Reading conetent from web url

Question

PS el 29 de Ag. de 2024

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/2148759-reading-conetent-from-web-url

Respondida: PS el 30 de Ag. de 2024

I know how to read urls and save the content for further analyzing the data.

The issue I am facing is that I want to read certain content of a url in a specif way;

For e.g from this url https://www.gem.wiki/Almaty-2_power_station. I would like to read table 2 in a table format or tables with having specific words in it.

On exploring internet I figured out that I can read table directly from urls but I am not sure the table I want to read from the url is actual table or just text content.

Any help will be great

2 comentarios
Mostrar NingunoOcultar Ninguno

Mario Malic el 29 de Ag. de 2024

There is no content on this page.

Voss el 29 de Ag. de 2024

Try without the period at the end:

https://www.gem.wiki/Almaty-2_power_station

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Rahul el 30 de Ag. de 2024

1
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/2148759-reading-conetent-from-web-url#answer_1507524

Abrir en MATLAB Online

Hi @PS,

I understand that you are trying to read the content of 'Table 2' from url https://www.gem.wiki/Almaty-2_power_station .

You can achieve the desired result by following the following code:

url = 'https://www.gem.wiki/Almaty-2_power_station';  
htmlContent = webread(url);  % Reading the content from the url
tree = htmlTree(htmlContent);
tables = findElement(tree, "table"); % Finding the tables from the DOM tree
secondTableElement = tables(4); % Here I have tables the index as 4 as some other elemts are of the HTML page are also getting considered as tables.
% Find all rows in the second table
rows = findElement(secondTableElement, "tr");
% Initialize a cell array to store table data
tableData = {};
columnNames = {};
headerCells = findElement(rows(1), "th");
% Extract header text
for j = 1:numel(headerCells)
    columnNames{j} = strtrim(extractHTMLText(headerCells(j)));
end
% Extract data rows
for i = 2:numel(rows)  
    
    cells = findElement(rows(i), "td");
    
    % Extract text from each cell
    rowData = cell(1, numel(cells));
    for j = 1:numel(cells)
        rowData{j} = strtrim(extractHTMLText(cells(j)));
    end
    tableData = [tableData; rowData];
end
% The following part is just to get a string cell array for the header
headerCellstring = cell(size(columnNames));
for i = 1:numel(columnNames)
    headerCellstring{i} = columnNames{i}{1};
end
% Obtain the table using 'cell2table' function
secondTable = cell2table(tableData, 'VariableNames', headerCellstring);

You can refer to the following documentations for your reference:

'webread': https://www.mathworks.com/help/releases/R2024a/matlab/ref/webread.html?searchHighlight=webread&s_tid=doc_srchtitle

'htmlTree': https://www.mathworks.com/help/releases/R2024a/textanalytics/ref/htmltree.html?searchHighlight=htmltree&s_tid=doc_srchtitle

'findElement': https://www.mathworks.com/help/releases/R2024a/textanalytics/ref/htmltree.findelement.html?searchHighlight=findElement&s_tid=doc_srchtitle

'cell2table': https://www.mathworks.com/help/releases/R2024a/matlab/ref/cell2table.html?searchHighlight=cell2table&s_tid=doc_srchtitle

Hope this helps! Thanks.

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

PS el 30 de Ag. de 2024

Editada: PS el 30 de Ag. de 2024

@Rahul I can't thank you enough. I was playing with htmlTree and findElement but somehow could not fathom to go further with it.

Your code will help me save ton of my time as I have hundereds of url to scan through. My sincere gratitude.

Thanks!

Iniciar sesión para comentar.

Answer 2

PS el 30 de Ag. de 2024

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/2148759-reading-conetent-from-web-url#answer_1507859

I figured out another solution using readtable

url = "https://www.gem.wiki/Almaty-2_power_station";

opts = htmlImportOptions('TableSelector',"//TABLE[.//TH='CHP']")

opts.VariableNamesRow= 1;

opts.DataRows = [2 Inf];

T = readtable(url, opts);

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Reading conetent from web url

2 comentarios
Mostrar NingunoOcultar Ninguno

Respuesta aceptada

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

Reading conetent from web url

2 comentarios Mostrar NingunoOcultar Ninguno

Respuesta aceptada

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Más respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

2 comentarios
Mostrar NingunoOcultar Ninguno

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos