- Making sure that the text can be pulled out of a url;
- processing text
Extract data from HTML file stored in C drive of Laptop
9 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Hello Everyone,
I want to extract data from local HTML file stored in C drive of laptop.
Can anyonw guide me how can I extract the data from the HTML file and further converting the data into array of char and using it ahead.
the file format is HTML and link is something like - file:///C:/Users/Pranav/OneDrive/Desktop/.....................................
commands that I have already used - 1) str=fileread('xxxxxxxxxxxxxxxxx.html') ---> data=extractHTMLString (str)
but it is giving output data as a 1 X 1000000 range where each letter is considered.
I am looking forward to some quality advices
Thanks in advance!
1 comentario
Walter Roberson
el 6 de Sept. de 2022
As an experiment, what happens if you fileread() the file directly and process that?
You have two separate issues:
Reading the file without url will allow you to test out the processing part separately from reading from the url.
To test reading from the url you could fileread() from the url and fileread() from the local file without url, and compare the two.
Respuestas (1)
Saffan
el 30 de Ag. de 2023
Hi,
To accomplish this, you can modify your code to add an additional step of creating an HTMLTtree using the “htmlTree” method. This method parses the HTML code in the string and returns the resulting tree structure. You can then extract the text from the HTMLtree as shown in the following code snippet:
% Read the HTML file
htmlContent = fileread(filePath);
% Create an HTML tree from the content
tree = htmlTree(htmlContent);
% Extract the text from the HTML tree
data = extractHTMLText(tree);
Refer to this for more information:
0 comentarios
Ver también
Categorías
Más información sobre Text Files en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!