This is an unusual problem. I am trying to extract a table from certain website(link cant be given because of security). The problem is that the site will load the table when accessed through website but when we use inspect element
on any values/tables on that table it is not visible. It just show <html>_</html>
with some scripts and links inside. Initially I tried to extract table using beautifulsoup
but it was unsuccessful. Then I used pandas pandas.read_html(html)
but the site contains more than one table and its output is something like this
[ Code Name
0 A John
1 B Terry
2 C Kitty
Column 1 Column 2 Column 3
0 1 0.6173661242 8
1 2 0.7232098163 20
2 3 0.9954581943 39
3 4 0.5595425507 18
4 5 0.9644025159 20
5 6 0.3914102544 29
6 7 0.0154642132 49
....
[873 rows x 3 columns],
0\n\t\t\t\t\t\t\t\t\t
0 0 ]
Then I tried something like this pandas.read_html(html, match="Column 1")
it returns this error
ValueError: No tables found matching pattern 'Column 1'
any idea how we can use read_html to extract tables?
When data scraping off a secure website, the website can be using Java to load the tables so you never see the HTML-styled code. This could be why BeautifulSoup is not returning anything.
Does the "scripts and links inside" look like Java?
Maybe have a look at Selenium?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.