I'm trying to read in a table using read_html
import requests
import pandas as pd
import numpy as np
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_intentional_homicide_rate'
resp = requests.get(url)
tables = pd.read_html(resp.text)
But I get this error
IndexError: list index out of range
Other Wiki pages work fine. What's up with this page and how do I solve the above error?
Seems like the table can't be read because of the jquery table sorter. It's easy to read tables with the selenium library into a df when you're dealing with jquery instead of plain html. You'll still need to do some cleanup, but this will get the table into a df.
You'll need to install the selenium library and download a web browser driver too.
from selenium import webdriver
driver = r'C:\chromedriver_win32\chromedriver.exe'
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_intentional_homicide_rate'
driver = webdriver.Chrome(driver)
driver.get(url)
the_table = driver.find_element_by_xpath('//*[@id="mw-content-text"]/div/table[2]/tbody/tr/td[2]/table')
data = the_table.text
df = pd.DataFrame([x.split() for x in data.split('\n')])
driver.close()
print(df)
0 1 2 3 4 5 \
0 Country (or dependent territory, None None
1 subnational area, etc.) Region Subregion Rate
2 listed Source None None None None
3 None None None None None None
4 Burundi Africa Eastern Africa 6.02 635
5 Comoros Africa Eastern Africa 7.70 60
6 Djibouti Africa Eastern Africa 6.48 60
7 Eritrea Africa Eastern Africa 8.04 390
8 Ethiopia Africa Eastern Africa 7.56 7,552
9 Kenya Africa Eastern Africa 5.00 2,466
10 Madagascar Africa Eastern Africa 7.69 1,863
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.