Pandas read_html error when reading in a Wikipedia table

Question

I'm trying to read in a table using read_html

import requests 
import pandas as pd
import numpy as np

url = 'https://en.wikipedia.org/wiki/List_of_countries_by_intentional_homicide_rate'
resp = requests.get(url)
tables = pd.read_html(resp.text)

But I get this error

IndexError: list index out of range

Other Wiki pages work fine. What's up with this page and how do I solve the above error?

Answer 1

Seems like the table can't be read because of the jquery table sorter. It's easy to read tables with the selenium library into a df when you're dealing with jquery instead of plain html. You'll still need to do some cleanup, but this will get the table into a df.

You'll need to install the selenium library and download a web browser driver too.


from selenium import webdriver

driver = r'C:\chromedriver_win32\chromedriver.exe'
url = 'https://en.wikipedia.org/wiki/List_of_countries_by_intentional_homicide_rate'

driver = webdriver.Chrome(driver) 
driver.get(url)

the_table = driver.find_element_by_xpath('//*[@id="mw-content-text"]/div/table[2]/tbody/tr/td[2]/table')

data = the_table.text
df = pd.DataFrame([x.split() for x in data.split('\n')])

driver.close()

print(df)




                0          1           2           3          4       5   \
0          Country        (or   dependent  territory,       None    None   
1      subnational      area,       etc.)      Region  Subregion    Rate   
2           listed     Source        None        None       None    None   
3             None       None        None        None       None    None   
4          Burundi     Africa     Eastern      Africa       6.02     635   
5          Comoros     Africa     Eastern      Africa       7.70      60   
6         Djibouti     Africa     Eastern      Africa       6.48      60   
7          Eritrea     Africa     Eastern      Africa       8.04     390   
8         Ethiopia     Africa     Eastern      Africa       7.56   7,552   
9            Kenya     Africa     Eastern      Africa       5.00   2,466   
10      Madagascar     Africa     Eastern      Africa       7.69   1,863

Pandas read_html error when reading in a Wikipedia table

Question

1 answers

solution1
1 2019-11-03 01:30:19

Pandas read_html error when reading in a Wikipedia table

Question

1 answers

solution1 1 2019-11-03 01:30:19

solution1
1 2019-11-03 01:30:19