简体   繁体   中英

How to convert list to pandas DataFrame?

I use BeautifulSoup to get some data from a webpage:

import pandas as pd
import requests
from bs4 import BeautifulSoup

res = requests.get("http://www.nationmaster.com/country-info/stats/Media/Internet-users")
soup = BeautifulSoup(res.content,'html5lib')
table = soup.find_all('table')[0] 
df = pd.read_html(str(table))

df.head()

But df is a list, not the pandas DataFrame as I expected from using pd.read_html .

How can I get pandas DataFrame out of it?

You can use read_html with your url :

df = pd.read_html("http://www.nationmaster.com/country-info/stats/Media/Internet-users")[0]

And then if necessary remove GRAPH and HISTORY columns and replace NaN s in column # by forward filling:

df = df.drop(['GRAPH','HISTORY'], axis=1)
df['#'] = df['#'].ffill()
print(df.head())
   #                                       COUNTRY         AMOUNT  DATE
0  1                                         China    389 million  2009
1  2                                 United States    245 million  2009
2  3                                         Japan  99.18 million  2009
3  3  Group of 7 countries (G7) average  (profile)  80.32 million  2009
4  4                                        Brazil  75.98 million  2009

print(df.tail())

        #                                        COUNTRY AMOUNT  DATE
244   214                                           Niue   1100  2009
245  =215  Saint Helena, Ascension, and Tristan da Cunha    900  2009
246  =215                                   Saint Helena    900  2009
247   217                                        Tokelau    800  2008
248   218                               Christmas Island    464  2001

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM