[英]pandas read_csv fails when trying to read csv
所以當我試圖閱讀印度尼西亞出生時的預期壽命時( https://data.worldbank.org/indicator/SP.DYN.LE00.IN?locations=ID如果您想查看,這是鏈接)我可以`t,這是我的代碼
import pandas as pd
import matplotlib.pyplot as plt
lifeexpectacion = pd.read_csv("API_SP.DYN.LE00.IN_DS2_en_csv_v2_4770434.csv")
print(lifeexpectacion)
錯誤是
File "D:\programaizar\data economy\main.py", line 4, in <module>
lifeexpectacion = pd.read_csv("API_SP.DYN.LE00.IN_DS2_en_csv_v2_4770434.csv")
CSV 的前 4 行包含標題、上次更新日期等信息。您需要跳過數據文件的前 4 行。 使用pd.read_csv("API_SP.DYN.LE00.IN_DS2_en_csv_v2_4770434.csv", skiprows=4)
我下載了鏈接文件,看看是否可以重現錯誤。 這篇文章有類似的問題。
csv 的前四行是:
"Data Source","World Development Indicators",
"Last Updated Date","2022-12-22",
如果刪除這些行,它會按預期工作。 元數據讓 pandas 誤以為只有兩列,而實際上是六十七列。
為我工作
df = pd.read_csv(r'D:\temp\API_SP.DYN.LE00.IN_DS2_en_csv_v2_4770434.csv',skiprows=4)
df
Out[149]:
Country Name Country Code ... 2021 Unnamed: 66
0 Aruba ABW ... NaN NaN
1 Africa Eastern and Southern AFE ... NaN NaN
2 Afghanistan AFG ... NaN NaN
3 Africa Western and Central AFW ... NaN NaN
4 Angola AGO ... NaN NaN
.. ... ... ... ... ...
261 Kosovo XKX ... NaN NaN
262 Yemen, Rep. YEM ... NaN NaN
263 South Africa ZAF ... NaN NaN
264 Zambia ZMB ... NaN NaN
265 Zimbabwe ZWE ... NaN NaN
[266 rows x 67 columns]
df.columns
Out[150]:
Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
'1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968',
'1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977',
'1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986',
'1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995',
'1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004',
'2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013',
'2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021',
'Unnamed: 66'],
dtype='object')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.