[英]pandas read_csv fails when trying to read csv
所以当我试图阅读印度尼西亚出生时的预期寿命时( https://data.worldbank.org/indicator/SP.DYN.LE00.IN?locations=ID如果您想查看,这是链接)我可以`t,这是我的代码
import pandas as pd
import matplotlib.pyplot as plt
lifeexpectacion = pd.read_csv("API_SP.DYN.LE00.IN_DS2_en_csv_v2_4770434.csv")
print(lifeexpectacion)
错误是
File "D:\programaizar\data economy\main.py", line 4, in <module>
lifeexpectacion = pd.read_csv("API_SP.DYN.LE00.IN_DS2_en_csv_v2_4770434.csv")
CSV 的前 4 行包含标题、上次更新日期等信息。您需要跳过数据文件的前 4 行。 使用pd.read_csv("API_SP.DYN.LE00.IN_DS2_en_csv_v2_4770434.csv", skiprows=4)
我下载了链接文件,看看是否可以重现错误。 这篇文章有类似的问题。
csv 的前四行是:
"Data Source","World Development Indicators",
"Last Updated Date","2022-12-22",
如果删除这些行,它会按预期工作。 元数据让 pandas 误以为只有两列,而实际上是六十七列。
为我工作
df = pd.read_csv(r'D:\temp\API_SP.DYN.LE00.IN_DS2_en_csv_v2_4770434.csv',skiprows=4)
df
Out[149]:
Country Name Country Code ... 2021 Unnamed: 66
0 Aruba ABW ... NaN NaN
1 Africa Eastern and Southern AFE ... NaN NaN
2 Afghanistan AFG ... NaN NaN
3 Africa Western and Central AFW ... NaN NaN
4 Angola AGO ... NaN NaN
.. ... ... ... ... ...
261 Kosovo XKX ... NaN NaN
262 Yemen, Rep. YEM ... NaN NaN
263 South Africa ZAF ... NaN NaN
264 Zambia ZMB ... NaN NaN
265 Zimbabwe ZWE ... NaN NaN
[266 rows x 67 columns]
df.columns
Out[150]:
Index(['Country Name', 'Country Code', 'Indicator Name', 'Indicator Code',
'1960', '1961', '1962', '1963', '1964', '1965', '1966', '1967', '1968',
'1969', '1970', '1971', '1972', '1973', '1974', '1975', '1976', '1977',
'1978', '1979', '1980', '1981', '1982', '1983', '1984', '1985', '1986',
'1987', '1988', '1989', '1990', '1991', '1992', '1993', '1994', '1995',
'1996', '1997', '1998', '1999', '2000', '2001', '2002', '2003', '2004',
'2005', '2006', '2007', '2008', '2009', '2010', '2011', '2012', '2013',
'2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021',
'Unnamed: 66'],
dtype='object')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.