简体   繁体   English

Pandas 总和包括列标题

[英]Pandas sum includes column headers

Background背景

I have a dataset of about 200 countries (rows) for different time periods (columns).我有一个包含不同时间段(列)的大约 200 个国家(行)的数据集。 The Pandas dataframe of this dataset is as follows:该数据集的Pandas dataframe如下:

data = {'Country': ['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola'],
        '1958-1962': [0, 0, 0, 0, 0],
        '2008-2012': [0.0, 0.0, 8.425, 0.0, 0.0],
        '2013-2017': [0.0, 0.0, 10.46, 0.0, 0.0]}

df = pd.DataFrame(data)

     Country  1958-1962  2008-2012  2013-2017
 Afghanistan          0      0.000       0.00
     Albania          0      0.000       0.00
     Algeria          0      8.425      10.46
     Andorra          0      0.000       0.00
      Angola          0      0.000       0.00

I am trying to obtain sum of all the values in each column using the following code.我正在尝试使用以下代码获取每列中所有值的总和。

y_data = []

period_list = list(df)
period_list.remove('Country')

for x in period_list:
    y_data.append(df[x].sum())

Error错误

TypeError: unsupported operand type(s) for +: 'int' and 'str'

Process finished with exit code 1

For some reason, Pandas is also including the header in the sum process.由于某种原因,Pandas 在求和过程中也包括了 header。 How do I resolve this?我该如何解决这个问题?

Other tests其他测试

I tested the sum function on the following dataframe using df.sum() , and it appropriately produced the sum of numbers for each column as 18, 20, 20, 19.我使用df.sum()在以下 dataframe 上测试了总和 function ,它适当地产生了每列的数字总和为 18、20、20、19。

df = pd.DataFrame({"A":[5, 3, 6, 4], 
                   "B":[11, 2, 4, 3], 
                   "C":[4, 3, 8, 5], 
                   "D":[5, 4, 2, 8]}) 

The output of print(df.drop("Country",axis=1).dtypes) is as follows: print(df.drop("Country",axis=1).dtypes)的output如下:

1958-1962    object
1963-1967    object
1968-1972    object
1973-1977    object
1978-1982    object
1983-1987    object
1988-1992    object
1993-1997    object
1998-2002    object
2003-2007    object
2008-2012    object
2013-2017    object
dtype: object

Solution解决方案

I used df = df.apply(pd.to_numeric, errors='ignore') to convert the objects into numbers and that resolved the issue.我使用df = df.apply(pd.to_numeric, errors='ignore')将对象转换为数字并解决了问题。

Convert the columns you want to sum from objects to numeric and then drop Country column before making sum in the rest of columns.将要求和的列从对象转换为数字,然后在列的 rest 中求和之前删除 Country 列。

Refer this link for converting from object to numeric请参阅此链接从 object 转换为数字

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM