简体   繁体   English

Pandas dataframe 组合唯一的行值

[英]Pandas dataframe combine unique row values

I have a dataframe like the following with over 90000 rows.我有一个 dataframe ,如下所示,超过 90000 行。

origin      destination people
101011001   101011001   7378
101011001   101011002   120
101011001   101011002   8
101011001   101011002   285
101011001   101011003   7
101011001   101011004   0
101011001   101011004   1
101011001   101011004   2
101011001   101011004   9
101011002   101011001   5

As you can see, some origin and destination values repeat for example there are multiple rows where origin=101011001, destination=101011002.如您所见,一些origindestination值重复,例如有多行起点=101011001,终点=101011002。 My goal is to group the repeating origin and destination values and sum the the people column, so the dataframe looks like this:我的目标是将重复的origindestination值分组并对people求和,因此 dataframe 如下所示:

origin      destination people
101011001   101011001   7378
101011001   101011002   413
101011001   101011003   7
101011001   101011004   12
101011002   101011001   5

I've tried jsondf.groupby(['origin', 'destination']).sum() which gives me the correct sum and destination values but it's not quite what I want as I want the origin values to also be shown in the row for each destination.我试过jsondf.groupby(['origin', 'destination']).sum()它给了我正确的总和和目标值,但这不是我想要的,因为我希望原始值也显示在每个目的地的行。

Note My end goal is to get this dataframe into a SQL database as a table, and with the .groupby() code above, the origin and destination values are actually interpreted as NULL which is not what I want.注意我的最终目标是将这个 dataframe 作为表格放入 SQL 数据库中,并且使用上面的.groupby()代码,原始值和目标值实际上被解释为 Z6C3E226B4D4795D518AB341B 不是我想要的。

Thanks!谢谢!

A quick and easy way to get each of your origin values to display would be to simply reset your index after using the groupby.让您的每个原始值显示的一种快速简便的方法是在使用 groupby 后简单地重置您的索引。 Here is an example that shows what the database looks like before and after resetting the index:这是一个示例,显示了重置索引之前和之后数据库的样子:

df.groupby(['origin', 'destination']).sum()

origin      destination  people
101011001   101011001    7378
            101011002    413
            101011003    7
            101011004    12
101011002   101011001    5

Once you add the reset_index(), then the dataframe will have each value of origin represented in every row.添加 reset_index() 后,dataframe 将在每一行中表示每个原点值。

    df.groupby(['origin', 'destination']).sum().reset_index()

    origin      destination people
0   101011001   101011001   7378
1   101011001   101011002   413
2   101011001   101011003   7
3   101011001   101011004   12
4   101011002   101011001   5

This should allow you to send to the sql database without interpreting the origin as null values.这应该允许您发送到 sql 数据库,而无需将来源解释为 null 值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM