简体   繁体   English

如何合并两个pandas DataFrames并聚合一个特定列

[英]how to merge two pandas DataFrames and aggregate one specific column

I have 2 DataFrames: 我有2个DataFrames:

         city  count    school
0    New York      1  school_3
1  Washington      1  School_4
2  Washington      1  School_5
3          LA      1  School_1
4          LA      1  School_4

         city  count    school
0    New York      1  School_3
1  Washington      1  School_1
2          LA      1  School_3
3          LA      2  School_4

I want to get the this result: 我想得到这个结果:

         city  count    school
0    New York      2  school_3
1  Washington      1  School_1
2  Washington      1  School_4
3  Washington      1  School_5
4          LA      1  School_1
5          LA      1  School_3
6          LA      3  School_4

Following is the code. 以下是代码。

d1 = [{'city':'New York', 'school':'school_3', 'count':1},
      {'city':'Washington', 'school':'School_4', 'count':1},
      {'city':'Washington', 'school':'School_5', 'count':1},
      {'city':'LA', 'school':'School_1', 'count':1},
      {'city':'LA', 'school':'School_4', 'count':1}]


d2 = [{'city':'New York', 'school':'School_3', 'count':1},
      {'city':'Washington', 'school':'School_1', 'count':1},
      {'city':'LA', 'school':'School_3', 'count':1},
      {'city':'LA', 'school':'School_4', 'count':2}]

x1 = pd.DataFrame(d1)
x2 = pd.DataFrame(d2)
#just get empty DataFrame
print pd.merge(x1, x2)

How to get the aggregate result ? 如何获得汇总结果?

You can do: 你可以做:

>>> pd.concat([x1, x2]).groupby(["city", "school"], as_index=False)["count"].sum()
       city    school        count
0          LA  School_1      1
1          LA  School_3      1
2          LA  School_4      3
3    New York  School_3      1
4    New York  school_3      1
5  Washington  School_1      1
6  Washington  School_4      1
7  Washington  School_5      1

Note that New York appears 2 times because of a typo in the data ( school_3 vs School_3 ). 请注意,纽约出现2次是因为数据中的拼写错误( school_3 vs School_3 )。

Here's a slightly different implementation from @elyase's solution using pandas.DataFrame.merge(...) 这是与使用pandas.DataFrame.merge(...) @ elyase解决方案略有不同的实现

x1.merge(x2,on=['city', 'school', 'count'], how='outer').groupby(['city', 'school'], as_index=False)['count'].sum()

When timed in ipython notebook %timeit this method is marginally faster than @elyase's (<1ms) 当在ipython notebook %timeit定时时,此方法比@ elyase(<1ms)略快

100 loops, best of 3: 6.25 ms per loop  #using concat(...) with @elyase's solution
100 loops, best of 3: 5.49 ms per loop #using merge(...) in this solution

Also, if you want to use pandas aggregate functionality you can also do: 此外,如果您想使用pandas aggregate功能,您还可以:

x1.merge(x2,on=['city', 'school', 'count'], how='outer').groupby(['city', 'school'], as_index=False).agg(numpy.sum)

The only disclaimer is that using agg(...) is the slowest of the 3 solutions. 唯一的免责声明是使用agg(...)是3种解决方案中最慢的。

Obviously all 3 provide the correct result: 显然所有3都提供了正确的结果:

         city    school  count
0          LA  School_1      1
1          LA  School_3      1
2          LA  School_4      3
3    New York  School_3      1
4    New York  school_3      1
5  Washington  School_1      1
6  Washington  School_4      1
7  Washington  School_5      1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas:如何通过保留第一个的信息来合并列上的两个数据帧? - Pandas: how to merge two dataframes on a column by keeping the information of the first one? 如何在减去列值的同时合并 Pandas 中的聚合两个数据框? - How can I merge aggregate two dataframes in Pandas while subtracting column values? 如何在集合列上合并两个熊猫数据框 - How to merge two pandas dataframes on column of sets 基于 Pandas 中的一列将数据框特定列合并在一起 - Merge dataframes specific columns together based on one column in Pandas pandas合并两个数据帧,其中一个包含另一个数据帧 - pandas merge two dataframes with one contains column values in another Pandas:合并两个数据框,并在一列上取平均值 - Pandas: merge two dataframes and make the average over one column pandas:如何在一列上合并具有相同列名的多个数据框? - pandas: How to merge multiple dataframes with same column names on one column? 如何基于一个公共列但不同的内容合并/扩展两个 python pandas 数据帧? - How do I merge/expand two python pandas dataframes, based on one common column but different content? 如何基于一行中的值和不同的列名合并两个熊猫数据帧? - How to merge two pandas dataframes based on a value in one row and with different column names? 如何合并两个数据框并有条件地合并一列 - How do you merge two dataframes and conditionally merge one column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM