简体   繁体   English

与熊猫python内部连接

[英]inner join with group by pandas python

I have 2 dataframes named geostat and ref, the dataframes are as follows: 我有两个名为geostat和ref的数据框,这些数据框如下:

geostat:
      count percent  grpno. state code
0          14.78       1         CA
1           0.00       2         CA
2           8.80       3         CA
3           9.60       4         FL
4          55.90       4         MA
5           0.00       2         FL
6           0.00       6         NC
7           0.00       5         NC
8           6.90       1         FL
9          59.00       4         MA
res:
    grpno.  MaxOfcount percent
0       1               14.78
1       2                0.00
2       3                8.80
3       4               59.00
4       5                0.00
5       6                0.00

I want to select the first(res.Maxofcount percent), res.grpno., and geostat.first(statecode) from the dataframe geostat and res inner join on columns res.Maxofcount percent = geostat.count percent AND res. 我想从数据框geostat中选择first(res.Maxofcount百分比),res.grpno。和geostat.first(状态码),并在res.Maxofcount百分比= geostat.count百分比AND res列上进行res内部联接。 grpno. grpno。 = geostat.grpno. = geostat.grpno。 group by res.grpno. 按res.grpno分组。

I want to do this python pandas, I am not sure on how to do inner join with group by.Can anyone help me on this? 我想做这个python pandas,我不确定如何使用group by进行内部联接,有人可以帮我吗?

The output dataframe is given below: 输出数据帧如下:

   FirstOfMaxOfState count percent  state pool number FirstOfstate code
0                            14.78                  1                CA
1                             0.00                  2                CA
2                             8.80                  3                CA
3                            59.00                  4                MA
4                             0.00                  5                NC
5                             0.00                  6                NC

NOTE: FIRST(Column name) is an access function what should be equivalent of it in python? 注意:FIRST(Column name)是一个访问函数,在python中应该等效于什么?

EDITED: Changed the output dataframe. 编辑:更改了输出数据框。

Use pandas.DataFrame.merge() 使用pandas.DataFrame.merge()

geostat.merge(res, left_on=['count percent', 'grpno.'], right_on=['MaxOfcount percent', 'grpno.'],how='inner')

   count percent  grpno. state code  MaxOfcount percent
0          14.78       1         CA               14.78
1           0.00       2         CA                0.00
2           0.00       2         FL                0.00
3           8.80       3         CA                8.80
4           0.00       6         NC                0.00
5           0.00       5         NC                0.00
6          59.00       4         MA               59.00

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM