简体   繁体   English

Pandas groupby 基于列值

[英]Pandas groupby based on column value

I have following dataframe - dfgeo :我有以下 dataframe - dfgeo

              x            y         z  zt  n  k  pv                         geometry        dist
0   6574878.210  4757530.610  1152.588   1  8  4  90  POINT (6574878.210 4757530.610)    0.000000
1   6574919.993  4757570.314  1174.724   0            POINT (6574919.993 4757570.314)   57.638760
2   6575020.518  4757665.839  1177.339   0            POINT (6575020.518 4757665.839)  138.673362
3   6575239.548  4757873.972  1160.156   1  8  4  90  POINT (6575239.548 4757873.972)  302.148120
4   6575351.603  4757980.452  1202.418   0            POINT (6575351.603 4757980.452)  154.577856
5   6575442.780  4758067.093  1199.297   0            POINT (6575442.780 4758067.093)  125.777217
6   6575538.217  4758157.782  1192.914   1  8  4  90  POINT (6575538.217 4758157.782)  131.653772
7   6575594.625  4758240.033  1217.442   0            POINT (6575594.625 4758240.033)   99.735096
8   6575738.820  4758450.289  1174.477   0            POINT (6575738.820 4758450.289)  254.950551
9   6575850.937  4758613.772  1123.852   1  8  4  90  POINT (6575850.937 4758613.772)  198.234490
10  6575984.323  4758647.118  1131.761   0            POINT (6575984.323 4758647.118)  137.491020
11  6576204.312  4758702.115  1119.407   0            POINT (6576204.312 4758702.115)  226.759410
12  6576303.976  4758727.031  1103.064   0            POINT (6576303.976 4758727.031)  102.731300
13  6576591.496  4758798.910   1114.06   0            POINT (6576591.496 4758798.910)  296.368590
14  6576736.965  4758835.277  1120.285   1  8  4  90  POINT (6576736.965 4758835.277)  149.945952

I am trying to group by zt values an summarize dist column.我正在尝试按zt值对汇总 dist 列进行分组。 I have tried this:我试过这个:

def summarize(group):
    s = group['zt'].eq(1).cumsum()
    return group.groupby(s).agg(
        D=('dist', 'sum')
    )
dfzp=dfgeo.apply(summarize)

But i get following errors on last line of code但是我在最后一行代码中遇到以下错误

    s = group['zt'].eq(1).cumsum()
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py", line 871, in __getitem__
    result = self.index.get_value(self, key)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 4405, in get_value
    return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
  File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 135, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index_class_helper.pxi", line 109, in pandas._libs.index.Int64Engine._check_type
KeyError: 'zt'

Any help in resolving this appreciated.解决此问题的任何帮助表示赞赏。

If need pass Dataframe to function use:如果需要通过 Dataframe 到 function 使用:

dfzp=summarize(dfgeo)

Or DataFrame.pipe :DataFrame.pipe

dfzp=dfgeo.pipe(summarize)

If use DataFrame.apply then is used function per columns or per rows if axis=1 .如果使用DataFrame.apply则使用 function 如果axis=1每列或每行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM