Pandas groupby 基于列值

Question

I have following dataframe - dfgeo :我有以下 dataframe - dfgeo ：

              x            y         z  zt  n  k  pv                         geometry        dist
0   6574878.210  4757530.610  1152.588   1  8  4  90  POINT (6574878.210 4757530.610)    0.000000
1   6574919.993  4757570.314  1174.724   0            POINT (6574919.993 4757570.314)   57.638760
2   6575020.518  4757665.839  1177.339   0            POINT (6575020.518 4757665.839)  138.673362
3   6575239.548  4757873.972  1160.156   1  8  4  90  POINT (6575239.548 4757873.972)  302.148120
4   6575351.603  4757980.452  1202.418   0            POINT (6575351.603 4757980.452)  154.577856
5   6575442.780  4758067.093  1199.297   0            POINT (6575442.780 4758067.093)  125.777217
6   6575538.217  4758157.782  1192.914   1  8  4  90  POINT (6575538.217 4758157.782)  131.653772
7   6575594.625  4758240.033  1217.442   0            POINT (6575594.625 4758240.033)   99.735096
8   6575738.820  4758450.289  1174.477   0            POINT (6575738.820 4758450.289)  254.950551
9   6575850.937  4758613.772  1123.852   1  8  4  90  POINT (6575850.937 4758613.772)  198.234490
10  6575984.323  4758647.118  1131.761   0            POINT (6575984.323 4758647.118)  137.491020
11  6576204.312  4758702.115  1119.407   0            POINT (6576204.312 4758702.115)  226.759410
12  6576303.976  4758727.031  1103.064   0            POINT (6576303.976 4758727.031)  102.731300
13  6576591.496  4758798.910   1114.06   0            POINT (6576591.496 4758798.910)  296.368590
14  6576736.965  4758835.277  1120.285   1  8  4  90  POINT (6576736.965 4758835.277)  149.945952

I am trying to group by zt values an summarize dist column.我正在尝试按zt值对汇总 dist 列进行分组。 I have tried this:我试过这个：

def summarize(group):
    s = group['zt'].eq(1).cumsum()
    return group.groupby(s).agg(
        D=('dist', 'sum')
    )
dfzp=dfgeo.apply(summarize)

But i get following errors on last line of code但是我在最后一行代码中遇到以下错误

    s = group['zt'].eq(1).cumsum()
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py", line 871, in __getitem__
    result = self.index.get_value(self, key)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 4405, in get_value
    return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
  File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 135, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index_class_helper.pxi", line 109, in pandas._libs.index.Int64Engine._check_type
KeyError: 'zt'

Any help in resolving this appreciated.解决此问题的任何帮助表示赞赏。

Answer 1

If need pass Dataframe to function use:如果需要通过 Dataframe 到 function 使用：

dfzp=summarize(dfgeo)

Or DataFrame.pipe :或DataFrame.pipe ：

dfzp=dfgeo.pipe(summarize)

If use DataFrame.apply then is used function per columns or per rows if axis=1 .如果使用DataFrame.apply则使用 function 如果axis=1每列或每行。

Pandas groupby 基于列值

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-08-22 12:38:06

Pandas groupby 基于列值

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-08-22 12:38:06

解决方案1
2 已采纳 2020-08-22 12:38:06