简体   繁体   English

如何在 python 中按顺序编号 1、123、12、12.. 等进行分组

[英]how to group by the ordered numbers 1, 123, 12, 12.. etc in python

i have a data which has 3303 rows.我有一个包含 3303 行的数据。 I use pandas in python我在 python 中使用 pandas

df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],'B': ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'], 'C': np.random.randn(8),'D': np.random.randn(8), 'E':['1','1','2','3','1','2','1','2',]})

OUTPUT:
     A   B          C            D      E
0   foo one     -1.607303   1.343192    1
1   bar one      2.064340   1.000130    1
2   foo two     -0.362983   1.113389    2
3   bar three    0.486864   -0.804323   3
4   foo two      0.111030   -0.322696   1
5   bar two     -0.729870   0.912012    2
6   foo one      1.111405   0.076317    1
7   foo three    0.378172   0.298974    2

Do you know how to groupby the column 'E' with the order in terms of number?你知道如何根据数字顺序对“E”列进行分组吗? meaning;意义; any idea on how to group by iterations like 1,2,3 in 1st group, 1,2 in 2nd group, 1 in 3rd group, 1,2 in 4th group... etc such that it will be like关于如何按第 1 组中的 1,2,3,第 2 组中的 1,2,第 3 组中的 1,第 4 组中的 1,2 等迭代进行分组的任何想法......等等,这样它就像

     A   B          C            D      E  G
0   foo one     -1.607303   1.343192    1  a
1   bar one      2.064340   1.000130    1  b
2   foo two     -0.362983   1.113389    2  b
3   bar three    0.486864   -0.804323   3  b
4   foo two      0.111030   -0.322696   1  c
5   bar two     -0.729870   0.912012    2  c
6   foo one      1.111405   0.076317    1  d
7   foo three    0.378172   0.298974    2  d

so that it will be like, new columns 'H', 'I' having the sum of 'C' and 'D' values grouped by 'G'.这样就像新列“H”、“I”具有按“G”分组的“C”和“D”值的总和。 please suggest and guide me in this part请在这部分建议和指导我

Try this:试试这个:

df['G'] = df.E.eq('1').cumsum()

This works if every new group starts with '1'.如果每个新组都以“1”开头,则此方法有效。 If not you need to resort to yatu's solution .如果不是,您需要求助于yatu 的解决方案

To answer your whole question:回答你的整个问题:

df[['H','I']] = df.groupby(df.E.eq('1').cumsum())[['C','D']].transform(sum)

Probably numbering those resulting groups is a better idea.可能对这些结果组进行编号是一个更好的主意。 In such case you can check if the values in the series are smaller or equal than a shifted version, and take the cumsum of the boolean result:在这种情况下,您可以检查系列中的值是否小于或等于移位版本,并取cumsum结果的累积和:

df['G'] = df.E.le(df.E.shift()).cumsum()

print(df)

     A      B         C         D  E  G
0  foo    one -1.495356  3.699348  1  0
1  bar    one -1.852039  0.569688  1  1
2  foo    two  0.875101  0.736014  2  1
3  bar  three -0.690525  0.132817  3  1
4  foo    two -0.742679  0.138903  1  2
5  bar    two -0.435063  1.525082  2  2
6  foo    one -0.985005  1.013949  1  3
7  foo  three  0.934254  1.157935  2  3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM