[英]how to group by the ordered numbers 1, 123, 12, 12.. etc in python
i have a data which has 3303 rows.我有一个包含 3303 行的数据。 I use pandas in python
我在 python 中使用 pandas
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],'B': ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'], 'C': np.random.randn(8),'D': np.random.randn(8), 'E':['1','1','2','3','1','2','1','2',]})
OUTPUT:
A B C D E
0 foo one -1.607303 1.343192 1
1 bar one 2.064340 1.000130 1
2 foo two -0.362983 1.113389 2
3 bar three 0.486864 -0.804323 3
4 foo two 0.111030 -0.322696 1
5 bar two -0.729870 0.912012 2
6 foo one 1.111405 0.076317 1
7 foo three 0.378172 0.298974 2
Do you know how to groupby the column 'E' with the order in terms of number?你知道如何根据数字顺序对“E”列进行分组吗? meaning;
意义; any idea on how to group by iterations like 1,2,3 in 1st group, 1,2 in 2nd group, 1 in 3rd group, 1,2 in 4th group... etc such that it will be like
关于如何按第 1 组中的 1,2,3,第 2 组中的 1,2,第 3 组中的 1,第 4 组中的 1,2 等迭代进行分组的任何想法......等等,这样它就像
A B C D E G
0 foo one -1.607303 1.343192 1 a
1 bar one 2.064340 1.000130 1 b
2 foo two -0.362983 1.113389 2 b
3 bar three 0.486864 -0.804323 3 b
4 foo two 0.111030 -0.322696 1 c
5 bar two -0.729870 0.912012 2 c
6 foo one 1.111405 0.076317 1 d
7 foo three 0.378172 0.298974 2 d
so that it will be like, new columns 'H', 'I' having the sum of 'C' and 'D' values grouped by 'G'.这样就像新列“H”、“I”具有按“G”分组的“C”和“D”值的总和。 please suggest and guide me in this part
请在这部分建议和指导我
Try this:试试这个:
df['G'] = df.E.eq('1').cumsum()
This works if every new group starts with '1'.如果每个新组都以“1”开头,则此方法有效。 If not you need to resort to yatu's solution .
如果不是,您需要求助于yatu 的解决方案。
To answer your whole question:回答你的整个问题:
df[['H','I']] = df.groupby(df.E.eq('1').cumsum())[['C','D']].transform(sum)
Probably numbering those resulting groups is a better idea.可能对这些结果组进行编号是一个更好的主意。 In such case you can check if the values in the series are smaller or equal than a shifted version, and take the
cumsum
of the boolean result:在这种情况下,您可以检查系列中的值是否小于或等于移位版本,并取
cumsum
结果的累积和:
df['G'] = df.E.le(df.E.shift()).cumsum()
print(df)
A B C D E G
0 foo one -1.495356 3.699348 1 0
1 bar one -1.852039 0.569688 1 1
2 foo two 0.875101 0.736014 2 1
3 bar three -0.690525 0.132817 3 1
4 foo two -0.742679 0.138903 1 2
5 bar two -0.435063 1.525082 2 2
6 foo one -0.985005 1.013949 1 3
7 foo three 0.934254 1.157935 2 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.