i have a data which has 3303 rows. I use pandas in python
df = pd.DataFrame({'A': ['foo', 'bar', 'foo', 'bar','foo', 'bar', 'foo', 'foo'],'B': ['one', 'one', 'two', 'three','two', 'two', 'one', 'three'], 'C': np.random.randn(8),'D': np.random.randn(8), 'E':['1','1','2','3','1','2','1','2',]})
OUTPUT:
A B C D E
0 foo one -1.607303 1.343192 1
1 bar one 2.064340 1.000130 1
2 foo two -0.362983 1.113389 2
3 bar three 0.486864 -0.804323 3
4 foo two 0.111030 -0.322696 1
5 bar two -0.729870 0.912012 2
6 foo one 1.111405 0.076317 1
7 foo three 0.378172 0.298974 2
Do you know how to groupby the column 'E' with the order in terms of number? meaning; any idea on how to group by iterations like 1,2,3 in 1st group, 1,2 in 2nd group, 1 in 3rd group, 1,2 in 4th group... etc such that it will be like
A B C D E G
0 foo one -1.607303 1.343192 1 a
1 bar one 2.064340 1.000130 1 b
2 foo two -0.362983 1.113389 2 b
3 bar three 0.486864 -0.804323 3 b
4 foo two 0.111030 -0.322696 1 c
5 bar two -0.729870 0.912012 2 c
6 foo one 1.111405 0.076317 1 d
7 foo three 0.378172 0.298974 2 d
so that it will be like, new columns 'H', 'I' having the sum of 'C' and 'D' values grouped by 'G'. please suggest and guide me in this part
Try this:
df['G'] = df.E.eq('1').cumsum()
This works if every new group starts with '1'. If not you need to resort to yatu's solution .
To answer your whole question:
df[['H','I']] = df.groupby(df.E.eq('1').cumsum())[['C','D']].transform(sum)
Probably numbering those resulting groups is a better idea. In such case you can check if the values in the series are smaller or equal than a shifted version, and take the cumsum
of the boolean result:
df['G'] = df.E.le(df.E.shift()).cumsum()
print(df)
A B C D E G
0 foo one -1.495356 3.699348 1 0
1 bar one -1.852039 0.569688 1 1
2 foo two 0.875101 0.736014 2 1
3 bar three -0.690525 0.132817 3 1
4 foo two -0.742679 0.138903 1 2
5 bar two -0.435063 1.525082 2 2
6 foo one -0.985005 1.013949 1 3
7 foo three 0.934254 1.157935 2 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.