I have a pandas data frame like this:
df = pd.DataFrame({'a1':['astr1','jmtr2','astr2','mmsk3',
'astr6','jmtr2','astr2','mhhk',
'astr5','mmsk','astr6','astr1',
'mstr1','mhhk','mstr2','mhhk'],
'a2':[x for x in np.random.randn(16)]})
df
a1 a2
0 astr1 -0.490416
1 jmtr2 0.651627
2 astr2 0.784004
3 mmsk3 -1.595870
4 astr6 1.228631
5 jmtr2 -1.644518
6 astr2 -0.311709
7 mhhk -1.284221
8 astr5 -0.356339
9 mmsk -0.071046
10 astr6 1.620838
11 astr1 -0.717384
12 mstr1 0.830618
13 mhhk -0.020226
14 mstr2 -0.056465
15 mhhk -0.160234
What I want to do now is merging a1
if the first four letters is the same. Meanwhile, the values of a2
should to be added.
Like this:
a1 a2
0 astr $sum of astr$
1 jmtr $sum of jmtr$
2 mmsk $sum of mmsk$
3 mhhk $sum of mhhk$
4 mstr $sum of mstr$
I think you need groupby
by first 4
characters of a1
with indexing with str and aggregate sum
:
print (df.a1.str[:4])
0 astr
1 jmtr
2 astr
3 mmsk
4 astr
5 jmtr
6 astr
7 mhhk
8 astr
9 mmsk
10 astr
11 astr
12 mstr
13 mhhk
14 mstr
15 mhhk
Name: a1, dtype: object
print (df.a2.groupby(df.a1.str[:4]).sum().reset_index())
a1 a2
0 astr 1.112200
1 jmtr -1.559358
2 mhhk 1.113222
3 mmsk -0.023918
4 mstr -2.526466
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.