简体   繁体   中英

pandas column values merge

I have a pandas data frame like this:

df = pd.DataFrame({'a1':['astr1','jmtr2','astr2','mmsk3',
                         'astr6','jmtr2','astr2','mhhk',
                         'astr5','mmsk','astr6','astr1',
                        'mstr1','mhhk','mstr2','mhhk'],
                   'a2':[x for x in np.random.randn(16)]})
df

    a1      a2
0   astr1   -0.490416
1   jmtr2   0.651627
2   astr2   0.784004
3   mmsk3   -1.595870
4   astr6   1.228631
5   jmtr2   -1.644518
6   astr2   -0.311709
7   mhhk    -1.284221
8   astr5   -0.356339
9   mmsk    -0.071046
10  astr6   1.620838
11  astr1   -0.717384
12  mstr1   0.830618
13  mhhk    -0.020226
14  mstr2   -0.056465
15  mhhk    -0.160234

What I want to do now is merging a1 if the first four letters is the same. Meanwhile, the values of a2 should to be added.

Like this:

    a1     a2
0   astr   $sum of astr$
1   jmtr   $sum of jmtr$
2   mmsk   $sum of mmsk$
3   mhhk   $sum of mhhk$
4   mstr   $sum of mstr$

I think you need groupby by first 4 characters of a1 with indexing with str and aggregate sum :

print (df.a1.str[:4])
0     astr
1     jmtr
2     astr
3     mmsk
4     astr
5     jmtr
6     astr
7     mhhk
8     astr
9     mmsk
10    astr
11    astr
12    mstr
13    mhhk
14    mstr
15    mhhk
Name: a1, dtype: object

print (df.a2.groupby(df.a1.str[:4]).sum().reset_index())
     a1        a2
0  astr  1.112200
1  jmtr -1.559358
2  mhhk  1.113222
3  mmsk -0.023918
4  mstr -2.526466

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM