简体   繁体   中英

Merging rows when some columns are the same using Pandas Python

Now I have a dataframe, I want to merge rows. The value B is determined by the order in the strings in a list L = ['xx','yy','zz']

    A   B
0   a   xx
1   a   yy
2   b   zz
3   b   yy
  1. For row 0 and 1, the result will be 'a' for column A and 'xx' for column B ('xx' come before 'yy' in L)
  2. For row 2 and 3, the result will be 'b' for column A and 'yy' for column B ('yy' come before 'zz' in L)

Desired outcome:

    A   B
0   a   xx
1   b   yy

You can use pandas.Series.map and pandas.DataFrame.groupby :

df['C'] = df['B'].map(dict(zip(L,range(len(L)))))
df.groupby('A')[['B','C']].apply(lambda x: x.iloc[x["C"].argmin()]['B'])
#A
#a    xx
#b    yy

You can get the same result using pandas.Categorical :

df['B'] = pd.Categorical(df['B'], categories = L, ordered = True)
df.groupby('A').min()
#      B
#A
#a    xx
#b    yy

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM