简体   繁体   中英

concatenate the Id based on three columns in Python Data frame

In the database 4 columns(A,B,C,D) are there. A,B and C column are used to group the column D. Based on A,B,C column , I want to concatenate D columns Id. Consider the below one is my database :

A   B   C   D
A1  B1  C1  12
A1  B1  C1  15
A2  B2  C2  16
A4  B4  C4  18
A1  B1  C1  19

I am expecting the below result after running code:

A   B   C   D
A1  B1  C1  12_15_19
A2  B2  C2  16
A4  B4  C4  18

I have used below code to perform this operation:

df23['combined']=df23.apply(lambda x:'%s_%s_%s' % (x['A'],x['B'],x['C']),axis=1)

for i in range(len(df23)):
    df23['ABC'] = df23.iloc[:,3]
    for j in range(len(df23)+1):
      cur = df23.iloc[i,3]
      nxt = df23.iloc[j,3]
      if cur==nxt:
       df23['ABC'] = df23.iloc[i,4] +'_'+ df23.iloc[j,3]

It is not working as per my expectation, can you please suggest me if any other way we can built for the same. Thanks in Advance:)

pandas.DataFrame.groupby

SYNTAX: DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, observed=False, **kwargs)[source]

Group DataFrame or Series using a mapper or by a Series of columns.

A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups

.

pandas.DataFrame.apply

SYNTAX: DataFrame.apply(func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, args=(), **kwds)[source]

Apply a function along an axis of the DataFrame.

Objects passed to the function are Series objects whose index is either the DataFrame's index (axis=0) or the DataFrame's columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.

import pandas as pd
# Create dataframe
raw_data = {'A': ['A1', 'A1', 'A2', 'A4', 'A1'],
        'B': ['B1', 'B1', 'B2', 'B4', 'B1'],
        'C': ['C1', 'C1', 'C2', 'C4', 'C1'],
        'D': [12, 15, 16,18, 19]}
df = pd.DataFrame(raw_data, columns = ['A', 'B', 'C', 'D'])
print (df)
df_grouped = df.groupby(['A','B','C'])['D'].apply(lambda text: ''.join(text.to_string(index=False))).str.replace('(\\n)', '_').reset_index()
print (df_grouped)

output:

    A   B   C   D
0  A1  B1  C1  12
1  A1  B1  C1  15
2  A2  B2  C2  16
3  A4  B4  C4  18
4  A1  B1  C1  19

    A   B   C            D
0  A1  B1  C1   12_ 15_ 19
1  A2  B2  C2           16
2  A4  B4  C4           18

NOTE: If you want to print a data.frame without index use:

print (df.to_string(index = False))
print (df_grouped.to_string(index = False))

output:

  A   B   C   D
 A1  B1  C1  12
 A1  B1  C1  15
 A2  B2  C2  16
 A4  B4  C4  18
 A1  B1  C1  19

  A   B   C            D
 A1  B1  C1   12_ 15_ 19
 A2  B2  C2           16
 A4  B4  C4           18

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM