In the database 4 columns(A,B,C,D) are there. A,B and C column are used to group the column D. Based on A,B,C column , I want to concatenate D columns Id. Consider the below one is my database :
A B C D
A1 B1 C1 12
A1 B1 C1 15
A2 B2 C2 16
A4 B4 C4 18
A1 B1 C1 19
I am expecting the below result after running code:
A B C D
A1 B1 C1 12_15_19
A2 B2 C2 16
A4 B4 C4 18
I have used below code to perform this operation:
df23['combined']=df23.apply(lambda x:'%s_%s_%s' % (x['A'],x['B'],x['C']),axis=1)
for i in range(len(df23)):
df23['ABC'] = df23.iloc[:,3]
for j in range(len(df23)+1):
cur = df23.iloc[i,3]
nxt = df23.iloc[j,3]
if cur==nxt:
df23['ABC'] = df23.iloc[i,4] +'_'+ df23.iloc[j,3]
It is not working as per my expectation, can you please suggest me if any other way we can built for the same. Thanks in Advance:)
pandas.DataFrame.groupby
SYNTAX: DataFrame.groupby(by=None, axis=0, level=None, as_index=True, sort=True, group_keys=True, squeeze=False, observed=False, **kwargs)[source]
Group DataFrame or Series using a mapper or by a Series of columns.
A groupby operation involves some combination of splitting the object, applying a function, and combining the results. This can be used to group large amounts of data and compute operations on these groups
.
pandas.DataFrame.apply
SYNTAX: DataFrame.apply(func, axis=0, broadcast=None, raw=False, reduce=None, result_type=None, args=(), **kwds)[source]
Apply a function along an axis of the DataFrame.
Objects passed to the function are Series objects whose index is either the DataFrame's index (axis=0) or the DataFrame's columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.
import pandas as pd
# Create dataframe
raw_data = {'A': ['A1', 'A1', 'A2', 'A4', 'A1'],
'B': ['B1', 'B1', 'B2', 'B4', 'B1'],
'C': ['C1', 'C1', 'C2', 'C4', 'C1'],
'D': [12, 15, 16,18, 19]}
df = pd.DataFrame(raw_data, columns = ['A', 'B', 'C', 'D'])
print (df)
df_grouped = df.groupby(['A','B','C'])['D'].apply(lambda text: ''.join(text.to_string(index=False))).str.replace('(\\n)', '_').reset_index()
print (df_grouped)
output:
A B C D
0 A1 B1 C1 12
1 A1 B1 C1 15
2 A2 B2 C2 16
3 A4 B4 C4 18
4 A1 B1 C1 19
A B C D
0 A1 B1 C1 12_ 15_ 19
1 A2 B2 C2 16
2 A4 B4 C4 18
NOTE: If you want to print a data.frame without index use:
print (df.to_string(index = False))
print (df_grouped.to_string(index = False))
output:
A B C D
A1 B1 C1 12
A1 B1 C1 15
A2 B2 C2 16
A4 B4 C4 18
A1 B1 C1 19
A B C D
A1 B1 C1 12_ 15_ 19
A2 B2 C2 16
A4 B4 C4 18
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.