I'm using python 3.7 I have a pandas data frame with three text columns,name,email & section. The sample data looks like
Name Email Section
abc abc@gmail.com purchase
cde - drawing
lmn-pqr None -
Hyphen are there in between two words in all of the three columns. I would like to join three columns with "_" as separator and create a new column group ignoring None or -. My combined outcome will look like
Name Email Section Group
abc abc@gmail.com purchase abc_abc@gmail.com_purchase
cde - drawing cde_drawing
lmn-pqr None - lmn-pqr
I'm not sure about the python code. Can you please help me?
You can use str.cat
that gets rid of null values:
df.mask(df.isin(['-', None])).apply(lambda r: r.str.cat(sep='_'), axis=1)
or, manually:
df['Group'] = df.apply(lambda r: '_'.join([x for x in r.replace('-', pd.NA).dropna()]),
axis=1)
output:
Name Email Section Group
0 abc abc@gmail.com purchase abc_abc@gmail.com_purchase
1 cde - drawing cde_drawing
2 lmn-pqr None - lmn-pqr
You can try replace -
with None
then filter it out when join
df['Group'] = df.replace({'-': None}).apply(lambda row: '_'.join(filter(None, row)), axis=1)
print(df)
Name Email Section Group
0 abc abc@gmail.com purchase abc_abc@gmail.com_purchase
1 cde - drawing cde_drawing
2 lmn-pqr None - lmn-pqr
df['Group'] = df.apply(lambda x: '-'.join([x['Name'], x['Email'], x['Section']))
x is a Series.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.