I have a dataframe like this,
EmpID FirstName LastName Relationship FirstName.1 LastName.1 Relationship.1
1 Ax Bx 1A Cx Dx 1B
My excel source file didn't have .1 on the duplicate columns but pandas created it - when I read it, I think that's the way it is - you cant have duplicate columns in the database.
I want to convert this into a dataframe like this,
EmpID FirstName LastName Relationship
1 Ax Bx 1A
1 Cx Dx 1B
How do we do this transformation. Thanks.
You could create two new dataframes, and then append the second to the first:
df1 = df[['EmpID', 'FirstName', 'LastName', 'Relationship']]
df2 = df[['EmpID', 'FirstName.1', 'LastName.1', 'Relationship.1']]
df2.rename(columns=lambda x: x.replace('.1',''), inplace=True)
df = df1.append(df2, ignore_index=True)
>>> print(df)
0 EmpID FirstName LastName Relationship
0 1 Ax Bx 1A
1 1 Cx Dx 1B
This can be done with pandas.wide_to_long
, but your column names aren't perfect, so we make the first instances stubs by appending .0
to the name.
import pandas as pd
df.columns = [f'{x}.0' if not '.' in x and x != 'EmpID' else x for x in df.columns]
pd.wide_to_long(df, stubnames=['FirstName', 'LastName', 'Relationship'],
sep='.', i='EmpID', j='suff').reset_index().drop(columns='suff')
EmpID FirstName LastName Relationship
0 1 Ax Bx 1A
1 1 Cx Dx 1B
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.