I have a dataframe with 4 columns: 'age_1', 'name_1', 'age_2' and 'name_2'.
df = pd.DataFrame(index=[0, 4, 6, 9],
data={'age_1': [18, np.nan, 12, np.nan],
'name_1': ['Fred', np.nan, 'Harry', np.nan],
'age_2': [np.nan, 34, np.nan, 45],
'name_2': [np.nan, 'Jim', np.nan, 'Fred']})
Output
age_1 name_1 age_2 name_2
0 18.0 Fred NaN NaN
4 NaN NaN 34.0 Jim
6 12.0 Harry NaN NaN
9 NaN NaN 45.0 Fred
All names appear twice (once in name_1 and once in name_2) I want to put the rows together where name_1 and name_2 have the same item in. For example from the snippet above, i want it to put the first and last row together like this:
age_1 name_1 age_2 name_2
0 18.0 Fred 45.0 Fred
Any help would be great
you can split the dataframe into two parts and join them using merge. since the join columns name_1
& name_2
have nulls, you have to drop the nulls first.
l1 = ['age_1', 'name_1']
l2 = ['age_2', 'name_2']
df[l1].dropna().merge(df[l2].dropna(), left_on='name_1', right_on='name_2')
#outputs:
age_1 name_1 age_2 name_2
0 18.0 Fred 45.0 Fred
If df
is your dataframe:
df[["age_1", "name_1"]].dropna(how="all").join(df[["name_2", "age_2"]].dropna(how="all").set_index("name_2")[["age_2"]], on="name_1")
Will give you approximately what you're looking for (the name will not be repeated as in your example, since it's the key that's being joined on, it will appear just once).
Note this is a left join, any name_2
s that do not have corresponding name_1
s will be thrown away (however, name_1
s with no corresponding name_2
, like Harry
, will remain). If you want to keep those name_2
s, just add how="outer"
as as keyword argument to the join method. If you're sure that all names will always appear twice, then it won't matter.
If a name_1
has multiple name_2
s, the row will be repeated to accomodate as many name_2
s as it has. Again, if each name appears exactly twice (exactly once in the name_1
column and exactly once in the name_2
column), this won't matter. I would add a check for that like this:
# check that there are no repeats
for col in ("name_1", "name_2"):
assert df[col].dropna().nunique() == df[col].dropna().shape[0]
# check that all `name_1`s have corresponding `name_2`s
assert set(df["name_1"].dropna()) == set(df["name_2"].dropna())
Edited: to add dropna's as suggest in comments
df= pd.DataFrame({'age_1':[18,'',12,''],'name_1':['Fred','','Harry',''],'age_2':['',34,'',45],'name_2':['','Jim','','Fred']})
df1=df[['age_1','name_1']]
df2=df[['age_2','name_2']]
df_new=df1.merge(df2,how='left',left_on='name_1',right_on='name_2' )
df_new=df_new.replace('',np.nan)
df_new.dropna(how='any',inplace =True)
df_new
Output
age_1 name_1 age_2 name_2
0 18.0 Fred 45.0 Fred
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.