I want to merge the values of two different columns of pandas dataframe into one column of new dataframe.
pandas df1 =
hapX
pos 0.0
1 721 0.2
2 735 0.5
3 739 1.0
pandas df2 =
hapY
pos 0.1
1 721 0.0
2 735 0.6
3 739 1.5
I want to generate a new dataframe like:
df_joined['hapX|Y'] = df1.astype(str).add('|').add(df2.astype(str))
with expected output :
hapX|Y
pos 0.0|0.1
1 721 0.2|0.0
2 735 0.5|0.6
3 739 1.0|1.5
But, this is outputting bunch of NaN
hapX hapY
pos NaN NaN
1 721 NaN NaN
2 735 NaN NaN
3 739 NaN NaN
Is the problem with value being float (i don't think so). What is the problem with my approach?
Also, is there a way to automate the process if columns values are like hapX1 hapX1 hapX3
in one dataframe with hapY1 hapY2 hapY3
in another dataframe?
Thanks,
You can merge the two dataframes and then concat the hapX and hapY. Say your first column name is no.
df_joined = df1.merge(df2, on = 'no')
df_joined['hapX|Y'] = (df_joined['hapX'].astype(str))+'|'+(df_joined['hapY'].astype(str))
df_joined.drop(['hapX', 'hapY'], axis = 1)
This gives you
no hapX|Y
0 pos 0.0|0.1
1 721 0.2|0.0
2 735 0.5|0.6
3 739 1.0|1.5
Just to add onto the previous answer, for the general case of N DataFrames,
Suppose you have a number of DataFrames as follows:
dfs = [pd.DataFrame({'hapY'+str(j): [random.random() for i in range(10)]}) for j in range(5)]
such that
>>> dfs[0]
hapY0
0 0.175683
1 0.353729
2 0.949848
3 0.346088
4 0.435292
5 0.837879
6 0.277274
7 0.623121
8 0.325119
9 0.709252
Then,
>>> map( lambda m: '|'.join(m) , zip(*[ dfs[j]['hapY'+str(j)].astype(str) for j in range(5)]))
['0.0845464936138|0.193336164837|0.551717121013|0.113566029656|0.479590342798',
'0.275851474238|0.694161791339|0.151607726092|0.615367668451|0.498997567849',
'0.116891472119|0.258406028668|0.315137581816|0.819992354178|0.864412473301',
'0.729581942312|0.614902776003|0.443986436146|0.227782256619|0.0149481683863',
'0.745583477173|0.441456815889|0.428691631831|0.307480112319|0.136790112739',
'0.981337451224|0.0117895017035|0.415140979617|0.650957722911|0.968082350568',
'0.725618728314|0.0546057041356|0.715910454674|0.0828229441557|0.220878025678',
'0.704047455894|0.303403129266|0.0499082759635|0.49727194707|0.251623048104',
'0.453595354131|0.146042134766|0.346665276655|0.911092176243|0.291405609407',
'0.140523603089|0.117930249858|0.902071673051|0.0804933425857|0.876006332635']
which you can later put into a DataFrame.
I think the simpliest is rename columns by dict
which can be created by dict comprehension
, last add_suffix
:
print (df1)
hapX1 hapX2 hapX3 hapX4
pos
23 1.0 0.0 1.0 1.0
24 1.0 1.0 1.5 1.0
28 1.0 0.0 0.5 0.0
print (df2)
hapY1 hapY2 hapY3 hapY4
pos
23 0.0 1.0 0.5 0.0
24 1.0 1.0 1.5 1.0
28 0.0 1.0 1.0 1.0
d = {'hapY' + str(x):'hapX' + str(x) for x in range(1,5)}
print (d)
{'hapY1': 'hapX1', 'hapY3': 'hapX3', 'hapY2': 'hapX2', 'hapY4': 'hapX4'}
df_joined = df1.astype(str).add('|').add(df2.rename(columns=d).astype(str)).add_suffix('|Y')
print (df_joined)
hapX1|Y hapX2|Y hapX3|Y hapX4|Y
pos
23 1.0|0.0 0.0|1.0 1.0|0.5 1.0|0.0
24 1.0|1.0 1.0|1.0 1.5|1.5 1.0|1.0
28 1.0|0.0 0.0|1.0 0.5|1.0 0.0|1.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.