简体   繁体   中英

How to merge the two columns from two dataframe into one column of a new dataframe (pandas)?

I want to merge the values of two different columns of pandas dataframe into one column of new dataframe.

pandas df1 =         

        hapX
  pos   0.0
1 721   0.2
2 735   0.5
3 739   1.0


pandas df2 =       

        hapY
  pos   0.1
1 721   0.0
2 735   0.6
3 739   1.5

I want to generate a new dataframe like:

  df_joined['hapX|Y'] = df1.astype(str).add('|').add(df2.astype(str))

with expected output :

        hapX|Y
  pos   0.0|0.1
1 721   0.2|0.0
2 735   0.5|0.6
3 739   1.0|1.5

But, this is outputting bunch of NaN

        hapX    hapY
  pos   NaN      NaN
1 721   NaN      NaN
2 735   NaN      NaN
3 739   NaN      NaN

Is the problem with value being float (i don't think so). What is the problem with my approach?

Also, is there a way to automate the process if columns values are like hapX1 hapX1 hapX3 in one dataframe with hapY1 hapY2 hapY3 in another dataframe?

Thanks,

You can merge the two dataframes and then concat the hapX and hapY. Say your first column name is no.

df_joined = df1.merge(df2, on = 'no')
df_joined['hapX|Y'] = (df_joined['hapX'].astype(str))+'|'+(df_joined['hapY'].astype(str))
df_joined.drop(['hapX', 'hapY'], axis = 1)

This gives you

    no  hapX|Y
0   pos 0.0|0.1
1   721 0.2|0.0
2   735 0.5|0.6
3   739 1.0|1.5

Just to add onto the previous answer, for the general case of N DataFrames,

Suppose you have a number of DataFrames as follows:

dfs = [pd.DataFrame({'hapY'+str(j): [random.random() for i in range(10)]}) for j in range(5)]

such that

>>> dfs[0]
      hapY0
0  0.175683
1  0.353729
2  0.949848
3  0.346088
4  0.435292
5  0.837879
6  0.277274
7  0.623121
8  0.325119
9  0.709252

Then,

>>> map( lambda m: '|'.join(m) , zip(*[ dfs[j]['hapY'+str(j)].astype(str)  for j in range(5)]))
['0.0845464936138|0.193336164837|0.551717121013|0.113566029656|0.479590342798',
 '0.275851474238|0.694161791339|0.151607726092|0.615367668451|0.498997567849',
 '0.116891472119|0.258406028668|0.315137581816|0.819992354178|0.864412473301',
 '0.729581942312|0.614902776003|0.443986436146|0.227782256619|0.0149481683863',
 '0.745583477173|0.441456815889|0.428691631831|0.307480112319|0.136790112739',
 '0.981337451224|0.0117895017035|0.415140979617|0.650957722911|0.968082350568',
 '0.725618728314|0.0546057041356|0.715910454674|0.0828229441557|0.220878025678',
 '0.704047455894|0.303403129266|0.0499082759635|0.49727194707|0.251623048104',
 '0.453595354131|0.146042134766|0.346665276655|0.911092176243|0.291405609407',
 '0.140523603089|0.117930249858|0.902071673051|0.0804933425857|0.876006332635']

which you can later put into a DataFrame.

I think the simpliest is rename columns by dict which can be created by dict comprehension , last add_suffix :

print (df1) 
     hapX1  hapX2  hapX3  hapX4
pos                            
23     1.0    0.0    1.0    1.0
24     1.0    1.0    1.5    1.0
28     1.0    0.0    0.5    0.0

print (df2)
     hapY1  hapY2  hapY3  hapY4
pos                            
23     0.0    1.0    0.5    0.0
24     1.0    1.0    1.5    1.0
28     0.0    1.0    1.0    1.0

d = {'hapY' + str(x):'hapX' + str(x) for x in range(1,5)}
print (d)
{'hapY1': 'hapX1', 'hapY3': 'hapX3', 'hapY2': 'hapX2', 'hapY4': 'hapX4'}

df_joined = df1.astype(str).add('|').add(df2.rename(columns=d).astype(str)).add_suffix('|Y')
print (df_joined) 

     hapX1|Y  hapX2|Y  hapX3|Y  hapX4|Y
pos                                    
23   1.0|0.0  0.0|1.0  1.0|0.5  1.0|0.0
24   1.0|1.0  1.0|1.0  1.5|1.5  1.0|1.0
28   1.0|0.0  0.0|1.0  0.5|1.0  0.0|1.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM