[英]Merge and combine 2 columns of different dataframe
I have 2 dataframes : 我有2个数据帧:
ID word
1 srv1
2 srv2
3 srv1
4 nan
5 srv3
6 srv1
7 srv5
8 nan
ID word
1 nan
2 srv12
3 srv10
4 srv8
5 srv4
6 srv7
7 nan
8 srv9
What I need is to merge thoses 2 dataframes on ID and combine the column word to get : 我需要的是在ID上合并2个数据帧并组合列字来得到:
ID word
1 srv1
2 srv2 , srv12
3 srv1 , srv10
4 srv8
5 srv3 , srv4
6 srv1 , srv7
7 srv5
8 srv9
With the following code 使用以下代码
merge = pandas.merge(df1,df2,on="ID",how="left")
merge["word"] = merge[word_x] + " , " + merge["word_y"]
I am getting: 我正进入(状态:
ID word
1 nan
2 srv2 , srv12
3 srv1 , srv10
4 nan
5 srv3 , srv4
6 srv1 , srv7
7 nan
8 nan
Which it is not the correct solution. 这不是正确的解决方案。
You can use Series.str.cat
and the na_rep
option to populate the word
column even if one of the source columns in nan
, then use str.strip
to trim any leading/trailing ' , '
not between words. 您可以使用Series.str.cat
和na_rep
选项填充word
列,即使其中一个源列位于nan
,然后使用str.strip
来修剪任何前导/尾随' , '
而不是单词之间。
m['word'] = m['word_x'].str.cat(m['word_y'], sep=' , ', na_rep='').str.strip(' , ')
returns 回报
ID word_x word_y word
0 1 srv1 NaN srv1
1 2 srv2 srv12 srv2 , srv12
2 3 srv1 srv10 srv1 , srv10
3 4 NaN srv8 srv8
4 5 srv3 srv4 srv3 , srv4
5 6 srv1 srv7 srv1 , srv7
6 7 srv5 NaN srv5
7 8 NaN srv9 srv9
you can use np.select
to select the existing value, or the concatenated value. 您可以使用np.select
选择现有值或连接值。
try this: 试试这个:
import pandas as pd
import numpy as np
from io import StringIO
df1 = pd.read_csv(StringIO("""
ID word
1 srv1
2 srv2
3 srv1
4 nan
5 srv3
6 srv1
7 srv5
8 nan"""), sep=r"\s+")
df2 = pd.read_csv(StringIO("""
ID word
1 nan
2 srv12
3 srv10
4 srv8
5 srv4
6 srv7
7 nan
8 srv9"""), sep=r"\s+")
conditions = [(~df1["word"].isna()) & df2["word"].isna(), df1["word"].isna() & (~df2["word"].isna()), (~df1["word"].isna()) & (~df2["word"].isna())]
choices = [df1["word"], df2["word"], df1["word"] + "," + df2["word"]]
df1["word"] = np.select(conditions,choices)
print(df1)
Output: 输出:
ID word
0 1 srv1
1 2 srv2,srv12
2 3 srv1,srv10
3 4 srv8
4 5 srv3,srv4
5 6 srv1,srv7
6 7 srv5
7 8 srv9
Based on what I think you want to do I would first get rid of those nan
's: 根据我的想法,我首先要摆脱那些nan
的:
df_1.fillna(value="")
df_2.fillna(value="")
And then I would try the merge again and see if you get what you want. 然后我会再次尝试合并,看看你是否得到了你想要的东西。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.