简体   繁体   English

合并并组合两列不同的数据帧

[英]Merge and combine 2 columns of different dataframe

I have 2 dataframes : 我有2个数据帧:

ID             word
1              srv1
2              srv2
3              srv1
4              nan
5              srv3
6              srv1
7              srv5
8              nan
ID             word
1              nan
2              srv12
3              srv10
4              srv8
5              srv4
6              srv7
7              nan
8              srv9

What I need is to merge thoses 2 dataframes on ID and combine the column word to get : 我需要的是在ID上合并2个数据帧并组合列字来得到:

ID             word
1              srv1 
2              srv2 , srv12
3              srv1 , srv10
4              srv8
5              srv3 , srv4
6              srv1 , srv7
7              srv5
8              srv9

With the following code 使用以下代码

merge = pandas.merge(df1,df2,on="ID",how="left")
merge["word"] = merge[word_x] + " , " + merge["word_y"]

I am getting: 我正进入(状态:

ID             word
1              nan 
2              srv2 , srv12
3              srv1 , srv10
4              nan
5              srv3 , srv4
6              srv1 , srv7
7              nan
8              nan

Which it is not the correct solution. 这不是正确的解决方案。

You can use Series.str.cat and the na_rep option to populate the word column even if one of the source columns in nan , then use str.strip to trim any leading/trailing ' , ' not between words. 您可以使用Series.str.catna_rep选项填充word列,即使其中一个源列位于nan ,然后使用str.strip来修剪任何前导/尾随' , '而不是单词之间。

m['word'] = m['word_x'].str.cat(m['word_y'], sep=' , ', na_rep='').str.strip(' , ')

returns 回报

   ID word_x word_y          word
0   1   srv1    NaN          srv1
1   2   srv2  srv12  srv2 , srv12
2   3   srv1  srv10  srv1 , srv10
3   4    NaN   srv8          srv8
4   5   srv3   srv4   srv3 , srv4
5   6   srv1   srv7   srv1 , srv7
6   7   srv5    NaN          srv5
7   8    NaN   srv9          srv9

you can use np.select to select the existing value, or the concatenated value. 您可以使用np.select选择现有值或连接值。

try this: 试试这个:

import pandas as pd
import numpy as np
from io import StringIO

df1 = pd.read_csv(StringIO("""
ID             word
1              srv1
2              srv2
3              srv1
4              nan
5              srv3
6              srv1
7              srv5
8              nan"""), sep=r"\s+")

df2 = pd.read_csv(StringIO("""
ID             word
1              nan
2              srv12
3              srv10
4              srv8
5              srv4
6              srv7
7              nan
8              srv9"""), sep=r"\s+")


conditions = [(~df1["word"].isna()) & df2["word"].isna(), df1["word"].isna() & (~df2["word"].isna()), (~df1["word"].isna()) & (~df2["word"].isna())]
choices = [df1["word"], df2["word"], df1["word"] + "," + df2["word"]]

df1["word"] = np.select(conditions,choices)

print(df1)

Output: 输出:

   ID        word
0   1        srv1
1   2  srv2,srv12
2   3  srv1,srv10
3   4        srv8
4   5   srv3,srv4
5   6   srv1,srv7
6   7        srv5
7   8        srv9

Based on what I think you want to do I would first get rid of those nan 's: 根据我的想法,我首先要摆脱那些nan的:

df_1.fillna(value="")
df_2.fillna(value="")

And then I would try the merge again and see if you get what you want. 然后我会再次尝试合并,看看你是否得到了你想要的东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM