简体   繁体   English

将2个数据框的熊猫分离附加到第一个数据框

[英]Append Pandas disjunction of 2 dataframes to first dataframe

Given 2 pandas tables, both with the 3 columns id , x and y coordinates. 给定2个熊猫表,它们都具有3列idxy坐标。 So several rows of same id represent a graph with its x - y values. 因此,具有相同id几行代表具有x - y值的图。 How would I find paths that do not exist in the first table, but in the second and append them to 1st table? 如何找到第一个表中不存在但第二个表中不存在的路径并将它们附加到第一个表中? Key problem is that the order of the graphs in both tables can be different. 关键问题在于两个表中图形的顺序可能不同。

Example: 例:

df1 = pd.DataFrame({'id':[1,1,2,2,2,3,3,3], 'x':[1,1,5,4,4,1,1,1], 'y':[1,2,4,4,3,4,5,6]})
df2 = pd.DataFrame({'id':[1,1,1,2,2,3,3,3,4,4,4], 'x':[1,1,1,1,1,5,4,4,10,10,9], 'y':[4,5,6,1,2,4,4,3,1,2,2]})

(df1   intersect df2  )  --------->  df1
id x y       id x y              id x y 
1  1 1       1  1 4              1  1 1 
1  1 2       1  1 5              1  1 2
2  5 4       1  1 6              2  5 4
2  4 4       2  1 1              2  4 4
2  4 3       2  1 2              2  4 3
3  1 4       3  5 4              3  1 4
3  1 5       3  4 4              3  1 5
3  1 6       3  4 3              3  1 6
             4  10 1             4  10 1
             4  10 2             4  10 2
             4   9 2             4   9 2 
Should become:
df1 = pd.DataFrame({'id':[1,1,2,2,2,3,3,3,4,4,4], 'x':[1,1,5,4,4,1,1,1,10,10,9], 'y':[1,2,4,4,3,4,5,6,1,2,2]})

As you can see until id = 3, df1 and df2 have similar graphs, but their order is different from one to another table. 如您所见,直到id = 3为止, df1df2具有相似的图形,但是它们的顺序在一张表和另一张表中是不同的。 In this case for example df1 first graph is df2 seconds graph. 在这种情况下,例如df1第一图表是df2秒图表。 Now df2 has a 4th path that is not in df1 . 现在df2具有不在df1的第四条路径。 In that case the 4th path should be detected and appended to df1 . 在这种情况下,应该检测到第四条路径并将其附加到df1 Like that I want to get the intersection of the 2 pandas table and append the disjunction of the both to the first table, with the condition that the id , so to say the order of the paths can be different from one and another. 这样,我想得到2个pandas表的交集,并将两者的析取附加到第一个表,条件是id ,也就是说路径的顺序可以彼此不同。

Imports: 进口:

import pandas as pd

Set starting DataFrames: 设置开始的DataFrames:

df1 = pd.DataFrame({'id':[1,1,2,2,2,3,3,3], 
                    'x':[1,1,5,4,4,1,1,1], 
                    'y':[1,2,4,4,3,4,5,6]})
df2 = pd.DataFrame({'id':[1,1,1,2,2,3,3,3,4,4,4], 
                    'x':[1,1,1,1,1,5,4,4,10,10,9], 
                    'y':[4,5,6,1,2,4,4,3,1,2,2]})

Outer Merge: 外部合并:

df_merged = df1.merge(df2, on=['x', 'y'], how='outer')

produces: 产生:

df_merged =

   id_x  x  y   id_y
0   1.0  1  1   2
1   1.0  1  2   2
2   2.0  5  4   3
3   2.0  4  4   3
4   2.0  4  3   3
5   3.0  1  4   1
6   3.0  1  5   1
7   3.0  1  6   1
8   NaN  10 1   4
9   NaN  10 2   4
10  NaN  9  2   4

Note: Why does id_x become floats? 注意: 为什么id_x变成浮点数?

Fill NaN: 填写NaN:

df_merged.id_x = df_merged.id_x.fillna(df_merged.id_y).astype('int')

produces: 产生:

df_merged = 

 id_x   x   y   id_y
0   1   1   1   2
1   1   1   2   2
2   2   5   4   3
3   2   4   4   3
4   2   4   3   3
5   3   1   4   1
6   3   1   5   1
7   3   1   6   1
8   4   10  1   4
9   4   10  2   4
10  4   9   2   4

Drop id_y : 删除id_y

df_merged = df_merged.drop(['id_y'], axis=1)

produces: 产生:

df_merged = 

    id_x    x   y
0      1    1   1
1      1    1   2
2      2    5   4
3      2    4   4
4      2    4   3
5      3    1   4
6      3    1   5
7      3    1   6
8      4    10  1
9      4    10  2
10     4    9   2

Rename id_x to id : id_x重命名为id

df_merged = df_merged.rename(columns={'id_x': 'id'})

produces: 产生:

df_merged = 

    id  x   y
0   1   1   1
1   1   1   2
2   2   5   4
3   2   4   4
4   2   4   3
5   3   1   4
6   3   1   5
7   3   1   6
8   4   10  1
9   4   10  2
10  4   9   2

Final Program is 4 lines of code: 最终程序是4行代码:

import pandas as pd

df1 = pd.DataFrame({'id':[1,1,2,2,2,3,3,3], 
                    'x':[1,1,5,4,4,1,1,1], 
                    'y':[1,2,4,4,3,4,5,6]})
df2 = pd.DataFrame({'id':[1,1,1,2,2,3,3,3,4,4,4], 
                    'x':[1,1,1,1,1,5,4,4,10,10,9], 
                    'y':[4,5,6,1,2,4,4,3,1,2,2]})

df_merged = df1.merge(df2, on=['x', 'y'], how='outer')
df_merged.id_x = df_merged.id_x.fillna(df_merged.id_y).astype('int')
df_merged = df_merged.drop(['id_y'], axis=1)
df_merged = df_merged.rename(columns={'id_x': 'id'})

Please remember to put a check next to the selected answer. 请记住在所选答案旁边打勾。

Mauritius, try this code: 毛里求斯,请尝试以下代码:

df1 = pd.DataFrame({'id':[1,1,2,2,2,3,3,3], 'x':[1,1,5,4,4,1,1,1], 'y':[1,2,4,4,3,4,5,6]})
df2 = pd.DataFrame({'id':[1,1,1,2,2,3,3,3,4,4,4,5], 'x':[1,1,1,1,1,5,4,4,10,10,9,1], 'y':[4,5,6,1,2,4,4,3,1,2,2,2]})

df1_s = [{(x,y) for x, y in df1[['x','y']][df1.id==i].values} for i in df1.id.unique()]

def f(df2):
    data = {(x,y) for x, y in df2[['x','y']].values}
    if data not in df1_s:
        return True
    else:
        return False

check = df2.groupby('id').apply(f).apply(pd.Series)
ids = check[check[0]].index.values
df2 = df2.set_index('id').loc[ids].reset_index()

df1 = df1.append(df2)

OUT: 出:

   id   x  y
0   1   1  1
1   1   1  2
2   2   5  4
3   2   4  4
4   2   4  3
5   3   1  4
6   3   1  5
7   3   1  6
0   4  10  1
1   4  10  2
2   4   9  2
3   5   1  2

I think it can be done more simple and pythonic, but I think a lot and still don't know how = ) 我认为它可以更简单,更pythonic地完成,但是我认为很多事情仍然不知道如何=)

And I think, should to check ids is not the same in df1 and df2, before append one df to another (in the end). 而且我认为,在将一个df附加到另一个df(最后)之前,应该检查df1和df2中的id是否不同。 I might add this later. 我可能会在以后添加。

Does this code do what you want? 这段代码能满足您的要求吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM