使用python（熊貓）有條件合並CSV文件

Question

我正在嘗試使用相同的架構合並>=2文件。
這些文件將包含重復的條目，但行將不相同，例如：

file1:
store_id,address,phone
9191,9827 Park st,999999999
8181,543 Hello st,1111111111

file2:
store_id,address,phone
9191,9827 Park st Apt82,999999999
7171,912 John st,87282728282

Expected output:
9191,9827 Park st Apt82,999999999
8181,543 Hello st,1111111111
7171,912 John st,87282728282

如果您注意到了：基於store_id和phone 9191,9827 Park st,999999999 and 9191,9827 Park st Apt82,999999999是相似的，但由於該地址更具描述性，因此我從file2中選擇了它。

store_id+phone_number是我的復合主鍵，用於查找位置並查找重復項（在上面的示例中，store_id足以找到它，但我需要基於多個列值的鍵）

題：
-我需要合並具有相同架構但重復行的多個CSV文件。
-行級合並應該具有根據行的值選擇行的特定值的邏輯。 就像從文件1提取電話和從文件2提取地址一樣。
-1或許多列值的組合將定義行是否重復。

熊貓能做到嗎？

Answer 1

將它們粉碎在一起的一種方法是使用merge（在store_id和number上，如果這些是索引，那么這將是聯接而不是合並）：

In [11]: res = df1.merge(df2, on=['store_id', 'phone'], how='outer')

In [12]: res
Out[12]:
   store_id     address_x        phone           address_y
0      9191  9827 Park st    999999999  9827 Park st Apt82
1      8181  543 Hello st   1111111111                 NaN
2      7171           NaN  87282728282         912 John st

然后where您可以where選擇address_y如果存在），否則使用address_x ：

In [13]: res['address'] = res.address_y.where(res.address_y, res.address_x)

In [14]: del res['address_x'], res['address_y']

In [15]: res
Out[15]: 
   store_id        phone             address
0      9191    999999999  9827 Park st Apt82
1      8181   1111111111        543 Hello st
2      7171  87282728282         912 John st

Answer 2

如何使用concat ， groupby ， agg ，然后可以編寫agg函數以選擇正確的值：

import pandas as pd
import io

t1 = """store_id,address,phone
9191,9827 Park st,999999999
8181,543 Hello st,1111111111"""

t2 = """store_id,address,phone
9191,9827 Park st Apt82,999999999
7171,912 John st,87282728282"""

df1 = pd.read_csv(io.BytesIO(t1))
df2 = pd.read_csv(io.BytesIO(t2))

df = pd.concat([df1, df2]).reset_index(drop=True)

def f(s):
    loc = s.str.len().idxmax()
    return s[loc]

df.groupby(["store_id", "phone"]).agg(f)

使用python（熊貓）有條件合並CSV文件

問題描述

2 個解決方案

解決方案1
0 已采納 2013-11-19 01:22:56

解決方案2
0 2013-11-19 03:50:46

使用python（熊貓）有條件合並CSV文件

問題描述

2 個解決方案

解決方案1 0 已采納 2013-11-19 01:22:56

解決方案2 0 2013-11-19 03:50:46

解決方案1
0 已采納 2013-11-19 01:22:56

解決方案2
0 2013-11-19 03:50:46