![](/img/trans.png)
[英]Pandas: create new column in one dataframe with values based on matching key from another dataframe
[英]New column based on matching values from another dataframe pandas
如果我們在下面的例子中有兩個數據幀,如df1
和df2
; 我們如何合並它們來生成df3
?
import pandas as pd
import numpy as np
data1 = [("a1",["A","B"]),("a2",["A","B","C"]),("a3",["B","C"])]
df1 = pd.DataFrame(data1,columns = ["column1","column2"])
print df1
data2 = [("A",["1","2"]),("B",["1","3","4"]),("C",["5"])]
df2 = pd.DataFrame(data2,columns=["column3","column4"])
print df2
data3 = [("a1",["A","B"],["1","2","3","4"]),("a2",["A","B","C"],
["1","2","3","4","5"]),("a3",["B","C"],["1","3","4","5"])]
df3 = pd.DataFrame(data3,columns = ["column1","column2","column5"])
print df3
我的目標是不使用for循環,因為我正在處理大數據集
在使用DataFrame
重新創建后檢查stack
df1的列表列,然后map
df2
的值
此外,因為你要求不使用for循環我使用sum
,並且這種情況的sum
比*for loop*
或itertools
慢得多
s=pd.DataFrame(df1.column2.tolist()).stack()
df1['New']=s.map(df2.set_index('column3').column4).sum(level=0).apply(set)
df1
Out[36]:
column1 column2 New
0 a1 [A, B] {2, 4, 3, 1}
1 a2 [A, B, C] {3, 5, 4, 2, 1}
2 a3 [B, C] {4, 3, 1, 5}
正如我所提到的,我們大多數人都建議,你也可以檢查帶有熊貓的For循環 - 我什么時候應該關心?
import itertools
d=dict(zip(df2.column3,df2.column4))
l=[set(itertools.chain(*[d[y] for y in x ])) for x in df1.column2.tolist()]
df1['New']=l
你可以這樣做:
df2_dict = {i:j for i,j in zip(df2['column3'].values, df2['column4'].values)}
# print(df2_dict)
def func(val):
return sorted(list(set(np.concatenate([df2_dict.get(i) for i in val]))))
df1['column5'] = df1['column2'].apply(func)
print(df1)
輸出:
column1 column2 column5
0 a1 [A, B] [1, 2, 3, 4]
1 a2 [A, B, C] [1, 2, 3, 4, 5]
2 a3 [B, C] [1, 3, 4, 5]
這有效:
df1['column2'].apply(lambda x: list(set((np.concatenate([df2.set_index('column3')['column4'][i] for i in list(x)])) )))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.