Python在滿足條件的列上聯接兩個數據框

Question

假設我有兩個數據框A和B，每個數據框包含稱為x和y的兩列。 我想將這兩個數據框聯接在一起，但不希望在兩個數據框的x和y列相等的行上聯接，而是在A的x列是B的x列的子字符串且與y相同的行上聯接。 例如

if A[x][1]='mpla' and B[x][1]='mplampla'

我希望將其捕獲。

在sql上，它將類似於：

select *
from A
join B
on A.x<=B.x and A.y<=B.y.

可以在python上做類似的事情嗎？

Answer 1

您可以一次將一個字符串與一列中的所有字符串進行匹配，如下所示：

import numpy.core.defchararray as ca

ca.find(B.x.values.astype(str), 'mpla') >= 0

問題在於您必須遍歷A所有元素。 但是，如果您負擔得起，它應該可以工作。

另請參閱：熊貓+數據框-按部分字符串選擇

Answer 2

你可以嘗試像

B.x.where(B.x.str.contains(A.x), B.index,         axis=index) #this would give you the ones that don't match 


B.x.where(B.x.str.match(A.x, as_indexer=True), B.index, axis=index) #this would also give you the one's that don't match.  You could see if you can use the "^" operator used for regex to get the ones that match.

您也可以嘗試

np.where(B.x.str.contains(A.x), B.index, np.nan)

您也可以嘗試：

matchingmask = B[B.x.str.contains(A.x)]

matchingframe = B.ix[matchingmask.index] #or 

matchingcolumn = B.ix[matchingmask.index].x #or

matchingindex = B.ix[matchingmask.index].index

所有這些都假設您在兩個框架上都具有相同的索引（我認為）

您想看一下字符串方法： http : //pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods

您想閱讀正則表達式和熊貓的方法： http : //pandas.pydata.org/pandas-docs/dev/indexing.html#the-where-method-and-masking

Python在滿足條件的列上聯接兩個數據框

問題描述

2 個解決方案

解決方案1
0 2015-01-21 06:09:36

解決方案2
0 2015-01-21 06:34:50

Python在滿足條件的列上聯接兩個數據框

問題描述

2 個解決方案

解決方案1 0 2015-01-21 06:09:36

解決方案2 0 2015-01-21 06:34:50

解決方案1
0 2015-01-21 06:09:36

解決方案2
0 2015-01-21 06:34:50