如果一個數據框的行值在另一數據框的列中，則創建一個新列並獲取該索引

Question

我可能使這個問題復雜化了，但是似乎找不到簡單的解決方案。

我有兩個DataFrame。 我們稱它們為df1和df2。 為了使事情簡單。 假設df1有一個列稱為“某些數據”，而df2有兩列稱為“某些數據”和“其他數據”。

例：

DF1

Some Data "Lebron James 123" "Lebron James 234"

DF2

some data                        other data
"Lebron James 123 + other text"  "I want this in df1["New?"]"
"Michael Jordan"                 "Doesn't Matter"

因此，基本上我想在df1中創建一個名為“ New？”的新列。 如果df1 [“ Some data”]在df2 [“ Some other data”]中，則此新列（在df1中）將顯示“ New”。 但是，如果df2 [“ some data”]中沒有實例，則將df1 [“ New？”]設置為df2 [“ other data”]中該特定行的值。

運行后所需的結果：

DF1

Some Data                         New?
"Lebron James 123"  "I want this in df1["New?"]"
"Lebron James 234"               "New"

如您所見，The New？ 列將包含來自另一數據列的特定行的值。 Lebron James 234在df2的某些數據中並不存在，因此它是全新的。

我可以使用.isin()方法讓它說是對還是.isin() ，但是不知道如何獲取另一個df的索引並從另一個數據列獲取值。

謝謝

編輯：

據我所知會起作用

df["New?"] = df1["Some Data"].isin(df2["some data"])

會渲染

DF1 [ “新？”]

True
False

因此，我希望True成為“我想要df1 [“ New？”]]中的內容，而False成為New

Answer 1

首先通過加入df1系列創建一個正則表達式：

rgx = '|'.join(df1['some data'])

現在使用np.where ：

df1.assign(data=np.where(df2['some data'].str.match(rgx), df2['other data'], 'New'))

          some data                        data
0  Lebron James 123  I want this in df1["New?"]
1  Lebron James 234                         New

形狀不匹配的示例：

df1 = pd.DataFrame({'a': ['a', 'b', 'c', 'd']})
df2 = pd.DataFrame({'a': ['aaaaa', 'bbbb', 'ffff', 'gggg', 'hhhh']})

rgx = '({})'.format('|'.join(df1.a))
m = df2.assign(flag=df2.a.str.extract(rgx))

df1.set_index('a').join(m.set_index('flag')).fillna('New').reset_index()

  index      a
0     a  aaaaa
1     b   bbbb
2     c    New
3     d    New

Answer 2

根據您的信息，似乎您只需要一個簡單的np.where （如果dfs具有相同的長度）

df1['New?'] = np.where(df1["Some Data"].isin(df2["some data"]), df2['other data'], 'New')

    Some Data                       New?
0   Lebron James 123 + other text   I want this in df1[New?"]"
1   Lebron James 234                New

對於不同的長度，

mask = df2["some data"].isin(df["Some Data"]).values
df.loc[mask,'New'] = df2.loc[mask, 'other data']

df.fillna('New')

說明

基本上，您有一個掩碼，並且使用相同的掩碼來過濾兩個數據幀。 這產生相同數量的上兩個結果的dfs給出的描述，並從分配過濾的行‘其他數據’的值df2在同一匹配行df ‘一些數據’

如果一個數據框的行值在另一數據框的列中，則創建一個新列並獲取該索引

問題描述

2 個解決方案

解決方案1
1 2018-09-07 17:52:30

解決方案2
1 2018-09-07 17:54:40

如果一個數據框的行值在另一數據框的列中，則創建一個新列並獲取該索引

問題描述

2 個解決方案

解決方案1 1 2018-09-07 17:52:30

解決方案2 1 2018-09-07 17:54:40

解決方案1
1 2018-09-07 17:52:30

解決方案2
1 2018-09-07 17:54:40