如何查找字符串是否在數據框特定列的列表中？

Question

我有 2 個要相互比較的大型數據框。 我有.split(" ")列之一，並將結果放在數據框的新列中。 我現在想檢查並查看該新列中是否存在一個值，而不是在原始列中使用.contains() ，以避免在一個單詞中提取該值。

這是我嘗試過的方法以及為什么我感到沮喪。

row['company'][i] == 'nom'

L_df['Name split'][7126853] == "['nom', '[this', 'is', 'nom]']"

row['company'][i] in L_df['Name split'][7126853] == True   (this is the index where I know the specific value occurs)

row['company'][i] in L_df['Name split'] #WHAAT? == False (my attempt to check the entire column); why is this false when I've shown it exists?

L_df[L_df['Name split'].isin([row['company'][i]])] == [empty]

編輯：我還應該補充一點，我正在嘗試建立一個過程，我可以在其中迭代檢查較小數據集中的條目與較大數據集中的條目。

result = L_df[ #The [9] is a placeholder for our iterable 'i' that will go row by row
    L_df['Company name'].str.contains(row['company'][i], na=False) #Can be difficult with names like 'Nom'
    #(row['company'][i] in L_df['Name split'])
    & L_df['Industry'].str.contains('marketing', na=False) #Unreliable currently, need to get looser matches; min. reduction
    & L_df['Locality'].str.contains(row['city'][i], na=False)  #Reliable, but not always great at reducing results
    & ((row['workers'][i] >= L_df['Emp Lower bound']) & (row['workers'][i] <= L_df['Emp Upper bound'])) #Unreliable
]

第一行是我試圖用這個新過程替換的內容，所以當“nom”出現在單詞中間時我沒有得到匹配。

Answer 1

這是一個解決方案，它首先將兩個數據幀合並為一個，然后使用 lambda 來處理感興趣的列。 結果放置在一個新列found ：

df1 = pandas.DataFrame(data={'company': ['findme', 'asdf']})
df2 = pandas.DataFrame(data={'Name split': ["here is a string including findme and then some".split(" "), "something here".split(" ")]})
combined_df = pandas.concat([df1,df2], axis=1)
combined_df['found'] = combined_df.apply(lambda row: row['company'] in row['Name split'], axis=1)

結果：

  company                                         Name split  found
0  findme  [here, is, a, string, including, findme, and, ...   True
1    asdf                                  [something, here]  False

編輯：為了將company列中的每個值與另一個數據框中Name split列中的每個單元格進行比較，並從后一個數據框中訪問整行，我將簡單地遍歷每一列，請參見此處：

df1 = pd.DataFrame(data={'company': ['findme', 'asdf']})
df2 = pd.DataFrame(data={'Name split': ["random text".split(" "), "here is a string including findme and then some".split(" "), "somethingasdfq here".split(" ")], '`another column`': [3, 1, 2]})
for index1, row1 in df1.iterrows():
    for index2, row2 in df2.iterrows():
        if row1['company'] in row2['Name split']:
            # do something here with row2
            print(row2)

可能不是很有效，但如果我們只需要一個 match ，則可以通過在找到匹配項后立即中斷內部循環來改進。

如何查找字符串是否在數據框特定列的列表中？

問題描述

1 個解決方案

解決方案1
0 2020-03-09 21:11:51

如何查找字符串是否在數據框特定列的列表中？

問題描述

1 個解決方案

解決方案1 0 2020-03-09 21:11:51

解決方案1
0 2020-03-09 21:11:51