[英]How do I find if a string is in a list in a specific column of a dataframe?
我有 2 個要相互比較的大型數據框。 我有.split(" ")
列之一,並將結果放在數據框的新列中。 我現在想檢查並查看該新列中是否存在一個值,而不是在原始列中使用.contains()
,以避免在一個單詞中提取該值。
這是我嘗試過的方法以及為什么我感到沮喪。
row['company'][i] == 'nom'
L_df['Name split'][7126853] == "['nom', '[this', 'is', 'nom]']"
row['company'][i] in L_df['Name split'][7126853] == True (this is the index where I know the specific value occurs)
row['company'][i] in L_df['Name split'] #WHAAT? == False (my attempt to check the entire column); why is this false when I've shown it exists?
L_df[L_df['Name split'].isin([row['company'][i]])] == [empty]
編輯:我還應該補充一點,我正在嘗試建立一個過程,我可以在其中迭代檢查較小數據集中的條目與較大數據集中的條目。
result = L_df[ #The [9] is a placeholder for our iterable 'i' that will go row by row
L_df['Company name'].str.contains(row['company'][i], na=False) #Can be difficult with names like 'Nom'
#(row['company'][i] in L_df['Name split'])
& L_df['Industry'].str.contains('marketing', na=False) #Unreliable currently, need to get looser matches; min. reduction
& L_df['Locality'].str.contains(row['city'][i], na=False) #Reliable, but not always great at reducing results
& ((row['workers'][i] >= L_df['Emp Lower bound']) & (row['workers'][i] <= L_df['Emp Upper bound'])) #Unreliable
]
第一行是我試圖用這個新過程替換的內容,所以當“nom”出現在單詞中間時我沒有得到匹配。
這是一個解決方案,它首先將兩個數據幀合並為一個,然后使用 lambda 來處理感興趣的列。 結果放置在一個新列found
:
df1 = pandas.DataFrame(data={'company': ['findme', 'asdf']})
df2 = pandas.DataFrame(data={'Name split': ["here is a string including findme and then some".split(" "), "something here".split(" ")]})
combined_df = pandas.concat([df1,df2], axis=1)
combined_df['found'] = combined_df.apply(lambda row: row['company'] in row['Name split'], axis=1)
結果:
company Name split found
0 findme [here, is, a, string, including, findme, and, ... True
1 asdf [something, here] False
編輯:為了將company
列中的每個值與另一個數據框中Name split
列中的每個單元格進行比較,並從后一個數據框中訪問整行,我將簡單地遍歷每一列,請參見此處:
df1 = pd.DataFrame(data={'company': ['findme', 'asdf']})
df2 = pd.DataFrame(data={'Name split': ["random text".split(" "), "here is a string including findme and then some".split(" "), "somethingasdfq here".split(" ")], '`another column`': [3, 1, 2]})
for index1, row1 in df1.iterrows():
for index2, row2 in df2.iterrows():
if row1['company'] in row2['Name split']:
# do something here with row2
print(row2)
可能不是很有效,但如果我們只需要一個 match ,則可以通過在找到匹配項后立即中斷內部循環來改進。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.