[英]How do I fix the For Loop to return a certain character from a DataFrame?
我已經導入了一個excel文件,並將其放入DataFrame中,然后遍歷“標題”列以吐出帶有某些關鍵字的標題。 我的標題列表為“ match_titles”。 我現在想做的是創建一個For循環,以返回match_titles中每個標題的“ titles”之前的列。”我不確定代碼為什么不起作用,任何幫助將不勝感激。
import pandas as pd
data = pd.read_excel(r'C:\Users\bryanmccormack\Downloads\asin_list.xlsx')
df = pd.DataFrame(data, columns=['Track','Asin','Title'])
excludes = ["Chainsaw", "Diaper pail", "Leaf Blower"]
my_excludes = [set(key_word.lower().split()) for key_word in excludes]
match_titles = [e for e in df.Title if
any(keywords.issubset(e.lower().split()) for keywords in my_excludes)]
a = []
for i in match_titles:
a.append(df['Asin'])
print(a)
在您的for循環中,您將未過濾的列df['Asin']
追加到列表a
次數是match_titles
中的值的match_titles
。 但是沒有對df
任何過濾。
一種解決方案是在match_values
列中創建一列,然后在對該match_values
列進行過濾后,您可以返回Asin
列:
# make a function to perform your match analysis.
def is_match(title, excludes=["Chainsaw", "Diaper pail", "Leaf Blower"]):
my_excludes = [set(key_word.lower().split()) for key_word in excludes]
if any(keywords.issubset(title.lower().split()) for keywords in my_excludes):
return True
return False
# Make a new boolean column for the matches. This applies your
# function to each value in df['Title'] and puts the output in
# the new column.
df['match_titles'] = df['Title'].apply(is_match)
# Filter the df to only matches and return the column you want.
# Because the match_titles column is boolean it can be used as
# an index.
result = df[df['match_titles']]['Asin']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.