從單詞數組中分類文本

Question

我正在嘗試通過數組中的單詞列表對 dataframe 中的文本進行分類。 如果找到該單詞，則下一列將填充該單詞，否則不應給出

到目前為止的代碼：

Product=['Fish','food','Product','Expensive','cheap','expensive','seafood','ice cream','delicious','taste','smell','selection','price','grilled']
df=pd_read_csv("text.csv")
df['classify']=""
for i in range(len(df)): 
  paragraph=df[i]
  count = Counter(paragraph.split())

  pos = 0
  for key, val in count.items():
    key = key.rstrip('.,?!\n') # removing possible punctuation signs
    if key in positive:
       df['classify'][i]=key

期望的結果：

Text                               Classify
"The food is bad"                  food
"He parked the car"                none

任何幫助將不勝感激！

Answer 1

這應該有效：

import pandas as pd
Product=['Fish','food','Product','Expensive','cheap','expensive','seafood','ice cream','delicious','taste','smell','selection','price','grilled']
df=pd.DataFrame({'Text':["The food is bad", "He parked the car"]})

def classify(text):
    for i in Product:
        if i in ''.join(text.values).split():
            return i
    return None

df['classify']=df.apply(classify, axis=1)

Output：

                Text classify
0    The food is bad     food
1  He parked the car     None

Answer 2

您應該像這樣創建 function ：

def classify(classification_list, text, data_id):
    for check_word in classification_list:
        if check_word.lower() in text.lower():
            df['classify'][data_id] = check_word
            break
        else:
            df['classify'][data_id] = None

和用法：

products=['Fish','food','Product','Expensive','cheap','expensive','seafood','ice cream','delicious','taste','smell','selection','price','grilled']

for data_id in range(0, len(df)):
    classify(products, df['text'][data_id], data_id)

最后你會得到這樣的 DataFrame：

>>> df
                text classify
0    The food is bad     food
1  He parked the car     None

從單詞數組中分類文本

問題描述

2 個解決方案

解決方案1
0 已采納 2019-11-19 15:19:27

解決方案2
0 2019-11-19 15:40:43

從單詞數組中分類文本

問題描述

2 個解決方案

解決方案1 0 已采納 2019-11-19 15:19:27

解決方案2 0 2019-11-19 15:40:43

解決方案1
0 已采納 2019-11-19 15:19:27

解決方案2
0 2019-11-19 15:40:43