如何使用類別中的單詞詞典從文本片段中提取特定單詞？

Question

我想從數據框中的文本中提取特定的單詞。 這些詞我已經輸入到字典的列表中，它們屬於某些類別（鍵）。 從這里我想創建與存儲單詞的類別相對應的列。 與往常一樣，最好通過示例來說明：

我有一個數據框：

df = pd.DataFrame({'Text': ["This car is fast, agile and large and wide", "This wagon is slow, sluggish, small and compact with alloy wheels"]}  )

創建表：

    Text
0   This car is fast, agile and large and wide
1   This wagon is slow, sluggish, small and compact with alloy wheels

以及我想從中提取的類別中的單詞詞典。 這些單詞都是沒有符號的自然語言單詞，並且可以包含短語，例如本例中的“合金車輪””（這不一定是字典，我只是覺得這是最好的方法）：

myDict = {
  "vehicle": ["car", "wagon"],
  "speed": ["fast", "agile", "slow", "sluggish"],
  "size": ["large", "small", "wide", "compact"]
  "feature": ["alloy wheels"]
}

從這里我想創建一個看起來像這樣的表：

|     Text                                                          | vehicle | speed          | size           | feature      |
| ----------------------------------------------------------------- | ------- | -------------- | -------------- | ------------ |
| This car is fast, agile and large and wide                        | car     | fast, agile    | large, wide    | NaN          |
| This wagon is slow, sluggish, small and compact with allow wheels | wagon   | slow, sluggish | small, compact | alloy wheels |

提前為幫助干杯！ 很想使用正則表達式，但歡迎任何解決方案！

Answer 1

有很多方法可以解決這個問題。 我可能開始的一種方法是：定義一個 function 如果它們與您的句子匹配，則返回一個單詞列表。

    def get_matching_words(sentence, category_dict, category):
        
        matching_words = list()

        for word in category_dict[category]:
             if word in sentence.split(" "):
                   matching_words.append(word)

        return matching_words

然后，您想將此 function 應用於您的 pandas dataframe。

    df["vehicle"] = df["Text"].apply(lambda x: get_matching_words(x, "vehicle", my_dict))

    df["speed"] = df["Text"].apply(lambda x: get_matching_words(x, "speed", my_dict))

這里唯一要添加的是將列表連接成一個字符串，而不是返回一個列表。

def get_matching_words(sentence, category_dict, category):
        
        matching_words = list()

        for word in category_dict[category]:
             if word in sentence:
                   matching_words.append(word)

        return ",".join(matching_words)

如何使用類別中的單詞詞典從文本片段中提取特定單詞？

問題描述

1 個解決方案

解決方案1
0 已采納 2021-12-10 15:29:39

如何使用類別中的單詞詞典從文本片段中提取特定單詞？

問題描述

1 個解決方案

解決方案1 0 已采納 2021-12-10 15:29:39

解決方案1
0 已采納 2021-12-10 15:29:39