如何使用 Python 從 CSV 文件的列中刪除英文單詞

Question

Python 非常新。

問題：我有一個 csv 文件，其中包含帶有字母數字文本的行，我想刪除所有英文單詞。 例如，輸入是：“Steam traps on Steam to 56X-233 Butane Vaporizer”，所需的 output 只是：“56X-233”

答案是否類似於使用 NLTK 刪除停用詞？

謝謝你。

Answer 1

如果您不關心匹配實際單詞，您可以使用正則表達式來匹配其中沒有數字的任何單詞：

import re

def remove_words(line):
    # Remove words containing only letters
    line = re.sub(r"\b[A-Za-z]+\b", "", line)

    # Remove remaining extra spaces
    return re.sub(" +", " ", line).strip()

print(remove_words("Steam traps on Steam to 56X-233 Butane Vaporizer"))

要對整個文件執行此操作，您只需要獲取文件的每一行並在其上運行上述代碼：

with open("my_file.txt") as f:
    for line in f.readlines():
        print(remove_words(line))

如何使用 Python 從 CSV 文件的列中刪除英文單詞

問題描述

1 個解決方案

解決方案1
0 2020-07-21 04:54:48

如何使用 Python 從 CSV 文件的列中刪除英文單詞

問題描述

1 個解決方案

解決方案1 0 2020-07-21 04:54:48

解決方案1
0 2020-07-21 04:54:48