去除python中包含“#”、“@”、“:”等符號的單詞

Question

我這學期剛開始學習 Python 編碼，我們得到了一些復習練習。 但是我被困在其中一個問題上。 給出的文本文件是 2016 年美國大選的推文。示例如下：

I wish they would show out takes of Dick Cheney #GOPdebates
Candidates went after @HillaryClinton 32 times in the #GOPdebate-but remained silent about the issues that affect us. 
It seems like Ben Carson REALLY doesn't want to be there. #GOPdebates
RT @ColorOfChange: Or better said: #KKKorGOP #GOPDebate

該問題要求我編寫一個 Python 程序，該程序從文件 tweets.txt 中讀取。 請記住，每一行都包含一條推文。 對於每條推文，您的程序應刪除任何長度少於 8 個字符的單詞，以及任何包含 hash (#)、at (@) 或冒號 (:) 字符的單詞。 我現在擁有的：

for line in open("tweets.txt"):
  aline=line.strip()
  words=aline.split()
  length=len(words)
  remove=['#','@',':']
  for char in words:
    if "#" in char:
      char=''
    if "@" in char:
      char=''
    if ":" in char:
      char=''

這不起作用，結果列表仍然包含@、# 或：。 任何幫助表示贊賞！ 謝謝！

Answer 1

在循環中分配char=''不會更改或刪除列表中的實際 char （實際上是一個單詞），它只是為變量char分配了一個不同的值。

相反，您可以使用列表理解/生成器表達式來過濾滿足條件的單詞。

>>> tweet = "Candidates went after @HillaryClinton 32 times in the #GOPdebate-but remained silent about the issues that affect us."
>>> [w for w in tweet.split() if not any(c in w for c in "#@:") and len(w) >= 8]
['Candidates', 'remained']

可選地，使用' '.join(...)將剩余的單詞連接回“句子”，盡管這可能沒有太大意義。

Answer 2

使用此代碼。

import re
tweet=re.sub(r'#', '',tweet )
tweet=re.sub(r'@', '',tweet )
tweet=re.sub(r':', '',tweet )

Answer 3

下面將打開文件（在處理文件時通常最好使用“with open”），遍歷所有行並使用翻譯刪除“#@:”。 然后刪除少於 8 個字符的單詞，得到 output“new_line”。

with open('tweets.txt') as rf:
    for sentence in rf:
        line = sentence.strip()
        line = line.translate({ord(i): None for i in '#@:'})
        line = line.split()
        new_line = [ word for word in line if len(word) >= 8 ]
        print(new_line)

這不是最簡潔的方法，而且肯定有更好的方法來做到這一點，但它可能更容易閱讀和理解，就像您像我一樣剛剛開始學習一樣。

去除python中包含“#”、“@”、“:”等符號的單詞

問題描述

3 個解決方案

解決方案1
0 已采納 2020-10-27 15:04:08

解決方案2
0 2020-10-27 15:14:35

解決方案3
0 2020-10-27 17:09:23

去除python中包含“#”、“@”、“:”等符號的單詞

問題描述

3 個解決方案

解決方案1 0 已采納 2020-10-27 15:04:08

解決方案2 0 2020-10-27 15:14:35

解決方案3 0 2020-10-27 17:09:23

解決方案1
0 已采納 2020-10-27 15:04:08

解決方案2
0 2020-10-27 15:14:35

解決方案3
0 2020-10-27 17:09:23