在文件中搜索列表中的單詞

Question

我正在嘗試搜索文件中的單詞。 這些單詞存儲在單獨的列表中。 找到的單詞存儲在另一個列表中，最后返回該列表。

代碼如下：

def scanEducation(file):
    education = []
    qualities = ["python", "java", "sql", "mysql", "sqlite", "c#", "c++", "c", "javascript", "pascal",
             "html", "css", "jquery", "linux", "windows"]
    with open("C:\Users\Vadim\Desktop\Python\New_cvs\\" + file, 'r') as file1:
    for line in file1:
        for word in line.split():
            matching = [s for s in qualities if word.lower() in s]
            if matching is not None:
                education.append(matching)
return education

首先，它返回一個帶有一堆空“位子”的列表，這意味着我的比較不起作用？

結果（掃描4個文件）：

"C:\Program Files (x86)\Python2\python.exe" C:/Users/Vadim/PycharmProjects/TestFiles/ReadTXT.py
[[], [], [], [], [], [], [], [], [], ['java', 'javascript']]
[[], [], [], [], [], [], [], [], [], ['pascal']]
[[], [], [], [], [], [], [], [], [], ['linux']]
[[], [], [], [], [], [], [], [], [], [], ['c#']]

Process finished with exit code 0

輸入文件包含：

Name: Some Name
Phone: 1234567890
email: some@email.com
python,excel,linux

第二期，每個文件包含3種不同的技能，但該功能只能找到1或2。這是否還是比較不好？還是我這里有其他錯誤？

我希望結果是只列出找到的技能，而不留空位，並在文件中找到所有技能，而不僅僅是其中一些。

編輯：當我執行word.split(', ')時，該函數的確找到了所有技能word.split(', ')但是如果我希望它變得更通用，那么如果我word.split(', ')知道該怎么做的話，找到這些技能的好方法是什么分開他們？

Answer 1

您會得到空列表，因為None不等於空列表。 您可能需要將條件更改為以下內容：

if matching:
    # do your stuff

似乎您正在檢查質量列表中的字符串中是否存在子字符串。 可能不是您想要的。 如果要檢查質量列表中出現的一行中的單詞，則可能需要將列表理解更改為：

words = line.split()
match = [word for word in words if word.lower() in qualities]

如果你正在尋找到兩者匹配,和空格，你可能想看看正則表達式。 請參閱使用多個分隔符分割字符串？ 。

Answer 2

該代碼應編寫如下（如果我正確理解所需的輸出格式）：

def scanEducation(file):
    education = []
    qualities = ["python", "java", "sql", "mysql", "sqlite", "c#", "c++", "c", "javascript", "pascal",
             "html", "css", "jquery", "linux", "windows"]
    with open("C:\Users\Vadim\Desktop\Python\New_cvs\\" + file, 'r') as file1:
    for line in file1:
        matching = []
        for word.lower() in line.strip().split(","):
            if word in qualities:
                matching.append(word)
        if len(matching) != 0:
            education.append(matching)
return education

Answer 3

首先，由於條件定義不正確，您將獲得一堆“空座位”。 如果匹配為空列表，則不是“無”。 也就是說： [] is not None為True 。 這就是為什么您要獲得所有這些“空座位”的原因。

第二，列表理解中的條件也不是您想要的。 除非我在這里誤解了您的目標，否則您要尋找的條件是：

[s for s in qualities if word.lower() == s]

這將檢查質量列表，並且僅當單詞是質量之一時才返回不為空的列表。 但是，由於此列表的長度始終為1（如果有匹配項）或0（如果沒有），我們可以使用python內置的any()函數將其交換為布爾值：

if any(s == word.lower() for s in qualities):
    education.append(word)

希望對您有所幫助，請隨時提出任何后續問題，如果我誤解了您的目標，請告訴我。

為了方便起見，以下是我用來檢查自己的修改過的來源：

def scanEducation(file):
    education = []
    qualities = ["python", "java", "sql", "mysql", "sqlite", "c#", "c++", "c", "javascript", "pascal",
             "html", "css", "jquery", "linux", "windows"]
    with open(file, 'r') as file1:
        for line in file1:
            for word in line.split():
                if any(s == word.lower() for s in qualities):
                    education.append(word)
    return education

Answer 4

您還可以使用如下正則表達式：

def scan_education(file_name):
    education = []
    qualities_list = ["python", "java", "sql", "mysql", "sqlite", "c\#", "c\+\+", "c", "javascript", "pascal",
                      "html", "css", "jquery", "linux", "windows"]
    qualities = re.compile(r'\b(?:%s)\b' % '|'.join(qualities_list))
    for line in open(file_name, 'r'):
        education += re.findall(qualities, line.lower())
    return list(set(education))

Answer 5

這是一個使用集合和一些列表理解過濾的簡短示例，以查找文本文件（或我僅使用文本字符串）和您提供的列表之間的常用詞。 這比嘗試使用循環更快，更清晰。

import string

try:
    with open('myfile.txt') as f:
        text = f.read()
except:
    text = "harry met sally; the boys went to the park.  my friend is purple?"

my_words = set(("harry", "george", "phil", "green", "purple", "blue"))

text = ''.join(x for x in text if x in string.ascii_letters or x in string.whitespace)

text = set(text.split()) # split on any whitespace

common_words = my_words & text # my_words.intersection(text) also does the same

print common_words

在文件中搜索列表中的單詞

問題描述

5 個解決方案

解決方案1
1 已采納 2016-09-25 07:56:22

解決方案2
1 2016-09-25 07:56:43

解決方案3
1 2016-09-25 07:59:43

解決方案4
1 2016-09-25 08:29:39

解決方案5
1 2016-09-25 08:32:27

在文件中搜索列表中的單詞

問題描述

5 個解決方案

解決方案1 1 已采納 2016-09-25 07:56:22

解決方案2 1 2016-09-25 07:56:43

解決方案3 1 2016-09-25 07:59:43

解決方案4 1 2016-09-25 08:29:39

解決方案5 1 2016-09-25 08:32:27

解決方案1
1 已采納 2016-09-25 07:56:22

解決方案2
1 2016-09-25 07:56:43

解決方案3
1 2016-09-25 07:59:43

解決方案4
1 2016-09-25 08:29:39

解決方案5
1 2016-09-25 08:32:27