繁体   English   中英

在文件中搜索列表中的单词

[英]Searching a file for words from a list

我正在尝试搜索文件中的单词。 这些单词存储在单独的列表中。 找到的单词存储在另一个列表中,最后返回该列表。

代码如下:

def scanEducation(file):
    education = []
    qualities = ["python", "java", "sql", "mysql", "sqlite", "c#", "c++", "c", "javascript", "pascal",
             "html", "css", "jquery", "linux", "windows"]
    with open("C:\Users\Vadim\Desktop\Python\New_cvs\\" + file, 'r') as file1:
    for line in file1:
        for word in line.split():
            matching = [s for s in qualities if word.lower() in s]
            if matching is not None:
                education.append(matching)
return education

首先,它返回一个带有一堆空“位子”的列表,这意味着我的比较不起作用?

结果(扫描4个文件):

"C:\Program Files (x86)\Python2\python.exe" C:/Users/Vadim/PycharmProjects/TestFiles/ReadTXT.py
[[], [], [], [], [], [], [], [], [], ['java', 'javascript']]
[[], [], [], [], [], [], [], [], [], ['pascal']]
[[], [], [], [], [], [], [], [], [], ['linux']]
[[], [], [], [], [], [], [], [], [], [], ['c#']]

Process finished with exit code 0

输入文件包含:

Name: Some Name
Phone: 1234567890
email: some@email.com
python,excel,linux

第二期,每个文件包含3种不同的技能,但该功能只能找到1或2。这是否还是比较不好?还是我这里有其他错误?

我希望结果是只列出找到的技能,而不留空位,并在文件中找到所有技能,而不仅仅是其中一些。

编辑 :当我执行word.split(', ')时,该函数的确找到了所有技能word.split(', ')但是如果我希望它变得更通用,那么如果我word.split(', ')知道该怎么做的话,找到这些技能的好方法是什么分开他们?

您会得到空列表,因为None不等于空列表。 您可能需要将条件更改为以下内容:

if matching:
    # do your stuff

似乎您正在检查质量列表中的字符串中是否存在子字符串。 可能不是您想要的。 如果要检查质量列表中出现的一行中的单词,则可能需要将列表理解更改为:

words = line.split()
match = [word for word in words if word.lower() in qualities]

如果你正在寻找到两者匹配,和空格,你可能想看看正则表达式。 请参阅使用多个分隔符分割字符串?

该代码应编写如下(如果我正确理解所需的输出格式):

def scanEducation(file):
    education = []
    qualities = ["python", "java", "sql", "mysql", "sqlite", "c#", "c++", "c", "javascript", "pascal",
             "html", "css", "jquery", "linux", "windows"]
    with open("C:\Users\Vadim\Desktop\Python\New_cvs\\" + file, 'r') as file1:
    for line in file1:
        matching = []
        for word.lower() in line.strip().split(","):
            if word in qualities:
                matching.append(word)
        if len(matching) != 0:
            education.append(matching)
return education

首先,由于条件定义不正确,您将获得一堆“空座位”。 如果匹配为空列表,则不是“无”。 也就是说: [] is not NoneTrue 这就是为什么您要获得所有这些“空座位”的原因。

第二,列表理解中的条件也不是您想要的。 除非我在这里误解了您的目标,否则您要寻找的条件是:

[s for s in qualities if word.lower() == s]

这将检查质量列表,并且仅当单词是质量之一时才返回不为空的列表。 但是,由于此列表的长度始终为1(如果有匹配项)或0(如果没有),我们可以使用python内置的any()函数将其交换为布尔值:

if any(s == word.lower() for s in qualities):
    education.append(word)

希望对您有所帮助,请随时提出任何后续问题,如果我误解了您的目标,请告诉我。

为了方便起见,以下是我用来检查自己的修改过的来源:

def scanEducation(file):
    education = []
    qualities = ["python", "java", "sql", "mysql", "sqlite", "c#", "c++", "c", "javascript", "pascal",
             "html", "css", "jquery", "linux", "windows"]
    with open(file, 'r') as file1:
        for line in file1:
            for word in line.split():
                if any(s == word.lower() for s in qualities):
                    education.append(word)
    return education

您还可以使用如下正则表达式:

def scan_education(file_name):
    education = []
    qualities_list = ["python", "java", "sql", "mysql", "sqlite", "c\#", "c\+\+", "c", "javascript", "pascal",
                      "html", "css", "jquery", "linux", "windows"]
    qualities = re.compile(r'\b(?:%s)\b' % '|'.join(qualities_list))
    for line in open(file_name, 'r'):
        education += re.findall(qualities, line.lower())
    return list(set(education))

这是一个使用集合和一些列表理解过滤的简短示例,以查找文本文件(或我仅使用文本字符串)和您提供的列表之间的常用词。 这比尝试使用循环更快,更清晰。

import string

try:
    with open('myfile.txt') as f:
        text = f.read()
except:
    text = "harry met sally; the boys went to the park.  my friend is purple?"

my_words = set(("harry", "george", "phil", "green", "purple", "blue"))

text = ''.join(x for x in text if x in string.ascii_letters or x in string.whitespace)

text = set(text.split()) # split on any whitespace

common_words = my_words & text # my_words.intersection(text) also does the same

print common_words

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM