繁体   English   中英

我正在尝试从另一个文本文件中的文本文件中查找单词

[英]I'm trying to find words from a text file in another text file

我构建了一个带有篮球信息的简单图形用户界面 (GUI),以便更轻松地查找有关球员的信息。 GUI 利用使用“请求”库从各种来源抓取的数据。 它运行良好,但有一个问题; 在我的代码中有一个玩家列表,必须将这些玩家列表与这些抓取的数据进行比较,以便一切正常工作。 这意味着如果我想在此列表中添加或删除任何名称,我必须将 go 放入我的 IDE 或直接放入我的代码中 - 我需要更改它。 拥有一个可以存储所有这些玩家名称的外部文本文件将在管理它们时提供非常需要的灵活性。

#This is how the players list looks in the code.
basketball = ['Adebayo, Bam', 'Allen, Jarrett', 'Antetokounmpo, Giannis' ... #and many others]

#This is how the info in the scrapped file looks like:

Charlotte Hornets,"Ball, LaMelo",Out,"Injury/Illness - Bilateral Ankle, Wrist; Soreness (L Ankle, R Wrist)"
"Hayward, Gordon",Available,Injury/Illness - Left Hamstring; Soreness
"Martin, Cody",Out,Injury/Illness - Left Knee; Soreness
"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
"Okogie, Josh",Questionable,Injury/Illness - Nasal; Fracture,

#The rest of the code is working well, this is the final part where it uses the list to write the players that were found it both files.

with open("freeze.csv",'r') as freeze:
    for word in basketball:
        if word in freeze:
            freeze.write(word)

# Up to this point I get the correct output, but now I need the list 'basketball' in a text file so can can iterate the same way

# I tried differents solutions but none of them work for me

with open('final_G_league.csv') as text,  open('freeze1.csv') as filter_words:
    st = set(map(str.rstrip,filter_words))
    txt = next(text).split()
    out = [word  for word in txt if word not in st]

# This one gives me the first line of the scrapped text

import csv

file1 = open("final_G_league.csv",'r')
file2 = open("freeze1.csv",'r')

data_read1= csv.reader(file1)
data_read2 = csv.reader(file2)

# convert the data to a list
data1 = [data for data in data_read1]
data2 = [data for data in data_read2]

for i in range(len(data1)):
    if data1[i] != data2[i]:
        print("Line " + str(i) + " is a mismatch.")
        print(f"{data1[i]} doesn't match {data2[i]}")

file1.close()
file2.close()

#This one returns a list with a bunch of names and a list index error.

file1 = open('final_G_league.csv','r')
file2 = open('freeze_list.txt','r')

list1 = file1.readlines()
list2 = file2.readlines()

for i in list1:
    for j in list2:
        if j in i:

# I also tried the answers in this post:
#https://stackoverflow.com/questions/31343457/filter-words-from-one-text-file-in-another-text-file

假设我们有以下输入文件:

freeze_list.txt - 逗号分隔的过滤词(玩家)列表,用引号括起来:

'Adebayo, Bam', 'Allen, Jarrett', 'Antetokounmpo, Giannis', 'Anthony, Cole', 'Anunoby, O.G.', 'Ayton, Deandre',
'Banchero, Paolo', 'Bane, Desmond', 'Barnes, Scottie', 'Barrett, RJ', 'Beal, Bradley', 'Booker, Devin', 'Bridges, Mikal',
'Brown, Jaylen', 'Brunson, Jalen', 'Butler, Jimmy', 'Forbes, Bryn'

final_G_league.csv - 我们要过滤的废弃行,使用freeze_list.txt文件中的词:

Charlotte Hornets,"Ball, LaMelo",Out,"Injury/Illness - Bilateral Ankle, Wrist; Soreness (L Ankle, R Wrist)"
"Hayward, Gordon",Available,Injury/Illness - Left Hamstring; Soreness
"Martin, Cody",Out,Injury/Illness - Left Knee; Soreness
"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
"Okogie, Josh",Questionable,Injury/Illness - Nasal; Fracture,

我会将脚本的职责拆分为代码段,以使其更具可读性和可管理性:

  1. 定义常量(稍后您可以将它们设为参数)
  2. 从文件中读取过滤词
  3. 过滤报废行
  4. 将 output 转储到文件中

常量:

FILTER_WORDS_FILE_NAME = "freeze_list.txt"
SCRAPPED_FILE_NAME = "final_G_league.csv"
FILTERED_FILE_NAME = "freeze.csv"

从文件中读取过滤词:

with open(FILTER_WORDS_FILE_NAME) as filter_words_file:
    filter_words = eval('(' + filter_words_file.read() + ')')

从报废文件中过滤行:

matched_lines = []
with open(SCRAPPED_FILE_NAME) as scrapped_file:
    for line in scrapped_file:
        # Check if any of the keywords is found in the line
        for filter_word in filter_words:
            if filter_word in line:
                matched_lines.append(line)
                # stop checking other words for performance and 
                # to avoid sending same line multipe times to the output
                break

将过滤后的行转储到文件中:

with open(FILTERED_FILE_NAME, "w") as filtered_file:
    for line in matched_lines:
        filtered_file.write(line)

按顺序运行上述段后的 output freeze.csv是:

"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,

建议

不确定您为什么选择将过滤词存储在逗号分隔列表中。 我更喜欢使用简单的单词列表——每行一个单词。

freeze_list.txt

Adebayo, Bam
Allen, Jarrett
Antetokounmpo, Giannis
Butler, Jimmy
Forbes, Bryn

阅读变得简单明了:

with open(FILTER_WORDS_FILE_NAME) as filter_words_file:
    filter_words = [word.strip() for word in filter_words_file]

output freeze.csv是一样的:

"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,

如果 file2 只是一个名称列表,并且想要提取第一个文件中名称列与列表中的名称匹配的那些行。

建议您将“冻结”文件设为每行一个名称的文本文件,并从名称中删除单引号,然后可以更轻松地解析它。

然后可以执行类似的操作来将一个文件中的名称与另一个文件进行匹配。

import csv

# convert the names data to a list
with open("freeze1.txt",'r') as file2:
  names = [s.strip() for s in file2]
  print("names:", names)

# next open league data and extract rows with matching names
with open("final_G_league.csv",'r') as file1:
  reader = csv.reader(file1)
  next(reader) # skip header
  for row in reader:
    if row[0] in names:
      # print matching name that matches
      print(row[0])

如果名称与 final_G_league 文件中出现的名称不完全匹配,则可能需要进行相应调整,例如进行不区分大小写的匹配或规范化名称(姓氏、名字与姓氏)等。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM