简体   繁体   English

我正在尝试从另一个文本文件中的文本文件中查找单词

[英]I'm trying to find words from a text file in another text file

I built a simple graphical user interface (GUI) with basketball info to make finding information about players easier.我构建了一个带有篮球信息的简单图形用户界面 (GUI),以便更轻松地查找有关球员的信息。 The GUI utilizes data that has been scraped from various sources using the 'requests' library. GUI 利用使用“请求”库从各种来源抓取的数据。 It works well but there is a problem;它运行良好,但有一个问题; within my code lies a list of players which must be compared against this scraped data in order for everything to work properly.在我的代码中有一个玩家列表,必须将这些玩家列表与这些抓取的数据进行比较,以便一切正常工作。 This means that if I want to add or remove any names from this list, I have to go into my IDE or directly into my code - I need to change this.这意味着如果我想在此列表中添加或删除任何名称,我必须将 go 放入我的 IDE 或直接放入我的代码中 - 我需要更改它。 Having an external text file where all these player names can be stored would provide much needed flexibility when managing them.拥有一个可以存储所有这些玩家名称的外部文本文件将在管理它们时提供非常需要的灵活性。

#This is how the players list looks in the code.
basketball = ['Adebayo, Bam', 'Allen, Jarrett', 'Antetokounmpo, Giannis' ... #and many others]

#This is how the info in the scrapped file looks like:

Charlotte Hornets,"Ball, LaMelo",Out,"Injury/Illness - Bilateral Ankle, Wrist; Soreness (L Ankle, R Wrist)"
"Hayward, Gordon",Available,Injury/Illness - Left Hamstring; Soreness
"Martin, Cody",Out,Injury/Illness - Left Knee; Soreness
"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
"Okogie, Josh",Questionable,Injury/Illness - Nasal; Fracture,

#The rest of the code is working well, this is the final part where it uses the list to write the players that were found it both files.

with open("freeze.csv",'r') as freeze:
    for word in basketball:
        if word in freeze:
            freeze.write(word)

# Up to this point I get the correct output, but now I need the list 'basketball' in a text file so can can iterate the same way

# I tried differents solutions but none of them work for me

with open('final_G_league.csv') as text,  open('freeze1.csv') as filter_words:
    st = set(map(str.rstrip,filter_words))
    txt = next(text).split()
    out = [word  for word in txt if word not in st]

# This one gives me the first line of the scrapped text

import csv

file1 = open("final_G_league.csv",'r')
file2 = open("freeze1.csv",'r')

data_read1= csv.reader(file1)
data_read2 = csv.reader(file2)

# convert the data to a list
data1 = [data for data in data_read1]
data2 = [data for data in data_read2]

for i in range(len(data1)):
    if data1[i] != data2[i]:
        print("Line " + str(i) + " is a mismatch.")
        print(f"{data1[i]} doesn't match {data2[i]}")

file1.close()
file2.close()

#This one returns a list with a bunch of names and a list index error.

file1 = open('final_G_league.csv','r')
file2 = open('freeze_list.txt','r')

list1 = file1.readlines()
list2 = file2.readlines()

for i in list1:
    for j in list2:
        if j in i:

# I also tried the answers in this post:
#https://stackoverflow.com/questions/31343457/filter-words-from-one-text-file-in-another-text-file

Let's assume we have following input files:假设我们有以下输入文件:

freeze_list.txt - comma separated list of filter words (players) enclosed in quotes: freeze_list.txt - 逗号分隔的过滤词(玩家)列表,用引号括起来:

'Adebayo, Bam', 'Allen, Jarrett', 'Antetokounmpo, Giannis', 'Anthony, Cole', 'Anunoby, O.G.', 'Ayton, Deandre',
'Banchero, Paolo', 'Bane, Desmond', 'Barnes, Scottie', 'Barrett, RJ', 'Beal, Bradley', 'Booker, Devin', 'Bridges, Mikal',
'Brown, Jaylen', 'Brunson, Jalen', 'Butler, Jimmy', 'Forbes, Bryn'

final_G_league.csv - scrapped lines that we want to filter, using words from the freeze_list.txt file: final_G_league.csv - 我们要过滤的废弃行,使用freeze_list.txt文件中的词:

Charlotte Hornets,"Ball, LaMelo",Out,"Injury/Illness - Bilateral Ankle, Wrist; Soreness (L Ankle, R Wrist)"
"Hayward, Gordon",Available,Injury/Illness - Left Hamstring; Soreness
"Martin, Cody",Out,Injury/Illness - Left Knee; Soreness
"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
"Okogie, Josh",Questionable,Injury/Illness - Nasal; Fracture,

I would split the responsibilities of the script in code segments to make it more readable and manageable:我会将脚本的职责拆分为代码段,以使其更具可读性和可管理性:

  1. Define constants (later you could make them parameters)定义常量(稍后您可以将它们设为参数)
  2. Read filter words from a file从文件中读取过滤词
  3. Filter scrapped lines过滤报废行
  4. Dump output to a file将 output 转储到文件中

The constants:常量:

FILTER_WORDS_FILE_NAME = "freeze_list.txt"
SCRAPPED_FILE_NAME = "final_G_league.csv"
FILTERED_FILE_NAME = "freeze.csv"

Read filter words from a file:从文件中读取过滤词:

with open(FILTER_WORDS_FILE_NAME) as filter_words_file:
    filter_words = eval('(' + filter_words_file.read() + ')')

Filter lines from the scrapped file:从报废文件中过滤行:

matched_lines = []
with open(SCRAPPED_FILE_NAME) as scrapped_file:
    for line in scrapped_file:
        # Check if any of the keywords is found in the line
        for filter_word in filter_words:
            if filter_word in line:
                matched_lines.append(line)
                # stop checking other words for performance and 
                # to avoid sending same line multipe times to the output
                break

Dump filtered lines into a file:将过滤后的行转储到文件中:

with open(FILTERED_FILE_NAME, "w") as filtered_file:
    for line in matched_lines:
        filtered_file.write(line)

The output freeze.csv after running above segments in a sequence is:按顺序运行上述段后的 output freeze.csv是:

"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,

Suggestion建议

Not sure why you have chosen to store the filter words in a comma separated list.不确定您为什么选择将过滤词存储在逗号分隔列表中。 I would prefer using a plain list of words - one word per line.我更喜欢使用简单的单词列表——每行一个单词。

freeze_list.txt : freeze_list.txt

Adebayo, Bam
Allen, Jarrett
Antetokounmpo, Giannis
Butler, Jimmy
Forbes, Bryn

The reading becomes straightforward:阅读变得简单明了:

with open(FILTER_WORDS_FILE_NAME) as filter_words_file:
    filter_words = [word.strip() for word in filter_words_file]

The output freeze.csv is the same: output freeze.csv是一样的:

"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,

If file2 is just a list of names and want to extract those rows in first file where the name column matches a name in the list.如果 file2 只是一个名称列表,并且想要提取第一个文件中名称列与列表中的名称匹配的那些行。

Suggest you make the "freeze" file a text file with one-name per line and remove the single quotes from the names then can more easily parse it.建议您将“冻结”文件设为每行一个名称的文本文件,并从名称中删除单引号,然后可以更轻松地解析它。

Can then do something like this to match the names from one file against the other.然后可以执行类似的操作来将一个文件中的名称与另一个文件进行匹配。

import csv

# convert the names data to a list
with open("freeze1.txt",'r') as file2:
  names = [s.strip() for s in file2]
  print("names:", names)

# next open league data and extract rows with matching names
with open("final_G_league.csv",'r') as file1:
  reader = csv.reader(file1)
  next(reader) # skip header
  for row in reader:
    if row[0] in names:
      # print matching name that matches
      print(row[0])

If names don't match exactly as appears in the final_G_league file then may need to adjust accordingly such as doing a case-insensitive match or normalizing names (last, first vs first last), etc.如果名称与 final_G_league 文件中出现的名称不完全匹配,则可能需要进行相应调整,例如进行不区分大小写的匹配或规范化名称(姓氏、名字与姓氏)等。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM