簡體   English   中英

我正在嘗試從另一個文本文件中的文本文件中查找單詞

[英]I'm trying to find words from a text file in another text file

我構建了一個帶有籃球信息的簡單圖形用戶界面 (GUI),以便更輕松地查找有關球員的信息。 GUI 利用使用“請求”庫從各種來源抓取的數據。 它運行良好,但有一個問題; 在我的代碼中有一個玩家列表,必須將這些玩家列表與這些抓取的數據進行比較,以便一切正常工作。 這意味着如果我想在此列表中添加或刪除任何名稱,我必須將 go 放入我的 IDE 或直接放入我的代碼中 - 我需要更改它。 擁有一個可以存儲所有這些玩家名稱的外部文本文件將在管理它們時提供非常需要的靈活性。

#This is how the players list looks in the code.
basketball = ['Adebayo, Bam', 'Allen, Jarrett', 'Antetokounmpo, Giannis' ... #and many others]

#This is how the info in the scrapped file looks like:

Charlotte Hornets,"Ball, LaMelo",Out,"Injury/Illness - Bilateral Ankle, Wrist; Soreness (L Ankle, R Wrist)"
"Hayward, Gordon",Available,Injury/Illness - Left Hamstring; Soreness
"Martin, Cody",Out,Injury/Illness - Left Knee; Soreness
"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
"Okogie, Josh",Questionable,Injury/Illness - Nasal; Fracture,

#The rest of the code is working well, this is the final part where it uses the list to write the players that were found it both files.

with open("freeze.csv",'r') as freeze:
    for word in basketball:
        if word in freeze:
            freeze.write(word)

# Up to this point I get the correct output, but now I need the list 'basketball' in a text file so can can iterate the same way

# I tried differents solutions but none of them work for me

with open('final_G_league.csv') as text,  open('freeze1.csv') as filter_words:
    st = set(map(str.rstrip,filter_words))
    txt = next(text).split()
    out = [word  for word in txt if word not in st]

# This one gives me the first line of the scrapped text

import csv

file1 = open("final_G_league.csv",'r')
file2 = open("freeze1.csv",'r')

data_read1= csv.reader(file1)
data_read2 = csv.reader(file2)

# convert the data to a list
data1 = [data for data in data_read1]
data2 = [data for data in data_read2]

for i in range(len(data1)):
    if data1[i] != data2[i]:
        print("Line " + str(i) + " is a mismatch.")
        print(f"{data1[i]} doesn't match {data2[i]}")

file1.close()
file2.close()

#This one returns a list with a bunch of names and a list index error.

file1 = open('final_G_league.csv','r')
file2 = open('freeze_list.txt','r')

list1 = file1.readlines()
list2 = file2.readlines()

for i in list1:
    for j in list2:
        if j in i:

# I also tried the answers in this post:
#https://stackoverflow.com/questions/31343457/filter-words-from-one-text-file-in-another-text-file

假設我們有以下輸入文件:

freeze_list.txt - 逗號分隔的過濾詞(玩家)列表,用引號括起來:

'Adebayo, Bam', 'Allen, Jarrett', 'Antetokounmpo, Giannis', 'Anthony, Cole', 'Anunoby, O.G.', 'Ayton, Deandre',
'Banchero, Paolo', 'Bane, Desmond', 'Barnes, Scottie', 'Barrett, RJ', 'Beal, Bradley', 'Booker, Devin', 'Bridges, Mikal',
'Brown, Jaylen', 'Brunson, Jalen', 'Butler, Jimmy', 'Forbes, Bryn'

final_G_league.csv - 我們要過濾的廢棄行,使用freeze_list.txt文件中的詞:

Charlotte Hornets,"Ball, LaMelo",Out,"Injury/Illness - Bilateral Ankle, Wrist; Soreness (L Ankle, R Wrist)"
"Hayward, Gordon",Available,Injury/Illness - Left Hamstring; Soreness
"Martin, Cody",Out,Injury/Illness - Left Knee; Soreness
"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
"Okogie, Josh",Questionable,Injury/Illness - Nasal; Fracture,

我會將腳本的職責拆分為代碼段,以使其更具可讀性和可管理性:

  1. 定義常量(稍后您可以將它們設為參數)
  2. 從文件中讀取過濾詞
  3. 過濾報廢行
  4. 將 output 轉儲到文件中

常量:

FILTER_WORDS_FILE_NAME = "freeze_list.txt"
SCRAPPED_FILE_NAME = "final_G_league.csv"
FILTERED_FILE_NAME = "freeze.csv"

從文件中讀取過濾詞:

with open(FILTER_WORDS_FILE_NAME) as filter_words_file:
    filter_words = eval('(' + filter_words_file.read() + ')')

從報廢文件中過濾行:

matched_lines = []
with open(SCRAPPED_FILE_NAME) as scrapped_file:
    for line in scrapped_file:
        # Check if any of the keywords is found in the line
        for filter_word in filter_words:
            if filter_word in line:
                matched_lines.append(line)
                # stop checking other words for performance and 
                # to avoid sending same line multipe times to the output
                break

將過濾后的行轉儲到文件中:

with open(FILTERED_FILE_NAME, "w") as filtered_file:
    for line in matched_lines:
        filtered_file.write(line)

按順序運行上述段后的 output freeze.csv是:

"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,

建議

不確定您為什么選擇將過濾詞存儲在逗號分隔列表中。 我更喜歡使用簡單的單詞列表——每行一個單詞。

freeze_list.txt

Adebayo, Bam
Allen, Jarrett
Antetokounmpo, Giannis
Butler, Jimmy
Forbes, Bryn

閱讀變得簡單明了:

with open(FILTER_WORDS_FILE_NAME) as filter_words_file:
    filter_words = [word.strip() for word in filter_words_file]

output freeze.csv是一樣的:

"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,

如果 file2 只是一個名稱列表,並且想要提取第一個文件中名稱列與列表中的名稱匹配的那些行。

建議您將“凍結”文件設為每行一個名稱的文本文件,並從名稱中刪除單引號,然后可以更輕松地解析它。

然后可以執行類似的操作來將一個文件中的名稱與另一個文件進行匹配。

import csv

# convert the names data to a list
with open("freeze1.txt",'r') as file2:
  names = [s.strip() for s in file2]
  print("names:", names)

# next open league data and extract rows with matching names
with open("final_G_league.csv",'r') as file1:
  reader = csv.reader(file1)
  next(reader) # skip header
  for row in reader:
    if row[0] in names:
      # print matching name that matches
      print(row[0])

如果名稱與 final_G_league 文件中出現的名稱不完全匹配,則可能需要進行相應調整,例如進行不區分大小寫的匹配或規范化名稱(姓氏、名字與姓氏)等。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM