简体   繁体   中英

I'm trying to find words from a text file in another text file

I built a simple graphical user interface (GUI) with basketball info to make finding information about players easier. The GUI utilizes data that has been scraped from various sources using the 'requests' library. It works well but there is a problem; within my code lies a list of players which must be compared against this scraped data in order for everything to work properly. This means that if I want to add or remove any names from this list, I have to go into my IDE or directly into my code - I need to change this. Having an external text file where all these player names can be stored would provide much needed flexibility when managing them.

#This is how the players list looks in the code.
basketball = ['Adebayo, Bam', 'Allen, Jarrett', 'Antetokounmpo, Giannis' ... #and many others]

#This is how the info in the scrapped file looks like:

Charlotte Hornets,"Ball, LaMelo",Out,"Injury/Illness - Bilateral Ankle, Wrist; Soreness (L Ankle, R Wrist)"
"Hayward, Gordon",Available,Injury/Illness - Left Hamstring; Soreness
"Martin, Cody",Out,Injury/Illness - Left Knee; Soreness
"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
"Okogie, Josh",Questionable,Injury/Illness - Nasal; Fracture,

#The rest of the code is working well, this is the final part where it uses the list to write the players that were found it both files.

with open("freeze.csv",'r') as freeze:
    for word in basketball:
        if word in freeze:
            freeze.write(word)

# Up to this point I get the correct output, but now I need the list 'basketball' in a text file so can can iterate the same way

# I tried differents solutions but none of them work for me

with open('final_G_league.csv') as text,  open('freeze1.csv') as filter_words:
    st = set(map(str.rstrip,filter_words))
    txt = next(text).split()
    out = [word  for word in txt if word not in st]

# This one gives me the first line of the scrapped text

import csv

file1 = open("final_G_league.csv",'r')
file2 = open("freeze1.csv",'r')

data_read1= csv.reader(file1)
data_read2 = csv.reader(file2)

# convert the data to a list
data1 = [data for data in data_read1]
data2 = [data for data in data_read2]

for i in range(len(data1)):
    if data1[i] != data2[i]:
        print("Line " + str(i) + " is a mismatch.")
        print(f"{data1[i]} doesn't match {data2[i]}")

file1.close()
file2.close()

#This one returns a list with a bunch of names and a list index error.

file1 = open('final_G_league.csv','r')
file2 = open('freeze_list.txt','r')

list1 = file1.readlines()
list2 = file2.readlines()

for i in list1:
    for j in list2:
        if j in i:

# I also tried the answers in this post:
#https://stackoverflow.com/questions/31343457/filter-words-from-one-text-file-in-another-text-file

Let's assume we have following input files:

freeze_list.txt - comma separated list of filter words (players) enclosed in quotes:

'Adebayo, Bam', 'Allen, Jarrett', 'Antetokounmpo, Giannis', 'Anthony, Cole', 'Anunoby, O.G.', 'Ayton, Deandre',
'Banchero, Paolo', 'Bane, Desmond', 'Barnes, Scottie', 'Barrett, RJ', 'Beal, Bradley', 'Booker, Devin', 'Bridges, Mikal',
'Brown, Jaylen', 'Brunson, Jalen', 'Butler, Jimmy', 'Forbes, Bryn'

final_G_league.csv - scrapped lines that we want to filter, using words from the freeze_list.txt file:

Charlotte Hornets,"Ball, LaMelo",Out,"Injury/Illness - Bilateral Ankle, Wrist; Soreness (L Ankle, R Wrist)"
"Hayward, Gordon",Available,Injury/Illness - Left Hamstring; Soreness
"Martin, Cody",Out,Injury/Illness - Left Knee; Soreness
"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
"Okogie, Josh",Questionable,Injury/Illness - Nasal; Fracture,

I would split the responsibilities of the script in code segments to make it more readable and manageable:

  1. Define constants (later you could make them parameters)
  2. Read filter words from a file
  3. Filter scrapped lines
  4. Dump output to a file

The constants:

FILTER_WORDS_FILE_NAME = "freeze_list.txt"
SCRAPPED_FILE_NAME = "final_G_league.csv"
FILTERED_FILE_NAME = "freeze.csv"

Read filter words from a file:

with open(FILTER_WORDS_FILE_NAME) as filter_words_file:
    filter_words = eval('(' + filter_words_file.read() + ')')

Filter lines from the scrapped file:

matched_lines = []
with open(SCRAPPED_FILE_NAME) as scrapped_file:
    for line in scrapped_file:
        # Check if any of the keywords is found in the line
        for filter_word in filter_words:
            if filter_word in line:
                matched_lines.append(line)
                # stop checking other words for performance and 
                # to avoid sending same line multipe times to the output
                break

Dump filtered lines into a file:

with open(FILTERED_FILE_NAME, "w") as filtered_file:
    for line in matched_lines:
        filtered_file.write(line)

The output freeze.csv after running above segments in a sequence is:

"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,

Suggestion

Not sure why you have chosen to store the filter words in a comma separated list. I would prefer using a plain list of words - one word per line.

freeze_list.txt :

Adebayo, Bam
Allen, Jarrett
Antetokounmpo, Giannis
Butler, Jimmy
Forbes, Bryn

The reading becomes straightforward:

with open(FILTER_WORDS_FILE_NAME) as filter_words_file:
    filter_words = [word.strip() for word in filter_words_file]

The output freeze.csv is the same:

"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,

If file2 is just a list of names and want to extract those rows in first file where the name column matches a name in the list.

Suggest you make the "freeze" file a text file with one-name per line and remove the single quotes from the names then can more easily parse it.

Can then do something like this to match the names from one file against the other.

import csv

# convert the names data to a list
with open("freeze1.txt",'r') as file2:
  names = [s.strip() for s in file2]
  print("names:", names)

# next open league data and extract rows with matching names
with open("final_G_league.csv",'r') as file1:
  reader = csv.reader(file1)
  next(reader) # skip header
  for row in reader:
    if row[0] in names:
      # print matching name that matches
      print(row[0])

If names don't match exactly as appears in the final_G_league file then may need to adjust accordingly such as doing a case-insensitive match or normalizing names (last, first vs first last), etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM