简体   繁体   中英

When trying to delete rows in csv file using a list as a reference of files you want to keep, it deletes everything instead

I'm trying to delete any rows in my csv file where the rows in column "country" do not match my countryList .

So far it's been running through with no errors except it deletes everything in my document.

    import csv
    countryList = ['Azerbaijan', 'Belarus', 'China', 'Estonia', 'Finland', 'Georgia', 'Kazakhstan', 'Latvia', 'Lithuania', 'Mongolia', 'North Korea', 'Norway', 'Poland', 'Ukraine', 'United States', 'Venezuala']

    file = "C:\\Capstone\\Data\\WIID_30JUN2022_Altered.csv"

    with open(file, "r") as inCountryName:
       csvReader = csv.reader(inCountryName)
       header = next(csvReader)    
       countryIndex = header.index("country")
       with open(file,"w") as outCountryName:
           writer = csv.writer(outCountryName)
           for row in csv.reader(inCountryName):
               name = row[countryIndex]
               for country in countryList:
                   if name!= countryList:
                       writer.writerow(row)

I made a few edits from the suggestions.

I see two issues leading to some problems:

  1. Your file is being deleted, I believe, because you're trying to write to it while it's open for reading. I'm on a Mac, so I actually don't get a deleted file, so I'm only guessing at this point that between Windows and Python, writing to a file that's already open is a problem.

    In general, I recommend against trying to read-and-write in the same pass, and I definitely recommend against writing to what you're reading from (if that's even valid?). If your code is buggy, you could wipe your input. It also makes debugging difficult when you cannot compare the before and after.

    Instead, read in to an intermediate list, then write that list out. Once you're satisfied with the output, move/rename the output file with the OS (using Python if you need) to whatever you want.

  2. Your logic for filtering out countries is wrong; this is what @juanpa.arrivillaga meant they said, "use 'in'", as in:

     if country_name not in countries: filtered_rows.append(row)

I mocked up this small sample input CSV:

country,capital
Belarus,Minsk
Kiribati,South Tarawa 
Marshall Islands,Majuro
United States,Washington D.C.

My code looks pretty different from what you posted: I took the liberty of making variables more Pythonic; and when opening files for use with csv's reader and writer, we need to specify newline="" to avoid mangling valid newlines in the CSV file:

import csv

countries = ["Azerbaijan", "Belarus", "United States"]

filtered_rows = []
with open("input.csv", "r", newline="") as f:
    reader = csv.reader(f)

    header = next(reader)
    filtered_rows.append(header)  # keep header for output

    country_idx = header.index("country")
    for row in reader:
        country_name = row[country_idx]

        if country_name not in countries:
            filtered_rows.append(row)


with open("output.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerows(filtered_rows)

Given that you know the "country" column by name, you might like using csv's DictReader and DictWriter; you can avoid explicitly getting the header and looking for the index of the column:

filtered_rows = []
with open("input.csv", "r", newline="") as f:
    reader = csv.DictReader(f)
    for row in reader:
        if row["country"] not in countries:
            filtered_rows.append(row)


print(filtered_rows)  # a list of dicts, keyed to your column names
# [
#     {'country': 'Kiribati', 'capital': 'South Tarawa '},
#     {'country': 'Marshall Islands', 'capital': 'Majuro'}
# ]


with open("output.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=filtered_rows[0])
    writer.writeheader()
    writer.writerows(filtered_rows)

You need to pass some kind of iterable to fieldnames=, and the first row (a dict) of the filtered rows works fine for that.

Going back to Issue #1, and my recommendation to not try and do too much in a loop, I've even written CSV processors like this before:

countries = ["Azerbaijan", "Belarus", "United States"]

all_rows = []
with open("input.csv", "r", newline="") as f:
    reader = csv.DictReader(f)
    all_rows = list(reader)

filtered_rows = []
for row in all_rows:
    if row["country"] not in countries:
        filtered_rows.append(row)

row1 = filtered_rows[0]
with open("output.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=row1)
    writer.writeheader()
    writer.writerows(filtered_rows)

Just so I can be very clear with myself when I'm modifying the data, as compared to just trying to read it in or write it out: it's exceptionally clear what's going. If that helps. Good luck.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM