I have two CSV files, CSV1
and CSV2
, with multiple columns and rows. One of the columns is titled ID
.
What I want to do is check CSV1
's rows and if there is an ID
in CSV2
's ID
column that matches CSV1
's rows ID
, then I want to keep that row. If there's no match, then I want to delete the row from CSV1
.
Basically the data in CSV1
is relevant to me, but only the people that are in CSV2
. The unique connection between the two is the ID
column. So I need to check all the rows in CSV1
to see if that Row ID
is in a row in CSV2
.
Here's what I have so far.
import csv
smarteeCSV = open("Smartee.csv", "r")
aeriesCSV = open("aeriesEditable.csv", "r+")
aeries = csv.reader(aeriesCSV, delimiter=',')##CSV1
smartee = csv.reader(smarteeCSV, delimiter=',')##CSV2
for row in aeries:
for item in smartee
if row[1] != item[1]##indexes for the columns with the ids
I can already tell I'm not on the right track, so can anyone assist?
You could extract all of the ID's in the second file and look those up every time you check one of the rows of the first file.
For example:
# extract ID column from CSV file 2 into a set
Ids = { row[1] for row in smartee }
# pick only rows whose ID is in Ids
filtered_rows = [item for item in aeries if item[1] in Ids]
Depending on what you plan on doing with the relevant rows of data, you might be able to use Python's built-in filter()
function to do what you need:
import csv
# first get the ids
with open('Smartee.csv', 'rb') as smarteeCSV: # CSV2
ids = set(row['ID'] for row in csv.DictReader(smarteeCSV, delimiter=','))
with open('aeriesEditable.csv', 'rb') as aeriesCSV: # CSV1
relevant = filter(lambda row: if row['ID'] in ids,
csv.DictReader(aeriesCSV, delimiter=','))
# relevant will be a list containing the desired rows from CSV1
If you want to iteratively process the rows, for the second part you could use a for
loop over the results of calling the itertools.ifilter()
function similarly instead.
First, read CSV2 to make just a set of IDs:
with open(CSV2) as f:
r = csv.DictReader(f)
theids = set(row['ID'] for row in r)
Then, while reading CSV1, just check if the ID is in the set:
with open(CSV1) as f, open(CSV1 + '.new', 'w') as out:
r = csv.DictReader(r)
w = csv.DictWriter(out, r.fieldnames)
for row in r:
if row['ID'] in theids:
w.writerow(row)
This assumes the CSV files are suitable for dict-based read/write (ie first row is a list of column names), but is easy to tweak if the col names come from other information too.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.