[英]Python: General CSV file parsing and manipulation
The purpose of my Python script is to compare the data present in multiple CSV files, looking for discrepancies. 我的Python脚本的目的是比较多个CSV文件中存在的数据,寻找差异。 The data are ordered, but the ordering differs between files.
数据是有序的,但文件之间的顺序不同。 The files contain about 70K lines, weighing around 15MB.
这些文件包含大约70K行,重约15MB。 Nothing fancy or hardcore here.
没有什么花哨或硬核在这里。 Here's part of the code:
这是代码的一部分:
def getCSV(fpath):
with open(fpath,"rb") as f:
csvfile = csv.reader(f)
for row in csvfile:
allRows.append(row)
allCols = map(list, zip(*allRows))
csv.reader
, but would I benefit from using csv.DictReader
? csv.reader
,但是我会从使用csv.DictReader
受益吗? This should work, you don't need to make another list to have access to the columns. 这应该有效,您不需要创建另一个列表来访问列。
import csv
import sys
def getCSV(fpath):
with open(fpath) as ifile:
csvfile = csv.reader(ifile)
rows = list(csvfile)
value_20 = [x for x in rows if x[20] == 'value']
Are you sure you want to be keeping all rows around? 你确定要保留所有行吗? This creates a list with matching values only...
fname
could also come from glob.glob()
or os.listdir()
or whatever other data source you so choose. 这将创建一个仅包含匹配值的列表...
fname
也可以来自glob.glob()
或os.listdir()
或您选择的任何其他数据源。 Just to note, you mention the 20th column, but row[20] will be the 21st column... 需要注意的是,你提到了第20栏,但第[20]行将是第21列......
import csv
matching20 = []
for fname in ('file1.csv', 'file2.csv', 'file3.csv'):
with open(fname) as fin:
csvin = csv.reader(fin)
next(csvin) # <--- if you want to skip header row
for row in csvin:
if row[20] == 'value':
matching20.append(row) # or do something with it here
You only want csv.DictReader
if you have a header row and want to access your columns by name. 如果您有标题行并希望按名称访问列,则只需要
csv.DictReader
。
If I understand the question correctly, you want to include a row if value
is in the row, but you don't know which column value
is, correct? 如果我正确理解了这个问题,如果
value
在行中,你想要包含一行,但是你不知道哪个列value
是正确的?
If your rows are lists, then this should work: 如果您的行是列表,那么这应该工作:
testlist = [row for row in allRows if 'value' in row]
post-edit: 后期编辑:
If, as you say, you want a list of rows where value
is in a specified column (specified by an integer pos
, then: 如果,如您所说,您想要一个
value
列在指定列中的行列表(由整数pos
指定,则:
testlist = []
pos = 20
for row in allRows:
testlist.append([element if index != pos else 'value' for index, element in enumerate(row)])
(I haven't tested this, but let me now if that works). (我没有对此进行过测试,但如果有效,请告诉我)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.