[英]extract rows and filenames from multiple csv files
I have multiple csv files with date as filename (20080101.csv to 20111031.csv) in a folder. 我在文件夹中有多个日期为文件名(20080101.csv至20111031.csv)的csv文件。 The csv files have common headers.
csv文件具有公共头。 The csv file looks like this:
csv文件如下所示:
20080101.csv
X ;Y; Z
1 ; 1 ; 3
1 ; 2 ; 6
1 ; 3 ; 24
2 ; 1 ; 24
2 ; 2 ; 24
20080102.csv
X ;Y; Z
1 ; 1 ; 0.1
1 ; 2 ; 2
1 ; 3 ; 67
2 ; 1 ; 24
2 ; 2 ; 24
20080103.csv
X ;Y; Z
1 ; 1 ; 3
1 ; 3 ; 24
2 ; 1 ; 24
2 ; 2 ; 24
20080104.csv
X ;Y; Z
1 ; 1 ; 34
1 ; 2 ; 23
1 ; 3 ; 67
2 ; 1 ; 24
2 ; 2 ; 24
… and so on. … 等等。 I want to write a script that would read the rows and if in a given row we have X=1 and Y=2, the whole row is copied to a new csv file along with filename giving the following output:
我想编写一个脚本来读取行,如果在给定行中我们有X = 1和Y = 2,则将整个行与文件名一起复制到新的csv文件中,并提供以下输出:
X ;Y ; Z ; filename
1 ; 2 ; 6 ; 20080101
1 ; 2 ; 2 ; 20080102
1 ; 2 ; NA; 20080103
1 ; 2 ; 23; 20080104
Any idea how this can be done and any suggestions about modules that i should look into or any examples. 关于如何完成此操作的任何想法,以及我应该研究的有关模块的任何建议或任何示例。 Thanks for your time and help.
感谢您的时间和帮助。
Cheers, Navin 干杯,纳文
This is a well-formed question, from which the logic should be apparent. 这是一个格式正确的问题,逻辑应该从中显而易见。 For someone to provide finished code would defeat the purpose of the assignment.
为某人提供完成的代码会破坏分配的目的。 First, add a "homework" tag to the question, then think about what you want to do: 1) loop over the files (keeping track of each filename as it's opened) 2) read lines from the current file 3) if the selection criteria (x==1 and y==2) is met, then write the line.
首先,在问题中添加“作业”标签,然后考虑要做什么:1)循环遍历文件(跟踪每个文件名的打开状态)2)从当前文件中读取行3)如果选择满足条件(x == 1和y == 2),然后写一行。
To get started, try: 要开始使用,请尝试:
import csv, os
for fn in os.listdir():
if ".csv" in fn:
with open(fn, 'r', newline='') as f:
reader = csv.reader(f, delimiter=";")
for row in reader:
...
Then extend the solution to open the output file and write the selected lines using csv.writer. 然后扩展解决方案以打开输出文件,并使用csv.writer写入选定的行。
You could read in each file at a time. 您可以一次读入每个文件。 Read it line by line
逐行阅读
files = ['20080101.csv', '20080102.csv', '20080103.csv'] #...etc
for f in files:
file = open(f, 'r')
for line in file:
ray = line.split(';')
if (ray[0].strip() == '1' and ray[1].strip() == '2'):
fout = open('output.csv', 'a')
fout.write(ray[0].strip() + ' ; ' + ray[1].strip() + ' ; ' + ray[2].strip() + ' ; ' + f + '\n')
fout.close()
file.close()
Tested and works. 经过测试和工作。 May need some slight modifications.
可能需要稍作修改。
This should do the job: 这应该可以完成以下工作:
import glob
import os
outfile = open('output.csv', 'w')
outfile.write('X ; Y ; Z ; filename\n')
for filename in glob.glob('*.csv'):
if filename == 'output.csv': # Skip the file we're writing.
continue
with open(filename, 'r') as infile:
count = 0
lineno = 0
for line in infile:
lineno += 1
if lineno == 1: # Skip the header line.
continue
fields = line.split(';')
x = int(fields[0])
y = int(fields[1])
z = float(fields[2])
if x == 1 and y == 2:
outfile.write('%d ; %d ; %g ; %s\n' % (x, y, z, filename))
count += 1
if count == 0: # Handle the case when no lines were found.
outfile.write('1 ; 2 ; NA ; %s\n' % filename)
outfile.close()
Note that if you can't control or trust the file format you may want to handle exceptions thrown by the conversions to int/float. 请注意,如果您无法控制或信任文件格式,则可能要处理由转换为int / float引发的异常。
if you know that you have one file for each day, no missing day, then i'd use glob('*.csv') to get list of file names, open one bye one, then read like Tyler is doing 如果您知道每天都有一个文件,没有丢失的日子,那么我会使用glob('*。csv')来获取文件名列表,再开一个,然后像泰勒所做的那样阅读
if you konw that there are days where file is missing i'd use datetime to star with datetime.date(2008,1,1) and loop incrementing by one day. 如果您知道有文件丢失的日子,我将使用datetime与datetime.date(2008,1,1)一起显示,并循环增加一天。 then for each of day i compose file name using .strftime() + '.csv', and try process file (if no file, just write a recode with NA)
然后我每天都使用.strftime()+'.csv'撰写文件名,然后尝试处理文件(如果没有文件,只需用NA写一个重新编码)
The following should work: 以下应该工作:
import csv
with open('output.csv', 'w') as outfile:
outfile.write('X ; Y ; Z ; filename\n')
fmt = '1 ; 2 ; %s ; %s\n'
files = ['20080101.csv', '20080102.csv', '20080103.csv', '20080104.csv']
for file in files:
with open(file) as f:
reader = csv.reader(f, delimiter=';')
for row in reader:
if len(row) > 2 and row[0].strip() == '1' and row[1].strip() == '2':
outfile.write(fmt % (row[2].strip(), file[:-4]))
break
else:
outfile.write(fmt % ('NA', file[:-4]))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.