使用CSV模型从较大文件中提取特定行的文本

Question

So I'm extracting the lines that I want from this larger file using this program: 因此，我要使用此程序从这个较大的文件中提取所需的行：

import csv

name = ['NAMETHEFIRST,' 'NAMEANOTHERNAME ']
data = csv.reader(open('C:\\bigfile.csv'))

with open('C:\\smalldataset.xcl','w') as outf:
    csv.writer(outf).writerows(l for l in data if l[0] in name)

The program runs. 该程序运行。 However I am only getting the line of data from NAMETHEFIRST and I get no data from NAMETHEOTHERNAME written to my small dataset file. 但是我只得到来自数据线NAMETHEFIRST和我从没有数据NAMETHEOTHERNAME写给我的小数据集文件。 This works exactly as I want printing all relevant info from the large data set of the line of data for NAME THE FIRST but i get no information from the second nametheother name written to the smaller file. 这与我要从NAME THE FIRST的数据行的大数据集中打印所有相关信息的方式完全一样，但是我从第二个名称（另一个写入较小文件的名称）中没有得到任何信息。 Why isn't this working? 为什么这不起作用？

Answer 1

This is a list with one string: 这是一个带有一个字符串的列表：

['NAMETHEFIRST,' 'NAMEANOTHERNAME ']

This is a list with two strings: 这是一个包含两个字符串的列表：

['NAMETHEFIRST', 'NAMEANOTHERNAME ']

Note the placement of the comma. 请注意逗号的位置。

Also note that your second string has a space at the end. 另请注意，第二个字符串的末尾有一个空格。

Answer 2

This line of code 这行代码

name = ['NAMETHEFIRST,' 'NAMEANOTHERNAME ']

is equivalent to 相当于

name = ['NAMETHEFIRST,NAMEANOTHERNAME ']

because Python follows C in concatenating adjacent string constants at compile time. 因为Python在编译时遵循C来连接相邻的字符串常量。

You say """I am only getting the line of data from NAMETHEFIRST and I get no data from NAMETHEOTHERNAME written to my small dataset file""" -- however the code that you show will NOT produce that result ; 您说“”“”我仅从NAMETHEFIRST获取数据行，而从NAMETHEOTHERNAME却没有数据写入我的小型数据集文件“”“ –但是，您显示的代码不会产生该结果 ； it will select only lines that start with 它只会选择以开头的行

"NAMETHEFIRST,NAMEANOTHERNAME ",

You will get the stated result only if that line is actually: 仅当该行实际上是：

name = ['NAMETHEFIRST', 'NAMEANOTHERNAME ']

and that is presumably because the second name in the file doesn't have a trailing space as above. 大概是因为文件中的第二个名字没有如上所述的尾随空格。

Other problems: 其他问题：

csv.writer(outf).writerows(l for l in data if l[0] in name) is trying to be a bit too clever. csv.writer(outf).writerows(l for l in data if l[0] in name)试图变得太聪明了。 If you break it down into bite-size chunks, you can much more easily use a debugger or just print statements to show you what is actually happening. 如果将其分解为一口大小的块，则可以更轻松地使用调试器或仅打印语句以显示实际情况。

Try this: 尝试这个：

print len(name), name
data = csv.reader(open('C:\\bigfile.csv', 'rb')) # ALWAYS open csv files in BINARY mode
with open('C:\\smalldataset.xcl','wb') as outf: # ALWAYS open csv files in BINARY mode
    writer = csv.writer(outf)
    for row_index, row in enumerate (data): # don't use 'l' as a variable name
        print row_index + 1, row
        if row[0] in name:
            writer.writerow(row)

使用CSV模型从较大文件中提取特定行的文本

问题描述

2 个解决方案

解决方案1
1 已采纳 2010-07-20 17:13:23

解决方案2
1 2010-07-20 22:01:22

使用CSV模型从较大文件中提取特定行的文本

问题描述

2 个解决方案

解决方案1 1 已采纳 2010-07-20 17:13:23

解决方案2 1 2010-07-20 22:01:22

解决方案1
1 已采纳 2010-07-20 17:13:23

解决方案2
1 2010-07-20 22:01:22