[英]How to read a specific row of a csv file in python?
I have searched like crazy trying to find specifically how to read a row in a csv file. 我疯狂地搜索,试图专门查找如何读取csv文件中的行。
I need to read a random row out of 1000, each of which has 3 columns. 我需要读取1000行中的随机行,每个行都有3列。 The first column has an email.
第一列有一封电子邮件。 I need to put in a random email, and get columns 2 and 3 out.
我需要放入随机电子邮件,并取出第2列和第3列。 (Python 2.7, csv file)
(Python 2.7,csv文件)
Example: 例:
Name Date Color
Ray May Gray
Alex Apr Green
Ann Jun Blue
Kev Mar Gold
Rob May Black
Instead of column 1 row 3, I need [Ann], her whole row. 我需要[Ann],而不是第1列第3行。 This is a CSV file, with over 1000 names.
这是CSV档案,名称超过1000个。 I have to put in her name and output her whole row.
我必须输入她的名字并输出她的整行。
What I have tried 我尝试过的
from collections import namedtuple
Entry = namedtuple('Entry', 'Name, Date, Color')
file_location = "C:/Users/abriman/Desktop/Book.csv"
ss_dict = {}
spreadsheet = file_location = "C:/Users/abriman/Desktop/Book.csv"
for row in spreadsheet:
entry = Entry(*tuple(row))
ss_dict['Ann']
And my error reads 我的错误读
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
TypeError: __new__() takes exactly 4 arguments (2 given)
I have tried other ways too and got little to no result. 我也尝试了其他方法,但收效甚微。 I'm a beginner at python.
我是python的初学者。
You're on the right track. 您走在正确的轨道上。 First issue: you're never opening the file located at
file_location
. 第一个问题:您永远不会打开位于
file_location
的文件。 Thus, when you iterate for row in spreadsheet:
, you're iterating over the characters of spreadsheet
, which are the characters of file_location
, which are the characters of "C:/Users/..."
. 因此,当您迭代
for row in spreadsheet:
,您将遍历spreadsheet
的字符,即file_location
的字符,即"C:/Users/..."
的字符。 So the first thing you want to do is actually open the file: 因此,您要做的第一件事实际上是打开文件:
spreadsheet = open(file_location, 'r')
You still have another issue in your loop. 您的循环中还有另一个问题。 When you iterate over a file in a
for
loop, you get back the lines of the file. 在
for
循环中遍历文件时,您会取回文件的各行。 So, at each iteration, row
will be a line, eg "Ray May Gray"
. 因此,在每次迭代中,
row
是一行,例如"Ray May Gray"
。 When you call tuple()
on that, you're going to get a tuple that looks like ('R', 'a', 'y', ' ', ' ', 'M', ...)
. 当您在其上调用
tuple()
时,将得到一个看起来像('R', 'a', 'y', ' ', ' ', 'M', ...)
。 What you need to do is construct your tuple by splitting on whitespace: 您需要做的是通过在空白处分割来构造元组:
entry = Entry(*row.split())
Then, you need to add your entry to the dictionary ss_dict
: 然后,您需要将条目添加到字典
ss_dict
:
ss_dict[entry.Name] = entry
Finally, you can read out the value of ss_dict['Ann']
, but this should be outside your loop - if you do it inside your loop, you may be trying to read the value of ss_dict['Ann']
before it has been set. 最后,您可以读出
ss_dict['Ann']
,但这应该在循环之外-如果在循环内执行,则可能ss_dict['Ann']
读取ss_dict['Ann']
的值被设置。 All in all, your code should look like this: 总而言之,您的代码应如下所示:
from collections import namedtuple
Entry = namedtuple('Entry', 'Name, Date, Color')
file_location = "C:/Users/abriman/Desktop/Book.csv"
ss_dict = {}
spreadsheet = open(file_location, 'r') # <--
for row in spreadsheet:
entry = Entry(*row.split()) # <--
ss_dict[entry.Name] = entry # <--
print ss_dict['Ann']
Incidentally, the reason you're getting your error message there is that when you do for row in spreadsheet:
with spreadsheet
being a string, row
is just a character , as I mentioned, and so tuple(row)
is just a tuple containing one character, and hence is of length 1, so that you're only passing one argument rather than three when you do *tuple(row)
. 顺便说一句,您收到错误消息的原因是,当您
for row in spreadsheet:
如我所提到的那样, for row in spreadsheet:
spreadsheet
为字符串的情况下, row
仅是一个字符 ,因此tuple(row)
只是包含一个的元组字符,因此长度为1,因此在执行*tuple(row)
时只传递一个参数,而不传递三个参数。
All that said, you might want to consider looking at the csv
module , which is part of the standard library, and is precisely designed for reading csv files. 综上所述,您可能需要考虑查看
csv
模块 ,该模块是标准库的一部分,并且专门用于读取csv文件。 It will probably make your life easier in the long run. 从长远来看,这可能会使您的生活更轻松。
I think what you need is enumerate 我认为您需要枚举
def read_csv_line(line_number, filename):
with open("filename.csv") as fileobj
for i, line in enumerate(fileobj):
if i == (line_number - 1):
return line
return None
Then you can feed your random number and filename to get a random line. 然后,您可以输入您的随机数和文件名以获取随机行。
Solution to your problem could be simple dictionary comprehension: 解决问题的方法可能是简单的字典理解:
>>> Entry = namedtuple('Entry', 'Name, Date, Color')
>>> [l for l in open('t.tsv', 'r')]
<<<
['Name Date Color\n',
'Ray May Gray\n',
'Alex Apr Green\n',
'Ann Jun Blue\n',
'Kev Mar Gold\n',
'Rob May Black\n']
>>> [l.split() for l in open('t.tsv', 'r')]
<<<
[['Name', 'Date', 'Color'],
['Ray', 'May', 'Gray'],
['Alex', 'Apr', 'Green'],
['Ann', 'Jun', 'Blue'],
['Kev', 'Mar', 'Gold'],
['Rob', 'May', 'Black']]
>>> [Entry(*l.split()) for l in open('t.tsv', 'r')]
<<<
[Entry(Name='Name', Date='Date', Color='Color'),
Entry(Name='Ray', Date='May', Color='Gray'),
Entry(Name='Alex', Date='Apr', Color='Green'),
Entry(Name='Ann', Date='Jun', Color='Blue'),
Entry(Name='Kev', Date='Mar', Color='Gold'),
Entry(Name='Rob', Date='May', Color='Black')] >>> {'fooo':e for e in Entry(*l.split()) for l in open('t.tsv', 'r')}
>>> {e.Name:e for e in list(Entry(*l.split()) for l in open('t.tsv', 'r'))}
<<<
{'Alex': Entry(Name='Alex', Date='Apr', Color='Green'),
'Ann': Entry(Name='Ann', Date='Jun', Color='Blue'),
'Kev': Entry(Name='Kev', Date='Mar', Color='Gold'),
'Name': Entry(Name='Name', Date='Date', Color='Color'),
'Ray': Entry(Name='Ray', Date='May', Color='Gray'),
'Rob': Entry(Name='Rob', Date='May', Color='Black')}
I think you are thinking on reading the first row as header names. 我认为您正在考虑将第一行作为标题名称阅读。 Python has DictReader - https://docs.python.org/2/library/csv.html#csv.DictReader
Python具有DictReader- https: //docs.python.org/2/library/csv.html#csv.DictReader
>>> import csv
>>> for line in csv.DictReader(open('t.tsv')): print line # don't forget to make your file coma-separated.
{'Date': 'May', 'Color': 'Gray', 'Name': 'Ray'}
{'Date': 'Apr', 'Color': 'Green', 'Name': 'Alex'}
{'Date': 'Jun', 'Color': 'Blue', 'Name': 'Ann'}
{'Date': 'Mar', 'Color': 'Gold', 'Name': 'Kev'}
{'Date': 'May', 'Color': 'Black', 'Name': 'Rob'}
or with dictionary comprehension: 或具有字典理解能力:
>>> { line['Name']: line for line in csv.DictReader(open('t.tsv')) }
<<<
{'Alex': {'Color': 'Green', 'Date': 'Apr', 'Name': 'Alex'},
'Ann': {'Color': 'Blue', 'Date': 'Jun', 'Name': 'Ann'},
'Kev': {'Color': 'Gold', 'Date': 'Mar', 'Name': 'Kev'},
'Ray': {'Color': 'Gray', 'Date': 'May', 'Name': 'Ray'},
'Rob': {'Color': 'Black', 'Date': 'May', 'Name': 'Rob'}}
>>> rows_by_name = { line['Name']: line for line in csv.DictReader(open('t.tsv')) }
>>> rows_by_name['Ann']
<<< {'Color': 'Blue', 'Date': 'Jun', 'Name': 'Ann'}
If you want random samples - i suggest first reading a rows into list and then make selection through randbom module. 如果您想要随机样本-我建议您首先将一行读入列表,然后通过randbom模块进行选择。 Or... let's do it with Entry:
或者...让我们用Entry来做:
>>> rows = list(Entry(*l.split()) for l in open('t.tsv', 'r'))
>>> import random
>>> random.sample(rows, 1)
<<< [Entry(Name='Ray', Date='May', Color='Gray')]
>>> random.sample(rows, 1)
<<< [Entry(Name='Alex', Date='Apr', Color='Green')]
>>> random.sample(rows, 1)
<<< [Entry(Name='Name', Date='Date', Color='Color')]
>>> random.sample(rows, 1)
<<< [Entry(Name='Alex', Date='Apr', Color='Green')]
>>> random.sample(rows, 1)
<<< [Entry(Name='Alex', Date='Apr', Color='Green')]
>>> random.sample(rows, 1)
<<< [Entry(Name='Alex', Date='Apr', Color='Green')]
>>> random.sample(rows, 3)
<<<
[Entry(Name='Ray', Date='May', Color='Gray'),
Entry(Name='Kev', Date='Mar', Color='Gold'),
Entry(Name='Ann', Date='Jun', Color='Blue')]
>>> random.sample(rows, 3)
<<<
[Entry(Name='Ann', Date='Jun', Color='Blue'),
Entry(Name='Rob', Date='May', Color='Black'),
Entry(Name='Name', Date='Date', Color='Color')]
>>> random.sample(rows, 3)
<<<
[Entry(Name='Rob', Date='May', Color='Black'),
Entry(Name='Ann', Date='Jun', Color='Blue'),
Entry(Name='Kev', Date='Mar', Color='Gold')]
but beware, that you can load up your memory too much. 但请注意,您可能会过多地加载内存。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.