简体   繁体   English

如何在python中读取csv文件的特定行?

[英]How to read a specific row of a csv file in python?

I have searched like crazy trying to find specifically how to read a row in a csv file. 我疯狂地搜索,试图专门查找如何读取csv文件中的行。

I need to read a random row out of 1000, each of which has 3 columns. 我需要读取1000行中的随机行,每个行都有3列。 The first column has an email. 第一列有一封电子邮件。 I need to put in a random email, and get columns 2 and 3 out. 我需要放入随机电子邮件,并取出第2列和第3列。 (Python 2.7, csv file) (Python 2.7,csv文件)

Example: 例:

Name Date  Color
Ray  May   Gray
Alex Apr   Green
Ann  Jun   Blue
Kev  Mar   Gold
Rob  May   Black

Instead of column 1 row 3, I need [Ann], her whole row. 我需要[Ann],而不是第1列第3行。 This is a CSV file, with over 1000 names. 这是CSV档案,名称超过1000个。 I have to put in her name and output her whole row. 我必须输入她的名字并输出她的整行。


What I have tried 我尝试过的

from collections import namedtuple
Entry = namedtuple('Entry', 'Name, Date, Color')
file_location = "C:/Users/abriman/Desktop/Book.csv"
ss_dict = {}
spreadsheet = file_location = "C:/Users/abriman/Desktop/Book.csv"
for row in spreadsheet:
    entry = Entry(*tuple(row))
    ss_dict['Ann']

And my error reads 我的错误读

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
TypeError: __new__() takes exactly 4 arguments (2 given)

I have tried other ways too and got little to no result. 我也尝试了其他方法,但收效甚微。 I'm a beginner at python. 我是python的初学者。

You're on the right track. 您走在正确的轨道上。 First issue: you're never opening the file located at file_location . 第一个问题:您永远不会打开位于file_location的文件。 Thus, when you iterate for row in spreadsheet: , you're iterating over the characters of spreadsheet , which are the characters of file_location , which are the characters of "C:/Users/..." . 因此,当您迭代for row in spreadsheet: ,您将遍历spreadsheet的字符,即file_location的字符,即"C:/Users/..."的字符。 So the first thing you want to do is actually open the file: 因此,您要做的第一件事实际上是打开文件:

spreadsheet = open(file_location, 'r')

You still have another issue in your loop. 您的循环中还有另一个问题。 When you iterate over a file in a for loop, you get back the lines of the file. for循环中遍历文件时,您会取回文件的各行。 So, at each iteration, row will be a line, eg "Ray May Gray" . 因此,在每次迭代中, row是一行,例如"Ray May Gray" When you call tuple() on that, you're going to get a tuple that looks like ('R', 'a', 'y', ' ', ' ', 'M', ...) . 当您在其上调用tuple()时,将得到一个看起来像('R', 'a', 'y', ' ', ' ', 'M', ...) What you need to do is construct your tuple by splitting on whitespace: 您需要做的是通过在空白处分割来构造元组:

entry = Entry(*row.split())

Then, you need to add your entry to the dictionary ss_dict : 然后,您需要条目添加到字典ss_dict

ss_dict[entry.Name] = entry

Finally, you can read out the value of ss_dict['Ann'] , but this should be outside your loop - if you do it inside your loop, you may be trying to read the value of ss_dict['Ann'] before it has been set. 最后,您可以读出ss_dict['Ann'] ,但这应该在循环之外-如果在循环内执行,则可能ss_dict['Ann']读取ss_dict['Ann']的值被设置。 All in all, your code should look like this: 总而言之,您的代码应如下所示:

from collections import namedtuple
Entry = namedtuple('Entry', 'Name, Date, Color')
file_location = "C:/Users/abriman/Desktop/Book.csv"
ss_dict = {}
spreadsheet = open(file_location, 'r') # <--
for row in spreadsheet:
    entry = Entry(*row.split()) # <--
    ss_dict[entry.Name] = entry # <--
print ss_dict['Ann']

Incidentally, the reason you're getting your error message there is that when you do for row in spreadsheet: with spreadsheet being a string, row is just a character , as I mentioned, and so tuple(row) is just a tuple containing one character, and hence is of length 1, so that you're only passing one argument rather than three when you do *tuple(row) . 顺便说一句,您收到错误消息的原因是,当您for row in spreadsheet:如我所提到的那样, for row in spreadsheet: spreadsheet为字符串的情况下, row仅是一个字符 ,因此tuple(row)只是包含一个的元组字符,因此长度为1,因此在执行*tuple(row)时只传递一个参数,而不传递三个参数。


All that said, you might want to consider looking at the csv module , which is part of the standard library, and is precisely designed for reading csv files. 综上所述,您可能需要考虑查看csv模块 ,该模块是标准库的一部分,并且专门用于读取csv文件。 It will probably make your life easier in the long run. 从长远来看,这可能会使您的生活更轻松。

I think what you need is enumerate 我认为您需要枚举

def read_csv_line(line_number, filename):
    with open("filename.csv") as fileobj
        for i, line in enumerate(fileobj):
            if i == (line_number - 1):
                return line
    return None

Then you can feed your random number and filename to get a random line. 然后,您可以输入您的随机数和文件名以获取随机行。

Solution to your problem could be simple dictionary comprehension: 解决问题的方法可能是简单的字典理解:

>>> Entry = namedtuple('Entry', 'Name, Date, Color')
>>> [l for l in open('t.tsv', 'r')]
<<<
['Name Date  Color\n',
 'Ray  May   Gray\n',
 'Alex Apr   Green\n',
 'Ann  Jun   Blue\n',
 'Kev  Mar   Gold\n',
 'Rob  May   Black\n']
>>> [l.split() for l in open('t.tsv', 'r')]
<<<
[['Name', 'Date', 'Color'],
 ['Ray', 'May', 'Gray'],
 ['Alex', 'Apr', 'Green'],
 ['Ann', 'Jun', 'Blue'],
 ['Kev', 'Mar', 'Gold'],
 ['Rob', 'May', 'Black']]
>>> [Entry(*l.split()) for l in open('t.tsv', 'r')]
<<<
[Entry(Name='Name', Date='Date', Color='Color'),
 Entry(Name='Ray', Date='May', Color='Gray'),
 Entry(Name='Alex', Date='Apr', Color='Green'),
 Entry(Name='Ann', Date='Jun', Color='Blue'),
 Entry(Name='Kev', Date='Mar', Color='Gold'),
 Entry(Name='Rob', Date='May', Color='Black')]    >>> {'fooo':e for e in Entry(*l.split()) for l in open('t.tsv', 'r')}
>>> {e.Name:e for e in list(Entry(*l.split()) for l in open('t.tsv', 'r'))}
<<<
{'Alex': Entry(Name='Alex', Date='Apr', Color='Green'),
 'Ann': Entry(Name='Ann', Date='Jun', Color='Blue'),
 'Kev': Entry(Name='Kev', Date='Mar', Color='Gold'),
 'Name': Entry(Name='Name', Date='Date', Color='Color'),
 'Ray': Entry(Name='Ray', Date='May', Color='Gray'),
 'Rob': Entry(Name='Rob', Date='May', Color='Black')}

I think you are thinking on reading the first row as header names. 我认为您正在考虑将第一行作为标题名称阅读。 Python has DictReader - https://docs.python.org/2/library/csv.html#csv.DictReader Python具有DictReader- https: //docs.python.org/2/library/csv.html#csv.DictReader

>>> import csv
>>> for line in csv.DictReader(open('t.tsv')): print line # don't forget to make your file coma-separated. 
{'Date': 'May', 'Color': 'Gray', 'Name': 'Ray'}
{'Date': 'Apr', 'Color': 'Green', 'Name': 'Alex'}
{'Date': 'Jun', 'Color': 'Blue', 'Name': 'Ann'}
{'Date': 'Mar', 'Color': 'Gold', 'Name': 'Kev'}
{'Date': 'May', 'Color': 'Black', 'Name': 'Rob'}

or with dictionary comprehension: 或具有字典理解能力:

>>> { line['Name']: line for line in csv.DictReader(open('t.tsv')) }
<<<
{'Alex': {'Color': 'Green', 'Date': 'Apr', 'Name': 'Alex'},
 'Ann': {'Color': 'Blue', 'Date': 'Jun', 'Name': 'Ann'},
 'Kev': {'Color': 'Gold', 'Date': 'Mar', 'Name': 'Kev'},
 'Ray': {'Color': 'Gray', 'Date': 'May', 'Name': 'Ray'},
 'Rob': {'Color': 'Black', 'Date': 'May', 'Name': 'Rob'}}
>>> rows_by_name = { line['Name']: line for line in csv.DictReader(open('t.tsv')) }
>>> rows_by_name['Ann']
<<< {'Color': 'Blue', 'Date': 'Jun', 'Name': 'Ann'}

If you want random samples - i suggest first reading a rows into list and then make selection through randbom module. 如果您想要随机样本-我建议您首先将一行读入列表,然后通过randbom模块进行选择。 Or... let's do it with Entry: 或者...让我们用Entry来做:

>>> rows = list(Entry(*l.split()) for l in open('t.tsv', 'r'))
>>> import random
>>> random.sample(rows, 1)
<<< [Entry(Name='Ray', Date='May', Color='Gray')]
>>> random.sample(rows, 1)
<<< [Entry(Name='Alex', Date='Apr', Color='Green')]
>>> random.sample(rows, 1)
<<< [Entry(Name='Name', Date='Date', Color='Color')]
>>> random.sample(rows, 1)
<<< [Entry(Name='Alex', Date='Apr', Color='Green')]
>>> random.sample(rows, 1)
<<< [Entry(Name='Alex', Date='Apr', Color='Green')]
>>> random.sample(rows, 1)
<<< [Entry(Name='Alex', Date='Apr', Color='Green')]
>>> random.sample(rows, 3)
<<<
[Entry(Name='Ray', Date='May', Color='Gray'),
 Entry(Name='Kev', Date='Mar', Color='Gold'),
 Entry(Name='Ann', Date='Jun', Color='Blue')]
>>> random.sample(rows, 3)
<<<
[Entry(Name='Ann', Date='Jun', Color='Blue'),
 Entry(Name='Rob', Date='May', Color='Black'),
 Entry(Name='Name', Date='Date', Color='Color')]
>>> random.sample(rows, 3)
<<<
[Entry(Name='Rob', Date='May', Color='Black'),
 Entry(Name='Ann', Date='Jun', Color='Blue'),
 Entry(Name='Kev', Date='Mar', Color='Gold')]

but beware, that you can load up your memory too much. 但请注意,您可能会过多地加载内存。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM