简体   繁体   English

Python将整数从csv文件读取到列表中

[英]Python reading in integers from a csv file into a list

在此处输入图片说明 I am having some trouble trying to read a particular column in a csv file into a list in Python. 我在尝试将csv文件中的特定列读入Python列表时遇到一些麻烦。 Below is an example of my csv file: 以下是我的csv文件的示例:

Col 1       Col 2
1,000,000   1
  500,000   2
  250,000   3

Basically I am wanting to add column 1 into a list as integer values and am having a lot of trouble doing so. 基本上,我想将第1列作为整数添加到列表中,这样做很麻烦。 I have tried: 我努力了:

for row in csv.reader(csvfile):
    list = [int(row.split(',')[0]) for row in csvfile]

However, I get a ValueError that says "invalid literal for int() with base 10: '"1' 但是,我得到一个ValueError,上面写着“以10为底的int()无效文字:'“ 1

I then tried: 然后,我尝试:

for row in csv.reader(csvfile):
    list = [(row.split(',')[0]) for row in csvfile]

This time I don't get an error however, I get the list: 这次我没有收到错误,但是得到了以下列表:

['"1', '"500', '"250']

I have also tried changing the delimiter: 我也尝试过更改定界符:

for row in csv.reader(csvfile):
    list = [(row.split(' ')[0]) for row in csvfile]

This almost gives me the desired list however, the list includes the second column as well as, "\\n" after each value: 这几乎给了我所需的列表,但是该列表包括第二列以及每个值后的“ \\ n”:

['"1,000,000", 1\n', etc...]

If anyone could help me fix this it would be greatly appreciated! 如果有人可以帮助我解决此问题,将不胜感激!

Cheers 干杯

You can open the file and split at the space using regular expressions: 您可以打开文件并使用正则表达式在空格处分割:

import re
file_data = [re.split('\s+', i.strip('\n')) for i in open('filename.csv')]
final_data = [int(i[0]) for i in file_data[1:]]

You should choose your delimiter wisely : If you have floating numbers using . 您应该明智地选择定界符:如果使用,则有浮点数. , use , delimiter, or if you use , for floating numbers, use ; ,使用,定界符,或者如果使用,则为浮点数,请使用; as delimiter. 作为分隔符。

Moreover, as referred by the doc for csv.reader you can use the delimiter= argument to define your delimiter, like so: 此外,正如csv.reader的文档所引用的csv.reader您可以使用delimiter=参数来定义您的分隔符,如下所示:

with open('myfile.csv', 'r') as csvfile:
    mylist = []
    for row in csv.reader(csvfile, delimiter=';'):
        mylist.append(row[0]) # careful here with [0]

or short version: 或简短版本:

with open('myfile.csv', 'r') as csvfile:
    mylist = [row[0] for row in csv.reader(csvfile, delimiter=';')]

To parse your number to a float, you will have to do 要将您的数字解析为浮点数,您必须

 float(row[0].replace(',', ''))

First of all, you must parse your data correctly. 首先,您必须正确解析数据。 Because it's not, in fact, CSV (Comma-Separated Values) but rather TSV (Tab-Separated) of which you should inform CSV reader (I'm assuming it's tab but you can theoretically use any whitespace with a few tweaks): 因为实际上它不是CSV(逗号分隔值),而是TSV(制表符分隔),您应该通知CSV阅读器(我假设它是制表符,但理论上您可以通过一些调整使用任何空白):

for row in csv.reader(csvfile, delimiter="\t"):

Second of all, you should strip your integer values of any commas as they don't add new information. 第二,您应该去除所有逗号的整数值,因为它们不会添加新信息。 After that, they can be easily parsed with int() : 之后,可以使用int()轻松解析它们:

int(row[0].replace(',', ''))

Third of all, you really really should not iterate the same list twice. 第三,您真的不应该重复两次相同的列表。 Either use a list comprehension or normal for loop, not both at the same time with the same variable. 使用列表推导普通的for循环,不要同时使用相同的变量。 For example, with list comprehension: 例如,使用列表理解:

csvfile = StringIO("Col 1\tCol 2\n1,000,000\t1\n500,000\t2\n250,000\t3\n")
reader = csv.reader(csvfile, delimiter="\t")
next(reader, None)  # skip the header
lst = [int(row[0].replace(',', '')) for row in reader]

Or with normal iteration: 或使用常规迭代:

csvfile = StringIO("Col 1\tCol 2\n1,000,000\t1\n500,000\t2\n250,000\t3\n")
reader = csv.reader(csvfile, delimiter="\t")
lst = []
for i, row in enumerate(reader):
    if i == 0:
        continue  # your custom header-handling code here
    lst.append(int(row[0].replace(',', '')))

In both cases, lst is set to [1000000, 500000, 250000] as it should. 在这两种情况下, lst都应设置为[1000000, 500000, 250000] Enjoy. 请享用。

By the way, using reserved keyword list as a variable is an extremely bad idea. 顺便说一句,使用保留关键字list作为变量是一个非常糟糕的主意。

UPDATE. UPDATE。 There's one more option that I find interesting. 我发现另外一个有趣的选择。 Instead of setting the delimiter explicitly you can use csv.Sniffer to detect it eg: 不用显式设置定界符,您可以使用csv.Sniffer进行检测,例如:

csvdata = "Col 1\tCol 2\n1,000,000\t1\n500,000\t2\n250,000\t3\n"
csvfile = StringIO(csvdata)
dialect = csv.Sniffer().sniff(csvdata)
reader = csv.reader(csvfile, dialect=dialect)

and then just like the snippets above. 然后就像上面的片段一样 This will continue working even if you replace tabs with semicolons or commas (would require quotes around your weird integers) or, possibly, something else. 即使您用分号或逗号替换制表符(可能需要在您的怪异整数周围加上引号)或可能还有其他内容,这也将继续起作用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM