简体   繁体   English

在python中将文件导出到字典中时出错

[英]error in exporting a file into a dictionary in python

I have a csv file with two columns and more that 6000 rows and would like to export it to a dictionary in python. 我有一个包含两列和超过6000行的csv文件,并希望将其导出到python中的字典。 here is a part of big file: 这是大文件的一部分:

ENST00000589805,CCCTCCCGGACTCCTCTCCCCGGCCGGCCGGCAAGAGTTTACAA
ENST00000376512,GTTGCCGAGGGGACGGGCCGGGCAGATGCCAAC
ENST00000314332,TTTAAG

I wrote this function: 我写了这个函数:

def file_to_dict(filename):
    f = open(filename, 'r')
    answer = {}
    for line in f:
        k, v = line.strip().split(',')
        answer[k.strip()] = v.strip()
    return answer

I tried that for a small file and worked perfectly. 我尝试了一个小文件,并完美地工作。 but when I tried that for my big file, it gave this error: 但是当我尝试对大文件进行操作时,出现了以下错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in file_to_dict
ValueError: too many values to unpack

I tried to find the solution but did not manage. 我试图找到解决方案,但没有解决。 do you guys know how to resolve it? 你们知道如何解决吗? BTW, the dictionary would be like this: 顺便说一句,字典是这样的:

{'ENST00000589805':'CCCTCCCGGACTCCTCTCCCCGGCCGGCCGGCAAGAGTTTACAA', 'ENST00000376512': 'GTTGCCGAGGGGACGGGCCGGGCAGATGCCAAC', 'ENST00000314332': 'TTTAAG'}

The most likely (but not the only possible) cause is that you have a newline at the end if your input file. 最可能(但不是唯一可能)的原因是,如果输入文件末尾有换行符。 This would break the split() call in the manner you describe. 这将以您描述的方式中断split()调用。 One way to fix this is as follows: 解决此问题的一种方法如下:

for line in f:
    line = line.strip()
    if line:
      k, v = line.split(',')
      answer[k.strip()] = v.strip()

It is equally possible that your input file breaks your assumptions in some other way. 输入文件也有可能以其他方式破坏您的假设。 To handle this, you should beef up the error checking in your code. 为了解决这个问题,您应该加强代码中的错误检查。

One or more of the lines probably has more than one comma in it. 一行或多行中可能包含多个逗号。 Because you're splitting by commas, it's being broken up into >2 variables, but you've only specified two names. 因为要用逗号分隔,所以将其分解为> 2个变量,但是只指定了两个名称。 Find the line with the extra comma and try to fix that, or give an extra variable name if needed. 找到带有多余逗号的行,然后尝试解决该问题,或者在需要时提供一个额外的变量名。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM