简体   繁体   English

Python将带有逗号分隔符的.csv文件转换为字典

[英]Python Converting .csv file with comma delimiter to dictionary

So I have tried to fix this problem for quite a while now and did some research on trying to figure out why my code won't work, but I simply can't get the dictionary to print with all the proper key:value pairs I need. 因此,我已经尝试解决这个问题已有相当长的一段时间了,并进行了一些研究,试图弄清楚为什么我的代码无法正常工作,但是我根本无法使用所有正确的key:value对打印字典。需要。

So here's the story. 这就是故事。 I am reading a .csv file where the first column are text abbreviations and in the second column they are the full english meaning. 我正在阅读.csv文件,其中第一列是文本缩写,在第二列中是完整的英语含义。 Now I have tried multiple ways of trying to open this file, read it, and then store it to dictionary we create. 现在,我尝试了多种方法来尝试打开此文件,读取它,然后将其存储到我们创建的字典中。 My issue is that the file gets read, and when I print the separated pieces (I believe it goes through the whole file, but I don't know since it does get cut off around line 1007, but goes through to 4600. The problem is that when I now want to take all that stuff and put it into key:value pairs inside a dictionary. The only one that gets stored is the very first line in the file. 我的问题是读取文件,然后打印分开的部分(我相信它会遍历整个文件,但是我不知道,因为它确实在第1007行附近被截断,但会遍历4600。问题是就是当我现在想把所有这些东西放到字典中的key:value对中时,唯一被存储的是文件的第一行。

Here is the code: 这是代码:

def createDictionary(filename):
    f = open(filename, 'r')
    dic = {}
    for line in f:
        #line = line.strip()
        data = line.split(',')
        print data
        dic[data[0]] = data[1]
        print dic

What I assumed was the issue was: 我认为问题是:

    print dic

Since it is printing within the loop, but since it is in the loop it should just print everytime it goes through again and again. 由于它是在循环内打印,但由于它是在循环内,因此每次遍历一次时都应该打印。 I am confused on what I am doing wrong. 我对自己做错了感到困惑。 The other methods I attempted to use were json, but I don't know too much about how to use it, and then I also read up about the csv module, but I don't think our professor wants us to use that so i was hoping for someone to spot my error. 我尝试使用的其他方法是json,但是我对如何使用它并不了解,然后我也阅读了csv模块,但是我不认为我们的教授希望我们使用它,所以我希望有人发现我的错误。 Thanks in advance!!! 提前致谢!!!

EDIT 编辑

This is the output of my program 这是我程序的输出

going to be late\rg2cu', 'glad to see you\rg2e', 'got to eat\rg2g', 'got to go\rg2g2tb', 'got to go to the bathroom\rg2g2w', 'got to go to work\rg2g4aw', 'got to go for a while\rg2gb', 'got to go bye\rg2gb2wn', 'got to go back to work now\rg2ge', 'got to go eat\rg2gn', 'got to go now\rg2gp', 'got to go pee\rg2gpc', 'got 2 go parents coming\rg2gpp', 'got to go pee pee\rg2gs', 'got to go sorry\rg2k', 'good to know\rg2p', 'got to pee\rg2t2s', 'got to talk to someone\rg4u', 'good for you\rg4y', 'good for you\rg8', 'gate\rg9', 'good night\rga', 'go ahead\rgaalma', 'go away and leave me alone\rgafi', 'get away from it\rgafm', 'Get away from me\rgagp', 'go and get pissed\rgaj'

Which goes on for a bit until the end of the file and then after that its supposed to print the entire dictionary in which I get this 它持续了一段时间,直到文件结束,然后才应该打印出我在其中得到的整个字典

   {'$$': 'money\r/.'}

Along with a 伴随着

none

EDIT 2 编辑2

Here is the full code: 这是完整的代码:

def createDictionary(filename):
    f = open(filename, 'r')
    dic = {}
    for line in f:
        line = line.strip()
        data = line.split(',')
        print data
        dic[data[0]] = data[1]
        print dic

if __name__ == "__main__":
    x = createDictionary("textToEnglish.csv")
    print x

EDIT 3 编辑3

Here is the file I am trying to make into a dictionary 这是我要制作成字典的文件

https://1drv.ms/u/s!AqnudQBXpxTGiC9vQEopu1dOciIS https://1drv.ms/u/s!AqnudQBXpxTGiC9vQEopu1dOciIS

Simply add a return in your function. 只需在函数中添加return Also, you will see the dictionary length is not the same as csv rows due to repeated values in first column of csv. 另外,由于csv第一栏中的重复值,您会发现字典长度与csv行不同。 Dictionary keys must be unique, so when a reused key is assigned to a value, the latter value replaces former. 字典键必须是唯一的,因此当将重用键分配给一个值时,后一个值将替换前一个。

def createDictionary(filename):
    f = open(filename, 'r')
    dic = {}
    for line in f:
        #line = line.strip()
        data = line.split(',')
        print(data)
        dic[data[0]] = data[1]
    return dic 

if __name__ == "__main__":
    x = createDictionary("textToEnglish.csv") 
    print type(x)
    # <class 'dict'>

    print len(x)
    # 4255

for k, v in x.items():
    print(k, v)

And try not to print dictionary all at once especially with so many values which becomes intense overhead on memory. 并不要一次全部print字典,尤其是要使用太多的值时,这会占用大量内存。 See how you can iterate through keys and values with for loop. 了解如何使用for循环遍历键和值。

Although there is nothing wrong with the other solutions presented, you could simplify and greatly escalate your solutions by using python's excellent library pandas. 尽管介绍的其他解决方案没有任何问题,但是您可以使用python的出色的库熊猫来简化并极大地升级您的解决方案。

Pandas is a library for handling data in Python, preferred by many Data Scientists. Pandas是一个用Python处理数据的库,许多数据科学家都喜欢使用Pandas。

Pandas has a simplified CSV interface to read and parse files, that can be used to return a list of dictionaries, each containing a single line of the file. Pandas具有简化的CSV界面,用于读取和解析文件,可用于返回字典列表,每个字典包含文件的一行。 The keys will be the column names, and the values will be the ones in each cell. 键将是列名,值将是每个单元格中的值。

In your case: 在您的情况下:

    import pandas

    def createDictionary(filename):
        my_data = pandas.DataFrame.from_csv(filename, sep=',', index_col=False)
        list_of_dicts = [item for item in my_data.T.to_dict().values()]
        return list_of_dicts

    if __name__ == "__main__":
        x = createDictionary("textToEnglish.csv") 
        print type(x)
        # <class 'list'>
        print len(x)
        # 4255
        print type(x[0])
        # <class 'dict'>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM