简体   繁体   English

使用来自 a.csv 文件的数据创建字典

[英]Creating a dictionary using data from a .csv file

I have a.csv file with 20 lines, and each line is formatted as follows:我有一个20行的.csv文件,每行格式如下:

Lucy, 23, F, diabetes
Darwin, 60, M, hypertension
Dave, 35, M, epilepsy
Tiffany, 12, F, asthma

... and so on. ... 等等。

I am looking to convert this.csv file into a dictionary, presented as follows:我希望将 this.csv 文件转换为字典,如下所示:

dict = {
     'Lucy':{
           age: 23
           gender: 'F'
           condition: 'diabetes'
      },
     'Darwin':{
           age: 60
           gender: 'M'
           condition: 'hypertension'
      },
      #(and so on for all 20 lines)
}

Each line is in the form: name, age, gender, condition.每一行的格式为:姓名、年龄、性别、状况。 Here's what I have tried so far.这是我到目前为止所尝试的。

dict ={}
f = open("medical.csv', mode = "rt", encoding = "utf8")
s = f.readline()
for line in f:
    line.split(",")

... and this is where I hit a halt. ...这就是我停下来的地方。 I cannot figure out how to assign the titles to each value in the line such that the dictionary will be displayed as above, with the tags 'age', 'gender' and 'condition'.我无法弄清楚如何将标题分配给该行中的每个值,以便字典将如上显示,带有标签“年龄”、“性别”和“条件”。 And when I run the code, there is a SyntaxError: invalid syntax message on " medical.csv ".当我运行代码时,“ medical.csv ”上有一个SyntaxError: invalid syntax消息。

The age has to be an integer.年龄必须是 integer。 If it is not an integer, I want the program to skip that line when creating the dictionary.如果不是 integer,我希望程序在创建字典时跳过该行。

Any help would be much appreciated!任何帮助将非常感激!

  1. I recommend not naming your dictionary keys with the names because the names can be repeated.我建议不要用名称命名字典键,因为名称可以重复。

  2. At the beginning create the main dict, then iterate over lines in CSV.首先创建主字典,然后遍历 CSV 中的行。 In each line extract name person properties (You used split method - it fits perfectly here,, but instead of doing split(",") use split(", ")) .在每一行中提取名称人员属性(您使用了 split 方法 - 它非常适合这里,但不要使用 split( split(",")使用split(", ")) Create dictionary for each person and assign keys and values to it this way:为每个人创建字典并以这种方式为其分配键和值:

    person = {}

    person['age'] = 23

An so on... Then assign this person's dictionary as a value to the main dictionary and set the key to person's name.以此类推...然后将这个人的字典作为值分配给主字典,并将键设置为人名。 Hope it helps a bit!希望它有点帮助!

First of all, please keep in mind that there might be more "pythonic" answers to your problem.首先,请记住,您的问题可能有更多“pythonic”答案。

Well, you are on the right path:好吧,您走在正确的道路上:

dict ={}
f = open("medical.csv", mode = "rt", encoding = "utf8")
s = f.readline()
for line in f:
    l = line.split(",")

Let's give a name to the result to line.split(",") ( l ).让我们为line.split(",") ( l ) 的结果命名。 Now l is in this format:现在l是这种格式:

l[0] contains the name l[1] contains the age l[2] contains the sex l[3] contains the condition. l[0]包含姓名l[1]包含年龄l[2]包含性别l[3]包含条件。

Now, the first element of l is the name, so let's add it to the dictionary:现在, l的第一个元素是名称,所以让我们将它添加到字典中:

dict[l[0].strip()] = {}

Note:笔记:

  1. I'm using l[0].strip() because there might be unwanted whitespace at the beginning or end of it我正在使用l[0].strip()因为它的开头或结尾可能有不需要的空格
  2. I'm initializing a new dictionary inside the dictionary (the data structure you want is a dictionary of dictionaies)我正在字典中初始化一个新字典(你想要的数据结构是字典的字典)

Now, let's add in turn the other fields:现在,让我们依次添加其他字段:

dict[l[0].strip()]['gender'] = l[2].strip()
dict[l[0].strip()]['condition'] = l[3].strip()

This works, unless the age is not an integer, so we need to use a try except block for that beforehand:这有效,除非年龄不是 integer,所以我们需要事先使用 try except 块:

try: 
    age = int(l[1].strip())
except ValueError:
    continue    # You want to skip the current iteration, right?

Now we can put everything together, polishing the code a bit:现在我们可以把所有东西放在一起,稍微润色一下代码:

    dict ={}
    f = open("medical.csv", mode = "rt", encoding = "utf8")
    s = f.readline()
    for line in f:
        l = line.split(",")
        age = -1
        try:
            age = int(l[1].strip())
        except ValueError:
            continue
        key = l[0].strip()
        dict[key]['age'] = age
        dict[key]['sex'] = l[2].strip()
        dict[key]['condition'] = l[3].strip()

Of course this supposes all the names are different (I've just read firanek's answer: I agree with her/him in the fact that you should not use names as the key, with this approach, you lose all the data about all the people with the same name except for the last one)当然,这假设所有的名字都是不同的(我刚刚阅读了 firanek 的回答:我同意她/他的观点,即你不应该使用名字作为密钥,使用这种方法,你会丢失关于所有人的所有数据除了最后一个名字相同)

Oh, I was almost forgetting about it: you can use the dict constructor and replace the lines dict[keys][<string>] = <thing> : dict[key] = {'age' = age, 'sex' = l[2].strip(), 'condition' = l[3].strip() .哦,我差点忘记了:您可以使用dict构造函数并替换行dict[keys][<string>] = <thing> : dict[key] = {'age' = age, 'sex' = l[2].strip(), 'condition' = l[3].strip()

I suggest to use the csv module for this purpose.为此,我建议使用csv模块。 Note the handy skipinitialspace argument.请注意方便的skipinitialspace参数。

import csv
from pprint import pprint


def row_to_dict(ts):
    return {k: t for k, t in zip(("age", "gender", "condition"), ts)}


if __name__ == "__main__":
    result = {}
    with open("medical.csv") as f:
        reader = csv.reader(f, skipinitialspace=True)
        for row in reader:
            name, data = row[0], row[1:]
            result[name] = row_to_dict(data)

    pprint(result)

You may want to check out the Pandas library, and manipulate the data with DataFrames as it has lots of built-in functionality.您可能想查看Pandas库,并使用 DataFrames 操作数据,因为它具有许多内置功能。

import pandas as pd
data=pd.read_csv("data.csv", header=None ,names=["Name", "Age", "Gender", "Condition"], index_col=False, na_values=",NaN, null", verbose=True)
data=pd.DataFrame(data)
newdata=data.dropna(subset=['Age'])
print("new data: \n", newdata)

Also a similar question: Pandas: drop columns with all NaN's还有一个类似的问题: Pandas: drop columns with all NaN's

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM