简体   繁体   中英

Creating a dictionary using data from a .csv file

I have a.csv file with 20 lines, and each line is formatted as follows:

Lucy, 23, F, diabetes
Darwin, 60, M, hypertension
Dave, 35, M, epilepsy
Tiffany, 12, F, asthma

... and so on.

I am looking to convert this.csv file into a dictionary, presented as follows:

dict = {
     'Lucy':{
           age: 23
           gender: 'F'
           condition: 'diabetes'
      },
     'Darwin':{
           age: 60
           gender: 'M'
           condition: 'hypertension'
      },
      #(and so on for all 20 lines)
}

Each line is in the form: name, age, gender, condition. Here's what I have tried so far.

dict ={}
f = open("medical.csv', mode = "rt", encoding = "utf8")
s = f.readline()
for line in f:
    line.split(",")

... and this is where I hit a halt. I cannot figure out how to assign the titles to each value in the line such that the dictionary will be displayed as above, with the tags 'age', 'gender' and 'condition'. And when I run the code, there is a SyntaxError: invalid syntax message on " medical.csv ".

The age has to be an integer. If it is not an integer, I want the program to skip that line when creating the dictionary.

Any help would be much appreciated!

  1. I recommend not naming your dictionary keys with the names because the names can be repeated.

  2. At the beginning create the main dict, then iterate over lines in CSV. In each line extract name person properties (You used split method - it fits perfectly here,, but instead of doing split(",") use split(", ")) . Create dictionary for each person and assign keys and values to it this way:

    person = {}

    person['age'] = 23

An so on... Then assign this person's dictionary as a value to the main dictionary and set the key to person's name. Hope it helps a bit!

First of all, please keep in mind that there might be more "pythonic" answers to your problem.

Well, you are on the right path:

dict ={}
f = open("medical.csv", mode = "rt", encoding = "utf8")
s = f.readline()
for line in f:
    l = line.split(",")

Let's give a name to the result to line.split(",") ( l ). Now l is in this format:

l[0] contains the name l[1] contains the age l[2] contains the sex l[3] contains the condition.

Now, the first element of l is the name, so let's add it to the dictionary:

dict[l[0].strip()] = {}

Note:

  1. I'm using l[0].strip() because there might be unwanted whitespace at the beginning or end of it
  2. I'm initializing a new dictionary inside the dictionary (the data structure you want is a dictionary of dictionaies)

Now, let's add in turn the other fields:

dict[l[0].strip()]['gender'] = l[2].strip()
dict[l[0].strip()]['condition'] = l[3].strip()

This works, unless the age is not an integer, so we need to use a try except block for that beforehand:

try: 
    age = int(l[1].strip())
except ValueError:
    continue    # You want to skip the current iteration, right?

Now we can put everything together, polishing the code a bit:

    dict ={}
    f = open("medical.csv", mode = "rt", encoding = "utf8")
    s = f.readline()
    for line in f:
        l = line.split(",")
        age = -1
        try:
            age = int(l[1].strip())
        except ValueError:
            continue
        key = l[0].strip()
        dict[key]['age'] = age
        dict[key]['sex'] = l[2].strip()
        dict[key]['condition'] = l[3].strip()

Of course this supposes all the names are different (I've just read firanek's answer: I agree with her/him in the fact that you should not use names as the key, with this approach, you lose all the data about all the people with the same name except for the last one)

Oh, I was almost forgetting about it: you can use the dict constructor and replace the lines dict[keys][<string>] = <thing> : dict[key] = {'age' = age, 'sex' = l[2].strip(), 'condition' = l[3].strip() .

I suggest to use the csv module for this purpose. Note the handy skipinitialspace argument.

import csv
from pprint import pprint


def row_to_dict(ts):
    return {k: t for k, t in zip(("age", "gender", "condition"), ts)}


if __name__ == "__main__":
    result = {}
    with open("medical.csv") as f:
        reader = csv.reader(f, skipinitialspace=True)
        for row in reader:
            name, data = row[0], row[1:]
            result[name] = row_to_dict(data)

    pprint(result)

You may want to check out the Pandas library, and manipulate the data with DataFrames as it has lots of built-in functionality.

import pandas as pd
data=pd.read_csv("data.csv", header=None ,names=["Name", "Age", "Gender", "Condition"], index_col=False, na_values=",NaN, null", verbose=True)
data=pd.DataFrame(data)
newdata=data.dropna(subset=['Age'])
print("new data: \n", newdata)

Also a similar question: Pandas: drop columns with all NaN's

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM