简体   繁体   中英

Casting Data Types When Converting a CSV file to a Dictionary in Python

I have a CSV file that looks like this

Item,Price,Calories,Category
Orange,1.99,60,Fruit
Cereal,3.99,110,Box Food
Ice Cream,6.95,200,Dessert
...

and I want to form a Python dictionary in this format:

{'Orange': (1.99, 60, 'Fruit'), 'Cereal': (3.99, 110, 'Box Food'), ... }

I want to make sure the titles of the columns are removed (ie, the first row is NOT included).

Here is what I've tried so far:

reader = csv.reader(open('storedata.csv'))

for row in reader:
    # only needed if empty lines in input
    if not row:
        continue
    key = row[0]
    x = float(row[1])
    y = int(row[2])
    z = row[3]
    result[key] = x, y, z

print(result)

However, when I do this, I get a ValueError: could not convert string to float: 'Price' , and I don't know how to fix it. I want to keep these three values in a tuple.

Thanks!

I recommend using pandas.read_csv to read your csv file:

import pandas as pd

df = pd.DataFrame([["Orange",1.99,60,"Fruit"], ["Cereal",3.99,110,"Box Food"], ["Ice Cream",6.95,200,"Dessert"]],
            columns= ["Item","Price","Calories","Category"])

I have tried to frame your data as shown below:

print(df)
    Item         Price    Calories    Category
0   Orange       1.99       60          Fruit
1   Cereal       3.99       110         Box Food
2   Ice Cream    6.95       200         Dessert

First off, you create an empty Python dictionary to hold the files then leverage the pandas.DataFrame.iterrows() to iterate through the columns

res = {}


for index, row in df.iterrows():
    item = row["Item"]
    x = pd.to_numeric(row["Price"], errors="coerce")
    y = int(row["Calories"])
    z = row["Category"]
    res[item] = (x,y,z) 

In fact printing res results in your expected output as shown below:

print(res)

{'Orange': (1.99, 60, 'Fruit'),
 'Cereal': (3.99, 110, 'Box Food'),
 'Ice Cream': (6.95, 200, 'Dessert')}

You can simply use dict plus zip if you're using a pandas.DataFrame called df :

>>> dict(zip(df['Item'], df[['Price', 'Calories', 'Category']].values.tolist()))
{'Orange': [1.99, 60, 'Fruit'], 'Cereal': [3.99, 110, 'Box Food'], 'Ice Cream': [6.95, 200, 'Dessert']}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM