簡體   English   中英

要將CSV文件讀入字典?

[英]Reading a CSV file into a dictionary?

首先,我想說一句我要求您編寫代碼。 我只想討論並反饋什么是編寫此程序的最佳方法,因為我一直想弄清楚如何解決問題。

我的程序應該打開一個包含7列的CSV文件:

Name of the state,Crop,Crop title,Variety,Year,Unit,Value. 

這是文件的一部分:

Indiana,Corn,Genetically engineered (GE) corn,Stacked gene varieties,2012,Percent of all corn planted,60
Indiana,Corn,Genetically engineered (GE) corn,Stacked gene varieties,2013,Percent of all corn planted,73
Indiana,Corn,Genetically engineered (GE) corn,Stacked gene varieties,2014,Percent of all corn planted,78
Indiana,Corn,Genetically engineered (GE) corn,Stacked gene varieties,2015,Percent of all corn planted,76
Indiana,Corn,Genetically engineered (GE) corn,Stacked gene varieties,2016,Percent of all corn planted,75
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2000,Percent of all corn planted,11
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2001,Percent of all corn planted,12
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2002,Percent of all corn planted,13
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2003,Percent of all corn planted,16
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2004,Percent of all corn planted,21
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2005,Percent of all corn planted,26
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2006,Percent of all corn planted,40
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2007,Percent of all corn planted,59
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2008,Percent of all corn planted,78
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2009,Percent of all corn planted,79
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2010,Percent of all corn planted,83
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2011,Percent of all corn planted,85
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2012,Percent of all corn planted,84
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2013,Percent of all corn planted,85
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2014,Percent of all corn planted,88
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2015,Percent of all corn planted,88
Indiana,Corn,Genetically engineered (GE) corn,All GE varieties,2016,Percent of all corn planted,86

然后將每一行讀入字典。 該文本文件中有很多行,我想要/需要的唯一行是Variety列中顯示“所有GE品種”的行。 請注意,每個州也有多行。 下一步是使用農作物的用戶輸入,僅檢查該農作物的數據。 最后一步是(針對每個州)找出最大和最小值及其對應的年份並打印出來。

我考慮的方式可能是為每行創建一個集合,檢查“所有GE品種”是否在集合中,然后將其添加到字典中。 然后對農作物做類似的事情?

我最大的難題可能是1.)我不知道如何忽略不包含“所有GE品種”的行。 我在創建字典之前或之后要這樣做嗎? 和2.)我知道如何用一個值和一個鍵來創建字典,但是我將如何將其余的值添加到鍵中呢? 你用布套做嗎? 或清單?

確定字符串中是否包含“所有GE品種”相對簡單-使用in關鍵字:

with open(datafile, 'r') as infile:
    for line in file:
        if "All GE varieties" in line:
            # put into line into data structure

對於數據結構,我偏愛詞典列表,其中每個詞典都有一組定義的鍵:

myList = [ {}, {}, {}, ... ]

在這種情況下,問題是如果每個字段都是一個值,我不確定您將使用什么作為鍵。 還請記住split()命令可以幫助您:

varieties = []
with open(datafile, 'r') as infile:
    for line in file:
        if "All GE varieties" in line:
            varieties.append(line.split(','))

這將為您提供一個包含列表的列表(變量),每個列表中的每一行都包含單個字段。

像這樣:

varieties = [['Indiana','Corn','Genetically engineered (GE) corn','All GE varieties','2000','Percent of all corn planted','11'], ['Indiana','Corn','Genetically engineered (GE) corn','All GE varieties','2001','Percent of all corn planted','12'], ... ]

從這里可以很容易地使用切片(2D數組)來選擇狀態或年份等。

如前所述,您可以使用csv模塊讀取csv文件。 我不確定您要如何在state鍵后構造數據,但我認為最好能夠查找每個特定的crop_title ,然后分別訪問每年的value

In[33]: from collections import defaultdict
   ...: from csv import reader
   ...: 
   ...: crops = defaultdict(lambda: defaultdict(dict))
   ...: with open('hmm.csv', 'r') as csvfile:
   ...:     cropreader = reader(csvfile)
   ...:     for row in cropreader:
   ...:         state, crop_type, crop_title, variety, year, unit, value = row
   ...:         if variety == 'All GE varieties':
   ...:             crops[state][crop_title][year] = value
   ...: 
In[34]: crops
Out[34]: 
defaultdict(<function __main__.<lambda>>,
            {'Indiana': defaultdict(dict,
                         {'Genetically engineered (GE) corn': {'2000': '11',
                           '2001': '12',
                           '2002': '13',
                           '2003': '16',
                           '2004': '21',
                           '2005': '26',
                           '2006': '40',
                           '2007': '59',
                           '2008': '78',
                           '2009': '79',
                           '2010': '83',
                           '2011': '85',
                           '2012': '84',
                           '2013': '85',
                           '2014': '88',
                           '2015': '88',
                           '2016': '86'}})})
In[35]: crops['Indiana']['Genetically engineered (GE) corn']['2000']
Out[35]: '11'
In[36]: crops['Indiana']['Genetically engineered (GE) corn']['2015']
Out[36]: '88'

您還可以將yearvalue轉換為整數,例如以下crops[state][crop_title][int(year)] = int(value) ,這將允許您進行如下調用(返回值是整數):

In[38]: crops['Indiana']['Genetically engineered (GE) corn'][2015]
Out[38]: 88

我將您的數據放入名為“ crop_data.csv”的文件中。 這是一些使用標准csv模塊將每一行讀入其自己的字典中的代碼。 我們使用一個簡單的if測試來確保僅保留'Variety' == 'All GE varieties' all_data 'Variety' == 'All GE varieties' ,並將每個州的數據存儲在all_data ,后者是列表的字典,每個州一個列表。 由於狀態“名稱”用作all_data的鍵, all_data我們不需要將其保留row字典中,因此類似地,我們可以丟棄“變量”,因為我們不再需要該信息。

收集完所有數據后,我們可以使用json模塊很好地打印它。

然后我們遍歷all_data ,逐個狀態循環,並計算其最大值和最小值。

import csv
from collections import defaultdict
import json

filename = 'crop_data.csv'

fieldnames = 'Name,Crop,Title,Variety,Year,Unit,Value'.split(',')

all_data = defaultdict(list)

with open(filename) as csvfile:
    reader = csv.DictReader(csvfile, fieldnames=fieldnames)
    for row in reader:
        # We only want 'All GE varieties'
        if row['Variety'] == 'All GE varieties':
            state = row['Name']
            # Get rid of unneeded fields
            del row['Name'], row['Variety']
            # Store it as a plain dict
            all_data[state].append(dict(row))

# Show all the data
print(json.dumps(all_data, indent=4))

#Find minimums & maximums

# Extract the 'Value' field from dict d and convert it to a number
def value_key(d):
    return int(d['Value'])

for state, data in all_data.items():
    print(state)
    row = min(data, key=value_key)
    print('min', row['Value'], row['Year'])

    row = max(data, key=value_key)
    print('max', row['Value'], row['Year'])

產量

{
    "Indiana": [
        {
            "Crop": "Corn",
            "Title": "Genetically engineered (GE) corn",
            "Year": "2000",
            "Unit": "Percent of all corn planted",
            "Value": "11"
        },
        {
            "Crop": "Corn",
            "Title": "Genetically engineered (GE) corn",
            "Year": "2001",
            "Unit": "Percent of all corn planted",
            "Value": "12"
        },
        {
            "Crop": "Corn",
            "Title": "Genetically engineered (GE) corn",
            "Year": "2002",
            "Unit": "Percent of all corn planted",
            "Value": "13"
        },
        {
            "Crop": "Corn",
            "Title": "Genetically engineered (GE) corn",
            "Year": "2003",
            "Unit": "Percent of all corn planted",
            "Value": "16"
        },
        {
            "Crop": "Corn",
            "Title": "Genetically engineered (GE) corn",
            "Year": "2004",
            "Unit": "Percent of all corn planted",
            "Value": "21"
        },
        {
            "Crop": "Corn",
            "Title": "Genetically engineered (GE) corn",
            "Year": "2005",
            "Unit": "Percent of all corn planted",
            "Value": "26"
        },
        {
            "Crop": "Corn",
            "Title": "Genetically engineered (GE) corn",
            "Year": "2006",
            "Unit": "Percent of all corn planted",
            "Value": "40"
        },
        {
            "Crop": "Corn",
            "Title": "Genetically engineered (GE) corn",
            "Year": "2007",
            "Unit": "Percent of all corn planted",
            "Value": "59"
        },
        {
            "Crop": "Corn",
            "Title": "Genetically engineered (GE) corn",
            "Year": "2008",
            "Unit": "Percent of all corn planted",
            "Value": "78"
        },
        {
            "Crop": "Corn",
            "Title": "Genetically engineered (GE) corn",
            "Year": "2009",
            "Unit": "Percent of all corn planted",
            "Value": "79"
        },
        {
            "Crop": "Corn",
            "Title": "Genetically engineered (GE) corn",
            "Year": "2010",
            "Unit": "Percent of all corn planted",
            "Value": "83"
        },
        {
            "Crop": "Corn",
            "Title": "Genetically engineered (GE) corn",
            "Year": "2011",
            "Unit": "Percent of all corn planted",
            "Value": "85"
        },
        {
            "Crop": "Corn",
            "Title": "Genetically engineered (GE) corn",
            "Year": "2012",
            "Unit": "Percent of all corn planted",
            "Value": "84"
        },
        {
            "Crop": "Corn",
            "Title": "Genetically engineered (GE) corn",
            "Year": "2013",
            "Unit": "Percent of all corn planted",
            "Value": "85"
        },
        {
            "Crop": "Corn",
            "Title": "Genetically engineered (GE) corn",
            "Year": "2014",
            "Unit": "Percent of all corn planted",
            "Value": "88"
        },
        {
            "Crop": "Corn",
            "Title": "Genetically engineered (GE) corn",
            "Year": "2015",
            "Unit": "Percent of all corn planted",
            "Value": "88"
        },
        {
            "Crop": "Corn",
            "Title": "Genetically engineered (GE) corn",
            "Year": "2016",
            "Unit": "Percent of all corn planted",
            "Value": "86"
        }
    ]
}
Indiana
min 11 2000
max 88 2014

請注意,在該數據中有2年,值為88。如果您想按年份打破value_key可以使用比value_key更好的鍵功能。 或者,您可以使用value_key對整個狀態data列表進行排序,以便輕松提取所有最低和最高記錄。 例如, for state, data循環會

data.sort(key=value_key)
print(json.dumps(data, indent=4))

它將按數字順序打印該狀態的所有記錄。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM