简体   繁体   English

python27从csv文件中仅提取特定列

[英]python27 extracting only specific columns from a csv file

Pls excuse Im fairly new to programming trying to do something simple but cant seem to figure it out. 请原谅我是编程新手,尝试做一些简单的事情,但似乎无法弄清楚。 Probably something obvious. 可能很明显。

I need to take a huge csv file populated with about 6 columns, parse it and extract only 2 columns into a dictionary which later I will use to build and API call with a json payload. 我需要获取一个巨大的csv文件,该文件填充约6列,将其解析并仅将2列提取到字典中 ,稍后我将使用json负载来构建和API调用。 Any extra data will cause the call to fail. 任何额外的数据将导致呼叫失败。

I need to create a dictionary from the csv file populated with only selected columns, lets say column1 and column5 preserving the key,value structure. 我需要从仅填充选定列的csv文件中创建字典,可以说column1和column5保留键值结构。 So far i have been able to output either only keys or only values or all keys and values but not specific key value data sets. 到目前为止,我已经只能输出键或仅值或所有键和值,但不能输出特定键值数据集。

I need to achieve this using standard python27 the csv module , nothing extra such as panda as i have to work with what i have. 我需要使用标准的python27 csv模块来实现此目的,没有其他事情,例如熊猫,因为我必须使用我所拥有的东西。 I know Im missing something obvious but just cant figure it out. 我知道我缺少明显的东西,但无法弄清楚。 Help is greatly appreciated. 非常感谢您的帮助。

source file example: 源文件示例:

column1,column2,column3,column4,column5
joe,43,888-123-4567,seattle,toyota
bill,18,888-123-4567,vancouver,gm
sally,32,888-987-1234,la,ford

desired output to dictionary: 所需的输出到字典:

[{'column1':'joe', 'column5':'toyota'},{'column1':'bil', 'column5':'gm'},{'column1':'sally', 'column5':'ford'}] 

code snippet: 代码段:

import csv

def parseSourceFile(filename):
    filtered_data = {}
    reader = csv.DictReader(open(filename, "rb"))

    # some for loop here extracting only column1 and column5 with their values appending to filtered_data

    return result

def main():
    readerObj = parseSourceFile('somefile.csv')
    for row in readerObj:
        print row     #at this point i only want columns1,5 k,v data

if __name__ == '__main__':
    main()

You can store the keys you need to a list and then every row you read from csv file use dict comprehension to pick the keys you need: 您可以将所需的密钥存储到列表中,然后从CSV文件读取的每一行都使用dict理解来选择所需的密钥:

import csv
import pprint

KEYS = [
    'column1',
    'column5'
]

def parseSourceFile(filename):
    with open(filename) as f:
        reader = csv.DictReader(f)
        return [{key: row[key] for key in KEYS} for row in reader]

pprint.pprint(parseSourceFile('somefile.csv'))

Output: 输出:

[{'column1': 'joe', 'column5': 'toyota'},
 {'column1': 'bill', 'column5': 'gm'},
 {'column1': 'sally', 'column5': 'ford'}]

What about something like 怎么样

import csv

def parseSourceFile(filename):
    reader = csv.DictReader(open(filename, "r"))

    result = []
    for row in reader:
        result.append({k:v for (k,v) in row.items() if k in ['column1', 'column5']})

    return result

def main():
    result = parseSourceFile('so.csv')

    # Print what you wrote you expected
    print(result)

    # Or iterate over the list elements and print each on separate lines
    for row in result:
        print(row)

if __name__ == '__main__':
    main()

Output: 输出:

[{'column1': 'joe', 'column5': 'toyota'}, {'column1': 'bill', 'column5': 'gm'}, {'column1': 'sally', 'column5': 'ford'}]

{'column1': 'joe', 'column5': 'toyota'}
{'column1': 'bill', 'column5': 'gm'}
{'column1': 'sally', 'column5': 'ford'}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM