[英]python27 extracting only specific columns from a csv file
Pls excuse Im fairly new to programming trying to do something simple but cant seem to figure it out. 请原谅我是编程新手,尝试做一些简单的事情,但似乎无法弄清楚。 Probably something obvious. 可能很明显。
I need to take a huge csv file populated with about 6 columns, parse it and extract only 2 columns into a dictionary which later I will use to build and API call with a json payload. 我需要获取一个巨大的csv文件,该文件填充约6列,将其解析并仅将2列提取到字典中 ,稍后我将使用json负载来构建和API调用。 Any extra data will cause the call to fail. 任何额外的数据将导致呼叫失败。
I need to create a dictionary from the csv file populated with only selected columns, lets say column1 and column5 preserving the key,value structure. 我需要从仅填充选定列的csv文件中创建字典,可以说column1和column5保留键值结构。 So far i have been able to output either only keys or only values or all keys and values but not specific key value data sets. 到目前为止,我已经只能输出键或仅值或所有键和值,但不能输出特定键值数据集。
I need to achieve this using standard python27 the csv module , nothing extra such as panda as i have to work with what i have. 我需要使用标准的python27 csv模块来实现此目的,没有其他事情,例如熊猫,因为我必须使用我所拥有的东西。 I know Im missing something obvious but just cant figure it out. 我知道我缺少明显的东西,但无法弄清楚。 Help is greatly appreciated. 非常感谢您的帮助。
source file example: 源文件示例:
column1,column2,column3,column4,column5
joe,43,888-123-4567,seattle,toyota
bill,18,888-123-4567,vancouver,gm
sally,32,888-987-1234,la,ford
desired output to dictionary: 所需的输出到字典:
[{'column1':'joe', 'column5':'toyota'},{'column1':'bil', 'column5':'gm'},{'column1':'sally', 'column5':'ford'}]
code snippet: 代码段:
import csv
def parseSourceFile(filename):
filtered_data = {}
reader = csv.DictReader(open(filename, "rb"))
# some for loop here extracting only column1 and column5 with their values appending to filtered_data
return result
def main():
readerObj = parseSourceFile('somefile.csv')
for row in readerObj:
print row #at this point i only want columns1,5 k,v data
if __name__ == '__main__':
main()
You can store the keys you need to a list and then every row you read from csv file use dict comprehension to pick the keys you need: 您可以将所需的密钥存储到列表中,然后从CSV文件读取的每一行都使用dict理解来选择所需的密钥:
import csv
import pprint
KEYS = [
'column1',
'column5'
]
def parseSourceFile(filename):
with open(filename) as f:
reader = csv.DictReader(f)
return [{key: row[key] for key in KEYS} for row in reader]
pprint.pprint(parseSourceFile('somefile.csv'))
Output: 输出:
[{'column1': 'joe', 'column5': 'toyota'},
{'column1': 'bill', 'column5': 'gm'},
{'column1': 'sally', 'column5': 'ford'}]
What about something like 怎么样
import csv
def parseSourceFile(filename):
reader = csv.DictReader(open(filename, "r"))
result = []
for row in reader:
result.append({k:v for (k,v) in row.items() if k in ['column1', 'column5']})
return result
def main():
result = parseSourceFile('so.csv')
# Print what you wrote you expected
print(result)
# Or iterate over the list elements and print each on separate lines
for row in result:
print(row)
if __name__ == '__main__':
main()
Output: 输出:
[{'column1': 'joe', 'column5': 'toyota'}, {'column1': 'bill', 'column5': 'gm'}, {'column1': 'sally', 'column5': 'ford'}] {'column1': 'joe', 'column5': 'toyota'} {'column1': 'bill', 'column5': 'gm'} {'column1': 'sally', 'column5': 'ford'}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.