Python从S3上的csv创建字典列表

Question

I am trying to take a CSV and create a list of dictionaries in python with the CSV coming from S3.我正在尝试使用 CSV 并使用来自 S3 的 CSV 在 python 中创建字典列表。 Code is as follows:代码如下：

import os
import boto3
import csv
import json
from io import StringIO
import logging
import time

s3 = boto3.resource('s3')
s3Client = boto3.client('s3','us-east-1')

bucket = 'some-bucket'
key = 'some-key'

obj = s3Client.get_object(Bucket = bucket, Key = key)
lines = obj['Body'].read().decode('utf-8').splitlines(True)

newl = []

for line in csv.reader(lines, quotechar='"', delimiter=',',quoting=csv.QUOTE_ALL,skipinitialspace=True, escapechar="\\"):
    newl.append(line)

fieldnames = newl[0]
newl1 = newl[1:]

reader = csv.DictReader(newl1,fieldnames)
out = json.dumps([row for row in reader])
jlist1 = json.loads(out)

but this gives me the error:但这给了我错误：

iterator should return strings, not list (did you open the file in text mode?)

if I alter the for loop to this:如果我将 for 循环更改为：

for line in csv.reader(lines, quotechar='"', delimiter=',',quoting=csv.QUOTE_ALL,skipinitialspace=True, escapechar="\\"):
    newl.append(','.join(line))

then it works, however there are some fields that have commas in them so this completely screws up the schema and shifts the data.然后它可以工作，但是有些字段中包含逗号，因此这完全搞砸了架构并移动了数据。 For example:例如：

|address1   |address2  |state|
------------------------------
|123 Main st|APT 3, Fl1|TX   |

becomes:变成：

|address1   |address2  |state|null|
-----------------------------------
|123 Main st|APT 3     |Fl1  |TX  |

Where am I going wrong?我哪里错了？

Answer 1

The problem is that you are building a list of lists here :问题是您正在此处构建列表列表：

 newl.append(line)

and as the error says : iterator should return strings, not list正如错误所说：迭代器应该返回字符串，而不是列表

so try to cast line as a string:所以尝试将 line 转换为字符串：

newl.append(str(line))

Hope this helps :)希望这可以帮助：）

Answer 2

I ended up changing the code to this:我最终将代码更改为：

obj = s3Client.get_object(Bucket = bucket, Key = key)
lines1 = obj['Body'].read().decode('utf-8').split('\n')
fieldnames = lines1[0].replace('"','').split(',')
testls = [row for row in csv.DictReader(lines1[1:], fieldnames)]
out = json.dumps([row for row in testls])
jlist1 = json.loads(out)

And got the desired result并得到了想要的结果

Python从S3上的csv创建字典列表

问题描述

2 个解决方案

解决方案1
1 2020-02-28 18:59:41

解决方案2
0 已采纳 2020-02-28 18:47:31

Python从S3上的csv创建字典列表

问题描述

2 个解决方案

解决方案1 1 2020-02-28 18:59:41

解决方案2 0 已采纳 2020-02-28 18:47:31

解决方案1
1 2020-02-28 18:59:41

解决方案2
0 已采纳 2020-02-28 18:47:31