简体   繁体   English

如何从csv文件中的一行读取JSON字符串?

[英]How to read JSON string from a line in csv file?

I'm new to MapReduce and MRjob, I am trying to read a csv file that I want to process using MRjob in python. 我是MapReduce和MRjob的新手,我试图读取要在python中使用MRjob处理的csv文件。 But it has about 5 columns with JSON strings(eg. {}) or an array of JSON strings (eg. [{},{}]), some of them are nested. 但是它有大约5列带有JSON字符串(例如{})或JSON字符串数组(例如[{},{}])的列,其中有些是嵌套的。

My mapper so far looks as follows: 到目前为止,我的映射器如下所示:

from mrjob.job import MRJob
import csv
from io import StringIO

class MRWordCount(MRJob):
    def mapper(self, _, line):
        l = StringIO(line)
        reader = csv.reader(l) # returns a generator.

        for cols in reader:
            columns = cols

        yield None, columns

I get the error - 我得到了错误-

_csv.Error: field larger than field limit (131072) _csv。错误:字段大于字段限制(131072)

But that seems to happen because my code separates the JSON strings into separate columns as well (because of the commas inside). 但这似乎是发生了,因为我的代码也将JSON字符串也分成了单独的列(由于内部的逗号)。

How do I make this, so that the JSON strings are not split? 我如何做到这一点,以便不拆分JSON字符串? Maybe I'm overlooking something? 也许我忽略了什么?

Alternatively, is there any other ways I could read this file with MRjob that would make this process simpler or cleaner? 另外,是否还有其他方法可以使MRjob读取此文件,从而使此过程更简单或更简洁?

Your JSON string is not surrounded by quote characters so every comma in that field makes the csv engine think its a new column. 您的JSON字符串没有用引号引起来,因此该字段中的每个逗号都会使csv引擎将其视为新列。 take a look here what you are looking for is quotechar change your data so that you json is surrounded with a special character (The default is " ) and adjust your csv reader accordingly 这里看看您要寻找的是quotechar更改您的数据,以便json包含一个特殊字符(默认为" )并相应地调整您的csv阅读器

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM