简体   繁体   English

在Python中将多个JSON写入CSV-字典到CSV

[英]Writing multiple JSON to CSV in Python - Dictionary to CSV

I am using Tweepy to stream tweets and would like to record them in a CSV format so I can play around with them or load them in database later. 我正在使用Tweepy来发送推文,并希望以CSV格式记录它们,以便以后可以使用它们或稍后将其加载到数据库中。 Please keep in mind that I am a noob, but I do realize there are multiple ways of handling this (suggestions are very welcome). 请记住,我是菜鸟,但我确实意识到有多种处理方法(非常欢迎提出建议)。

Long story short, I need to convert and append multiple Python dictionaries to a CSV file. 长话短说,我需要转换多个Python字典并将其附加到CSV文件。 I already did my research ( How do I write a Python dictionary to a csv file? ) and tried doing this with DictWriter and writer methods. 我已经进行了研究( 如何将Python字典写入csv文件? ),并尝试使用DictWriter和writer方法进行此操作。

However, there are few more things that need to be accomplished: 但是,还需要完成几件事:

1) Write key as header only once. 1)只能将密钥作为标题写入一次。

2) As new tweet is streamed, value needs to be appended without overwriting previous rows. 2)随着新推文的流式传输,需要附加值而不覆盖先前的行。

3) If value is missing record NULL. 3)如果缺少值,则记录NULL。

4) Skip/fix ascii codec errors. 4)跳过/修复ascii编解码器错误。

Here is the format of what I would like to end up with (each value is in its individual cell): 这是我想最后得到的格式(每个值在其单独的单元格中):

Header1_Key_1 Header2_Key_2 Header3_Key_3... Header1_Key_1 Header2_Key_2 Header3_Key_3 ...

Row1_Value_1 Row1_Value_2 Row1_Value_3... Row1_Value_1 Row1_Value_2 Row1_Value_3 ...

Row2_Value_1 Row2_Value_2 Row2_Value_3... Row2_Value_1 Row2_Value_2 Row2_Value_3 ...

Row3_Value_1 Row3_Value_2 Row3_Value_3... Row3_Value_1 Row3_Value_2 Row3_Value_3 ...

Row4_Value_1 Row4_Value_2 Row4_Value_3... Row4_Value_1 Row4_Value_2 Row4_Value_3 ...

Here is my code: 这是我的代码:

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import csv
import json

consumer_key="XXXX"
consumer_secret="XXXX"
access_token="XXXX"
access_token_secret="XXXX"

class StdOutListener(StreamListener):

    def on_data(self, data):
        json_data = json.loads(data)

        data_header = json_data.keys()
        data_row = json_data.values()

        try:
            with open('csv_tweet3.csv', 'wb') as f:
                w = csv.DictWriter(f, data_header)
                w.writeheader(data_header)
                w.writerow(json_data)
        except BaseException, e:
            print 'Something is wrong', str(e)

        return True

    def on_error(self, status):
        print status

if __name__ == '__main__':
    l = StdOutListener()
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)

    stream = Stream(auth, l)
    stream.filter(track=['world cup'])

Thank you in advance! 先感谢您!

I have done a similar thing with facebook's graph API ( facepy module)! 我用facebook的图形API( facepy模块)做了类似的事情!

from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import csv
import json

consumer_key="XXXX"
consumer_secret="XXXX"
access_token="XXXX"
access_token_secret="XXXX"

class StdOutListener(StreamListener):
    _headers = None
    def __init__(self,headers,*args,**keys):
        StreamListener.__init__(self,*args,**keys)
        self._headers = headers

    def on_data(self, data):
        json_data = json.loads(data)

        #data_header = json_data.keys()
        #data_row = json_data.values()

        try:
            with open('csv_tweet3.csv', 'ab') as f: # a for append
                w = csv.writer(f)
                # write!
                w.writerow(self._valToStr(json_data[header])
                           if header in json_data else ''
                           for header in self._headers)
        except Exception, e:
            print 'Something is wrong', str(e)

        return True

    @static_method
    def _valToStr(o):
        # json returns a set number of datatypes - parse dependingly
        # https://docs.python.org/2/library/json.html#encoders-and-decoders
        if type(o)==unicode: return self._removeNonASCII(o)
        elif type(o)==bool: return str(o)
        elif type(o)==None: return ''
        elif ...
        ...

    def _removeNonASCII(s):
        return ''.join(i if ord(i)<128 else '' for i in s)

    def on_error(self, status):
        print status

if __name__ == '__main__':
    headers = ['look','at','twitter','api',
               'to','find','all','possible',
               'keys']

    # initialize csv file with header info
    with open('csv_tweet3.csv', 'wb') as f:
        w = csv.writer(headers)

    l = StdOutListener(headers)
    auth = OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)

    stream = Stream(auth, l)
    stream.filter(track=['world cup'])

It's not copy&paste ready, but it's clear enough to where you should be able to finish it. 还没有准备好复制粘贴,但是很清楚您应该能够完成的地方。
For performance, you may want to look opening the file, write several records, then close the file. 为了提高性能,您可能希望打开文件,写几条记录,然后关闭文件。 This way you're not consistently opening, initializing the csv writer, appending, then closing the file. 这样,您就不会一直打开,初始化csv编写器,追加,然后关闭文件。 I'm not familiar with the tweepy API, so I'm not sure exactly how this would work - but it's worth looking into. 我不熟悉tweepy API,因此不确定该如何工作-但值得研究。

If you run into any trouble, I'll be happy to help - enjoy! 如果您遇到任何麻烦,我们将很乐意为您提供帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM