简体   繁体   English

如何使用python scraper将结果保存到csv?

[英]How to save results to csv using python scraper?

I found this python code to scrape twitter by custom search queries: 我发现此python代码通过自定义搜索查询来刮擦Twitter:

https://github.com/tomkdickinson/Twitter-Search-API-Python/blob/master/TwitterScraper.py https://github.com/tomkdickinson/Twitter-Search-API-Python/blob/master/TwitterScraper.py

I want to store the results from this code to a csv file. 我想将这段代码的结果存储到一个csv文件中。

I tried adding the csv writer at around line 245 within the for loop that prints out the tweets as per my search query but the csv file results as blank 我尝试在for循环的第245行附近添加csv编写器,该循环根据我的搜索查询打印出tweet,但csv文件的结果为空白

def save_tweets(self, tweets):
    """
    Just prints out tweets
    :return: True always
    """
    for tweet in tweets:
        # Lets add a counter so we only collect a max number of tweets
        self.counter += 1
        if tweet['created_at'] is not None:
            t = datetime.datetime.fromtimestamp((tweet['created_at']/1000))
            fmt = "%Y-%m-%d %H:%M:%S"
            myCsvRow = log.info("%i [%s] - %s" % (self.counter, t.strftime(fmt), tweet['text']))
            fd = open('document.csv','a')
            fd.write(myCsvRow)
            fd.close()

    return True

Also, There is a comment in the code at around line 170 that mentions: 另外,在第170行附近的代码中有一条注释提到:

@abstractmethod
def save_tweets(self, tweets):
    """
    An abstract method that's called with a list of tweets.
    When implementing this class, you can do whatever you want with these tweets.
    """

How can I use this class to save the tweets? 如何使用此类保存推文?

Your problem appears to be the line: 您的问题似乎是这样的:

myCsvRow = log.info("%i [%s] - %s" % (self.counter, t.strftime(fmt), tweet['text']))

Looking at the code on the GitHub page you're using, I can see log is a python logger. 查看您正在使用的GitHub页面上的代码,我可以看到log是一个python记录器。 log.info 's purpose is to write the string that it is given somewhere (ex: the console, a file, or any combination of these or other places). log.info的目的是将给定的字符串写在某个地方(例如:控制台,文件或这些位置或其他位置的任意组合)。 It does not return a value, thus myCsvRow will be empty. 它不返回值,因此myCsvRow将为空。

What you want is more likely: 您想要的是更有可能的:

myCsvRow = "%i [%s] - %s" % (self.counter, t.strftime(fmt), tweet['text'])

Although, a couple notes on that: 虽然,有一些注意事项:

(1) You are not putting commas between the entries, which is common for CSVs (CSV = Comma Separated Values), and (1)您不在逗号之间插入逗号,这在CSV中很常见(CSV =逗号分隔值),并且

(2) It's actually kind of risky to try to write out a csv line when one of your fields is a text field that could potentially contain commas. (2)当您的一个字段是一个可能包含逗号的文本字段时,尝试写一个csv行实际上有点冒险。 If you naively just write out the text as-is, a comma in the tweet itself would cause whatever is interpreting the CSV to think that there are extra CSV fields in the row. 如果您只是天真地按原样写出文本,则推文中的逗号本身会导致任何解释CSV的人都认为该行中还有其他CSV字段。 Luckily python comes with a csv library that will help you avoid these kinds of problems. 幸运的是,python带有一个csv库,它将帮助您避免此类问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM