如何从收集的数据集中删除转发

Question

i have a collected dataset of tweets in python (jupyter notebook).我在python（jupyter笔记本）中有一个收集的推文数据集。 but there are many duplicate tweets.但是有很多重复的推文。 how can i remove these programmaticaly with python (jupyter notebook)我怎样才能用python（jupyter notebook）以编程方式删除这些

csvFile = open('ua.csv', 'a')
csvWriter = csv.writer(csvFile)

search_words = "corona"
date_since = "2020-10-13"

new_search = search_words + " -filter:retweets"
new_search

for tweet in tweepy.Cursor(api.search,q=search_words,count=100,
                           lang="id",
                           since=date_since).items():
    print (tweet.created_at, tweet.text)
    csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])

Answer 1

While you're iterating through the list of tweets, you could keep a list of tweets in a set, and check if you've already written that tweet.当您遍历推文列表时，您可以在一个集合中保留一个推文列表，并检查您是否已经编写了该推文。

tweet_set = set() # store tweet ids you've already seen before
for tweet in tweepy.Cursor(api.search,q=search_words,count=100,
                           lang="id",
                           since=date_since).items():

    if tweet.id not in tweet_set:
        print (tweet.created_at, tweet.text)
        csvWriter.writerow([tweet.created_at, tweet.text.encode('utf-8')])

        tweet_set.add(tweet.id) # update the set of tweets

如何从收集的数据集中删除转发

问题描述

1 个解决方案

解决方案1
0 2020-10-15 02:19:18

如何从收集的数据集中删除转发

问题描述

1 个解决方案

解决方案1 0 2020-10-15 02:19:18

解决方案1
0 2020-10-15 02:19:18