简体   繁体   English

将抓取数据转换为数据帧(微博推文)

[英]Transforming scraping data into dataframe (weibo tweets)

Good evening, 晚上好,

I just started using Python for a project (I want to use social media data from different platforms to then proceed to an analysis) and I need to retrieve from Weibo different tweet data. 我刚刚开始为项目使用Python(我想使用来自不同平台的社交媒体数据来进行分析),并且需要从微博检索不同的tweet数据。

I chose to use this library for this work. 我选择将此用于这项工作。 Taking the website example my code is the following: 以网站为例,我的代码如下:

from weibo_scraper import  get_weibo_tweets_by_name
for tweet in get_weibo_tweets_by_name(name='嘻红豆'):
    print(tweet)

Result looks like this: 结果如下:

{'card_type': 9, 'itemid': '1076033637346297_-_4341063131108312', 'scheme': 'https://m.weibo.cn/status/HheeR4Ek0?mblogid=HheeR4Ek0&luicode=10000011&lfid=1076033637346297', 'mblog': {'created_at': '12小时前', 'id': '4341063131108312', 'idstr': '4341063131108312', 'mid': '4341063131108312', 'can_edit': False, 'show_additional_indication': 0, 'text': '行吧//<a href=\'/n/夏正正\'>@夏正正</a>:我没有,我没说过。<span class="url-icon"><img alt=[感冒] src="//h5.sinaimg.cn/m/emoticon/icon/default/d_ganmao-babf39d6ae.png" style="width:1em; height:1em;" /></span>

I'm not sure if the other way of retrieving tweet makes it easier to transform it into a dataframe but here the other way of doing it: 我不确定检索推文的另一种方法是否可以更轻松地将其转换为数据帧,但在这里可以做到这一点:

from weibo_scraper import  get_formatted_weibo_tweets_by_name
result_iterator = get_formatted_weibo_tweets_by_name(name='嘻红豆', pages=None)
for user_meta in result_iterator:
    for tweetMeta in user_meta.cards_node:
        print(tweetMeta.mblog.text)

With the following result: 结果如下:

行吧//<a href='/n/夏正正'>@夏正正</a>:我没有,我没说过。<span class="url-icon"><img alt=[感冒] src="//h5.sinaimg.cn/m/emoticon/icon/default/d_ganmao-babf39d6ae.png" style="width:1em; height:1em;" /></span>//<a href='/n/勺布斯'>@勺布斯</a>:<span class="url-icon"><img alt=[二哈] src="//h5.sinaimg.cn/m/emoticon/icon/others/d_erha-0d2bea3a7d.png" style="width:1em; height:1em;" /></span>//<a href='/n/暴躁豆奶包'>@暴躁豆奶包</a>:逃避虽然舒服但没用//<a href='/n/by语冰'>@by语冰</a>: 难受😖//<a href='/n/-Lillyyyyyy-'>@-Lillyyyyyy-</a>:扎心

From here, I'm not sure how I should proceed to transform the data into a pandas dataframe (creating a CSV?, transform the data directly?). 从这里开始,我不确定如何将数据转换为熊猫数据框(创建CSV?直接转换数据?)。

I would like to have some guidance on this if possible. 如果可能的话,我想对此提供一些指导。

Thank you very much for reading. 非常感谢您的阅读。

While its hard for me to grasp exactly what you are looking to achieve, I think this should get you started in a dataframe. 尽管我很难准确地掌握要实现的目标,但我认为这应该使您从一个数据框架入手。 You can start earlier by adding the tweet itself to the list then use the pd.DataFrame(tweets) to create a datafrmae then expanding and extracting from there or you can do the below. 您可以通过将推文本身添加到列表中来开始,然后使用pd.DataFrame(tweets)创建一个数据表,然后从那里展开和提取,或者可以执行以下操作。

tweets = []
from weibo_scraper import get_formatted_weibo_tweets_by_name

result_iterator = get_formatted_weibo_tweets_by_name(name='嘻红豆', pages=1)
for user_meta in result_iterator:
    for tweetMeta in user_meta.cards_node:
        tweets.append(tweetMeta.mblog.text)

df = pd.DataFrame(tweets)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    print(df)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM