从字典中提取信息 dataframe

Question

using the module facebook_scraper in Python I would like to extract the text of Facebook comments of posts to conduct a sentiment analysis of a certain page.使用 Python 中的模块facebook_scraper我想提取 Facebook 条帖子评论的文本，以对某个页面进行情感分析。

With the following usage of the built-in function get_posts ,使用内置的 function get_posts的以下用法，

from facebook_scraper import get_posts
import pandas as pd

for post in get_posts('PAGE_NAME', extra_info=True, pages=50, options={"comments": True}):
    post_entry = post
    fb_post_df = pd.DataFrame.from_dict(post_entry, orient='index')
    fb_post_df = fb_post_df.transpose()    
    post_df_full = post_df_full.append(fb_post_df)
    print(post['post_id']+' get')

it's possible to scrape the post information into the dataframe fb_post_df which looks like this (condensed version with only the relevant columns, since function returns df with 50 columns):可以将帖子信息抓取到 dataframe fb_post_df中，它看起来像这样（只有相关列的压缩版本，因为 function 返回包含 50 列的 df）：

post_id post_id	text文本	... ...	comments_full comments_full
12345 12345	'text of the post' '帖子正文'	... ...	[{'comment_id': '12345', 'comment_url': 'https://facebook.com/12345', 'commenter_id': '12345', 'commenter_url': None, 'commenter_name': 'Jane Doe', 'commenter_meta': None, 'comment_text': 'THIS PIECE I NEED, TEXT OF THE COMMENT' , 'comment_time': 2022-02-23 10:01:38, 'comment_image': None, 'comment_reactors': [], 'comment_reactions': None, 'comment_reaction_count': None, 'replies': []}] [{'comment_id': '12345', 'comment_url': 'https://facebook.com/12345', 'commenter_id': '12345', 'commenter_url': None, 'commenter_name': 'Jane Doe', ' commenter_meta'：无， 'comment_text'：'我需要这篇文章，评论的文本' ，'comment_time'：2022-02-23 10:01:38，'comment_image'：无，'comment_reactors'：[]，' comment_reactions'：无，'comment_reaction_count'：无，'回复'：[]}]

The dtype of the column comments_full is an object. comments_full 列的dtype是 object。

I've tried using pandas' from_dict to generate a new dataframe solely consisting of the comment texts, but it seems to fail to identify the contents of the column as a dictionary - since it is a list of dictionaries (if that makes sense).我试过使用 pandas 的from_dict来生成一个新的 dataframe 仅由评论文本组成，但它似乎无法将列的内容识别为字典——因为它是一个字典列表（如果这有意义的话）。

Please note, that the column can be empty if a post has no comments, in this case the content of the column looks as such: []请注意，如果帖子没有评论，该列可以为空，在这种情况下，该列的内容如下所示： []

Answer 1

List comprehension should do the trick:列表理解应该可以解决问题：

post_df_full['comments_full'].apply(lambda x: [y['comment_text'] for y in x] if x else 'no comment')

从字典中提取信息 dataframe

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-03-23 10:44:00

从字典中提取信息 dataframe

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-03-23 10:44:00

解决方案1
1 已采纳 2022-03-23 10:44:00