简体   繁体   English

使用Python保存从Facebook收集的评论的最佳方法是什么?

[英]What is the best way to save the comments collected from Facebook using Python?

I'm collecting all the comments from some Facebook pages using Python and Facebook-SDK. 我正在使用Python和Facebook-SDK从某些Facebook页面收集所有评论。

Since I want to do Sentiment Analysis on these comments, what's the best way to save these texts, such that it's not needed any changing in the texts? 既然我想对这些注释进行情感分析,那么保存这些文本的最佳方法是什么,从而无需对文本进行任何更改?

I'm now saving the comments as a table and then as a CSV file. 我现在将注释另存为表格,然后另存为CSV文件。

table.to_csv('file-name.csv')

But if I want to read this saved file, I get the following error: 但是,如果我想读取此保存的文件,则会出现以下错误:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position ...

By the way, I'm working with the German Texts. 顺便说一下,我正在处理德语文本。

Have you tried this? 你有尝试过吗?

Set default encoder at the top of your code 在代码顶部设置默认编码器

import sys
reload(sys)
sys.setdefaultencoding("ISO-8859-1")

or 要么

pd.read_csv('file-name.csv', encoding = "ISO-8859-1")

If you have knowledge about the encoding of the data then, you can simply use pandas to read your csv as follow: 如果您了解数据的编码,则可以简单地使用pandas读取csv,如下所示:

import pandas as pd
pd.read_csv('filename.csv', encoding='encoding')

I would say it really depends on many different factors such as: 我要说的是,这实际上取决于许多不同的因素,例如:

  • Size of the data 数据大小
  • What kind of analysis, specifically, are you anticipating that you'll be doing 具体来说,您打算进行哪种分析?
  • What format are you most comfortable working with the data 您最喜欢使用哪种格式的数据

For most of my data munging in python I like to do it in pandas if possible, but sometimes that's not a feasible option given the size of the data. 对于我使用python处理的大多数数据,如果可能的话,我喜欢在熊猫中进行处理,但是鉴于数据的大小,有时这不是一个可行的选择。 In that case you'd have to think about using something like pyspark. 在这种情况下,您必须考虑使用pyspark之类的东西。 But here is a link to the pandas docs for reference, they have a lot of functionality for reading in all kinds of data: pandas docs 但是,这里有一个熊猫文档的链接供参考,它们具有很多功能,可以读取各种数据: 熊猫文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 什么是在python中保存元组的最佳方法 - what is the best way to save tuples in python 从 Reddit 获取所有数据(帖子及其评论)的最佳方式是什么? - What is the best way to fetch all data (posts and their comments) from Reddit? 使用 python 从进程中捕获输出的最佳方法是什么? - What is the best way to capture output from a process using python? 使用python和xlrd,从电子表格中读取2列的最佳方法是什么 - Using python and xlrd, what is the best way to read 2 columns from a spreadsheet 在 Python 中停止程序并保存数据的最佳方法是什么? - What is the best way to stop a program in Python and save the data? 在 python 中保存和加载混合数据的最佳方法是什么? - What is the best way to save and load mixture data in python? 使用 python 下载文件的最佳方法是什么 - What is the best way to download files using python 从Facebook API(Python,JSON)中提取注释 - Pull comments from Facebook API (Python, JSON) Pytest 最小工作示例:收集测试但找不到模块。 配置测试集合的最佳方式是什么? - Pytest minimal working example: tests are collected but modules cannot be found. What is the best way to configure the test collection? 在Facebook上发布RSS Feed的最佳方法是什么? - What is the best way to publish RSS Feed on Facebook?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM