[英]What is the best way to save the comments collected from Facebook using Python?
I'm collecting all the comments from some Facebook pages using Python and Facebook-SDK. 我正在使用Python和Facebook-SDK从某些Facebook页面收集所有评论。
Since I want to do Sentiment Analysis on these comments, what's the best way to save these texts, such that it's not needed any changing in the texts? 既然我想对这些注释进行情感分析,那么保存这些文本的最佳方法是什么,从而无需对文本进行任何更改?
I'm now saving the comments as a table and then as a CSV file. 我现在将注释另存为表格,然后另存为CSV文件。
table.to_csv('file-name.csv')
But if I want to read this saved file, I get the following error: 但是,如果我想读取此保存的文件,则会出现以下错误:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position ...
By the way, I'm working with the German Texts. 顺便说一下,我正在处理德语文本。
Have you tried this? 你有尝试过吗?
Set default encoder at the top of your code 在代码顶部设置默认编码器
import sys
reload(sys)
sys.setdefaultencoding("ISO-8859-1")
or 要么
pd.read_csv('file-name.csv', encoding = "ISO-8859-1")
If you have knowledge about the encoding of the data then, you can simply use pandas to read your csv as follow: 如果您了解数据的编码,则可以简单地使用pandas读取csv,如下所示:
import pandas as pd
pd.read_csv('filename.csv', encoding='encoding')
I would say it really depends on many different factors such as: 我要说的是,这实际上取决于许多不同的因素,例如:
For most of my data munging in python I like to do it in pandas if possible, but sometimes that's not a feasible option given the size of the data. 对于我使用python处理的大多数数据,如果可能的话,我喜欢在熊猫中进行处理,但是鉴于数据的大小,有时这不是一个可行的选择。 In that case you'd have to think about using something like pyspark. 在这种情况下,您必须考虑使用pyspark之类的东西。 But here is a link to the pandas docs for reference, they have a lot of functionality for reading in all kinds of data: pandas docs 但是,这里有一个熊猫文档的链接供参考,它们具有很多功能,可以读取各种数据: 熊猫文档
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.