[英]Reconversion txt file (from Windows to Unix)
My university project written in Java, take tweets from Twitter and analyzes them. 我的大学项目是用Java编写的,从Twitter获得了推文并对其进行了分析。
In the first phase, I take tweets; 在第一阶段,我会发推文。 I have to do that on a Windows machine, after I put online on my Linux server program and I use it to analyze tweets with a user feedback system. 我在Linux服务器程序上联机并使用它通过用户反馈系统分析推文后,必须在Windows机器上执行此操作。
When I open the txt file on Linux machine, it asks me if I want convert in UTF-8
, and I click yes. 当我在Linux机器上打开txt文件时,它询问我是否要在UTF-8
转换,然后单击“是”。 But because of this operation some special characters are not formatted correctly. 但是由于此操作,某些特殊字符的格式不正确。 If I try to reconvert in original format (maybe CP1252
) with iconv it returns an error caused by special characters. 如果我尝试使用iconv转换为原始格式(也许是CP1252
),它将返回由特殊字符引起的错误。
I understand that it is impossible to reconvert that characters, because any special character is a sum of the possible character that they may be, but I can use a sort of text predict character to rewrite that character ?
我知道不可能重新转换这些字符,因为任何特殊字符都是它们可能是的可能字符的总和,但是我可以使用某种文本预测字符来重写该字符?
. 。
For example if I have because
, and e
is a special character I see this word something like this becaus?
例如,如果我有because
,并且e
是一个特殊字符,那么我会看到这个单词,因为这样becaus?
, If I remove the ?
,如果我删除了?
character, how can I reput the e
? 性格,我该如何称呼e
? I have tried to use Word but the txt is too big, so there a big mount of words with this problem, and with Word you have to check every word manually. 我尝试使用Word,但txt太大,因此出现大量单词出现此问题,而使用Word则必须手动检查每个单词。
您应该使用dos2unix
将文件更改为linux格式
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.