简体繁体 English

重新转换txt文件（从Windows到Unix）

[英]Reconversion txt file (from Windows to Unix)

原文 2017-05-10 01:11:49 8 1 java/ utf-8/ type-conversion/ prediction

My university project written in Java, take tweets from Twitter and analyzes them. 我的大学项目是用Java编写的，从Twitter获得了推文并对其进行了分析。

In the first phase, I take tweets; 在第一阶段，我会发推文。 I have to do that on a Windows machine, after I put online on my Linux server program and I use it to analyze tweets with a user feedback system. 我在Linux服务器程序上联机并使用它通过用户反馈系统分析推文后，必须在Windows机器上执行此操作。

When I open the txt file on Linux machine, it asks me if I want convert in UTF-8 , and I click yes. 当我在Linux机器上打开txt文件时，它询问我是否要在UTF-8转换，然后单击“是”。 But because of this operation some special characters are not formatted correctly. 但是由于此操作，某些特殊字符的格式不正确。 If I try to reconvert in original format (maybe CP1252 ) with iconv it returns an error caused by special characters. 如果我尝试使用iconv转换为原始格式（也许是CP1252 ），它将返回由特殊字符引起的错误。

I understand that it is impossible to reconvert that characters, because any special character is a sum of the possible character that they may be, but I can use a sort of text predict character to rewrite that character ? 我知道不可能重新转换这些字符，因为任何特殊字符都是它们可能是的可能字符的总和，但是我可以使用某种文本预测字符来重写该字符? . 。

For example if I have because , and e is a special character I see this word something like this becaus? 例如，如果我有because ，并且e是一个特殊字符，那么我会看到这个单词，因为这样becaus? , If I remove the ? ，如果我删除了? character, how can I reput the e ? 性格，我该如何称呼e ？ I have tried to use Word but the txt is too big, so there a big mount of words with this problem, and with Word you have to check every word manually. 我尝试使用Word，但txt太大，因此出现大量单词出现此问题，而使用Word则必须手动检查每个单词。