简体   繁体   English

从字符串中删除 unicode 字符

[英]Remove unicode characters from a string

Below is my code snippet where I have been able to remove some escape characters.下面是我已经能够删除一些转义字符的代码片段。 But the problem is that I cannot remove unicode characters from given string NewOutput read from ParseLine().但问题是我无法从 ParseLine() 读取的给定字符串 NewOutput 中删除 unicode 字符。 Also I want to count the number of lines that contains unicode.另外我想计算包含unicode的行数。

For example the string NewOutput have 3 lines as:例如,字符串 NewOutput 有 3 行:

@KayKay121 dragged me to the library. @KayKay121 把我拖到图书馆。 Now I have to be productive \?\? https://t.co/HjZR3d5QaQ (timestamp: Thu Oct 29 17:51:50 +0000 2015)现在我必须提高效率 \?\? https://t.co/HjZR3d5QaQ (时间戳:2015 年 10 月 29 日星期四 17:51:50 +0000)

6A has decided to postpone final vote until appeals are heard by executive board. 6A 决定推迟最终投票,直到执行委员会听取上诉。 What seems set: 7 regions.似乎已设置:7 个区域。 (timestamp: Thu Oct 29 17:51:51 +0000 2015) (时间戳:2015 年 10 月 29 日星期四 17:51:51 +0000)

@i_am_sknapp Thanks for following us, Seth. @i_am_sknapp 感谢您关注我们,赛斯。 (timestamp: Thu Oct 29 18:10:49 +0000 2015) (时间戳:2015 年 10 月 29 日星期四 18:10:49 +0000)

It would be great help for me :) Thanks!!!这对我有很大帮助:) 谢谢!!!

if (readtweetfile.is_open()) 
{
    while (!readtweetfile.eof()) 
    {
        getline(readtweetfile,output);
        ParseLine(output,NewOutput);
        std::string unicod_string = output;

        if(NewOutput!=" ")
        {   
            std::string firstChar="Check";
            std::string secondChar;
            std::string checkingChar="";
            for (std::string::iterator it = NewOutput.begin(), end = NewOutput.end(); it != end; ++it)
            {
                if(firstChar=="Check")
                    firstChar = *it;
                else
                {
                    secondChar = *it;
                    checkingChar = firstChar + secondChar;

                    if(checkingChar=="\\\"")
                    {
                        writetweetfile << secondChar ; 
                        firstChar="Check";
                        continue;
                    }
                    else if(checkingChar=="\\/")
                    {
                        writetweetfile << secondChar; 
                        firstChar="Check";
                        continue;
                    }
                    else if(checkingChar=="\\\'")
                    {
                        writetweetfile << secondChar; 
                        firstChar="Check";
                        continue;
                    }
                    else if(checkingChar=="\\\n")
                    {
                        writetweetfile << " " ; 
                        firstChar="Check";
                        continue;
                    }
                    else if(checkingChar=="\\\t")
                    {
                        writetweetfile <<  " "; 
                        firstChar="Check";
                        continue;
                    }
                    else if(checkingChar=="\\  ")
                    {
                        writetweetfile <<  " "; 
                        firstChar="Check";
                        continue;
                    }
                    else if(checkingChar=="\\\\")
                    {
                        writetweetfile << secondChar;
                        firstChar="Check";
                        continue;
                    }
                    else if(checkingChar=="\\u")
                    {
                        writetweetfile << "unicode";
                        firstChar="Check";
                        continue;
                    }

                    writetweetfile << firstChar;
                    firstChar=secondChar;
                }   
            }
        }
        writetweetfile << std::endl;
    }
}

Well without actually knowing what you would like the output to be for your 3 samples - I came up with this好吧,实际上并不知道您希望 3 个样本的输出是什么 - 我想出了这个

\\(u|U)[a-zA-Z0-9]{4}|\\|\t|\n

This will find unicode and escape characters这将找到 unicode 和转义字符

If you need something different, revise the question with more examples and more importantly, what you would like the finished output to be.如果您需要不同的东西,请使用更多示例修改问题,更重要的是,您希望完成的输出是什么。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM