[英]Remove special characters python data frame
I wanted to remove special characters from a column and some words I choose.我想从列中删除特殊字符和我选择的一些单词。
df['tweet_text'][0]
'\\": \\"#\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588 TEXAS Corona update 19-MAY-21\\\\n\\\\nTotal Deaths 51","180\\\\n\\\\nhttps://t.co/jeoAqC07Oq\\\\n\\\\n#\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588 #\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588 #\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588 #\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588 #\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588 #\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588 #\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588updates #\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588 #\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588 #\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588 #\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\"","\\"'
I used我用了
df['tweet_text'] = df['tweet_text'].str.replace('[#,@,&,{,},",:,//,\\\n,-,\\\\,u2588]', '')
' TEXAS Corona pdate 19MAY1nnTotal Deaths 110nnhttpst.cojeoAqC07Oqnn pdates ' ' 德克萨斯州电晕 pdate 19MAY1nn 总死亡人数 110nnhttpst.cojeoAqC07Oqnn pdates '
As you can see in the out put, there "nn" not removed, and every "u" is removed .正如您在输出中看到的那样,没有删除 "nn",并且删除了每个 "u"。 Can you help me figure this out?
你能帮我解决这个问题吗? thank you!
谢谢你!
.replace()
uses regular expressions. .replace()
使用正则表达式。 Your regex character class '[#,@,&,{,},",:,//,\\\\\\n,-,\\\\\\\\,u2588]'
is parsed as您的正则表达式字符类
'[#,@,&,{,},",:,//,\\\\\\n,-,\\\\\\\\,u2588]'
被解析为
[#,@,&,{,},",:,//,\
,-,\\,u2588]
so it will match the newline character and the characters "#&,/258:@\\u{}\u003c/code> (not a dash, though, since it's a range delimiter in regexps).
所以它将匹配换行符和字符
"#&,/258:@\\u{}\u003c/code> (不过不是破折号,因为它是正则表达式中的范围分隔符)。
You'll need to read up on the syntax for regular expressions.
您需要阅读正则表达式的语法。
(However, if your dataframe has a string like that to begin with, I'm afraid your data is broken in other ways too...)
(但是,如果您的数据框开始时有这样的字符串,恐怕您的数据也会以其他方式损坏...)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.