简体   繁体   English

删除特殊字符python数据框

[英]Remove special characters python data frame

I wanted to remove special characters from a column and some words I choose.我想从列中删除特殊字符和我选择的一些单词。

df['tweet_text'][0]
'\\": \\"#\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588 TEXAS Corona update 19-MAY-21\\\\n\\\\nTotal Deaths 51","180\\\\n\\\\nhttps://t.co/jeoAqC07Oq\\\\n\\\\n#\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588 #\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588 #\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588 #\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588 #\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588 #\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588 #\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588updates #\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588 #\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588 #\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588 #\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\\\u2588\\"","\\"'

I used我用了

df['tweet_text'] = df['tweet_text'].str.replace('[#,@,&,{,},",:,//,\\\n,-,\\\\,u2588]', '')

' TEXAS Corona pdate 19MAY1nnTotal Deaths 110nnhttpst.cojeoAqC07Oqnn pdates ' ' 德克萨斯州电晕 pdate 19MAY1nn 总死亡人数 110nnhttpst.cojeoAqC07Oqnn pdates '

As you can see in the out put, there "nn" not removed, and every "u" is removed .正如您在输出中看到的那样,没有删除 "nn",并且删除了每个 "u"。 Can you help me figure this out?你能帮我解决这个问题吗? thank you!谢谢你!

.replace() uses regular expressions. .replace()使用正则表达式。 Your regex character class '[#,@,&,{,},",:,//,\\\\\\n,-,\\\\\\\\,u2588]' is parsed as您的正则表达式字符类'[#,@,&,{,},",:,//,\\\\\\n,-,\\\\\\\\,u2588]'被解析为

[#,@,&,{,},",:,//,\
,-,\\,u2588]

so it will match the newline character and the characters "#&,/258:@\\u{}\u003c/code> (not a dash, though, since it's a range delimiter in regexps).所以它将匹配换行符和字符"#&,/258:@\\u{}\u003c/code> (不过不是破折号,因为它是正则表达式中的范围分隔符)。

You'll need to read up on the syntax for regular expressions.您需要阅读正则表达式的语法。

(However, if your dataframe has a string like that to begin with, I'm afraid your data is broken in other ways too...) (但是,如果您的数据框开始时有这样的字符串,恐怕您的数据也会以其他方式损坏...)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python Pandas 数据框:一列包含特殊的 HTML 特殊字符,例如 & < 有没有办法删除它们? - Python Pandas Data Frame: One column contains special HTML spcial characters such as & < Is there a way to remove them? 如何删除json数据python中的特殊字符 - How to remove special characters in json data python 如何识别和删除数据框中的所有特殊字符 - How to identify and remove all the special characters from the data frame 如何在python中删除特殊字符? - How to remove special characters in python? 删除python 3.7中的特殊字符 - remove special characters in python 3.7 当 web 与 Python 一起抓取时,如何删除 pandas 数据帧中的字符? - How to remove characters in pandas data frame when web scraping with Python? 如何仅删除数据框中一列的特殊字符? - How can I remove special characters for just one column in a data frame? 我应该如何从数据框中删除特殊字符(空格除外) - How should I remove special characters from data frame except space 将元素中的列表转换为字典或替换Python中pandas数据框中元素中的特殊字符 - Convert list to dictionary in element Or replace special characters in element in data frame of pandas in Python Python Regex - 删除特殊字符但保留撇号 - Python Regex - Remove special characters but preserve apostraphes
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM