简体   繁体   English

无法替换 Python pandas dataframe 中的特殊字符

[英]Cannot replace special characters in a Python pandas dataframe

I'm working with Python 3.5 in Windows. I have a dataframe where a 'titles' str type column contains titles of headlines, some of which have special characters such as â , , ˜ .我正在使用 Windows 中的 Python 3.5。我有一个 dataframe,其中'titles' str 类型列包含标题标题,其中一些具有特殊字符,例如â˜

I am trying to replace these with a space '' using pandas.replace .我正在尝试使用pandas.replace将它们替换为空格'' I have tried various iterations and nothing works.我尝试了各种迭代,但没有任何效果。 I am able to replace regular characters, but these special characters just don't seem to work.我可以替换常规字符,但这些特殊字符似乎不起作用。

The code runs without error, but the replacement simply does not occur, and instead the original title is returned.代码运行没有错误,但替换根本没有发生,而是返回了原始标题。 Below is what I have tried already.以下是我已经尝试过的。 Any advice would be much appreciated.任何建议将不胜感激。

df['clean_title'] = df['titles'].replace('€','',regex=True)
df['clean_titles'] = df['titles'].replace('€','')
df['clean_titles'] = df['titles'].str.replace('€','')

def clean_text(row):
   return re.sub('€','',str(row))
   return str(row).replace('€','')
df['clean_title'] = df['titles'].apply(clean_text)

We can only assume that you refer to non-ASCI as 'special' characters.我们只能假设您将非 ASCI 称为“特殊”字符。

To remove all non-ASCI characters in a pandas dataframe column, do the following:要删除 pandas dataframe 列中的所有非 ASCI 字符,请执行以下操作:

df['clean_titles'] = df['titles'].str.replace(r'[^\x00-\x7f]', '')

Note that this is a scalable solution as it works for any non-ASCI char.请注意,这是一个可扩展的解决方案,因为它适用于任何非 ASCI 字符。

How to remove escape sequence character in dataframe如何删除 dataframe 中的转义序列字符

Data.数据。

product,rating pest,<br> test mouse,/产品,害虫评级,<br>测试鼠标,/
mousetest鼠标测试

Solution: scala Code解决方案:scala代码

 val finaldf = df.withColumn("rating", regexp_replace(col("rating"), "\\\\", "/")).show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM