Python - 替换字符串中的非ascii字符（»）

Question

I need to replace in a string the character "»" with a whitespace, but I still get an error. 我需要在字符串中用空格替换字符“»”，但我仍然会收到错误。 This is the code I use: 这是我使用的代码：

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup

# other code

soup = BeautifulSoup(data, 'lxml')
mystring = soup.find('a').text.replace(' »','')

UnicodeEncodeError: 'ascii' codec can't encode character u'\\xbb' in position 13: ordinal not in range(128) UnicodeEncodeError：'ascii'编解码器无法对位置13中的字符u'\\ xbb'进行编码：序数不在范围内（128）

But If I test it with this other script: 但如果我用其他脚本测试它：

# -*- coding: utf-8 -*-
a = "hi »"
b = a.replace('»','')

It works. 有用。 Why this? 为什么这个？

Answer 1

In order to replace the content of string using str.replace() method; 为了使用str.replace()方法替换字符串的内容; you need to firstly decode the string, then replace the text and encode it back to the original text: 你需要首先解码字符串，然后替换文本并将其编码回原始文本：

>>> a = "hi »"
>>> a.decode('utf-8').replace("»".decode('utf-8'), "").encode('utf-8')
'hi '

You may also use the following regex to remove all the non-ascii characters from the string: 您还可以使用以下正则表达式从字符串中删除所有非ascii字符：

>>> import re
>>> re.sub(r'[^\x00-\x7f]',r'', 'hi »')
'hi '

Answer 2

@Moinuddin Quadri's answer fits your use-case better, but in general, an easy way to remove non-ASCII characters from a given string is by doing the following: @Moinuddin Quadri的答案更适合您的用例，但一般来说，从给定字符串中删除非ASCII字符的简单方法是执行以下操作：

# the characters '¡' and '¢' are non-ASCII
string = "hello, my name is ¢arl... ¡Hola!"

all_ascii = ''.join(char for char in string if ord(char) < 128)

This results in: 这导致：

>>> print(all_ascii)
"hello, my name is arl... Hola!"

You could also do this: 你也可以这样做：

''.join(filter(lambda c: ord(c) < 128, string))

But that's about 30% slower than the char for char ... approach. 但这比char for char ...方法慢了约30％。

Python - 替换字符串中的非ascii字符（»）

问题描述

2 个解决方案

解决方案1
16 已采纳 2016-11-29 17:37:16

解决方案2
6 2016-11-29 17:42:09

Python - 替换字符串中的非ascii字符（»）

问题描述

2 个解决方案

解决方案1 16 已采纳 2016-11-29 17:37:16

解决方案2 6 2016-11-29 17:42:09

解决方案1
16 已采纳 2016-11-29 17:37:16

解决方案2
6 2016-11-29 17:42:09