简体   繁体   English

如何在Python 3中删除字符串中的特殊字符?

[英]How to remove special characters in a string in Python 3?

I would like to convert 我想转换

from this 由此

<b><i><u>Charming boutique selling trendy casual & dressy apparel for women, some plus sized items, swimwear, shoes & jewelry.</u></i></b>

to this 对此

Charming boutique selling trendy casual dressy apparel for women, some plus sized items, swimwear, shoes jewelry.

I'm very confused how to remove not only special characters but also some alphabets between the special characters. 我很困惑如何不仅删除特殊字符,还删除特殊字符之间的一些字母。 Can anyone suggest a way to do that? 任何人都可以建议一种方法吗?

Try the following: 请尝试以下方法:

import re

string = '<b><i><u>Charming boutique selling trendy casual & dressy apparel for women, some plus sized items, swimwear, shoes & jewelry.</u></i></b>'

string = re.sub('</?[a-z]+>', '', string)
string = string.replace('&', '&')

print(string)  # prints 'Charming boutique selling trendy casual & dressy apparel for women, some plus sized items, swimwear, shoes & jewelry.'

Your string that you want to change looks like it was HTML that's been escaped a few times over, so my solution only works for that kind of thing. 您想要更改的字符串看起来像是已经过几次转义的HTML,所以我的解决方案只适用于那种事情。

I used regex to replace the tags with empty strings, and also I replaced the escape for an ampersand with a literal & . 我用正则表达式用空字符串替换标签,并且我用一个文字&替换了一个&符号的转义符。

Hopefully this is what you're looking for, let me know if you have any troubles. 希望这是你正在寻找的,如果你有任何麻烦,请告诉我。

You can use html module and BeautifulSoup to get text without escaped tags: 您可以使用html模块和BeautifulSoup来获取没有转义标记的文本:

s = "<b><i><u>Charming boutique selling trendy casual & dressy apparel for women, some plus sized items, swimwear, shoes & jewelry.</u></i></b>"

from bs4 import BeautifulSoup
from html import unescape

soup = BeautifulSoup(unescape(s), 'lxml')
print(soup.text)

Prints: 打印:

Charming boutique selling trendy casual & dressy apparel for women, some plus sized items, swimwear, shoes & jewelry.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM