在Python中替换列表中的部分字符串

Question

I know similar questions exist for this topic but I've gone through them and still couldn't get it. 我知道这个主题也存在类似的问题，但是我已经遍历了这些问题，但仍然无法解决。

My python program retrieves a subsection of html from a page using a regular expression. 我的python程序使用正则表达式从页面检索html的一部分。 I just realised that I hadn't accounted for html special characters getting in the way. 我只是意识到我并没有考虑到html特殊字符会妨碍您。

say I have: 说我有：

regex_title = ['I went to the store', 'Itlt'sa nice day today', 'I went home for a rest']

I obviously want to change lt' 我显然想更改lt' to a single quote '. 单引号'。

I've tried variations of: 我尝试了以下的变体：

for each in regex_title:
    if 'lt&#039;' in regex_title:
        str.replace("lt&#039;", "'")

but had no success. 但没有成功。 What am I missing. 我想念什么。

NOTE: The purpose is to do this without importing any more modules. 注意：目的是在不导入更多模块的情况下执行此操作。

Answer 1

str.replace does not replace in-place. str.replace不能就地替换。 It returns the replaced string. 它返回替换后的字符串。 You need to assigned back the return value. 您需要分配回值。

>>> regex_title = ['I went to the store', 'Itlt&#039;s a nice day today',
...                'I went home for a rest']
>>> regex_title = [s.replace("lt&#039;", "'") for s in regex_title]
>>> regex_title
['I went to the store', "It's a nice day today", 'I went home for a rest']

Answer 2

If your task is to unescape HTML, then better use unescape function: 如果您的任务是对HTML进行转义，那么最好使用unescape函数：

>>> ll = ['I went to the store', 'Itlt&#039;s a nice day today', 'I went home for a rest']
>>> import HTMLParser
>>> h = HTMLParser.HTMLParser()
>>> print map(h.unescape, ll)
['I went to the store', u"Itlt's a nice day today", 'I went home for a rest']

Answer 3

You need to change your code to this : 您需要将代码更改为此：

for each in regex_title:
    if 'lt&#039;' in each:
        each.replace("lt&#039;", "'")

But it doesn't change your list so you need to pass the replaced index to list: 但这不会更改您的列表，因此您需要将替换的索引传递给list：

>>> for each in regex_title:
...         if 'lt&#039;' in each:
...             regex_title[regex_title.index(each)]=each.replace("lt&#039;", "'")
... 
>>> regex_title
['I went to the store', "It's a nice day today", 'I went home for a rest']
>>>

Answer 4

You don't explain why you want to avoid importing standard library modules. 您无需解释为什么要避免导入标准库模块。 There are very few good reasons to deny yourself the use of Python's included batteries; 很少有理由拒绝使用Python随附的电池。 unless you have such a reason (and if you do, you should state it), you should use the functionality provided to you. 除非您有这样的理由（如果确实如此，则应说明理由），则应使用提供给您的功能。

In this case, it's the unescape() function from the html module: ¹ 在这种情况下，它是html模块中的unescape()函数： ¹

from html import unescape

titles = [
    'I went to the store',
    'It&#039;s a nice day today',
    'I went home for a rest'
]

fixed = [unescape(s) for s in titles]

>>> fixed
['I went to the store', "It's a nice day today", 'I went home for a rest']

Reimplementing html.unescape() yourself is 自己重新实现html.unescape()是

Pointless. 无意义。
Error-prone. 容易出错。
Going to mean constantly going back and adding new cases when new HTML entities crop up in your data. 这意味着要不断返回并在数据中出现新的HTML实体时添加新案例。

¹ Since Python 3.4, anyway. ¹从Python 3.4开始，无论如何。 For previous versions, use HTMLParser.HTMLParser.unescape() as per @stalk's answer . 对于以前的版本，请按照@stalk的answer使用HTMLParser.HTMLParser.unescape() 。

Answer 5

Instead of doing this yourself, you'd be better off using the HTMLParser library, as described in https://stackoverflow.com/a/2087433/2314532 . 最好不要使用HTMLParser库，而最好自己动手，如https://stackoverflow.com/a/2087433/2314532中所述。 Read that question and answer for all the details, but the summary is: 阅读该问题和答案以获取所有详细信息，但摘要是：

import HTMLParser
parser = HTMLParser.HTMLParser()
print parser.unescape('&#039;')
# Will print a single ' character

So in your case, you'd want to do something like: 因此，在您的情况下，您想要执行以下操作：

import HTMLParser
parser = HTMLParser.HTMLParser()
new_titles = [parser.unescape(s) for s in regex_title]

That will unescape any HTML escape, not just the ' 这将取消所有 HTML转义，而不仅仅是' escape that you asked about, and process the entire list all at once. 转义您要的内容，然后一次处理整个列表。

Answer 6

Try like this:- 尝试这样：-

 regex_title = ['I went to the store', 'Itlt&#039;s a nice day today', 'I went home for a rest']
 str=','.join(regex_title)
 str1=str.replace("lt&#039;","'");    
 print str1.split()

在Python中替换列表中的部分字符串

问题描述

6 个解决方案

解决方案1
3 已采纳 2014-10-03 06:19:09

解决方案2
2 2014-10-03 06:24:58

解决方案3
1 2014-10-03 06:19:22

解决方案4
1 2014-10-03 06:28:25

解决方案5
0 2014-10-03 06:24:31

解决方案6
0 2014-10-03 08:23:02

在Python中替换列表中的部分字符串

问题描述

6 个解决方案

解决方案1 3 已采纳 2014-10-03 06:19:09

解决方案2 2 2014-10-03 06:24:58

解决方案3 1 2014-10-03 06:19:22

解决方案4 1 2014-10-03 06:28:25

解决方案5 0 2014-10-03 06:24:31

解决方案6 0 2014-10-03 08:23:02

解决方案1
3 已采纳 2014-10-03 06:19:09

解决方案2
2 2014-10-03 06:24:58

解决方案3
1 2014-10-03 06:19:22

解决方案4
1 2014-10-03 06:28:25

解决方案5
0 2014-10-03 06:24:31

解决方案6
0 2014-10-03 08:23:02