在Python中替換列表中的部分字符串

Question

我知道這個主題也存在類似的問題，但是我已經遍歷了這些問題，但仍然無法解決。

我的python程序使用正則表達式從頁面檢索html的一部分。 我只是意識到我並沒有考慮到html特殊字符會妨礙您。

說我有：

regex_title = ['I went to the store', 'Itlt'sa nice day today', 'I went home for a rest']

我顯然想更改lt' 單引號'。

我嘗試了以下的變體：

for each in regex_title:
    if 'lt&#039;' in regex_title:
        str.replace("lt&#039;", "'")

但沒有成功。 我想念什么。

注意：目的是在不導入更多模塊的情況下執行此操作。

Answer 1

str.replace不能就地替換。 它返回替換后的字符串。 您需要分配回值。

>>> regex_title = ['I went to the store', 'Itlt&#039;s a nice day today',
...                'I went home for a rest']
>>> regex_title = [s.replace("lt&#039;", "'") for s in regex_title]
>>> regex_title
['I went to the store', "It's a nice day today", 'I went home for a rest']

Answer 2

如果您的任務是對HTML進行轉義，那么最好使用unescape函數：

>>> ll = ['I went to the store', 'Itlt&#039;s a nice day today', 'I went home for a rest']
>>> import HTMLParser
>>> h = HTMLParser.HTMLParser()
>>> print map(h.unescape, ll)
['I went to the store', u"Itlt's a nice day today", 'I went home for a rest']

Answer 3

您需要將代碼更改為此：

for each in regex_title:
    if 'lt&#039;' in each:
        each.replace("lt&#039;", "'")

但這不會更改您的列表，因此您需要將替換的索引傳遞給list：

>>> for each in regex_title:
...         if 'lt&#039;' in each:
...             regex_title[regex_title.index(each)]=each.replace("lt&#039;", "'")
... 
>>> regex_title
['I went to the store', "It's a nice day today", 'I went home for a rest']
>>>

Answer 4

您無需解釋為什么要避免導入標准庫模塊。 很少有理由拒絕使用Python隨附的電池。 除非您有這樣的理由（如果確實如此，則應說明理由），則應使用提供給您的功能。

在這種情況下，它是html模塊中的unescape()函數： ¹

from html import unescape

titles = [
    'I went to the store',
    'It&#039;s a nice day today',
    'I went home for a rest'
]

fixed = [unescape(s) for s in titles]

>>> fixed
['I went to the store', "It's a nice day today", 'I went home for a rest']

自己重新實現html.unescape()是

無意義。
容易出錯。
這意味着要不斷返回並在數據中出現新的HTML實體時添加新案例。

¹從Python 3.4開始，無論如何。 對於以前的版本，請按照@stalk的answer使用HTMLParser.HTMLParser.unescape() 。

Answer 5

最好不要使用HTMLParser庫，而最好自己動手，如https://stackoverflow.com/a/2087433/2314532中所述。 閱讀該問題和答案以獲取所有詳細信息，但摘要是：

import HTMLParser
parser = HTMLParser.HTMLParser()
print parser.unescape('&#039;')
# Will print a single ' character

因此，在您的情況下，您想要執行以下操作：

import HTMLParser
parser = HTMLParser.HTMLParser()
new_titles = [parser.unescape(s) for s in regex_title]

這將取消所有 HTML轉義，而不僅僅是' 轉義您要的內容，然后一次處理整個列表。

Answer 6

嘗試這樣：-

 regex_title = ['I went to the store', 'Itlt&#039;s a nice day today', 'I went home for a rest']
 str=','.join(regex_title)
 str1=str.replace("lt&#039;","'");    
 print str1.split()

在Python中替換列表中的部分字符串

問題描述

6 個解決方案

解決方案1
3 已采納 2014-10-03 06:19:09

解決方案2
2 2014-10-03 06:24:58

解決方案3
1 2014-10-03 06:19:22

解決方案4
1 2014-10-03 06:28:25

解決方案5
0 2014-10-03 06:24:31

解決方案6
0 2014-10-03 08:23:02

在Python中替換列表中的部分字符串

問題描述

6 個解決方案

解決方案1 3 已采納 2014-10-03 06:19:09

解決方案2 2 2014-10-03 06:24:58

解決方案3 1 2014-10-03 06:19:22

解決方案4 1 2014-10-03 06:28:25

解決方案5 0 2014-10-03 06:24:31

解決方案6 0 2014-10-03 08:23:02

解決方案1
3 已采納 2014-10-03 06:19:09

解決方案2
2 2014-10-03 06:24:58

解決方案3
1 2014-10-03 06:19:22

解決方案4
1 2014-10-03 06:28:25

解決方案5
0 2014-10-03 06:24:31

解決方案6
0 2014-10-03 08:23:02