如何替換python中的所有字符串？

Question

我正在使用調節器經驗創建代理刮板。 使用re解析HTML很糟糕，因此我需要確保最終結果中沒有字符串顯示。 如何用空格替換所有字符串。 我必須清理已解析數據的當前代碼是

print title.replace(',', '').replace("!", '').replace(":", '').replace(";", '').replace(str, '')

str部分是我嘗試的。 還有其他方法嗎？

Answer 1

如果要從HTML文檔中提取所有可見數字，則可以首先使用BeautifulSoup解析HTML文檔，然后從中提取文本。 然后，您可以從這些文本元素中提取所有數字：

from bs4 import BeautifulSoup
from urllib.request import urlopen
import re

# let’s use the StackOverflow homepage as an example
r = urlopen('http://stackoverflow.com')
soup = BeautifulSoup(r)

# As we don’t want to get the content from script related
# elements, remove those.
for script in soup(['script', 'noscript']):
    script.extract()

# And now extract the numbers using regular expressions from
# all text nodes we can find in the (remaining) document.
numbers = [n for t in soup(text=True) for n in re.findall('\d+', t)]

然后， numbers將包含文檔中所有可見的數字。 如果只想將搜索限制在某些元素上，則可以更改soup(text=True)部分。

Answer 2

replace1 = range(0,46)+range(58,127)+[47] #Makes a list of all the 
#ASCII characters  values that you don't want it to show,
#http://www.asciitable.com/, this includes all the letters,
#and excludes all numbers and '.'

text = '<html><body><p>127.0.0.1</p></body></html>' #Test data.
tmp = '' 

for i in range(len(text)-1): #this goes through each character in the text
...     if not ord(text[i]) in replace1: #checks if that character's 
#ASCII value is in not the list of 'Blacklisted' ASCII values, 
#then appends it to the tmp variable 
...             tmp += text[i]

print tmp
127.0.0.1

如何替換python中的所有字符串？

問題描述

2 個解決方案

解決方案1
3 2014-01-04 23:58:20

解決方案2
1 已采納 2014-01-05 00:01:30

如何替換python中的所有字符串？

問題描述

2 個解決方案

解決方案1 3 2014-01-04 23:58:20

解決方案2 1 已采納 2014-01-05 00:01:30

解決方案1
3 2014-01-04 23:58:20

解決方案2
1 已采納 2014-01-05 00:01:30