Python-檢查字符串中的多個空格

Question

我正在使用此函數來檢查字符串是否包含多個空格：

def check_multiple_white_spaces(text):
    return "  " in text

並且通常可以正常工作，但是不能在以下代碼中進行操作：

from bs4 import BeautifulSoup
from string import punctuation

text = "<p>Hello &nbsp; &nbsp; &nbsp;world!!</p>\r\n\r"

text = BeautifulSoup(text, 'html.parser').text
text = ''.join(ch for ch in text if ch not in set(punctuation))
text = text.lower().replace('\n', ' ').replace('\t', '').replace('\r', '')

print check_multiple_white_spaces(text)

text變量的最終值是hello world ，但是我不知道為什么check_multiple_white_spaces函數返回False而不是True 。

我怎樣才能解決這個問題？

Answer 1

如果要使用repr()打印text內容，您將看到它不包含兩個連續的空格：

'hello \xa0 \xa0 \xa0world '

結果，您的函數正確返回False 。 這可以通過將不間斷空格轉換為空格來解決：

text = text.replace(u'\xa0', u' ')

Answer 2

首先，您的函數check_multiple_white_spaces不能真正檢查是否存在多個空格，因為可能存在三個或更多個空格。

您應該使用re.search(r"\\s{2,}", text) 。

其次，如果您打印text ，您將發現需要取消轉義文本。

看到這個答案。

如何在Python 3.1中的字符串中取消對HTML實體的轉義？

Answer 3

text變量中沒有連續的空格，這就是為什么check_multiple_white_spaces函數返回False的原因。

>>> text
u'hello \xa0 \xa0 \xa0world '
>>> print text
hello      world

\\xa0是不間斷空間，不間斷空間（NBSP），硬空間。 os空間的值為32，非中斷空間的值為160

(u' ', 32)
(u'\xa0', 160)

字符\\ xa0是一個NO-BREAK空格，最接近的ASCII等效詞當然是常規空格。

使用unidecode module將所有非ASCII字符轉換為與其最接近的ASCII等效字符

演示：

>>> import unidecode
>>> unidecode.unidecode(text)
'hello      world '
>>> "  " in unidecode.unidecode(text)
True

Python-檢查字符串中的多個空格

問題描述

3 個解決方案

解決方案1
2 已采納 2017-09-22 08:48:10

解決方案2
0 2017-09-22 08:48:14

解決方案3
0 2017-09-22 08:54:32

Python-檢查字符串中的多個空格

問題描述

3 個解決方案

解決方案1 2 已采納 2017-09-22 08:48:10

解決方案2 0 2017-09-22 08:48:14

解決方案3 0 2017-09-22 08:54:32

解決方案1
2 已采納 2017-09-22 08:48:10

解決方案2
0 2017-09-22 08:48:14

解決方案3
0 2017-09-22 08:54:32