文本python中字符串的出現

Question

關於python中子字符串的出現有很多文章，但是我找不到關於文本中字符串出現的任何信息。

testSTR = "Suppose you have a large text and you are trying to find the specific occurences of some words"

#Suppose my search term is a, then I would expect the output of my program to be:
print testSTR.myfunc("a")
>>1

因為在整個輸入中只有1個對字符串“ a”的具體引用。 count()不會執行，因為它也計算子字符串，所以我得到的輸出是：

print testSTR.count()
>>3

可以這樣做嗎？

Answer 1

您可以在拆分字符串后使用集合來做到這一點。

from collections import Counter
print Counter(testSTR.split())

輸出看起來像

Counter({'you': 2, 'a': 1, 'and': 1, 'words': 1, 'text': 1, 'some': 1, 'the': 1, 'large': 1, 'to': 1, 'Suppose': 1, 'are': 1, 'have': 1, 'of': 1, 'specific': 1, 'trying': 1, 'find': 1, 'occurences': 1})

要獲得特定子字符串a使用計數，

from collections import Counter
res = Counter(testSTR.split())
print res['a']

如果計數不區分大小寫，請在計數之前使用upper()或lower轉換子字符串。

res= Counter(i.lower() for i in testSTR.split())

Answer 2

我認為最直接的方法是使用正則表達式：

import re
testSTR = "Suppose you have a large text and you are trying to find the specific occurences of some words"

print len(re.findall(r"\ba\b", testSTR))
# 1

\\ba\\b檢查之前和之后的“單詞邊界” a ，其中一個“單詞邊界”是標點符號，空格，或者開始或整個字符串的結尾。 這比僅在空格上分割要有用，除非您當然要這么做。

import re
str2 = "a large text a, a. a"

print len(re.findall(r"\ba\b", str2))
# 4

Answer 3

如果您擔心標點符號，則應嘗試以下操作：

words = testSTR.split().map(lambda s: s.strip(".!?:;,\"'"))
print "a" in words

文本python中字符串的出現

問題描述

3 個解決方案

解決方案1
5 已采納 2016-08-08 19:14:37

解決方案2
2 2016-08-08 19:20:56

解決方案3
1 2016-08-08 19:15:30

文本python中字符串的出現

問題描述

3 個解決方案

解決方案1 5 已采納 2016-08-08 19:14:37

解決方案2 2 2016-08-08 19:20:56

解決方案3 1 2016-08-08 19:15:30

解決方案1
5 已采納 2016-08-08 19:14:37

解決方案2
2 2016-08-08 19:20:56

解決方案3
1 2016-08-08 19:15:30