[英]Count times a regex pattern appears in a list of strings
假設我有一個學校列表:
schools = [
'00A000',
'01A000',
'00B000',
'01B000',
'00C000',
'01C000'
]
我正在做一些數據探索,我想做的第一件事是計算所有學校,例如%A%
(中間有一個A
)。
我以為我可以使用類似下面的命令:
schools.count('\BA')
但看起來我可以使用正則表達式的唯一方法是使用re
模塊:
[re.findall('\BA', x) for x in schools].count(['A'])
這是最簡單的方法嗎?
完整代碼:
import re
schools = [
'00A000',
'01A000',
'00B000',
'01B000',
'00C000',
'01C000'
]
# Data exploration. Find count of all district A schools.
# I thought I could use list's built in count and some kind of string regex for it to
# take in:
schools.count('\BA')
# Above example is invalid.
# It looks like I must loop over with regex and then add a count after, right?
[re.findall('\BA', x) for x in schools].count(['A'])
# Repeat for B and C...
你可以完全放棄使用正則表達式,如果你確實想匹配“xyAuv”而不是“Axyuv”或“xyuvA”,你可以使用:
len([1 for school in schools if 'A' in school[1:-1]])
如果字符串中的任何 'A' 都可以,當然只需'A' in school
使用'A' in school
。
一種更有趣的寫法是:
sum('A' in school for school in schools)
但它可能會令人困惑,而且速度有點慢。
或者:
from functools import reduce
from operator import add
reduce(add, ('A' in school for school in schools))
這很有趣,但速度更快。
如何將列表加入字符串並獲取出現次數:
import re
print(len(re.findall(r'\BA',','.join(schools))))
輸出:
2
正如我在評論中所說,我會選擇:
len(re.findall('\BA\B', ','.join(schools)))
這是一個概念證明:
Python 3.7.6 (default, Dec 19 2019, 22:52:49)
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> schools = [
... '00A000',
... '01A000',
... '00B000',
... '01B000',
... '00C000',
... '01C000',
... 'A0D000',
... '01B00A'
... ]
>>>
>>> len(re.findall('\BA\B', ','.join(schools)))
2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.