簡體   English   中英

計算正則表達式模式出現在字符串列表中的次數

[英]Count times a regex pattern appears in a list of strings

假設我有一個學校列表:

schools = [
    '00A000',
    '01A000',
    '00B000',
    '01B000',
    '00C000',
    '01C000'
]

我正在做一些數據探索,我想做的第一件事是計算所有學校,例如%A% (中間有一個A )。

我以為我可以使用類似下面的命令:

schools.count('\BA')

但看起來我可以使用正則表達式的唯一方法是使用re模塊:

[re.findall('\BA', x) for x in schools].count(['A'])

這是最簡單的方法嗎?

完整代碼:

import re

schools = [
    '00A000',
    '01A000',
    '00B000',
    '01B000',
    '00C000',
    '01C000'
]

# Data exploration. Find count of all district A schools.

# I thought I could use list's built in count and some kind of string regex for it to
# take in:
schools.count('\BA')
# Above example is invalid.

# It looks like I must loop over with regex and then add a count after, right?
[re.findall('\BA', x) for x in schools].count(['A'])

# Repeat for B and C...

你可以完全放棄使用正則表達式,如果你確實想匹配“xyAuv”而不是“Axyuv”或“xyuvA”,你可以使用:

len([1 for school in schools if 'A' in school[1:-1]])

如果字符串中的任何 'A' 都可以,當然只需'A' in school使用'A' in school

一種更有趣的寫法是:

sum('A' in school for school in schools)

但它可能會令人困惑,而且速度有點慢。

或者:

from functools import reduce                                                                                 
from operator import add                                                                                     

reduce(add, ('A' in school for school in schools))                                                           

這很有趣,但速度更快。

如何將列表加入字符串並獲取出現次數:

import re
print(len(re.findall(r'\BA',','.join(schools))))

輸出:

2

正如我在評論中所說,我會選擇:

len(re.findall('\BA\B', ','.join(schools)))

這是一個概念證明:

Python 3.7.6 (default, Dec 19 2019, 22:52:49) 
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> schools = [
...     '00A000',
...     '01A000',
...     '00B000',
...     '01B000',
...     '00C000',
...     '01C000',
...     'A0D000',
...     '01B00A'
... ]
>>> 
>>> len(re.findall('\BA\B', ','.join(schools)))
2

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM