TypeError使用正則表達式在Python中進行文本分析

Question

我正在嘗試編寫一些代碼，以掃描與正則表達式“ PP +”匹配的每個字符串，並告訴我它出現了多少次。 這是我的代碼：

with open ('testfile.txt') as f:
data = f.read()
data = data.split()

import re


the_sum = 0

prolist = []

for word in data:
    pronoun = re.compile(r'PP+')
    result = pronoun.match(data)
    if word == result:
        the_sum += 1

print the_sum

我收到此錯誤消息：

Traceback (most recent call last):
  File "C:/Python27/RE_counter.py", line 14, in 
    result = pronoun.match(data)
TypeError: expected string or buffer

有人可以告訴我我在做什么錯嗎？

Answer 1

您將在每次迭代中傳遞整個列表（即TypeError ），並且由於沒有返回單詞，因此也沒有正確檢查匹配結果：

for word in data:
    pronoun = re.compile(r'PP+')
    result = pronoun.match(word)  # ← you had pronoun.match(data)
    if result is not None:        # ← you had if word == result
        the_sum += 1

Answer 2

你可以直接從中得到你所得到的。

with open ('testfile.txt') as f:
    data = f.read()
    print len(re.findall(r"\bPP\+\b",data))

TypeError使用正則表達式在Python中進行文本分析

問題描述

2 個解決方案

解決方案1
1 2014-12-07 03:15:42

解決方案2
0 2014-12-07 06:30:50

TypeError使用正則表達式在Python中進行文本分析

問題描述

2 個解決方案

解決方案1 1 2014-12-07 03:15:42

解決方案2 0 2014-12-07 06:30:50

解決方案1
1 2014-12-07 03:15:42

解決方案2
0 2014-12-07 06:30:50