如何使用正則表達式按給定范圍獲取匹配結果？

Question

我堅持使用我的代碼來獲得給定范圍內的所有返回匹配。 我的數據樣本是：

        comment
0       [intj74, you're, whipping, people, is, a, grea...
1       [home, near, kcil2, meniaga, who, intj47, a, l...
2       [thematic, budget, kasi, smooth, sweep]
3       [budget, 2, intj69, most, people, think, of, e...

我想得到結果：（其中給定的范圍是 intj1 到 intj75）

         comment
0        [intj74]   
1        [intj47]    
2        [nan]   
3        [intj69]

我的代碼是：

df.comment = df.comment.apply(lambda x: [t for t in x if t=='intj74'])
df.ix[df.comment.apply(len) == 0, 'comment'] = [[np.nan]]

我不確定如何使用正則表達式來查找 t=='range' 的范圍。 或者任何其他想法來做到這一點？

提前致謝，

Pandas Python 新手

Answer 1

您可以將[t for t in x if t=='intj74']替換為例如，

[t for t in x if re.match('intj[0-9]+$', t)]

甚至

[t for t in x if re.match('intj[0-9]+$', t)] or [np.nan]

如果沒有匹配項，它也會處理這種情況（這樣就不需要使用df.ix[df.comment.apply(len) == 0, 'comment'] = [[np.nan]] ) 這里的“技巧”是空列表的計算結果為False以便or在這種情況下返回其正確的操作數。

Answer 2

我也是pandas新手。 您可能以不同的方式初始化了 DataFrame。 無論如何，這就是我所擁有的：

import pandas as pd

data = {
    'comment': [
        "intj74, you're, whipping, people, is, a",
        "home, near, kcil2, meniaga, who, intj47, a",
        "thematic, budget, kasi, smooth, sweep",
        "budget, 2, intj69, most, people, think, of"
    ]
}
print(df.comment.str.extract(r'(intj\d+)'))

如何使用正則表達式按給定范圍獲取匹配結果？

問題描述

2 個解決方案

解決方案1
1 已采納 2016-09-15 08:49:26

解決方案2
0 2016-09-15 08:55:00

如何使用正則表達式按給定范圍獲取匹配結果？

問題描述

2 個解決方案

解決方案1 1 已采納 2016-09-15 08:49:26

解決方案2 0 2016-09-15 08:55:00

解決方案1
1 已采納 2016-09-15 08:49:26

解決方案2
0 2016-09-15 08:55:00