简体   繁体   中英

How to get match result by given range using regular expression?

I'm stucking with my code to get all return match by given range. My data sample is:

        comment
0       [intj74, you're, whipping, people, is, a, grea...
1       [home, near, kcil2, meniaga, who, intj47, a, l...
2       [thematic, budget, kasi, smooth, sweep]
3       [budget, 2, intj69, most, people, think, of, e...

I want to get the result as: (where the given range is intj1 to intj75)

         comment
0        [intj74]   
1        [intj47]    
2        [nan]   
3        [intj69]

My code is:

df.comment = df.comment.apply(lambda x: [t for t in x if t=='intj74'])
df.ix[df.comment.apply(len) == 0, 'comment'] = [[np.nan]]

I'm not sure how to use regular expression to find the range for t=='range'. Or any other idea to do this?

Thanks in advance,

Pandas Python Newbie

you could replace [t for t in x if t=='intj74'] with, eg,

[t for t in x if re.match('intj[0-9]+$', t)]

or even

[t for t in x if re.match('intj[0-9]+$', t)] or [np.nan]

which would also handle the case if there are no matches (so that one wouldn't need to check for that explicitly using df.ix[df.comment.apply(len) == 0, 'comment'] = [[np.nan]] ) The "trick" here is that an empty list evaluates to False so that the or in that case returns its right operand.

I am new to pandas as well. You might have initialized your DataFrame differently. Anyway, this is what I have:

import pandas as pd

data = {
    'comment': [
        "intj74, you're, whipping, people, is, a",
        "home, near, kcil2, meniaga, who, intj47, a",
        "thematic, budget, kasi, smooth, sweep",
        "budget, 2, intj69, most, people, think, of"
    ]
}
print(df.comment.str.extract(r'(intj\d+)'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM