I have a really similar question to this one .
And i really wonder why my restult is: NaN
.
I have a dataframe which this column:
Action
Player[J♡, K♧] won the $5.40 main pot with a Straight
Player [5, 2] won the $21.00 main pot with a flush
and I want to built a new column with the cards, who got played:
[J♡, K♧]
[5, 2]
or even:
[J, K]
[5, 2]
However when I play around on regex and i use: dfpot['cards'] = dfpot['Action'].str.extract(r'\[([A-Za-z0-9_]+)\]', expand=False)
I only got NaN
.
You can add the characters to the character class in the capture group as in your pattern \[([A-Za-z0-9_♤♡♢♧, ]+)\]
or make the pattern a bit more specific:
\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)]
The pattern matches:
\[
Match [
(
Capture group 1
[A-Za-z0-9_]
Match one of the listed charss [♤♡♢♧]?
Optionally match one of the listed chars,\s*[A-Za-z0-9_][♤♡♢♧]?
Match a comma and the same logic as before the comma)
Close group 1 ]
Match ]
For example
import pandas as pd
dfpot = pd.DataFrame({'Action':['Player[J♡, K♧] won the $5.40 main pot with a Straight', 'Player [5, 2] won the $21.00 main pot with a flush']})
dfpot['cards'] = dfpot['Action'].str.extract(r'\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)]', expand=False)
print(dfpot)
Output
Action cards
0 Player[J♡, K♧] won the $5.40 main pot with a S... J♡, K♧
1 Player [5, 2] won the $21.00 main pot with a f... 5, 2
Try pattern (I assumed that you use ()
in the text instead []
, as was posted in regex demo):
\([^,]+,[^\)]+\)
Explanation:
\(
- match (
literally
[^,]+
- match one ore more character other than ,
,
- match ,
literally
[^\)]+
- match one or more characters other than )
\)
- match )
literally
Use
>>> import pandas as pd
>>> df = pd.DataFrame({'Action':['Player[J♡, K♧] won the $5.40 main pot with a Straight', 'Player [5, 2] won the $21.00 main pot with a flush']})
>>> df['cards'] = df['Action'].str.findall(r'(\w+)(?=[^][]*])')
>>> df
Action cards
0 Player[J♡, K♧] won the $5.40 main pot with a S... [J, K]
1 Player [5, 2] won the $21.00 main pot with a f... [5, 2]
>>>
Regex : (\w+)(?=[^][]*])
EXPLANATION
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[^][]* any character except: ']', '[' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
] ']'
--------------------------------------------------------------------------------
) end of look-ahead
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.