Get the string between [ brackets and special characters in python

Question

I have a really similar question to this one .

And i really wonder why my restult is: NaN .

I have a dataframe which this column:

Action
Player[J♡, K♧] won the $5.40 main pot with a Straight
Player [5, 2] won the $21.00 main pot with a flush

and I want to built a new column with the cards, who got played:

[J♡, K♧]
[5, 2]

or even:

[J, K]
[5, 2]

However when I play around on regex and i use: dfpot['cards'] = dfpot['Action'].str.extract(r'\[([A-Za-z0-9_]+)\]', expand=False)

I only got NaN .

Answer 1

You can add the characters to the character class in the capture group as in your pattern \[([A-Za-z0-9_♤♡♢♧, ]+)\] or make the pattern a bit more specific:

\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)]

The pattern matches:

\[ Match [
( Capture group 1
- [A-Za-z0-9_] Match one of the listed charss
- [♤♡♢♧]? Optionally match one of the listed chars
- ,\s*[A-Za-z0-9_][♤♡♢♧]? Match a comma and the same logic as before the comma
) Close group 1
] Match ]

Regex demo

For example

import pandas as pd

dfpot = pd.DataFrame({'Action':['Player[J♡, K♧] won the $5.40 main pot with a Straight', 'Player [5, 2] won the $21.00 main pot with a flush']})
dfpot['cards'] = dfpot['Action'].str.extract(r'\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)]', expand=False)
print(dfpot)

Output

                                              Action   cards
0  Player[J♡, K♧] won the $5.40 main pot with a S...  J♡, K♧
1  Player [5, 2] won the $21.00 main pot with a f...    5, 2

Answer 2

Try pattern (I assumed that you use () in the text instead [] , as was posted in regex demo):

\([^,]+,[^\)]+\)

Explanation:

\( - match ( literally

[^,]+ - match one ore more character other than ,

, - match , literally

[^\)]+ - match one or more characters other than )

\) - match ) literally

Regex demo

Answer 3

Use

>>> import pandas as pd
>>> df = pd.DataFrame({'Action':['Player[J♡, K♧] won the $5.40 main pot with a Straight', 'Player [5, 2] won the $21.00 main pot with a flush']})
>>> df['cards'] = df['Action'].str.findall(r'(\w+)(?=[^][]*])')
>>> df
                                              Action   cards
0  Player[J♡, K♧] won the $5.40 main pot with a S...  [J, K]
1  Player [5, 2] won the $21.00 main pot with a f...  [5, 2]
>>>

Regex : (\w+)(?=[^][]*])

EXPLANATION

--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    [^][]*                   any character except: ']', '[' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    ]                        ']'
--------------------------------------------------------------------------------
  )                        end of look-ahead

Get the string between [ brackets and special characters in python

Question

3 answers

solution1
1 ACCPTED 2021-02-14 10:46:36

solution2
0 2021-02-05 14:10:47

solution3
0 2021-02-14 01:03:41

Get the string between [ brackets and special characters in python

Question

3 answers

solution1 1 ACCPTED 2021-02-14 10:46:36

solution2 0 2021-02-05 14:10:47

solution3 0 2021-02-14 01:03:41

solution1
1 ACCPTED 2021-02-14 10:46:36

solution2
0 2021-02-05 14:10:47

solution3
0 2021-02-14 01:03:41