简体   繁体   中英

Get the string between [ brackets and special characters in python

I have a really similar question to this one .

And i really wonder why my restult is: NaN .

I have a dataframe which this column:

Action
Player[J♡, K♧] won the $5.40 main pot with a Straight
Player [5, 2] won the $21.00 main pot with a flush

and I want to built a new column with the cards, who got played:

[J♡, K♧]
[5, 2]

or even:

[J, K]
[5, 2]

However when I play around on regex and i use: dfpot['cards'] = dfpot['Action'].str.extract(r'\[([A-Za-z0-9_]+)\]', expand=False)

I only got NaN .

You can add the characters to the character class in the capture group as in your pattern \[([A-Za-z0-9_♤♡♢♧, ]+)\] or make the pattern a bit more specific:

\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)]

The pattern matches:

  • \[ Match [
  • ( Capture group 1
    • [A-Za-z0-9_] Match one of the listed charss
    • [♤♡♢♧]? Optionally match one of the listed chars
    • ,\s*[A-Za-z0-9_][♤♡♢♧]? Match a comma and the same logic as before the comma
  • ) Close group 1
  • ] Match ]

Regex demo

For example

import pandas as pd

dfpot = pd.DataFrame({'Action':['Player[J♡, K♧] won the $5.40 main pot with a Straight', 'Player [5, 2] won the $21.00 main pot with a flush']})
dfpot['cards'] = dfpot['Action'].str.extract(r'\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)]', expand=False)
print(dfpot)

Output

                                              Action   cards
0  Player[J♡, K♧] won the $5.40 main pot with a S...  J♡, K♧
1  Player [5, 2] won the $21.00 main pot with a f...    5, 2

Try pattern (I assumed that you use () in the text instead [] , as was posted in regex demo):

\([^,]+,[^\)]+\)

Explanation:

\( - match ( literally

[^,]+ - match one ore more character other than ,

, - match , literally

[^\)]+ - match one or more characters other than )

\) - match ) literally

Regex demo

Use

>>> import pandas as pd
>>> df = pd.DataFrame({'Action':['Player[J♡, K♧] won the $5.40 main pot with a Straight', 'Player [5, 2] won the $21.00 main pot with a flush']})
>>> df['cards'] = df['Action'].str.findall(r'(\w+)(?=[^][]*])')
>>> df
                                              Action   cards
0  Player[J♡, K♧] won the $5.40 main pot with a S...  [J, K]
1  Player [5, 2] won the $21.00 main pot with a f...  [5, 2]
>>> 

Regex : (\w+)(?=[^][]*])

EXPLANATION

--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    [^][]*                   any character except: ']', '[' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    ]                        ']'
--------------------------------------------------------------------------------
  )                        end of look-ahead

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM