繁体   English   中英

获取 [ 括号和 python 中的特殊字符之间的字符串

[英]Get the string between [ brackets and special characters in python

我有一个与此非常相似的问题。

我真的很想知道为什么我的结果是: NaN

我有一个 dataframe 这个专栏:

Action
Player[J♡, K♧] won the $5.40 main pot with a Straight
Player [5, 2] won the $21.00 main pot with a flush

我想用牌建立一个新列,谁被玩过:

[J♡, K♧]
[5, 2]

甚至:

[J, K]
[5, 2]

但是,当我玩正则表达式并使用: dfpot['cards'] = dfpot['Action'].str.extract(r'\[([A-Za-z0-9_]+)\]', expand=False)

我只有NaN

您可以将字符添加到捕获组中的字符 class 中,如模式\[([A-Za-z0-9_♤♡♢♧, ]+)\]或使模式更具体:

\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)]

模式匹配:

  • \[匹配[
  • (捕获组 1
    • [A-Za-z0-9_]匹配列出的字符之一
    • [♤♡♢♧]? 可选择匹配列出的字符之一
    • ,\s*[A-Za-z0-9_][♤♡♢♧]? 匹配一个逗号和与逗号之前相同的逻辑
  • )关闭第 1 组
  • ]匹配]

正则表达式演示

例如

import pandas as pd

dfpot = pd.DataFrame({'Action':['Player[J♡, K♧] won the $5.40 main pot with a Straight', 'Player [5, 2] won the $21.00 main pot with a flush']})
dfpot['cards'] = dfpot['Action'].str.extract(r'\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)]', expand=False)
print(dfpot)

Output

                                              Action   cards
0  Player[J♡, K♧] won the $5.40 main pot with a S...  J♡, K♧
1  Player [5, 2] won the $21.00 main pot with a f...    5, 2

尝试模式(我假设您在文本中使用()而不是[] ,正如在正则表达式演示中发布的那样):

\([^,]+,[^\)]+\)

解释:

\( - 匹配(字面意思

[^,]+ - 匹配除,之外的一个或多个字符

, - 匹配,字面意思

[^\)]+ - 匹配除)以外的一个或多个字符

\) - 匹配)字面意思

正则表达式演示

利用

>>> import pandas as pd
>>> df = pd.DataFrame({'Action':['Player[J♡, K♧] won the $5.40 main pot with a Straight', 'Player [5, 2] won the $21.00 main pot with a flush']})
>>> df['cards'] = df['Action'].str.findall(r'(\w+)(?=[^][]*])')
>>> df
                                              Action   cards
0  Player[J♡, K♧] won the $5.40 main pot with a S...  [J, K]
1  Player [5, 2] won the $21.00 main pot with a f...  [5, 2]
>>> 

正则表达式(\w+)(?=[^][]*])

解释

--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    [^][]*                   any character except: ']', '[' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    ]                        ']'
--------------------------------------------------------------------------------
  )                        end of look-ahead

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM