簡體   English   中英

獲取 [ 括號和 python 中的特殊字符之間的字符串

[英]Get the string between [ brackets and special characters in python

我有一個與此非常相似的問題。

我真的很想知道為什么我的結果是: NaN

我有一個 dataframe 這個專欄:

Action
Player[J♡, K♧] won the $5.40 main pot with a Straight
Player [5, 2] won the $21.00 main pot with a flush

我想用牌建立一個新列,誰被玩過:

[J♡, K♧]
[5, 2]

甚至:

[J, K]
[5, 2]

但是,當我玩正則表達式並使用: dfpot['cards'] = dfpot['Action'].str.extract(r'\[([A-Za-z0-9_]+)\]', expand=False)

我只有NaN

您可以將字符添加到捕獲組中的字符 class 中,如模式\[([A-Za-z0-9_♤♡♢♧, ]+)\]或使模式更具體:

\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)]

模式匹配:

  • \[匹配[
  • (捕獲組 1
    • [A-Za-z0-9_]匹配列出的字符之一
    • [♤♡♢♧]? 可選擇匹配列出的字符之一
    • ,\s*[A-Za-z0-9_][♤♡♢♧]? 匹配一個逗號和與逗號之前相同的邏輯
  • )關閉第 1 組
  • ]匹配]

正則表達式演示

例如

import pandas as pd

dfpot = pd.DataFrame({'Action':['Player[J♡, K♧] won the $5.40 main pot with a Straight', 'Player [5, 2] won the $21.00 main pot with a flush']})
dfpot['cards'] = dfpot['Action'].str.extract(r'\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)]', expand=False)
print(dfpot)

Output

                                              Action   cards
0  Player[J♡, K♧] won the $5.40 main pot with a S...  J♡, K♧
1  Player [5, 2] won the $21.00 main pot with a f...    5, 2

嘗試模式(我假設您在文本中使用()而不是[] ,正如在正則表達式演示中發布的那樣):

\([^,]+,[^\)]+\)

解釋:

\( - 匹配(字面意思

[^,]+ - 匹配除,之外的一個或多個字符

, - 匹配,字面意思

[^\)]+ - 匹配除)以外的一個或多個字符

\) - 匹配)字面意思

正則表達式演示

利用

>>> import pandas as pd
>>> df = pd.DataFrame({'Action':['Player[J♡, K♧] won the $5.40 main pot with a Straight', 'Player [5, 2] won the $21.00 main pot with a flush']})
>>> df['cards'] = df['Action'].str.findall(r'(\w+)(?=[^][]*])')
>>> df
                                              Action   cards
0  Player[J♡, K♧] won the $5.40 main pot with a S...  [J, K]
1  Player [5, 2] won the $21.00 main pot with a f...  [5, 2]
>>> 

正則表達式(\w+)(?=[^][]*])

解釋

--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    [^][]*                   any character except: ']', '[' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    ]                        ']'
--------------------------------------------------------------------------------
  )                        end of look-ahead

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM