[英]Get the string between [ brackets and special characters in python
我有一個與此非常相似的問題。
我真的很想知道為什么我的結果是: NaN
。
我有一個 dataframe 這個專欄:
Action
Player[J♡, K♧] won the $5.40 main pot with a Straight
Player [5, 2] won the $21.00 main pot with a flush
我想用牌建立一個新列,誰被玩過:
[J♡, K♧]
[5, 2]
甚至:
[J, K]
[5, 2]
但是,當我玩正則表達式並使用: dfpot['cards'] = dfpot['Action'].str.extract(r'\[([A-Za-z0-9_]+)\]', expand=False)
我只有NaN
。
您可以將字符添加到捕獲組中的字符 class 中,如模式\[([A-Za-z0-9_♤♡♢♧, ]+)\]
或使模式更具體:
\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)]
模式匹配:
\[
匹配[
(
捕獲組 1
[A-Za-z0-9_]
匹配列出的字符之一[♤♡♢♧]?
可選擇匹配列出的字符之一,\s*[A-Za-z0-9_][♤♡♢♧]?
匹配一個逗號和與逗號之前相同的邏輯)
關閉第 1 組]
匹配]
例如
import pandas as pd
dfpot = pd.DataFrame({'Action':['Player[J♡, K♧] won the $5.40 main pot with a Straight', 'Player [5, 2] won the $21.00 main pot with a flush']})
dfpot['cards'] = dfpot['Action'].str.extract(r'\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)]', expand=False)
print(dfpot)
Output
Action cards
0 Player[J♡, K♧] won the $5.40 main pot with a S... J♡, K♧
1 Player [5, 2] won the $21.00 main pot with a f... 5, 2
嘗試模式(我假設您在文本中使用()
而不是[]
,正如在正則表達式演示中發布的那樣):
\([^,]+,[^\)]+\)
解釋:
\(
- 匹配(
字面意思
[^,]+
- 匹配除,
之外的一個或多個字符
,
- 匹配,
字面意思
[^\)]+
- 匹配除)
以外的一個或多個字符
\)
- 匹配)
字面意思
利用
>>> import pandas as pd
>>> df = pd.DataFrame({'Action':['Player[J♡, K♧] won the $5.40 main pot with a Straight', 'Player [5, 2] won the $21.00 main pot with a flush']})
>>> df['cards'] = df['Action'].str.findall(r'(\w+)(?=[^][]*])')
>>> df
Action cards
0 Player[J♡, K♧] won the $5.40 main pot with a S... [J, K]
1 Player [5, 2] won the $21.00 main pot with a f... [5, 2]
>>>
正則表達式: (\w+)(?=[^][]*])
解釋
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[^][]* any character except: ']', '[' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
] ']'
--------------------------------------------------------------------------------
) end of look-ahead
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.