[英]Get the string between [ brackets and special characters in python
我有一个与此非常相似的问题。
我真的很想知道为什么我的结果是: NaN
。
我有一个 dataframe 这个专栏:
Action
Player[J♡, K♧] won the $5.40 main pot with a Straight
Player [5, 2] won the $21.00 main pot with a flush
我想用牌建立一个新列,谁被玩过:
[J♡, K♧]
[5, 2]
甚至:
[J, K]
[5, 2]
但是,当我玩正则表达式并使用: dfpot['cards'] = dfpot['Action'].str.extract(r'\[([A-Za-z0-9_]+)\]', expand=False)
我只有NaN
。
您可以将字符添加到捕获组中的字符 class 中,如模式\[([A-Za-z0-9_♤♡♢♧, ]+)\]
或使模式更具体:
\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)]
模式匹配:
\[
匹配[
(
捕获组 1
[A-Za-z0-9_]
匹配列出的字符之一[♤♡♢♧]?
可选择匹配列出的字符之一,\s*[A-Za-z0-9_][♤♡♢♧]?
匹配一个逗号和与逗号之前相同的逻辑)
关闭第 1 组]
匹配]
例如
import pandas as pd
dfpot = pd.DataFrame({'Action':['Player[J♡, K♧] won the $5.40 main pot with a Straight', 'Player [5, 2] won the $21.00 main pot with a flush']})
dfpot['cards'] = dfpot['Action'].str.extract(r'\[([A-Za-z0-9_][♤♡♢♧]?,\s*[A-Za-z0-9_][♤♡♢♧]?)]', expand=False)
print(dfpot)
Output
Action cards
0 Player[J♡, K♧] won the $5.40 main pot with a S... J♡, K♧
1 Player [5, 2] won the $21.00 main pot with a f... 5, 2
尝试模式(我假设您在文本中使用()
而不是[]
,正如在正则表达式演示中发布的那样):
\([^,]+,[^\)]+\)
解释:
\(
- 匹配(
字面意思
[^,]+
- 匹配除,
之外的一个或多个字符
,
- 匹配,
字面意思
[^\)]+
- 匹配除)
以外的一个或多个字符
\)
- 匹配)
字面意思
利用
>>> import pandas as pd
>>> df = pd.DataFrame({'Action':['Player[J♡, K♧] won the $5.40 main pot with a Straight', 'Player [5, 2] won the $21.00 main pot with a flush']})
>>> df['cards'] = df['Action'].str.findall(r'(\w+)(?=[^][]*])')
>>> df
Action cards
0 Player[J♡, K♧] won the $5.40 main pot with a S... [J, K]
1 Player [5, 2] won the $21.00 main pot with a f... [5, 2]
>>>
正则表达式: (\w+)(?=[^][]*])
解释
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[^][]* any character except: ']', '[' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
] ']'
--------------------------------------------------------------------------------
) end of look-ahead
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.