使用正则表达式匹配代码

Question

I need to extract tickers (which are stock symbols is an abbreviation) from tweets, those tickers starts with $ (dollar sign) and composed of Uppercase letters and sometime "-".我需要从推文中提取代码（股票代码是缩写），这些代码以 $（美元符号）开头，由大写字母和有时“-”组成。 This is an example below:这是下面的示例：

str = "VG Acquisition Has The Potential To Fly High $SPCE $STPK $VG-AC price is $0.88"

I tries many regex but none of them returned what I need:我尝试了许多正则表达式，但没有一个返回我需要的内容：

\b\$.*\b
[$].*\s     
[$].*\b
[$].*\s$

I need to match:我需要匹配：

$SPCE 
$STPK 
$VG-AC

Answer 1

I would have suggested something like that: re.findall(r'\$[AZ-?]+', text)我会建议这样的： re.findall(r'\$[AZ-?]+', text)

\$ = Start with $ \$ = 以 $ 开头

[AZ-?]+ = match uppercase letter with dash as a possibility. [AZ-?]+ = 可能匹配带有破折号的大写字母。 The + at the end for repeatability.末尾的 + 表示可重复性。

This regex works even with this pattern: ABS-DE-CE这个正则表达式甚至适用于这种模式：ABS-DE-CE

Answer 2

Use利用

re.findall(r'\$(?!\d+\.\d)\S+', text)

See proof .见证明。

Explanation解释

--------------------------------------------------------------------------------
  \$                       '$'
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
--------------------------------------------------------------------------------
    \.                       '.'
--------------------------------------------------------------------------------
    \d                       digits (0-9)
--------------------------------------------------------------------------------
  )                        end of look-ahead
--------------------------------------------------------------------------------
  \S+                      non-whitespace (all but \n, \r, \t, \f,
                           and " ") (1 or more times (matching the
                           most amount possible))

Answer 3

pytickersymbols , if it does what it says on the tin, should serve your purpose well. pytickersymbols ，如果它按照锡上所说的那样做，应该可以很好地满足您的目的。 From the tests :从测试：

import yfinance as yf
y_ticker = yf.Ticker('GOOG')
data = y_ticker.history(period='4d')

Answer 4

You can match 1 or more uppercase chars AZ.您可以匹配 1 个或多个大写字符 AZ。

Then optionally repeat matching - and 1 or more uppercase chars AZ.然后可选地重复匹配-和 1 个或多个大写字符 AZ。

\$[A-Z]+(?:-[A-Z]+)*\b

Explanation解释

\$[AZ]+ Match $ and 1 or more uppercase chars AZ \$[AZ]+匹配$和 1 个或多个大写字符 AZ
(?: Non capture group (?:非捕获组
- -[AZ]+ Match - and 1 or more uppercase chars AZ -[AZ]+匹配-和 1 个或多个大写字符 AZ
)* Close group and repeat 0+ times )*关闭组并重复 0+ 次
\b A word boundary \b一个词的边界

Regex demo |正则表达式演示| Python demo Python 演示

For example例如

import re
 
regex = r"\$[A-Z]+(?:-[A-Z]+)*\b"
s = "VG Acquisition Has The Potential To Fly High $SPCE $STPK $VG-AC price is $0.88"
print(re.findall(regex, s))

Output Output

['$SPCE', '$STPK', '$VG-AC']

使用正则表达式匹配代码

问题描述

4 个解决方案

解决方案1
0 2021-01-30 23:30:56

解决方案2
0 2021-01-31 00:11:16

解决方案3
0 已采纳 2021-01-31 00:20:18

解决方案4
0 2021-01-31 10:24:01

使用正则表达式匹配代码

问题描述

4 个解决方案

解决方案1 0 2021-01-30 23:30:56

解决方案2 0 2021-01-31 00:11:16

解决方案3 0 已采纳 2021-01-31 00:20:18

解决方案4 0 2021-01-31 10:24:01

解决方案1
0 2021-01-30 23:30:56

解决方案2
0 2021-01-31 00:11:16

解决方案3
0 已采纳 2021-01-31 00:20:18

解决方案4
0 2021-01-31 10:24:01