![](/img/trans.png)
[英]RegEx for finding number after back slash double quotation colon (\":) in python
[英]Regex expression after quotation on python
我正在嘗試開發一個Python程序,它將從Pandora的twit中獲得藝術家的名字。 例如,如果我有這個推特:
Luther Vandross在Pandora上聽到了“我能做得更好”#pandora http://t.co/ieDbLC393F 。
我想只得到Luther Vandross這個名字。 我對regex了解不多,所以我嘗試了以下代碼:
print re.findall('".+?" by [\w+]+', text)
但結果是路德的“我能做得更好”
您是否知道如何在python上開發正則表達式來獲取它?
你的正則表達式是接近的,但你可以改變分隔符使用" by
與on
。但是,你需要使用括號使用捕獲組。
你可以使用這樣的正則表達式:
" by (.+?) on
此正則表達式背后的想法是捕捉的內容" by
和on
,使用簡單nongreedy正則表達式。
匹配信息
MATCH 1
1. [43-58] `Luther Vandross`
碼
import re
p = re.compile(ur'" by (.+?) on')
test_str = u"I'm listening to \"I Can Make It Better\" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.\n"
re.search(p, test_str)
>>> s = '''I'm listening to "I Can Make It Better" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.'''
>>> import re
>>> m = re.search('to "?(.*?)"? by (.*?) on #?Pandora', s)
>>> m
<_sre.SRE_Match object; span=(14, 69), match='to "I Can Make It Better" by Luther Vandross on P>
>>> m.groups()
('I Can Make It Better', 'Luther Vandross')
更多測試案例:
>>> tests = [
'''I'm listening to "Don't Turn Out The Lights (D.T.O.T.L.)" by NKOTBSB on #Pandora''',
'''I'm listening to G.O.D. Remix by Canton Jones on #Pandora''',
'''I'm listening to "It's Been Awhile" by @staindmusic on Pandora #pandora http://pdora.co/R1OdxE''',
'''I'm listening to "Everlong" by @foofighters on #Pandora http://pdora.co/1eANfI0''',
'''I'm listening to "El Preso (2000)" by Fruko Y Sus Tesos on #Pandora http://pdora.co/1GtOHC1'''
'''I'm listening to "Cat Daddy" by Rej3ctz on #Pandora http://pdora.co/1eALNpc''',
'''I'm listening to "Space Age Pimpin'" by 8 Ball & MJG on Pandora #pandora http://pdora.co/1h8swun'''
]
>>> expr = re.compile('to "?(.*?)"? by (.*?) on #?Pandora')
>>> for s in tests:
print(expr.search(s).groups())
("Don't Turn Out The Lights (D.T.O.T.L.)", 'NKOTBSB')
('G.O.D. Remix', 'Canton Jones')
("It's Been Awhile", '@staindmusic')
('Everlong', '@foofighters')
('El Preso (2000)', 'Fruko Y Sus Tesos')
("Space Age Pimpin'", '8 Ball & MJG')
您需要使用捕獲組。
print re.findall(r'"[^"]*" by ([A-Z][a-z]+(?: [A-Z][a-z]+){0,2})', text)
我使用了重復量詞,因為名稱可能只包含名字或名字,姓氏或名字,中間名,姓名。
print re.findall('".+?" by ((?:[A-Z][a-z]+ )+)', text)
你可以試試這個。看看演示。
您可以使用此基於外觀的正則表達式:
str = 'I\'m listening to "I Can Make It Better" by Luther Vandross on Pandora #pandora http://t.co/ieDbLC393F.';
print re.search(r'(?<=by ).+?(?= on)', str).group()
Luther Vandross
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.