简体   繁体   中英

find the appropriate regular expression

Can you help me to find the right Regular expression to extract ( Margaux or Saint-Julien ) in each time of this 2 pages:

in page 1 : Margaux, Rouge

in page 2 : 2ème Vin, Saint-Julien, Rouge

my code :

item ["appelation"] = res.select('.//div[@class="pro_col_right"]/div[@class="pro_blk_trans"]/div[@class="pro_blk_trans_titre"]/text()').re(r'\s*\w+\-\w+\-\w+|\w+\-\w+|\[^Rouge,Blanc]')

My regular expression couldn't find Margaux but it extracts Saint-Julien !!

Not sure why you need this but suppose s is your html file then this regex will find what you look for..

import re
m = re.search(r"\<div\ class=\"pro_blk_trans_titre\"\>(.*)\</div\>", s)
print(m.group(1).strip().encode("utf8"))

# page1: b'Margaux, Rouge'
# page2: b'2\xc3\xa8me Vin, Saint-Julien, Rouge'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM