简体   繁体   中英

Regex extract everything after and before a specific text

I need to extract from this:

<meta content=",\n\n\nÓscar Mauricio  Lizcano Arango,\n\n\n\n\n\n\n\nBerner León Zambrano Eraso,\n\n\n\n\n" name="keywords"><meta content="Congreso Visible - Toda la información sobre el Congreso Colombiano en un solo lugar" property="og:title"/><meta content="/static/img/logo-fb.jpg" 

The names shown in there: Óscar Mauricio Lizcano Arango and Berner León Zambrano Eraso.

So it would be something like everything after

<meta content=" 

and before

name="keywords". 

Also, using python, I would like to put every name as an element of a list. I would repeat this many times for different strings and the amount of names vary (it could be 4 names instead of 2 as in this case).

How could I do this?

我做到了

re.findall(r'(?<=content=",)[^.]+(?=name=)', names)

This might help you:

# -*- coding: utf-8 -*-
import re
or_str = '<meta content=",\n\n\nÓscar Mauricio  Lizcano Arango,\n\n\n\n\n\n\n\nBerner León Zambrano Eraso,\n\n\n\n\n" name="keywords"><meta content="Congreso Visible - Toda la información sobre el Congreso Colombiano en un solo lugar" property="og:title"/><meta content="/static/img/logo-fb.jpg"'
new_str = or_str.replace("\n","")
li = re.findall('meta content=",(.*)" name="keywords"', new_str);
new_str = ''.join(li)
print re.findall('(.*?),',new_str)

I used replace() method to change all the newline characters \\n to NULL .
Then, I used findall to look for the names and put it in a list, and again used findall to store every name as an element of a list, since findall returns a list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM