I need to extract from this:
<meta content=",\n\n\nÓscar Mauricio Lizcano Arango,\n\n\n\n\n\n\n\nBerner León Zambrano Eraso,\n\n\n\n\n" name="keywords"><meta content="Congreso Visible - Toda la información sobre el Congreso Colombiano en un solo lugar" property="og:title"/><meta content="/static/img/logo-fb.jpg"
The names shown in there: Óscar Mauricio Lizcano Arango and Berner León Zambrano Eraso.
So it would be something like everything after
<meta content="
and before
name="keywords".
Also, using python, I would like to put every name as an element of a list. I would repeat this many times for different strings and the amount of names vary (it could be 4 names instead of 2 as in this case).
How could I do this?
我做到了
re.findall(r'(?<=content=",)[^.]+(?=name=)', names)
This might help you:
# -*- coding: utf-8 -*-
import re
or_str = '<meta content=",\n\n\nÓscar Mauricio Lizcano Arango,\n\n\n\n\n\n\n\nBerner León Zambrano Eraso,\n\n\n\n\n" name="keywords"><meta content="Congreso Visible - Toda la información sobre el Congreso Colombiano en un solo lugar" property="og:title"/><meta content="/static/img/logo-fb.jpg"'
new_str = or_str.replace("\n","")
li = re.findall('meta content=",(.*)" name="keywords"', new_str);
new_str = ''.join(li)
print re.findall('(.*?),',new_str)
I used replace()
method to change all the newline characters \\n
to NULL
.
Then, I used findall
to look for the names and put it in a list, and again used findall
to store every name as an element of a list, since findall
returns a list.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.