正则表达式提取特定文本前后的所有内容

Question

I need to extract from this: 我需要从中提取：

<meta content=",\n\n\nÓscar Mauricio  Lizcano Arango,\n\n\n\n\n\n\n\nBerner León Zambrano Eraso,\n\n\n\n\n" name="keywords"><meta content="Congreso Visible - Toda la información sobre el Congreso Colombiano en un solo lugar" property="og:title"/><meta content="/static/img/logo-fb.jpg"

The names shown in there: Óscar Mauricio Lizcano Arango and Berner León Zambrano Eraso. 那里显示的名字：ÓscarMauricio Lizcano Arango和BernerLeónZambrano Eraso。

So it would be something like everything after 所以那之后的一切都会像

<meta content="

and before 和之前

name="keywords".

Also, using python, I would like to put every name as an element of a list. 另外，使用python，我想将每个名称都作为列表的元素。 I would repeat this many times for different strings and the amount of names vary (it could be 4 names instead of 2 as in this case). 我会针对不同的字符串重复多次，并且名称的数量也有所不同（可以是4个名称，而不是本例中的2个）。

How could I do this? 我该怎么办？

Answer 1

我做到了

re.findall(r'(?<=content=",)[^.]+(?=name=)', names)

Answer 2

This might help you: 这可能对您有帮助：

# -*- coding: utf-8 -*-
import re
or_str = '<meta content=",\n\n\nÓscar Mauricio  Lizcano Arango,\n\n\n\n\n\n\n\nBerner León Zambrano Eraso,\n\n\n\n\n" name="keywords"><meta content="Congreso Visible - Toda la información sobre el Congreso Colombiano en un solo lugar" property="og:title"/><meta content="/static/img/logo-fb.jpg"'
new_str = or_str.replace("\n","")
li = re.findall('meta content=",(.*)" name="keywords"', new_str);
new_str = ''.join(li)
print re.findall('(.*?),',new_str)

I used replace() method to change all the newline characters \\n to NULL . 我使用replace()方法将所有换行符\\n更改为NULL 。
Then, I used findall to look for the names and put it in a list, and again used findall to store every name as an element of a list, since findall returns a list. 然后，我使用findall查找名称并将其放在列表中，然后再次使用findall将每个名称存储为列表的元素，因为findall返回列表。

正则表达式提取特定文本前后的所有内容

问题描述

2 个解决方案

解决方案1
1 2016-10-13 22:51:52

解决方案2
1 已采纳 2016-10-13 23:23:07

正则表达式提取特定文本前后的所有内容

问题描述

2 个解决方案

解决方案1 1 2016-10-13 22:51:52

解决方案2 1 已采纳 2016-10-13 23:23:07

解决方案1
1 2016-10-13 22:51:52

解决方案2
1 已采纳 2016-10-13 23:23:07