简体   繁体   English

正则表达式提取特定文本前后的所有内容

[英]Regex extract everything after and before a specific text

I need to extract from this: 我需要从中提取:

<meta content=",\n\n\nÓscar Mauricio  Lizcano Arango,\n\n\n\n\n\n\n\nBerner León Zambrano Eraso,\n\n\n\n\n" name="keywords"><meta content="Congreso Visible - Toda la información sobre el Congreso Colombiano en un solo lugar" property="og:title"/><meta content="/static/img/logo-fb.jpg" 

The names shown in there: Óscar Mauricio Lizcano Arango and Berner León Zambrano Eraso. 那里显示的名字:ÓscarMauricio Lizcano Arango和BernerLeónZambrano Eraso。

So it would be something like everything after 所以那之后的一切都会像

<meta content=" 

and before 和之前

name="keywords". 

Also, using python, I would like to put every name as an element of a list. 另外,使用python,我想将每个名称都作为列表的元素。 I would repeat this many times for different strings and the amount of names vary (it could be 4 names instead of 2 as in this case). 我会针对不同的字符串重复多次,并且名称的数量也有所不同(可以是4个名称,而不是本例中的2个)。

How could I do this? 我该怎么办?

我做到了

re.findall(r'(?<=content=",)[^.]+(?=name=)', names)

This might help you: 这可能对您有帮助:

# -*- coding: utf-8 -*-
import re
or_str = '<meta content=",\n\n\nÓscar Mauricio  Lizcano Arango,\n\n\n\n\n\n\n\nBerner León Zambrano Eraso,\n\n\n\n\n" name="keywords"><meta content="Congreso Visible - Toda la información sobre el Congreso Colombiano en un solo lugar" property="og:title"/><meta content="/static/img/logo-fb.jpg"'
new_str = or_str.replace("\n","")
li = re.findall('meta content=",(.*)" name="keywords"', new_str);
new_str = ''.join(li)
print re.findall('(.*?),',new_str)

I used replace() method to change all the newline characters \\n to NULL . 我使用replace()方法将所有换行符\\n更改为NULL
Then, I used findall to look for the names and put it in a list, and again used findall to store every name as an element of a list, since findall returns a list. 然后,我使用findall查找名称并将其放在列表中,然后再次使用findall将每个名称存储为列表的元素,因为findall返回列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python中正则表达式之后/之前的所有内容 - Everything after/before regex in Python 正则表达式-在@之后和特定字符之前查找文本 - Regex - Find text after @ and before specific chars Python提取前3个单词和3个单词后带有正则表达式的特定单词列表 - Python extract 3 words before and 3 words after a specific list of words with a regex 正则表达式:提取逗号之前的所有内容,如果没有逗号,则提取所有文本,特定数字逗号数字组合除外 - Regular expression: extract everything before comma or all text if no comma, except a specific digit comma digit combination 在 Pandas 中删除正则表达式匹配前后的所有内容 - Deleting everything before and after regex match in Pandas python 正则表达式 re.sub:删除模式之前或之后的所有内容,直到以两种方式找到特定条件 - python regex re.sub: remove everything before or after a pattern until find a specific condition in both ways 熊猫:提取连字符前后的特定文本,以给定的子字符串结尾 - pandas: extract specific text before or after hyphen, that ends in given substrings 提取除正则表达式匹配之外的所有内容 - Extract everything but a regex match 使用 Selenium 在特定文本之后查找所有内容 - Finding everything after a specific text using Selenium 正则表达式,在另一个词之前和之后提取词 - Regex, extract word before and after another one
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM