简体   繁体   English

在一个大字符串中查找多次出现的不同URL,其中每个URL使用Python在两个特定的子字符串之间

[英]Find multiple occurrences of different URLs in a big string, where each URL is between two specific substrings using Python

I have a file containing just one long string which has multiple URLs embedded in it. 我有一个仅包含一个长字符串的文件,该字符串中嵌入了多个URL。 The URLs are all different but are always enclosed between two specific substrings. URL都是不同的,但始终包含在两个特定的子字符串之间。 How can I extract all the URLs? 如何提取所有URL?

My File Contents look like the following: 我的文件内容如下所示:

data-starred-src="www.example.com" data-non-starred-src asdf asdf ghgh data-starred-src="www.someurl.com" data-non-starred-src gjsltg ajshssl ahssfh data-starred-src="www.anotherurl.com" data-non-starred-src

I want to extract URLs in the form 我想提取表单中的URL

www.example.com
www.someurl.com
www.anotherurl.com

On the example, this one: 在示例中,此示例:

print re.findall(r'data-starred-src\s*=\s*"([^"]*)"', line)

Gives: 得到:

['www.example.com', 'www.someurl.com', 'www.anotherurl.com']

This should do it: 应该这样做:

(?<=\")([^"]+\.[^"]+\.[^"]+)(?=\")

Working regex example: 工作正则表达式示例:

http://regex101.com/r/sI2jL7 http://regex101.com/r/sI2jL7

or another example: 或另一个例子:

http://regex101.com/r/sI2jL7 http://regex101.com/r/sI2jL7

Try the following: 请尝试以下操作:

import re
r1 = re.compile('(?:AAA ")([^"]*)(?:" BBB)')
s = 'AAA "www.example.com" BBB asdf asdf ghgh AAA "www.someurl.com" BBB gjsltg ajshssl ahssfh AAA "www.anotherurl.com" BBB'
res = r1.findall(s)

You may also consider using finditer() if s is really long. 如果s真的很长,您也可以考虑使用finditer()

Updated re looks like this 更新后的内容看起来像这样

r1 = re.compile('(?:data-starred-src=")([^"]*)(?:" data-non-starred-src)')

but I've simply replaced AAA and BBB with new delimiters so it's possible that the code won't work if it didn't work before. 但是我只是用新的定界符替换了AAA和BBB,所以如果以前不起作用,则该代码可能无法起作用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在两个大的csv文件中找到字符串中的子字符串(Python) - How to find a substrings in string in two big csv file (python) Python3:查找字符串的两个子字符串之间的长度 - Python3: Find length between two substrings of a string Python正则表达式查找两个子字符串之间的所有字符串 - Python Regex Find All String Between Two Substrings 在pandas的文本列中的两个字符串之间查找多次出现的字符串 - Find multiple occurrences of a string between two strings in a column of text in pandas 使用Python查找字符串中多次出现的子串和字符的通用代码或Python函数? - Generic code or Python function for finding multiple occurrences of substrings and character in string using Python? 使用python查找字符串中的子字符串 - Find substrings in string using python 用python循环中的不同子字符串替换字符串的相同子字符串出现 - Replacing same substring occurrences of a string with different substrings from looping in python 在Python中查找多个字符串出现 - find multiple string occurrences in Python 在数据的 stream 中查找两个子字符串之间的字符串 - Find string between two substrings, in a stream of data 如何在python字符串中的子字符串之间找到子字符串? - How to find substrings between substrings within a python string?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM