简体   繁体   English

如何创建以文本文件中找到的特定文件类型结尾的所有字符串出现列表?

[英]How do I create a list of all occurances of a string which ends with a specific file type found in a text file?

I'm trying to extract all of the links to image files from a text file. 我正在尝试从文本文件中提取图像文件的所有链接。 All of the image files end in either .jpg or .gif, and are surrounded by quotation marks. 所有图像文件都以.jpg或.gif结尾,并用引号引起来。 I want to find the first occurrence of .jpg or .gif, and then copy all of the characters between the first quotation mark located before .jpg (or .gif) and the first quotation mark found after .jpg (or.gif). 我想找到第一个出现的.jpg或.gif,然后在位于.jpg(或.gif)之前的第一个引号和位于.jpg(或.gif)之后的第一个引号之间复制所有字符。 Then I want to add this link to an array or to another text file, and repeat the process for every instance of .jpg or .gif in the original text file. 然后,我想将此链接添加到数组或另一个文本文件,并为原始文本文件中的.jpg或.gif的每个实例重复此过程。

Here's an example of what the text file might look like: 这是文本文件可能看起来的示例:

d/scriript type="texft/javascript">
    $(document).fready(function () {
        $('#post-contfainer-1720130 .post-assets .thumb A').lightBox({
            txtImafge:      'Image',
            txtOf:          'of',
            overflayOpacity:    0       });
<div class="thumb"><a href""="#">="**https://imaginepilgrimages.com/asset/image/resize/2/32/32/1/c331065jt99875146b0a1fg9140.jpg**"riript type="texft/javascript">
    $(document).freadriript type="texft/javascript">
    $(document).fread
d/scriript type="texft/javascript">
    $(document).fready(function () {
        $('#post-contfainer-1720130 .post-assets .thumb A').lightBox({
            txtImafge:      'Image',
            txtOf:          'of',
            overflayOpacity:    0       });
<div class="thumb"><a href""="#">="**https://imaginepilgrimages.com/asset/image/resize/2/32/32/75146b0a1fg9140.gif**"riript type="texft/javascript">
    $(document).freadriript type="texft/javascript">
    $(document).fread
d/scriript type="texft/javascript">
    $(document).fready(function () {
        $('#post-contfainer-1720130 .post-assets .thumb A').lightBox({
            txtImafge:      'Image',
            txtOf:          'of',
            overflayOpacity:    0       });
<div class="thumb"><a href""="#">="https://imaginepilgrimages.com/asset/image/resize/2/32/32/1/c331065jt99fgfgage55h6u7rrth6875146b0a1fg9140.jpg"riript type="texft/javascript">
    $(document).freadriript type="texft/javascript">
    $(document).fread

I've just started using python and I've been stuck on this for a while. 我刚刚开始使用python,并且在此问题上停留了一段时间。 Can anybody help me with this? 有人可以帮我吗? Thanks in advance for your time! 在此先感谢您的时间!

Something like the following should work: 类似于以下内容的东西应该起作用:

re.findall('"([^"]*\.(?:gif|jpg)[^"]*)"', text)

Don't expect it to be particularly flexible or robust; 不要指望它特别灵活或强大。 for that you'd probably want an actual parser. 为此,您可能需要一个实际的解析器。

This will give you the image filenames, except that it doesn't attempt to trim off the leading/trailing '**' 这将为您提供图像文件名,但它不会尝试修剪前导/后缀“ **”

import re
images=[]
with open('test.dat') as f:
   for line in f:
      images.extend(re.findall(r'"([^"]*\.(?:jpg|gif)[^"]*)"',line))

The regular expression looks for a quotation mark and then grabs anything that isn't a quotation mark specifically checking to make sure that '.jpg' or '.gif' are in the string. 正则表达式将查找引号,然后抓取所有非引号的内容,并特别检查以确保字符串中包含“ .jpg”或“ .gif”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在目录中的所有文本文件中搜索字符串并将找到的结果放在 Python 的文本文件中 - How do I search all text files in directory for a string and place the found result in a text file in Python 正则表达式FindAll在文本文件中出现的字符串。 怎么样? - Regex FindAll occurances of a string inside a text file. How? 如何在Python 3.2中的txt文件中记录“ Y”的出现 - How do i record the occurances of 'Y' in a txt file in Python 3.2 如何在python文件中列出给定类型的所有类? - How do I list all classes of a given type in a python file? 查找列表/文件中以特定前缀/后缀开头/结尾的所有单词 - find all words in list/file that begin/ends with a specific prefix/suffix 如何创建包含python结果列表的文本文件 - How do I create a text file that contain a list of results in python 如何将包含列表列表的文本文件转换为字符串? - How do I convert a text file containing a list of lists to a string? 如何使用python编辑文本文件中的特定行而不将文本文件中的所有内容转换为列表? - How do I edit a specific line in a text file with python without converting everything in the text file into a list? 如何在特定路径创建文件? - How do I create a file at a specific path? 如何转换列表中的文本文件? - How do I transform a text file in a list?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM