正则表达式：匹配指定字符串之间的所有匹配项

Question

I'm dealing with a bunch of text files that refer to image filenames. 我正在处理一堆引用图像文件名的文本文件。 These filenames were sanitized (made lowercase and whitespace replaced with hyphens) - but the text referring to them was not. 这些文件名已经过清理（将小写字母和空格替换为连字符）-但是引用它们的文本却没有。

I need to transform strings like this: 我需要像这样转换字符串：

(image: uploaded IMAGE.jpg caption: this is my caption)
(image: uploaded IMAGE copy.jpeg caption: this is my caption)
(image: IMG_6087.png caption: this is my caption)
(image: IMG_6087 copy.gif)
(image: IMG_9999_copy.jpg)
(image: somehow, a comma.jpg)
(image: other ridic'ulous characters!.jpg)

to: 至：

(image: uploaded-image.jpg caption: this is my caption)
(image: uploaded-image-copy.jpeg caption: this is my caption)
(image: img_6087.png caption: this is my caption)
(image: img_6087-copy.gif)
(image: img_9999_copy.jpg)
(image: somehow-a-comma.jpg)
(image: other-ridiculous-characters.jpg)

These strings are parts of larger blocks of text, but are all on their own lines, like so: 这些字符串是较大的文本块的一部分，但都位于各自的行上，如下所示：

This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan.

(image: manhattan photo.jpg)

Drive till sunset and say goodbye to your body, because this is not a photograph. I saw sixteen americans, raised by wolves, probably lost in paradise city. I found your head — Do you still want it?

I'm using Sublime text and was planning on doing multiple Replace Alls: 我正在使用Sublime文本，并计划进行多次替换操作：

strip whitespace 带空格
strip characters that are not alphanumeric or _ or - 去除不是字母数字或_或-的字符
make lowercase 小写

But I can't manage to capture all instances of something between the two delimiters. 但是我无法捕获两个定界符之间的所有实例。

(?<=^\\(image: )[what do I do here??](?=\\.jpe?g|png|gif)

Answer 1

you can use non-greedy match-all .*? 您可以使用非贪婪的所有人.*?

so ^\\(image: (.*?\\.(:?jpe?g|png|gif)) to capture the filename including extension 因此^\\(image: (.*?\\.(:?jpe?g|png|gif))捕获包含扩展名的文件名

Answer 2

You can grab the filenames with: 您可以使用以下方法获取文件名：

(?<=image:\s)([^.]++)(?=\.jpe?g|\.png|\.gif)

After that, the transformations depend on the language that you're working in. Add file extensions as you need them. 之后，转换取决于您使用的语言。根据需要添加文件扩展名。 Right now you support jpg , jpeg , png , and gif . 现在，您支持jpg ， jpeg ， png和gif 。

Answer 3

Here is a working way to do it in PHP 这是在PHP中完成此工作的方法

<?php
$string =
"This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan.

(image: uploaded IMAGE.jpg caption: this is my caption)
This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan.

(image: uploaded IMAGE copy.jpeg caption: this is my caption)
(image: IMG_6087.png caption: this is my caption)
(image: IMG_6087 copy.gif) blah blah
(image: IMG_9999_copy.jpg)
(image: somehow, a comma.jpg)
(image: other ridic'ulous characters!.jpg)";

echo preg_replace_callback('~(?<=\(image: )(.*?)\.(jpg|jpeg|png|gif)~', function($matches)
{
    return preg_replace('~\W~', '-', stripslashes(strtolower($matches[1]))) . ".$matches[2]";
}, $string);

?>

[EDIT] add regex explanation: [编辑]添加正则表达式说明：

(?<=image: ) : is a positive lookbehind - so checking the presence of 'image: ' but not capturing. (?<=image: ) ：）：是令人反感的-因此请检查'image：'的存在，但不能捕获。
(.*?) : captures everything before the image extension in a greedy way - so match as few text as possible. (.*?) ：以贪婪的方式捕获图像扩展名之前的所有内容-因此匹配的文本越少越好。
\\.(jpg|jpeg|png|gif) : will match . \\.(jpg|jpeg|png|gif) ：将匹配. literally + one of the given extensions - and capturing the extension to reuse. 从字面上看+给定的扩展之一-并捕获扩展以重用。
~ : is the delimiter, this choice just because it is very seldom used in strings and won't need to \\ the / ~ ：是分隔符，这种选择只是因为它是在字符串很少使用，不需要\\的/
\\W : is the opposite of \\w and it will match any non-alphanumeric character. \\W ：与\\w相反，它将匹配任何非字母数字字符。

Will output (in view source): 将输出（在视图源中）：

This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan.

(image: uploaded-image.jpg caption: this is my caption)
This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan.

(image: uploaded-image-copy.jpeg caption: this is my caption)
(image: img_6087.png caption: this is my caption)
(image: img_6087-copy.gif) blah blah
(image: img_9999_copy.jpg)
(image: somehow--a-comma.jpg)
(image: other-ridic-ulous-characters-.jpg)

You can then fine-tune in the callback what character you want to transform into what, with str_replace() for instance. 然后，您可以使用str_replace（）在回调中微调您想将什么字符转换成什么字符。

hope it helps! 希望能帮助到你！ ;) ;）

Answer 4

Can you try Jetbrains webstrom front end IDE. 您可以尝试Jetbrains webstrom前端IDE吗？ Which provides lot of capabilities to achieve any regex operations in readable way. 它提供了许多以可读方式实现任何正则表达式操作的功能。 Select a text you want to split are check for delimiters or any white-spaces. 选择要拆分的文本，检查是否有分隔符或任何空白。

You will get it for 30 days trail version . 您将获得30天试用版。 Also will share you the regex query shortly. 也将很快与您分享正则表达式查询。

Also checkout http://myregexp.com/ or some plugin to valid your regex queries 还可以检出http://myregexp.com/或某些插件来验证您的正则表达式查询

Online Regex editor 在线正则表达式编辑器

正则表达式：匹配指定字符串之间的所有匹配项

问题描述

4 个解决方案

解决方案1
0 2016-07-08 02:07:02

解决方案2
0 2016-07-08 03:29:09

解决方案3
0 2016-07-08 03:45:05

解决方案4
-1 2016-07-08 02:07:41

正则表达式：匹配指定字符串之间的所有匹配项

问题描述

4 个解决方案

解决方案1 0 2016-07-08 02:07:02

解决方案2 0 2016-07-08 03:29:09

解决方案3 0 2016-07-08 03:45:05

解决方案4 -1 2016-07-08 02:07:41

解决方案1
0 2016-07-08 02:07:02

解决方案2
0 2016-07-08 03:29:09

解决方案3
0 2016-07-08 03:45:05

解决方案4
-1 2016-07-08 02:07:41