[英]Regex: Match all occurrences between specified strings
I'm dealing with a bunch of text files that refer to image filenames. 我正在处理一堆引用图像文件名的文本文件。 These filenames were sanitized (made lowercase and whitespace replaced with hyphens) - but the text referring to them was not. 这些文件名已经过清理(将小写字母和空格替换为连字符)-但是引用它们的文本却没有。
I need to transform strings like this: 我需要像这样转换字符串:
(image: uploaded IMAGE.jpg caption: this is my caption)
(image: uploaded IMAGE copy.jpeg caption: this is my caption)
(image: IMG_6087.png caption: this is my caption)
(image: IMG_6087 copy.gif)
(image: IMG_9999_copy.jpg)
(image: somehow, a comma.jpg)
(image: other ridic'ulous characters!.jpg)
to: 至:
(image: uploaded-image.jpg caption: this is my caption)
(image: uploaded-image-copy.jpeg caption: this is my caption)
(image: img_6087.png caption: this is my caption)
(image: img_6087-copy.gif)
(image: img_9999_copy.jpg)
(image: somehow-a-comma.jpg)
(image: other-ridiculous-characters.jpg)
These strings are parts of larger blocks of text, but are all on their own lines, like so: 这些字符串是较大的文本块的一部分,但都位于各自的行上,如下所示:
This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan.
(image: manhattan photo.jpg)
Drive till sunset and say goodbye to your body, because this is not a photograph. I saw sixteen americans, raised by wolves, probably lost in paradise city. I found your head — Do you still want it?
I'm using Sublime text and was planning on doing multiple Replace Alls: 我正在使用Sublime文本,并计划进行多次替换操作:
But I can't manage to capture all instances of something between the two delimiters. 但是我无法捕获两个定界符之间的所有实例。
(?<=^\\(image: )[what do I do here??](?=\\.jpe?g|png|gif)
you can use non-greedy match-all .*?
您可以使用非贪婪的所有人.*?
so ^\\(image: (.*?\\.(:?jpe?g|png|gif))
to capture the filename including extension 因此^\\(image: (.*?\\.(:?jpe?g|png|gif))
捕获包含扩展名的文件名
You can grab the filenames with: 您可以使用以下方法获取文件名:
(?<=image:\s)([^.]++)(?=\.jpe?g|\.png|\.gif)
After that, the transformations depend on the language that you're working in. Add file extensions as you need them. 之后,转换取决于您使用的语言。根据需要添加文件扩展名。 Right now you support jpg
, jpeg
, png
, and gif
. 现在,您支持jpg
, jpeg
, png
和gif
。
Here is a working way to do it in PHP 这是在PHP中完成此工作的方法
<?php
$string =
"This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan.
(image: uploaded IMAGE.jpg caption: this is my caption)
This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan.
(image: uploaded IMAGE copy.jpeg caption: this is my caption)
(image: IMG_6087.png caption: this is my caption)
(image: IMG_6087 copy.gif) blah blah
(image: IMG_9999_copy.jpg)
(image: somehow, a comma.jpg)
(image: other ridic'ulous characters!.jpg)";
echo preg_replace_callback('~(?<=\(image: )(.*?)\.(jpg|jpeg|png|gif)~', function($matches)
{
return preg_replace('~\W~', '-', stripslashes(strtolower($matches[1]))) . ".$matches[2]";
}, $string);
?>
[EDIT] add regex explanation: [编辑]添加正则表达式说明:
(?<=image: )
: is a positive lookbehind - so checking the presence of 'image: ' but not capturing. (?<=image: )
:):是令人反感的-因此请检查'image:'的存在,但不能捕获。 (.*?)
: captures everything before the image extension in a greedy way - so match as few text as possible. (.*?)
:以贪婪的方式捕获图像扩展名之前的所有内容-因此匹配的文本越少越好。 \\.(jpg|jpeg|png|gif)
: will match .
\\.(jpg|jpeg|png|gif)
:将匹配.
literally + one of the given extensions - and capturing the extension to reuse. 从字面上看+给定的扩展之一-并捕获扩展以重用。 ~
: is the delimiter, this choice just because it is very seldom used in strings and won't need to \\
the /
~
:是分隔符,这种选择只是因为它是在字符串很少使用,不需要\\
的/
\\W
: is the opposite of \\w
and it will match any non-alphanumeric character. \\W
:与\\w
相反,它将匹配任何非字母数字字符。 Will output (in view source): 将输出(在视图源中):
This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan.
(image: uploaded-image.jpg caption: this is my caption)
This is not a short guide to write about art. Go in, out of the window, inside New York’s stars qualities, dreams and schemes. People are gathered together, brewing coffee — you have seen their faces? The artists in Manhattan.
(image: uploaded-image-copy.jpeg caption: this is my caption)
(image: img_6087.png caption: this is my caption)
(image: img_6087-copy.gif) blah blah
(image: img_9999_copy.jpg)
(image: somehow--a-comma.jpg)
(image: other-ridic-ulous-characters-.jpg)
You can then fine-tune in the callback what character you want to transform into what, with str_replace() for instance. 然后,您可以使用str_replace()在回调中微调您想将什么字符转换成什么字符。
hope it helps! 希望能帮助到你! ;) ;)
Can you try Jetbrains webstrom front end IDE. 您可以尝试Jetbrains webstrom前端IDE吗? Which provides lot of capabilities to achieve any regex operations in readable way. 它提供了许多以可读方式实现任何正则表达式操作的功能。 Select a text you want to split are check for delimiters or any white-spaces. 选择要拆分的文本,检查是否有分隔符或任何空白。
You will get it for 30 days trail version . 您将获得30天试用版。 Also will share you the regex query shortly. 也将很快与您分享正则表达式查询。
Also checkout http://myregexp.com/ or some plugin to valid your regex queries 还可以检出http://myregexp.com/或某些插件来验证您的正则表达式查询
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.