简体   繁体   English

从PHP中的简单正则表达式中排除一些单词

[英]Exclude a few words from a simple regex in PHP

I'm categorizing a few folders on my drives and I want to weed out low quality files using this regex (this works): 我在驱动器上分类了几个文件夹,我想用这个正则表达式清除低质量的文件(这个工作):

xvid|divx|480p|320p|DivX|XviD|DIVX|XVID|XViD|DiVX|DVDSCR|PDTV|pdtv|DVDRip|dvdrip|DVDRIP

Now some filenames are in High Definition but still have DVD or XviD in their filenames but also 1080p, 720p, 1080i or 720i. 现在有些文件名是高清晰度的,但文件名中仍然有DVD或XviD,但也有1080p,720p,1080i或720i。 I need a single regex to match the one above but exclude these words 1080p, 720p, 1080i or 720i. 我需要一个正则表达式来匹配上面的那个,但排除这些单词1080p,720p,1080i或720i。

Use two regex's 使用两个正则表达式

one to find if it matches 一个找到它是否匹配

1080p|720p|1080i|720i

Then if it doesn't , that is no match is found for the above, check for matches: 然后,如果没有 ,那就找不到匹配的内容,检查匹配:

xvid|divx|480p|320p|DivX|XviD|DIVX|XVID|XViD|DiVX|DVDSCR|PDTV|pdtv|DVDRip|dvdrip|DVDRIP

Regular expressions don't support inverse matching, you could use negative look-arounds but for this task I wouldn't say they're appropriate. 正则表达式不支持反向匹配,你可以使用负面的环顾,但对于这个任务,我不会说它们是合适的。 As you check for all the cases of 1080p-divx , you put a negative look ahead, however it doesn't catch divx-10bit-1080p , you couldn't achieve this in a simple regex. 当你检查1080p-divx所有情况时,你会看到负面的情况,但它没有捕获divx-10bit-1080p ,你无法在一个简单的正则表达式中实现这一点。

You can use a negative lookahead for this 你可以使用负向前瞻

^(?!.*(?:1080p|720p|1080i|720i)).*(?:xvid|divx|480p|320p|DivX|XviD|DIVX|XVID|XViD|DiVX|DVDSCR|PDTV|pdtv|DVDRip|dvdrip|DVDRIP)

This will match on your search strings, but fail if there is also 1080p|720p|1080i|720i in the string. 这将匹配您的搜索字符串,但如果字符串中还有1080p|720p|1080i|720i ,则会失败。

You can do it like this: 你可以这样做:

<pre><?php
$subjects = array('Arrival of the train at La Ciotat station.avi',
                  'Gardenator II - multi - DVDrip - 720i.mkv',
                  'The adventures of Roberto the bear - divx.avi',
                  'Tokyo’s Ginza District - dvdrip.mkv');

$pattern = '~(?(DEFINE)(?<excl>(?>d(?>vd(?>rip|scr)|ivx)|pdtv|xvid|320p|480p)))
             (?(DEFINE)(?<keep>(?>[^17]+?|1(?!080[ip])|7(?!20[ip]))))
             ^\g<keep>*\g<excl>\g<keep>*$  ~ix';

foreach($subjects as $subject) {
    if (preg_match($pattern, $subject)) echo $subject."\n"; }

The main interest is to avoid to test a lookahead on each character. 主要的兴趣是避免测试每个角色的前瞻。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM