简体   繁体   English

如何使用RegEx排除[]内的符号

[英]How to exclude a symbol within [ ] with RegEx

I am using PHP preg_match_all, and this is what I can get so far.... 我正在使用PHP preg_match_all,这是我到目前为止所能获得的....

[A-Za-z+\W]+\s[\d]

The only problem is that I need the \\W to not be a " . 唯一的问题是我需要\\ W不是"

So I have tried: 所以我试过了:

[A-Za-z+[^\dA-Za-z"]\s?]+\s[\d]


[A-Za-z+]\s?+[^A-Za-z\d"]?\s[\d]

among other things, and it is just failing and I really can't figure out why. 除其他外,它只是失败,我真的无法弄清楚为什么。

EDIT: 编辑:

Here is the entire RegEx; 这是整个RegEx;

([A-Z][a-z]+\s){1,5}\s?[^a-zA-Z\d\s:,.\'\"]\s?
[A-Za-z+\W]+\s[\d]{1,2}\s[A-Z][a-z]+\s[\d]{4}

I split it into two line, the second line begins with what I posted. 我把它分成两行,第二行从我发布的内容开始。

Patterns trying to match: 试图匹配的模式:

    India – Adulterated Tea Powder Seized 18 April 2011
    India – Importer of Haldiram’s Petha Sweet Cubes Issuing Voluntary Recall 26 April 2011
    India – Undeclared Gluten Found in Sweets by Canadian Authorities 27 April 2011
    India – Adulteration Found in Edible Oils 28 April 2011
    India – Viral Disease Affects Chili Crop in Goa 28 April 2011
NOT ---->   Chili – India: Goa”. 8 April 2011
    Ivory Coast – Potential Cocoa Quality Decline despite Sufficient Surplus 11 April 2011
    Japan – Sanuki Kanzume Co. and Failure to Comply with FDA Standards 27 April 2011
    Madagascar – Toxic Sardines 14 April 2011
    Madagascar – Update: Toxic Sardines 26 April 2011

the pattern you are showing matches all letters and non word characters. 您显示的模式匹配所有字母和非字符。 The only thing not included in the pattern are numbers and you also want to not match the double quote. 唯一没有包含在模式中的是数字,你也想要与双引号不匹配。

[^\d\"_]+\s\d

Edit: 编辑:

I could be wrong, but from the sample input, it appears you are just trying to match all lines that don't have a double quote. 我可能是错的,但是从示例输入来看,您似乎只是想匹配所有没有双引号的行。 If so something like this is much easier and I've even grouped the date separate from the rest of the string. 如果这样的事情更容易,我甚至将日期与字符串的其余部分分开。 If you don't need to group the sting/date then just remove all the parenthesis. 如果您不需要对sting / date进行分组,则只需删除所有括号。

<?php
error_reporting(E_ALL);
$str = "    India - Adulterated Tea Powder Seized 18 April 2011
    India - Importer of Haldiram’s Petha Sweet Cubes Issuing Voluntary Recall 26 April 2011
    India - Undeclared Gluten Found in Sweets by Canadian Authorities 27 April 2011
    India - Adulteration Found in Edible Oils 28 April 2011
    India - Viral Disease Affects Chili Crop in Goa 28 April 2011
    Chili - India: Goa\". 8 April 2011
    Ivory Coast - Potential Cocoa Quality Decline despite Sufficient Surplus 11 April 2011
    Japan - Sanuki Kanzume Co. and Failure to Comply with FDA Standards 27 April 2011
    Madagascar - Toxic Sardines 14 April 2011
    Madagascar - Update: Toxic Sardines 26 April 2011";
preg_match_all("/^([^\"]+?)(\d?\d\s[a-z]+\s\d{4})$/im", $str, $m);
echo '<pre>'.print_r($m, true).'</pre>';
?>

如果您知道所有行都可以接受或包含“(因此是不可接受的),那么[^\\"]+应该没问题。

尝试这个:

[A-Za-z+\\W^\\"]+\\s[\\d]

May be I'm missing something here. 可能是我在这里遗漏了一些东西。 With your own text and pattern if I have this code: 如果我有这个代码,请使用您自己的文本和模式:

$str = "India – Adulterated Tea Powder Seized 18 April 2011
    India – Importer of Haldiram’s Petha Sweet Cubes Issuing Voluntary Recall 26 April 2011
    India – Undeclared Gluten Found in Sweets by Canadian Authorities 27 April 2011
    India – Adulteration Found in Edible Oils 28 April 2011
    India – Viral Disease Affects Chili Crop in Goa 28 April 2011
    Chili – India: Goa”. 8 April 2011
    Ivory Coast – Potential Cocoa Quality Decline despite Sufficient Surplus 11 April 2011
    Japan – Sanuki Kanzume Co. and Failure to Comply with FDA Standards 27 April 2011
    Madagascar – Toxic Sardines 14 April 2011
    Madagascar – Update: Toxic Sardines 26 April 2011";
if(preg_match_all('~(?:[A-Z][a-z]+\s){1,5}\s?[^a-zA-Z\d\s:,.\'\"]\s?[A-Za-z+\W]+\s[\d]{1,2}\s[A-Z][a-z]+\s[\d]{4}~', $str, $m)) {
   print_r($m[0]);
}

OUTPUT is: 输出是:

Array
(
    [0] => India – Adulterated Tea Powder Seized 18 April 2011
    [1] => India – Undeclared Gluten Found in Sweets by Canadian Authorities 27 April 2011
    [2] => India – Adulteration Found in Edible Oils 28 April 2011
    [3] => India – Viral Disease Affects Chili Crop in Goa 28 April 2011
    [4] => Ivory Coast – Potential Cocoa Quality Decline despite Sufficient Surplus 11 April 2011
    [5] => Japan – Sanuki Kanzume Co. and Failure to Comply with FDA Standards 27 April 2011
    [6] => Madagascar – Toxic Sardines 14 April 2011
    [7] => Madagascar – Update: Toxic Sardines 26 April 2011
)

And you can see the line with Goa" doesn't appear in output. Isn't that the behavior you wanted? 而且你可以看到Goa"这一行Goa"没有出现在输出中。这不是你想要的行为吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM