简体   繁体   English

PHP-可运行的RegEx无法正常工作?

[英]PHP - Working RegEx is… not working?

I have a regular expression on line 6. I have tested it, and confirmed it works. 我在第6行有一个正则表达式。我已经对其进行了测试,并确认它可以工作。 Now, I have a very large array. 现在,我的阵列很大。 Though I have tested the regular expression on smaller arrays, it does not appear to work on the larger one! 尽管我已经在较小的数组上测试了正则表达式,但它似乎不适用于较大的数组! What gives? 是什么赋予了? Basically, the script below is intended to remove array entries that contain uncommon characters. 基本上,以下脚本旨在删除包含不常见字符的数组条目。 Look at what happens when you try both arrays--the first one successfully removes the odd array entry, but the larger one does not! 看看同时尝试两个数组会发生什么—第一个成功删除了奇数数组条目,但是较大的数组却没有!

See here: http://pastebin.com/raw.php?i=KpNZHbrv 看到这里: http : //pastebin.com/raw.php?i=KpNZHbrv

$sentences_working = array(
    'Egypt, officially the Arab Republic of Egypt, is a country mainly in North Africa, with the Sinai Peninsula forming a land bridge in Southwest Asia.',
    'Egypt is thus a transcontinental country, and a major power in Africa, the Mediterranean Basin, the Middle East and the Muslim world.',
    'The English name Egypt was borrowed from Middle French Egypte, from Latin, from ancient Greek Aígyptos, from earlier Linear B 𐁁𐀓𐀠𐀴𐀍 a-ku-pi-ti-yo.',
    'In later years, the dynasty became a British puppet.',
);

$sentences_notworking = unserialize(urldecode("a%3A30%3A%7Bi%3A0%3Bs%3A160%3A%22Egypt++%2C+officially+the+Arab+Republic+of+Egypt%2C+Arabic%3A+%2C+is+a+country+mainly+in+North+Africa%2C+with+the+Sinai+Peninsula+forming+a+land+bridge+in+Southwest+Asia.%22%3Bi%3A1%3Bs%3A133%3A%22Egypt+is+thus+a+transcontinental+country%2C+and+a+major+power+in+Africa%2C+the+Mediterranean+Basin%2C+the+Middle+East+and+the+Muslim+world.%22%3Bi%3A2%3Bs%3A223%3A%22Covering+an+area+of+about+1%2C010%2C000+square+kilometers+%2C+Egypt+is+bordered+by+the+Mediterranean+Sea+to+the+north%2C+the+Gaza+Strip+and+Israel+to+the+northeast%2C+the+Red+Sea+to+the+east%2C+Sudan+to+the+south+and+Libya+to+the+west.%22%3Bi%3A3%3Bs%3A74%3A%22Egypt+is+one+of+the+most+populous+countries+in+Africa+and+the+Middle+East.%22%3Bi%3A4%3Bs%3A171%3A%22The+great+majority+of+its+over+81+million+people+live+near+the+banks+of+the+Nile+River%2C+in+an+area+of+about+40%2C000+square+kilometers+%2C+where+the+only+arable+land+is+found.%22%3Bi%3A5%3Bs%3A60%3A%22The+large+areas+of+the+Sahara+Desert+are+sparsely+inhabited.%22%3Bi%3A6%3Bs%3A177%3A%22About+half+of+Egypt%27s+residents+live+in+urban+areas%2C+with+most+spread+across+the+densely+populated+centres+of+greater+Cairo%2C+Alexandria+and+other+major+cities+in+the+Nile+Delta.%22%3Bi%3A7%3Bs%3A118%3A%22Monuments+in+Egypt+such+as+the+Giza+pyramid+complex+and+its+Great+Sphinx+were+constructed+by+its+ancient+civilization.%22%3Bi%3A8%3Bs%3A155%3A%22Its+ancient+ruins%2C+such+as+those+of+Memphis%2C+Thebes%2C+and+Karnak+and+the+Valley+of+the+Kings+outside+Luxor%2C+are+a+significant+focus+of+archaeological+study.%22%3Bi%3A9%3Bs%3A83%3A%22The+tourism+industry+and+the+Red+Sea+Riviera+employ+about+12%25+of+Egypt%27s+workforce.%22%3Bi%3A10%3Bs%3A170%3A%22The+economy+of+Egypt+is+one+of+the+most+diversified+in+the+Middle+East%2C+with+sectors+such+as+tourism%2C+agriculture%2C+industry+and+service+at+almost+equal+production+levels.%22%3Bi%3A11%3Bs%3A133%3A%22In+early+2011%2C+Egypt+underwent+a+revolution%2C+which+resulted+in+the+ousting+of+President+Hosni+Mubarak+after+nearly+30+years+in+power.%22%3Bi%3A12%3Bs%3A50%3A%22Presidential+elections+are+scheduled+for+May+2012.%22%3Bi%3A13%3Bs%3A190%3A%22The+English+name+Egypt+was+borrowed+from+Middle+French+Egypte%2C+from+Latin+%2C+from+ancient+Greek+A%26iacute%3Bgyptos+%2C+from+earlier+Linear+B+%26%2365601%3B%26%2365555%3B%26%2365568%3B%26%2365588%3B%26%2365549%3B+a-ku-pi-ti-yo.%22%3Bi%3A14%3Bs%3A277%3A%22The+adjective+aig%26yacute%3Bpti-%2C+aig%26yacute%3Bptios+was+borrowed+into+Coptic+as+%26%2311397%3B%26%2311433%3B%26%2311425%3B%26%231007%3B%26%2311411%3B%26%2311423%3B%26%2311429%3B%2F%26%2311413%3B%26%2311433%3B%26%2311425%3B%26%231007%3B%26%2311411%3B%26%2311423%3B%26%2311429%3B+gyptios%2C+kyptios%2C+and+from+there+into+Arabic+as+%2C+back+formed+into+%2C+whence+English+Copt.%22%3Bi%3A15%3Bs%3A209%3A%22The+Greek+forms+were+borrowed+from+Late+Egyptian++Hikuptah+%22Memphis%22%2C+a+corruption+of+the+earlier+Egyptian+name+Hwt-ka-Ptah+%2C+meaning+%22home+of+the+ka++of+Ptah%22%2C+the+name+of+a+temple+to+the+god+Ptah+at+Memphis.%22%3Bi%3A16%3Bs%3A130%3A%22Strabo+attributed+the+word+to+a+folk+etymology+in+which+A%26iacute%3Bgyptos++evolved+as+a+compound+from++%2C+meaning+%22below+the+Aegean%22.%22%3Bi%3A17%3Bs%3A288%3A%22%2C+the+Arabic+and+modern+official+name+of+Egypt+%2C+is+of+Semitic+origin%2C+directly+cognate+with+other+Semitic+words+for+Egypt+such+as+the+Hebrew+%26lrm%3B+%2C+literally+meaning+%22the+two+straits%22+.+The+word+originally+connoted+%22metropolis%22+or+%22civilization%22+and+means+%22country%22%2C+or+%22frontier-land%22.%22%3Bi%3A18%3Bs%3A232%3A%22The+ancient+Egyptian+name+of+the+country+is+Kemet++%5B%26%2378222%3B%26%2378163%3B%26%2378799%3B%26%2378486%3B%5D%2C+which+means+%22black+land%22%2C+referring+to+the+fertile+black+soils+of+the+Nile+flood+plains%2C+distinct+from+the+deshret+%2C+or+%22red+land%22+of+the+desert.%22%3Bi%3A19%3Bs%3A152%3A%22The+name+is+realized+as++and++in+the+Coptic+stage+of+the+Egyptian+language%2C+and+appeared+in+early+Greek+as++.+Another+name+was++%22land+of+the+riverbank%22.%22%3Bi%3A20%3Bs%3A105%3A%22The+names+of+Upper+and+Lower+Egypt+were+Ta-Sheme%27aw++%22sedgeland%22+and+Ta-Mehew++%22northland%22%2C+respectively.%22%3Bi%3A21%3Bs%3A79%3A%22There+is+evidence+of+rock+carvings+along+the+Nile+terraces+and+in+desert+oases.%22%3Bi%3A22%3Bs%3A103%3A%22In+the+10th+millennium+BC%2C+a+culture+of+hunter-gatherers+and+fishers+replaced+a+grain-grinding+culture.%22%3Bi%3A23%3Bs%3A117%3A%22Climate+changes+and%2For+overgrazing+around+8000+BC+began+to+desiccate+the+pastoral+lands+of+Egypt%2C+forming+the+Sahara.%22%3Bi%3A24%3Bs%3A129%3A%22Early+tribal+peoples+migrated+to+the+Nile+River+where+they+developed+a+settled+agricultural+economy+and+more+centralized+society.%22%3Bi%3A25%3Bs%3A63%3A%22By+about+6000+BC+a+Neolithic+culture+rooted+in+the+Nile+Valley.%22%3Bi%3A26%3Bs%3A104%3A%22During+the+Neolithic+era%2C+several+predynastic+cultures+developed+independently+in+Upper+and+Lower+Egypt.%22%3Bi%3A27%3Bs%3A108%3A%22The+Badarian+culture+and+the+successor+Naqada+series+are+generally+regarded+as+precursors+to+dynastic+Egypt.%22%3Bi%3A28%3Bs%3A100%3A%22The+earliest+known+Lower+Egyptian+site%2C+Merimda%2C+predates+the+Badarian+by+about+seven+hundred+years.%22%3Bi%3A29%3Bs%3A198%3A%22Contemporaneous+Lower+Egyptian+communities+coexisted+with+their+southern+counterparts+for+more+than+two+thousand+years%2C+remaining+culturally+distinct%2C+but+maintaining+frequent+contact+through+trade.%22%3B%7D"));

$sentences = $sentences_working; // change here for testing
foreach ($sentences as $sentence_key => $sentence)
{
    if (preg_match('/[^\x20-\x7E]/', $sentence))
    {
        unset($sentences[$sentence_key]);
    }
}

echo "<pre>";
print_r($sentences);
echo "</pre>";

To test both arrays (the smaller and larger) simply change line 12. What's going on? 要测试两个数组(较小和较大),只需更改第12行。发生了什么?

The long string you provided is not only urlencoded, php serialized, but also has it's entities encoded. 您提供的长字符串不仅经过urlencoded,php序列化,而且还对其实体进行了编码。

html_entity_decode should fix it, but I would also think a bit about whether a triple encoding is really what you want. html_entity_decode应该修复它,但我还要考虑一下三重编码是否真的是您想要的。

Edit : Just saw your comment: 编辑 :刚刚看到您的评论:

I have a feeling the character codes are within the allowed range for some reason. 我出于某种原因感到字符代码在允许的范围内。

A few things here.. it is so very easy to find out.. just do a hexdump. 这里有几件事..非常容易发现..只是做一个十六进制转储。 However, you would have spotted this even earlier if you ran this in a terminal. 但是,如果您在终端中运行它,您甚至会更早发现它。 If you're gonna check out the output of a PHP script where encoding is relevant.. at least check 'view source' rather than your browser screen. 如果您要检查与编码相关的PHP脚本的输出,请至少检查“查看源代码”而不是浏览器屏幕。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM