简体   繁体   English

URL编码的PHP正则表达式解析问题

[英]URL encoded Regex Parsing issue in PHP

So, I've URL encoded a word doc trying to parse certain fields..which is a pain. 所以,我已经对URL编码了一个doc文档,试图解析某些字段..这很痛苦。 Though there are some "unexpected" results, I've got everything running great except for this one off. 尽管有一些“出乎意料”的结果,但除此一项外,我的所有工作都很顺利。

Here is an example of the output from Word for 99.8% of the results: 这是Word输出99.8%结果的示例:

%13+FORMTEXT+%01%14wes%15 %13 + FORMTEXT +%01%14wes%15

Normally, the regex I setup grabs all the fields exactly as I need, for the example above. 通常,对于上面的示例,我设置的正则表达式完全按照我的需要捕获所有字段。 But the example below is a weird one. 但是下面的例子很奇怪。 Trying to parse out "wes" from the bottom example. 尝试从底部示例中解析出“ wes”。

%13+FORMTEXT+%01%15%86%15%9A%9C%9E%A0%F2%F4%0A%1A%1C%1E+468%3A%3C%3E%40TVXZ%5C%15%60bvxz%FC%F0%E0%14%D4%C1%06%14wes%15 %13 + FORMTEXT +%01%15%86%15%9A%9C%9E%A0%F2%F4%0A%1A%1C%1E + 468%3A%3C%3E%40TVXZ%5C%15%60bvxz%FC %F0%E0%14%D4%C1%06%14wes%15

Mind you, this is one big string, so it would continue on in this fashion: 请注意,这是一个很大的字符串,因此将以这种方式继续:

%13+FORMTEXT+%01%15%86%15%9A%9C%9E%A0%F2%F4%0A%1A%1C%1E+468%3A%3C%3E%40TVXZ%5C%15%60bvxz%FC%F0%E0%14%D4%C1%06%14wes%15%13+FORMTEXT+%01%14wess%15

Notice the huge gap between %01 and %14 then the text between %14 and %15. 请注意%01和%14之间的巨大差异,然后是%14和%15之间的文本。 Usually %01%14 are side by side, in this case there is nonsense between them...lots of it, this is shortened for the example. 通常%01%14是并排的,在这种情况下,它们之间是无稽之谈...很多,在本示例中这是缩短的。

Cheers, Wes 干杯,韦斯

走了另一条路线,将doc转换为docx / ooxml并在XML上使用了正则表达式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM