简体   繁体   English

正则表达式php在html标签中找到一个字符

[英]Regex php find a character within html tag

I'm stuck on a stubborn problem I can't seem to solve. 我陷入了似乎无法解决的顽固问题。

I'm trying to find a specific character only when it is inside an html tag (not between). 我试图只在html标记内找到特定字符(不在两者之间)。

To test this I have 2 test strings: 为了测试这一点,我有2个测试字符串:

  1. a string with NO HTML. 没有HTML的字符串。 this is sentence 2. 这是句子2。
  2. a string with some HTML. 一个带有一些 HTML的字符串。 this is <a href="www.somesite.com">sentence</a>

I'd like to find all the period characters within < > html tags so the match should be 2 periods within www.somesite.com, I cannot get the match correctly. 我想在<> html标签中找到所有的句点字符,因此匹配项应该是www.somesite.com中的2个句点,我无法正确获取匹配项。 Can someone please take a look at my regex and see what I am missing? 有人可以看看我的正则表达式,看看我缺少什么吗?

(<[^>]*>?(\.))>?

Try this: 尝试这个:

$re = "/>[^<]*<(*SKIP)(*F)|searchText/mi";   //before | part avoid tag inner text and after | part search only tag inside text.
$str = "<div><a href=\"www.searchText.com\">This is <a href=\"www.searchText.com\">sentence</a> tI want to test.</a></div>";

preg_match_all($re, $str, $matches);

Demo 演示

Given the string " This is <a href="www.somesite.com">sentence</a> I want to test. " the regex: 给定字符串“ This is <a href="www.somesite.com">sentence</a> I want to test. ”正则表达式:

\.(?=\w)

will match the periods in the URL but not at the end of the sentence. 将匹配URL中的句点,但不匹配句子的结尾。 Note that the regex is not URL specific, it just finds a period followed immediately by a word character using a positive lookahead. 请注意 ,正则表达式不是特定于URL的,它仅使用正向查找来找到一个句点,后跟一个单词字符。

Having said that you should really be parsing HTML with something like PHPDomDocument 话虽如此,您实际上应该使用PHPDomDocument之类的东西来解析HTML。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM