简体   繁体   English

拆分html链接标签

[英]Split html links tag

I have this html code in a string: 我在字符串中有以下html代码:

  Hello world
  <img src="mypicture.png" />
  <p>Some text in a tag</p>
  <a href="http://www.google.fr">Link to google</a> Some Text <a href="http://www.yahoo.fr">Link to yahoo</a> End of line
  <p>Some text in a tag</p>
  <a attribute="some value" href="http://www.apple.com">Link to apple</a>
  Some text

I want to convert this string into this array: 我想将此字符串转换为以下数组:

  0 => Hello world
  <img src="mypicture.png" />
  <p>Some text in a tag</p>
  <a href="

  1 => http://www.google.fr

  2 => ">Link to google</a> Some Text <a href="

  3 => http://www.yahoo.fr

  4 => ">Link to yahoo</a> End of line
  <p>Some text in a tag</p>
  <a attribute="some value" href="

  5 => http://www.apple.com

  6 => ">Link to apple</a>
  Some text

I have tried this regexp. 我已经尝试过此正则表达式。 It works fine to extract the links, but i do not manage to build my array... 提取链接效果很好,但是我无法建立数组...

  <a (.*?)href=(.*?)\"(.+?)\"(.*?)>

You can just add something to capture anything and everything before the link as well: 您也可以添加一些内容以捕获链接之前的所有内容:

([\\W\\w]*?)(?:(<a .*?href=.*?\\")(.+?)(?=\\")|$)

  • Get every character until... 得到每个角色,直到...
  • Link is found: 找到链接:
    • Get the link up to the href value (basically your code) 获取指向href值的链接(基本上是您的代码)
    • Get the characters up to the next quote (the url) 获取字符直到下一个引号(URL)
  • End of the text is found 找到文字结尾

Then you just need to step through each match and add the pre + link to the array, and the url to the array separately. 然后,您只需要逐步完成每个匹配,并将pre + link添加到数组,并将url到数组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM