简体   繁体   English

PHP中的正则表达式以搜索特定数据集

[英]regular expression in php to search particular set of data

searching for i wanted to extract a paragraph from my website . 搜索我想从我的网站中提取一段。 their are more then 20 paragraph tags used in the index page. 它们是索引页面中使用的20多个段落标签。 the key diff. 关键区别 is style18 class is used 1 time and style 19 3 time in each tag. 在每个标签中,style18类使用1次,style 19类使用3次。 i want to search them with the content os class 18 eg. 我想用内容操作系统类18搜索它们。 the main content 主要内容


<p class="margin">
    <span class="style18">*the main content*</span>
      » <a href="https://example1.html">
        somthing</a>

        <span class="style19">[somthing]</span>
         » <a href="https://example1.html">Town</a>

         <span class="style19">[somthing]</span>
          » <a href="https://example1.html">somthing</a>

    <span class="style19">[somthing]</span> »
    <a href="https://www.example.html">somthing</a>

    <span class="style19">[somthing]</span>

</p>

<?php
  $data = file_get_contents('https://www.example.net/index.php');

  preg_match('/<title>([^<]+)<\/title>/i', $data, $matches);
  $title = $matches[1];

  echo preg_match('/(<p)\s.+\n.+(style18).+Single\sTrack(.+)\n(.+)\n(.+)\n(.+)\n.+(style19).+\n(.+)\n(.+)\n.+(style19).+\n(.+)\n(.+)\n.+(style19).+\n(.+)\n(.+)\n.+(style19).+\n\n<\/p>/i', $data, $matches);

  $img = $matches[1];

  echo $title."<br>\n";
  echo $img;
  ?>

Welcome to the community @Aerro. 欢迎来到@Aerro社区。

If I got your question correctly, you want to extract the inner content of any span surrounded by other spans with certain rules. 如果我正确地回答了您的问题,则要提取具有特定规则的其他跨度所包围的任何跨度的内部内容。 While this could easily break your fingers with regexp, (tree / graph) query languages like XPath would be a good approach to solve this. 尽管这很容易用regexp折断手指,但是像XPath这样的(树/图)查询语言将是解决此问题的好方法。

Have a look at eg http://php.net/manual/en/simplexmlelement.xpath.php 看看例如http://php.net/manual/en/simplexmlelement.xpath.php

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM