简体   繁体   English

从网页获取特定文本

[英]Get specific text from webpage

I have this Page Test1 on this other page test 我在另一页测试中有此Page Test1

I have this PHP code running to get some code from test1. 我运行此PHP代码以从test1获取一些代码。

<?php
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile("http://inviatapenet.gethost.ro/sop/test1.php");

$xpath = new DOMXpath($doc);

$elements = $xpath->query("//*[@type='button']/@onclick");

if (!is_null($elements)) {
    foreach ($elements as $element) {
        $nodes = $element->childNodes;
        foreach ($nodes as $node) {
            echo $node->nodeValue. "\n";
        }
    }
}
?>

The result is this 结果是这样

OnPlay('sop://broker.sopcast.com:3912/120704 cod ', ' eu - Nr.1 in tv ! ')
OnPlay('sop://broker.sopcast.com:3912/140601 cod ', ' eu - Nr.1 in tv ! ')     
OnPlay('sop://broker.sopcast.com:3912/124589 cod ', ' eu - Nr.1 tv') 
OnPlay('sop://broker.sopcast.com:3912/589994 cod ', ' eu - tv ') 
OnPlay('sop://broker.sopcast.com:3912/ cod ', ' eu - tv ')

But I need only this data from all of that: `sop://broker.sopcast.com:3912/140601 但是我只需要所有这些数据即可:`sop://broker.sopcast.com:3912/140601

All of them. 他们全部。

How to get rid of extra text or how to get gest the(sop://broker.sopcast.com:3912/140601,sop://broker.sopcast.com:3912/120704) 如何清除多余的文本或如何整理(sop://broker.sopcast.com:3912/140601,sop://broker.sopcast.com:3912/120704)

I think you might need do some string manipulation on resultant OnClick event handlers text. 我认为您可能需要对结果OnClick事件处理程序文本进行一些字符串操作。

<?php
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile("http://inviatapenet.gethost.ro/sop/test1.php");

$xpath = new DOMXpath($doc);

$elements = $xpath->query("//*[@type='button']/@onclick");
$value_text = array();
$index = 0;
if (!is_null($elements)) {
    foreach ($elements as $element) {
        $nodes = $element->childNodes;
        foreach ($nodes as $node) {
            value_text[$index++] = getReuiredValue($node->nodeValue);
        }
    }
    //value_text will contain all required values as array
    print_r($value_text);
}


    function getReuiredValue($on_play)
    {
   $pos = strpos($on_play, 'cod ');
   //following call will parse the OnPlay string and get the required value out of string
   $updated_on_play = substr($on_play, 8, (strlen($on_play) - (strlen($on_play) - $pos) - 8));
   $updated_on_play = trim($updated_on_play);
   return  $updated_on_play;
   }
?>

If the string is always formatted like this, you can simply use explode to get the sop:// URL. 如果字符串始终采用这种格式,则只需使用explode即可获取sop:// URL。

<?php

header('Content-Type: text/plain; charset=UTF-8');


libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile("http://inviatapenet.gethost.ro/sop/test1.php");

$xpath = new DOMXpath($doc);

$elements = $xpath->query("//*[@type='button']/@onclick");

if (!is_null($elements)) {
    foreach ($elements as $element) {
        $nodes = $element->childNodes;
        foreach ($nodes as $node) {
            echo $node->nodeValue. "\n";
            $content = $node->nodeValue;
            $content = explode("'", $content, 3);
            $content = explode(" ", $content[1], 2);
            $sop = $content[0];
            unset($content);
            var_dump($sop);
        }
    }
}
?>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM