php/ xpath/ xquery

I want to scrap webpage's content, which I already did, But my problem is, I can't get accurate link text, if there is any child tag in the link text. For example, my xquery is "//div[@class='someclass']/div/a/text()" , then it gets accurate result if link is somethings like this <a href='somelink'> this is link </a> , (my output is :: this is link ) but if the link is : <a href='somelink'> this is <br /> another text </a> , then my output is this is , another text because of child tag br , Then I google, then may be got some solution, my solution may be fn:string() , but I can't figure out how can I use fn:string() in xquery/xpath in php

text() selects all text nodes directly below a certain element. For <a href='somelink'> this is <br /> another text </a> , these are two elements, in case of <a href='somelink'> this is <strong>another</strong> text </a> will even omit the word another , as it isn't a direct child of the anchor tag.

If querying a single anchor tag within one XPath expression, use the string($element) function without any text() matcher, eg.


If your expression returns a sequence (in PHP: list/array) of results, loop over the results and for each anchor tag run the XPath expression string(.) (with . being the current context). For more control, you might want to use .//text() to fetch all text nodes below the current context, and concatenate them in PHP. There's another answer explaining this in detail.

Be aware PHP only supports XPath 1.0 – no XQuery, and no XPath 2.0.

you didn't show your html code. So I guess your html code is looking like this :

<div class='someclass'>
   <div class='otherclass'>
      <a href='somelink'> some text including child element <a>

you can try as like as given below ::


It will give you all information inside otherclass div , Now if you tried as like as given below, may be your problem is being solved:

   $linkQuery     =  $xpath->query("//div[@class='someclass']/div/*");

   $linkText = array();       

   for($i=0, $len = ($linkQuery->length) ? $linkQuery->length : -1; $i < $len; $i++ )  {
      $linkText[]  = ( $linkQuery->item($i) != NULL ) ? preg_replace('/\s+/', ' ', $linkQuery->item($i)->nodeValue )  : 'some default text'; 

Now you get all text inside your link text.


The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please

Related Question Argument 'fn' is not a function, got string Extracting sub string from web page content The content type of return string of PHP function Function to add HTML tags depending on string content function return content as string from a require Passing content of web page as string argument in shell_exec() Include PHP file content as argument to function that expects string value? String replace function only for the content inside of a specified html tag Get self web content Web scraping for dynamic content
粤ICP备18138465号  © 2020-2024 STACKOOM.COM