[英]Cannot get get right value with XPath
I'm using a src which collects data and works as an API. 我正在使用src来收集数据并用作API。 The website it takes info from had been redone and some stuff work, some don't.
重做它的用于获取信息的网站已经重做,有些工作正常,有些则没有。
PHP : PHP的 :
protected $namexpath = ".//h1[contains(@itemprop,\"name\")]/a";
Works with HTML Source : 与HTML Source一起使用 :
<h1 itemprop="name" class="fn itemTitle">
<a title="https://www.paginegialle.it/altopascio-lu/lotto-ricevitorie/lucky-planet-duro-anastasia-tabaccheria-ricevitori" href="https://www.paginegialle.it/altopascio-lu/lotto-ricevitorie/lucky-planet-duro-anastasia-tabaccheria-ricevitori">
Lucky <strong>Planet</strong> - Duro Anastasia <strong>Tabaccheria</strong> Ricevitoria Lotto
</a>
</h1>
But this is not working: 但这不起作用:
PHP : PHP的 :
protected $telephonexpath = ".//div[@class=\"hidden-phone-elem visiblePhone\"]/span";
HTML Source : HTML来源 :
<section itemscope="" itemtype="https://schema.org/LocalBusiness" class="vcard listElement flFree " data-user="teverola-ce/bar/rivendita-generi-monopolio-n.-2-tozzi-iolanda" data-id="4" data-fl_free="true" data-cd_opec="GU01WAAW" data-cd_aggregazione="23787370" data-cd_id_sede="E57901ED-8833-A2AD-E040-A8C08D264C56">
<div class="container">
<div class="row">
<div class="col contentCol">
<header>
<div class="tabletOnlyBadge">
</div>
<h1 itemprop="name" class="fn itemTitle">
<a title="https://www.paginegialle.it/teverola-ce/bar/rivendita-generi-monopolio-n.-2-tozzi-iolanda" href="https://www.paginegialle.it/teverola-ce/bar/rivendita-generi-monopolio-n.-2-tozzi-iolanda">
<strong>Planet</strong> Cafe' di Tozzi Iolanda
</a>
</h1>
<span class="itemSubtitle">
</span>
<div>
<span class="itemAddress">
<span class="adr" itemprop="location" itemscope="" itemtype="https://schema.org/Place">
<div class="street-address">
<span>105, Via Roma</span> -
<span class="postal-code">81030</span>
<span class="locality">Teverola</span> <span class="region">(CE)</span>
</div>
<div style="display: none;">
<span>40.99494</span>
<span>14.2077</span>
</div>
</span>
</span>
</div>
</header>
<div>
<div class="hidden-phone-wrapper">
<span class="custom-label"></span>
<div class="hidden-phone-elem">
<div class="btn btn-yellow btn-show-phone" data-pag="mostra telefono" data-context="listing">
<span>MOSTRA TELEFONO</span>
</div>
<div class="btn btn-hidden-phone">
<span class="phIco "></span>
<span class="phone-label">081 5034556</span>
</div>
</div>
</div>
<div class="itemGeoLinks">
<ul>
</ul>
</div>
<div class="itemPayoff">
<p class="payoff-title">
<a class="cat" href="//www.paginegialle.it/ricerca/cat/008647000" rel="nofollow"><strong>Tabacchi</strong>, sigarette e sigari - produzione e commercio</a>
</p>
<p itemprop="description" class="payoff-txt"></p>
</div>
<div class="itemInfoTags">
</div>
</div>
</div>
<div class="col-3 logoCol">
<div class="itemRating">
<a rel="nofollow" href="//www.paginegialle.it/teverola-ce/bar/rivendita-generi-monopolio-n.-2-tozzi-iolanda/commenti#scrivi">
<ul class="stars">
<li></li>
<li></li>
<li></li>
<li></li>
<li></li>
</ul>
<span class="label scriviRecensione">Scrivi una recensione</span>
</a>
</div>
<figure class="itemLogo">
<div class="img-container-ext">
<div class="img-container-int">
<a href="https://www.paginegialle.it/teverola-ce/bar/rivendita-generi-monopolio-n.-2-tozzi-iolanda" title="Dettagli azienda">
<img itemprop="image" alt="Planet Cafe' di Tozzi Iolanda" title="Planet Cafe' di Tozzi Iolanda" data-original="" class="elementImage photo" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" pagespeed_url_hash="1859759222" onload="pagespeed.CriticalImages.checkImageForCriticality(this);">
</a>
</div>
</div>
</figure>
</div>
</div>
</div>
<div class="container">
<div class="row">
<div class="col">
<nav class="itemFooter">
<a class="btn btn-black icn-vetrina shinystat_ssxl" data-pag="vetrina" href="//www.paginegialle.it/teverola-ce/bar/rivendita-generi-monopolio-n.-2-tozzi-iolanda">Vetrina</a>
<a class="btn btn-blank icn-showOnMap btnShowOnMap shinystat_ssxl" data-pag="vedimappa" href="https://www.paginegialle.it/teverola-ce/bar/rivendita-generi-monopolio-n.-2-tozzi-iolanda/mappa" rel="nofollow"> <span>Vedi su mappa</span></a>
</nav>
</div>
</div>
</div>
www.paginegialle.it//ricerca//TABACCO%20PLANET?mr=50 So You might see the HTML easier. www.paginegialle.it//ricerca//TABACCO%20PLANET?mr=50因此,您可能会更轻松地看到HTML。
I edited and I am adding some text because It won't let me finalize edit since it says too much code, I fixed the first part and changed from span to h1 我进行了编辑,并添加了一些文本,因为由于它说了太多代码,它无法让我完成编辑,因此我修复了第一部分并将其从span更改为h1
The Xpath does not match the HTML. Xpath与HTML不匹配。 The relevant fragment seems to be:
相关的片段似乎是:
<div class="hidden-phone-elem">
<div class="btn btn-yellow btn-show-phone" data-pag="mostra telefono" data-context="listing">
<span>MOSTRA TELEFONO</span>
</div>
<div class="btn btn-hidden-phone">
<span class="phIco "></span>
<span class="phone-label">081 5034556</span>
</div>
</div>
The div
has only the class hidden-phone-elem
and two descendant span
s. div
仅具有hidden-phone-elem
和两个后代span
。 Xpath 1.0 has no token selector function, but it can be emulated with string functions. Xpath 1.0没有令牌选择器功能,但是可以使用字符串函数进行仿真。
normalize-space()
- replace all whitespace sequences with a single space, trim normalize-space()
-用单个空格替换所有空格序列,修剪 concat()
- concatenate strings concat()
-连接字符串 contains()
- look for substring contains()
-寻找子字符串 The trick is to normalize the attribute to something like classToMatch otherClass
and look if that contains classToMatch
. 诀窍是将属性规范化为
classToMatch otherClass
类的东西,然后查看其是否包含classToMatch
。 (Take note of the spaces at the start/end). (记下开始/结束处的空格)。
$document = new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXpath($document);
$expression = 'string(
//div[
contains(concat(" ", normalize-space(@class), " "), " hidden-phone-elem ")
]
//span[
contains(concat(" ", normalize-space(@class), " "), " phone-label ")
]
)';
var_dump($xpath->evaluate($expression));
Output: 输出:
string(11) "081 5034556"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.