简体   繁体   English

DomCrawler仅选择段落

[英]DomCrawler select only paragraphs

I want to extract just the paragraphs in each .pertanyaan class, before the .listjawaban class using DomCrawler/Goutte Symfony component 我想提取只是在每个段落.pertanyaan类中,前.listjawaban使用DomCrawler / GOUTTE Symfony的组件类

Is there any way to do this? 有什么办法吗? I'm coming with $crawler->filter('.pertanyaan p')->eq($i)->html() but it just gives me the first paragraph, because $i is the n-th position of .pertanyaan class. 我要使用$crawler->filter('.pertanyaan p')->eq($i)->html()但这只是给我第一段,因为$i.pertanyaan的第n个位置类。

 <div class="pertanyaan"><p></p> <p>Karena mengalami mutasi, kromosom mengalami perubahan seperti pada gambar di bawah.</p> <p><img src="http://indocademy.com/images/ipa_2013_133/53_1.png" alt=""><br>Jenis mutasi tersebut adalah ....</p> <p></p> <div class="listjawaban"> <div class="radiojawaban"> <input type="radio" name="answer_dup_758" id="answer_dup_758_A" value="A" style="display:none" disabled=""><input type="radio" name="answer_758" id="answer_758_A" value="A" onclick="showbutton(758);">A. </div> <div class="pilihanjawaban"> adisi </div> </div> <div class="listjawaban"> <div class="radiojawaban"> <input type="radio" name="answer_dup_758" id="answer_dup_758_B" value="B" style="display:none" disabled=""><input type="radio" name="answer_758" id="answer_758_B" value="B" onclick="showbutton(758);">B. </div> <div class="pilihanjawaban"> delesi </div> </div> <div class="listjawaban"> <div class="radiojawaban"> <input type="radio" name="answer_dup_758" id="answer_dup_758_C" value="C" style="display:none" disabled=""><input type="radio" name="answer_758" id="answer_758_C" value="C" onclick="showbutton(758);">C. </div> <div class="pilihanjawaban"> inversi </div> </div> <div class="listjawaban"> <div class="radiojawaban"> <input type="radio" name="answer_dup_758" id="answer_dup_758_D" value="D" style="display:none" disabled=""><input type="radio" name="answer_758" id="answer_758_D" value="D" onclick="showbutton(758);">D. </div> <div class="pilihanjawaban"> duplikasi </div> </div> <div class="listjawaban"> <div class="radiojawaban"> <input type="radio" name="answer_dup_758" id="answer_dup_758_E" value="E" style="display:none" disabled=""><input type="radio" name="answer_758" id="answer_758_E" value="E" onclick="showbutton(758);">E. </div> <div class="pilihanjawaban"> translokasi </div> </div> <div class="buttons"> <input type="button" class="tombol_jawab" id="tombol_jawab_758" value="Jawab" style="display:none" onclick="executejawaban(758,&quot;http://indocademy.com&quot;)"><input type="button" class="tombol_clear" id="tombol_clear_758" value="Hapus" style="display:none" onclick="clearjawaban(758)"> </div> <div class="kunci" id="kunci_758" style="display: none"> <div class="tulisanjawab abu"> <input type="button" id="tombol_kunci" value="+" class="jawaban_758" onclick="showkunci(this)"> Jawaban : <img id="loading_758" src="http://indocademy.com/images/loading.gif" style="height:12px;vertical-align:middle"> <span id="hasil_758"> </span> </div> <div class="konten_kunci"> <div class="konten_jawaban_758" id="isi_jawaban"></div> </div> </div> </div> 

This is the url I want to crawl: http://indocademy.com/soal/sbmptn/biologi/2013 这是我要抓取的网址: http : //indocademy.com/soal/sbmptn/biologi/2013
Everything goes fine except when crawling but at number #53 since there are three paragraph tags to extract (I only assumed each number has its first paragraph tag being the question, and I don't know how to extract all the paragraphs before .listjawaban class) 除了进行爬网外,一切都进行得很好,但编号为#53,因为要提取三个段落标记(我仅假设每个数字的第一个段落标记都是问题,而且我不知道如何提取.listjawaban类之前的所有段落)

Please help 请帮忙

Since the page at the URL doesn't have the structure and the class .pertanyaan does not exist, I copied the HTML snippet into a script and used the DomCrawler to get the four 由于网址上的页面没有结构,并且类.pertanyaan不存在,因此我将HTML代码段复制到了脚本中,并使用DomCrawler获取了这四个代码

elements. 元素。

#!/usr/bin/php

<?php

require ('vendor/autoload.php');

use Symfony\Component\DomCrawler\Crawler;

$html = <<<'HTML'
<div class="pertanyaan">
    <p></p>
    <p>Karena mengalami mutasi, kromosom mengalami perubahan seperti pada gambar di bawah.</p>
    <p><img src="http://indocademy.com/images/ipa_2013_133/53_1.png" alt=""><br>Jenis mutasi tersebut adalah ....</p>
    <p></p>
    <div class="listjawaban">
        <div class="radiojawaban">
            <input type="radio" name="answer_dup_758" id="answer_dup_758_A" value="A" style="display:none" disabled="">
            <input type="radio" name="answer_758" id="answer_758_A" value="A" onclick="showbutton(758);">A.
        </div>
        <div class="pilihanjawaban">
            adisi
        </div>
    </div>
    <div class="listjawaban">
        <div class="radiojawaban">
            <input type="radio" name="answer_dup_758" id="answer_dup_758_B" value="B" style="display:none" disabled="">
            <input type="radio" name="answer_758" id="answer_758_B" value="B" onclick="showbutton(758);">B.
        </div>
        <div class="pilihanjawaban">
            delesi
        </div>
    </div>
    <div class="listjawaban">
        <div class="radiojawaban">
            <input type="radio" name="answer_dup_758" id="answer_dup_758_C" value="C" style="display:none" disabled="">
            <input type="radio" name="answer_758" id="answer_758_C" value="C" onclick="showbutton(758);">C.
        </div>
        <div class="pilihanjawaban">
            inversi
        </div>
    </div>
    <div class="listjawaban">
        <div class="radiojawaban">
            <input type="radio" name="answer_dup_758" id="answer_dup_758_D" value="D" style="display:none" disabled="">
            <input type="radio" name="answer_758" id="answer_758_D" value="D" onclick="showbutton(758);">D.
        </div>
        <div class="pilihanjawaban">
            duplikasi
        </div>
    </div>
    <div class="listjawaban">
        <div class="radiojawaban">
            <input type="radio" name="answer_dup_758" id="answer_dup_758_E" value="E" style="display:none" disabled="">
            <input type="radio" name="answer_758" id="answer_758_E" value="E" onclick="showbutton(758);">E.
        </div>
        <div class="pilihanjawaban">
            translokasi
        </div>
    </div>
    <div class="buttons">
        <input type="button" class="tombol_jawab" id="tombol_jawab_758" value="Jawab" style="display:none" onclick="executejawaban(758,&quot;http://indocademy.com&quot;)"><input type="button" class="tombol_clear" id="tombol_clear_758" value="Hapus" style="display:none"
          onclick="clearjawaban(758)">
    </div>

    <div class="kunci" id="kunci_758" style="display: none">
        <div class="tulisanjawab abu">
            <input type="button" id="tombol_kunci" value="+" class="jawaban_758" onclick="showkunci(this)"> Jawaban : <img id="loading_758" src="http://indocademy.com/images/loading.gif" style="height:12px;vertical-align:middle">
            <span id="hasil_758"> </span>
        </div>
        <div class="konten_kunci">
            <div class="konten_jawaban_758" id="isi_jawaban"></div>
        </div>
    </div>
</div>
HTML;

$crawler = new Crawler($html);

$output = $crawler->filter('.pertanyaan p')->each(function ($node) {
    return $node->html();
});

print_r($output);

The function each() returns an array of the four paragraphs. 函数each()返回包含四个段落的数组。 The resulting array is here: 结果数组在这里:

Array
(
    [0] =>
    [1] => Karena mengalami mutasi, kromosom mengalami perubahan seperti pada gambar di bawah.
    [2] => <img src="http://indocademy.com/images/ipa_2013_133/53_1.png" alt=""><br>Jenis mutasi tersebut adalah ....
    [3] =>
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM