简体   繁体   中英

Can not extract the elements i want from html by jsoup

Page is here http://www.yildiz.edu.tr/etkinlikler/

Source of it: view-source: http://www.yildiz.edu.tr/etkinlikler/

I did not want to put screenshots because it is really long. I would have to put a lot of screenshots.

I want to take this

title: 'ss Event',
start: new Date(y, m, 1)

Not all, only after title and date.

But there seems no class or another header for it, because it is in javascript class:

 </div>
    </div>
</div>


</div>    </div>
</div>    <script>
    $(document).ready(function() {

        var date = new Date();
        var d = date.getDate();
        var m = date.getMonth();
        var y = date.getFullYear();

    $('#calendar').fullCalendar({
        header: {
            left: 'prev,next today',
            center: 'title',
            right: 'month,agendaWeek,agendaDay'
        },
        editable: false,
        events: [
                    {
                title: 'birthday party',
                start: new Date(2015, 9, 26),
                                    end: new Date(2015, 10, 13),
                                    url: 'http://www.yildiz.edu.tr/etkinlikler/Uygarlıkların Geçiş Yolu  &  Anadolu Peyzajı/237'

            },


{
                title: 'Concert',
                start: new Date(2015, 5, 12),
                                    end: new Date(2015, 5, 19),
                                    url: 'http://www.yildiz.edu.tr/etkinlikler/İki Seçki İki Salon İki Sergi/233'

            },

                        ]
    });

});

</script>

    <style type='text/css'>
        #calendar {
            width: 900px;
            margin: 0 auto;
            }

</style>

I tried

Elements event = document.select("#events");

but it did not work.Should i use another tool?

You need to use something else. Jsoup will not let you work with Javascript, only the HTML parse tree.

At maximum, you can get the whole script text using document.select("script").get(1).toString().

Once you get the script text, you may consider other options such as regex, or maybe some Javascript parsing tool that does what you require.

Document doc = Jsoup.connect("http://www.yildiz.edu.tr/etkinlikler/").get();
String script = doc.select("script").get(10).toString();
String pattern = "\\{\\s*title\\s*:\\s*(.*),\\s*start\\s*:\\s*(.*),\\s*end.*";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(script);
while (m.find()) {
    System.out.println(m.group(1) + " -> " + m.group(2));
}

The output:

'Long Event' -> new Date(y, m, d-5)
'Lunch' -> new Date(y, m, d, 12, 0)
'Birthday Party' -> new Date(y, m, d+1, 19, 0)
'Kemal Gök Fotoğraf Sergisi :Kentleşme Sürecinde Çocuk İşçiler' -> new Date(2015, 10, 17)
'Uygarlıkların Geçiş Yolu  &  Anadolu Peyzajı' -> new Date(2015, 9, 26)
'Vizöre Çarpanlar' -> new Date(2015, 8, 9)
'İki Seçki İki Salon İki Sergi' -> new Date(2015, 5, 12)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM