Extract 2 arguments from web page

Question

I want to extract 2 arguments ( title and href ) from <a> tag from a wikipedia page.

I want this output eg ( https://en.wikipedia.org/wiki/Riddley_Walker ):

Canterbury Cathedral  
/wiki/Canterbury_Cathedral

The code:

import os, re, lxml.html, urllib

def extractplaces(hlink):
    connection = urllib.urlopen(hlink)
    places = {}

    dom =  lxml.html.fromstring(connection.read())

    for name in dom.xpath('//a/@title'): # select the url in href for all a tags(links)
            print name

In this case i only get @title .

Answer 1

You should get elements with tag a that have title attribute (instead of directly getting the title attribute).And then use .attrib for the element to get the attributes you need. Example -

for name in dom.xpath('//a[@title]'):
    print('title :',name.attrib['title'])
    print('href :',name.attrib['href'])

Extract 2 arguments from web page

Question

1 answers

solution1
0 ACCPTED 2015-10-27 16:10:51

Extract 2 arguments from web page

Question

1 answers

solution1 0 ACCPTED 2015-10-27 16:10:51

solution1
0 ACCPTED 2015-10-27 16:10:51