I want to extract 2 arguments ( title
and href
) from <a>
tag from a wikipedia page.
I want this output eg ( https://en.wikipedia.org/wiki/Riddley_Walker ):
Canterbury Cathedral
/wiki/Canterbury_Cathedral
The code:
import os, re, lxml.html, urllib
def extractplaces(hlink):
connection = urllib.urlopen(hlink)
places = {}
dom = lxml.html.fromstring(connection.read())
for name in dom.xpath('//a/@title'): # select the url in href for all a tags(links)
print name
In this case i only get @title
.
You should get elements with tag a
that have title attribute (instead of directly getting the title
attribute).And then use .attrib
for the element to get the attributes you need. Example -
for name in dom.xpath('//a[@title]'):
print('title :',name.attrib['title'])
print('href :',name.attrib['href'])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.