简体   繁体   中英

How do I use LXML to return href attribute in path as string?

I have working code that prints element

'//*[@id=all_TorontoBlueJayspitching"]/div/table/tbody/tr/th/a/text()'

From the sitehttps://www.baseball-reference.com/boxes/CHA/CHA202206200.shtml

Using the script:

import requests

from lxml import html

boxScore = "CHA/CHA202206200"

url = "https://www.baseball-reference.com/boxes/" + boxScore + ".shtml"

page = requests.get(url)

tree = html.fromstring(b''.join(line for line in page.content.splitlines() if b'<!--' not in line and b'-->' not in line))

getTeams = tree.xpath('//*[@class="scorebox"]/div/div/strong/a/text()')

for team in getTeams:

team = team.replace(" ", "")

stringy = '"all_' + team + 'pitching"'

stringx = '//*[@id=' + stringy + ']/div/table/tbody/tr/th/a/text()'


tambellini = tree.xpath(stringx)

print(tambellini)

The problem is I do not want to print this text, I want to print one of the paths. Meaning I more or less am trying to get to

'//*[@id=all_TorontoBlueJayspitching"]/div/table/tbody/tr/th/a'

And then that value href in /a (which in this case is href=-"/players/b/berrijo01.shtml"

Any guidance here would be helpful. I know how to successfully print an element, but I don't know how to access the path itself as a variable. Thank you.

Change the stringx to

stringx = '//*[@id=' + stringy + ']/div/table/tbody/tr/th/a/@href'

This should output

[
  '/players/l/lynnla01.shtml', 
  '/players/l/lopezre01.shtml', 
  '/players/g/graveke01.shtml', 
  '/players/k/kellyjo05.shtml'
]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM