简体   繁体   中英

Python get value from input element with lxml xpath

i want to make paypal cURL login, so i need auth value. i try to get auth value from html source with this python code

import requests
import lxml.html
import StringIO
from xml.etree.ElementTree import ElementTree
r = requests.get("https://paypal.com/cgi-bin/webscr?cmd=_login-run")
login_page = r.text.encode('utf-8') #printing html source
html = lxml.html.fromstring(login_page) #printing <Element html at 0x7f19cb242e$
auth = html.xpath('//input[@name="auth"]') #printing [<InputElement 7fb0971e9f1$
print auth

but the above code printing this [<InputElement 7fb0971e9f18 name='auth' type='hidden'>] , so how i can get auth value with decoded &#x2d; &#x2d; &#x2e; &#x2d; &#x2d; &#x2e; ? the input section looks like this

<input name="auth" type="hidden" value="ADPifNsidn&#x2d;P0G6WmiMMeJbjEhnhIvZCNg7Fk11NUxc0DyYWzrH&#x2d;xk5ydV&#x2e;85WCzy">

thanks you very much.

If you'd like to retrieve the auth attribute of that element, use

auth = html.xpath('//input[@name="auth"]/@value')

there is no need to decode anything, entities are expanded automatically when lxml parses HTML, and therefore the output will be

$ python sample.py 
['AmmyYuqCDmZRcSs6MaQi2tKhzZiyAX0eSERKqTi3pLB5pdceB726lx7jhXU2MGDN6']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM