简体   繁体   中英

Extract token from within <script> tags BeautifulSoup4, Requests

I'm trying to isolate the securityToken from an HTML response. The securityToken is within tags though.

I've been able to isolate the tag with the code below:

import requests
from bs4 import BeautifulSoup
import re

url = 'https://obe.sandals.com/read-land-availability/'
r = requests.get(url, headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36"})
soup= BeautifulSoup(r.text, 'html.parser')
mytext = soup.find('script', text = re.compile('securityToken:'))

print(mytext)

Here is the output, but I cannot figure out the last step to extract the securityToken

<script> window._app.page = { jsView: './views/step1/Vacation', securityToken: "BF8394B1DD5481AF43BE2AF02243903F121D26327E83ADC13785F6EF739B5870", subSessionId: "6D71C585C7F51CF105B3100A473635ACF3637329F2C1ABAADB1F2827832562D8", step: 1 }; </script>

Process finished with exit code 0

如果您使用 'html5lib' 而不是 'html.parser',并且安全令牌的位置始终相同:

mytext.split('securityToken: "')[1].split('", subSessionId:')[0]

To extract the value of securityToken try the following:

import re
import requests
from bs4 import BeautifulSoup


url = 'https://obe.sandals.com/read-land-availability/'
r = requests.get(url, headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.103 Safari/537.36"})
soup = BeautifulSoup(r.text, 'html.parser')
mytext = soup.find('script', text = re.compile('securityToken:'))


print(re.search(r'securityToken: "(.*?)"', str(mytext)).group(1))

Output:

5EFDCE1D62C5F1C1369EF3629F921B0F90301ACB51C5FD24321D7FB58D04DE50

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM