简体   繁体   English

如何使用 BeautifulSoup 和 Requests 抓取动态变量 Javascript 值

[英]How Scraping Dynamic Variable Javascript value using BeautifulSoup and Requests

I am scraping login page, i only need VAR SALT= variable in JAVASCRIPT TAG.我正在抓取登录页面,我只需要 JAVASCRIPT TAG 中的 VAR SALT= 变量。 This is the website = https://ib.muamalatbank.com/ib-app/loginpage这是网站 = https://ib.muamalatbank.com/ib-app/loginpage

When i am read all answer here,using BeautifulSoup and requests, i can get these 2 variable(Maybe because its static): var muserid='User ID must be filled';当我在这里阅读所有答案时,使用 BeautifulSoup 和请求,我可以获得这 2 个变量(可能是因为它是静态的): var muserid='用户 ID 必须填写'; var mpassword= 'Password must be filled'; var mpassword = '必须填写密码';

But when i try Scrape this var SALT= , its give me all VAR value.但是当我尝试 Scrape 这个 var SALT= 时,它给了我所有的 VAR 值。 My result code in python我在python中的结果代码

I just need This VAR SALT value only with no Quotation mark Here the PIC = Source VAR SALT VALUE我只需要这个 VAR SALT 值,没有引号 这里的 PIC = Source VAR SALT VALUE

I already using re.search, and re.compile, re.findall, but i am Newbie, keep gives me error "Object cannot string...."我已经在使用 re.search、re.compile、re.findall,但我是新手,一直给我错误“对象不能串......”

from bs4 import BeautifulSoup as bs
import requests
import re
import lxml
import json

URL = 'https://ib.muamalatbank.com/ib-app/loginpage'
REF = 'https://ib.muamalatbank.com'

HEADERS = {'User-Agent': 'User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:81.0) Gecko/20100101 Firefox/81.0', 'origin': URL, 'referer': REF}

s = requests.session()
soup = bs(s.get(URL, headers=HEADERS, timeout=5, verify=False).text,"html.parser")

script = soup.find_all("script")[11]
ambilteks = soup.find_all(text=re.compile("salt=(.*?)"))
print(ambilteks)

Note: 1) i need Help but not interested using Selenium,注意:1)我需要帮助但对使用 Selenium 不感兴趣,

  1. I have script in PHP-Laravel, its fully working(i need in Python), but i have no knowledge in laravel, anyone can ask me , i will give the Laravel code我在 PHP-Laravel 中有脚本,它完全可以工作(我需要在 Python 中),但我对 Laravel 一无所知,任何人都可以问我,我会提供 Laravel 代码

Please help me, thank you very much请帮帮我,非常感谢

Try using re.compile and add the '' into your regex, then extract first result.尝试使用 re.compile 并将''添加到您的正则表达式中,然后提取第一个结果。 Not tested with page response.未使用页面响应进行测试。 First verify the string is actually present in the response.首先验证字符串是否确实存在于响应中。

p = re.compile(r"var salt='(.*?)'")
res = p.findall(s.get(URL, headers=HEADERS, timeout=5, verify=False).text)[0]
print(res)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM