[英]Passing session ID from Selenium to Python Requests
100% Python noob, so I apologize if any terms or phrases I use are incorrect or ambiguous 100%Python noob,因此,如果我使用的任何术语或短语不正确或模棱两可,我深表歉意
I am trying to go to a random word generator website, refresh the random words generated from the default, scrape the website after the refresh, ingest the words, sort them alphabetically, then print the pre-sorted and post-sorted lists to the screen. 我正在尝试访问随机词生成器网站,刷新从默认值生成的随机词,刷新后抓取网站,提取词,按字母顺序对它们进行排序,然后将预排序和后排序的列表打印到屏幕上。
This is the website url: https://www.randomlists.com/random-words 这是网站网址: https : //www.randomlists.com/random-words
I am using the latest version of Python 3.x, which comes with Requests, and I have installed Selenium using Pip. 我正在使用Requests随附的最新版本的Python 3.x,并且已经使用Pip安装了Selenium。
Here's what I have been able to successfully do: 这是我成功完成的事情:
Here's what I can't do that I want to do: 这是我无法做的事情:
I've discovered that the reason for this is that Selenium and Requests are each using their own "instances", represented by Session IDs, of the website. 我发现这样做的原因是Selenium和Requests各自使用各自的“实例”(由会话ID表示)。
So (finally) here is my question - how do I pass the captured Session ID from the website refreshed with Selenium to Requests which I then use to scrape the refreshed word list rather than the default word list? 所以,(最后)这是我的问题-如何将用硒更新的网站捕获的会话ID传递给请求,然后使用这些信息来刮擦刷新的单词列表而不是默认单词列表?
All of the topics around this issue have dealt with passing a username and password login session from one to the other, which isn't what I'm trying to do here. 有关此问题的所有主题都涉及将用户名和密码登录会话从一个传递到另一个,这不是我要在此处进行的操作。 My apologies if this question has been already asked - I spent several hours today researching, but was not able to find anything with my, specific scenario.
如果已经问过这个问题,我深表歉意-我今天花了几个小时进行研究,但在我的特定情况下找不到任何东西。
More than happy to post a copy of my current code if necessary. 如有必要,很高兴发布我当前代码的副本。
Thanks! 谢谢!
Edit to add code below 编辑以在下面添加代码
options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument("--test-type")
options.binary_location = "/usr/bin/chromium"
driver = webdriver.Chrome(executable_path=r"REDACTED")
url = 'https://www.randomlists.com/random-words'
driver.get(url)
refresh_button = driver.find_elements_by_xpath("//input[@name='submit' and
@value='Refresh']")[0]
refresh_button.click()
rawList = []
sortedList= []
url = 'https://www.randomlists.com/random-words'
r = requests.get(url)
tree = lxml.html.fromstring(r.content)
elements = tree.get_element_by_id('result')
for el in elements:
rawList.append(el.text_content())
print("This is the unsorted list:", "\n")
for i in rawList:
print(i)
print("\n")
print("This is the sorted list:", "\n")
for i in rawList:
sortedList.append(i)
sortedList.sort()
for i in sortedList:
print(i)
I am trying to go to a random word generator website, refresh the random words generated from the default, scrape the website after the refresh, ingest the words, sort them alphabetically, then print the pre-sorted and post-sorted lists to the screen.
我正在尝试访问随机词生成器网站,刷新从默认值生成的随机词,刷新后抓取网站,提取词,按字母顺序对它们进行排序,然后将预排序和后排序的列表打印到屏幕上。
I'm not sure if this is an assignment or anything and has to be done in this specific way but there may be an easier way. 我不确定这是一项作业还是任何其他工作,是否必须以这种特定方式完成,但是可能会有更简单的方式。 Looking at the requests on that website, I found the link to the JSON of all the words they seem to be choosing random ones from.
通过查看该网站上的请求,我发现了所有似乎从中选择随机单词的单词的JSON链接。 Now, they don't seem to be sorted but that is as simple as calling
sorted(...)
on the returned data. 现在,它们似乎没有进行排序,但是就像对返回的数据调用
sorted(...)
一样简单。
>>> import requests
...
... url = 'https://www.randomlists.com/data/words.json'
... r = requests.get(url)
... r.raise_for_status()
... all_words = r.json()['data']
>>> len(all_words)
2643
>>> all_words == sorted(all_words)
False
>>> sorted_words = sorted(all_words)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.