简体   繁体   English

将会话ID从Selenium传递到Python请求

[英]Passing session ID from Selenium to Python Requests

100% Python noob, so I apologize if any terms or phrases I use are incorrect or ambiguous 100%Python noob,因此,如果我使用的任何术语或短语不正确或模棱两可,我深表歉意

I am trying to go to a random word generator website, refresh the random words generated from the default, scrape the website after the refresh, ingest the words, sort them alphabetically, then print the pre-sorted and post-sorted lists to the screen. 我正在尝试访问随机词生成器网站,刷新从默认值生成的随机词,刷新后抓取网站,提取词,按字母顺序对它们进行排序,然后将预排序和后排序的列表打印到屏幕上。

This is the website url: https://www.randomlists.com/random-words 这是网站网址: https : //www.randomlists.com/random-words

I am using the latest version of Python 3.x, which comes with Requests, and I have installed Selenium using Pip. 我正在使用Requests随附的最新版本的Python 3.x,并且已经使用Pip安装了Selenium。

Here's what I have been able to successfully do: 这是我成功完成的事情:

  1. Use Selenium to pass a refresh command to the website and generate a new list of words 使用Selenium将刷新命令传递到网站并生成新的单词列表
  2. Use Requests to scrape the default words from the website. 使用“请求”从网站上刮取默认单词。
  3. Sort the default word list alphabetically 按字母顺序对默认单词列表进行排序
  4. Print the pre-sorted and post-sorted default word lists to the screen 将预排序和后排序的默认单词列表打印到屏幕上

Here's what I can't do that I want to do: 这是我无法做的事情:

  1. Use Selenium to refresh the website from the default 使用Selenium从默认刷新网站
  2. Then use Requests to scrape the refreshed words from the website 然后使用“请求”从网站上抓取刷新的单词
  3. Then sort alphabetically and print to the screen, pre-sort and post-sort. 然后按字母顺序排序并打印到屏幕上,进行预排序和后排序。

I've discovered that the reason for this is that Selenium and Requests are each using their own "instances", represented by Session IDs, of the website. 我发现这样做的原因是Selenium和Requests各自使用各自的“实例”(由会话ID表示)。

So (finally) here is my question - how do I pass the captured Session ID from the website refreshed with Selenium to Requests which I then use to scrape the refreshed word list rather than the default word list? 所以,(最后)这是我的问题-如何将用硒更新的网站捕获的会话ID传递给请求,然后使用这些信息来刮擦刷新的单词列表而不是默认单词列表?

All of the topics around this issue have dealt with passing a username and password login session from one to the other, which isn't what I'm trying to do here. 有关此问题的所有主题都涉及将用户名和密码登录会话从一个传递到另一个,这不是我要在此处进行的操作。 My apologies if this question has been already asked - I spent several hours today researching, but was not able to find anything with my, specific scenario. 如果已经问过这个问题,我深表歉意-我今天花了几个小时进行研究,但在我的特定情况下找不到任何东西。

More than happy to post a copy of my current code if necessary. 如有必要,很高兴发布我当前代码的副本。

Thanks! 谢谢!

Edit to add code below 编辑以在下面添加代码

options = webdriver.ChromeOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument("--test-type")
options.binary_location = "/usr/bin/chromium"
driver = webdriver.Chrome(executable_path=r"REDACTED")
url = 'https://www.randomlists.com/random-words'
driver.get(url)

refresh_button = driver.find_elements_by_xpath("//input[@name='submit' and 
@value='Refresh']")[0]
refresh_button.click()

rawList = []
sortedList= []
url = 'https://www.randomlists.com/random-words'
r = requests.get(url)
tree = lxml.html.fromstring(r.content)
elements = tree.get_element_by_id('result')
for el in elements:
    rawList.append(el.text_content())

print("This is the unsorted list:", "\n")
for i in rawList:
    print(i)

print("\n")
print("This is the sorted list:", "\n")
for i in rawList:
    sortedList.append(i)
sortedList.sort()
for i in sortedList:
    print(i)

I am trying to go to a random word generator website, refresh the random words generated from the default, scrape the website after the refresh, ingest the words, sort them alphabetically, then print the pre-sorted and post-sorted lists to the screen. 我正在尝试访问随机词生成器网站,刷新从默认值生成的随机词,刷新后抓取网站,提取词,按字母顺序对它们进行排序,然后将预排序和后排序的列表打印到屏幕上。

I'm not sure if this is an assignment or anything and has to be done in this specific way but there may be an easier way. 我不确定这是一项作业还是任何其他工作,是否必须以这种特定方式完成,但是可能会有更简单的方式。 Looking at the requests on that website, I found the link to the JSON of all the words they seem to be choosing random ones from. 通过查看该网站上的请求,我发现了所有似乎从中选择随机单词的单词的JSON链接。 Now, they don't seem to be sorted but that is as simple as calling sorted(...) on the returned data. 现在,它们似乎没有进行排序,但是就像对返回的数据调用sorted(...)一样简单。

>>> import requests
... 
... url = 'https://www.randomlists.com/data/words.json'
... r = requests.get(url)
... r.raise_for_status()
... all_words = r.json()['data']
>>> len(all_words)
2643
>>> all_words == sorted(all_words)
False
>>> sorted_words = sorted(all_words)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM