简体   繁体   English

从网站 Python 抓取数据 - 交互后

[英]Scraping data from website Python - After interaction

Hello guys!大家好!

A friend of mine has to do a lot of typing for school in her IT classes.我的一个朋友在她的 IT 课上必须为学校做很多打字工作。 That means, she has to learn how to type fast on the keyboard.这意味着,她必须学习如何在键盘上快速打字。 As lazy as she is, she asked me if i have any idea how she's able to type her texts on https://at4.typewriter.at/index.php?r=site/index without actually doing something.尽管她很懒惰,但她问我是否知道她如何能够在https://at4.typewriter.at/index.php?r=site/index上键入她的文本而不实际做某事。 I thought to myself "hey thats a cool idea, I'll look into it".我心想“嘿,这是一个很酷的主意,我会研究一下”。

This is how the website looks like这是网站的样子

Thats the website where she has to type.那是她必须输入的网站。 There is a <span id="actualLetter" tag with the current char that has to be typed and another <span id="remainingText" with the remaining text.有一个 <span id="actualLetter" 标记带有必须输入的当前字符,另一个 <span id="remainingText" 带有剩余文本。 I've been able scrape the fist "actualLetter" with BeautifulSoup and open the website with webbrowser.我已经能够用 BeautifulSoup 刮掉拳头“actualLetter”并用 webbrowser 打开网站。 The problem is, that on first start the span "remainingText" does not have 100% of the remaining Text.问题是,在第一次启动时,跨度“remainingText”没有 100% 的剩余文本。 After the first letter has been typen, the span updates to the "full" text and I could scrape it.输入第一个字母后,跨度会更新为“完整”文本,我可以将其刮掉。 After I'd scrape it, I'd just let it be written by the python program with pynput.keyboard.在我刮掉它之后,我会让它由 python 程序用 pynput.keyboard 编写。

The problem I am facing is that i have no Idea how to scrape data from a website that already has been opened in a webrowser / that already has been edited / that already has been interacted with.我面临的问题是我不知道如何从已经在网络浏览器中打开/已经被编辑/已经与之交互的网站中抓取数据。 I'm happy about any advice or solutions!我很高兴有任何建议或解决方案!

Thanks!谢谢!

Normally, you'd have people asking for what you've tried so far and your code, but I understand you're really in the dark on how to even get started with this problem.通常,您会让人询问您迄今为止尝试过的内容和您的代码,但我知道您对于如何开始解决这个问题真的一无所知。

If you need the Python script to be able to step in after the user has interacted with the site, you're in for a massive challenge.如果您需要 Python 脚本能够在用户与站点交互后介入,那么您将面临巨大的挑战。 There are many variables, like what browser is being used, on what operating system, at what resolution, with what settings, etc.有很多变量,比如正在使用什么浏览器、在什么操作系统上、在什么分辨率下、使用什么设置等。

Interacting with a live application will be fairly hard, although not impossible.与实时应用程序交互将相当困难,尽管并非不可能。 If the site can be operated entirely using the keyboard and you can find some reliable sequence of keyboard inputs that find the right controls to send input to, that could be an approach and libraries like pywin32 could provide access to the API call you'd need to send input to the screen.如果该站点可以完全使用键盘进行操作,并且您可以找到一些可靠的键盘输入序列来找到正确的控件来发送输入,那么这可能是一种方法,并且像pywin32这样的库可以提供对您需要的 API 调用的访问将输入发送到屏幕。

However, a better approach may be to just cut out the user altogether and have the script perform all the interaction.但是,更好的方法可能是完全删除用户并让脚本执行所有交互。 You can do that through something like selenium and a driver like ChromeDriver that basically allows you to operate a website, with all its scripting, like a user would.您可以通过ChromeDriver之类的东西和selenium类的驱动程序来做到这一点,该驱动程序基本上允许您像用户一样操作网站及其所有脚本。

You should probably look into either of these approaches and come up with a basic attempt to ask more specific questions if you run into problems.如果遇到问题,您可能应该研究这些方法中的任何一种,并提出一个基本的尝试来提出更具体的问题。

I would really recommend looking into selenium as a webdriver, it allows for automation and similar scraping to BS4, for specifically interacting with DOM elements.我真的建议将selenium作为 webdriver 进行研究,它允许自动化和类似于 BS4 的抓取,用于专门与 DOM 元素交互。

I'm sorta unsure about the website, since I can't quite access it, however, I am sure that if you check out the selenium documentation, you should be able to solve your query!我不太确定该网站,因为我不能完全访问它,但是,我相信如果您查看 selenium 文档,您应该能够解决您的查询!

With selenium you'll probably need to install a browser driver, so depending on the setup and what you can install/execute, may be an issue.对于selenium ,您可能需要安装浏览器驱动程序,因此根据设置和您可以安装/执行的内容,可能会出现问题。 The selenium python bindings are relatively simple, however, slightly more complicated than BS4, in my opinion.在我看来,selenium python 绑定相对简单,但是比 BS4 稍微复杂一些。 I would recommend checking out other SO posts if you get stuck or try to dive into the documentation !如果您遇到困难或尝试深入了解 文档,我建议您查看其他 SO 帖子!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM