简体   繁体   中英

Scraping data from website Python - After interaction

Hello guys!

A friend of mine has to do a lot of typing for school in her IT classes. That means, she has to learn how to type fast on the keyboard. As lazy as she is, she asked me if i have any idea how she's able to type her texts on https://at4.typewriter.at/index.php?r=site/index without actually doing something. I thought to myself "hey thats a cool idea, I'll look into it".

This is how the website looks like

Thats the website where she has to type. There is a <span id="actualLetter" tag with the current char that has to be typed and another <span id="remainingText" with the remaining text. I've been able scrape the fist "actualLetter" with BeautifulSoup and open the website with webbrowser. The problem is, that on first start the span "remainingText" does not have 100% of the remaining Text. After the first letter has been typen, the span updates to the "full" text and I could scrape it. After I'd scrape it, I'd just let it be written by the python program with pynput.keyboard.

The problem I am facing is that i have no Idea how to scrape data from a website that already has been opened in a webrowser / that already has been edited / that already has been interacted with. I'm happy about any advice or solutions!

Thanks!

Normally, you'd have people asking for what you've tried so far and your code, but I understand you're really in the dark on how to even get started with this problem.

If you need the Python script to be able to step in after the user has interacted with the site, you're in for a massive challenge. There are many variables, like what browser is being used, on what operating system, at what resolution, with what settings, etc.

Interacting with a live application will be fairly hard, although not impossible. If the site can be operated entirely using the keyboard and you can find some reliable sequence of keyboard inputs that find the right controls to send input to, that could be an approach and libraries like pywin32 could provide access to the API call you'd need to send input to the screen.

However, a better approach may be to just cut out the user altogether and have the script perform all the interaction. You can do that through something like selenium and a driver like ChromeDriver that basically allows you to operate a website, with all its scripting, like a user would.

You should probably look into either of these approaches and come up with a basic attempt to ask more specific questions if you run into problems.

I would really recommend looking into selenium as a webdriver, it allows for automation and similar scraping to BS4, for specifically interacting with DOM elements.

I'm sorta unsure about the website, since I can't quite access it, however, I am sure that if you check out the selenium documentation, you should be able to solve your query!

With selenium you'll probably need to install a browser driver, so depending on the setup and what you can install/execute, may be an issue. The selenium python bindings are relatively simple, however, slightly more complicated than BS4, in my opinion. I would recommend checking out other SO posts if you get stuck or try to dive into the documentation !

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM