简体   繁体   中英

Python script that utilizes a single search bar out of several on a website

I have a list of 230 crystal structure space groups (strings). I'd like to write a python script to extract files, for each group, from http://rruff.geo.arizona.edu/AMS/amcsd.php .

I'd like the script to iteratively searches for all space groups in the "Cell Parameters and Symmetry" search option, and then downloads one of the files for some structure (say the first one).

An example of my list looks something like spaceGroups = ["A-1","A2","A2/a","A2/m","..."] . Search format for say group 1 will look like this, sg=A-1 , and the results look like http://rruff.geo.arizona.edu/AMS/result.php .

First I'd like to know if this is even possible, and if so, where to start?

Sure, it's possible. The "clean" way is to create a crawler to make requests, download and save the files.

You can use scrapy ( https://docs.scrapy.org/en/latest/ ) for the crawler and Fiddler ( https://www.telerik.com/fiddler ) to see what requests you need to recreate inside your spider.

In essence you will use a list of space groups to generate requests to the form on that page, after each request you will parse the response, collect the IDs/download urls and follow on subsequent pages (to collect all IDs/download urls). Finally you will download the files.

If you don't want to use scrapy you can make your own logic with requests ( https://requests.readthedocs.io/en/latest/user/quickstart/ ), but scrapy would download everything faster and has a lot of features to help you.

Perusing that page it seems you only need the ids from each crystals, the actual download urls are simple.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM