简体   繁体   中英

API to trigger series of Python scripts

I have 3 python scripts

  1. grab.py: Takes as input an Instagram account name and outputs a text file containing all of its followers.
  2. scrape.py: Takes as input the output from grab.py and outputs details of each account (follower count, post count etc) in csv form
  3. analyze.py: A basic machine learning model that uses results of scrape.py to perform an analysis on the accounts.

The 3 scripts work as expected individually. The next step is to create an API endpoint which will take an account name as a request parameter, and then trigger the above 3 scripts for the received account. The final analysis results will be stored in a database.

The endpoint also needs to have a queueing mechanism to store account names received. The queue will be polled, and if account names are available, they will be processed sequentially.

My API development experience is limited so I am not sure of the best approach to tackle this problem. My questions are:

  1. Should my API endpoint be written in Python? If yes, is the Flask framework a viable option? If not, what are the other options I have?
  2. Is there some sort of a pipeline that I can use to seamlessly integrate the 3 scripts together?
  3. Is the idea of maintaining the queue in memory and polling it using a separate thread running an infinite while loop a good idea? Is there a better way to accomplish this?

To get info from an API and save it I would recommend using asyncio to do something like

import asyncio
import aiohttp
import time
import aiofiles as aiof

FILENAME = "foo.txt"
loop = asyncio.get_event_loop()

async def fetch(session, url):
    async with session.get(url) as response:
        async with aiof.open(FILENAME, "a") as out:
            out.write((await response.json()))
            out.flush()



async def main():
    instagram-ids = [] #profile ids
    current = time.time()
    url = "INSTAGRAM_API_URL"
    tasks = []
    async with aiohttp.ClientSession() as session:
        for id in instagram-ids:
            tasks.append(loop.create_task(fetch(session, url.format(id))))
        responses = await asyncio.gather(*tasks)
    print(time.time() - current)

loop.run_until_complete(main())

since most of the time when dealing with API is spent on waiting for the results

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM