简体   繁体   中英

Python Reading a file from URL in chunks

I need to read a really big file of jsonl's from a URL the approach I am using is as follow

 bulk_status_info = _get_bulk_info(shop)
 url = bulk_status_info.get('bulk_info').get('url')
 file = urllib.request.urlopen(url)
 for line in file:
    print(json.loads(line.decode("utf-8")))

However, my CPU and memory are limited so that brings me to two questions

  1. Is the file loaded all at once or is it have some buffering mechanism to prevent memory from overflowing.
  2. In case my task failed I want to start from the place I failed. Is there some sort of cursor I can save. Note things like seek or tell do not work here since it is not an actual file

Some additional info I am using Python3 and urllib

The file will be loaded in its entirety before running the for loop. The file will be loaded packet by packet but this is abstracted away by urllib. If you want to have closer access I'm sure there is a way similar to how it can be done using the requests library .

Generally there is no way to resume the downloading of a webpage, or any file request for that matter unless the server specifically supports it. That would require the server to allow for a start point to be specified, this is the case for video streaming protocols.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM