如何从 Python 中的 URL 连续提取数据？

Question

I have a link, eg www.someurl.com/api/getdata?password=... , and when I open it in a web browser it sends a constantly updating document of text.我有一个链接，例如www.someurl.com/api/getdata?password=... ，当我在网络浏览器中打开它时，它会发送一个不断更新的文本文档。 I'd like to make an identical connection in Python, and dump this data to a file live as it's received.我想在 Python 中建立一个相同的连接，并将这些数据在收到时实时转储到一个文件中。 I've tried using requests.Session() , but since the stream of data never ends (and dropping it would lose data), the get request also never ends.我尝试使用requests.Session() ，但由于数据流永远不会结束（并且丢弃它会丢失数据），因此获取请求也永远不会结束。

import requests

s = requests.Session()
x = s.get("www.someurl.com/api/getdata?password=...") #never terminates

What's the proper way to do this?这样做的正确方法是什么？

Answer 1

I found the answer I was looking for here: Python Requests Stream Data from API我在这里找到了我正在寻找的答案： Python Requests Stream Data from API

Full implementation:全面实施：

import requests

url = "www.someurl.com/api/getdata?password=..."
s = requests.Session()
with open('file.txt','a') as fp:
    with s.get(url,stream=True) as resp:
        for line in resp.iter_lines(chunk_size=1):
            fp.write(str(line))

Note that chunk_size=1 is necessary for the data to immediately respond to new complete messages, rather than waiting for an internal buffer to fill before iterating over all the lines.请注意， chunk_size=1是数据立即响应新的完整消息所必需的，而不是在遍历所有行之前等待内部缓冲区填满。 I believe chunk_size=None is meant to do this, but it doesn't work for me.我相信chunk_size=None是为了做到这一点，但它对我不起作用。

Answer 2

You can keep making get requests to the url您可以继续向 url 发出 get 请求

import requests
import time

url = "www.someurl.com/api/getdata?password=..."

sess = requests.session()
while True:
    req = sess.get(url)
    time.sleep(10)

Answer 3

this will terminate the request after 1 second ,这将在 1 秒后终止请求，

import multiprocessing
import time
import requests

data = None

def get_from_url(x):
    s = requests.Session() 
    data = s.get("www.someurl.com/api/getdata?password=...")    

if __name__ == '__main__':
    while True: 
        p = multiprocessing.Process(target=get_from_url, name="get_from_url", args=(1,))
        p.start()

        # Wait 1 second for get request
        time.sleep(1)

        p.terminate()

        p.join()
        
        # do something with the data 
        print(data) # or smth else

如何从 Python 中的 URL 连续提取数据？

问题描述

3 个解决方案

解决方案1
2 已采纳 2021-10-12 20:12:40

解决方案2
0 2021-10-12 19:30:28

解决方案3
0 2021-10-12 19:41:28

如何从 Python 中的 URL 连续提取数据？

问题描述

3 个解决方案

解决方案1 2 已采纳 2021-10-12 20:12:40

解决方案2 0 2021-10-12 19:30:28

解决方案3 0 2021-10-12 19:41:28

解决方案1
2 已采纳 2021-10-12 20:12:40

解决方案2
0 2021-10-12 19:30:28

解决方案3
0 2021-10-12 19:41:28