簡體   English   中英

如何通過其 API 從網站讀取數據?

[英]How to read data from a website through its API?

我對 Spark 很陌生。 I need to read data from the website Opensky, using the api they have for it ( https://openskynetwork.github.io/opensky-api/python.html ). bbox 參數是正好四個值(min_latitude、max_latitude、min_longitude、max_latitude)的元組。 以下代碼顯示了在某些坐標上注冊的航班:

import json
from random import sample

from opensky_api import OpenSkyApi
api = OpenSkyApi()
states = api.get_states(bbox=(45.8389, 47.8229, 5.9962, 10.5226))

for s in sample(states.states,5):
    flight = {
            'callsign':s.callsign,
            'country': s.origin_country,
            'longitude': s.longitude,
            'latitude': s.latitude,
            'velocity': s.velocity,
            'vertical_rate': s.vertical_rate,
        }
flight_data= json.dumps(flight, indent=2).encode('utf-8')
print("(%r, %r,%r, %r, %r, %r)" % (s.callsign, s.origin_country, s.longitude, s.latitude,s.velocity,s.vertical_rate))

我需要創建一個 python 程序,以便能夠每 10 秒發送一次航班信息(通過我分配的端口)。 首先,我必須在終端中使用從 Opensky 讀取的套接字服務器運行 python 程序,然后我必須在另一個終端中運行帶有結構化流的 Spark 程序。 我需要發送數據並通過終端以 json 格式顯示(使用 json.dumps 函數)。

我有以下模板可以做到這一點,但我不知道應該如何修改它們才能讀取數據。 模板如下:

服務器套接字:

import socket
server = socket.socket()
host = ????
port = ????
server.bind((host, port))
server.listen(2)
client_socket, addr = server.accept()
print("connection established.")

# Sending data
client_socket.sendall("Text".encode())

Spark結構化流:

from pyspark.sql import SparkSession
from pyspark.sql.functions import explode
from pyspark.sql.functions import split

spark = SparkSession \
    .builder \
    .appName("FlightsInformation") \
    .getOrCreate()

    flights= spark \
    .readStream \
    .format("socket") \
    .option("host", "????") \
    .option("port", ????) \
    .load()

flights_information= ????

 query = flight_information\
 .writeStream \
 .outputMode("complete") \
 .format("console") \
 .start()

query.awaitTermination()

我該怎么做?

這就是我創建套接字以通過套接字發送 JSON 數據的方式。

import socket
import sys
import json
from random import sample
from time import sleep

from opensky_api import OpenSkyApi
api = OpenSkyApi()
states = api.get_states(bbox=(45.8389, 47.8229, 5.9962, 10.5226))
# Create a socket (SOCK_STREAM means a TCP socket)
try:
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
except socket.error as err:
    print('Socket error because of %s' %(err))

try:  
    # Connectar al server
    sock.bind(('127.0.0.1', PORT))
except socket.error as err:
    print('Error, could not bind to server because of %s' %(err))
    sys.exit

sock.listen(2)
client_socket, addr = sock.accept()
print("connection established.")

while True:
    
    for s in sample(states.states, 5):
        vuelo_dict = {
                    'callsign':s.callsign,
                    'country': s.origin_country,
                    'longitude': s.longitude,
                    'latitude': s.latitude,
                    'velocity': s.velocity,
                    'vertical_rate': s.vertical_rate,
                }
        flight_data = json.dumps(vuelo_dict, indent=2).encode('utf-8')
        print("(%r, %r,%r, %r, %r, %r)" % (s.callsign, s.origin_country, s.longitude, s.latitude,s.velocity,s.vertical_rate))

        try:
            client_socket.sendall(flight_data)
            sleep(10)
            #print('Sent: {}'.format(flight_data))

        except socket.gaierror:
            print ('There an error resolving the host')

sock.close()

Spark結構化流:

from pyspark.sql import SparkSession
from pyspark.sql.functions import explode
from pyspark.sql.functions import split

spark = SparkSession \
    .builder \
    .appName("FlightsInformation") \
    .getOrCreate()

flights_information= spark \
    .readStream \
    .format('socket')\
    .option('host', 'localhost')\
    .option('port', XXXXX)\
    .load()

query = flights_information\
    .writeStream \
    .outputMode("append") \
    .format("console") \
    .start()

query.awaitTermination()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM