简体   繁体   中英

How do I ensure fast and reliable datasets migration to fiware backend?

Transferring a very large datasets to fiware backend: SourceDB ->Orion -> Cygnus -> Postgres . For this I write a python script fetching rows and for each row fetched, send a payload to the Orion.

The script is firing at 150ms(0.15sec) intervals, but to my surprise after about 10 iterations, only two values (the first & last payloads sent) was persisted to the Postgres sink. Meaning 80% of the datasets aren't persisted to the sink.

Script:

import psycopg2
from time import sleep
from config import config
from tqdm import tqdm
import requests
import json

def val_json():
    db = "select to_json(d) from (  select \
        n.noise_data as measurand, \
        n.factor as \"sonometerClass\", \
        to_timestamp(n.seconds) as \"dateObserved\", \
        l.description as name, \
            json_build_object( \
                'coordinates', \
                json_build_array(l.node_lon, l.node_lat) \
            ) as location \
        from noise as n \
            inner join deployment as d on \
                d.deployment_id = n.deployment_id \
            inner join location as l on \
                l.location_id = d.location_id \
        order by n.seconds asc \
    ) as d"
    return db

def main():

    url = 'http://localhost:1026/v2/entities/002/attrs?options=keyValues'
    headers = {"Content-Type": "application/json", \
               "fiware-service": "urbansense",  \
               "fiware-servicepath": "/basic"}

    conn = None
    try:
        params = config()
        with psycopg2.connect(**params) as conn:
            with conn.cursor(name='my_cursor') as cur:
                cur.itersize = 5000
                cur.execute(val_json())

                for row in tqdm(cur):
                    jsonData = json.dumps(row)
                    if jsonData.startswith('[') and jsonData.endswith(']'):
                        jsonData = jsonData[1:-1]
                        print()
                        print(jsonData)
                    requests.post(url, data= jsonData, headers=headers)
                    sleep(0.15)

                cur.close()
    except (Exception, psycopg2.DatabaseError) as error:
        print(error)
    finally:
        if conn is not None:
            conn.close()

if __name__ == '__main__':
    main()

First ten iterations (payloads):

$python3 noiselevelObserved.py
0it [00:00, ?it/s]
{"measurand": 64.8, "sonometerClass": 1, "dateObserved": "1970-01-01T01:00:15+01:00", "name": "Trindade", "location": {"coordinates": [-8.609973, 41.151943]}}
1it [00:00,  1.75it/s]
{"measurand": 58.8, "sonometerClass": 0, "dateObserved": "1970-01-01T01:00:16+01:00", "name": "Trindade", "location": {"coordinates": [-8.609973, 41.151943]}}
2it [00:00,  2.23it/s]
{"measurand": 56.5, "sonometerClass": 0, "dateObserved": "1970-01-01T01:00:17+01:00", "name": "Trindade", "location": {"coordinates": [-8.609973, 41.151943]}}
3it [00:00,  2.76it/s]
{"measurand": 61.1, "sonometerClass": 1, "dateObserved": "1970-01-01T01:00:18+01:00", "name": "Casa da Musica", "location": {"coordinates": [-8.63041, 41.158091]}}
4it [00:01,  3.31it/s]
{"measurand": 108.5, "sonometerClass": 2, "dateObserved": "1970-01-01T01:00:18+01:00", "name": "Pr. Liberdade Cardosas", "location": {"coordinates": [-8.611119, 41.146023]}}
5it [00:01,  3.86it/s]
{"measurand": 56.5, "sonometerClass": 0, "dateObserved": "1970-01-01T01:00:18+01:00", "name": "Trindade", "location": {"coordinates": [-8.609973, 41.151943]}}
6it [00:01,  4.35it/s]
{"measurand": 59.9, "sonometerClass": 1, "dateObserved": "1970-01-01T01:00:19+01:00", "name": "Casa da Musica", "location": {"coordinates": [-8.63041, 41.158091]}}
7it [00:01,  4.78it/s]
{"measurand": 97.2, "sonometerClass": 2, "dateObserved": "1970-01-01T01:00:19+01:00", "name": "D. Manuel II", "location": {"coordinates": [-8.625192, 41.148558]}}
8it [00:01,  5.12it/s]
{"measurand": 108.6, "sonometerClass": 2, "dateObserved": "1970-01-01T01:00:19+01:00", "name": "Pr. Liberdade Cardosas", "location": {"coordinates": [-8.611119, 41.146023]}}
9it [00:01,  5.41it/s]
{"measurand": 57.1, "sonometerClass": 0, "dateObserved": "1970-01-01T01:00:19+01:00", "name": "Trindade", "location": {"coordinates": [-8.609973, 41.151943]}}
10it [00:02,  5.63it/s]
{"measurand": 53.9, "sonometerClass": 0, "dateObserved": "1970-01-01T01:00:20+01:00", "name": "Casa da Musica", "location": {"coordinates": [-8.63041, 41.158091]}}

Reading attributes' values persisted into postgres:

postgres=# select  * from urbansense.basic_002_noiselevelobserved ;
  recvtimets   |         recvtime         | fiwareservicepath | entityid |     entitytype     |    attrname    |    attrtype     |               attrvalue               | attrmd
---------------+--------------------------+-------------------+----------+--------------------+----------------+-----------------+---------------------------------------+--------
 1559045918129 | 2019-05-28T12:18:38.129Z | /basic            | 002      | NoiseLevelObserved | dateObserved   | Text            | 1970-01-01T01:00:15+01:00             | []
 1559045918129 | 2019-05-28T12:18:38.129Z | /basic            | 002      | NoiseLevelObserved | latitude       | Number          | 41.1591                               | []
 1559045918129 | 2019-05-28T12:18:38.129Z | /basic            | 002      | NoiseLevelObserved | location       | StructuredValue | {"coordinates":[-8.609973,41.151943]} | []
 1559045918129 | 2019-05-28T12:18:38.129Z | /basic            | 002      | NoiseLevelObserved | longitude      | Number          | -8.65915                              | []
 1559045918129 | 2019-05-28T12:18:38.129Z | /basic            | 002      | NoiseLevelObserved | measurand      | Number          | 64.8                                  | []
 1559045918129 | 2019-05-28T12:18:38.129Z | /basic            | 002      | NoiseLevelObserved | name           | Text            | Trindade                              | []
 1559045918129 | 2019-05-28T12:18:38.129Z | /basic            | 002      | NoiseLevelObserved | sonometerClass | Number          | 1                                     | []
 1559045919723 | 2019-05-28T12:18:39.723Z | /basic            | 002      | NoiseLevelObserved | dateObserved   | Text            | 1970-01-01T01:00:20+01:00             | []
 1559045919723 | 2019-05-28T12:18:39.723Z | /basic            | 002      | NoiseLevelObserved | latitude       | Number          | 41.1591                               | []
 1559045919723 | 2019-05-28T12:18:39.723Z | /basic            | 002      | NoiseLevelObserved | location       | StructuredValue | {"coordinates":[-8.63041,41.158091]}  | []
 1559045919723 | 2019-05-28T12:18:39.723Z | /basic            | 002      | NoiseLevelObserved | longitude      | Number          | -8.65915                              | []
 1559045919723 | 2019-05-28T12:18:39.723Z | /basic            | 002      | NoiseLevelObserved | measurand      | Number          | 53.9                                  | []
 1559045919723 | 2019-05-28T12:18:39.723Z | /basic            | 002      | NoiseLevelObserved | name           | Text            | Casa da Musica                        | []
 1559045919723 | 2019-05-28T12:18:39.723Z | /basic            | 002      | NoiseLevelObserved | sonometerClass | Number          | 0                                     | []
(14 rows)

Changing the firing rate to 1-second interval doesn't give much improvement, only the 3 result sets (payloads) were persisted (70% lost):

postgres=# select  * from urbansense.basic_002_noiselevelobserved ;
  recvtimets   |         recvtime         | fiwareservicepath | entityid |     entitytype     |    attrname    |    attrtype     |               attrvalue               | attrmd
---------------+--------------------------+-------------------+----------+--------------------+----------------+-----------------+---------------------------------------+--------
 1559046840569 | 2019-05-28T12:34:00.569Z | /basic            | 002      | NoiseLevelObserved | dateObserved   | Text            | 1970-01-01T01:00:15+01:00             | []
 1559046840569 | 2019-05-28T12:34:00.569Z | /basic            | 002      | NoiseLevelObserved | latitude       | Number          | 41.1591                               | []
 1559046840569 | 2019-05-28T12:34:00.569Z | /basic            | 002      | NoiseLevelObserved | location       | StructuredValue | {"coordinates":[-8.609973,41.151943]} | []
 1559046840569 | 2019-05-28T12:34:00.569Z | /basic            | 002      | NoiseLevelObserved | longitude      | Number          | -8.65915                              | []
 1559046840569 | 2019-05-28T12:34:00.569Z | /basic            | 002      | NoiseLevelObserved | measurand      | Number          | 64.8                                  | []
 1559046840569 | 2019-05-28T12:34:00.569Z | /basic            | 002      | NoiseLevelObserved | name           | Text            | Trindade                              | []
 1559046840569 | 2019-05-28T12:34:00.569Z | /basic            | 002      | NoiseLevelObserved | sonometerClass | Number          | 1                                     | []
 1559046845620 | 2019-05-28T12:34:05.620Z | /basic            | 002      | NoiseLevelObserved | dateObserved   | Text            | 1970-01-01T01:00:18+01:00             | []
 1559046845620 | 2019-05-28T12:34:05.620Z | /basic            | 002      | NoiseLevelObserved | latitude       | Number          | 41.1591                               | []
 1559046845620 | 2019-05-28T12:34:05.620Z | /basic            | 002      | NoiseLevelObserved | location       | StructuredValue | {"coordinates":[-8.609973,41.151943]} | []
 1559046845620 | 2019-05-28T12:34:05.620Z | /basic            | 002      | NoiseLevelObserved | longitude      | Number          | -8.65915                              | []
 1559046845620 | 2019-05-28T12:34:05.620Z | /basic            | 002      | NoiseLevelObserved | measurand      | Number          | 56.5                                  | []
 1559046845620 | 2019-05-28T12:34:05.620Z | /basic            | 002      | NoiseLevelObserved | name           | Text            | Trindade                              | []
 1559046845620 | 2019-05-28T12:34:05.620Z | /basic            | 002      | NoiseLevelObserved | sonometerClass | Number          | 0                                     | []
 1559046850679 | 2019-05-28T12:34:10.679Z | /basic            | 002      | NoiseLevelObserved | dateObserved   | Text            | 1970-01-01T01:00:20+01:00             | []
 1559046850679 | 2019-05-28T12:34:10.679Z | /basic            | 002      | NoiseLevelObserved | latitude       | Number          | 41.1591                               | []
 1559046850679 | 2019-05-28T12:34:10.679Z | /basic            | 002      | NoiseLevelObserved | location       | StructuredValue | {"coordinates":[-8.63041,41.158091]}  | []
 1559046850679 | 2019-05-28T12:34:10.679Z | /basic            | 002      | NoiseLevelObserved | longitude      | Number          | -8.65915                              | []
 1559046850679 | 2019-05-28T12:34:10.679Z | /basic            | 002      | NoiseLevelObserved | measurand      | Number          | 53.9                                  | []
 1559046850679 | 2019-05-28T12:34:10.679Z | /basic            | 002      | NoiseLevelObserved | name           | Text            | Casa da Musica                        | []
 1559046850679 | 2019-05-28T12:34:10.679Z | /basic            | 002      | NoiseLevelObserved | sonometerClass | Number          | 0                                     | []
(21 rows)

I honestly wouldn't like the firing rate to be up to a second because of the data size to be migrated to continue my research. When I first tried it at 1-second interval, I realised is going to take months to complete (maybe 4).

Question: Is Orion CB (or maybe Cygnus ) not built to have such power of receiving entity/attributes values at such a rate (150ms/0.15s), Or maybe Cygnus isn't smart enough to receive notification from Orion at such a rate?.

I will appreciate any suggestion to ensure all values are persisted in the shortest time.

Based on the discussion in the question comments, the solution to this was to remove the throttling parameter in the subscription. It makes sense: the throttling makes some notification not being sent (in this particular case, 80% of all the notifications).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM