net-Lib error during Connection timed out (110) on Airflow

Question

I a running a process on Apache Airflow that has a loop in which it reads data from a MSSQL data base, adds two columns and writes the data to another MSSQL data base. I am using MsSqlHook to connect to both bases The process usually runs fine with a loop that reads and loads the data, but sometimes, after some successful data writes, I get the following error message:

ERROR - (20009, b'DB-Lib error message 20009, severity 9:\nUnable to connect: Adaptive Server is unavailable or does not exist (SOURCE_DB.database.windows.net:PORT)\nNet-Lib error during Connection timed out (110)\nDB-Lib error message 20009, severity 9:\nUnable to connect: Adaptive Server is unavailable or does not exist (SOURCE_DB.database.windows.net:PORT)\nNet-Lib error during Connection timed out (110)\n')
Traceback (most recent call last):
  File "src/pymssql.pyx", line 636, in pymssql.connect
  File "src/_mssql.pyx", line 1957, in _mssql.connect
  File "src/_mssql.pyx", line 676, in _mssql.MSSQLConnection.__init__
  File "src/_mssql.pyx", line 1683, in _mssql.maybe_raise_MSSQLDatabaseException
_mssql.MSSQLDatabaseException: (20009, b'DB-Lib error message 20009, severity 9:\nUnable to connect: Adaptive Server is unavailable or does not exist (SOURCE_DB.database.windows.net:PORT)\nNet-Lib error during Connection timed out (110)\nDB-Lib error message 20009, severity 9:\nUnable to connect: Adaptive Server is unavailable or does not exist (SOURCE_DB.database.windows.net:PORT)\nNet-Lib error during Connection timed out (110)\n')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 984, in _run_raw_task
    result = task_copy.execute(context=context)
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 113, in execute
    return_value = self.execute_callable()
  File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 118, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/usr/local/airflow/dags/DAG_NAME.py", line 156, in readWriteData
    df = readFromSource(query)
  File "/usr/local/airflow/dags/MX_CENT_SAMS_EXIT_APP_ITMS_MIGRATION.py", line 112, in readFromSource
    df = mssql_hook.get_pandas_df(sql=query)
  File "/usr/local/lib/python3.7/site-packages/airflow/hooks/dbapi_hook.py", line 99, in get_pandas_df
    with closing(self.get_conn()) as conn:
  File "/usr/local/lib/python3.7/site-packages/airflow/hooks/mssql_hook.py", line 48, in get_conn
    port=conn.port)
  File "src/pymssql.pyx", line 642, in pymssql.connect

I am guessing this is because the connection to the source data base is unstable, and whenever it is interrupted it can't reestablish it, so is there a way to pause or make the process wait if the source connection becomes unavaliable?

This is my current code:

def readFromSource(query):
    """
    Args: query--> Query to be executed
    Returns: Dataframe with source tables data
    """
    print("Executing readFromSource()")
    mssql_hook = MsSqlHook(mssql_conn_id=SRC_CONN)
    mssql_hook.autocommit = True
    df = mssql_hook.get_pandas_df(sql=query)
    print(f"Source rows: {df.shape[0]}")
    print("readFromSource() execution completed")
    return df

def writeToTarget(df):
    print("Executing writeToTarget()")

    try:
        fast_sql_conn = FastMSSQLConnection(TGT_CONN)
        tgt_conn = fast_sql_conn.getConnection()
        with closing(tgt_conn) as conn:
            df.to_sql(
                name=TGT_TABLE,
                schema='dbo',
                con=conn,
                chunksize=CHUNK_SIZE,
                method='multi',
                index=False,
                if_exists='append'
                )
    except Exception as e:
        print("Error while loading data to target: " + str(e))

    print("writeToTarget() execution completed")

def readWriteData(*op_args, **context):
    """Loads info to target table
    """
    print("Executing readWriteData()")

    partition_column_list = context['ti'].xcom_pull(
        task_ids='getPartitionColumnList')

    parallelProcParams = context['ti'].xcom_pull(
        task_ids='setParallelProcessingParams')

    range_start = parallelProcParams['i'][op_args[0]][0]
    range_len = parallelProcParams['i'][op_args[0]][1]

    for i in range(range_start, range_start + range_len):
        filter_ = partition_column_list[i]
        print(f"Executing for audititemid: {filter_}")
        query = SRC_QUERY + ' and audititemid = ' + str(filter_).replace("[","").replace("]","") # a exit app
        df = readFromSource(query)
        df = df.rename(columns={"createdate": "CREAT_DATE", "scannedqty": "SCANNED_QTY", "audititemid":"AUDT_ITM_ID", "auditid":"AUDT_ID", "upc":"UPC", "itemnbr":"ITM_NBR", "txqty":"TXNS_QTY", "displayname":"DSPLY_NAME", "unitprice":"UNIT_PRICE", "cancelled":"CNCL"})
        df['LOADG_CHNNL'] = 'Airflow Exit App DB'
        df['LOADG_DATE'] = datetime.now()
        writeToTarget(df)

    print("readWriteData() execution completed")

Answer 1

You could split the task in two:

Read from DB and persist
Read persisted data and write to DB

The first task will read the data, transform it, and persist it (eg, on the local disk). The second one will read the persisted data and write it to DB using a transaction. For the second task set the number of retries as needed.

Now, if the connection times out the second task will fail, the changes to DB will be rolled back, and Airflow will retry the task as many times as you set.

net-Lib error during Connection timed out (110) on Airflow

Question

1 answers

solution1
0 2021-04-28 16:33:34

net-Lib error during Connection timed out (110) on Airflow

Question

1 answers

solution1 0 2021-04-28 16:33:34

solution1
0 2021-04-28 16:33:34