簡體   English   中英

如何使用 Azure 數據工廠活動實現一些 python 函數?

[英]How to implement some python functions with Azure Data Factory activity?

描述場景:我有一個 Azure 服務總線並接收到一個主題的數據。 此外,當我在服務總線上收到一條消息時,我有一個 Azure 函數(服務總線主題觸發器),然后這個函數在這個消息上運行一些函數。 (見下面的代碼)

功能的步驟是(此代碼的詳細信息)

  1. 收到一條消息並將其轉換為 JSON

  2. 檢查接收到的消息是否有效

  3. 再次檢查接收消息的條件

  4. 如果上述條件為 TRUE

  5. 從接收到的消息中的字段值創建一個數組

  6. 對步驟 5 的輸出運行特征提取

  7. 對步驟 6 的輸出運行規范化

  8. 對步驟 7 的輸出運行分類並將標簽添加到接收到的消息中

  9. 步驟8的輸出,插入到數據庫

現在,我想知道如何使用數據工廠活動作為管道來實現和運行這些功能(步驟)。 (或有關此場景的其他指南和建議)

我的代碼是:

import logging
import json
import pickle
import statistics
import config
import psycopg2
import pandas as pd
import numpy as np
import azure.functions as func


def main(message: func.ServiceBusMessage):

    connection_db = psycopg2.connect(
        f"host={config.database_url} dbname=developer user={config.database_username} password={config.database_password}")
    cursor_connection = connection_db.cursor()

    """
    this functions validate and filters data with the folloeing criteria:
    message_type==50
    logical_id=='BLOCK'
    """
    message_body = message.get_body().decode("utf-8")
    message_body = message_body.replace(";", ",")
    message_json = json.loads(message_body)
    print("Json Converted")
    if message_json['error'] == {} and message_json['MSG_TYPE_TAG'] != '':
        logging.info("Data is Valid")
    else:
        logging.info("Data Not Valid")

    if int(message_json['MSG_TYPE_TAG']) == 50 and message_json['GET_RIGIDSENSE_SENSOR_ACCELDATA_LOG_TAG']['logical_name'] == 'BLOCK':
        message_filtered = message_json

        """
        this functions makes one array from the recieved array data
        """

        def _create_one_array(message_filtered):
            acceleration_array_of_all = []
            temp_array = message_filtered['GET_RIGIDSENSE_SENSOR_ACCELDATA_LOG_TAG']['acceleration_array']
            for value in temp_array:
                acceleration_array_of_all.append(value)
            return acceleration_array_of_all

        """
        features extraction functions
        """

        def percent_above_mean(acceleration_array_list):
            percent_above_mean = 0
            mean = np.mean(acceleration_array_list)
            for i in acceleration_array_list:
                if i > mean:
                    percent_above_mean += 1
            return percent_above_mean/len(acceleration_array_list)

        def variation_from_mean(acceleration_array_list):
            variation_from_mean = 0
            mean = np.mean(acceleration_array_list)
            for value in acceleration_array_list:
                variation_from_mean = variation_from_mean+abs(value-mean)
            return variation_from_mean/len(acceleration_array_list)

        def _feature_extraction(acceleration_array):
            feature = dict()
            feature['mean'] = np.mean(acceleration_array)
            feature['max'] = max(acceleration_array)
            feature['min'] = min(acceleration_array)
            feature['std'] = np.std(acceleration_array)
            feature['median'] = statistics.median(acceleration_array)
            feature['L1'] = sum(list(map(abs, acceleration_array)))
            feature['MAD'] = pd.Series(acceleration_array).mad()
            feature['percent_above_mean'] = percent_above_mean(
                acceleration_array)
            feature['variation_from_mean'] = variation_from_mean(
                acceleration_array)
            features_dataframe = pd.DataFrame(feature, index=[0])
            return features_dataframe

        def _normalization(df):
            scaler = pickle.load(open('scaler.sav', 'rb'))
            scaler.transform(df)
            return df

        """
        classification function
        """

        def _classification_lable(normalized_features):
            classifier = pickle.load(
                open('ExtraTreesClassifier.sav', 'rb'))
            prediction = dict()
            label = classifier.predict(normalized_features).tolist()[0]
            if label == 0:
                prediction['label'] = 'Hard'
            else:
                prediction['label'] = 'Easy'
            probablity = classifier.predict_proba(normalized_features)
            prediction['probability'] = round(max(probablity[0]), 2)
            return prediction

        def _classification(normalized_features):
            label = _classification_lable(normalized_features)
            return label

        acceleration_array = _create_one_array(message_filtered)
        extracted_features = _feature_extraction(acceleration_array)
        normalized_features = _normalization(extracted_features)
        label = _classification(normalized_features)
        logging.info('functions done')

        """
        Insert to database
        """
        message_final = {**message_filtered, **
                         message_filtered['LOG_TAG']}
        del message_final['error']
        del message_final['LOG_TAG']
        del message_final['acceleration_array']

        message_final['label'] = []
        message_final['probability'] = []
        message_final['label'] = label['label']
        message_final['probability'] = label['probability']
        cursor_connection.execute(
            '''INSERT into dci_output_lable VALUES (%(MSG_TYPE_TAG)s , %(ATTACHED_DEVICE_SERIAL_NUMBER_TAG)s, %(date_time)s , %(name)s , %(number)s , %(sequence)s , %(label)s , %(probability)s);''', message_final)
        connection_db.commit()
        logging.info("Insert to database done")

    else:
        logging.info(" Input data isn't BLOCKS")

可以使用Azure Function數據工廠中的Azure Function活動來運行 Azure Function。

  1. 創建 Azure 函數鏈接服務。 Azure Function獲取 Function App URL 和 Function Key。

在此處輸入圖片說明

在此處輸入圖片說明

相關服務詳情

在此處輸入圖片說明

  1. 在 Azure 數據工廠中創建管道並向其添加Azure Function活動。

    (i) 在設置中,指定創建的鏈接服務。

    (ii) 函數名稱將是為您的 Azure 函數創建的名稱。

    (iii) 方法:要調用的函數方法。

    (iv) 主體:請求函數調用。

在此處輸入圖片說明

  1. 一旦管道成功執行,您就可以看到輸出。

在此處輸入圖片說明

輸出:

在此處輸入圖片說明

參考: Azure 函數活動

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM