繁体   English   中英

如何使用 Azure 数据工厂活动实现一些 python 函数?

[英]How to implement some python functions with Azure Data Factory activity?

描述场景:我有一个 Azure 服务总线并接收到一个主题的数据。 此外,当我在服务总线上收到一条消息时,我有一个 Azure 函数(服务总线主题触发器),然后这个函数在这个消息上运行一些函数。 (见下面的代码)

功能的步骤是(此代码的详细信息)

  1. 收到一条消息并将其转换为 JSON

  2. 检查接收到的消息是否有效

  3. 再次检查接收消息的条件

  4. 如果上述条件为 TRUE

  5. 从接收到的消息中的字段值创建一个数组

  6. 对步骤 5 的输出运行特征提取

  7. 对步骤 6 的输出运行规范化

  8. 对步骤 7 的输出运行分类并将标签添加到接收到的消息中

  9. 步骤8的输出,插入到数据库

现在,我想知道如何使用数据工厂活动作为管道来实现和运行这些功能(步骤)。 (或有关此场景的其他指南和建议)

我的代码是:

import logging
import json
import pickle
import statistics
import config
import psycopg2
import pandas as pd
import numpy as np
import azure.functions as func


def main(message: func.ServiceBusMessage):

    connection_db = psycopg2.connect(
        f"host={config.database_url} dbname=developer user={config.database_username} password={config.database_password}")
    cursor_connection = connection_db.cursor()

    """
    this functions validate and filters data with the folloeing criteria:
    message_type==50
    logical_id=='BLOCK'
    """
    message_body = message.get_body().decode("utf-8")
    message_body = message_body.replace(";", ",")
    message_json = json.loads(message_body)
    print("Json Converted")
    if message_json['error'] == {} and message_json['MSG_TYPE_TAG'] != '':
        logging.info("Data is Valid")
    else:
        logging.info("Data Not Valid")

    if int(message_json['MSG_TYPE_TAG']) == 50 and message_json['GET_RIGIDSENSE_SENSOR_ACCELDATA_LOG_TAG']['logical_name'] == 'BLOCK':
        message_filtered = message_json

        """
        this functions makes one array from the recieved array data
        """

        def _create_one_array(message_filtered):
            acceleration_array_of_all = []
            temp_array = message_filtered['GET_RIGIDSENSE_SENSOR_ACCELDATA_LOG_TAG']['acceleration_array']
            for value in temp_array:
                acceleration_array_of_all.append(value)
            return acceleration_array_of_all

        """
        features extraction functions
        """

        def percent_above_mean(acceleration_array_list):
            percent_above_mean = 0
            mean = np.mean(acceleration_array_list)
            for i in acceleration_array_list:
                if i > mean:
                    percent_above_mean += 1
            return percent_above_mean/len(acceleration_array_list)

        def variation_from_mean(acceleration_array_list):
            variation_from_mean = 0
            mean = np.mean(acceleration_array_list)
            for value in acceleration_array_list:
                variation_from_mean = variation_from_mean+abs(value-mean)
            return variation_from_mean/len(acceleration_array_list)

        def _feature_extraction(acceleration_array):
            feature = dict()
            feature['mean'] = np.mean(acceleration_array)
            feature['max'] = max(acceleration_array)
            feature['min'] = min(acceleration_array)
            feature['std'] = np.std(acceleration_array)
            feature['median'] = statistics.median(acceleration_array)
            feature['L1'] = sum(list(map(abs, acceleration_array)))
            feature['MAD'] = pd.Series(acceleration_array).mad()
            feature['percent_above_mean'] = percent_above_mean(
                acceleration_array)
            feature['variation_from_mean'] = variation_from_mean(
                acceleration_array)
            features_dataframe = pd.DataFrame(feature, index=[0])
            return features_dataframe

        def _normalization(df):
            scaler = pickle.load(open('scaler.sav', 'rb'))
            scaler.transform(df)
            return df

        """
        classification function
        """

        def _classification_lable(normalized_features):
            classifier = pickle.load(
                open('ExtraTreesClassifier.sav', 'rb'))
            prediction = dict()
            label = classifier.predict(normalized_features).tolist()[0]
            if label == 0:
                prediction['label'] = 'Hard'
            else:
                prediction['label'] = 'Easy'
            probablity = classifier.predict_proba(normalized_features)
            prediction['probability'] = round(max(probablity[0]), 2)
            return prediction

        def _classification(normalized_features):
            label = _classification_lable(normalized_features)
            return label

        acceleration_array = _create_one_array(message_filtered)
        extracted_features = _feature_extraction(acceleration_array)
        normalized_features = _normalization(extracted_features)
        label = _classification(normalized_features)
        logging.info('functions done')

        """
        Insert to database
        """
        message_final = {**message_filtered, **
                         message_filtered['LOG_TAG']}
        del message_final['error']
        del message_final['LOG_TAG']
        del message_final['acceleration_array']

        message_final['label'] = []
        message_final['probability'] = []
        message_final['label'] = label['label']
        message_final['probability'] = label['probability']
        cursor_connection.execute(
            '''INSERT into dci_output_lable VALUES (%(MSG_TYPE_TAG)s , %(ATTACHED_DEVICE_SERIAL_NUMBER_TAG)s, %(date_time)s , %(name)s , %(number)s , %(sequence)s , %(label)s , %(probability)s);''', message_final)
        connection_db.commit()
        logging.info("Insert to database done")

    else:
        logging.info(" Input data isn't BLOCKS")

可以使用Azure Function数据工厂中的Azure Function活动来运行 Azure Function。

  1. 创建 Azure 函数链接服务。 Azure Function获取 Function App URL 和 Function Key。

在此处输入图片说明

在此处输入图片说明

相关服务详情

在此处输入图片说明

  1. 在 Azure 数据工厂中创建管道并向其添加Azure Function活动。

    (i) 在设置中,指定创建的链接服务。

    (ii) 函数名称将是为您的 Azure 函数创建的名称。

    (iii) 方法:要调用的函数方法。

    (iv) 主体:请求函数调用。

在此处输入图片说明

  1. 一旦管道成功执行,您就可以看到输出。

在此处输入图片说明

输出:

在此处输入图片说明

参考: Azure 函数活动

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM