[英]How to implement some python functions with Azure Data Factory activity?
描述场景:我有一个 Azure 服务总线并接收到一个主题的数据。 此外,当我在服务总线上收到一条消息时,我有一个 Azure 函数(服务总线主题触发器),然后这个函数在这个消息上运行一些函数。 (见下面的代码)
功能的步骤是(此代码的详细信息)
收到一条消息并将其转换为 JSON
检查接收到的消息是否有效
再次检查接收消息的条件
如果上述条件为 TRUE
从接收到的消息中的字段值创建一个数组
对步骤 5 的输出运行特征提取
对步骤 6 的输出运行规范化
对步骤 7 的输出运行分类并将标签添加到接收到的消息中
步骤8的输出,插入到数据库
现在,我想知道如何使用数据工厂活动作为管道来实现和运行这些功能(步骤)。 (或有关此场景的其他指南和建议)
我的代码是:
import logging
import json
import pickle
import statistics
import config
import psycopg2
import pandas as pd
import numpy as np
import azure.functions as func
def main(message: func.ServiceBusMessage):
connection_db = psycopg2.connect(
f"host={config.database_url} dbname=developer user={config.database_username} password={config.database_password}")
cursor_connection = connection_db.cursor()
"""
this functions validate and filters data with the folloeing criteria:
message_type==50
logical_id=='BLOCK'
"""
message_body = message.get_body().decode("utf-8")
message_body = message_body.replace(";", ",")
message_json = json.loads(message_body)
print("Json Converted")
if message_json['error'] == {} and message_json['MSG_TYPE_TAG'] != '':
logging.info("Data is Valid")
else:
logging.info("Data Not Valid")
if int(message_json['MSG_TYPE_TAG']) == 50 and message_json['GET_RIGIDSENSE_SENSOR_ACCELDATA_LOG_TAG']['logical_name'] == 'BLOCK':
message_filtered = message_json
"""
this functions makes one array from the recieved array data
"""
def _create_one_array(message_filtered):
acceleration_array_of_all = []
temp_array = message_filtered['GET_RIGIDSENSE_SENSOR_ACCELDATA_LOG_TAG']['acceleration_array']
for value in temp_array:
acceleration_array_of_all.append(value)
return acceleration_array_of_all
"""
features extraction functions
"""
def percent_above_mean(acceleration_array_list):
percent_above_mean = 0
mean = np.mean(acceleration_array_list)
for i in acceleration_array_list:
if i > mean:
percent_above_mean += 1
return percent_above_mean/len(acceleration_array_list)
def variation_from_mean(acceleration_array_list):
variation_from_mean = 0
mean = np.mean(acceleration_array_list)
for value in acceleration_array_list:
variation_from_mean = variation_from_mean+abs(value-mean)
return variation_from_mean/len(acceleration_array_list)
def _feature_extraction(acceleration_array):
feature = dict()
feature['mean'] = np.mean(acceleration_array)
feature['max'] = max(acceleration_array)
feature['min'] = min(acceleration_array)
feature['std'] = np.std(acceleration_array)
feature['median'] = statistics.median(acceleration_array)
feature['L1'] = sum(list(map(abs, acceleration_array)))
feature['MAD'] = pd.Series(acceleration_array).mad()
feature['percent_above_mean'] = percent_above_mean(
acceleration_array)
feature['variation_from_mean'] = variation_from_mean(
acceleration_array)
features_dataframe = pd.DataFrame(feature, index=[0])
return features_dataframe
def _normalization(df):
scaler = pickle.load(open('scaler.sav', 'rb'))
scaler.transform(df)
return df
"""
classification function
"""
def _classification_lable(normalized_features):
classifier = pickle.load(
open('ExtraTreesClassifier.sav', 'rb'))
prediction = dict()
label = classifier.predict(normalized_features).tolist()[0]
if label == 0:
prediction['label'] = 'Hard'
else:
prediction['label'] = 'Easy'
probablity = classifier.predict_proba(normalized_features)
prediction['probability'] = round(max(probablity[0]), 2)
return prediction
def _classification(normalized_features):
label = _classification_lable(normalized_features)
return label
acceleration_array = _create_one_array(message_filtered)
extracted_features = _feature_extraction(acceleration_array)
normalized_features = _normalization(extracted_features)
label = _classification(normalized_features)
logging.info('functions done')
"""
Insert to database
"""
message_final = {**message_filtered, **
message_filtered['LOG_TAG']}
del message_final['error']
del message_final['LOG_TAG']
del message_final['acceleration_array']
message_final['label'] = []
message_final['probability'] = []
message_final['label'] = label['label']
message_final['probability'] = label['probability']
cursor_connection.execute(
'''INSERT into dci_output_lable VALUES (%(MSG_TYPE_TAG)s , %(ATTACHED_DEVICE_SERIAL_NUMBER_TAG)s, %(date_time)s , %(name)s , %(number)s , %(sequence)s , %(label)s , %(probability)s);''', message_final)
connection_db.commit()
logging.info("Insert to database done")
else:
logging.info(" Input data isn't BLOCKS")
可以使用Azure Function
数据工厂中的Azure Function
活动来运行 Azure Function。
相关服务详情:
在 Azure 数据工厂中创建管道并向其添加Azure Function
活动。
(i) 在设置中,指定创建的链接服务。
(ii) 函数名称将是为您的 Azure 函数创建的名称。
(iii) 方法:要调用的函数方法。
(iv) 主体:请求函数调用。
输出:
参考: Azure 函数活动
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.