[英]How to implement some python functions with Azure Data Factory activity?
描述場景:我有一個 Azure 服務總線並接收到一個主題的數據。 此外,當我在服務總線上收到一條消息時,我有一個 Azure 函數(服務總線主題觸發器),然后這個函數在這個消息上運行一些函數。 (見下面的代碼)
功能的步驟是(此代碼的詳細信息)
收到一條消息並將其轉換為 JSON
檢查接收到的消息是否有效
再次檢查接收消息的條件
如果上述條件為 TRUE
從接收到的消息中的字段值創建一個數組
對步驟 5 的輸出運行特征提取
對步驟 6 的輸出運行規范化
對步驟 7 的輸出運行分類並將標簽添加到接收到的消息中
步驟8的輸出,插入到數據庫
現在,我想知道如何使用數據工廠活動作為管道來實現和運行這些功能(步驟)。 (或有關此場景的其他指南和建議)
我的代碼是:
import logging
import json
import pickle
import statistics
import config
import psycopg2
import pandas as pd
import numpy as np
import azure.functions as func
def main(message: func.ServiceBusMessage):
connection_db = psycopg2.connect(
f"host={config.database_url} dbname=developer user={config.database_username} password={config.database_password}")
cursor_connection = connection_db.cursor()
"""
this functions validate and filters data with the folloeing criteria:
message_type==50
logical_id=='BLOCK'
"""
message_body = message.get_body().decode("utf-8")
message_body = message_body.replace(";", ",")
message_json = json.loads(message_body)
print("Json Converted")
if message_json['error'] == {} and message_json['MSG_TYPE_TAG'] != '':
logging.info("Data is Valid")
else:
logging.info("Data Not Valid")
if int(message_json['MSG_TYPE_TAG']) == 50 and message_json['GET_RIGIDSENSE_SENSOR_ACCELDATA_LOG_TAG']['logical_name'] == 'BLOCK':
message_filtered = message_json
"""
this functions makes one array from the recieved array data
"""
def _create_one_array(message_filtered):
acceleration_array_of_all = []
temp_array = message_filtered['GET_RIGIDSENSE_SENSOR_ACCELDATA_LOG_TAG']['acceleration_array']
for value in temp_array:
acceleration_array_of_all.append(value)
return acceleration_array_of_all
"""
features extraction functions
"""
def percent_above_mean(acceleration_array_list):
percent_above_mean = 0
mean = np.mean(acceleration_array_list)
for i in acceleration_array_list:
if i > mean:
percent_above_mean += 1
return percent_above_mean/len(acceleration_array_list)
def variation_from_mean(acceleration_array_list):
variation_from_mean = 0
mean = np.mean(acceleration_array_list)
for value in acceleration_array_list:
variation_from_mean = variation_from_mean+abs(value-mean)
return variation_from_mean/len(acceleration_array_list)
def _feature_extraction(acceleration_array):
feature = dict()
feature['mean'] = np.mean(acceleration_array)
feature['max'] = max(acceleration_array)
feature['min'] = min(acceleration_array)
feature['std'] = np.std(acceleration_array)
feature['median'] = statistics.median(acceleration_array)
feature['L1'] = sum(list(map(abs, acceleration_array)))
feature['MAD'] = pd.Series(acceleration_array).mad()
feature['percent_above_mean'] = percent_above_mean(
acceleration_array)
feature['variation_from_mean'] = variation_from_mean(
acceleration_array)
features_dataframe = pd.DataFrame(feature, index=[0])
return features_dataframe
def _normalization(df):
scaler = pickle.load(open('scaler.sav', 'rb'))
scaler.transform(df)
return df
"""
classification function
"""
def _classification_lable(normalized_features):
classifier = pickle.load(
open('ExtraTreesClassifier.sav', 'rb'))
prediction = dict()
label = classifier.predict(normalized_features).tolist()[0]
if label == 0:
prediction['label'] = 'Hard'
else:
prediction['label'] = 'Easy'
probablity = classifier.predict_proba(normalized_features)
prediction['probability'] = round(max(probablity[0]), 2)
return prediction
def _classification(normalized_features):
label = _classification_lable(normalized_features)
return label
acceleration_array = _create_one_array(message_filtered)
extracted_features = _feature_extraction(acceleration_array)
normalized_features = _normalization(extracted_features)
label = _classification(normalized_features)
logging.info('functions done')
"""
Insert to database
"""
message_final = {**message_filtered, **
message_filtered['LOG_TAG']}
del message_final['error']
del message_final['LOG_TAG']
del message_final['acceleration_array']
message_final['label'] = []
message_final['probability'] = []
message_final['label'] = label['label']
message_final['probability'] = label['probability']
cursor_connection.execute(
'''INSERT into dci_output_lable VALUES (%(MSG_TYPE_TAG)s , %(ATTACHED_DEVICE_SERIAL_NUMBER_TAG)s, %(date_time)s , %(name)s , %(number)s , %(sequence)s , %(label)s , %(probability)s);''', message_final)
connection_db.commit()
logging.info("Insert to database done")
else:
logging.info(" Input data isn't BLOCKS")
可以使用Azure Function
數據工廠中的Azure Function
活動來運行 Azure Function。
相關服務詳情:
在 Azure 數據工廠中創建管道並向其添加Azure Function
活動。
(i) 在設置中,指定創建的鏈接服務。
(ii) 函數名稱將是為您的 Azure 函數創建的名稱。
(iii) 方法:要調用的函數方法。
(iv) 主體:請求函數調用。
輸出:
參考: Azure 函數活動
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.