從三個數據幀動態創建字符串

Question

我有三個數據框，如下所示，一個是 df，另一個是異常：-

d = {'10028': [0], '1058': [25], '20120': [29], '20121': [22],'20122': [0], '20123': [0], '5043': [0], '5046': [0]}
    
    df1 = pd.DataFrame(data=d)

基本上 df 的鏡像副本中的異常只是在異常中，值將為 0 或 1，這表示值為 1 的異常和值為 0 的非異常

d = {'10028': [0], '1058': [1], '20120': [1], '20121': [0],'20122': [0], '20123': [0], '5043': [0], '5046': [0]}

df2 = pd.DataFrame(data=d)

第三個數據框如下：-

d = {'10028': ['US,IN'], '1058': ['NA, JO, US'], '20120': [''], '20121': ['US,PK'],'20122': ['IN'], '20123': ['Us,LN'], '5043': ['AI,AL'], '5046': ['AA,AB']}

df3 = pd.DataFrame(data=d)

我正在使用以下代碼將其轉換為特定格式：-

details = (
        '\n' + 'Metric Name' + '\t' + 'Count' + '\t' + 'Anomaly' + '\t' + 'Country' 
        '\n' + '10028:' + '\t'+ '\t' + str(df1.tail(1)['10028'][0]) + '\t' + str(df2['10028'][0]) + '\t'+ str(df3['10028'][0]) + 
        '\n' + '1058:' + '\t' + '\t' + str(df1.tail(1)['1058'][0]) + '\t' + str(df2['1058'][0]) + '\t'+ str(df3['1058'][0]) +
        '\n' + '20120:' + '\t' +'\t' + str(df1.tail(1)['20120'][0]) + '\t' + str(df2['20120'][0]) + '\t'+ str(df3['20120'][0]) +
        '\n' + '20121:' + '\t' + '\t' +str(round(df1.tail(1)['20121'][0], 2)) + '\t' + str(df2['20121'][0]) + '\t'+ str(df3['20121'][0]) +
        '\n' + '20122:' + '\t' + '\t' +str(round(df1.tail(1)['20122'][0], 2)) + '\t' + str(df2['20122'][0]) + '\t'+str(df3['20122'][0]) +
        '\n' + '20123:' + '\t' + '\t' +str(round(df1.tail(1)['20123'][0], 3)) + '\t' + str(df2['20123'][0]) + '\t'+str(df3['20123'][0]) +
        '\n' + '5043:' + '\t' + '\t' +str(round(df1.tail(1)['5043'][0], 3)) + '\t' + str(df2['5043'][0]) + '\t'+str(df3['5043'][0]) +
        '\n' + '5046:' + '\t' + '\t' +str(round(df1.tail(1)['5046'][0], 3)) + '\t' + str(df2['5046'][0]) + '\t'+str(df3['5046'][0]) +
        '\n\n' + 'message:' + '\t' +
        'Something wrong with the platform as there is a spike in [values where anomalies == 1].'
            )

問題是列值在每次運行中總是在變化我的意思是在這次運行中它'10028', '1058', '20120', '20121', '20122', '20123', '5043', '5046'但也許在下一次運行中它將是'10029', '1038', '20121', '20122', '20123', '5083', '5946'

如何根據數據框中存在的列動態創建詳細信息，因為我不想硬編碼，並且在消息中我想傳遞值為 1 的列的名稱。

對於 df1 和 df2，列的值將始終為 1 或 0，而對於 df3，則為列表或空白。

預期 Output:-

對於兩個數據框，我得到了一個工作解決方案，如下所示：-

# first part of the string
s = '\n' + 'Metric Name' + '\t' + 'Count' + '\t' + 'Anomaly' 

# dynamically add the data
for idx, val in df1.iloc[-1].iteritems():
    s += f'\n{idx}\t{val}\t{df2[idx][0]}' 
# last part
s += ('\n\n' + 'message:' + '\t' +
      'Something wrong with the platform as there is a spike in [values where anomalies == 1].'
     )

如果不存在匹配值，則打印 null

Answer 1

要獲得預期的結果，您可以執行以下操作（輸入數據必須是問題所示的字典，如果不是，請提供真實的輸入數據）：

import pandas as pd

final_d = []
d = {'10028': 0, '1058': 25, '20120': 29, '20121': 22,'20122': 0, '20123': 0, '5043': 0, '5046': 0}
final_d.append(d)

d = {'10028': 0, '1058': 1, '20120': 1, '20121': 0,'20122': 0, '20123': 0, '5043': 0, '5046': 0, '91111':0}
final_d.append(d)

d = {'10028': ['US','IN'], '1058': ['NA', 'JO', 'US'], '20120': [''], '20121': ['US','PK'],'20122': ['IN'], '20123': ['Us','LN'], '5043': ['AI','AL'], '5046': ['AA','AB'], '00000':['kk','dd','ee']}
final_d.append(d)

# Now, we will merge the dictionaries on key
data = {}
for i, dt in enumerate(final_d):
    for k,v in dt.items():
        if k in data:
            if type(v)==list:
                data[k][i] = ','.join(v)
            else:
                data[k][i] = v
        else:
            data[k] = ['']*len(final_d)
            if type(v)==list:
                data[k][i] = ','.join(v)
            else:
                data[k][i] = v
maxlen = max([len(v) for v in data.values()])
data = {k:v if len(v)==maxlen else v+['']*(maxlen-len(v)) for k,v in data.items()}

# Creating the base dataframe
df = pd.DataFrame.from_dict(data)

# Converting the column headers (metric names) into a row in the dataframe
df = pd.concat([pd.DataFrame.from_dict({k:[v] for k,v in zip(df.columns.tolist(), df.columns.tolist())}), df], ignore_index=True)

# removing column names
df.columns = [''] * len(df.columns)

# organising the dataframe according to your required output
result = df.T.reset_index(drop=True)

# Adding the column names as required
result.columns = ['Metric Name', 'Count', 'Anomaly', 'Country']

# Voila!
print(result.to_string(index=False))

生成的dataframe：

Metric Name Count Anomaly   Country
      10028     0       0     US,IN
       1058    25       1  NA,JO,US
      20120    29       1          
      20121    22       0     US,PK
      20122     0       0        IN
      20123     0       0     Us,LN
       5043     0       0     AI,AL
       5046     0       0     AA,AB
      91111             0          
      00000                kk,dd,ee

從三個數據幀動態創建字符串

問題描述

1 個解決方案

解決方案1
1 已采納 2021-02-02 06:44:51

從三個數據幀動態創建字符串

問題描述

1 個解決方案

解決方案1 1 已采納 2021-02-02 06:44:51

解決方案1
1 已采納 2021-02-02 06:44:51