简体   繁体   English

附加不存在的行数据时,来自 pd.Crosstab 的 Multiindex 问题

[英]Problems with Multiindex from pd.Crosstab when appending non present row data

Morning All,大家早,

Happy Friday.星期五快乐。 I have some excel output which shows Client , their Sector and the counts of their Result via a crosstab command.我有一些 excel output 通过交叉表命令显示Client ,他们的Sector和他们的Result的计数。 This works well:这很好用:

dfAll_Clients = {'All_Client': ['AAA','BBB','CCC','DDD','EEE','FFF'],
                'City': ['SY','LN','NY','TO','TK','LA']}
dfAll_Clients = pd.DataFrame.from_dict(dfAll_Clients)
df = {  'Client': ['AAA', 'AAA', 'AAA',
                 'BBB', 'BBB', 'BBB', 'BBB','BBB','BBB','BBB',
                 'CCC',
                'DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD','DDD'],
        'Sector': ['GOV', 'GOV', 'CORP',
                 'GOV', 'GOV', 'GOV', 'GOV','CORP','CORP','CORP',
                 'GOV',
                 'GOV','GOV','GOV','GOV','GOV','GOV','GOV','GOV','GOV','CORP'],
        'Result': ['Covered', 'Customer Reject', 'Customer Timeout',
               'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','No RFQ',
               'No RFQ',
               'Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','No RFQ']
      }
df = pd.DataFrame.from_dict(df)
# print(df)

vals = ['Covered',
'Customer Reject',
'Customer Timeout',
'Dealer Reject',
'Dealer Timeout',
'Done',
'No RFQ',
'Tied Covered',
'Tied Done',
'Tied Traded Away',
'Traded Away']

df = (pd.crosstab([df.Client,
                  df.Sector],
                 df.Result,
                 margins=True,
                 margins_name='Total_Result_Per_Client')
        .drop('Total_Result_Per_Client')
        .reindex(vals + ['Total_Result_Per_Client'], axis=1, fill_value=0))
# Total Priced Back = (All RFQ's - Dealer Reject - Dealer_Timeout) / All RFQ's
df['Total_Priced_Back'] = (df['Total_Result_Per_Client']- df['Dealer Reject'] - df['Dealer Timeout']) / (df['Total_Result_Per_Client'])
# Hit_Rate = (Done + Tied Done) / Total RFQ's less Customer Reject and Customer Timeout
df['Hit_Rate'] = (df['Done'] + df['Tied Done']) / (df['Total_Result_Per_Client']- df['Customer Reject'] - df['Customer Timeout'])
# Populate any nulls due to 0/0
df = df.fillna(0)
# Format Pct cols
decimals2 = 2
df['Total_Priced_Back'] = df['Total_Priced_Back'].apply(lambda x: round(x * 100, decimals2)).astype(str) + '%'
df['Hit_Rate'] = df['Hit_Rate'].apply(lambda x: round(x * 100, decimals2)).astype(str) + '%'
print (df)
df.to_excel('C:\Temp\Out_Data_EOM_Key_Clients_Corp.xlsx')

The excel extract meets the requirements. excel 提取物符合要求。

在此处输入图像描述

An additional request has come in to add all other possible clients which are not present in the current months data, but may be in futures months.一个额外的请求已经出现,以添加当前月份数据中不存在但可能在未来月份中的所有其他可能的客户。 For each Client with no data add a row in the crosstab, and for each field insert;对于没有数据的每个Client ,在交叉表中添加一行,并为每个字段插入; #NA . #NA The final output would be:最终的 output 将是:

在此处输入图像描述

I looked to add theses rows via:我希望通过以下方式添加这些行:

# Get list of all possible clients
dfAll_Clients = pd.DataFrame.from_dict(dfAll_Clients.All_Client)
new_index = tuple(list(dfAll_Clients.All_Client))
print(new_index)
# Append clients not present in current row entries
dfTemp = df.reindex(new_index, fill_value=0)
print(dfTemp)

The issue is that results of the crosstab are multiindexed.问题是交叉表的结果是多索引的。 I tried flattening the crosstab output with df = df.stack([0]).reset_index() but this changed the structure completely and deviates from the final output entirely.我尝试使用df = df.stack([0]).reset_index()平交叉表 output 但这完全改变了结构并完全偏离了最终的 output。 I now get TypeError: Expected tuple, got str我现在得到TypeError: Expected tuple, got str

Any help would be appreciated.任何帮助,将不胜感激。

You can try reindex您可以尝试reindex

#here E and F (l) you can get it by 

cond = dfAll_Clients.All_Client.isin(df.index.get_level_values(0))
l = dfAll_Clients.loc[~cond,'All_Client'].unique().tolist()
l = [(x, None)for x in l]
df = df.reindex(pd.MultiIndex.from_tuples(df.index.tolist()+l))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM