嵌套字典錯誤-Python Pandas

Question

我有以下代碼：

import os
import pandas as pd 
from pandas import ExcelWriter
from pandas import ExcelFile

fileName= input("Enter file name here (Case Sensitve) > ")
df = pd.read_excel(fileName +'.xlsx', sheetname=None, ignore_index=True)
xl = pd.ExcelFile(fileName +'.xlsx')
SystemCount= len(xl.sheet_names)
df1 = pd.DataFrame([])

for y in range(1, int(SystemCount)+ 1): 
    df = pd.read_excel(xl,'System ' + str(y))  #reads each sheet
    df['System {0}'.format(y)] = "1"  #adds a column for each system, sets the column = 1
    df1 = df1.append(df)  #appends all sheets together into a new df


df1 = df1.sort_values(['Email']) #sorts by email
df = df1['Email'].value_counts() #counts the amount each email shows
df1['Count'] = df1.groupby('Email')['Email'].transform('count') #adds the count to the end


df1 = df1.apply(lambda x : pd.to_numeric(x,errors='ignore')) #turns ints to floats
d = dict(zip(df1.columns[1:],['sum']*df1.columns[1:].str.contains('System').sum()+['first'])) #adds up each row
df1 = df1.fillna(0).groupby('Email').agg(d) #turns NAN into 0 and groups everything together
df1 = df1.reset_index() #email column was turned into an index with above line, this turns it back to a df column


SystemsList = []#creates empty list
for count in range(1, int(SystemCount)+1): #counts up to the system amount
    SystemsList.append(['System {0}'.format(count)]) #creates list of systems

SystemDict = {}
for item in SystemsList:
    SystemDict[item]=df1[df1[item]== 1]["Email"]

它沿着（輸出的一小段）輸出內容：

 Email          System 1  System 2 System 3 System 4 Count
    test1@test.com    0     1       0        1           2
    test2@test.com    1     0       0        1           2
    test3@test.com    1     1       0        1           3
    test4@test.com    1     0       1        0           2

我正在嘗試為每個系統創建一個嵌套的字典，使用以下代碼部分將電子郵件放置在每個顯示1的位置：

SystemDict = {}
    for item in SystemsList:
        SystemDict[item]=df1[df1[item]== 1]["Email"]

但是我收到以下錯誤-ValueError：條件而不是float64所期望的布爾數組。 有想法該怎么解決這個嗎？

Answer 1

這是一種方法。

import pandas as pd

lst = [['test1@test.com', 0, 1, 0, 1, 2],
       ['test2@test.com', 1, 0, 0, 1, 2],
       ['test3@test.com', 1, 1, 0, 1, 3],
       ['test4@test.com', 1, 0, 1, 0, 1]]

df = pd.DataFrame(lst, columns=['Email', 'System 1', 'System 2',
                                'System 3', 'System 4', 'Count'])

d = {'System'+str(i): list(filter(None, df['System '+str(i)]*df['Email'])) \
                      for i in range(1, 5)}

# {'System1': ['test2@test.com', 'test3@test.com', 'test4@test.com'],
#  'System2': ['test1@test.com', 'test3@test.com'],
#  'System3': ['test4@test.com'],
#  'System4': ['test1@test.com', 'test2@test.com', 'test3@test.com']}

嵌套字典錯誤-Python Pandas

問題描述

1 個解決方案

解決方案1
0 2018-02-21 22:50:37

嵌套字典錯誤-Python Pandas

問題描述

1 個解決方案

解決方案1 0 2018-02-21 22:50:37

解決方案1
0 2018-02-21 22:50:37