將列表從大量字典轉換為數據幀的問題

Question

我用這種方式創建了一個字典：

數據如下所示：

GDS3:
ABC_1     ABC_2     BBB_1
cat        elf       123
dog        run       456
bird       burp      789

GDS4:
ABC_3     ABC_4     BCB_a
beer        yes      234
wine        no       543
gin         yes      743

GDS5:
ABC_5     ABC_6     BCD_c
lol        yea       543
lmao       NaN       446
asl        NaN       777

#create a dictionary in which all columns that start with the same 3 characters will be grouped in the same key. 
dict_2013 = {k: g for k, g in GDS3.groupby(by=lambda x: x[:3].lower(), axis=1)}

dict_2014 = {k: g for k, g in GDS4.groupby(by=lambda x: x[:3].lower(), axis=1)}

dict_2015 = {k: g for k, g in GDS5.groupby(by=lambda x: x[:3].lower(), axis=1)}

#start with year 2013:
global_dict=dict_2013

#if key in the new dictionary is in the old dictionary then 
#add the values from the new dictionary key to the old dictionary key
#else if the new dictionary key does not exist in the old dictionary then add a new key with the new values

for key,val in dict_2014.items():
    if key in global_dict:
       global_dict[key]=[global_dict[key],val]
    else:
       global_dict[key]=val

for key,val in dict_2015.items():#to add items
    if key in global_dict:
        global_dict[key]=[global_dict[key],val]
    else:
       global_dict[key]=val

這是我想要的輸出（每個鍵的數據幀）

  df_ABC:
  ABC_1     ABC_2     ABC_3   ABC_4   ABC_5
  cat        elf       beer    yes    lol
  dog        run       win     no     lmao
  bird       burp      gin     yes    asl

  df_BBB:
  BBB_1
  cat   
  dog        
  bird

換句話說，我想將單個鍵轉換為單獨的詞典（FOR ALL OF THE KEYS），所以我嘗試了以下內容：

ABC_dataframe=pd.DataFrame(global_dict['ABC'])

當我這樣做時，我收到以下錯誤：

TypeError: Expected list, got DataFrame

這很奇怪，因為global_dict ['ABC']是一個列表。 （我使用類型檢查（global_dict ['ABC']）。

我該怎么做才能糾正這個問題？ 我試過扁平化列表，但我仍然遇到問題。

Answer 1

邏輯中最令人困惑的部分是使global_dict值為數據幀或列表。 保持對象類型一致; 選擇列表並在每次添加值時附加到它。

Pythonic解決方案是使用list對象的collections.defaultdict ：

from collections import defaultdict

global_dict = defaultdict(list, {k: [v] for k, v in dict_2013.items()})

for key,val in dict_2014.items():
    global_dict[key].append(val)

for key,val in dict_2015.items():
    global_dict[key].append(val)

然后沿axis=1使用pd.concat ：

abc = pd.concat(global_dict['abc'], axis=1)

print(abc)

  ABC_1 ABC_2 ABC_3 ABC_4 ABC_5 ABC_6
0   cat   elf  beer   yes   lol   yea
1   dog   run  wine    no  lmao   NaN
2  bird  burp   gin   yes   asl   NaN

我無法解釋為什么你想要的結果缺少ABC_6 。

Answer 2

如果GDS3，GDS4和GSD5已經是數據幀，您可以使用pd.concat和groupby ：

tdf = pd.concat([GDS3, GDS4, GDS5], axis=1)

g = tdf.groupby(tdf.columns.str[:3], axis=1)

# Now, let's create a dictionary of dataframes grouped 
# by the first three letters of each column.

df_list = {}
for n, i in g:
    df_list[n] = i


print(df_list['ABC'])
print(df_list['BBB'])

或者@jpp建議使用：

dict_dfs = dict(tuple(g))

print(dict_dfs['ABC'])
print(dict_dfs['BBB'])

輸出：

  ABC_1 ABC_2 ABC_3 ABC_4 ABC_5 ABC_6
0   cat   elf  beer   yes   lol   yea
1   dog   run  wine    no  lmao   NaN
2  bird  burp   gin   yes   asl   NaN
   BBB_1
0    123
1    456
2    789

將列表從大量字典轉換為數據幀的問題

問題描述

2 個解決方案

解決方案1
2 已采納 2018-09-03 23:33:46

解決方案2
2 2018-09-03 23:42:00

將列表從大量字典轉換為數據幀的問題

問題描述

2 個解決方案

解決方案1 2 已采納 2018-09-03 23:33:46

解決方案2 2 2018-09-03 23:42:00

解決方案1
2 已采納 2018-09-03 23:33:46

解決方案2
2 2018-09-03 23:42:00