[英]Issues converting lists from massive dictionary to dataframe
我用這種方式創建了一個字典:
數據如下所示:
GDS3:
ABC_1 ABC_2 BBB_1
cat elf 123
dog run 456
bird burp 789
GDS4:
ABC_3 ABC_4 BCB_a
beer yes 234
wine no 543
gin yes 743
GDS5:
ABC_5 ABC_6 BCD_c
lol yea 543
lmao NaN 446
asl NaN 777
#create a dictionary in which all columns that start with the same 3 characters will be grouped in the same key.
dict_2013 = {k: g for k, g in GDS3.groupby(by=lambda x: x[:3].lower(), axis=1)}
dict_2014 = {k: g for k, g in GDS4.groupby(by=lambda x: x[:3].lower(), axis=1)}
dict_2015 = {k: g for k, g in GDS5.groupby(by=lambda x: x[:3].lower(), axis=1)}
#start with year 2013:
global_dict=dict_2013
#if key in the new dictionary is in the old dictionary then
#add the values from the new dictionary key to the old dictionary key
#else if the new dictionary key does not exist in the old dictionary then add a new key with the new values
for key,val in dict_2014.items():
if key in global_dict:
global_dict[key]=[global_dict[key],val]
else:
global_dict[key]=val
for key,val in dict_2015.items():#to add items
if key in global_dict:
global_dict[key]=[global_dict[key],val]
else:
global_dict[key]=val
這是我想要的輸出(每個鍵的數據幀)
df_ABC:
ABC_1 ABC_2 ABC_3 ABC_4 ABC_5
cat elf beer yes lol
dog run win no lmao
bird burp gin yes asl
df_BBB:
BBB_1
cat
dog
bird
換句話說,我想將單個鍵轉換為單獨的詞典(FOR ALL OF THE KEYS),所以我嘗試了以下內容:
ABC_dataframe=pd.DataFrame(global_dict['ABC'])
當我這樣做時,我收到以下錯誤:
TypeError: Expected list, got DataFrame
這很奇怪,因為global_dict ['ABC']是一個列表。 (我使用類型檢查(global_dict ['ABC'])。
我該怎么做才能糾正這個問題? 我試過扁平化列表,但我仍然遇到問題。
邏輯中最令人困惑的部分是使global_dict
值為數據幀或列表。 保持對象類型一致; 選擇列表並在每次添加值時附加到它。
Pythonic解決方案是使用list
對象的collections.defaultdict
:
from collections import defaultdict
global_dict = defaultdict(list, {k: [v] for k, v in dict_2013.items()})
for key,val in dict_2014.items():
global_dict[key].append(val)
for key,val in dict_2015.items():
global_dict[key].append(val)
然后沿axis=1
使用pd.concat
:
abc = pd.concat(global_dict['abc'], axis=1)
print(abc)
ABC_1 ABC_2 ABC_3 ABC_4 ABC_5 ABC_6
0 cat elf beer yes lol yea
1 dog run wine no lmao NaN
2 bird burp gin yes asl NaN
我無法解釋為什么你想要的結果缺少ABC_6
。
如果GDS3,GDS4和GSD5已經是數據幀,您可以使用pd.concat
和groupby
:
tdf = pd.concat([GDS3, GDS4, GDS5], axis=1)
g = tdf.groupby(tdf.columns.str[:3], axis=1)
# Now, let's create a dictionary of dataframes grouped
# by the first three letters of each column.
df_list = {}
for n, i in g:
df_list[n] = i
print(df_list['ABC'])
print(df_list['BBB'])
或者@jpp建議使用:
dict_dfs = dict(tuple(g))
print(dict_dfs['ABC'])
print(dict_dfs['BBB'])
輸出:
ABC_1 ABC_2 ABC_3 ABC_4 ABC_5 ABC_6
0 cat elf beer yes lol yea
1 dog run wine no lmao NaN
2 bird burp gin yes asl NaN
BBB_1
0 123
1 456
2 789
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.