Pandas dataframe 來自列表值字典

Question

我有一本字典，其中包含列表作為值，例如：

cols = {'animals':['dog','cat','fish'],
        'colors':['red','black','blue','dog']}

我想將其轉換為 dataframe，其中每個列表都根據它們的鍵進行枚舉，結果為

key variable
animals dog
animals cat
animal fish
colors red
colors black
colors blue
colors dog

到目前為止，我已經這樣做了：但它並沒有為我提供預期的結果。

cols_df = pd.DataFrame.from_dict(cols, orient='index')

我該如何修改它以實現上述目標？

Answer 1

這可能不是最快的解決方案，您需要其他列表。

d = {'animals': ['dog','cat','fish'],
     'colors': ['red','black','blue','dog']}

keys = [k for k in d.keys() for v in d[k]]
values = [v for k in d.keys() for v in d[k]]
pd.DataFrame.from_dict({'index': keys, 'values': values})

Answer 2

沒有導入，適用於所有輸入：

>>> pd.DataFrame([(key, var) for (key, L) in cols.items() for var in L], 
                 columns=['key', 'variable'])

       key variable
0  animals      dog
1  animals      cat
2  animals     fish
3   colors      red
4   colors    black
5   colors     blue
6   colors      dog

Answer 3

使用itertools.chain和itertools.repeat ：

import pandas as pd
from itertools import chain, repeat

chainer = chain.from_iterable

d = {'animals': ['dog', 'cat', 'fish'],
     'colors': ['red', 'black', 'blue', 'dog']}

df = pd.DataFrame({'key': list(chainer(repeat(k, len(v)) for k, v in d.items())),
                   'variable': list(chainer(d.values()))})

print(df)

       key variable
0  animals      dog
1  animals      cat
2  animals     fish
3   colors      red
4   colors    black
5   colors     blue
6   colors      dog

Answer 4

pd.DataFrame.from_dict(cols, orient='index').T.unstack().dropna().reset_index(level=1,drop=True)

animals      dog
animals      cat
animals     fish
colors       red
colors     black
colors      blue
colors       dog

我們首先需要將cols填充到相同的長度以防止from_dict(.. orient='columns')失敗。 有兩種方法可以做到：

pd.DataFrame.from_dict(cols, orient='index').T是我在這個答案中找到的一個未記錄的技巧; transpose添加NaN單元格以使結果成矩形
手動替代方案是找到每行填充多少個單元格，例如：
使用df_cols.apply(pd.Series.pad, max(len(c) for c in cols.values())) ... cols['animals'].append(np.NaN)計算填充量df_cols.apply(pd.Series.pad, max(len(c) for c in cols.values())) ... cols['animals'].append(np.NaN)

Answer 5

你可以使用stack ：

df = pd.DataFrame.from_dict(cols, orient='index')
df = df.stack().to_frame().reset_index().drop('level_1', axis=1)
df.columns = ['key', 'variable']

df

key variable
0   colors  red
1   colors  black
2   colors  blue
3   colors  dog
4   animals dog
5   animals cat
6   animals fish

DEMO：

df = pd.DataFrame.from_dict(cols, orient='index')
df

        0   1      2    3
colors  red black  blue dog
animals dog cat    fish None

df.stack()返回一個系列。 需要使用to_frame()將其轉換為數據幀。 之后完成reset_index()以獲得所需的幀。

df.stack().to_frame().reset_index()


 level_0 level_1 0
0   colors  0   red
1   colors  1   black
2   colors  2   blue
3   colors  3   dog
4   animals 0   dog
5   animals 1   cat
6   animals 2   fish

現在drop('level_1', axis=1)並設置列名稱得到預期的幀。

Answer 6

使用 itertools crossproduct 創建一個可以加載到 dataframe 中的鍵/值配對字典

 import itertools

 cols = {'animals':['dog','cat','fish'],
    'colors':['red','black','blue','dog']}

 keys=cols.keys()
 values=cols.values()

 data=[]
 for key,values in cols.items():
     results=itertools.product([key],values)
     for key,item in enumerate(results):
          data.append(item)

 df=pd.DataFrame(data,columns=['category','value'])
 print(df)

output：

  category  value
0  animals    dog
1  animals    cat
2  animals   fish
3   colors    red
4   colors  black
5   colors   blue
6   colors    dog

Pandas dataframe 來自列表值字典

問題描述

6 個解決方案

解決方案1
2 2018-06-07 23:12:48

解決方案2
1 2018-06-07 23:37:38

解決方案3
0 2018-06-07 23:10:00

解決方案4
0 2018-06-07 23:26:47

解決方案5
0 2018-06-08 04:57:37

解決方案6
0 2021-08-23 20:05:06

Pandas dataframe 來自列表值字典

問題描述

6 個解決方案

解決方案1 2 2018-06-07 23:12:48

解決方案2 1 2018-06-07 23:37:38

解決方案3 0 2018-06-07 23:10:00

解決方案4 0 2018-06-07 23:26:47

解決方案5 0 2018-06-08 04:57:37

解決方案6 0 2021-08-23 20:05:06

解決方案1
2 2018-06-07 23:12:48

解決方案2
1 2018-06-07 23:37:38

解決方案3
0 2018-06-07 23:10:00

解決方案4
0 2018-06-07 23:26:47

解決方案5
0 2018-06-08 04:57:37

解決方案6
0 2021-08-23 20:05:06