[英]How can I convert a Pandas DataFrame to a three level nested dictionary?
How can I convert a Pandas DataFrame to a three level nested dictionary using column names?如何使用列名将 Pandas DataFrame 转换为三级嵌套字典?
The columns are not first three columns and I want it to group by column artist
then group by column album
, and I need it to be case insensitive, preferably without using defaultdict.列不是前三列,我希望它按列artist
分组,然后按列album
分组,我需要它不区分大小写,最好不使用 defaultdict。
This is a minimal reproducible example:这是一个最小的可重现示例:
from collections import defaultdict
from itertools import product
from pandas import DataFrame
tree = defaultdict(lambda: defaultdict(dict))
columns = {'a': str(), 'b': str(), 'c': str(), 'd': int(), 'e': int(), 'f': int()}
df = DataFrame(columns, index=[])
for i, j, k in product('abcd', repeat=3):
tree[i][j][k] = list(map('abcd'.index, (i, j, k)))
df.loc[len(df)] = [i, j, k, *list(map('abcd'.index, (i, j, k)))]
How can I get a nested dictionary similar to tree
from df
?如何从df
获取类似于tree
的嵌套字典?
I am really sorry I can provide any actual examples because they wouldn't be minimal.我真的很抱歉我可以提供任何实际的例子,因为它们不会是最小的。
I tried to use .groupby()
but I only ever saw it being used with one column and I really don't know what to do with the pandas.core.groupby.generic.DataFrameGroupBy
object it returns, I just started using it today.我尝试使用.groupby()
但我只看到它与一列一起使用,我真的不知道如何处理它返回的pandas.core.groupby.generic.DataFrameGroupBy
对象,我今天才开始使用它.
Currently I can do this:目前我可以这样做:
tree1 = dict()
for index, row in df.iterrows():
if not tree1.get(row['a'].lower()):
tree1[row['a'].lower()] = dict()
if not tree1[row['a'].lower()].get(row['b'].lower()):
tree1[row['a'].lower()][row['b'].lower()] = dict()
tree1[row['a'].lower()][row['b'].lower()][row['c'].lower()] = [row['d'], row['e'], row['f']]
I actually implemented case insensitive str
and dict
but for the sake of brevity (they are very long) I wouldn't use it here.我实际上实现了不区分大小写的str
和dict
但为了简洁起见(它们很长)我不会在这里使用它。
But according to this answer https://stackoverflow.com/a/55557758/16383578 such method is bad, what is a better way?但是根据这个答案https://stackoverflow.com/a/55557758/16383578这种方法不好,有什么更好的方法?
I would probably do it like this:我可能会这样做:
cols = ['a', 'b', 'c']
for col in cols:
df[col] = df[col].str.casefold()
tree = {}
for (a, b, c), values in (df.set_index(cols).T.to_dict(orient='list')
.items()):
tree.setdefault(a, {}).setdefault(b, {})[c] = values
or要么
...
for (a, b, c), values in (df.set_index(cols).apply(list, axis=1)
.to_dict()).items():
tree.setdefault(a, {}).setdefault(b, {})[c] = values
This produces the same result (when the first part that casefolds is included)这会产生相同的结果(当包含折叠的第一部分时)
def to_dict(df):
return df.set_index(df.columns[0]).iloc[:, 0].to_dict()
df['values'] = df[['d', 'e', 'f']].apply(list, axis=1)
df = df[['a', 'b', 'c', 'values']]
tree = (df.set_index(['a', 'b'])
.groupby(['a', 'b']).apply(to_dict)
.reset_index('b')
.groupby('a').apply(to_dict)
.to_dict())
but I think it's a bit too convoluted.但我认为这有点太复杂了。
Results:结果:
{'a': {'a': {'a': [0, 0, 0], 'b': [0, 0, 1], 'c': [0, 0, 2], 'd': [0, 0, 3]},
'b': {'a': [0, 1, 0], 'b': [0, 1, 1], 'c': [0, 1, 2], 'd': [0, 1, 3]},
'c': {'a': [0, 2, 0], 'b': [0, 2, 1], 'c': [0, 2, 2], 'd': [0, 2, 3]},
'd': {'a': [0, 3, 0], 'b': [0, 3, 1], 'c': [0, 3, 2], 'd': [0, 3, 3]}},
'b': {'a': {'a': [1, 0, 0], 'b': [1, 0, 1], 'c': [1, 0, 2], 'd': [1, 0, 3]},
'b': {'a': [1, 1, 0], 'b': [1, 1, 1], 'c': [1, 1, 2], 'd': [1, 1, 3]},
'c': {'a': [1, 2, 0], 'b': [1, 2, 1], 'c': [1, 2, 2], 'd': [1, 2, 3]},
'd': {'a': [1, 3, 0], 'b': [1, 3, 1], 'c': [1, 3, 2], 'd': [1, 3, 3]}},
'c': {'a': {'a': [2, 0, 0], 'b': [2, 0, 1], 'c': [2, 0, 2], 'd': [2, 0, 3]},
'b': {'a': [2, 1, 0], 'b': [2, 1, 1], 'c': [2, 1, 2], 'd': [2, 1, 3]},
'c': {'a': [2, 2, 0], 'b': [2, 2, 1], 'c': [2, 2, 2], 'd': [2, 2, 3]},
'd': {'a': [2, 3, 0], 'b': [2, 3, 1], 'c': [2, 3, 2], 'd': [2, 3, 3]}},
'd': {'a': {'a': [3, 0, 0], 'b': [3, 0, 1], 'c': [3, 0, 2], 'd': [3, 0, 3]},
'b': {'a': [3, 1, 0], 'b': [3, 1, 1], 'c': [3, 1, 2], 'd': [3, 1, 3]},
'c': {'a': [3, 2, 0], 'b': [3, 2, 1], 'c': [3, 2, 2], 'd': [3, 2, 3]},
'd': {'a': [3, 3, 0], 'b': [3, 3, 1], 'c': [3, 3, 2], 'd': [3, 3, 3]}}}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.