简体   繁体   English

pandas to dict: to_dict 不存储所有值

[英]pandas to dict: to_dict does not store all values

I have a dataframe df with 40000 rows:我有一个 40000 行的 dataframe df

              0  bin
0      4.506840  4-5
1      4.506840  4-5
2      4.444245  4-5
3      4.485975  4-5
4      4.527705  4-5
...         ...  ...
39995  6.572475  6-7
39996  6.697665  6-7
39997  6.322095  6-7
39998  6.322095  6-7
39999  6.676800  6-7

It stores for every number in column '0' the interval (bin) it belongs to.它为“0”列中的每个数字存储它所属的区间(bin)。 I want to convert it to a dict by:我想通过以下方式将其转换为字典:

dict(zip(df[0],df.bin))

to get an output like:得到一个 output 像:

{4.506840: '4-5', 4.506840: '4-5', 4.444245: '4-5, ... }

so I want to store every value from '0' and the bin it belongs to.所以我想存储'0'和它所属的bin中的每个值。 Somehow my dict has a length of 340, not 40000, so it doesn't store all of the rows.不知何故,我的 dict 的长度为 340,而不是 40000,因此它不会存储所有行。 My question is: why?我的问题是:为什么? And how do I get all 40000 rows in the dict?以及如何在字典中获取所有 40000 行? Cheers!干杯!

Perhaps you have multiple values in that column that are the same, and altough python allows you to store multiple values that are the same, you can't store more than one of the same key.也许您在该列中有多个相同的值,尽管 python 允许您存储多个相同的值,但您不能存储多个相同的键。 I would suggest either you don't use a dict or try to put some identifier on each duplicate in the df before converting it into a dict to distinguish it from other keys.我建议您不要使用 dict 或尝试在 df 中的每个重复项上放置一些标识符,然后再将其转换为 dict 以将其与其他键区分开来。 Another way would be to divide the df where the duplicates are or store all the duplicates in a list, although I'm not sure this is what you desire.另一种方法是划分重复项所在的 df 或将所有重复项存储在一个列表中,尽管我不确定这是你想要的。

Due to the duplicates you have in your df[0] , and due to the fact that you cannot have the same key duplicated in a python dictionary, you can do:由于您在df[0]中存在重复项,并且由于您不能在 python 字典中复制相同的键,因此您可以执行以下操作:

result = {}
for i_0, i_bin in zip(df[0],df.bin):
    if i_0 not in result.keys():
        result[i_0] = []
    result[i_0].append(i_bin)

output: output:

{
    "4.506840": ["4-5", "4-5"],
    "4.444245": ["4-5"],
    ...
}

It depends on what you want to achieve, but this is a way to perceive all the values.这取决于您想要实现的目标,但这是一种感知所有价值的方式。

Edit:编辑:

As per @anky comment, you can make use of pandas aggregation function to do the same instead of the loop.根据@anky 评论,您可以使用 pandas 聚合 function 来代替循环来执行相同的操作。 Definitely, it is of better performance:当然,它具有更好的性能:

df.groupby(0)['bin'].agg(list).to_dict()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM