[英]Insert Values into a Dictionary From a DF Column - Pandas (Python)
I have a list of values that I am trying to match with a column of a pandas df
and then would like to create a dictionary that will have list values as keys and then dictionary values from a different column from the data frame. 我有一个要与pandas df
的列匹配的值列表,然后想创建一个字典,该字典将具有列表值作为键,然后将字典值从数据框的另一列中提取。
This is how I have my list: 这是我的清单:
sample_list = [101,105,112]
My Data Frame: 我的数据框:
sample_df = pd.DataFrame([[101, "NJ"], [105, "CA"],[111, "MO"], [101, "NJ"], [112, "NB"], [101, "NJ"], [105, "CA"]], \
columns=["Col1", "Col2"])
looks like this, 看起来像这样
Col1 Col2
0 101 NJ
1 105 CA
2 111 MO
3 101 NJ
4 112 NB
5 101 NJ
6 105 CA
Now, I am trying to iterate list values (which are keys of my new_dict
)and match them with Col1
and if they match I would like to extract Col2
values as my dictionary values. 现在,我尝试迭代列表值(这是我的new_dict
键),并将其与Col1
匹配,如果它们匹配,我想将Col2
值提取为字典值。 This is how I have my code so far, 到目前为止,这就是我的代码,
new_dict = {}
for value in sample_list:
for i in sample_df['Col1']:
if value == i:
new_dict[value] = [i for i in sample_df['Col2']]
However, my new_dict
looks like this, 但是,我的new_dict
看起来像这样,
{101: ['NJ', 'CA', 'MO', 'NJ', 'NB', 'NJ', 'CA'],
105: ['NJ', 'CA', 'MO', 'NJ', 'NB', 'NJ', 'CA'],
112: ['NJ', 'CA', 'MO', 'NJ', 'NB', 'NJ', 'CA']}
I need my output like this, 我需要这样的输出,
{101: ['NJ'],
105: ['CA'],
112: ['NB']}
How can I get to my desired output? 如何获得所需的输出? Any help would be nice. 你能帮忙的话,我会很高兴。
这样做:
new_dict = {i: [sample_df[sample_df['Col1']==i]['Col2'].values[0]] for i in sample_list}
If you insist here is another solution that should be efficient by using isin()
to create a mask used to filter away not desired rows. 如果您坚持认为这是另一种解决方案,则应该使用isin()
创建用于过滤掉不需要的行的掩码来提高效率。
m = sample_df['Col1'].isin(sample_list)
sample_df[m].drop_duplicates().groupby('Col1')['Col2'].apply(list).to_dict()
Returns: {101: ['NJ'], 105: ['CA'], 112: ['NB']}
返回值: {101: ['NJ'], 105: ['CA'], 112: ['NB']}
note: if there are more non-unique combos they will be in the list too . 注意:如果有更多非唯一组合,它们也将出现在列表中 。 Use: {k:[v] for k,v in sample_df[m].groupby('Col1')['Col2'].first().items()}
if you only want the first. 使用: {k:[v] for k,v in sample_df[m].groupby('Col1')['Col2'].first().items()}
如果您只想要第一个。
If you are going for list items but not all why not just the values? 如果要使用列表项而不是全部,为什么不只是值?
m = sample_df['Col1'].isin(sample_list)
sample_df[m].set_index('Col1')['Col2'].to_dict()
Returns: {101: 'NJ', 105: 'CA', 112: 'NB'}
返回值: {101: 'NJ', 105: 'CA', 112: 'NB'}
or if you want all the items: 或者如果您想要所有物品:
m = sample_df['Col1'].isin(sample_list)
sample_df[m].groupby('Col1')['Col2'].apply(list).to_dict()
Returns: {101: ['NJ', 'NJ', 'NJ'], 105: ['CA', 'CA'], 112: ['NB']}
返回值: {101: ['NJ', 'NJ', 'NJ'], 105: ['CA', 'CA'], 112: ['NB']}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.