从dict更新pandas df

Question

I'd like to update the 'frequency' column in my df 'co_names_df_1' from the values in the dict 'counts': 我想根据字典“ counts”中的值更新df“ co_names_df_1”中的“ frequency”列：

counts:
Counter({u'Apple': 1638, u'Facebook': 1169, u'Amazon': 1027, u'Boeing': 548, u'Microsoft': 437, u'JPMorgan': 435, u'Nasdaq': 364, u'Williams': 296, u'Disney': 270, u'Netflix': 260, u'Chevron': 258, u'Comcast': 213, u'CBS': 200, u'Carnival': 193, u'Intel': 188, u'IBM': 172, u'Starbucks': 165, u'Target': 143, u'Monsanto': 141, u'PayPal': 133, u'Viacom': 126, u'Equifax': 124, u'Anthem': 123, u'Pfizer': 121, u'Nike': 121, u'Caterpillar': 119, u'Citigroup': 116, u'AIG': 116, u'HP': 109, u'Aetna': 109, u'BlackRock': 109 ...

co_names_df_1:
         Name          Frequency
0        3M            0
1        A.O. Smith    0
2        Abbott        0
3        AbbVie        0
4        Accenture     0
5        Activision    0
6        Acuity Brands 0 ...

Answer 1

The following iterates through the keys in counts and sets the Frequency value in your dataframe, co_names_df_1 , to the value associated with that key in counts . 以下内容循环遍历counts的键，并将数据帧中的Frequency值co_names_df_1设置为与counts中与该键相关联的值。

from collections import Counter

counts = Counter({u'Apple': 1638, u'Facebook': 1169, u'Amazon': 1027, u'Boeing': 548,})

for x in counts:
    co_names_df_1['Frequency'][co_names_df_1['Name']==x] = counts[x]  # updates dataframe values based on those in counts

Update: 更新：

Using pandas' .map() method as follows appears to run faster than the above for loop (at least on this small sample set of 4 key:value pairs). 如下所示，使用pandas的.map()方法似乎比上述for循环的运行速度更快（至少在这由4个key：value对组成的小样本集上）。

co_names_df_1['Frequency'] = co_names_df_1['Name'].map(counts)

Using %%time in a jupyter notebook cell, the .map() approach takes ~488 µs to run whereas the for loop approach takes ~1.24s 在jupyter笔记本单元中使用%%time ， .map()方法需要约488 µs来运行，而for循环方法则需要约1.24s

Answer 2

You could use Series.map : 您可以使用Series.map ：

import collections
import pandas as pd
c = collections.Counter({u'Apple': 1638, u'Facebook': 1169, u'Amazon': 1027, u'Boeing': 548, u'Microsoft': 437, u'JPMorgan': 435, u'Nasdaq': 364, u'Williams': 296, u'Disney': 270, u'Netflix': 260, u'Chevron': 258, u'Comcast': 213, u'CBS': 200, u'Carnival': 193, u'Intel': 188,
                         u'IBM': 172, u'Starbucks': 165, u'Target': 143, u'Monsanto': 141, u'PayPal': 133, u'Viacom': 126, u'Equifax': 124, u'Anthem': 123, u'Pfizer': 121, u'Nike': 121, u'Caterpillar': 119, u'Citigroup': 116, u'AIG': 116, u'HP': 109, u'Aetna': 109, u'BlackRock': 109})
df = pd.DataFrame({'Name': {0: '3M',
                            1: 'A.O. Smith',
                            2: 'Abbott',
                            3: 'AbbVie',
                            4: 'Accenture',
                            5: 'Activision',
                            6: 'Acuity Brands',
                            7: 'AIG'},
                   'Frequency': {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0, 7: 10}})

df['Frequency'] = df['Name'].map(c)
print(df)

yields 产量

            Name  Frequency
0             3M          0
1     A.O. Smith          0
2         Abbott          0
3         AbbVie          0
4      Accenture          0
5     Activision          0
6  Acuity Brands          0
7            AIG        116

I added a row to df to show a non-trivial result. 我向df添加了一行以显示非平凡的结果。

When there isn't a corresponding key in c , Series.map(c) leaves the Series alone. 当c没有对应的键时， Series.map(c)会Series.map(c) Series。 Thus only rows with a corresponding key in c get updated. 因此，只有在c具有相应键的行才被更新。

从dict更新pandas df

问题描述

2 个解决方案

解决方案1
0 2019-01-30 18:34:37

解决方案2
0 2019-01-30 18:47:07

从dict更新pandas df

问题描述

2 个解决方案

解决方案1 0 2019-01-30 18:34:37

解决方案2 0 2019-01-30 18:47:07

解决方案1
0 2019-01-30 18:34:37

解决方案2
0 2019-01-30 18:47:07