如何有效替换熊猫DataFrame上的值？

Question

I've got a large DataFrame(600k,2) named data and basically I have in the second column a set of 50k unique values distributed along the data. 我有一个大的DataFrame（600k，2）命名数据，基本上在第二列中有一组50k沿数据分布的唯一值。

The data looks like this 数据看起来像这样

    image_id     term 
0   56127        23001  
1   56127        763003  
2   56127        51002  
3   26947        581007  
4   26947        14001  
5   26947        95000  
6   26947        92000  
7   26947        62004  
8   26947        224007
...600k more

On the other hand I have a Series named terms_indexed with an index composed of this 50k terms like this. 另一方面，我有一个名为terms_indexed的系列，其索引由这样的50k个术语组成。

            NewTerm
Term                  
23001          9100
763003          402
51002         10608
581007          900
14001         42107
95000           900
92000          4002
62004         42107
224007         9100
...50k more

But I want to reemplace those values in the original DataFrame efficiently using the Series with the indexed terms. 但是我想使用带有索引项的系列将这些值有效地重新放置在原始DataFrame中。 So far I have done it with the following line 到目前为止，我已经完成了以下代码

for i in range(data.shape[0]):
        data.loc[i, 'term'] = int(terms_indexed.ix[data.iloc[i][1]])

However it takes so much time doing this replacement operation. 但是，执行此替换操作需要花费大量时间。 About 35minutes in an intel core i7 with 8GB ram. 配备8GB内存的Intel Core i7约需35分钟。 I wanted to know if there is a better way to do this operation. 我想知道是否有更好的方法来执行此操作。 Thanks in advance 提前致谢

Answer 1

If I understand your situation right, you can just do df['term'] = df['term'].map(terms_indexed) . 如果我了解您的情况正确，则可以执行df['term'] = df['term'].map(terms_indexed) 。 Doing series1.map(series2) "translates" series1 by using its values as indexes into series2. 通过使用series1.map(series2)的值作为对series2的索引，可以“翻译” series1。

如何有效替换熊猫DataFrame上的值？

问题描述

1 个解决方案

解决方案1
4 已采纳 2014-09-03 21:11:50

如何有效替换熊猫DataFrame上的值？

问题描述

1 个解决方案

解决方案1 4 已采纳 2014-09-03 21:11:50

解决方案1
4 已采纳 2014-09-03 21:11:50