简体   繁体   English

如何有效替换熊猫DataFrame上的值?

[英]How to replace efficiently values on a pandas DataFrame?

I've got a large DataFrame(600k,2) named data and basically I have in the second column a set of 50k unique values distributed along the data. 我有一个大的DataFrame(600k,2)命名数据 ,基本上在第二列中有一组50k沿数据分布的唯一值。

The data looks like this 数据看起来像这样

    image_id     term 
0   56127        23001  
1   56127        763003  
2   56127        51002  
3   26947        581007  
4   26947        14001  
5   26947        95000  
6   26947        92000  
7   26947        62004  
8   26947        224007
...600k more

On the other hand I have a Series named terms_indexed with an index composed of this 50k terms like this. 另一方面,我有一个名为terms_indexed的系列,其索引由这样的50k个术语组成。

            NewTerm
Term                  
23001          9100
763003          402
51002         10608
581007          900
14001         42107
95000           900
92000          4002
62004         42107
224007         9100
...50k more

But I want to reemplace those values in the original DataFrame efficiently using the Series with the indexed terms. 但是我想使用带有索引项的系列将这些值有效地重新放置在原始DataFrame中。 So far I have done it with the following line 到目前为止,我已经完成了以下代码

for i in range(data.shape[0]):
        data.loc[i, 'term'] = int(terms_indexed.ix[data.iloc[i][1]])

However it takes so much time doing this replacement operation. 但是,执行此替换操作需要花费大量时间。 About 35minutes in an intel core i7 with 8GB ram. 配备8GB内存的Intel Core i7约需35分钟。 I wanted to know if there is a better way to do this operation. 我想知道是否有更好的方法来执行此操作。 Thanks in advance 提前致谢

If I understand your situation right, you can just do df['term'] = df['term'].map(terms_indexed) . 如果我了解您的情况正确,则可以执行df['term'] = df['term'].map(terms_indexed) Doing series1.map(series2) "translates" series1 by using its values as indexes into series2. 通过使用series1.map(series2)的值作为对series2的索引,可以“翻译” series1。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM