[英]How to replace efficiently values on a pandas DataFrame?
I've got a large DataFrame(600k,2) named data and basically I have in the second column a set of 50k unique values distributed along the data. 我有一个大的DataFrame(600k,2)命名数据 ,基本上在第二列中有一组50k沿数据分布的唯一值。
The data looks like this 数据看起来像这样
image_id term
0 56127 23001
1 56127 763003
2 56127 51002
3 26947 581007
4 26947 14001
5 26947 95000
6 26947 92000
7 26947 62004
8 26947 224007
...600k more
On the other hand I have a Series named terms_indexed with an index composed of this 50k terms like this. 另一方面,我有一个名为terms_indexed的系列,其索引由这样的50k个术语组成。
NewTerm
Term
23001 9100
763003 402
51002 10608
581007 900
14001 42107
95000 900
92000 4002
62004 42107
224007 9100
...50k more
But I want to reemplace those values in the original DataFrame efficiently using the Series with the indexed terms. 但是我想使用带有索引项的系列将这些值有效地重新放置在原始DataFrame中。 So far I have done it with the following line
到目前为止,我已经完成了以下代码
for i in range(data.shape[0]):
data.loc[i, 'term'] = int(terms_indexed.ix[data.iloc[i][1]])
However it takes so much time doing this replacement operation. 但是,执行此替换操作需要花费大量时间。 About 35minutes in an intel core i7 with 8GB ram.
配备8GB内存的Intel Core i7约需35分钟。 I wanted to know if there is a better way to do this operation.
我想知道是否有更好的方法来执行此操作。 Thanks in advance
提前致谢
If I understand your situation right, you can just do df['term'] = df['term'].map(terms_indexed)
. 如果我了解您的情况正确,则可以执行
df['term'] = df['term'].map(terms_indexed)
。 Doing series1.map(series2)
"translates" series1 by using its values as indexes into series2. 通过使用
series1.map(series2)
的值作为对series2的索引,可以“翻译” series1。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.