How to replace efficiently values on a pandas DataFrame?

Question

I've got a large DataFrame(600k,2) named data and basically I have in the second column a set of 50k unique values distributed along the data.

The data looks like this

    image_id     term 
0   56127        23001  
1   56127        763003  
2   56127        51002  
3   26947        581007  
4   26947        14001  
5   26947        95000  
6   26947        92000  
7   26947        62004  
8   26947        224007
...600k more

On the other hand I have a Series named terms_indexed with an index composed of this 50k terms like this.

            NewTerm
Term                  
23001          9100
763003          402
51002         10608
581007          900
14001         42107
95000           900
92000          4002
62004         42107
224007         9100
...50k more

But I want to reemplace those values in the original DataFrame efficiently using the Series with the indexed terms. So far I have done it with the following line

for i in range(data.shape[0]):
        data.loc[i, 'term'] = int(terms_indexed.ix[data.iloc[i][1]])

However it takes so much time doing this replacement operation. About 35minutes in an intel core i7 with 8GB ram. I wanted to know if there is a better way to do this operation. Thanks in advance

Answer 1

If I understand your situation right, you can just do df['term'] = df['term'].map(terms_indexed) . Doing series1.map(series2) "translates" series1 by using its values as indexes into series2.

How to replace efficiently values on a pandas DataFrame?

Question

1 answers

solution1
4 ACCPTED 2014-09-03 21:11:50

How to replace efficiently values on a pandas DataFrame?

Question

1 answers

solution1 4 ACCPTED 2014-09-03 21:11:50

solution1
4 ACCPTED 2014-09-03 21:11:50