I've got a large DataFrame(600k,2) named data and basically I have in the second column a set of 50k unique values distributed along the data.
The data looks like this
image_id term
0 56127 23001
1 56127 763003
2 56127 51002
3 26947 581007
4 26947 14001
5 26947 95000
6 26947 92000
7 26947 62004
8 26947 224007
...600k more
On the other hand I have a Series named terms_indexed with an index composed of this 50k terms like this.
NewTerm
Term
23001 9100
763003 402
51002 10608
581007 900
14001 42107
95000 900
92000 4002
62004 42107
224007 9100
...50k more
But I want to reemplace those values in the original DataFrame efficiently using the Series with the indexed terms. So far I have done it with the following line
for i in range(data.shape[0]):
data.loc[i, 'term'] = int(terms_indexed.ix[data.iloc[i][1]])
However it takes so much time doing this replacement operation. About 35minutes in an intel core i7 with 8GB ram. I wanted to know if there is a better way to do this operation. Thanks in advance
If I understand your situation right, you can just do df['term'] = df['term'].map(terms_indexed)
. Doing series1.map(series2)
"translates" series1 by using its values as indexes into series2.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.