简体   繁体   中英

More efficient way to assign group for each row in pandas

I have a data frame with more than 1000 columns, and I have a predefined group list. I would like to compare each cell value with the each group boundary and create a new column to assign the group name. I have written for loops but it took more than 5 mins to process it. Is there any more efficient way to achieve this? Thanks

Here my data frame

Frequency
21.0
18.0    
16.0    
10.0
10.0    
9.0    
10.0    
10.0      
5.0       
8.0 

And my predefined group list

> groups    
[(3, 5), (6, 10), (11, 30)]

What I would like to get is

Frequency   Group
21.0        11-30
18.0        11-30
16.0        11-30
10.0        6-10
10.0        6-10
9.0         6-10
10.0        6-10
10.0        6-10
5.0         3-5
8.0         6-10

Here is my code

for i in range(0, len(fre_table["Frequency"])):
    for j in range(0, len(groups)):
        if fre_table["Frequency"][i] >= groups[j][0] and fre_table["Frequency"][i] <= groups[j][1]:
            break
    fre_table['Group'][i] = "{}-{}".format(groups[j][0], groups[j][1])

Establishing the efficiency of the solution suggested by @BallpointBen in the comment section

Data:

import numpy as np
import pandas as pd

fre_table = pd.DataFrame({'Index':[0,1,2,3,4,5,6,7,8,9],
             'Frequency':[21.0, 18.0, 16.0, 10.0, 10.0, 9.0, 10.0, 10.0, 5.0, 8.0]})
groups = [(3, 5), (6, 10), (11, 30)]

Time taken by the initial solution: 0.5420

import timeit
start_time = timeit.default_timer()
fre_table['Group'] = 0
for i in range(0, len(fre_table["Frequency"])):
    for j in range(0, len(groups)):
        if fre_table["Frequency"][i] >= groups[j][0] and fre_table["Frequency"][i] <= groups[j][1]:
            break
    fre_table['Group'][i] = "{}-{}".format(groups[j][0], groups[j][1])
elapsed_time = timeit.default_timer() - start_time

Time take by the final solution: 0.0043s

import timeit
start_time = timeit.default_timer()
bins = pd.IntervalIndex.from_tuples(groups)
fre_table['Group'] = pd.cut(fre_table['Frequency'], bins)
elapsed_time = timeit.default_timer() - start_time

About a 100 times faster!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM