简体   繁体   中英

Computing using two dataframes in Pandas

I'm trying to compute the following:

When there are

df1 (dataframe that has speed of characters( char_speed ) of subtitle that starts at start_time and ends at end_time ):

  char_speed  start_time  end_time
0         34           3        15
1         19          15        21
2          9          21        28
...

and

df2 (dataframe that has user's listening log that starts at start_time and ends at end_time with the speed that the user listened to at that interval):

  start_time  end_time  speed
0       9.23    20.929    1.0 
1        1.4     20.26    1.5
2       20.0      27.6   1.25
...

then compute the total character count during each interval:

  start_time  end_time  speed  total_char
0       9.23    20.929    1.0        
1        1.4     20.26    1.5
2       20.0      27.6   1.25
... 

For example, df2['total_char'].iloc[0] would be

((15-9.23)*34) + ((20.929-15)*19) 

as among time period of 9.23 ~ 20.929,

during 9.23 ~ 15, the speed would be 34,

during 15 ~ 20.929, the speed would be 19

and df2['total_char'].iloc[1] would be

(3-1.4)*0 + ((15-3)*34) + ((20.26-15)*19)

as among time period of 1.4 ~ 20.26,

during 1.4 ~ 3, the speed is not found in df1, so 0

during 3 ~ 15, the speed would be 34

during 15 ~ 20.26, the speed would be 19

I'm a newbie in Pandas and I've been recently mesmerized by how Pandas can be efficient in short and simple codings, but I'm not sure if there's a way to compute this in a short and simple coding. Right now, I can only think of an way to do it without utilizing Pandas functions: calling each row of df2 and then searching through each row in df1 and then compute it.

It would be helpful if you could tell me a way to efficiently code this using Pandas. Or any recommendation of functions would be helpful too!

Thanks in advance! :)

If you aren't opposed to merging the dataframes then apply makes it easy.

df2 = pd.concat([df1, df2], axis=1, sort=False)

def speed_calc(row):
    return ((row['end_time1']-row['start_time1'])*row['char_speed']) + \
    ((row['end_time2']-row['end_time1'])*row['char_speed'])

df2['total_char'] = df2.apply(speed_calc, axis=1)

This would require you to adjust the header names.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM