简体   繁体   中英

'Iterating backwards' through a huge dataset in Pandas DataFrame

I know, that iterating is not 'acceptable' in Pandas, and there are plenty, more efficient ways to do it, but for the sake of better understanding, let's just stick with iterating.

I have a huge NetFlow database, (it contains a Timestamp, source ip, dest ip, protocol, source and dest prot, and more attributes). I want to create custom attributes based on the previous rows.

Basically, I want to 'iterate' through the entire DataFrame, and for each row, I want to get the source IP, and then 'iterate' backwards for only -lets say- one hour. within that hour, I want to get all the rows that matches the selected source IP, and with only those rows, I want to calculate a new attribute from last two attributes of the previous occurrences in the last hour for every source IP.

One row from the dataset

You can do that without "iterating", you can apply a lambda function to the dataframe and just use indexing to do your "backwards" logic. You won't get any understanding with using iterations, you can have understanding of what you're doing better with df.apply()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM