简体   繁体   中英

Most efficient way in Pandas to assign tuples to segments

I have written the following piece of code which assigns tuples to segments. A segment is a container of tuples and spans a certain time interval. Contrary to a tuple which has just 1 timestamp.

However, since my code has ~ 30 000 tuples, and this step is iterated quite often, it spends a lot of time on this method.

Is there a more efficient way to handle this?

for timestamp, tuple in tuples.iterrows():
    this_seg = [s for s in segments if s.can_have(timestamp)]
    assert(len(this_seg) <= 1)
    for s in this_seg:
        s.append(tuple)
return segments

Here is some more context:

A segment is a class of type Segment, and has a constructor as follows:

def __init__(self, ts_max, ts_min):
            self._df = pd.DataFrame({})
            self._ts_max = ts_max
            self._ts_min = ts_min

The method can_have checks whether the given timestamp, could be part of the segment: ie timestamp lies between ts_min and ts_max.

Tuples is a Pandas dataframe, which has timestamps as indices and some other features as columns.

Iterrows is the slowest way to do things in Pandas. It's not clear from your question what you're trying to do, but this tutorial offers several faster replacements for iterrows.

https://realpython.com/fast-flexible-pandas/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM