简体   繁体   中英

Most efficient way to enlarge the active area of a binary series pandas?

I have a pandas dataframe df :

Car Open Time
Audi A5 0 0
Audi A5 0 1
Audi A5 0 2
Audi A5 1 3
Audi A5 1 4
Audi A5 0 5
Audi A5 0 6
Audi A5 0 7
Audi A5 1 8
Audi A5 1 9
Mercedes Class A 1 0
Mercedes Class A 1 1
Mercedes Class A 1 2
Mercedes Class A 0 3
Mercedes Class A 0 4
Mercedes Class A 1 5
Mercedes Class A 1 6
Mercedes Class A 0 7
Mercedes Class A 0 8
Mercedes Class A 1 9

I want to enlarge the active part of the binary series Open by n units, but after grouping the dataframe by Car .

An active part is a group of consecutive 1 that is either surrounded by 0, or having only 0 as previous value, or having only 0 as next values. The case when the series has only 1 as value is ignored.

If n = 1 , I want to get the following dataframe:

Car Open Time
Audi A5 0 0
Audi A5 0 1
Audi A5 1 2
Audi A5 1 3
Audi A5 1 4
Audi A5 0 5
Audi A5 0 6
Audi A5 1 7
Audi A5 1 8
Audi A5 1 9
Mercedes Class A 1 0
Mercedes Class A 1 1
Mercedes Class A 1 2
Mercedes Class A 0 3
Mercedes Class A 1 4
Mercedes Class A 1 5
Mercedes Class A 1 6
Mercedes Class A 0 7
Mercedes Class A 1 8
Mercedes Class A 1 9

I can get the index of all active parts using the following code:

df = pd.DataFrame(
   {
      "Car": ["Audi A5"]*10 + ["Mercedes Class A"]*10,
      "Time" : list(range(10)) + list(range(10)),
      "Open" : [0,0,0,1,1,0,0,0,1,1,1,1,1,0,0,1,1,0,0,1]
   }
)

def enlarge(dataframe : pd.DataFrame, sensor : str, n : int = 1) -> pd.DataFrame:

    get_group_indexes = (
        lambda x: x.index[0]
        if x.index[-1] - x.index[0] >= 1
        else None
    )

    groups = (
        dataframe[sensor]
        .eq(0)
        .cumsum()[dataframe[sensor].ne(0)]
        .to_frame()
        .groupby(sensor)
        .apply(get_group_indexes)
        .dropna()
    )

    if groups.empty:
        return dataframe

    for index in groups:
        dataframe.loc[index-n:index, sensor] = 1

    return dataframe

It works when I don't have to group by Car but I want to group by this column before perfoming this transformation. Does someone hqs an idea how to achieve this efficiently using pandas tricks? Thanks.

IIUC, you can bfill per group with a limit after masking the non-1 values:

n=1
df['Open2'] = (df['Open']
               .where(df['Open'].eq(1))
               .groupby(df['Car']).bfill(limit=n)
               .fillna(df['Open'], downcast='infer')
              )

output (as new column "Open2" for clarity):

                 Car  Time  Open  Open2
0            Audi A5     0     0      0
1            Audi A5     1     0      0
2            Audi A5     2     0      1
3            Audi A5     3     1      1
4            Audi A5     4     1      1
5            Audi A5     5     0      0
6            Audi A5     6     0      0
7            Audi A5     7     0      1
8            Audi A5     8     1      1
9            Audi A5     9     1      1
10  Mercedes Class A     0     1      1
11  Mercedes Class A     1     1      1
12  Mercedes Class A     2     1      1
13  Mercedes Class A     3     0      0
14  Mercedes Class A     4     0      1
15  Mercedes Class A     5     1      1
16  Mercedes Class A     6     1      1
17  Mercedes Class A     7     0      0
18  Mercedes Class A     8     0      1
19  Mercedes Class A     9     1      1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM