I have a pandas dataframe df
:
Car | Open | Time |
---|---|---|
Audi A5 | 0 | 0 |
Audi A5 | 0 | 1 |
Audi A5 | 0 | 2 |
Audi A5 | 1 | 3 |
Audi A5 | 1 | 4 |
Audi A5 | 0 | 5 |
Audi A5 | 0 | 6 |
Audi A5 | 0 | 7 |
Audi A5 | 1 | 8 |
Audi A5 | 1 | 9 |
Mercedes Class A | 1 | 0 |
Mercedes Class A | 1 | 1 |
Mercedes Class A | 1 | 2 |
Mercedes Class A | 0 | 3 |
Mercedes Class A | 0 | 4 |
Mercedes Class A | 1 | 5 |
Mercedes Class A | 1 | 6 |
Mercedes Class A | 0 | 7 |
Mercedes Class A | 0 | 8 |
Mercedes Class A | 1 | 9 |
I want to enlarge the active part of the binary series Open
by n
units, but after grouping the dataframe by Car
.
An active part is a group of consecutive 1 that is either surrounded by 0, or having only 0 as previous value, or having only 0 as next values. The case when the series has only 1 as value is ignored.
If n = 1
, I want to get the following dataframe:
Car | Open | Time |
---|---|---|
Audi A5 | 0 | 0 |
Audi A5 | 0 | 1 |
Audi A5 | 1 | 2 |
Audi A5 | 1 | 3 |
Audi A5 | 1 | 4 |
Audi A5 | 0 | 5 |
Audi A5 | 0 | 6 |
Audi A5 | 1 | 7 |
Audi A5 | 1 | 8 |
Audi A5 | 1 | 9 |
Mercedes Class A | 1 | 0 |
Mercedes Class A | 1 | 1 |
Mercedes Class A | 1 | 2 |
Mercedes Class A | 0 | 3 |
Mercedes Class A | 1 | 4 |
Mercedes Class A | 1 | 5 |
Mercedes Class A | 1 | 6 |
Mercedes Class A | 0 | 7 |
Mercedes Class A | 1 | 8 |
Mercedes Class A | 1 | 9 |
I can get the index of all active parts using the following code:
df = pd.DataFrame(
{
"Car": ["Audi A5"]*10 + ["Mercedes Class A"]*10,
"Time" : list(range(10)) + list(range(10)),
"Open" : [0,0,0,1,1,0,0,0,1,1,1,1,1,0,0,1,1,0,0,1]
}
)
def enlarge(dataframe : pd.DataFrame, sensor : str, n : int = 1) -> pd.DataFrame:
get_group_indexes = (
lambda x: x.index[0]
if x.index[-1] - x.index[0] >= 1
else None
)
groups = (
dataframe[sensor]
.eq(0)
.cumsum()[dataframe[sensor].ne(0)]
.to_frame()
.groupby(sensor)
.apply(get_group_indexes)
.dropna()
)
if groups.empty:
return dataframe
for index in groups:
dataframe.loc[index-n:index, sensor] = 1
return dataframe
It works when I don't have to group by Car
but I want to group by this column before perfoming this transformation. Does someone hqs an idea how to achieve this efficiently using pandas tricks? Thanks.
IIUC, you can bfill
per group with a limit after masking the non-1 values:
n=1
df['Open2'] = (df['Open']
.where(df['Open'].eq(1))
.groupby(df['Car']).bfill(limit=n)
.fillna(df['Open'], downcast='infer')
)
output (as new column "Open2" for clarity):
Car Time Open Open2
0 Audi A5 0 0 0
1 Audi A5 1 0 0
2 Audi A5 2 0 1
3 Audi A5 3 1 1
4 Audi A5 4 1 1
5 Audi A5 5 0 0
6 Audi A5 6 0 0
7 Audi A5 7 0 1
8 Audi A5 8 1 1
9 Audi A5 9 1 1
10 Mercedes Class A 0 1 1
11 Mercedes Class A 1 1 1
12 Mercedes Class A 2 1 1
13 Mercedes Class A 3 0 0
14 Mercedes Class A 4 0 1
15 Mercedes Class A 5 1 1
16 Mercedes Class A 6 1 1
17 Mercedes Class A 7 0 0
18 Mercedes Class A 8 0 1
19 Mercedes Class A 9 1 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.