简体   繁体   English

扩大二进制系列 pandas 的有效区域的最有效方法?

[英]Most efficient way to enlarge the active area of a binary series pandas?

I have a pandas dataframe df :我有一个 pandas dataframe df

Car Open打开 Time时间
Audi A5奥迪A5 0 0 0 0
Audi A5奥迪A5 0 0 1 1
Audi A5奥迪A5 0 0 2 2
Audi A5奥迪A5 1 1 3 3
Audi A5奥迪A5 1 1 4 4
Audi A5奥迪A5 0 0 5 5
Audi A5奥迪A5 0 0 6 6
Audi A5奥迪A5 0 0 7 7
Audi A5奥迪A5 1 1 8 8
Audi A5奥迪A5 1 1 9 9
Mercedes Class A奔驰Class A 1 1 0 0
Mercedes Class A奔驰Class A 1 1 1 1
Mercedes Class A奔驰Class A 1 1 2 2
Mercedes Class A奔驰Class A 0 0 3 3
Mercedes Class A奔驰Class A 0 0 4 4
Mercedes Class A奔驰Class A 1 1 5 5
Mercedes Class A奔驰Class A 1 1 6 6
Mercedes Class A奔驰Class A 0 0 7 7
Mercedes Class A奔驰Class A 0 0 8 8
Mercedes Class A奔驰Class A 1 1 9 9

I want to enlarge the active part of the binary series Open by n units, but after grouping the dataframe by Car .我想将二进制系列Open的活动部分放大n单位,但是在将 dataframe 分组后Car

An active part is a group of consecutive 1 that is either surrounded by 0, or having only 0 as previous value, or having only 0 as next values.活动部分是一组被 0 包围的连续 1,或者只有 0 作为前一个值,或者只有 0 作为下一个值。 The case when the series has only 1 as value is ignored.该系列只有 1 作为值的情况被忽略。

If n = 1 , I want to get the following dataframe:如果n = 1 ,我想得到以下 dataframe:

Car Open打开 Time时间
Audi A5奥迪A5 0 0 0 0
Audi A5奥迪A5 0 0 1 1
Audi A5奥迪A5 1 1 2 2
Audi A5奥迪A5 1 1 3 3
Audi A5奥迪A5 1 1 4 4
Audi A5奥迪A5 0 0 5 5
Audi A5奥迪A5 0 0 6 6
Audi A5奥迪A5 1 1 7 7
Audi A5奥迪A5 1 1 8 8
Audi A5奥迪A5 1 1 9 9
Mercedes Class A奔驰Class A 1 1 0 0
Mercedes Class A奔驰Class A 1 1 1 1
Mercedes Class A奔驰Class A 1 1 2 2
Mercedes Class A奔驰Class A 0 0 3 3
Mercedes Class A奔驰Class A 1 1 4 4
Mercedes Class A奔驰Class A 1 1 5 5
Mercedes Class A奔驰Class A 1 1 6 6
Mercedes Class A奔驰Class A 0 0 7 7
Mercedes Class A奔驰Class A 1 1 8 8
Mercedes Class A奔驰Class A 1 1 9 9

I can get the index of all active parts using the following code:我可以使用以下代码获取所有活动部件的索引:

df = pd.DataFrame(
   {
      "Car": ["Audi A5"]*10 + ["Mercedes Class A"]*10,
      "Time" : list(range(10)) + list(range(10)),
      "Open" : [0,0,0,1,1,0,0,0,1,1,1,1,1,0,0,1,1,0,0,1]
   }
)

def enlarge(dataframe : pd.DataFrame, sensor : str, n : int = 1) -> pd.DataFrame:

    get_group_indexes = (
        lambda x: x.index[0]
        if x.index[-1] - x.index[0] >= 1
        else None
    )

    groups = (
        dataframe[sensor]
        .eq(0)
        .cumsum()[dataframe[sensor].ne(0)]
        .to_frame()
        .groupby(sensor)
        .apply(get_group_indexes)
        .dropna()
    )

    if groups.empty:
        return dataframe

    for index in groups:
        dataframe.loc[index-n:index, sensor] = 1

    return dataframe

It works when I don't have to group by Car but I want to group by this column before perfoming this transformation.当我不必按Car分组但我想在执行此转换之前按此列分组时,它可以工作。 Does someone hqs an idea how to achieve this efficiently using pandas tricks?有人知道如何使用 pandas 技巧有效地实现这一目标吗? Thanks.谢谢。

IIUC, you can bfill per group with a limit after masking the non-1 values: bfill ,您可以在屏蔽非 1 值后对每个组进行限制:

n=1
df['Open2'] = (df['Open']
               .where(df['Open'].eq(1))
               .groupby(df['Car']).bfill(limit=n)
               .fillna(df['Open'], downcast='infer')
              )

output (as new column "Open2" for clarity): output(为清楚起见,作为新列“Open2”):

                 Car  Time  Open  Open2
0            Audi A5     0     0      0
1            Audi A5     1     0      0
2            Audi A5     2     0      1
3            Audi A5     3     1      1
4            Audi A5     4     1      1
5            Audi A5     5     0      0
6            Audi A5     6     0      0
7            Audi A5     7     0      1
8            Audi A5     8     1      1
9            Audi A5     9     1      1
10  Mercedes Class A     0     1      1
11  Mercedes Class A     1     1      1
12  Mercedes Class A     2     1      1
13  Mercedes Class A     3     0      0
14  Mercedes Class A     4     0      1
15  Mercedes Class A     5     1      1
16  Mercedes Class A     6     1      1
17  Mercedes Class A     7     0      0
18  Mercedes Class A     8     0      1
19  Mercedes Class A     9     1      1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将pandas整数系列转换为字符串的最有效方法? - Most efficient way to convert pandas series of integers to strings? 使用Pandas检查2个系列中的值对的最有效方法是? - Most efficient way with Pandas to check pair of values from 2 series? 将日期字符串转换为pandas时间序列索引的最有效方法 - Most efficient way to convert date strings to a pandas time series index 当旧序列缺少数据时,基于Pandas中的条件创建新二进制序列的最有效方法? - Most efficient method for creating new binary Series based on conditional in Pandas when the old series has missing data? 最有效的方式加入两个时间序列 - Most efficient way to join two time series 用 numpy arrays 存储大型 Pandas 系列的最有效方法是什么? - What's the most efficient way to store large Pandas Series with numpy arrays? 对于 Pandas 系列,将附加行添加到特定级别的最有效方法是 NaN - Most efficient way to add additional rows up to a certain level that are NaN in value for Pandas Series 按日/月/年将熊猫系列日期字符串分类的最有效方法? - Most efficient way to bin a pandas series of date strings by day/month/year? 计算熊猫中出现次数的最有效方法是什么? - What is the most efficient way of counting occurrences in pandas? Pandas:最节省资源的申请方式 function - Pandas: Most resource efficient way to apply function
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM