简体   繁体   English

有没有办法在 pandas 中传播或 pivot df 但保持/填充空值?

[英]Is there a way to spread or pivot a df in pandas but maintain/fill in empty values?

This might be a complicated question, and I am very new to pandas.这可能是一个复杂的问题,我对 pandas 很陌生。 I basically want to use the script from a machine that can be used as a key for data analysis.我基本上想使用机器上的脚本,可以用作数据分析的关键。

I start with the following data frame:我从以下数据框开始:

```
Function       Value
6      WAIT        60.0
7       ACT          27
8      WAIT        30.0
10     WAIT        30.0
12     WAIT        30.0
14     WAIT        30.0
16     WAIT        60.0
18     WAIT        60.0
20     WAIT        60.0
22     WAIT        60.0
24     WAIT        60.0
26     WAIT        60.0
28     WAIT        60.0
30     WAIT        60.0
33      ACT           0
34     WAIT        30.0
36     WAIT        30.0
38     WAIT        90.0
40     WAIT       120.0
42     WAIT       120.0
44     WAIT       120.0
46     WAIT       120.0
```

I want to end up with a data frame that looks like this:我想最终得到一个如下所示的数据框:

Time_Min  Condition
0        1.0          0
1        1.5         27
2        2.0         27
3        2.5         27
4        3.5         27
5        4.5         27
6        5.5         27
7        6.5         27
8        7.5         27
9        8.5         27
10       9.5         27
11      10.5         27
12      11.0         27
13      11.5          0
14      12.0          0
15      13.5          0
16      15.5          0
17      17.5          0
18      19.5          0
19      21.5          0

Where "WAIT" has been converted to accumulated time in minutes (currently it's seconds between samples) and ACT is 0 when it is not set, and then when it is set to switch to 27, it stays 27 at each time point until it changes.其中“WAIT”已转换为以分钟为单位的累计时间(目前是采样之间的秒数)且ACT不设置时为0,然后设置为切换为27时,在每个时间点保持27直到它改变.

What's the best way to do this?最好的方法是什么? I want to be able to import scripts that vary in ACT value and have different sampling times, but always produces the same structure of output:)我希望能够导入 ACT 值不同且采样时间不同的脚本,但总是产生相同的 output 结构:)

Time_Min is just the cumsum() of Value divided by 60: Time_Min只是Valuecumsum()除以 60:

df['Time_Min'] = df['Value'].cumsum().div(60)

Condition is more involved but can be done with mask() and groupby() . Condition涉及更多,但可以使用mask()groupby()来完成。 Create a boolean mask where Function == ACT , then set non- ACT rows to 0 and cumsum() the ACT groups:创建一个 boolean 掩码,其中Function == ACT ,然后将非ACT行设置为 0 和cumsum() ACT组:

act = df['Function'].eq('ACT')
df['Condition'] = df['Value'].mask(~act, 0).groupby(act.cumsum()).cumsum()

Output: Output:

   Function  Value  Time_Min  Condition
6      WAIT   60.0      1.00        0.0
7       ACT   27.0      1.45       27.0
8      WAIT   30.0      1.95       27.0
10     WAIT   30.0      2.45       27.0
12     WAIT   30.0      2.95       27.0
14     WAIT   30.0      3.45       27.0
16     WAIT   60.0      4.45       27.0
18     WAIT   60.0      5.45       27.0
20     WAIT   60.0      6.45       27.0
22     WAIT   60.0      7.45       27.0
24     WAIT   60.0      8.45       27.0
26     WAIT   60.0      9.45       27.0
28     WAIT   60.0     10.45       27.0
30     WAIT   60.0     11.45       27.0
33      ACT    0.0     11.45        0.0
34     WAIT   30.0     11.95        0.0
36     WAIT   30.0     12.45        0.0
38     WAIT   90.0     13.95        0.0
40     WAIT  120.0     15.95        0.0
42     WAIT  120.0     17.95        0.0
44     WAIT  120.0     19.95        0.0
46     WAIT  120.0     21.95        0.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM