[英]R: How do I replace NAs in a dataframe column with values from conditions leveraging other multiple columns?
Using R, I am trying to fill NAs in a column with values leveraging conditions of other columns.使用 R,我试图用利用其他列条件的值填充列中的 NA。 The data frame has 4columns.
数据框有 4 列。 The 4 columns are described below.
4 列如下所述。
"Water_Level": Has some values which also include NAs. “Water_Level”:具有一些还包括 NA 的值。 This is the column I want to replace the NAs.
这是我要替换 NA 的列。 Take this column as the amount of water in liters in a tank.
将此列作为水箱中的水量(以升为单位)。
"Tank": Unique identifier for tanks. “坦克”:坦克的唯一标识符。 In this sample, I have tank 1 and tank 2.
在这个示例中,我有坦克 1 和坦克 2。
"Flag": This has a series of 0's and 1's. “标志”:这有一系列 0 和 1。 When value is 0 the tap is opened and the Water_level value decreases by a constant of 0.05.
当值为 0 时,水龙头打开,Water_level 值减少 0.05 的常数。 When flag is 1, the tank is being pumped, so the water level increases in the respective tank gradually to the peak value at the end of the series of 1's.
当 flag 为 1 时,水箱正在被抽水,因此各个水箱中的水位逐渐上升到 1 序列结束时的峰值。 The rate of increase is varies and is determined by the length of 1's in the Flag column or the Counter number corresponding to the end of the series of 1's.
增加的速率是变化的,取决于标志列中 1 的长度或对应于 1 序列末尾的计数器编号。
"Counter": A column counting the number of 0's and 1's in the flag column in order. “计数器”:按顺序计算标志列中 0 和 1 的数量的列。
I need to fill the NAs in the "Water_level" column with the conditions of the other columns.我需要用其他列的条件填充“Water_level”列中的 NA。
Honestly, I haven't been able to try anything despite clearly understanding the outcome required.老实说,尽管清楚地了解所需的结果,但我无法尝试任何事情。
df <- data.frame(
Water_level = c(67.92, rep(NA,9),67.96,10.5,rep(NA,8),20),
Flag = c(rep(0,5),rep(1,6),rep(0,5),rep(1,5)),
Tank= c(rep(1, 11), rep(2, 10)),
Counter = c(seq(1:5),seq(1:6), seq(1:5),seq(1:5))
)
df
Water_level Flag Tank Counter
1 67.92 0 1 1
2 NA 0 1 2
3 NA 0 1 3
4 NA 0 1 4
5 NA 0 1 5
6 NA 1 1 1
7 NA 1 1 2
8 NA 1 1 3
9 NA 1 1 4
10 NA 1 1 5
11 67.96 1 1 6
12 10.50 0 2 1
13 NA 0 2 2
14 NA 0 2 3
15 NA 0 2 4
16 NA 0 2 5
17 NA 1 2 1
18 NA 1 2 2
19 NA 1 2 3
20 NA 1 2 4
21 20.00 1 2 5
The result expected is to fill the NAs in the Water_level as described by the conditions in my introduction.预期的结果是填充 Water_level 中的 NA,如我介绍中的条件所述。
For example, line 2 in the "Water_level" should be 67.92 - 0.05 = 67.87.例如,“Water_level”中的第 2 行应为 67.92 - 0.05 = 67.87。 This is because the tap is open ie Flag is at 0. line 3 will be 67.87 - 0.05 = 67.82 and so on.
这是因为抽头已打开,即标志位于 0。第 3 行将是 67.87 - 0.05 = 67.82,依此类推。
The tricky part is in line 6 were the Flag changes to 1 ie the tank is being pumped.棘手的部分在第 6 行,如果标志变为 1,即正在抽油箱。 We can see the series of 1's for Tank 1 ends at line 11. The peak value recorded for water_level is 67.96.
我们可以看到 Tank 1 的 1 序列在第 11 行结束。记录的 water_level 峰值为 67.96。 So the rate of increase from line 6 to 10 will now be as seen in the formular below.
因此,从第 6 行到第 10 行的增长率现在将如下面的公式所示。
(67.96- value at line5 following the decrease pattern) / number of Counter steps ie 6 for this case (67.96- 第 5 行的值遵循减少模式)/计数器步数,即这种情况下为 6
This calculation continues for Tank 2.对 Tank 2 继续进行此计算。
Thanks is anticipation for a solution.谢谢是对解决方案的期待。
Update.更新。
@manotheshark. @manotheshark。 This is a good beginning.
这是一个好的开始。 But it doesnt generalise well.
但它不能很好地概括。 When I include row 12 to 16, it produces a wrong output.
当我包含第 12 到 16 行时,它会产生错误的 output。 ie it doesnt decline by 0.05 from line 11.
即它不会从第 11 行下降 0.05。
df <- data.frame(
Water_level = c(67.92, rep(NA,9),67.96, rep(NA,5),10.5,rep(NA,8),20),
Flag = c(rep(0,5),rep(1,6),rep(0,5),rep(0,5),rep(1,5)),
Tank= c(rep(1, 16), rep(2, 10)),
Counter = c(seq(1:5),seq(1:6),seq(1:5), seq(1:5),seq(1:5))
)
df
Water_level Flag Tank Counter
1 67.92 0 1 1
2 NA 0 1 2
3 NA 0 1 3
4 NA 0 1 4
5 NA 0 1 5
6 NA 1 1 1
7 NA 1 1 2
8 NA 1 1 3
9 NA 1 1 4
10 NA 1 1 5
11 67.96 1 1 6
12 NA 0 1 1
13 NA 0 1 2
14 NA 0 1 3
15 NA 0 1 4
16 NA 0 1 5
17 10.50 0 2 1
18 NA 0 2 2
19 NA 0 2 3
20 NA 0 2 4
21 NA 0 2 5
22 NA 1 2 1
23 NA 1 2 2
24 NA 1 2 3
25 NA 1 2 4
26 20.00 1 2 5
The output running your solution is presented below.运行您的解决方案的 output 如下所示。 Line 12 should be 67.96 - 0.05 = 67.91.
第 12 行应该是 67.96 - 0.05 = 67.91。
Water_level Flag Tank Counter
1 67.92000 0 1 1
2 67.87000 0 1 2
3 67.82000 0 1 3
4 67.77000 0 1 4
5 67.72000 0 1 5
6 67.30167 1 1 1
7 67.43333 1 1 2
8 67.56500 1 1 3
9 67.69667 1 1 4
10 67.82833 1 1 5
11 67.96000 1 1 6
12 67.37000 0 1 1
13 67.32000 0 1 2
14 67.27000 0 1 3
15 67.22000 0 1 4
16 67.17000 0 1 5
17 10.50000 0 2 1
18 10.45000 0 2 2
19 10.40000 0 2 3
20 10.35000 0 2 4
21 10.30000 0 2 5
22 12.24000 1 2 1
23 14.18000 1 2 2
24 16.12000 1 2 3
25 18.06000 1 2 4
26 20.00000 1 2 5
Not tested if this works for multiple tank cycles.未测试这是否适用于多个罐循环。 Converted
data.frame
to data.table
将
data.frame
转换为data.table
library(data.table)
setDT(df)
# calculate tank levels when dropping with Flag of 0
df[Flag == 0, Water_level := first(Water_level) - 0.05 * (.I - first(.I)), by = .(Flag, Tank)]
# use sequence to determine tank levels when filling from previous minimum to new max
df[Flag == 1, Water_level := seq(df[Flag == 0, last(Water_level), by = .(Flag, Tank)][,V1][.GRP], last(Water_level), length.out = .N + 1)[-1], by = .(Flag, Tank)]
> df
Water_level Flag Tank Counter
1: 67.92 0 1 1
2: 67.87 0 1 2
3: 67.82 0 1 3
4: 67.77 0 1 4
5: 67.72 0 1 5
6: 67.76 1 1 1
7: 67.80 1 1 2
8: 67.84 1 1 3
9: 67.88 1 1 4
10: 67.92 1 1 5
11: 67.96 1 1 6
12: 10.50 0 2 1
13: 10.45 0 2 2
14: 10.40 0 2 3
15: 10.35 0 2 4
16: 10.30 0 2 5
17: 12.24 1 2 1
18: 14.18 1 2 2
19: 16.12 1 2 3
20: 18.06 1 2 4
21: 20.00 1 2 5
Water_level Flag Tank Counter
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.