简体   繁体   English

R:如何使用来自利用其他多列的条件的值替换 dataframe 列中的 NA?

[英]R: How do I replace NAs in a dataframe column with values from conditions leveraging other multiple columns?

Using R, I am trying to fill NAs in a column with values leveraging conditions of other columns.使用 R,我试图用利用其他列条件的值填充列中的 NA。 The data frame has 4columns.数据框有 4 列。 The 4 columns are described below. 4 列如下所述。

"Water_Level": Has some values which also include NAs. “Water_Level”:具有一些还包括 NA 的值。 This is the column I want to replace the NAs.这是我要替换 NA 的列。 Take this column as the amount of water in liters in a tank.将此列作为水箱中的水量(以升为单位)。

"Tank": Unique identifier for tanks. “坦克”:坦克的唯一标识符。 In this sample, I have tank 1 and tank 2.在这个示例中,我有坦克 1 和坦克 2。

"Flag": This has a series of 0's and 1's. “标志”:这有一系列 0 和 1。 When value is 0 the tap is opened and the Water_level value decreases by a constant of 0.05.当值为 0 时,水龙头打开,Water_level 值减少 0.05 的常数。 When flag is 1, the tank is being pumped, so the water level increases in the respective tank gradually to the peak value at the end of the series of 1's.当 flag 为 1 时,水箱正在被抽水,因此各个水箱中的水位逐渐上升到 1 序列结束时的峰值。 The rate of increase is varies and is determined by the length of 1's in the Flag column or the Counter number corresponding to the end of the series of 1's.增加的速率是变化的,取决于标志列中 1 的长度或对应于 1 序列末尾的计数器编号。

"Counter": A column counting the number of 0's and 1's in the flag column in order. “计数器”:按顺序计算标志列中 0 和 1 的数量的列。

I need to fill the NAs in the "Water_level" column with the conditions of the other columns.我需要用其他列的条件填充“Water_level”列中的 NA。

Honestly, I haven't been able to try anything despite clearly understanding the outcome required.老实说,尽管清楚地了解所需的结果,但我无法尝试任何事情。

df <- data.frame(
  Water_level = c(67.92, rep(NA,9),67.96,10.5,rep(NA,8),20),
  Flag = c(rep(0,5),rep(1,6),rep(0,5),rep(1,5)),
  Tank= c(rep(1, 11), rep(2, 10)),
  Counter = c(seq(1:5),seq(1:6), seq(1:5),seq(1:5))
)

df

   Water_level Flag Tank Counter
1        67.92    0    1       1
2           NA    0    1       2
3           NA    0    1       3
4           NA    0    1       4
5           NA    0    1       5
6           NA    1    1       1
7           NA    1    1       2
8           NA    1    1       3
9           NA    1    1       4
10          NA    1    1       5
11       67.96    1    1       6
12       10.50    0    2       1
13          NA    0    2       2
14          NA    0    2       3
15          NA    0    2       4
16          NA    0    2       5
17          NA    1    2       1
18          NA    1    2       2
19          NA    1    2       3
20          NA    1    2       4
21       20.00    1    2       5

The result expected is to fill the NAs in the Water_level as described by the conditions in my introduction.预期的结果是填充 Water_level 中的 NA,如我介绍中的条件所述。

For example, line 2 in the "Water_level" should be 67.92 - 0.05 = 67.87.例如,“Water_level”中的第 2 行应为 67.92 - 0.05 = 67.87。 This is because the tap is open ie Flag is at 0. line 3 will be 67.87 - 0.05 = 67.82 and so on.这是因为抽头已打开,即标志位于 0。第 3 行将是 67.87 - 0.05 = 67.82,依此类推。

The tricky part is in line 6 were the Flag changes to 1 ie the tank is being pumped.棘手的部分在第 6 行,如果标志变为 1,即正在抽油箱。 We can see the series of 1's for Tank 1 ends at line 11. The peak value recorded for water_level is 67.96.我们可以看到 Tank 1 的 1 序列在第 11 行结束。记录的 water_level 峰值为 67.96。 So the rate of increase from line 6 to 10 will now be as seen in the formular below.因此,从第 6 行到第 10 行的增长率现在将如下面的公式所示。

(67.96- value at line5 following the decrease pattern) / number of Counter steps ie 6 for this case (67.96- 第 5 行的值遵循减少模式)/计数器步数,即这种情况下为 6

This calculation continues for Tank 2.对 Tank 2 继续进行此计算。

Thanks is anticipation for a solution.谢谢是对解决方案的期待。

Update.更新。

@manotheshark. @manotheshark。 This is a good beginning.这是一个好的开始。 But it doesnt generalise well.但它不能很好地概括。 When I include row 12 to 16, it produces a wrong output.当我包含第 12 到 16 行时,它会产生错误的 output。 ie it doesnt decline by 0.05 from line 11.即它不会从第 11 行下降 0.05。

df <- data.frame(
  Water_level = c(67.92, rep(NA,9),67.96, rep(NA,5),10.5,rep(NA,8),20),
  Flag = c(rep(0,5),rep(1,6),rep(0,5),rep(0,5),rep(1,5)),
  Tank= c(rep(1, 16), rep(2, 10)),
  Counter = c(seq(1:5),seq(1:6),seq(1:5), seq(1:5),seq(1:5))
)
df

   Water_level Flag Tank Counter
1        67.92    0    1       1
2           NA    0    1       2
3           NA    0    1       3
4           NA    0    1       4
5           NA    0    1       5
6           NA    1    1       1
7           NA    1    1       2
8           NA    1    1       3
9           NA    1    1       4
10          NA    1    1       5
11       67.96    1    1       6
12          NA    0    1       1
13          NA    0    1       2
14          NA    0    1       3
15          NA    0    1       4
16          NA    0    1       5
17       10.50    0    2       1
18          NA    0    2       2
19          NA    0    2       3
20          NA    0    2       4
21          NA    0    2       5
22          NA    1    2       1
23          NA    1    2       2
24          NA    1    2       3
25          NA    1    2       4
26       20.00    1    2       5

The output running your solution is presented below.运行您的解决方案的 output 如下所示。 Line 12 should be 67.96 - 0.05 = 67.91.第 12 行应该是 67.96 - 0.05 = 67.91。

   Water_level Flag Tank Counter
1     67.92000    0    1       1
2     67.87000    0    1       2
3     67.82000    0    1       3
4     67.77000    0    1       4
5     67.72000    0    1       5
6     67.30167    1    1       1
7     67.43333    1    1       2
8     67.56500    1    1       3
9     67.69667    1    1       4
10    67.82833    1    1       5
11    67.96000    1    1       6
12    67.37000    0    1       1
13    67.32000    0    1       2
14    67.27000    0    1       3
15    67.22000    0    1       4
16    67.17000    0    1       5
17    10.50000    0    2       1
18    10.45000    0    2       2
19    10.40000    0    2       3
20    10.35000    0    2       4
21    10.30000    0    2       5
22    12.24000    1    2       1
23    14.18000    1    2       2
24    16.12000    1    2       3
25    18.06000    1    2       4
26    20.00000    1    2       5

Not tested if this works for multiple tank cycles.未测试这是否适用于多个罐循环。 Converted data.frame to data.tabledata.frame转换为data.table

library(data.table)
setDT(df)

# calculate tank levels when dropping with Flag of 0
df[Flag == 0, Water_level := first(Water_level) - 0.05 * (.I - first(.I)), by = .(Flag, Tank)]

# use sequence to determine tank levels when filling from previous minimum to new max
df[Flag == 1, Water_level := seq(df[Flag == 0, last(Water_level), by = .(Flag, Tank)][,V1][.GRP], last(Water_level), length.out = .N + 1)[-1], by = .(Flag, Tank)]

> df
    Water_level Flag Tank Counter
 1:       67.92    0    1       1
 2:       67.87    0    1       2
 3:       67.82    0    1       3
 4:       67.77    0    1       4
 5:       67.72    0    1       5
 6:       67.76    1    1       1
 7:       67.80    1    1       2
 8:       67.84    1    1       3
 9:       67.88    1    1       4
10:       67.92    1    1       5
11:       67.96    1    1       6
12:       10.50    0    2       1
13:       10.45    0    2       2
14:       10.40    0    2       3
15:       10.35    0    2       4
16:       10.30    0    2       5
17:       12.24    1    2       1
18:       14.18    1    2       2
19:       16.12    1    2       3
20:       18.06    1    2       4
21:       20.00    1    2       5
    Water_level Flag Tank Counter

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何用同一数据集中其他相应多列的值替换多列中的 NA? - How can I replace NAs in multiple columns with the values from other corresponding multiple columns in the same data set? 如何根据R中另一列中的值替换数据框的列中的值? - How to replace values in the columns of a dataframe based on the values in the other column in R? 如何根据R中的其他列将NA替换为先前的列值加上一个分组? - How to replace NAs with previous column values plus one by group based on other columns in R? 如何在R中的数据帧中将多种类型的值替换为NA - How to replace multiple type of values to NAs in a dataframe in R dplyr 的 rowwise + replace_NAs:用其他列的值替换多列中的 NA - dplyr's rowwise + replace_NAs: replacing NAs in multiple columns with value from other column 如何将所有非数字值转换为数据帧中R中的NA? - How do I convert all nonumeric values to NAs in R in a dataframe? 在 R 中:将 NA 替换为其他行的值,但其他列中的值相同 - In R: Replace NAs with values of other row but same value in other column 如何替换R数据框列中的多个字符串 - How do I replace multiple strings in a R dataframe column 根据 R 中其他列中的值和条件计算新列 - Calculate a new column from values and conditions in the other columns in R 如何根据使用 R 与第三列的匹配,将数据框中多列的值替换为第二列中的值? - How do I replace values across multiple columns in a data-frame with values from a second column, based on a match with a third column using R?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM