简体   繁体   English

如何基于匹配R中的其他列的行值来填充列的值

[英]How do I fill in values for columns based on matching few other column's row values in R

Data looks like below. 数据如下所示。

 time <- c('Nov 1st 2014, 17:36:50.000','Nov 1st 2014, 17:36:50.000',
          'Nov 1st 2014, 17:36:50.000','Nov 1st 2014, 17:36:50.000', 'Nov 1st 2014, 17:37:50.000','Nov 1st 2014, 17:37:50.000','Nov 1st 2014, 17:37:50.000')
A <- c('20.79','NA','NA','NA','21.8','NA','NA')  
B <- c('NA','97.017','94.321','85.014','NA','87.1','67.1')
C <- c('NA','C1','C2','C3','NA','C1','C2')
D <- c('L1','L1','L1','L1','L2','L2','L2')
C1 <- c('NA','NA','NA','NA','NA','NA','NA')
C2 <- c('NA','NA','NA','NA','NA','NA','NA')
C3 <- c('NA','NA','NA','NA','NA','NA','NA')
df <- data.frame(time,A,B,C,D,C1,C2,C3)

I need output in the below format. 我需要以下格式输出。

#   time                           A     B  C    D        C1     C2      C3
# 1 Nov 1st 2014, 17:36:50.000  20.79    NA NA   L1       97.02  94.321  85.014
    Nov 1st 2014, 17:37:50.000  21.8     NA NA   L2       87.1   67.1   47.3

How do I get the data in the above format in just one row as columns "time" and "D" are same for all the rows? 我如何只在一行中获得上述格式的数据,因为所有行的“时间”和“D”列相同?

Thanks in advance! 提前致谢!

You can do this with dplyr::gather() to re-shape B into C1, C2, C3, and then dplyr::join() it with the other columns, assuming a unique date/time. 您可以使用dplyr::gather()将B重新形成为C1,C2,C3,然后将dplyr::join()与其他列重新形成,假定具有唯一的日期/时间。

library(dplyr)
library(tidyr)

df %>%
  select(time, A, B, C, D) %>%
  filter(!is.na(A)) %>%
  left_join(
    df %>%
      select(time, C, B, D) %>%
      spread(C, B) %>%
      select(-`<NA>`),
    by = c("time", "D")
  )

#                         time     A  B    C  D     C1     C2     C3
# 1 Nov 1st 2014, 17:36:50.000 20.79 NA <NA> L1 97.017 94.321 85.014
# 2 Nov 1st 2014, 17:37:50.000 21.80 NA <NA> L2 87.100 67.100 47.300

data 数据

df <- read.table(text = "time A B C D C1 C2 C3
1 'Nov 1st 2014, 17:36:50.000' 20.79 NA NA L1 NA NA NA
2 'Nov 1st 2014, 17:36:50.000' NA 97.017 C1 L1 NA NA NA
3 'Nov 1st 2014, 17:36:50.000' NA 94.321 C2 L1 NA NA NA
4 'Nov 1st 2014, 17:36:50.000' NA 85.014 C3 L1 NA NA NA
5 'Nov 1st 2014, 17:37:50.000' 21.8 NA NA L2 NA NA NA
6 'Nov 1st 2014, 17:37:50.000' NA 87.1 C1 L2 NA NA NA
7 'Nov 1st 2014, 17:37:50.000' NA 67.1 C2 L2 NA NA NA
8 'Nov 1st 2014, 17:37:50.000' NA 47.3 C3 L2 NA NA NA",
                 header = T,
                 stringsAsFactors = F)

Step-by-step approach 循序渐进的方法

If I understand correctly, OP's dataset actually consists of two intermixed datasets: 如果我理解正确,OP的数据集实际上由两个混合数据集组成:

df
  time ABCD C1 C2 C3 1 Nov 1st 2014, 17:36:50.000 20.79 NA NA L1 NA NA NA 2 Nov 1st 2014, 17:36:50.000 NA 97.017 C1 L1 NA NA NA 3 Nov 1st 2014, 17:36:50.000 NA 94.321 C2 L1 NA NA NA 4 Nov 1st 2014, 17:36:50.000 NA 85.014 C3 L1 NA NA NA 5 Nov 1st 2014, 17:37:50.000 21.8 NA NA L2 NA NA NA 6 Nov 1st 2014, 17:37:50.000 NA 87.1 C1 L2 NA NA NA 7 Nov 1st 2014, 17:37:50.000 NA 67.1 C2 L2 NA NA NA 

which need to be separated: 需要分开的:

library(data.table)
df1 <- setDT(df)[A != "NA", .(time, A, D)]
df1
  time AD 1: Nov 1st 2014, 17:36:50.000 20.79 L1 2: Nov 1st 2014, 17:37:50.000 21.8 L2 

and

df2 <- df[A == "NA", .(time, B, C, D)]
df2
  time BCD 1: Nov 1st 2014, 17:36:50.000 97.017 C1 L1 2: Nov 1st 2014, 17:36:50.000 94.321 C2 L1 3: Nov 1st 2014, 17:36:50.000 85.014 C3 L1 4: Nov 1st 2014, 17:37:50.000 87.1 C1 L2 5: Nov 1st 2014, 17:37:50.000 67.1 C2 L2 

The key columns which identify unique subsets of rows are time and D . 标识行的唯一子集的关键列是timeD Columns C1 , C2 , and C3 are dropped as they will be created in the next step. C1C2C3列将被删除,因为它们将在下一步中创建。

The second dataset is to be reshaped from long to wide format: 第二个数据集将从长格式转换为宽格式:

wide <- dcast(df2, time + D ~ C, value.var = "B")
wide
  time D C1 C2 C3 1: Nov 1st 2014, 17:36:50.000 L1 97.017 94.321 85.014 2: Nov 1st 2014, 17:37:50.000 L2 87.1 67.1 <NA> 

Now both partial results can be joined together: 现在两个部分结果可以连接在一起:

df1[wide, on = .(time, D)]
  time AD C1 C2 C3 1: Nov 1st 2014, 17:36:50.000 20.79 L1 97.017 94.321 85.014 2: Nov 1st 2014, 17:37:50.000 21.8 L2 87.1 67.1 <NA> 

Note that columns B and C have been dropped from the result as they convey no information. 请注意,列BC已从结果中删除,因为它们不传达任何信息。

Compact code 紧凑的代码

This steps above can be combined into fewer statements: 上述步骤可以合并为更少的语句:

library(data.table)
setDT(df)[, (paste0("C", 1:3)) := NULL]
df[A != "NA"][dcast(df[C != "NA"], time + D ~ C, value.var = "B"), on = .(time, D)]
  time ABCD C1 C2 C3 1: Nov 1st 2014, 17:36:50.000 20.79 NA NA L1 97.017 94.321 85.014 2: Nov 1st 2014, 17:37:50.000 21.8 NA NA L2 87.1 67.1 <NA> 

Data 数据

as provided by the OP with NA values given as strings 由OP提供的NA值作为字符串给出

time <- c('Nov 1st 2014, 17:36:50.000','Nov 1st 2014, 17:36:50.000',
          'Nov 1st 2014, 17:36:50.000','Nov 1st 2014, 17:36:50.000', 'Nov 1st 2014, 17:37:50.000','Nov 1st 2014, 17:37:50.000','Nov 1st 2014, 17:37:50.000')
A <- c('20.79','NA','NA','NA','21.8','NA','NA')  
B <- c('NA','97.017','94.321','85.014','NA','87.1','67.1')
C <- c('NA','C1','C2','C3','NA','C1','C2')
D <- c('L1','L1','L1','L1','L2','L2','L2')
C1 <- c('NA','NA','NA','NA','NA','NA','NA')
C2 <- c('NA','NA','NA','NA','NA','NA','NA')
C3 <- c('NA','NA','NA','NA','NA','NA','NA')
df <- data.frame(time,A,B,C,D,C1,C2,C3)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据其他 3 列 [R] 中的匹配值计算我在 1 列中求和的行数? - How do I count the number of rows I have summed values in 1 column based on matching values in 3 other columns [R]? 当这些值显示在 R 的其他列中时,如何用值填充列? - How do I fill a column with values when those values are displayed in other columns in R? 如何根据其他列中的值将一列中的特定值向上移动一行? - How do I move specific values in a column up one row based on values in other columns? 如何根据R中的匹配ID添加列值? - How do I add column values based on matching IDs in R? 如何基于一列的部分与另一数据框中的值的匹配来填充R中的列 - How to fill columns in R based on matching parts of one column to values in another data frame 如何根据R中另一列中的值替换数据框的列中的值? - How to replace values in the columns of a dataframe based on the values in the other column in R? 如何根据其他列R中的值对一列中的值求和? - How to sum values in one column based on values in other columns R? 如何根据 R 中另一列的值填写空白 - How to fill in blanks based off another column's values in R 根据 R 中其他列的值创建列 - Creating column based on values of other columns in R 如何根据其他列将行值连接到新列? - How to concatenate row values to a new column based on other columns?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM