简体   繁体   English

从R中的行中提取值

[英]Extracting values from rows in R

I have a data frame with large number of columns, each row has a bunch of -1 values and I only want to retain the values in a row that are not -1. 我有一个包含大量列的数据框,每一行都有一堆-1值,我只想将非-1的值保留在一行中。 For example, if my data is: 例如,如果我的数据是:

A1 A2 A3 A4 A5 
-1 -1  2 -1  6
 2 -1 -1 -1 -1
 4 -1 -1 -1  3
 6  5 -1  2  2

I want the output to extract all the values in a row apart from -1 to other variables, say: 我希望输出提取除-1到其他变量之外的一行中的所有值,例如:

V1 V2 V3 V4
2   6
2
4   3
6   5  2  2

Row 1 and row 3 have two values that are not -1 so these two values will be moved V1 and V2 and then V3 and V4 become empty. 第1行和第3行有两个不为-1的值,因此这两个值将被移动V1和V2,然后V3和V4变为空。 Row 2 has 1 value so it occupies V1 so V2, V3 and V4 will be empty for this row. 第2行的值为1,因此它占用V1,因此该行的V2,V3和V4将为空。 Row 4 has four values that are not -1. 第4行有四个不为-1的值。 Then all these values will be occupied in new variables V1 to V4. 然后所有这些值将被新变量V1至V4占用。

con <- textConnection("
A1 A2 A3 A4 A5
-1 -1 2 -1 6
2 -1 -1 -1 -1
4 -1 -1 -1 3
6 5 -1 2 2")

df <- read.delim(con, sep = " ")

df2 <- df
df2[,] <- ""
m <- 0

for(i in 1:nrow(df)) {
  x <- df[i,][df[i,] != -1]
  df2[i,1:length(x)] <- x
  m <- max(m, length(x))
}
df2 <- df2[, 1:m]

colnames(df2) <- paste0("V", 1:m)
df2
#   V1 V2 V3 V4
# 1  2  6      
# 2  2         
# 3  4  3      
# 4  6  5  2  2

Looks like we can do this with apply 看起来我们可以通过apply做到这一点

Filter(function(x) !all(is.na(x)), as.data.frame(t(apply(df1, 1, 
               function(x) c(x[x!= -1], rep(NA, sum(x == -1)))))))
#  V1 V2 V3 V4
#1  2  6 NA NA
#2  2 NA NA NA
#3  4  3 NA NA
#4  6  5  2  2

dt2 is the final output. dt2是最终输出。

# Create example data frame
dt <- read.table(text = "A1 A2 A3 A4 A5 
-1 -1  2 -1  6
                 2 -1 -1 -1 -1
                 4 -1 -1 -1  3
                 6  5 -1  2  2",
                 header = TRUE)

# Replace -1 with NA
dt[dt == -1] <- NA

# Sort each row in the data frame, the result is a list
dt_list <- apply(dt, 1, sort)

# Find the maximum length of each row with non-NA values
max_len <- max(sapply(dt_list, length))

# Add NA based on the length of each row
dt_list2 <- lapply(dt_list, function(x){
  if (length(x) < max_len){
    x <- c(x, rep(NA, max_len - length(x)))
  } 
  return(x)
})

# Combine all rows, create a new data frame
dt2 <- as.data.frame(do.call(rbind, dt_list2))

# Change the column name
colnames(dt2) <- paste0("V", 1:ncol(dt2))

dt2
  V1 V2 V3 V4
1  2  6 NA NA
2  2 NA NA NA
3  3  4 NA NA
4  2  2  5  6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM