繁体   English   中英

将非 NA 单元格向左移动

[英]Shifting non-NA cells to the left

我的数据集中有很多 NA,我需要将所有这些单元格(在行级别)向左移动。

示例 - 我的数据框:

    df=data.frame(x=c("l","m",NA,NA,"p"),y=c(NA,"b","c",NA,NA),z=c("u",NA,"w","x","y"))
    df
         x    y    z
    1    l <NA>    u
    2    m    b <NA>
    3 <NA>    c    w
    4 <NA> <NA>    x
    5    p <NA>    y

我想将上面的数据框转换成这样:

      x    y  z
    1 l    u NA
    2 m    b NA
    3 c    w NA
    4 x <NA> NA
    5 p    y NA

请帮忙。

谢谢。

您可以使用标准apply功能:

df=data.frame(x=c("l","m",NA,NA,"p"),y=c(NA,"b","c",NA,NA),z=c("u",NA,"w","x","y"))
df2 = as.data.frame(t(apply(df,1, function(x) { return(c(x[!is.na(x)],x[is.na(x)]) )} )))
colnames(df2) = colnames(df)

> df
     x    y    z
1    l <NA>    u
2    m    b <NA>
3 <NA>    c    w
4 <NA> <NA>    x
5    p <NA>    y
> df2
  x    y    z
1 l    u <NA>
2 m    b <NA>
3 c    w <NA>
4 x <NA> <NA>
5 p    y <NA>

感谢@Richard Scriven 的良好观察

A)使用is.naorderlapplyrbind进行聚合

nosort.df<-do.call(rbind,lapply(1:nrow(df),function(x) { z=df[x,][order(is.na(df[x,]))];colnames(z)<-c("x","y","z");return(z) } ))

> nosort.df
  x    y    z
1 l    u <NA>
2 m    b <NA>
3 c    w <NA>
4 x <NA> <NA>
5 p    y <NA>

B)如果需要排序的行:

使用sort , lapplyrbind

sort.df<-do.call(rbind,lapply(1:nrow(df),function(x) { z=sort(df[x,],na.last=TRUE);colnames(z)<-c("x","y","z");return(z) } ))

> sort.df
  x    y    z
1 l    u <NA>
2 b    m <NA>
3 c    w <NA>
4 x <NA> <NA>
5 p    y <NA> 

如果你不会得到更短的答案,这应该会有所帮助:

df=data.frame(x=c("l","m",NA,NA,"p"),y=c(NA,"b","c",NA,NA),z=c("u",NA,"w","x","y"))
sapply(df,as.character)


for(i in 1:nrow(df)){
  sub <- df[i,c(which(!is.na(df[i,])),which(is.na(df[i,])))] 
  colnames(sub) <- colnames(df)
  df[i,] <- sub
}

另一个语法较短的答案:

df=data.frame(x=c("l","m",NA,NA,"p"),y=c(NA,"b","c",NA,NA),z=c("u",NA,"w","x","y"))

      x   y   z  
[1,] "l" NA  "u"
[2,] "m" "b" NA 
[3,] NA  "c" "w"
[4,] NA  NA  "x"
[5,] "p" NA  "y"



sorted.df <- as.data.frame(t(apply(df, 1, function(x) x[order(is.na(x))])))

     [,1] [,2] [,3]
[1,] "l"  "u"  NA  
[2,] "m"  "b"  NA  
[3,] "c"  "w"  NA  
[4,] "x"  NA   NA  
[5,] "p"  "y"  NA 

有许多重复的问题(这里这里)。 我收集了一些更惯用的答案,并根据我自己的Rcpp实现对它们进行了基准测试。

为简单起见,我比较了将字符矩阵作为输入并作为输出返回的函数,而不是仅包含字符变量的数据框。 您可以使用as.matrixas.data.frame从一个强制转换为另一个。

Rcpp::sourceCpp(code = '
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
void shift_na_in_place(CharacterMatrix x)
{
  int m = x.nrow();
  int n = x.ncol();
  for (int i = 0, k = 0, k0 = 0; i < m; ++i) {
    for (int j = 0; j < n; ++j) {
      if (x[k] != NA_STRING) {
        x[k0] = x[k];
        k0 += m;
      }
      k += m;
    }
    while (k0 < k) {
      x[k0] = NA_STRING;
      k0 += m;
    }
    k = (k % m) + 1;
    k0 = k;
  }
}

// [[Rcpp::export]]
CharacterMatrix shift_na(CharacterMatrix x)
{
  CharacterMatrix y = clone(x);
  shift_na_in_place(y);
  return y;
}
')
f1 <- function(x) {
  t(apply(x, 1L, function(y) {r <- is.na(y); c(y[!r], y[r])}))
}
f2 <- function(x) {
  t(apply(x, 1L, function(y) y[order(is.na(y), method = "radix")]))
}
f3 <- function(x) {
  d <- dim(x)
  dn <- dimnames(x)
  matrix(x[order(row(x), is.na(x), method = "radix")],
         nrow = d[1L], ncol = d[2L], byrow = TRUE,
         dimnames = if (!is.null(dn)) c(dn[1L], list(NULL)))
}
f4 <- function(x) {
  d <- dim(x)
  dn <- dimnames(x)
  matrix(x[order(is.na(x) + (row(x) - 1L) * 2L + 1L, method = "radix")],
         nrow = d[1L], ncol = d[2L], byrow = TRUE,
         dimnames = if (!is.null(dn)) c(dn[1L], list(NULL)))
}
set.seed(1L)
x <- sample(c(letters, NA), size = 1e+06L, replace = TRUE, prob = c(rep(1, 26), 13))
dim(x) <- c(1e+05L, 10L)
microbenchmark::microbenchmark(shift_na(x), f1(x), f2(x), f3(x), f4(x), check = "identical")
Unit: milliseconds
        expr       min        lq      mean    median        uq       max neval
 shift_na(x)  10.51297  10.71512  11.26035  10.82689  10.95819  25.86571   100
       f1(x) 136.64972 150.06148 179.82575 169.18783 212.53840 249.85297   100
       f2(x) 703.83650 755.42414 771.16689 762.34443 795.05738 850.33926   100
       f3(x)  18.22811  18.93702  23.92874  21.48275  22.10351  71.41487   100
       f4(x)  30.55000  31.27300  34.28563  33.78949  34.48608  81.68077   100

正如您所料,专用Rcpp实现shift_na最快,但f3f4并没有慢多少。 一些更精细的细节:

  • f1f2调用apply ,它建立在 R 的for循环上,所以它们很慢也就不足为奇了。

  • f3f4必须为is.na(x)row(x)分配内存,这可能会成为足够大的x的障碍。

  • f3f4快,因为只有当被排序的整数向量的范围(最大值减去最小值)小于 100000(参见?sort )时, "radix"排序才使用稍微快一点的线性时间算法。 这里的范围是:

     is.na(x): 1 row(x): 99999 is.na(x) + (row(x) - 1L) * 2L + 1L: 199999
  • shift_na(x)创建x的副本并就地修改副本。 如果因为x非常大而无法为副本分配内存,那么您可以执行shift_na_in_place(x)来修改x

我们也可以在这里使用purrr包中的pmap函数来purrr巨大的优势:

library(dplyr)
library(purrr)

df %>% 
  pmap(., ~ c(c(...)[!is.na(c(...))], c(...)[is.na(c(...))])) %>%
  exec(rbind, !!!.) %>%
  as_tibble()

# A tibble: 5 x 3
  x     z     y    
  <chr> <chr> <chr>
1 l     u     NA   
2 m     b     NA   
3 c     w     NA   
4 x     NA    NA   
5 p     y     NA  

我在我的包dedupewider (在 CRAN 上可用)中包含了这个任务的函数。 它允许将NA向右、向左甚至顶部和底部移动:

library(dedupewider)

df <- data.frame(x = c("l", "m", NA, NA, "p"),
                 y = c(NA, "b", "c", NA, NA),
                 z = c("u", NA, "w", "x", "y"))

na_move(df) # 'right' direction is by default

#>   x    y  z
#> 1 l    u NA
#> 2 m    b NA
#> 3 c    w NA
#> 4 x <NA> NA
#> 5 p    y NA

它实现了对数据进行整形(从宽格式到长格式再到宽格式)的解决方案,内部使用了data.table函数。 因此,它比使用apply标准解决方案要快得多:

library(dedupewider)
library(microbenchmark)

df <- data.frame(x = c("l", "m", NA, NA, "p"),
                 y = c(NA, "b", "c", NA, NA),
                 z = c("u", NA, "w", "x", "y"))

df <- do.call(rbind, replicate(10000, df, simplify = FALSE))

apply_function <- function(df) {
  as.data.frame(t(apply(df, 1, function(x) c(x[!is.na(x)], x[is.na(x)]))))
}

microbenchmark(apply_function(df), na_move(df))

#> Unit: milliseconds
#>                expr      min       lq      mean    median       uq      max
#>  apply_function(df) 289.2032 361.0178 475.65281 425.79355 545.6405 999.4086
#>         na_move(df)  51.0419  58.1426  75.32407  65.01445  92.8706 216.6384

如果您不想使用 VBA,可以尝试以下步骤。

1. Select your dataset
2. Replace NA will empty cells
3. press F5 and select blanks ok
4. right click on any of the selection and delete (left)

我希望这有帮助。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM