简体   繁体   English

如何整齐地组合稀疏列

[英]how to combine sparse columns tidily

A colleague has some data composed of many sparse columns that should be collapsed into a few filled columns. 一位同事有一些数据由许多稀疏列组成,这些列应该折叠成几个填充列。 For example: 例如:

d1 <- data.frame(X1 = c(rep("Northampton", times=3), rep(NA, times=7)), 
                 X2 = c(rep(NA, times=3), rep("Amherst", times=5), rep(NA, times=2)), 
                 X3 = c(rep(NA, times=8), rep("Hadley", times=2)), 
                 X4 = c(rep("Stop and Shop", times=2), rep(NA, times=6), rep("Stop and Shop", times=2)), 
                 X5 = c(rep(NA, times=2), rep("Whole Foods", times=6), rep(NA, times=2)))

d1
            X1      X2     X3            X4          X5
1  Northampton    <NA>   <NA> Stop and Shop        <NA>
2  Northampton    <NA>   <NA> Stop and Shop        <NA>
3  Northampton    <NA>   <NA>          <NA> Whole Foods
4         <NA> Amherst   <NA>          <NA> Whole Foods
5         <NA> Amherst   <NA>          <NA> Whole Foods
6         <NA> Amherst   <NA>          <NA> Whole Foods
7         <NA> Amherst   <NA>          <NA> Whole Foods
8         <NA> Amherst   <NA>          <NA> Whole Foods
9         <NA>    <NA> Hadley Stop and Shop        <NA>
10        <NA>    <NA> Hadley Stop and Shop        <NA>

X1:X3 should be collapsed into one column named Town and X4:X5 into one column named Store. X1:X3应该折叠成一个名为Town的列,将X4:X5折叠到一个名为Store的列中。 There must be a tidyverse solution here. 这里必须有一个整齐的解决方案。 I've tried with gather() and unite() but haven't found anything elegant. 我尝试过使用gather()unite()但是没有找到任何优雅的东西。

You can use coalesce : 你可以使用coalesce

d1 %>% mutate_if(is.factor, as.character) %>%    # coerce explicitly
    transmute(town = coalesce(X1, X2, X3), 
              store = coalesce(X4, X5))

##           town         store
## 1  Northampton Stop and Shop
## 2  Northampton Stop and Shop
## 3  Northampton   Whole Foods
## 4      Amherst   Whole Foods
## 5      Amherst   Whole Foods
## 6      Amherst   Whole Foods
## 7      Amherst   Whole Foods
## 8      Amherst   Whole Foods
## 9       Hadley Stop and Shop
## 10      Hadley Stop and Shop

I think a sequence of gather() calls and some pruning will get you what you want. 我认为一系列的gather()调用和一些修剪会得到你想要的东西。 One wrinkle is to use the na.rm = TRUE argument to gather() to cull out the unwanted rows. 一个问题是使用na.rm = TRUE参数来gather()以剔除不需要的行。

d1 %>% 
  gather(key = "town", value = "town_name", X1:X3, na.rm = TRUE) %>% 
  gather(key = "store", value = "store_name", X4:X5, na.rm = TRUE) %>%
  select(-town, -store)

Does that do the trick? 这样做诀窍吗?

You can also do this in base R with apply run rowwise: 您也可以在base R中执行此操作,并使用apply rowwise:

d2 <- data.frame(X1 = apply(d1[,c("X1", "X2", "X3")], 1, function(x) x[!is.na(x)]),
                 X2 = apply(d1[,c("X4", "X5")], 1, function(x) x[!is.na(x)]),
                 stringsAsFactors = FALSE)

Result: 结果:

> d2
            X1            X2
1  Northampton Stop and Shop
2  Northampton Stop and Shop
3  Northampton   Whole Foods
4      Amherst   Whole Foods
5      Amherst   Whole Foods
6      Amherst   Whole Foods
7      Amherst   Whole Foods
8      Amherst   Whole Foods
9       Hadley Stop and Shop
10      Hadley Stop and Shop

Here is another way with base R using pmax/pmin 这是使用pmax/pmin base R另一种方式

data.frame(lapply(list(Town = d1[1:3], Store = d1[4:5]), function(x) 
           do.call(pmax, c(x, na.rm = TRUE))), stringsAsFactors=FALSE)
#          Town         Store
#1  Northampton Stop and Shop
#2  Northampton Stop and Shop
#3  Northampton   Whole Foods
#4      Amherst   Whole Foods
#5      Amherst   Whole Foods
#6      Amherst   Whole Foods
#7      Amherst   Whole Foods
#8      Amherst   Whole Foods
#9       Hadley Stop and Shop
#10      Hadley Stop and Shop

data 数据

d1 <- data.frame(X1 = c(rep("Northampton", times=3),rep(NA, times=7)),
   X2 = c(rep(NA, times=3), rep("Amherst", times=5), rep(NA, times=2)),
  X3 = c(rep(NA, times=8), rep("Hadley", times=2)), 
  X4 = c(rep("Stop and Shop", times=2), rep(NA, times=6), rep("Stop and Shop", times=2)), 
  X5 = c(rep(NA, times=2), rep("Whole Foods", times=6), 
        rep(NA, times=2)), stringsAsFactors=FALSE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM