[英]how to combine sparse columns tidily
A colleague has some data composed of many sparse columns that should be collapsed into a few filled columns. 一位同事有一些数据由许多稀疏列组成,这些列应该折叠成几个填充列。 For example: 例如:
d1 <- data.frame(X1 = c(rep("Northampton", times=3), rep(NA, times=7)),
X2 = c(rep(NA, times=3), rep("Amherst", times=5), rep(NA, times=2)),
X3 = c(rep(NA, times=8), rep("Hadley", times=2)),
X4 = c(rep("Stop and Shop", times=2), rep(NA, times=6), rep("Stop and Shop", times=2)),
X5 = c(rep(NA, times=2), rep("Whole Foods", times=6), rep(NA, times=2)))
d1
X1 X2 X3 X4 X5
1 Northampton <NA> <NA> Stop and Shop <NA>
2 Northampton <NA> <NA> Stop and Shop <NA>
3 Northampton <NA> <NA> <NA> Whole Foods
4 <NA> Amherst <NA> <NA> Whole Foods
5 <NA> Amherst <NA> <NA> Whole Foods
6 <NA> Amherst <NA> <NA> Whole Foods
7 <NA> Amherst <NA> <NA> Whole Foods
8 <NA> Amherst <NA> <NA> Whole Foods
9 <NA> <NA> Hadley Stop and Shop <NA>
10 <NA> <NA> Hadley Stop and Shop <NA>
X1:X3
should be collapsed into one column named Town and X4:X5
into one column named Store. X1:X3
应该折叠成一个名为Town的列,将X4:X5
折叠到一个名为Store的列中。 There must be a tidyverse solution here. 这里必须有一个整齐的解决方案。 I've tried with gather()
and unite()
but haven't found anything elegant. 我尝试过使用gather()
和unite()
但是没有找到任何优雅的东西。
You can use coalesce
: 你可以使用coalesce
:
d1 %>% mutate_if(is.factor, as.character) %>% # coerce explicitly
transmute(town = coalesce(X1, X2, X3),
store = coalesce(X4, X5))
## town store
## 1 Northampton Stop and Shop
## 2 Northampton Stop and Shop
## 3 Northampton Whole Foods
## 4 Amherst Whole Foods
## 5 Amherst Whole Foods
## 6 Amherst Whole Foods
## 7 Amherst Whole Foods
## 8 Amherst Whole Foods
## 9 Hadley Stop and Shop
## 10 Hadley Stop and Shop
I think a sequence of gather()
calls and some pruning will get you what you want. 我认为一系列的gather()
调用和一些修剪会得到你想要的东西。 One wrinkle is to use the na.rm = TRUE
argument to gather()
to cull out the unwanted rows. 一个问题是使用na.rm = TRUE
参数来gather()
以剔除不需要的行。
d1 %>%
gather(key = "town", value = "town_name", X1:X3, na.rm = TRUE) %>%
gather(key = "store", value = "store_name", X4:X5, na.rm = TRUE) %>%
select(-town, -store)
Does that do the trick? 这样做诀窍吗?
You can also do this in base R with apply
run rowwise: 您也可以在base R中执行此操作,并使用apply
rowwise:
d2 <- data.frame(X1 = apply(d1[,c("X1", "X2", "X3")], 1, function(x) x[!is.na(x)]),
X2 = apply(d1[,c("X4", "X5")], 1, function(x) x[!is.na(x)]),
stringsAsFactors = FALSE)
Result: 结果:
> d2
X1 X2
1 Northampton Stop and Shop
2 Northampton Stop and Shop
3 Northampton Whole Foods
4 Amherst Whole Foods
5 Amherst Whole Foods
6 Amherst Whole Foods
7 Amherst Whole Foods
8 Amherst Whole Foods
9 Hadley Stop and Shop
10 Hadley Stop and Shop
Here is another way with base R
using pmax/pmin
这是使用pmax/pmin
base R
另一种方式
data.frame(lapply(list(Town = d1[1:3], Store = d1[4:5]), function(x)
do.call(pmax, c(x, na.rm = TRUE))), stringsAsFactors=FALSE)
# Town Store
#1 Northampton Stop and Shop
#2 Northampton Stop and Shop
#3 Northampton Whole Foods
#4 Amherst Whole Foods
#5 Amherst Whole Foods
#6 Amherst Whole Foods
#7 Amherst Whole Foods
#8 Amherst Whole Foods
#9 Hadley Stop and Shop
#10 Hadley Stop and Shop
d1 <- data.frame(X1 = c(rep("Northampton", times=3),rep(NA, times=7)),
X2 = c(rep(NA, times=3), rep("Amherst", times=5), rep(NA, times=2)),
X3 = c(rep(NA, times=8), rep("Hadley", times=2)),
X4 = c(rep("Stop and Shop", times=2), rep(NA, times=6), rep("Stop and Shop", times=2)),
X5 = c(rep(NA, times=2), rep("Whole Foods", times=6),
rep(NA, times=2)), stringsAsFactors=FALSE)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.