[英]assign grouping variable based on dataframe rows present R
我在R中有一個這樣的列表:
cat1
cat7
cat10
cat4
frog
dino11
dino12
dino15
rabbit
我需要制作一個新的數據框,如下所示:
cat1 frog
cat7 frog
cat10 frog
cat4 frog
dino11 rabbit
dino12 rabbit
dino15 rabbit
想法? 謝謝!
我們根據'v1'中數字的不出現情況創建分組變量,以lag
,創建一個新列'v2'作為'v1'的last
元素,刪除每個組的last
一行,然后select
有興趣
library(tidyverse)
df %>%
group_by(grp = lag(cumsum(grepl("^[^0-9]+$", v1)), default = 0)) %>%
mutate(v2 = last(v1)) %>%
slice(-n()) %>%
ungroup %>%
select(-grp)
# A tibble: 7 x 2
# v1 v2
# <chr> <chr>
#1 cat1 frog
#2 cat7 frog
#3 cat10 frog
#4 cat4 frog
#5 dino11 rabbit
#6 dino12 rabbit
#7 dino15 rabbit
df <- structure(list(v1 = c("cat1", "cat7", "cat10", "cat4", "frog",
"dino11", "dino12", "dino15", "rabbit")), .Names = "v1",
class = "data.frame", row.names = c(NA, -9L))
與@akrun的答案類似,但具有data.table:
library(data.table)
setDT(df)
df[, .(
anum = v1[-.N],
a = v1[.N]
), by=.(g = cumsum(!(shift(v1) %like% "\\d")))]
g anum a
1: 1 cat1 frog
2: 1 cat7 frog
3: 1 cat10 frog
4: 1 cat4 frog
5: 2 dino11 rabbit
6: 2 dino12 rabbit
7: 2 dino15 rabbit
只有基礎R,你可以做到這一點grepl
和rle
。
where <- grepl("[[:digit:]]", x)
r <- rle(where)
A <- x[where]
B <- rep.int(x[!where], times = r$lengths[r$values])
data.frame(A, B)
# A B
#1 cat1 frog
#2 cat7 frog
#3 cat10 frog
#4 cat4 frog
#5 dino11 rabbit
#6 dino12 rabbit
#7 dino15 rabbit
數據。
x <- scan(what = character(), text = "
cat1
cat7
cat10
cat4
frog
dino11
dino12
dino15
rabbit
")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.