[英]Count number of values followed by another value in r
df <- data.frame(Year = c("May","June","July"),
D1 = c(0,0,0),
D2 = c(0,0,0),
D3 = c(0,0,0),
D4 = c(0,0,1),
D5 = c(0,1,1),
D6 = c(0,1,1),
D7 = c(0,0,0),
D8 = c(0,0,0),
D9 = c(0,0,0),
D10 = c(0,0,0),
D11 = c(0,0,0),
D12 = c(0,0,0),
D13 = c(0,0,0),
D14 = c(1,1,0),
D15 = c(1,0,0),
D16 = c(0,1,0),
D17 = c(1,1,1),
D18 = c(0,0,0),
D19 = c(0,0,0),
D20 = c(0,0,0),
D21 = c(0,1,0),
D22 = c(0,0,0),
D23 = c(0,1,0),
D24 = c(0,0,0),
D25 = c(0,0,0),
D26 = c(1,0,0),
D27 = c(0,0,0),
D28 = c(1,0,1),
D29 = c(1,0,0),
D30 = c(1,1,0),
D31 = c(0,1,1)
)
I have a data frame (subset above) of months and days.我有几个月和几天的数据框(上面的子集)。 I am trying to count the number of days a 1 is followed by a 0, 0 is followed by a 1, etc. For example, May would have 2 ones followed by zeros and two zeros followed by 1s.我正在尝试计算 1 后跟 0、0 后跟 1 等的天数。例如,May 将有 2 个 1 后跟 0 和 2 个 0 后跟 1。 I am thinking a for loop would be the best way to go about this but am having trouble since the comparisons are in rows.我认为 for 循环将是 go 关于此问题的最佳方法,但由于比较是成行的,所以遇到了麻烦。
Based on the updated data, we may need rolling paste
根据更新的数据,我们可能需要滚动paste
library(zoo)
out <- table(apply(df[-1], 1, function(x) rollapply(x, 2, paste, collapse="")))
out
# 00 01 10 11
# 56 14 12 8
sum(out)
#[1] 90
Or can be be made a bit more compact without the anonymous function call或者可以在没有匿名 function 调用的情况下变得更紧凑
table(apply(df[-1], 1, rollapply, width = 2, paste, collapse=""))
Or using tidyverse
或使用tidyverse
library(runner)
library(janitor)
library(dplyr)
library(tidyr)
df %>%
rowwise %>%
summarise(out = list(table(runner(c_across(starts_with('D')),
f = function(x) paste(x, collapse=""), k = 2))), .groups = 'drop') %>%
unnest_wider(c(out)) %>%
adorn_totals()
# 0 00 01 10 11
# 1 19 4 4 3
# 1 16 6 5 3
# 1 21 4 3 2
# Total 56 14 12 8
A base R option using gregexpr
使用gregexpr
的基本 R 选项
v <- do.call(paste0, df[-1])
rev(
stack(
sapply(
c("00", "01", "10", "11"),
function(x) sum(lengths(regmatches(v, gregexpr(x, v)))),
USE.NAMES = TRUE
)
)
)
gives给
ind values
1 00 1
2 01 5
3 10 4
4 11 1
As you already recognized the trouble is in the row comparison... then we can reshape the data from wide format to long format.正如您已经认识到问题在于行比较......然后我们可以将数据从宽格式重塑为长格式。
Warning: In your data, due to the wide format June have 31 days as May/July.
reshape_df <- df %>%
tidyr::pivot_longer(cols = D1:D31, names_to = "date", values_to = "value") %>%
mutate(index = if_else(value != lag(value), 1, 0)) %>%
replace_na(list(index = 0)) %>%
mutate(index_group = cumsum(index))
reshape_df %>%
group_by(index_group) %>%
summarize(first_month = first(Year),
first_date = first(date),
first_value = first(value),
length = n())
Result for this data.此数据的结果。
index_group first_month first_date first_value length pattern
<dbl> <chr> <chr> <dbl> <int> <chr>
1 0 May D1 0 13 0 -> 1
2 1 May D14 1 2 1 -> 0
3 2 May D16 0 1 0 -> 1
4 3 May D17 1 1 1 -> 0
5 4 May D18 0 8 0 -> 1
6 5 May D26 1 1 1 -> 0
7 6 May D27 0 1 0 -> 1
8 7 May D28 1 3 1 -> 0
9 8 May D31 0 5 0 -> 1
10 9 June D5 1 2 1 -> 0
# … with 18 more rows
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.