简体   繁体   English

计算 r 中的值的数量,后跟另一个值

[英]Count number of values followed by another value in r

df <- data.frame(Year = c("May","June","July"), 
                 D1 = c(0,0,0), 
                 D2 = c(0,0,0), 
                 D3 = c(0,0,0), 
                 D4 = c(0,0,1), 
                 D5 = c(0,1,1),
                 D6 = c(0,1,1),
                 D7 = c(0,0,0),
                 D8 = c(0,0,0),
                 D9 = c(0,0,0),
                 D10 = c(0,0,0),
                 D11 = c(0,0,0),
                 D12 = c(0,0,0),
                 D13 = c(0,0,0), 
                 D14 = c(1,1,0), 
                 D15 = c(1,0,0),
                 D16 = c(0,1,0),
                 D17 = c(1,1,1),
                 D18 = c(0,0,0),
                 D19 = c(0,0,0),
                 D20 = c(0,0,0),
                 D21 = c(0,1,0),
                 D22 = c(0,0,0),
                 D23 = c(0,1,0), 
                 D24 = c(0,0,0), 
                 D25 = c(0,0,0),
                 D26 = c(1,0,0),
                 D27 = c(0,0,0),
                 D28 = c(1,0,1),
                 D29 = c(1,0,0),
                 D30 = c(1,1,0),
                 D31 = c(0,1,1)
                 )

I have a data frame (subset above) of months and days.我有几个月和几天的数据框(上面的子集)。 I am trying to count the number of days a 1 is followed by a 0, 0 is followed by a 1, etc. For example, May would have 2 ones followed by zeros and two zeros followed by 1s.我正在尝试计算 1 后跟 0、0 后跟 1 等的天数。例如,May 将有 2 个 1 后跟 0 和 2 个 0 后跟 1。 I am thinking a for loop would be the best way to go about this but am having trouble since the comparisons are in rows.我认为 for 循环将是 go 关于此问题的最佳方法,但由于比较是成行的,所以遇到了麻烦。

Based on the updated data, we may need rolling paste根据更新的数据,我们可能需要滚动paste

library(zoo)
out <- table(apply(df[-1], 1, function(x) rollapply(x, 2, paste, collapse="")))
out
#   00 01 10 11 
#   56 14 12  8 

sum(out)
#[1] 90

Or can be be made a bit more compact without the anonymous function call或者可以在没有匿名 function 调用的情况下变得更紧凑

table(apply(df[-1], 1, rollapply, width = 2, paste, collapse=""))

Or using tidyverse或使用tidyverse

library(runner)
library(janitor)
library(dplyr)
library(tidyr)

df %>% 
    rowwise %>%
    summarise(out = list(table(runner(c_across(starts_with('D')),
          f = function(x) paste(x, collapse=""), k = 2))), .groups = 'drop') %>%
    unnest_wider(c(out))  %>%
    adorn_totals() 
#     0 00 01 10 11
#     1 19  4  4  3
#     1 16  6  5  3
#     1 21  4  3  2
# Total 56 14 12  8

A base R option using gregexpr使用gregexpr的基本 R 选项

v <- do.call(paste0, df[-1])
rev(
  stack(
    sapply(
      c("00", "01", "10", "11"),
      function(x) sum(lengths(regmatches(v, gregexpr(x, v)))),
      USE.NAMES = TRUE
    )
  )
)

gives

  ind values
1  00      1
2  01      5
3  10      4
4  11      1

As you already recognized the trouble is in the row comparison... then we can reshape the data from wide format to long format.正如您已经认识到问题在于行比较......然后我们可以将数据从宽格式重塑为长格式。

Warning: In your data, due to the wide format June have 31 days as May/July.

reshape_df <- df %>%
  tidyr::pivot_longer(cols = D1:D31, names_to = "date", values_to = "value") %>%
  mutate(index = if_else(value != lag(value), 1, 0)) %>%
  replace_na(list(index = 0)) %>%
  mutate(index_group = cumsum(index))

reshape_df %>%
  group_by(index_group) %>%
  summarize(first_month = first(Year),
            first_date = first(date),
            first_value = first(value),
            length = n())

Result for this data.此数据的结果。

   index_group first_month first_date first_value length pattern
         <dbl> <chr>       <chr>            <dbl>  <int> <chr>  
 1           0 May         D1                   0     13 0 -> 1 
 2           1 May         D14                  1      2 1 -> 0 
 3           2 May         D16                  0      1 0 -> 1 
 4           3 May         D17                  1      1 1 -> 0 
 5           4 May         D18                  0      8 0 -> 1 
 6           5 May         D26                  1      1 1 -> 0 
 7           6 May         D27                  0      1 0 -> 1 
 8           7 May         D28                  1      3 1 -> 0 
 9           8 May         D31                  0      5 0 -> 1 
10           9 June        D5                   1      2 1 -> 0 
# … with 18 more rows

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM