简体   繁体   English

如何查找数据帧列中值的第一个多次出现之间的最小和最大中间项数

[英]how to find min and max number of intervening items between first multiple occurences of values in a data frame column

I have a factor vector called Categories with 47 levels 我有一个因子类别为47级的因子向量

Categories = as.factor(sort(make.unique(rep(letters, length.out = 47), sep='')))

[1] a  a1 b  b1 c  c1 d  d1 e  e1 f  f1 g  g1 h  h1 i  i1 j  j1 k  k1 l  l1 m  m1 n  n1 o  o1 p  p1 q  q1 r  r1 s  s1 t 
[40] t1 u  u1 v  w  x  y  z 
47 Levels: a a1 b b1 c c1 d d1 e e1 f f1 g g1 h h1 i i1 j j1 k k1 l l1 m m1 n n1 o o1 p p1 q q1 r r1 s s1 t t1 u u1 ... z

I have another vector called cat with 9 of those levels 我还有一个叫cat的矢量,其中有9个级别

cat = Categories[c(7,42,43,24,45,26,35,6,15)]
[1] d  u1 v  l1 x  m1 r  c1 h 
47 Levels: a a1 b b1 c c1 d d1 e e1 f f1 g g1 h h1 i i1 j j1 k k1 l l1 m m1 n n1 o o1 p p1 q q1 r r1 s s1 t t1 u u1 ... z

I also have a dataframe My_Data with 36 rows. 我也有一个36行的数据框My_Data。 One of the columns in the dataframe has multiple occurences of the values from cat 数据框中的一列具有来自cat的值的​​多次出现

My_Data = data.frame(name = make.unique(rep(c(1:10,LETTERS), length.out = 36), sep=''), cat = sample(rep(cat,4),36,replace = FALSE), position = 0)

    name cat position
1     1   v        0
2     2   r        0
3     3   h        0
4     4  m1        0
5     5   h        0
6     6  u1        0
7     7  l1        0
8     8   h        0
9     9  u1        0
10   10   x        0
11    A   x        0
12    B   v        0
13    C   d        0
14    D  c1        0
15    E   r        0
16    F   v        0
17    G  l1        0
18    H   d        0
19    I  l1        0
20    J  c1        0
21    K  u1        0
22    L   x        0
23    M   v        0
24    N   d        0
25    O  l1        0
26    P  m1        0
27    Q   r        0
28    R  m1        0
29    S   h        0
30    T  m1        0
31    U  c1        0
32    V   d        0
33    W   r        0
34    X   x        0
35    Y  c1        0
36    Z  u1        0

Using the code below, I can populate the position column given above to reflect the number of occurence of the value in the cat column: 使用下面的代码,我可以填充上面给出的position列,以反映cat列中值的出现次数:

transform(My_Data, position = ave(as.character(cat), cat, FUN = seq_along))

The first 15 rows of the dataframe My_Data would look like: 数据框My_Data的前15行如下所示:

    name cat position
1     1   v        1
2     2   r        1
3     3   h        1
4     4  m1        1
5     5   h        2
6     6  u1        1
7     7  l1        1
8     8   h        3
9     9  u1        2
10   10   x        1
11    A   x        2
12    B   v        2
13    C   d        1
14    D  c1        1
15    E   r        2

Now I want to calculate the min. 现在我要计算最小值。 and max. 和最大 number of intervening items between any 2 consecutive occurences of the same value of the cat column. cat列的相同值的任何两个连续出现之间的干预项数。

How can I do this? 我怎样才能做到这一点?

If I understand your question, here's one option: 如果我了解您的问题,请选择以下一种方法:

library(tidyverse)

# Data
Categories = as.factor(sort(make.unique(rep(letters, length.out = 47), sep='')))  
cat = Categories[c(7,42,43,24,45,26,35,6,15)]
# Set a seed for reproducibility
set.seed(5)
My_Data = data.frame(name = make.unique(rep(c(1:10,LETTERS), length.out = 36), sep=''), 
                     cat = sample(rep(cat,4),36,replace = FALSE), 
                     position = 0)

The code below summarises to give the minimum and maximum number of intervening rows for each level of cat . 以下代码进行了总结,以给出每级cat的最小和最大中间行数。

# Summarise to give min and max number rows between each occurrence
My_Data %>%
  mutate(row=1:n()) %>% 
  group_by(cat) %>% 
  summarise(min.diff=min(diff(row)-1, na.rm=TRUE),
            max.diff=max(diff(row)-1, na.rm=TRUE))
  cat min.diff max.diff <fctr> <dbl> <dbl> 1 c1 4 6 2 d 1 16 3 h 1 16 4 l1 0 13 5 m1 0 12 6 r 5 15 7 u1 2 7 8 v 1 16 9 x 6 12 

If you want to mark the number of intervening rows in the original data frame: The code below adds a column to the original data frame to give the number of intervening rows since the last occurrence of a given level of cat . 如果要标记原始数据帧中的中间行数:下面的代码在原始数据帧中添加一列,以给出自上次出现给定级别的cat的中间行数。

# Add column with intervening number of rows between each occurrence in cat
My_Data %>%
  mutate(row=1:n()) %>% 
  group_by(cat) %>% 
  mutate(diff=c(NA,diff(row)-1)) %>%
  select(-row)
  name cat position diff <fctr> <fctr> <dbl> <dbl> 1 1 c1 0 NA 2 2 m1 0 NA 3 3 x 0 NA 4 4 d 0 NA 5 5 l1 0 NA 6 6 l1 0 0 7 7 r 0 NA 8 8 c1 0 6 9 9 h 0 NA 10 10 v 0 NA 

Here is a tidy solution using lag() : 这是使用lag()的整洁解决方案:

library(tidyverse)

# create data frame
set.seed(1)
Categories <- as.factor(sort(make.unique(rep(letters, length.out = 47), sep='')))
cat <- Categories[c(7,42,43,24,45,26,35,6,15)]
My_Data <- data.frame(
  name = make.unique(rep(c(1:10,LETTERS), length.out = 36), sep=''), 
  cat = sample(rep(cat,4),36,replace = FALSE), 
  position = 0
)

# solution
My_Data %>%
  mutate(row = 1:n()) %>%
  group_by(cat) %>%
  mutate(inter = row - lag(row) - 1) %>%
  summarize(min_inter = min(inter, na.rm = T), max_inter = max(inter, na.rm = T))

Result: 结果:

# A tibble: 9 x 3
     cat min_inter max_inter
  <fctr>     <dbl>     <dbl>
1     c1         0        10
2      d         4        11
3      h         0         8
4     l1         0         6
5     m1         1         3
6      r         0        16
7     u1         2         5
8      v         1        23
9      x         6        15

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 查找data.frame中的值是否包含在第二个data.frame的最小和最大范围内 - Find if values in data.frame are contained in min and max range of a second data.frame 如何在R的一列中的值序列中找到最大值和最小值? - How to find max and min within sequence of values in a column in R? 如何找到列和数据框之间的匹配? - how to find match between a column and a data frame? 如何在R中的数据帧中找到一列中出现字符串最长的时间以及另一列中对应的第一个和最后一个值? - How to find the longest occurrence of a string in a column and corresponding first and last values from another column in a data frame in R? 如何在R的数据帧的列中查找属于特定范围的项目数 - How to find number of items which falls in a specific range in a column of a data frame in R 扩展数据框架并进行中间观察 - Expand data frame with intervening observations 用R中的现有值替换数据框中的最大值和最小值 - Replace max and min values in data frame with existing values in R 如何获取列的最小值和最大值? - How to get the min and max values of a column? 将数据框分成两部分,分别在R中找到最小值和最大值 - Divide the data frame into two parts, and find the min and max respectively in R R:查找数据框中列的最大/最小值 - R: Find max/min of list of columns in data frame
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM