简体   繁体   中英

Find the number of specific value where is greater than a specific frequency in r

I'm trying to get the frequency distribution for a list if it's over a certain number. In my data, I have multiple columns and I want to generate a code that identifies the frequency of "0" in each column where "0" is greater than 3.

My dataset is like this:

a   b   c   d   e   f   g   h 
0   1   0   1   1   1   1   1
2   0   0   0   0   0   0   0
0   1   2   2   2   1   0   1
0   0   0   0   1   0   0   0
1   0   2   1   1   0   0   0
1   1   0   0   1   0   0   0
0   1   2   2   2   2   2   2

The output of the code that I need is :
Variable     Frequency
a            4 
c            4 
f            4
g            5
h            4

So this will show us the numbers of "0" in the data frame in each column when it is greater than 3.

Thank you.

You can use colSums to count number of 0's in each column and subset the values which are greater than 3.

subset(stack(colSums(df == 0, na.rm = TRUE)), values > 3)

tidyverse way would be:

df %>%
  summarise(across(.fns = ~sum(. == 0, na.rm = TRUE))) %>%
  tidyr::pivot_longer(cols = everything()) %>%
  filter(value > 3)

#  name  value
#  <chr> <int>
#1 a         4
#2 c         4
#3 f         4
#4 g         5
#5 h         4


df <- structure(list(a = c(0L, 2L, 0L, 0L, 1L, 1L, 0L), b = c(1L, 0L, 
1L, 0L, 0L, 1L, 1L), c = c(0L, 0L, 2L, 0L, 2L, 0L, 2L), d = c(1L, 
0L, 2L, 0L, 1L, 0L, 2L), e = c(1L, 0L, 2L, 1L, 1L, 1L, 2L), f = c(1L, 
0L, 1L, 0L, 0L, 0L, 2L), g = c(1L, 0L, 0L, 0L, 0L, 0L, 2L), h = c(1L, 
0L, 1L, 0L, 0L, 0L, 2L)), class = "data.frame", row.names = c(NA, -7L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM