简体   繁体   English

计算每个值在行数据框中出现的次数r

[英]Count the number of times each value appears in a row dataframe r

I have the following dataframe (79000 rows): 我有以下数据框(79000行):

ID       P1      P2      P3      P4        P5        P6      P7     P8  
1       38005   28002   38005   38005    28002    34002      NA     NA
2       28002   28002   28002   38005    28002    NA         NA     NA

I want to count the number of times each number(code) appears in a row of dataframe. 我想计算每个数字(代码)出现在数据帧行中的次数。 So the ouput something like this: 所以输出是这样的:

38005 appears 3   28002 appears 2    34002 appears 1     NA appears 2 
28002 appears 3   38005 appears 1    28002 appears 1     NA appears 3 

So far I tried to find the most frequent number (code): 到目前为止,我试图找到最频繁的号码(代码):

df$frequency <-apply(df,1,function(x) names(which.max(table(x))))

But I don't know how to count the number of times each number(code) appears in a row. 但是我不知道如何计算每个数字(代码)连续出现的次数。

Using tidyverse and reshape2 you can do: 使用tidyversereshape2您可以执行以下操作:

df %>%
 gather(var, val, -ID) %>% #Transforming the data from wide to long format
 group_by(val, ID) %>% #Grouping 
 summarise(count = n()) %>% #Performing the count
 dcast(ID~val, value.var = "count") #Reshaping the data

  ID 28002 34002 38005 NA
1  1     2     1     3  2
2  2     4    NA     1  3

Showing the first two non-NA columns with the biggest count according ID: 显示前两个非NA列,其ID最多:

df %>%
 gather(var, val, -ID) %>% #Transforming the data from wide to long format
 group_by(val, ID) %>% #Grouping
 mutate(temp = n()) %>% #Performing the count
 group_by(ID) %>% #Grouping
 mutate(temp2 = dense_rank(temp)) %>% #Creating the rank based on count
 group_by(ID, val) %>% #Grouping
 summarise(temp3 = first(temp2), #Summarising 
           temp = first(temp)) %>%
 arrange(ID, desc(temp3)) %>% #Arranging
 na.omit() %>% #Deleting the rows with NA
 group_by(ID) %>%
 mutate(temp4 = ifelse(temp3 == first(temp3) | temp3 == nth(temp3, 2), 1, 0)) %>% #Identifying the highest and the second highest count
 filter(temp4 == 1) %>% #Selecting the highest and the second highest count
 dcast(ID~val, value.var = "temp") #Reshaping the data

  ID 28002 38005
1  1     2     3
2  2     4     1
ID <- c("P1","P2","P3","P4","P5","P6","P7","P8","P1","P2","P3","P4","P5","P6","P7","P8","P1")
count <-c("38005","28002","38005","38005","28002","34002","NA","NA","2","28002","28002","28002","38005","28002","NA","NA","NA")

df<- cbind.data.frame(ID,count)

table(df$count)

Use this code to find out the count 使用此代码找出计数

I think you're looking for this. 我想您正在寻找这个。

sort(table(unlist(df1[-1])), decreasing=TRUE)
# 31002 38005 24003 34002 28002 
# 13222 13193 13019 13018 12625 

This is, you're excluding column 1 that contains the IDs and "unlist" the rest of your data frame into a vector. 也就是说,您要排除包含ID的第1列,并将数据帧的其余部分“取消列出”到向量中。 The table() then counts the appearance of each value, which you also can sort() . 然后, table()计算每个值的外观,您也可以对其进行sort() Set option decreasing=TRUE and the first two values are the two most frequent ones. 设置选项decreasing=TRUE ,并且前两个值是两个最常使用的值。

If the output is getting to long because of a lot of values, you can include the code into a head(.) . 如果由于很多值而导致输出变长,则可以将代码包含在head(.) The default length of the output is six, but you can limit it to two by specifying n=2 which gives you exactly what you want. 输出的默认长度为6,但是您可以通过指定n=2来将其限制为n=2 ,这将为您提供所需的确切信息。 No need for any packages. 无需任何包装。

head(sort(table(unlist(df1[-1])), decreasing=TRUE), n=2)
# 31002 38005 
# 13222 13193

DATA: 数据:

set.seed(42)  # for sake of reproducibility
df1 <- data.frame(id=1:9750,
                  matrix(sample(c(38005, 28002, 34002, NA, 24003, 31002), 7.8e4, 
                                replace=TRUE), nrow=9750,
                         dimnames=list(NULL, paste0("P", 1:8))))

data.table solution 数据表解决方案

#read sample data
dt <- fread( "ID       P1      P2      P3      P4        P5        P6      P7     P8  
1       38005   28002   38005   38005    28002    34002      NA     NA
             2       28002   28002   28002   38005    28002    NA         NA     NA")
#melt
dt.melt <- melt(dt, id = 1, measure = patterns("^P"), na.rm = FALSE)
#and cast
dcast( dt.melt, ID ~ value, fun = length, fill = 0 )

#    ID 28002 34002 38005 NA
# 1:  1     2     1     3  2
# 2:  2     4     0     1  3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R计算值在每行中出现的次数 - R count number of times a value appears in each row (R统计包)对于向量中的每个值,计算该值出现在不同向量中的次数 - (R statistical package) For each value in a vector, count the number of times that value appears in a different vector 如何计算每个唯一 ID 列中值出现的次数? - How to count number of times value appears in column for each unique id? 在R中如何计算一个值出现的次数并满足多个条件 - in R how to count the number of times a value appears and meet multiple criteria R:累计计算列值出现在其他列中的次数 - R: Cumulatively count number of times column value appears in other column 如何计算一个字符连续出现的次数 - How to count number of times a character appears in a row 如何计算一个值在 1.6 亿乘 2 数据帧中出现的次数 - 内存问题 - How to count the number of times a value appears in a 160Million by 2 dataframe - memory issues R 统计dataframe的每一列中特定值出现的次数 - R count the number of occurrences of a specific value within each column of dataframe R:通过两个变量聚合数据并计算第三个变量的值出现的次数 - R : Aggregate data by two variables and count the number of times the value of a third variable appears 计算数据在R中另一个数据帧中的次数 - Count number of times data is in another dataframe in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM