简体   繁体   English

循环遍历数据框并获取计数。 Output 转其他数据帧

[英]Loop through data frame and get counts. Output to other data frame

I have an 30 * 9 data frame filled with integers 1-9.我有一个 30 * 9 的数据框,里面填充了整数 1-9。 Each integer can feature multiple times in a column, or none at all.每个 integer 可以在一列中出现多次,或者根本没有。

I basically wanted to calculate the number of times a number appears, in order to generate a column of 9 rows (of counts) for each element of the original data frame, to end up with a 9 * 9 data frame of counts.我基本上想计算一个数字出现的次数,以便为原始数据帧的每个元素生成一列 9 行(计数),最终得到一个 9 * 9 的计数数据帧。 I also wanted to have a 0 placed where a number does not appear in a particular column.我还想在特定列中没有出现数字的地方放置一个 0。

So far I tried multiple approaches with for loops, tapply, functions etc. But I cannot seem to end up with a result which can be stored directly into a new data frame in a loop.到目前为止,我尝试了使用 for 循环、tapply、函数等的多种方法。但我似乎无法得到可以直接存储到循环中的新数据帧中的结果。

for (i in seq_along(columnHeaderQuosureList)) {
         original_data_frame %>% 
           group_by(!! columnHeaderQuosureList[[i]]) %>% # Unquote with !!
           count(!! columnHeaderQuosureList[[i]]) %>% 
           print()
       } 

This works and prints each count for each column.这有效并打印每列的每个计数。 I tried replacing print() with return() and then trying to cbind the returned output with the result_data_frame.我尝试将 print() 替换为 return(),然后尝试将返回的 output 与 result_data_frame 进行 cbind。 Unfortunately I am getting nowhere, and I do not think my approach is feasible.不幸的是,我一无所获,而且我认为我的方法不可行。

Does anyone have any better ideas please?请问有人有更好的想法吗?

The function to count instances of unique values in R is table .用于计算 R 中唯一值实例的 function 是table

# simulated data
df <- as.data.frame(matrix(sample(1:9, 30*9, TRUE), ncol=9))

# use stack to turn column name into a factor column (long format)
table(stack(df))

      ind
values V1 V2 V3 V4 V5 V6 V7 V8 V9
     1  4  0  1  2  6  3  6  3  2
     2  2  2  5  4  4  4  3  3  2
     3  4  4  3  7  2  7  1  4  2
     4  1  5  1  3  3  5  6  3  5
     5  3  4  4  4  4  2  2  3  6
     6  7  5  3  2  3  0  1  3  3
     7  4  5  3  3  2  1  3  3  1
     8  3  2  3  3  5  2  6  4  3
     9  2  3  7  2  1  6  2  4  6

Edit: forgot the tricky bit.编辑:忘记了棘手的一点。 The output of table is a table object, which looks like a matrix but gets turned into a long format if you try to do as.data.frame .表的 output 是table object,它看起来像一个矩阵,但如果您尝试执行as.data.frame则会变成长格式。 To turn your result into a 9x9 df, use要将结果转换为 9x9 df,请使用

as.data.frame.matrix(table(stack(df)))

Caveat: if for some reason one of the 9 digits doesn't appear anywhere in the original df, then that row will be skipped (instead of being filled with 0s).警告:如果由于某种原因 9 个数字之一没有出现在原始 df 中的任何位置,则该行将被跳过(而不是用 0 填充)。

This is very similar to the behaviour of tabulate() , so you can do:这与tabulate()的行为非常相似,因此您可以这样做:

#Create example data
df <- as.data.frame(matrix(sample(1:9, 30*9, TRUE), ncol=9))

#Counts of digits 1-9
as.data.frame(sapply(df, tabulate, nbins=9))

Here's a tidyverse solution that is robust with respect to the numbert of columns and the number of distinct values they contain.这是一个 tidyverse 解决方案,它在列数和它们包含的不同值的数量方面是稳健的。 I avoid recourse to loops andnon-standard evaluation by making the data tidy by pivoting.我通过旋转使数据整洁,从而避免求助于循环和非标准评估。

First, create some test data首先,创建一些测试数据

library(tidyverse)

# For reproducibility
set.seed(123)

d <- tibble(
  c1=floor(runif(30, 1, 10)),
  c2=floor(runif(30, 1, 10)),
  c3=floor(runif(30, 1, 10)),
  c4=floor(runif(30, 1, 10)),
  c5=floor(runif(30, 1, 10)),
  c6=floor(runif(30, 1, 10)),
  c7=floor(runif(30, 1, 10)),
  c8=floor(runif(30, 1, 10)),
  c9=floor(runif(30, 1, 10))
)
d
# A tibble: 30 × 9
      c1    c2    c3    c4    c5    c6    c7    c8    c9
   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1     3     9     6     2     6     8     8     5     5
 2     8     9     1     6     3     5     3     3     6
 3     4     7     4     4     3     4     7     2     2
 4     8     8     3     6     2     3     3     7     6
 5     9     1     8     3     4     1     6     1     3
 6     1     5     5     2     9     4     5     7     7
 7     5     7     8     8     2     6     3     4     4
 8     9     2     8     1     1     2     6     4     9
 9     5     3     8     5     2     5     9     8     9
10     5     3     4     5     7     2     9     9     7
# … with 20 more rows

Now solve the problem现在解决问题

d %>% 
  pivot_longer(
    everything(),
    names_to="Column", 
    values_to="Value"
  ) %>% 
  group_by(Column, Value) %>% 
  summarise(N=n(), .groups="drop") %>% 
  pivot_wider(
    names_from=Column,
    values_from=N,
    id_cols=Value,
    values_fill=0
  )
# A tibble: 9 × 10
  Value    c1    c2    c3    c4    c5    c6    c7    c8    c9
  <dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1     1     3     2     3     2     3     1     0     3     1
2     2     1     7     3     4     4     4     3     3     3
3     3     4     4     2     3     6     2     7     3     3
4     4     1     5     6     3     3     7     3     5     4
5     5     5     2     2     5     1     5     3     3     5
6     6     4     1     3     5     3     5     6     2     4
7     7     3     3     3     1     4     3     1     5     3
8     8     2     3     6     1     3     3     2     3     3
9     9     7     3     2     6     3     0     5     3     4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM