[英]Loop through data frame and get counts. Output to other data frame
I have an 30 * 9 data frame filled with integers 1-9.我有一个 30 * 9 的数据框,里面填充了整数 1-9。 Each integer can feature multiple times in a column, or none at all.
每个 integer 可以在一列中出现多次,或者根本没有。
I basically wanted to calculate the number of times a number appears, in order to generate a column of 9 rows (of counts) for each element of the original data frame, to end up with a 9 * 9 data frame of counts.我基本上想计算一个数字出现的次数,以便为原始数据帧的每个元素生成一列 9 行(计数),最终得到一个 9 * 9 的计数数据帧。 I also wanted to have a 0 placed where a number does not appear in a particular column.
我还想在特定列中没有出现数字的地方放置一个 0。
So far I tried multiple approaches with for loops, tapply, functions etc. But I cannot seem to end up with a result which can be stored directly into a new data frame in a loop.到目前为止,我尝试了使用 for 循环、tapply、函数等的多种方法。但我似乎无法得到可以直接存储到循环中的新数据帧中的结果。
for (i in seq_along(columnHeaderQuosureList)) {
original_data_frame %>%
group_by(!! columnHeaderQuosureList[[i]]) %>% # Unquote with !!
count(!! columnHeaderQuosureList[[i]]) %>%
print()
}
This works and prints each count for each column.这有效并打印每列的每个计数。 I tried replacing print() with return() and then trying to cbind the returned output with the result_data_frame.
我尝试将 print() 替换为 return(),然后尝试将返回的 output 与 result_data_frame 进行 cbind。 Unfortunately I am getting nowhere, and I do not think my approach is feasible.
不幸的是,我一无所获,而且我认为我的方法不可行。
Does anyone have any better ideas please?请问有人有更好的想法吗?
The function to count instances of unique values in R is table
.用于计算 R 中唯一值实例的 function 是
table
。
# simulated data
df <- as.data.frame(matrix(sample(1:9, 30*9, TRUE), ncol=9))
# use stack to turn column name into a factor column (long format)
table(stack(df))
ind
values V1 V2 V3 V4 V5 V6 V7 V8 V9
1 4 0 1 2 6 3 6 3 2
2 2 2 5 4 4 4 3 3 2
3 4 4 3 7 2 7 1 4 2
4 1 5 1 3 3 5 6 3 5
5 3 4 4 4 4 2 2 3 6
6 7 5 3 2 3 0 1 3 3
7 4 5 3 3 2 1 3 3 1
8 3 2 3 3 5 2 6 4 3
9 2 3 7 2 1 6 2 4 6
Edit: forgot the tricky bit.编辑:忘记了棘手的一点。 The output of
table
is a table object, which looks like a matrix but gets turned into a long format if you try to do as.data.frame
.表的 output 是
table
object,它看起来像一个矩阵,但如果您尝试执行as.data.frame
则会变成长格式。 To turn your result into a 9x9 df, use要将结果转换为 9x9 df,请使用
as.data.frame.matrix(table(stack(df)))
Caveat: if for some reason one of the 9 digits doesn't appear anywhere in the original df, then that row will be skipped (instead of being filled with 0s).警告:如果由于某种原因 9 个数字之一没有出现在原始 df 中的任何位置,则该行将被跳过(而不是用 0 填充)。
This is very similar to the behaviour of tabulate()
, so you can do:这与
tabulate()
的行为非常相似,因此您可以这样做:
#Create example data
df <- as.data.frame(matrix(sample(1:9, 30*9, TRUE), ncol=9))
#Counts of digits 1-9
as.data.frame(sapply(df, tabulate, nbins=9))
Here's a tidyverse solution that is robust with respect to the numbert of columns and the number of distinct values they contain.这是一个 tidyverse 解决方案,它在列数和它们包含的不同值的数量方面是稳健的。 I avoid recourse to loops andnon-standard evaluation by making the data tidy by pivoting.
我通过旋转使数据整洁,从而避免求助于循环和非标准评估。
First, create some test data首先,创建一些测试数据
library(tidyverse)
# For reproducibility
set.seed(123)
d <- tibble(
c1=floor(runif(30, 1, 10)),
c2=floor(runif(30, 1, 10)),
c3=floor(runif(30, 1, 10)),
c4=floor(runif(30, 1, 10)),
c5=floor(runif(30, 1, 10)),
c6=floor(runif(30, 1, 10)),
c7=floor(runif(30, 1, 10)),
c8=floor(runif(30, 1, 10)),
c9=floor(runif(30, 1, 10))
)
d
# A tibble: 30 × 9
c1 c2 c3 c4 c5 c6 c7 c8 c9
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 3 9 6 2 6 8 8 5 5
2 8 9 1 6 3 5 3 3 6
3 4 7 4 4 3 4 7 2 2
4 8 8 3 6 2 3 3 7 6
5 9 1 8 3 4 1 6 1 3
6 1 5 5 2 9 4 5 7 7
7 5 7 8 8 2 6 3 4 4
8 9 2 8 1 1 2 6 4 9
9 5 3 8 5 2 5 9 8 9
10 5 3 4 5 7 2 9 9 7
# … with 20 more rows
Now solve the problem现在解决问题
d %>%
pivot_longer(
everything(),
names_to="Column",
values_to="Value"
) %>%
group_by(Column, Value) %>%
summarise(N=n(), .groups="drop") %>%
pivot_wider(
names_from=Column,
values_from=N,
id_cols=Value,
values_fill=0
)
# A tibble: 9 × 10
Value c1 c2 c3 c4 c5 c6 c7 c8 c9
<dbl> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1 1 3 2 3 2 3 1 0 3 1
2 2 1 7 3 4 4 4 3 3 3
3 3 4 4 2 3 6 2 7 3 3
4 4 1 5 6 3 3 7 3 5 4
5 5 5 2 2 5 1 5 3 3 5
6 6 4 1 3 5 3 5 6 2 4
7 7 3 3 3 1 4 3 1 5 3
8 8 2 3 6 1 3 3 2 3 3
9 9 7 3 2 6 3 0 5 3 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.