[英]Creating column that lists distinct observations
I have a data frame of observations that looks like this (showing course numbers of college classes offered each term).我有一个看起来像这样的观察数据框(显示每个学期提供的大学课程的课程编号)。 The columns are very long and of varying lengths
柱子很长而且长短不一
spring summer fall
4a 5b 5c
4a 9c 11b
7c 5b 8a
... ... ...
I want to reformat it to make it look like this.我想重新格式化它以使其看起来像这样。 First, I want to create a column, "Course_Names", that shows all names of distinct course offerings possible.
首先,我想创建一个列“Course_Names”,它显示可能提供的不同课程的所有名称。 Then, I want to count the number of sections of each course offered each semester.
然后,我想计算每学期提供的每门课程的部分数量。
Course_Names spring summer fall
4a 2 0 0
5b 0 2 0
5c 0 0 1
7c 1 0 0
8a 1 0 1
9c 0 1 0
11b 0 0 1
Any advice or links to relevant posts would be very much appreciated!任何建议或相关帖子的链接将不胜感激! Thank you!
谢谢!
In base R
, an option would be to stack
the data.frame into a two column dataset and use table
在
base R
,一个选项是将 data.frame stack
成两列数据集并使用table
table(stack(df1))
# ind
#values spring summer fall
# 11b 0 0 1
# 4a 2 0 0
# 5b 0 2 0
# 5c 0 0 1
# 7c 1 0 0
# 8a 0 0 1
# 9c 0 1 0
Or in tidyverse
, we can reshape into 'long' format with pivot_longer
, get the count
and reshape into 'wide或者在
tidyverse
,我们可以使用pivot_longer
将其重塑为“long”格式,获取count
并重塑为“wide”
library(dplyr)
library(tidyr)
df1 %>%
pivot_longer(everything()) %>%
count(name, Course_Names = value) %>%
pivot_wider(names_from = name, values_from = n, values_fill = list(n = 0))
# A tibble: 7 x 4
# Course_Names fall spring summer
# <chr> <int> <int> <int>
#1 11b 1 0 0
#2 5c 1 0 0
#3 8a 1 0 0
#4 4a 0 2 0
#5 7c 0 1 0
#6 5b 0 0 2
#7 9c 0 0 1
df1 <- structure(list(spring = c("4a", "4a", "7c"), summer = c("5b",
"9c", "5b"), fall = c("5c", "11b", "8a")), class = "data.frame", row.names = c(NA,
-3L))
You can do this by gathering the data and then spreading it again using those functions from tidyr package as follows;您可以通过收集数据然后使用 tidyr 包中的这些函数再次传播它来做到这一点,如下所示;
library(dplyr)
library(tidyr)
data <-
data.frame(
spring = c("4a", "4a", "7c"),
summer = c("5b", "9c", "5b"),
fall = c("5c", "11b", "8a")
)
result <-
data %>%
gather(key = "Course_Names", value = "Course") %>%
group_by(Course_Names, Course) %>%
count() %>%
spread(key = Course_Names, value = n) %>%
replace(is.na(.), 0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.