简体   繁体   English

创建列出不同观察结果的列

[英]Creating column that lists distinct observations

I have a data frame of observations that looks like this (showing course numbers of college classes offered each term).我有一个看起来像这样的观察数据框(显示每个学期提供的大学课程的课程编号)。 The columns are very long and of varying lengths柱子很长而且长短不一

  spring   summer   fall
   4a       5b       5c
   4a       9c       11b
   7c       5b       8a 
   ...      ...      ...

I want to reformat it to make it look like this.我想重新格式化它以使其看起来像这样。 First, I want to create a column, "Course_Names", that shows all names of distinct course offerings possible.首先,我想创建一个列“Course_Names”,它显示可能提供的不同课程的所有名称。 Then, I want to count the number of sections of each course offered each semester.然后,我想计算每学期提供的每门课程的部分数量。

   Course_Names   spring   summer   fall
   4a             2        0        0
   5b             0        2        0
   5c             0        0        1
   7c             1        0        0
   8a             1        0        1
   9c             0        1        0
   11b            0        0        1        

Any advice or links to relevant posts would be very much appreciated!任何建议或相关帖子的链接将不胜感激! Thank you!谢谢!

In base R , an option would be to stack the data.frame into a two column dataset and use tablebase R ,一个选项是将 data.frame stack成两列数据集并使用table

table(stack(df1))
#    ind
#values spring summer fall
#   11b      0      0    1
#   4a       2      0    0
#   5b       0      2    0
#   5c       0      0    1
#   7c       1      0    0
#   8a       0      0    1
#   9c       0      1    0

Or in tidyverse , we can reshape into 'long' format with pivot_longer , get the count and reshape into 'wide或者在tidyverse ,我们可以使用pivot_longer将其重塑为“long”格式,获取count并重塑为“wide”

library(dplyr)
library(tidyr)
df1 %>%
    pivot_longer(everything()) %>%
    count(name, Course_Names = value) %>%
    pivot_wider(names_from = name, values_from = n, values_fill = list(n = 0))
# A tibble: 7 x 4
#  Course_Names  fall spring summer
#  <chr>        <int>  <int>  <int>
#1 11b              1      0      0
#2 5c               1      0      0
#3 8a               1      0      0
#4 4a               0      2      0
#5 7c               0      1      0
#6 5b               0      0      2
#7 9c               0      0      1

data数据

df1 <- structure(list(spring = c("4a", "4a", "7c"), summer = c("5b", 
"9c", "5b"), fall = c("5c", "11b", "8a")), class = "data.frame", row.names = c(NA, 
-3L))

You can do this by gathering the data and then spreading it again using those functions from tidyr package as follows;您可以通过收集数据然后使用 tidyr 包中的这些函数再次传播它来做到这一点,如下所示;

library(dplyr)
library(tidyr)

data <-
  data.frame(
    spring = c("4a", "4a", "7c"),
    summer = c("5b", "9c", "5b"),
    fall = c("5c", "11b", "8a")
  )

result <-
  data %>%
  gather(key = "Course_Names", value = "Course") %>%
  group_by(Course_Names, Course) %>%
  count() %>%
  spread(key = Course_Names, value = n) %>%
  replace(is.na(.), 0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM