简体   繁体   English

R 将数据集转换为长格式的复杂版本(id 信息分布在多个不同的行中)

[英]R complicated version of converting dataset to long format (id information is spread across multiple disparate rows)

I have student enrollment data that is currently organized as follows:我有目前组织如下的学生注册数据:

df <- tibble(course_number = c("Snow", 12345, 56789, "Stark", 10111, 21314, 15161),
                    academic_level = c("John", "UG", "UG", "Arya", "GR", "GR", "GR"),
                    course_id = c("middlename", "Wall101", "Wall102", "middlename", "Assassin501", "Assassin502", "Assassin503"))

My actual datasets have thousands of students, as well as a variety of more columns of course information, but the main problem I'm having is converting those rows of names into a new column with repeating name values based on the number of courses each student took.我的实际数据集有数千名学生,以及各种更多的课程信息列,但我遇到的主要问题是根据每个学生的课程数量将这些姓名行转换为具有重复姓名值的新列拿。 I'm familiar with gather and spread, and I have been able to separate out just the name information into its own column (currently saved in a separate df), but I need to find a way to count the courses so I know how many times each name has to repeat.我熟悉收集和传播,我已经能够将名称信息分离到自己的列中(目前保存在单独的 df 中),但我需要找到一种计算课程的方法,以便我知道有多少次每个名字必须重复。

Thanks in advance!提前致谢!

May be we can create a grouping variable based on the occurrence of letters in the 'course_number', create a the 'name' based on the first element of 'course_number' and 'academic_level' and remove the first row也许我们可以根据“course_number”中字母的出现创建一个分组变量,根据“course_number”和“academic_level”的first元素创建一个“name”并删除第一行

library(dplyr)
library(stringr)    
df %>% 
  group_by(grp = cumsum(str_detect(course_number, '[A-Za-z]'))) %>% 
  mutate(name = str_c(first(course_number), course_id,  
                 first(academic_level), sep=" ")) %>%
   slice(-1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM