[英]Conditional Character Column ffdf data
I imported a big dataset (~6 million rows) to R using ffbase package that lists people enrolled in high school in Brazil.我使用 ffbase 包将一个大数据集(约 600 万行)导入到 R 中,该包列出了在巴西就读高中的人数。 In principle, I have 2 columns: Id (Student Id Number) and University (Institution's name).原则上,我有 2 列:Id(学生 ID 号)和大学(机构名称)。
I would like to create a column - named Group in my example - relating each university to its educational group:我想创建一个列 - 在我的示例中名为 Group - 将每所大学与其教育组相关联:
Id University Group
000001 Anhanguera Kroton
000002 Unopar Kroton
000003 Anhembi Laureate
000004 FMU Laureate
PS: I have none information about educational groups in my dataset, but, I've got the information I need concerning which group corresponds to each university. PS:我的数据集中没有关于教育团体的信息,但是,我有我需要的关于每个大学对应哪个团体的信息。 In this way, I need to attach this detail to my data.这样,我需要将此详细信息附加到我的数据中。
PS2: The class of University column is ff_vector. PS2:大学列的类是ff_vector。
I appreciate any contribution you might make.我感谢您可能做出的任何贡献。
If you have a long list of Groups, this may not be the quickest way, but, using mutate
from the dplyr package:如果您有很长的组列表,这可能不是最快的方法,但是,使用dplyr包中的mutate
:
data <- data.frame("Id" = 000001:000004, "University" = c("Anhanguera", "Unopar", "Anhembi", "FMU"))
data <- mutate(data, Group = as.factor(
ifelse(University %in% "Anhanguera", "Kronton",
ifelse(University %in% "Unopar", "Kronton",
ifelse(University %in% "Anhembi", "Laureate",
ifelse(University %in% "FMU", "Laureate", NA))))))
data
str(data)
I used University here, but just substitute it with ff_vector
.我在这里使用了 University ,但只需将其替换为ff_vector
。
If you would like to keep Group as character, remove the as.factor()
.如果您想保留 Group 作为字符,请删除as.factor()
。
I'm not familiar with ffbase
, but see ffbase2 for using dplyr and ffbase
.我不熟悉ffbase
,但请参阅ffbase2以使用dplyr和ffbase
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.