[英]Fill in missing values in dataset using values from another column
I have a data set like 我有一个数据集
Student_ID City Branch Name_of_University
M2001 Hyderabad C.S.E JNTU
M2002 Delhi E.C.E DelhiUniversity
M2003 Hyderabad C.S.E
M2004 Chennai I.T
M2005 Chennai C.S.E AnnaUniversity
M2006 Hyderabad E.C.E OU
M2007 Delhi I.T
M2008 Chennai E.C.E
I would like to fill the missing values of Name_University based on city, say M2003 can be filled with either OU or JNTU , but if JNTU appears more than OU, it is better to fill with JNTU. 我想根据城市填充Name_University的缺失值,说M2003可以用OU或JNTU填充,但是如果JNTU看起来比OU多,则最好用JNTU填充。 So how can i decide University name based on Maximum number of occurrences corresponding to a city. 因此,如何根据与城市相对应的最大出现次数来确定大学名称。
I need to do this in R. Please help me! 我需要在R中执行此操作。请帮助我!
probably this is close to what you want: 可能这接近您想要的:
> # example data set
> set.seed(0)
> df <- data.frame(city = LETTERS[sample(3,20,TRUE)], univ = letters[sample(3,20,TRUE)])
> df$univ[sample(20, 5)] <- NA
> df
city univ
1 C c
2 A c
3 B a
# .. snip ..
18 C c
19 C a
20 B <NA>
>
> # find max occurance of univ for each city
> ma <- daply(df, .(city), function(x) names(which.max(table(x$univ))))
> ma
A B C
"b" "a" "a"
>
> # replace the NA with the max value
> df$univ <- ifelse(is.na(df$univ), ma[df$city], as.character(df$univ))
> df
city univ
1 C c
2 A c
3 B a
# .. snip ..
18 C c
19 C a
20 B a
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.