[英]Removing repeats and blanks from R data frame
I apologise in advance for the data structure here, but I'm stuck with it... 我在这里为数据结构提前道歉,但我坚持下去......
I have a data frame with lots of repeats and blanks, like so: 我有一个包含大量重复和空白的数据框,如下所示:
df <- data.frame(
country=c("Afghanistan", "Afghanistan", "Algeria", "Australia", "Australia", "Australia"),
survey.1=c("Influenza","", "","","Influenza","Influenza"),
survey.2=c("","Hepatitis C","","","",""),
survey.3=c("West Nile Virus", "", "", "", "", "West Nile Virus"))
country survey.1 survey.2 survey.3
1 Afghanistan Influenza West Nile Virus
2 Afghanistan Hepatitis C
3 Algeria
4 Australia
5 Australia Influenza
6 Australia Influenza West Nile Virus
I need to remove the repeats and blanks but keep the same data structure (I don't know what you would call this... 'concentrating' as opposed to 'aggregating' maybe?). 我需要删除重复和空白但保持相同的数据结构(我不知道你会称之为什么......'集中'而不是'聚合'可能?)。 So what I'd end up with is this: 所以我最终得到的是:
country survey.1 survey.2 survey.3
1 Afghanistan Influenza Hepatitis C West Nile Virus
2 Australia Influenza West Nile Virus
Can anyone help? 有人可以帮忙吗?
Using plyr
: 使用plyr
:
ddply(df,.(country),
function(x)
sapply(x,function(y){
xx= unique(y[nchar(y)>0])
ifelse(length(xx)>0,xx,unique(y))
}
)
)
country survey.1 survey.2 survey.3
1 Afghanistan Influenza Hepatitis C West Nile Virus
2 Algeria
3 Australia Influenza West Nile Virus
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.