[英]How to grab each column's unique values in multiple csv files
I am relatively new to R so bear with me. 我对R比较陌生,请耐心等待。 I have 50+ csv files and am looking to run through each of them and grab each column's unique values.
我有50多个csv文件,希望遍历每个文件并获取每一列的唯一值。 They are all formatted with first row being the headers.
它们都被格式化,第一行是标题。
The ideal output would then be a data frame indicating filename, column headers, and unique values for each csv. 理想的输出将是一个数据帧,指示每个csv的文件名,列标题和唯一值。 These are unique values for each column, one at a time, not for any uniqueness across a combination of columns.
这些是每一列的唯一值,一次是一个,而不是列组合的唯一性。
Any help would be greatly appreciated! 任何帮助将不胜感激!
Here is how I'm getting unique values as a list, but I'm not sure what to do next: 这是我如何获取唯一值的列表,但是我不确定下一步该怎么做:
lapply(files, function(x) {
t <- read.csv(x, header=TRUE) # load file
unq <- apply(t, 2, unique)
})
This should do the trick: 这应该可以解决问题:
do.call(rbind, lapply(files, function(x) {
dat <- read.csv(x, header=TRUE)
do.call(rbind, lapply(seq(ncol(dat)), function(idx) {
data.frame(filename=x, column=colnames(dat)[idx],
value=unique(dat[,idx]))
}))
}))
The outer lapply
returns a data frame for each of your files x
, and the inner lapply
returns a data frame for each column numbered idx
within x
. 外部
lapply
为每个文件x
返回一个数据帧,内部lapply
为x
每个编号为idx
列返回一个数据帧。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.