简体   繁体   English

如何在多个csv文件中获取每一列的唯一值

[英]How to grab each column's unique values in multiple csv files

I am relatively new to R so bear with me. 我对R比较陌生,请耐心等待。 I have 50+ csv files and am looking to run through each of them and grab each column's unique values. 我有50多个csv文件,希望遍历每个文件并获取每一列的唯一值。 They are all formatted with first row being the headers. 它们都被格式化,第一行是标题。

The ideal output would then be a data frame indicating filename, column headers, and unique values for each csv. 理想的输出将是一个数据帧,指示每个csv的文件名,列标题和唯一值。 These are unique values for each column, one at a time, not for any uniqueness across a combination of columns. 这些是每一列的唯一值,一次是一个,而不是列组合的唯一性。

Any help would be greatly appreciated! 任何帮助将不胜感激!

Here is how I'm getting unique values as a list, but I'm not sure what to do next: 这是我如何获取唯一值的列表,但是我不确定下一步该怎么做:

lapply(files, function(x) {
  t <- read.csv(x, header=TRUE) # load file
  unq <- apply(t, 2, unique)
})

This should do the trick: 这应该可以解决问题:

do.call(rbind, lapply(files, function(x) {
  dat <- read.csv(x, header=TRUE)
  do.call(rbind, lapply(seq(ncol(dat)), function(idx) {
    data.frame(filename=x, column=colnames(dat)[idx],
               value=unique(dat[,idx]))
  }))
}))

The outer lapply returns a data frame for each of your files x , and the inner lapply returns a data frame for each column numbered idx within x . 外部lapply为每个文件x返回一个数据帧,内部lapplyx每个编号为idx列返回一个数据帧。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM