[英]Check number of different columns in a Data Frame in R
I am using R and have a Dataset where each column is a production unit and each row is a time unit.我正在使用 R 并有一个数据集,其中每一列是一个生产单位,每一行是一个时间单位。 Each variable is a crop rotation sequence applied to the production unit.每个变量都是应用于生产单位的轮作序列。
The dataset looks like this:数据集如下所示:
land_use_1 land_use_2 land_use_3 land_use_4 land_use_5 land_use_6
<chr> <chr> <chr> <chr> <chr> <chr>
1 PAST PAST PAST PAST SOY PAST
2 PAST PAST PAST PAST SOY PAST
3 PAST PAST PAST PAST PAST PAST
4 PAST PAST PAST PAST PAST SOY
5 PAST PAST PAST PAST CORN SOY
6 PAST PAST PAST PAST CORN PAST
I need to check how many of these columns (crop sequences) are unique, by i cannot do it one by one (doing something like land_use_1!=land_use_2, doing something like land_use_1!=land_use_3, etc) because there are hundreds of columns in the dataset.我需要检查这些列(裁剪序列)中有多少是唯一的,因为我不能一一做(做像 land_use_1!=land_use_2,做像 land_use_1!=land_use_3 之类的事情),因为有数百列数据集。
I tried to use this command我尝试使用此命令
dataset %>% unique(, MARGIN=2) %>% dim()
but it returns the same number of columns of the dataset and therefore doesnt detect which columns are identical (i know that some are identical, because i checked using some of them).但它返回数据集的相同列数,因此不会检测哪些列是相同的(我知道有些是相同的,因为我使用了其中的一些进行了检查)。
How can i do that in an efficient way?我怎样才能以有效的方式做到这一点?
Thanks谢谢
you can use data.table
fuction duplicated
:您可以使用data.table
功能duplicated
:
library(data.table)
DT<-data.table(yourdataframe)
DT$duplicated_rows<- duplicate(DT)
here a reporoductible example:这是一个可重现的示例:
DT <- data.table(A = rep(1:3, each=4), B = rep(1:4, each=3),
C = rep(1:2, 6), key = "A,B")
DT
A B C
1: 1 1 1
2: 1 1 2
3: 1 1 1
4: 1 2 2
5: 2 2 1
6: 2 2 2
7: 2 3 1
8: 2 3 2
9: 3 3 1
10: 3 4 2
11: 3 4 1
12: 3 4 2
duplicated(DT)
[1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
>
Here's a solution to generate a list of names of the unique columns and the total number of unique columns:这是生成唯一列名称列表和唯一列总数的解决方案:
library(tidyverse)
df <- data.frame(land_use_1 = rep("PAST", 6),
land_use_2 = rep("PAST", 6),
land_use_3 = rep("PAST", 6),
land_use_4 = rep("PAST", 6),
land_use_5 = c("SOY", "SOY", "PAST", "PAST", "CORN", "CORN"),
land_use_6 = c("PAST", "PAST", "PAST", "SOY", "SOY", "PAST"))
unique_vars <- data.frame(t(df)) %>%
rownames_to_column() %>%
distinct_at(vars(-rowname), .keep_all = T)
unique_vars$rowname
# [1] [1] "land_use_1" "land_use_5" "land_use_6"
length(unique_vars$rowname)
# [1] 3
Use:用:
unique(as.list(dataset))
This coerces the dataframe into a list of columns, then counts the number of unique elements of the list.这会将数据框强制转换为列列表,然后计算列表中唯一元素的数量。
eg:例如:
> d <- data.frame(a=c(1,1,0) , b=c(1,1,0), c=c(1,0,1))
> unique(as.list(d))
[[1]]
[1] 1 1 0
[[2]]
[1] 1 0 1
> length(unique(as.list(d)))
[1] 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.