简体   繁体   English

将字符的 dataframe 转换为“更清晰”的 dataframe,并在 R 中使用二进制变量

[英]Transform dataframe of characters into a “more clear” dataframe with binary variables in R

Starting from a dataframe in R like the following (df):从 R 中的 dataframe 开始,如下所示(df):

year_1 <- c('James','Mike','Jane', NA)
year_2 <- c('Evelyn', 'Jackson', 'James', 'Avery')
year_3 <- c('Harper', 'Avery', NA, NA)
df <- data.frame(year_1, year_2, year_3)

...I would like convert it into something like df1 (of course I have hundreds of elements in my original dataframe, so I can't go manually) ...我想把它转换成df1之类的东西(当然我原来的 dataframe 中有数百个元素,所以我不能手动 go)

names <- c('James','Mike','Jane','Evelyn', 'Jackson', 'Avery', 'Harper')
year_1 <- c('YES','YES','YES', 'NO', 'NO', 'NO', 'NO')
year_2 <- c('YES','NO','NO', 'YES', 'YES', 'YES', 'NO')
year_3 <- c('NO','NO','NO', 'NO', 'NO', 'YES', 'YES')
df_1 <- data.frame(year_1, year_2, year_3)
rownames(df_1) <- names

I have tried to:我试图:

  1. convert all elements of df into a string vector with unique elements将 df 的所有元素转换为具有唯一元素的字符串向量
  2. construct the structure of df1 taking the names of step 1)使用步骤 1 的名称构造 df1 的结构)
  3. try to fill df1 with a loop (here is where I am not able to build a proper loop that makes the trick)尝试用循环填充 df1 (在这里我无法构建一个合适的循环来实现这一点)

Any idea?任何想法?

Thanks!!谢谢!!

A base R option using stack + table使用stack + table的基本 R 选项

> as.data.frame(ifelse(table(stack(df)) == 1, "YES", "NO"))
        year_1 year_2 year_3
Avery       NO    YES    YES
Evelyn      NO    YES     NO
Harper      NO     NO    YES
Jackson     NO    YES     NO
James      YES    YES     NO
Jane       YES     NO     NO
Mike       YES     NO     NO

here is an option with tidyverse where we reshape the data into 'long' format pivot_longer , get the distinct rows, create a column of 'YES' and reshape back to 'wide' with pivot_wider这是tidyverse的一个选项,我们将数据重塑为“长”格式pivot_longer ,获取distinct的行,创建一个“YES”列并使用pivot_wider重塑回“宽”

library(dplyr)
library(tidyr)
library(tibble)
df %>%
  pivot_longer(cols = everything(), values_drop_na = TRUE) %>%
  distinct %>%
  mutate(new = 'YES') %>% 
  pivot_wider(names_from = name, values_from = new, values_fill = 'NO') %>%
  column_to_rownames("value")

-output -输出

#          year_1 year_2 year_3
#James      YES    YES     NO
#Evelyn      NO    YES     NO
#Harper      NO     NO    YES
#Mike       YES     NO     NO
#Jackson     NO    YES     NO
#Avery       NO    YES    YES
#Jane       YES     NO     NO

What about this?那这个呢?

sapply(df, function(x) sapply(na.omit(unique(unlist(df))), `%in%`, x))
#         year_1 year_2 year_3
# James     TRUE   TRUE  FALSE
# Mike      TRUE  FALSE  FALSE
# Jane      TRUE  FALSE  FALSE
# Evelyn   FALSE   TRUE  FALSE
# Jackson  FALSE   TRUE  FALSE
# Avery    FALSE   TRUE   TRUE
# Harper   FALSE  FALSE   TRUE

To offer another option, first we can extract the unique names from df using a nested for loop.为了提供另一种选择,首先我们可以使用嵌套的 for 循环从 df 中提取唯一名称。 We test if the name is already in our list, and further test if we're looking at an NA.我们测试该名称是否已经在我们的列表中,并进一步测试我们是否正在查看 NA。

people<-c()
for (i in 1:length(colnames(df))){
  for (j in 1:length(df[,1])){
    pers<-df[j,i]
    if (!(pers %in% people)){
      if (!is.na(pers)){
        people<-c(people,toString(pers))
      }
    }
  }
}

From here, we can iterate a simple %in% check over each year and combine into a full dataframe.从这里开始,我们可以每年迭代一个简单的 %in% 检查,并组合成一个完整的 dataframe。 The above answers are probably more straightforward, but I've found code like this is useful if you need to make other small changes to the data as it passes through the script.上面的答案可能更直接,但我发现如果您需要在数据通过脚本时对数据进行其他小的更改,这样的代码很有用。

for (i in 1:length(colnames(df))){
  colname<-colnames(df)[i]
  peoplein<-people %in% df[,i]
  if (i == 1){
    df1<-cbind(people,peoplein)
    colnames(df1)[i+1]<-colname
  } else {
    df1<-cbind(df1,peoplein)
    colnames(df1)[i+1]<-colname
  }
}

The resulting df1 is shown below.生成的 df1 如下所示。

     people    year_1  year_2  year_3 
[1,] "James"   "TRUE"  "TRUE"  "FALSE"
[2,] "Mike"    "TRUE"  "FALSE" "FALSE"
[3,] "Jane"    "TRUE"  "FALSE" "FALSE"
[4,] "Evelyn"  "FALSE" "TRUE"  "FALSE"
[5,] "Jackson" "FALSE" "TRUE"  "FALSE"
[6,] "Avery"   "FALSE" "TRUE"  "TRUE" 
[7,] "Harper"  "FALSE" "FALSE" "TRUE" 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM