简体   繁体   English

R:从数据框中拆分变量并找到唯一的变量

[英]R: split variables from data frame and find unique ones

I have a tibble with 28 rows: 我有28行的小标题:

> al
# A tibble: 28 x 1
   lang_name                                               
   <chr>                                                   
 1 Objective-C,Swift,Other                                 
 2 Ruby,Shell                                              
 3 Ruby,HTML,Shell                                         
 4 Java,HTML,Kotlin,Other                                  
 5 TypeScript,JavaScript,CSS,Inno Setup,Shell,HTML         
 6 Vue,JavaScript,CSS,HTML                                 
 7 HTML,JavaScript,CSS                                     
 8 JavaScript,HTML,CSS,Other                               
 9 NA                                                      
10 Vim script,Ruby,Shell,Python,CoffeeScript,Makefile,Other
# ... with 18 more rows

Whicy I got by slicing the other data frame with al <- gh[,'lang_name'] . 我通过用al <- gh[,'lang_name']分割另一个数据框而得到了。 I want to extract data from every row and place it all in a single list, so I can find unique values. 我想从每一行提取数据并将其全部放在一个列表中,这样我就可以找到唯一的值。

How do I do that? 我怎么做?

I have tried splitting with al <- str_split(al, ",") , but it returns the following list: 我已经尝试使用al <- str_split(al, ",")拆分,但是它返回以下列表:

[[1]]
  [1] "c(\"Objective-C"  "Swift"            "Other\""          " \"Ruby"         
  [5] "Shell\""          " \"Ruby"          "HTML"             "Shell\""         
  [9] " \"Java"          "HTML"             "Kotlin"           "Other\""         
 [13] " \"TypeScript"    "JavaScript"       "CSS"              "Inno Setup"      
 [17] "Shell"            "HTML\""           " \"Vue"           "JavaScript"      
 [21] "CSS"              "HTML\""           " \"HTML"          "JavaScript"      
 [25] "CSS\""            " \"JavaScript"    "HTML"             "CSS"             
 [29] "Other\""          " NA"              " \"Vim script"    "Ruby"            
 [33] "Shell"            "Python"           "CoffeeScript"     "Makefile"        
 [37] "Other\""          " \"PHP\""         " \"JavaScript"    "TypeScript"      
 [41] "Other\""          " \"JavaScript"    "Other\""          " \"JavaScript"   
 [45] "CSS"              "Shell\""          " \"Ruby"          "JavaScript"      
 [49] "HTML"             "Vue"              "CSS"              "Shell\""         
 [53] " \"Go"            "Assembly"         "HTML"             "C"               
 [57] "Shell"            "Perl\""           " \"Go"            "HCL"             
 [61] "Other\""          " \"JavaScript\""  " \"C++"           "JavaScript"      
 [65] "Python"           "Go"               "Shell"            "C\""             
 [69] " \n\"JavaScript"  "CSS"              "HTML"             "Other\""         
 [73] " \"C++"           "Cuda"             "C"                "CMake"           
 [77] "Java"             "Python"           "Other\""          " \"JavaScript"   
 [81] "GLSL\""           " \"JavaScript"    "TypeScript"       "CSS\""           
 [85] " \"Kotlin"        "C"                "Makefile"         "HTML"            
 [89] "C++"              "Java"             "Other\""          " \"Java"         
 [93] "Other\""          " \"Python"        "Jupyter Notebook" "C++"             
 [97] "HTML"             "Shell"            "JavaScript\""     " \"CSS"          
[101] "JavaScript"       "HTML"             "Other\""          " \"HTML"         
[105] "CSS"              "JavaScript\")"   

And unique(al) simply returns the same string. 并且unique(al)仅返回相同的字符串。

I have also tried to put it all as a character: 我还尝试将所有内容都当作一个角色:

al <- gh[1,'lang_name']
i = 2
while(i < nrow(gh)) {
    al <- paste(al, ",", gh[i+1,'lang_name'])
    i = i + 1
  }
}

Which results in the following character: [1] "Objective-C,Swift,Other , Ruby,HTML,Shell , Java,HTML,Kotlin,Other , TypeScript,JavaScript,CSS,Inno Setup,Shell,HTML , Vue,JavaScript,CSS,HTML , HTML,JavaScript,CSS , JavaScript,HTML,CSS,Other , NA , Vim script,Ruby,Shell,Python,CoffeeScript,Makefile,Other , PHP , JavaScript,TypeScript,Other , JavaScript,Other , JavaScript,CSS,Shell , Ruby,JavaScript,HTML,Vue,CSS,Shell , Go,Assembly,HTML,C,Shell,Perl , Go,HCL,Other , JavaScript , C++,JavaScript,Python,Go,Shell,C , JavaScript,CSS,HTML,Other , C++,Cuda,C,CMake,Java,Python,Other , JavaScript,GLSL , JavaScript,TypeScript,CSS , Kotlin,C,Makefile,HTML,C++,Java,Other , Java,Other , Python,Jupyter Notebook,C++,HTML,Shell,JavaScript , CSS,JavaScript,HTML,Other , HTML,CSS,JavaScript" 结果为以下字符: [1] "Objective-C,Swift,Other , Ruby,HTML,Shell , Java,HTML,Kotlin,Other , TypeScript,JavaScript,CSS,Inno Setup,Shell,HTML , Vue,JavaScript,CSS,HTML , HTML,JavaScript,CSS , JavaScript,HTML,CSS,Other , NA , Vim script,Ruby,Shell,Python,CoffeeScript,Makefile,Other , PHP , JavaScript,TypeScript,Other , JavaScript,Other , JavaScript,CSS,Shell , Ruby,JavaScript,HTML,Vue,CSS,Shell , Go,Assembly,HTML,C,Shell,Perl , Go,HCL,Other , JavaScript , C++,JavaScript,Python,Go,Shell,C , JavaScript,CSS,HTML,Other , C++,Cuda,C,CMake,Java,Python,Other , JavaScript,GLSL , JavaScript,TypeScript,CSS , Kotlin,C,Makefile,HTML,C++,Java,Other , Java,Other , Python,Jupyter Notebook,C++,HTML,Shell,JavaScript , CSS,JavaScript,HTML,Other , HTML,CSS,JavaScript"

Which I don't know how to convert into string to run unique on. 我不知道如何将其转换为字符串以运行unique

I hope this gives you what you want: 我希望这能给您您想要的:

library(tibble)

al <- tibble(lang_name=
c("Objective-C,Swift,Other",                                 
"Ruby,Shell",                                              
"Ruby,HTML,Shell",                                         
"Java,HTML,Kotlin,Other",                          
"TypeScript,JavaScript,CSS,Inno Setup,Shell,HTML",         
"Vue,JavaScript,CSS,HTML",                                 
"HTML,JavaScript,CSS",                                     
"JavaScript,HTML,CSS,Other",                               
NA,                                                      
"Vim script,Ruby,Shell,Python,CoffeeScript,Makefile,Other"))

l1 <- strsplit(al$lang_name,",")
l1

# [[1]]
# [1] "Objective-C" "Swift"       "Other"      
# 
# [[2]]
# [1] "Ruby"  "Shell"
# 
# [[3]]
# [1] "Ruby"  "HTML"  "Shell"
# 
# [[4]]
# [1] "Java"   "HTML"   "Kotlin" "Other" 
# 
# [[5]]
# [1] "TypeScript" "JavaScript" "CSS"        "Inno Setup" "Shell"      "HTML"      
# 
# [[6]]
# [1] "Vue"        "JavaScript" "CSS"        "HTML"      
# 
# [[7]]
# [1] "HTML"       "JavaScript" "CSS"       
# 
# [[8]]
# [1] "JavaScript" "HTML"       "CSS"        "Other"     
# 
# [[9]]
# [1] NA
# 
# [[10]]
# [1] "Vim script"   "Ruby"         "Shell"        "Python"       "CoffeeScript" "Makefile"     "Other"  

l2 <- unlist(l1)
l2
# [1] "Objective-C"  "Swift"        "Other"        "Ruby"         "Shell"        "Ruby"         "HTML"         "Shell"       
# [9] "Java"         "HTML"         "Kotlin"       "Other"        "TypeScript"   "JavaScript"   "CSS"          "Inno Setup"  
# [17] "Shell"        "HTML"         "Vue"          "JavaScript"   "CSS"          "HTML"         "HTML"         "JavaScript"  
# [25] "CSS"          "JavaScript"   "HTML"         "CSS"          "Other"        NA             "Vim script"   "Ruby"        
# [33] "Shell"        "Python"       "CoffeeScript" "Makefile"     "Other" 

l3 <- unique(l2)
l3

# [1] "Objective-C"  "Swift"        "Other"        "Ruby"         "Shell"        "HTML"         "Java"         "Kotlin"      
# [9] "TypeScript"   "JavaScript"   "CSS"          "Inno Setup"   "Vue"          NA             "Vim script"   "Python"      
# [17] "CoffeeScript" "Makefile"

If you like tidyverse / purrr functions, you can do this in one piped step. 如果您喜欢tidyverse / purrr函数,则可以一步一步完成。 stringr::str_split is a convenient wrapper around stringi::stri_split . stringr::str_splitstringi::stri_split的便捷包装器。 purrr::reduce lets you apply a function, in this case c , repeatedly until you have the entire list of vectors that was returned by str_split reduced into one character vector. purrr::reduce允许您反复应用一个函数,在本例中为c ,直到您将str_split返回的向量的整个列表简化为一个字符向量为止。 unlist from base R also works well in place of reduce —I have very purrr -focused habits with tasks like this, but that doesn't need to be the default for a simple task. 从base R中unlist也可以很好地代替reduce -我对purrr任务有很purrr习惯,但是对于简单任务,不必将其作为默认设置。

library(tidyverse)

al$lang_name %>%
  str_split(",") %>%
  reduce(c) %>%
  unique()
#>  [1] "Objective-C"  "Swift"        "Other"        "Ruby"        
#>  [5] "Shell"        "HTML"         "Java"         "Kotlin"      
#>  [9] "TypeScript"   "JavaScript"   "CSS"          "Inno Setup"  
#> [13] "Vue"          NA             "Vim script"   "Python"      
#> [17] "CoffeeScript" "Makefile"

Created on 2018-06-03 by the reprex package (v0.2.0). reprex软件包 (v0.2.0)于2018-06-03创建。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM