[英]R: split variables from data frame and find unique ones
I have a tibble with 28 rows: 我有28行的小标题:
> al
# A tibble: 28 x 1
lang_name
<chr>
1 Objective-C,Swift,Other
2 Ruby,Shell
3 Ruby,HTML,Shell
4 Java,HTML,Kotlin,Other
5 TypeScript,JavaScript,CSS,Inno Setup,Shell,HTML
6 Vue,JavaScript,CSS,HTML
7 HTML,JavaScript,CSS
8 JavaScript,HTML,CSS,Other
9 NA
10 Vim script,Ruby,Shell,Python,CoffeeScript,Makefile,Other
# ... with 18 more rows
Whicy I got by slicing the other data frame with al <- gh[,'lang_name']
. 我通过用
al <- gh[,'lang_name']
分割另一个数据框而得到了。 I want to extract data from every row and place it all in a single list, so I can find unique values. 我想从每一行提取数据并将其全部放在一个列表中,这样我就可以找到唯一的值。
How do I do that? 我怎么做?
I have tried splitting with al <- str_split(al, ",")
, but it returns the following list: 我已经尝试使用
al <- str_split(al, ",")
拆分,但是它返回以下列表:
[[1]]
[1] "c(\"Objective-C" "Swift" "Other\"" " \"Ruby"
[5] "Shell\"" " \"Ruby" "HTML" "Shell\""
[9] " \"Java" "HTML" "Kotlin" "Other\""
[13] " \"TypeScript" "JavaScript" "CSS" "Inno Setup"
[17] "Shell" "HTML\"" " \"Vue" "JavaScript"
[21] "CSS" "HTML\"" " \"HTML" "JavaScript"
[25] "CSS\"" " \"JavaScript" "HTML" "CSS"
[29] "Other\"" " NA" " \"Vim script" "Ruby"
[33] "Shell" "Python" "CoffeeScript" "Makefile"
[37] "Other\"" " \"PHP\"" " \"JavaScript" "TypeScript"
[41] "Other\"" " \"JavaScript" "Other\"" " \"JavaScript"
[45] "CSS" "Shell\"" " \"Ruby" "JavaScript"
[49] "HTML" "Vue" "CSS" "Shell\""
[53] " \"Go" "Assembly" "HTML" "C"
[57] "Shell" "Perl\"" " \"Go" "HCL"
[61] "Other\"" " \"JavaScript\"" " \"C++" "JavaScript"
[65] "Python" "Go" "Shell" "C\""
[69] " \n\"JavaScript" "CSS" "HTML" "Other\""
[73] " \"C++" "Cuda" "C" "CMake"
[77] "Java" "Python" "Other\"" " \"JavaScript"
[81] "GLSL\"" " \"JavaScript" "TypeScript" "CSS\""
[85] " \"Kotlin" "C" "Makefile" "HTML"
[89] "C++" "Java" "Other\"" " \"Java"
[93] "Other\"" " \"Python" "Jupyter Notebook" "C++"
[97] "HTML" "Shell" "JavaScript\"" " \"CSS"
[101] "JavaScript" "HTML" "Other\"" " \"HTML"
[105] "CSS" "JavaScript\")"
And unique(al)
simply returns the same string. 并且
unique(al)
仅返回相同的字符串。
I have also tried to put it all as a character: 我还尝试将所有内容都当作一个角色:
al <- gh[1,'lang_name']
i = 2
while(i < nrow(gh)) {
al <- paste(al, ",", gh[i+1,'lang_name'])
i = i + 1
}
}
Which results in the following character: [1] "Objective-C,Swift,Other , Ruby,HTML,Shell , Java,HTML,Kotlin,Other , TypeScript,JavaScript,CSS,Inno Setup,Shell,HTML , Vue,JavaScript,CSS,HTML , HTML,JavaScript,CSS , JavaScript,HTML,CSS,Other , NA , Vim script,Ruby,Shell,Python,CoffeeScript,Makefile,Other , PHP , JavaScript,TypeScript,Other , JavaScript,Other , JavaScript,CSS,Shell , Ruby,JavaScript,HTML,Vue,CSS,Shell , Go,Assembly,HTML,C,Shell,Perl , Go,HCL,Other , JavaScript , C++,JavaScript,Python,Go,Shell,C , JavaScript,CSS,HTML,Other , C++,Cuda,C,CMake,Java,Python,Other , JavaScript,GLSL , JavaScript,TypeScript,CSS , Kotlin,C,Makefile,HTML,C++,Java,Other , Java,Other , Python,Jupyter Notebook,C++,HTML,Shell,JavaScript , CSS,JavaScript,HTML,Other , HTML,CSS,JavaScript"
结果为以下字符:
[1] "Objective-C,Swift,Other , Ruby,HTML,Shell , Java,HTML,Kotlin,Other , TypeScript,JavaScript,CSS,Inno Setup,Shell,HTML , Vue,JavaScript,CSS,HTML , HTML,JavaScript,CSS , JavaScript,HTML,CSS,Other , NA , Vim script,Ruby,Shell,Python,CoffeeScript,Makefile,Other , PHP , JavaScript,TypeScript,Other , JavaScript,Other , JavaScript,CSS,Shell , Ruby,JavaScript,HTML,Vue,CSS,Shell , Go,Assembly,HTML,C,Shell,Perl , Go,HCL,Other , JavaScript , C++,JavaScript,Python,Go,Shell,C , JavaScript,CSS,HTML,Other , C++,Cuda,C,CMake,Java,Python,Other , JavaScript,GLSL , JavaScript,TypeScript,CSS , Kotlin,C,Makefile,HTML,C++,Java,Other , Java,Other , Python,Jupyter Notebook,C++,HTML,Shell,JavaScript , CSS,JavaScript,HTML,Other , HTML,CSS,JavaScript"
Which I don't know how to convert into string to run unique
on. 我不知道如何将其转换为字符串以运行
unique
。
I hope this gives you what you want: 我希望这能给您您想要的:
library(tibble)
al <- tibble(lang_name=
c("Objective-C,Swift,Other",
"Ruby,Shell",
"Ruby,HTML,Shell",
"Java,HTML,Kotlin,Other",
"TypeScript,JavaScript,CSS,Inno Setup,Shell,HTML",
"Vue,JavaScript,CSS,HTML",
"HTML,JavaScript,CSS",
"JavaScript,HTML,CSS,Other",
NA,
"Vim script,Ruby,Shell,Python,CoffeeScript,Makefile,Other"))
l1 <- strsplit(al$lang_name,",")
l1
# [[1]]
# [1] "Objective-C" "Swift" "Other"
#
# [[2]]
# [1] "Ruby" "Shell"
#
# [[3]]
# [1] "Ruby" "HTML" "Shell"
#
# [[4]]
# [1] "Java" "HTML" "Kotlin" "Other"
#
# [[5]]
# [1] "TypeScript" "JavaScript" "CSS" "Inno Setup" "Shell" "HTML"
#
# [[6]]
# [1] "Vue" "JavaScript" "CSS" "HTML"
#
# [[7]]
# [1] "HTML" "JavaScript" "CSS"
#
# [[8]]
# [1] "JavaScript" "HTML" "CSS" "Other"
#
# [[9]]
# [1] NA
#
# [[10]]
# [1] "Vim script" "Ruby" "Shell" "Python" "CoffeeScript" "Makefile" "Other"
l2 <- unlist(l1)
l2
# [1] "Objective-C" "Swift" "Other" "Ruby" "Shell" "Ruby" "HTML" "Shell"
# [9] "Java" "HTML" "Kotlin" "Other" "TypeScript" "JavaScript" "CSS" "Inno Setup"
# [17] "Shell" "HTML" "Vue" "JavaScript" "CSS" "HTML" "HTML" "JavaScript"
# [25] "CSS" "JavaScript" "HTML" "CSS" "Other" NA "Vim script" "Ruby"
# [33] "Shell" "Python" "CoffeeScript" "Makefile" "Other"
l3 <- unique(l2)
l3
# [1] "Objective-C" "Swift" "Other" "Ruby" "Shell" "HTML" "Java" "Kotlin"
# [9] "TypeScript" "JavaScript" "CSS" "Inno Setup" "Vue" NA "Vim script" "Python"
# [17] "CoffeeScript" "Makefile"
If you like tidyverse
/ purrr
functions, you can do this in one piped step. 如果您喜欢
tidyverse
/ purrr
函数,则可以一步一步完成。 stringr::str_split
is a convenient wrapper around stringi::stri_split
. stringr::str_split
是stringi::stri_split
的便捷包装器。 purrr::reduce
lets you apply a function, in this case c
, repeatedly until you have the entire list of vectors that was returned by str_split
reduced into one character vector. purrr::reduce
允许您反复应用一个函数,在本例中为c
,直到您将str_split
返回的向量的整个列表简化为一个字符向量为止。 unlist
from base R also works well in place of reduce
—I have very purrr
-focused habits with tasks like this, but that doesn't need to be the default for a simple task. 从base R中
unlist
也可以很好地代替reduce
-我对purrr
任务有很purrr
习惯,但是对于简单任务,不必将其作为默认设置。
library(tidyverse)
al$lang_name %>%
str_split(",") %>%
reduce(c) %>%
unique()
#> [1] "Objective-C" "Swift" "Other" "Ruby"
#> [5] "Shell" "HTML" "Java" "Kotlin"
#> [9] "TypeScript" "JavaScript" "CSS" "Inno Setup"
#> [13] "Vue" NA "Vim script" "Python"
#> [17] "CoffeeScript" "Makefile"
Created on 2018-06-03 by the reprex package (v0.2.0). 由reprex软件包 (v0.2.0)于2018-06-03创建。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.