简体   繁体   English

使用R将字符串拆分为因子

[英]Split String into factor using R

I sent out a fun questionnaire to our office to get some data for putting together a workflow for handling questionnaires in future. 我向我们的办公室发送了一份有趣的调查表,以获取一些数据,以便将来组合处理调查表的工作流。 Some of the questions had textual input, and the responses were comma separated lists. 一些问题有文字输入,回答是逗号分隔的列表。 The data were collected using a Google form, so they ended up in a spreadsheet. 数据是使用Google表单收集的,因此最终形成了电子表格。 I'm linking directly to this spreadsheet to get the data into R so I'd prefer not to do any more pre-processing on the data than I have to. 我直接链接到该电子表格以将数据输入R,所以我不希望对数据进行不必要的预处理。

Because the csv coming into R is comma separated too I swap the commas for pipes ('|'). 因为进入R的csv也是逗号分隔的,所以我将逗号换成管道('|')。 I'd like to make bar charts out of the responses to questions like "what's your favorite piece of industrial design", but lots of people have said things like "iPhone, coke bottle". 我想根据对“您最喜欢的工业设计作品”等问题的回答来制作条形图,但是很多人都说过诸如“ iPhone,可乐瓶”之类的东西。 This comes up for me as a bar labeled with iPhone|coke bottle. 这对我来说是带有iPhone |可乐瓶标签的酒吧。

I'd like to split it up so that the iPhone part contributes to the iPhone bar etc. In other languages I'd concatenate the whole list with a pipe separator, then split it again on the pipes then work with that new list. 我想将其拆分,以便iPhone部件有助于iPhone栏等。用其他语言,我要用管道分隔符将整个列表连接起来,然后在管道上再次拆分它,然后使用该新列表。 I'm stuck trying this approach in R; 我一直在R中尝试这种方法; is it the right way to go or is there a more R way to do it? 这是正确的方法还是还有更多的R方法?

a <- BVNdhData$Pets
b <- paste(a,collapse ="|")
c <- strsplit(b,"|",fixed=TRUE)

that all works, but leaves me with a list that I have no idea what to do with. 一切正常,但是给我留下了一个清单 ,我不知道该如何处理。

If you call unlist() on the results of strsplit() you get a single character vector with all of the components of your text: 如果在unlist()的结果上调用unlist()strsplit()得到一个包含所有文本成分的单个字符向量:

text <- c("cake|pie|sausage roll", "scotch egg|pie")
x <- unlist(strsplit(text, "\\|"))

Use table() to tabulate the entries: 使用table()列出条目:

table(x)

x
        cake          pie sausage roll   scotch egg 
           1            2            1            1 

Then coerce it to a data frame... 然后将其强制为数据框...

dat <- as.data.frame(table(x))
dat


             x Freq
1         cake    1
2          pie    2
3 sausage roll    1
4   scotch egg    1

... and plot: ...并绘制:

library(ggplot2)
ggplot(dat, aes(x, Freq)) + geom_point()

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM