简体   繁体   English

R:在data.frame字段中拆分多个值/关键字对

[英]R: split multiple value/key pairs in data.frame field

I've got a data.frame that contains a field like this: 我有一个data.frame包含一个像这样的字段:

:6:Description_C
:3:Description_A:2:Description_B:1:Description_C
:2:Description_C:1:Description_B:1:Description_A:1:Description_D:1:Description_E
:3:Description_B:3:Description_A

The number in front, surrounded by colons, is the number of times, out of a total of 6, which the Description is seen in that entry in the data.frame. 前面的数字(用冒号包围)是总数6中的次数,在data.frame的该条目中可以看到“说明”。 If there is a :6:Description_X means that all 6 counts go for that description, if not it's split into different counts, one next to each other. 如果存在:6:Description_X表示该:6:Description_X全部使用6个计数,如果没有,则将其分成不同的计数,一个接一个。

I would like to turn this field into a key/value hash of number of counts for each description, so that I can then do a barplot of the total proportions for all counts, but also in a way that I can plot these proportions in combination with the other factors in the data.frame. 我想将此字段转换为每个描述的计数数量的键/值散列,以便随后可以绘制所有计数的总比例的小图,而且还可以组合绘制这些比例以及data.frame中的其他因素。

EDIT: looking a bit at the doc for colsplit, probably what people will tell me is that I need a new column for each description, since I only have about 8 descriptions in total. 编辑:稍微看一下colsplit的文档,人们可能会告诉我,我需要为每个描述添加一个新列,因为我总共只有大约8个描述。 Still, haven't figured out how to do it. 不过,还没有弄清楚该怎么做。

How can I do that in R? 我如何在R中做到这一点?

I'm not sure what structure you wanted for the "key:value hash" but this will extract the strings and their associated numeric reps: 我不确定您要为“ key:value哈希”使用哪种结构,但这会提取字符串及其关联的数字代表:

inp <- readLines(textConnection(
 ":6:Description_C
 :3:Description_A:2:Description_B:1:Description_C
 :2:Description_C:1:Description_B:1:Description_A:1:Description_D:1:Description_E
 :3:Description_B:3:Description_A")
        )
 inp2 <- sapply( strsplit(inp, ":"), "[", -1) # drop the leading empty strings
 reps <-  lapply(inp2, function(x) as.numeric(x[ seq( 1, length(x) , by=2)]))
 values <- lapply(inp2, function(x) x[ seq( 2, length(x) , by=2)])

lapply(reps, barplot) # Probably needs to work but this demonstrates feasibility

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM