简体   繁体   English

在依赖不同子字符串的单独列中创建值

[英]Creating values in separate columns that are dependent on different substrings

I have the following data frame in R after using melt on some wide-format data: 在对某些宽格式数据使用melt之后,我在R中具有以下数据框:

Condition value
C1SSC     4.5
C2SSC     7.7
TC1SSC    6.0
TC2SSC    7.3
PC1SSC    4.5
PC2SSC    5.7

Each character or substring has a specific meaning (for instance, TC2SSC means a condition where a textured [T] circle [C] was viewed with both eyes [2], and the response 'starting shape' was a circle [SSC]). 每个字符或子字符串都有特定含义(例如,TC2SSC表示用两只眼睛[2]观察到纹理化的[T]圆圈[C],并且响应“起始形状”是圆圈[SSC]的情况)。

What I want to do is generate new variable columns that are dependent on these characters and substrings - one for texture, one for shape and so on. 我想要做的是生成依赖于这些字符和子字符串的新变量列-一个用于纹理,一个用于形状,等等。 I thought about using grepl or substr , but I'm not sure if these can evaluate specific parts of strings (ie when ascertaining shape, checking the first two characters to see if they contain a 'C'). 我考虑过使用greplsubstr ,但是我不确定它们是否可以评估字符串的特定部分(即确定形状时,检查前两个字符以查看它们是否包含'C')。

Ideally, this is what I'd end up with (example for TC2SSC): 理想情况下,这就是我要得到的结果(例如TC2SSC):

Texture    Shape    View    startShape    value
T          Circle   2       Circle        4.5

There are a lot of useful functions, but I'm not sure which is the best to use here. 有很多有用的功能,但我不确定在这里最好使用哪个功能。 Any advice would be much appreciated. 任何建议将不胜感激。

Here's a straightforward way to approach the problem. 这是解决问题的一种直接方法。 Basically, use a pattern with gsub to insert a character after every character (here "_") that you want to "split" and then use strsplit on it. 基本上,使用带有gsub的模式在要“分割”的每个字符(此处为“ _”)之后插入一个字符,然后在其上使用strsplit Here's how: 这是如何做:

split.df <- data.frame(do.call(rbind, strsplit(gsub("(C|SSC|[0-9]+)", "_\\1_", 
                      dt$Condition), "[_]+")), stringsAsFactors=FALSE)

#   X1 X2 X3  X4
# 1     C  1 SSC
# 2     C  2 SSC
# 3  T  C  1 SSC
# 4  T  C  2 SSC
# 5  P  C  1 SSC
# 6  P  C  2 SSC

Now, the rest is pretty straightforward (change names, convert classes and replace C to circle etc..) 现在,其余的操作非常简单(更改名称,转换类并将C替换为圆形等)。

names(split.df) <- c("Texture", "Shape", "View", "startShape")
split.df <- within(split.df, { Shape[Shape == "C"] <- "Circle" 
            View <- as.numeric(View)
            startShape[startShape == "SSC"] <- "Circle"} )
cbind(split.df, value = df$value)

#   Texture  Shape View startShape df$value
# 1         Circle    1     Circle      4.5
# 2         Circle    2     Circle      7.7
# 3       T Circle    1     Circle      6.0
# 4       T Circle    2     Circle      7.3
# 5       P Circle    1     Circle      4.5
# 6       P Circle    2     Circle      5.7

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM