[英]How to automate hierarchical grouping of variables based on variable name
I have my variables named in little-endian fashion, separated by periods.我的变量以小端方式命名,以句点分隔。
I'd like to create index variables for each different level and get summary output for the variables at each level, but I'm getting stuck at the first step trying to break apart my variables and put them in a table to start working with them:我想为每个不同的级别创建索引变量,并为每个级别的变量获取摘要 output,但我在尝试拆分变量并将它们放入表中以开始使用它们时遇到了第一步:
Variable naming convention:变量命名约定:
Example:例子:
n <- 6
dat <- data.frame(
ph1.career_interest.delight.1.Friendly=sample(1:5, n, replace=TRUE),
ph1.career_interest.delight.2.Advantagious=sample(1:5, n, replace=TRUE),
ph1.career_interest.philosophy.1.Meaningful_Difference=sample(1:5, n, replace=TRUE),
ph1.career_interest.philosophy.2.Enable_Work=sample(1:5, n, replace=TRUE)
)
# create list of variable names
names <- as.list(colnames( dat ))
## Try to create a heirarchy of variables: Step 1: Create matrix
heir <- as.matrix(strsplit(names,".", fixed = TRUE))
I've gone through a couple iterations but it still returns an error:我已经经历了几次迭代,但它仍然返回错误:
Error in strsplit(names, ".", fixed = TRUE) : non-character argument
Instead of wrapping with as.list
, directly use the colnames
because according to ?strsplit
, the input x
would be而不是用as.list
包装,直接使用colnames
因为根据?strsplit
,输入x
将是
x - character vector, each element of which is to be split. x - 字符向量,其中的每个元素都将被拆分。 Other inputs, including a factor, will give an error.其他输入,包括一个因素,将给出一个错误。
Thus, if it is a list
, it is not the expected input class for strsplit
因此,如果它是一个list
,则它不是 strsplit 的预期输入strsplit
nm1 <- colnames(dat)
strsplit(nm1, ".", fixed = TRUE)
#[[1]]
#[1] "ph1" "career_interest" "delight" "1" "Friendly"
#[[2]]
#[1] "ph1" "career_interest" "delight" "2" "Advantagious"
#[[3]]
#[1] "ph1" "career_interest" "philosophy" "1" "Meaningful_Difference"
#[[4]]
#[1] "ph1" "career_interest" "philosophy" "2" "Enable_Work"
Output is a list
of vector
s. Output 是vector
的list
。 It is not clear from the OP's post about the expected output format. OP 的帖子中并不清楚预期的 output 格式。 If we need a matrix
or data.frame
, can rbind
those list
elements (assuming they have the same length
)如果我们需要一个matrix
或data.frame
,可以rbind
这些list
元素(假设它们具有相同的length
)
m1 <- do.call(rbind, strsplit(nm1, ".", fixed = TRUE))
returns a matrix
返回一个matrix
Or can convert to data.frame
with rbind.data.frame
或者可以使用rbind.data.frame
转换为data.frame
NOTE: names
is a function name.注意: names
是 function 名称。 It is better not to assign object names with function names最好不要给function的名字分配object的名字
If the lengths
are not the same, an option is to pad NA
at the end for those elements with less length
如果lengths
不相同,一个选项是在末尾为那些length
的元素填充NA
lst1 <- strsplit(nm1, ".", fixed = TRUE)
lst1[[1]] <- lst1[[1]][1:3] # making lengths different
mx <- max(lengths(lst1))
do.call(rbind, lapply(lst1, `length<-`, mx))
# [,1] [,2] [,3] [,4] [,5]
#[1,] "ph1" "career_interest" "delight" NA NA
#[2,] "ph1" "career_interest" "delight" "2" "Advantagious"
#[3,] "ph1" "career_interest" "philosophy" "1" "Meaningful_Difference"
#[4,] "ph1" "career_interest" "philosophy" "2" "Enable_Work"
You can count number of '.'
您可以计算'.'
数量in the column names to count number of new columns to create.在列名中计算要创建的新列的数量。 We can then use tidyr::separate
to divide data into n
new columns splitting on .
然后我们可以使用tidyr::separate
将数据分成n
新列,拆分为.
. .
#Changing 1st column name to make length unequal
names(dat)[1] <- 'ph1.career_interest.delight.1'
#Number of new columns to be created
n <- max(stringr::str_count(names(dat), '\\.')) + 1
tidyr::separate(data.frame(name = names(dat)), name,
paste0('col', seq_len(n)), sep = '\\.', fill = 'right')
# col1 col2 col3 col4 col5
#1 ph1 career_interest delight 1 <NA>
#2 ph1 career_interest delight 2 Advantagious
#3 ph1 career_interest philosophy 1 Meaningful_Difference
#4 ph1 career_interest philosophy 2 Enable_Work
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.