简体   繁体   English

如何根据变量名自动对变量进行层次分组

[英]How to automate hierarchical grouping of variables based on variable name

I have my variables named in little-endian fashion, separated by periods.我的变量以小端方式命名,以句点分隔。

I'd like to create index variables for each different level and get summary output for the variables at each level, but I'm getting stuck at the first step trying to break apart my variables and put them in a table to start working with them:我想为每个不同的级别创建索引变量,并为每个级别的变量获取摘要 output,但我在尝试拆分变量并将它们放入表中以开始使用它们时遇到了第一步:

Variable naming convention:变量命名约定:

  • Environment.Construct.Subconstruct_1.subconstruct_i.#.Short_Name Environment.Construct.Subconstruct_1.subconstruct_i.#.Short_Name

Example:例子:

n <- 6
dat <- data.frame(
  ph1.career_interest.delight.1.Friendly=sample(1:5, n, replace=TRUE),
  ph1.career_interest.delight.2.Advantagious=sample(1:5, n, replace=TRUE),
  ph1.career_interest.philosophy.1.Meaningful_Difference=sample(1:5, n, replace=TRUE),
  ph1.career_interest.philosophy.2.Enable_Work=sample(1:5, n, replace=TRUE)
)

# create list of variable names
names <-  as.list(colnames( dat ))
## Try to create a heirarchy of variables: Step 1: Create matrix
heir <- as.matrix(strsplit(names,".", fixed = TRUE))

I've gone through a couple iterations but it still returns an error:我已经经历了几次迭代,但它仍然返回错误:

Error in strsplit(names, ".", fixed = TRUE) : non-character argument

Instead of wrapping with as.list , directly use the colnames because according to ?strsplit , the input x would be而不是用as.list包装,直接使用colnames因为根据?strsplit ,输入x将是

x - character vector, each element of which is to be split. x - 字符向量,其中的每个元素都将被拆分。 Other inputs, including a factor, will give an error.其他输入,包括一个因素,将给出一个错误。

Thus, if it is a list , it is not the expected input class for strsplit因此,如果它是一个list ,则它不是 strsplit 的预期输入strsplit

nm1 <- colnames(dat)
strsplit(nm1, ".", fixed = TRUE)
#[[1]]
#[1] "ph1"             "career_interest" "delight"         "1"               "Friendly"       

#[[2]]
#[1] "ph1"             "career_interest" "delight"         "2"               "Advantagious"   

#[[3]]
#[1] "ph1"                   "career_interest"       "philosophy"            "1"                     "Meaningful_Difference"

#[[4]]
#[1] "ph1"             "career_interest" "philosophy"      "2"               "Enable_Work"  

Output is a list of vector s. Output 是vectorlist It is not clear from the OP's post about the expected output format. OP 的帖子中并不清楚预期的 output 格式。 If we need a matrix or data.frame , can rbind those list elements (assuming they have the same length )如果我们需要一个matrixdata.frame ,可以rbind这些list元素(假设它们具有相同的length

 m1 <-  do.call(rbind, strsplit(nm1, ".", fixed = TRUE))

returns a matrix返回一个matrix

Or can convert to data.frame with rbind.data.frame或者可以使用rbind.data.frame转换为data.frame

NOTE: names is a function name.注意: names是 function 名称。 It is better not to assign object names with function names最好不要给function的名字分配object的名字

Update更新

If the lengths are not the same, an option is to pad NA at the end for those elements with less length如果lengths不相同,一个选项是在末尾为那些length的元素填充NA

lst1 <- strsplit(nm1, ".", fixed = TRUE)
lst1[[1]] <- lst1[[1]][1:3] # making lengths different
mx  <- max(lengths(lst1))
do.call(rbind, lapply(lst1, `length<-`, mx))
#   [,1]  [,2]              [,3]         [,4] [,5]                   
#[1,] "ph1" "career_interest" "delight"    NA   NA                     
#[2,] "ph1" "career_interest" "delight"    "2"  "Advantagious"         
#[3,] "ph1" "career_interest" "philosophy" "1"  "Meaningful_Difference"
#[4,] "ph1" "career_interest" "philosophy" "2"  "Enable_Work"          

You can count number of '.'您可以计算'.'数量in the column names to count number of new columns to create.在列名中计算要创建的新列的数量。 We can then use tidyr::separate to divide data into n new columns splitting on .然后我们可以使用tidyr::separate将数据分成n新列,拆分为. . .

#Changing 1st column name to make length unequal
names(dat)[1] <- 'ph1.career_interest.delight.1'
#Number of new columns to be created
n <- max(stringr::str_count(names(dat), '\\.')) + 1
tidyr::separate(data.frame(name = names(dat)), name, 
                paste0('col', seq_len(n)), sep = '\\.', fill = 'right')

#  col1            col2       col3 col4                  col5
#1  ph1 career_interest    delight    1                  <NA>
#2  ph1 career_interest    delight    2          Advantagious
#3  ph1 career_interest philosophy    1 Meaningful_Difference
#4  ph1 career_interest philosophy    2           Enable_Work

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM