I have my variables named in little-endian fashion, separated by periods.
I'd like to create index variables for each different level and get summary output for the variables at each level, but I'm getting stuck at the first step trying to break apart my variables and put them in a table to start working with them:
Variable naming convention:
Example:
n <- 6
dat <- data.frame(
ph1.career_interest.delight.1.Friendly=sample(1:5, n, replace=TRUE),
ph1.career_interest.delight.2.Advantagious=sample(1:5, n, replace=TRUE),
ph1.career_interest.philosophy.1.Meaningful_Difference=sample(1:5, n, replace=TRUE),
ph1.career_interest.philosophy.2.Enable_Work=sample(1:5, n, replace=TRUE)
)
# create list of variable names
names <- as.list(colnames( dat ))
## Try to create a heirarchy of variables: Step 1: Create matrix
heir <- as.matrix(strsplit(names,".", fixed = TRUE))
I've gone through a couple iterations but it still returns an error:
Error in strsplit(names, ".", fixed = TRUE) : non-character argument
Instead of wrapping with as.list
, directly use the colnames
because according to ?strsplit
, the input x
would be
x - character vector, each element of which is to be split. Other inputs, including a factor, will give an error.
Thus, if it is a list
, it is not the expected input class for strsplit
nm1 <- colnames(dat)
strsplit(nm1, ".", fixed = TRUE)
#[[1]]
#[1] "ph1" "career_interest" "delight" "1" "Friendly"
#[[2]]
#[1] "ph1" "career_interest" "delight" "2" "Advantagious"
#[[3]]
#[1] "ph1" "career_interest" "philosophy" "1" "Meaningful_Difference"
#[[4]]
#[1] "ph1" "career_interest" "philosophy" "2" "Enable_Work"
Output is a list
of vector
s. It is not clear from the OP's post about the expected output format. If we need a matrix
or data.frame
, can rbind
those list
elements (assuming they have the same length
)
m1 <- do.call(rbind, strsplit(nm1, ".", fixed = TRUE))
returns a matrix
Or can convert to data.frame
with rbind.data.frame
NOTE: names
is a function name. It is better not to assign object names with function names
If the lengths
are not the same, an option is to pad NA
at the end for those elements with less length
lst1 <- strsplit(nm1, ".", fixed = TRUE)
lst1[[1]] <- lst1[[1]][1:3] # making lengths different
mx <- max(lengths(lst1))
do.call(rbind, lapply(lst1, `length<-`, mx))
# [,1] [,2] [,3] [,4] [,5]
#[1,] "ph1" "career_interest" "delight" NA NA
#[2,] "ph1" "career_interest" "delight" "2" "Advantagious"
#[3,] "ph1" "career_interest" "philosophy" "1" "Meaningful_Difference"
#[4,] "ph1" "career_interest" "philosophy" "2" "Enable_Work"
You can count number of '.'
in the column names to count number of new columns to create. We can then use tidyr::separate
to divide data into n
new columns splitting on .
.
#Changing 1st column name to make length unequal
names(dat)[1] <- 'ph1.career_interest.delight.1'
#Number of new columns to be created
n <- max(stringr::str_count(names(dat), '\\.')) + 1
tidyr::separate(data.frame(name = names(dat)), name,
paste0('col', seq_len(n)), sep = '\\.', fill = 'right')
# col1 col2 col3 col4 col5
#1 ph1 career_interest delight 1 <NA>
#2 ph1 career_interest delight 2 Advantagious
#3 ph1 career_interest philosophy 1 Meaningful_Difference
#4 ph1 career_interest philosophy 2 Enable_Work
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.