简体   繁体   中英

Split camelCase Column names

I've been trying to figure this out for a while, and thought I would ask here.

Say I have a data frame like the following:

df <- data.frame(participant = 1:6, group = c("adult", "adult", "child", "child", "NSS", "NSS"), RegProto = c(2, 3, 4, 2, 4, 3), RegInt = c(2, 3, 4, 6, 6, 5), RegDistant = c(3, 3, 4, 5, 4, 5), IrregProto = c(4, 5, 3, 4, 3, 1), IrregInt = c(4, 4, 4, 4, 4, 4), IrregDistant = c(4, 5, 6, 8, 9, 1))

The problem with this data frame is that each contains two variables: one variable whose values are either Reg or Irreg , another whose values are Proto , Int , or Distant . What I would like to do is split these columns and make the table long, preferably using tidyr . I thought I could do it like this.

library("tidyr")
df_long <- df %>%
gather(index, n, -group, -participant) %>%
select(participant, group, index, n) %>%
separate(index, into = c("verb", "similarity"), sep = "\\.?=\\p{Upper}")

This does what I want until separate() . I get an error message saying that the values were not split, but no other suggestions as to why that might be. I'm new to regex, so I suspect the problem must be there, but I can't figure out what the correct syntax might be.

You can use this regex:

(?<=.)(?=[A-Z])

This indicates the (zero-length) position followed by an uppercase letter and preceded by any character.

The command:

library(dplyr)
df %>%
  gather(index, n, -group, -participant) %>%
  select(participant, group, index, n) %>%
  separate(index, into = c("verb", "similarity"), sep = "(?<=.)(?=[A-Z])")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM