Is there any way to generate year column from existing column names in R?

Question

I am working with a dataset that has the corresponding year attached to variable names as suffix, eg AXOX1991, where AXO is the variable. I am trying to separate the year from the variable label/column names to generate a year column so that the dataset can be analyzed as time-series data.

In other words, the existing dataset looks like:

Country	AXOX1991	AXOX1992	BXOX1991	BXOX1992	CXOX1991	CXOX1992
Afghanistan	1	2	3	4	5	6
USA	6	5	4	3	2	1

And I am trying to create the following:

Country	Year	AXO	BXO	CXO
Afghanistan	1991	1	3	5
Afghanistan	1992	2	4	6
USA	1991	6	4	2
USA	1992	5	3	1

As you can see, X not only acts as the delimiter that divides the variable name and the year, but it is also part of the variable name. Is there any way in R to separate the year from the variable name in existing column names and then to create a year column as shown above?

I have been thinking of workarounds, such as loops, but I haven't gotten very far, and I'm truly stumped. I have more than 900 variable-years, so I want to avoid doing it by hand if possible.

Thank you!

Answer 1

For the sake of completeness, here is a solution using melt() with the new measure() function (introduced with data.table v1.14.1):

library(data.table) # development version 1.14.1
melt(setDT(df), measure.vars = measure(value.name, year, 
                                       pattern = "(\\w{3})X(\\d{4})"))

 Country year AXO BXO CXO 1: Afghanistan 1991 1 3 5 2: USA 1991 6 4 2 3: Afghanistan 1992 2 4 6 4: USA 1992 5 3 1

Data

library(data.table)
df <- fread("Country    AXOX1991    AXOX1992    BXOX1991    BXOX1992    CXOX1991    CXOX1992
Afghanistan 1   2   3   4   5   6
USA 6   5   4   3   2   1")

Answer 2

You can make use of tidyr::pivot_longer -

res <- tidyr::pivot_longer(df, cols = -Country, 
                    names_to = c('.value', 'Year'), 
                    names_pattern = '([A-Z]+)X(\\d+)')
res

#  Country     Year    AXO   BXO   CXO
#  <chr>       <chr> <int> <int> <int>
#1 Afghanistan 1991      1     3     5
#2 Afghanistan 1992      2     4     6
#3 USA         1991      6     4     2
#4 USA         1992      5     3     1

data

df <- structure(list(Country = c("Afghanistan", "USA"), AXOX1991 = c(1L, 
6L), AXOX1992 = c(2L, 5L), BXOX1991 = 3:4, BXOX1992 = 4:3, CXOX1991 = c(5L, 
2L), CXOX1992 = c(6L, 1L)), class = "data.frame", row.names = c(NA, -2L))

Is there any way to generate year column from existing column names in R?

Question

2 answers

solution1
1 2021-07-03 08:11:36

Data

solution2
0 ACCPTED 2021-07-03 03:46:11

Is there any way to generate year column from existing column names in R?

Question

2 answers

solution1 1 2021-07-03 08:11:36

Data

solution2 0 ACCPTED 2021-07-03 03:46:11

solution1
1 2021-07-03 08:11:36

solution2
0 ACCPTED 2021-07-03 03:46:11