I am working with a dataset that has the corresponding year attached to variable names as suffix, eg AXOX1991, where AXO is the variable. I am trying to separate the year from the variable label/column names to generate a year column so that the dataset can be analyzed as time-series data.
In other words, the existing dataset looks like:
Country | AXOX1991 | AXOX1992 | BXOX1991 | BXOX1992 | CXOX1991 | CXOX1992 |
---|---|---|---|---|---|---|
Afghanistan | 1 | 2 | 3 | 4 | 5 | 6 |
USA | 6 | 5 | 4 | 3 | 2 | 1 |
And I am trying to create the following:
Country | Year | AXO | BXO | CXO |
---|---|---|---|---|
Afghanistan | 1991 | 1 | 3 | 5 |
Afghanistan | 1992 | 2 | 4 | 6 |
USA | 1991 | 6 | 4 | 2 |
USA | 1992 | 5 | 3 | 1 |
As you can see, X not only acts as the delimiter that divides the variable name and the year, but it is also part of the variable name. Is there any way in R to separate the year from the variable name in existing column names and then to create a year column as shown above?
I have been thinking of workarounds, such as loops, but I haven't gotten very far, and I'm truly stumped. I have more than 900 variable-years, so I want to avoid doing it by hand if possible.
Thank you!
For the sake of completeness, here is a solution using melt()
with the new measure()
function (introduced with data.table
v1.14.1):
library(data.table) # development version 1.14.1
melt(setDT(df), measure.vars = measure(value.name, year,
pattern = "(\\w{3})X(\\d{4})"))
Country year AXO BXO CXO 1: Afghanistan 1991 1 3 5 2: USA 1991 6 4 2 3: Afghanistan 1992 2 4 6 4: USA 1992 5 3 1
library(data.table)
df <- fread("Country AXOX1991 AXOX1992 BXOX1991 BXOX1992 CXOX1991 CXOX1992
Afghanistan 1 2 3 4 5 6
USA 6 5 4 3 2 1")
You can make use of tidyr::pivot_longer
-
res <- tidyr::pivot_longer(df, cols = -Country,
names_to = c('.value', 'Year'),
names_pattern = '([A-Z]+)X(\\d+)')
res
# Country Year AXO BXO CXO
# <chr> <chr> <int> <int> <int>
#1 Afghanistan 1991 1 3 5
#2 Afghanistan 1992 2 4 6
#3 USA 1991 6 4 2
#4 USA 1992 5 3 1
data
df <- structure(list(Country = c("Afghanistan", "USA"), AXOX1991 = c(1L,
6L), AXOX1992 = c(2L, 5L), BXOX1991 = 3:4, BXOX1992 = 4:3, CXOX1991 = c(5L,
2L), CXOX1992 = c(6L, 1L)), class = "data.frame", row.names = c(NA, -2L))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.