I have seven data.frames within a list my_data
. Three of these data.frames have 16 columns, the other four have 22 columns. There are five columns in each data.frame that I need to bind into one data.frame ( all_data
). The problem is that I can't simply select the columns I want to retain by name, because the names are different (but similar) between each data.frame, and in different orders. For example, I have one data.frame that has a column titled "X2012.NAICS.code" and one that has a column titled "X2007.NAICS.codes.and.NAICS.based.rollup.code". These columns contain the same info (NAICS codes) and need to be bond together. The approach I am trying to use is this:
header_cols <- c( "Geographic.area.name", "Year", "**3rd column**", "**4th column**", "**5th column**" )
all_data <- map_dfr( my_data[grepl( "^ASM", names( my_data ))], ~
.x %>%
select( header_cols ))
Where the 3rd, 4th, and 5th columns are the three others I need ( Year
and Geographic.area.name
are the same between all 7 data.frames).
All data.frame names begin with "ASM", which is what the ^ASM
is for.
UPDATE: My current strategy is this
# Make object for raw column name strings (all columns of interest contain these strings in all dataframes)
name_pattern <- c( "Geographic.area.name", "Geographic Area Name")
VoS_pattern <- c( "Total.value.of.shipment", "value of shipments")
NAICS_pattern <- c( "NAICS.code", "NAICS code")
industry_pattern <- c("Meaning.of.", "Meaning of NAICS code")
relative_pattern <- c("Relative.standard.error", "Relative standard error")
header_cols <- c( "Year" )
# Part 3: binding the data into one dataframe based on the columns of interest, uniting columns that contain the same information category
# Bind the columns of interest into one dataframe
combined_data <- map_dfr( my_data[grepl( "^ASM", names( my_data ))], ~
.x %>%
select( header_cols, contains( paste0( name_pattern ) ),
contains( paste0( VoS_pattern ) ),
contains( paste0( NAICS_pattern ) ),
contains( paste0( industry_pattern ) ),
-contains ( paste0( relative_pattern) ) ))
which works perfectly. Unfortunately, I can't use the map_dfr
function (or any function specific to purrr, so am looking for a way to do this with rbind.
One option is to standardize the column names with rename_at
after select
ing the columns.
library(dplyr)
library(stringr)
library(purrr)
map_dfr(my_data[grep('^ASM', names(my_data))], ~
.x %>%
select(header_cols[1:2],
matches("NAICS\\.(code|based\\.rollup\\.code)")) %>%
rename_at(matches("NAICS"), ~ str_remove(., "^X\\d{4}\\.")))
Or with base R
using lapply
v1 <- c("Year", "state_name", "VoS_thousUSD", "NAICS_code", "industry")
out <- lapply(my_data[grep('^ASM', names(my_data))],
function(x) x %>%
mutate_if(is.factor, as.character) %>%
select( header_cols, contains( paste0( name_pattern ) ),
contains( paste0( VoS_pattern ) ),
contains( paste0( NAICS_pattern ) ),
contains( paste0( industry_pattern ) ),
-contains ( paste0( relative_pattern) ) ) %>%
set_names(v1))
combined_data <- do.call(rbind, out)
row.names(combined_data) <- NULL
# Make VoS numeric
combined_data_new <- combined_data %>%
dplyr::mutate( VoS_thousUSD = as.numeric( VoS_thousUSD ) )
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.