简体   繁体   中英

Using lists to change columns in multiple dataframes in R

I am using a list of variables to download and create dataframes in R. I'd like to be able to use this list to make changes to different columns in each dataframe, but I am having trouble calling particular columns using the list of variables.

countries= c("USA","CHN")

for (i in 1:length(countries)){
    download.file(url[i],savedata[i])
    assign(countries[i],xmlToDataFrame(savedata[i]))
}

Now I have dataframes that look like this:

head(USA)
        indicator       country date          value decimal
1 GDP (current US$) United States 2012 15684800000000       0
2 GDP (current US$) United States 2011 14991300000000       0
3 GDP (current US$) United States 2010 14419400000000       0
4 GDP (current US$) United States 2009 13898300000000       0
5 GDP (current US$) United States 2008 14219300000000       0
6 GDP (current US$) United States 2007 13961800000000       0

And I would like to go through and make several changes, such as formatting the date column with the as.date() function, or changing the units of the value column, but I want to be able to do the same to both dataframe (or an arbitrary number in case I increase the length of countries.

However, whenever I try to do this I can seem to use the list of countries in the countries variable to get 'inside' each data frame. My initial guess was putting something like this in a loop:

assign(paste(countries[i],"date",sep="$"),
    as.date(get(paste(countries[i],"date",sep="$")))

In particular, I get confused about how the get(paste(countries[i])) works if I am not trying to get the particular column date, and how the paste(countries[i],"date",sep="$") prints the correct name, but I can't seem to get just the one column I'd like to manipulate.

Additionally, I realize loops are not the ideal way of doing this, but I've been having the same problem with the apply functions, though I am likely having trouble with them due to my lack of experience. Suggestions for either how to do it in a loop, or with out, would be much appreciated. Super R novice here, just trying to learn. Also, if you've come across a clear explanation/answer for this somewhere else, I'd appreciate you pointing me towards it.

It's much easier if you use lists. Start with an empty one:

mylist = list()

Then change this:

assign(countries[i],xmlToDataFrame(savedata[i]))

to this:

mylist[[i]] <- xmlToDataFrame(savedata[i])

Then make a function that does your formatting, for instance:

f <- function(df){
    within(df, date <- as.date(date))
}

And use lapply to apply it to all dataframes:

mylist2 <- lapply(mylist, f)

If you want to access dataframes by name, use this:

names(mylist2) <- countries

And test:

mylist2[["USA"]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM