简体   繁体   中英

How to assign a subset from a data frame `a' to a subset of data frame `b'

It might be a trivial question (I am new to R), but I could not find a answer for my question, either here in SO or anywhere else. My scenario is the following.

I have an data frame df and i want to update a subset df$tag values. df is similar to the following:

id = rep( c(1:4), 3)
tag = rep( c("aaa", "bbb", "rrr", "fff"), 3)
df = data.frame(id, tag)

Then, I am trying to use match() to update the column tag from the subsets of the data frame, using a second data frame (eg, aux) that contains two columns, namely, key and value . The subsets are defined by id = n, according to n in unique(df$id) . aux looks like the following:

 > aux 
     key      value
   "aaa"  "valueAA"
   "bbb"  "valueBB"
   "rrr"  "valueRR"
   "fff"  "valueFF"

I have tried to loop over the data frame, as follows:

for(i in unique(df$id)){
   indexer = df$id == i

   # here is how I tried to update the dame frame:
   df[indexer,]$tag <- aux[match(df[indexer,]$tag, aux$key),]$value
}

The expected result was the df[indexer,]$tag updated with the respective values from aux$value . The actual result was df$tag fulfilled with NA's. I've got no errors, but the following warning message:

In '[<-.factor'(' tmp ', df$id == i, value = c(NA, : invalid factor level, NA generated

Before, I was using df$tag <- aux[match(df$tag, aux$key),]$value , which worked properly, but some duplicated df$tags made the match() produce the misplaced updates in a number of rows. I also simulate the subsetting and it works fine. Can someone suggest a solution for this update?

UPDATE (how the final dataset should look like?):

 > df
      id       tag
       1  "valueAA"
       2  "valueBB"
       3  "valueRR"
       4  "valueFF"
    (...)     (...)

Thank you in advance.

Does this produce the output you expect?

df$tag <- aux$value[match(df$tag, aux$key)]

merge() would work too unless you have duplicates in aux .

It turned out that my data was breaking all the available built-in functions providing me a wrong dataset in the end. Then, my solution (at least, a preliminary one) was the following:

  1. to process each subset individually;
  2. add each data frame to a list;
  3. use rbindlist(a.list, use.names = T) to get a complete data frame with the results.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM