简体   繁体   中英

Fill a data.frame column with values from other data.frame column with a condition R

I have this two data.frames:

  df1 = data.frame(
    "scientific_name"=c("Anchietea ballardii","Anemia patens","Cololobus argenteus"),
    "threat_status"=c("VU","EN","EN")
  )

> df1
      scientific_name threat_status
1 Anchietea ballardii            VU
2       Anemia patens            EN
3 Cololobus argenteus            EN
> 


  df2 = data.frame(
    "scientific_name"=c("Anchietea ballardii","Anemia patens","Cololobus argenteus","Coleocephalocereus pluricostatus", "Dyckia bracteata", "Merianthera parvifolia", "Kielmeyera rupestris"),
    "threat_status"=c(NA)
  )

> df2
                     scientific_name threat_status
1                Anchietea ballardii            NA
2                      Anemia patens            NA
3                Cololobus argenteus            NA
4   Coleocephalocereus pluricostatus            NA
5                   Dyckia bracteata            NA
6             Merianthera parvifolia            NA
7               Kielmeyera rupestris            NA
>

I need to fill the df2$threat_status with the correspondent df1$threat_status, but when there is not any value for the column df2$threat_status it can be filled with "LC". I'm trying with sqldf INSERT TO, but it's not working. I's like to have a solution in plain R, something tells me it would be more elegant. Somebody could help? Thanks a lot!

In base R, you can do:

df2 = merge(df2[,1,drop=F],df1, by="scientific_name", all.x=T)
df2[is.na(df2$threat_status), 2] <- "LC"

Output:

                   scientific_name threat_status
1              Anchietea ballardii            VU
2                    Anemia patens            EN
3 Coleocephalocereus pluricostatus            LC
4              Cololobus argenteus            EN
5                 Dyckia bracteata            LC
6             Kielmeyera rupestris            LC
7           Merianthera parvifolia            LC

Another option in your example is to simply set all the threat_status values in df2 to "LC" , and then just row bind df1 to the rows in df2 where df1$scientfic_name does not appear:

df2$threat_status="LC"
rbind(df1,df2[!df2$scientific_name %in% df1$scientific_name,])

Output:

                   scientific_name threat_status
1              Anchietea ballardii            VU
2                    Anemia patens            EN
3              Cololobus argenteus            EN
4 Coleocephalocereus pluricostatus            LC
5                 Dyckia bracteata            LC
6           Merianthera parvifolia            LC
7             Kielmeyera rupestris            LC

With sqldf you could :

library(sqldf)
sqldf("SELECT df2.scientific_name, 
              CASE WHEN df1.threat_status IS NULL THEN 'LC'
                   ELSE df1.threat_status
                   END AS threat_status
         FROM df2 LEFT JOIN df1 ON df2.scientific_name = df1.scientific_name")


#                    scientific_name threat_status
# 1              Anchietea ballardii            VU
# 2                    Anemia patens            EN
# 3              Cololobus argenteus            EN
# 4 Coleocephalocereus pluricostatus            LC
# 5                 Dyckia bracteata            LC
# 6           Merianthera parvifolia            LC
# 7             Kielmeyera rupestris            LC

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM