简体   繁体   中英

Assigning an ID vector to a dataframe in R, based on week number (ISOweek)

I'm trying to add ID codes to a dataframe (creel), which is a time series of shellfish landings from 2000 to 2015. The first few columns are given below:

    head(creel,10)

           week year       boat  fID
    1  2000-W01 2000      Mousa  NA
    2  2000-W01 2000       Yell  NA
    3  2000-W01 2000      Foula  NA
    4  2000-W01 2000      Foula  NA
    5  2000-W02 2000      Foula  NA
    6  2000-W02 2000 Papa Stour  NA
    7  2000-W02 2000     Fetlar  NA
    8  2000-W02 2000       Unst  NA
    9  2000-W03 2000      Foula  NA
    10 2000-W03 2000  Fair Isle  NA
    ...

The IDs are for boat owners, which change over time. I have details for who owned which boat and when, and have created unique codes to add to the "fID" column (which I created and filled with NAs). For the purposes of this, let's say "aa" is the ID for the Mousa, "ab" for Yell, "ac" for Foula etc. in the rows above. If the owner of Mousa then buys a new boat which is in the dataframe, the "aa" would go with them and be assigned to the new boat name.

The week vector was created from actual dates, using the ISOweek function. The week vector is an ordered factor, so R knows the correct order from start to finish in the time series:

    creel$week <- as.ordered(creel$week)
    #Levels:2000-W01 < 2000-W02 < 2000-W03 < 2000-W04 < 2000-W05<...<2015-W53

I have tried to add the unique fID codes for one boat to begin with, but it did not work:

    creel$fID[which(creel$boat=="Mousa" & creel$week=>"2004-W53" & creel$week=<"2015-W53"),] <- as.factor("aa") 

"aa" is the code I'd like to assign in the fID vector, only between weeks 2004-W53 and 2015-W53. I wasn't sure if R would recognise > or < when using with the weeks - I did find that unclass(creel$week) gives actual values for week numbers which could maybe be used instead.

I also tried using ifelse, but only for boats where the owner did not change throughout the entire dataset (in which case the week is irrelevant). Something like this, (also didn't work!):

    creel$fID <- ifelse(creel$boat=="Unst", as.factor(creel$fID=="ad"), NA)

The dataset is very large, although I'm happy to do each person/boat combo individually if it's easier.

UPDATE: Here's an example of the other df I have, which details who owned which boat, and when:

        Person  code     boat1 date_from  date_to  boat2 date_from2 date_to2
    1      Bob    aa     Mousa  2002-W53 2005-W34   <NA>       <NA>     <NA>
    2     Bill    ab      Yell  1999-W52 2010-W52   <NA>       <NA>     <NA>
    3    James    ac     Foula  1999-W52 2005-W26  Mousa   2005-W35 2015-W53
    4      Tom    ad      Unst  1999-W52 2015-W53   <NA>       <NA>     <NA>
    5   Willie    ae    Fetlar  2007-W35 2015-W53   <NA>       <NA>     <NA>
    6    Wayne    af      Yell  2011-W01 2013-W13   <NA>       <NA>     <NA>

You can see that James owned the "Mousa" after Bob, and that Wayne owned the "Yell" after Bill. I need James' ID to remain as "ac" for the weeks he owned both the Foula and the Mousa (ie so I can trace the fisherman through time, not necessarily just the boat).

Here is what I would do, though, there likely are better methods. I use dplyr but only marginally to count to observations per week. I believe everything else is done in base R.

library(dplyr)  

creel$ref.week<- rep(1:length(unique(creel$week)), 
                     (creel %>% group_by(week) %>% summarise(n= n()))$n)
#add a reference column

creel.subset<-creel[creel$ref.week %in% c(1,2),]
#subset the weeks you want by that reference column. Obviously your 
#reference weeks will be different. 

creel.subset$fID<-with(creel.subset, ifelse(boat =="Mousa", "aa", 
                                            ifelse(boat == "Yell", "ab",
                                                   ifelse(boat == "Foula", "ac", NA))))
#name the fID's however you want. This is just example.

creel.subset

      week year       boat  fID   ref.week
1 2000-W01 2000      Mousa   aa        1
2 2000-W01 2000       Yell   ab        1
3 2000-W01 2000      Foula   ac        1
4 2000-W01 2000      Foula   ac        1
5 2000-W02 2000      Foula   ac        2
6 2000-W02 2000 Papa_Stour <NA>        2
7 2000-W02 2000     Fetlar <NA>        2
8 2000-W02 2000       Unst <NA>        2

If you want to put it all back together into one big data.frame :

creel.back_together<-rbind(creel.subset, creel[!creel$ref.week %in% c(1,2),])
creel.back_together
       week year       boat  fID   ref.week
1  2000-W01 2000      Mousa   aa        1
2  2000-W01 2000       Yell   ab        1
3  2000-W01 2000      Foula   ac        1
4  2000-W01 2000      Foula   ac        1
5  2000-W02 2000      Foula   ac        2
6  2000-W02 2000 Papa_Stour <NA>        2
7  2000-W02 2000     Fetlar <NA>        2
8  2000-W02 2000       Unst <NA>        2
9  2000-W03 2000      Foula <NA>        3
10 2000-W03 2000  Fair_Isle <NA>        3

Edit: I spent over an hour trying to figure out how to make this work with the ISOweek values, but no luck. I definitely think this would be easier to do dealing with regular date values. Here is my solution with the additional data.frame you provided which i called mydata , though it ended up becoming mydata3 . I do expect this to be fairly slow for large data sets, but I am pretty sure it does what you are wanting:

library(ISOweek)
library(lubridate)
library(data.table)

fullWeek<-function(x){
  paste(x, "-1", sep = "")
}

creel$week<-as.character(creel$week)
creel$week<-fullWeek(creel$week)
creel$week<-ISOweek2date(creel$week)
creel$week<-as_date(ymd(creel$week))

mydata1<-mydata[,1:5]
mydata2<-mydata[,c(1:2,6:8)]
colnames(mydata2)<-colnames(mydata1)
mydata3<-na.omit(rbind(mydata1, mydata2))
mydata3[,4:5]<-sapply(mydata3[,4:5], fullWeek)
mydata3[,4:5]<-lapply(mydata3[,4:5], ISOweek2date)
mydata3[,4:5]<-lapply(mydata3[,4:5], function(x) as_date(ymd(x)))
## undoing all of the ISOweek nonsense

for(i in 1:nrow(mydata3)){
  boat1<-mydata3[i,]$boat1
  date_from<-mydata3[i,]$date_from
  date_to<-mydata3[i,]$date_to
  code<-mydata3[i,]$code

  for(j in 1:nrow(creel)){
    boat2<-creel[j,]$boat
    date<-creel[j,]$week

  if(boat1 == boat2 && date %between% c(date_from, date_to)) {
    creel[j,]$fID<-code
    }
  }
}

creel

      week year       boat  fID
2000-01-03 2000      Mousa <NA>
2000-01-03 2000       Yell   ab
2000-01-03 2000      Foula   ac
2000-01-03 2000      Foula   ac
2000-01-10 2000      Foula   ac
2000-01-10 2000 Papa_Stour <NA>
2000-01-10 2000     Fetlar <NA>
2000-01-10 2000       Unst   ad
2000-01-17 2000      Foula   ac
2000-01-17 2000  Fair_Isle <NA>

Now, if you decide you want to work with the ISOweek dates out of convenience, then:

creel$week<-ISOweek(creel$week)
creel

    week year       boat  fID
2000-W01 2000      Mousa <NA>
2000-W01 2000       Yell   ab
2000-W01 2000      Foula   ac
2000-W01 2000      Foula   ac
2000-W02 2000      Foula   ac
....

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM