简体   繁体   中英

Conditional rolling string concat in data.table

I have a data.table obtained from a somewhat quirky file:

library(data.table)

istub  <- setDT(read.fwf( 'http://www.bls.gov/cex/pumd/2016/csxistub.txt', 
                          widths=c(2,3,64,12,2,3,10), skip=1,
                          stringsAsFactors=FALSE, strip.white=TRUE,
                          col.names = c( "type", "level", "title", "UCC", 
                                         "survey", "factor","group" )
                ) )

One of the quirks of the file is that if type==2 , the row merely holds a continuation of the previous row's title field.

So, I want to append the continuation title to the previous row's title. I assume there is only ever one continuation line per ordinary line.

For each example, please begin with:

df <- copy(istub) # avoids extra requests of file

Base R solution: (desired result)

I know I can do:

# if type == 2, "title" field should be appended to the above row's "title" field
continued <- which(df$type==2)

# You can see that these titles are incomplete,
#  e.g., "School books, supplies, equipment for vocational and"  
tail(df$title[continued-1])

df$title[continued-1] <- paste(df$title[continued-1],df$title[continued])

# Now they're complete
# e.g., "School books, supplies, equipment for vocational and technical schools"    
tail(df$title[continued-1])

# And we could get rid of the continuation lines
df <- df[-continued]

However, I would like to practice some data.table fu.

Attempts using data.table

First I tried using shift() to subset .i , but that didn't work:

df[shift(type, type='lead')==2, 
     title := paste(title, shift(title, type='lead') ) ] # doesn't work

This works:

df[,title := ifelse( shift(type, type='lead')==2,
                     paste(title, shift(title, type='lead')),
                     title ) ]

Am I stuck with two shift s (seems inefficient) or is there an awesomer way?

I was able to do it with a shift() -ed ifelse() .

df[, title := paste0(title, shift( ifelse(type==2, paste0(' ',title), ''),
                                   type='lead')
                     ) ]
df <- df[type==1] # can get rid of continuation lines

It seems kind of hacky, paste0 -ing a mostly empty string vector, so improvements welcome.

ifelse is pretty much always avoidable and worth avoiding.**

I'd probably do...

# back up the data before editing values
df0 = copy(df)

# find rows
w = df[type == 2, which = TRUE]

# edit at rows up one
stopifnot(all(w > 1))
df[w-1, title := paste(title, df$title[w])]

# drop rows
res = df[-w]

** Some examples...

Q&A

Workarounds

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM