简体   繁体   中英

How do I specify the rows that have additional pieces while using separate() in R?

I'm using tidyr to clean up my data like this

df <- data.frame(Time = c("2014-01-03", "2014-01-04-morning", "01-06", "2014-01-07"), stringsAsFactors = FALSE)

str(df)
'data.frame':   4 obs. of  1 variable:
 $ Time: chr  "2014-01-03" "2015-01-04-morning" "01-06" "2014-01-07"

Then when I using

separate(df, Time, c("Y", "M", "D"), sep = '-')
     Y  M    D
1 2014 01   03
2 2014 01   04
3   01 06 <NA>
4 2014 01   07

Warning messages: 1: Expected 3 pieces. Additional pieces discarded in 1 rows [2]. 2: Expected 3 pieces. Missing pieces filled with NA in 1 rows [3].

Then how could I get the list that indicates the rows having additional pieces, in this example, [2]?

An option would be to convert to 'Date' class with anydate (from anytime - it would convert most of the formats into Date class, but there would be some edge cases "01-06" - which is not a Date as it doesn't have the 'Year' or one of the components)

library(tidyverse)
library(anytime)
df %>% 
   mutate(DATE = anydate(DATE)) %>% 
   separate(DATE, into = c("Y", "M", "D"))

Update

If we need a flag column

df %>%
   mutate(flag = str_count(Time, "\\w+") >3) %>%
   separate(Time, into = c("Y", "M", "D"))

data

df <- data.frame(DATE = c("2014-01-03", "2014-01-04-A", "01-06", 
       "2014-01-07"), stringsAsFactors = FALSE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM