简体   繁体   中英

How to edit “row.names” after split and cut2 in R?

I want to edit out some information from row.names that are created automatically once split and cut2 were used. See following code:

#Mock data
date_time <- as.factor(c('8/24/07 17:30','8/24/07 18:00','8/24/07 18:30',
                        '8/24/07 19:00','8/24/07 19:30','8/24/07 20:00',
                        '8/24/07 20:30','8/24/07 21:00','8/24/07 21:30',
                        '8/24/07 22:00','8/24/07 22:30','8/24/07 23:00',
                        '8/24/07 23:30','8/25/07 00:00','8/25/07 00:30'))
U. <- as.numeric(c('0.2355','0.2602','0.2039','0.2571','0.1419','0.0778','0.3557',
                 '0.3065','0.1559','0.0943','0.1519','0.1498','0.1574','0.1929'
                 ,'0.1407'))

#Mock data frame
test_data <- data.frame(date_time,U.)


#To use cut2
library(Hmisc)

#Splitting the data into categories
sub_data <- split(test_data,cut2(test_data$U.,c(0,0.1,0.2)))
new_data <- do.call("rbind",sub_data)
test_data <- new_data

You will see that "test_data" would have an extra column "row.names" with values such as "[0.000,0.100).6", "[0.000,0.100).10", etc.

How do I remove "[0.000,0.100)" and keep the number after the "." such as 6 and 10 so that I can reference these rows by their original row number later?

Any other better method to do this?

You could use a Regular Expression (Regex), as follows:

rownames(test_data) = gsub(".*[]\\)]\\.", "", rownames(test_data))

It's cryptic if you're not familiar with Regular Expressions, but it basically says match any sequence of characters ( .* ) that are followed by either a brace or parenthesis ( []\\\\)] ) and then by a period ( \\\\. ) and remove all of it.

The double backslashes are "escapes" indicating that the character following the double-backslash should be interpreted literally, rather than in its special Regex meaning (eg, . means "match any single character", but \\\\. means "this is really just a period").

Just for fun, you can also use regmatches

> Names <- rownames(test_data)
> ( rownames(test_data) <- regmatches(Names, regexpr("[0-9]+$", Names))  )
 [1] "6"  "10" "5"  "9"  "11" "12" "13" "14" "15" "1"  "2"  "3"  "4"  "7"  "8" 

You could also set the names of sub_data to NULL.

names(sub_data) <- NULL     
test_data <- do.call('rbind', sub_data)
row.names(test_data)
#[1] "6"  "10" "5"  "9"  "11" "12" "13" "14" "15" "1"  "2"  "3"  "4"  "7"  "8" 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM