简体   繁体   中英

R - Categorize a dataset

Morning folks,

I'm trying to categorize a set of numerical values (Days Left divided by 365.2 which gives us approximately the numbers of years left until a maturity).

The results of this first calculation give me a vector of 3560 values (example: 0.81, 1.65, 3.26 [...], 0.2).

I'd like to categorise these results into intervals, [Between 0 and 1 Year, 0 and 2 Years, 0 and 3 years, 0 and 4 years, Over 4 years].

#Set the Data Frame
dfMaturity <- data.frame(Maturity = DATA$Maturity)

#Call the library and Run the function
MaturityX = ddply(df, .(Maturity), nrow)

#Set the Data Frame
dfMaturityID <- data.frame(testttto = DATA$Security.Name)

#Calculation of the remaining days
MaturityID = ddply(df, .(dfMaturityID$testttto), nrow)

survey <- data.frame(date=c(DATA$Maturity),tx_start=c("1/1/2022"))

survey$date_diff <- as.Date(as.character(survey$date), format="%m/%d/%Y")-
  as.Date(as.character(survey$tx_start), format="%m/%d/%Y")

# Data for the table
MaturityName <- MaturityID$`dfMaturityID$testttto
MaturityZ <- survey$date
TimeToMaturity <- as.numeric(survey$date_diff)

# /!/ HERE IS WHERE I NEED HELP /!/ I'M TRYING TO CATEGORISE THE RESULTS OF THIS CALCULATION
Multiplier <- TimeToMaturity /365.2
cx <- cut(Multiplier, breaks=0:5)

The original datasource comes from an excel file (DATA$Maturity)

If it can helps you:

''' print(Multiplier) '''

gives us

print(Multiplier)
   [1]  0.4956188  1.4950712  1.9989047  0.2464403  0.9994524  3.0010953  5.0000000  7.0016429  9.0005476
  [10] 21.0021906  4.1621030 13.1626506  1.1610077  8.6664841 28.5377875  3.1626506  6.7497262  2.0920044
  [19]  2.5602410  4.6495071  0.3368018  6.3225630  8.7130340 10.4956188  3.9019715 12.7957284  5.8378970

I copied the first three lines, but there is a total 3560 objects.

I'm open to any kind of help, I just want it to work:) thank you !

The cut function does that:

example <- c(0.81, 1.65, 3.26, 0.2)

cut(example, breaks = c(0, 1, 2, 3, 4), 
    labels = c("newborn", "one year old", "two", "three"))

Edit: From the comment

I'd like then to create a table with for example: 30% of the objects has a maturity between 0 and 1 year

You could compute that using the function below:

example <- c(0.81, 1.65, 3.26, 0.2)

share <- function(x, lower = 0, higher= 1){
  x <- na.omit(x)
  sum((lower <= x) & (x < higher))/length(x)
}

share(1:10, lower = 0,higher = 3.5) # true for 1:3 out of 1:10 so 30%

share(1:10, lower = 4.5, higher = 5.5) # true for 5 so 10%)

share(example, 0, 3)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM