简体   繁体   English

通过根据 R 中的分位数分配值在数据框中生成新列?

[英]Produce new column in data frame by assigning values based on quantiles in R?

Let's make a dummy vector called INCOME <- rnorm(1:1000, 500, 100)让我们创建一个名为INCOME <- rnorm(1:1000, 500, 100)的虚拟向量

Then let's take quantiles using function 'quantile': INCOME_QUANTILES <- quantile(INCOME, probs=c(0.05, 0.50, 1.00))然后让我们使用 function 'quantile' 获取分位数: INCOME_QUANTILES <- quantile(INCOME, probs=c(0.05, 0.50, 1.00))

Now I want to make a new vector called INCOME QUANTILE and attach this to my vector INCOME to create a data frame of 2 columns (INCOME / INCOME QUANTILE) of 1000 observations.现在我想创建一个名为 INCOME QUANTILE 的新向量并将其附加到我的向量 INCOME 以创建一个包含 1000 个观察值的 2 列(INCOME / INCOME QUANTILE)的数据框。 In this new vector should go a value of 1, 2, or 3, depending on which income quantile that observation falls into, so a value of 1 = 0.05 quantile, 2 = 0.50 quantile, and 3 = 1.00 quantile.在此新向量中,go 的值应为 1、2 或 3,具体取决于观察结果属于哪个收入分位数,因此值 1 = 0.05 分位数,2 = 0.50 分位数,3 = 1.00 分位数。

So for example, if the first observation of income falls into the 1.00 quantile, and the second observation falls into the 0.50 quantile, it'll look like:因此,例如,如果收入的第一个观测值落入 1.00 分位数,而第二个观测值落入 0.50 分位数,则它看起来像:

INCOME   INCOME QUANTILE
550.50         3
415.20         2

It's been suggested by a friend to create a for loop, but I'm honestly not sure at all how to go about that.一位朋友建议创建一个 for 循环,但老实说我完全不确定如何 go 。 Any help would be very appreciated!任何帮助将不胜感激!

You can try this:你可以试试这个:

INCOME <- rnorm(1:1000, 500, 100)
INCOME_QUANTILES <- quantile(INCOME, probs=c(0, 0.05, 0.50, 1.00))

df <- data.frame(INCOME, 
                 INCOME_GOUP = as.numeric(cut(INCOME, breaks = INCOME_QUANTILES, include.lowest = TRUE)))

Note that I had to add 0 as the lowest quantile.请注意,我必须添加 0 作为最低分位数。 So it's 0-0.05 = 1, >.05-.5 = 2, >.5 = 3.所以它是 0-0.05 = 1,>.05-.5 = 2,>.5 = 3。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在列表中生成新列,以数据帧从带有lapply(R)的数组中分配不同的值 - Generate new column in list assigning different values by data frame from an array with lapply (R) R:根据第二个data.frame中的值在d​​ata.frame中创建一个新列 - R: creating a new column in a data.frame based on values out of a second data.frame 尝试生成一个函数,该函数根据列中的值分隔数据框,然后从该分隔数据生成新的数据框 - Trying to produce a function that separates a data frame, based on values within a column, and then produces new data frames from that separated data 根据逻辑表达式为 R dataframe 中的新列赋值 - Assigning values to a new column in R dataframe based on logical expression 根据R中另一个数据框的值有条件地将1或0分配给新列 - Assigning 1 or 0 conditionally to a new column based on values from another dataframe in R 根据我的数据框中现有列的值,在 R 中创建一个新列 - Create a new column in R based off of values for an existing column in my data frame 根据其他列的分组在R数据框中创建新列 - Create new column in R data frame based on grouping of other column 根据 R 中数据框中所有其他列中的字符串值,使用 dplyr 创建一个新列 - Create a new column using dplyr based on string values in all other columns in a data frame in R R/tidyr-根据列值创建新的数据框行 - R/tidyr- Creating New Data Frame Rows Based On Column Values 基于R语言列中的特定值从现有数据中提取新数据帧 - Extract new data frame from existing, based on particular values in a column in R language
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM