![](/img/trans.png)
[英]Get the column name and assign as value in unlisted column in the dataframe using r
[英]Assign column with value in the dataset itself using R
我对R很陌生。我有一列数据,其中包含大约26000个数据,该列包含大约1200个唯一数据。 让我们假设该列的名称为“ Breed”。
我需要的是
我需要获取列中每个唯一值的频率。
我提取了BreedType和频率,如下所示。 (品种列的名称为BreedType)
然后,如果每个BreedType的频率小于50,则使用if条件,我需要使用'F'创建一个新列,如果大于50,则需要为该列分配'Breedtype'的值。
这是我尝试过的。
x<- sort(table(full$Breed),decreasing=T)
w=as.data.frame(x)
names(w)[1] = 'BreedType'
w$TrueFalse<-ifelse(w$Freq<50,F,w$BreedType)
w$TrueFalse
但是给出的输出不是我期望的。 尽管F正确分配了每一列,但w $ BreedType不会获得BreedType的值,而是将整数加1而不是给出特定的BreedType的整数。
有人可以解释一下为什么输出不如预期吗。
“品种列”在数据集中看起来像下面,具有20,000行和1200个唯一值。
Breed
Shetland Sheepdog Mix
Domestic Shorthair Mix
Pit Bull Mix
Domestic Shorthair Mix
Lhasa Apso/Miniature Poodle
Cairn Terrier/Chihuahua Shorthair
Domestic Shorthair Mix
Domestic Shorthair Mix
American Pit Bull Terrier Mix
Cairn Terrier
Domestic Shorthair Mix
Miniature Schnauzer Mix
Pit Bull Mix
Yorkshire Terrier Mix
Great Pyrenees Mix
Domestic Shorthair Mix
Domestic Shorthair Mix
Pit Bull Mix
Angora Mix
Flat Coat Retriever Mix
Queensland Heeler Mix
Domestic Shorthair Mix
Plott Hound/Boxer
我的预期结果是
BreedType Frequency TrueFalse
Shetland Sheepdog Mix 60 Shetland Sheepdog Mix
Domestic Shorthair Mix 20 F
Pit Bull Mix 80 Pit Bull Mix
Domestic Shorthair Mix 10 F
原始数据- full
数据帧:
> full
# Breed
# 1: Shetland Sheepdog Mix
# 2: Domestic Shorthair Mix
# 3: Pit Bull Mix
# 4: Domestic Shorthair Mix
# 5: Lhasa Apso/Miniature Poodle
# 6: Cairn Terrier/Chihuahua Shorthair
# 7: Domestic Shorthair Mix
# 8: Domestic Shorthair Mix
# 9: American Pit Bull Terrier Mix
# 10: Cairn Terrier
# 11: Domestic Shorthair Mix
# 12: Miniature Schnauzer Mix
# 13: Pit Bull Mix
# 14: Yorkshire Terrier Mix
# 15: Great Pyrenees Mix
# 16: Domestic Shorthair Mix
# 17: Domestic Shorthair Mix
# 18: Pit Bull Mix
# 19: Angora Mix
# 20: Flat Coat Retriever Mix
# 21: Queensland Heeler Mix
# 22: Domestic Shorthair Mix
# 23: Plott Hound/Boxer
# Breed
在工作区中加载data.table库
library("data.table")
通过引用将full
数据帧转换为数据表
setDT(full)
将full
复制到dt1
。 这样做是为了备份full
数据表
dt1 <- copy(full)
通过BreedType(品种列)对dt1
进行分组,然后访问.N内部变量,该变量存储每个子集中的条目数,并对其进行ifelse条件。 然后将其另存为Frequency和TrueFalse列变量。
dt1[, c("Frequency", "TrueFalse") := .(.N, ifelse(.N < 50, FALSE, Breed)), by = Breed]
在上述步骤之后显示dt1
> dt1
# Breed Frequency TrueFalse
# 1: Shetland Sheepdog Mix 1 FALSE
# 2: Domestic Shorthair Mix 8 FALSE
# 3: Pit Bull Mix 3 FALSE
# 4: Domestic Shorthair Mix 8 FALSE
# 5: Lhasa Apso/Miniature Poodle 1 FALSE
# 6: Cairn Terrier/Chihuahua Shorthair 1 FALSE
# 7: Domestic Shorthair Mix 8 FALSE
# 8: Domestic Shorthair Mix 8 FALSE
# 9: American Pit Bull Terrier Mix 1 FALSE
# 10: Cairn Terrier 1 FALSE
# 11: Domestic Shorthair Mix 8 FALSE
# 12: Miniature Schnauzer Mix 1 FALSE
# 13: Pit Bull Mix 3 FALSE
# 14: Yorkshire Terrier Mix 1 FALSE
# 15: Great Pyrenees Mix 1 FALSE
# 16: Domestic Shorthair Mix 8 FALSE
# 17: Domestic Shorthair Mix 8 FALSE
# 18: Pit Bull Mix 3 FALSE
# 19: Angora Mix 1 FALSE
# 20: Flat Coat Retriever Mix 1 FALSE
# 21: Queensland Heeler Mix 1 FALSE
# 22: Domestic Shorthair Mix 8 FALSE
# 23: Plott Hound/Boxer 1 FALSE
# Breed Frequency TrueFalse
您提供的数据的繁殖类型频率不超过50。如果您有一个,则将根据ifelse语句添加繁殖类型,而不是FALSE。
假设您已经实现了每个BreedType的频率实现。 这类似于@Sathish,但是使用data.frame
而不是data.table
testData <- data.frame(BreedType = c("Shetland Sheepdog Mix", "Domestic Shorthair Mix", "Pit Bull Mix", "Domestic Shorthair Mix"),
Frequency = c(60, 20, 80, 10), stringsAsFactors = F)
testData$TrueFalse <- testData$BreedType
testData$TrueFalse[testData$Frequency < 50] <- F
输出与您所拥有的相同。 但是,“ FALSE”将转换为字符串(而不是布尔值),因为该列已初始化为字符向量。 我不确定您是否可以混合使用布尔值和字符串。
您可以使用plyr
包中的count
功能。 我已经使用您所提供的数据演示了一个示例。
> library(plyr)
> df <- read.table(text = "Shetland Sheepdog Mix
Domestic Shorthair Mix
Pit Bull Mix
Domestic Shorthair Mix
Lhasa Apso/Miniature Poodle
Cairn Terrier/Chihuahua Shorthair
Domestic Shorthair Mix
Domestic Shorthair Mix
American Pit Bull Terrier Mix
Cairn Terrier
Domestic Shorthair Mix
Miniature Schnauzer Mix
Pit Bull Mix
Yorkshire Terrier Mix
Great Pyrenees Mix
Domestic Shorthair Mix
Domestic Shorthair Mix
Pit Bull Mix
Angora Mix
Flat Coat Retriever Mix
Queensland Heeler Mix
Domestic Shorthair Mix
Plott Hound/Boxer", sep='\n', stringsAsFactors = F, col.names = c('Breed'))
使用plyr::count
函数。
> df <- count(df, 'Breed')
> df
## Breed freq
## 1 American Pit Bull Terrier Mix 1
## 2 Angora Mix 1
## 3 Cairn Terrier 1
## 4 Cairn Terrier/Chihuahua Shorthair 1
## 5 Domestic Shorthair Mix 8
## 6 Flat Coat Retriever Mix 1
## ...
## ...
> df$TrueFalse <- ifelse(df$freq >= 3, df$Breed, F)
> df
Breed freq TrueFalse
## 1 American Pit Bull Terrier Mix 1 FALSE
## 2 Angora Mix 1 FALSE
## 3 Cairn Terrier 1 FALSE
## 4 Cairn Terrier/Chihuahua Shorthair 1 FALSE
## 5 Domestic Shorthair Mix 8 Domestic Shorthair Mix
## 6 Flat Coat Retriever Mix 1 FALSE
好吧,您也可以使用base R
table
来获取频率
new_df <- data.frame(table(df$Breed))
# Var1 Freq
#1 American Pit Bull Terrier Mix 1
#2 Angora Mix 1
#3 Cairn Terrier 1
#4 Cairn Terrier/Chihuahua Shorthair 1
#5 Domestic Shorthair Mix 8
#6 Flat Coat Retriever Mix 1
#7 Great Pyrenees Mix 1
#8 Lhasa Apso/Miniature Poodle 1
#9 Miniature Schnauzer Mix 1
#10 Pit Bull Mix 3
#11 Plott Hound/Boxer 1
#12 Queensland Heeler Mix 1
#13 Shetland Sheepdog Mix 1
#14 Yorkshire Terrier Mix 1
然后使用ifelse
获取TrueFalse
列的值
new_df$TrueFalse <- ifelse(new_df$Freq > 2, as.character(new_df$Var1), "F")
# Var1 Freq TrueFalse
#1 American Pit Bull Terrier Mix 1 F
#2 Angora Mix 1 F
#3 Cairn Terrier 1 F
#4 Cairn Terrier/Chihuahua Shorthair 1 F
#5 Domestic Shorthair Mix 8 Domestic Shorthair Mix
#6 Flat Coat Retriever Mix 1 F
#7 Great Pyrenees Mix 1 F
#8 Lhasa Apso/Miniature Poodle 1 F
#9 Miniature Schnauzer Mix 1 F
#10 Pit Bull Mix 3 Pit Bull Mix
#11 Plott Hound/Boxer 1 F
#12 Queensland Heeler Mix 1 F
#13 Shetland Sheepdog Mix 1 F
#14 Yorkshire Terrier Mix 1 F
如果我们需要汇总输出,则
library(data.table)
setDT(df)[, .(Frequency = .N, TrueFalse = .N > 55), by = Breed]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.