![](/img/trans.png)
[英]Get the column name and assign as value in unlisted column in the dataframe using r
[英]Assign column with value in the dataset itself using R
我對R很陌生。我有一列數據,其中包含大約26000個數據,該列包含大約1200個唯一數據。 讓我們假設該列的名稱為“ Breed”。
我需要的是
我需要獲取列中每個唯一值的頻率。
我提取了BreedType和頻率,如下所示。 (品種列的名稱為BreedType)
然后,如果每個BreedType的頻率小於50,則使用if條件,我需要使用'F'創建一個新列,如果大於50,則需要為該列分配'Breedtype'的值。
這是我嘗試過的。
x<- sort(table(full$Breed),decreasing=T)
w=as.data.frame(x)
names(w)[1] = 'BreedType'
w$TrueFalse<-ifelse(w$Freq<50,F,w$BreedType)
w$TrueFalse
但是給出的輸出不是我期望的。 盡管F正確分配了每一列,但w $ BreedType不會獲得BreedType的值,而是將整數加1而不是給出特定的BreedType的整數。
有人可以解釋一下為什么輸出不如預期嗎。
“品種列”在數據集中看起來像下面,具有20,000行和1200個唯一值。
Breed
Shetland Sheepdog Mix
Domestic Shorthair Mix
Pit Bull Mix
Domestic Shorthair Mix
Lhasa Apso/Miniature Poodle
Cairn Terrier/Chihuahua Shorthair
Domestic Shorthair Mix
Domestic Shorthair Mix
American Pit Bull Terrier Mix
Cairn Terrier
Domestic Shorthair Mix
Miniature Schnauzer Mix
Pit Bull Mix
Yorkshire Terrier Mix
Great Pyrenees Mix
Domestic Shorthair Mix
Domestic Shorthair Mix
Pit Bull Mix
Angora Mix
Flat Coat Retriever Mix
Queensland Heeler Mix
Domestic Shorthair Mix
Plott Hound/Boxer
我的預期結果是
BreedType Frequency TrueFalse
Shetland Sheepdog Mix 60 Shetland Sheepdog Mix
Domestic Shorthair Mix 20 F
Pit Bull Mix 80 Pit Bull Mix
Domestic Shorthair Mix 10 F
原始數據- full
數據幀:
> full
# Breed
# 1: Shetland Sheepdog Mix
# 2: Domestic Shorthair Mix
# 3: Pit Bull Mix
# 4: Domestic Shorthair Mix
# 5: Lhasa Apso/Miniature Poodle
# 6: Cairn Terrier/Chihuahua Shorthair
# 7: Domestic Shorthair Mix
# 8: Domestic Shorthair Mix
# 9: American Pit Bull Terrier Mix
# 10: Cairn Terrier
# 11: Domestic Shorthair Mix
# 12: Miniature Schnauzer Mix
# 13: Pit Bull Mix
# 14: Yorkshire Terrier Mix
# 15: Great Pyrenees Mix
# 16: Domestic Shorthair Mix
# 17: Domestic Shorthair Mix
# 18: Pit Bull Mix
# 19: Angora Mix
# 20: Flat Coat Retriever Mix
# 21: Queensland Heeler Mix
# 22: Domestic Shorthair Mix
# 23: Plott Hound/Boxer
# Breed
在工作區中加載data.table庫
library("data.table")
通過引用將full
數據幀轉換為數據表
setDT(full)
將full
復制到dt1
。 這樣做是為了備份full
數據表
dt1 <- copy(full)
通過BreedType(品種列)對dt1
進行分組,然后訪問.N內部變量,該變量存儲每個子集中的條目數,並對其進行ifelse條件。 然后將其另存為Frequency和TrueFalse列變量。
dt1[, c("Frequency", "TrueFalse") := .(.N, ifelse(.N < 50, FALSE, Breed)), by = Breed]
在上述步驟之后顯示dt1
> dt1
# Breed Frequency TrueFalse
# 1: Shetland Sheepdog Mix 1 FALSE
# 2: Domestic Shorthair Mix 8 FALSE
# 3: Pit Bull Mix 3 FALSE
# 4: Domestic Shorthair Mix 8 FALSE
# 5: Lhasa Apso/Miniature Poodle 1 FALSE
# 6: Cairn Terrier/Chihuahua Shorthair 1 FALSE
# 7: Domestic Shorthair Mix 8 FALSE
# 8: Domestic Shorthair Mix 8 FALSE
# 9: American Pit Bull Terrier Mix 1 FALSE
# 10: Cairn Terrier 1 FALSE
# 11: Domestic Shorthair Mix 8 FALSE
# 12: Miniature Schnauzer Mix 1 FALSE
# 13: Pit Bull Mix 3 FALSE
# 14: Yorkshire Terrier Mix 1 FALSE
# 15: Great Pyrenees Mix 1 FALSE
# 16: Domestic Shorthair Mix 8 FALSE
# 17: Domestic Shorthair Mix 8 FALSE
# 18: Pit Bull Mix 3 FALSE
# 19: Angora Mix 1 FALSE
# 20: Flat Coat Retriever Mix 1 FALSE
# 21: Queensland Heeler Mix 1 FALSE
# 22: Domestic Shorthair Mix 8 FALSE
# 23: Plott Hound/Boxer 1 FALSE
# Breed Frequency TrueFalse
您提供的數據的繁殖類型頻率不超過50。如果您有一個,則將根據ifelse語句添加繁殖類型,而不是FALSE。
假設您已經實現了每個BreedType的頻率實現。 這類似於@Sathish,但是使用data.frame
而不是data.table
testData <- data.frame(BreedType = c("Shetland Sheepdog Mix", "Domestic Shorthair Mix", "Pit Bull Mix", "Domestic Shorthair Mix"),
Frequency = c(60, 20, 80, 10), stringsAsFactors = F)
testData$TrueFalse <- testData$BreedType
testData$TrueFalse[testData$Frequency < 50] <- F
輸出與您所擁有的相同。 但是,“ FALSE”將轉換為字符串(而不是布爾值),因為該列已初始化為字符向量。 我不確定您是否可以混合使用布爾值和字符串。
您可以使用plyr
包中的count
功能。 我已經使用您所提供的數據演示了一個示例。
> library(plyr)
> df <- read.table(text = "Shetland Sheepdog Mix
Domestic Shorthair Mix
Pit Bull Mix
Domestic Shorthair Mix
Lhasa Apso/Miniature Poodle
Cairn Terrier/Chihuahua Shorthair
Domestic Shorthair Mix
Domestic Shorthair Mix
American Pit Bull Terrier Mix
Cairn Terrier
Domestic Shorthair Mix
Miniature Schnauzer Mix
Pit Bull Mix
Yorkshire Terrier Mix
Great Pyrenees Mix
Domestic Shorthair Mix
Domestic Shorthair Mix
Pit Bull Mix
Angora Mix
Flat Coat Retriever Mix
Queensland Heeler Mix
Domestic Shorthair Mix
Plott Hound/Boxer", sep='\n', stringsAsFactors = F, col.names = c('Breed'))
使用plyr::count
函數。
> df <- count(df, 'Breed')
> df
## Breed freq
## 1 American Pit Bull Terrier Mix 1
## 2 Angora Mix 1
## 3 Cairn Terrier 1
## 4 Cairn Terrier/Chihuahua Shorthair 1
## 5 Domestic Shorthair Mix 8
## 6 Flat Coat Retriever Mix 1
## ...
## ...
> df$TrueFalse <- ifelse(df$freq >= 3, df$Breed, F)
> df
Breed freq TrueFalse
## 1 American Pit Bull Terrier Mix 1 FALSE
## 2 Angora Mix 1 FALSE
## 3 Cairn Terrier 1 FALSE
## 4 Cairn Terrier/Chihuahua Shorthair 1 FALSE
## 5 Domestic Shorthair Mix 8 Domestic Shorthair Mix
## 6 Flat Coat Retriever Mix 1 FALSE
好吧,您也可以使用base R
table
來獲取頻率
new_df <- data.frame(table(df$Breed))
# Var1 Freq
#1 American Pit Bull Terrier Mix 1
#2 Angora Mix 1
#3 Cairn Terrier 1
#4 Cairn Terrier/Chihuahua Shorthair 1
#5 Domestic Shorthair Mix 8
#6 Flat Coat Retriever Mix 1
#7 Great Pyrenees Mix 1
#8 Lhasa Apso/Miniature Poodle 1
#9 Miniature Schnauzer Mix 1
#10 Pit Bull Mix 3
#11 Plott Hound/Boxer 1
#12 Queensland Heeler Mix 1
#13 Shetland Sheepdog Mix 1
#14 Yorkshire Terrier Mix 1
然后使用ifelse
獲取TrueFalse
列的值
new_df$TrueFalse <- ifelse(new_df$Freq > 2, as.character(new_df$Var1), "F")
# Var1 Freq TrueFalse
#1 American Pit Bull Terrier Mix 1 F
#2 Angora Mix 1 F
#3 Cairn Terrier 1 F
#4 Cairn Terrier/Chihuahua Shorthair 1 F
#5 Domestic Shorthair Mix 8 Domestic Shorthair Mix
#6 Flat Coat Retriever Mix 1 F
#7 Great Pyrenees Mix 1 F
#8 Lhasa Apso/Miniature Poodle 1 F
#9 Miniature Schnauzer Mix 1 F
#10 Pit Bull Mix 3 Pit Bull Mix
#11 Plott Hound/Boxer 1 F
#12 Queensland Heeler Mix 1 F
#13 Shetland Sheepdog Mix 1 F
#14 Yorkshire Terrier Mix 1 F
如果我們需要匯總輸出,則
library(data.table)
setDT(df)[, .(Frequency = .N, TrueFalse = .N > 55), by = Breed]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.