簡體   English   中英

創建一個 function 和多個 arguments 子集 dataframe [R]

[英]Creating a function with multiple arguments that subsets a dataframe [R]

我有一個名為titanic的數據框,其中包含泰坦尼克號上的 2021 行乘客以及每位乘客的具體特征:

Class  Sex   Age Survived
1   3rd Male Child       No
2   3rd Male Child       No
3   3rd Male Child       No
4   3rd Male Child       No
5   3rd Male Child       No
6   3rd Male Child       No
...

我想創建一個 function,它有多個 arguments,看起來像這樣:

f1 <- function(sex, age, class, survived){
...
}

arguments 是我輸入一些乘客標准的地方。 例如,我希望能夠將條件輸入 function 這樣

f1("Female", "Child","3rd", "Yes")

回報

     Class    Sex   Age Survived
1534   3rd Female Child      Yes
1535   3rd Female Child      Yes
1536   3rd Female Child      Yes
1537   3rd Female Child      Yes
1538   3rd Female Child      Yes

現在,我已經對其進行了硬編碼,並且只使用了一個 if else 語句來涵蓋所有的可能性。

function.q6.1 <- function(sex,age,class,survival){
  if(sex == "Male" & age == "Child" & class == "3rd" & survival == "No"){
    subset(titanic, Sex == "Male" & Age == "Child" & Class == "3rd" & Survived == "No")
  }
  else if(sex == "Female" & age == "Child" & class == "3rd" & survival == "No"){
    subset(titanic, Sex == "Female" & Age == "Child" & Class == "3rd" & Survived == "No")
  }
  else if(sex == "Male" & age == "Adult" & class == "3rd" & survival == "No"){
    subset(titanic, Sex == "Male" & Age == "Adult" & Class == "3rd" & Survived == "No")
  }
...
}

我想知道是否有更有效的方法來做到這一點。 提前謝謝你。

這假設第一個參數是數據框,其余 arguments 是每個列的值,按照它們在數據框中出現的順序排列,否則被命名。

arguments 可以少於列,在這種情況下,對於未命名的 arguments,數據框的第一列將與相同數量的 arguments 匹配。如果 arguments 已命名,則匹配將使用這些名稱。 數據框之后的所有 arguments 必須命名或不命名。 如果僅傳遞數據幀而沒有其他 arguments,則將無形地返回 NULL。

如果數據框后有一個非零數字 arguments,我們將獲取名稱或使用前 n 個名稱,其中 n 是數據框后 arguments 的數字。 然后假設這些行不匹配,從 dat 中刪除帶有 NA 的行。 mapply 將連續列與返回邏輯矩陣的連續參數值進行比較。 apply 每行返回一個邏輯值,然后我們下標。

我們在測試調用末尾的注釋中使用可重復顯示的數據框。

f1 <- function(dat, ...) {
  if (n <- ...length()) {
    if (is.null(nms <- ...names())) nms <- head(names(dat), n)
    dat <- na.omit(dat)
    dat[apply(mapply(`==`, dat[nms], list(...)), 1, all), ]
  }
}

現在我們運行一些測試

f1(dat, "3rd", "Male", "Child", "No")
##   Class  Sex   Age Survived
## 1   3rd Male Child       No
## 2   3rd Male Child       No
## 3   3rd Male Child       No
## 4   3rd Male Child       No
## 5   3rd Male Child       No
## 6   3rd Male Child       No

f1(dat, "3rd", "Female", "Child", "No")
## [1] Class    Sex      Age      Survived
## <0 rows> (or 0-length row.names)

f1(dat, "3rd")
##   Class  Sex   Age Survived
## 1   3rd Male Child       No
## 2   3rd Male Child       No
## 3   3rd Male Child       No
## 4   3rd Male Child       No
## 5   3rd Male Child       No
## 6   3rd Male Child       No

f1(BOD, 1, 8.3)  # BOD is built into R
##   Time demand
## 1    1    8.3

f1(BOD, demand = 8.3)
##   Time demand
## 1    1    8.3

筆記

Lines <- "
Class  Sex   Age Survived
1   3rd Male Child       No
2   3rd Male Child       No
3   3rd Male Child       No
4   3rd Male Child       No
5   3rd Male Child       No
6   3rd Male Child       No"
dat <- read.table(text = Lines)

更新

允許 arguments 少於列,並允許 arguments 被命名。

#toy dataset
set.seed(1912)
titanic <- data.frame(class = sample(c("1st","2nd","3rd"),100,replace = T),
                      sex = sample(c("Male","Female"),100,replace = T),
                      age = sample(c("Child","Adult"),100,replace = T),
                      survival = sample(c("Yes","No"),100,replace = T)
                      )

f1 <- function(sex,age,class,survival) {
  titanic[titanic$class==class&titanic$sex==sex&titanic$age==age&titanic$survival==survival,]
}

f1("Female", "Child","3rd", "Yes")

class    sex   age survival
11   3rd Female Child      Yes
15   3rd Female Child      Yes
38   3rd Female Child      Yes
71   3rd Female Child      Yes
85   3rd Female Child      Yes
94   3rd Female Child      Yes

如果您使用的是問題中顯示的 data.frame,則可以使用

library(dplyr)
my_filter <- function(sex, age, class, survived) {

  df %>% 
    filter(Sex == sex, Age == age, Class == class, Survived == survived)

}

現在my_filter("Female", "Child","3rd", "Yes")返回

   Class    Sex   Age Survived
7    3rd Female Child      Yes
8    3rd Female Child      Yes
9    3rd Female Child      Yes
10   3rd Female Child      Yes
11   3rd Female Child      Yes 

更新:

將列和條件分別存儲在一個向量中,然后將 function 應用於 dataframe:

library(dplyr)
library(stringr)

f1 <- paste(f1, collapse = "|")
cols <- c("Sex", "Age", "Class", "Survived")

my_function <- function(df){
  df %>% 
    select(cols) %>% 
    filter(if_all(everything(), ~str_detect(.,f1))
    )
  }
my_function(df)

第一個答案:

也許另一種策略是:

library(dplyr)
library(stringr)

f1 <- paste(f1, collapse = "|")

my_function <- function(df){
  df %>% 
    select(Sex, Age, Class, Survived) %>% 
    filter(if_all(everything(), ~str_detect(.,f1))
    )
  }

my_function(df)

output:

       Sex   Age Class Survived
1534 Female Child   3rd      Yes
1535 Female Child   3rd      Yes
1536 Female Child   3rd      Yes
1537 Female Child   3rd      Yes
1538 Female Child   3rd      Yes

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM