简体   繁体   English

使用循环更改函数和列名

[英]Using a loop to change function and column names

I am performing a clutser analysis on data from the following site. 我正在对来自以下站点的数据进行分析。

https://www.kaggle.com/arjunbhasin2013/ccdata/version/1# https://www.kaggle.com/arjunbhasin2013/ccdata/version/1#

I have segmented the dataset using a 7 cluster solution using the following code. 我使用以下代码使用7聚类解决方案对数据集进行了细分。

    library(cluster)
    library(dplyr)

    CC_data <- read.csv("CC_GENERAL.csv")

    DistMatrix <- dist(CC_data[2:17])
    Ward_CCD <- hclust(DistMatrix, method = "ward.D2")
    CCD_hclust_cut <- cutree(tree = Ward_CCD, k = 7)
    CC_data <- mutate(CC_data, cluster = CCD_hclust_cut)

    # Subset the data into individual clusters for further analysis

    for (C in 1:7) {
      assign(paste0("cluster", C),filter(CC_data, cluster == C))
    }

Now I want to subset each cluster and generate boxplots to summarise the data. 现在,我想对每个群集进行子集化,并生成箱形图以汇总数据。 The problem is, some of the data has been scaled [0,1], while the rest is in absolute dollar values and one column is a percentage value that needs to be rescaled (PRC_FULL_PAYMENT). 问题是,某些数据已缩放为[0,1],而其余数据为绝对美元值,其中一列是需要重新缩放的百分比值(PRC_FULL_PAYMENT)。

I want to create two sets of boxplots for each cluster solution, using a loop to change the cluster being referred to in the code. 我想为每个集群解决方案创建两组箱线图,使用循环更改代码中引用的集群。 Doing things manually, the code I have is: 手动执行操作,我拥有的代码是:

    C1_frequency <- data.frame(
      cluster1$BALANCE_FREQUENCY, 
      cluster1$PURCHASES_FREQUENCY, 
      cluster1$ONEOFF_PURCHASES_FREQUENCY, 
      cluster1$PURCHASES_INSTALLMENTS_FREQUENCY,
      cluster1$CASH_ADVANCE_FREQUENCY,
      cluster1$PRC_FULL_PAYMENT / 100
    )

    C1_unscaled <- data.frame(
      cluster1$BALANCE,
      cluster1$PURCHASES,
      cluster1$ONEOFF_PURCHASES,
      cluster1$INSTALLMENTS_PURCHASES,
      cluster1$CASH_ADVANCE,
      cluster1$CASH_ADVANCE_TRX,
      cluster1$PURCHASES_TRX,
      cluster1$CREDIT_LIMIT,
      cluster1$PAYMENTS,
      cluster1$MINIMUM_PAYMENTS
    )

This works OK, but I want to avoid the needless repetition by using some sort of loop. 这行得通,但是我想通过使用某种循环来避免不必要的重复。 I've been trying to use various combinations of the assign() and paste0() functions, as well as one attempt at using [[]] which I still don't really understand, but I keep getting different errors each time I try something. 我一直在尝试使用assign()和paste0()函数的各种组合,以及一次尝试使用[[]]的尝试,虽然我仍然不太了解,但是每次尝试时,都会出现不同的错误的东西。

How can I change the cluster number for 1:7 without doing a copy and paste job? 如何在不执行复制和粘贴工作的情况下将群集号更改为1:7?

Someone can probably provide a more elegant answer, but here's a quick'n'dirty solution: 有人可能会提供一个更优雅的答案,但这是一个快速的解决方案:

library(dplyr)

for (i in 1:7) {

  assign(paste0("C", i, "_frequency"), {
      get(paste0("cluster", i)) %>%
      mutate(PRC_FULL_PAYMENT_SCALED = PRC_FULL_PAYMENT / 100) %>%
      select(BALANCE_FREQUENCY, PURCHASES_FREQUENCY, ONEOFF_PURCHASES_FREQUENCY, PURCHASES_INSTALLMENTS_FREQUENCY, CASH_ADVANCE_FREQUENCY, PRC_FULL_PAYMENT_SCALED)
  })

  assign(paste0("C", i, "_unscaled"), {
    get(paste0("cluster", i)) %>%
      mutate(PRC_FULL_PAYMENT_SCALED = PRC_FULL_PAYMENT / 100) %>%
      select(BALANCE, PURCHASES, ONEOFF_PURCHASES, INSTALLMENTS_PURCHASES, CASH_ADVANCE, CASH_ADVANCE_TRX, PURCHASES_TRX, CREDIT_LIMIT, PAYMENTS, MINIMUM_PAYMENTS)
  })
}

Maybe you could try to create a function 也许您可以尝试创建一个函数

create_subset <- function(df) {
  list(C1_frequency <- data.frame(
                      df$BALANCE_FREQUENCY, 
                      df$PURCHASES_FREQUENCY, 
                      df$ONEOFF_PURCHASES_FREQUENCY, 
                      df$PURCHASES_INSTALLMENTS_FREQUENCY,
                      df$CASH_ADVANCE_FREQUENCY,
                      df$PRC_FULL_PAYMENT / 100),
       C1_unscaled <- data.frame(
                df$BALANCE,
                df$PURCHASES,
                df$ONEOFF_PURCHASES,
                df$INSTALLMENTS_PURCHASES,
                df$CASH_ADVANCE,
                df$CASH_ADVANCE_TRX,
                df$PURCHASES_TRX,
                df$CREDIT_LIMIT,
                df$PAYMENTS,
                df$MINIMUM_PAYMENTS))
}

and then use lapply to apply it to all clusters 然后使用lapply将其应用于所有集群

lapply(mget(paste0("cluster", 1:7)), create_subset)

Also you could include any other code which you want to apply to each cluster (like boxplot etc.) in the same function create_subset . 同样,您可以在同一函数create_subset包含要应用于每个群集的任何其他代码(如boxplot等)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM