簡體   English   中英

將描述性的統計值行從R導出到Excel工作表中

[英]Export descriptive statistics row of values into an Excel Sheet from R

我有一個大型數據庫,其中包含100多個變量以及100多個不同公司的85,000多個值。 我的目標是確定與幾個變量相對應的描述性統計量(均值,標准差,最小最大和值的數量)。

以下是有關一個給定公司的一組信息,我將其稱為F公司。

Attendance   Number of representatives   Number of Presenters     Company Audience  
29           2                            30                      2
20           3                            30                      4   
30           10                           20                      5
40           20                           10                      5
10           30                           13                      5

我想要做的是讓R計算描述統計量[均值,標准差,最小值和最大值],並針對這些特定列中的每一列,按以下方式將其導出到Excel中:

Company F  Average Number of Attendance Standard Deviation of Number of Attendance Min  Number of Attendance Max Number of Attendance and Number of People in Attendance Average of Number of Representatives   Standard Deviation of Number of Representatives Min of Number of Representatives Max Number of Representatives Total Number of Values Average Number of Presenters Standard Deviation Number of Presenters Min Number of Presenters Max Number of Presenters Total Number of Presenters Average Company Audience Standard Deviation Company Audience Min Number of Company Audience Max Number of Company Audience Total Number of Company Audience 

因為這行太長了,所以我要總結一下,我試圖找到這些列中每一列的描述性統計量[均值,標准差,最小值,最大值和n]。 這些都應對應於F公司。

我如何嘗試解決此問題:

我已經使用R中的描述性統計功能來獲取數據框以為我識別代碼。 為此,我使用了心理軟件包:

 library(psych)
 describe(CompanyF$Attendance)
 describe(CompanyF$NumberofRepresentatives)
 describe(CompanyF$Number_of_Presenters
 describe(CompanyF$Company Audience)

通過使用該程序包,我能夠獲取數據框,然后進入Excel並手動構造行,輸入我收到的值,並忽略心理庫程序包給出的與我感興趣的信息不符的任何信息。以下是我從心理軟件包獲得的信息類型的示例:

vars   n mean   sd median trimmed  mad min max range skew kurtosis   se
1    1 559 2.02 2.21      1    1.75 1.48   0   9     9 0.78    -0.65 0.09

此過程非常耗時,並且容易出錯。 在完成F公司的工作之后,我在Excel中為F公司的行下面創建了新行,但是這次是為G公司等其他公司創建的,因此我繼續查找描述性統計信息[均值,標准差, ,最大值和n]表示感興趣的這些變量中的每一個(出席人數,代表人數,演示者人數和公司受眾)。

我一直在尋找各種解決方案,其中之一來自此堆棧溢出后, 將數據從R導出到Excel,但找不到如何從R逐行信息導入Excel以及如何指定它的說明。確定我上面列出的描述性統計信息。

理想情況下,將以下輸出放入Excel:

Company F  Average Number of Attendance Standard Deviation of Number of Attendance Min  Number of Attendance Max Number of Attendance and Number of People in Attendance Average of Number of Representatives   Standard Deviation of Number of Representatives Min of Number of Representatives Max Number of Representatives Total Number of Values Average Number of Presenters Standard Deviation Number of Presenters Min Number of Presenters Max Number of Presenters Total Number of Presenters Average Company Audience Standard Deviation Company Audience Min Number of Company Audience Max Number of Company Audience Total Number of Company Audience 
Company G  Average Number of Attendance Standard Deviation of Number of Attendance Min  Number of Attendance Max Number of Attendance and Number of People in Attendance Average of Number of Representatives   Standard Deviation of Number of Representatives Min of Number of Representatives Max Number of Representatives Total Number of Values Average Number of Presenters Standard Deviation Number of Presenters Min Number of Presenters Max Number of Presenters Total Number of Presenters Average Company Audience Standard Deviation Company Audience Min Number of Company Audience Max Number of Company Audience Total Number of Company Audience 
Company H  Average Number of Attendance Standard Deviation of Number of Attendance Min  Number of Attendance Max Number of Attendance and Number of People in Attendance Average of Number of Representatives   Standard Deviation of Number of Representatives Min of Number of Representatives Max Number of Representatives Total Number of Values Average Number of Presenters Standard Deviation Number of Presenters Min Number of Presenters Max Number of Presenters Total Number of Presenters Average Company Audience Standard Deviation Company Audience Min Number of Company Audience Max Number of Company Audience Total Number of Company Audience 

等等。

我的數據的原始子集如下:

structure(list(sn = structure(c(2L, 2L, 3L, 5L, 2L, 7L, 1L, 9L, 
1L, 9L, NA, 9L, 1L, 26L, 11L, 9L, 7L, NA, NA, 7L, 17L, 9L, NA, 
21L, 7L, 17L, 7L, 7L, 16L, 7L, 7L, 7L, 7L, 26L, 7L, 6L, 26L, 
22L, NA, NA, 11L, 23L, 23L, 26L, NA, 7L, 23L, 1L, NA, 1L, 7L, 
11L, 12L, 13L, 9L, NA, 15L, NA, 20L, 15L, NA, 17L, 5L, NA, 22L, 
15L, NA, NA, 5L, 8L, 32L, 29L, 23L, 33L, 1L, 23L, 14L, 6L, 7L, 
15L), .Label = c("Broome Street", "Company A", "Company B", "Company BC",
"Company C", "Company CC", "Company D Clinton", "Company DD", 
"Company E", "Company ED BroadCompany", "Company G", "Company H     
BroadCompany", 
"Company I BroadCompany", "Company I Studio", "Company J", "Company K", 
"Company L", "Company M", "Company M BroadCompany", "Company M HS    
 BroadCompany", 
"Company MCC BroadCompany", "Company N", "Company P", "Company Q", 
"Company Q Company N", "Company Q Company ZZ", "Company R - Company ZZ", 
"Company SLab", "Company Z", "Company ZE", "Company ZED", "Company ZEQ", 
"Company ZZ", "Company ZZQ", "Company ZZQ Company N"), class = "factor"), 
earn_tot = c(21.85, 20.8, NA, 8.16, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, 7.16, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, 43.32, NA, 30.48, NA, NA, 34.9, NA, NA, NA, NA, NA, 25.82, 
40.75, NA, NA, NA, NA, NA, NA, NA, NA, NA, 0, NA, NA, NA, 
30, NA, NA, NA, NA, NA, NA, 39.1, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, 52.29, 44.32, NA, 7, 38.32, 0, NA, NA, 8.25, 
NA, NA), earn_and_current_tot = c(29.43, 20.8, NA, 8.16, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, 7.16, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, 49.9, NA, 37.56, NA, NA, 41.98, 
NA, NA, NA, NA, NA, 37.32, 49, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, 0, NA, NA, NA, 37, NA, NA, NA, NA, NA, NA, 47.68, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 57.29, 48.48, NA, 
7, 45.9, 0, NA, NA, 15.75, NA, NA), pass_99 = c(0L, 0L, NA, 
NA, NA, NA, 1L, NA, NA, NA, NA, 5L, NA, 0L, NA, 5L, NA, NA, 
NA, 0L, NA, NA, NA, NA, NA, 1L, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, 0L, NA, NA, NA, NA, 5L, NA, NA, NA, NA, 4L, 0L, 
NA, NA, NA, 4L, 4L, NA, NA, NA, NA, NA, NA, 0L, NA, NA, NA, 
NA, 1L, NA, NA, NA, NA, 1L, NA, NA, 0L, 4L, 0L, NA, NA, 0L, 
NA, NA), pass_65 = c(0L, 0L, 5L, 0L, 6L, NA, 0L, 5L, NA, 
5L, NA, 6L, NA, 0L, 5L, 2L, NA, NA, NA, 0L, 5L, 5L, NA, NA, 
NA, 0L, NA, 1L, 4L, 7L, 5L, 5L, 7L, 0L, 5L, NA, 0L, 1L, NA, 
NA, NA, 2L, 0L, 6L, NA, 8L, 2L, 0L, NA, 4L, 0L, 1L, 3L, NA, 
NA, NA, NA, NA, 4L, 0L, NA, 5L, 7L, NA, 0L, NA, NA, NA, 5L, 
0L, 5L, 4L, 0L, 2L, 0L, 0L, 7L, 0L, NA, 5L)), .Names = c("sn", 
"earn_tot", "earn_and_current_tot", "pass_99", "pass_65"), row.names = c(NA, 
80L), class = "data.frame")

有四個最重要的子集列。 這些列是“ earn_tot”,“ earn_and_current_tot”,“ pass_99”和“ pass_65”。 這里列出了許多匿名的公司。 我正在與大約100家公司合作。 在標題為“ sn”的列下有許多公司名稱。 整個子集數據集的名稱稱為Subset.MergedEx.So。

對於沒有提出一個很好的可復制示例,我深表歉意。 感謝您的耐心等待。 我一直在閱讀如何構造一個並使用以下代碼:dput((head(Subset.MergedEx.SO,80)))

你可以做的是melt你的數據為長格式,然后將它重新與多個聚合功能寬幅:

library(data.table)
dat.new <- dcast(melt(dat, id="company"),
                 company ~ variable, 
                 fun = list(mean,sd), 
                 value.var = "value")

這給出了:

> dat.new
   company value_mean_attendance value_mean_presenters value_mean_audience value_sd_attendance value_sd_presenters value_sd_audience
1:       A                   8.0                  24.8                60.6            1.870829            4.207137          7.668116
2:       B                   8.2                  23.8                64.2            2.489980            2.387467          2.049390

現在,您可以使用例如WriteXLS包將其寫入到excel文件中:

library(WriteXLS)
WriteXLS("dat.new","companies.xls")

因為要計算每個公司的許多統計信息,所以您可能需要考慮將每個公司的摘要統計信息寫入excel文件中的單獨工作表。

同樣,將數據轉換為帶有melt長格式,然后使用lapply(.SD, function(x) list(average = mean(x), sdev = sd(x)))$value每個公司和每個公司的lapply(.SD, function(x) list(average = mean(x), sdev = sd(x)))$value變量。 拆分導致data.table通過公司的名單data.table秒。 最后將該列表寫入excel文件:

dat.new <- melt(dat, id="company")[, lapply(.SD, function(x) list(average = mean(x), sdev = sd(x)))$value, 
                                    .(company,variable)]

company.list <- split(dat.new, dat.new$company)

WriteXLS(company.list,"companies.xls")

現在,每個公司都有一個帶有單獨選項卡的excel文件。


使用的數據:

set.seed(21)
dat <- data.table(company = rep(c("A","B"), each = 5),
                  attendance = sample(5:10,10,TRUE),
                  presenters = sample(20:30,10,TRUE),
                  audience = sample(50:70,10,TRUE))

這可能不是最佳解決方案,但僅使用了basepsych包。

這是數據

df <- data.frame(company = rep(c("A","B", "C","D"), each = 5),
              attendance = sample(5:10,20,TRUE),
              representatives = sample(2:30,20,TRUE),
              presenters = sample(20:30,20,TRUE),
              audience = sample(50:70,20,TRUE))

我編寫了一個函數來獲取所需的值。 我假設您只有5類信息:公司名稱,出席情況,代表,演示者,聽眾。

    get.values<-function(x){
    require(psych)
    info<-describeBy(x[,2:5], group = x[,1])
    n.companies<-length(levels(df[,1]))
    n<-list()
    mean<-list()
    sd<-list()
    min<-list()
    max<-list()
    for(i in 1:n.companies){
      n[[i]]<-info[[i]][,2]
      mean[[i]]<-info[[i]][,3]
      sd[[i]]<-info[[i]][,4]
      min[[i]]<-info[[i]][,8]
      max[[i]]<-info[[i]][,9]
    }
  l<-Map(c, mean, sd, min, max, n)
  valuedf<-do.call(rbind, l)
return(valuedf)
}

我還編寫了一個函數來生成所需的列名,您可以將其命名為所需的任何名稱:

get.names<-function(x){
      require(psych)
      names<-rownames(describe(x[,2:5]))
      avg<-character()
      sd<-character()
      min<-character()
      max<-character()
      total<-character()
  for(i in 1:length(names)){
      avg[i]<-paste("average number of", names[i])
      sd[i]<-paste("standard deviation of", names[i])
      min[i]<-paste("min number of", names[i])
      max[i]<-paste("max number of", names[i])
      total[i]<-paste("total number of", names[i])
  }
  cnames<-c(avg,sd,min,max,total)
return(cnames)
}

將值和名稱合並到一個新的數據框中:

output<-get.values(df)
col.names<-get.names(df)
colnames(output)<-col.names
rownames(output)<-levels(df[,1]) 

導出到Excel:

library(xlsx)
write.xlsx(output, "descriptives.xlsx")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM