在 R 中創建“元數據”字段

Question

我有一個類似於此的數據框設置：

id <- c(123,234,123,234)
task <- c(54,23,12,58)
a <- c(23,67,45,89)
b <- c(78,45,65,45)

df <- data.frame(id,task,a,b)
> df
   id task  a  b
1 123   54 23 78
2 234   23 67 45
3 123   12 45 65
4 234   58 89 45

我為每個 ID 評分 a 和 b：

df$score <- rowMeans(subset(df, select = c(3:4)), na.rm = TRUE)
> df
   id task  a  b score
1 123   54 23 78  50.5
2 234   23 67 45  56.0
3 123   12 45 65  55.0
4 234   58 89 45  67.0

對於每個 id，我得到了一個總分，如下所示：

out <- ddply(df, 1, summarise,
                    overall = mean(score, na.rm = TRUE))
> out
   id overall
1 123   52.75
2 234   61.50

但我希望我的最終輸出是一個新列，其中包含進入整體的分數及其任務 ID，如下所示：

   id overall                                meta
1 123   52.75 "task_scores":[{"54":50.5,"12":55}]
2 234   61.50   "task_scores":[{"23":56,"58":67}]

我將如何使用 R 來做到這一點？

Answer 1

我們可以利用jsonlite來創建結構

library(jsonlite)
library(plyr)
ddply(df, "id", summarise, overall = mean(score, na.rm = TRUE),
    meta = paste0('"task_scores":', 
              toJSON(setNames(as.data.frame.list(score), task))))
#   id overall                                meta
#1 123   52.75 "task_scores":[{"54":50.5,"12":55}]
#2 234   61.50   "task_scores":[{"23":56,"58":67}]

Answer 2

我不知道如何立即制作元數據字典，但您可以執行以下操作：

library(dplyr)
library(magrittr)
out <- df %>% group_by(id) %>%  mutate(overall = mean(score))

> out
# A tibble: 4 x 6
# Groups:   id [2]
     id  task     a     b score overall
  <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
1   123    54    23    78  50.5    52.8
2   234    23    67    45  56      61.5
3   123    12    45    65  55      52.8
4   234    58    89    45  67      61.5

因此 df 將同時具有聚合分數並保留原始行中的數據。

Answer 3

你可以用一些變異來做到這一點。 粘貼您的計數，獲得您的行平均值，然后是您的組平均值。

library(dplyr)
df %>%
  mutate(score = rowMeans(subset(., select = c(3:4)), na.rm = TRUE)) %>% 
  group_by(id) %>% 
  mutate(overall = mean(score)) %>% 
  mutate(tally = paste(task, score, sep = ":", collapse = ","))

  # A tibble: 4 x 7
# Groups:   id [2]
     id  task     a     b score overall tally        
  <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl> <chr>        
1   123    54    23    78  50.5    52.8 54:50.5,12:55
2   234    23    67    45  56      61.5 23:56,58:67  
3   123    12    45    65  55      52.8 54:50.5,12:55
4   234    58    89    45  67      61.5 23:56,58:67

要獲得所需的最終輸出，只需選擇並切片即可。

    df %>%
  mutate(score = rowMeans(subset(., select = c(3:4)), na.rm = TRUE)) %>% 
  group_by(id) %>% 
  mutate(overall = mean(score)) %>% 
  mutate(tally = paste(task, score, sep = ":", collapse = ",")) %>% 
  select(id, overall, tally) %>% 
  slice(1)

  # A tibble: 1 x 3
     id overall tally        
  <dbl>   <dbl> <chr>        
1   123    52.8 54:50.5,12:55
2   234    61.5 23:56,58:67

在 R 中創建“元數據”字段

問題描述

3 個解決方案

解決方案1
2 已采納 2018-10-23 04:15:51

解決方案2
0 2018-10-22 20:34:59

解決方案3
0 2018-10-22 21:35:40

在 R 中創建“元數據”字段

問題描述

3 個解決方案

解決方案1 2 已采納 2018-10-23 04:15:51

解決方案2 0 2018-10-22 20:34:59

解決方案3 0 2018-10-22 21:35:40

解決方案1
2 已采納 2018-10-23 04:15:51

解決方案2
0 2018-10-22 20:34:59

解決方案3
0 2018-10-22 21:35:40