简体   繁体   中英

How to turn a df (x,y,z) into a table using the columns x and y as labels and fill the cells with a function including z?

My data frame is made up of three columns called "vehicle", "age" and "ways" like this:

# A tibble: 30 x 3
  vehicle     age   ways
   <dbl>     <dbl> <dbl>
 1     4        25 0.201
 2     1        24 0.216
 3     4        25 0.236
 4     4        25 0.147
 5     4        24 0.435
 6     4        25 2.54 
 7     1        24 0.268
 8     1        25 0.194
 9     4        23 0.360
10     1        26 0.248
11     5        24 0.239
12     2        26 0.162
13     4        23 2.15 
14     1        25 0.554
15     4        26 0.384
16     3        26 0.122
17     4        27 0.183
18     4        25 1.36 
19     4        25 1.27 
20     1        24 0.404
21     2        27 0.479
22     1        25 4.98 
23     3        25 0.113
24     4        25 0.297
25     4        24 0.566
26     4        24 1.12 
27     4        25 0.394
28     4        25 2.77 
29     4        24 4.63 
30     4        24 0.677

I want to transform this data frame into a table with the column "vehicle" as vertical labels and the column "age" as horizontal labels. Which would look something like this:

vehicle/age|  23 | 24 | 25 | 26 ...
-----------------------------------
     1     |     |    |    |
–----------------------------------
     2     |     |    |    |
-----------------------------------
     3     |     |    |    |
-----------------------------------
     4     |     |    |    |

And I want to fill the unfilled cells with a mathematical function like this:

Example:

cell1 = ((∑ ways of vehicle 1 and age 23)*100) / (∑ all ways of all vehicle with age 23)

I know a way how I can do this manually with using some filter, group_by and summarize functions and writing the single results into excel. I am just curious if there is a better and faster way because i need to this with more data frames.

For everyone who is willing to find a solution for my problem, thank you very much!

With xtabs() .

res <- xtabs(ways ~ vehicle + age, D) / sum(D$ways) * 100
res
#        age
# vehicle         23         24         25         26         27
#       1  0.0000000  3.2058919 20.6794469  0.8953392  0.0000000
#       2  0.0000000  0.0000000  0.0000000  0.5848587  1.7293043
#       3  0.0000000  0.0000000  0.4079570  0.4404491  0.0000000
#       4  9.0616990 26.8168526 33.2683490  1.3863316  0.6606737
#       5  0.0000000  0.8628470  0.0000000  0.0000000  0.0000000

Do eg res <- data.frame(unclass(res)) to get a "data.frame" from the "table" object. You also may round, eg round(res, 2) .

Data

D <- structure(list(vehicle = c(4L, 1L, 4L, 4L, 4L, 4L, 1L, 1L, 4L, 
1L, 5L, 2L, 4L, 1L, 4L, 3L, 4L, 4L, 4L, 1L, 2L, 1L, 3L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L), age = c(25L, 24L, 25L, 25L, 24L, 25L, 24L, 
25L, 23L, 26L, 24L, 26L, 23L, 25L, 26L, 26L, 27L, 25L, 25L, 24L, 
27L, 25L, 25L, 25L, 24L, 24L, 25L, 25L, 24L, 24L), ways = c(0.201, 
0.216, 0.236, 0.147, 0.435, 2.54, 0.268, 0.194, 0.36, 0.248, 
0.239, 0.162, 2.15, 0.554, 0.384, 0.122, 0.183, 1.36, 1.27, 0.404, 
0.479, 4.98, 0.113, 0.297, 0.566, 1.12, 0.394, 2.77, 4.63, 0.677
)), row.names = c(NA, -30L), class = "data.frame")

Data thanks to jay.sf - Edit: I added vehicle == 1 & age == 23:

D <- structure(list(vehicle = c(4L, 1L, 4L, 4L, 4L, 4L, 1L, 1L, 4L, 
1L, 5L, 2L, 4L, 1L, 4L, 3L, 4L, 4L, 4L, 1L, 2L, 1L, 3L, 4L, 4L, 
4L, 4L, 4L, 4L, 4L,1L), age = c(25L, 24L, 25L, 25L, 24L, 25L, 24L, 
25L, 23L, 26L, 24L, 26L, 23L, 25L, 26L, 26L, 27L, 25L, 25L, 24L, 
27L, 25L, 25L, 25L, 24L, 24L, 25L, 25L, 24L, 24L,23L), ways = c(0.201, 
0.216, 0.236, 0.147, 0.435, 2.54, 0.268, 0.194, 0.36, 0.248, 
0.239, 0.162, 2.15, 0.554, 0.384, 0.122, 0.183, 1.36, 1.27, 0.404, 
0.479, 4.98, 0.113, 0.297, 0.566, 1.12, 0.394, 2.77, 4.63, 0.677, 0.55
)), row.names = c(NA, -30L), class = "data.frame")

Solution:

D %>%
  group_by(vehicle,age) %>%
  summarise(ways = sum(ways)) %>%
  ungroup() %>%
  spread(age,ways) %>%
  gather(age,ways,-vehicle) %>%
  mutate(ways = case_when(is.na(ways) ~ ways[age == 23 & vehicle == 1]*100/sum(ways, na.rm = TRUE), TRUE ~ ways)) %>%
  spread(age,ways)

Edit:

I updated the code to reflect the fill function based on what I understood it to be.

You can use xtabs , as already shown by @jay.sf, in combination with prop.table like:

prop.table(xtabs(ways ~ vehicle + age, D)) * 100

In case you want to show the percentages per column you can use:

prop.table(xtabs(ways ~ vehicle + age, D), 2) * 100

Since R 4.0.0 you should use proportions instead of prop.table which makes

proportions(xtabs(ways ~ vehicle + age, D)) * 100

and

proportions(xtabs(ways ~ vehicle + age, D), 2) * 100
proportions(xtabs(ways ~ vehicle + age, D), "age") * 100 #Alternative

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM