[英]R: Split a row into multiple rows, and then split the column into multiple columns
我陷入了一項看似簡單的任務。 想象以下data.table
:
dt1 <- data.table(ID = as.factor(c("202E", "202E", "202E")),
timestamp = as.POSIXct(c("2017-05-02 00:00:00",
"2017-05-02 00:15:00",
"2017-05-02 00:30:00")),
acceleration_raw = c("-0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.727 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.727 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.164 -0.703 0.656 0.141 -0.703 0.656 0.164 -0.703 0.656 0.141 -0.703 0.656 0.141 -0.703 0.656 0.141 -0.703 0.656 0.141",
"-0.703 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.680 0.680 0.117 -0.680 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117 -0.703 0.680 0.117",
"-0.750 0.586 0.117 -0.773 0.586 0.117 -0.773 0.609 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.750 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.141 -0.773 0.586 0.117 -0.773 0.586 0.141 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.141 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117 -0.773 0.586 0.117"))
創建於 2022-11-17,使用reprex v2.0.2
我的想法是,我想將acceleration_raw
列分成 3 個不同的列: acc_x
、 acc_y
和acc_z
。 acceleration_raw
的每一行都是一串字符,最終導致 120 個數值觀察。 我想將acceleration_raw
分開,然后以 3 的步長從第一行來回獲取每個值並將其放入acc_x
,從第二行來回的每個值並將其放入acc_y
,最后是第三行的每個值然后把它放到acc_z
。
我嘗試首先將acceleration_raw
與separate_rows
從dplyr
中分離出來:
library('tidyverse')
library('data.table')
dt1 <- dt1 %>%
separate_rows(acceleration_raw, sep = " ", convert = F)
創建於 2022-11-17,使用reprex v2.0.2
在那之后:
library('tidyverse')
library('data.table')
dt1 <- dt1 %>%
separate_rows(acceleration_raw, sep = " ", convert = F) %>%
mutate(acc_x = seq(acceleration_raw, from = 1, to = length(dt1), by = 3),
acc_y = seq(acceleration_raw, from = 2, to = length(dt1), by = 3),
acc_z = seq(acceleration_raw, from = 3, to = length(dt1), by = 3))
#> Warning in seq.default(acceleration_raw, from = 1, to = length(dt1), by = 3):
#> first element used of 'length.out' argument
#> Error in `mutate()`:
#> ! Problem while computing `acc_x = seq(acceleration_raw, from = 1, to =
#> length(dt1), by = 3)`.
#> Caused by error in `ceiling()`:
#> ! non-numeric argument to mathematical function
創建於 2022-11-17,使用reprex v2.0.2
關於如何進行的任何建議?
您可以使用pivot_wider
和unnest
:
library(tidyverse)
dt1 %>%
separate_rows(acceleration_raw, sep = " ", convert = F) %>%
mutate(id = rep(c("acc_x", "acc_y", "acc_z"), times = nrow(.) / 3)) %>%
pivot_wider(names_from = id, values_from = acceleration_raw, values_fn = list) %>%
unnest(cols = c("acc_x", "acc_y", "acc_z"))
這返回
# A tibble: 120 × 5
ID timestamp acc_x acc_y acc_z
<fct> <dttm> <chr> <chr> <chr>
1 202E 2017-05-02 00:00:00 -0.703 0.656 0.164
2 202E 2017-05-02 00:00:00 -0.703 0.656 0.164
3 202E 2017-05-02 00:00:00 -0.703 0.656 0.164
4 202E 2017-05-02 00:00:00 -0.703 0.656 0.164
5 202E 2017-05-02 00:00:00 -0.703 0.656 0.164
6 202E 2017-05-02 00:00:00 -0.703 0.656 0.164
7 202E 2017-05-02 00:00:00 -0.703 0.656 0.164
8 202E 2017-05-02 00:00:00 -0.703 0.656 0.164
9 202E 2017-05-02 00:00:00 -0.703 0.656 0.164
10 202E 2017-05-02 00:00:00 -0.703 0.656 0.164
# … with 110 more rows
沒有 NA,我無法使用 pivot_wider,因此解決方案不是最優的:
library(tidyverse)
dt1 %>%
as_tibble() %>%
separate_rows(acceleration_raw, sep = " ") %>%
group_by(group = as.integer(gl(n(), n()/3, n()))) %>%
mutate(id = row_number()) %>%
mutate(group = case_when(group == 1 ~ "acc_x",
group == 2 ~ "acc_y",
group == 3 ~ "acc_z")) %>%
pivot_wider(names_from = group, values_from = acceleration_raw) %>%
mutate(acc_y = lead(acc_y,n()/3),
acc_z = lead(acc_z,n()/3*2)) %>%
na.omit()
# A tibble: 120 x 6
ID timestamp id acc_x acc_y acc_z
<fct> <dttm> <int> <chr> <chr> <chr>
1 202E 2017-05-02 00:00:00 1 -0.703 -0.703 -0.750
2 202E 2017-05-02 00:00:00 2 0.656 0.680 0.586
3 202E 2017-05-02 00:00:00 3 0.164 0.117 0.117
4 202E 2017-05-02 00:00:00 4 -0.703 -0.680 -0.773
5 202E 2017-05-02 00:00:00 5 0.656 0.680 0.586
6 202E 2017-05-02 00:00:00 6 0.164 0.117 0.117
7 202E 2017-05-02 00:00:00 7 -0.703 -0.680 -0.773
8 202E 2017-05-02 00:00:00 8 0.656 0.680 0.609
9 202E 2017-05-02 00:00:00 9 0.164 0.117 0.117
10 202E 2017-05-02 00:00:00 10 -0.703 -0.680 -0.773
# ... with 110 more rows
# i Use `print(n = ...)` to see more rows
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.