简体   繁体   English

R-根据另一列中的另一行向下填充一列中的行

[英]R - fill rows of a column down based on another row in another column

I am reading in JSON data in RStudio that is coming from sensors driving around in a city. 我正在RStudio中读取JSON数据,这些数据来自在城市中行驶的传感器。 I am then converting this data to a dataframe with the sofa and jsonlite package. 然后,我将这些数据转换为带有Sofa和jsonlite包的数据框。 the data basically is sensor data, that consists of location data, and to each location measurement there are some environmental data transmitted and coded in resource paths like (/6/0/0 and /6/0/1 for latitude and longitude information) - due to the JSON data I am reading in, the location information in my R dataframe is in the same "value" column like the environmental data like humidity, CO2 etc. so I loose the location information for the individual observations, as the location info is also treated as value. 数据基本上是传感器数据,由位置数据组成,并且对于每个位置测量,都会在资源路径中传输和编码一些环境数据,例如(用于纬度和经度信息的/ 6/0/0和/ 6/0/1) -由于我正在读取JSON数据,因此R数据框中的位置信息位于同一“值”列中,例如湿度,CO2等环境数据。因此,我将各个观测值的位置信息作为位置松开信息也被视为价值。 see converted JSON data.frame below 请参阅下面的转换后的JSON data.frame

data.frame 数据框架

> |------------------------------------------------------- |    
> resourcePath    |     value  |UTC       |lat     |long . |
>   
> /6/0/0          |    48.18   |14:51:43 |  0     | 0      |
> |---------------|------------|---------|--------|--------|   
> /6/0/1          |    16.39   |14:51:43 |  0     | 0      |
> |---------------|------------|---------|--------|--------|   
> /3300/515/5700  |         34 |14:52:43 |  0     | 0      |
> |---------------|------------|---------|--------|--------|   
> /3300/289/5700  |         15 |14:53:43 |  0     | 0      |
> |---------------|------------|---------|--------|--------|   
> /3300/515/5700  |        55  |4:53:47  |  0     | 0      |
> |---------------|------------|---------|--------|--------|   
> /3300/289/5700  |       9004 |14:54:23 |  0     | 0      |
> |---------------|------------|---------|--------|--------|   
> /3304/0/5700    |       367  | 14:54:34| 0      |0       |
> |---------------|------------|---------|--------|--------|   
> /3315/0/5700    |         47 |14:54:54 | 0      |0       |
> |---------------|------------|---------|--------|--------|   
> /6/0/0          |     50.34  |14:57:11 |0       | 0      |
> |---------------|------------|---------|--------|--------|   
> /6/0/1          |     20.52  |14:57:13 |0       | 0      |
> |---------------|------------|---------|--------|--------|   
> /3304/0/5700    |         84 |14:57:34 |0       | 0      |
> |---------------|------------|---------|--------|--------|   
> /3315/0/5700    |         56 |14:57:45    0       0      |

And here a View of the desired dataframe. 这是所需数据帧的视图。

[this is the desired df - each row has an associated "lat" and "long" information - depending on the value in the "value" column, as long as there is a new value in the "value" column.][2] [这是所需的df-只要“ value”列中有一个新值,每行都有一个相关的“ lat”和“ long”信息-取决于“ value”列中的值。] [2 ]

> |------------------------------------------------------- |    
> resourcePath    |     value  |UTC       |lat     |long . |
>   
> /6/0/0          |    48.18   |14:51:43 |  48.18 | 16.39  |
> |---------------|------------|---------|--------|--------|   
> /6/0/1          |    16.39   |14:51:43 |  48.18 | 16.39  |
> |---------------|------------|---------|--------|--------|   
> /3300/515/5700  |         34 |14:52:43 |  48.18 | 16.39  |
> |---------------|------------|---------|--------|--------|   
> /3300/289/5700  |         15 |14:53:43 |  48.18 | 16.39  |
> |---------------|------------|---------|--------|--------|   
> /3300/515/5700  |        55  |4:53:47  |  48.18 | 16.39  |
> |---------------|------------|---------|--------|--------|   
> /3300/289/5700  |       9004 |14:54:23 |  48.18 | 16.39  |
> |---------------|------------|---------|--------|--------|   
> /3304/0/5700    |       367  | 14:54:34| 48.18  |16.39   |
> |---------------|------------|---------|--------|--------|   
> /3315/0/5700    |         47 |14:54:54 | 48.18  |16.39   |
> |---------------|------------|---------|--------|--------|   
> /6/0/0          |     50.34  |14:57:11 |50.34   | 20.52  |
> |---------------|------------|---------|--------|--------|   
> /6/0/1          |     20.52  |14:57:13 |50.34   | 20.52  |
> |---------------|------------|---------|--------|--------|   
> /3304/0/5700    |         84 |14:57:34 |50.34   | 20.52  |
> |---------------|------------|---------|--------|--------|   
> /3315/0/5700    |         56 |14:57:45    50.34   20.52  |

I was looping - using lapply, but currently i am not getting the desired df. 我正在循环-使用lapply,但目前我没有获得所需的df。 Any hints widely appreciated. 任何提示广为赞赏。 Thomas 汤玛士

Here is a solution with the use of the tidyr package. 这是使用tidyr软件包的解决方案。 This assumes the first row of each grouping of data is the "/6/0/0" row and the second row is "/6/0/1". 假定每个数据分组的第一行是“ / 6/0/0”行,第二行是“ / 6/0/1”。

df<-structure(list(resourcePath = structure(c(5L, 6L, 2L, 1L, 2L, 
1L, 3L, 4L, 5L, 6L, 3L, 4L), .Label = c("/3300/289/5700", "/3300/515/5700", 
"/3304/0/5700", "/3315/0/5700", "/6/0/0", "/6/0/1"), class = "factor"), 
    value = c(48.18, 16.39, 34, 15, 55, 9004, 367, 47, 50.34, 
    20.52, 84, 56), UTC = structure(c(1L, 1L, 2L, 3L, 4L, 5L, 
    6L, 7L, 8L, 9L, 10L, 11L), .Label = c("14:51:43", "14:52:43", 
    "14:53:43", "14:53:47", "14:54:23", "14:54:34", "14:54:54", 
    "14:57:11", "14:57:13", "14:57:34", "14:57:45"), class = "factor"), 
    lat = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), 
    long = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("resourcePath", 
"value", "UTC", "lat", "long"), class = "data.frame", row.names = c(NA, 
-12L))    

f$resourcePath<-as.character(df$resourcePath)

#reset lat and long columns to NA for the fill command
df$lat<-NA
df$long <- NA

#find rows with the lat resource
#assumes this is the first row of each data grouping
latrows<-which(df$resourcePath=="/6/0/0")
df$lat[latrows]<-df$value[latrows]
df$long[latrows]<-df$value[(latrows+1)]

library(tidyr)  #needed for the fill function
df<-fill(df, lat, long)

Edit note: This is a performance improvement over the initial version, provide the row ordering is consistent in the dataframe. 编辑说明:如果数据框中的行顺序保持一致,则这是对初始版本的性能改进。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM