简体   繁体   English

使用R处理多维(时空)数据的最佳实践

[英]Best practices for handling multidimensional (spatio temporal) data with R

I have a question regarding the usage of a (Postgre)SQL Database in R: Many documentations on this topic stress the fact that it only makes sense to use SQL Databases in R if you are dealing with big data that doesn't fit in your ram (eg see here and here ). 我对R中的(Postgre)SQL数据库的使用有疑问:关于此主题的许多文档都强调这样一个事实,即如果您要处理不适合您的大数据,则只有在R中使用SQL数据库才有意义ram(例如,请参见此处此处 )。 I have a different situation and wasn't to find out if using a Postgre(SQL) database would be a reasonable decision. 我的情况有所不同,因此无法确定使用Postgre(SQL)数据库是否是一个合理的决定。 Here's my situation: 这是我的情况:

I'm well into a ecological research study where I analyse roe deer gps data at different sampling intervals (5min and 3h) over a span of about 2 years. 我非常擅长进行生态研究,我在大约2年的时间里以不同的采样间隔(5分钟和3小时)分析ro gps数据。 In addition, I integrate two axis acceleration data at a sampling interval of 4 minutes. 另外,我以4分钟的采样间隔集成了两个轴加速度数据。

To evaluate the behaviour of the roe deer in regard to humans, I analyse this multidimensional data comparing it to gps data of human beings taken at a sampling interval of 5 seconds. 为了评估the对人的行为,我分析了此多维数据,并将其与以5秒的采样间隔获取的人的gps数据进行了比较。

To date, I've been doing this analysis using dataframe/datatable with dplyr. 到目前为止,我一直在使用带有dplyr的dataframe / datatable进行此分析。 When merging all the data into one dataset, the resulting datatable becomes really wide . 将所有数据合并到一个数据集中时,结果数据表实际上变得很 The columns include: Timestamp, ID, X/Y Positions, DOP and so forth of both humans and roe deer and all the resulting calculated values like distance, speed, elevation, proximity and lots more. 的列包括:时间戳,ID,X / Y位置,DOP等人和狍子和所有像距离,速度,高度,接近度和其它更多所得到的计算值的。

Also, the data is immensely long : Since the position of multiple roe deer and multiple humans are recorded simultaneously (many-to-many relationship), which leads to many repetitions in the dataframe. 而且,数据非常 :由于同时记录了多个ro和多个人的位置(多对多关系),因此导致数据帧中的许多重复。 On top of that, the different sampling intervals between humans and roe deer lead to repetition (of the roe deer positions) as well. 最重要的是,人和ro之间的不同采样间隔也会导致重复(positions位置)。

I'm hoping that with a database solution, I can 我希望有了数据库解决方案,我可以

  1. write shorter, more elegent and concise code to analyse my data 编写更短,更简洁和简洁的代码来分析我的数据
  2. keep a better overview of my data since it's 更好地了解我的数据,因为
    • shorter (no repetitions) and 较短(无重复)和
    • narrower (separate tables for the individual datasets with according relationships) 较窄(具有对应关系的各个数据集的单独表)

Would you recommend using a database in my case? 您是否建议在我的情况下使用数据库? Would using a database solution help achieve the goals as described above? 使用数据库解决方案是否可以帮助实现上述目标?

Postgresql offers all the protection an ACID Database. PostgreSQL为所有保护提供了一个ACID数据库。

I use both R and Postgresql for work. 我同时使用R和Postgresql。 To be honest I prefer most things to be in the database. 老实说,我更喜欢大多数东西在数据库中。

In relation to your many to many data join Database normalization may help you there. 关于您的多对多数据连接, 数据库规范化可以为您提供帮助。

Also a select from postgresql on the relevant columns and applying a filter to the rows may help. 同样,从postgresql的相关列中进行选择并对行应用过滤器可能也会有所帮助。 More information on select queries can be found here Ref Postgresql select tutorial 有关选择查询的更多信息,请参见Ref Postgresql选择教程。

EG 例如

Select column1, column3 from example_table where x =y etc and reading this into a data set. 从example_table中选择column1,column3,其中x = y等,然后将其读入数据集。

A Database is more suited for handling data while R is more suited to data analysis. 数据库更适合处理数据,而R更适合数据分析。

If you want to take a look at the commands calling Postgresql from R you could look at this article from Google. 如果您想看看从R调用Postgresql的命令,可以查看Google的这篇文章。

Ref RPostgresql 参考RPostgresql

Example

``` library(RPostgreSQL) ```库(RPostgreSQL)

loads the PostgreSQL driver 加载PostgreSQL驱动

drv <- dbDriver("PostgreSQL") drv <-dbDriver(“ PostgreSQL”)

Open a connection 打开连接

con <- dbConnect(drv, dbname="R_Project") con <-dbConnect(drv,dbname =“ R_Project”)

Submits a statement 提交声明

rs <- dbSendQuery(con, "select * from R_Users") rs <-dbSendQuery(con,“从R_Users中选择*”)

All the best 祝一切顺利

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM