简体   繁体   中英

Best practices for handling multidimensional (spatio temporal) data with R

I have a question regarding the usage of a (Postgre)SQL Database in R: Many documentations on this topic stress the fact that it only makes sense to use SQL Databases in R if you are dealing with big data that doesn't fit in your ram (eg see here and here ). I have a different situation and wasn't to find out if using a Postgre(SQL) database would be a reasonable decision. Here's my situation:

I'm well into a ecological research study where I analyse roe deer gps data at different sampling intervals (5min and 3h) over a span of about 2 years. In addition, I integrate two axis acceleration data at a sampling interval of 4 minutes.

To evaluate the behaviour of the roe deer in regard to humans, I analyse this multidimensional data comparing it to gps data of human beings taken at a sampling interval of 5 seconds.

To date, I've been doing this analysis using dataframe/datatable with dplyr. When merging all the data into one dataset, the resulting datatable becomes really wide . The columns include: Timestamp, ID, X/Y Positions, DOP and so forth of both humans and roe deer and all the resulting calculated values like distance, speed, elevation, proximity and lots more.

Also, the data is immensely long : Since the position of multiple roe deer and multiple humans are recorded simultaneously (many-to-many relationship), which leads to many repetitions in the dataframe. On top of that, the different sampling intervals between humans and roe deer lead to repetition (of the roe deer positions) as well.

I'm hoping that with a database solution, I can

  1. write shorter, more elegent and concise code to analyse my data
  2. keep a better overview of my data since it's
    • shorter (no repetitions) and
    • narrower (separate tables for the individual datasets with according relationships)

Would you recommend using a database in my case? Would using a database solution help achieve the goals as described above?

Postgresql offers all the protection an ACID Database.

I use both R and Postgresql for work. To be honest I prefer most things to be in the database.

In relation to your many to many data join Database normalization may help you there.

Also a select from postgresql on the relevant columns and applying a filter to the rows may help. More information on select queries can be found here Ref Postgresql select tutorial

EG

Select column1, column3 from example_table where x =y etc and reading this into a data set.

A Database is more suited for handling data while R is more suited to data analysis.

If you want to take a look at the commands calling Postgresql from R you could look at this article from Google.

Ref RPostgresql

Example

``` library(RPostgreSQL)

loads the PostgreSQL driver

drv <- dbDriver("PostgreSQL")

Open a connection

con <- dbConnect(drv, dbname="R_Project")

Submits a statement

rs <- dbSendQuery(con, "select * from R_Users")

All the best

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM