简体   繁体   中英

How to generate random correlated uniform data from a correlation matrix?

I have a very specific problem to solve that makes researching a solution quite hard because I lack the requisite math skills.

My goal: Given a covariance/correlation matrix and variable ranges , generate some random data. This data needs to meet 3 important conditions:

  • The covariance/correlation of this data should be similar to the provided covariance/correlation matrix.

  • The ranges of the variables of this data (columns) should be bounded by the provided ranges.

  • Each variable has a uniform distribution.

Is there perhaps an R package or function that can generate this data conditions using those provided arguments? Maybe code in some other language that I could then rewrite in R?


EDIT1:

In the case that uniformity (condition 3) cannot be met, is there perhaps an R package or function that can generate data that meets just conditions 1 and 2 ? In other words, I don't care what distribution the variables take.


EDIT2:

Here is my first very terrible attempt at this problem. All it does so far is create positively correlated and uniform data. Tests are at the bottom:

generate_correlated_variables <- function(variable_ranges, numPoints = 100, nbins = 10) {
  
  df <- matrix(0, nrow = numPoints, ncol = length(variable_ranges))
  colnames(df) <- names(variable_ranges)

  
  for (i in 1:length(variable_ranges)) {
    
    df[,i] <- runif(numPoints, min = as.numeric(variable_ranges[[i]][1]), max = as.numeric(variable_ranges[[i]][2]))  
    
  }
  
  #Sample one variable and determine how many points fall in each bin
  #These amounts will be used to sample the rest of the variables
  df[,1] <- runif(numPoints, min = as.numeric(variable_ranges[[1]][1]), max = as.numeric(variable_ranges[[1]][2]))
  bin_width <- (variable_ranges[[1]][2] - variable_ranges[[1]][1])/nbins
  breaks_vec <- seq(variable_ranges[[1]][1], variable_ranges[[1]][2], by = bin_width)
  table <- table(cut(df[,1], breaks = breaks_vec, include.lowest = TRUE))

  binned_ranges_list <- vector(mode = "list", length = length(variable_ranges))
  names(binned_ranges_list) <- names(variable_ranges)
  
  temp <- vector(mode = "list", length = nbins)
  
  
  for (i in 1:length(variable_ranges)) {

      bin_width <- (variable_ranges[[i]][2] - variable_ranges[[i]][1])/nbins
      
      breaks_vec <- seq(variable_ranges[[i]][1], variable_ranges[[i]][2], by = bin_width)
      
      for (j in 1:nbins) {
        
        temp[[j]][1] <- breaks_vec[j]
        temp[[j]][2] <- breaks_vec[j+1]
        
      }
      
      binned_ranges_list[[i]] <- temp
      
  }
  
  print(binned_ranges_list)
    
  #sample ranges
  for (i in 1:length(variable_ranges)) {
    
    sampled_values_vec <- c()
      
      for (j in 1:nbins) {
        
        sample <- runif(n = table[j], min = binned_ranges_list[[i]][[j]][1], max = binned_ranges_list[[i]][[j]][2])
        
        sampled_values_vec <- c(sampled_values_vec, sample)
        
      }
    
    df[,i] <- sampled_values_vec
    }
   return(df) 
  }
  

#Tests
variable_ranges = list(A = c(1, 100), B = c(50, 100), C = c(1, 10))

a <- generate_correlated_variables(variable_ranges = variable_ranges, numPoints = 100, nbins = 2)
cor(a)

b <- generate_correlated_variables(variable_ranges = variable_ranges, numPoints = 100, nbins = 50)
cor(b)

Here is the idea how to get correlated uniform random numbers.

Suppose you have source of independent bits

  1. First generate array X bits (say 2 bits).

  2. Then generate another random array with upper (middle, lower, some position...) bits replaced from step 1.

  3. Again generate another random array with upper (middle, lower, some position...) bits replaced from step 1.

Arrays from step 2 and 3 would be uniform, but correlated.

Code for illustration (sorry, Python)

import numpy as np

N=1000000

rng = np.random.default_rng()

m = np.empty(N, dtype=np.uint32); m.fill(2*1073741824-1) # mask 2^31-1

f = rng.integers(low = 0, high=4294967295, size=N, dtype=np.uint32, endpoint=True)
f = f - np.bitwise_and(f, m) # upper three bits

q = rng.integers(low = 0, high=4294967295, size=N, dtype=np.uint32, endpoint=True)
z = rng.integers(low = 0, high=4294967295, size=N, dtype=np.uint32, endpoint=True)

print("Uncorrelated")
print(np.corrcoef([q, z]))

q = f + np.bitwise_and(m, q)
z = f + np.bitwise_and(m, z)

print("Correlated")
print(np.corrcoef([q, z]))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM