根据指数分布生成数据

Question

我想生成一个包含 30 个条目的数据集，例如在 (50-5000) 的范围内，使其遵循一个增加的曲线（对数曲线），即在开始时增加，然后在结束时停滞。

我遇到了from scipy.stats import expon但我不确定如何在我的场景中使用 package。

任何人都可以帮忙。

一个可能的 output 看起来像[300, 1000, 1500, 1800, 1900, ...] 。

Answer 1

首先，您需要（均匀地）生成 30 个随机x值。 然后你得到log(x) 。 理想情况下， log(x)应该在[50, 5000)范围内。 但是，在这种情况下，您将需要e^50 <= x <= e^5000 （溢出。！）。 一种可能的解决方案是在[min_x, max_x)中生成随机 x 值，获取对数值，然后将它们缩放到所需的范围[50, 5000) 。

import numpy as np

min_y = 50
max_y = 5000
min_x = 1
# any number max_x can be chosen
# this number controls the shape of the logarithm, therefore the final distribution
max_x = 10

# generate (uniformly) and sort 30 random float x in [min_x, max_x)
x = np.sort(np.random.uniform(min_x, max_x, 30))
# get log(x), i.e. values in [log(min_x), log(max_x))
log_x = np.log(x)
# scale log(x) to the new range [min_y, max_y)
y = (max_y - min_y) * ((log_x - np.log(min_x)) / (np.log(max_x) - np.log(min_x))) + min_y

根据指数分布生成数据

问题描述

1 个解决方案

解决方案1
1 2019-10-30 10:21:52

根据指数分布生成数据

问题描述

1 个解决方案

解决方案1 1 2019-10-30 10:21:52

解决方案1
1 2019-10-30 10:21:52