The following came up whilst generating some synthetic data.
Example code:
import numpy as np
def example(
*,
n_vars=4,
n=10,
flag_1=True,
flag_2=True,
flag_3=True,
):
data = {}
if flag_1:
scales = [2 for _ in range(n)]
else:
scales = [np.random.randint(0, 10) for _ in range(n)]
if flag_2:
locs = [2 for _ in range(n)]
else:
locs = [np.random.randint(0, 10) for _ in range(n)]
for i in range(n_vars):
data[f"var_{i}"] = np.random.normal(loc=locs[i], scale=scales[i], size=n)
return pd.DataFrame(data)
What I'm not sure about here is how to go about using values from multiple lists as well as having a counter. Having locs[i]
within the loop feels quite unnatural (within python at least). But using something like enumerate(some_list)
wouldn't work here (as I have multiple lists), and zip(list1, list2)
isn't going to work as I need the counter.
Things like enumerate(zip(list_1, list_2))
could provide a counter and a tuple within each iteration, though if I had three lists that feels as though it'd break down as well.
Here's another approach:
import numpy as np
def example_2(
*,
n_vars=4,
n=6,
flag_loc=True,
flag_scale=True,
):
data = {}
np.random.seed(1)
# build array for loc/scale
random_data = np.random.randint(0, 10, size=(n_vars, 2))
random_data[:, 0] = random_data[:, 0] * int(flag_loc) + int(not flag_loc) * 2
random_data[:, 1] = random_data[:, 1] * int(flag_scale) + int(not flag_scale) * 2
random_data = pd.DataFrame(
{
"locs": np.random.randint(0, 10, size=(n_vars))
if flag_loc
else [2] * n_vars,
"scales": np.random.randint(0, 10, size=(n_vars))
if flag_scale
else [2] * n_vars,
}
)
for i, r in random_data.iterrows():
data[f"x{i}"] = np.random.normal(loc=r["locs"], scale=r["scales"], size=n)
return pd.DataFrame(data)
which returns
x0 x1 x2 x3
0 5.559480 11.236594 5.726233 4.749504
1 10.261197 9.111226 8.827740 3.053234
2 9.386170 9.753313 -1.567655 3.090958
3 5.465608 9.752270 8.942386 3.829324
4 9.626370 8.671618 -0.524433 7.006377
5 10.674446 8.830913 -1.629373 6.321282
Which is a bit nicer perhaps, but still feels like it's lacking.
I appreciate that there might be some subjectivity - but I do feel as though there's a better approach to this in python than what I've written. I'm happy for a solution to use base python, or include numpy / pandas.
You could use zip()
to loop through multiple lists at once and also use list comprehension a bit more intense
import numpy as np
import pandas as pd
def example(*, n_vars=4, n=10, flag_1=True, flag_2=True, flag_3=True):
data = {}
scales = [2 if flag_1 else np.random.randint(0, 10) for _ in range(n)]
locs = [2 if flag_2 else np.random.randint(0, 10) for _ in range(n)]
data = {f"var_{i}": np.random.normal(loc=loc, scale=scale, size=n) for i, loc, scale in zip(range(n_vars), locs, scales)}
return pd.DataFrame(data)
print(example())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.