简体   繁体   中英

Avoiding use of `i` for indexing when using multiple python lists

The following came up whilst generating some synthetic data.

Example code:

import numpy as np


def example(
    *,
    n_vars=4,
    n=10,
    flag_1=True,
    flag_2=True,
    flag_3=True,
):
    data = {}
    if flag_1:
        scales = [2 for _ in range(n)]
    else:
        scales = [np.random.randint(0, 10) for _ in range(n)]

    if flag_2:
        locs = [2 for _ in range(n)]
    else:
        locs = [np.random.randint(0, 10) for _ in range(n)]

    for i in range(n_vars):
        data[f"var_{i}"] = np.random.normal(loc=locs[i], scale=scales[i], size=n)
    return pd.DataFrame(data)

What I'm not sure about here is how to go about using values from multiple lists as well as having a counter. Having locs[i] within the loop feels quite unnatural (within python at least). But using something like enumerate(some_list) wouldn't work here (as I have multiple lists), and zip(list1, list2) isn't going to work as I need the counter.

Things like enumerate(zip(list_1, list_2)) could provide a counter and a tuple within each iteration, though if I had three lists that feels as though it'd break down as well.

Here's another approach:

import numpy as np


def example_2(
    *,
    n_vars=4,
    n=6,
    flag_loc=True,
    flag_scale=True,
):
    data = {}
    np.random.seed(1)
    # build array for loc/scale
    random_data = np.random.randint(0, 10, size=(n_vars, 2))
    random_data[:, 0] = random_data[:, 0] * int(flag_loc) + int(not flag_loc) * 2
    random_data[:, 1] = random_data[:, 1] * int(flag_scale) + int(not flag_scale) * 2
    random_data = pd.DataFrame(
        {
            "locs": np.random.randint(0, 10, size=(n_vars))
            if flag_loc
            else [2] * n_vars,
            "scales": np.random.randint(0, 10, size=(n_vars))
            if flag_scale
            else [2] * n_vars,
        }
    )
    for i, r in random_data.iterrows():
        data[f"x{i}"] = np.random.normal(loc=r["locs"], scale=r["scales"], size=n)
    return pd.DataFrame(data)


which returns

          x0         x1        x2        x3
0   5.559480  11.236594  5.726233  4.749504
1  10.261197   9.111226  8.827740  3.053234
2   9.386170   9.753313 -1.567655  3.090958
3   5.465608   9.752270  8.942386  3.829324
4   9.626370   8.671618 -0.524433  7.006377
5  10.674446   8.830913 -1.629373  6.321282

Which is a bit nicer perhaps, but still feels like it's lacking.

Solution

I appreciate that there might be some subjectivity - but I do feel as though there's a better approach to this in python than what I've written. I'm happy for a solution to use base python, or include numpy / pandas.

You could use zip() to loop through multiple lists at once and also use list comprehension a bit more intense

import numpy as np
import pandas as pd

def example(*, n_vars=4, n=10, flag_1=True, flag_2=True, flag_3=True):
    data = {}

    scales = [2 if flag_1 else np.random.randint(0, 10) for _ in range(n)]
    locs = [2 if flag_2 else np.random.randint(0, 10) for _ in range(n)]

    data = {f"var_{i}": np.random.normal(loc=loc, scale=scale, size=n) for i, loc, scale in zip(range(n_vars), locs, scales)}

    return pd.DataFrame(data)

print(example())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM