简体   繁体   中英

Get indexes for Subsample of list of lists

I have several lists of data in python:

a = [2,45,1,3]
b = [4,6,3,6,7,1,37,48,19]
c = [45,122]
total = [a,b,c]

I want to get n random indexes from them:

n = 7
# some code
result = [[1,3], [2,6,8], [0,1]] # or
result = [[0], [0,2,6,8], [0,1]] # or
result = [[0,1], [0,2,3,6,8], []] # or any other

The idea - it takes randomly any elements (indexes of that elements) from any arrays, but total count of them must be n .

So my idea - generate random indexes:

n = 7
total_len = sum([len(el) for el in total])
inds = random.sample(range(total_length), n))

But how then get such indexes? I think about np.cumsum() and shift indixes after that but can't find elegant solution...


PS Actually, I need to use it for loading data from a several csv files using skiprow option. So my idea - get indexes for every file, and this let me load only necessary rows from every file. So my real task: i have several csv files of different length and need to get n random rows from them. My idea:

lengths = my_func_to_get_lengths_for_every_csv(paths) # list of lengths
# generate random subsamle of indexes
skip = ...
for ind, fil in enumerate(files):
    pd.read_csv(fil, skiprows=skip[ind])

You could flatten the list first and then take your samples:

total_flat = [item for sublist in total for item in sublist]
inds = random.sample(total_flat , k=n)

Is this what you mean?

relative_inds = []
min_bound = 0
for lst in total:
    relative_inds.append([i - min_bound for i in inds if min_bound <= i < min_bound + len(lst)])
    min_bound += len(lst)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM