Read in n number of random columns in pandas

Question

I have a 80gb h5 file, and I want to just read say a random set of 1000 columns and assume I do not know the column names. How would we achieve this?

Answer 1

You should first know the number of columns in your file. Let's assume 10000 here.

You can then use a combination of numpy.random and the columns option of pandas.read_hdf :

pd.read_hdf('file', columns=sorted(np.random.choice(range(10000), size=1000, replace=False)))

Read in n number of random columns in pandas

Question

1 answers

solution1
1 2021-07-13 05:00:04

Read in n number of random columns in pandas

Question

1 answers

solution1 1 2021-07-13 05:00:04

solution1
1 2021-07-13 05:00:04