简体   繁体   中英

Read in n number of random columns in pandas

I have a 80gb h5 file, and I want to just read say a random set of 1000 columns and assume I do not know the column names. How would we achieve this?

You should first know the number of columns in your file. Let's assume 10000 here.

You can then use a combination of numpy.random and the columns option of pandas.read_hdf :

pd.read_hdf('file', columns=sorted(np.random.choice(range(10000), size=1000, replace=False)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM