I have a 80gb h5 file, and I want to just read say a random set of 1000 columns and assume I do not know the column names. How would we achieve this?
You should first know the number of columns in your file. Let's assume 10000 here.
You can then use a combination of numpy.random and the columns
option of pandas.read_hdf :
pd.read_hdf('file', columns=sorted(np.random.choice(range(10000), size=1000, replace=False)))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.