简体   繁体   English

如何从 pandas DataFrame 中选择一系列随机行?

[英]How can I select a sequence of random rows from a pandas DataFrame?

My data is:我的数据是:

         dOpen     dHigh      dLow    dClose   dVolume  day_of_week_0  day_of_week_1  ...  month_6  month_7  month_8  month_9  month_10  month_11  month_12
0     0.000000  0.000000  0.000000  0.000000  0.000000              0              0  ...        0        0        0        0         0         0         0
1     0.000000  0.006397  0.005000  0.007112  0.007111              1              0  ...        0        0        0        0         0         0         0
2     0.005686  0.002825  0.003554  0.002119  0.002119              0              1  ...        0        0        0        0         0         0         0
3     0.004240  0.010563  0.005666  0.010571  0.010571              0              0  ...        0        0        0        0         0         0         0
4     0.012667  0.005575  0.002113  0.004184  0.004184              0              0  ...        0        0        0        0         0         0         0
...        ...       ...       ...       ...       ...            ...            ...  ...      ...      ...      ...      ...       ...       ...       ...
6787 -0.002750  0.001527  0.002214  0.006877  0.006877              1              0  ...        0        0        0        0         0         0         0
6788  0.003309  0.002012  0.002823 -0.001525 -0.001525              0              1  ...        0        0        0        0         0         0         0
6789 -0.000366  0.001217  0.001285  0.002260  0.002260              0              0  ...        0        0        0        0         0         0         0
6790  0.007179  0.005775  0.006692  0.008318  0.008318              0              0  ...        0        0        0        0         0         0         0
6791  0.006066  0.003808  0.004249  0.003113  0.003113              0              0  ...        0        0        0        0         0         0         0

I want to select 5 consecutive rows (at random).我想选择 5 个连续的行(随机)。 I've tried with .sample , but that just loads a random n rows that aren't consecutive.我试过.sample ,但它只是加载了一个随机的n行,这些行不连续。

Here's one approach using random.randint :这是使用random.randint的一种方法:

import random

nrows = range(df.shape[0])
ix = random.randint(nrows.start, nrows.stop-5)
df.iloc[ix:ix+5, :]

 dOpen     dHigh      dLow    dClose   dVolume  day_of_week_0  \
4      4  0.012667  0.005575  0.002113  0.004184       0.004184   
5   6787 -0.002750  0.001527  0.002214  0.006877       0.006877   
6   6788  0.003309  0.002012  0.002823 -0.001525      -0.001525   
7   6789 -0.000366  0.001217  0.001285  0.002260       0.002260   
8   6790  0.007179  0.005775  0.006692  0.008318       0.008318   
9   6791  0.006066  0.003808  0.004249  0.003113       0.003113   

   day_of_week_1  ...  month_6  month_7  month_8  month_9  month_10  month_11  \
4              0    0        0        0        0        0         0         0   
5              1    0        0        0        0        0         0         0   
6              0    1        0        0        0        0         0         0   
7              0    0        0        0        0        0         0         0   
8              0    0        0        0        0        0         0         0   
9              0    0        0        0        0        0         0         0   

   month_12  
4         0  
5         0  
6         0  
7         0  
8         0  
9         0  

Choose a random row n and then take the n to n+5 rows随机选择第n行,然后取n到n+5行

n = random.randint(0, rows_in_dataframe-5)

five_random_consecutive_rows = dataframe[n:n+5]

You can also use a random choice on the df.index , then using get the location using get_loc , and slice using df.iloc[]您还可以在df.index上使用random choice ,然后使用get_loc获取位置,并使用df.iloc[]切片

s=np.random.choice(df.index[:-5],1)
df.iloc[df.index.get_loc(s[0]):df.index.get_loc(s[0])+5]

why not just get one sample and then get the N consecutive rows afterwards?为什么不只获取一个样本,然后获取连续的 N 行呢?

random_position = df.sample(1).index
no_consecutives = 5
len_df = len(df)
# see if adding the consecutives it will be higher than df len()
if random_position + no_consecutives > len_df :
    random_position = len_df - no_consecutives

df_random = df.loc[random_position:random_position+no_consecutives)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM