如何从 pandas DataFrame 中选择一系列随机行？

Question

My data is:我的数据是：

         dOpen     dHigh      dLow    dClose   dVolume  day_of_week_0  day_of_week_1  ...  month_6  month_7  month_8  month_9  month_10  month_11  month_12
0     0.000000  0.000000  0.000000  0.000000  0.000000              0              0  ...        0        0        0        0         0         0         0
1     0.000000  0.006397  0.005000  0.007112  0.007111              1              0  ...        0        0        0        0         0         0         0
2     0.005686  0.002825  0.003554  0.002119  0.002119              0              1  ...        0        0        0        0         0         0         0
3     0.004240  0.010563  0.005666  0.010571  0.010571              0              0  ...        0        0        0        0         0         0         0
4     0.012667  0.005575  0.002113  0.004184  0.004184              0              0  ...        0        0        0        0         0         0         0
...        ...       ...       ...       ...       ...            ...            ...  ...      ...      ...      ...      ...       ...       ...       ...
6787 -0.002750  0.001527  0.002214  0.006877  0.006877              1              0  ...        0        0        0        0         0         0         0
6788  0.003309  0.002012  0.002823 -0.001525 -0.001525              0              1  ...        0        0        0        0         0         0         0
6789 -0.000366  0.001217  0.001285  0.002260  0.002260              0              0  ...        0        0        0        0         0         0         0
6790  0.007179  0.005775  0.006692  0.008318  0.008318              0              0  ...        0        0        0        0         0         0         0
6791  0.006066  0.003808  0.004249  0.003113  0.003113              0              0  ...        0        0        0        0         0         0         0

I want to select 5 consecutive rows (at random).我想选择 5 个连续的行（随机）。 I've tried with .sample , but that just loads a random n rows that aren't consecutive.我试过.sample ，但它只是加载了一个随机的n行，这些行不连续。

Answer 1

Here's one approach using random.randint :这是使用random.randint的一种方法：

import random

nrows = range(df.shape[0])
ix = random.randint(nrows.start, nrows.stop-5)
df.iloc[ix:ix+5, :]

 dOpen     dHigh      dLow    dClose   dVolume  day_of_week_0  \
4      4  0.012667  0.005575  0.002113  0.004184       0.004184   
5   6787 -0.002750  0.001527  0.002214  0.006877       0.006877   
6   6788  0.003309  0.002012  0.002823 -0.001525      -0.001525   
7   6789 -0.000366  0.001217  0.001285  0.002260       0.002260   
8   6790  0.007179  0.005775  0.006692  0.008318       0.008318   
9   6791  0.006066  0.003808  0.004249  0.003113       0.003113   

   day_of_week_1  ...  month_6  month_7  month_8  month_9  month_10  month_11  \
4              0    0        0        0        0        0         0         0   
5              1    0        0        0        0        0         0         0   
6              0    1        0        0        0        0         0         0   
7              0    0        0        0        0        0         0         0   
8              0    0        0        0        0        0         0         0   
9              0    0        0        0        0        0         0         0   

   month_12  
4         0  
5         0  
6         0  
7         0  
8         0  
9         0

Answer 2

Choose a random row n and then take the n to n+5 rows随机选择第n行，然后取n到n+5行

n = random.randint(0, rows_in_dataframe-5)

five_random_consecutive_rows = dataframe[n:n+5]

Answer 3

You can also use a random choice on the df.index , then using get the location using get_loc , and slice using df.iloc[]您还可以在df.index上使用random choice ，然后使用get_loc获取位置，并使用df.iloc[]切片

s=np.random.choice(df.index[:-5],1)
df.iloc[df.index.get_loc(s[0]):df.index.get_loc(s[0])+5]

Answer 4

why not just get one sample and then get the N consecutive rows afterwards?为什么不只获取一个样本，然后获取连续的 N 行呢？

random_position = df.sample(1).index
no_consecutives = 5
len_df = len(df)
# see if adding the consecutives it will be higher than df len()
if random_position + no_consecutives > len_df :
    random_position = len_df - no_consecutives

df_random = df.loc[random_position:random_position+no_consecutives)

如何从 pandas DataFrame 中选择一系列随机行？

问题描述

4 个解决方案

解决方案1
5 2020-01-22 15:08:08

解决方案2
2 已采纳 2020-01-22 15:08:17

解决方案3
2 2020-01-22 15:13:48

解决方案4
1 2020-01-22 15:13:26

如何从 pandas DataFrame 中选择一系列随机行？

问题描述

4 个解决方案

解决方案1 5 2020-01-22 15:08:08

解决方案2 2 已采纳 2020-01-22 15:08:17

解决方案3 2 2020-01-22 15:13:48

解决方案4 1 2020-01-22 15:13:26

解决方案1
5 2020-01-22 15:08:08

解决方案2
2 已采纳 2020-01-22 15:08:17

解决方案3
2 2020-01-22 15:13:48

解决方案4
1 2020-01-22 15:13:26