[英]How can I select a sequence of random rows from a pandas DataFrame?
My data is:我的数据是:
dOpen dHigh dLow dClose dVolume day_of_week_0 day_of_week_1 ... month_6 month_7 month_8 month_9 month_10 month_11 month_12
0 0.000000 0.000000 0.000000 0.000000 0.000000 0 0 ... 0 0 0 0 0 0 0
1 0.000000 0.006397 0.005000 0.007112 0.007111 1 0 ... 0 0 0 0 0 0 0
2 0.005686 0.002825 0.003554 0.002119 0.002119 0 1 ... 0 0 0 0 0 0 0
3 0.004240 0.010563 0.005666 0.010571 0.010571 0 0 ... 0 0 0 0 0 0 0
4 0.012667 0.005575 0.002113 0.004184 0.004184 0 0 ... 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
6787 -0.002750 0.001527 0.002214 0.006877 0.006877 1 0 ... 0 0 0 0 0 0 0
6788 0.003309 0.002012 0.002823 -0.001525 -0.001525 0 1 ... 0 0 0 0 0 0 0
6789 -0.000366 0.001217 0.001285 0.002260 0.002260 0 0 ... 0 0 0 0 0 0 0
6790 0.007179 0.005775 0.006692 0.008318 0.008318 0 0 ... 0 0 0 0 0 0 0
6791 0.006066 0.003808 0.004249 0.003113 0.003113 0 0 ... 0 0 0 0 0 0 0
I want to select 5 consecutive rows (at random).我想选择 5 个连续的行(随机)。 I've tried with .sample
, but that just loads a random n
rows that aren't consecutive.我试过.sample
,但它只是加载了一个随机的n
行,这些行不连续。
Here's one approach using random.randint
:这是使用random.randint
的一种方法:
import random
nrows = range(df.shape[0])
ix = random.randint(nrows.start, nrows.stop-5)
df.iloc[ix:ix+5, :]
dOpen dHigh dLow dClose dVolume day_of_week_0 \
4 4 0.012667 0.005575 0.002113 0.004184 0.004184
5 6787 -0.002750 0.001527 0.002214 0.006877 0.006877
6 6788 0.003309 0.002012 0.002823 -0.001525 -0.001525
7 6789 -0.000366 0.001217 0.001285 0.002260 0.002260
8 6790 0.007179 0.005775 0.006692 0.008318 0.008318
9 6791 0.006066 0.003808 0.004249 0.003113 0.003113
day_of_week_1 ... month_6 month_7 month_8 month_9 month_10 month_11 \
4 0 0 0 0 0 0 0 0
5 1 0 0 0 0 0 0 0
6 0 1 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0
month_12
4 0
5 0
6 0
7 0
8 0
9 0
Choose a random row n and then take the n to n+5 rows随机选择第n行,然后取n到n+5行
n = random.randint(0, rows_in_dataframe-5)
five_random_consecutive_rows = dataframe[n:n+5]
You can also use a random choice
on the df.index
, then using get the location using get_loc
, and slice using df.iloc[]
您还可以在df.index
上使用random choice
,然后使用get_loc
获取位置,并使用df.iloc[]
切片
s=np.random.choice(df.index[:-5],1)
df.iloc[df.index.get_loc(s[0]):df.index.get_loc(s[0])+5]
why not just get one sample and then get the N consecutive rows afterwards?为什么不只获取一个样本,然后获取连续的 N 行呢?
random_position = df.sample(1).index
no_consecutives = 5
len_df = len(df)
# see if adding the consecutives it will be higher than df len()
if random_position + no_consecutives > len_df :
random_position = len_df - no_consecutives
df_random = df.loc[random_position:random_position+no_consecutives)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.