[英]Pandas sampling a dataframe but treating multiple rows as a single row based on column
Consider the following toy code that performs a simplified version of my actual question:考虑以下玩具代码,它执行我的实际问题的简化版本:
import pandas
df = pandas.DataFrame(
{
'n_event': [1,2,3,4,5],
'some column': [0,1,2,3,4],
}
)
df = df.set_index(['n_event'])
print(df)
resampled_df = df.sample(frac=1, replace=True)
print(resampled_df)
The resampled_df
is, as it name suggests, a resampled version of the original one (with replacement).顾名思义, resampled_df
是原始版本的重新采样版本(带替换)。 This is exactly what I want.这正是我想要的。 An example output of the previous code is前面代码的示例 output 是
some column
n_event
1 0
2 1
3 2
4 3
5 4
some column
n_event
4 3
1 0
4 3
4 3
2 1
Now for my actual question I have the following dataframe:现在对于我的实际问题,我有以下 dataframe:
import pandas
df = pandas.DataFrame(
{
'n_event': [1,1,2,2,3,3,4,4,5,5],
'n_channel': [1,2,1,2,1,2,1,2,1,2],
'some column': [0,1,2,3,4,5,6,7,8,9],
}
)
df = df.set_index(['n_event','n_channel'])
print(df)
which looks like看起来像
some column
n_event n_channel
1 1 0
2 1
2 1 2
2 3
3 1 4
2 5
4 1 6
2 7
5 1 8
2 9
I want to do exactly the same as before, resample with replacements, but treating each group of rows with the same n_event
as a single entity.我想做与以前完全相同的操作,使用替换重新采样,但将具有相同n_event
的每组行视为单个实体。 A hand-built example of what I want to do can look like this:我想要做的手工构建示例如下所示:
some column
n_event n_channel
2 1 2
2 3
2 1 2
2 3
3 1 4
2 5
1 1 0
2 1
5 1 8
2 9
As seen, each n_event
was treated as a whole and things within each event were no mixed up.正如所见,每个n_event
都被视为一个整体,并且每个事件中的事物都没有混淆。
How can I do this without proceeding by brute force (ie without for
loops, etc)?我怎样才能做到这一点而不通过蛮力进行(即没有for
循环等)?
I have tried with df.sample(frac=1, replace=True, ignore_index=False)
and a few things using group_by
without success.我尝试使用df.sample(frac=1, replace=True, ignore_index=False)
和一些使用group_by
的东西但没有成功。
Would a pivot()
/ melt()
sequence work for you? pivot()
/ melt()
序列对你有用吗?
Use pivot()
to from long to wide (make each group a single row).使用pivot()
从长到宽(使每个组成为单行)。
Do the sampling.进行抽样。
Then back from wide to long using melt()
.然后使用melt()
从宽变长。
Don't have time to work out a full answer but thought I would get this idea to you in case it might help you.没有时间想出一个完整的答案,但我想我会把这个想法告诉你,以防它对你有所帮助。
Following the suggestion of jch I was able to find a solution by combining pivot
and stack
:按照jch 的建议,我能够通过结合pivot
和stack
找到解决方案:
import pandas
df = pandas.DataFrame(
{
'n_event': [1,1,2,2,3,3,4,4,5,5],
'n_channel': [1,2,1,2,1,2,1,2,1,2],
'some column': [0,1,2,3,4,5,6,7,8,9],
'other col': [5,6,4,3,2,5,2,6,8,7],
}
)
resampled_df = df.pivot(
index = 'n_event',
columns = 'n_channel',
values = set(df.columns) - {'n_event','n_channel'},
)
resampled_df = resampled_df.sample(frac=1, replace=True)
resampled_df = resampled_df.stack()
print(resampled_df)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.