简体   繁体   English

Pandas 对 dataframe 进行采样,但根据列将多行视为单行

[英]Pandas sampling a dataframe but treating multiple rows as a single row based on column

Consider the following toy code that performs a simplified version of my actual question:考虑以下玩具代码,它执行我的实际问题的简化版本:

import pandas

df = pandas.DataFrame(
    {
        'n_event':     [1,2,3,4,5],
        'some column': [0,1,2,3,4],
    }
)

df = df.set_index(['n_event'])
print(df)

resampled_df = df.sample(frac=1, replace=True)
print(resampled_df)

The resampled_df is, as it name suggests, a resampled version of the original one (with replacement).顾名思义, resampled_df是原始版本的重新采样版本(带替换)。 This is exactly what I want.这正是我想要的。 An example output of the previous code is前面代码的示例 output 是

         some column
n_event             
1                  0
2                  1
3                  2
4                  3
5                  4
         some column
n_event             
4                  3
1                  0
4                  3
4                  3
2                  1

Now for my actual question I have the following dataframe:现在对于我的实际问题,我有以下 dataframe:

import pandas

df = pandas.DataFrame(
    {
        'n_event':     [1,1,2,2,3,3,4,4,5,5],
        'n_channel':   [1,2,1,2,1,2,1,2,1,2],
        'some column': [0,1,2,3,4,5,6,7,8,9],
    }
)

df = df.set_index(['n_event','n_channel'])
print(df)

which looks like看起来像

                   some column
n_event n_channel             
1       1                    0
        2                    1
2       1                    2
        2                    3
3       1                    4
        2                    5
4       1                    6
        2                    7
5       1                    8
        2                    9

I want to do exactly the same as before, resample with replacements, but treating each group of rows with the same n_event as a single entity.我想做与以前完全相同的操作,使用替换重新采样,但将具有相同n_event的每组行视为单个实体。 A hand-built example of what I want to do can look like this:我想要做的手工构建示例如下所示:

                   some column
n_event n_channel             
2       1                    2
        2                    3
2       1                    2
        2                    3
3       1                    4
        2                    5
1       1                    0
        2                    1
5       1                    8
        2                    9

As seen, each n_event was treated as a whole and things within each event were no mixed up.正如所见,每个n_event都被视为一个整体,并且每个事件中的事物都没有混淆。

How can I do this without proceeding by brute force (ie without for loops, etc)?我怎样才能做到这一点而不通过蛮力进行(即没有for循环等)?

I have tried with df.sample(frac=1, replace=True, ignore_index=False) and a few things using group_by without success.我尝试使用df.sample(frac=1, replace=True, ignore_index=False)和一些使用group_by的东西但没有成功。

Would a pivot() / melt() sequence work for you? pivot() / melt()序列对你有用吗?

Use pivot() to from long to wide (make each group a single row).使用pivot()从长到宽(使每个组成为单行)。
Do the sampling.进行抽样。
Then back from wide to long using melt() .然后使用melt()从宽变长。

Don't have time to work out a full answer but thought I would get this idea to you in case it might help you.没有时间想出一个完整的答案,但我想我会把这个想法告诉你,以防它对你有所帮助。

Following the suggestion of jch I was able to find a solution by combining pivot and stack :按照jch 的建议,我能够通过结合pivotstack找到解决方案:

import pandas

df = pandas.DataFrame(
    {
        'n_event':     [1,1,2,2,3,3,4,4,5,5],
        'n_channel':   [1,2,1,2,1,2,1,2,1,2],
        'some column': [0,1,2,3,4,5,6,7,8,9],
        'other col':   [5,6,4,3,2,5,2,6,8,7],
    }
)

resampled_df = df.pivot(
    index = 'n_event',
    columns = 'n_channel',
    values = set(df.columns) - {'n_event','n_channel'},
)
resampled_df = resampled_df.sample(frac=1, replace=True)
resampled_df = resampled_df.stack()
print(resampled_df)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将 dataframe 中的多行与单行逐列进行比较 - Comparing multiple rows in dataframe to single row by column 基于行和多列的pandas dataframe列 - pandas dataframe column based on row and multiple columns 如何根据多个条件根据前一行填充 pandas dataframe 列的行? - How to populate rows of pandas dataframe column based with previous row based on a multiple conditions? Python-Pandas-根据分类值将多列的行合并到数据框中的单行 - Python - Pandas - Combining rows of multiple columns into single row in dataframe based on categorical value 如何根据 Pandas Dataframe 中的值将多行合并为一行? - How can I consolidate multiple rows into a single row based off their values in a Pandas Dataframe? Pandas dataframe 将多行和多列转换为单行[key]和列[key] - Pandas dataframe convert multiple rows and columns to single row[key] and column[key] 是否可以根据对单个 pandas DataFrame 列的查询跨行进行多次更新 - Is it possible to multiple updates across rows based on a query on single pandas DataFrame column 如何基于熊猫数据框中的单列(内爆或嵌套)合并多行? - How to merge multiple rows based on a single column (implode or nest) in pandas dataframe? pandas dataframe 中的每一行根据列表列中的多行计算总和 - Calculate sum based on multiple rows from list column for each row in pandas dataframe 如何使用多索引将 pandas dataframe 中的单行与多行相加? - How to sum single row to multiple rows in pandas dataframe using multiindex?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM