简体   繁体   English

将 pandas.DataFrame 附加到另一个 pandas.DataFrame 的一列

[英]Appending a pandas.DataFrame to one column of another pandas.DataFrame

A similar question has already been asked here , however the exact answer to what the OP wanted was not provided. 这里已经提出一个类似的问题,但是没有提供 OP 想要什么的确切答案。 So, I will try again.所以,我会再试一次。 I have data with 4 columns, named ['date', 'log time', 'choice', 'dt between ROI'] .我有 4 列的数据,名为['date', 'log time', 'choice', 'dt between ROI'] I would now like to filter this data based on two criteria, here named ['LED', 'drum'] .我现在想根据两个标准过滤这些数据,这里命名为['LED', 'drum'] In other words, if particular rows of the original pandas.DataFrame correspond to 'LED' , they get sorted under the 'LED' column of the master dataframe, and if they correspond to 'drum' , they get sorted under 'drum' column of the master dataframe.换句话说,如果原始 pandas.DataFrame 的特定行对应于'LED' ,它们将在主数据框的'LED'列下排序,如果它们对应于'drum' ,它们将在'drum'列下排序主数据框。 In this way, both 'LED' and 'drum' columns would have the same 4 subcolumns as the original data, ['date', 'log time', 'choice', 'dt between ROI'] .这样, 'LED''drum'列都将具有与原始数据相同的 4 个子列, ['date', 'log time', 'choice', 'dt between ROI'] Additionally, the 'LED' and 'drum' columns would not necessarily have the same number of rows.此外, 'LED''drum'列不一定具有相同的行数。

To start, I first created the master dataframe with the above described structure:首先,我首先创建了具有上述结构的主数据框:

master_df = pandas.DataFrame({
    'distraction': ['LED','LED','LED','LED','drum','drum','drum','drum']),
    '': ['date', 'log time', 'choice', 'dt between ROI', 'date', 'log time', 'choice', 'dt between ROI']
})

master_df = master_df.set_index(['distraction', '']).transpose()

This resulted in the desired final structure:这导致了所需的最终结构:

In: master_df
Out:
distraction     LED                                             drum
                date    log time    choice    dt between ROI    date    log time    choice    dt between ROI

In: master_df['LED']
Out:
date    log time    choice    dt between ROI

Next, my filtering function returns certain rows of the original dataframe:接下来,我的过滤函数返回原始数据帧的某些行:

output = filter_function(original_df)

Hence, output has the same structure as original_df :因此, outputoriginal_df具有相同的结构:

In: output
Out:
date    log time    choice    dt between ROI
x1      x2          x3        x4
y1      y1          y3        y4

Then I tried appending this output to the created master dataframe like so:然后我尝试将此output附加到创建的主数据帧,如下所示:

master_df['LED'] = master_df['LED'].append(output, ignore_index=True)

which resulted in the following error:导致以下错误:

ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series

Next, I tried:接下来,我尝试:

master_df = master_df['LED'].append(output, ignore_index=True)

which simply overwrote the above created structure.它只是覆盖了上面创建的结构。 What I really want to achieve is this:我真正想要实现的是:

In: master_df['LED'].append(output, ignore_index=True)
Out:
LED                                             drum
date    log time    choice    dt between ROI    date    log time    choice    dt between ROI
x1      x2          x3        x4
y1      y1          y3        y4

and likewise:同样:

In: master_df['drum'].append(output, ignore_index=True)
Out:
LED                                             drum
date    log time    choice    dt between ROI    date    log time    choice    dt between ROI
                                                x1      x2          x3        x4
                                                y1      y1          y3        y4

I am not sure, if pandas can handle empty rows, but I guess NaN would be OK.我不确定熊猫是否可以处理空行,但我想NaN可以。 After the filtering is done, I then wish to recall the two filtered datasets by simply calling master_df['LED'] or master_df['drum'] .过滤完成后,我希望通过简单地调用master_df['LED']master_df['drum']来调用两个过滤后的数据集。 Is there a way to do this?有没有办法做到这一点?

Many thanks for your help!非常感谢您的帮助!

EDIT: Fixed criterium -> distraction to avoid confusion.编辑:固定criterium -> distraction以避免混淆。

The point of @Dani is that your code does not work. @Dani 的重点是您的代码不起作用。 Anyway it works if one replaces 'criterium' with 'distraction' which I assume was the intent.无论如何,如果用我认为是意图的'distraction'替换'criterium' ,它就会起作用。

So to your question, you can do the following.因此,对于您的问题,您可以执行以下操作。 You can prepend your output with another level of column multi-index so that it matches the column structure of master_df .您可以在output添加另一个级别的列多索引,以便它与master_df的列结构匹配。 Then you can safely append or concat然后你可以安全地appendconcat

# if this goes into LED group; otherwise use 'drum' 
output_LED = pd.concat([output], keys = ['LED'], axis=1)
master_df2 = master_df.append(output_LED)
master_df2

produces产生

    LED drum
choice  date    dt between ROI  log time    choice  date    dt between ROI  log time
0   x3  x1  x4  x2  NaN NaN NaN NaN
1   y3  y1  y4  y1  NaN NaN NaN NaN

and

masket_df2['LED']

produces产生


choice  date    dt between ROI  log time
0   x3  x1  x4  x2
1   y3  y1  y4  y1

and

master_df2['drum']

produces产生


choice  date    dt between ROI  log time
0   NaN NaN NaN NaN
1   NaN NaN NaN NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM