简体   繁体   English

使用 wide_to_long 将多个时间列合并为一个列

[英]Merge multiple time columns into a single column using wide_to_long

I have a dataframe that has multiple time columns and an assigned value.我有一个 dataframe,它有多个时间列和一个赋值。

import pandas as pd
df = pd.DataFrame({'time': {0: 0.2, 1: 0.3, 2: 0.4, 3: nan}, 'val1': {0: 1.0, 1: 3.0, 2: 1.0, 3: nan}, 'time2': {0: 0.1, 1: 0.4, 2: 0.8, 3: 1.0}, 'val2': {0: 2, 1: 2, 2: 9, 3: 2}})

Which looks like this:看起来像这样:

   time  val1  time2  val2
0   0.2   1.0    0.1     2
1   0.3   3.0    0.4     2
2   0.4   1.0    0.8     9
3   NaN   NaN    1.0     2

There can be many more time and value columns (but they always come in pairs).可以有更多的时间和值列(但它们总是成对出现)。 I would like to merge all the times columns into ONE column, while keeping and filling in the val 's to their corressponding times.我想将所有时间列合并到一个列中,同时将val保留并填充到它们相应的时间。

Example output:示例 output:

   time  val1   val2
0   0.1   1.0    2.0     
1   0.2   1.0    2.0     
2   0.3   3.0    2.0     
3   0.4   1.0    2.0   
4   0.8   1.0    9.0   
5   1.0   1.0    2.0     

I have asked this question previously, and one answer got very close: Answer and output below:我以前问过这个问题,一个答案非常接近:答案和下面的 output:

df1 = (pd.wide_to_long(df.rename(columns={'time':'time1'}).reset_index(),
                      'time', i='index', j='t')
        .sort_values(['time','val2'])
        .drop_duplicates('time')
        .dropna(subset=['time'])
        .reset_index(drop=True))

output: output:

   val1  val2  time
0   1.0     2   0.1
1   1.0     2   0.2
2   3.0     2   0.3
3   3.0     2   0.4 <- val1 incorrect
4   1.0     9   0.8
5   NaN     2   1.0

IIUC, you can't achieve this with wide_to_long . IIUC,您无法使用wide_to_long实现此目的。

You don't have a canonical reshaping.您没有规范的重塑。 There are duplicate values (eg time 0.4) and you need to make a choice here.有重复的值(例如时间0.4),你需要在这里做出选择。

So, I guess you need to perform two merges and combine in the desired order:因此,我想您需要执行两次合并并按所需顺序合并:

m1 = (
 df[['time', 'val1']]
 .merge(df[['time2', 'val2']]
        .rename(columns={'time2': 'time'}),
        on='time', how='outer')
 .sort_values(by='time')
)

m2 = (
 df[['time', 'val2']]
 .merge(df[['time2', 'val1']]
        .rename(columns={'time2': 'time'}),
        on='time', how='outer')
 .sort_values(by='time')
)

out = m1.combine_first(m2).dropna(subset='time')

output: output:

   time  val1  val2
4   0.1   1.0   2.0
0   0.2   1.0   2.0
1   0.3   3.0   2.0
2   0.4   1.0   2.0
5   0.8   1.0   9.0
6   1.0   NaN   2.0

Here is another simple approach.这是另一种简单的方法。 melt the data, sort to have time1 before time2 and in case of duplicate times, get the first row for val1 and last for val2. melt数据,将 time1 排序在 time2 之前,如果出现重复时间,则获取 val1 的第一行和 val2 的最后一行。

cols = ['val1', 'val2']
(df
 .rename(columns={'time': 'time1'})
 .melt(id_vars=cols, value_name='time')
 .dropna(subset='time')
 .sort_values(by=['time', 'variable'])
 .groupby('time').agg({'val1': 'first', 'val2': 'last'})
 .reset_index()
)

output: output:

   time  val1  val2
0   0.1   1.0     2
1   0.2   1.0     2
2   0.3   3.0     2
3   0.4   1.0     2
4   0.8   1.0     9
5   1.0   NaN     2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM