將每個 dataframe 行切成 3 windows 不同的切片范圍

Question

我想將我的 dataframe 的每一行切成 3 個 windows，切片索引存儲在另一個 dataframe 中，並為 dataframe 的每一行進行更改。之后我想返回一個包含 88382368 形式的 88382368 的單個 dataframe。 每個 windows 中比 window 中最長的行短的行應該用 NaN 值填充。 由於我的實際 dataframe 有大約 100.000 行和 600 列，我關心一個有效的解決方案。

考慮以下示例：

這是我的 dataframe 我想分成 3 windows

>>> df
  0  1  2  3  4  5  6  7
0 0  1  2  3  4  5  6  7
1 8  9  10 11 12 13 14 15
2 16 17 18 19 20 21 22 23

第二個 dataframe 包含我的切片索引，其行數與df相同：

>>> df_slice
  0 1
0 3 5
1 2 6
2 4 7

我試過對 windows 進行切片，如下所示：

first_window = df.iloc[:, :df_slice.iloc[:, 0]]
first_window.columns = pd.MultiIndex.from_tuples([("A", c) for c in first_window.columns])

second_window = df.iloc[:, df_slice.iloc[:, 0] : df_slice.iloc[:, 1]]
second_window.columns = pd.MultiIndex.from_tuples([("B", c) for c in second_window.columns])

third_window = df.iloc[:, df_slice.iloc[:, 1]:]
third_window.columns = pd.MultiIndex.from_tuples([("C", c) for c in third_window.columns])
result = pd.concat([first_window,
                    second_window,
                    third_window], axis=1)

這給了我以下錯誤：

TypeError: cannot do slice indexing on <class 'pandas.core.indexes.range.RangeIndex'> with these indexers [0    3
1    2
2    4
Name: 0, dtype: int64] of <class 'pandas.core.series.Series'>

我預期的 output 是這樣的：

>>> result
    A                   B                   C           
    0   1     2     3   4   5     6     7   8     9    10
0   0   1     2   NaN   3   4   NaN   NaN   5     6    7
1   8   9   NaN   NaN  10  11    12    13  14    15  NaN
2  16  17    18    19  20  21    22   NaN  23   NaN  NaN

是否有一個有效的解決方案來解決我的問題，而無需遍歷我的 dataframe 的每一行？

Answer 1

這是一個解決方案，使用melt然后pivot_table ，加上一些邏輯：

確定三組“A”、“B”和“C”。
將列向左移動，這樣 NaN 只會出現在每個 window 的右側。
重命名列以獲得預期的 output。

    t = df.reset_index().melt(id_vars="index")
    t = pd.merge(t, df_slice, left_on="index", right_index=True)
    t.variable = pd.to_numeric(t.variable)
    
    t.loc[t.variable < t.c_0,"group"] = "A"
    t.loc[(t.variable >= t.c_0) & (t.variable < t.c_1), "group"] = "B"
    t.loc[t.variable >= t.c_1, "group"] = "C"

    # shift relevant values to the left
    shift_val = t.groupby(["group", "index"]).variable.transform("min") - t.groupby(["group"]).variable.transform("min")
    t.variable = t.variable - shift_val
    
    # extract a, b, and c groups, and create a multi-level index for their
    # columns
    df_a = pd.pivot_table(t[t.group == "A"], index= "index", columns="variable", values="value")
    df_a.columns = pd.MultiIndex.from_product([["a"], df_a.columns])
    
    df_b = pd.pivot_table(t[t.group == "B"], index= "index", columns="variable", values="value")
    df_b.columns = pd.MultiIndex.from_product([["b"], df_b.columns])
    
    df_c = pd.pivot_table(t[t.group == "C"], index= "index", columns="variable", values="value")
    df_c.columns = pd.MultiIndex.from_product([["c"], df_c.columns])
    
    res = pd.concat([df_a, df_b, df_c], axis=1)
    
    res.columns = pd.MultiIndex.from_tuples([(c[0], i) for i, c in enumerate(res.columns)])
    
    print(res)

output 是：

          a                       b                       c           
         0     1     2     3     4     5     6     7     8     9    10
index                                                                 
0       0.0   1.0   2.0   NaN   3.0   4.0   NaN   NaN   5.0   6.0  7.0
1       8.0   9.0   NaN   NaN  10.0  11.0  12.0  13.0  14.0  15.0  NaN
2      16.0  17.0  18.0  19.0  20.0  21.0  22.0   NaN  23.0   NaN  NaN

將每個 dataframe 行切成 3 windows 不同的切片范圍

問題描述

1 個解決方案

解決方案1
0 已采納 2020-07-18 08:10:53

將每個 dataframe 行切成 3 windows 不同的切片范圍

問題描述

1 個解決方案

解決方案1 0 已采納 2020-07-18 08:10:53

解決方案1
0 已采納 2020-07-18 08:10:53