熊猫在较小的数据框中合并或合并

Question

我有一个问题，即我有一个长数据帧和一个短数据帧，并且我想合并，以便较短的数据帧重复自身以填充较长（左）df的长度。

df1:

| Index  | Wafer | Chip | Value |
---------------------------------
| 0      | 1     | 32   | 0.99  |
| 1      | 1     | 33   | 0.89  |
| 2      | 1     | 39   | 0.96  |
| 3      | 2     | 32   | 0.81  |
| 4      | 2     | 33   | 0.87  |

df2:

| Index  |   x   |   y  |
-------------------------
| 0      |   1   |   3  |
| 1      |   2   |   2  |
| 2      |   1   |   6  |


df_combined:

| Index  | Wafer | Chip | Value |   x   |   y   |
-------------------------------------------------
| 0      | 1     | 32   | 0.99  |   1   |   3   |
| 1      | 1     | 33   | 0.89  |   2   |   2   |
| 2      | 1     | 39   | 0.96  |   1   |   6   |
| 3      | 2     | 32   | 0.81  |   1   |   3   |  <--- auto-repeats...
| 4      | 2     | 33   | 0.87  |   2   |   2   |

这是内置的join / merge-type，还是需要某种循环？

{这只是错误的数据，但dfs超过1000行...}

当前代码是一个简单的外部合并，但没有提供填充/重复结束：

df = main.merge(df_coords, left_index=True, right_index = True, how='outer')并给出NaN。

我检查了一下：合并两个不同长度的python pandas数据帧，但将所有行保留在输出数据帧 pandas中：根据单元格值将行从小数据框复制到大数据框

感觉这可能是合并功能中的某个争论……但我找不到它。 非常感谢任何帮助。

谢谢

Answer 1

您可以重复df2直到它与df1一样长，然后reset_index并merge ：

new_len = round(len(df1)/len(df2))
repeated = (pd.concat([df2] * new_len)
              .reset_index()
              .drop(["index"], 1)
              .iloc[:len(df1)])

repeated
   x  y
0  1  3
1  2  2
2  1  6
3  1  3
4  2  2

df1.merge(repeated, how="outer", left_index=True, right_index=True)
   Wafer  Chip  Value   x  y
0      1    32    0.99  1  3
1      1    33    0.89  2  2
2      1    39    0.96  1  6
3      2    32    0.81  1  3
4      2    33    0.87  2  2

有点hacky，但应该可以。

注意：我假设您的Index列实际上不是列，但实际上旨在表示数据帧索引。 我做这个假设是因为您在merge()代码中引用了left_index / right_index args。 如果Index实际上是它自己的列，则此代码基本上可以工作，如果您不希望在最终df它，则只需drop Index 。

Answer 2

您可以通过在df1["Index"]的值上左连接mod df2["Index"]的长度来实现：

# Creating Modular Index values on df1
n = df2.shape[0]
df1["Modular Index"] = df1["Index"].apply(lambda x: str(int(x)%n))

# Merging dataframes
df_combined = df1.merge(df2, how="left", left_on="Modular Index", right_on="Index")

# Dropping unnecessary columns
df_combined = df_combined.drop(["Modular Index", "Index_y"], axis=1)

print(df_combined)

0 Index_x Wafer Chip Value  x  y
0       0     1   32  0.99  1  3
1       1     1   33  0.89  2  2
2       2     1   39  0.96  1  6
3       3     2   32  0.81  1  3
4       4     2   33  0.87  2  2

熊猫在较小的数据框中合并或合并

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-08-06 10:53:10

解决方案2
0 2018-08-06 11:41:45

熊猫在较小的数据框中合并或合并

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-08-06 10:53:10

解决方案2 0 2018-08-06 11:41:45

解决方案1
2 已采纳 2018-08-06 10:53:10

解决方案2
0 2018-08-06 11:41:45