简体   繁体   中英

Pandas merge or join in smaller dataframe

I have an issue whereby I have one long dataframe and one short dataframe, and I want to merge so that the shorter dataframe repeats itself to fill the length of the longer (left) df.

df1:

| Index  | Wafer | Chip | Value |
---------------------------------
| 0      | 1     | 32   | 0.99  |
| 1      | 1     | 33   | 0.89  |
| 2      | 1     | 39   | 0.96  |
| 3      | 2     | 32   | 0.81  |
| 4      | 2     | 33   | 0.87  |

df2:

| Index  |   x   |   y  |
-------------------------
| 0      |   1   |   3  |
| 1      |   2   |   2  |
| 2      |   1   |   6  |


df_combined:

| Index  | Wafer | Chip | Value |   x   |   y   |
-------------------------------------------------
| 0      | 1     | 32   | 0.99  |   1   |   3   |
| 1      | 1     | 33   | 0.89  |   2   |   2   |
| 2      | 1     | 39   | 0.96  |   1   |   6   |
| 3      | 2     | 32   | 0.81  |   1   |   3   |  <--- auto-repeats...
| 4      | 2     | 33   | 0.87  |   2   |   2   |

Is this a built in join/merge-type, or requiring a loop of some sort?

{This is just false data, but dfs are over 1000 rows...}

Current code is a simple outer merge, but doesn't provide the fill/repeat to end:

df = main.merge(df_coords, left_index=True, right_index = True, how='outer') and just gives NaNs.

I've checked around: Merge two python pandas data frames of different length but keep all rows in output data frame pandas: duplicate rows from small dataframe to large based on cell value

and it feels like this could be an arguement somewhere in a merge function... but I can't find it. Any help gratefully received.

Thanks

You can repeat df2 until it's as long as df1 , then reset_index and merge :

new_len = round(len(df1)/len(df2))
repeated = (pd.concat([df2] * new_len)
              .reset_index()
              .drop(["index"], 1)
              .iloc[:len(df1)])

repeated
   x  y
0  1  3
1  2  2
2  1  6
3  1  3
4  2  2

df1.merge(repeated, how="outer", left_index=True, right_index=True)
   Wafer  Chip  Value   x  y
0      1    32    0.99  1  3
1      1    33    0.89  2  2
2      1    39    0.96  1  6
3      2    32    0.81  1  3
4      2    33    0.87  2  2

A little hacky, but it should work.

Note: I'm assuming your Index column is not actually a column, but is in fact intended to represent the data frame index. I'm making this assumption because you refer to left_index / right_index args in your merge() code. If Index is actually its own column, this code will basically work, you'll just need to drop Index as well if you don't want it in the final df .

You can achieve this with a left join on the value of df1["Index"] mod the length of df2["Index"] :

# Creating Modular Index values on df1
n = df2.shape[0]
df1["Modular Index"] = df1["Index"].apply(lambda x: str(int(x)%n))

# Merging dataframes
df_combined = df1.merge(df2, how="left", left_on="Modular Index", right_on="Index")

# Dropping unnecessary columns
df_combined = df_combined.drop(["Modular Index", "Index_y"], axis=1)

print(df_combined)

0 Index_x Wafer Chip Value  x  y
0       0     1   32  0.99  1  3
1       1     1   33  0.89  2  2
2       2     1   39  0.96  1  6
3       3     2   32  0.81  1  3
4       4     2   33  0.87  2  2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM