I have an issue whereby I have one long dataframe and one short dataframe, and I want to merge so that the shorter dataframe repeats itself to fill the length of the longer (left) df.
df1:
| Index | Wafer | Chip | Value |
---------------------------------
| 0 | 1 | 32 | 0.99 |
| 1 | 1 | 33 | 0.89 |
| 2 | 1 | 39 | 0.96 |
| 3 | 2 | 32 | 0.81 |
| 4 | 2 | 33 | 0.87 |
df2:
| Index | x | y |
-------------------------
| 0 | 1 | 3 |
| 1 | 2 | 2 |
| 2 | 1 | 6 |
df_combined:
| Index | Wafer | Chip | Value | x | y |
-------------------------------------------------
| 0 | 1 | 32 | 0.99 | 1 | 3 |
| 1 | 1 | 33 | 0.89 | 2 | 2 |
| 2 | 1 | 39 | 0.96 | 1 | 6 |
| 3 | 2 | 32 | 0.81 | 1 | 3 | <--- auto-repeats...
| 4 | 2 | 33 | 0.87 | 2 | 2 |
Is this a built in join/merge-type, or requiring a loop of some sort?
{This is just false data, but dfs are over 1000 rows...}
Current code is a simple outer merge, but doesn't provide the fill/repeat to end:
df = main.merge(df_coords, left_index=True, right_index = True, how='outer')
and just gives NaNs.
I've checked around: Merge two python pandas data frames of different length but keep all rows in output data frame pandas: duplicate rows from small dataframe to large based on cell value
and it feels like this could be an arguement somewhere in a merge function... but I can't find it. Any help gratefully received.
Thanks
You can repeat df2
until it's as long as df1
, then reset_index
and merge
:
new_len = round(len(df1)/len(df2))
repeated = (pd.concat([df2] * new_len)
.reset_index()
.drop(["index"], 1)
.iloc[:len(df1)])
repeated
x y
0 1 3
1 2 2
2 1 6
3 1 3
4 2 2
df1.merge(repeated, how="outer", left_index=True, right_index=True)
Wafer Chip Value x y
0 1 32 0.99 1 3
1 1 33 0.89 2 2
2 1 39 0.96 1 6
3 2 32 0.81 1 3
4 2 33 0.87 2 2
A little hacky, but it should work.
Note: I'm assuming your Index
column is not actually a column, but is in fact intended to represent the data frame index. I'm making this assumption because you refer to left_index
/ right_index
args in your merge()
code. If Index
is actually its own column, this code will basically work, you'll just need to drop
Index
as well if you don't want it in the final df
.
You can achieve this with a left join on the value of df1["Index"]
mod the length of df2["Index"]
:
# Creating Modular Index values on df1
n = df2.shape[0]
df1["Modular Index"] = df1["Index"].apply(lambda x: str(int(x)%n))
# Merging dataframes
df_combined = df1.merge(df2, how="left", left_on="Modular Index", right_on="Index")
# Dropping unnecessary columns
df_combined = df_combined.drop(["Modular Index", "Index_y"], axis=1)
print(df_combined)
0 Index_x Wafer Chip Value x y
0 0 1 32 0.99 1 3
1 1 1 33 0.89 2 2
2 2 1 39 0.96 1 6
3 3 2 32 0.81 1 3
4 4 2 33 0.87 2 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.