[英]Join series with repeated index on dataframe where column values are equal to the index in the series
假設我有該系列的數據框,例如:
import pandas as pd
s = pd.Series([10,20,11,12,30,34],
index=["red","red","blue","blue","green","green"])
s.index.name="numbers"
df = pd.DataFrame({
"color":["red","green","blue","blue","red","green"],
"id":[1,2,3,4,5,6]})
我想將s
中的值添加到df
中的列中,其順序與它們在s
的索引等於df["color"]
時出現的順序相同,即
pd.some_function(df,s,left_on="color",right_index=True)
color id numbers
red 1 10
green 2 30
blue 3 11
blue 4 12
red 5 20
green 6 34
我已經嘗試過pd.merge
, pd.join
等,但我根本無法讓它工作(沒有循環df
,按color
過濾,從s
添加數據,然后在最后連接它)
您可以使用groupby.cumcount
為merge
設置唯一鍵:
idx1 = s.groupby(level=0).cumcount()
# [0, 1, 0, 1, 0, 1]
idx2 = df.groupby('color').cumcount()
# [0, 0, 0, 1, 1, 1]
s.index.name="color"
out = (df
.merge(s.reset_index(name='number'),
left_on=['color', idx2], right_on=['color', idx1])
.drop(columns='key_1')
)
變體:
s.index.name="color"
out = (df
.assign(idx=df.groupby('color').cumcount())
.merge(s.reset_index(name='number')
.assign(idx=s.groupby(level=0).cumcount().values),
left_on=['color', 'idx'], right_on=['color', 'idx'])
.drop(columns='idx')
)
輸出:
color id number
0 red 1 10
1 green 2 30
2 blue 3 11
3 blue 4 12
4 red 5 20
5 green 6 34
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.