![](/img/trans.png)
[英]How to add new rows to a dataframe based on ranges of two columns in the same dataframe?
[英]How to add rows to a dataframe based on the diff of two columns
我正在努力解決這個問題。
讓我們假設一個看起來像這樣的數據框:
df = pd.DataFrame({'col0':['string1', 'string2'],
'col1':['some string','another string'],
'start':[100,1],
'end':[107,5]})
col0 col1 start end
0 string1 some string 100 107
1 string2 another string 1 5
目標是找到start
和end
之間的差異,然后將那么多行添加到我的數據ffill
, ffill
其余的列,並為start
和end
之間的范圍添加累積計數。 預期輸出如下:
df2 = pd.DataFrame({'col0':['string1']*8,
'col1':['some string']*8,
'new_col':[x for x in range(100,108)]})
df3 = pd.DataFrame({'col0':['string2']*5,
'col1':['another string']*5,
'new_col':[x for x in range(1,6)]})
output = pd.concat([df2,df3]).reset_index(drop=True)
col0 col1 new_col
0 string1 some string 100
1 string1 some string 101
2 string1 some string 102
3 string1 some string 103
4 string1 some string 104
5 string1 some string 105
6 string1 some string 106
7 string1 some string 107
8 string2 another string 1
9 string2 another string 2
10 string2 another string 3
11 string2 another string 4
12 string2 another string 5
我的第一個雖然是創建一個新的數據框......類似於:
vals = list(zip(df['start'], df['end']+1))
pd.concat([pd.DataFrame([i], columns=['new_col']) for val in vals for i in range(*val)])
但這似乎效率很低,我正在努力添加剩余的數據。
1st 使用 for 循環和range
創建列表列,然后問題變得解除嵌套
df['New']=[list(range(y,x+1)) for x , y in zip(df.pop('end'),df.pop('start'))]
unnesting(df,['New'])
New col0 col1
0 100 string1 some string
0 101 string1 some string
0 102 string1 some string
0 103 string1 some string
0 104 string1 some string
0 105 string1 some string
0 106 string1 some string
0 107 string1 some string
1 1 string2 another string
1 2 string2 another string
1 3 string2 another string
1 4 string2 another string
1 5 string2 another string
供參考
def unnesting(df, explode):
idx=df.index.repeat(df[explode[0]].str.len())
df1=pd.concat([pd.DataFrame({x:np.concatenate(df[x].values)} )for x in explode],axis=1)
df1.index=idx
return df1.join(df.drop(explode,1),how='left')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.