[英]Pandas separate a dataframe based on percentage of a column's sum
Suppose I have a dataframe sorted from smallest to largest over ratio column, as below: (actual dataframe has thousands of rows) 假设我有一个数据框,从最小到最大超比例列进行排序,如下所示:(实际数据框有数千行)
identifier total ratio
1 15 0.21
2 500 0.21
3 70 0.56
4 200 0.75
5 540 0.99
and a cutoff value of: 截止值为:
cutoff = .3
and at the end I want two csv files, one with the 30% of the sum of total with the lowest ratio (type1.csv), and one with the remaining 70% (type2.csv) 最后,我想要两个csv文件,一个文件占总和的30%,比率最低(type1.csv),另一个文件占70%(type2.csv)
So far I have tried just taking the first 30% of the rows, as well as trying to multiply the 到目前为止,我尝试仅获取行的前30%,并尝试将
total * ratio
and sorting on that new column, neither resulted in the correct lists at the end... 并在该新列上进行排序,都没有在最后产生正确的列表...
How to I assign weights for the total column's value, but then cut on the ratio column? 如何为总列的值分配权重,然后在比率列上削减?
Like this? 像这样?
cols = ['identifier', 'total', 'ratio']
data = [
[1 ,15 , 0.21],
[2 ,500 ,0.21],
[3 ,70 ,0.56],
[4 ,200 ,0.75],
[5 ,540 ,0.99]
]
import pandas as pd
df = pd.DataFrame(data=data, columns=cols)
df['s']=(df.total*df.ratio).cumsum()
df['cutoff']=df.s/df.s.iloc[-1]
type1 = df[df['cutoff'] < 0.3]
type1[['identifier', 'total', 'ratio']].to_csv(index=False, path_or_buf='type1.csv')
type2 = df[df['cutoff'] >= 0.3]
type2[['identifier', 'total', 'ratio']].to_csv(index=False, path_or_buf='type2.csv')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.