熊猫根据列总和的百分比分隔数据框

Question

Suppose I have a dataframe sorted from smallest to largest over ratio column, as below: (actual dataframe has thousands of rows) 假设我有一个数据框，从最小到最大超比例列进行排序，如下所示：（实际数据框有数千行）

identifier total ratio
1          15     0.21
2          500    0.21
3          70     0.56
4          200    0.75
5          540    0.99

and a cutoff value of: 截止值为：

cutoff = .3

and at the end I want two csv files, one with the 30% of the sum of total with the lowest ratio (type1.csv), and one with the remaining 70% (type2.csv) 最后，我想要两个csv文件，一个文件占总和的30％，比率最低（type1.csv），另一个文件占70％（type2.csv）

So far I have tried just taking the first 30% of the rows, as well as trying to multiply the 到目前为止，我尝试仅获取行的前30％，并尝试将

total * ratio

and sorting on that new column, neither resulted in the correct lists at the end... 并在该新列上进行排序，都没有在最后产生正确的列表...

How to I assign weights for the total column's value, but then cut on the ratio column? 如何为总列的值分配权重，然后在比率列上削减？

Answer 1

Like this? 像这样？

cols = ['identifier', 'total', 'ratio']

data = [
[1          ,15    , 0.21],
[2          ,500    ,0.21],
[3          ,70     ,0.56],
[4          ,200    ,0.75],
[5          ,540    ,0.99]
]
import pandas as pd
df = pd.DataFrame(data=data, columns=cols)

df['s']=(df.total*df.ratio).cumsum()
df['cutoff']=df.s/df.s.iloc[-1]

type1 = df[df['cutoff'] < 0.3]
type1[['identifier', 'total', 'ratio']].to_csv(index=False, path_or_buf='type1.csv')


type2 = df[df['cutoff'] >= 0.3]
type2[['identifier', 'total', 'ratio']].to_csv(index=False, path_or_buf='type2.csv')

熊猫根据列总和的百分比分隔数据框

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-01-03 20:16:21

熊猫根据列总和的百分比分隔数据框

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-01-03 20:16:21

解决方案1
0 已采纳 2019-01-03 20:16:21