[英]How to split sort a column using the first row's value?
My dataset df
looks like this: 我的数据集df
如下所示:
time Open
2017-01-01 2.2475
2017-01-02 3.2180
2017-01-03 5.2128
2017-01-04 1.2128
2017-01-05 2.2128
...., ....
2017-12-31 6.7388
I want to sort the Open
column but by comparing the first ROW
value in increasing order. 我想对“ Open
列进行排序,但要通过按递增顺序比较第一个ROW
值来进行。
We will have 1st
row value always on the top( 1st row
) and then sort starting the second row by comparing to 1st
row to the closest value in increasing order. 我们将始终在顶部( 1st row
)具有1st
行值,然后通过将1st
行与最接近的值进行比较(以递增顺序)来对第二行进行排序。 All the low
value is kept on the bottom. 所有low
都保留在底部。 Eg: 1.2128
例如: 1.2128
[OP seeks a method where values greater than the first row in a select column should appear sequentially and ascending from row 2 to row n, and values less than the first row should then come after n (all of the preceding values).] [OP寻求一种方法,其中大于选择列中第一行的值应顺序出现并从第2行升至第n行,然后小于第一行的值应在n之后(所有先前的值)。
For example, the new df
would be: 例如,新的df
将是:
time Open
2017-01-01 2.2475
2017-01-05 2.2128
2017-01-02 3.2180
2017-01-03 5.2128
...., ....
2017-12-31 6.7388
2017-01-04 1.2128
What did I do? 我做了什么
I can sort by column doing this: 我可以按列进行排序:
df.sort_values(by='Open', ascending=False)
but that is by column
. 但这是按column
。 Now how do I sort by first ROW
value, which is 2.2475
现在如何按第一个ROW
值2.2475
IIUC, given a df
: IIUC,给定df
:
time Open
0 2017-01-01 2.2475
1 2017-01-02 3.2180
2 2017-01-03 5.2128
3 2017-01-04 1.2128
4 2017-01-05 2.2128
5 2017-12-31 6.7388
OP wants to sort as row_0
, (rows greater than row_0)
, ( rows smaller than row_0)
: This can be achieved using difference between each row and row_0: OP希望排序为row_0
, (rows greater than row_0)
,( rows smaller than row_0)
:这可以通过使用每行与row_0之间的差异来实现:
s = df['Open'].sub(df['Open'][0]).to_dict()
df.iloc[sorted(s, key = lambda x: s.get(x) < 0)]
Output: 输出:
time Open
0 2017-01-01 2.2475
1 2017-01-02 3.2180
2 2017-01-03 5.2128
5 2017-12-31 6.7388
3 2017-01-04 1.2128
4 2017-01-05 2.2128
OP is after a method where the first row of a DataFrame column is used as a baseline for a split method of column sorting: values greater than this first row should appear sequentially and ascending from row 2 to row n, and values less than the first row should then come after n (all of the preceding values). OP在使用DataFrame列的第一行作为列排序拆分方法的基准的方法之后:大于此第一行的值应顺序出现并从第2行升至第n行,而小于第一个值然后,该行应排在n(所有前述值)之后。
This can be achieved by the following function: 这可以通过以下功能实现:
df = pd.DataFrame({'time': ['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04', '2017-01-05', '2017-01-06'],
'Open': [2.24, 1.21, 1.51, 3.21, 5.21, 6.21]})
def pin_row_and_sort(f):
values_above = f.loc[f['Open'] >= f['Open'].iloc[0]].sort_values(by='Open')
values_below = f.loc[f['Open'] < f['Open'].iloc[0]].sort_values(by='Open')
return pd.concat([values_above, values_below])
new_frame = pin_row_and_sort(df)
I'd be keen to see any improvements/suggestions on this method. 我很想看到这种方法的任何改进/建议。 Or just down-vote without explaining why :) 或者只是不投票解释原因:)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.