[英]How to create a column that contains the penultimate value of each row?
I have a DataFrame and I need to create a new column which contains the second largest value of each row in the original Dataframe.我有一个 DataFrame,我需要创建一个新列,其中包含原始 Dataframe 中每一行的第二大值。
Sample:样本:
df = pd.DataFrame(np.random.randint(1,100, 80).reshape(8, -1))
Desired output:所需的 output:
0 1 2 3 4 5 6 7 8 9 penultimate
0 52 69 62 7 20 69 38 10 57 17 62
1 52 94 49 63 1 90 14 76 20 84 90
2 78 37 58 7 27 41 27 26 48 51 58
3 6 39 99 36 62 90 47 25 60 84 90
4 37 36 91 93 76 69 86 95 69 6 93
5 5 54 73 61 22 29 99 27 46 24 73
6 71 65 45 9 63 46 4 93 36 18 71
7 85 7 76 46 65 97 64 52 28 80 85
How can this be done in as little code as possible?如何用尽可能少的代码完成这项工作?
You could use NumPy
for this:您可以为此使用NumPy
:
import numpy as np
df = pd.DataFrame(np.random.randint(1,100, 80).reshape(8, -1))
df['penultimate'] = np.sort(df.values, 1)[:, -2]
print(df)
Using NumPy
is faster.使用NumPy
更快。
Here is a simple lambda function!这是一个简单的 lambda 函数!
# Input
df = pd.DataFrame(np.random.randint(1,100, 80).reshape(8, -1))
# Output
out = df.apply(lambda x: x.sort_values().unique()[-2], axis=1)
df['penultimate'] = out
print(df)
Cheers!干杯!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.