简体   繁体   English

(Python,DataFrame):添加一列,然后在行中插入第n个最小值

[英](Python, DataFrame): Add a Column and insert the nth smallest value in the row

How do I find the nth smallest number in a row, within a DataFrame, and add that value as an entry in a new column (because I would ultimately like to export the data). 如何在DataFrame中找到连续第n个最小的数字,并将该值添加为新列中的条目(因为我最终希望导出数据)。 Example Data 示例数据

在此处输入图片说明

Setup 设定

np.random.seed([3,14159])

df = pd.DataFrame(np.random.randint(10, size=(4, 5)), columns=list('ABCDE'))

   A  B  C  D  E
0  4  8  1  1  9
1  2  8  1  4  2
2  8  2  8  4  9
3  4  3  4  1  5

In all of the following solutions, I assume n = 3 在以下所有解决方案中,我假设n = 3

Solution 1 解决方案1
function prt below 下面的功能prt
Use np.partition to place smallest to the left of a partition and the largest to the right. 使用np.partition将最小的分区放置在分区的左侧,将最大的分区放置在右侧。 Then take all to the left and find the max. 然后把所有的都放在左边,找到最大值。

df.assign(nth=np.partition(df.values, 3, axis=1)[:, :3].max(1))

   A  B  C  D  E  nth
0  4  8  1  1  9    4
1  2  8  1  4  2    2
2  8  2  8  4  9    8
3  4  3  4  1  5    4

Solution 2 解决方案2
function srt below 下面的功能srt
More intuitive but more costly time complexity with np.sort 使用np.sort更直观但更昂贵的时间复杂度

df.assign(nth=np.sort(df.values, axis=1)[:, 2])

   A  B  C  D  E  nth
0  4  8  1  1  9    4
1  2  8  1  4  2    2
2  8  2  8  4  9    8
3  4  3  4  1  5    4

Solution 3 解决方案3
function rnk below 下面的功能rnk
Using pd.DataFrame.rank 使用pd.DataFrame.rank
Concise version that upcast to float 简洁版本浮出水面

df.assign(nth=df.where(df.rank(1, method='first').eq(3)).stack().values)

   A  B  C  D  E  nth
0  4  8  1  1  9  4.0
1  2  8  1  4  2  2.0
2  8  2  8  4  9  8.0
3  4  3  4  1  5  4.0

Solution 4 解决方案4
function whr below 功能whr以下
Using np.where and pd.DataFrame.rank 使用np.wherepd.DataFrame.rank

i, j = np.where(df.rank(1, method='first') == 3)
df.assign(nth=df.values[i, j])

   A  B  C  D  E  nth
0  4  8  1  1  9    4
1  2  8  1  4  2    2
2  8  2  8  4  9    8
3  4  3  4  1  5    4

Timing 定时
Notice that srt is quickest but comparable to prt for a bit, then for larger number of columns, the more efficient algorithm of prt kicks in. 请注意, srt是最快的,但与prt相比却有些许,然后,对于更多的列,更有效的prt算法开始了。

res.plot(loglog=True)

在此处输入图片说明

prt = lambda df, n: df.assign(nth=np.partition(df.values, n, axis=1)[:, :n].max(1))
srt = lambda df, n: df.assign(nth=np.sort(df.values, axis=1)[:, n - 1])
rnk = lambda df, n: df.assign(nth=df.where(df.rank(1, method='first').eq(n)).stack().values)
def whr(df, n):
    i, j = np.where(df.rank(1, method='first').values == n)
    return df.assign(nth=df.values[i, j])

res = pd.DataFrame(
    index=[10, 30, 100, 300, 1000, 3000, 10000],
    columns='prt srt rnk whr'.split(),
    dtype=float
)

for i in res.index:
    num_rows = int(np.log(i))
    d = pd.DataFrame(np.random.rand(num_rows, i))
    for j in res.columns:
        stmt = '{}(d, 3)'.format(j)
        setp = 'from __main__ import d, {}'.format(j)
        res.at[i, j] = timeit(stmt, setp, number=100)

Here is a method that finds nth smallest item in a list: 这是一种在列表中找到第n个最小项的方法:

def find_nth_in_list(list, n):
    return sorted(list)[n-1]

The usage: 用法:

list =[10,5,7,9,8,4,6,2,1,3]
print(find_nth_in_list(list, 2))

Output: 输出:

2

You can give the row items as a list to this function. 您可以将行项目作为此功能的列表。

EDIT 编辑

You can find rows with this function: 您可以使用此功能查找行:

#Returns all rows as a list
def find_rows(df):         
    rows=[]
    for row in df.iterrows():
        index, data = row
        rows.append(data.tolist())
    return rows

Example usage: 用法示例:

rows = find_rows(df)                           #all rows as a list
smallest_3th = find_nth_in_list(rows[2], 3)    #3rd row, 3rd smallest item

You can do this as follows: 您可以按照以下步骤进行操作:

df.assign(nth=df.apply(lambda x: np.partition(x, nth)[nth], axis='columns'))

Example: 例:

In[72]: df = pd.DataFrame(np.random.rand(3, 3), index=list('abc'), columns=[1, 2, 3])
In[73]: df
Out[73]: 
          1         2         3
a  0.436730  0.653242  0.843014
b  0.643496  0.854859  0.531652
c  0.831672  0.575336  0.517944

In[74]: df.assign(nth=df.apply(lambda x: np.partition(x, 1)[1], axis='columns'))
Out[74]: 
          1         2         3       nth
a  0.436730  0.653242  0.843014  0.653242
b  0.643496  0.854859  0.531652  0.643496
c  0.831672  0.575336  0.517944  0.575336

generate some random data 产生一些随机数据

dd=pd.DataFrame(data=np.random.rand(7,3))

find minumum value per row using numpy 使用numpy查找每行的最小值

dd['minPerRow']=dd.apply(np.min,axis=1)

export results 出口结果

dd['minPerRow'].to_csv('file.csv')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM