简体   繁体   中英

(Python, DataFrame): Add a Column and insert the nth smallest value in the row

How do I find the nth smallest number in a row, within a DataFrame, and add that value as an entry in a new column (because I would ultimately like to export the data). Example Data

在此处输入图片说明

Setup

np.random.seed([3,14159])

df = pd.DataFrame(np.random.randint(10, size=(4, 5)), columns=list('ABCDE'))

   A  B  C  D  E
0  4  8  1  1  9
1  2  8  1  4  2
2  8  2  8  4  9
3  4  3  4  1  5

In all of the following solutions, I assume n = 3

Solution 1
function prt below
Use np.partition to place smallest to the left of a partition and the largest to the right. Then take all to the left and find the max.

df.assign(nth=np.partition(df.values, 3, axis=1)[:, :3].max(1))

   A  B  C  D  E  nth
0  4  8  1  1  9    4
1  2  8  1  4  2    2
2  8  2  8  4  9    8
3  4  3  4  1  5    4

Solution 2
function srt below
More intuitive but more costly time complexity with np.sort

df.assign(nth=np.sort(df.values, axis=1)[:, 2])

   A  B  C  D  E  nth
0  4  8  1  1  9    4
1  2  8  1  4  2    2
2  8  2  8  4  9    8
3  4  3  4  1  5    4

Solution 3
function rnk below
Using pd.DataFrame.rank
Concise version that upcast to float

df.assign(nth=df.where(df.rank(1, method='first').eq(3)).stack().values)

   A  B  C  D  E  nth
0  4  8  1  1  9  4.0
1  2  8  1  4  2  2.0
2  8  2  8  4  9  8.0
3  4  3  4  1  5  4.0

Solution 4
function whr below
Using np.where and pd.DataFrame.rank

i, j = np.where(df.rank(1, method='first') == 3)
df.assign(nth=df.values[i, j])

   A  B  C  D  E  nth
0  4  8  1  1  9    4
1  2  8  1  4  2    2
2  8  2  8  4  9    8
3  4  3  4  1  5    4

Timing
Notice that srt is quickest but comparable to prt for a bit, then for larger number of columns, the more efficient algorithm of prt kicks in.

res.plot(loglog=True)

在此处输入图片说明

prt = lambda df, n: df.assign(nth=np.partition(df.values, n, axis=1)[:, :n].max(1))
srt = lambda df, n: df.assign(nth=np.sort(df.values, axis=1)[:, n - 1])
rnk = lambda df, n: df.assign(nth=df.where(df.rank(1, method='first').eq(n)).stack().values)
def whr(df, n):
    i, j = np.where(df.rank(1, method='first').values == n)
    return df.assign(nth=df.values[i, j])

res = pd.DataFrame(
    index=[10, 30, 100, 300, 1000, 3000, 10000],
    columns='prt srt rnk whr'.split(),
    dtype=float
)

for i in res.index:
    num_rows = int(np.log(i))
    d = pd.DataFrame(np.random.rand(num_rows, i))
    for j in res.columns:
        stmt = '{}(d, 3)'.format(j)
        setp = 'from __main__ import d, {}'.format(j)
        res.at[i, j] = timeit(stmt, setp, number=100)

Here is a method that finds nth smallest item in a list:

def find_nth_in_list(list, n):
    return sorted(list)[n-1]

The usage:

list =[10,5,7,9,8,4,6,2,1,3]
print(find_nth_in_list(list, 2))

Output:

2

You can give the row items as a list to this function.

EDIT

You can find rows with this function:

#Returns all rows as a list
def find_rows(df):         
    rows=[]
    for row in df.iterrows():
        index, data = row
        rows.append(data.tolist())
    return rows

Example usage:

rows = find_rows(df)                           #all rows as a list
smallest_3th = find_nth_in_list(rows[2], 3)    #3rd row, 3rd smallest item

You can do this as follows:

df.assign(nth=df.apply(lambda x: np.partition(x, nth)[nth], axis='columns'))

Example:

In[72]: df = pd.DataFrame(np.random.rand(3, 3), index=list('abc'), columns=[1, 2, 3])
In[73]: df
Out[73]: 
          1         2         3
a  0.436730  0.653242  0.843014
b  0.643496  0.854859  0.531652
c  0.831672  0.575336  0.517944

In[74]: df.assign(nth=df.apply(lambda x: np.partition(x, 1)[1], axis='columns'))
Out[74]: 
          1         2         3       nth
a  0.436730  0.653242  0.843014  0.653242
b  0.643496  0.854859  0.531652  0.643496
c  0.831672  0.575336  0.517944  0.575336

generate some random data

dd=pd.DataFrame(data=np.random.rand(7,3))

find minumum value per row using numpy

dd['minPerRow']=dd.apply(np.min,axis=1)

export results

dd['minPerRow'].to_csv('file.csv')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM