简体   繁体   中英

Can Pandas perform row-wise min() and max() functions?

In my DataFrame I wish to clip the value of a particular column between 0 and 100. For instance, given the following:

  a  b
0 10 90
1 20 150
2 30 -30

I want to get:

  a  b   c
0 10 90  90
1 20 150 100
2 30 -30 0

I know that in Pandas certain arithmetic operations work across columns. For instance, I could double every number in column b like so:

>>>df["c"] = df["b"] * 2
>>>df
  a  b   c
0 10 90  180
1 20 150 300
2 30 -30 -60

However this doesn't work for built-in functions like min and max :

>>>df["c"] = min(100, max(0, df["b"]))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Is there some way to accomplish what I want efficiently?

You can use Series.clip :

df['c'] = df['b'].clip(0,100)
print (df)
    a    b    c
0  10   90   90
1  20  150  100
2  30  -30    0

You can use the Pandas min function across an axis. Then combine it with min/max

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.min.html

For example

df.max(axis=1)

But it looks like you want to clip the values instead of min/max.

A numpy view. Not as elegant as clip .

Option 1

df.assign(c=np.minimum(np.maximum(df.b.values, 0), 100))

    a    b    c
0  10   90   90
1  20  150  100
2  30  -30    0

Option 2

b = df.b.values
df.assign(c=np.where(b > 100, 100, np.where(b < 0, 0, b)))

    a    b    c
0  10   90   90
1  20  150  100
2  30  -30    0

Timing
Code Below

res.div(res.min(1), 0)

            pir1  pir2       jez1
10     30.895514   1.0  75.210427
30     28.611177   1.0  49.913498
100    20.658307   1.0  50.823106
300    19.842134   1.0  39.162901
1000   14.078159   1.0  25.148937
3000    8.767133   1.0  15.066847
10000   4.377849   1.0   8.849138
30000   2.634263   1.0   4.653956

在此输入图像描述

res = pd.DataFrame(
    index=[10, 30, 100, 300, 1000, 3000, 10000, 30000],
    columns=['pir1', 'pir2', 'jez1'],
    dtype=float
)

jez1 = lambda d: d.assign(c=df.b.clip(0, 1))
pir1 = lambda d: d.assign(c=np.minimum(np.maximum(d.b.values, 0), 100))
pir2 = lambda d: (lambda b: np.where(b > 100, 100, np.where(b < 0, 0, b)))(d.b.values)

for i in res.index:
    d = pd.concat([df] * i, ignore_index=True)
    for j in res.columns:
        stmt = '{}(d)'.format(j)
        setp = 'from __main__ import d, {}'.format(j)
        res.at[i, j] = timeit(stmt, setp, number=10)

res.plot(loglog=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM