简体   繁体   中英

Pandas: Calculating a value in a separate data frame column frame based on range of values in another data frame column (python)

I'm using python 3.9, and I'm trying to calculate an output value in another dataframe column based on a range of values in another column.

For instance, in df['a'] , I have integers between 0 and 50, in no particular order.

I am trying to create another column named df['output_column'] in that same dataframe based on an if statement.

import pandas as pd
import numpy as np

p = 'a'

if df[p] in range(0, 7):
    df['output_column'] = 95
elif df[p] in range(8, 14):
    df['output_column'] = 90
elif df[p] in range(15, 21):
    df['output_column'] = 85
elif df[p] in range(22, 28):
    df['output_column'] = 80
else:
    df['output_column'] = 75

However, I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [18], in <module>
      1 p = 'a'
----> 3 if df[p] in range(0, 7):
      4     df['output_column'] = 95
      5 elif df[p] in range(8, 14):

File ~\path_to_pandas\pandas\core\generic.py:1535, in NDFrame.__nonzero__(self)
   1533 @final
   1534 def __nonzero__(self):
-> 1535     raise ValueError(
   1536         f"The truth value of a {type(self).__name__} is ambiguous. "
   1537         "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1538     )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

How can I correct this?

You can use pd.cut to do this:

df['output'] = pd.cut(df[p], 
                      bins=[-np.inf,8,15,22,29,np.inf], 
                      labels=[95,90,85,80,75]).astype(int)

You can set your ranges with.bewteen() and then populate your new output_column with np.select().

import pandas as pd
import numpy as np

ranges = [df['a'].between(0, 6),
          df['a'].between(7, 13), df['a'].between(14, 20),
          df['a'].between(21, 27), df['a'].between(28, 999)]

values = [95,90, 85, 80, 75]

df['output_column'] = np.select(ranges, values)
df["output_column"] = 95
df.loc[df[p]>=8, "output_column"] = 90
df.loc[df[p]>=15, "output_column"] = 85
df.loc[df[p]>=22, "output_column"] = 80
df.loc[df[p]>=29, "output_column"] = 75

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM