简体   繁体   中英

Insert value based on row index number in a pandas dataframe

I need to insert value into a column based on row index of a pandas dataframe.

import pandas as pd
df=pd.DataFrame(np.random.randint(0,100,size=(11, 4)), columns=list('ABCD'))
df['ticker']='na'
df

Sample DataFrame In the above sample dataframe, the ticker column for first 25% of the total number of records must have value '$" the next 25% of the records must have value "$$" and so on.

I tried to get the length of the dataframe and calculate 25,50,75 percent on it and then access one row at a time and assign value to "ticker" based on row index.

total_row_count=len(df)
row_25 = int(total_row_count * .25)
row_50 = int(total_row_count * .5)
row_75=int(total_row_count*.75)

if ((row.index >=0) and (row.index<=row_25)):
    return"$"
elif ((row.index > row_25) and (row.index<=row_50)):
    return"$$"
elif ((row.index > row_50) and (row.index<=row_75)):
    return"$$$"
elif (row.index > row_75):
    return"$$$$"

But I'm not able to get the row index. Please let me know if there is a different way to assign these values

I think cut can solve this problem

df['ticker']=pd.cut(np.arange(len(df))/len(df), [-np.inf,0.25,0.5,0.75,1], labels=["$","$$",'$$$','$$$$'],right=True)
df
Out[35]: 
     A   B   C   D ticker
0   63  51  19  33      $
1   12  80  57   1      $
2   53  27  62  26      $
3   97  43  31  80     $$
4   91  22  92  11     $$
5   39  70  82  26     $$
6   32  62  17  75    $$$
7    5  59  79  72    $$$
8   75   4  47   4    $$$
9   43   5  45  66   $$$$
10  29   9  74  94   $$$$

I like to use np.select for this kind of task, because I find the syntax intuitive and readable:

# Set up your conditions:
conds = [(df.index >= 0) & (df.index <= row_25),
         (df.index > row_25) & (df.index<=row_50),
         (df.index > row_50) & (df.index<=row_75),
         (df.index > row_75)]

# Set up your target values (in the same order as your conditions)
choices = ['$', '$$', '$$$', '$$$$']

# Assign df['ticker']
df['ticker'] = np.select(conds, choices)

returns this:

>>> df
     A   B   C   D ticker
0   92  97  25  79      $
1   76   4  26  94      $
2   49  65  19  91      $
3   76   3  83  45     $$
4   83  16   0  16     $$
5    1  56  97  44     $$
6   78  17  18  86    $$$
7   55  56  83  91    $$$
8   76  16  52  33    $$$
9   55  35  80  95   $$$$
10  90  29  41  87   $$$$

You can set up a few np.where statements to handle this. Try something like the following:

import numpy as np
...
df['ticker'] = np.where(df.index < row_25, "$", df['ticker'])
df['ticker'] = np.where(row_25 <= df.index < row_50, "$$", df['ticker'])
df['ticker'] = np.where(row_50 <= df.index < row_75, "$$$", df['ticker'])
df['ticker'] = np.where(row_75 <= df.index, "$$$$", df['ticker'])

This is one explicit solution using .loc accessor.

import pandas as pd

df = pd.DataFrame(np.random.randint(0,100,size=(11, 4)), columns=list('ABCD'))
n = len(df.index)

df['ticker'] = 'na'
df.loc[df.index <= n/4, 'ticker'] = '$'
df.loc[(n/4 < df.index) & (df.index <= n/2), 'ticker'] = '$$'
df.loc[(n/2 < df.index) & (df.index <= n*3/4), 'ticker'] = '$$$'
df.loc[df.index > n*3/4, 'ticker'] = '$$$$'

#      A   B   C   D ticker
# 0   47  64   7  46      $
# 1   53  55  75   3      $
# 2   93  95  28  47      $
# 3   35  88  16   7     $$
# 4   99  66  88  84     $$
# 5   75   2  72  90     $$
# 6    6  53  36  92    $$$
# 7   83  58  54  67    $$$
# 8   49  83  46  54    $$$
# 9   69   9  96  73   $$$$
# 10  84  42  11  83   $$$$

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM