简体   繁体   中英

pandas groupby column values and replace grouped values in another column

I have a dataframe like this:

Ticker instrument_name year month instrument_type expiry_type
ABAN10SEPFUT ABAN 10 SEP FUT NaN
ABAN10OCTFUT ABAN 10 OCT FUT NaN
ABAN10NOVFUT ABAN 10 NOV FUT NaN

I want to groupby instrument_type ('FUT') & find unique values in month . Then compare unique values with month column and replace values with 'I','II','III' in the expiry_type column.

Result expected:

Ticker instrument_name year month instrument_type expiry_type
ABAN10SEPFUT ABAN 10 SEP FUT I
ABAN10OCTFUT ABAN 10 OCT FUT II
ABAN10NOVFUT ABAN 10 NOV FUT III

My code look like #1

def condition(x):
if x =='SEP':
    return "I"
elif x =='OCT':
    return "II"
elif x =='NOV':
    return "III"
else:
    return ''

#2

for index, row in path.iterrows():
    data = pd.read_parquet(row['location'])
    data['expiry_type'] = np.where((data['instrument_type'] == 'FUT'),data['month'].apply(condition),'')

Since I already know the unique values in month column so I created a custom function to replace values in expiry_type column. I have similar files like this so is there a way to find unique values and replace automatically. How do I do that? Thank you in advance!

Considering you have grouped by instrument_type , you could build a function like the one in #1:

def condition(x):
    if x.month =='SEP':
        return "I"
    elif x.month =='OCT':
        return "II"
    elif x.month =='NOV':
        return "III"
    else:
        return ''

And apply this function to the expiry_type column:

df['expiry_type'] = df.apply(condition, axis = 1).

You can find the unique values in a column by using Pandasunique function. Using a for loop for each DataFrame you have, apply the unique function over the month column to obtain a list of unique values. Then, create a dictionary using those values as keys and the new representation (roman numbers in this particular example) as values. You can then use the map function to replace the values in the month column and assign the new ones to the expiry_type column.

def toRoman(n):
    roman = ['I', 'II', 'III', 'IV', 'V', 'VI', 'VII', 'VIII', 'IX', 'X', 'XI', 'XII']
    return roman[n]

df_list = ['df1.csv', 'df2.csv', 'df3.csv']
for df_file in df_list:
    df = pd.read_csv(df_file)
    g = df.groupby('instrument_type')
    uniq = g['month'].unique()[0]
    # create a dictionary using the unique values
    dict_map = {name:toRoman(idx) for idx,name in enumerate(uniq)}
    df['expiry_type'] = df['month'].map(dict_map)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM