简体   繁体   中英

Trying to Pass Pandas DataFrame to a Function and Return a Modified DataFrame

I'm trying to pass different pandas dataframes to a function that does some string modification (usually str.replace operation on columns based on mapping tables stored in CSV files) and return the modified dataframes. And I'm encountering errors especially with handling the dataframe as a parameter.

The mapping table in CSV is structured as follows:

From(Str) To(Str) Regex(True/False)
A A2
B B2
CD (.*) FG CD FG True

My code looks as something like this:

def apply_mapping_table (p_df, p_df_col_name, p_mt_name):

    df_mt = pd.read_csv(p_mt_name)

    for index in range(df_mt.shape[0]):
        # If regex is true
        if df_mt.iloc[index][2] is True:
         # perform regex replacing
            df_p[p_df_col_name] = df_p[p_df_col_name].replace(to_replace=df_mt.iloc[index][0], value = df_mt.iloc[index][1], regex=True)
        else:
            # perform normal string replacing
            p_df[p_df_col_name] = p_df[p_df_col_name].replace(df_mt.iloc[index][0], df_mt.iloc[index][1])

    return df_p

df_new1 = apply_mapping_table1(df_old1, 'Target_Column1', 'MappingTable1.csv')
df_new2 = apply_mapping_table2(df_old2, 'Target_Column2', 'MappingTable2.csv')

I'm getting 'IndexError: single positional indexer is out-of-bounds' for 'df_mt.iloc[index][2]' and haven't gone to the portion where the actual replacement is happening. Any suggestions to make it work or even a better way to do the dataframe string replacements based on mapping tables?

You can use the .iterrows() function to iterate through lookup table rows. Generally, the .iterrows() function is slow, but in this case because the lookup table should be a small manageable table it will be completely fine.

You can adapt your give function as I did in the following snippet:

def apply_mapping_table (p_df, p_df_col_name, p_mt_name):
    df_mt = pd.read_csv(p_mt_name)
    for _, row in df_mt.iterrows():
        # If regex is true
        if row['Regex(True/False)']:
            # perform regex replacing
            df_p[p_df_col_name] = df_p[p_df_col_name].replace(to_replace=row['From(Str)'], value=row['To(Str)'], regex=True)
        else:
            # perform normal string replacing
            p_df[p_df_col_name] = p_df[p_df_col_name].replace(row['From(Str)'], row['To(Str)'])
    return df_p

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM