Pandas apply function treating input as a series rather than a single cell in the passed series

Question

I am working in a Jupyter notebook to do some string comparisons between two dataframes and I have run into a confusing issue.

I wrote a simple function to remove all of the stop words and punctuation from a string and when I try to apply it to a column in pandas (as in iterate over all of the indexes for a given column) it instead passes the entire column to the function and outputs garbage.

Here is the statement that is causing problems:

bank_exp['Description'] = bank_exp.apply(lambda row : preprocess_cell(bank_exp['Description'], stopwords_all), axis = 1)

The preprocess function in pseudocode works something like:

def preprocess_cell(string, stopwords_set):
    # Remove punctuation from the string
    string = remove_punc()
    # Filter the string
    filtered_sentence = filt_str(string, stopwords_set)
    # Convert list back into string
    filtered_cell = re_format(filtered_sentence)
    
    return filtered_cell

For reference, my table looks something like this originally: base table format

And when I run the code currently I get an output like this: partial traceback

I have been staring at this for a bit now so any ideas would be greatly appreciated.

Answer 1

You have to pass the row, not the entire column:

bank_exp['Description'] = bank_exp.apply(lambda row : preprocess_cell(row['Description'], stopwords_all), axis = 1)

Pandas apply function treating input as a series rather than a single cell in the passed series

Question

1 answers

solution1
1 ACCPTED 2022-02-06 06:46:08

Pandas apply function treating input as a series rather than a single cell in the passed series

Question

1 answers

solution1 1 ACCPTED 2022-02-06 06:46:08

solution1
1 ACCPTED 2022-02-06 06:46:08