"Pandas 应用函数将输入视为一系列而不是传递的系列中的单个单元格"

Question

I am working in a Jupyter notebook to do some string comparisons between two dataframes and I have run into a confusing issue.我正在使用 Jupyter 笔记本在两个数据帧之间进行一些字符串比较，但遇到了一个令人困惑的问题。

I wrote a simple function to remove all of the stop words and punctuation from a string and when I try to apply it to a column in pandas (as in iterate over all of the indexes for a given column) it instead passes the entire column to the function and outputs garbage.我编写了一个简单的函数来删除字符串中的所有停用词和标点符号，当我尝试将其应用于 pandas 中的列时（如迭代给定列的所有索引），它改为将整个列传递给函数并输出垃圾。

Here is the statement that is causing problems:这是导致问题的语句：

bank_exp['Description'] = bank_exp.apply(lambda row : preprocess_cell(bank_exp['Description'], stopwords_all), axis = 1)

The preprocess function in pseudocode works something like:伪代码中的预处理函数的工作方式如下：

def preprocess_cell(string, stopwords_set):
    # Remove punctuation from the string
    string = remove_punc()
    # Filter the string
    filtered_sentence = filt_str(string, stopwords_set)
    # Convert list back into string
    filtered_cell = re_format(filtered_sentence)
    
    return filtered_cell

For reference, my table looks something like this originally: base table format作为参考，我的表最初看起来像这样：基表格式

And when I run the code currently I get an output like this: partial traceback当我当前运行代码时，我得到这样的输出： partial traceback

I have been staring at this for a bit now so any ideas would be greatly appreciated.我已经盯着这个看了一会儿，所以任何想法都将不胜感激。

Answer 1

You have to pass the row, not the entire column:您必须传递行，而不是整列：

bank_exp['Description'] = bank_exp.apply(lambda row : preprocess_cell(row['Description'], stopwords_all), axis = 1)

"Pandas 应用函数将输入视为一系列而不是传递的系列中的单个单元格"

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-02-06 06:46:08

"Pandas 应用函数将输入视为一系列而不是传递的系列中的单个单元格"

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-02-06 06:46:08

解决方案1
1 已采纳 2022-02-06 06:46:08