简体   繁体   中英

I am trying to apply a function to a column of a DataFrame but get error of a loop with signature matching

Hello I am trying to run the following code:

def f(df):
    new = pd.Series(df)

    i = new.str.lower() \
        .str.replace('[^a-z\s]', '') \
        .str.split(expand=True) \
        .stack()

    # generate bigrams by concatenating unigram columns
    j = i + ' ' + i.shift(-1)
    digrams = []
    for k in j[:]:
        k=str(k)
        k = k.split(" ")
        s = "_".join(k)
        digrams.append(s)

    return pd.Series(digrams)

df = pd.read_csv("labeled_new.csv")

#vectorize documents
df["abstract_text_x"]=df["abstract_text_x"].apply(f)

So df is a DataFrame with several columns and rows, I am trying to apply function f to only a column, abstract_text_x of the dataframe. This column contains a text which is a strin format. The function f will create bigrams and join the words with "_". The function works, the problem is when I am trying to return the result of f to the dataframe. I get the following error:

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')

What does that mean? How could I fix it?

That error emerges when your code is expecting an integer type but receives a string or something else instead. Try adding a str() type conversion to variable s when you append it to the digrams. Also ensure the data type of "abstract_text_x" is string and not an object or an array. Basically, just iterate through your data and check through your datatypes, you'll find something that doesn't match up.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM