简体   繁体   中英

Appending strings and adding to dataframe column occurs “ufunc 'add' did not contain a loop with signature matching types”-error

i want to append some strings and add it to a new column in the dataframe. The first code snipped works and when I try the second one it fails with this error:

numpy.core._exceptions.UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U21'), dtype('<U21')) -> dtype('<U21')

The only difference is that the second code snippet includes another string _ .

Does anyone has a suggestion why this error occurs?

First code snipped:

df["identifier"]=df.index.get_level_values(0).values.astype(str) + df["mother tongue iso636-3"].astype(str)+ '_' + df["country iso3166-2"].astype(str)

Second code snipped:

df["identifier"]=df.index.get_level_values(0).values.astype(str) + '_' + df["mother tongue iso636-3"].astype(str)+ '_' + df["country iso3166-2"].astype(str)

I've had a similar error when adding string-casted series.

This lambda solution will be slightly slower but will ensure the added values are STR type.

df['level_values'] = df.index.get_level_values(0).values
df["identifier"] = df.apply(lambda x: \
    str(x['level_values']) + '_' + str(x["mother tongue iso636-3"]) \
    + '_' + str(df["country iso3166-2"]),axis=1)
df.drop('level_values',inplace=True)

The problem what occurs the exception is the index-function

df.index.get_level_values(0).values.astype(str)

If you first add it as a column to the dataframe and then use the column instead of the function, the problem will not occur anymore:

df['index'] = df.index.get_level_values(0).values
df["identifier"]=df['index'].astype(str) + '_' + df["mother tongue iso636-3"].astype(str)+ '_' + df["country iso3166-2"].astype(str)

I'm surprised that your first case works. But it might help if you gave a simple version of your dataframe.

Since you didn't do that, I'll have to make up one:(

In [321]: df = pd.DataFrame([[1,'foo'],[2,'bar']])                                                     
In [322]: df                                                                                           
Out[322]: 
   0    1
0  1  foo
1  2  bar

First look at the index :

In [323]: df.index.values                                                                              
Out[323]: array([0, 1])            # numeric in my case
In [324]: df.index.values.astype(str)                                                                  
Out[324]: array(['0', '1'], dtype='<U21')    # numpy dtype
In [325]: df.index.values.astype(str)+'_'                                                              
---------------------------------------------------------------------------
UFuncTypeError                            Traceback (most recent call last)
<ipython-input-325-230387b2895a> in <module>
----> 1 df.index.values.astype(str)+'_'

UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U21'), dtype('<U21')) -> dtype('<U21')

+/add is not defined for numpy strings

Now look at the string column:

In [330]: df[1].values                                                                                 
Out[330]: array(['foo', 'bar'], dtype=object)  # pandas uses python strings

converting that array to numpy str produces the same error:

In [331]: df[1].values.astype(str)                                                                     
Out[331]: array(['foo', 'bar'], dtype='<U3')
In [332]: df.index.values.astype(str)+df[1].values.astype(str)                                         
---------------------------------------------------------------------------
UFuncTypeError                            Traceback (most recent call last)
<ipython-input-332-7bc2436a1bf8> in <module>
----> 1 df.index.values.astype(str)+df[1].values.astype(str)

UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U21'), dtype('<U21')) -> dtype('<U21')

That's why I wonder why your first case runs.

If I leave the object dtype strings as is:

In [333]: df.index.values.astype(str)+df[1].values                                                     
Out[333]: array(['0foo', '1bar'], dtype=object)

numpy converts the index array to object dtype (the common dtype), and does element by element + , which for python strings is concatenation.

Applying that idea to the case with a '_':

In [334]: df.index.values.astype(str).astype(object)+'_'+df[1].values                                  
Out[334]: array(['0_foo', '1_bar'], dtype=object)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM