i want to append some strings and add it to a new column in the dataframe. The first code snipped works and when I try the second one it fails with this error:
numpy.core._exceptions.UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U21'), dtype('<U21')) -> dtype('<U21')
The only difference is that the second code snippet includes another string _
.
Does anyone has a suggestion why this error occurs?
First code snipped:
df["identifier"]=df.index.get_level_values(0).values.astype(str) + df["mother tongue iso636-3"].astype(str)+ '_' + df["country iso3166-2"].astype(str)
Second code snipped:
df["identifier"]=df.index.get_level_values(0).values.astype(str) + '_' + df["mother tongue iso636-3"].astype(str)+ '_' + df["country iso3166-2"].astype(str)
I've had a similar error when adding string-casted series.
This lambda solution will be slightly slower but will ensure the added values are STR type.
df['level_values'] = df.index.get_level_values(0).values
df["identifier"] = df.apply(lambda x: \
str(x['level_values']) + '_' + str(x["mother tongue iso636-3"]) \
+ '_' + str(df["country iso3166-2"]),axis=1)
df.drop('level_values',inplace=True)
The problem what occurs the exception is the index-function
df.index.get_level_values(0).values.astype(str)
If you first add it as a column to the dataframe and then use the column instead of the function, the problem will not occur anymore:
df['index'] = df.index.get_level_values(0).values
df["identifier"]=df['index'].astype(str) + '_' + df["mother tongue iso636-3"].astype(str)+ '_' + df["country iso3166-2"].astype(str)
I'm surprised that your first case works. But it might help if you gave a simple version of your dataframe.
Since you didn't do that, I'll have to make up one:(
In [321]: df = pd.DataFrame([[1,'foo'],[2,'bar']])
In [322]: df
Out[322]:
0 1
0 1 foo
1 2 bar
First look at the index
:
In [323]: df.index.values
Out[323]: array([0, 1]) # numeric in my case
In [324]: df.index.values.astype(str)
Out[324]: array(['0', '1'], dtype='<U21') # numpy dtype
In [325]: df.index.values.astype(str)+'_'
---------------------------------------------------------------------------
UFuncTypeError Traceback (most recent call last)
<ipython-input-325-230387b2895a> in <module>
----> 1 df.index.values.astype(str)+'_'
UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U21'), dtype('<U21')) -> dtype('<U21')
+/add
is not defined for numpy strings
Now look at the string column:
In [330]: df[1].values
Out[330]: array(['foo', 'bar'], dtype=object) # pandas uses python strings
converting that array to numpy str produces the same error:
In [331]: df[1].values.astype(str)
Out[331]: array(['foo', 'bar'], dtype='<U3')
In [332]: df.index.values.astype(str)+df[1].values.astype(str)
---------------------------------------------------------------------------
UFuncTypeError Traceback (most recent call last)
<ipython-input-332-7bc2436a1bf8> in <module>
----> 1 df.index.values.astype(str)+df[1].values.astype(str)
UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U21'), dtype('<U21')) -> dtype('<U21')
That's why I wonder why your first case runs.
If I leave the object dtype strings as is:
In [333]: df.index.values.astype(str)+df[1].values
Out[333]: array(['0foo', '1bar'], dtype=object)
numpy
converts the index
array to object dtype (the common dtype), and does element by element +
, which for python strings is concatenation.
Applying that idea to the case with a '_':
In [334]: df.index.values.astype(str).astype(object)+'_'+df[1].values
Out[334]: array(['0_foo', '1_bar'], dtype=object)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.