简体   繁体   中英

Is this a pandas bug or feature?

I am trying to fill a pandas series with a constant, provided some condition is met. As a simplified test case I will use the following:

'-'*pd.Series([True]*5, dtype=bool)

This results in:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-89-0e3400ddc239> in <module>()
----> 1 '-'*pd.Series([True]*5, dtype=bool)

C:\Anaconda\lib\site-packages\pandas\core\ops.pyc in wrapper(left, right, name)
    529             if hasattr(lvalues, 'values'):
    530                 lvalues = lvalues.values
--> 531             return left._constructor(wrap_results(na_op(lvalues, rvalues)),
    532                                      index=left.index, name=left.name,
    533                                      dtype=dtype)

C:\Anaconda\lib\site-packages\pandas\core\ops.pyc in na_op(x, y)
    476                 result = np.empty(len(x), dtype=x.dtype)
    477                 mask = notnull(x)
--> 478                 result[mask] = op(x[mask], y)
    479             else:
    480                 raise TypeError("{typ} cannot perform the operation {op}".format(typ=type(x).__name__,op=str_rep))

TypeError: only integer arrays with one element can be converted to an index

If I however do the following:

'-'*pd.Series([True]*5, dtype=bool).astype(object)

I get the expected:

0    -
1    -
2    -
3    -
4    -
dtype: object

Can somebody explain to me what is going on? Am I maybe choosing an awkward way of doing this?

I think you are choosing an awkward way to do this by using the * operator. Might it be easier to use pandas.Series.map ?

eg.

pd.Series([True]*5,dtype=bool).map( lambda x : '-' if x else None )

If you're set on using the * operator I note that you can use it on two vectors, instead of on a scalar + vector:

my_filter = pd.Series([True]*5,dtype=bool)
pd.Series('-',index=my_filter.index) * my_filter

Or (as you more or less identified) it works if you adjust the dtype up front:

'-' * pd.Series([True]*5,dtype=object)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM