Inconsistent results when adding a new column in Pandas DataFrame. Is it a Series or a Value?

Question

So I know I can add a new column trivially in Pandas like this:

df
=====
  A
1 5
2 6
3 7

df['new_col'] = "text"

df
====
  A    new_col
1 5    text
2 6    text
3 7    text

And I can also set a new column based on an operation on an existing column.

def times_two(x):
    return x * 2

df['newer_col'] = time_two(df.a)
df
====
  A    new_col   newer_col
1 5    text      10
2 6    text      12
3 7    text      14

however when I try to operate on a text column I get an unexpected AttributeError.

df['new_text'] = df['new_col'].upper()
AttributeError: 'Series' object has no attribute 'upper'

It is now treating the value as a series, not the value in that "cell".

Why does this happen with text and not with numbers and how can update my DF with a new column based on an existing text column?

Answer 1

It's because the * operator is implemented as a mul operator whilst upper isn't defined for a Series . You have to use str.upper which is implemented for a Series where the dtype is str :

In[53]:
df['new_text'] = df['new_col'].str.upper()
df

Out[53]: 
   A new_col new_text
1  5    text     TEXT
2  6    text     TEXT
3  7    text     TEXT

There is no magic here.

For df['new_col'] this is just assigning a scalar value and conforming to broadcasting rules, where the scalar is broadcast to the length of the df along the minor axis, see this for an explanation of that: What does the term "broadcasting" mean in Pandas documentation?

Inconsistent results when adding a new column in Pandas DataFrame. Is it a Series or a Value?

Question

1 answers

solution1
1 ACCPTED 2019-04-12 15:07:18

Inconsistent results when adding a new column in Pandas DataFrame. Is it a Series or a Value?

Question

1 answers

solution1 1 ACCPTED 2019-04-12 15:07:18

solution1
1 ACCPTED 2019-04-12 15:07:18