简体   繁体   中英

Inconsistent results when adding a new column in Pandas DataFrame. Is it a Series or a Value?

So I know I can add a new column trivially in Pandas like this:

df
=====
  A
1 5
2 6
3 7

df['new_col'] = "text"

df
====
  A    new_col
1 5    text
2 6    text
3 7    text

And I can also set a new column based on an operation on an existing column.

def times_two(x):
    return x * 2

df['newer_col'] = time_two(df.a)
df
====
  A    new_col   newer_col
1 5    text      10
2 6    text      12
3 7    text      14

however when I try to operate on a text column I get an unexpected AttributeError.

df['new_text'] = df['new_col'].upper()
AttributeError: 'Series' object has no attribute 'upper'

It is now treating the value as a series, not the value in that "cell".

Why does this happen with text and not with numbers and how can update my DF with a new column based on an existing text column?

It's because the * operator is implemented as a mul operator whilst upper isn't defined for a Series . You have to use str.upper which is implemented for a Series where the dtype is str :

In[53]:
df['new_text'] = df['new_col'].str.upper()
df

Out[53]: 
   A new_col new_text
1  5    text     TEXT
2  6    text     TEXT
3  7    text     TEXT

There is no magic here.

For df['new_col'] this is just assigning a scalar value and conforming to broadcasting rules, where the scalar is broadcast to the length of the df along the minor axis, see this for an explanation of that: What does the term "broadcasting" mean in Pandas documentation?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM