[英]Inconsistent results when adding a new column in Pandas DataFrame. Is it a Series or a Value?
So I know I can add a new column trivially in Pandas like this:所以我知道我可以像这样在 Pandas 中简单地添加一个新列:
df
=====
A
1 5
2 6
3 7
df['new_col'] = "text"
df
====
A new_col
1 5 text
2 6 text
3 7 text
And I can also set a new column based on an operation on an existing column.我还可以根据对现有列的操作设置一个新列。
def times_two(x):
return x * 2
df['newer_col'] = time_two(df.a)
df
====
A new_col newer_col
1 5 text 10
2 6 text 12
3 7 text 14
however when I try to operate on a text column I get an unexpected AttributeError.但是,当我尝试对文本列进行操作时,出现意外的 AttributeError。
df['new_text'] = df['new_col'].upper()
AttributeError: 'Series' object has no attribute 'upper'
It is now treating the value as a series, not the value in that "cell".它现在将值视为一个系列,而不是该“单元格”中的值。
Why does this happen with text and not with numbers and how can update my DF with a new column based on an existing text column?为什么这种情况发生在文本而不是数字上,以及如何使用基于现有文本列的新列更新我的 DF?
It's because the *
operator is implemented as a mul
operator whilst upper
isn't defined for a Series
.这是因为
*
运算符是作为mul
运算符实现的,而upper
不是为Series
定义的。 You have to use str.upper
which is implemented for a Series
where the dtype is str
:您必须使用为
str.upper
为str
的Series
实现的str.upper
:
In[53]:
df['new_text'] = df['new_col'].str.upper()
df
Out[53]:
A new_col new_text
1 5 text TEXT
2 6 text TEXT
3 7 text TEXT
There is no magic here.这里没有魔法。
For df['new_col']
this is just assigning a scalar value and conforming to broadcasting
rules, where the scalar is broadcast to the length of the df along the minor axis, see this for an explanation of that: What does the term "broadcasting" mean in Pandas documentation?对于
df['new_col']
这只是分配一个标量值并符合broadcasting
规则,其中标量沿短轴广播到 df 的长度,请参阅此说明: 什么是术语“广播” " 在 Pandas 文档中是什么意思?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.