[英]classifying a series to a new column in pandas
I want to be able to take my current set of data, which is filled with ints, and classify them according to certain criteria. 我希望能够获取当前的数据集,其中包含整数,并根据特定条件对其进行分类。 The table looks something like this:
该表看起来像这样:
[in]> df = pd.DataFrame({'A':[0,2,3,2,0,0],'B': [1,0,2,0,0,0],'C': [0,0,1,0,1,0]})
[out]>
A B C
0 0 1 0
1 2 0 0
2 3 2 1
3 2 0 0
4 0 0 1
5 0 0 0
I'd like to classify these in a separate column by string. 我想在一个单独的列中按字符串对这些进行分类。 Being more familiar with R, I tried to create a new column with the rules in that column's definition.
由于更熟悉R,我尝试使用该列定义中的规则创建一个新列。 Following that I attempted with .ix and lambdas which both resulted in a type errors (between ints & series ).
之后我尝试使用.ix和lambdas两者都导致类型错误(在整数和系列之间)。 I'm under the impression that this is a fairly simple question.
我的印象是这是一个相当简单的问题。 Although the following is completely wrong, here is the logic from attempt 1:
虽然以下是完全错误的,但这是来自尝试1的逻辑:
df['D']=(
if ((df['A'] > 0) & (df['B'] == 0) & df['C']==0):
return "c1";
elif ((df['A'] == 0) & ((df['B'] > 0) | df['C'] >0)):
return "c2";
else:
return "c3";)
for a final result of: 为了最终结果:
A B C D
0 0 1 0 "c2"
1 2 0 0 "c1"
2 3 2 1 "c3"
3 2 0 0 "c1"
4 0 0 1 "c2"
5 0 0 0 "c3"
If someone could help me figure this out it would be much appreciated. 如果有人可以帮我解决这个问题,我将不胜感激。
I can think of two ways. 我可以想到两种方式。 The first is to write a classifier function and then
.apply
it row-wise: 第一种是编写分类器函数,然后
.apply
行.apply
它:
>>> import pandas as pd
>>> df = pd.DataFrame({'A':[0,2,3,2,0,0],'B': [1,0,2,0,0,0],'C': [0,0,1,0,1,0]})
>>>
>>> def classifier(row):
... if row["A"] > 0 and row["B"] == 0 and row["C"] == 0:
... return "c1"
... elif row["A"] == 0 and (row["B"] > 0 or row["C"] > 0):
... return "c2"
... else:
... return "c3"
...
>>> df["D"] = df.apply(classifier, axis=1)
>>> df
A B C D
0 0 1 0 c2
1 2 0 0 c1
2 3 2 1 c3
3 2 0 0 c1
4 0 0 1 c2
5 0 0 0 c3
and the second is to use advanced indexing: 第二个是使用高级索引:
>>> df = pd.DataFrame({'A':[0,2,3,2,0,0],'B': [1,0,2,0,0,0],'C': [0,0,1,0,1,0]})
>>> df["D"] = "c3"
>>> df["D"][(df["A"] > 0) & (df["B"] == 0) & (df["C"] == 0)] = "c1"
>>> df["D"][(df["A"] == 0) & ((df["B"] > 0) | (df["C"] > 0))] = "c2"
>>> df
A B C D
0 0 1 0 c2
1 2 0 0 c1
2 3 2 1 c3
3 2 0 0 c1
4 0 0 1 c2
5 0 0 0 c3
Which one is clearer depends upon the situation. 哪个更清楚取决于具体情况。 Usually the more complex the logic the more likely I am to wrap it up in a function I can then document and test.
通常逻辑越复杂,我就越有可能将它包装在一个函数中,然后我可以记录和测试。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.