简体   繁体   English

将数据框中的列转换为“类”?

[英]Convert column in dataframe to “classes”?

So I've essentially got this dataframe: 所以我基本上得到了这个数据帧:

,club_name,tr_begin,year,ranking
0,ADO Den Haag,1357,2010,6.0
1,ADO Den Haag,1480,2011,15.0
2,ADO Den Haag,1397,2012,9.0
3,ADO Den Haag,1384,2013,9.0
4,ADO Den Haag,1451,2014,13.0

What I want to do is this, I want to go through every ranking and put them into a class based on it's value. 我想要做的就是这个,我想通过每个排名,并根据它的价值将它们放入一个类。 So a ranking of 6 would go into class number 2 and a ranking 1 would go into class number 1. The conversion table is this: 所以排名6将进入第2类,排名1将进入第1类。转换表如下:

if ranking > 0 and ranking =< 3:
    rank_class = 1
if ranking > 3 and ranking =< 6:
    rank_class = 2 

etc etc etc

This I would like to happen in multiples of 3 up until 18. 我希望以3的倍数发生直到18。

So my hoped output would be: 所以我希望的输出是:

,club_name,tr_begin,year,ranking, ranking_class
0,ADO Den Haag,1357,2010,6.0, 2
1,ADO Den Haag,1480,2011,15.0, 5
2,ADO Den Haag,1397,2012,9.0, 3
3,ADO Den Haag,1384,2013,9.0, 3
4,ADO Den Haag,1451,2014,13.0, 5

I tried with the mask function, and by making a new dataframe and then merging, This worked but just seemed very sloppy. 我尝试使用掩码功能,并通过创建一个新的数据帧然后合并,这工作,但似乎非常草率。 Is there some easy way to do this? 有一些简单的方法来做到这一点?

Thanks in advance 提前致谢

Using pandas.cut , you can define iterables for your "bins" and "labels". 使用pandas.cut ,您可以为“bin”和“labels”定义iterables。 This is simplified by the fact they can both be defined using range objects. 这可以通过使用range对象定义它们来简化。

I recommend you convert your ranking series to int first; 我建议你先将你的ranking系列转换为int ; it may be affected by floating-point rounding which may yield undesirable results. 它可能受到浮点舍入的影响,这可能会产生不良结果。

df = pd.read_csv('file.csv')

binrange = range(0, 19, 3)
labrange = range(1, 7)

df['ranking_class'] = pd.cut(df['ranking'], bins=binrange, labels=labrange)

print(df)

      club_name  tr_begin  year  ranking ranking_class
0  ADO Den Haag      1357  2010      6.0             2
1  ADO Den Haag      1480  2011     15.0             5
2  ADO Den Haag      1397  2012      9.0             3
3  ADO Den Haag      1384  2013      9.0             3
4  ADO Den Haag      1451  2014     13.0             5

I think integer division // would do it: 我认为整数除法//会这样做:

df.assign(ranking_class=(df.ranking // 3).astype(int))

      club_name  tr_begin  year  ranking  ranking_class
0  ADO Den Haag      1357  2010      6.0              2
1  ADO Den Haag      1480  2011     15.0              5
2  ADO Den Haag      1397  2012      9.0              3
3  ADO Den Haag      1384  2013      9.0              3
4  ADO Den Haag      1451  2014     13.0              4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM