[英]Label a column based on the value of another column (same row) in pandas dataframe
I have a list of sub-categories that correspond to a particular category, think of it like this:我有一个对应于特定类别的子类别列表,可以这样想:
Category Sub Category类别子类别
a |一个 | 1 1个
a |一个 | 2 2个
a |一个 | 3 3个
b |乙 | 4 4个
b |乙 | 5 5个
etc... ETC...
I was wondering the best way to apply the Category value to each row of the dataframe (~800,000 rows) based on the Sub Category which is defined.我想知道根据定义的子类别将类别值应用于 dataframe(~800,000 行)的每一行的最佳方法。
I am currently using this method, but I know its not the best or even good:我目前正在使用这种方法,但我知道它不是最好的,甚至不是最好的:
df.loc[df.Subcategory =='1', 'Category'] = 'a'
df.loc[df.Subcategory =='2', 'Category'] = 'a'
df.loc[df.Subcategory =='3', 'Category'] = 'a'
df.loc[df.Subcategory =='4', 'Category'] = 'b'
and so on...
That leaves me with a long chunk of ugly code and isnt very efficient.这给我留下了一大堆丑陋的代码,而且效率不高。
I was wondering if anyone has another method that might be able to help, I am fairly new to coding so this is only the 5th or so code I've written and am mostly self taught so any help would be really appreciated.我想知道是否有人有另一种方法可以提供帮助,我对编码还很陌生,所以这只是我编写的第 5 个左右的代码,而且大部分都是自学的,因此非常感谢任何帮助。
Based on your code, it looks like you have a DataFrame column called "Subcategory" and you want to create the column "Category" based on some mapping of subcategory to category.根据您的代码,您似乎有一个名为“子类别”的 DataFrame 列,并且您希望基于子类别到类别的某些映射来创建“类别”列。 (Your initial description suggests that you already have the "Category" column, but then there would be no point to your code.) (您的初始描述表明您已经有了“类别”列,但是您的代码就没有意义了。)
If I'm understanding correctly and you want to create the "Category" column, equal to "a" when subcategory == 1, equal to "a" when subcategory == 2, ..., equal to "b" when subcategory == 5, and so on, then you could use the pandas map() function.如果我理解正确并且您想创建“类别”列,当子类别 == 1 时等于“a”,当子类别 == 2 时等于“a”,...,当子类别时等于“b” == 5,依此类推,那么你可以使用 pandas map() function。
subcategory_to_category_map = { "1": "a", "2": "a", "3": "a", "4": "b", "5": "b" }
df["Category"] = df["Subcategory"].map( subcategory_to_category_map )
Make sure you use the same data type in the dictionary/map as your "Subcategory" values (ie if they are numeric use numberic keys, and if they are strings ("1", "2", etc.) then use strings (as shown)).确保在字典/地图中使用与“子类别”值相同的数据类型(即,如果它们是数字,则使用数字键,如果它们是字符串(“1”、“2”等),则使用字符串(如图所示))。 Also note that any value of "Subcategory" that is not a key in the dictionary will result in the new "Category" column having a missing value.另请注意,任何不是字典中键的“子类别”值都将导致新的“类别”列具有缺失值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.