[英]Roll up categorical values into second variable Python
I have data that comes from a data source with pre-coded categorical variables. 我的数据来自带有预编码分类变量的数据源。 Unfortunately, these are not the variables that I need for my analysis and need to roll them up into a second column:
不幸的是,这些不是我进行分析所需的变量,不需要将它们汇总到第二列中:
age_group lifestage
18-24 young adult
25-34 adult
35-44 adult
45-54 adult
. .
. .
. .
I currently am using a loop through lists in order to do this: 我目前正在使用遍历列表来执行此操作:
ya_list = ['18-24']
adult_list = ['25-34', '35-44', '45-54']
for age in age_group:
if age in ya_list:
lifestage = 'young adult'
elif age in adult_list:
lifestage = 'adult'
This works ok for this example with only a few groups to recode into, but when I have groups with 10 or more groups to recode, it becomes a lot more unwieldy. 在这个示例中,只有少数几个组可以重新编码,这可以正常工作,但是当我有10个或更多组要重新编码的组时,它变得更加笨拙。 I can't help but think there has to be a better way to do this, but I haven't been able to find one.
我忍不住想必须要有一种更好的方法来做到这一点,但我一直找不到。
You want a dictionary: 您想要一本字典:
stages = {'18-24': 'young adult',
'25-34': 'adult', ...}
for age in age_group:
lifestage = stages[age]
This is the canonical replacement for a lot of elif
s in Python. 这是Python中许多
elif
的规范替代。
You could use split()
and a list comprehension to get the actual numbers to work with: 您可以使用
split()
和列表推导来获取要使用的实际数字:
for age in age_group:
lower,higher = [int(i) for i in age.split("-")]
if higher <= 24:
lifestage = "young adult"
elif lower <= 54:
lifestage = "adult"
# etc...
Not sure if what you are scaling up is the number of age ranges, or the number of stages, but hopefully that will help you get started. 不确定您要扩大的年龄范围或阶段数,但希望可以帮助您入门。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.