简体   繁体   English

将分类值汇总到第二个变量Python中

[英]Roll up categorical values into second variable Python

I have data that comes from a data source with pre-coded categorical variables. 我的数据来自带有预编码分类变量的数据源。 Unfortunately, these are not the variables that I need for my analysis and need to roll them up into a second column: 不幸的是,这些不是我进行分析所需的变量,不需要将它们汇总到第二列中:

age_group  lifestage
18-24      young adult
25-34      adult
35-44      adult
45-54      adult
.          .
.          .
.          .

I currently am using a loop through lists in order to do this: 我目前正在使用遍历列表来执行此操作:

ya_list = ['18-24']
adult_list = ['25-34', '35-44', '45-54']

for age in age_group:
    if age in ya_list:
        lifestage = 'young adult' 
    elif age in adult_list:
        lifestage = 'adult'

This works ok for this example with only a few groups to recode into, but when I have groups with 10 or more groups to recode, it becomes a lot more unwieldy. 在这个示例中,只有少数几个组可以重新编码,这可以正常工作,但是当我有10个或更多组要重新编码的组时,它变得更加笨拙。 I can't help but think there has to be a better way to do this, but I haven't been able to find one. 我忍不住想必须要有一种更好的方法来做到这一点,但我一直找不到。

You want a dictionary: 您想要一本字典:

stages = {'18-24': 'young adult',
          '25-34': 'adult', ...}

for age in age_group:
    lifestage = stages[age]

This is the canonical replacement for a lot of elif s in Python. 这是Python中许多elif的规范替代。

You could use split() and a list comprehension to get the actual numbers to work with: 您可以使用split()和列表推导来获取要使用的实际数字:

for age in age_group:
    lower,higher = [int(i) for i in age.split("-")]
    if higher <= 24:
        lifestage = "young adult"
    elif lower <= 54:
        lifestage = "adult"
    # etc...

Not sure if what you are scaling up is the number of age ranges, or the number of stages, but hopefully that will help you get started. 不确定您要扩大的年龄范围或阶段数,但希望可以帮助您入门。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我需要使用 pandas dataframe 根据第二个分类变量中的值来估算分类变量的缺失值 - I need to impute the missing values of a categorical variable based on the values in second categorical variable using pandas dataframe 将具有多个值的分类变量重新编码为 Python 中的二进制变量 - Recode categorical variable with multiple values to binary variable in Python Python - 加速将分类变量转换为数字索引 - Python - Speed up for converting a categorical variable to it's numerical index Python:将分类变量的值替换为数据框中的其他值 - Python: replace values of a categorical variable to something else in a data frame python:未检测到的分类值 - python: undetected categorical values 编码作为值列表的分类变量 - Encoding categorical variable that is a list of values 将数据帧值汇总到其上方的单元格中 - Roll up dataframe values into a cell above it 如何基于两个值比较数据(.csv)文件中的行,然后使用Python汇总数据? - How to compare lines in data (.csv) file based on two values, then roll-up the data using Python? 在 Python 中创建虚拟变量的分类变量 - Creating categorical variable of dummies in Python 使用画布汇总图像实现-Python - Roll up image implementation using canvas - Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM