简体   繁体   English

#1 组合分类变量的类别

[英]#1 Combining categories of a categorical variable

I would like to combine some Brazilian political party names from a categorical variable (partido_pref) that was wrongly coded.我想从错误编码的分类变量(partido_pref)中组合一些巴西政党名称。

The categories that I would like to combine are "PC do B" and "PCdoB", and "PT do B" and "PTdoB".我想合并的类别是“PC do B”和“PCdoB”,以及“PT do B”和“PTdoB”。 The parties with and without space are the same parties.有空间和没有空间的各方是同一方。

I would rather do it in Stata but I can also work on R.我宁愿在 Stata 中做,但我也可以在 R 上工作。

Below you will find the list of political parties.您将在下面找到政党名单。

. . tab partido_pref选项卡 partido_pref

partido_pref | Freq. Percent Cum.

    ---------------+-----------------------------------
    DEM | 2,267 2.14 2.14
    NA | 34,848 32.84 34.98
    Não disponível | 2 0.00 34.98
    Outra situação | 19 0.02 35.00
    PAN | 6 0.01 35.00
    PC do B | 260 0.25 35.25
    PCB | 2 0.00 35.25
    PCdoB | 7 0.01 35.26
    PCO | 1 0.00 35.26
    PDT | 3,933 3.71 38.97
    PFL | 6,811 6.42 45.39
    PHS | 194 0.18 45.57
    PL | 2,525 2.38 47.95
    PMDB | 14,833 13.98 61.93
    PMN | 410 0.39 62.31
    PP | 5,467 5.15 67.47
    PPB | 1,661 1.57 69.03
    PPL | 10 0.01 69.04
    PPS | 2,493 2.35 71.39
    PR | 1,861 1.75 73.14
    PRB | 298 0.28 73.43
    PRN | 9 0.01 73.43
    PRONA | 26 0.02 73.46
    PRP | 273 0.26 73.72
    PRTB | 121 0.11 73.83
    PSB | 2,905 2.74 76.57
    PSC | 480 0.45 77.02
    PSD | 816 0.77 77.79
    PSDB | 11,316 10.66 88.45
    PSDC | 121 0.11 88.57
    PSL | 273 0.26 88.83
    PSOL | 4 0.00 88.83
    PST | 48 0.05 88.87
    PSTU | 1 0.00 88.88
    PT | 5,258 4.96 93.83
    PT do B | 139 0.13 93.96
    PTB | 5,383 5.07 99.03
    PTC | 140 0.13 99.17
    PTdoB | 10 0.01 99.18
    PTN | 108 0.10 99.28
    PV | 702 0.66 99.94
    Recusa | 2 0.00 99.94
    Sem partido | 62 0.06 100.00
    ---------------+-----------------------------------
    Total | 106,105 100.00

Thank you in advance!先感谢您!

One option is fct_collapse from forcats一种选择是来自fct_collapseforcats

library(forcats)
fct_collapse(df1$partido_pref, pc = c( "PC do B", "PCdoB"),
                    pt = c( "PT do B", "PTdoB"))

If your problem is just getting rid of whitespace:如果您的问题只是摆脱空白:

replace partido_pref = subinstr(partido_pref,  " ", "")

See help string_functions for more options.有关更多选项,请参阅help string_functions

R is more flexible, but Stata can handle that level of simple text management. R 更灵活,但 Stata 可以处理这种级别的简单文本管理。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据条件(互斥类别)将不同的虚拟变量组合成一个分类变量? - Combining different dummy variables into a single categorical variable based on conditions (mutually exclusive categories)? 使用具有很多类别的分类变量进行对数回归 - Log regression using categorical variable with lots of categories 创建分类变量(年龄类别)并应用于表格 - Create a categorical variable (age categories) and apply to a table 对分类变量的类别子集进行等式比例检验 - Equality proportions test for a subset of categories of a categorical variable 通过在R中组合2个分类变量来创建新变量 - Create new variable by combining 2 Categorical variables in R R中连续变量和分类变量之间的相互作用:是否可以包含所有类别? - Interaction between continuous and categorical variable in R: is there a way to include all categories? R 的总和表上未显示超过五个类别的分类变量 - Categorical variable of more than five categories not showing on sumtable in R 使用 Tidyverse 在 R 中将连续变量重新编码为具有“特定”类别的分类 - Recoding continuous variable into categorical with *specific" categories, in R using Tidyverse 分类自变量的残差 plot 仅提供某些类别的残差 - Residual plot by categorical independent variable only provides residuals for some categories 绘制一个类别变量的一个类别相对于第二个变量的所有类别的份额 - Plot the Share of one Category of a Categorical Variable with Respect to all Categories of a Second Variable
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM