[英]#1 Combining categories of a categorical variable
I would like to combine some Brazilian political party names from a categorical variable (partido_pref) that was wrongly coded.我想从错误编码的分类变量(partido_pref)中组合一些巴西政党名称。
The categories that I would like to combine are "PC do B" and "PCdoB", and "PT do B" and "PTdoB".我想合并的类别是“PC do B”和“PCdoB”,以及“PT do B”和“PTdoB”。 The parties with and without space are the same parties.
有空间和没有空间的各方是同一方。
I would rather do it in Stata but I can also work on R.我宁愿在 Stata 中做,但我也可以在 R 上工作。
Below you will find the list of political parties.您将在下面找到政党名单。
. . tab partido_pref
选项卡 partido_pref
partido_pref | Freq. Percent Cum.
---------------+-----------------------------------
DEM | 2,267 2.14 2.14
NA | 34,848 32.84 34.98
Não disponível | 2 0.00 34.98
Outra situação | 19 0.02 35.00
PAN | 6 0.01 35.00
PC do B | 260 0.25 35.25
PCB | 2 0.00 35.25
PCdoB | 7 0.01 35.26
PCO | 1 0.00 35.26
PDT | 3,933 3.71 38.97
PFL | 6,811 6.42 45.39
PHS | 194 0.18 45.57
PL | 2,525 2.38 47.95
PMDB | 14,833 13.98 61.93
PMN | 410 0.39 62.31
PP | 5,467 5.15 67.47
PPB | 1,661 1.57 69.03
PPL | 10 0.01 69.04
PPS | 2,493 2.35 71.39
PR | 1,861 1.75 73.14
PRB | 298 0.28 73.43
PRN | 9 0.01 73.43
PRONA | 26 0.02 73.46
PRP | 273 0.26 73.72
PRTB | 121 0.11 73.83
PSB | 2,905 2.74 76.57
PSC | 480 0.45 77.02
PSD | 816 0.77 77.79
PSDB | 11,316 10.66 88.45
PSDC | 121 0.11 88.57
PSL | 273 0.26 88.83
PSOL | 4 0.00 88.83
PST | 48 0.05 88.87
PSTU | 1 0.00 88.88
PT | 5,258 4.96 93.83
PT do B | 139 0.13 93.96
PTB | 5,383 5.07 99.03
PTC | 140 0.13 99.17
PTdoB | 10 0.01 99.18
PTN | 108 0.10 99.28
PV | 702 0.66 99.94
Recusa | 2 0.00 99.94
Sem partido | 62 0.06 100.00
---------------+-----------------------------------
Total | 106,105 100.00
Thank you in advance!先感谢您!
One option is fct_collapse
from forcats
一种选择是来自
fct_collapse
的forcats
library(forcats)
fct_collapse(df1$partido_pref, pc = c( "PC do B", "PCdoB"),
pt = c( "PT do B", "PTdoB"))
If your problem is just getting rid of whitespace:如果您的问题只是摆脱空白:
replace partido_pref = subinstr(partido_pref, " ", "")
See help string_functions
for more options.有关更多选项,请参阅
help string_functions
。
R is more flexible, but Stata can handle that level of simple text management. R 更灵活,但 Stata 可以处理这种级别的简单文本管理。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.