简体   繁体   中英

match labels with na_tags

The haven package preserves both value labels and tagged NAs when reading Stata/SPSS files. For example, in the GSS's variable for self-employment, the labels suggest there are three different kinds of NA values:

library(tidyverse)
library(haven)

download.file(url="http://gss.norc.org/Documents/stata/2016_stata.zip",
              destfile = "2016_stata.zip")
unzip("2016_stata.zip")

gss <- read_dta("GSS2016.dta")

attr(gss$wrkslf, "labels")
#> self-employed  someone else            DK           IAP            NA 
#>             1             2            NA            NA            NA

Looking at the na_tag() for that variable, we can confirm that there are three types of NA tags:

table(na_tag(gss$wrkslf))
#> 
#>  d  i  n 
#>  4 90  5

My question is, how do we find out which strings in the labels correspond to which of the NA tags? In this example, we can infer that the d , i , and n tags probably correspond to the DK , IAP , and NA labels respectively just based on their letters (and we could always check the documentation), but I'd like a way to do this programmatically, if possible.

This would be useful if, for example, you wanted to produce a tabulation of a particular variable which displays the values of a variable alongside their associated labels, including for tagged NAs.

Looking at the definition of print_labels I see that na tags and labels are associated like this:

format_tagged_na(attr(gss$wrkslf, "labels"))
self-employed  someone else            DK           IAP            NA 
      "    1"       "    2"       "NA(d)"       "NA(i)"       "NA(n)" 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM