简体   繁体   中英

Creating a subset of data of missing values from two columns

I have a dataset, which contains name of individuals and their telephone numbers, Tel_1 and Tel_2. Some of these telephone numbers do not have any entry and some are filled with 0 or 00000

|-------|:-------:|-------:|
| Tom   |  87669  |        |
| Dave  |    0    |        |
| Jess  |    0    | 767589 |
| Mike  | 5673254 | 755995 |
| Jerry |         | 43789  |
| Yen   |         |        |
| Mary  | 34545   |        |

I want the output in two ways where, one would generate the records containing zero or missing values from either of the telephone numbers, as shown below:

| Name  | Tel_1 |  Tel_2 |
|-------|:-----:|-------:|
| Tom   | 87669 |        |
| Dave  |   0   |        |
| Jess  |   0   | 767589 |
| Jerry |       | 43789  |
| Yen   |       |        |
| Mary  | 34545 |        |

The other would generate the records containing 0 or missing values from both telephone numbers, as shown below:

| Name | Tel_1 | Tel_2 |
|-----:|-------|-------|
| Dave | 0     |       |
| Yen  |       |       |
library(dplyr)

# First one
data %>% filter(is.na(Tel_1) | is.na(Tel_2) | Tel_1 == 0 | Tel_2 == 0)  

# Second One
data %>% filter((is.na(Tel_1) | Tel_1 == 0) & (is.na(Tel_2) | Tel_2 == 0))

Suppose your table is stored in dt . I recommend using data.table for this, as slicing is more intuitive (and way faster) than tidyverse .

First things first:

library(data.table)
dt <- as.data.table(dt)

To generate table 1:

dt1 <- dt[is.na(Tel_1) | Tel_1 == 0 | is.na(Tel_2) | Tel_2 == 0]

Table 2:

dt2 <- dt[(Tel_1 == 0 | is.na(Tel_1)) & (Tel_2 == 0 | is.na(Tel_2))]

If efficiency is an issue, you can do the following:

dt[is.na(dt)] <- 0 # Replace all NAs with a zero.
dt1 <- dt[Tel_1 == 0 | Tel_2 == 0]
dt2 <- dt[Tel_1 == 0 & Tel_2 == 0]

By doing this, you can do the same with less code and less logical operators.

If Tel1 and Tel2 are really characters (and not factors, which they may be if you have them in data.frame), you're looking for something like

mat <- as.martix[df,c("Tel1", "Tel2")]
rowHasZeros <- is.na(mat) | (nchar(mat) > 0) | (mat == "0")
idx1 <- rowSums(rowHasZeros) > 0
version1 <- Df[idx1,]

idx2 <- rowSums(rowHasZeros) == 2
version2 <- Df[idx2,]

I the data is numeric

mat <- as.martix[df,c("Tel1", "Tel2")]
rowHasZeros <- is.na(mat) | (mat == 0)
idx1 <- rowSums(rowHasZeros) > 0
version1 <- Df[idx1,]

idx2 <- rowSums(rowHasZeros) == 2
version2 <- Df[idx2,]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM