Creating a subset of data of missing values from two columns

Question

I have a dataset, which contains name of individuals and their telephone numbers, Tel_1 and Tel_2. Some of these telephone numbers do not have any entry and some are filled with 0 or 00000

|-------|:-------:|-------:|
| Tom   |  87669  |        |
| Dave  |    0    |        |
| Jess  |    0    | 767589 |
| Mike  | 5673254 | 755995 |
| Jerry |         | 43789  |
| Yen   |         |        |
| Mary  | 34545   |        |

I want the output in two ways where, one would generate the records containing zero or missing values from either of the telephone numbers, as shown below:

| Name  | Tel_1 |  Tel_2 |
|-------|:-----:|-------:|
| Tom   | 87669 |        |
| Dave  |   0   |        |
| Jess  |   0   | 767589 |
| Jerry |       | 43789  |
| Yen   |       |        |
| Mary  | 34545 |        |

The other would generate the records containing 0 or missing values from both telephone numbers, as shown below:

| Name | Tel_1 | Tel_2 |
|-----:|-------|-------|
| Dave | 0     |       |
| Yen  |       |       |

Answer 1

library(dplyr)

# First one
data %>% filter(is.na(Tel_1) | is.na(Tel_2) | Tel_1 == 0 | Tel_2 == 0)  

# Second One
data %>% filter((is.na(Tel_1) | Tel_1 == 0) & (is.na(Tel_2) | Tel_2 == 0))

Answer 2

Suppose your table is stored in dt . I recommend using data.table for this, as slicing is more intuitive (and way faster) than tidyverse .

First things first:

library(data.table)
dt <- as.data.table(dt)

To generate table 1:

dt1 <- dt[is.na(Tel_1) | Tel_1 == 0 | is.na(Tel_2) | Tel_2 == 0]

Table 2:

dt2 <- dt[(Tel_1 == 0 | is.na(Tel_1)) & (Tel_2 == 0 | is.na(Tel_2))]

If efficiency is an issue, you can do the following:

dt[is.na(dt)] <- 0 # Replace all NAs with a zero.
dt1 <- dt[Tel_1 == 0 | Tel_2 == 0]
dt2 <- dt[Tel_1 == 0 & Tel_2 == 0]

By doing this, you can do the same with less code and less logical operators.

Answer 3

If Tel1 and Tel2 are really characters (and not factors, which they may be if you have them in data.frame), you're looking for something like

mat <- as.martix[df,c("Tel1", "Tel2")]
rowHasZeros <- is.na(mat) | (nchar(mat) > 0) | (mat == "0")
idx1 <- rowSums(rowHasZeros) > 0
version1 <- Df[idx1,]

idx2 <- rowSums(rowHasZeros) == 2
version2 <- Df[idx2,]

I the data is numeric

mat <- as.martix[df,c("Tel1", "Tel2")]
rowHasZeros <- is.na(mat) | (mat == 0)
idx1 <- rowSums(rowHasZeros) > 0
version1 <- Df[idx1,]

idx2 <- rowSums(rowHasZeros) == 2
version2 <- Df[idx2,]

Creating a subset of data of missing values from two columns

Question

3 answers

solution1
1 ACCPTED 2019-07-11 19:26:20

solution2
1 2019-07-11 19:41:26

solution3
0 2019-07-11 19:28:27

Creating a subset of data of missing values from two columns

Question

3 answers

solution1 1 ACCPTED 2019-07-11 19:26:20

solution2 1 2019-07-11 19:41:26

solution3 0 2019-07-11 19:28:27

solution1
1 ACCPTED 2019-07-11 19:26:20

solution2
1 2019-07-11 19:41:26

solution3
0 2019-07-11 19:28:27