简体   繁体   English

将 dataframe 中的一列与另一 dataframe 的两列进行比较

[英]Compare one column in a dataframe to two columns of another dataframe

I have two dataframes and I need to know if the values of the first dataframe are between two values (min and max values) in the second dataframe.我有两个数据帧,我需要知道第一个 dataframe 的值是否介于第二个 dataframe 中的两个值(最小值和最大值)之间。

I did something similar before with two other data frames, I used a nested loop and between {dplyr} .我之前对其他两个数据框做了类似的事情,我使用了嵌套loopbetween {dplyr} However, the other dataset only had three variables and I could make it work with 8 if statements.但是,另一个数据集只有三个变量,我可以使用 8 个if语句。 This is where I get stuck, dataframe1 has 62 variables and 477 observations and dataframe2 has 124 variables and 50 observations (min values and max values).这就是我卡住的地方,dataframe1 有 62 个变量和 477 个观察值,而 dataframe2 有 124 个变量和 50 个观察值(最小值和最大值)。 Below I have an example of the two dataframes and the result I am looking for.下面我有两个数据框的示例以及我正在寻找的结果。

So I am looking for a solution where I don't have to write around a thousand if else statements.所以我正在寻找一个解决方案,我不必写大约一千个if else语句。 I hope someone can help or if this is even possible.我希望有人可以提供帮助,或者如果这可能的话。

The example of how the data looks, I can still change the dataframes, however this is the point where I am at.数据看起来如何的示例,我仍然可以更改数据框,但这就是我所处的位置。

Df1
   id type data1 data2 data3
1   1   ab     0     0     0
2   2   cd     0     0     0
3   3   dd     0    10     5
4   4   ed     0     0     0
5   5   kd     0     0    15
6   6   xd     0     5     0
7   7   ab     0     0     0
8   8   cd     0     0     0
9   9   dd     0    10    10
10 10   ed     0     0     0
11 11   kd     0     0    12
12 12   xd     0    12     0
13 13   ab     0     0     0
14 14   cd     0     0     0
15 15   dd     0     5    15
16 16   ed     0     0     0
17 17   kd     0     0    15
18 18   xd     0     7     0
19 19   ab     0     0     0
20 20   cd     0     0     0
21 21   dd     0    18    10
22 22   ed     0     0     0
23 23   kd     0     0     5

I usually match the "type" with each other and then match if the data is between the lower and upper boundary.我通常将“类型”相互匹配,然后匹配数据是否在下边界和上边界之间。

Df2
  type data1 data1max data2 data2max data3 data3max
1   ab    NA       NA    NA       NA    NA       NA
2   dd    NA       NA     5       20    10      100
3   xd    NA       NA     1       30    NA       NA
4   ed    NA       NA    NA       NA    NA       NA
5   cd    NA       NA    NA       NA    NA       NA
6   kd    NA       NA    NA       NA     5       20

And resulting in a count when the observed data matches the qualifying data.并在观察到的数据与合格数据匹配时产生计数。

Df3
   id type qualifyingfields
1   1   ab                0
2   2   cd                0
3   3   dd                1
4   4   ed                0
5   5   kd                1
6   6   xd                1
7   7   ab                0
8   8   cd                0
9   9   dd                2
10 10   ed                0
11 11   kd                1
12 12   xd                1
13 13   ab                0
14 14   cd                0
15 15   dd                2
16 16   ed                0
17 17   kd                1
18 18   xd                1
19 19   ab                0
20 20   cd                0
21 21   dd                1
22 22   ed                0
23 23   kd                1
library(dplyr)
library(tidyr)

df1 %>% 
  right_join(., df2, by = "type", suffix = c("val", "min")) %>% 
  group_by(type, id) %>% 
  pivot_longer(-c(id, type), names_to = "data", values_to = "value") %>% 
  separate(col = data, into = c("data", "var"), sep = "(?<=\\d)") %>% 
  pivot_wider(names_from = var, values_from = value) %>% 
  group_by(id, type, data) %>% 
  mutate(qualifyingfields = sum(between(val, min, max), na.rm = T)) %>% 
  group_by(id, type) %>% 
  summarise(qualifyingfields = sum(qualifyingfields))

#> # A tibble: 23 x 3
#> # Groups:   type, id [23]
#>       id type  qualifyingfields
#>    <int> <chr>            <int>
#>  1     1 ab                   0
#>  2     2 cd                   0
#>  3     3 dd                   1
#>  4     4 ed                   0
#>  5     5 kd                   1
#>  6     6 xd                   1
#>  7     7 ab                   0
#>  8     8 cd                   0
#>  9     9 dd                   2
#> 10    10 ed                   0
#> # ... with 13 more rows

Data:数据:

df1 <- read.table(text="   id type data1 data2 data3
1   1   ab     0     0     0
2   2   cd     0     0     0
3   3   dd     0    10     5
4   4   ed     0     0     0
5   5   kd     0     0    15
6   6   xd     0     5     0
7   7   ab     0     0     0
8   8   cd     0     0     0
9   9   dd     0    10    10
10 10   ed     0     0     0
11 11   kd     0     0    12
12 12   xd     0    12     0
13 13   ab     0     0     0
14 14   cd     0     0     0
15 15   dd     0     5    15
16 16   ed     0     0     0
17 17   kd     0     0    15
18 18   xd     0     7     0
19 19   ab     0     0     0
20 20   cd     0     0     0
21 21   dd     0    18    10
22 22   ed     0     0     0
23 23   kd     0     0     5", 
header=T, stringsAsFactors=F)

df2 <- read.table(text="  type data1 data1max data2 data2max data3 data3max
1   ab    NA       NA    NA       NA    NA       NA
2   dd    NA       NA     5       20    10      100
3   xd    NA       NA     1       30    NA       NA
4   ed    NA       NA    NA       NA    NA       NA
5   cd    NA       NA    NA       NA    NA       NA
6   kd    NA       NA    NA       NA     5       20", 
header=T, stringsAsFactors=F, na.strings = "NA")

Here is a more general solution that applies to data regardless of how many data[n] columns are这是一个更通用的解决方案,适用于数据,无论有多少data[n]

library('dplyr')
library('tidyr')

# Make dataframes tidy
Df1_tidy <- Df1 %>%   
    gather(key='data_name', value='value', -(id:type))

Df2_tidy <- Df2 %>%
    gather(key='data_name', value='value', -type) %>%
    mutate(limit=ifelse(grepl('max', data_name), 'Max', 'Min'),
           data_name=gsub('max', '', data_name)) %>% 
    spread(limit, value) 

# Count qualifying fields
Df3 <- full_join(Df1_tidy, Df2_tidy) %>%
    group_by(id, type) %>%
    summarise(qualifyingfields = sum(value >= Min & value <= Max, na.rm=T)) %>%
    ungroup()

Df3
# # A tibble: 23 x 3
#       id type  qualifyingfields
#    <int> <chr>            <int>
#  1     1 ab                   0
#  2     2 cd                   0
#  3     3 dd                   1
#  4     4 ed                   0
#  5     5 kd                   1
#  6     6 xd                   1
#  7     7 ab                   0
#  8     8 cd                   0
#  9     9 dd                   2
# 10    10 ed                   0
# # ... with 13 more rows

Get data (copied from @M-- response):获取数据(从@M--响应复制):

df1 <- read.table(text="   id type data1 data2 data3
1   1   ab     0     0     0
2   2   cd     0     0     0
3   3   dd     0    10     5
4   4   ed     0     0     0
5   5   kd     0     0    15
6   6   xd     0     5     0
7   7   ab     0     0     0
8   8   cd     0     0     0
9   9   dd     0    10    10
10 10   ed     0     0     0
11 11   kd     0     0    12
12 12   xd     0    12     0
13 13   ab     0     0     0
14 14   cd     0     0     0
15 15   dd     0     5    15
16 16   ed     0     0     0
17 17   kd     0     0    15
18 18   xd     0     7     0
19 19   ab     0     0     0
20 20   cd     0     0     0
21 21   dd     0    18    10
22 22   ed     0     0     0
23 23   kd     0     0     5", 
header=T, stringsAsFactors=F)

df2 <- read.table(text="  type data1 data1max data2 data2max data3 data3max
1   ab    NA       NA    NA       NA    NA       NA
2   dd    NA       NA     5       20    10      100
3   xd    NA       NA     1       30    NA       NA
4   ed    NA       NA    NA       NA    NA       NA
5   cd    NA       NA    NA       NA    NA       NA
6   kd    NA       NA    NA       NA     5       20", 
header=T, stringsAsFactors=F, na.strings = "NA")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 比较一个 dataframe 中的两对列以检测不匹配并在同一行中显示另一列的值 - Compare two pairs of columns from one dataframe to detect mismatches and show the value from another column in the same row 如何将两列对与R中数据帧的另一对列进行比较 - How to compare pair of two columns with another pair of column of a dataframe in r 将dataframe列与另一个dataframe列进行比较 - Compare dataframe column to another dataframe column 合并dataframe的两列然后比较 - Combine two columns of a dataframe and then compare 如果前两列都匹配,则将数据框的一列中的值添加到另一数据框的新列中 - adding values from one column of a data frame into a new column of another dataframe if the first two columns in both match 在 R 中将两个数据框与两列 [日期时间列之一] 连接起来 - Join two dataframe with two columns [ one of datetime column ] in R 如何检查一个数据框中的两列是否都匹配另一个数据框中的两列? - How can I check that two columns in one dataframe both match two columns in another dataframe? 如何比较不同数据框中的日期,并将值分配给另一个数据框中同一列的一个数据框中的同一列? - how to compare dates in different data frames and assign values to a same column in one dataframe of a same column in another dataframe? 将一个 dataframe 的每一列与另一个 dataframe 列进行比较,并将每个结果重叠打印到单独的文件中 - Compare each column of one dataframe with another dataframe column and print each resulting overlap to separate files R - 将 dataframe 列中的所有元素与另一个 dataframe 中一行中的元素进行比较,跨所有列 - R - Compare all elements in a dataframe column with an element in a row in another dataframe, across all columns
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM