I have a question that I have answer two but think it is inefficient and so want a better way. I am querying a database using DBplyr and joining tables based upon a few ID columns.
For example, I have one table (Table_1) in the database with an org_ID and a sender_ID, the first is the number for the organization in a given geography and the second a number for organizations with which they interacted in a given geography. So org_ID = 1 in geography 2, org_ID 1 in geography 3, and so forth.
I also have a single identification table (Table_2) with an org_ID column and a corresponding geography. There is not a sender_ID in Table_2, but the sender_ID in Table_2 does match org_ID/geography pairs in Table_1.
I want to combine these tables, but I need to use the Table_1 twice, effectively. What works, but is slow is the following
df <- Table_1 %>%
left_join(Table_2, by=c('org_ID' = "org_ID" , 'geography' = 'geography' ))%>%
left_join(Table_2, by=c('sender_ID' = "org_ID" , 'geography' = 'geography')) %>%
#various grouping and summary commands %>%
collect()
Any ideas for better ways?
If it is mainly a question of how to do:
df <- Table_1 %>%
left_join(Table_2, by=c('org_ID' = "org_ID" , 'geography' = 'geography' ))%>%
left_join(Table_2, by=c('sender_ID' = "org_ID" , 'geography' = 'geography')) %>%
#various grouping and summary commands %>%
collect()
more effectively (with only a single join), then I would suggest:
df <- Table_1 %>%
left_join(Table_2, by = "geography", suffix = c(".x",".y") %>%
filter(org_ID.x == org_ID.y | sender_ID.x == org_ID.y) %>%
#various grouping and summary commands %>%
collect()
However, imposes certain handling of records in Table_1
that do not appear in Table_2
: It behaves more like an inner join than a left join.
Regarding peformance:
collect()
copies that data from SQL into R, this may be the slowest point of your code.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.