简体   繁体   English

相当于R查询的sql

[英]sql equivalent of R query

I have two data sets author_data and paper_author 我有两个数据集author_data和paper_author

author_data: author_data:

author_id       author_name          author_affiliation
 25         William H. Nailon                                                                    
 37         P. B. Littlewood        Cavendish Laboratory|Cambridge University
 44         A. Kuroiwa               Department of Molecular Biology 

paper_author: paper_author:

paper_id     author_id      author_name      author_affiliation
  1          521630         Ayman Kaheel     Cairo Microsoft Innovation Lab
  1          972575       Mahmoud Refaat     Cairo Microsoft Innovation Lab

I have run the following query in R 我已经在R中运行以下查询

author_data[which(author_data$author_id %in% paper_author$author_id &
                  author_data$author_name %in% paper_author$author_name & 
                  author_data$author_affiliation %in% paper_author$author_affiliation), ]

That is, I want to find out the matches between author_data and paper_author for which the three columns author_id , author_name and author_affiliation match. 也就是说,我想找出author_data和paper_author之间的匹配项, author_idauthor_nameauthor_affiliation这三列author_id匹配。

I have written a query to get this result in sql but I am not getting it right.The query which I have tried is 我已经编写了一个查询来在sql中获得此结果,但我没有得到正确的结果。我尝试过的查询是

statement <- "select
              author_data.author_id,
              author_data.author_name,
              author_data.author_affiliation
        FROM author_data
        INNER JOIN paper_author
          ON author_data.author_id = paper_author.author_id
            AND author_data.author_name = paper_author.author_name
            AND author_data.author_affiliation = paper_author.author_affiliation"

through this query I am getting more rows than the rows in author_data and the query should fetch data which first of all would be subset of author_data. 通过该查询,我得到的行数比author_data中的行数更多,并且查询应获取首先是author_data子集的数据。 I am not able to figure out what is wrong as I am naive at sql. 我不懂sql,所以无法弄清楚出了什么问题。

What is wrong with this query? 此查询有什么问题?

Thanks 谢谢

There is a difference between which in R and join in SQL. 有之间的差异which在R和join的SQL。 While which will effectively subset given data frame, join will return all rows where join condition is met. 虽然which将有效子集给出的数据帧, join将返回所有行, join条件得到满足。 I am almost sure, that in your case you have multiple occurences of combination author_id, author_name, author_affiliation in paper_author . 我几乎可以确定,在您的情况下author_id, author_name, author_affiliation会多次出现author_id, author_name, author_affiliation paper_author As a result, rows in author_data are multiplied by rows in paper_author . 结果, author_data中的行与author_data中的行paper_author

Your query was almost correct, you need to add distinct or group by or use exists : 您的查询几乎是正确的,您需要添加非distinctgroup by或使用exists

Distinct: 不同:

select
   distinct
   author_data.author_id,
   author_data.author_name,
   author_data.author_affiliation
from
   author_data
   INNER JOIN paper_author
          ON author_data.author_id = paper_author.author_id
            AND author_data.author_name = paper_author.author_name
            AND author_data.author_affiliation = paper_author.author_affiliation

Group by: 通过...分组:

select
   author_data.author_id,
   author_data.author_name,
   author_data.author_affiliation
from
   author_data
   INNER JOIN paper_author
          ON author_data.author_id = paper_author.author_id
            AND author_data.author_name = paper_author.author_name
            AND author_data.author_affiliation = paper_author.author_affiliation
group by
   author_data.author_id,
   author_data.author_name,
   author_data.author_affiliation

You can also use exists : 您也可以使用exists

select
   author_data.author_id,
   author_data.author_name,
   author_data.author_affiliation
from
   author_data
where
   exists (select 1 from paper_author where
       author_data.author_id = paper_author.author_id
       AND author_data.author_name = paper_author.author_name
       AND author_data.author_affiliation = paper_author.author_affiliation
       )

Try this. 尝试这个。

SELECT author_data.author_id,author_data.author_name,author_data.author_affiliation
FROM author_data, paper_author
WHERE author_data.author_id = paper_author.author_id 
AND author_data.author_name=paper_author.author_name 
AND author_data.author_affiliation=paper_author.author_affiliation

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM