简体   繁体   English

使用 JPA 和 Hibernate 时 DISTINCT 是如何工作的

[英]How does DISTINCT work when using JPA and Hibernate

What column does DISTINCT work with in JPA and is it possible to change it? DISTINCT 在 JPA 中使用什么列,是否可以更改它?

Here's an example JPA query using DISTINCT:这是使用 DISTINCT 的示例 JPA 查询:

select DISTINCT c from Customer c

Which doesn't make a lot of sense - what column is the distinct based on?哪个没有多大意义——不同的列基于什么? Is it specified on the Entity as an annotation because I couldn't find one?它是否在实体上指定为注释,因为我找不到?

I would like to specify the column to make the distinction on, something like:我想指定要区分的列,例如:

select DISTINCT(c.name) c from Customer c

I'm using MySQL and Hibernate.我正在使用 MySQL 和 Hibernate。

你很亲密。

select DISTINCT(c.name) from Customer c

Depending on the underlying JPQL or Criteria API query type, DISTINCT has two meanings in JPA.根据底层 JPQL 或 Criteria API 查询类型, DISTINCT在 JPA 中有两种含义。

Scalar queries标量查询

For scalar queries, which return a scalar projection, like the following query:对于返回标量投影的标量查询,如以下查询:

List<Integer> publicationYears = entityManager
.createQuery(
    "select distinct year(p.createdOn) " +
    "from Post p " +
    "order by year(p.createdOn)", Integer.class)
.getResultList();

LOGGER.info("Publication years: {}", publicationYears);

The DISTINCT keyword should be passed to the underlying SQL statement because we want the DB engine to filter duplicates prior to returning the result set:应该将DISTINCT关键字传递给底层 SQL 语句,因为我们希望数据库引擎在返回结果集之前过滤重复项:

SELECT DISTINCT
    extract(YEAR FROM p.created_on) AS col_0_0_
FROM
    post p
ORDER BY
    extract(YEAR FROM p.created_on)

-- Publication years: [2016, 2018]

Entity queries实体查询

For entity queries, DISTINCT has a different meaning.对于实体查询, DISTINCT有不同的含义。

Without using DISTINCT , a query like the following one:在不使用DISTINCT的情况下,查询如下:

List<Post> posts = entityManager
.createQuery(
    "select p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.getResultList();

LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

is going to JOIN the post and the post_comment tables like this:将像这样加入postpost_comment表:

SELECT p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'

-- Fetched the following Post entity identifiers: [1, 1]

But the parent post records are duplicated in the result set for each associated post_comment row.但是父post记录在每个关联的post_comment行的结果集中重复。 For this reason, the List of Post entities will contain duplicate Post entity references.因此, Post实体List将包含重复的Post实体引用。

To eliminate the Post entity references, we need to use DISTINCT :为了消除Post实体引用,我们需要使用DISTINCT

List<Post> posts = entityManager
.createQuery(
    "select distinct p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.getResultList();
 
LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

But then DISTINCT is also passed to the SQL query, and that's not desirable at all:但随后DISTINCT也被传递给 SQL 查询,这根本不可取:

SELECT DISTINCT
       p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'
 
-- Fetched the following Post entity identifiers: [1]

By passing DISTINCT to the SQL query, the EXECUTION PLAN is going to execute an extra Sort phase which adds overhead without bringing any value since the parent-child combinations always return unique records because of the child PK column:通过将DISTINCT传递给 SQL 查询,EXECUTION PLAN 将执行一个额外的排序阶段,这会增加开销而不会带来任何价值,因为父子组合总是返回唯一记录,因为子 PK 列:

Unique  (cost=23.71..23.72 rows=1 width=1068) (actual time=0.131..0.132 rows=2 loops=1)
  ->  Sort  (cost=23.71..23.71 rows=1 width=1068) (actual time=0.131..0.131 rows=2 loops=1)
        Sort Key: p.id, pc.id, p.created_on, pc.post_id, pc.review
        Sort Method: quicksort  Memory: 25kB
        ->  Hash Right Join  (cost=11.76..23.70 rows=1 width=1068) (actual time=0.054..0.058 rows=2 loops=1)
              Hash Cond: (pc.post_id = p.id)
              ->  Seq Scan on post_comment pc  (cost=0.00..11.40 rows=140 width=532) (actual time=0.010..0.010 rows=2 loops=1)
              ->  Hash  (cost=11.75..11.75 rows=1 width=528) (actual time=0.027..0.027 rows=1 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 9kB
                    ->  Seq Scan on post p  (cost=0.00..11.75 rows=1 width=528) (actual time=0.017..0.018 rows=1 loops=1)
                          Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
                          Rows Removed by Filter: 3
Planning time: 0.227 ms
Execution time: 0.179 ms

Entity queries with HINT_PASS_DISTINCT_THROUGH带有 HINT_PASS_DISTINCT_THROUGH 的实体查询

To eliminate the Sort phase from the execution plan, we need to use the HINT_PASS_DISTINCT_THROUGH JPA query hint:要从执行计划中消除排序阶段,我们需要使用HINT_PASS_DISTINCT_THROUGH JPA 查询提示:

List<Post> posts = entityManager
.createQuery(
    "select distinct p " +
    "from Post p " +
    "left join fetch p.comments " +
    "where p.title = :title", Post.class)
.setParameter(
    "title", 
    "High-Performance Java Persistence eBook has been released!"
)
.setHint(QueryHints.HINT_PASS_DISTINCT_THROUGH, false)
.getResultList();
 
LOGGER.info(
    "Fetched the following Post entity identifiers: {}", 
    posts.stream().map(Post::getId).collect(Collectors.toList())
);

And now, the SQL query will not contain DISTINCT but Post entity reference duplicates are going to be removed:现在,SQL 查询将不包含DISTINCT ,但Post实体引用重复项将被删除:

SELECT
       p.id AS id1_0_0_,
       pc.id AS id1_1_1_,
       p.created_on AS created_2_0_0_,
       p.title AS title3_0_0_,
       pc.post_id AS post_id3_1_1_,
       pc.review AS review2_1_1_,
       pc.post_id AS post_id3_1_0__
FROM   post p
LEFT OUTER JOIN
       post_comment pc ON p.id=pc.post_id
WHERE
       p.title='High-Performance Java Persistence eBook has been released!'
 
-- Fetched the following Post entity identifiers: [1]

And the Execution Plan is going to confirm that we no longer have an extra Sort phase this time:执行计划将确认这次我们不再有额外的排序阶段:

Hash Right Join  (cost=11.76..23.70 rows=1 width=1068) (actual time=0.066..0.069 rows=2 loops=1)
  Hash Cond: (pc.post_id = p.id)
  ->  Seq Scan on post_comment pc  (cost=0.00..11.40 rows=140 width=532) (actual time=0.011..0.011 rows=2 loops=1)
  ->  Hash  (cost=11.75..11.75 rows=1 width=528) (actual time=0.041..0.041 rows=1 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Seq Scan on post p  (cost=0.00..11.75 rows=1 width=528) (actual time=0.036..0.037 rows=1 loops=1)
              Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
              Rows Removed by Filter: 3
Planning time: 1.184 ms
Execution time: 0.160 ms
@Entity
@NamedQuery(name = "Customer.listUniqueNames", 
            query = "SELECT DISTINCT c.name FROM Customer c")
public class Customer {
        ...

        private String name;

        public static List<String> listUniqueNames() {
             return = getEntityManager().createNamedQuery(
                   "Customer.listUniqueNames", String.class)
                   .getResultList();
        }
}

Update: See the top-voted answer please.更新:请查看投票最多的答案。

My own is currently obsolete.我自己的现在已经过时了。 Only kept here for historical reasons.仅出于历史原因保留在这里。


Distinct in HQL is usually needed in Joins and not in simple examples like your own.连接中通常需要 HQL 中的不同,而不是像您自己这样的简单示例。

See also How do you create a Distinct query in HQL另请参阅如何在 HQL 中创建 Distinct 查询

I agree with kazanaki 's answer, and it helped me.我同意kazanaki的回答,它帮助了我。 I wanted to select the whole entity, so I used我想选择整个实体,所以我用

 select DISTINCT(c) from Customer c

In my case I have many-to-many relationship, and I want to load entities with collections in one query.就我而言,我有多对多的关系,我想在一个查询中加载带有集合的实体。

I used LEFT JOIN FETCH and at the end I had to make the result distinct.我使用了 LEFT JOIN FETCH,最后我必须使结果与众不同。

I would use JPA's constructor expression feature.我会使用 JPA 的构造函数表达式功能。 See also following answer:另请参阅以下答案:

JPQL Constructor Expression - org.hibernate.hql.ast.QuerySyntaxException:Table is not mapped JPQL 构造函数表达式 - org.hibernate.hql.ast.QuerySyntaxException:表未映射

Following the example in the question, it would be something like this.按照问题中的示例,它将是这样的。

SELECT DISTINCT new com.mypackage.MyNameType(c.name) from Customer c

I'm adding an answer who is slightly specific, in case someone encounters the same issue as I did and finds this question.我正在添加一个稍微具体的答案,以防有人遇到与我相同的问题并找到这个问题。

I used JPQL with query annotations (no query building).我将 JPQL 与查询注释一起使用(没有查询构建)。 And I needed to get distinct values for an Entity that was embedded into another entity, the relationship was asserted via a Many To One annotation.而且我需要为嵌入到另一个实体中的实体获取不同的值,这种关系是通过多对一注释断言的。

I have two database tables :我有两个数据库表:

  • MainEntity , which I want with distinct values MainEntity ,我想要不同的值
  • LinkEntity , which is a relationship table between MainEntity and another table. LinkEntity ,这是 MainEntity 和另一个表之间的关系表。 It has a composite primary key formed with its three columns.它有一个由三列组成的复合主键。

In Java Spring code, this leads to three classes implemented :在 Java Spring 代码中,这导致实现了三个类:

LinkEntity :链接实体:

@Entity
@Immutable
@Table(name="link_entity")
public class LinkEntity implements Entity {

    @EmbeddedId
    private LinkEntityPK pk;

    // ... Getter, setter, toString()
}

LinkEntityPK :链接实体PK:

@Embeddable
public class LinkEntityPK implements Entity, Serializable {

    /** The main entity we want to have distinct values of */
    @ManyToOne
    @JoinColumn(name = "code_entity")
    private MainEntity mainEntity;

    /** */
    @Column(name = "code_pk2")
    private String codeOperation;

    /** */
    @Column(name = "code_pk3")
    private String codeFonction;

MainEntity :主要实体:

@Entity
@Immutable
@Table(name = "main_entity")
public class MainEntity implements Entity {

    /** We use this for LinkEntity*/
    @Id
    @Column(name="code_entity")
    private String codeEntity;


    private String name;
    // And other attributes, getters and setters

So the final query to get distinct values for the main entity is :因此,获取主实体的不同值的最终查询是:

@Repository
public interface EntityRepository extends JpaRepository<LinkEntity, String> {

    @Query(
        "Select " +
            "Distinct linkEntity.pk.intervenant " +
        "From " +
            "LinkEntity as linkEntity " +
            "Join MainEntity as mainEntity On " +
                 "mainEntity = linkEntity.pk.mainEntity ")
    List<MainEntity> getMainEntityList();

}

Hope this can help someone.希望这可以帮助某人。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM