简体   繁体   English

SQL / Hibernate 仅在 url 不存在时插入

[英]SQL / Hibernate insert only if url not exists

I have a list of urls and a table that contains urls.我有一个 url 列表和一个包含 url 的表。 I want to only insert if the url is not in the table.如果 url 不在表中,我只想插入。

Data in the Table: 
|id | url | ... |
|---| --- | --- |
| 1 | example.com | ... | 

List<String> urls = new ArrayList<>()
urls.add("example.com/");
urls.add("example.com/#");
urls.add("www.example.com/");
urls.add("https://www.example.com/");
urls.add("example.net");

After the insert the Data-table should be contain:插入后,数据表应包含:

Data in the Table: 
|id | url | ... |
|---| --- | --- |
| 1 | example.com | ... | 
| 2 | example.net | ... |

My current attempt is to have a method findByURL(url):List and a call this method for every URL in the list.我目前的尝试是有一个方法 findByURL(url):List 并为列表中的每个 URL 调用此方法。 If the returned list is empty I insert the url into the table but unfortunately my statement makes a difference between example.com and example.com#如果返回的列表为空,我将 url 插入表中,但不幸的是,我的语句在 example.com 和 example.com 之间有所不同#

@Table(name = "url_to_edit")
@NamedQueries({
        @NamedQuery(name= UrlToEdit.FIND_BY_URL, query = "select urlToEdit from UrlToEdit urlToEdit where urlToEdit.url = :url")
})
@NoArgsConstructor
public class UrlToEdit { ... }

With my current solution the table contains the follow rows:使用我当前的解决方案,该表包含以下行:

Data in the Table: 
|id | url | ... |
|---| --- | --- |
| 1 | example.com | ... | 
| 2 | example.com/ | ... | 
| 3 | example.com/# | ... | 
| 4 | www.example.com/ | ... | 
| 5 | https://www.example.com/ | ... | 
| 6 | example.net | ... | 


How can I say in the sql that it is the same?我怎么能在 sql 中说它是一样的? Or need I some kind of pre parser?或者我需要某种预解析器? And is it possible to make a bulk insert?是否可以进行批量插入? My current Code inserts one after the other.我当前的代码一个接一个地插入。

EDIT: I have multiple urls from one host.编辑:我有来自一台主机的多个网址。 I can't go after the hostnames.我不能在主机名之后 go 。 eg example.com/test/ example.com/test/# and example.com/# etc.例如 example.com/test/example.com/test/# 和 example.com/# 等。

I think you should transform the urls even before storing them into the database;我认为您甚至应该在将它们存储到数据库之前转换它们; this way, all your data would be normalized and you won't have to check every row manually.这样,您的所有数据都将被规范化,您不必手动检查每一行。 Using a UNIQUE constraint to the url column in the table would help too.对表中的 url 列使用 UNIQUE 约束也会有所帮助。

In terms of the transformation, I think (not assured) that the following regex might work:在转换方面,我认为(不确定)以下正则表达式可能有效:

 Pattern URL_REGEX = Pattern.compile("(?:https?:\\/\\/)?(www\\.)?([^\\/]+).*");
 String url = "http://www.example.com/xxx";
 Matcher matcher = URG_REGEX.matcher(url);
 if (matcher.matches()) {
    url = matcher.group(2);
 } 

NOTE: I adapt the regular expression in order to fit your data, but I wouldn't consider example.com and www.example.com to be the same URL.注意:我调整正则表达式以适合您的数据,但我不会考虑将example.comwww.example.com与 URL 相同。

Maybe you can view if exist before with:也许您可以通过以下方式查看是否存在:

select count(urlToEdit) from UrlToEdit urlToEdit where urlToEdit.url like %:url%

if the counter if zero you can insert如果计数器为零,则可以插入

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM