[英]SQL / Hibernate insert only if url not exists
I have a list of urls and a table that contains urls.我有一个 url 列表和一个包含 url 的表。 I want to only insert if the url is not in the table.
如果 url 不在表中,我只想插入。
Data in the Table:
|id | url | ... |
|---| --- | --- |
| 1 | example.com | ... |
List<String> urls = new ArrayList<>()
urls.add("example.com/");
urls.add("example.com/#");
urls.add("www.example.com/");
urls.add("https://www.example.com/");
urls.add("example.net");
After the insert the Data-table should be contain:插入后,数据表应包含:
Data in the Table:
|id | url | ... |
|---| --- | --- |
| 1 | example.com | ... |
| 2 | example.net | ... |
My current attempt is to have a method findByURL(url):List and a call this method for every URL in the list.我目前的尝试是有一个方法 findByURL(url):List 并为列表中的每个 URL 调用此方法。 If the returned list is empty I insert the url into the table but unfortunately my statement makes a difference between example.com and example.com#
如果返回的列表为空,我将 url 插入表中,但不幸的是,我的语句在 example.com 和 example.com 之间有所不同#
@Table(name = "url_to_edit")
@NamedQueries({
@NamedQuery(name= UrlToEdit.FIND_BY_URL, query = "select urlToEdit from UrlToEdit urlToEdit where urlToEdit.url = :url")
})
@NoArgsConstructor
public class UrlToEdit { ... }
With my current solution the table contains the follow rows:使用我当前的解决方案,该表包含以下行:
Data in the Table:
|id | url | ... |
|---| --- | --- |
| 1 | example.com | ... |
| 2 | example.com/ | ... |
| 3 | example.com/# | ... |
| 4 | www.example.com/ | ... |
| 5 | https://www.example.com/ | ... |
| 6 | example.net | ... |
How can I say in the sql that it is the same?我怎么能在 sql 中说它是一样的? Or need I some kind of pre parser?
或者我需要某种预解析器? And is it possible to make a bulk insert?
是否可以进行批量插入? My current Code inserts one after the other.
我当前的代码一个接一个地插入。
EDIT: I have multiple urls from one host.编辑:我有来自一台主机的多个网址。 I can't go after the hostnames.
我不能在主机名之后 go 。 eg example.com/test/ example.com/test/# and example.com/# etc.
例如 example.com/test/example.com/test/# 和 example.com/# 等。
I think you should transform the urls even before storing them into the database;我认为您甚至应该在将它们存储到数据库之前转换它们; this way, all your data would be normalized and you won't have to check every row manually.
这样,您的所有数据都将被规范化,您不必手动检查每一行。 Using a UNIQUE constraint to the url column in the table would help too.
对表中的 url 列使用 UNIQUE 约束也会有所帮助。
In terms of the transformation, I think (not assured) that the following regex might work:在转换方面,我认为(不确定)以下正则表达式可能有效:
Pattern URL_REGEX = Pattern.compile("(?:https?:\\/\\/)?(www\\.)?([^\\/]+).*");
String url = "http://www.example.com/xxx";
Matcher matcher = URG_REGEX.matcher(url);
if (matcher.matches()) {
url = matcher.group(2);
}
NOTE: I adapt the regular expression in order to fit your data, but I wouldn't consider example.com
and www.example.com
to be the same URL.注意:我调整正则表达式以适合您的数据,但我不会考虑将
example.com
和www.example.com
与 URL 相同。
Maybe you can view if exist before with:也许您可以通过以下方式查看是否存在:
select count(urlToEdit) from UrlToEdit urlToEdit where urlToEdit.url like %:url%
if the counter if zero you can insert如果计数器为零,则可以插入
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.