简体繁体中英

server on Java to work with database input staging

原文 2012-02-26 09:29:46 3 2 java/ database/ duplicates/ record-linkage

Please suggest me Java product (I would prefer open-source) which does do:

data deduplication
deduplication scoring
allows to customize deduplication rules and scoring rules.

Please see the example:

I have an input staging database named "INPUT_DB"
I have a table named "INPUT_PERSONS"
There are several fields in this table:
ID (some meaningless surrogate primary key)
FIRST_NAME
LAST_NAME
SECOND_NAME
BIRTH_DATE
PASSPORT_SERIES (PASSPORT_SERIES + PASSPORT_NUM is a unique identifier of a citizen)
PASSPORT_NUM

I have to look through all records in INPUT_PERSONS and find duplicates and matches. Several rules should be created:

if PASSPORT_SERIES+PASSPORT_NUM equals to some record it means these two records are duplicates. The score for such situation is 100 out of 100
If FIRST_NAME, LAST_NAME are equal, but PASSPORT_SERIES+PASSPORT_NUM has one different character (misprint for example), then these records are possible duplicates and their score is 90 out of 100.
And so on....

Is it possible to find some ready solution and use it as a base?

2 answers

Ive done this in the past and based it on the fellEgi-sunter algo. See this question: Is there a open source implementation for Fellegi-Sunter?

DUKE项目可以满足您的要求： https : //github.com/larsga/Duke

Framework for JAVA socket server application

Deduplication work not as expected in reducer

Deduplication of repeated numbers in java

String Deduplication feature of Java 8

Data Deduplication In Cloud WIth Java

Deduplication using a Java Set

String Deduplication with Flatbuffers in Java

Java SE server + database + rest framework

Database server for a simple java desktop application

Java Application Deploying With SQL Server Database

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Framework for JAVA socket server application Deduplication work not as expected in reducer Deduplication of repeated numbers in java String Deduplication feature of Java 8 Data Deduplication In Cloud WIth Java Deduplication using a Java Set String Deduplication with Flatbuffers in Java Java SE server + database + rest framework Database server for a simple java desktop application Java Application Deploying With SQL Server Database

Related Tags

Deduplication with scoring framework/application/server on Java to work with database input staging

Question

2 answers

solution1
1 2012-02-26 09:56:04

solution2
0 2017-04-20 02:40:41

Deduplication with scoring framework/application/server on Java to work with database input staging

Question

2 answers

solution1 1 2012-02-26 09:56:04

solution2 0 2017-04-20 02:40:41

solution1
1 2012-02-26 09:56:04

solution2
0 2017-04-20 02:40:41