简体   繁体   中英

Comparing csv data with oracle database table using java

I need to compare my csv file data with the oracle database table. The data contains nearly 9000 rows. Any links and sources how can I do this. I am using this thread, but it uses the equals method in list string, but this does not compare the data row by row both csv and database table

Compare csv file with MySQL database

Java? I don't speak Java. But, as it is an Oracle database, I'd suggest another approach - an external table . Here's an example based on Scott's sample schema and its DEPT table. The CSV file contains data that "fit" that table, but - I'd like to see the differences.

test_dept.csv file:

10,ACCOUNTING,NEW YORK
20,SALES,CHICAGO
30,RESEARCH,DALLAS
40,OPERATIONS,BOSTON
50,CIA,LANGLEY

External table: in order to use it, there must be a directory (line #8) (Oracle object that points to a filesystem directory, usually located on the database server. It contains the csv file (line #18)); user which will be using it has to have at least read privilege on it:

SQL> create table dept_ext
  2    (deptno   char(2),
  3     dname    char(20),
  4     loc      char(20)
  5    )
  6  organization external (
  7    type oracle_loader
  8    default directory ext_dir
  9    access parameters (
 10      records delimited by newline
 11      fields terminated by ','
 12      missing field values are null
 13      ( deptno  char(2),
 14        dname   char(20),
 15        loc     char(20)
 16      )
 17    )
 18    location ('test_dept.csv')
 19  )
 20  reject limit unlimited;

Table created.

Does it see any data?

SQL> select * from dept_ext;

DE DNAME                LOC
-- -------------------- --------------------
10 ACCOUNTING           NEW YORK
20 SALES                CHICAGO
30 RESEARCH             DALLAS
40 OPERATIONS           BOSTON
50 CIA                  LANGLEY

Yes, it does. What's in the "original" dept table?

SQL> select * from dept;

    DEPTNO DNAME          LOC
---------- -------------- -------------
        10 ACCOUNTING     NEW YORK
        20 RESEARCH       DALLAS
        30 SALES          CHICAGO
        40 OPERATIONS     BOSTON

OK, so now what? As it is a "table", you can write any select you want, join it to other tables... for example: which departments from the csv file don't exist in the database table?

SQL> select * from dept_ext
  2  where deptno not in (select deptno from dept);

DE DNAME                LOC
-- -------------------- --------------------
50 CIA                  LANGLEY

If I join tables on deptno , are there any differences in department name?

SQL> select e.deptno, e.dname, e.loc, d.dname, d.loc
  2  from dept_ext e join dept d on d.deptno = e.deptno
  3                             and trim(d.dname) <> trim(e.dname);

DE DNAME                LOC                  DNAME          LOC
-- -------------------- -------------------- -------------- -------------
20 SALES                CHICAGO              RESEARCH       DALLAS
30 RESEARCH             DALLAS               SALES          CHICAGO

SQL>

And so forth. Looks like it might do what you want.

The code will be very long if you try to use Java to do this. But it is convenient to compare an CSV file and a table in the Oracle database using SPL, the open-source Java package.

Suppose we have an employee table in Oracle database:

CREATE TABLE EMPLOYEE
  (EID NUMBER(8),
  NAME VARCHAR2(255),
  SURNAME VARCHAR2(255),
  GENDER VARCHAR2(255),
  STATE VARCHAR2(255),
  BIRTHDAY DATE,
  HIREDATE DATE,
  DEPT VARCHAR2(255),
  SALARY NUMBER(8)
);

INSERT INTO EMPLOYEE VALUES (1,'Rebecca','Moore','F','California',TIMESTAMP'1974-11-20 00:00:00.0',TIMESTAMP'2005-03-11 00:00:00.0','R&D',7000);
INSERT INTO EMPLOYEE VALUES (2,'Ashley','Wilson','F','New York',TIMESTAMP'1980-07-19 00:00:00.0',TIMESTAMP'2008-03-16 00:00:00.0','Finance',11000);
INSERT INTO EMPLOYEE VALUES (3,'Rachel','Johnson','F','New Mexico',TIMESTAMP'1970-12-17 00:00:00.0',TIMESTAMP'2010-12-01 00:00:00.0','Sales',9000);
INSERT INTO EMPLOYEE VALUES (4,'Emily','Smith','F','Texas',TIMESTAMP'1985-03-07 00:00:00.0',TIMESTAMP'2006-08-15 00:00:00.0','HR',7000);
INSERT INTO EMPLOYEE VALUES (5,'Ashley','Smith','F','Texas',TIMESTAMP'1975-05-13 00:00:00.0',TIMESTAMP'2004-07-30 00:00:00.0','R&D',16000);
INSERT INTO EMPLOYEE VALUES (6,'Matthew','Johnson','M','California',TIMESTAMP'1984-07-07 00:00:00.0',TIMESTAMP'2005-07-07 00:00:00.0','Sales',11000);
INSERT INTO EMPLOYEE VALUES (7,'Alexis','Smith','F','Illinois',TIMESTAMP'1972-08-16 00:00:00.0',TIMESTAMP'2002-08-16 00:00:00.0','Sales',9000);
INSERT INTO EMPLOYEE VALUES (8,'Megan','Wilson','F','California',TIMESTAMP'1979-04-19 00:00:00.0',TIMESTAMP'1984-04-19 00:00:00.0','Marketing',11000);
INSERT INTO EMPLOYEE VALUES (9,'Victoria','Davis','F','Texas',TIMESTAMP'1983-12-07 00:00:00.0',TIMESTAMP'2009-12-07 00:00:00.0','HR',3000);
INSERT INTO EMPLOYEE VALUES (10,'Ryan','Johnson','M','Pennsylvania',TIMESTAMP'1976-03-12 00:00:00.0',TIMESTAMP'2006-03-12 00:00:00.0','R&D',13000);

And a CSV file employee.csv:

EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY
1,Rebecca,Moore,F,California,1974-11-20 00:00:00,2005-03-11 00:00:00,R&D,7000
3,Rachel,Johnson,F,New Mexico,1970-12-17 00:00:00,2010-12-01 00:00:00,Sales,9000
5,Ashley,Smith,F,Texas,1975-05-13 00:00:00,2004-07-30 00:00:00,R&D,16000
7,Alexis,Smith,F,Illinois,1972-08-16 00:00:00,2002-08-16 00:00:00,Sales,9000
9,Victoria,Davis,F,Texas,1983-12-07 00:00:00,2009-12-07 00:00:00,HR,3000

In order to get difference between the Oracle employee table and the CSV file (below is the expected result):

EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY
2,Ashley,Wilson,F,New York,1980-07-19 00:00:00,2008-03-16 00:00:00,Finance,11000
4,Emily,Smith,F,Texas,1985-03-07 00:00:00,2006-08-15 00:00:00,HR,7000
6,Matthew,Johnson,M,California,1984-07-07 00:00:00,2005-07-07 00:00:00,Sales,11000
8,Megan,Wilson,F,California,1979-04-19 00:00:00,1984-04-19 00:00:00,Marketing,11000
10,Ryan,Johnson,M,Pennsylvania,1976-03-12 00:00:00,2006-03-12 00:00:00,R&D,13000

And to calculate the intersection of Oracle employee table an the CSV file:

EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY
1,Rebecca,Moore,F,California,1974-11-20 00:00:00,2005-03-11 00:00:00,R&D,7000
3,Rachel,Johnson,F,New Mexico,1970-12-17 00:00:00,2010-12-01 00:00:00,Sales,9000
5,Ashley,Smith,F,Texas,1975-05-13 00:00:00,2004-07-30 00:00:00,R&D,16000
7,Alexis,Smith,F,Illinois,1972-08-16 00:00:00,2002-08-16 00:00:00,Sales,9000
9,Victoria,Davis,F,Texas,1983-12-07 00:00:00,2009-12-07 00:00:00,HR,3000

We just need a number of lines of SPL code:

A
1 =ORACLE.query@x("SELECT * FROM EMPLOYEE")
2 =file("employee.csv").import@ct(EID:decimal,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY:decimal)
3 =INTERSECT=[A1,A2].merge@oi(EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY)
4 =MINUS=[A1,A2].merge@od(EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY)

SPL offers JDBC driver to be invoked by Java. Just store the above SPL script as cmp.splx and invoke it in Java as you call a stored procedure:

…
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
st=con.prepareCall("call cmp()");
st.execute();
…

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM