简体   繁体   English

使用java比较csv数据与oracle数据库表

[英]Comparing csv data with oracle database table using java

I need to compare my csv file data with the oracle database table.我需要将我的 csv 文件数据与 oracle 数据库表进行比较。 The data contains nearly 9000 rows.数据包含近 9000 行。 Any links and sources how can I do this.任何链接和来源我该怎么做。 I am using this thread, but it uses the equals method in list string, but this does not compare the data row by row both csv and database table我正在使用这个线程,但它在列表字符串中使用了 equals 方法,但这并没有逐行比较数据 csv 和数据库表

Compare csv file with MySQL database 将 csv 文件与 MySQL 数据库进行比较

Java? Java? I don't speak Java. But, as it is an Oracle database, I'd suggest another approach - an external table .我不会说 Java。但是,因为它是一个 Oracle 数据库,所以我建议使用另一种方法 -外部表 Here's an example based on Scott's sample schema and its DEPT table.下面是一个基于 Scott 的示例模式及其DEPT表的示例。 The CSV file contains data that "fit" that table, but - I'd like to see the differences. CSV 文件包含“适合”该表的数据,但是 - 我想看看差异。

test_dept.csv file: test_dept.csv文件:

10,ACCOUNTING,NEW YORK
20,SALES,CHICAGO
30,RESEARCH,DALLAS
40,OPERATIONS,BOSTON
50,CIA,LANGLEY

External table: in order to use it, there must be a directory (line #8) (Oracle object that points to a filesystem directory, usually located on the database server. It contains the csv file (line #18));外部表:为了使用它,必须有一个目录(第 8 行)(指向文件系统目录的 Oracle object,通常位于数据库服务器上。它包含 csv 文件(第 18 行)); user which will be using it has to have at least read privilege on it:将要使用它的用户必须至少拥有read权限:

SQL> create table dept_ext
  2    (deptno   char(2),
  3     dname    char(20),
  4     loc      char(20)
  5    )
  6  organization external (
  7    type oracle_loader
  8    default directory ext_dir
  9    access parameters (
 10      records delimited by newline
 11      fields terminated by ','
 12      missing field values are null
 13      ( deptno  char(2),
 14        dname   char(20),
 15        loc     char(20)
 16      )
 17    )
 18    location ('test_dept.csv')
 19  )
 20  reject limit unlimited;

Table created.

Does it see any data?看到任何数据吗?

SQL> select * from dept_ext;

DE DNAME                LOC
-- -------------------- --------------------
10 ACCOUNTING           NEW YORK
20 SALES                CHICAGO
30 RESEARCH             DALLAS
40 OPERATIONS           BOSTON
50 CIA                  LANGLEY

Yes, it does.是的,它确实。 What's in the "original" dept table? “原始” dept表中有什么?

SQL> select * from dept;

    DEPTNO DNAME          LOC
---------- -------------- -------------
        10 ACCOUNTING     NEW YORK
        20 RESEARCH       DALLAS
        30 SALES          CHICAGO
        40 OPERATIONS     BOSTON

OK, so now what?好的,那现在呢? As it is a "table", you can write any select you want, join it to other tables... for example: which departments from the csv file don't exist in the database table?因为它是一个“表”,所以你可以写任何你想要的select ,将它加入到其他表中......例如:数据库表中不存在csv文件中的哪些部门?

SQL> select * from dept_ext
  2  where deptno not in (select deptno from dept);

DE DNAME                LOC
-- -------------------- --------------------
50 CIA                  LANGLEY

If I join tables on deptno , are there any differences in department name?如果我加入deptno上的表,部门名称有什么不同吗?

SQL> select e.deptno, e.dname, e.loc, d.dname, d.loc
  2  from dept_ext e join dept d on d.deptno = e.deptno
  3                             and trim(d.dname) <> trim(e.dname);

DE DNAME                LOC                  DNAME          LOC
-- -------------------- -------------------- -------------- -------------
20 SALES                CHICAGO              RESEARCH       DALLAS
30 RESEARCH             DALLAS               SALES          CHICAGO

SQL>

And so forth.等等。 Looks like it might do what you want.看起来它可能会做你想做的事。

The code will be very long if you try to use Java to do this.如果您尝试使用 Java 来执行此操作,代码将会很长。 But it is convenient to compare an CSV file and a table in the Oracle database using SPL, the open-source Java package.但是使用SPL比较一个CSV文件和Oracle数据库中的一个表很方便,开源的Java package。

Suppose we have an employee table in Oracle database:假设我们在 Oracle 数据库中有一张员工表:

CREATE TABLE EMPLOYEE
  (EID NUMBER(8),
  NAME VARCHAR2(255),
  SURNAME VARCHAR2(255),
  GENDER VARCHAR2(255),
  STATE VARCHAR2(255),
  BIRTHDAY DATE,
  HIREDATE DATE,
  DEPT VARCHAR2(255),
  SALARY NUMBER(8)
);

INSERT INTO EMPLOYEE VALUES (1,'Rebecca','Moore','F','California',TIMESTAMP'1974-11-20 00:00:00.0',TIMESTAMP'2005-03-11 00:00:00.0','R&D',7000);
INSERT INTO EMPLOYEE VALUES (2,'Ashley','Wilson','F','New York',TIMESTAMP'1980-07-19 00:00:00.0',TIMESTAMP'2008-03-16 00:00:00.0','Finance',11000);
INSERT INTO EMPLOYEE VALUES (3,'Rachel','Johnson','F','New Mexico',TIMESTAMP'1970-12-17 00:00:00.0',TIMESTAMP'2010-12-01 00:00:00.0','Sales',9000);
INSERT INTO EMPLOYEE VALUES (4,'Emily','Smith','F','Texas',TIMESTAMP'1985-03-07 00:00:00.0',TIMESTAMP'2006-08-15 00:00:00.0','HR',7000);
INSERT INTO EMPLOYEE VALUES (5,'Ashley','Smith','F','Texas',TIMESTAMP'1975-05-13 00:00:00.0',TIMESTAMP'2004-07-30 00:00:00.0','R&D',16000);
INSERT INTO EMPLOYEE VALUES (6,'Matthew','Johnson','M','California',TIMESTAMP'1984-07-07 00:00:00.0',TIMESTAMP'2005-07-07 00:00:00.0','Sales',11000);
INSERT INTO EMPLOYEE VALUES (7,'Alexis','Smith','F','Illinois',TIMESTAMP'1972-08-16 00:00:00.0',TIMESTAMP'2002-08-16 00:00:00.0','Sales',9000);
INSERT INTO EMPLOYEE VALUES (8,'Megan','Wilson','F','California',TIMESTAMP'1979-04-19 00:00:00.0',TIMESTAMP'1984-04-19 00:00:00.0','Marketing',11000);
INSERT INTO EMPLOYEE VALUES (9,'Victoria','Davis','F','Texas',TIMESTAMP'1983-12-07 00:00:00.0',TIMESTAMP'2009-12-07 00:00:00.0','HR',3000);
INSERT INTO EMPLOYEE VALUES (10,'Ryan','Johnson','M','Pennsylvania',TIMESTAMP'1976-03-12 00:00:00.0',TIMESTAMP'2006-03-12 00:00:00.0','R&D',13000);

And a CSV file employee.csv:和一个 CSV 文件 employee.csv:

EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY
1,Rebecca,Moore,F,California,1974-11-20 00:00:00,2005-03-11 00:00:00,R&D,7000
3,Rachel,Johnson,F,New Mexico,1970-12-17 00:00:00,2010-12-01 00:00:00,Sales,9000
5,Ashley,Smith,F,Texas,1975-05-13 00:00:00,2004-07-30 00:00:00,R&D,16000
7,Alexis,Smith,F,Illinois,1972-08-16 00:00:00,2002-08-16 00:00:00,Sales,9000
9,Victoria,Davis,F,Texas,1983-12-07 00:00:00,2009-12-07 00:00:00,HR,3000

In order to get difference between the Oracle employee table and the CSV file (below is the expected result):为了获得 Oracle 员工表和 CSV 文件之间的差异(以下是预期结果):

EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY
2,Ashley,Wilson,F,New York,1980-07-19 00:00:00,2008-03-16 00:00:00,Finance,11000
4,Emily,Smith,F,Texas,1985-03-07 00:00:00,2006-08-15 00:00:00,HR,7000
6,Matthew,Johnson,M,California,1984-07-07 00:00:00,2005-07-07 00:00:00,Sales,11000
8,Megan,Wilson,F,California,1979-04-19 00:00:00,1984-04-19 00:00:00,Marketing,11000
10,Ryan,Johnson,M,Pennsylvania,1976-03-12 00:00:00,2006-03-12 00:00:00,R&D,13000

And to calculate the intersection of Oracle employee table an the CSV file:并计算 Oracle 员工表与 CSV 文件的交集:

EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY
1,Rebecca,Moore,F,California,1974-11-20 00:00:00,2005-03-11 00:00:00,R&D,7000
3,Rachel,Johnson,F,New Mexico,1970-12-17 00:00:00,2010-12-01 00:00:00,Sales,9000
5,Ashley,Smith,F,Texas,1975-05-13 00:00:00,2004-07-30 00:00:00,R&D,16000
7,Alexis,Smith,F,Illinois,1972-08-16 00:00:00,2002-08-16 00:00:00,Sales,9000
9,Victoria,Davis,F,Texas,1983-12-07 00:00:00,2009-12-07 00:00:00,HR,3000

We just need a number of lines of SPL code:我们只需要几行 SPL 代码:

A一种
1 1个 =ORACLE.query@x("SELECT * FROM EMPLOYEE") =ORACLE.query@x("SELECT * FROM EMPLOYEE")
2 2个 =file("employee.csv").import@ct(EID:decimal,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY:decimal) =file("employee.csv").import@ct(EID:decimal,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY:decimal)
3 3个 =INTERSECT=[A1,A2].merge@oi(EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY) =INTERSECT=[A1,A2].merge@oi(EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY)
4 4个 =MINUS=[A1,A2].merge@od(EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY) =MINUS=[A1,A2].merge@od(EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY)

SPL offers JDBC driver to be invoked by Java. Just store the above SPL script as cmp.splx and invoke it in Java as you call a stored procedure: SPL提供了JDBC驱动,Java可以调用。只需将上面的SPL脚本保存为cmp.splx,在Java调用存储过程即可:

…
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
st=con.prepareCall("call cmp()");
st.execute();
…

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM