[英]Comparing csv data with oracle database table using java
I need to compare my csv file data with the oracle database table.我需要将我的 csv 文件数据与 oracle 数据库表进行比较。 The data contains nearly 9000 rows.数据包含近 9000 行。 Any links and sources how can I do this.任何链接和来源我该怎么做。 I am using this thread, but it uses the equals method in list string, but this does not compare the data row by row both csv and database table我正在使用这个线程,但它在列表字符串中使用了 equals 方法,但这并没有逐行比较数据 csv 和数据库表
Compare csv file with MySQL database 将 csv 文件与 MySQL 数据库进行比较
Java? Java? I don't speak Java. But, as it is an Oracle database, I'd suggest another approach - an external table .我不会说 Java。但是,因为它是一个 Oracle 数据库,所以我建议使用另一种方法 -外部表。 Here's an example based on Scott's sample schema and its DEPT
table.下面是一个基于 Scott 的示例模式及其DEPT
表的示例。 The CSV file contains data that "fit" that table, but - I'd like to see the differences. CSV 文件包含“适合”该表的数据,但是 - 我想看看差异。
test_dept.csv
file: test_dept.csv
文件:
10,ACCOUNTING,NEW YORK
20,SALES,CHICAGO
30,RESEARCH,DALLAS
40,OPERATIONS,BOSTON
50,CIA,LANGLEY
External table: in order to use it, there must be a directory (line #8) (Oracle object that points to a filesystem directory, usually located on the database server. It contains the csv file (line #18));外部表:为了使用它,必须有一个目录(第 8 行)(指向文件系统目录的 Oracle object,通常位于数据库服务器上。它包含 csv 文件(第 18 行)); user which will be using it has to have at least read
privilege on it:将要使用它的用户必须至少拥有read
权限:
SQL> create table dept_ext
2 (deptno char(2),
3 dname char(20),
4 loc char(20)
5 )
6 organization external (
7 type oracle_loader
8 default directory ext_dir
9 access parameters (
10 records delimited by newline
11 fields terminated by ','
12 missing field values are null
13 ( deptno char(2),
14 dname char(20),
15 loc char(20)
16 )
17 )
18 location ('test_dept.csv')
19 )
20 reject limit unlimited;
Table created.
Does it see any data?它看到任何数据吗?
SQL> select * from dept_ext;
DE DNAME LOC
-- -------------------- --------------------
10 ACCOUNTING NEW YORK
20 SALES CHICAGO
30 RESEARCH DALLAS
40 OPERATIONS BOSTON
50 CIA LANGLEY
Yes, it does.是的,它确实。 What's in the "original" dept
table? “原始” dept
表中有什么?
SQL> select * from dept;
DEPTNO DNAME LOC
---------- -------------- -------------
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
OK, so now what?好的,那现在呢? As it is a "table", you can write any select
you want, join it to other tables... for example: which departments from the csv file don't exist in the database table?因为它是一个“表”,所以你可以写任何你想要的select
,将它加入到其他表中......例如:数据库表中不存在csv文件中的哪些部门?
SQL> select * from dept_ext
2 where deptno not in (select deptno from dept);
DE DNAME LOC
-- -------------------- --------------------
50 CIA LANGLEY
If I join tables on deptno
, are there any differences in department name?如果我加入deptno
上的表,部门名称有什么不同吗?
SQL> select e.deptno, e.dname, e.loc, d.dname, d.loc
2 from dept_ext e join dept d on d.deptno = e.deptno
3 and trim(d.dname) <> trim(e.dname);
DE DNAME LOC DNAME LOC
-- -------------------- -------------------- -------------- -------------
20 SALES CHICAGO RESEARCH DALLAS
30 RESEARCH DALLAS SALES CHICAGO
SQL>
And so forth.等等。 Looks like it might do what you want.看起来它可能会做你想做的事。
The code will be very long if you try to use Java to do this.如果您尝试使用 Java 来执行此操作,代码将会很长。 But it is convenient to compare an CSV file and a table in the Oracle database using SPL, the open-source Java package.但是使用SPL比较一个CSV文件和Oracle数据库中的一个表很方便,开源的Java package。
Suppose we have an employee table in Oracle database:假设我们在 Oracle 数据库中有一张员工表:
CREATE TABLE EMPLOYEE
(EID NUMBER(8),
NAME VARCHAR2(255),
SURNAME VARCHAR2(255),
GENDER VARCHAR2(255),
STATE VARCHAR2(255),
BIRTHDAY DATE,
HIREDATE DATE,
DEPT VARCHAR2(255),
SALARY NUMBER(8)
);
INSERT INTO EMPLOYEE VALUES (1,'Rebecca','Moore','F','California',TIMESTAMP'1974-11-20 00:00:00.0',TIMESTAMP'2005-03-11 00:00:00.0','R&D',7000);
INSERT INTO EMPLOYEE VALUES (2,'Ashley','Wilson','F','New York',TIMESTAMP'1980-07-19 00:00:00.0',TIMESTAMP'2008-03-16 00:00:00.0','Finance',11000);
INSERT INTO EMPLOYEE VALUES (3,'Rachel','Johnson','F','New Mexico',TIMESTAMP'1970-12-17 00:00:00.0',TIMESTAMP'2010-12-01 00:00:00.0','Sales',9000);
INSERT INTO EMPLOYEE VALUES (4,'Emily','Smith','F','Texas',TIMESTAMP'1985-03-07 00:00:00.0',TIMESTAMP'2006-08-15 00:00:00.0','HR',7000);
INSERT INTO EMPLOYEE VALUES (5,'Ashley','Smith','F','Texas',TIMESTAMP'1975-05-13 00:00:00.0',TIMESTAMP'2004-07-30 00:00:00.0','R&D',16000);
INSERT INTO EMPLOYEE VALUES (6,'Matthew','Johnson','M','California',TIMESTAMP'1984-07-07 00:00:00.0',TIMESTAMP'2005-07-07 00:00:00.0','Sales',11000);
INSERT INTO EMPLOYEE VALUES (7,'Alexis','Smith','F','Illinois',TIMESTAMP'1972-08-16 00:00:00.0',TIMESTAMP'2002-08-16 00:00:00.0','Sales',9000);
INSERT INTO EMPLOYEE VALUES (8,'Megan','Wilson','F','California',TIMESTAMP'1979-04-19 00:00:00.0',TIMESTAMP'1984-04-19 00:00:00.0','Marketing',11000);
INSERT INTO EMPLOYEE VALUES (9,'Victoria','Davis','F','Texas',TIMESTAMP'1983-12-07 00:00:00.0',TIMESTAMP'2009-12-07 00:00:00.0','HR',3000);
INSERT INTO EMPLOYEE VALUES (10,'Ryan','Johnson','M','Pennsylvania',TIMESTAMP'1976-03-12 00:00:00.0',TIMESTAMP'2006-03-12 00:00:00.0','R&D',13000);
And a CSV file employee.csv:和一个 CSV 文件 employee.csv:
EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY
1,Rebecca,Moore,F,California,1974-11-20 00:00:00,2005-03-11 00:00:00,R&D,7000
3,Rachel,Johnson,F,New Mexico,1970-12-17 00:00:00,2010-12-01 00:00:00,Sales,9000
5,Ashley,Smith,F,Texas,1975-05-13 00:00:00,2004-07-30 00:00:00,R&D,16000
7,Alexis,Smith,F,Illinois,1972-08-16 00:00:00,2002-08-16 00:00:00,Sales,9000
9,Victoria,Davis,F,Texas,1983-12-07 00:00:00,2009-12-07 00:00:00,HR,3000
In order to get difference between the Oracle employee table and the CSV file (below is the expected result):为了获得 Oracle 员工表和 CSV 文件之间的差异(以下是预期结果):
EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY
2,Ashley,Wilson,F,New York,1980-07-19 00:00:00,2008-03-16 00:00:00,Finance,11000
4,Emily,Smith,F,Texas,1985-03-07 00:00:00,2006-08-15 00:00:00,HR,7000
6,Matthew,Johnson,M,California,1984-07-07 00:00:00,2005-07-07 00:00:00,Sales,11000
8,Megan,Wilson,F,California,1979-04-19 00:00:00,1984-04-19 00:00:00,Marketing,11000
10,Ryan,Johnson,M,Pennsylvania,1976-03-12 00:00:00,2006-03-12 00:00:00,R&D,13000
And to calculate the intersection of Oracle employee table an the CSV file:并计算 Oracle 员工表与 CSV 文件的交集:
EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY
1,Rebecca,Moore,F,California,1974-11-20 00:00:00,2005-03-11 00:00:00,R&D,7000
3,Rachel,Johnson,F,New Mexico,1970-12-17 00:00:00,2010-12-01 00:00:00,Sales,9000
5,Ashley,Smith,F,Texas,1975-05-13 00:00:00,2004-07-30 00:00:00,R&D,16000
7,Alexis,Smith,F,Illinois,1972-08-16 00:00:00,2002-08-16 00:00:00,Sales,9000
9,Victoria,Davis,F,Texas,1983-12-07 00:00:00,2009-12-07 00:00:00,HR,3000
We just need a number of lines of SPL code:我们只需要几行 SPL 代码:
A一种 | |
---|---|
1 1个 | =ORACLE.query@x("SELECT * FROM EMPLOYEE") =ORACLE.query@x("SELECT * FROM EMPLOYEE") |
2 2个 | =file("employee.csv").import@ct(EID:decimal,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY:decimal) =file("employee.csv").import@ct(EID:decimal,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY:decimal) |
3 3个 | =INTERSECT=[A1,A2].merge@oi(EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY) =INTERSECT=[A1,A2].merge@oi(EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY) |
4 4个 | =MINUS=[A1,A2].merge@od(EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY) =MINUS=[A1,A2].merge@od(EID,NAME,SURNAME,GENDER,STATE,BIRTHDAY,HIREDATE,DEPT,SALARY) |
SPL offers JDBC driver to be invoked by Java. Just store the above SPL script as cmp.splx and invoke it in Java as you call a stored procedure: SPL提供了JDBC驱动,Java可以调用。只需将上面的SPL脚本保存为cmp.splx,在Java调用存储过程即可:
…
Class.forName("com.esproc.jdbc.InternalDriver");
con= DriverManager.getConnection("jdbc:esproc:local://");
st=con.prepareCall("call cmp()");
st.execute();
…
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.