[英]How do you parse a difficult .txt file?
我对java很新,并且一直在尝试读取一个非常困难的.txt文件并将其输入我的MySQL数据库。
对我来说,该文件有一些非常奇怪的分界规则。 划界似乎都是逗号,但其他部分没有任何意义。 这里有几个例子:
" "," "," "," "," "
" ",,,,,,," "
" ",0.00," "
" ",," ",," ",," "
我所知道的是,所有包含字母的字段都是正常的,"text",
格式。
只有数字的所有列都将遵循以下格式: ,0.00,
除了正常格式"123456789",
的第一列"123456789",
然后,没有任何数据会交替之间,,
或," ",
我已经能够使用java.sql.Statement正确读取程序但我需要它来使用java.sql.PreparedStatement
我只能选择几个列才能使用它,但我需要使用100多列,一些字段包含逗号,例如"Some Company, LLC"
这是我目前的代码,但我不知道下一步该去哪里。
import java.io.BufferedReader;
import java.io.FileReader;
import java.sql.*;
public class AccountTest {
public static void main(String[] args) throws Exception {
//Declare DB settings
String dbName = "jdbc:mysql://localhost:3306/local";
String userName = "root";
String password = "";
String fileName = "file.txt";
String psQuery = "insert into accounttest"
+ "(account,account_name,address_1,address_2,address_3) values"
+ "(?,?,?,?,?)";
Connection connect = null;
PreparedStatement statement = null;
String account = null;
String accountName = null;
String address1 = null;
String address2 =null;
String address3 = null;
//Load JDBC Driver
try {
Class.forName("com.mysql.jdbc.Driver");
}
catch (ClassNotFoundException e) {
System.out.println("JDBC driver not found.");
e.printStackTrace();
return;
}
//Attempt connection
try {
connect = DriverManager.getConnection(dbName,userName,password);
}
catch (SQLException e) {
System.out.println("E1: Connection Failed.");
e.printStackTrace();
return;
}
//Verify connection
if (connect != null) {
System.out.println("Connection successful.");
}
else {
System.out.println("E2: Connection Failed.");
}
BufferedReader bReader = new BufferedReader(new FileReader(fileName));
String line;
//import file into mysql DB
try {
//Looping the read block until all lines in the file are read.
while ((line = bReader.readLine()) != null) {
//Splitting the content of comma delimited file
String data[] = line.split("\",\"");
//Renaming array items for ease of use
account = data[0];
accountName = data[1];
address1 = data[2];
address2 = data[3];
address3 = data[4];
// removing double quotes so they do not get put into the db
account = account.replaceAll("\"", "");
accountName = accountName.replaceAll("\"", "");
address1 = address1.replaceAll("\"", "");
address2 = address2.replaceAll("\"", "");
address3 = address3.replaceAll("\"", "");
//putting data into database
statement = connect.prepareStatement(psQuery);
statement.setString(1, account);
statement.setString(2, accountName);
statement.setString(3, address1);
statement.setString(4, address2);
statement.setString(5, address3);
statement.executeUpdate();
}
}
catch (Exception e) {
e.printStackTrace();
statement = null;
}
finally {
bReader.close();
}
}
}
对不起,如果没有正确格式化,我仍在学习,并且在被慌乱几天后试图解决这个问题,我没有打扰让它看起来不错。
我的问题是这样的混乱文件可能会出现这种情况吗? 如果是这样,我该怎么做才有可能呢? 另外,我对准备好的语句并不完全熟悉,我是否必须声明每一列或者是否有更简单的方法?
在此先感谢您的帮助。
编辑:为了澄清我需要的是我需要将一个txt文件上传到MySQL数据库,我需要一种方法来读取和拆分(除非有更好的方法)基于","
,,,,
, ,0.00,
并且仍然保持在字段中有逗号的字段Some Company, LLC
。 我需要使用100多列来执行此操作,文件从3000行到6000行不等。 需要将此作为准备好的声明。 我不确定这是否可行,但我感谢任何人可能对此事提出任何意见。
编辑2:由于rpc1,我能够弄清楚如何整理凌乱的文件。 而不是String data[] = line.split("\\",\\"");
我使用了String data[] = line.split(",(?=([^\\"]*\\"[^\\"]*\\")*[^\\"]*$)");
我仍然需要写出每个变量以将其链接到data[]
然后写出每个列的每个statement.setString
以及写入replaceALL("\\"", "");
对于每一列,但我使它工作,我找不到另一种方法来使用预准备语句。 谢谢你的帮助!
你可以循环,例如:
String psQuery = "insert into accounttest"
+ "(account,account_name,address_1,address_2,address_3,..,adrress_n) values"
+ "(?,?,?,?,?,?,..,?)"; //you have to put m=n+2 values
.....
//you can change separator
String data[] = line.replace("\",\"",";").replace("\"","").split(";");
for(int i=0;i<m;i++)
{
if(i<data.length) //if index smaller then array siz
statement.setString(i+1, data[i]);
else
statement.setString(i+1, ""); //put null
}
statement.executeUpdate();
PS如果您的csv文件大使用批量插入(addBatch())并使用Pattern来拆分字符串
Pattern p = Pattern.compile(";","");
p.split(st);
编辑试试这个分割功能
private static Pattern pSplit = Pattern.compile("[^,\"']+|\"([^\"]*)\"|'([^']*)'"); //set pattern as global var
private static Pattern pReplace = Pattern.compile("\"");
public static Object[] split(String st)
{
List<String> list = new ArrayList<String>();
Matcher m = pSplit.matcher(st);
while (m.find())
list.add( pReplace.matcher(m.group(0)).replaceAll("")); // Add .replace("\"", "") to remove surrounding quotes.
return list.toArray();
}
例如intput string: st="\\"1212\\",\\"LL C ,DDD \\",\\"CA, SPRINGFIELD\\",232.11,3232.00";
拆分5项数组:
1212
LL C ,DDD
CA, SPRINGFIELD
232.11
3232.00
EDIT2
this example solves all your problems (even empty values)
private static Pattern pSplit = Pattern.compile(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
public static String[] split2(String st)
{
String[] tokens = pSplit.split(st);
return tokens;
}
我能够通过这一点代码找出我遇到的两个问题。 再次感谢您的帮助!
for (String line = bReader.readLine(); line != null; line = bReader.readLine()) {
//Splitting the content of comma delimited file
String data[] = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
//Iterating through the file and updating the table.
statement = connect.prepareStatement(psQuery);
for (int i =0; i < data.length;i++) {
temp = data[i];
temp = temp.replaceAll("\"", "");
statement.setString(i+1, temp);
}
statement.executeUpdate();
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.