繁体   English   中英

你如何解析一个困难的.txt文件?

[英]How do you parse a difficult .txt file?

我对java很新,并且一直在尝试读取一个非常困难的.txt文件并将其输入我的MySQL数据库。

对我来说,该文件有一些非常奇怪的分界规则。 划界似乎都是逗号,但其他部分没有任何意义。 这里有几个例子:

" "," "," "," "," "

" ",,,,,,," "

" ",0.00," "

" ",," ",," ",," "

我所知道的是,所有包含字母的字段都是正常的,"text",格式。

只有数字的所有列都将遵循以下格式: ,0.00,除了正常格式"123456789",的第一列"123456789",

然后,没有任何数据会交替之间,,," ",

我已经能够使用java.sql.Statement正确读取程序但我需要它来使用java.sql.PreparedStatement

我只能选择几个列才能使用它,但我需要使用100多列,一些字段包含逗号,例如"Some Company, LLC"

这是我目前的代码,但我不知道下一步该去哪里。

import java.io.BufferedReader;
import java.io.FileReader;
import java.sql.*;


public class AccountTest {

  public static void main(String[] args) throws Exception {


        //Declare DB settings
    String dbName = "jdbc:mysql://localhost:3306/local";
    String userName = "root";
    String password = "";
    String fileName = "file.txt";
    String psQuery = "insert into accounttest"
                     + "(account,account_name,address_1,address_2,address_3) values"
                     + "(?,?,?,?,?)";
    Connection connect = null;
    PreparedStatement statement = null;
    String account = null;
    String accountName = null;
    String address1 = null;
    String address2 =null;
    String address3 = null;


        //Load JDBC Driver
    try {
        Class.forName("com.mysql.jdbc.Driver");
    }
    catch (ClassNotFoundException e) {
        System.out.println("JDBC driver not found.");
        e.printStackTrace();
        return;
    }


        //Attempt connection
    try {
    connect = DriverManager.getConnection(dbName,userName,password);
    }
    catch (SQLException e) {
        System.out.println("E1: Connection Failed.");
        e.printStackTrace();
        return;         
    }


        //Verify connection
    if (connect != null) {
        System.out.println("Connection successful.");
    }   
    else {
        System.out.println("E2: Connection Failed.");
    }


      BufferedReader bReader = new BufferedReader(new FileReader(fileName));
        String line;

        //import file into mysql DB
    try {

        //Looping the read block until all lines in the file are read.
    while ((line = bReader.readLine()) != null) {

            //Splitting the content of comma delimited file
        String data[] = line.split("\",\"");

            //Renaming array items for ease of use
        account = data[0];
        accountName = data[1];
        address1 = data[2];
        address2 = data[3];
        address3 = data[4];

            // removing double quotes so they do not get put into the db
        account = account.replaceAll("\"", "");
        accountName = accountName.replaceAll("\"", "");
        address1 = address1.replaceAll("\"", "");
        address2 = address2.replaceAll("\"", "");
        address3 = address3.replaceAll("\"", "");

            //putting data into database
        statement = connect.prepareStatement(psQuery);
        statement.setString(1, account);
        statement.setString(2, accountName);
        statement.setString(3, address1);
        statement.setString(4, address2);
        statement.setString(5, address3);
        statement.executeUpdate();
    }
    }
    catch (Exception e) {
        e.printStackTrace();
        statement = null;
    }
    finally {
        bReader.close();
    }
}   
}

对不起,如果没有正确格式化,我仍在学习,并且在被慌乱几天后试图解决这个问题,我没有打扰让它看起来不错。

我的问题是这样的混乱文件可能会出现这种情况吗? 如果是这样,我该怎么做才有可能呢? 另外,我对准备好的语句并不完全熟悉,我是否必须声明每一列或者是否有更简单的方法?

在此先感谢您的帮助。

编辑:为了澄清我需要的是我需要将一个txt文件上传到MySQL数据库,我需要一种方法来读取和拆分(除非有更好的方法)基于"," ,,,,,0.00,并且仍然保持在字段中有逗号的字段Some Company, LLC 我需要使用100多列来执行此操作,文件从3000行到6000行不等。 需要将此作为准备好的声明。 我不确定这是否可行,但我感谢任何人可能对此事提出任何意见。

编辑2:由于rpc1,我能够弄清楚如何整理凌乱的文件。 而不是String data[] = line.split("\\",\\""); 我使用了String data[] = line.split(",(?=([^\\"]*\\"[^\\"]*\\")*[^\\"]*$)");我仍然需要写出每个变量以将其链接到data[]然后写出每个列的每个statement.setString以及写入replaceALL("\\"", ""); 对于每一列,但我使它工作,我找不到另一种方法来使用预准备语句。 谢谢你的帮助!

你可以循环,例如:

    String psQuery = "insert into accounttest"
                         + "(account,account_name,address_1,address_2,address_3,..,adrress_n) values"
                         + "(?,?,?,?,?,?,..,?)";  //you have to put m=n+2 values

.....

     //you can change separator 
            String data[] = line.replace("\",\"",";").replace("\"","").split(";");

              for(int i=0;i<m;i++)
              { 
                  if(i<data.length) //if index smaller then array siz
                      statement.setString(i+1, data[i]);
                  else
                      statement.setString(i+1, ""); //put null
              }
              statement.executeUpdate();

PS如果您的csv文件大使用批量插入(addBatch())并使用Pattern来拆分字符串

Pattern p = Pattern.compile(";",""); 
p.split(st);

编辑试试这个分割功能

private static Pattern pSplit = Pattern.compile("[^,\"']+|\"([^\"]*)\"|'([^']*)'"); //set pattern as global var
private static Pattern pReplace = Pattern.compile("\"");
public static Object[] split(String st)
{
   List<String> list = new ArrayList<String>();
   Matcher m = pSplit.matcher(st);
   while (m.find())
   list.add( pReplace.matcher(m.group(0)).replaceAll("")); // Add .replace("\"", "") to remove surrounding quotes.
   return list.toArray();
}

例如intput string: st="\\"1212\\",\\"LL C ,DDD \\",\\"CA, SPRINGFIELD\\",232.11,3232.00"; 拆分5项数组:

1212
LL C ,DDD
CA, SPRINGFIELD
232.11
3232.00

EDIT2

this example solves all your problems (even empty values)


private static Pattern pSplit = Pattern.compile(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
public static String[] split2(String st)
{
    String[] tokens = pSplit.split(st);       
    return tokens;
}

我能够通过这一点代码找出我遇到的两个问题。 再次感谢您的帮助!

for (String line = bReader.readLine(); line != null; line = bReader.readLine()) {   

          //Splitting the content of comma delimited file
    String data[] = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");

         //Iterating through the file and updating the table.
    statement = connect.prepareStatement(psQuery);
    for (int i =0; i < data.length;i++) {
        temp =  data[i];
        temp = temp.replaceAll("\"", "");
        statement.setString(i+1, temp);
    }
    statement.executeUpdate();
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM