简体   繁体   中英

How do you parse a difficult .txt file?

I'm fairly new to java and have been attempting to read a very difficult .txt file and input it into my MySQL DB.

To me, the file has some very weird delimiting rules. the delimiting seems to be all commas but other parts just do not make any sense. here is a few examples:

" "," "," "," "," "

" ",,,,,,," "

" ",0.00," "

" ",," ",," ",," "

What I do know is that all fields containing letters will be the normal ,"text", format.

all columns that only have numerals will follow this format: ,0.00, except for the first column which follows the normal format "123456789",

Then anything with no data will alternate between ,, or ," ",

I have been able to get the program to read correctly with java.sql.Statement but I need it to work with java.sql.PreparedStatement

I can get it to work with only a few columns selected but I need this to work with 100+ columns and some fields contain commas eg "Some Company, LLC"

Here is the code I currently have but I am at a loss as to where to go next.

import java.io.BufferedReader;
import java.io.FileReader;
import java.sql.*;


public class AccountTest {

  public static void main(String[] args) throws Exception {


        //Declare DB settings
    String dbName = "jdbc:mysql://localhost:3306/local";
    String userName = "root";
    String password = "";
    String fileName = "file.txt";
    String psQuery = "insert into accounttest"
                     + "(account,account_name,address_1,address_2,address_3) values"
                     + "(?,?,?,?,?)";
    Connection connect = null;
    PreparedStatement statement = null;
    String account = null;
    String accountName = null;
    String address1 = null;
    String address2 =null;
    String address3 = null;


        //Load JDBC Driver
    try {
        Class.forName("com.mysql.jdbc.Driver");
    }
    catch (ClassNotFoundException e) {
        System.out.println("JDBC driver not found.");
        e.printStackTrace();
        return;
    }


        //Attempt connection
    try {
    connect = DriverManager.getConnection(dbName,userName,password);
    }
    catch (SQLException e) {
        System.out.println("E1: Connection Failed.");
        e.printStackTrace();
        return;         
    }


        //Verify connection
    if (connect != null) {
        System.out.println("Connection successful.");
    }   
    else {
        System.out.println("E2: Connection Failed.");
    }


      BufferedReader bReader = new BufferedReader(new FileReader(fileName));
        String line;

        //import file into mysql DB
    try {

        //Looping the read block until all lines in the file are read.
    while ((line = bReader.readLine()) != null) {

            //Splitting the content of comma delimited file
        String data[] = line.split("\",\"");

            //Renaming array items for ease of use
        account = data[0];
        accountName = data[1];
        address1 = data[2];
        address2 = data[3];
        address3 = data[4];

            // removing double quotes so they do not get put into the db
        account = account.replaceAll("\"", "");
        accountName = accountName.replaceAll("\"", "");
        address1 = address1.replaceAll("\"", "");
        address2 = address2.replaceAll("\"", "");
        address3 = address3.replaceAll("\"", "");

            //putting data into database
        statement = connect.prepareStatement(psQuery);
        statement.setString(1, account);
        statement.setString(2, accountName);
        statement.setString(3, address1);
        statement.setString(4, address2);
        statement.setString(5, address3);
        statement.executeUpdate();
    }
    }
    catch (Exception e) {
        e.printStackTrace();
        statement = null;
    }
    finally {
        bReader.close();
    }
}   
}

Sorry if it's not formatted correctly, I am still learning and after being flustered for several days trying to figure this out, I didn't bother making it look nice.

My question is would something like this be possible with such a jumbled up file? if so, how do I go about making this a possibility? Also, I am not entirely familiar with prepared statements, do I have to declare every single column or is there a simpler way?

Thanks in advance for your help.

EDIT : To clarify what I need is I need to upload a txt file to a MySQL database, I need a way to read and split(unless there is a better way) the data based on either "," , ,,,, , ,0.00, and still keep fields together that have commas in the field Some Company, LLC . I need to do this with 100+ columns and the file varies from 3000 to 6000 rows. Doing this as a prepared statement is required. I'm not sure if this is possible but I appreciate any input anyone might have on the matter.

EDIT2 : I was able to figure out how to get the messy file sorted out thanks to rpc1. instead of String data[] = line.split("\\",\\""); I used String data[] = line.split(",(?=([^\\"]*\\"[^\\"]*\\")*[^\\"]*$)"); I still had to write out each variable to link it to the data[] then write out each statement.setString for each column as well as write the replaceALL("\\"", ""); for each column but I got it working and I couldn't find another way to use prepared statements. Thank you for all your help!

You can cycles for example:

    String psQuery = "insert into accounttest"
                         + "(account,account_name,address_1,address_2,address_3,..,adrress_n) values"
                         + "(?,?,?,?,?,?,..,?)";  //you have to put m=n+2 values

.....

     //you can change separator 
            String data[] = line.replace("\",\"",";").replace("\"","").split(";");

              for(int i=0;i<m;i++)
              { 
                  if(i<data.length) //if index smaller then array siz
                      statement.setString(i+1, data[i]);
                  else
                      statement.setString(i+1, ""); //put null
              }
              statement.executeUpdate();

PS if your csv file large use batch insert (addBatch()) and use Pattern to split string

Pattern p = Pattern.compile(";",""); 
p.split(st);

EDIT Try this split function

private static Pattern pSplit = Pattern.compile("[^,\"']+|\"([^\"]*)\"|'([^']*)'"); //set pattern as global var
private static Pattern pReplace = Pattern.compile("\"");
public static Object[] split(String st)
{
   List<String> list = new ArrayList<String>();
   Matcher m = pSplit.matcher(st);
   while (m.find())
   list.add( pReplace.matcher(m.group(0)).replaceAll("")); // Add .replace("\"", "") to remove surrounding quotes.
   return list.toArray();
}

for example intput string: st="\\"1212\\",\\"LL C ,DDD \\",\\"CA, SPRINGFIELD\\",232.11,3232.00"; split on 5 item array:

1212
LL C ,DDD
CA, SPRINGFIELD
232.11
3232.00

EDIT2

this example solves all your problems (even empty values)


private static Pattern pSplit = Pattern.compile(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");
public static String[] split2(String st)
{
    String[] tokens = pSplit.split(st);       
    return tokens;
}

I was able to figure out both issues that I was having by this little bit of code. Again, thanks for all of your help!

for (String line = bReader.readLine(); line != null; line = bReader.readLine()) {   

          //Splitting the content of comma delimited file
    String data[] = line.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)");

         //Iterating through the file and updating the table.
    statement = connect.prepareStatement(psQuery);
    for (int i =0; i < data.length;i++) {
        temp =  data[i];
        temp = temp.replaceAll("\"", "");
        statement.setString(i+1, temp);
    }
    statement.executeUpdate();
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM