简体   繁体   中英

MySQL Procedure: substring_index throwing exception from special characters (executed in bash)

So the thing that makes this whole question hard is that I am working in a bash shell environment. I am parsing a large amount of data that is all located in text files in a set of directories. The environment I am working in does not have a gui, and is just the shell, and I am executing the commands from the shell through mysql, I am not logged into mysql.

I am the partner on a project, the main part is a bash script that searches for information and inserts it into text files in several directories. My operations parse out the needed data and inserts it into the database.

I run my main loop through a shell script. It loops through a set of directories and searches for the .txt files in each. I then pass the information to my procedure. In something like the below.

NOTE: I am not an expert in bash and have just started learning.

mysql - user -p'mypassword' --database=dbname <<EFO
call Procedure_Name("`cat ${textfile}`");
EOF

Since I am working in mysql and bash only I can not use another language to make my life easier so I use SUBSTRING_INDEX mostly. So an illustration of the procedure is shown below.

DELIMITER $$
CREATE PROCEDURE Procedure_name(textfile LONGTEXT)
BEGIN
    DECLARE data LONGTEXT;
    SET data = SUBSTRING_INDEX(SUBSTRING_INDEX(textfile,"(+++)",1),"(++)",-1));
    INSERT INTO Table_Name (column) values (data);
END; $$
DELIMITER ;

The text file is a clean structure that allows for me to cut it up, but the problem I am having is that special characters inside of the textfile is causing my procedure to throw an error. I believe they are escape characters and I need a way around this. Just about any character could appear in the data I am parsing so I need a way to ignore these characters in the procedure or to cause them to not affect my process.

I tried looking into mysql_real_escape_string() however the parameters were hard to figure out and it looks like it only works in PHP but I am not sure. So I would like to do something at the beginning of my procedure to maybe insert "\\"'s or something into the string to not cause my procedure to fail.

Also, these textfiles range from 16k to 11000k so I need something that can handle that. My process works sometimes but is getting caught up on a lot of stuff and my searching has not helped me at all. So any help would be greatly appreciated!!!

and thanks to all to reading this long description. normally I can find my answer or piece it together from questions but I had no luck this time so I figured it was about time to make an account and ask something.

Your question is really too board, but here is an example of what I mean

  a script file:

  #!/bin/bash

  case $# in
     1 ) inFile=$1 ;;
     * ) echo "usage: myLoader infile"; exit 1 ;;
  esac 

  awk 'BEGIN {
    FS="\t"'; OFS="|"
  } 
  {
     sub(/badChars/, "", $0); sub(/otherBads/, "", $0) ; # .... as many as needed
     # but be careful, easy to delete stuff that with too broad a brush.
     print $1, $2, $5, $4, $9
  }' $inFile > $inFile.psv

  bcp -in -f ${formatFile:-formatFile} $inFile.psv

Note how awk makes it very easy, by repeating sub(...) commands to remove any "bad chars" you may have in your source data AND to reorganize the order of the columns in your data. Each $n is the value in numbered column on a line, so $1, $2, $5 skips fields $3 and $4, for example.

The OFS is set to the pipe char, making it easy to see in your output where exactly the field boundaries are AND if there are any leading or trailing whitespace characters that may be throwing off your load.

The > $inFile.psv keeps your original file, just in case you make a mistake in the awk script. If you create really small test data files, you can eliminate saving to a file and just let the output go to the screen, editing until you get it right.

You'll have to find out exactly how mySQL's equivalent of bcp works. I'm pretty sure I've seen postings here. Either that, or post a separate question, "I have this pipe-delimited file with 8 columns, how do I load it to my table?".

The reference in my sample code to ${formatFile} is that hopefully the mySQL bcp command can take a format file that specifies the order and types of fields to be loaded into a file. Good bcp fmt files allow a fair amount of flexibility, but you'll have to read the man page for that utility AND do some research to understand the scope and restraints on that flexibility.

Going forward, you should post individual questions like, "I've tried x using lang Y to filter Z characters. Right now I'm getting output z, What am I doing wrong?"

Divide and conquer. There is no easy way. Reset those customer and boss expectations, you're learning something new, and it will take a little study to get it right. Good luck.

IHTH

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM