简体   繁体   中英

multiline regex match string replacement on large file

I have a large mysqldump (4+ gigs), and we have an archive type table which suffers from this bug Net result is that I need to reset the AUTO_INCREMENT counter to zero. Did manage to do the replacement, but it was ugly, involving splitting the file into smaller chunks, then grepping to find the table, looking to see the number I wanted to change and then using sed on the original file to replace just that match on the auto increment. Like I said, horrible, but it worked.

So - I have tried to decipher multiline sed and didn't get that far. What I want to do is to seek to the table name I'm interested in, and then from that point find the next AUTO_INCREMENT= , and then match the number in it, and make it zero. Here's the table: (assume there is scads of data before this point, and after it)

DROP TABLE IF EXISTS `archive_exported_problems`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `archive_exported_problems` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `export_id` int(11) DEFAULT NULL,
  `problem_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=ARCHIVE AUTO_INCREMENT=478 DEFAULT CHARSET=latin1;

What I want to do, is to (automatically) scan the file until it matches

(?:CREATE TABLE `archive_exported_problems).*?AUTO_INCREMENT=(\d+)

(regex which seems to work) and then replace the capture group with 0

I assume this is possible - any help most appreciated!

If perl is an option then it is easier using DOTALL flag in perl like this:

perl -00 -pe 
 's/(?s)(CREATE TABLE `archive_exported_problems`.*?AUTO_INCREMENT)=\d+/$1=0/' file.sql

DROP TABLE IF EXISTS `archive_exported_problems`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `archive_exported_problems` (
      `id` int(11) NOT NULL AUTO_INCREMENT,
      `export_id` int(11) DEFAULT NULL,
      `problem_id` int(11) DEFAULT NULL,
      PRIMARY KEY (`id`)
) ENGINE=ARCHIVE AUTO_INCREMENT=0 DEFAULT CHARSET=latin1;

Options used are:

-00    # slurps whole file
(?s)   # enable DOTALL flag for regex

Consider this:

$ sed -r '/CREATE TABLE `archive_exported_problems`/,/AUTO_INCREMENT=/ {s/(AUTO_INCREMENT)=[[:digit:]]+/\1=0/;}' file
DROP TABLE IF EXISTS `archive_exported_problems`;
/*!40101 SET @saved_cs_client     = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `archive_exported_problems` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `export_id` int(11) DEFAULT NULL,
  `problem_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=ARCHIVE AUTO_INCREMENT=0 DEFAULT CHARSET=latin1;

For Mac OSX (BSD), try:

$ sed -E -e '/CREATE TABLE `archive_exported_problems`/,/AUTO_INCREMENT=/ {s/(AUTO_INCREMENT)=[[:digit:]]+/\1=0/;}' file

How it works

  • /CREATE TABLE archive_exported_problems /, /AUTO_INCREMENT=/

    This restricts the subsequent commands to ranges on lines that start with a line containing CREATE TABLE 'archive_exported_problems' and end with a line containing AUTO_INCREMENT= .

  • s/(AUTO_INCREMENT)=[[:digit:]]+/\\1=0/

    This performs the substitution that you wanted.

Limitation

This approach assumes that the CREATE TABLE phrase and the AUTO_INCREMENT= phrase will never be on the same line. If that is not true, we need to make some minor changes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM