简体   繁体   中英

Perl: Keep only one of two consecutive characters

I'm having trouble applying a regex to keep only one of two specific consecutive characters in a column. I have the following file in which CO appears for number 1 and number 2, as indicated. I would like to write a new file in which only CO in number 1 is present. This functionality needs to be repeated throughout the file, for example between number 2 and 3 (keep number 2), and number 3 and 4 (keep number 3) etc .

Input:
    1       H       27.5310
    1       H       27.0882
    1       C       36.8857
    1       O       -118.2564
    2       C       36.6954
    2       O       -118.5597
    2       N       133.6704
    2       H       28.3581

Output:
    1       H       27.5310
    1       H       27.0882
    1       C       36.8857
    1       O       -118.2564
    2       N       133.6704
    2       H       28.3581

This is what I have so far, hope my logic is semi-clear. I'm still learning and any commentary is greatly appreciated!

#!/usr/bin/perl

use strict;
use warnings;

my $file = 'data.txt';

open my $fh, '<', $file or die "Can't read $file: $!";

while (my $line = <fh>) {
    chomp $line;
    my @column = split(/\t/,$line);
    if ($column[1] =~ s/COCO/\s+/g) {
    print "@columns\n";
   }
 }

You could maybe do it all at once. Read the whole file into a string.
Then put it through this regex.

 # s/(?m)(^\h+(\d+)\h+C.*\s+^\h+\2\h+O.*\n)\s*^\h+(?!\2)(\d+)\h+C.*\s+^\h+\3\h+O.*\n(?!\s*\z)/$1/g

 (?xm-)
 # C-O in the bottom of a segment
 (                             # (1 start), Keep this
      ^ \h+                         # new line
      ( \d+ )                       # (2), col 1 number
      \h+ C .* \s+                  # C
      ^ \h+                         # next line
      \2 \h+ O .* \n                # \2 .. O 
 )                             # (1 end)
 # Throw this away
 # C-O in the top of next segment
 \s* 
 ^ \h+                         # new line
 (?! \2 )                      # Not \2
 ( \d+ )                       # (3), col 1 num
 \h+ C .* \s+                  # C
 ^ \h+                         # next line
 \3 \h+ O .* \n                # \3 .. O
 (?! \s* \z )                  # Not the last in file

Perl code:

use strict;
use warnings;

$/ = "";

my $input = <DATA>;
print "Input:\n$input\n";

$input =~

 s/(?xm-)
     # C-O in the bottom of a segment
     (                             # (1 start), Keep this
          ^ \h+                         # new line
          ( \d+ )                       # (2), col 1 number
          \h+ C .* \s+                  # C
          ^ \h+                         # next line
          \2 \h+ O .* \n                # \2 .. O 
     )                             # (1 end)
     # Throw this away
     # C-O in the top of next segment
     \s* 
     ^ \h+                         # new line
     (?! \2 )                      # Not \2
     ( \d+ )                       # (3), col 1 num
     \h+ C .* \s+                  # C
     ^ \h+                         # next line
     \3 \h+ O .* \n                # \3 .. O
     (?! \s* \z )                  # Not the last in file
/$1/g;

print "Output:\n$input\n";

__DATA__
    1       H       27.5310
    1       H       27.0882
    1       C       36.8857
    1       O       -118.2564
    2       C       36.6954
    2       O       -118.5597
    2       N       133.6704
    2       H       28.3581

Code output:

Input:
    1       H       27.5310
    1       H       27.0882
    1       C       36.8857
    1       O       -118.2564
    2       C       36.6954
    2       O       -118.5597
    2       N       133.6704
    2       H       28.3581

Output:
    1       H       27.5310
    1       H       27.0882
    1       C       36.8857
    1       O       -118.2564
    2       N       133.6704
    2       H       28.3581

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM