简体   繁体   中英

Special preg_match for a string

This is my string:

================================================================================
                                       INPUT FILE
================================================================================
NAME = CO-c0m1.txt
|  1> ! HF def2-TZVP opt numfreq

|  2> 

|  3> % scf

|  4>      convergence tight

|  5> end

|  6> 

|  7> * xyz 0 1

|  8> C 0 0 0

|  9> O 0 0 1

| 10> *

| 11> 
| 12>                          ****END OF INPUT****
================================================================================

I want get this output:

! HF def2-TZVP opt numfreq
% scf
     convergence tight
end

* xyz 0 1
C 0 0 0
O 0 0 1
*

I've been trying to do for like 5 hours and can't do it, please help, this is my pregmatch:

$regx = '/INPUT FILE...................................................................................(.*?)........................END OF INPUT/s';
      if(preg_match($regx, $source[$i], $matches)) {
        $input[$i] = preg_replace('/\s\s\s\s+/', "\n", $matches[1]);
      }

I'am very new to regex and seems to be so hard. Can someone please help me, thanks in advance :)!

$p ="/[|]\s*\d*[>]\s(.+)/";
$t = "================================================================================
                                       INPUT FILE
================================================================================
NAME = CO-c0m1.txt
|  1> ! HF def2-TZVP opt numfreq

|  2> 

|  3> % scf

|  4>      convergence tight

|  5> end

|  6> 

|  7> * xyz 0 1

|  8> C 0 0 0

|  9> O 0 0 1

| 10> *

| 11> 
| 12>                          ****END OF INPUT****
================================================================================";


preg_match_all($p,$t,$res);

die(json_encode($res[1], JSON_PRETTY_PRINT));

/* Output:
[
    "! HF def2-TZVP opt numfreq",
    "% scf",
    "     convergence tight",
    "end",
    "* xyz 0 1",
    "C 0 0 0",
    "O 0 0 1",
    "*",
    "                         ****END OF INPUT****"
]
 */

Second item of $res is an array that have what you want.

You need a regular expression that matches the lines that start with | followed by some spaces, then one or more digits then > and you need only the text that follows this prefix.

The regular expression is: /^\\|\\s*\\d+>(.*)$/m . It contains a capturing group for the text you need. preg_match_all() puts the capturing fragments in $matches[1] :

preg_match_all('/^\|\s*\d+>(.*)$/m', $source[$i], $matches);
echo(implode("\n", $matches[1]));

You can then remove the line that contains ****END OF INPUT**** by other means ( array_pop() , array_filter() , etc.)

Check it in action: https://3v4l.org/hUEBk

The regex explained:

/             # regex delimiter
    ^         # match the beginning of the line
    \|        # match '|' (it needs to be escaped because it is a meta-character)
    \s        # match a whitespace character (space, tab)
    *         # the previous (a whitespace) can appear zero or more times
    \d        # match a digit (0..9)
    +         # the previous (a digit) can appear one or more times
    >         # match '>'
    (         # begin of a capturing group
      .*      # match any character, any number of times
    )         # end of the capturing group
    $         # match the end of the line
/             # regex delimiter
m             # multiline (regex modifier); check the regex against each line of the input string

Read more about Perl-Compatible Regular Expressions in PHP .

You don't need to run first regex on your text, only run this regex:

preg_match_all("/[|]\s*\d*[>]\s(.+)/", $source[$i], $matches);
echo(implode("\n", $matches[1]));

This works fine in my tests.

You may have a single regex solution to get all those data in one go:

^\|\h+\d+>(?!\h*\Q****END OF INPUT****\E)\h\K.+

Breakdown:

  • ^ Match beginning of line
  • \\|\\h+\\d+> Match up to digit>
  • (?! Start of a negative lookahead
    • \\h* If horizental whitespace(s) exist(s)
    • \\Q****END OF INPUT****\\E And ends with end of input
  • ) End of lookahead
  • \\h\\K Match a horizental whitespace then reset match
  • .+ Match up to the end of line

PHP code:

preg_match_all("~^\|\h+\d+>(?!\h*\Q****END OF INPUT****\E)\h\K.+~mi", $str, $matches);

Live demo

Output of print_r($matches[0]); :

Array
(
    [0] => ! HF def2-TZVP opt numfreq
    [1] => % scf
    [2] =>      convergence tight
    [3] => end
    [4] => * xyz 0 1
    [5] => C 0 0 0
    [6] => O 0 0 1
    [7] => *
)

You need to do a implode(PHP_EOL, $matches[0]); to join values together.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM