Is there a more efficient way to code (no mysql )

Question

I want to take a few hundred lines from a text file containing 20,000 to 30,000 lines. After writing the lines to a smaller (extracted) file, I want to delete / remove those lines from the larger (source) file. I have cobbled together some php code that does the job on my development files but seems that it will be very inefficient when working with the larger files.

This is what I put together:

<?php 
// WRITE SPECIFIED NUMBER OF LINES OF TEXT FROM SOURCE FILE TO EXTRACTED FILE

$file = fopen("extracted.txt", "w+");
flock($file, 2);

$file2read = fopen("source.txt","r+");
for ($i = 0; $i <= 9; $i++) {
    $keyphrase = fgets($file2read) ;
    fputs ($file, $keyphrase);
}

//     SOURCE LINES HAVE BEEN WRITTEN TO EXTRACTED FILE

fclose($file2read);
flock($file, 3);
fclose($file);

// NOW PROCESS SOURCE FILE TO STRIP OUT EXTRACTED FILES AND COMPACT SOURCE FILE FOR NEXT EXTRACT

for ($i = 0; $i <= 9; $i++) {
    $file2read = fopen("source.txt","r+");
    $keyphrase = fgets($file2read) ;

    $rows = file("source.txt");    

    foreach($rows as $key => $row) {
        if(preg_match("/($keyphrase)/", $row)) {
            unset($rows[$key]);
        }
    }

    file_put_contents("temporary1.txt", implode("\n", $rows));

    // STRIP OUT EMPTY LINES WHILE COPYING FROM MODIFIED SOURCE IMAGE (TEMPORARY1.TXT) TO TEMPORARY FILE

    file_put_contents('temporary.txt',
        preg_replace(
            '/\R+/',
            "\n",
            file_get_contents('temporary1.txt')
        )
    );

    // COPY TEMPORARY FILE TO PRODUCE UPDATED SOURCE FILE WITH EXTRACTED LINES REMOVED
    copy("temporary.txt","source.txt");

}  // CLOSE INITIAL 'FOR LOOP'

unlink ("temporary1.txt");  // CLEAR DISKSPACE OF TEMPORARY FILE 
unlink ("temporary.txt");  // CLEAR DISKSPACE OF TEMPORARY FILE

Copying the lines to the extracted file in the first block of code seems straightforward enough. But removing the lines and compacting the source file appears ugly and consumes multiple loops. Is there a better way, without using an mysql database?

Answer 1

To expand on my comment: Make this a program that filters lines from the input into two temporary files, then moves the temporary files back to place.

I couldn't quite get a grip on the logic for whether a line is accepted or rejected, so it's replaced here with a silly check for whether the line starts with a lowercase vowel.

(Also, it's been a while since I've written PHP, it might be buggy.)

<?php

$input = fopen("source.txt", "r");
$output_1 = fopen("temp1.txt", "w");
$output_2 = fopen("temp2.txt", "w");

while (($line = fgets($handle)) !== false) {
    $accepted = preg_match("/^[aeiou]/", $line); // TODO: replace this logic
    fwrite(($accepted ? $output_1 : $output_2), $line . "\n");
}
fclose($output_1);
fclose($output_2);

rename("temp1.txt", "extracted.txt");
rename("temp2.txt", "source.txt");

Answer 2

If I may make a suggestion, you're using the wrong tool for the job here. I'd suggest using something that is designed to work with lines of text like awk . This simple program does the same as AKX's answer. Go through each line of the file, check for a match, and write to one file or the other. Since awk is small and fast, it should be more performant than PHP.

Run this from your shell:

awk '{sub(/\r/, "")} NR <= 10 {print > "extracted.txt"; next} {print > "newsource.txt"}' source.txt

This does use some shortcuts to minimize line length, this is the same program with some more verbosity:

awk '{
    # $0 is the contents of the current line, remove CR from it
    sub(/\r/, "", $0);
    # NR indicates the current line number
    if (NR <= 10) {
        print($0) > "extracted.txt";
        # jump to the next line in the file
        next;
    }
    print($0) > "newsource.txt";
}' source.txt

Answer 3

Adding this as another answer just for completion, but again this is not a job for PHP.

If you're simply pulling the first x lines from a file, why are you running preg_search() on every line of the file x times?, Setting aside the fact that loading the entire file into an array and looping it is totally unnecessary, or that you don't need to search for text at all, regular expression functions are a very heavy tool and should only be used when needed, and never in place of a simple str_contains() or str_replace() . If you're looking for particular line numbers, you should simply set up a counter to count the line numbers, and check its value before deciding what to do.

<?php 
$extracted = fopen("extracted.txt", "w+");
$source = fopen("source.txt","r+");
$newsource = fopen("newsource.txt", "w+");

$lines = 10;
$i = 0;

while (($buffer = fgets($source, 4096)) !== false) {
    $buffer = str_replace("\r", "", $buffer);
    if ($i++ <= $lines) {
        fputs($extracted, $buffer);
        continue;
    }
    fputs($newsource, $buffer);
}
fclose($source);
fclose($extracted);
fclose($newsource);

Is there a more efficient way to code (no mysql )

Question

3 answers

solution1
1 2021-05-17 19:08:03

solution2
1 2021-05-17 22:31:28

solution3
0 2021-05-18 15:28:28

Is there a more efficient way to code (no mysql )

Question

3 answers

solution1 1 2021-05-17 19:08:03

solution2 1 2021-05-17 22:31:28

solution3 0 2021-05-18 15:28:28

solution1
1 2021-05-17 19:08:03

solution2
1 2021-05-17 22:31:28

solution3
0 2021-05-18 15:28:28