简体   繁体   中英

Read numbers from file until line < n

As shocked as I am, I can't find this anywhere, and my bash skills are still sub-par.

I have a text file of prime numbers:

2\\n
3\\n
5\\n
7\\n
11\\n
etc...

I want to pull all primes under 2^32 (4294967296) plus one additional prime number , and save these primes to the own text file formatted the same way. Also, my file has just over 1.3 billion lines so far, so stopping after the limit would be ideal.

Update: Problem.

The bash script has been looping through these 11 numbers for quite some time without me noticing:

4232004449
4232004479
4232004493
4232004509
4232004527
4232004533
4232004559
4232004589
4232004593
4232004613
004437

What's even weirder is I grepped primes.txt (the original) and "^004437" was nowhere to be found. Is this some kind of limitation of bash?

Update: Solution

It appears to be some kind of limitation of something, I really don't know what. I'm re-chosing the perl script as my answer because not only did it work, but it created the ~2GB from nothing in ~80 seconds and included the additional prime. Go here for a solution to the bash error.

$  perl -lne 'print; last if $_ > 2**32' < myprimes.txt > myprimes2.txt

Gives you the input series of primes up to one prime past 2**32, then stops. Does not read source file into memory.

In shell, without loading the whole 1.3 billion numbers into memory, you can use:

n=4294967296
last=0
while read number
do
    if [ $last -gt $n ]
    then break
    fi
    echo $number
    last=$number
done < primes.txt > primes2.txt

You could lose the last variable too:

n=4294967296
while read number
do
    echo $number
    if [ $number -gt $n ]
    then break
    fi
done < primes.txt > primes2.txt

This is very easy to do in Bash! Just cat the file primes.txt to read it, go through each number, check that the number is less than 2^32, and if it is, append it to primes2.txt.

The exact code is below.

#!/bin/bash

n=4294967296; # 2^32

for i in `cat primes.txt`
do
        if [ $i -le $n ]
        then
                echo $i >> primes2.txt;
        fi
done

Or you can use this simple Python solution, which does not require loading the entire file into memory.

new_primes = open('primes2.txt', 'a')
n = 2**32

[new_primes.write(p) for p in open('primes.txt', 'r') if int(p) < n]

I would recommend doing something like this in Perl:

EDIT : Hm, it was probably the array that used up all your RAM - this should be more friendly to your resources.

#!/usr/bin/env perl

use warnings;
use strict;

my $max_value = ( 2 ** 32);
my $input_file = 'primes.txt';
my $output_file = 'primes2.txt';

open( my $INPUT_FH, '<', $input_file )
    or die "could not open file: $!";

open ( my $OUTPUT_FH, '>', $output_file )
    or die "could not open file: $!";

foreach my $prime ( <$INPUT_FH> ) {
  chomp($prime);
  unless ( $prime >= $max_value ) { print $OUTPUT_FH "$prime","\n"; }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM