简体   繁体   中英

How can I use grep to search for years from 1900 to 2100?

How can I use grep to search for years from 1900 to 2100?

For example, if I have a variable with 20123320 I want to print 2012 .

Funny ways using bash ( sh users beware!):

If you want to match and print all these years that appear at the beginning of lines in a file file :

printf "^%s\n" {1900..2100} | grep -of - file

If you have a variable variable that contains 20123320 :

variable=20123320
printf "^%s\n" {1900..2100} | grep -of - <(echo "$variable")

Now please detail a little bit more what you want to do exactly so that we can give you the most appropriate answer.

Edit. As I see other answers using other tools than and here's a 100% solution:

variable="20123320"
# take the first 4 characters of variable:
year="${variable:0:4}"
# check that year is an integer and that it falls into the given range
if [[ "$year" =~ ^[[:digit:]]+$ ]] && (( 1900<=year && year<=2100)); then
    echo "$year"
else
    # Do whatever you want here
    echo "You dumbo, I couldn't find a valid year in your string"
fi
awk 'BEGIN{FIELDWIDTHS="4 "}{if($1~/^[0-9]+$/&&$1>=1900&&$1<=2100)print $1}'    

Try doing this :

echo "$var" | grep -Eo '\b(((19|20)[0-9][0-9])|2100)'

Or see my solution, since I think using regex here is not the best path.

is not the better tool to do this, Perl will be more suitable, easier & robust to test numeric ranges :

echo "$var" | perl -lne '
    $year = substr($_, 0, 4);
    print $year if $year <= 2100 && $year >= 1900 && $year =~ /^\d+$/
'

or with with the same logic :

echo "$var" | awk '
{
    year = substr($0, 0, 4)
    if (year <= 2100 && year >= 1900 && $1 ~ /^[0-9]+$/) {
        print year
    }
}'

If you insist on using grep for this, you can.

I'll assume that you want to match a variable that starts with 4 digits in in the range 1900 to 2100, and you want to print just those 4 digits.

echo "$var" | grep -Eo '^(((19|20)[0-9][0-9])|2100)'

This ignores whatever may follow those first 4 digits (because I can't think of a way to check the rest of the string without printing it).

But grep is not the obvious tool for this job, nor is a regular expression the best tool for matching a range of numbers. For example, if you needed to match numbers from 1950 to 2100, the regular expression would have to be substantially different.

Personally, I'd use Perl:

echo "$var" | perl -ne 'if (/^(\d{4})\d{4}$/ and $1 >= 1900 and $1 <= 2100) { print "$1\n" }'

This checks that $var contains exactly 8 decimal digits. If you want to check that they make up a valid date, you'll need some more code.

You could also do it fairly cleanly in awk, which might be a bit faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM