Fit for print: How to cut strings with codepoints that take more than one print column?

Question

Is there a shorter way to cut this Chinese text so that it fits in a $print_length width print column?

#!/usr/bin/env perl
use warnings;
use 5.10.1;
use utf8;
binmode STDOUT, ':utf8';
use Unicode::GCString;

my $print_length = 15;

my $string1 = 'abcdefghijklmnopqrstuvwxyz';
say substr( $string1, 0, $print_length );

my $string2 = '大佛頂如來密因修證了義諸菩薩萬行首楞嚴經'; # don't know what that means
say fit_for_column( $string2 );

sub fit_for_column {
    my ( $string ) = @_;

    my $gcs = Unicode::GCString->new( $string ); 
    my $pcw = $gcs->columns();

    while ( $pcw > $print_length ) {
        $string =~ s/\X\z//;
        $gcs = Unicode::GCString->new( $string );
        $pcw = $gcs->columns();
    }
    return $string;
}

Answer 1

For small texts like this I don't think you can do much else. But for longer texts you might want to look into East Asian Width , and use that (maybe compressed into blocks of codepoints instead of one by one) as a reference for the width of the characters in your text. You could have a function that takes a string and returns its width (assuming standard = 1, W =2, H = 0.5 etc). Or that returns text within a set length.

Fit for print: How to cut strings with codepoints that take more than one print column?

Question

1 answers

solution1
1 ACCPTED 2012-07-20 13:40:01

Fit for print: How to cut strings with codepoints that take more than one print column?

Question

1 answers

solution1 1 ACCPTED 2012-07-20 13:40:01

solution1
1 ACCPTED 2012-07-20 13:40:01