简体   繁体   中英

SQL - Get maximum chars and no words are cut off

I have a description column having rows with different length.

example with one value from description:

description
today is saturday

Now need to limit the description to maximum of 11 characters and no words have been cut off. The maximum allowed chars from that string is 'today is sa' . The result should be 'today is' since Saturday has been cut off. I can solve this with the below query:

SELECT 
  LEFT(description, 11 - POSITION(' ' IN REVERSE(LEFT(description, 11)))) AS desc
FROM test

However, if I change the request to be maximum 8 chars ( 'today is' ), seems like the query wont work because the result is 'today' when it is supposed to be 'today is' since 'is' is not a cut off word.

Any advice on how to update the query to satisfy any cases? Thanks in advance!

Your approach wouldn't work either, because it cuts off the last word in any case, even if the length is large enough. This is because you always search for the last space and do the left cut with string which are short enough:

demo for your approach: db<>fiddle


Another possible approach:

demo:db<>fiddle

SELECT
    tr,
    STRING_AGG(word, ' ') FILTER (WHERE sum + index - 1 <= tr)                          -- 4
FROM (
    SELECT
        *,
        SUM(length(word)) OVER (PARTITION BY tr ORDER BY index)                         -- 2
    FROM test,
        regexp_split_to_table(description, '\s') WITH ORDINALITY as words(word, index)  -- 1
) s
GROUP BY tr                                                                             -- 3
  1. Split your phrases into words and put the words into own records. The WITH ORDINALITY adds an ordered index to your words, so we can use it later for ordering and reaggregating in the correct order
  2. The cumulative SUM window function sums up the word length per record within you description. So, for this is saturday it returns (5, 7, 15) .
  3. With these additional information we can reaggregate the descriptions by grouping the original phrases (here I am using my tr column, which also serves as max length test value, but you should consider a real id column)
  4. STRING_AGG reaggregates the words into a string, taking the space as delimiter. The FILTER clause is for filtering the correct aggregate records: We are only interested in those where the cumulative sum is less than or equal to the max required length ( tr value in my case). We have to take the original spaces into account. This is what the generated index is good for, as well: We know between n records there are n - 1 spaces. So we add index - 1 to the total sum for comparison.

I would use a case expression that combines left() and regexp comparators and replacements:

with inparms as (
  select 'today is saturday' as words
)
select len, 
       case 
         when length(words) <= len then words  -- equal or under the limit
         when left(words, len + 1) ~ '\s+$' then left(words, len)  -- word ends on limit 
         when left(words, len) !~ '\s' then null  -- first word longer than limit
         else regexp_replace(left(words, len), '^(.*)\s.*', '\1') -- get complete words up to limit
       end
  from inparms
       cross join generate_series(4, 18) as gs(len);

┌─────┬───────────────────┐
│ len │  regexp_replace   │
├─────┼───────────────────┤
│   4 │                   │
│   5 │ today             │
│   6 │ today             │
│   7 │ today             │
│   8 │ today is          │
│   9 │ today is          │
│  10 │ today is          │
│  11 │ today is          │
│  12 │ today is          │
│  13 │ today is          │
│  14 │ today is          │
│  15 │ today is          │
│  16 │ today is          │
│  17 │ today is saturday │
│  18 │ today is saturday │
└─────┴───────────────────┘
(15 rows)

Using regular expression:

with t(x) as (values('today is saturday'))
select
    n,
    (regexp_match(x, '(^.{0,'||n||'})(\W|$)'))[1]
from t, generate_series(0,20) as n;
n regexp_match
0
1
2
3
4
5 today
6 today
7 today
8 today is
9 today is
10 today is
11 today is
12 today is
13 today is
14 today is
15 today is
16 today is
17 today is saturday
18 today is saturday
19 today is saturday
20 today is saturday

demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM