I have a description
column having rows with different length.
example with one value from description:
description |
---|
today is saturday |
Now need to limit the description to maximum of 11 characters and no words have been cut off. The maximum allowed chars from that string is 'today is sa' . The result should be 'today is' since Saturday has been cut off. I can solve this with the below query:
SELECT
LEFT(description, 11 - POSITION(' ' IN REVERSE(LEFT(description, 11)))) AS desc
FROM test
However, if I change the request to be maximum 8 chars ( 'today is' ), seems like the query wont work because the result is 'today' when it is supposed to be 'today is' since 'is' is not a cut off word.
Any advice on how to update the query to satisfy any cases? Thanks in advance!
Your approach wouldn't work either, because it cuts off the last word in any case, even if the length is large enough. This is because you always search for the last space and do the left cut with string which are short enough:
demo for your approach: db<>fiddle
Another possible approach:
SELECT
tr,
STRING_AGG(word, ' ') FILTER (WHERE sum + index - 1 <= tr) -- 4
FROM (
SELECT
*,
SUM(length(word)) OVER (PARTITION BY tr ORDER BY index) -- 2
FROM test,
regexp_split_to_table(description, '\s') WITH ORDINALITY as words(word, index) -- 1
) s
GROUP BY tr -- 3
WITH ORDINALITY
adds an ordered index
to your words, so we can use it later for ordering and reaggregating in the correct order SUM
window function sums up the word length per record within you description. So, for this is saturday
it returns (5, 7, 15)
.tr
column, which also serves as max length test value, but you should consider a real id column)STRING_AGG
reaggregates the words into a string, taking the space as delimiter. The FILTER
clause is for filtering the correct aggregate records: We are only interested in those where the cumulative sum is less than or equal to the max required length ( tr
value in my case). We have to take the original spaces into account. This is what the generated index
is good for, as well: We know between n
records there are n - 1
spaces. So we add index - 1
to the total sum for comparison.I would use a case
expression that combines left()
and regexp comparators and replacements:
with inparms as (
select 'today is saturday' as words
)
select len,
case
when length(words) <= len then words -- equal or under the limit
when left(words, len + 1) ~ '\s+$' then left(words, len) -- word ends on limit
when left(words, len) !~ '\s' then null -- first word longer than limit
else regexp_replace(left(words, len), '^(.*)\s.*', '\1') -- get complete words up to limit
end
from inparms
cross join generate_series(4, 18) as gs(len);
┌─────┬───────────────────┐
│ len │ regexp_replace │
├─────┼───────────────────┤
│ 4 │ │
│ 5 │ today │
│ 6 │ today │
│ 7 │ today │
│ 8 │ today is │
│ 9 │ today is │
│ 10 │ today is │
│ 11 │ today is │
│ 12 │ today is │
│ 13 │ today is │
│ 14 │ today is │
│ 15 │ today is │
│ 16 │ today is │
│ 17 │ today is saturday │
│ 18 │ today is saturday │
└─────┴───────────────────┘
(15 rows)
Using regular expression:
with t(x) as (values('today is saturday'))
select
n,
(regexp_match(x, '(^.{0,'||n||'})(\W|$)'))[1]
from t, generate_series(0,20) as n;
n | regexp_match |
---|---|
0 | |
1 | |
2 | |
3 | |
4 | |
5 | today |
6 | today |
7 | today |
8 | today is |
9 | today is |
10 | today is |
11 | today is |
12 | today is |
13 | today is |
14 | today is |
15 | today is |
16 | today is |
17 | today is saturday |
18 | today is saturday |
19 | today is saturday |
20 | today is saturday |
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.