简体   繁体   中英

Oracle: Coercing VARCHAR2 and CLOB to the same type without truncation

In an app that supports MS SQL Server, MySQL, and Oracle, there's a table with the following relevant columns (types shown here are for Oracle):

ShortText VARCHAR2(1700) indexed
LongText CLOB

The app stores values 850 characters or less in ShortText, and longer ones in LongText. I need to create a view that returns that data, whichever column it's in. This works for SQL Server and MySQL:

SELECT
  CASE
    WHEN ShortText IS NOT NULL THEN ShortText
    ELSE LongText
  END AS TheValue
FROM MyTable

However, on Oracle, it generates this error:

ORA-00932: inconsistent datatypes: expected CHAR got CLOB 

...meaning that Oracle won't implicitly convert the two columns to the same type, so the query has to do it explicitly. Don't want data to get truncated, so the type used has to be able to hold as much data as a CLOB, which as I understand it (not an Oracle expert) means CLOB, only, no other choices are available.

This works on Oracle:

SELECT
  CASE
    WHEN ShortText IS NOT NULL THEN TO_CLOB(ShortText)
    ELSE LongText
  END AS TheValue
FROM MyTable

However, performance is amazingly awful. A query that returns LongText directly took 70-80 ms for about 9k rows, but the above construct took between 30 and 60 seconds , unacceptable.

So:

  1. Are there any other Oracle types I could coerce both columns to that can hold as much data as a CLOB? Ideally something more text-oriented, like MySQL's LONGTEXT, or SQL Server's NTEXT (or even better, NVARCHAR(MAX))?
  2. Any other approaches I should be looking at?

Some specifics, in particular ones requested by @Guido Leenders:

Oracle version: Oracle Database 11g 11.2.0.1.0 64bit Production
Not certain if I was the only user, but the relative times are still striking.

Stats for the small table where I saw the performance I posted earlier:
  rowcount: 9,237
  varchar column total length: 148,516
  clob column total length: 227,020

The to_clob is pretty expensive, so try to avoid it. But I think it should perform reasonable well for 9K rows. Following test case based upon one of the applications we develop which has the similar datamodel behaviour:

create table bubs_projecten_sample
( id number
, toelichting varchar2(1700)
, toelichting_l clob
)

begin
  for i in 1..10000
  loop
    insert into bubs_projecten_sample
    ( id
    , toelichting
    , toelichting_l
    )
    values
    ( i
    , case when mod(i, 2) = 0 then 'short' else null end
    , case when mod(i, 2) = 0 then rpad('long', i, '*') else null end
    )
    ;
  end loop;
  commit;
end;

Now make sure everything in cache and dirty blocks written out:

select *
from   bubs_projecten_sample

Test performance:

create table bubs_projecten_flat
as
select id
,      to_clob(toelichting) toelichting_any
from   bubs_projecten_sample
where  toelichting is not null
union all
select id
,      toelichting_l
from   bubs_projecten_sample
where  toelichting_l is not null

The create table take less than 1 second on a normal entry level server, including writing out the data, 17K consistent gets, 4K physical reads. Stored on disk (note the rpad) is 25K for toelichting and 16M for toelichting_l.

Can you further elaborate on the problem?

Please check that large CLOBs are not stored inline. Normally large CLOBs are stored in a separate system-maintained table. Storing large CLOBs inside a table can make going through the table with a Full Table Scan expensive.

Also, I can imagine populating both columns always. You still have the benefits of indexing working for the first so many characters. You just need to memorize in the table using an indicator whether the CLOB or the shortText column is leading.

As a side note; I see a difference between 850 and 1700. I would recommend making them equal, but remember to check that you are creating the table using character semantics. That can be done on statement level by using: "varchar2(850 char)". Please note that Oracle will actually create a column that fits 850 * 4 bytes (in AL32UTF8 at least, there the "32" stands for "4 bytes at most per character"). Good luck!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM