简体   繁体   中英

Comma-delimited fields in a csv file in plsql

I have

   WHILE INSTR (l_buffer, ',', 1, l_col_no) != 0

which checks whether the l_buffer is comma delimited and enters the loop.

Now I have a file with values

CandidateNumber,rnumber,title,OrganizationCode,OrganizationName,JobCode,JobName
10223,1600003B,Admin Officer,00000004,"Org Land, Inc.",ORGA03,ORGA03 HR & Admin

In this file it is considering "Org Land, Inc." as two words because of , in between. Is there a way to treat this as one by using Instr or anything?

Horrible idea. If you are forced to use character-delimited strings, the least you should be able to require is that the delimiter be a character that is all but guaranteed not to appear in regular field values.

The problem you raised can be solved. I show below a solution - probably not close to the most efficient, but at least it shouldn't be difficult to follow the logic. I intentionally chose an example (the fifth string) to demonstrate how it can fail. I assumed any commas between a pair of double-quotes (an opening one and a closing one) should become "invisible" - treated as if they were not delimiters, but part of the field value. That breaks if a double-quote is used in a way different from the "usual" - see my sample string #5. It will also break on any other "natural" uses of comma (where they are not meant as a delimiter) - for example, what if you have a field with a value of $1,000.00? Now you need to "escape" that comma too. One could probably come up with at least ten more similar situations - are you going to code around all of them?

Now, for my own learning and practice, I pretended the ONLY way a comma may need to be "escaped" (to become invisible to the tokenization process) is if it is enclosed between an opening and a closing double-quote (determined simply by ordering: a double-quote with an odd count from the beginning of the string is an opening one, and a double-quote with an even count is a closing one). Here is the solution; test strings at the top, including a few to test proper treatment of nulls, and the output following immediately after.

Good luck!

with test_strings (r, s) as (
        select 1, 'abdc, ronfn 0003, "ABC, Inc.", 9939' from dual union all
        select 2, 'New Delhi'                           from dual union all
        select 3, null                                  from dual union all
        select 4, ','                                   from dual union all
        select 5, 'If needed, use double quote("), OK?' from dual
     ),
     t (r, s) as (
        select r, ',' || s || ',' from test_strings
     ),
     ct (r, nc, nq) as (
        select r, regexp_count(s, ','), regexp_count(s, '"') from t
     ),
     c (r, pos) as (
        select t.r, instr(t.s, ',', 1, level) from t join ct on t.r = ct.r 
        connect by level <= ct.nc and t.r = prior t.r and prior sys_guid() is not null
     ),
     q (r, pos) as (
        select t.r, instr(t.s, '"', 1, level) from t join ct on t.r = ct.r
        connect by level <= ct.nq and t.r = prior t.r and prior sys_guid() is not null
     ),
     p (r, pos_from, pos_to, rn) as (
        select r, pos, lead(pos) over (partition by r order by pos),
               row_number() over (partition by r order by pos) from c 
           where mod((select count(1) from q where q.r = c.r and q.pos != 0 
                                                             and q.pos < c.pos), 2) = 0
     )
select p.r as string_number, p.rn as token_number,
       substr(t.s, p.pos_from + 1, p.pos_to - p.pos_from - 1)
from t join p on t.r = p.r
where p.pos_to is not null
order by string_number, token_number
;

Results:

STRING_NUMBER TOKEN_NUMBER TOKEN
------------- ------------ --------------------
            1            1 abdc
            1            2  ronfn 0003
            1            3  "ABC, Inc."
            1            4  9939
            2            1 New Delhi
            3            1
            4            1
            4            2
            5            1 If needed

9 rows selected.

Use notepad++, And change all commas to ';'. Before it, You should use REGEXP to change all commas between double quotes for let's say '@'. Then ctrl+h -> ',' to ';' and '@' to ','.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM