简体   繁体   中英

How can a Windows batch file read data correctly from a delimited text file when some fields are null?

I have a comma-delimited text file with three fields. The first always containd a string, but the second, third, or both can be empty. When all contain strings, when only the third is emppty, and when the second and third are both empty, I get the expected result when it is read using the FOR command, the expected result being that the variables read from fields containing strings are equal to those strings, and the variables read from empty fields have null values. However, when the second fielkd is empty, and the third field contains a string, I get the unexpected result that the second variable, the one that was supposed to be read from the second field equals the contents of the third field, and the third variable has a null value.

How can I work around this problem?

This information is copied verbatim from my DosTips post: Safely parse nearly any CSV with parseCSV.bat

It is fairly common that someone wants to parse CSV using FOR /F. This is a simple task if you know all columns are populated, and there are no commas, newlines, or quotes within values. Assume there are 4 columns:

@echo off
for /f "tokens=1-4 delims=," %%A in (test.csv) do (
  echo ----------------------
  echo A=%%~A
  echo B=%%~B
  echo C=%%~C
  echo D=%%~D
  echo(
)

But things become more difficult if any of the following conditions occur:

1) Values may be empty, with consecutive commas. FOR /F treats consecutive delimiters as one, so it will throw off the column assignment.

2) Quoted values may contain commas. FOR /F will incorrectly treat a quoted comma as a column delimiter.

3) Quoted values may contain newlines. FOR /F will break the line at the newline and incorrectly treat the one row as two.

4) Quoted values may contain paired quotes that represent one quote.
For example, "He said, ""Hello there"" . A method is needed to convert "" into " .

Then there is a secondary problems that can crop up if delayed expansion is enabled.

5) A FOR variable %%A will be corrupted if it contains ! (or sometimes ^ ) if delayed expansion is enabled when the variable is expanded.

There are fairly easy solutions for some of these issues, but solving all of them is extremely difficult (and slow) with pure batch.

I have written a hybrid JScript/batch utility called parseCSV.bat that makes it easy and relatively efficient to correctly parse nearly any CSV file with FOR /F.

parseCSV.bat

@if (@X)==(@Y) @end /* harmless hybrid line that begins a JScrpt comment

::************ Documentation ***********
::parseCSV.bat version 1.0
:::
:::parseCSV  [/option]...
:::
:::  Parse stdin as CSV and write it to stdout in a way that can be safely
:::  parsed by FOR /F. All columns will be enclosed by quotes so that empty
:::  columns may be preserved. It also supports delimiters, newlines, and
:::  quotes within quoted values. Two consecutive quotes within a quoted value
:::  are converted into one quote.
:::
:::  Available options:
:::
:::    /I:string = Input delimiter. Default is a comma.
:::
:::    /O:string = Output delimiter. Default is a comma.
:::
:::    /E = Encode output delimiter in value as \D
:::         Encode newline in value as \N
:::         Encode backslash in value as \S
:::
:::    /D = Escape exclamation point and caret for delayed expansion
:::         ! becomes ^!
:::         ^ becomes ^^
:::
:::parseCSV  /?
:::
:::  Display this help
:::
:::parseCSV  /V
:::
:::  Display the version of parseCSV.bat
:::
:::parseCSV.bat was written by Dave Benham. Updates are available at the original
:::posting site: http://www.dostips.com/forum/viewtopic.php?f=3&t=5702
:::

::************ Batch portion ***********
@echo off
if "%~1" equ "/?" (
  setlocal disableDelayedExpansion
  for /f "delims=: tokens=*" %%A in ('findstr "^:::" "%~f0"') do echo(%%A
  exit /b 0
)
if /i "%~1" equ "/V" (
  for /f "delims=:" %%A in ('findstr /bc:"::%~nx0 version " "%~f0"') do echo %%A
  exit /b 0
)
cscript //E:JScript //nologo "%~f0" %*
exit /b 0


************ JScript portion ***********/
var args     = WScript.Arguments.Named,
    stdin    = WScript.Stdin,
    stdout   = WScript.Stdout,
    escape   = args.Exists("E"),
    delayed  = args.Exists("D"),
    inDelim  = args.Exists("I") ? args.Item("I") : ",",
    outDelim = args.Exists("O") ? args.Item("O") : ",",
    quote    = false,
    ln, c, n;
while (!stdin.AtEndOfStream) {
  ln=stdin.ReadLine();
  if (!quote) stdout.Write('"');
  for (n=0; n<ln.length; n++ ) {
    c=ln.charAt(n);
    if (c == '"') {
      if (quote && ln.charAt(n+1) == '"') {
        n++;
      } else {
        quote=!quote;
        continue;
      }
    }
    if (c == inDelim && !quote) c='"'+outDelim+'"';
    if (escape) {
      if (c == outDelim) c="\\D";
      if (c == "\\") c="\\S";
    }
    if (delayed) {
      if (c == "!") c="^!";
      if (c == "^") c="^^";
    }
    stdout.Write(c);
  }
  stdout.Write( (quote) ? ((escape) ? "\\N" : "\n") : '"\n' );
}

I have also written a script that defines a macro to assist with parsing the most problematic CSV files. See http://www.dostips.com/forum/viewtopic.php?f=3&t=1827 for background information about batch macros with arguments.

define_csvGetCol.bat

::define_csvGetCol.bat version 1.1
::
:: Defines variable LF and macro csvGetCol to be used with
:: parseCSV.bat to parse nearly any CSV file.
::
:: This script must be called with delayedExpansion disabled.
::
:: The %csvGetCol% macro must be used with delayedExpansion enabled.
::
:: Example usage:
::
::   @echo off
::   setlocal disableDelayedExpansion
::   call define_csvGetCol
::   setlocal enableDelayedExpansion
::   for /f "tokens=1-3 delims=," %%A in ('parseCSV /d /e ^<test.csv') do (
::     %== Load and decode column values ==%
::     %csvGetCol% A "," %%A
::     %csvGetCol% B "," %%B
::     %csvGetCol% C "," %%C
::     %== Display the result ==%
::     echo ----------------------
::     for %%V in (A B C) do echo %%V=!%%V!
::     echo(
::   )
::
:: Written by Dave Benham
::

:: Delayed expansion must be disabled during macro definition

:: Define LF to contain a linefeed (0x0A) character
set ^"LF=^

^" The empty line above is critical - DO NOT REMOVE

:: define a newline with line continuation
set ^"\n=^^^%LF%%LF%^%LF%%LF%^^"

:: Define csvGetCol
:: %csvGetCol%  envVarName  "Delimiter"  FORvar
set csvGetCol=for %%# in (1 2) do if %%#==2 (%\n%
setlocal enableDelayedExpansion^&for /f "tokens=1,2*" %%1 in ("!args!") do (%\n%
  endlocal^&endlocal%\n%
  set "%%1=%%~3"!%\n%
  if defined %%1 (%\n%
    for %%L in ("!LF!") do set "%%1=!%%1:\N=%%~L!"%\n%
    set "%%1=!%%1:\D=%%~2!"%\n%
    set "%%1=!%%1:\S=\!"%\n%
  )%\n%
)) else setlocal disableDelayedExpansion ^& set args=


Usage is extremely simple if you know there are no commas or newlines in any values, and delayed expansion is not needed:

test1.csv

"value1 with ""quotes""",value2: No problem!,value3: 2^3=8,value4: (2^2)!=16
value1,,value3,value4
value1,,,value4
value1,,,
,,,value4

test1.bat - no delayed expansion, no commas or newlines in values

@echo off
for /f "tokens=1-4 delims=," %%A in ('parseCSV ^<test1.csv') do (
  echo -------------
  echo(A=%%~A
  echo(B=%%~B
  echo(C=%%~C
  echo(D=%%~D
  echo(
)

--OUTPUT1--

-------------
A=value1 with "quotes"
B=value2: No problem!
C=value3: 2^3=8
D=value4: (2^2)!=16

-------------
A=value1
B=
C=value3
D=value4

-------------
A=value1
B=
C=
D=value4

-------------
A=value1
B=
C=
D=

-------------
A=
B=
C=
D=value4


It is also quite simple when commas are in values if you know of a character that does not exist in any value. Simply specify a unique character for the output delimiter.

test2.csv

"value1 with ""quotes""","value2, No problem!","value3, 2^3=8","value4, (2^2)!=16"
value1,,value3,value4
value1,,,value4
value1,,,
,,,value4

test2.bat - no delayed expansion, no newlines or pipes in values. Note that the entire option must be quoted if the delimiter is a poison character

@echo off
for /f "tokens=1-4 delims=|" %%A in ('parseCSV "/o:|" ^<test2.csv') do (
  echo -------------
  echo(A=%%~A
  echo(B=%%~B
  echo(C=%%~C
  echo(D=%%~D
  echo(
)

--OUTPUT2--

-------------
A=value1 with "quotes"
B=value2, No problem!
C=value3, 2^3=8
D=value4, (2^2)!=16

-------------
A=value1
B=
C=value3
D=value4

-------------
A=value1
B=
C=
D=value4

-------------
A=value1
B=
C=
D=

-------------
A=
B=
C=
D=value4


It only takes a bit more code if values may contain newlines or if you don't know of a character that does not appear in any value. This solution encodes newlines, delimiters, and slashes as \\N , \\D , and \\S . Delayed expansion is needed within the loop to decode the values, so ! and ^ must be escaped as ^! and ^^ .

test3.csv

"2^3=8","(2^2)!=16","Success!",Value4
value1,value2,value3,value4
,,,value4
"value1","value2","value3","value4"
"He said, ""Hey cutie.""","She said, ""Drop dead!""","value3 line1
value3 line2",c:\Windows

test3.bat - Allow virtually any valid CSV, without using a macro.

@echo off
setlocal enableDelayedExpansion

:: Define LF to contain a linefeed (0x0A) character
set ^"LF=^

^" The empty line above is critical - DO NOT REMOVE

for /f "tokens=1-4 delims=," %%A in ('parseCSV /e /d ^<test3.csv') do (
  %== Load columns with encoded values. The trailing ! is important ==%
  set "A=%%~A"!
  set "B=%%~B"!
  set "C=%%~C"!
  set "D=%%~D"!
  %== Decode values ==%
  for %%L in ("!LF!") do for %%V in (A B C D) do if defined %%V (
    set "%%V=!%%V:\N=%%~L!"
    set "%%V=!%%V:\D=,!"
    set "%%V=!%%V:\S=\!"
  )
  %== Print results ==%
  echo ---------------------
  for %%V in (A B C D) do echo(%%V=!%%V!
  echo(
)

--OUTPUT3--

---------------------
A=2^3=8
B=(2^2)!=16
C=Success!
D=Value4

---------------------
A=value1
B=value2
C=value3
D=value4

---------------------
A=
B=
C=
D=value4

---------------------
A=value1
B=value2
C=value3
D=value4

---------------------
A=He said, "Hey cutie."
B=She said, "Drop dead!"
C=value3 line1
value3 line2
D=c:\Windows


test4.bat - Allow virtually any valid CSV, but now use the %csvGetCol% macro.

@echo off

:: Delayed expansion must be disabled during macro definition
setlocal disableDelayedExpansion
call define_csvGetCol

:: Delayed expansion must be enabled when using %csvGetCol%
setlocal enableDelayedExpansion
for /f "tokens=1-4 delims=," %%A in ('parseCSV /e /d ^<test3.csv') do (
  %== Load and decode column values ==%
  %csvGetCol% A "," %%A
  %csvGetCol% B "," %%B
  %csvGetCol% C "," %%C
  %csvGetCol% D "," %%D
  %== Print results ==%
  echo ---------------------
  for %%V in (A B C D) do echo(%%V=!%%V!
  echo(
)

Output is identical to test3.bat


If the CSV file is very large, then it is much more efficient to save the output of parseCSV.bat to a temporary file, and then use the FOR /F loop to read the temporary file.


There are still a couple inherent limitations that are true for all FOR /F usage:

1) A single FOR /F cannot parse more than 32 columns.

2) Batch line length restriction of 8191 characters can still be a problem.

No sample data, so solution incomplete.

@ECHO OFF
SETLOCAL enabledelayedexpansion
(
 FOR /f "delims=" %%a IN (q27830845.txt) DO (
  SET "line=%%a"
  SET "line=!line:,,,= , , ,!"
  SET "line=!line:,,= , ,!"
  FOR /f "tokens=1-4delims=," %%b IN ("!LINE!") DO (
   ECHO(%%a--^>^>%%b++%%c++%%d++%%e++
  )
 )
)>newfile.txt

GOTO :EOF

I used a file named q27830845.txt containing this data for my testing.

col1,col 2,col 3,col4
one,two,three,four
ONE,,THREE,FOUR - no two
ONE,,,FOUR - 3 and 2 missing
,,,Only FOUR

Produces newfile.txt with content

col1,col 2,col 3,col4-->>col1++col 2++col 3++col4++
one,two,three,four-->>one++two++three++four++
ONE,,THREE,FOUR - no two-->>ONE ++ ++THREE++FOUR - no two++
ONE,,,FOUR - 3 and 2 missing-->>ONE ++ ++ ++FOUR - 3 and 2 missing++
,,,Only FOUR-->> ++ ++ ++Only FOUR++

Note that %%a etc may have Space appended. Will no doubt show sensitivity to the characters which have meaning to cmd like ! and % . ++ used simply as an obvious visual separator between fields.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM