簡體   English   中英

當某些字段為空時,Windows批處理文件如何從分隔的文本文件中正確讀取數據?

[英]How can a Windows batch file read data correctly from a delimited text file when some fields are null?

我有一個逗號分隔的文本文件,包含三個字段。 第一個總是包含一個字符串,但第二個,第三個或兩個都可以為空。 當所有包含字符串時,只有第三個是emppty,當第二個和第三個都是空的時,我在使用FOR命令讀取時獲得預期結果,預期結果是從包含字符串的字段讀取的變量相等對於這些字符串,從空字段讀取的變量具有空值。 但是,當第二個fielkd是空的,並且第三個字段包含一個字符串時,我得到意外的結果,第二個變量,應該從第二個字段讀取的那個等於第三個字段的內容,第三個字段的內容變量具有空值。

我該如何解決這個問題?

此信息將從我的DosTips帖子中逐字復制: 使用parseCSV.bat安全地解析幾乎所有CSV

有人想用FOR / F解析CSV是很常見的。 如果您知道所有列都已填充,並且值中沒有逗號,換行符或引號,則這是一項簡單的任務。 假設有4列:

@echo off
for /f "tokens=1-4 delims=," %%A in (test.csv) do (
  echo ----------------------
  echo A=%%~A
  echo B=%%~B
  echo C=%%~C
  echo D=%%~D
  echo(
)

但是,如果出現以下任何一種情況,事情會變得更加困難:

1)值可以為空,帶有連續的逗號。 FOR / F將連續的分隔符視為一個,因此它會拋棄列賦值。

2)引用的值可能包含逗號。 FOR / F會錯誤地將帶引號的逗號視為列分隔符。

3)引用的值可能包含換行符。 FOR / F將在換行符處斷行並錯誤地將一行視為兩行。

4)引用值可能包含代表一個引用的成對引號。
例如, "He said, ""Hello there"" 。需要一種方法將""轉換為"

如果啟用延遲擴展,則會出現可能出現的次要問題。

5)FOR變量%% A如果包含則會被破壞! (或者有時^ )如果在擴展變量時啟用延遲擴展。

對於其中一些問題,有一些相當簡單的解決方案,但是使用純批次解決所有這些問題極其困難(而且速度很慢)。

我編寫了一個名為parseCSV.bat的混合JScript /批處理實用程序,它使用FOR / F正確解析幾乎任何CSV文件變得簡單而且相對有效。

parseCSV.bat

@if (@X)==(@Y) @end /* harmless hybrid line that begins a JScrpt comment

::************ Documentation ***********
::parseCSV.bat version 1.0
:::
:::parseCSV  [/option]...
:::
:::  Parse stdin as CSV and write it to stdout in a way that can be safely
:::  parsed by FOR /F. All columns will be enclosed by quotes so that empty
:::  columns may be preserved. It also supports delimiters, newlines, and
:::  quotes within quoted values. Two consecutive quotes within a quoted value
:::  are converted into one quote.
:::
:::  Available options:
:::
:::    /I:string = Input delimiter. Default is a comma.
:::
:::    /O:string = Output delimiter. Default is a comma.
:::
:::    /E = Encode output delimiter in value as \D
:::         Encode newline in value as \N
:::         Encode backslash in value as \S
:::
:::    /D = Escape exclamation point and caret for delayed expansion
:::         ! becomes ^!
:::         ^ becomes ^^
:::
:::parseCSV  /?
:::
:::  Display this help
:::
:::parseCSV  /V
:::
:::  Display the version of parseCSV.bat
:::
:::parseCSV.bat was written by Dave Benham. Updates are available at the original
:::posting site: http://www.dostips.com/forum/viewtopic.php?f=3&t=5702
:::

::************ Batch portion ***********
@echo off
if "%~1" equ "/?" (
  setlocal disableDelayedExpansion
  for /f "delims=: tokens=*" %%A in ('findstr "^:::" "%~f0"') do echo(%%A
  exit /b 0
)
if /i "%~1" equ "/V" (
  for /f "delims=:" %%A in ('findstr /bc:"::%~nx0 version " "%~f0"') do echo %%A
  exit /b 0
)
cscript //E:JScript //nologo "%~f0" %*
exit /b 0


************ JScript portion ***********/
var args     = WScript.Arguments.Named,
    stdin    = WScript.Stdin,
    stdout   = WScript.Stdout,
    escape   = args.Exists("E"),
    delayed  = args.Exists("D"),
    inDelim  = args.Exists("I") ? args.Item("I") : ",",
    outDelim = args.Exists("O") ? args.Item("O") : ",",
    quote    = false,
    ln, c, n;
while (!stdin.AtEndOfStream) {
  ln=stdin.ReadLine();
  if (!quote) stdout.Write('"');
  for (n=0; n<ln.length; n++ ) {
    c=ln.charAt(n);
    if (c == '"') {
      if (quote && ln.charAt(n+1) == '"') {
        n++;
      } else {
        quote=!quote;
        continue;
      }
    }
    if (c == inDelim && !quote) c='"'+outDelim+'"';
    if (escape) {
      if (c == outDelim) c="\\D";
      if (c == "\\") c="\\S";
    }
    if (delayed) {
      if (c == "!") c="^!";
      if (c == "^") c="^^";
    }
    stdout.Write(c);
  }
  stdout.Write( (quote) ? ((escape) ? "\\N" : "\n") : '"\n' );
}

我還編寫了一個腳本,定義了一個宏來幫助解析最有問題的CSV文件。 有關帶參數的批處理宏的背景信息,請參見http://www.dostips.com/forum/viewtopic.php?f=3&t=1827

define_csvGetCol.bat

::define_csvGetCol.bat version 1.1
::
:: Defines variable LF and macro csvGetCol to be used with
:: parseCSV.bat to parse nearly any CSV file.
::
:: This script must be called with delayedExpansion disabled.
::
:: The %csvGetCol% macro must be used with delayedExpansion enabled.
::
:: Example usage:
::
::   @echo off
::   setlocal disableDelayedExpansion
::   call define_csvGetCol
::   setlocal enableDelayedExpansion
::   for /f "tokens=1-3 delims=," %%A in ('parseCSV /d /e ^<test.csv') do (
::     %== Load and decode column values ==%
::     %csvGetCol% A "," %%A
::     %csvGetCol% B "," %%B
::     %csvGetCol% C "," %%C
::     %== Display the result ==%
::     echo ----------------------
::     for %%V in (A B C) do echo %%V=!%%V!
::     echo(
::   )
::
:: Written by Dave Benham
::

:: Delayed expansion must be disabled during macro definition

:: Define LF to contain a linefeed (0x0A) character
set ^"LF=^

^" The empty line above is critical - DO NOT REMOVE

:: define a newline with line continuation
set ^"\n=^^^%LF%%LF%^%LF%%LF%^^"

:: Define csvGetCol
:: %csvGetCol%  envVarName  "Delimiter"  FORvar
set csvGetCol=for %%# in (1 2) do if %%#==2 (%\n%
setlocal enableDelayedExpansion^&for /f "tokens=1,2*" %%1 in ("!args!") do (%\n%
  endlocal^&endlocal%\n%
  set "%%1=%%~3"!%\n%
  if defined %%1 (%\n%
    for %%L in ("!LF!") do set "%%1=!%%1:\N=%%~L!"%\n%
    set "%%1=!%%1:\D=%%~2!"%\n%
    set "%%1=!%%1:\S=\!"%\n%
  )%\n%
)) else setlocal disableDelayedExpansion ^& set args=


如果您知道任何值中沒有逗號或換行符,則使用非常簡單,並且不需要延遲擴展:

test1.csv

"value1 with ""quotes""",value2: No problem!,value3: 2^3=8,value4: (2^2)!=16
value1,,value3,value4
value1,,,value4
value1,,,
,,,value4

test1.bat - 沒有延遲擴展,沒有逗號或值的換行符

@echo off
for /f "tokens=1-4 delims=," %%A in ('parseCSV ^<test1.csv') do (
  echo -------------
  echo(A=%%~A
  echo(B=%%~B
  echo(C=%%~C
  echo(D=%%~D
  echo(
)

--OUTPUT1--

-------------
A=value1 with "quotes"
B=value2: No problem!
C=value3: 2^3=8
D=value4: (2^2)!=16

-------------
A=value1
B=
C=value3
D=value4

-------------
A=value1
B=
C=
D=value4

-------------
A=value1
B=
C=
D=

-------------
A=
B=
C=
D=value4


如果你知道任何值中不存在的字符,那么當逗號處於值時也很簡單。 只需為輸出分隔符指定唯一字符即可。

test2.csv

"value1 with ""quotes""","value2, No problem!","value3, 2^3=8","value4, (2^2)!=16"
value1,,value3,value4
value1,,,value4
value1,,,
,,,value4

test2.bat - 沒有延遲擴展,沒有新行或值管道。 請注意,如果分隔符是毒性字符,則必須引用整個選項

@echo off
for /f "tokens=1-4 delims=|" %%A in ('parseCSV "/o:|" ^<test2.csv') do (
  echo -------------
  echo(A=%%~A
  echo(B=%%~B
  echo(C=%%~C
  echo(D=%%~D
  echo(
)

--OUTPUT2--

-------------
A=value1 with "quotes"
B=value2, No problem!
C=value3, 2^3=8
D=value4, (2^2)!=16

-------------
A=value1
B=
C=value3
D=value4

-------------
A=value1
B=
C=
D=value4

-------------
A=value1
B=
C=
D=

-------------
A=
B=
C=
D=value4


如果值可能包含換行符,或者如果您不知道任何值中沒有出現的字符,則只需要更多代碼。 此解決方案將換行符,分隔符和斜杠編碼為\\N\\D\\S 循環內需要延遲擴展來解碼值,所以! ^必須轉義為^! ^^

test3.csv

"2^3=8","(2^2)!=16","Success!",Value4
value1,value2,value3,value4
,,,value4
"value1","value2","value3","value4"
"He said, ""Hey cutie.""","She said, ""Drop dead!""","value3 line1
value3 line2",c:\Windows

test3.bat - 允許幾乎任何有效的CSV,而不使用宏。

@echo off
setlocal enableDelayedExpansion

:: Define LF to contain a linefeed (0x0A) character
set ^"LF=^

^" The empty line above is critical - DO NOT REMOVE

for /f "tokens=1-4 delims=," %%A in ('parseCSV /e /d ^<test3.csv') do (
  %== Load columns with encoded values. The trailing ! is important ==%
  set "A=%%~A"!
  set "B=%%~B"!
  set "C=%%~C"!
  set "D=%%~D"!
  %== Decode values ==%
  for %%L in ("!LF!") do for %%V in (A B C D) do if defined %%V (
    set "%%V=!%%V:\N=%%~L!"
    set "%%V=!%%V:\D=,!"
    set "%%V=!%%V:\S=\!"
  )
  %== Print results ==%
  echo ---------------------
  for %%V in (A B C D) do echo(%%V=!%%V!
  echo(
)

--OUTPUT3--

---------------------
A=2^3=8
B=(2^2)!=16
C=Success!
D=Value4

---------------------
A=value1
B=value2
C=value3
D=value4

---------------------
A=
B=
C=
D=value4

---------------------
A=value1
B=value2
C=value3
D=value4

---------------------
A=He said, "Hey cutie."
B=She said, "Drop dead!"
C=value3 line1
value3 line2
D=c:\Windows


test4.bat - 幾乎允許任何有效的CSV,但現在使用%csvGetCol%宏。

@echo off

:: Delayed expansion must be disabled during macro definition
setlocal disableDelayedExpansion
call define_csvGetCol

:: Delayed expansion must be enabled when using %csvGetCol%
setlocal enableDelayedExpansion
for /f "tokens=1-4 delims=," %%A in ('parseCSV /e /d ^<test3.csv') do (
  %== Load and decode column values ==%
  %csvGetCol% A "," %%A
  %csvGetCol% B "," %%B
  %csvGetCol% C "," %%C
  %csvGetCol% D "," %%D
  %== Print results ==%
  echo ---------------------
  for %%V in (A B C D) do echo(%%V=!%%V!
  echo(
)

輸出與test3.bat相同


如果CSV文件非常大,則將parseCSV.bat的輸出保存到臨時文件,然后使用FOR / F循環讀取臨時文件會更有效。


對於所有FOR / F用法,仍有一些固有的限制:

1)單個FOR / F無法解析超過32列。

2)8191個字符的批量行長度限制仍然是個問題。

無樣本數據,因此解決方案不完整。

@ECHO OFF
SETLOCAL enabledelayedexpansion
(
 FOR /f "delims=" %%a IN (q27830845.txt) DO (
  SET "line=%%a"
  SET "line=!line:,,,= , , ,!"
  SET "line=!line:,,= , ,!"
  FOR /f "tokens=1-4delims=," %%b IN ("!LINE!") DO (
   ECHO(%%a--^>^>%%b++%%c++%%d++%%e++
  )
 )
)>newfile.txt

GOTO:EOF

我使用了一個名為q27830845.txt的文件, q27830845.txt包含我的測試數據。

col1,col 2,col 3,col4
one,two,three,four
ONE,,THREE,FOUR - no two
ONE,,,FOUR - 3 and 2 missing
,,,Only FOUR

生成包含內容的newfile.txt

col1,col 2,col 3,col4-->>col1++col 2++col 3++col4++
one,two,three,four-->>one++two++three++four++
ONE,,THREE,FOUR - no two-->>ONE ++ ++THREE++FOUR - no two++
ONE,,,FOUR - 3 and 2 missing-->>ONE ++ ++ ++FOUR - 3 and 2 missing++
,,,Only FOUR-->> ++ ++ ++Only FOUR++

請注意, %%a等可能會附加空格 毫無疑問會對像cmd這樣有意義的人物表現出敏感! % ++僅用作字段之間明顯的可視分隔符。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM