[英]How can a Windows batch file read data correctly from a delimited text file when some fields are null?
我有一個逗號分隔的文本文件,包含三個字段。 第一個總是包含一個字符串,但第二個,第三個或兩個都可以為空。 當所有包含字符串時,只有第三個是emppty,當第二個和第三個都是空的時,我在使用FOR命令讀取時獲得預期結果,預期結果是從包含字符串的字段讀取的變量相等對於這些字符串,從空字段讀取的變量具有空值。 但是,當第二個fielkd是空的,並且第三個字段包含一個字符串時,我得到意外的結果,第二個變量,應該從第二個字段讀取的那個等於第三個字段的內容,第三個字段的內容變量具有空值。
我該如何解決這個問題?
此信息將從我的DosTips帖子中逐字復制: 使用parseCSV.bat安全地解析幾乎所有CSV
有人想用FOR / F解析CSV是很常見的。 如果您知道所有列都已填充,並且值中沒有逗號,換行符或引號,則這是一項簡單的任務。 假設有4列:
@echo off
for /f "tokens=1-4 delims=," %%A in (test.csv) do (
echo ----------------------
echo A=%%~A
echo B=%%~B
echo C=%%~C
echo D=%%~D
echo(
)
但是,如果出現以下任何一種情況,事情會變得更加困難:
1)值可以為空,帶有連續的逗號。 FOR / F將連續的分隔符視為一個,因此它會拋棄列賦值。
2)引用的值可能包含逗號。 FOR / F會錯誤地將帶引號的逗號視為列分隔符。
3)引用的值可能包含換行符。 FOR / F將在換行符處斷行並錯誤地將一行視為兩行。
4)引用值可能包含代表一個引用的成對引號。
例如, "He said, ""Hello there""
。需要一種方法將""
轉換為"
。
如果啟用延遲擴展,則會出現可能出現的次要問題。
5)FOR變量%% A如果包含則會被破壞!
(或者有時^
)如果在擴展變量時啟用延遲擴展。
對於其中一些問題,有一些相當簡單的解決方案,但是使用純批次解決所有這些問題極其困難(而且速度很慢)。
我編寫了一個名為parseCSV.bat的混合JScript /批處理實用程序,它使用FOR / F正確解析幾乎任何CSV文件變得簡單而且相對有效。
parseCSV.bat
@if (@X)==(@Y) @end /* harmless hybrid line that begins a JScrpt comment
::************ Documentation ***********
::parseCSV.bat version 1.0
:::
:::parseCSV [/option]...
:::
::: Parse stdin as CSV and write it to stdout in a way that can be safely
::: parsed by FOR /F. All columns will be enclosed by quotes so that empty
::: columns may be preserved. It also supports delimiters, newlines, and
::: quotes within quoted values. Two consecutive quotes within a quoted value
::: are converted into one quote.
:::
::: Available options:
:::
::: /I:string = Input delimiter. Default is a comma.
:::
::: /O:string = Output delimiter. Default is a comma.
:::
::: /E = Encode output delimiter in value as \D
::: Encode newline in value as \N
::: Encode backslash in value as \S
:::
::: /D = Escape exclamation point and caret for delayed expansion
::: ! becomes ^!
::: ^ becomes ^^
:::
:::parseCSV /?
:::
::: Display this help
:::
:::parseCSV /V
:::
::: Display the version of parseCSV.bat
:::
:::parseCSV.bat was written by Dave Benham. Updates are available at the original
:::posting site: http://www.dostips.com/forum/viewtopic.php?f=3&t=5702
:::
::************ Batch portion ***********
@echo off
if "%~1" equ "/?" (
setlocal disableDelayedExpansion
for /f "delims=: tokens=*" %%A in ('findstr "^:::" "%~f0"') do echo(%%A
exit /b 0
)
if /i "%~1" equ "/V" (
for /f "delims=:" %%A in ('findstr /bc:"::%~nx0 version " "%~f0"') do echo %%A
exit /b 0
)
cscript //E:JScript //nologo "%~f0" %*
exit /b 0
************ JScript portion ***********/
var args = WScript.Arguments.Named,
stdin = WScript.Stdin,
stdout = WScript.Stdout,
escape = args.Exists("E"),
delayed = args.Exists("D"),
inDelim = args.Exists("I") ? args.Item("I") : ",",
outDelim = args.Exists("O") ? args.Item("O") : ",",
quote = false,
ln, c, n;
while (!stdin.AtEndOfStream) {
ln=stdin.ReadLine();
if (!quote) stdout.Write('"');
for (n=0; n<ln.length; n++ ) {
c=ln.charAt(n);
if (c == '"') {
if (quote && ln.charAt(n+1) == '"') {
n++;
} else {
quote=!quote;
continue;
}
}
if (c == inDelim && !quote) c='"'+outDelim+'"';
if (escape) {
if (c == outDelim) c="\\D";
if (c == "\\") c="\\S";
}
if (delayed) {
if (c == "!") c="^!";
if (c == "^") c="^^";
}
stdout.Write(c);
}
stdout.Write( (quote) ? ((escape) ? "\\N" : "\n") : '"\n' );
}
我還編寫了一個腳本,定義了一個宏來幫助解析最有問題的CSV文件。 有關帶參數的批處理宏的背景信息,請參見http://www.dostips.com/forum/viewtopic.php?f=3&t=1827 。
define_csvGetCol.bat
::define_csvGetCol.bat version 1.1
::
:: Defines variable LF and macro csvGetCol to be used with
:: parseCSV.bat to parse nearly any CSV file.
::
:: This script must be called with delayedExpansion disabled.
::
:: The %csvGetCol% macro must be used with delayedExpansion enabled.
::
:: Example usage:
::
:: @echo off
:: setlocal disableDelayedExpansion
:: call define_csvGetCol
:: setlocal enableDelayedExpansion
:: for /f "tokens=1-3 delims=," %%A in ('parseCSV /d /e ^<test.csv') do (
:: %== Load and decode column values ==%
:: %csvGetCol% A "," %%A
:: %csvGetCol% B "," %%B
:: %csvGetCol% C "," %%C
:: %== Display the result ==%
:: echo ----------------------
:: for %%V in (A B C) do echo %%V=!%%V!
:: echo(
:: )
::
:: Written by Dave Benham
::
:: Delayed expansion must be disabled during macro definition
:: Define LF to contain a linefeed (0x0A) character
set ^"LF=^
^" The empty line above is critical - DO NOT REMOVE
:: define a newline with line continuation
set ^"\n=^^^%LF%%LF%^%LF%%LF%^^"
:: Define csvGetCol
:: %csvGetCol% envVarName "Delimiter" FORvar
set csvGetCol=for %%# in (1 2) do if %%#==2 (%\n%
setlocal enableDelayedExpansion^&for /f "tokens=1,2*" %%1 in ("!args!") do (%\n%
endlocal^&endlocal%\n%
set "%%1=%%~3"!%\n%
if defined %%1 (%\n%
for %%L in ("!LF!") do set "%%1=!%%1:\N=%%~L!"%\n%
set "%%1=!%%1:\D=%%~2!"%\n%
set "%%1=!%%1:\S=\!"%\n%
)%\n%
)) else setlocal disableDelayedExpansion ^& set args=
如果您知道任何值中沒有逗號或換行符,則使用非常簡單,並且不需要延遲擴展:
test1.csv
"value1 with ""quotes""",value2: No problem!,value3: 2^3=8,value4: (2^2)!=16
value1,,value3,value4
value1,,,value4
value1,,,
,,,value4
test1.bat - 沒有延遲擴展,沒有逗號或值的換行符
@echo off
for /f "tokens=1-4 delims=," %%A in ('parseCSV ^<test1.csv') do (
echo -------------
echo(A=%%~A
echo(B=%%~B
echo(C=%%~C
echo(D=%%~D
echo(
)
--OUTPUT1--
-------------
A=value1 with "quotes"
B=value2: No problem!
C=value3: 2^3=8
D=value4: (2^2)!=16
-------------
A=value1
B=
C=value3
D=value4
-------------
A=value1
B=
C=
D=value4
-------------
A=value1
B=
C=
D=
-------------
A=
B=
C=
D=value4
如果你知道任何值中不存在的字符,那么當逗號處於值時也很簡單。 只需為輸出分隔符指定唯一字符即可。
test2.csv
"value1 with ""quotes""","value2, No problem!","value3, 2^3=8","value4, (2^2)!=16"
value1,,value3,value4
value1,,,value4
value1,,,
,,,value4
test2.bat - 沒有延遲擴展,沒有新行或值管道。 請注意,如果分隔符是毒性字符,則必須引用整個選項
@echo off
for /f "tokens=1-4 delims=|" %%A in ('parseCSV "/o:|" ^<test2.csv') do (
echo -------------
echo(A=%%~A
echo(B=%%~B
echo(C=%%~C
echo(D=%%~D
echo(
)
--OUTPUT2--
-------------
A=value1 with "quotes"
B=value2, No problem!
C=value3, 2^3=8
D=value4, (2^2)!=16
-------------
A=value1
B=
C=value3
D=value4
-------------
A=value1
B=
C=
D=value4
-------------
A=value1
B=
C=
D=
-------------
A=
B=
C=
D=value4
如果值可能包含換行符,或者如果您不知道任何值中沒有出現的字符,則只需要更多代碼。 此解決方案將換行符,分隔符和斜杠編碼為\\N
, \\D
和\\S
循環內需要延遲擴展來解碼值,所以!
和^
必須轉義為^!
和^^
。
test3.csv
"2^3=8","(2^2)!=16","Success!",Value4
value1,value2,value3,value4
,,,value4
"value1","value2","value3","value4"
"He said, ""Hey cutie.""","She said, ""Drop dead!""","value3 line1
value3 line2",c:\Windows
test3.bat - 允許幾乎任何有效的CSV,而不使用宏。
@echo off
setlocal enableDelayedExpansion
:: Define LF to contain a linefeed (0x0A) character
set ^"LF=^
^" The empty line above is critical - DO NOT REMOVE
for /f "tokens=1-4 delims=," %%A in ('parseCSV /e /d ^<test3.csv') do (
%== Load columns with encoded values. The trailing ! is important ==%
set "A=%%~A"!
set "B=%%~B"!
set "C=%%~C"!
set "D=%%~D"!
%== Decode values ==%
for %%L in ("!LF!") do for %%V in (A B C D) do if defined %%V (
set "%%V=!%%V:\N=%%~L!"
set "%%V=!%%V:\D=,!"
set "%%V=!%%V:\S=\!"
)
%== Print results ==%
echo ---------------------
for %%V in (A B C D) do echo(%%V=!%%V!
echo(
)
--OUTPUT3--
---------------------
A=2^3=8
B=(2^2)!=16
C=Success!
D=Value4
---------------------
A=value1
B=value2
C=value3
D=value4
---------------------
A=
B=
C=
D=value4
---------------------
A=value1
B=value2
C=value3
D=value4
---------------------
A=He said, "Hey cutie."
B=She said, "Drop dead!"
C=value3 line1
value3 line2
D=c:\Windows
test4.bat - 幾乎允許任何有效的CSV,但現在使用%csvGetCol%
宏。
@echo off
:: Delayed expansion must be disabled during macro definition
setlocal disableDelayedExpansion
call define_csvGetCol
:: Delayed expansion must be enabled when using %csvGetCol%
setlocal enableDelayedExpansion
for /f "tokens=1-4 delims=," %%A in ('parseCSV /e /d ^<test3.csv') do (
%== Load and decode column values ==%
%csvGetCol% A "," %%A
%csvGetCol% B "," %%B
%csvGetCol% C "," %%C
%csvGetCol% D "," %%D
%== Print results ==%
echo ---------------------
for %%V in (A B C D) do echo(%%V=!%%V!
echo(
)
輸出與test3.bat相同
如果CSV文件非常大,則將parseCSV.bat的輸出保存到臨時文件,然后使用FOR / F循環讀取臨時文件會更有效。
對於所有FOR / F用法,仍有一些固有的限制:
1)單個FOR / F無法解析超過32列。
2)8191個字符的批量行長度限制仍然是個問題。
無樣本數據,因此解決方案不完整。
@ECHO OFF
SETLOCAL enabledelayedexpansion
(
FOR /f "delims=" %%a IN (q27830845.txt) DO (
SET "line=%%a"
SET "line=!line:,,,= , , ,!"
SET "line=!line:,,= , ,!"
FOR /f "tokens=1-4delims=," %%b IN ("!LINE!") DO (
ECHO(%%a--^>^>%%b++%%c++%%d++%%e++
)
)
)>newfile.txt
GOTO:EOF
我使用了一個名為q27830845.txt
的文件, q27830845.txt
包含我的測試數據。
col1,col 2,col 3,col4
one,two,three,four
ONE,,THREE,FOUR - no two
ONE,,,FOUR - 3 and 2 missing
,,,Only FOUR
生成包含內容的newfile.txt
col1,col 2,col 3,col4-->>col1++col 2++col 3++col4++
one,two,three,four-->>one++two++three++four++
ONE,,THREE,FOUR - no two-->>ONE ++ ++THREE++FOUR - no two++
ONE,,,FOUR - 3 and 2 missing-->>ONE ++ ++ ++FOUR - 3 and 2 missing++
,,,Only FOUR-->> ++ ++ ++Only FOUR++
請注意, %%a
等可能會附加空格 。 毫無疑問會對像cmd
這樣有意義的人物表現出敏感!
和%
。 ++
僅用作字段之間明顯的可視分隔符。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.