简体   繁体   中英

Count delimiter in a file in windows

I have a bunch of files that contains thousands of records.The structure of each file is same.

Each record is on a separate line and has multiple fields separated by a delimiter '|'.

Each row should have 36 fields, but the problem is some of these rows has <>35 fields, ie <>35 '|' characters.

Can someone please suggest a way in windows, by which I can identify the row. (Like record with delimiters <>35 should be written to bad file).

@ECHO Off
SETLOCAL
:: Looking for exactly 36 fields - no empty fields
FOR /f "delims=" %%a IN (q25936461.txt) DO (
 SET good=Y
 FOR /f "tokens=1,30*delims=|" %%m IN ("%%a") DO (
  IF "%%o" equ "" (SET "good=") ELSE (
   FOR /f "tokens=1,6,7delims=|" %%p IN ("%%o") DO (
    IF "%%r" neq "" SET "good="
    IF "%%q" equ "" SET "good="
   )
  )
 )
 IF NOT DEFINED good ECHO(%%a
)
ECHO ========== method 1 done =============
:: Looking for exactly 36 fields - allow empty fields
FOR /f "delims=" %%a IN (q25936461.txt) DO (
 SET good=Y
 SET "line=%%a"
 SET /a count=0
 CALL :analyse
 IF NOT DEFINED good ECHO %%a
)
ECHO ========== method 2 done =============

GOTO :EOF
:analyse
SET "linem=%line:*|=%"
IF "%linem%" neq "%line%" SET /a count+=1&SET "line=%linem%"&GOTO analyse
IF %count% neq 35 SET "good="
GOTO :eof

Here's two methods. Testing is your problem....

On

cat|dog|26/7/14|$15.00
cat|dog|26/7/14|$15.00
cat|dog|26/7/14|$15.00
cat|dog|26/7/14|$15.00
cat|dog|26/7/14|$15.00
cat|26/7/14|$15.00
cat|dog|26/7/14|$15.00
cat|dog|26/7/14|$15.00

this command

findstr /r /i /n /v "^.*|.*|.*|.*$" "C:\Users\User\Desktop\test.txt"

shows

6:cat|26/7/14|$15.00

Type findstr /? for more.

It could be done with FINDSTR alone if the number of columns on valid lines was <=15.

For example, the following would show all lines that do not have exactly 3 columns:

findstr /vx "[^|]*|[^|]*|[^|]*" test.txt

But FINDSTR cannot handle more than 15 character class terms. See What are the undocumented features and limitations of the Windows FINDSTR command? for more info. Your search would require 35 such terms.

The following solution returns all the faulty lines, except it ignores empty lines. It relies on REPL.BAT - a hybrid JScript/batch utility that performs a regex search/replace on stdin and writes the result to stdout. REPL.BAT is pure script that will run on any modern Windows machine from XP onward.

The solution uses REPL.BAT to remove all characters from lines that have exactly 36 columns, and then uses FINDSTR to print remaining lines that have at least one character.

<test.txt repl "^([^|]*\|){35}[^|]*$" ""|findstr .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM