简体   繁体   English

在Windows中计算文件中的定界符

[英]Count delimiter in a file in windows

I have a bunch of files that contains thousands of records.The structure of each file is same. 我有一堆包含数千条记录的文件。每个文件的结构是相同的。

Each record is on a separate line and has multiple fields separated by a delimiter '|'. 每个记录在单独的行上,并具有多个由定界符“ |”分隔的字段。

Each row should have 36 fields, but the problem is some of these rows has <>35 fields, ie <>35 '|' 每行应具有36个字段,但问题是其中某些行具有<> 35个字段,即<> 35'|' characters. 字符。

Can someone please suggest a way in windows, by which I can identify the row. 有人可以在Windows中建议一种方法,通过它我可以识别行。 (Like record with delimiters <>35 should be written to bad file). (如记录中带有分隔符<> 35的记录应写入错误的文件中)。

@ECHO Off
SETLOCAL
:: Looking for exactly 36 fields - no empty fields
FOR /f "delims=" %%a IN (q25936461.txt) DO (
 SET good=Y
 FOR /f "tokens=1,30*delims=|" %%m IN ("%%a") DO (
  IF "%%o" equ "" (SET "good=") ELSE (
   FOR /f "tokens=1,6,7delims=|" %%p IN ("%%o") DO (
    IF "%%r" neq "" SET "good="
    IF "%%q" equ "" SET "good="
   )
  )
 )
 IF NOT DEFINED good ECHO(%%a
)
ECHO ========== method 1 done =============
:: Looking for exactly 36 fields - allow empty fields
FOR /f "delims=" %%a IN (q25936461.txt) DO (
 SET good=Y
 SET "line=%%a"
 SET /a count=0
 CALL :analyse
 IF NOT DEFINED good ECHO %%a
)
ECHO ========== method 2 done =============

GOTO :EOF
:analyse
SET "linem=%line:*|=%"
IF "%linem%" neq "%line%" SET /a count+=1&SET "line=%linem%"&GOTO analyse
IF %count% neq 35 SET "good="
GOTO :eof

Here's two methods. 这是两种方法。 Testing is your problem.... 测试是您的问题。

On

cat|dog|26/7/14|$15.00
cat|dog|26/7/14|$15.00
cat|dog|26/7/14|$15.00
cat|dog|26/7/14|$15.00
cat|dog|26/7/14|$15.00
cat|26/7/14|$15.00
cat|dog|26/7/14|$15.00
cat|dog|26/7/14|$15.00

this command 这个命令

findstr /r /i /n /v "^.*|.*|.*|.*$" "C:\Users\User\Desktop\test.txt"

shows 节目

6:cat|26/7/14|$15.00

Type findstr /? 键入findstr /? for more. 更多。

It could be done with FINDSTR alone if the number of columns on valid lines was <=15. 如果有效行上的列数小于等于15,则可以单独使用FINDSTR来完成。

For example, the following would show all lines that do not have exactly 3 columns: 例如,以下内容将显示所有不完全具有三列的行:

findstr /vx "[^|]*|[^|]*|[^|]*" test.txt

But FINDSTR cannot handle more than 15 character class terms. 但是FINDSTR不能处理超过15个字符类术语。 See What are the undocumented features and limitations of the Windows FINDSTR command? 请参阅Windows FINDSTR命令的未记录功能和限制是什么? for more info. 有关更多信息。 Your search would require 35 such terms. 您的搜索将需要35个这样的术语。

The following solution returns all the faulty lines, except it ignores empty lines. 以下解决方案返回所有有故障的行,但忽略空行。 It relies on REPL.BAT - a hybrid JScript/batch utility that performs a regex search/replace on stdin and writes the result to stdout. 它依赖于REPL.BAT-一个混合的JScript / batch实用程序 ,它在stdin上执行正则表达式搜索/替换并将结果写入stdout。 REPL.BAT is pure script that will run on any modern Windows machine from XP onward. REPL.BAT是纯脚本,它将在XP以后的任何现代Windows计算机上运行。

The solution uses REPL.BAT to remove all characters from lines that have exactly 36 columns, and then uses FINDSTR to print remaining lines that have at least one character. 该解决方案使用REPL.BAT从正好具有36列的行中删除所有字符,然后使用FINDSTR打印至少具有一个字符的其余行。

<test.txt repl "^([^|]*\|){35}[^|]*$" ""|findstr .

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM