简体   繁体   中英

How to extract numbers from text file using Windows batch file?

I need to do the following using cmd (Windows command line).

I have one file named DDD.CLI026.WK0933.DDDMR45.001.head.

The data in file is as follwing (in one long line)

HEAD HEALTHDMD Weekly  DDD.CLI026 Centocor  W200908021012 
TRAIL0101 000000000581 00000CKSUM00000223680

I need to extract 581 from 000000000581 and copy it in another file, IMS_FILE_to_LND.par, using the Windows command line or DOS.

How do I go about it?

Irveen, for the input file (one line), you can have the following files:

infile.txt (the inputfile on one line):
    HEAD HEALTHDMD Weekly  DDD.CLI026 Centocor  W200908021012
     TRAIL0101 000000000581 00000CKSUM00000223680

pre.txt (the first half of your desired file):
    [WCPIT_BIO_EDW.WF:w_DDDMD_LNDG_IMS_NONRET_SALES]
    $$Cust_RowCount=72648
    $$Sales_RowCount=5235998
    $$OuletChangeLog_RowCount=931

post.txt (the second half of your desired file):
    $$Control_RowCount=4495
    $$Outl_Subcat_RowCount=105
    $$Fac_Subcat_RowCount=149

go.cmd (the command file to create your desired file):
    @echo off
    setlocal enableextensions enabledelayedexpansion
    for /f "tokens=8" %%i in (infile.txt) do (
        set num=%%i
    :loop1
        if "!num!"=="0" goto :skip1
        if not "!num:~0,1!"=="0" goto :skip1
        set num=!num:~1!
        goto :loop1
    :skip1
        type pre.txt >outfile.txt
        echo $$DRM45_RowCount=!num!>>outfile.txt
        type post.txt >>outfile.txt
    )
    endlocal

This produces the file:

outfile.txt:
    [WCPIT_BIO_EDW.WF:w_DDDMD_LNDG_IMS_NONRET_SALES]
    $$Cust_RowCount=72648
    $$Sales_RowCount=5235998
    $$OuletChangeLog_RowCount=931
    $$DRM45_RowCount=581
    $$Control_RowCount=4495
    $$Outl_Subcat_RowCount=105
    $$Fac_Subcat_RowCount=149

which is what, I believe, you wanted from this series of questions.

By way of explanation, the for loop processes your one line, extracting the 8 th field (000...00581). The loop skip section simply removes leading zeros until you have either a 0 on its own or a real number (Windows treats numbers with leading zeros as octal which is no good for us here).

Once the number is extracted, you simply construct your file from a pre and post bit, along with the line you want to modify.

I know, it's a bit more of a kludge than the earlier awk solution I gave but it'll do the trick in Windows without having to add third party software (which you indicated was not an option in one of your other questions).

Update 1: Here is a version that, as requested, uses a single template file to create you output file. The template file must have lines beginning with either "pre:" or "post:" to dictate whether they come before or after the line to be inserted. Lines without a marker are not used at all so you can insert blank lines or comments to your heart's content. So your file would be:

pre:[WCPIT_BIO_EDW.WF:w_DDDMD_LNDG_IMS_NONRET_SALES]
pre:$$Cust_RowCount=72648
pre:$$Sales_RowCount=5235998
pre:$$OuletChangeLog_RowCount=931

post:$$Control_RowCount=4495
post:$$Outl_Subcat_RowCount=105
post:$$Fac_Subcat_RowCount=149

And this is the command script which will give you what you needed. I just used a trick to temporarily create the pre- and post-files to minimize the changes needed.

@echo off
setlocal enableextensions enabledelayedexpansion
del /q /q pre.txt post.txt >nul: 2>nul:
for /f "delims=" %%j in (template.txt) do (
    set ln=%%j
    if "!ln:~0,4!"=="pre:" echo !ln:~4!>>pre.txt
    if "!ln:~0,5!"=="post:" echo !ln:~5!>>post.txt
)
for /f "tokens=8" %%i in (infile.txt) do (
    set num=%%i
:loop1
    if not "!num!"=="0" (
        if "!num:~0,1!"=="0" (
            set num=!num:~1!
            goto :loop1
        )
    )
)
type pre.txt >outfile.txt
echo $$DRM45_RowCount=!num!>>outfile.txt
type post.txt >>outfile.txt
del /q /q pre.txt post.txt >nul: 2>nul:
endlocal

It outputs:

[WCPIT_BIO_EDW.WF:w_DDDMD_LNDG_IMS_NONRET_SALES]
$$Cust_RowCount=72648
$$Sales_RowCount=5235998
$$OuletChangeLog_RowCount=931
$$DRM45_RowCount=581
$$Control_RowCount=4495
$$Outl_Subcat_RowCount=105
$$Fac_Subcat_RowCount=149

just like the pre/post solution above, but with your new requirement satisfied.

Update 2: If you can convince them to go for a Cygwin solution, this is all you need:

x=$(expr 0 + $(awk '{print $8}' infile))
sed "s/^\$\$DRM45_RowCount=.*$/\$\$DRM45_RowCount=$x/" cfgfile >cfgfile_new

With cfgfile containing:

[WCPIT_BIO_EDW.WF:w_DDDMD_LNDG_IMS_NONRET_SALES]
$$Cust_RowCount=72648
$$Sales_RowCount=5235998
$$OuletChangeLog_RowCount=931
$$DRM45_RowCount=whatever
$$Control_RowCount=4495
$$Outl_Subcat_RowCount=105
$$Fac_Subcat_RowCount=149

and infile containing (shorter but same number of fields):

HD HLTHDMD Wkly DDD.CLI Cntcr  W200908021012 TRAIL0101 00581 00000CKSUM680

you get the following cfgfile_new :

[WCPIT_BIO_EDW.WF:w_DDDMD_LNDG_IMS_NONRET_SALES]
$$Cust_RowCount=72648
$$Sales_RowCount=5235998
$$OuletChangeLog_RowCount=931
$$DRM45_RowCount=581
$$Control_RowCount=4495
$$Outl_Subcat_RowCount=105
$$Fac_Subcat_RowCount=149

Voila! So much simpler. Feel free to use the cmd script and Cygwin script to convince your management they should be using better tools :-)

Can you install Cygwin ? Or use Microsoft PowerShell ? If yes, you will then have far more powerful tools (eg regular expressions) for doing that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM