简体   繁体   中英

Batch file to extract string from multiple text files

I have over a hundred text files formatted like this

    <TITLE> This is the title
    <SUBJECT> This is the subject
    <XTITLE>

I want to extract the title values using a Windows batch file, eg "This is the title" from each of these text files to a single output file, and include also the filename of the text file where these were found. Each text file can have multiple title tags. Example output below:

This is the title textfile1.txt This is the second title textfile1.txt

This is the third title textfile2.txt

This is the fourth title textfile3.txt

Anyone?

@ECHO Off
SETLOCAL
SET "sourcedir=U:\sourcedir"
SET "destdir=U:\destdir"
SET "outfile=%destdir%\outfile.txt"
(
FOR /f "delims=" %%i IN ('dir /b/a-d "%sourcedir%\*.txt"') DO (
 FOR /f "usebackqtokens=1-3delims=<=>" %%a IN ("%sourcedir%\%%i") DO (
  IF "%%b"=="TITLE" ECHO(%%i %%c
  IF "%%a"=="TITLE" ECHO(%%i %%b
 )
)
)>"%outfile%"

GOTO :EOF

You would need to change the settings of sourcedir and destdir to suit your circumstances.

Produces the file defined as %outfile%

The if...%%a line will be invoked if there are no leading spaces, the if...%%b if there are leading spaces.

I changed the order of the report fields as that seemed to make more sense to me. If you truly want the report in the opposite order, simply revers the %%i and %%a/%%b in the echo statements.

This routine produces one line per input file.


@ECHO Off
SETLOCAL
SET "sourcedir=U:\sourcedir"
SET "destdir=U:\destdir"
SET "outfile=%destdir%\outfile.txt"
(
FOR /f "delims=" %%i IN ('dir /s/b/a-d "%sourcedir%\*.txt"') DO (
 FOR /f "usebackqtokens=1-3delims=<=>" %%a IN ("%%i") DO (
  IF "%%b"=="TITLE" ECHO(%%i %%c
  IF "%%a"=="TITLE" ECHO(%%i %%b
 )
)
)>"%outfile%"

GOTO :EOF

Same routine adjusted to include scan of subdirectories. Note that in this case, dir /s /b includes the path in the listing.

You may wish to put the echo ed %%i in quotes in case of separators in path/filenames.

@echo off
pushd "c:\folder_with_files"
for %%# in (textfile*.txt) do (

  for /f "tokens=1* delims=>" %%a in ('find "<SUBJECT>" "%%#"') do (
    if "%%b" neq "" (
        echo %%b : file %%#
    )
  )
)>>"c:\output.txt"

You might need to change the mask of the files in the first for loop and you need to change the PUSHD location

This method should run faster, specially if the files are large:

@echo off
setlocal EnableDelayedExpansion

rem Group titles of same files in same array elements
for /F "tokens=1,3 delims=:>" %%a in ('findstr /L "<TITLE>" *.txt') do (
   set "t[%%a]=!t[%%a]!  %%b"
)

rem Show the titles
(for /F "tokens=2,3 delims=[]=" %%a in ('set t[') do echo %%~Fa: %%b) > output.txt

For example, with these input files:

textfile1.txt

    <TITLE> This is the title
    <SUBJECT> This is the subject
    <XTITLE>

    <TITLE> This is the second title
    <SUBJECT> This is the subject
    <XTITLE>

textfile2.txt

    <TITLE> This is the third title
    <SUBJECT> This is the subject
    <XTITLE>

textfile3.txt

    <TITLE> Fourth title
    <SUBJECT> This is the subject
    <XTITLE>

    <TITLE> Fifth title
    <SUBJECT> This is the subject
    <XTITLE>

    <TITLE> Sixth title
    <SUBJECT> This is the subject
    <XTITLE>

This is the output:

C:\Folder\textfile1.txt:    This is the title   This is the second title
C:\Folder\textfile2.txt:    This is the third title
C:\Folder\textfile3.txt:    Fourth title   Fifth title   Sixth title

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM