简体   繁体   中英

Powershell multithreading

I have a Powershell script that converts Office documents to PDF. I would like to multithread it, but cannot figure out how based on other examples I have seen. The main script (OfficeToPDF.ps1) scans through a list of files and calls separate scripts for each file type/office application (ex. for .doc files WordToPDF.ps1 is called to convert). The main script passes 1 file name at a time to the child script ( I did this for a couple of reasons).

Here is an example of the main script:

    $documents_path = "C:\Documents\Test_Docs"
    $pdf_out_path = "C:\Documents\Converted_PDFs"
    $failed_path = "C:\Documents\Failed_to_Convert"

    # Sets the root directory of this script
    $PSScriptRoot = Split-Path -parent $MyInvocation.MyCommand.Definition

    $date = Get-Date -Format "MM_dd_yyyy"
    $Logfile = "$PSScriptRoot\logs\OfficeToTiff_$Date.log"

    $word2PDF = "$PSScriptRoot\WordToPDF.ps1"
    $arguments = "'$documents_path'", "'$pdf_out_path'", "'$Logfile'"

    # Function to write to log file
    Function LogWrite
    {
       Param ([string]$logstring)
       $time = Get-Date -Format "hh:mm:ss:fff"

       Add-content $Logfile -value "$date $time $logstring"
    }


################################################################################
# Word to PDF                                                                  #
################################################################################

    LogWrite "*** BEGIN CONVERSION FROM DOC, DOCX, RTF, TXT, HTM, HTML TO PDF ***"

    Get-ChildItem -Path $documents_path\* -Include *.docx, *.doc, *.rtf, *.txt, *.htm? -recurse | ForEach-Object {

            $original_document = "$($_.FullName)"

            # Verifies that a document exists before calling the convert script
            If ($original_document -ne $null)
            {

                Invoke-Expression "$word2PDF $arguments"

                #checks to see if document was successfully converted and deleted.  If not, doc is moved to another directory
                If(Test-Path -path $original_document)
                {
                Move-Item $original_document $failed_path
                }
            }
         }

    $original_document = $null

    [gc]::collect()
    [gc]::WaitForPendingFinalizers()

Here is the script (WordToPDF.ps1) that is called by the main script:

Param($documents, $pdf_out_path, $Logfile)

# Function to write to the log file
Function LogWrite
{
   Param ([string]$logstring)
   $time = Get-Date -Format "hh:mm:ss:fff"

   Add-content $Logfile -value "$date $time $logstring"
}

$word_app = New-Object -ComObject Word.Application

$document = $word_app.Documents.Open($_.FullName)
$original_document = "$($_.FullName)"

# Creates the output file name with path
$pdf_document = "$($pdf_out_path)\$($_.BaseName).pdf"

LogWrite "Converting: $original_document to $pdf_document"
$document.SaveAs([ref] $pdf_document, [ref] 17)
$document.Close()

# Deletes the original document after it has been converted
Remove-Item $original_document
LogWrite "Deleting: $original_document"

$word_app.Quit()

Any suggestions would be appreciated. Thanks.

I was just going to comment and link you to this question: Can PowerShell run commands in Parallel . I then noted the date of that question and the answers, and with PowerShell v3.0 there are some new features that might work better for you.

The question goes over use of the PowerShell jobs . Which can work but require you to keep up with the job status, so can add a bit extra coding to manage.

PowerShell v3 opened up the door a bit more with workflow which is based on Windows Workflow Foundation. A good article on the basics of how this new command works can be found on Script Guy's blog here . You can basically adjust your code to run your conversion via workflow and it will perform this in parallel:

workflow foreachfile {
  foreach -parallel ($f in $files) {
    #Put your code here that does the work
  }
}

Which from what I can find the thread limit this has is 5 threads at a time. I am not sure how accurate that is but blog post here noted the limitation . However, being that the Application com objects for Word and Excel can be very CPU intensive doing 5 threads at a time would probably work well.

I have a multithreaded powershell environment for indicator of compromise scanning on all AD devices- threaded 625 times with Gearman. http://gearman.org

It is open source and allows for an option to go cross platform. It threads with a server worker flow and runs via Python. Extremely recommended by yours truly- someone that has abused threading in powershell. This isn't so much an answer but something that I had never heard of but love and use daily. Pass it forward. Open source for the win :)

I have also used psjobs before and they are great until a certain point of magnitude. Maybe it is my lack of .net expertise but ps has some querky subtle memory nuances that in a large scale can create some nasty effects.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM