简体   繁体   中英

Powershell-MS Word docx table to csv

I've looked for several solutions regarding this as I'm still new to powershell, but there is this same kind of code everywhere. My problem is that it doesnot output all the contents of the word table to csv in the right format. Only one single last column data is output to the csv file. I can't understad where I am wrong. Please help me out.

$objWord = New-Object -Com Word.Application
$filename = 'path to file'
$outputfile= 'path to file'
$objDocument = $objWord.Documents.Open($filename)

$Table = $objDocument.Tables.Item(1)
$TableCols = $Table.Columns.Count
$TableRows = $Table.Rows.Count
for($r=1; $r -le $TableRows; $r++) {
    for($c=1; $c -le $TableCols; $c++) {
        #Write-Host $r "x" $c
        $content = $Table.Cell($r,$c).Range.Text
        Write-Host $content
        $content | Out-File $outputfile
    }
}
$objDocument.Close()
$objWord.Quit()
# Stop Winword Process
$rc = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($objWord)

I've tried using Export-Csv but it gives me length even with adding -noTypeInformation there is no result. Also how can I get a csv file created dynamically, instead of having to create a new empty csv file always?

Your code currently outputs every single cell contents to the file individually.
Creating a cvs from a Word table like this is doable, but you need to capture the cell contents for each row in an array variable first and join the elements with a comma.
Then output the row.

For safety, quote every cell value so that fields having a comma inside do not make for a mis-aligned file afterwards.

Another snag is that Word appends each cel value from a table with control characters 0x0D and 0x07 , so you need to remove those aswell.

Try

$objWord     = New-Object -Com Word.Application
$filename    = 'D:\Test\blah.docx'
$outputfile  = 'D:\Test\blah.csv'
$objDocument = $objWord.Documents.Open($filename)

$Table = $objDocument.Tables.Item(1)
$TableCols = $Table.Columns.Count
$TableRows = $Table.Rows.Count
# this gets the list separator character your local Excel expects when double-clicking a CSV file
$delimiter = [cultureinfo]::CurrentCulture.TextInfo.ListSeparator

for($r = 1; $r -le $TableRows; $r++) {
    # capture an array of cell contents
    $content = for($c = 1; $c -le $TableCols; $c++) {
        # surround each value with quotes to prevent fields that contain the delimiter character would ruin the csv,
        # double any double-quotes the value may contain,
        # remove the control characters (0x0D 0x07) Word appends to the cell text
        # trim the resulting value from leading or trailing whitespace characters
        '"{0}"' -f ($Table.Cell($r,$c).Range.Text -replace '"', '""' -replace '[\x00-\x1F\x7F]').Trim()
    }
    # output this array joined with the delimiter, both on screen and to file
    $content -join $delimiter | Add-Content -Path $outputfile -PassThru
}
$objDocument.Close()
$objWord.Quit()
# Stop Winword Process
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($objDocument)
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($objWord)
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()

Using the file you have made available, the output CSV (opened in Excel) looks like this:

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM