简体   繁体   中英

Extract text from MS Word table without bullets [Powershell 4.0]

I want to extract some text from a ms word, from some particular table: 表格和文字

However, when I execute code:

$objWord = New-Object -ComObject Word.Application
$objWord.Visible = $true
$filename = 'D:\test.docx'
$objDocument = $objWord.Documents.Open($filename)
$LETable = $objDocument.Tables.Item(1)
$LETableCols = $LETable.Columns.Count
$LETableRows = $LETable.Rows.Count

Write-output "Starting to write... "

$content2 = $LETable.Cell(6,2).Range.Text
$content3 = $LETable.Cell(7,1).Range.Text
$content4 = $LETable.Cell(7,2).Range.Text
#Write-host $content2
$doc2 = $objWord.Documents.Add()
$objWord.Selection.typetext("$content2")
$objWord.Selection.typetext("$content3")
$objWord.Selection.typetext("$content4")
#$objDocument.Close()
#$objWord.Quit()
# Stop Winword Process
#$rc = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($objWord)

子弹 - 问号

How to remove those bullets-questionmarks? I want just plain text.

You will need to find unicode expression for this character.Once found then use of regular expression can be helpful to replace it with empty character, space or tab. I even try with "direct" copy and paste of "✀" and it works as well.

To remove character: $String -replace '✀'

to replace it:

 $String = "Just ✀ and another ✀"
#replace ✀ with cat
 $String -replace '✀','cat'

Ant the result is:

Just cat and another cat

What you need in actual is to get the plain text of the document. Check out Open-Xml-PowerTools .

As docx files are open xml format, you can take advantage of this tool and its powerful commands.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM