简体   繁体   中英

How do I get a list of all of the XML tags in a file using PowerShell and Regular Expressions?

This question is related to RegEx find all XML tags but I'm trying to do it in Windows PowerShell.

I have an XML file that contains many different XML tags, and the file is Huge, so basically I want to use RegEx to parse the file and spit out the name of all the tags as a list. The XML document is not a valid XML document even though it contains XML tags and elements. So using the XML functions of PowerShell won't work. I get many errors when trying to view it as an XML document, thus the need to use RegEx.

I've determined that the following RegEx identifies the tags (thanks to the related question mentioned above): (?<=<)([^\\/]*?)((?= \\/>)|(?=>))

Here's a very small sniplet of the file I'm parsing:

<data><bp_year /><bp_make>John Deere</bp_make><bp_model>650</bp_model><bp_price>3000.00</bp_price><bp_txtDayPhone>555-555-5555</bp_txtDayPhone><bp_bestPrice>3000.0000</bp_bestPrice><bp_txtComments>Best price available?</bp_txtComments><bp_url>https://www.example.com</bp_url></data>
<data><receiveOffers /><link>http://example.com/inventory.htm?id=2217405&amp;used=1</link><itemName>2007 Yamaha RHINO 660</itemName></data>
<data><vehicleYear>2008</vehicleYear><vehicleMake>Buick</vehicleMake><vehicleModel>Enclave</vehicleModel><vehicleStyle>CX</vehicleStyle><vehicleInformation /><vehicleMileage /><phone>555-555-5555</phone><timeOfDay>Morning</timeOfDay><message /></data>
<data><mo_year>2009</mo_year><mo_make>Webasto</mo_make><mo_model>Air Top 2000</mo_model><mo_price /><mo_txtDayPhone>555-555-5555</mo_txtDayPhone><mo_txtOffer>700</mo_txtOffer><mo_txtTrade /><mo_txtComments /></data>

I really don't have much experience with Powershell, but from my understanding, you can do Grep stuff with it. After searching around on the internet, I found some resources that helped point me towards my solution, via using the powershell Select-String command.

I've attempted the following powershell command, but it gives me way too much feedback. I just want a master "Matches" list.

Select-String -Path '.\dataXML stuff - Copy.xml'-Pattern "(?<=<)([^\/]*?)((?= \/>)|(?=>))" -AllMatches | Format-List -Property Matches

Sample of Output generated:

Matches : {data, vehicleYear, vehicleMake, vehicleModel...}
Matches : {data, address, city, region...}
Matches : {data, vehicleYear, vehicleMake, vehicleModel...}
Matches : {data, vehicleYear, vehicleMake, vehicleModel...}
Matches : {data, address, city, region...}
Matches : {data, vehicleYear, vehicleMake, vehicleModel...}
Matches : {data, vehicleYear, vehicleMake, vehicleModel...}
Matches : {data, mo_year, mo_make, mo_model...}

Basically, I want something like:

data
vehicleYear
vehicleMake
vehicleModel
address
city
region
mo_year
mo_make
mo_model

and so on and on....

Where only the matched strings are returned and listed, rather than telling me what matched on each line of the XML file. I prefer the list format because then I can pump this into Excel and get a distinct list of tag names, and then start actually doing what I need to accomplish, but the overwhelming number of different XML tags and not knowing what they are is holding me up.

Maybe Select-String isn't the best method to use, but I feel like I'm close to my solution after finding this Microsoft post: https://social.technet.microsoft.com/Forums/windowsserver/en-US/d5bbd2fb-c8fa-43ed-b432-79ebfeee82ea/return-only-matches-from-selectstring?forum=winserverpowershell

Basically, here's the solution modified to fit my needs:

Gc 'C:\Documents\dataXML stuff - Copy.xml'|Select-String -Pattern "(?<=<)([^\/]*?)((?= \/>)|(?=>))"|foreach {$_.matches}|select value 

It provides a list of all the xml tags, just like I wanted, except it only returns the first XML tag of that line, so I get a lot of:

data
data
data

but no vehicleYear, vehicleMake, vehicleModel, etc., which would have been the 2nd or 3rd or 11th xml tag of that line.

As for ...

Like I mentioned earlier in the post, I do not use PowerShell at all

Reading is a good thing, but see it in action is better. There are many free video resources to view PowerShell from the beginning, and tons of references. Then the are the MS TechNet virtual labs to leverage.

See this post for folks providing some paths for learning PowerShell.

Does anyone have any experience teaching others powershell?

https://www.reddit.com/r/PowerShell/comments/7oir35/help_with_teaching_others_powershell

Sure you could do it with RegEx, but it is best to handle it natively.

In PowerShell, XML is a big deal; as is JSON. All the help files a just XML files. There are bulit-in cmdlets to deal with it.

# Get parameters, examples, full and Online help for a cmdlet or function

Get-Command -Name '*xml*' | Format-Table -AutoSize

(Get-Command -Name Select-Xml).Parameters
Get-help -Name Select-Xml -Examples
Get-help -Name Select-Xml -Full
Get-help -Name Select-Xml -Online

Get-Help about_*

# Find all cmdlets / functions with a target parameter
Get-Help * -Parameter xml

# All Help topics locations
explorer "$pshome\$($Host.CurrentCulture.Name)"

And many sites that present articles on dealing with it.

PowerShell Data Basics: XML

To master PowerShell, you must know how to use XML. XML is an essential data interchange format because it remains the most reliable way of ensuring that an object's data is preserved. Fortunately, PowerShell makes it all easy, as Michael Sorens demonstrates.

https://www.red-gate.com/simple-talk/sysadmin/powershell/powershell-data-basics-xml

Converting XML to PowerShell PSObject

Recently, I was working on some code (of course) and had a need to convert some XML to PowerShell PSObjects. I found some snippets out there that sort of did this, but not the way that I needed for this exercise. In this case I'm converting XML meta data from Plex.

https://consciouscipher.wordpress.com/2015/06/05/converting-xml-to-powershell-psobject

Mastering everyday XML tasks in PowerShell

PowerShell has awesome XML support. It is not obvious at first, but with a little help from your friends here at PowerShellMagazine.com, you'll soon solve every-day XML tasks – even pretty complex ones – in no time.

So let's check out how you put very simple PowerShell code to work to get the things done that used to be so mind-blowingly complex in the pre-PowerShell era.

http://www.powershellmagazine.com/2013/08/19/mastering-everyday-xml-tasks-in-powershell

For all intents and purposes, if I just take one row for your sample, and do this using the .Net xml namespace...

($MyXmlData = [xml]'<data><bp_year /><bp_make>John Deere</bp_make><bp_model>650</bp_model><bp_price>3000.00</bp_price><bp_txtDayPhone>555-555-5555</bp_txtDayPhone><bp_bestPrice>3000.0000</bp_bestPrice><bp_txtComments>Best price available?</bp_txtComments><bp_url>https://www.example.com</bp_url></data>')

data
----
data

You get resutls like this...

$MyXmlData.data

bp_year        : 
bp_make        : John Deere
bp_model       : 650
bp_price       : 3000.00
bp_txtDayPhone : 555-555-5555
bp_bestPrice   : 3000.0000
bp_txtComments : Best price available?
bp_url         : https://www.example.com

with intellisene / autocomplete of the nodes / elements...

$MyXmlData.data.bp_year

Another view...

$MyXmlData.data | Format-Table -AutoSize

bp_year bp_make    bp_model bp_price bp_txtDayPhone bp_bestPrice bp_txtComments        bp_url                 
------- -------    -------- -------- -------------- ------------ --------------        ------                 
        John Deere 650      3000.00  555-555-5555   3000.0000    Best price available? https://www.example.com

And from that, just geting the tags / names

$MyXmlData.data.ChildNodes.Name

bp_year
bp_make
bp_model
bp_price
bp_txtDayPhone
bp_bestPrice
bp_txtComments
bp_url

So, armed with the above approaches / notes. It just becomes a matter of looping through your file to get all you are after.

So, just taking your sample and dumping it into a file with no changes, one can do this.

$MyXmlData = (Get-Content -Path 'D:\Scripts\MyXmlData.xml')

$MyXmlData | Format-List -Force

ForEach($DataRow in $MyXmlData)
{
    ($DataObject = [xml]$DataRow).Data | Format-Table -AutoSize

}

bp_year bp_make    bp_model bp_price bp_txtDayPhone bp_bestPrice bp_txtComments        bp_url                 
------- -------    -------- -------- -------------- ------------ --------------        ------                 
        John Deere 650      3000.00  555-555-5555   3000.0000    Best price available? https://www.example.com



receiveOffers link                                               itemName             
------------- ----                                               --------             
              http://example.com/inventory.htm?id=2217405&used=1 2007 Yamaha RHINO 660



vehicleYear vehicleMake vehicleModel vehicleStyle vehicleInformation vehicleMileage phone        timeOfDay message
----------- ----------- ------------ ------------ ------------------ -------------- -----        --------- -------
2008        Buick       Enclave      CX                                             555-555-5555 Morning          



mo_year mo_make mo_model     mo_price mo_txtDayPhone mo_txtOffer mo_txtTrade mo_txtComments
------- ------- --------     -------- -------------- ----------- ----------- --------------
2009    Webasto Air Top 2000          555-555-5555   700    



ForEach($DataRow in $MyXmlData)
{
    ($DataObject = [xml]$DataRow).Data.ChildNodes.Name

}


bp_year
bp_make
bp_model
bp_price
bp_txtDayPhone
bp_bestPrice
bp_txtComments
bp_url
receiveOffers
link
itemName
vehicleYear
vehicleMake
vehicleModel
vehicleStyle
vehicleInformation
vehicleMileage
phone
timeOfDay
message
mo_year
mo_make
mo_model
mo_price
mo_txtDayPhone
mo_txtOffer
mo_txtTrade
mo_txtComments

Yet, note, this is not the only way to do this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM