I have a directory of xml files and I want to extract the title for each. I am very new to powershell, and have tried the following.
Get-ChildItem -recurse | Get-Content | Select-String -pattern "<title>" -list | Set-Content protid_output.txt
An example of the relevant part of the xml files: < title> protein name < /title>
This outputs the title tag but not the actual title. How can I go through the directory and output the titles to one file?
If you're sure that the all of <title>this title</title>
is on a SINGLE line, then try:
Get-ChildItem -recurse | % {
((Get-Content .\test.xml) -match "<title>" -replace '<title>' -replace '</title>').Trim()
} | Set-Content protid_output.txt
If they are more like:
<?xml version="1.0" encoding="ISO-8859-1"?>
<example>
<title>
protein name
</title>
</example>
Then try parsing it to xml-object first(easier to read), but avoid on 10+ MB files. Example:
Get-ChildItem -Recurse | % {
$x = [xml](Get-Content $_)
$x.example.title.Trim()
} | Set-Content protid_output.txt
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.