little bit of background: I work at a multilingual communication company, where we're working with a CMS system. Since its last update, all the files I export out of the system are 'polluted' with metadata, which I don't want to see, use or replace. To filter and change a heap of xml files, I use Powergrep, which operates with regexes.
I want my regex to find, eg "there is no spoon", "oracle", "I know kung-fu" and "bending method" (all straight quotation marks) and replace it with “there is no spoon”, “oracle”, “I know kung-fu” and “bending method” (all with curly quotation marks).
I don't want it to find the metadata "concept.dtd"
and "map.dtd"
The following lines are the first lines of my xml file. It's this "concept.dtd"
that I would like to ignore.
<?xml version="1.0" encoding="UTF-16" standalone="no"?>
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"[
]>
<?ish ishref="GUID-6B84EF92-DA99-4C54-BA91-FD0A113D4A96" version="1" lang="sv" srclng="en"?>
This is somewhere in the middle of the xml file
<row>
<entry colname="col1" valign="middle" align="left">"Bending method" </entry>
<entry colname="col2" valign="middle" align="left">another word</entry>
</row>
So.. this is the original regex:
(?<!=)”\b(.+?)\b”(?! \[)
Replacement:
“1”
Problem: As the metadata “concept.dtd” and “map.dtd” are part of the file, I don't want to replace their quotation marks in order not to change anything crucial. So I tried rewriting the regex:
(?<!=)”\b(.+?[\.d])\b”(?! \[)
It almost works: “concept.dtd” and “map.dtd” are skipped, most of the terms between quotation marks are found, but not all: “Bending method” is not found, for example.
What am I missing? Any help or opinions would be greatly appreciated!
Based on your last answers, here is a regexp that can help you:
(?<=<entry)[^>]+>[^<>]*?(".+?")[^<>]*?(?=<\x2Fentry>)
I assume that you have one serie of words between straight quotations and that there are no carriage returns ou line feeds inside an <entry>
node.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.