简体   繁体   English

使用CSV文件中的键->值对替换XML中的多个字符串

[英]Replace multiple strings in XML using a key->value pair in a CSV file

I have a dump from our application server which contains XML of multiple strings. 我从我们的应用程序服务器中转储,其中包含多个字符串的XML。 I am interested in the userID, which is embedded in the XML tags and in the format of (lasfir1) as in the XML examples below: 我对userID感兴趣,该用户ID嵌入在XML标记中,格式为(lasfir1),如下面的XML示例所示:

<row>
  <string></string>
  <integer>2177</integer>
  <string>assignee =lasfir1 </string>
  <string>Firstname Lastname</string>
  <integer>10</integer>
  <string xsi:nil="true"/>
  <integer>450</integer>
</row>

<row>
  <string>#ffd600</string>
  <integer>2199</integer>
  <integer>23</integer>
  <integer>474</integer>
  <string>assignee</string>
  <string>lasfir1</string>
</row>

<row>
  <integer>1536</integer>
  <string>lasfir1</string>
  <integer>235</integer>
  <string>USER</string>
</row>

<row>
  <string>#ffd610</string>
  <integer>2200</integer>
  <integer>25</integer>
  <integer>464</integer>
  <string>assignee</string>
  <string>lisfar1</string>
</row>

The requirement is to convert the string "lasfir1" only into its equivalent Email ID, which are available in another CSV (text) file which has key->value pairing of the userID and Email ID: 要求是仅将字符串“ lasfir1”转换为等效的电子邮件ID,该字符串在另一个CSV(文本)文件中可用,该文件具有userID和Email ID的键-值对:

FirstName.LastName@abc.com,lasfir1
FarstName.ListName@abc.com,lisfar1
LastName.FirstName@abc.com,firlas1

The XML may not always be the same, but the string will be the one to search for, not the pattern of what is ahead or behind it. XML可能并不总是相同,但是字符串将是要搜索的字符串,而不是字符串前后的模式。

Is there some simple way to read the key->value pair (in the CSV file), check if the key (userID) exists in the XML file and then replace it with the 'value' (Email ID) 有什么简单的方法可以读取key-> value对(在CSV文件中),检查XML文件中是否存在key(用户ID),然后将其替换为“ value”(电子邮件ID)

This is required for a set of 300+ userID and Email ID combinations, all of which might not be in the XML. 这对于300个以上的userID和Email ID组合是必需的,所有这些组合都可能不在XML中。

Check out this Perl one liner solution: 查看此Perl一种衬板解决方案:

$ cat gagneet.csv
FirstName.LastName@abc.com,lasfir1
FarstName.ListName@abc.com,lisfar1
LastName.FirstName@abc.com,firlas1

$ cat gagneet.xml
<row>
  <string></string>
  <integer>2177</integer>
  <string>assignee =lasfir1 </string>
  <string>Firstname Lastname</string>
  <integer>10</integer>
  <string xsi:nil="true"/>
  <integer>450</integer>
</row>

. . . . 
. . . . 

$ perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat gagneet.csv) ; $content=qx(cat gagneet.xml);while($content=~/(<row>)(.*?)(<\/row>)/smg) { $xml=$2;forea
ch $y (keys %kv) { $xml=~s/${y}/$kv{$y}/gm; } print "$1$xml$3\n"; } exit } '
<row>
  <string></string>
  <integer>2177</integer>
  <string>assignee =FirstName.LastName@abc.com </string>
  <string>Firstname Lastname</string>
  <integer>10</integer>
  <string xsi:nil="true"/>
  <integer>450</integer>
</row>
<row>
  <string>#ffd600</string>
  <integer>2199</integer>
  <integer>23</integer>
  <integer>474</integer>
  <string>assignee</string>
  <string>FirstName.LastName@abc.com</string>
</row>
<row>
  <integer>1536</integer>
  <string>FirstName.LastName@abc.com</string>
  <integer>235</integer>
  <string>USER</string>
</row>
<row>
  <string>#ffd610</string>
  <integer>2200</integer>
  <integer>25</integer>
  <integer>464</integer>
  <string>assignee</string>
  <string>FarstName.ListName@abc.com</string>
</row>

If you want edit only between tags, then 如果只想在标签之间进行编辑,则

$ perl -ne 'BEGIN { %kv=map{chomp;(split(",",$_))[1,0] } qx(cat gagneet.csv) ; $content=qx(cat gagneet.xml);while($content=~/(<row>)(.*?)(<\/row>)/smg) { $xml=$2;forea
ch $y (keys %kv) { $xml=~s/<string>${y}<\/string>/<string>$kv{$y}<\/string>/gm; } print "$1$xml$3\n"; } exit } '

Created a script using Python3, which takes in the input as the CSV and the XML file and outputs an XML file with the changes. 使用Python3创建了一个脚本,该脚本将输入作为CSV和XML文件,并输出带有更改的XML文件。 The command is: 该命令是:

python xml_converter.py –csvfile file.csv –xmlfile file.xml –outfile output_file.xml

Not totally optimized as I would want it to be and running on a single thread, and assumption is that the files are utf-8 encoded. 并没有像我希望的那样完全优化并在单个线程上运行,并且假设文件是​​utf-8编码的。

usage: Replace username to user email of a given xml file
       [-h] --csvfile CSVFILE --xmlfile XMLFILE --outfile OUTFILE

optional arguments:
  -h, --help         show this help message and exit
  --csvfile CSVFILE  csv file that provide user name and email pair
  --xmlfile XMLFILE  xml file that to be searched and replaced
  --outfile OUTFILE  output file name

The basic script is: 基本脚本是:

class XMLConvert:
    def __init__(self, csv, xml, out):
        self._csv = csv
        self._xml = xml
        self._out = out

        self._kv_dict = self.prepare_kv_dict()

    def prepare_kv_dict(self):
        with open(self._csv, newline='', encoding='utf-8') as f:
            reader = csv.reader(f)
            result = dict()
            for row in reader:
                result[row[1]] = row[2]
        return result

    def convert(self):
        with open(self._xml, 'r', encoding='utf-8') as f:
            for line in f:
                _line = self.convert_line(line)
                yield _line

    def convert_line(self, line):
        # self._kv_dict = {'lasfir1': 'First.Name@abc.com'}
        for k, v in self._kv_dict.items():
            if k.lower() in line:
                # print(line)
                return re.sub(r'{}'.format(k), v, line)
        return line

    def start(self):
        with open(self._out, 'w', encoding='utf-8') as f:
            for line in self.convert():
                f.write(line)


if __name__ == '__main__':
    csv_file, xml_file, out_file = parse_args()
    converter = XMLConvert(csv_file, xml_file, out_file)
    converter.start()

I am trying to add threads and modify it accordingly to optimize the running of it. 我正在尝试添加线程并相应地对其进行修改以优化其运行。 If anyone has a better way then please do inform. 如果有人有更好的方法,请告知。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM