如何使用字典有效地替换基于CSV的大型数组中的字符串？

Question

我有一个非常大的数组，其中包含许多行和许多列（称为“ self.csvFileArray ”），这些数组由我从CSV文件读取的行组成，并在处理CSV文件的类中使用以下代码...

with open(self.nounDef["Noun Source File Name"], 'rU') as csvFile:
  for idx, row in enumerate(csv.reader(csvFile, delimiter=',')):
    if idx == 0:
      self.csvHeader = row
    self.csvFileArray.append(row)

我有很长的替换映射字典，我想将其用于替换...

replacements = {"str1a":"str1b", "str2a":"str2b", "str3a":"str3b", etc.}

我想在一个看起来如下的类方法中执行此操作...

def m_globalSearchAndReplace(self, replacements):
  # apply replacements dictionary to self.csvFileArray...

我的问题：使用“ replacements ”字典在整个数组“ self.csvFileArray ”中替换字符串的最有效方法是什么？

澄清说明：

我看了一下这篇文章，但似乎无法使其适用于这种情况。
另外，我想替换匹配单词中的字符串，而不仅仅是整个单词。 因此，使用“ SomeCompanyName”：“ xyz”的替换映射，我可能会遇到这样的句子：“ SomeCompanyName公司拥有名为abcSomeCompanyNamedef的产品的专利。 ”您会注意到，必须在两次替换字符串中句子...一次作为一个整体单词，一次作为嵌入式字符串。

Answer 1

以下内容适用于上述情况，并且已经过全面测试...

  def m_globalSearchAndReplace(self, dataMap):
    replacements = dataMap.m_getMappingDictionary()
    keys = replacements.keys()
    for row in self.csvFileArray: # Loop through each row/list
      for idx, w in enumerate(row): # Loop through each word in the row/list
        for key in keys: # For every key in the dictionary...
          if key != 'NULL' and key != '-' and key != '.' and key != '':
            w = w.replace(key, replacements[key])
        row[idx] = w

简而言之，循环遍历csvFileArray中的每一行并获取每个单词。
然后，对于该行中的每个单词，遍历字典的键（称为“替换”）以访问和应用每个映射。
然后（假设条件正确）用映射的值（在字典中）替换该值。

注意： 虽然可以使用，但我不认为使用无限循环是解决问题的最有效方法，并且我相信必须有使用正则表达式的更好方法。 因此，我将对此开放一会儿，看看是否有人可以改善答案。

Answer 2

大循环？ 您可以将csv文件作为字符串加载，因此只需要浏览一次列表即可，而不是每个项目都需要浏览。 尽管由于python字符串是不可变的，它的效率不是很高，但是无论哪种方式，您仍然面临着同样的问题。

根据这个答案，优化在Python中查找和替换大文件（提高效率），也许逐行效果更好，因此，如果这确实成为问题，那么您就不必在内存中使用巨型字符串。

编辑：所以像这样...

# open original and new file.
with open(old_file, 'r') as old_f, open(new_file, 'w') as new_f:
    # loop through each line of the original file (old file)
    for old_line in old_f:
        new_line = old_line
        # loop through your dictionary of replacements and make them.
        for r in replacements:
            new_line = new_line.replace(r, replacements[r])
        # write each line to the new file.
        new_f.write(new_line)

无论如何，我会忘记该文件是一个csv文件，只是将其视为大量的行或字符集。

如何使用字典有效地替换基于CSV的大型数组中的字符串？

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-10-24 01:29:32

解决方案2
0 2017-10-20 22:52:14

如何使用字典有效地替换基于CSV的大型数组中的字符串？

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-10-24 01:29:32

解决方案2 0 2017-10-20 22:52:14

解决方案1
1 已采纳 2017-10-24 01:29:32

解决方案2
0 2017-10-20 22:52:14