使用python字典查找/替换字符串的csv具有多个字符串，每个单元可替换

Question

请注意，这是我的原始查询的修订本/精炼本，希望比我的第一次尝试更加清晰。 我是编程领域的新手，试图创建一个脚本，该脚本基本上会进行一系列特定的查找，并使用另一个csv表作为更正指南在csv上进行替换。 （即chiken变成鸡肉，bcon变成培根）

所以在简单的情况下：
chikn，1，a
bcon，2，b
egs，3，c

变成
小鸡1，a
培根，2，b
鸡蛋3，c

到目前为止，使用下面的代码，我已经基于输入的csv构建了一个词典，并且能够按照简单情况中的预期转换目标csv上的大多数校正。 但是，真正的挑战是，实际的数据集通常每个单元格具有1-3个条目（它们之间的共同偏斜符号：），并且其中许多将带有空格（即，是短语而不是单个单词）。 在具有更新的词典的先前示例的基础上，这将是：

开始于：
chk三明治：egs，1，a
bcon，2，b
Bcon：egs，3，c

应该以：
三明治鸡肉：鸡蛋1,1
培根，b，2
培根：鸡蛋，3，c

相反，我当前的输出会删除后一部分并打印
三明治鸡肉1，a
培根，b，2
培根3，c

码：

#!/usr/bin/env python
"""A script for finding and replacing values in CSV files.

"""

import csv
import sys


def main(args):
    """Execute the transformation script.

    Args:
        args (list of `str`): The command line arguments.

    """
    transform(args[1], args[2], create_reps(args[3]), int(args[4]))


def transform(infile, outfile, reps, column):
    """Write a new CSV file with replaced text.

    Args:
        infile (str): the sheet of original text with errors
        outfile (str): the sheet with the revised text with corrections in place of errors
        reps (:obj: `str`): dictionary of error word and corrected word
        column (int): the column (0 based) the word revisions will take place in

    """
    with open(infile) as csvfile:
        with open(outfile, 'w') as w:
            spamreader = csv.reader(csvfile)
            spamwriter = csv.writer(w)
            for row in spamreader:
                row[column] = replace_all(row[column], reps)
                spamwriter.writerow(row)


def create_reps(infile):
    """Create reps object to use as reference dictionary for transform.

    Args:
        infile (str): The sheet of original and corrected words used to
        generate dicitonary

    Returns:
        (:obj: `str`): a dictionary listing the error words and their
        corrections

    """
    reps = {}
    with open(infile) as csvfile:
        dictreader = csv.reader(csvfile)
        for row in dictreader:
            reps[row[0]] = row[1]

    return reps


# def replace_all(text, reps):
    #"""Original Version: Iterate through `reps` and replace key => value in `text`.

    # Args:
      #text (str): The text to search and replace.
   # reps (:obj: `str`): Search for `key` and replace with `value`

   # Returns:
     # (str): The string with the replacements.

    """
    # last = text
    # for i, j in reps.items():
     #   text = text.replace(i, j)
      #  if last != text:
       #     return text

def new_replace_all(text, reps):
    """Updated Version: Do a single-pass replacement from a dictionary"""
    pattern = re.compile(r'\b(' + '|'.join(reps.keys()) + r')\b')
    return pattern.sub(lambda x: reps[x.group()], text)

if __name__ == "__main__":
    main(sys.argv)

预先感谢大家的时间和支持。 我期待您的指导！

最好。

----------------更新4/5/18 ---------------------------- ---------

有了HFBrowing的大力支持，我已经能够修改此代码以与最初提供的示例数据集一起使用。 但是，在我的实际应用程序中，我发现当暴露给数据集中某些更复杂的字符串匹配项时，它仍然崩溃。 我欢迎有关如何解决此问题的任何建议，并在下面提供了一些示例和错误。

理想情况下，给定单元格中的项目之间用“ |”链接 将保持在一起，并且在给定单元格中由“：”链接的项目将被视为单独的字符串并被单独替换。

因此，如果：
“ A | first” =“ A1”和“ B | first” =“ B1”
然后
“ A | first：B | first”应转换为“ A1：B1”。

使用这个更复杂的字符串数据，我提供了示例，预期的输出和当前的输出以及收到的错误代码。

样本字典 。
错误词，正确词。
精算学，会计：精算学。
人类学，人类学：一般。
未声明，未确定。
信息技术与行政管理|行政管理
专业，信息技术和行政
管理：行政管理专业化。
生物学，生物学。

样本输入 。
专业，ID，最后
精算科学，111，史密斯。
人类学，222，鲍勃。
人类学：精算科学，333，约翰逊。
信息技术与行政管理|行政管理专业，444，弗兰克。
555，未公开

当前输出错误：

    Traceback (most recent call last):  
  File "myscript3.py", line 89, in <module> . 
    main(sys.argv) . 
  File "myscript3.py", line 21, in main . 
    transform(args[1], args[2], create_reps(args[3]), int(args[4])) . 
  File "myscript3.py", line 41, in transform . 
    row[column] = new_replace_all(row[column], reps) . 
  File "myscript3.py", line 68, in new_replace_all . 
    return pattern.sub(lambda x: reps[x.group()], text)  
  File "myscript3.py", line 68, in <lambda> .   
    return pattern.sub(lambda x: reps[x.group()], text) .   
KeyError: 'Information Technology and Administrative Management' .

电流输出csv 。
“少校，身份证，最后一位。
会计：精算学，111，Sumeri。
人类学：222，尼尔森将军。
人类学：一般；会计：精算学，333，纽曼。 ”

-----------------------更新4/6/18：已解决------------------- -------

大家好，

谢谢大家的支持。 在一位同事的建议下，我将原始的“ Replace_all”代码修改如下。 现在，这似乎在我的上下文中按预期工作。

再次感谢您的时间和支持！

码

   #!/usr/bin/env python
"""A script for finding and replacing values in CSV files.

Example::
    ./myscript school-data.csv outfile-data.csv replacements.csv 4

"""

import csv
import sys


def main(args):
    """Execute the transformation script.

    Args:
        args (list of `str`): The command line arguments.

    """
    transform(args[1], args[2], create_reps(args[3]), int(args[4]))


def transform(infile, outfile, reps, column):
    """Write a new CSV file with replaced text.

    Args:
        infile (str): the sheet of original text with errors
        outfile (str): the sheet with the revised text with corrections in
            place of errors
        reps (:obj: `str`): dictionary of error word and corrected word
        column (int): the column (0 based) the word revisions will take place
            in

    """
    with open(infile) as csvfile:
        with open(outfile, 'w') as w:
            spamreader = csv.reader(csvfile)
            spamwriter = csv.writer(w)
            for row in spamreader:
                row[column] = replace_all(row[column], reps)
                spamwriter.writerow(row)


def create_reps(infile):
    """Create reps object to use as reference dictionary for transform.

    Args:
        infile (str): The sheet of original and corrected words used to
        generate dicitonary

    Returns:
        (:obj: `str`): a dictionary listing the error words and their
        corrections

    """
    reps = {}
    with open(infile) as csvfile:
        dictreader = csv.reader(csvfile)
        for row in dictreader:
            reps[row[0]] = row[1]

    return reps


def replace_all(text, reps):
    """Iterate through `reps` and replace key => value in `text`.

    Args:
      text (str): The text to search and replace.
    reps (:obj: `str`): Search for `key` and replace with `value`

    Returns:
      (str): The string with the replacements.

    """
    last = text
    for i, j in reps.items():
        text = text.replace(i, j)
        #if last != text:
        #    return text
    return text

if __name__ == "__main__":
    main(sys.argv)

Answer 1

实际上，我根本无法使您的代码示例完全能够正常工作来替换事物，因此，我确定与您正在执行的CSV结构相比，它们的结构有所不同。 不过，我认为问题出在您的replace_all()函数中，因为顺序替换文本可能很棘手。 这是针对该链接问题的解决方案，已根据功能进行了调整。 这样可以为您解决问题吗？

def new_replace_all(text, reps):
    """Do a single-pass replacement from a dictionary"""
    pattern = re.compile(r'\b(' + '|'.join(reps.keys()) + r')\b')
    return pattern.sub(lambda x: reps[x.group()], text)

Answer 2

#!/usr/bin/env python
"""A script for finding and replacing values in CSV files.

Example::
    ./myscript school-data.csv outfile-data.csv replacements.csv 4

"""

import csv
import sys


def main(args):
    """Execute the transformation script.

    Args:
        args (list of `str`): The command line arguments.

    """
    transform(args[1], args[2], create_reps(args[3]), int(args[4]))


def transform(infile, outfile, reps, column):
    """Write a new CSV file with replaced text.

    Args:
        infile (str): the sheet of original text with errors
        outfile (str): the sheet with the revised text with corrections in
            place of errors
        reps (:obj: `str`): dictionary of error word and corrected word
        column (int): the column (0 based) the word revisions will take place
            in

    """
    with open(infile) as csvfile:
        with open(outfile, 'w') as w:
            spamreader = csv.reader(csvfile)
            spamwriter = csv.writer(w)
            for row in spamreader:
                row[column] = replace_all(row[column], reps)
                spamwriter.writerow(row)


def create_reps(infile):
    """Create reps object to use as reference dictionary for transform.

    Args:
        infile (str): The sheet of original and corrected words used to
        generate dicitonary

    Returns:
        (:obj: `str`): a dictionary listing the error words and their
        corrections

    """
    reps = {}
    with open(infile) as csvfile:
        dictreader = csv.reader(csvfile)
        for row in dictreader:
            reps[row[0]] = row[1]

    return reps


def replace_all(text, reps):
    """Iterate through `reps` and replace key => value in `text`.

    Args:
      text (str): The text to search and replace.
    reps (:obj: `str`): Search for `key` and replace with `value`

    Returns:
      (str): The string with the replacements.

    """
    last = text
    for i, j in reps.items():
        text = text.replace(i, j)
        #if last != text:
        #    return text
    return text

if __name__ == "__main__":
    main(sys.argv)

使用python字典查找/替换字符串的csv具有多个字符串，每个单元可替换

问题描述

2 个解决方案

解决方案1
0 2018-04-05 18:33:42

解决方案2
0 2018-04-06 22:56:40

使用python字典查找/替换字符串的csv具有多个字符串，每个单元可替换

问题描述

2 个解决方案

解决方案1 0 2018-04-05 18:33:42

解决方案2 0 2018-04-06 22:56:40

解决方案1
0 2018-04-05 18:33:42

解决方案2
0 2018-04-06 22:56:40