简体   繁体   English

sed to python取代了额外的分隔符

[英]sed to python replace extra delimiters in a

sed 's/\\t/_tab_/3g' sed's / \\ t / _tab_ / 3g'

I have a sed command that basically replaces all excess tab delimiters in my text document. 我有一个sed命令基本上替换了我的文本文档中所有多余的制表符分隔符。 My documents are supposed to be 3 columns, but occasionally there's an extra delimiter. 我的文档应该是3列,但偶尔会有一个额外的分隔符。 I don't have control over the files. 我无法控制文件。

I use the above command to clean up the document. 我使用上面的命令来清理文档。 However all my other operations on these files are in python. 但是我对这些文件的所有其他操作都在python中。 Is there a way to do the above sed command in python? 有没有办法在python中执行上面的sed命令?

sample input: 样本输入:

Column1   Column2         Column3
James     1,203.33        comment1
Mike      -3,434.09       testing testing 123
Sarah     1,343,342.23    there   here

sample output: 样本输出:

Column1   Column2         Column3
James     1,203.33        comment1
Mike      -3,434.09       testing_tab_testing_tab_123
Sarah     1,343,342.23    there_tab_here

You may read the file line by line, split with tab, and if there are more than 3 items, join the items after the 3rd one with _tab_ : 您可以逐行读取文件,使用制表符拆分,如果有超过3个项目,请使用_tab_加入第3个项目之后的项目:

lines = []
with open('inputfile.txt', 'r') as fr:
    for line in fr:
        split = line.split('\t')
        if len(split) > 3:
            tmp = split[:2]                      # Slice the first two items
            tmp.append("_tab_".join(split[2:]))  # Append the rest joined with _tab_
            lines.append("\t".join(tmp))         # Use the updated line
        else:
            lines.append(line)                   # Else, put the line as is

See the Python demo 请参阅Python演示

The lines variable will contain something like lines变量将包含类似的内容

Mike    -3,434.09   testing_tab_testing_tab_123
Mike    -3,434.09   testing_tab_256
No  operation   here
import os
os.system("sed -i 's/\t/_tab_/3g' " + file_path)

Does this work? 这有用吗? Please notice that there is a -i argument for the above sed command, which is used to modify the input file inplace. 请注意上面的sed命令有一个-i参数,用于修改输入文件。

You can mimic the sed behavior in python: 你可以模仿python中的sed行为:

import re

pattern = re.compile(r'\t')
string = 'Mike\t3,434.09\ttesting\ttesting\t123'
replacement = '_tab_'
count = -1
spans = []
start = 2 # Starting index of matches to replace (0 based)
for match in re.finditer(pattern, string):
    count += 1
    if count >= start:
        spans.append(match.span())
spans.reverse()
new_str = string
for sp in spans:
     new_str = new_str[0:sp[0]] + replacement + new_str[sp[1]:]

And now new_str is 'Mike\\t3,434.09\\ttesting_tab_testing_tab_123' . 现在new_str'Mike\\t3,434.09\\ttesting_tab_testing_tab_123'

You can wrap it in a function and repeat for every line. 您可以将其包装在一个函数中,并为每一行重复。 However, note that this GNU sed behavior isn't standard: 但请注意,此GNU sed行为不是标准的:

'NUMBER' Only replace the NUMBERth match of the REGEXP. 'NUMBER'仅替换REGEXP的第NUMBER个匹配。

  interaction in 's' command Note: the POSIX standard does not specify what should happen when you mix the 'g' and NUMBER modifiers, and currently there is no widely agreed upon meaning across 'sed' implementations. For GNU 'sed', the interaction is defined to be: ignore matches before the NUMBERth, and then match and replace all matches from the NUMBERth on. 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM