Find and Edit Text File

Question

I'm looking to find if there is a way of automating this process. Basically I have 300,000 rows of data needed to download on a daily basis. There are a couple of rows that need to be edited before it can be uploaded to SQL.

Jordan || Michael | 23 | Bulls | Chicago

Bryant | Kobe ||| 8 || LA

What I want to accomplish is to just have 4 vertical bars per row. Normally, I would search for a keyword then edit it manually then save. These two are the only anomalies in my data.

Find "Jordan", then remove the excess 1 vertical bar "|" right after it.
I need to find "Kobe", then remove the two excess vertical bars "|" right after it.

Correct format is below -

Jordan | Michael | 23 | Bulls | Chicago

Bryant | Kobe | 8 || LA

Not sure if this can be done in vbscript or Python. Any help would be appreciated. Thanks!

Answer 1

Python or vbscript could be used but they are overkill for something this simple. Try sed :

$ sed -E 's/(Jordan *)\|/\1/g; s/(Kobe *)\| *\|/\1/g' file 
Jordan | Michael | 23 | Bulls | Chicago
Bryant | Kobe | 8 || LA

To save to a new file:

sed -E 's/(Jordan *)\|/\1/g; s/(Kobe *)\| *\|/\1/g' file >newfile

Or, to change the existing file in-place:

sed -Ei.bak 's/(Jordan *)\|/\1/g; s/(Kobe *)\| *\|/\1/g' file

How it works

sed reads and processes a file line by line. In our case, we need only the substitute command which has the form s/old/new/g where old is a regular expression and, if it is found, it is replaced by new . The optional g at the end of the command tells sed to perform the substitution command 'globally', meaning not just once but as many times as it appears on the line.

s/(Jordan *)\\|/\\1/g

This tells sed to look for Jordan followed by zero or more spaces followed by a vertical bar and remove the vertical bar.
In more detail, the parens in (Jordan *) tell sed to save the string Jordan followed by zero or more spaces as a group. In the replacement side, we reference that group as \\1 .
s/(Kobe *)\\| *\\|/\\1/g

Similarly, this tells sed to look for Kobe followed by zero or more spaces followed by a vertical bar and remove the vertical bar.

Using python

Using the same logic as above, here is a python program:

$ cat kobe.py
import re
with open('file') as f:
    for line in f:
        line = re.sub(r'(Jordan *)\|', r'\1', line)
        line = re.sub(r'(Kobe *)\| *\|', r'\1', line)
        print(line.rstrip('\n'))
$ python kobe.py
Jordan | Michael | 23 | Bulls | Chicago
Bryant | Kobe | 8 || LA

To save that to a new file:

python kobe.py >newfile

Answer 2

I wrote a code snippet in Python 3.5 as follows.

# -*- coding: utf-8 -*-

rows = ["Jordan||Michael|23|Bulls|Chicago",
        "Bryant|Kobe|||8||LA"]

keywords = ["Jordan", "Kobe"]        

def get_keyword(row, keywords):
    for word in keywords:
        if word in row:
            return word
    else:
        return None            

for line in rows:
    num_bars = line.count('|')
    num_bars_del = num_bars - 4  # Number of bars to be deleted
    kw = get_keyword(line, keywords)
    if kw:  # this line contains a keyword
        # Split the line by the keyword
        first, second = line.split(kw)
        second = second.lstrip()
        result = "%s%s%s"%(first, kw, second[num_bars_del:])
        print(result)

Find and Edit Text File

Question

2 answers

solution1
2 ACCPTED 2016-08-24 05:41:23

How it works

Using python

solution2
1 2016-08-24 06:00:00

Find and Edit Text File

Question

2 answers

solution1 2 ACCPTED 2016-08-24 05:41:23

How it works

Using python

solution2 1 2016-08-24 06:00:00

solution1
2 ACCPTED 2016-08-24 05:41:23

solution2
1 2016-08-24 06:00:00