查找和編輯文本文件

Question

我正在尋找是否有一種自動化此過程的方法。 基本上，我每天需要下載300,000行數據。 在將其上載到SQL之前，需要編輯幾行。

Jordan || Michael | 23 | Bulls | Chicago

Bryant | Kobe ||| 8 || LA

我要完成的是每行只有4個垂直條。 通常，我會搜索一個關鍵字，然后手動對其進行編輯然后保存。 這兩個是我數據中唯一的異常。

找到“喬丹”，然后刪除多余的1個豎線“ |” 之后。
我需要找到“神戶”，然后刪除兩個多余的豎線“ |” 之后。

正確的格式如下-

Jordan | Michael | 23 | Bulls | Chicago

Bryant | Kobe | 8 || LA

不知道這是否可以在vbscript或Python中完成。 任何幫助，將不勝感激。 謝謝！

Answer 1

可以使用Python或vbscript，但是對於這種簡單的東西，它們是過大的。 嘗試sed ：

$ sed -E 's/(Jordan *)\|/\1/g; s/(Kobe *)\| *\|/\1/g' file 
Jordan | Michael | 23 | Bulls | Chicago
Bryant | Kobe | 8 || LA

要保存到新文件：

sed -E 's/(Jordan *)\|/\1/g; s/(Kobe *)\| *\|/\1/g' file >newfile

或者，就地更改現有文件：

sed -Ei.bak 's/(Jordan *)\|/\1/g; s/(Kobe *)\| *\|/\1/g' file

這個怎么運作

sed逐行讀取和處理文件。 在我們的情況下，我們只需要格式為s/old/new/g的替代命令，其中old是一個正則表達式，如果找到，則將其替換為new 。 該命令末尾的可選g告訴sed“全局”執行替換命令，這意味着它不僅出現在行中，而且還多次出現。

s/(Jordan *)\\|/\\1/g

這告訴sed查找約旦，后跟零個或多個空格，后跟垂直條，然后刪除垂直條。
更詳細地講， (Jordan *)的括號告訴sed保存字符串Jordan，后跟零個或多個空格。 在替換方面，我們將該組稱為\\1 。
s/(Kobe *)\\| *\\|/\\1/g

同樣，這告訴sed查找Kobe，后跟零個或多個空格，后跟垂直條，然后刪除垂直條。

使用python

使用與上述相同的邏輯，這是一個python程序：

$ cat kobe.py
import re
with open('file') as f:
    for line in f:
        line = re.sub(r'(Jordan *)\|', r'\1', line)
        line = re.sub(r'(Kobe *)\| *\|', r'\1', line)
        print(line.rstrip('\n'))
$ python kobe.py
Jordan | Michael | 23 | Bulls | Chicago
Bryant | Kobe | 8 || LA

要將其保存到新文件：

python kobe.py >newfile

Answer 2

我在Python 3.5中編寫了一個代碼段，如下所示。

# -*- coding: utf-8 -*-

rows = ["Jordan||Michael|23|Bulls|Chicago",
        "Bryant|Kobe|||8||LA"]

keywords = ["Jordan", "Kobe"]        

def get_keyword(row, keywords):
    for word in keywords:
        if word in row:
            return word
    else:
        return None            

for line in rows:
    num_bars = line.count('|')
    num_bars_del = num_bars - 4  # Number of bars to be deleted
    kw = get_keyword(line, keywords)
    if kw:  # this line contains a keyword
        # Split the line by the keyword
        first, second = line.split(kw)
        second = second.lstrip()
        result = "%s%s%s"%(first, kw, second[num_bars_del:])
        print(result)

查找和編輯文本文件

問題描述

2 個解決方案

解決方案1
2 已采納 2016-08-24 05:41:23

這個怎么運作

使用python

解決方案2
1 2016-08-24 06:00:00

查找和編輯文本文件

問題描述

2 個解決方案

解決方案1 2 已采納 2016-08-24 05:41:23

這個怎么運作

使用python

解決方案2 1 2016-08-24 06:00:00

解決方案1
2 已采納 2016-08-24 05:41:23

解決方案2
1 2016-08-24 06:00:00