從.txt文件中混淆.csv中的數據

Question

我想根據要刪除的不同.txt文件中的數據列表來混淆.csv文件列中出現的單詞。

理想情況下，我將能夠忽略我的數據的情況，然后在.csv文件中，用'*'替換“to remove”文件中的匹配單詞。 我不確定在替換.csv文件中的單詞同時忽略大小寫的最佳方法是什么。 到目前為止我沒有工作，我願意接受解決方案。

示例數據文件：

This is a line of text in .csv column that I want to remove a word from or data such as 123 from.

我的.txt文件將是要刪除的數據列表：

want
remove
123

輸出應該是：

This is a line of text in .csv column that I **** to ****** a word or data such as *** from.

我的代碼：

import csv

with open('MyFileName.csv' , 'rb') as csvfile, open ('DataToRemove.txt', 'r') as removetxtfile:
    reader = csv.reader(csvfile)
    reader.next()
    for row in reader:
        csv_words = row[3].split(" ") #Gets the word for the 4th column in .csv file
            for line in removetxtfile:
                for wordtoremove in line.split():
                    if csv_words.lower() ==  wordtoremove.lower()
                        csv_words = csv_words.replace(wordtoremove.lower(), '*' * len(csv_words))

Answer 1

我首先要構建一套檢查詞。 我的輸入基本上是換行符分隔單詞的純文本文件。 如果您的文本文件不同，則可能需要單獨解析。

其他想法：

創建單獨的審查文件輸出，而不是嘗試覆蓋您的輸入文件。 這樣，如果你搞砸了你的算法，你就不會丟失數據。

在第4列上執行.split(" ") ，只有在該列中有多個單詞（空格分隔）時才需要。 如果不是這種情況，您可以跳過for w in csv_words循環中的for w in csv_words ，它循環遍歷第4列中的所有單詞。

import csv
import re
import string

PUNCTUATION_SPLIT_REGEX = re.compile(r'[\s{}]+'.format(re.escape(string.punctuation)))

# construct a set of words to censor
censor_words = set()
with open ('DataToRemove.txt', 'r') as removetxtfile:
  for l in removetxtfile:
    words = PUNCTUATION_SPLIT_REGEX.split(l)
    for w in words:
        censor_words.add(w.strip().lower())

with open('MyFileName.csv' , 'rb') as csvfile, open('CensoredFileName.csv', 'w') as f:
    reader = csv.reader(csvfile)
    # reader.next()
    for row in reader:
        csv_words = row[3].split(' ') #Gets the word for the 4th column in .csv file
        new_column = []
        for w in csv_words:
            if w.lower() in censor_words:
                new_column.append('*'*len(w))
            else:
                new_column.append(w)
        row[3] = ' '.join(new_column)
        f.write(' '.join(row) + '\n')

從.txt文件中混淆.csv中的數據

問題描述

1 個解決方案

解決方案1
0 已采納 2016-08-01 20:23:47

從.txt文件中混淆.csv中的數據

問題描述

1 個解決方案

解決方案1 0 已采納 2016-08-01 20:23:47

解決方案1
0 已采納 2016-08-01 20:23:47