[英]Python & Pandas: How do I create a column of fake data containing dates in a specific format
[英]How do I create reversible fake data in Python?
我正在尝试提出一种数据屏蔽技术,该技术涉及用可逆的假数据替换实际数据。
例如:
如果我的数据由字符串'Hello'
组成,我想使用'Hi'
将其屏蔽,然后能够使用密钥或某种算法将其恢复为原始字符串'Hello'
。
'Hello'
--- Mask ---> 'Hi'
--- unMask ---> 'Hello'
我做了一些研究,发现了一种可能适用于我的情况的Fisher-Yates shuffle算法。
我曾想过通过对字符串进行混洗来创建上述算法的实现,然后尝试使用诸如密钥之类的机制将其还原。
'Hello'
--- Mask ---> 'ellho'
--- unMask ---> 'Hello'
但是,我不太确定如何实施该方法。
请指教。
我想到的第一件事是编码然后解码字符串。
或者像这样有趣的东西,但这可以破解..
text = "String to encode"
print(text)
text_utf = text.capitalize()[::-1]
print(text_utf)
original_text = text_utf[::-1]
print(original_text)
输出:
要编码的字符串
内分泌
要编码的字符串
您想要的是字符串集上的双射函数。 改组字母是使其成为双射的最简单方法,因为重新排列总是可以颠倒的。 所以单词都将被映射到相同长度的单词。 重新排列可以通过每个字符索引的变化来描述。
这是一种可以用密钥洗牌的方法。 这不是任何类型的加密,我看到在评论中讨论过,但不在原始问题中。 如果安全很重要,请不要使用它。
我会为每个可能的字符串长度制作一个键列表。
这是未经测试的。 手边没有口译员。 更新测试并修复错误。 示例输出:
>>> mask("hello")
'eholl'
>>> unmask("eholl")
'hello'
>>> mask("foobarbaz")
'faorbzboa'
>>> unmask("faorbzboa")
'foobarbaz'
这两个函数是相反的:
>>> mask(unmask("bazzbazzbazz"))
'bazzbazzbazz'
>>> unmask(mask("bazzbazzbazz"))
'bazzbazzbazz'
我还意识到哪个函数是“掩码”和哪个是“取消掩码”的指定是任意的。 你可以使用。
>>> unmask("hello")
'eholl'
>>> mask("eholl")
'hello'
代码:
import random
keys = {}
# Pneumonoultramicroscopicsilicovolcanoconiosis longest word in english 45 characters
for i in range(2, 45): # you can't shuffle length 0 or 1 strings.
key = list(range(i))
while key == list(range(i)): # just incase it randomly ends up being the same on the first try or thereafter. technically possible unless random.shuffle has a built in check.
random.shuffle(key)
keys[i] = key
现在我们可以使用键来洗牌
def mask(word: str):
key = keys[len(word)]
# I'm quite certain there will be some builtin library that can do this with one
# function call and efficiently but I'll do it manually here.
new_word_characters = ["", ]*len(word)
for i, character in zip(key, word):
new_word_characters[i] = character
new_word = "".join(new_word_characters)
return new_word
def unmask(word: str):
key = keys[len(word)]
new_word_characters = ["", ]*len(word)
k = 0
for i, character in zip(key, word):
new_word_characters[k] = word[i]
k += 1
new_word = "".join(new_word_characters)
return new_word
只是按照我的建议:这是(过度使用)在 CTR 模式下使用 AES 加密的示例。 在我写的评论中,加密将使明文的长度保持不变,但仅适用于二进制格式。 如果需要打印文本,此处的示例更改为十六进制输出,使长度加倍。 逐行加密可能没有意义,但这个例子应该给出如何实现目标的想法。 如果安全性不是这种假数据屏蔽的目标,则加密/解密方法可能会更改为洛伦兹密码以简化。
def test_plaintext_encryption(self):
plaintext = 'some words to encrypt'
words_lengths = [len(item) for item in plaintext.split(" ")]
plaintext_joined = plaintext.replace(" ", "")
encryptor = Encryption('some key', 'some nonce')
encryptor.init_encryption()
encryptor.update_payload_to_encrypt(plaintext_joined)
cipher_as_text = ''.join([hex(ord(item)).lstrip('0x').zfill(2) for item in encryptor.encrypted_payload])
self.assertEqual("c8638dd3ee70e8a7bf9c1c943507fe61b8cb", cipher_as_text)
split_encrypted_in = []
for word_len in words_lengths:
split_encrypted_in.append(cipher_as_text[:2*word_len])
cipher_as_text = cipher_as_text[2 * word_len:]
split_encrypted = " ".join(split_encrypted_in)
self.assertEqual("c8638dd3 ee70e8a7bf 9c1c 943507fe61b8cb", split_encrypted)
decryptor = Encryption('some key', 'some nonce')
decryptor.init_decryption()
joined_encrypted = split_encrypted.replace(" ", "")
self.assertEqual("c8638dd3ee70e8a7bf9c1c943507fe61b8cb", joined_encrypted)
binary_encrypted = [int(item) for item in bytearray.fromhex(joined_encrypted)]
decryptor.update_payload_to_decrypt(binary_encrypted)
plaintext_joined = decryptor.decrypted_payload
self.assertEqual("somewordstoencrypt", ''.join([chr(ord(item)) for item in plaintext_joined]))
plaintext_words = []
plaintext_words_lengths = [len(item)/2 for item in split_encrypted.split(" ")]
self.assertEqual([4, 5, 2, 7], plaintext_words_lengths)
for word_len in plaintext_words_lengths:
plaintext_words.append(plaintext_joined[:word_len])
plaintext_joined = plaintext_joined[word_len:]
decrypted_plaintext = ' '.join(plaintext_words)
self.assertEqual("some words to encrypt", decrypted_plaintext)
它基于 Crypto 类:
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
class Encryption(object):
def __init__(self, key='aKeyNobodyWIllEverUse', nonce='PleaseMakeMeRandomEachTime'):
key = str(key)
while len(key) < 32:
key += key
key = key[:32]
nonce = str(nonce)
while len(nonce) < 16:
nonce += nonce
nonce = nonce[:16]
backend = default_backend()
self._cipher = Cipher(algorithms.AES(key), modes.CTR(nonce), backend=backend)
self._encryptor = None
self._encrypted_payload = None
self.init_encryption()
self._decryptor = None
self._decrypted_payload = None
self.init_decryption()
def init_encryption(self):
self._encryptor = self._cipher.encryptor()
self._encrypted_payload = None
def update_payload_to_encrypt(self, payload):
if self._encryptor:
self._encrypted_payload = self._encryptor.update(payload)
@property
def encrypted_payload(self):
if self._encrypted_payload:
return self._encrypted_payload
return ''
def init_decryption(self):
self._decryptor = self._cipher.decryptor()
self._decrypted_payload = None
def update_payload_to_decrypt(self, payload):
if self._decryptor:
self._decrypted_payload = self._decryptor.update(payload)
@property
def decrypted_payload(self):
if self._decrypted_payload:
return self._decrypted_payload
return ''
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.