简体   繁体   English

如何使用pentaho PDI(勺)执行数据屏蔽?

[英]How should I perform data masking with pentaho PDI (spoon)?

I would perform data masking for more than 10 tables and each tables has more than 100 columns. 我将对10个以上的表执行数据屏蔽,每个表具有100多个列。

I'd tried to mask data using pentaho PDI tool, but I couldn't find out how should I write mask data with it. 我曾尝试使用pentaho PDI工具掩盖数据,但是我找不到如何用它写入掩盖数据的方法。

How should I perform data masking with Pentaho? 如何使用Pentaho执行数据屏蔽? I think one of the way is to use tool named "replace in String" but I couldn't change any string even if I tried to use it. 我认为一种方法是使用名为“在字符串中替换”的工具,但是即使尝试使用它也无法更改任何字符串。

my question is, 我的问题是

  1. Is it correct way to use "replace in String" in order to do data masking. 使用“替换字符串”进行数据屏蔽是否正确?
  2. if it is correct, how should I fill the value in the respective field? 如果正确,我应该如何在相应字段中填写值?

I want to replace some value with *, let's say, the value is "this is sample value" it should be "txxx xx xxxxx xxxxe" some thing like this. 我想用*代替一些值,比如说,值是“这是样本值”,应该是“ txxx xx xxxxx xxxxe”,就像这样。

PDI的画面

please help. 请帮忙。

It's not about kettle, it's about regexp. 这与水壶无关,而与正则表达式有关。 I can confirm that "String Replace" has strange unpredictable behavior, in case of using regex inside this step. 如果在此步骤中使用正则表达式,我可以确认“字符串替换”具有奇怪的不可预测的行为。 There is no explanation of "Replace String" step in official docs as well, not much actually. 在官方文档中也没有关于“替换字符串”步骤的解释,实际上并没有太多解释。 Anyway u can use RegexEvaluation step to capture needed part and replace inside original string. 无论如何,您可以使用RegexEvaluation步骤来捕获所需部分并替换原始字符串内。

But there is workaround which makes it easier 但是有解决方法可以使它更容易

在此处输入图片说明

JavaScript-Step with str.replace 带str.replace的JavaScript-Step

This can be done by using a javascript-step, like: 这可以通过使用javascript步骤来完成,例如:

//variable
var str = data_to_mask;

//first letter
var first = str.match(/^[A-Za-z0-9]/);

//last letter
var last = str.match(/[A-Za-z0-9]$/);

//replace all with "x"
str = str.replace(/[A-Za-z0-9]/gi, "x");

//get the first and the last letter back
str = str.replace(/^[A-Za-z0-9]/, first);
str = str.replace(/[A-Za-z0-9]$/, last);

(Simar's answer works as well I think and maybe it's a bit more elegant :) (我认为,Simar的答案也很有效,也许有点优雅:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM