简体   繁体   中英

How should I perform data masking with pentaho PDI (spoon)?

I would perform data masking for more than 10 tables and each tables has more than 100 columns.

I'd tried to mask data using pentaho PDI tool, but I couldn't find out how should I write mask data with it.

How should I perform data masking with Pentaho? I think one of the way is to use tool named "replace in String" but I couldn't change any string even if I tried to use it.

my question is,

  1. Is it correct way to use "replace in String" in order to do data masking.
  2. if it is correct, how should I fill the value in the respective field?

I want to replace some value with *, let's say, the value is "this is sample value" it should be "txxx xx xxxxx xxxxe" some thing like this.

PDI的画面

please help.

It's not about kettle, it's about regexp. I can confirm that "String Replace" has strange unpredictable behavior, in case of using regex inside this step. There is no explanation of "Replace String" step in official docs as well, not much actually. Anyway u can use RegexEvaluation step to capture needed part and replace inside original string.

But there is workaround which makes it easier

在此处输入图片说明

JavaScript-Step with str.replace

This can be done by using a javascript-step, like:

//variable
var str = data_to_mask;

//first letter
var first = str.match(/^[A-Za-z0-9]/);

//last letter
var last = str.match(/[A-Za-z0-9]$/);

//replace all with "x"
str = str.replace(/[A-Za-z0-9]/gi, "x");

//get the first and the last letter back
str = str.replace(/^[A-Za-z0-9]/, first);
str = str.replace(/[A-Za-z0-9]$/, last);

(Simar's answer works as well I think and maybe it's a bit more elegant :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM