简体   繁体   中英

Working on migration of SPL 3.0 to 4.2 (TEDA)

I am working on migration of 3.0 code into new 4.2 framework. I am facing a few difficulties:

  1. How to do CDR level deduplication in new 4.2 framework? (Note: Table deduplication is already done).

  2. Where to implement PostDedupProcessor - context or chainsink custom? In either case, do I need to remove duplicate hashcodes from the list or just reject the tuples? Here I am also doing column updating for a few tuples.

  3. My file is not moving into archive. The temporary output file is getting generated and that too empty and outside load directory. What could be the possible reasons? - I have thoroughly checked config parameters and after putting logs, it seems correct output is being sent from transformer custom, so I don't know where it is stuck. I had printed TableRowGenerator stream for logs(end of DataProcessor).

1. and 2.:

You need to select the type of deduplication. It is not a big difference if you choose "table-" or "cdr-level-deduplication". The ite.businessLogic.transformation.outputType does affect this. There is one Dedup only. You can not have both.

Select recordStream for "cdr-level-deduplication", do the transformation to table row format (eg if you like to use the TableFileWriter) in xxx.chainsink.custom::PostContextDataProcessor. In xxx.chainsink.custom::PostContextDataProcessor you need to add custom code for duplicate-handling: reject (discard) tuples or set special column values or write them to different target tables.

3.:

Possibly reasons could be:

  • Missing forwarding of window punctuations or statistic tuple
  • error in BloomFilter configuration, you would see it easily because PE is down and error log gives hints about wrong sha2 functions be used

To troubleshoot your ITE application, I recommend to enable the following debug sinks if checking the StreamsStudio live graph is not sufficient:

  • ite.businessLogic.transformation.debug=on
  • ite.businessLogic.group.debug=on
  • ite.businessLogic.sink.debug=on

Run a test with a single input file only and check the flow of your record and statistic tuples. "Debug sinks" write punctuations markers also to debug files.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM