简体   繁体   中英

What is Mapreduce equivalent capability in Oracle RDBMS?

Oracle RDBMS中是否有与MapReduce等效的功能/功能,可以使用并行处理来处理大型数据集?

Oracle published a white paper on the implementing MapReduce algorithm using PL/SQL. Find it here .

Whilst the code in the white paper works, the underlying premise does not stand up to scrutiny. MapReduce works by applying the brute force of massively parallel operations to sort, filter and transform data. But because of Oracle's Licensing policies hardly anybody can afford enough CPUs to make MapReduce worthwhile on the database.

Fortunately, Oracle's built-in functionality, the stuff we pay for with those eye-watering license fees, is sufficiently powerful to render MapReduce irrelevant. It's better to learn how to use SQL properly, especially analytical functions and (in 12c) MATCH_RECOGNIZE. Oh, and proper data modeling too.


" When i was reading about NoSQL DBs, especially MongoDB MapReduce was mentioned as a capability difference compared to RDBMS."

warning: opinions ahead

It's a capability difference in the same way that a pirate with a peg leg has the capability to stamp out the holes in ring doughnuts.

Essentially MapReduce is a process for transforming a bunch of data stored in one undifferentiated mass into a particular shape suited for a particular task. Google devised MapReduce algorithm to process screeds of web pages to extract all the different words and total the occurrences of each. That's why a word count demo is the MapReduce equivalent of "Hello World".

Stores like MongoDB hold their data in schema-less formats ie documents. That's very good for persisting and retrieving whole documents but not so good for querying parts of a document or joins across multiple documents. That's why they need this capability for MapReduce. There's no intelligence in the store itself.

Oracle doesn't need MapReduce because it has a different paradigm. It holds data in schemas, which apply intelligence and structure to the data, and has a query engine optimized for joining data structures.

Both approaches have their benefits and their costs. An RDBMS offers data integrity and querying speed at the price of upfront design and requiring loads to conform to a fixed structure. A "schema-less" data store makes it easier to persist arbitrarily structured documents but read tasks pay the toll in pre-processing those documents to generate the data sub-sets they need.

Yes. It's called parallel execution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM