Teradata 14.10 ups in-memory and in-database analytics
The recently released Teradata 14.10 platform adds several features that one-up and surpass some of its newer analytic platform rivals. Highlights include dynamic tiering of hot data into memory, increased support for in-database analytics, better connectivity to Hadoop, and new optimizations for R implementation that fully exploit parallel processing.
Some of the enhancements, such as in-memory tiering, in-database analytic functions, and tighter Hadoop integration, are not necessarily unique to Teradata, but the implementations are. As for R scale-up, Teradata is uniquely applying MapReduce-like enhancements to enable R to better utilize the platform’s massively parallel architecture.
In sum, the enhancements are essential for maintaining Teradata’s premium positioning for scalability and performance for workloads that still require the service levels and data protections offered by the SQL environment.
Taking advantage of memory
This is an ongoing theme for all data platforms, both analytic and transactional, with the price of memory (and flash storage) dropping to levels where it becomes possible, not simply to cache data, but persist it. While some providers, such as SAP, Kognitio, and others promote or offer all in-memory data stores as an option, Teradata believes such a strategy is overkill.
No matter how cheap memory gets, disk (and, to a lesser extent, flash) are still cheaper. Based on monitoring of several hundred customers, Teradata found that barely 1% of the data accounted for 50% of the IOPS (inputs/outputs per second), while 10% of the data counted for 85% of the load. So it has introduced Teradata Intelligent Memory, a configurable memory manager for allocating how much memory is reserved for cache and how much is used for storing hot data. For now, the settings must be input manually, but on the horizon, Teradata will introduce tools that optimize the settings.
Today, data tiering (selectively placing data in the right area of storage) in memory is table stakes for analytic platforms. Teradata’s extended memory enhancements keep the platform competitive, but not necessarily ahead of the pack. For instance, IBM recently introduced similar enhancements as part of its BLU architecture with DB2 10.5; and Oracle Exadata has doubled the memory and quadrupled the flash drive storage and automates tiering (admittedly, Exadata is a broader platform).
Extended memory management is just one of the pieces of the puzzle when it comes to optimizing performance. There are also classic features, such as query optimization that tune SQL for the data layout, and emerging approaches, such as hybrid columnar support (which Teradata offers) and data skipping (which IBM has, but Teradata does not). The bottom line is that Teradata Intelligent Memory keeps the platform quite competitive, but not ahead of the pack.
In-database analytic functions
Teradata has made an important leap ahead with in-database analytic functions, where capabilities are embedded inside the database (rather than in the application tier). Concurrent with the 14.10 announcement, Teradata has expanded its existing relationship with Fuzzy Logix, a third party that offers a library of roughly 800 analytic functions that can be called by SQL routines.
Before this, Teradata only directly supported roughly 200 of those functions. Aside from the benefit of saving the effort to custom-write the routines, the big advantage is performance. Until now, Teradata was hardly alone in supporting analytic libraries from Fuzzy Logix; rivals such as IBM, Microsoft, and SAP/Sybase already had similar arrangements. In most cases, these arrangements involved support of only a fraction of Fuzzy Logix’s libraries.
Teradata has upped the ante, formalizing a reselling agreement, and extending support to the rest of the Fuzzy Logix portfolio that others do not already support. It offers support on the recent two generations of platforms (back to 13.10), and early next year, will extend support to the companion Teradata Aster line.
The benefits accrue, not only to data within Teradata, but also Hadoop. With its SQL H connectivity to Hadoop (where queries are made against Hadoop and data massively downloaded to Teradata as external tables), Teradata has benchmarks showing performance gains versus running the same problem inside a Hadoop cluster.
Making R massively parallel
With 14.10, Teradata has not rewritten R itself – it continues to partner with Revolution Analytics, which provides a commercially supported version of R.
But Teradata has developed new optimizations for running R in a mode that is similar to MapReduce. The challenge with R is that it was developed as a program that ran on a single server; Teradata’s rivals (e.g., Oracle, IBM) have developed their own “enterprise” versions of R that implement R running on multiple servers. (Oracle has developed a version that interfaces to SQL, while IBM’s offering runs on its BigInsights Hadoop platform.)
The challenge is that most existing multi-node implementations in the SQL world compute on each node in isolation; this is typical of classic HPC (high-performance compute) clusters. Teradata has added a mode that operates like MapReduce, where results on each node are shuffled at each step along the way. There are different uses for each style; for instance, HPC-style is useful for scoring while MapReduce-style is better suited for regression analyses.
Teradata’s new implementation of R is an important bridging capability, bringing Hadoop-like computation to SQL platforms. There are still advantages to performing such runs on Hadoop where compute cycles are cheaper, but for analytics where higher performance or the need to work with more sensitive data is concerned, Teradata’s R implementation provides a useful new addition.