Conor O'Mahony's Database Diary

Your source of IBM database software news (DB2, Informix, Hadoop, & more)

Archive for November 2011

Comparing HDFS and GPFS for Hadoop

leave a comment »

Here is a chart that compares the performance of Hadoop Distributed File System (HDFS) with General Parallel File System-Shared Nothing Cluster (GPFS-SNC) for certain Hadoop-based workloads (it comes from the Understanding Big Data book). As you can see, GPFS-SNC easily out-performs HDFS. In fact, the book claims that a 10-node GPFS-SNC-based Hadoop cluster can match the performance of a 16-node HDFS-based Hadoop cluster.

Comparing HDFS and GPFS for Hadoop Workloads

GPFS was developed by IBM in the 1990s for high-performance computing applications. It has been used in many of the world’s fastest computers (including Blue Gene and Watson). Recently, IBM extended GPFS to develop GPFS-SNC, which is suitable for Hadoop environments. A key difference between GPFS-SNC and HDFS is that GPFS-SNC is a kernel-level file system, whereas HDFS runs on top of the operating system. This means that GPFS-SNC offers several advantages over HDFS, including:

  • Better performance
  • Storage flexibility
  • Concurrent read/write
  • Improved security

If you are interested in seeing how GPFS-SNC performs in your Hadoop cluster, please contact IBM. Although GPFS-SNC is not in the current release of InfoSphere BigInsights (IBM’s Hadoop-based product), GPFS-SNC is currently available to select clients as a technology preview.


Written by Conor O'Mahony

November 30, 2011 at 1:07 pm

Informix Users are Going to San Diego

leave a comment »

It has just been announced that next year’s International Informix Users Group (IIUG) conference will be held in San Diego, California on 22 – 25 April. The IIUG Conference continues to offer incredible value. Sign up soon to get the $695 early bird rate, and if you sign up for free IIUG membership, you even get $100 off that rate. $595 for a conference of this length and quality is amazing value. But you’re going to have to act fast to get this discount rate!

And, don’t forget that San Diego is such a great city to visit. Not only is it a wonderful city with an ideal year-round climate. But it also has fantastic array of attractions like the world-famous San Diego Zoo, Sea World, LEGO land, and the Zoo Safari Park (a personal favorite).

International Informix Users Group (IIUG) Conference

Written by Conor O'Mahony

November 30, 2011 at 9:22 am

Posted in DBA, IIUG, Informix

Tagged with , ,

Highlights from the IDUG EMEA Conference

leave a comment »

DB2Night ShowI’m still in the afterglow of the International DB2 User Group (IDUG) conference in Prague, Czech Republic. It was another great conference at a great facility in a great city. The conference organizers should be commended on a truly outstanding event. Its incredible to think that the conference organizers are user volunteers, and not professional conference planners! I’m already looking forward to the next IDUG EMEA conference in Berlin next year. If you are interested in a more in-depth discussion of the conference, including lessons learned from the technical sessions, Norberto Filho will be appearing on the DB2Night show on Friday 02 December 2011. Even if you were at the conference, there was so much happening there that you are sure to learn something new from Norberto’s experiences.

Written by Conor O'Mahony

November 30, 2011 at 8:29 am

Posted in DB2 for LUW, DB2 for z/OS, DBA, IDUG

Tagged with ,

IBM is Baking NoSQL Capabilities into DB2 and Informix

leave a comment »

IBM recently revealed its plan to integrate certain NoSQL capabilities into IBM DB2 and Informix. In particular, it is working to integrate graph store and key:value store capabilities into the flagship IBM database products. IBM is not yet indicating when these new capabilities will be available.

IBM does not plan to integrate all NoSQL technologies into DB2 and Informix. After all, there are many NoSQL technologies, and quite a few of them are clearly not suitable for integration into IBM’s products. The following chart summarizes the NoSQL product landscape. This landscape includes more than 100 products across a number of database categories. IBM is saying that they will integrate certain NoSQL capabilities into their products and work hand-in-hand with others NoSQL technologies.

NoSQL Landscape

Readers of this blog will know that these developments are consistent with my view that certain NoSQL technologies will eventually find themselves integrated into the major relational database products. In much the same way as the major relational database products fended off the challenge of object databases by adding features like stored procedures and user-defined functions, I expect the major relational database products to fend off the NoSQL challenge with similar tactics. And don’t forget that the major relational database products have already integrated XML capabilities, providing XQuery as an alternate query language. Its not too much of a stretch to imagine how several of these NoSQL capabilities might be supported in an optimized way as part of a relational database product.

I look forward to blogging more about this topic as news about it emerges…

Written by Conor O'Mahony

November 21, 2011 at 9:00 am

IBM DB2 Analytics Accelerator—Bringing Netezza to the Mainframe

leave a comment »

Now that the IBM Information on Demand (IOD) and International DB2 User Group (IDUG) conferences are behind me, I have time to blog about some of the great announcements from those conferences. Probably the announcement that generated the most interest among conferences attendees is the new release of the IBM DB2 Analytics Accelerator (IDAA). This product takes advantage of Netezza to accelerate analytics queries on DB2 for z/OS.

The way it works is… you specify the data whose analysis you want to speed up, and a copy of that data is placed on Netezza (DB2 for z/OS continues to be the system of record for all data). Then, when DB2 for z/OS receives a query, an optimizer determines whether that query should be handled by DB2 for z/OS or by IBM Netezza. Here is a chart from the IDUG conference that summarizes the query execution flow.

IBM DB2 Analytics Accelerator

Conceptually, you could almost think of the IBM DB2 Analytics Accelerator as a mainframe specialty processor for analytics. I know its not actually a specialty processor, but it does perform the processing involved with complex analytics queries. It also makes life easier for database administrators who often struggle with long-running complex queries, by providing them with an accelerator that does not require additional tuning. To see how much faster it is, here is another chart from the IDUG conference. It shows the experiences of IBM DB2 Analytics Accelerator Beta program participants.

IBM DB2 Analytics Accelerator Performance

If you run complex analytical queries on DB2 for z/OS, it is almost certainly worth you while to learn more about the IBM DB2 Analytics Accelerator.

Written by Conor O'Mahony

November 18, 2011 at 9:00 am

What will Happen to “In-Memory” when Storage Class Memory Arrives?

leave a comment »

During this week’s keynote address at the International DB2 User Group (IDUG) conference in Prague, Namik Hrle talked about Storage Class Memory. Storage Class Memory is a technology in development that promises the performance of Solid State Drive (SSD) technology at the low cost of Hard Disk Drive (HDD) technology. It also promises compelling breakthroughs in space and power consumption. Storage Class Memory is essentially the marriage of scalable non-volatile memory technology and ultra high-density technology. Here is a table that projects the 2020 characteristics of Storage Class Memory:

Storage Class Memory

This table was actually created in 2008. From what Mr. Hrle says, we are tracking ahead of this schedule and will have these capabilities available sooner than 2020.

The performance limitations of disk-based systems have led to the addition of many database and data warehouse “features” (clever optimizations that address these limitations, and provide acceptable performance). If Storage Class Memory delivers on its random and sequential I/O performance promises, as well as its cost promises, many of these optimizations will become either less important, or perhaps unnecessary. In fact, it makes you wonder if our industry’s current fixation with in-memory capabilities may be short-sighted. Several vendors have in-memory database product visions that will not be realized until the latter half of this decade, which is a similar time frame to the projected availability of low-cost Storage Class Memory. Certainly food for thought…

Written by Conor O'Mahony

November 17, 2011 at 10:17 am

Posted in Cost, Performance

Comparing “New Big Data” with IMS on the Mainframe

with 2 comments

While it does not come up often in today’s data management conversations, the IMS database software is at the heart of many major corporations around the world. For many people, it is the undisputed leader for mission-critical, enterprise transaction and data-serving workloads. IMS users routinely handle peaks of 100 million transactions in a day, and there are quite a few users who report more than 3,000 days without unplanned outages. That’s more than 8 years without an unplanned outage!

IBM recently announced IMS 12, claiming peak performance at a remarkable 66,000 transactions per second. The new release features improved performance and CPU efficiency for most IMS use cases, and a significant improvement in performance for certain use cases. For instance, the Fast Path Secondary Index means that workloads that use this secondary index are 60% faster.

It is interesting to compare the performance of IMS with the headline-grabbing “big data” solutions that are all the rage today. For instance, at the end of August this year, we read how Beyonce Pregnancy News Births New Twitter Record Of 8,868 Tweets Per Second. I am not saying that IMS can replace the infrastructure of Twitter. Far from it. However, I am saying that, when you consider that IMS can handle 66,000 transactions per second, the relative performance levels of the “new big data” solutions when compared with IMS are food for thought. Especially when you consider the very significant infrastructure in place at Twitter, and the staff needed to manage that infrastructure. And don’t forget that IMS supports these performance levels with full read-write capability, full data integrity, and mainframe-level security.

I appreciate that many of today’s Web-scale businesses begin with capital investments that preclude the hardware and software investments required for something like IMS. These new businesses need to be relatively agile, and depend upon the low barrier of entry that x86-based systems and open source/inexpensive software afford. However, I still think it interesting to put this “new big data” in perspective.

Written by Conor O'Mahony

November 9, 2011 at 2:17 pm

%d bloggers like this: