Conor O'Mahony's Database Diary

Your source of IBM database software news (DB2, Informix, Hadoop, & more)

Win a Trip to the IDUG Conference of your Choice

leave a comment »

DB2Night ShowThe International DB2 User Group (IDUG) is a user-run organization. If you want independent information about DB2, IDUG is the place to go. This year, IDUG are have conferences in the US (Denver), Germany (Berlin), and Australia (Sydney). The good news is that the DB2night Show is holding a contest, and the prize is an all expenses-paid trip to the IDUG conference of your choice. The contest aims to identify new users who can speak about their experiences with DB2. It’s a talent contest of sorts, where the talent is sharing your experiences. If you have ever considered speaking at a conference, this contest is the ideal way to see how you might do in a fun setting.


Written by Conor O'Mahony

January 25, 2012 at 2:01 pm

Anatomy of an Oracle Marketing Claim

with 10 comments

Yesterday, Oracle announced a new TPC-C benchmark result. They claim:

In this benchmark, the Sun Fire X4800 M2 server equipped with eight Intel® Xeon® E7-8870 processors and 4TB of Samsung’s Green DDR3 memory, is nearly 3x faster than the best published eight-processor result posted by an IBM p570 server equipped with eight Power 6 processors and running DB2. Moreover, Oracle Database 11g running on the Sun Fire X4800 M2 server is nearly 60 percent faster than the best DB2 result running on IBM’s x86 server.

Let’s have a closer look at this claim, starting with the first part: “nearly 3x faster than the best published eight-processor result posted by an IBM p570 server“. Interestingly, Oracle do not lead by comparing their new leading x86 result with IBM’s leading x86 result. Instead they choose to compare their new result to an IBM result from 2007, exploiting the fact that even though this IBM result was on a different platform, it uses the same number of processors. Of course, we all know that the advances in hardware, storage, networking, and software technology over half a decade are simply too great to form any basis for reasonable comparison. Thankfully, most people will see straight through this shallow attempt by Oracle to make themselves look better than they are. I cannot imagine any reasonable person claiming that Oracle’s x86 solutions offer 3x the performance of IBM’s Power Systems solutions, when comparing today’s technology. I’m sure most people will agree that this first comparison is simply meaningless.

Okay, now let’s look at the second claim: “nearly 60 percent faster than the best DB2 result running on IBM’s x86 server“. Oracle now compare their new leading x86 result with IBM’s leading x86 result. However, if you look at the benchmark details, you will see that IBM’s result uses half the number of CPU processors, CPU cores, and CPU threads. If you look at performance per core, the Oracle result achieves 60,046 tpmC per CPU core, while the IBM result achieves 75,367 tpmC per core. While Oracle claims to be 60% faster, if you take into account relevant system size and determine the performance per core, IBM is actually 25% faster than Oracle.

Finally, let’s not forget the price/performance metric from these benchmark results. This new Oracle result achieved US$.98/tpmC, whereas the leading IBM x86 result achieved US$.59/tpmC. That’s correct, when you determine the cost of processing each transaction for these two benchmark results IBM is 39% less expensive than Oracle. (BTW, I haven’t had a chance yet to determine if Oracle Used their Usual TPC Price/Performance Tactics for this benchmark result, as the result details are not yet available to me; but if they have, the IBM system will prove to be even less expensive again than the Oracle system.)

Benchmark results are as of January 17, 2012: Source: Transaction Processing Performance Council (TPC),
Oracle result: Oracle Sun Fire X4800 M2 server (8 chips/80 cores/160 threads) – 4,803,718 tpmC, US$.98/tpmC, available 06/26/12.
IBM results: IBM System p 570 server (8 chips/16 cores/32 threads) -1,616,162 tpmC, US$3.54 /tpmC, available 11/21/2007. IBM System x3850 X5 (4 chips/40 cores/80 threads) – 3,014,684 tpmC, US$.59/tpmC, available 09/22/11.

Written by Conor O'Mahony

January 18, 2012 at 11:01 am

Top Posts of 2011

leave a comment »

Its that time of year again. Here are the top posts from this blog in 2011, as judged by number of views.

  1. IBM DB2 Welcomes Oracle Database/HP Itanium Customers
  2. New IBM DB2 vs. Oracle Database Advertising Campaign
  3. A Closer Examination of Oracle’s “Database Performance” Advertisement
  4. Comparing Price for Oracle Exadata and IBM Smart Analytics System
  5. IBM DB2 Strikes Another Blow to Oracle Database

As you can see, there is a strong DB2/Oracle Database competitive theme running through these popular topics. And here are the top posts of 2011, as judged by reader participation. In other words, as judged by the number of comments (or perhaps the amount of controversy).

  1. New IBM DB2 vs. Oracle Database Advertising Campaign (20 comments)
  2. A Closer Examination of Oracle’s “Database Performance” Advertisement (19 comments)
  3. The Future of the NoSQL, SQL, and RDBMS Markets (12 comments)
  4. Update on the IBM DB2 “SQL Skin” for Migrating from Sybase ASE (8 comments)
  5. Industry Benchmark Result for DB2 pureScale: SAP Transaction Banking (TRBK) Benchmark (7 comments)

Written by Conor O'Mahony

December 19, 2011 at 11:00 am

Posted in Uncategorized

Deploying DB2 and InfoSphere Warehouse on Private Clouds

leave a comment »

Cloud computing is certainly a hot topic these days. If an organization is not already using cloud computing, it has plans to do so. The economics, agility, and value offered by cloud computing is just too persuasive for IT organizations ignore.

Even the high-profile Amazon outage couldn’t slow cloud computing’s relentless march towards mainstream adoption. If anything, that outage helped make cloud computing more robust by highlighting the need for hardened policies and procedures around provisioning in the cloud.

IBM recently announced updates to a set of products that make it easy to deploy DB2 and InfoSphere Warehouse on private clouds:

  • IBM Workload Deployer (previously know as WebSphere CloudBurst), which is a hardware/software appliance that streamlines the deployment and management of software on private clouds.
  • IBM Transactional Database Pattern, which works with the IBM Workload Deployer to generate DB2 instances that are suitable for transactional workloads.
  • IBM Data Mart Pattern, which generates InfoSphere Warehouse instances for data mart workloads.

These patterns consist of more than just deploying virtual images with pre-configured software. You should instead think of them as being like mini-applications for configuring and deploying a cloud-based database instances. Users specify information about the database, and then the pattern builds and deploys the database instance.

The Transactional Database Pattern is for OLTP deployments. It includes templates for sizing the virtual machine, database backup scheduling, database deployment cloning capabilities, and tooling (including Data Studio). The Data Mart Pattern incorporates the features to the OLTP pattern, together with deep compression and data movement tools. But, of course, it is configured and optimized for data mart workloads in a virtual environment.

Written by Conor O'Mahony

December 12, 2011 at 5:40 pm

Need Help Determining Hadoop Split Sizes? Use Adaptive MapReduce Instead!

with 2 comments

IBM is actively working on adaptive features for the Map and Reduce phases of its InfoSphere BigInsights product (which is based on Apache Hadoop). In some cases, this involves applying techniques commonly found in mature data management products, and in some cases it involves developing new techniques. While a number of these adaptive features are still under development, there are some features in the product today. For instance, BigInsights currently includes an Adaptive Mapper capability that allows Mappers to successively process multiple splits for a job, and avoid the start-up costs for subsequent splits.

When a MapReduce job begins, Hadoop divides the data into multiple splits. It then creates Mapper tasks for each split. Hadoop deploys the first wave of Mapper tasks to the available processors. Then, as Mapper tasks complete, Hadoop deploys the next Mapper tasks in the queue to the available processors. However, each Mapper task has a start-up cost, and that start-up cost is repeated each time a Mapper task starts.

With BigInsights, there is not a separate Mapper task for each split. Instead, BigInsights creates Mapper tasks on each available processor, and those Mapper tasks successively process the splits. This means that BigInsights significantly reduces the Mapper start-up cost. You can see the results of a benchmark for a set-similarity join workload in the following chart. In this case, the tasks have a high start-up cost. The AM bar (Adaptive Mapper) in the chart is based on a 32MB split size. You can see that by avoiding the recurring start-up costs, you can significantly improve performance.

Adaptive MapReduce Benchmark: Set-Similarity Join Workload

Of course, if you chose the largest split size (2GB), you would achieve similar results to the Adaptive Mapper. However, the you might potentially expose yourself to the imbalanced workloads that sometimes accompany very large splits.

The following chart shows the results of a benchmark for a join query on TERASORT records. Again the AM bar (Adaptive Mapper) in the chart is based on a 32MB split size.

Adaptive MapReduce Benchmark: TERASORT Join Workload

In this case, the Adaptive Mapper results in a more modest performance improvement. Although, it is still an improvement. The key benefit of these Adaptive MapReduce features is that they eliminate some of the hassles associated with determining the split sizes, while also improving performance.

As I mentioned earlier in this post, a number of additional Adaptive MapReduce features are currently in development for future versions of BigInsights. I look forward to telling you about them when they are released…

In the mean time, make sure to check out the free online Hadoop courses at Big Data University. I previous blogged about my experiences with these courses in Hadoop Fundamentals Course on

Written by Conor O'Mahony

December 7, 2011 at 1:07 pm

Comparing HDFS and GPFS for Hadoop

leave a comment »

Here is a chart that compares the performance of Hadoop Distributed File System (HDFS) with General Parallel File System-Shared Nothing Cluster (GPFS-SNC) for certain Hadoop-based workloads (it comes from the Understanding Big Data book). As you can see, GPFS-SNC easily out-performs HDFS. In fact, the book claims that a 10-node GPFS-SNC-based Hadoop cluster can match the performance of a 16-node HDFS-based Hadoop cluster.

Comparing HDFS and GPFS for Hadoop Workloads

GPFS was developed by IBM in the 1990s for high-performance computing applications. It has been used in many of the world’s fastest computers (including Blue Gene and Watson). Recently, IBM extended GPFS to develop GPFS-SNC, which is suitable for Hadoop environments. A key difference between GPFS-SNC and HDFS is that GPFS-SNC is a kernel-level file system, whereas HDFS runs on top of the operating system. This means that GPFS-SNC offers several advantages over HDFS, including:

  • Better performance
  • Storage flexibility
  • Concurrent read/write
  • Improved security

If you are interested in seeing how GPFS-SNC performs in your Hadoop cluster, please contact IBM. Although GPFS-SNC is not in the current release of InfoSphere BigInsights (IBM’s Hadoop-based product), GPFS-SNC is currently available to select clients as a technology preview.

Written by Conor O'Mahony

November 30, 2011 at 1:07 pm

Informix Users are Going to San Diego

leave a comment »

It has just been announced that next year’s International Informix Users Group (IIUG) conference will be held in San Diego, California on 22 – 25 April. The IIUG Conference continues to offer incredible value. Sign up soon to get the $695 early bird rate, and if you sign up for free IIUG membership, you even get $100 off that rate. $595 for a conference of this length and quality is amazing value. But you’re going to have to act fast to get this discount rate!

And, don’t forget that San Diego is such a great city to visit. Not only is it a wonderful city with an ideal year-round climate. But it also has fantastic array of attractions like the world-famous San Diego Zoo, Sea World, LEGO land, and the Zoo Safari Park (a personal favorite).

International Informix Users Group (IIUG) Conference

Written by Conor O'Mahony

November 30, 2011 at 9:22 am

Posted in DBA, IIUG, Informix

Tagged with , ,

%d bloggers like this: