The fastest I/O is the one you never do. By applying optimization, customers can dramatically cut the number of disk I/Os needed to run a database query. This is the story of one customer’s results from a 4-day EMC Oracle Assessment, and how they were able to avoid over 1 billion I/Os per day on a single database.
The Problem
The customer is a transportation company. They perform large nightly database updates. These updates have critical timelines for completion, so that the results are ready for use when other processes kick off in the morning. The updated data relates directly to the profitability of ongoing operations.
The application had been running OK, though it used a lot of hardware (Sun E10K servers and a pair of DMX-4 arrays, duplicated for DR and linked with SRDF). Since the application was purchased from an outside vendor, the tuning options were limited. The customer had worked with the vendor on several occasions to optimize the nightly batch processing. Benchmarks were conducted. Extensive examination was done of the data model. It appeared that the database and application were about as tuned as they could be, and more hardware was going to be the only path to keep up with growth.
EMC was engaged to perform an Oracle assessment. The proposed assessment provides a complete analysis of a single production database instance. It looks at everything from network traffic to CPU and memory usage to semaphores and buffer usage to file systems and disk I/O. This particular assessment is conducted in a 4-day format:
- Day 1: Install the tools
- Day 2: Gather the data
- Day 3: Process the data
- Day 4: Discuss the results
When a system generates this level of I/O, the reports coming out can get rather large. There is also an index noting which areas are the highest priority to fix, and the risk level associated with each change.
The Results
There were several high priority suggestions that came out of this report, but one was critical. There were a few SQL statements that were shown to be generating a huge volume of I/O that was flagged as unnecessary.
APPLICATION Observation: Some SQL statements are performing an excessive amount of I/O. This can slow performance, since this might cause other cached data buffers to be quickly aged out of the SGA, thereby causing subsequent queries to perform physical I/O rather than logical I/O. This can be caused by over-indexing tables, joining tables in the wrong order, or using an inappropriate index, or if there is a missing index that Oracle could have otherwise used to reduce the total number of I/O's. This can also be caused by SQL statements which use numeric group functions (such as AVG, COUNT, MIN, MAX, and SUM). Queries which perform many Logical I/O's consume an excessive amount of CPU cycles and can cause latch contention problems, due to the serialization of latches. The highest number of I/O's per execution measured was 36,406,850. The total number of excessive I/O's measured was 2,505,090,174.
With the additional data around the execution plan for these statements that was included in the detailed report, it was easy to identify what should be changed. As a result, the customer was able to take a clear set of directions back to the application vendor for a fairly simple adjustment.
Conclusion
The vendor was able to quickly integrate the changes and eliminate over 1 billion I/Os per day. The assessment cost less than the value generated by not missing the nightly batch window for a single night. And it may have eliminated the next hardware upgrade cycle, or at least allowed it to be completed with a much lower scale of system and storage. This is an example of the results that are possible with a custom Oracle engagement from EMC. Obviously, most systems will not get this level of performance boost, as most systems never get near this level of I/O. So how many I/Os might your systems be able to avoid?