Thoughts on performance

Every Informix DBA is concerned about performance (rather oddly, as in general, Informix performs very well out of the box!)

But things very rarely go wrong in normal implementations, so everybody focuses on performance to have something to do. The problem is that it’s rarely Informix’s “fault” when things go wrong. However, based on my personal experience over 35 years, the very first thing that everyone will blame is the database.

I have had some occurrences of performance issues caused by the database, but even then, these are often down to human error.

In general, the distribution of performance problems is that 1-2% will be hardware-, OS-, or network-related, 2-5% will be database- or configuration-related and the rest will be “application issues”.

Note that I regard indexing as an application issue, because the indexing strategy is defined by the application usage.

Also note that the impact of a problem is not related to the likelihood of the problem. Network problems are rare, but when they occur, they can slow everything down. On the other hand, an application problem requiring indexing changes and / or code changes could affect a single transaction.

There are a lot of best practise recommendations that fall into the “angels dancing on the head of a pin” category: you should do them in case something happens in extremis, but in normal use it’s not likely to matter.

So, when you’re faced with a problem, you need to do a triage. Some problems are self-evident (if you have sufficient experience!) So if you’re working in the shell and there’s a long delay between pressing a key and the character appearing on screen, that’s almost always a network issue. If you have a transaction that used to be fine, but suddenly slowed down, that’s almost always an optimiser issue, which could require an index or UPDATE STATISTICS.

But it’s always vital to understand WHY something changed, and often the answer you’re given is “nothing”. Persist, because this is always inaccurate. Someone has inevitably done something “that couldn’t possibly affect this”.

I was really lucky recently to be faced with a performance issue where there was a very clear change and they told me about it relatively early on. This rendered the optimiser statistics not just useless, but positively detrimental to performance. In this case, dropping the medium and high level statistics completely resolved the performance issue, that had started badly in the morning and got progressively worse during the day as data volumes increased.

Another particular bug bear of mine is “oh, the indexes are fine, the optimiser is using an index”. Using AN index is not the same as using the CORRECT index, especially when the correct index does not yet exist. An easy way of seeing if your index is not as good as it can be is if the SET EXPLAIN output includes one or more “Filter” conditions. If it does, it means that the index is selecting a set of rows which then have to be manually inspected by the engine. It is MUCH more efficient to index all the columns to be compared to values so that only the index is used to identify all the rows you will need.

Quite often people will use SELECT * or a collection of unnecessary columns as well. This adds to the amount of data taken off the server and sent across the network. In some cases it might be worth pushing data into an index so that you can do a key-only search and not access the table data at all.

Also, if you’re a 4GL programmer, you’re probably quite used to writing reports that go something like this:


#note that this is not syntactically correct code!

START REPORT...

DECLARE c CURSOR FOR SELECT * FROM table WHERE ... ORDER BY 1, 2, 3

LET cnt = 0

FOREACH c INTO p_table.*

LET cnt = cnt + 1

DISPLAY cnt, " rows processed" AT 10, 10

OUTPUT TO REPORT r_summary(p_table.*)

END FOREACH

FINISH REPORT

REPORT r_summary(r_table)

ON EACH ROW
SELECT foo FROM dingbat WHERE prop = r_table.prop
IF r_table.prop = "ZOOT"
THEN
LET zoot_cnt = zoot_cnt + 1
END IF

LET tot_val = tot_val + r_table.val

ON LAST ROW
PRINT "ZOOTs:", zoot_cnt
PRINT "Total:", tot_val
PRINT COUNT, "rows processed"

END REPORT

I’m not judging you. I used to write the exact same code, because I grew up with Standard Engine and 9600 baud and the best way of showing a user that the computer was doing something was to show them something happening. They wouldn’t care that the code was inefficient, because they could clearly see that something was happening and it looked fast.

But the truth is, this model doesn’t scale. It’s horrendously inefficient when you fetch an entire row and only use one value. It’s insane to use a report to calculate an aggregate. I’ll admit this is an extreme case, but I see lots of code like this out in the wild.

Nowadays you would almost certainly calculate your counts and totals in a single select and output that to the report. The problem is that fixing reports like this after decades of data growth is extremely difficult. The code will never be as simple as that outlined above. There will be many lookup and aggregate selects scattered throughout the code, in the driving cursor(s) as well as in the report.

Ideally you need to prune the driving cursors to a) fetch only the columns you actually use and b) join all the singleton lookups and aggregates that you do all over the place into one select. This forces a lot of work back into the engine and reduces network traffic, which becomes a problem at scale. Just because 4GL is legacy code, doesn’t mean that you shouldn’t try and refactor it.

I could probably go on for ever, so I’ll stop here and possibly revisit this on another day.