The Co-Intelligence Institute/Y2K Return to Y2K home RETURN to CII home


Routine that became a meltdown


By Sue Ashton Davies

from The Australian, 30 June 1998
(tranferred to the CII website because the original is hard to read.)



A routine Friday afternoon batch job turned into disaster when a computer meltdown brought a manufacturing system to its knees.

The computer room was humming, and all systems were go for one of Australia's largest manufacturers.

Then Jeff Steel, project manager of Infact Consultants, reset the system clock to January 7, 2000, and waited to see what would happen.

The routine batch job, which involved 800 custom-built Cobol and PL-1 programs in a manufacturing mainframe environment, was expected to take six hours to run.

Close by, a terminal in the control room was set up to track the programs as they went through the batch run.

Although he anticipated some problems, Steel was not prepared for anything coming out of left field.

His team of 12 programmers had worked methodically for nine months, manually sifting through millions of lines of code, rectifying the double digit issue to take account of the year 2000.

Great care had been taken to keep the crew motivated and focused on the their tasks to ensure time was spent productively and any reworking was kept to a minimum.

At worst, he expected to make some specific changes that could be easily spotted.

Operations had hardly begun before the first programs started to run slowly.

By the time the sixth program started, the system began to falter. Then, one after another, programs fell over.

By the time the 10th program failed, Steel decided to let the job run to the end, because in all likelihood, it would be all over in half an hour anyway.

Within minutes, 750 programs had fallen over. One of the few programs to continue running was invoicing, but it was producing invoices for the 43rd day of the 14th month.

As the job finally ground to a halt, a silence hung over the room as everyone stared vacantly into the terminal.

Steel stood frozen to the floor in shock, as did his team, which had been contracted to fulfil a $3 million contract.

Twelve people stared at the terminal where a complete suite of programs had died instantly.

Fortunately the meltdown had taken place in a test environment.

The search was now on to diagnose the problem. One of the team tracked down the problem to an obscure mainframe program.

The culprit was a non-Y2K compliant link editor on a PL1 program that last ran in 1987.

A link editor takes different modules of a program and puts them together in the right place at the right time.

With the problem identified and a Y2K compliant link editor installed, the 30 programs were rerun and the problem was solved.

Steel says the use of the test environment saved the company from bankruptcy.

"The consequences in a live environment would have been devastating," he says. As well as bringing the business to a standstill, it would have rendered it unable to operate for six months ­p; and possibly taken suppliers and customers down with it.

Situations like this are typical of what's happening and testify to the truth of rumours about large companies not yet meeting Y2K compliancy requirements, Steel says.

The post-mortem meeting found that the collective time required to diagnose such an obscure problem in a live environment would been about a month, and a fix would have taken six months.

"The problem was so unusual, you wouldn't have known if it was hardware, software or system
utilities," Steel says. "The horrible thing about it was that it was such an obscure component that
nobody even thought that it could fail."

Even with hindsight, the problem could never have been spotted before testing because it was too obscure.

"In nine months of remediation, no-one had ever got near this problem," he says.

Steel says the meltdown was so catastrophic that even a contingency plan wouldn't have saved the day.

The only way to find the Y2K bugs in a system is to manually trawl the program code line by line to find the date fields, some of which are very obscure, he says.

One area for dates was embedded deep in a job control language, where a sort of 30 characters revealed six characters making up a date.

Even though the testing is complete, Steel cannot say definitely that the system is now 100 per cent Y2K compliant.

As part of the strategy to protect himself and the company from any legal recourse, he operated with an auditor looking over his shoulder at every stage concurring that the way he was progressing was the best available method.

"All I can say to the client is that I can't guarantee that there will not be any problems after the year
2000," he says.

Steel says most organisations don't understand Y2K.

"Until something like this happens, they don't understand what Y2K can do to them," he says.



© News Limited 1998
 
 
from THE AUSTRALIAN