Wed, 12 May 1999
Sheri Nakken (Y2k Network)
Y2K & Nuclear Safety
From Gary North... Nuclear power plants are more analog than digital. This fact is used by the industry to play down the threat of y2k. Nevertheless, the plants are required to report their safety situation, and these systems are digital and are at risk. Second, the plants are dependent on the grid to supply them with power.
This is from a British industry representative. The paper was delivered at a meeting of industry representatives in September, 1998, in Vienna.
The Millennium Bug and Nuclear Safety
Mr L G Williams
Director of Nuclear Safety Directorate
Chief Inspector of Hm Nuclear Installations Inspectorate
The potential threat of the so-called millennium bug is now recognised by many of the world's governments, businesses and financial institutions. Significantly, the issue was discussed at some length by the G8 countries at their recent meeting in the UK. In the nuclear sector, OECD/NEA have issued a questionnaire to member states seeking statements on their proposed actions to deal with the matter, and are planning a workshop on the topic in February of next year to review progress. IAEA has also raised the profile of the issue by including it as a topic at its recent meeting on nuclear radiation and waste management. Similarly, at an EC sponsored meeting of Eastern/Western European nuclear regulators in June the topic was thoroughly reviewed and verbal statements of progress provided by the attendees. Indeed, there is clear evidence of a proactive approach by many countries. For example the US NRC has issued an Information Notice and the US DOE Office of Environment, Safety and Health has issued a bulletin on its Internet web site on the topic. In the UK, my Directorate has issued a generic letter to all its nuclear licensees and has received strategies and action plans from them.
The purpose of my presentation today is to share with you my concerns, and the associated activities which are taking place in the UK to address them. It is important for us all to understand that the potential for coincident (common cause) failures of safety-related computer systems on certain dates is real, and one which must be actively and systematically addressed . Appropriate contingency plans will also be necessary. However, whilst technically-based, the handling of the issue is essentially one of strategic management. The nuclear industry has a clear responsibility to tackle this issue positively and comprehensively so that this common-cause event cannot cause undue risk to the public.
I know that each country has primary responsibility for ensuring that its own nuclear industry deals appropriately with this potential safety threat. However, because of the international nature of nuclear energy, all countries should be prepared to share information on approaches and experiences so that we may benefit from each other's knowledge whilst also gaining mutual re-assurance that the problem is being tackled effectively, on a global basis.
THE YEAR 2000 PROBLEM
As you know, the millennium bug, or Year 2000 computer problem ,stems from the practice of dropping the century digits when storing and manipulating dates. This use of only the last two digits to represent the year means that the year 2000, represented as '00', will appear to be earlier than the year 1999, represented as '99'. Thus the use of such two-digit dates in computations will result in anomalies which if not compensated will give incorrect results. A system may become locked into an updating loop because it assumes '00' is an invalid date. Restarting some systems in the new millennium may be a problem because a date entry of '00' will be regarded as invalid. Other potentially problematic areas include the failure of trending and averaging routines in control and surveillance systems as they encounter date discontinuities.
Already, in the UK's nuclear industry there have been discovered instances of date-related problems. For example, corrective action is known to be required on a distributed process control system, a burst can detection system, a feedwater chemistry monitoring system and a fire alarm system. A fuller listing including more recent examples, is given in Annex 2.
TYPES OF SYSTEM POTENTIALLY AFFECTED
We all know that many aspects of the operation of a nuclear installation employ computers. This means that the date problem has the potential to affect every activity on a site. In some cases it may be easy to see that a computer is present in a system because it is overtly visible. But in some instances a computer may be embedded (see ref 3) in a device without any visible indication of its presence. Even having identified a computer's presence, determining its usage of a date will not necessarily be straightforward, in that the obvious signs, such as a displayed date or a need to enter a date, may not always be present even though the date is used. For instance, some systems obtain their date and time from a central host computer or a network server or, in some instances, via a radio transmission.
As can be seen, determining the extent of the systems that may be affected could require appreciable detective work but the types of system that need to be considered include:
safety systems (protection systems, safety actuation systems and safety system support features);
The above list, which is not exhaustive, represents a considerable amount of effort in reviewing and testing. There is also the potential need to undertake a significant programme of modifications and upgrades, the full extent of which can only be confirmed when the time-consuming and labour-intensive tasks of identification and testing have been completed. Given these programming uncertainties, and the immutability of the associated dates, it is clear that urgent effort must be applied, in order to be sure of achieving a robust demonstration of safety in relation to each date of concern.
THE SAFETY CONCERNS
I can report that the majority of UK's nuclear plant protection and safety actuation systems do not employ computers; and in those few cases where computers are employed the trip/actuate actions (a) do not use date information in the on-line systems and (b) for the higher risks are additionally backed by independent, hard-wired systems. Furthermore, the safety case/analysis for any nuclear installation has, necessarily, to consider the effects of single and multiple systems' (computer-based or otherwise) failures. In principle, a date-related failure is simply no more than another failure-cause within the fault sequences already analysed for that installation. These two factors give me comfort that a UK nuclear installation would safely shutdown if a serious date-related failure were to arise. However, the potential extent of the common-cause failures' effect must be acknowledged, and the associated possibility that this might, in some cases, challenge the bounds of the current fault-sequence analyses. For example, the simultaneous failures, on a given date, of only some of the systems I listed earlier could place additional, and unanalysed, burdens on an installation's staff. Erroneous information may be generated which, if not checked, may result in unsafe actions; and many computer-assisted operations may require to be performed manually with all the potential, due to the increased stress, for human error to occur. Although we may feel confident about the inherent safeties of our installations it is nevertheless essential that we carry out the examinations and analyses which positively confirm and secure these expectations.
Another source of concern is loss of grid supplies caused by computer failures in the non-nuclear generation system. Simultaneous loss of grid due to other stations tripping would cause major loading transients. The grid may not then be restored for some time leading to a need to run the diesel generators for longer than is normally the case. Should stocks of diesel be insufficient for such an event, there could be a developing emergency. Diesel suppliers may be unable to meet the demand; indeed some may be experiencing their own millennium problems. This possibility thus places a duty on nuclear operators to secure, for the millennium-related dates, the supply of all such items.
Some operations, of course, are not of a continuous nature and hence can, and perhaps will be, shut down over the date-critical periods. This action should not be regarded as ensuring safety since the date discontinuity may result in the plant entering an unsafe (possibly unrevealed) state on start-up which may then result in an accident. Operators of such plant have a duty to carefully review, and where necessary rectify, date-discontinuity problems; the approach of taking no corrective action, on the basis that shutting down over the critical periods constitutes an alternative and appropriate handling strategy (commonly termed a 'work-around'), should be regarded as unacceptable.
ADDRESSING THE PROBLEM - NUCLEAR OPERATORS' ACTIONS
The generally agreed approach (see ref 1) for dealing with this potential threat is, firstly, for nuclear operators to identify all systems on their site which contain software (including those employing embedded software - sometimes known as firmware). Since safety is the concern here, they should separate out those systems which have been identified as having a safety significance or which ensure nuclear safety. As a diverse check, a review should also be undertaken of all systems important to safety identified in the installation's safety case. As mentioned earlier, embedded software may not be self-evident, so careful investigation may be required. This list of systems should now be prioritised in terms of safety significance.
Based on this prioritised list, systems should be reviewed for potential date-related problems. Here manufacturers may need to be contacted. Also, reviews of maintenance and operators' manuals may be necessary, as well as consulting with operational and maintenance staff themselves, so as to identify systems employing dates. Any date-related failure modes must next be established through inspection and test (see ref 2). Clearly at this stage plant safety must be paramount; hence the risks associated with the investigative work must be assessed. Finally, any problematic systems need either to be modified, or replaced and re-tested, or to have safe work-around strategies devised. The interaction of these work-arounds needs to be considered since individually they may be adequate but the invocation of several at the same time may prove unmanageable, or incompatible in a safety sense, and hence constitute a hazard. All of these latter activities should, of course, be covered by the established modification procedures (including the associated quality controls) applicable to the plant and its operations. Timely training programmes will need to be developed so as to ensure that staff are fully familiar with any new or revised procedures well before the associated critical dates.
Nuclear operators should establish that their own suppliers of safety significant items (equipment and consumables) are dealing satisfactorily themselves with the Year 2000 problem. Consideration should also be given to emergency arrangements. In particular, the equipment should be checked and contingency plans laid, possibly including manning any emergency facilities at the millennium change. Also, where appropriate, headquarters' equipment should not escape scrutiny since some may have safety implications.
Despite having taken all the above precautions there may still be undetected errors (both old ones and, due to the modifications, new ones) - the 'residual risk'. Hence, operators will require contingency plans tailored to handling possible multiple systems' failures and the associated consequences - e.g. a major plant failure. Such plans may include, for example, double-shifting over the critical dates associated with the millennium change and the pre-manning of emergency control rooms. Additionally, where possible, all invasive plant operations (e.g. on-line refueling) at the critical dates should be avoided; and all necessary resources (e.g. fuel and communications) dependent upon external suppliers need to be prior-secured.
ADDRESSING THE PROBLEM - REGULATORY ACTIONS
As regulators we must ensure that nuclear operators are aware of, and effectively responding to, the problem; specifically, that each has an adequate strategy and action plan in place to deal with the safety issues. We will also need to monitor the implementation of the action plans; to review any safety submissions arising from the investigations and subsequent modifications; and to oversee, as appropriate, the arrangements each nuclear operator has in place for the critical dates. Finally, as mentioned above, the regulator's own emergency arrangements and the equipment required for that activity should be checked to ensure that no problems will arise due to date discontinuities. Similarly, the regulatory bodies themselves will need to be in an adequate state of alert during the critical periods.
CURRENT UK POSITION
My Directorate is using its regulatory powers to ensure that the UK's nuclear licensees are addressing the issues posed by the critical dates around the end of the millennium. The UK has a total of 15 nuclear licensees who operate 40 licensed sites; these include power reactors, research reactors, nuclear chemical plant and naval dockyards. We have ensured that all operators are aware of the matter, and we have been ensuring that each has an adequate strategy and action plan in place to address it. These operators are currently implementing intense programmes of work, involving the preparations of inventories, the safety-prioritisations of the systems, investigations, and the provisions of solutions where required. They also have to consider the need for special, additional, contingency arrangements to be in place at the critical dates. Active sharing of information between the operators is taking place, and all have attended a workshop to further this exchange - another such workshop is planned for the near future.
My Inspectors are monitoring the implementation of the operators' action plans (which so far appear generally to be holding to programmes); reviewing any safety submissions arising from the investigations and subsequent modifications; and will be ensuring that the arrangements each operator has in place at the critical dates are adequate. To this end we have developed a set of assessment criteria against which the operators' strategies, guidance, procedures and activities will be judged. Each licensed site will be expected to produce a justification for continued operation prior to each of the critical dates. It is intended that this justification will cover not only plant which is required to operate through the critical dates but also those that will be shut down and then restarted after a critical date. In the latter case the safety of the shut-down state will need to be justified as well as the safety of start-up.
Finally, we intend to check the adequacy of our own systems and arrangements for the millennium change and the other key dates.
Essentially, the millennium bug has the potential to affect the safety of any nuclear plant in any part of the world. Whilst the avoidance of an associated, uncontrolled, release of radioactivity is paramount, the inadvertent shutdown of one, or several, nuclear power plants, albeit safely in a nuclear sense, may still be a far from acceptable outcome if, for example, this were contributory to the general collapse of a country's grid supplies.
Whereas there is no avoiding the need for every plant owner individually to do all that is necessary for the purposes of ensuring (and demonstrating) 'millennium dates' safety, effectiveness is much aided by awareness of how others are tackling the same problem and of what they are finding in the process. Similarly, because of the global safety threat potentially posed, all of us wish to be assured that the matter is being tackled systematically wherever nuclear plants are present, and whatever their operational states.
It is clear that there is already much international exchange on the topic, fostered by the various international bodies. The continuation of such exchanges is vital, and is one in which UK will play its full part. Modern electronic communications such as those available via the Internet, provide excellent opportunities for posting information globally through web-sites and home pages, etc. (OECD/NEA has, in fact, organised a mail box whereby nuclear regulators can exchange such information). Organisations like IAEA and others can play a key role in facilitating such exchanges, both through their meetings programmes and topic workshops and also by acting as communication focus.
In conclusion, the potential safety concerns associated with the year 2000 computer problem are real. They need to be addressed systematically, comprehensively and in a timely manner by the nuclear industry. We as regulators have an important role in ensuring that their nuclear licensees are vigorously pursuing action plans to identify potentially problematic computer-based systems and to test them for their safe operation. Where safety cannot be so demonstrated, the associated systems should either be corrected or safe operating procedures devised. In addition, contingency plans should be made to ensure safe operations at the critical dates and the adequate supplies of safety significant items. The argument that computers are not being used in safety systems (or in some cases control systems) provides a certain level of safety assurance but does not remove the need to systematically demonstrate that the date-related (common-cause) failures in other systems could not directly or indirectly threaten safety.
The current efforts by international bodies such as IAEA to stimulate exchanges of experience in handling this issue are vital. We need to maintain (and if possible enhance) the momentum of these efforts so that we all may benefit from each other's activities.
1. "Health and safety and the year 2000 problem: guidance on the year 2000 issues as they affect safety-related control systems", Health and Safety Executive, INDG267 C1000 5/98.
2. "Testing safety-related control systems for year 2000 compliance", Health and Safety Executive, 1998, ISBN 0 7176 1596 0.
3. "Embedded Systems and the Year 2000 Problem, Guidance Notes", IEE Technical Guidelines 9:1997, ISBN 0 85296 930 9.
ANNEX 1 - Critical Dates
The critical dates are generally regarded as:
Additionally, 21-22 August 1999 might cause a problem to systems which depend upon the Global Positioning System (GPS); for example, the transporting of nuclear fuel where knowledge of its location is important.
A more extensive list covering non-millennium related problem dates can be found in Appendix B of ref 2.
ANNEX 2 - Examples of Date-Related Problematic Systems Which Have Been Found
The examples given below have been supplied by British Energy and Magnox Electric, who have asked that they be accompanied by the following text.
In common with many major companies across the world, the UK nuclear electricity generators, British Energy and Magnox Electric, have been addressing the millennium issue since 1996. The approach being taken is to ensure that safe and continued generation is achieved. Inventories of all potentially affected items have been drawn up for each station and a top-down inspection of safety case documentation and site licence conditions has been undertaken, to confirm that safety-related items are being reviewed. The NII, as the nuclear regulator, is being kept fully informed of the approach and progress.
Inventories of important systems have been built up and have been assessed for safety and business criticality, and prioritised accordingly. Systems assessed as "essential" or "business-critical" are targeted to be fully investigated and fixed by December 1998, and all other important systems by October 1999. Investigations will lead either to the conclusion that a system is already "millennium-compliant" or to a decision to :
* apply remedial work to make it compliant; * replace it by a compliant system; * accept any non-compliance where suitable "work-arounds" can be engineered.
The problems identified with plant systems (so far) have affected the date displayed, printed or recorded, the order of recorded data or have caused a system to fail to start-up or halt. Of a large number of items investigated the following are examples of problems which have either been fixed, or are in the process of being fixed, through the normal modifications process.
In addition, significant work is in hand to address business and technical computer systems.
Data Processing Systems
The data processing systems are being assessed for problems associated with sensitive dates. Types of problems found are:
* incorrect date-stamp on some entries in the alarm and event log, e.g. year set to 28 instead of 00, 29 instead of 01; * the rod-drop logger would not accept a date set beyond 1999, although it did correctly work through the date change to 2000; * the punch history programme which prints out data after a trip can, if the year is set to zero, follow a path which leads to an incorrect date being printed - this problem is being investigated.
A problem was found with a distributed monitoring and control system which normally obtains its date and time from a radio clock signal. In the event of the radio clock being unavailable, the systems would not accept "00" or "2000" as a valid date. This could have led to the system becoming degraded if parts of the system needed to be re-booted from cold. Under these relatively uncommon circumstances, if the fault was not corrected, then the system would have to work with the date set in the past with the potential for misinterpretation of date stamping. This fault has been rectified.
The provided access control systems fail due to excessive error messages being generated on transition to year 2000. Setting the date back or re-starting in 2000 is being investigated as a contingency measure. The systems are being upgraded.
Emergency Plume Gamma Monitoring System
Historical trend information is presented correctly if all data is in this millennium or all the data is in the next millennium, but trends do not appear correctly if the data is spanning the transition.
Main Turbine and Main Boiler Feed Pump Governors
The version of the operating system used in this equipment has a problem which prevents it being re-started in Year 2000. Upgrades to address this problem are being progressed.
Fuel Flask Leak Detection
This equipment includes a calibration date and a check that it is within a yearly calibration period. The comparison of current and calibration due date needs to address the transition from 1999 to 2000 (99 to 00). The software was intended to deal with this, but causes an illegal syntax error and halts the processor when it does this check on 1/1/1999.
Water Chemistry Control System
A water treatment plant control and chemical monitoring system has been found to work incorrectly in the year 2000. Whilst not causing a loss of feed water to the boilers, it had the potential to affect water quality resulting in the longer term in an increase in the number of boiler tube failures.
Activity in Low Level Waste Drums
A system which monitors the activity of low level waste stored in drums will not operate after 31 December 1999 because its calibration routine is not able to handle the change from 1999 to 2000. In addition it does not recognise 29 February 2000. If this system is not corrected it will not pose a direct safety hazard but could result in delays in the despatching of solid low level waste off the site.
Burst Can Detection System
A burst can detection system which monitors a reactor's primary cooling system for activity fails to scan the several inputs located around the reactor. Although it is still able to detect a leak of activity into the cooling system and will indicate this to an operator who will trip the reactor, the detection of this activity could be delayed further worsening the incident.
Maintenance Scheduling Computer
A maintenance scheduling computer is year 2000 non-compliant and requires modification. Whilst not affecting safety directly, problems with this system could mean maintenance was not carried out at the correct time and that there was an increased burden on the maintenance staff.
A Remote Emergency Indication Centre has a number of date related non-compliances which need to be rectified. If this is not done the efficient handling of emergencies would be in jeopardy.
Work Permit System
A system which is used to ensure the safety of personnel who are working on plant by preparing safety documentation needed to be replaced. Failure to have updated this system would have resulted in a manual alternative having to be brought into use with the additional burden on the operational/maintenance staff.
"There are only two ways to live your life: as though nothing is a miracle, or as though everything is a miracle." --Albert Einstein
Sheri Nakken Coordinator - Western Nevada County Y2K Preparedness
**VISIT OUR NEW BOOKSTORE ONLINE***
PO Box 1563, Nevada City, California 95959 Phone 530-478-1242
Business Owner - Well Within & Earth Mysteries & Sacred
Site Tours Broadcaster/DJ/Reporter
KVMR FM, Nevada City, CA
Host: The Y2k Forum, 1st & 3rd Wednesdays 12 Noon - 1 P.M.