The Pathlan Software Y2k Bug 1) The PathLAN software Y2K Bug The PathLAN software in use at Sheffield was derived from a standalone PC package. The specific area which failed was a date calculation module which extracted date information from strings to calculate maternal age at EDD. Date information was stored as a 10 character string: dd/mm/yyyy specific items of information were extracted using a ‘Move’ command from either the Birthdate field or the Cldate [Collection date] field. Assuming a birth date of 12/04/1960 and a collection date of 11/03/2000, the original lines read and would have extracted the following: Move Birthdate[4,2] to #N0 extracts 04 Move Birthdate[9,2] to #N1 extracts 60 Move CLdate[4,2] to #N2 extracts 03 Move CLdate[9,2] to #N3 extracts 00 And the corrected lines read and extract: Move Birthdate[4,2] to #N0 extracts 04 Move Birthdate[7,4] to #N1 extracts 1960 Move CLdate[4,2] to #N2 extracts 03 Move CLdate[7,4] to #N3 extracts 2000 The values are converted into a number of weeks by multiplying the year figure by 52 and the month figure by 4. The ‘Birth weeks’ are subtracted from the ‘collection weeks’, 40 – gestation period is added to arrive at a number of weeks to expected date of delivery which is the divided by 52 to arrive at an age in years [as a decimal number].
As a date calculating routine, this is a very poor routine. Firstly, it ignores day of birth so a woman born on the first day of the month gets the same age at EDD as one born on the last day of the month. I suggest that even though the calculator has been used for years and that the Y2K error has been ‘solved’, a date calculation routine that actually calculates dates properly rather than making grossly inaccurate calculations should be used instead. An example that could be used is shown later in this report. A further problem with the date calculation routine is the lack of any error trapping routine. In my software (Downcalc), ages outside of the range 12 – 54 years are automatically rejected. This should have been implemented in the PathLAN routines. A further alternative that would have helped would have been to report the age used in the calculation, rather than just the date of birth.
This is of course easy to say with the benefit of the retrospectoscope – but should be a recommendation of any final report, to prevent others from having a similar problem. The most critical question is whether the software fix described above has truly repaired the PathLAN program and has returned the calculations to ‘normal’. I have yet to fully analyse the before & after data from PathLAN but I have carried out 2 exercises that make me 99% certain that all problems have been solved. Firstly, I calculated a prediction of the age-related underestimate of risk that would be expected if the only problem was the millennium bug described above (Figure 1) Figure 1: Predicted risk underestimate Then I took a selection of 30 patients with incorrect & correct risks representing a range of ages from 18 – 43 years, and derived the observed inaccuracy, and saw how this fits with prediction. Figure 2: Observed variation in Risk It is clear that the observed variations lie almost exactly on the predicted line.
The slight variation is probably due to the date routine error. This means that for the randomly selected 30 patients, of the 7000 (approx) cases the agreement between predicted and observed is exact. It would be almost entirely unthinkable that this could have occurred by chance. Therefore, we can be effectively certain that all of the error was due to the Y2K bug described above. Once all of the data is available [data search currently being prepared], I will carry out a final test but this is really for completeness rather than to satisfy any lingering doubt.
A Final test based on 6240 results and using the same graphical technique as figure 11 is shown below. This demonstrates that the vast majority of results lie on the predicted line. Only a small proportion lie off the line. These represent results which had recalculation of values due to changes in gestation date or other factors. They are clearly few in number so it is valid to conclude that correction of the one Y2K Problem has satisfactorily returned the Sheffield screening program back to its previous position. Figure 3. The remaining part of this report will deal with areas of the Sheffield Down’s screen which I believe ought to be amended immediately within PathLAN to provide confidence that no future errors are missed and secondly, amendments that should be made to the program when any new computer system is installed.
Understanding of the issues included here may be aided by the description of how Down’s risk calculations work later in this report. 2) Further Problems with the PathLAN software 2a) Age Risk Calculation The age risk calculation formula used in the Sheffield program is: [Equation 3 – as in PathLAN] Risk (R) = 0.999373 + e(0.286.Age – 16.2395) —————————————– 0.000627 + e(0.286.Age – 16.2395) [Equation 3 – restated] Risk (R) = 0.999373 – e(0.286.Age – 16.2395) —————————————– 0.000627 + e(0.286.Age – 16.2395) The effect this has on screening is minimal but should be corrected. Table3: Correct & incorrect age risks due to PathLAN bug Age Correct Age Risk Sheffield Age Risk 15 1577.62550 1577.64591 20 1528.04102 1528.12360 25 1350.63542 1350.94047 30 909.29654 910.15502 35 383.99492 385.51214 40 111.85583 113.71430 45 27.54383 29.50804 46 20.53980 22.51279 47 15.23659 17.21623 48 11.22891 13.21358 49 8.20468 10.19314 50 5.92507 7.91639 2b) Calculation of Median Values & Population Parameters Currently median values are calculated on a ‘completed week’ basis. I have previously demonstrated this to be an ineffective way to correct for maternal age in some screening programs. The table shows the population distribution derived from patient data and current medians. To be an effective correction factor, the median & mean values should be close to 1 (or 0 in log column) and in fact they do conform quite well, having an error of 0.6 – 2.5% but use of a week + day gestation dating with exponential regression to determine medians should improve matters [Burton figures show errors of 0.009 – 0.06%].
It is not possible to calculate the effect of day + week derived exponentially Computers Essays.