% BCS Electronic Publishing specialist group conference, 23 Sep 1987 % % review by David Osborne, Cripps Computing Centre, Univ of Nottingham % [ JANET: cczdao@uk.ac.nott.vaxa ] 3-5 Oct, 1987 % \def\quid{{\it\$}} % credit to Malcolm Clark for this! \centerline{\bf BCS Electronic Publishing conference} \centerline{\bf 23rd September} \centerline{\bf Storage Technology in EP} \medskip \noindent This was the first meeting of the BCS EP group I had attended, though the advance programme described it as the fourth full day meeting of what is now the third year of the group's existence. The meeting was held at the School of Oriental and African Studies in London, and was well-attended by representatives of both commercial and academic organisations. In the absence of the scheduled speaker from Hatfield Polytechnic, Ken~Jamieson of Microrite (chairman of the EP group) gave the first presentation on an Overview of Relevant Storage Technology, setting in context the theme of the meeting. He reviewed both the familiar and unfamiliar media available for storage of text and fonts, leading on to the new technologies of CD-ROM and WORM which were to be described more fully later in the day. His description of exchangeable 15.6Mb hard disks for PCs enlightened many of those present who, like myself, had not come across these devices. Leading on from this, he contrasted the amount of storage needed for an average A4 page of text, at about 0.5Mb, with that of raster images which are increasingly demanded in documents including both text and graphics; eight quarter-page graphic images to be rendered at 300 dots-per-inch would require about 2Mb of storage. Andrew~James, Ventura Product Manager at Rank-Xerox, then led a discussion on the storage problems experienced by those in the audience. Andrew has written a review of the DTP (desk-top publishing) market in the September 1987 issue of ``PC~User'' from his involvement with Ventura, so his interests naturally centre on the special problems of PC users. As he pointed out, most PC publishing users have access to greatly restricted resources compared with text-processing users on minis or mainframes. Shortage of disc space is a common difficulty, particularly with the growing use of packages using bit-mapped font files (in the pages of \TeX line, one such immediately springs to mind!) On a slightly partisan note, he commented that some DTP packages such as {\it PageMaker\/} exacerbate storage problems by using large, monolithic document files, whereas others such as {\it Ventura\/} minimise these by using smaller files collected together. Having discussed the problems of storage the meeting then turned to techniques for rapid capture of text in a pair of presentations on OCR and desktop scanning. Alan Howard of Kurzweil gave a fascinating description of the Kurzweil ``Discover'' intelligent scanner. Intended for use with a PC, this uses a special processor board designed by Kurzweil which embodies expert systems software to perform pattern recognition on the scanned material, which may be text or graphics. The text-mode of scanning seems particularly sophisticated, being able to ignore graphics and ``noise'' (such as handwritten comments, staple marks and coffee stains), and to scan an A4 page in about 70 seconds with a 95\% assurance level of character recognition. The ``format analysis'' performed by the processor even takes account of the page layout and indent tabbing. For output, the conversion phase can produce a file in formats acceptable to most common word-processing packages as a mixture of text and markup commands. In its graphics mode the scanner can perform contrast enhancement and can handle continuous-tone photographs, with output in formats suitable for {\it Ventura\/} and the {\it Xerox Documenter\/}. It was perhaps unfortunate that a traffic incident delayed the arrival of Sandra Woodley from Sintrom, since this led to her having to speak after the Kurzweil demonstration --- a difficult `act' to follow. She described the Datacopy OCR scanner, which uses a different, less sophisticated recognition technique and is therefore restricted to coping with a set of 19 fonts, up to 9 different fonts on one page. There is a ``learning'' mode in which the user can teach the scanner in about 20 minutes to recognise a new font which can then be added to the machine's library of pre-defined typefaces (such as Courier 12 and Times Roman 10 \dots not a Computer Modern to be seen!) At present, the scanner can only cope with text but will handle both mono- and proportionally-spaced fonts, though kerning and ligatures can cause some problems. Again, the scanner is PC-based and can generate output for a variety of word-processing programs. Its ability to scan bound volumes without them having to be unbound is certain (I nearly said ``bound'') to make it popular with librarians. In comparison, the decision of which scanner to buy comes down to ``paying your money and taking your choice'', since the Datacopy costs \quid 1658 for the flatbed scanner and \quid 742 for its software, while the much more sophisticated Kurzweil is pitched at \quid 8000 for the package of scanner, processor board and software. We then progressed into more high-tech with a presentation on WORM (Write Once, Read Many) technology from Geoff Pullen of Microscribe. Unlike CD-ROM, there is no current world standard on formats for the optical discs used in WORM, and discs are manufactured for specific drives. The discs themselves are made of either glass or optical polycarbonate and are now guaranteed by the manufacturers for 30 years, though there was some discussion as to whether this included integrity of the data recorded on them. There are many types of drive available, and capacities of 1 gigabyte per side have been common since 1983. Three sizes are in use: 14-inch, with 6.8Gb capacity; 12-inch; and 5\frac1/4-inch, offering 115Mb--800Mb per disc. ``Juke box'' storage for up to 200 12-inch discs is available. Image data from WORM discs can be retrieved in standard MS-DOS file formats, allowing their exchange between media, once retrieved. An entry-level 5\frac1/4-inch system should cost \quid 16-20000, while top-level systems are much more expensive at \quid 84000 for the Racal REOS, and \quid 100000 for the Philips Megadoc. Despite these high prices, the utility of WORM for secure storage of large volumes of text is clear. Contrasted with WORM was the other optical medium, CD-ROM, described in a talk by a speaker from Saztec Ltd., in the context of their contract with the British Library for the conversion of the BL's General Catalogue. This is a huge task, since the catalogue details 4.2 million volumes deposited with the library, and occupies 360 volumes which take up 42 feet of shelf space. These contain 176,000 pages, representing about 1 billion characters. The conversion phase of the contract is costing about \quid 2M and will occupy about 70 people for 4 years. The claim that this is an unparalled task was disputed by Lawrie Newton of OUP, who stated that the computerisation of the Oxford English Dictionary is of a comparable scale. The opportunity is being taken of restructuring the catalogue as part of the conversion to optical disc. The CD-ROM discs are similar to the audio discs familiar to hi-fi enthusiasts and have an unformatted capacity of 600Mb, giving about 550Mb of storage when formatted and indexed. This means that the British Library catalogue can be held on merely four discs. CD-ROM is a suitable publishing medium for unchanging reference data, such as the BL catalogue, and can hold both text and graphics, offering the ability for rapid searching and the downloading of selected text for later processing on a PC, for example. The current high costs of mastering and production mitigate against their use for rapidly changing data, or where the production run is small, although the hardware to read them is relatively cheap, at about \quid 200 per drive. The data preparation for CD-ROM costs about \quid 9000, with mastering an additional \quid 2000 and replication costs of \quid 12--\quid 18 per disc. This puts the cost of a production run of one disc at about \quid 12000, but \quid 25000 for a run of a thousand. Rounding off the meeting, Mike Daines of Signus discussed ``Soft Scanning Techniques for fonts'', with particular reference to Signus' involvement with the {\it Ikarus\/} system. Signus specialise in the production of digital typefaces for various manufacturers, both for typesetters and low-resolution devices such as laser and dot-matrix printers. Mike noted that ``resolution independence'' of typefaces can be achieved by storing the font at a higher resolution than that used by any available device; this is the method adopted by the {\it Lucida\/} family, by Bigelow and Holmes (which they described at {\it EP86} in Nottingham). It is also used by {\it Ikarus\/}, which has a resolution of 15000 dots in {\it x\/} and {\it y\/} for an em-square. He then noted the advantages of hand digitisation, claiming a that an effective time of 2.6 minutes per letter could be attained, once the draft designs had been marked up. Signus digitise to an accuracy of \frac1/{100}{\it mm\/} for a letter 10{\it cm\/} high. Almost all digital formats can be calculated from this hand-digitised base. Soft scanning consists of calculating the intersections of the Ikarus `DI' format (which uses circles, arcs and straight lines) with the raster pattern of the display device, producing scan-lines for output. This method allows later reprocessing of the fonts to perform any corrections which may be necessary. Summing up, Mike described what he felt were essential requirements for digital typesetting: fonts should have a square raster resolution (equal in {\it x\/} and {\it y\/}), a constant database of resolution-independent character definitions and a common font format; printers should have a minimum resolution of 600 dots-per-inch, with fonts downloaded from their host machine rather than printer-resident. \medskip\noindent (The next meeting of the EP group is scheduled for November 26th, to discuss DTP in the printing industry, with speakers including Pierre Mackay and representatives from OUP and John Wiley). \medskip \rightline{\it David Osborne, Nottingham University}