\documentclass[a4paper]{article} \usepackage{array} \makeatletter \@ifundefined{l@nohyphenation}{\newlanguage\l@nohyphenation}{} \DeclareRobustCommand\meta[1]{% \ensuremath{\langle}% \sbox{\z@}{% \setlanguage\l@nohyphenation \normalfont\itshape #1\/% \setlanguage\language }% \unhbox\z@ \ensuremath{\rangle}% } \makeatother \DeclareRobustCommand\marg[1]{% \texttt{\char`\{}\meta{#1}\texttt{\char`\}}% } \DeclareRobustCommand\cs[1]{\texttt{\char`\\#1}} \makeatletter \DeclareTextFontCommand\textsmaller{% \fontsize{\scaledsize{\f@size}}{\f@baselineskip}\selectfont } \newcommand\scaledsize[1]{% \ifdim #1\p@>6\p@ \ifdim #1\p@>7\p@ \ifdim #1\p@>8\p@ \ifdim #1\p@>9\p@ \ifdim #1\p@>10\p@ \ifdim #1\p@>11\p@ \ifdim #1\p@>12\p@ \ifdim #1\p@>14\p@ 14% \else 12\fi \else 11\fi \else 10\fi \else 9\fi \else 8\fi \else 7\fi \else 6\fi \else 5\fi } \makeatother \DeclareRobustCommand\ETX{\textsmaller{ETX}} \DeclareRobustCommand\PDF{\textsmaller{PDF}} % From tugboat.cls \def\thinskip{\hskip 0.16667em\relax} \def\endash{--} \def\emdash{\endash-} \makeatletter \def\d@sh#1#2{\unskip#1\thinskip#2\thinskip\ignorespaces} \def\dash{\d@sh\nobreak\endash} \def\Dash{\d@sh\nobreak\emdash} \def\ldash{\d@sh\empty{\hbox{\endash}\nobreak}} \def\rdash{\d@sh\nobreak\endash} \def\Ldash{\d@sh\empty{\hbox{\emdash}\nobreak}} \def\Rdash{\d@sh\nobreak\emdash} \newcommand{\La}% {L\kern-.36em {\setbox0\hbox{T}% \vbox to\ht0{\hbox{$\m@th$% \csname S@\f@size\endcsname \fontsize\sf@size\z@ \math@fontsfalse\selectfont A}% \vss}% }} \IfFileExists{mflogo.sty}% {\RequirePackage{mflogo}}% {\TBWarning {Package mflogo.sty not available --\MessageBreak Proceeding to emulate mflogo.sty} \DeclareRobustCommand\logofamily{% \not@math@alphabet\logofamily\relax \fontencoding{U}\fontfamily{logo}\selectfont} \DeclareTextFontCommand{\textlogo}{\logofamily} \def\MF{\textlogo{META}\-\textlogo{FONT}\@} \def\MP{\textlogo{META}\-\textlogo{POST}\@} \DeclareFontFamily{U}{logo}{} \DeclareFontShape{U}{logo}{m}{n}{% <8><9>gen*logo% <10><10.95><12><14.4><17.28><20.74><24.88>logo10% }{} \DeclareFontShape{U}{logo}{m}{sl}{% <8><9>gen*logosl% <10><10.95><12><14.4><17.28><20.74><24.88>logosl10% }{} \DeclareFontShape{U}{logo}{m}{it}{% <->ssub*logo/m/sl% }{}% } \makeatother \def\AllTeX{(\La\kern-.075em)\kern-.075em\TeX} \usepackage{shortvrb} \MakeShortVerb{\|} \DeclareRobustCommand\cs[1]{\texttt{\char`\\#1}} \newcommand{\TeXOmega}{Omega} \DeclareRobustCommand\eTeX{\ensuremath{\varepsilon}-\kern-.125em\TeX} \DeclareRobustCommand\package[1]{\textsf{#1}} \providecommand*{\href}[2]{#2} \newcommand*{\ctanref}[2]{\href{ftp://ftp.ctan.org/#1}{#2}} \title{Writing \ETX\ format font encoding specifications} \author{Lars Hellstr\"om} \date{2003/07/09} \begin{document} \maketitle \begin{abstract} This paper explains how one writes formal specifications of font encodings for \LaTeX\ and suggests a ratification procedure for such specifications. \end{abstract} \tableofcontents \vspace{0mm plus 35mm} \pagebreak[2] \section{Introduction} One of the many difficult problems any creator of a new typesetting system encounters is that of \emph{font construction}\Dash to create fonts that provide all information that the typesetting system needs to do its job. From the early history of \TeX, we learn that this problem is so significant that it motivated the creation of \TeX's companion and equal \MF, whose implementation proved to be an even greater scientific challenge than \TeX\ was. It is also a tell-tale sign that the \texttt{fonts} subtree of the te\TeX\ distribution is about three times as large as the \texttt{tex} subtree: fonts are important, and not at all trivial to generate. The most respected and celebrated part of font construction is \emph{font design}\Ldash the creation from practically nothing of new letter (and symbol) shapes, in pursuit of an artistic vision\Dash but it is also something very few people have the time and skill to carry through. More common is the task of \emph{font installation}, where one has to solve the very concrete problem of how to set up an existing font so that it can be used with \AllTeX. The subproblems in this domain ranges from the very technical\Ldash how to make different pieces of software ``talk'' to each other, for example making information in file format~$A$ available to program~$B$\Dash to the almost artistic\Ldash finding values for glyph metrics and kerns that will make them look good in text\Dash but these extremes tend to be clearly defined even if solving them can be hard, so they are not what will be considered here. Rather, this paper is about a class of more subtle problems that have to do with how a font is organised. The technical name for such a ``font organisation'' is a \emph{font encoding}. In some contexts, font encodings are assumed to be mere mappings from a set of ``slots'' to a set of glyph identifiers, but in \TeX\ the concept entails much more; the various aspects are detailed in subsequent sections. For the moment, it is sufficient to observe that the role that a font encoding plays in a typesetting system is that of a standard: it describes what an author can expect from a font, so that a document or macro package can be written that work with a large class of fonts rather than just for one font family. The world of \AllTeX\ would be very different if papers published in journal $X$ that is printed in commercial font $Y$ could not use essentially the same sources as the author prepared for typesetting in the free font $Z$. Fine-tuning of a document (overfull lines, bad page breaks, etc.\@) depends on the exact font used, but it is a great convenience that one can typeset a well-coded body of text under a rather wide range of layout parameter (of which the main font family is one) values and still expect the result to look decent, often even good. Had font encodings not been standardised, the results might not even have been readable. When font encodings are viewed as standards, the historical states of most \AllTeX\ font encodings becomes rather embarrassing, as they lack something as fundamental as proper specifications! The typical origin of a font encoding has been that some\-one creates a font that behaves noticably different from other fonts, macro packages are then created to support this new font, and in time other people create other fonts that work with the same macros. At the end of this story the new encoding exists, but it is not clear who created it, and there is probably no document that describes all aspects of the encoding. Later contributors have typically had to rely on a combination of imitation of previous works, folklore, and reverse engineering of existing software when trying to figure out what they need to provide, but the results are not always verifiable. Furthermore the errors in this area are usually silent\Ldash the classical error being that a `\textdollar' was substituted for a `\textsterling' (or vice versa)\Dash which means they can only be discovered through careful proofreading, and then only \emph{provided} there at all exists a document which exercises all aspects of the font encoding. Since font encodings interact with hyphenation, exhaustive font verification through proofreading is probably beyond the capabilities of any living \TeX pert on purely linguistic grounds. Proper specifications of font encodings makes the task of font installation\Ldash and to some extent also the task of font design, as it too is subject to the technicalities of font encodings\Dash much simpler, as there is then a document that authoratively gives all details of a font encoding. This paper even goes one step further, and proposes (i)~a standard format for formal specifications of \AllTeX\ font encodings and (ii)~a process through which such specifications can be ratified as \emph{the} specification of a particular encoding. My hope is that future \AllTeX\ font encodings will have proper specifications from the start, as this will greatly simplify making more fonts available in these encodings, and perhaps also make font designers aware of the subtler points of \AllTeX\ font design, as many details have been poorly documented. The proposed file format for encoding specifications is a development of the \textsf{fontinst}~\cite{fontinst-pre} \ETX\ format. One reason for this choice was that it is an established format; many of those who are making fonts already use it, even if for a slightly different purpose. Another major reason is that an \ETX\ file is both a \LaTeX\ document and a processable data file; this is the same kind of bilinguality that has made the \texttt{.dtx} format so useful. Finally the \ETX\ format makes it easy to create experimental font installations when a new encoding is being designed; \textsf{fontinst} can directly read the file, but the file can also be automatically converted to a PostScript encoding vector if that approach seems more convenient. On the other hand, there are some features\Ldash most notably the prominent role of the glyph names\Dash of the \ETX\ format that would probably had been done differently in a file format that was built from scratch, but this is necessary for several of the advantages listed above. % \paragraph*{Why should one make formal specifications?} % Because the informal specifications that we have today % are incomplete and hard to use. E.g.\ the \LaTeX\ % \meta{enc}\texttt{enc.def} files only say something about the % characters that are accessed via commands, and even for those you % really have to do reverse engineering to figure out what the % encoding contains. To figure out what the remaining characters % should do you have to compile what the various user manuals claim % to work and then work backwards from that, but I don't think the % general problem of which character tokens are allowed in input is % thoroughly treated anywhere. On top of that, \LaTeX\ itself % contributes some character tokens when the document is being % typeset.\footnote{This is basically the ``a \texttt{T}$*$ encoding % must contain the characters \ldots'' problem that was the reason % that the \texttt{T2} encoding had to be split up.} % % On the other side of things there are the files which tell e.g. % \textsf{fontinst} or \textsf{AFMtoTFM} what the target font % encoding is. These are basically recipes which are known (?\@) to % produce valid results, and they do usually provide more % information about the encoding than the sources listed above, but % they don't give much information about where the recipe can be % modified. % \paragraph*{Why use the \ETX\ format?} \section{Points to keep in mind} \subsection{Characters, glyphs, and slots} One fundamental difference one must understand is that between characters and glyphs. A \emph{character} is a semantic entity---it carries some meaning, even if you usually have to combine several characters to make up even one word---whereas a \emph{glyph} simply is a piece of graphics. In printed text, glyphs are used to represent characters and the first step of reading is to determine which character(s) a given glyph is representing.\footnote{Some \PDF\ viewers also try to accomplish this, but in general they need extra information to do it right. The generic solution provided is to embed a \emph{ToUnicode CMap}\Ldash which is precisely a map from slots to characters\Rdash in the \PDF\ font object.} In the output, \TeX\ neither deals with characters nor glyphs, really (although many of its messages speak of characters), but with \emph{slots}, which essentially are numbered positions in a font. To \TeX, a slot is simply something which can have certain metric properties (width, height, depth, etc.\@) but to the driver which actually does the printing the slot also specifies a glyph. The same slot in two different fonts can correspond to two quite different characters. For completeness it should also be mentioned that the \emph{input} of \TeX\ is a stream of semantic entities and thus \TeX\ is dealing with characters on that side, but the input is not the subject of this paper. \subsection{Ligatures} In typography, a \emph{ligature} is a glyph which has been formed by joining glyphs that represent two or more characters; this joining can involve quite a lot of deformation of the original shapes. Examples of ligatures are the `fi' ligature (from `f' and `i'), the `\AE' ligature (from `A' and `E'), and the `\textit{\&}' character (from `E' and `t'), the latter two of which has evolved to become characters of their own. For those ligatures (such as `fi') that have not evolved to characters, \TeX\ has a mechanism for forming the ligature out of the characters it is composed from, under the guidance of ligature\slash kerning programs found in the font. More technically, what happens is really that if the |\char| (or equivalent) for one slot is immediately followed by the |\char| (or equivalent) for another (or the same) slot and there is a ligaturing instruction in the \texttt{\small LIGKERN} table of the current font which applies to this slot pair then this ligaturing instruction is executed. This usually replaces the two slots in the pair with a single new slot specified by the ligaturing instruction (it could also keep one or both of the original slots, but that is less common). \TeX\ has no idea about whether these replacements change the meaning of anything, but \TeX\ assumes that it doesn't, and it is up to the font designer to ensure that this is the case. Apart from forming ligatures in text, the ligaturing mechanism of \TeX\ is traditionally also employed for another task which is much more problematic. Ligatures are also used to produce certain characters which are not part of visible ASCII---the most common are the endash (typed as |--|) and the emdash (typed as |---|). This is a problem because it violates \TeX's assumption that the meaning is unchanged; the classical problem with this appears in the \texttt{OT2} encoding, where the Unicode character \texttt{U+0446} (\textsc{cyrillic small letter tse}) could be typed as |ts|, whilst the |t| and |s| by themselves produced Unicode characters \texttt{U+0442} (\textsc{cyrillic small letter te}) and \texttt{U+0441} (\textsc{cyrillic small letter es}) respectively. \TeX's hyphenation mechanism can however decompose ligatures, so it sometimes happened that the \textsc{tse} was hyphenated as \textsc{te}-\textsc{es}, which is quite different from what was intended. Since this is such an obvious disadvantage, the use of ligatures for forming non-English letters quickly disappeared after 8-bit input encodings became available. The practice still remains in use for punctuation, however, and the font designer must be aware of this. For many font encodings there is a set of ligatures which must be present and replace two or more characters by a single, different character. These ligatures are called \emph{mandatory ligatures} in this paper. The use of mandatory ligatures in new font encodings is strongly discouraged, for a number of reasons. The main problem is that they create unhealthy dependencies between input and output encoding, whereas these should ideally be totally independent. Using ligatures in this way complicates the internal representation of text, and it also makes it much harder to typeset text where those ligatures are not wanted (such as verbatim text). Furthermore it creates problems with kerning, since the ``ligature'' has not yet been formed when a kern to the left of it is inserted. Finally, a much better solution (when it is available) is to use an \TeXOmega\ translation process (see~\cite[Sec.~8--11]{Omega-doc}), since that \emph{is} independent of the font, different translations can be combined, and they can easily handle even ``abbreviations'' much more complicated than those ligatures can deal with. \subsection{Output stages} On its way out of \LaTeX\ towards the printed text, a character passes through a number of stages. The following five seem to cover what is relevant for the present discussion: \begin{enumerate} \item \emph{\LaTeX\ Internal Character Representation} (LICR); see~\cite{LaTeXCompanion}, Section~7.11 for a full description. At this point the character is a character token (e.g.~|a|), a text command (e.g.~|\ss|), or a combination (e.g.~|\H{o}|). \item \emph{Horizontal material;} this is what the character is en route from \TeX's mouth to its stomach. For most characters this is equivalent to a single |\char| command (e.g.\ |a| is equivalent to |\char|\,|97|), but some require more than one, some are combined using the |\accent| and |\char| commands, some involve rules and\slash or kerns, and some are built using boxes that arbitrarily combine the above elements. \item \emph{DVI commands;} this is the DVI file commands that produce the printed representation of the character. \item \emph{Printed text;} this is the graphical representation of the character, e.g. as ink on paper or as a pattern on a computer screen. Here the text consists of glyphs. \item \emph{Interpreted text;} this is essentially printed text modulo equivalence of interpretation, hence the text doesn't really reach this stage until someone reads it. Here the text consists of characters. \end{enumerate} In theory there is a universal mapping from LICR to interpreted text, but various technical restrictions make it impossible to simultaneously support the entire mapping. A \LaTeX\ encoding selects a restriction of this mapping to a limited set which will be ``well supported'' (meaning kerning and such between characters in the set works), whereas elements outside this set at best can be supported through temporary encoding changes. The encoding also specifies a decomposition of the mapping into one part which maps LICR to horizontal material and one part which maps horizontal material to interpreted text. The first part is realized by the text command definitions usually found in the \meta{enc}\texttt{enc.def} file for the encoding. The second part is the font encoding, the specification of which is the topic of this paper. It is also worth noticing that an actual font is a mapping of horizontal material to printed text. An alternative decomposition of the mapping from LICR to interpreted text would be at the DVI command level, but even though this decomposition is realized in most \TeX\ implementations, it has very little relevance for the discussion of encodings. The main reason for this is that it depends not only on the encoding of a font, but also on its metrics. Furthermore it is worth noticing that in e.g.\ pdf\TeX\ there need not be a DVI command level. \subsection{Hyphenation} There are strong connections between font encoding and hyphenation because \TeX's hyphenation mechanism operates on horizontal material; more precisely the hyphenation mechanism only works on pieces of horizontal material that are equivalent to sequences of |\char| commands. This implies that hyphenation patterns, as selected via the |\language| parameter, are not only for a specific language, they are also for a specific font encoding. The hyphenation mechanism uses the |\lccode| values to distinguish between three types of slots: lower case letters (|\lccode|\(\,n = n\)), upper case letters (|\lccode|\(\,n \notin \{0,n\}\)), and non-letters (|\lccode|\(\,n = 0\)); only the first two types can be part of a hyphenatable word and only lower case letters are needed in the hyphenation patters. This does however place severe restrictions on how letters can be placed in a text font because \TeX\ uses the same |\lccode| values for all text in a paragraph and therefore these values cannot be changed whenever the encoding changes. In \LaTeX\ the |\lccode| table is not allowed to change at all and consequently all text font encodings must work using the standard set of |\lccode| values. In \eTeX\ each set of hyphenation patterns has its own set of |\lccode| values for hyphenation, so the problem isn't as severe there. The hyphenation mechanism of \TeXOmega\ should become completely independent of the font encoding, although the last time I checked it was still operating on material encoded according to a font encoding. \subsection{Production and specification \ETX\ files} Finally, it is worth pointing out the difference between an \ETX\ file created for the specification of a font encoding and one created for being used in actually producing fonts with this encoding. They are usually not the same. Specification \ETX s certainly may be of direct use in the production of fonts---especially experimental fonts produced as part of the work on a new encoding---but they are usually not ideal for the purpose. In particular there is often a need to switch between alternative names for a glyph to accommodate what is actually in the fonts, but such trickeries are undesirable complications in a specification. On the other hand a production \ETX\ file has little need for verbose comments, whereas they are rather an advantage in a specification \ETX\ file. Therefore one shouldn't be surprised if there are two \ETX\ files for a specific encoding: one which is a specification version and one which is a production version. If both might need to be in the same directory then one should, as a rule of thumb, include a `\texttt{spec}' in the name of the specification version. \section{Font encoding specifications} \label{Sec:FontEncSpec} \subsection{Basic principles} Most features of the font encoding are categorized as either \emph{mandatory} or \emph{ordinary}. The mandatory features are what macros may rely on, whereas the ordinary simply are something which fonts with this encoding normally provide. Font designers may choose to provide other features than the ordinary, but are recommended to provide the ordinary features to the extent that the available resources permit. Many internal references in the specification are in the form of \emph{glyph names} and the choice of these is a slightly tricky matter. From the point of formal specification, the choices can be completely arbitrary, but from the point of practical usefulness they most likely are not. One of the main advantages of the \ETX\ format for specifications is that such specifications can also be used to make experimental implementations, but this requires that the glyph names in the specification are the same as those used in the fonts from which the experimental implementation should be built. Yet another aspect is that the glyph names are best chosen to be the ones one can expect to find in actual fonts, as that will make things easier for other people that want to make non-experimental implementations later. For this last purpose, a good reference is Adobe's technical note on Unicode and glyph names~\cite{unicodesign}. For most common glyphs, \cite{unicodesign} ends up recommending that one should follow the Adobe glyph list~\cite{AGL}, which however has the peculiar trait of recommending names on the form \texttt{afii}\textit{ddddd} (rather than the Unicode-based alternative \texttt{uni}\textit{xxxx}) for most non-latin glyphs. This is somewhat put in perspective by~\cite{ATN5013}. \subsection{Slot assignments} The purpose of the slot assignments is to specify for each slot which character or characters it is mapped to. That one slot is mapped to many characters is an unfortunate, but not very uncommon, reality in many encodings, as limitations in font size have often encouraged identifications of two characters which are almost the same. It should be avoided in new encodings. Slot assignmets are done using the |\nextslot| command and a |\setslot| \dots\ |\endsetslot| construction as follows: \begin{quote} |\nextslot|\marg{slot number}\\* |\setslot|\marg{glyph name}\\* \mbox{\quad}\meta{slot commands}\\* |\endsetslot| \end{quote} A typical example of this is \begin{quote}\begin{verbatim} \nextslot{65} \setslot{A} \Unicode{0041}{LATIN CAPITAL LETTER A} \endsetslot \end{verbatim}\end{quote} which gets typeset as \begin{quote} \textbf{Slot 65 `\texttt{A}'}\\* Unicode character \texttt{U+0041}, \textsc{latin capital letter a}. \end{quote} The |\nextslot| command does not typeset anything; it simply stores the slot number in a counter, for later use by |\setslot|. The |\endsetslot| command increments this counter by one. Hence the |\nextslot| command is unnecessary between |\setslot|s for consecutive slots. Besides |\nextslot|, there is also a command |\skipslots| which increments the slot number counter by a specified amount. The argument of both |\nextslot| and |\skipslots| can be arbitrary \package{fontinst} integer expressions (see~\cite{fontinst-man}). All \TeX\ \meta{number}s that survive full expansion are valid \package{fontinst} integer expressions, but for example |`\~| isn't, as |\~| is a macro which will break before the expression is typeset. These cases can however be fixed by preceding the \TeX\ \meta{number} by |\number|, as |\number`\~| survives full expansion by expanding to |126|. The main duty of the \meta{slot commands} is to specify the target character (or characters) for this slot. The simplest way of doing this is to use the |\Unicode| command, which has the syntax \begin{quote} |\Unicode|\marg{code point}\marg{name} \end{quote} The \meta{code point} is the number of the character (in hexadecimal notation, usually a four-digit number) and the \meta{name} is the name. Case is insignificant in these arguments. If a slot corresponds to a string of characters rather than to a single character, then one uses the |\charseq| command, which has the syntax \begin{quote} |\charseq|\marg{\cs{Unicode} commands} \end{quote} e.g. \begin{quote}\begin{verbatim} \nextslot{30} \setslot{ffi} \charseq{ \Unicode{0066}{LATIN SMALL LETTER F} \Unicode{0066}{LATIN SMALL LETTER F} \Unicode{0069}{LATIN SMALL LETTER I} } \endsetslot \end{verbatim}\end{quote} Several |\Unicode| commands not in the argument of a |\charseq| instead mean that each of the listed characters is a valid interpretation of the slot. If a character cannot be specified in terms of Unicode code points then the specification should simply be a description in text which identifies the character. Such descriptions are written using the |\comment| command \begin{quote} |\comment|\marg{text} \end{quote} It is worth noticing that the \meta{text} is technically only an argument of |\comment| when the program processing the \ETX\ file is ignoring |\comment| commands. This means |\verb| and similar catcode-changing commands \emph{can} be used in the \meta{text}. The |\par| command is on the other hand not allowed in the \meta{text}. The |\comment| command should also be used for any further piece of explanation of or commentary to the character used for the slot, if the exposition seems to need it. There can be any number of |\comment| commands in the \meta{slot commands}. \subsection{Ligatures} There are three classes of ligatures in the font encoding specifications: mandatory, ordinary, and odd. Mandatory ligatures must be present in any font which complies with the encoding, whereas ordinary and odd ligatures need not be. No clear distinction can be made between ordinary and odd ligatures, but a non-mandatory ligature should be categorized as ordinary if it makes sense for the majority of users, and as odd otherwise. Hence the `fi' ligature is categorized as ordinary in the \texttt{T1} encoding (although it makes no sense in Turkish), whereas the `ij' ligature is odd. In the \ETX\ format, a ligature is specified using one of the slot commands \begin{quote} |\Ligature|\marg{ligtype}\marg{right}\marg{new}\\ |\ligature|\marg{ligtype}\marg{right}\marg{new}\\ |\oddligature|\marg{note}\marg{ligtype}\marg{right}\marg{new} \end{quote} |\Ligature| is used for mandatory ligatures, |\ligature| is used for ordinary ligatures, and |\oddligature| is used for odd ligatures. The \meta{right} and \meta{new} arguments are names of the glyphs being assigned to the slots involved in this ligature. The \meta{right} specifies the right part in the slot pair being affected by the ligature, whereas the left part is the one of the |\setslot| \dots\ |\endsetslot| construction in which the ligaturing command is placed. The \meta{new} specifies a new slot which will be inserted by the ligaturing instruction. The \meta{ligtype} is the actual ligaturing instruction that will be used; it must be |LIG|, |/LIG|, |/LIG>|, |LIG/|, |LIG/>|, |/LIG/|, |/LIG/>|, or |/LIG/>>|. The slashes specify retention of the left or right original character; the |>| signs specify passing over that many slots in the result without further ligature processing. \meta{note}, finally, is a piece of text which explains when the odd ligature may be appropriate. It is typeset as a footnote. As an example of ligatures we find the following in the specification of the \texttt{T1} encoding: \begin{quote} |\nextslot{33}|\\ |\setslot{exclam}|\\ | \Unicode{0021}{EXCLAMATION MARK}|\\ | \Ligature{LIG}{quoteleft}{exclamdown}|\\ |\endsetslot| \end{quote} It is typeset as \begin{quote} \textbf{Slot 33 `\texttt{exclam}'}\\* Unicode character \texttt{U+0021}, \textsc{exclamation mark}.\\* \textbf{Mandatory ligature} \texttt{exclam}${}*{}$\texttt{quoteleft}${}\rightarrow {}$\texttt{exclamdown} \end{quote} With other \meta{ligtype}s there may be more names listed on the right hand side and possibly a `$\lfloor$' symbol showing the position at which ligature processing will start afterwards. \subsection{Math font specialities} There are numerous technicalities which are special to math fonts, but only a few of them are exhibited in \ETX\ files.\footnote{For an overview of the subject, see for example Vieth~\cite{Vieth2001}.} Most of these have to do with the \TeX\ mechanisms that find sufficiently large characters for commands like |\left|, |\sqrt|, and |\widetilde|. The first mechanism for this is that a character in a font can sort of say ``If I'm too small, then then try character \dots\ instead''. This is expressed in an \ETX\ file using the |\nextlarger| command, which has the syntax \begin{quote} |\nextlarger|\marg{glyph name} \end{quote} The second mechanism constructs a sufficiently large character from smaller pieces; this is known as a `varchar' or `extensible character'. This is expressed in an \ETX\ file using an ``extensible recipe'', the syntax for which is \begin{quote} |\varchar| \meta{varchar commands} |\endvarchar| \end{quote} where each \meta{varchar command} is one of \begin{quote} |\varrep|\marg{glyph name}\\ |\vartop|\marg{glyph name}\\ |\varmid|\marg{glyph name}\\ |\varbot|\marg{glyph name} \end{quote} There can be at most one of each and their order is irrelevant. The most important is the |\varrep| command, as that is the part which is repeated until the character is sufficiently large. The |\vartop|, |\varmid|, and |\varbot| commands are used to specify some other part which should be put at the top, middle, and bottom of the extensible character respectively. Not all extensible recipes use all of these, however. As an example, here is how a very large left brace is constructed: \begin{center} \begin{tabular}{>{% \fontencoding{OMX}\fontfamily{cmex}\selectfont $\vcenter\bgroup\hbox\bgroup }l<{\egroup\egroup$} l} \char"38& For |\vartop{bracelefttp}|\\ \char"3E& For |\varrep{braceex}|\\ \char"3C& For |\varmid{braceleftmid}|\\ \char"3E& Again for |\varrep{braceex}|\\ \char"3A& For |\varbot{braceleftbt}| \end{tabular} \end{center} Both |\nextlarger| and |\varchar| commands are like |\ligature| in that they describe ordinary features for the encoding; they appear in a specification \ETX\ file mainly to explain the purpose of some ordinary character. There is no such thing as a mandatory |\nextlarger| or |\varchar|, but varchars are occationally used to a similar effect. In these cases, the character generated by the extensible recipe is something quite different from what a |\char| for that slot would produce. Thus for the slot to produce the expected result it must be referenced using a |\delimiter| or |\radical| primitive, since those are the only ones which make use of the extensible recipe. The effect is that the slot has a \emph{semimandatory} assignment; the result of |\char| is unspecified (as for a slot with an ordinary assignment), but the result for a large delimiter or radical is not (as for a slot with a mandatory assignment). Thus some math fonts have an extra section ``Semimandatory characters'' between the mandatory and ordinary character sections. In that section for the \texttt{OMX} encoding we find for example \begin{quote}\begin{verbatim} \nextslot{60} \setslot{braceleftmid} \Unicode{2016}{DOUBLE VERTICAL LINE} \comment{This is the large size of the |\Arrowvert| delimiter, a glyphic variation on |\Vert|. The \texttt{braceleftmid} glyph ordinarily placed in this slot must not be too tall, or else the extensible recipe actually producing the character might sometimes not be used.} \varchar \varrep{arrowvertex} \endvarchar \endsetslot \end{verbatim}\end{quote} which is typeset as \begin{quote} \textbf{Slot 60 `\texttt{braceleftmid}'}\\* Unicode character \texttt{U+2016}, \textsc{double vertical line}.\\ This is the large size of the |\Arrowvert| delimiter, a glyphic variation on |\Vert|. The \texttt{braceleftmid} glyph ordinarily placed in this slot must not be too tall, or else the extensible recipe actually producing the character might sometimes not be used.\\ \textbf{Extensible glyph:}\\* \textbf{Repeated} \texttt{arrowvertex} \end{quote} \subsection{Fontdimens} Each \TeX\ font contains a list of fontdimens, numbered from $1$ and up, which are accessible via the |\fontdimen| \TeX\ primitive. Quite a few are also used implicitly by \TeX\ and therefore cannot be left out even if they are totally irrelevant, but as one can always include some extra fontdimens in a font---the only bounds on how many fontdimens there may be are the general bound on the size of a TFM file and the amount of font memory \TeX\ has available---this is usually not a problem. The reason fontdimens are part of font encoding specifications is that the meaning of e.g.\ |\fontdimen|\,|8| varies between different fonts depending on their encoding; thus the encoding specification must define the quantity stored in each |\fontdimen| parameter. This is done using the |\setfontdimen| command, which has the syntax \begin{quote} |\setfontdimen|\marg{number}\marg{name} \end{quote} The \meta{number} is the fontdimen number (as a sequence of decimal digits where the first digit isn't zero) and the \meta{name} is a symbolic name for the quantity. The standard list of symbolic names for fontdimen quantities appears below; the listed quantities should always be described using the names in this list. Encoding specifications that employ other quantities as fontdimens should include definitions of these quantities. Those quantities that are defined as ``Formula parameter \dots'' have to do with how mathematical formulae are rendered and are usually much too complicated to explain here. For exact definitions of these parameters, the reader is referred to Appendix~G of \textit{The \TeX book}~\cite{TeXbook}. \begin{list}{}{% \setlength\labelwidth{0pt}% \setlength\itemindent{-\leftmargin}% \def\makelabel#1{\hspace{\labelsep}\normalfont\itshape #1}% \setlength\itemsep{0.5\itemsep}% \setlength\parsep{0.5\parsep}% } \item[acccapheight] The height of accented full capitals. \item[ascender] The height of lower case letters with ascenders. \item[axisheight] Formula parameter $\sigma_{22}$. \item[baselineskip] The font designer's recommendation for natural length of the \TeX\ parameter |\baselineskip|. \item[bigopspacing1] Formula parameter $\xi_{9}$. \item[bigopspacing2] Formula parameter $\xi_{10}$. \item[bigopspacing3] Formula parameter $\xi_{11}$. \item[bigopspacing4] Formula parameter $\xi_{12}$. \item[bigopspacing5] Formula parameter $\xi_{13}$. \item[capheight] The height of full capitals. \item[defaultrulethickness] Formula parameter $\xi_{8}$. \item[delim1] Formula parameter $\sigma_{20}$. \item[delim2] Formula parameter $\sigma_{21}$. \item[denom1] Formula parameter $\sigma_{11}$. \item[denom2] Formula parameter $\sigma_{12}$. \item[descender] The depth of lower case letters with descenders. \item[digitwidth] The median width of the digits in the font. \item[extraspace] The natural width of extra interword glue at the end of a sentence. \TeX\ implicitly uses this parameter if |\spacefactor| is $2000$ or more and |\xspaceskip| is zero. \item[interword] The natural width of interword glue (spaces). \TeX\ implicitly uses this parameter unless |\spaceskip| is nonzero. \item[italicslant] The slant per point of the font. Unlike all other fontdimens, it is not proportional to the font size. \item[maxdepth] The maximal depth over all slots in the font. \item[maxheight] The maximal height over all slots in the font. \item[num1] Formula parameter $\sigma_{8}$. \item[num2] Formula parameter $\sigma_{9}$. \item[num3] Formula parameter $\sigma_{10}$. \item[quad] The quad width of the font, normally approximately equal to the font size and\slash or the width of an `M'. Also implicitly available as the length unit |em| and used for determining the size of the length unit |mu|. \item[shrinkword] The (finite) shrink component of interword glue (spaces). \TeX\ implicitly uses this parameter unless |\spaceskip| is nonzero. \item[stretchword] The (finite) stretch component of interword glue (spaces). \TeX\ implicitly uses this parameter unless |\spaceskip| is nonzero. \item[sub1] Formula parameter $\sigma_{16}$. \item[sub2] Formula parameter $\sigma_{17}$. \item[subdrop] Formula parameter $\sigma_{19}$. \item[sup1] Formula parameter $\sigma_{13}$. \item[sup2] Formula parameter $\sigma_{14}$. \item[sup3] Formula parameter $\sigma_{15}$. \item[supdrop] Formula parameter $\sigma_{18}$. \item[verticalstem] The dominant width of vertical stems. This quantity is meant to be used as a measure of how ``dark'' the font is. \item[xheight] The x-height (height of lower case letters without ascenders). Also implicitly available as the length unit |ex|. \end{list} \subsection{The codingscheme} The final encoding-dependent piece of information in a \TeX\ font is the codingscheme, which is essentially a string declaring what encoding the font has. This information is currently only used by programs that convert the information in a \TeX\ font to some other format and these use it to identify the glyphs in the font. Therefore this string should be chosen so that the contents of the slots in the font can be positively identified. Observe that the encoding specification by itself does not provide enough information for this, since there are usually a couple of slots that do not contain mandatory characters. On the other hand, it is not a problem in this context if the font leaves some of the slots (even mandatory ones) empty as that is anyway easily detected. The only problem is with fonts where the slots are assigned to other characters than the ones specified in the encoding. For that reason, it is appropriate to assign two codingscheme strings to each encoding. The main codingscheme is for fonts were all slots (mandatory and ordinary alike) have been assigned according to the specification or have been left empty. The variant codingscheme is for fonts where some ordinary slots have been assigned other characters than the ones listed in the specification, but where the mandatory slots are still assigned according to the specification or are left empty. The font encoding specification should give the main codingscheme name, whereas the variant codingscheme name could be formed by adding \verb*| VARIANT| to the main codingscheme name. Technically the codingscheme is specified by setting the \texttt{codingscheme} string variable. This has the syntax \begin{quote} |\setstr{codingscheme}|\marg{codingscheme name} \end{quote} e.g. \begin{quote} |\setstr{codingscheme}{EXTENDED TEX FONT ENCODING - LATIN}| \end{quote} which is typeset as \begin{quote} \textbf{Default} s(\texttt{codingscheme}) = \verb*|EXTENDED TEX FONT ENCODING - LATIN| \end{quote} A codingscheme name may be at most 40 characters long and may not contain parentheses. If the entire \verb*| VARIANT| cannot be suffixed to a main name because the result becomes to long (as in the above example) then use the first 40 characters of the result. \subsection{Overall document structure} \label{Ssec:Structure} The overall structure of a font encoding specification should be roughly the following \begin{quote} |\relax|\\ |\documentclass[twocolumn]{article}|\\ |\usepackage[specification]{fontdoc}|\\ \meta{preamble}\\ |\begin{document}|\\ \meta{title}\\ \meta{manifest}\\ |\encoding|\\ \meta{body}\\ |\endencoding|\\ \meta{discussion}\\ \meta{change history}\\ \meta{bibliography}\\ |\end{document}| \end{quote} The commands described in the preceding subsections must all go in the \meta{body} part of the document, as that is the only part of the file which actually gets processed as a data file. The part before |\encoding| is skipped and the part after |\endencoding| is never even input, so whatever appears there is only part of the \LaTeX\ document. For the purposes of processing as a data file, the important markers in the file are the |\relax|, the |\endcoding|, and the |\endencoding| commands. The \meta{title} is the usual |\maketitle| (and the like) stuff. The person or persons who appear as author(s) are elsewhere in this paper described as the \emph{encoding proposers}. The \meta{title} should also give the date when the specification was last changed. The \meta{manifest} is an important, although usually pretty short, part of the specification. It is a piece of text which explains the purpose of the encoding (in particular what it can be used for) and the basic ideas (if any) which have been used in its construction. It is often best marked up as an abstract. The \meta{discussion} is the place for any longer comments on the encoding, such as analyses of different implementations, comparisons with other encodings, etc. This is also the place to explain any more general structures in the encoding, such as the arrow kit in the proposed \texttt{MS2} encoding~\cite{ClasenVieth}. In cases where the specification is mainly a formulation of what is already an established standard the \meta{discussion} is often rather short as the relevant discussion has already been published elsewhere, but it is anyway a service to the reader to include this information. References to the original documents should always be given. It might be convenient to include an FAQ section at the end of the discussion. This is particularly suited for explaining things where one has to look for a while and consult the references to find the relevant information. The \meta{change history} documents how the specification has changed over time. It is preferably detailed, as each detail in an encoding is important, but one should not be surprised if it is anyway rather short due to that there haven't been that many changes. The \meta{bibliography} is an important part of the specification. It should at the very least include all the sources which have been used in compiling the encoding specification, regardless of whether they are printed, available on the net, merely ``personal communication'', or something else. It is also a service to the reader to include in the bibliography some more general references for related matters. The \meta{preamble} is just a normal \LaTeX\ preamble and there are no restrictions on defining new commands in it, although use of such commands in the \meta{body} part is subject to the same restrictions as use of any general \LaTeX\ command. The preamble should however \emph{not} load any packages not part of the required suite of \LaTeX\ packages, as that may prevent users who do not have these packages from typesetting the specification. Likewise, the specification should \emph{not} require that some special font is available. Glyph examples for characters are usually better referenced via Unicode character charts than via special fonts. An exception to this rule about packages is that the specification must load the \package{fontdoc} package, as shown in the outline above, since that defines the |\setslot| etc.\ commands that should appear in the \meta{body}. This should not be a problem, as the \package{fontdoc} package can preferably be kept in the same directory as the collection of encoding specifications (see below). The \texttt{specification} option should be passed to the package to let it know that the file being processed is an encoding specification---otherwise |\Ligature| and |\ligature| will get the same formatting, for one. It is not necessary to use the \package{article} document class, and neither must it be passed the \texttt{twocolumn} option, but it is customary to do so. In principle any other document class within required \LaTeX\ will do just as well. If you absolutely think that using some non-required package significantly improves the specification, then try writing the code so that is loads the package only if it is available and provide some kind of fallback definition for sites where it is not. E.g.\ the \package{url} package could be loaded as \begin{verbatim} \IfFileExists{url.sty}{\usepackage{url}}{} \providecommand\url{\verb} \end{verbatim} The |\url| command defined by this is not equivalent to the command defined by the \package{url} package, but it can serve fairly well (with a couple of extra overfull lines as only ill effect) if its use is somewhat restricted. Finally, a technical restriction on the \meta{preamble}, \meta{title}, and \meta{manifest} is that they must not contain any mismatched |\if|s (of any type) or |\fi|s, as \TeX\ conditionals will be used for skipping those parts of the file when it is processed as a data file. If the definition of some macro includes mismatched |\if|s or |\fi|s (this will probably occur only rarely) then include some extra code so that they do match. % All technical parts of the encoding specification (slot assignments, % fontdimens, etc.\@) have to be in the \meta{encoding commands} part. % The other parts are suitably used for longer commentry, such as the % mainfest (see below), revision history, and bibliography. % % When the file is being typeset as a \LaTeX\ document there is nothing % special going on. The |\encoding| and |\endencoding| commands may set % some internal variables, but otherwise they do very little. When the % file is being read by \package{fontinst}, things are quite different. % Everything between the initial |\relax| and |\encoding| is skipped, % and the file is not read further than to the |\endencoding|. Hence % the \meta{preamble}, \meta{\LaTeX\ text 1}, and \meta{\LaTeX\ text 2} % can contain pretty much anything (with a few exceptions) which is % legal in a \LaTeX\ document. \subsection{Encoding specification body syntax} The \meta{body} part of an encoding specification must comply to a much stricter syntax than the rest of the file. The \meta{body} is a sequence of \meta{encoding command}s, each of which should be one of the following: \begin{quote} |\setslot|\marg{glyph name} \meta{slots commands} |\endsetslot|\\ |\nextslot|\marg{number}\\ |\skipslots|\marg{number}\\ |\setfontdimen|\marg{number}\marg{name}\\ |\setstr{codingscheme}|\marg{codingscheme name}\\ |\needsfontinstversion|\marg{version number} \end{quote} The |\needsfontinstversion| command is usually placed immediately after the |\encoding| command. The \meta{version number} must be at least |1.918| for many of the features described in this file to be available, and at least |1.928| if the |\charseq| command is used. The \meta{slot commands} are likewise a sequence of \meta{slot command}s, each of which should be one of the following: \begin{quote} |\Unicode|\marg{code point}\marg{name}\\ |\charseq|\marg{\cs{Unicode} commands}\\ |\comment|\marg{text}\\ |\Ligature|\marg{ligtype}\marg{right}\marg{new}\\ |\ligature|\marg{ligtype}\marg{right}\marg{new}\\ |\oddligature|\marg{note}\marg{ligtype}\marg{right}\marg{new}\\ |\nextlarger|\marg{glyph name}\\ |\varchar| \meta{varchar commands} |\endvarchar| \end{quote} where \meta{varchar commands} similarly is a sequence of \meta{varchar command}s, each of which should be one of the following: \begin{quote} |\varrep|\marg{glyph name}\\ |\vartop|\marg{glyph name}\\ |\varmid|\marg{glyph name}\\ |\varbot|\marg{glyph name} \end{quote} Finally, one can include any number of \meta{comment command}s between any two encoding, slot, or varchar commands. The comment commands are \begin{quote} |\begincomment| \meta{\LaTeX\ text} |\endcomment|\\ |\label|\marg{reference label} \end{quote} The \meta{\LaTeX\ text} can be pretty much any \LaTeX\ code that can appear in conditional text. (|\begincomment| is either |\iffalse| or |\iftrue| depending on whether the encoding specification is processed as a data file or typeset as a \LaTeX\ document respectively. |\endcomment| is always |\fi|.) The |\label| command is just the normal \LaTeX\ |\label| command; when it is used in a \meta{slot commands} it references that particular slot (by number and glyph name). The full syntax of the \ETX\ format can be found in the \package{fontinst} manual~\cite{fontinst-man}, but font encoding specifications only need a subset of that. \subsection{Additional \package{fontdoc} features} There is an ``in comment paragraph'' form |\textunicode| of the |\Unicode| command. Both commands have the same syntax, but |\textunicode| is only allowed in ``comment'' contexts. A typical use of |\textunicode| is \begin{quote} |\comment{An |\dots\\ \quad\dots| this is \textunicode{2012}{FIGURE DASH}; in |\dots\\ |}| \end{quote} which is typeset as \begin{quote} An \dots\ this is \texttt{U+2012} (\textsc{figure dash}); in \dots \end{quote} The \package{fontdoc} package inputs a configuration file \texttt{fontdoc.cfg} if that exists. This can be used to pass additional options to the package. The only currently available options that this could be of interest for are the \texttt{hypertex} and \texttt{pdftex} options, which hyperlinks each \texttt{U+}\dots\ generated by |\Unicode| or |\textunicode| (using Hyper\TeX\ or pdf\TeX\ conventions\footnote{One could just as well do the same thing using some other convention if a suitable definition of \cs{FD@codepoint} is included in \texttt{fontdoc.cfg}. See the \package{fontinst} sources~\cite{fontinst-pre} for more details.} respectively) to a corresponding glyph image on the Unicode consortium website. To use this feature one should put the line \begin{quote} |\ExecuteOptions{hypertex}| \end{quote} or \begin{quote} |\ExecuteOptions{pdftex}| \end{quote} in the \texttt{fontdoc.cfg} file. \emph{Please} do not include this option in the |\usepackage|\nolinebreak[1]|{fontdoc}| of an encoding specification file as that can be a severe annoyance for people whose \TeX\ program or DVI viewers do not support the necessary extensions. % Hyper\TeX\ |\special|s. \section{Font encoding ratification} This section describes a suggested ratification process for font encoding specifications. As there are fewer technical matters that impose restrictions on what it may look like, it is probably more subjective than the other parts of this paper. \medskip A specification in the process of being ratified can be in one of three different stages: \emph{draft}, \emph{beta}, or \emph{final}. Initially the specification is in the draft stage, during which it will be scrutinized and can be subject to major changes. A specification which is in the beta stage has got a formal approval but the encoding in question may still be subject to some minor changes if weighty arguments present themselves. Once the specification has reached the final stage, the encoding may not change at all. \subsection{Getting to the draft stage} The process of taking an encoding to the draft stage can be summarized in the following steps. Being in the draft stage doesn't really say anything about whether the encoding is in any way correct or useful, except in that some people (the encoding proposers) believe it is and are willing to spend some time on ratifying it. \paragraph{Write an encoding specification} The first step is to write a specification for the font encoding in question. This document must not only technically describe the encoding but also explain what the encoding is for and why it was created. See Subsection~\ref{Ssec:Structure} for details on how the document is preferably organised. \paragraph{Request an encoding name} The second step is to write to the \LaTeX3 project and request a \LaTeX\ encoding name for the encoding. This mail should be in the form of a \LaTeX\ bug report, it must be sent to \begin{quote} \href{mailto:latex-bugs@latex-project.org}% {\texttt{latex-bugs@latex-project.org}}, \end{quote} and it must include the encoding specification file. Suggestions for an encoding name are appreciated, but not necessarily accepted. The purpose of this mail is \emph{not} to get an approval of the encoding, but only to have a reasonable name assigned to it. \paragraph{Upload the specification to CTAN} The third step is make the encoding specification publicly available by uploading it to CTAN. Encoding specifications are collected in the \begin{quote} \ctanref{info/encodings}{\texttt{info/encodings}} \end{quote} directory (which should also contain the most recent version of this paper). The name of the uploaded file should be \meta{encoding name}\texttt{draft.etx}. The reason for this naming is that it must be clear that the specification has not yet been ratified. \paragraph{Announce the encoding} When the upload has been confirmed, it is time to announce the encoding by posting a message about it to the relevant forums. Most important is the \texttt{tex-fonts} mailing list, since that is where new encodings should be debated. Messages should also be posted to the \texttt{comp.text.tex} newsgroup and any forums related to the intended use of the encoding: an encoding for Sanskrit should be announced on Indian \TeX\ users forums, an encoding for printing chess positions should be announced on some chess-with-\TeX\ user forum, etc.; in the extent that such forums exist. The full address of the \texttt{tex-fonts} mailing list is \begin{quote} \texttt{tex-fonts@math.utah.edu} \end{quote} This list rejects postings from non-members, so you need to subscribe to it before you can post your announcement. This is done by sending a `subscribe me' mail to \begin{quote} \href{mailto:tex-fonts-request@math.utah.edu} {\texttt{tex-fonts-request@math.utah.edu}} \end{quote} The list archives can be found at \begin{quote} \href{http://www.math.utah.edu/mailman/listinfo/tex-fonts} {\textsc{http:}/\slash \texttt{www.math.utah.edu}\slash \texttt{mailman}\slash \texttt{listinfo}\slash \texttt{tex-fonts}} \end{quote} A tip is to read through the messages from a couple of months before you write up your announcement, as that should help you get acquainted with the normal style on the list. Please do not send messages encoded in markup languages (notably, \textsmaller{HTML}, \textsmaller{XML}, and word processor formats) to the list. \paragraph{Experimental encodings} There is a point in going through the above procedure even for experimental encodings, i.e., encodings whose names start with an \texttt{E}. Of course there is no idea in ratifying a specification of an experimental encoding, as it is very likely to frequently change, but having a proper name assigned to the encoding and uploading its specification to CTAN makes it much simpler for other people to learn about and make references to the encoding. \subsection{From draft to beta stage} The main difference between a draft and beta stage specification respectively is that beta stage specifications have been scrutinized by other people and found to be free of errors. The practical implementation of this is that a debate is held (in the normal anarchical manner of mailing list debates) on the \texttt{tex-fonts} mailing list. In particular the following aspects of the specification should be checked: \begin{enumerate} \item \emph{Is the encoding technically correct?} There are many factors which affect what \TeX\ does and it is easy to overlook some. (The \cs{lccode}s seem to be particularly troublesome, in this respect.) Sometimes fonts simply cannot work as an encoding specifies they should and it is important that such defects in the encoding are discovered on an early stage. \item \emph{Are there any errors in the specification?} A font encoding specification is largely a table and typos are easy to make. Proof-reading may be boring, but it is very, very important. \item \emph{Is the specification sufficiently precise?} Are there any omissions, ambiguities, inaccuracies, or completely irrelevant material in the specification? There shouldn't be. \end{enumerate} During the debate, the encoding proposers should hear what other people have to say about the encoding draft, revise it accordingly when some flaw is pointed out, and upload the revised version. This cycle may well have to be repeated several times before everyone's content. It is worth pointing out that in practice the debate should turn out to be more of a collective authoring of the specification than a defense of its validity. There is no point in going into it expecting the worst. Unfortunately, it might happen that there never is a complete agreement on an encoding specification---depending on what side on takes, either the encoding proposers refuse to correct obvious flaws in it, or someone on the list insists that there is a flaw although there is obviously not---but hopefully that will never happen. If it anyway does happen then the person objecting should send a mail whose subject contains the phrase "formal protest against XXX encoding" (with XXX replaced by whatever the encoding is called) to the list. Then it will be up to the powers that be to decide on the fate of the encoding (see below). \paragraph{Summarize the debate} When the debate on the encoding is over---e.g.\ a month after anyone last posted anything new on the subject---then the encoding proposers should summarize the debate on the encoding specification draft and post this summary as a follow-up on the original mail to \texttt{latex-bugs}. This summary should list the changes that have been made to the encoding, what suggestions there were for changes which have not been included, and whether there were any formal protests against the encoding. The summary should also explain what the proposers want to have be done with the encoding. In the usual case this is having it advanced to beta stage, but the proposers might alternatively at this point have reached the conclusion that the encoding wasn't such a good idea to start with and therefore withdraw it, possibly to come again later with a different proposal. In response to this summary, the \LaTeX-project people may do one of three things: \begin{itemize} \item If the proposers wants the encoding specification advanced and there are no formal protests against this, then the encoding should be advanced to the beta stage. The \LaTeX-project people do this by adding the encoding to the list of approved (beta or final stage) encodings that they [presumably] maintain. \item If the proposers want to withdraw the encoding specification then the name assigned to it should once again be made available for use for other encodings. \item If the proposers want the encoding specification advanced but there is some formal protest against this, then the entire matter should be handed over to some suitable authority, as a suggestion some technical TUG committee, for resolution. \end{itemize} \paragraph{Update the specification on CTAN} When the specification has reached the beta stage, its file on CTAN should be updated to say so. In particular the file name should be changed from \meta{encoding name}\texttt{draft.etx} to \meta{encoding name}\texttt{spec.etx}. \paragraph{Modifying beta stage encodings} If a beta stage encoding is modified then the revised specification should go through the above procedure of ratification again before it can replace the previous \meta{encoding name}\texttt{spec.etx} file on CTAN. The revised version should thus initially be uploaded as \meta{encoding name}\texttt{draft.etx}, reannounced, and redebated. It can however be expected that such debates will not be as extensive as the original debates. \subsection{From beta stage to final stage} The requirements for going from beta stage to final stage are more about showing that the encoding has reached a certain maturity than about demonstrating any technical merits of it. The main difference in usefulness between a beta stage encoding and a final stage encoding is that the latter can be considered safe for archival purposes, whereas one should have certain reservations against such use of beta stage encodings. It seems reasonable that the following conditions should have to be fulfilled before a beta stage encoding can be made a final stage encoding: \begin{itemize} \item At least one year must have passed since the last change was made to the specification. \item At least two people other than the proposer must have succeeded in implemented the encoding in a font. \end{itemize} It is quite possible that some condition should be added or some of the above conditions reformulated. % References updated 2004/08/07. \begin{thebibliography}{???} \bibitem{ATN5013} Adobe Systems Incorporated: \textit{Adobe Standard Cyrillic Font Specification}, Adobe Technical Note \#5013, 1998; \href{http://partners.adobe.com/asn/developer/pdfs/tn/% 5013.Cyrillic_Font_Spec.pdf}{\textsc{http}:/\slash \texttt{partners.adobe.com}\slash \texttt{asn}\slash \texttt{developer}\slash \texttt{pdfs}\slash \texttt{tn}\slash \texttt{5013.Cyrillic\_Font\_Spec.pdf}}. \bibitem{AGL} Adobe Systems Incorporated: \textit{Adobe Glyph List}, text file, 1998, \href{http://partners.adobe.com/asn/developer/type/glyphlist.txt} {\textsc{http}:/\slash \texttt{partners.adobe.com}\slash \texttt{asn}\slash \texttt{developer}\slash \texttt{type}\slash \texttt{glyphlist.txt}}. \bibitem{unicodesign} Adobe Systems Incorporated: \textit{Adobe Solutions Network: Unicode and Glyph Names}, web page, 1998, \href{http://partners.adobe.com/asn/developer/type/unicodegn.html} {\textsc{http}:/\slash \texttt{partners.adobe.com}\slash \texttt{asn}\slash \texttt{developer}\slash \texttt{type}\slash \texttt{unicodegn.html}}. \bibitem{ClasenVieth} Matthias Clasen and Ulrik Vieth: \textit{Towards a new Math Font Encoding for (La)TeX}, March 1998, presented at EuroTeX'98; \href{http://tug.org/twg/mfg/papers/current/mfg-euro-all.ps.gz} {\textsc{http}:/\slash \texttt{tug.org}\slash \texttt{twg}\slash \texttt{mfg}\slash \texttt{papers}\slash \texttt{current}\slash \texttt{mfg-euro-all.ps.gz}}. \bibitem{fontinst-man} Alan Jeffrey, Rowland McDonnell, Ulrik Vieth, and Lars Hellstr\"om: \textit{\package{fontinst}---font installation software for \TeX} (manual), 2004, \ctanref{fonts/utilities/fontinst/doc/fontinst.tex}{% \textsc{ctan}:\discretionary{}{}{\thinspace}% \texttt{fonts}\slash \texttt{utilities}\slash \texttt{fontinst}\slash \texttt{doc}\slash \texttt{fontinst.tex}}. % \bibitem{fontinst} % Alan Jeffrey, Sebastian Rahtz, and Ulrik Vieth: % \textit{The \package{fontinst} utility}, documented source code, % v\,1.801, % \ctanref{fonts/utilities/fontinst/source}{% % \textsc{ctan}:\discretionary{}{}{\thinspace}% % \texttt{fonts}\slash \texttt{utilities}\slash % \texttt{fontinst}\slash \texttt{source}/}. \bibitem{fontinst-pre} Alan Jeffrey, Sebastian Rahtz, Ulrik Vieth, and Lars Hellstr\"om: \textit{The \package{fontinst} utility}, documented source code, v\,1.9xx, \ctanref{fonts/utilities/fontinst/source}{% \textsc{ctan}:\discretionary{}{}{\thinspace}% \texttt{fonts}\slash \texttt{utilities}\slash \texttt{fontinst}\slash \texttt{source}/}. \bibitem{TeXbook} Donald E.\ Knuth, Duane Bibby (illustrations): \textit{The \TeX book}, Ad\-di\-son--Wes\-ley, 1991; volume A of \textit{Computers and typesetting}. \bibitem{LaTeXCompanion} Frank Mittelbach and Michel Goossens, with Johannes Braams, David Carlisle, and Chris Rowley: \textit{The \LaTeX\ Companion} (second edition), Ad\-di\-son--Wes\-ley, 2004; ISBN~0-201-36299-6. \bibitem{Omega-doc} John Plaice and Yannis Haralambous: \textit{Draft documentation for the Omega system}, version~1.12, 1999; \href{http://omega.cse.unsw.edu.au:8080/doc-1.12.ps}{% \textsc{http:}/\slash \texttt{omega.cse.unsw.edu.au:8080}\slash \texttt{doc-1.12.ps}}. % \textsc{ctan}:\discretionary{}{}{\thinspace}% % \texttt{systems}\slash \texttt{omega}\slash % \texttt{omega-doc-1.8.tar.gz}. \bibitem{Vieth2001} Ulrik Vieth: \textit{Math typesetting in \TeX: The~good, the~bad, the~ugly}, to appear in the proceedings of Euro\TeX\ 2001; \href{http://www.ntg.nl/eurotex/vieth.pdf}{% \textsc{http}:/\slash \texttt{www.ntg.nl}\slash \texttt{eurotex}\slash \texttt{vieth.pdf}}. \end{thebibliography} \end{document}