\documentstyle{article} \makeindex \input{windex} \begin{document} \bigskip \centerline{\Large \bf Windex: An index compilation program} \centerline{by Ronald L. Rivest} \centerline{MIT Laboratory for Computer Science} \centerline{\tt rivest@theory.lcs.mit.edu} \centerline{\today} \bigskip Windex is a system to assist in compiling an index for a Latex document. It consists of a modification to the Latex \verb+\index+ command and a C program for compiling the index. \section{Using windex --- basic case} To use windex in its simplest form, you need to obtain (or obtain access to) two files: \begin{itemize} \item {\tt windex.tex}---a Latex macro package. \item {\tt windex}---the compiled form of {\tt windex.c}. \end{itemize} You should then do the following: \begin{itemize} \item Include the lines \begin{quote} \verb+\makeindex+ \\ \verb+\include{windex}+ \\ \end{quote} before the \verb+\begin{document}+ command in your latex file. \item Include an index command of the form \verb+\index{term}+ at the position you wish to index \verb+term+ in your document, for each such \verb+term+. \item Compile your document with Latex. If your document is called \verb+paper+, then Latex will create a file called \verb+paper.idx+ containing the raw index information. \item Issue the command \verb+windex paper+. The windex program will read the file \verb+paper.idx+ and produce as output a file called \verb+paper.index+ containing the actual Latex commands necessary to produce the index. \item Include the command \verb+\input{paper.index}+ at the place you wish the index to appear in your paper. \item Recompile \verb+paper+ with latex. \item Review the index produced and modify your paper as necessary until the index produced is satisfactory. \end{itemize} The output index will be alphabetized, and page ranges will be automatically created when possible to conserve space. For example, the page numbers ``20, 21, 22, 23'' will be condensed to read ``20--23''. This provides a very simple indexing capability. Windex also has a number of sophisticated features that you can use for more complicated indices. In particular, windex supports the following capabilities: \begin{itemize} \item using multi-level entries (up to three levels), \item using \verb+\index+ commands in the source document to indicate where a range of relevant pages begins and where it ends, \item separately specifying the text to sort an entry by, and the text to print, \item formatting control for page numbers, such as allowing italicized page numbers, or replacing the page number with text of the form ``See fruit.'' for cross-references. \end{itemize} The following sections give details on using the \verb+\index+ command to use these features and obtain satisfactory results. \section{Placing index commands in your document} In general, you should place the \verb+\index+ immediately {\em before} the term being indexed, with no intervening spaces, so that the page reference generated is to the beginning of the term. Once in a while Latex may split a term across pages, and you want the index command to refer to the first page the term is on. As an example, \begin{quote} \verb+falls in the area of \index{machine learning}machine learning, which+\ldots \end{quote} gives the desired behavior, even if a page break occurs between \verb+machine+ and \verb+learning+. The text of the index entry may not contain the characters: \verb+{ } / < > [ ]+. These characters have special meaning to windex, as noted below. The text may contain blanks, hyphens, periods, or other special characters. Windex does not directly support non-arabic page numbers, such as you might have in a preface. If you place an \verb+\index+ command on a page with non-arabic page numbers, you'll get the arabic version of the page number in the index. By using the format control for page numbers, however, you can solve this problem (see Section~\ref{sec:page-format}). \section{Multi-level entries} You can use the ``{\tt /}'' character to separate levels in a multi-level index entry. For example, an entry of the form \verb+\index{fruit/apples}+ on page 10 will produce in the index a two-level entry of the form \begin{quote} fruit\\ \hspace*{2em}apples, 10 \end{quote} You can create an index with up to three levels. For example, an entry of the form \verb+\index{fruit/apples/Macintosh}+ would cause a three-level entry to be generated. Note that you don't need to specify the ``parent'' entries separately. For example, in the above example, the parent entry for ``fruit'' (with no page reference) was automatically generated from the occurrence of \verb+\index{fruit/apples}+. You may have entries for the parent entry too, but a dummy one with no page reference is generated if you don't give one explicitly. \section{Alphabetization} Windex uses the following rules when alphabetizing the index. \begin{itemize} \item The entries are sorted according to their top-level entry, then according to the second-level entry, and so on, in the natural manner. \item Blanks and hyphens are ignored. Thus, we might have as output: \begin{quote} high bettor, 10\\ highest elevation, 12\\ high-point, 15 \end{quote} \item In the second and third levels, initial prepositions and articles are also ignored. Thus, we might have as output: \begin{quote} fruit\\ \hspace*{2em}banana, 15\\ \hspace*{2em}in cakes, 20\\ \hspace*{2em}as a decoration, 30\\ \hspace*{2em}grapefruit, 19 \end{quote} \item Capitalization is ignored unless two entries agree in all but capitalization, in which case the capitalized entry goes second. For example, we might have as output: \begin{quote} apple, 15\\ Apple, 20 \end{quote} \end{itemize} \section{Page ranges} You can place one \verb+\index+ command at the beginning of a range of pages for a given term, and another \verb+\index+ command at the end of the range of pages. The first \verb+\index+ command should contain the character ``{\tt <}'', and the second command should contain the character ``{\tt >}''. For example, the entry \verb+\index{number-theory algorithms/gcd<}+ on page 300 and the corresponding entry \verb+\index{number-theory algorithms/gcd>}+ on page 306 generates as output \begin{quote} number-theory algorithms\\ \hspace*{2em}gcd, 300--306 \end{quote} In addition to the ranges you create explicitly in this manner, windex may create ranges of its own by combining a sequence of consecutive pages references into a page range. Windex may also combine several page ranges into one, or combine extend a range to include additional pages in the obvious manner. In general, windex attempts to describe the pages covered in the most compact form possible. For example, the list of pages and ranges \begin{quote} 301,302,302--305,304,306--308,308,309--311,311--320,321 \end{quote} combines to yield \begin{quote} 301--321 \end{quote} If the range includes other individual pages in their interior, they will be eliminated from the listing (e.g., page 304 in the above example.) If the two index commands you give to specify happen to end up on the same page, then only a single reference, rather than a range, is printed. \section{Controlled printing of index keys} Windex allows you to print the text for a given entry in an arbitrary manner. For example, your index entry terms can be printed in italics or can contain mathematical symbols. The \verb+\index+ command allows you to specify how to print an index term {\em separately} from how to place it in the sorted order. To specify special printing for an index entry, merely include within the argument to the \verb+\index+ command the Latex commands for printing the entry, enclosed within braces. For example, the commands \begin{quote} \verb+\index{beta-decay{$\beta$-decay}}+ on page 12\\ \verb+\index{Lovasz, L.{Lov\'asz, L.}}+ on page 15\\ \verb+\index{fruit/apple{\bf apple}}+ on page 18 \end{quote} might produce as output \begin{quote} $\beta$-decay, 12\\ fruit\\ \hspace*{2em}{\bf apple}, 18\\ Lov\'asz, L., 15 \end{quote} Let us call the text within braces the {\em print value} for that index entry, and call the unenclosed text the {\em sort key} for that index entry. Ordinarily, only the sort key is given, and the print value is automatically determined by windex to be the same as the sort key. (Or, in the case of a multi-level entry, the print value is automatically taken to be the same as the last component of the multi-level entry.) Note that specifying the print value affects only how that entry is printed; the print value is not considered during the alphabetization of the entries. In particular, two entries are considered the same if and only if they have the same sort key. If two entries have the same sort key, but one of them specifies a print value and the other doesn't, then the entries will be considered the same, and printed with the given print value. For example, the commands \begin{quote} \verb+\index{fruit<}+ on page 101\\ \verb+\index{fruit{\bf fruit}>}+ on page 120\\ \verb+\index{fruit}+ on page 130 \end{quote} will produce as output \begin{quote} {\bf fruit}, 101--120, 130 \end{quote} Only one print value can be associated with a given sort key. If two different \verb+\index+ commands specify different print values for the same sort key an error is reported. \section{Controlled printing of page references} \label{sec:page-format} Windex also allows you to control how the page references are printed, including a {\em format} for the page reference in brackets within the argument to the \verb+\index+ command. The digit ``{\tt 0}'', if present within the format, will be replaced with the actual page reference. For example, the commands \begin{quote} \verb+\index{vegetable}+ on page 10 \\ \verb+\index{vegetable[\bf 0]}+ on page 15 \\ \verb+\index{vegetable[\it 0]}+ on page 20 \\ \verb+\index{vegetable[0 ff.]}+ on page 25 \\ \end{quote} will produce as output: \begin{quote} vegetable, 10, {\bf 15}, {\it 20}, 25 ff. \end{quote} Note that the brackets delimit the scope of Latex commands. (That is, there is no need to surround the format with an outer set of braces; the format is automatically surrounded by braces when included in the Latex output.) You can also get roman numerals out in this manner for page references in prefaces, etc. For example, the command \begin{quote} \verb+\index{Knuth, Donald E.[\romanpage{0}]}+ on page {\em xv} \end{quote} where \verb+\romanpage+ has been defined \begin{quote} \verb+\newcounter{pgno}+\\ \verb+\newcommand{\romanpage}[1]{\setcounter{pgno}{#1}\roman{pgno}}+ \end{quote} will produce as output \begin{quote} Knuth, Donald E., {\em xv} \end{quote} Note, however, that the roman format for the page reference doesn't affect where in the list of page references {\em xv} will appear; it will be sorted after page 13 and before page 17. (If this is a problem, you have to edit your {\tt .index} file by hand or modify the {\tt windex.c} program.) A special case occurs when the format does not include the digit ``{\tt 0}'' at all. In this case the format actually specifies a {\em replacement} for the page reference. For example, the commands \begin{quote} \verb+\index{pomme}+ on page 10\\ \verb+\index{pomme[{\em See also} apple]}+ on page 20 \\ \verb+\index{gcd[{\em See} greatest common divisor]}+ on page 30 \end{quote} will produce as output \begin{quote} gcd, {\em See} greatest common divisor\\ pomme, {\em See also} apple, 10 \end{quote} Note that replacement texts are always placed at the beginning of a list of page references, independent of which page the \verb+\index+ command specifying the replacement text occurred on. If no format is given, the default format of ``{\tt [0]}'' is used, causing just the page number to be printed. However, parent entries that are created automatically have a default format of ``{\tt []}'', causing no page reference or replacement text at all to be printed. For example, the two-level command \begin{quote} \verb+\index{fruit/apples}+ \end{quote} is entirely equivalent to the pair of commands \begin{quote} \verb+\index{fruit[]}+\\ \verb+\index{fruit/apples}+ \end{quote} Specifying a format of ``{\tt []}'' can be useful when you wish to give a print value to an automatically created parent entry. For example, if we wish to put both ``fruit'' and ``apple'' in boldface in the above example, the commands \begin{quote} \verb+\index{fruit{\bf fruit}[]}+ on page 10\\ \verb+\index{fruit/apples{\bf apples}}+ on page 10 \end{quote} produces as output \begin{quote} {\bf fruit}\\ \hspace*{2em}{\bf apples}, 10 \end{quote} The specification of formats and replacement texts affects how windex combines page references into ranges. Specifically, pages are only combined into ranges if they have the same format. If they are combined, the range string ``{\tt nnn-mmm}'' is placed at the position of the \verb+0+ in the format string. For example, the commands \begin{quote} \verb+\index{apple[\em 0]}+ on page 301\\ \verb+\index{apple[\em 0]}+ on page 302\\ \verb+\index{apple}+ on page 302\\ \verb+\index{apple}+ on page 303\\ \verb+\index{apple<}+ on page 400\\ \verb+\index{apple[\em 0]<}+ on page 401\\ \verb+\index{apple[\bf 0]}+ on page 403\\ \verb+\index{apple[\bf 0]}+ on page 404\\ \verb+\index{apple>}+ on page 405\\ \verb+\index{apple}+ on page 406\\ \verb+\index{apple[\em 0]>}+ on page 410 \end{quote} produce as output \begin{quote} apple {\em 301--302}, 302--303, 400--406, {\em 401--410}, {\bf 403--404} \end{quote} In such an example, the ranges are sorted into order according to the first page number in each range. If you really want a ``0'' in your format or replacement string, use the Tex command ``\verb+\char48+''. For example, you might have on page 200 \begin{quote} \verb+\index{zero-one programming[\char48-1 programming]}+ \end{quote} which will produce \begin{quote} 0-1 programming, 200 \end{quote} \section{Macros and interfacing with Latex} If all you do to create \verb+\index+ commands is to place them directly in your document, then you don't need to read this section. If, however, you wish to write Latex macros that generate \verb+\index+ macros automatically, then you will find this section helpful. If you use the \verb+\index+ command inside a macro, and there are Latex commands inside the argument to\verb+\index+, then you should precede them with \verb+\string+ to prevent their interpretation until the index is produced. For example, the following defines a variant of the \verb+\index+ command that automatically makes a boldface print value. \begin{quote} \verb+\newcommand{\bfindex}[1]{\index{#1{\string\bf{#1}}}}+ \end{quote} Because of the special way that the \verb+\index+ command parses its argument, writing macros that generate \verb+\index+ commands is rather tricky, and is best avoided. The given macro \verb+\bfindex+, for example, does not work correctly if formats are given containing Latex commands. \section{Implementation notes} The ``{\tt .idx}'' file that is produced by the \verb+\index+ commands consists of one line for each \verb+\index+ command. Each such line has the form. \begin{quote} \verb+text:nn+ \end{quote} where {\tt text} is the argument to the \verb+\index+ command and {\tt nn} is the page number on which that \verb+\index+ command was placed. The C program {\tt windex.c} is extensively commented, and easily modified, should you want slightly different behavior from {\tt windex}. \end{document}