\iffalse meta-comment File: l3term-glossary.tex Copyright (C) 2018-2024 The LaTeX Project It may be distributed and/or modified under the conditions of the LaTeX Project Public License (LPPL), either version 1.3c of this license or (at your option) any later version. The latest version of this license is in the file https://www.latex-project.org/lppl.txt This file is part of the "l3kernel bundle" (The Work in LPPL) and all files in that bundle must be distributed together. The released version of this bundle is available from CTAN. \fi \documentclass{l3doc} \title{% Glossary of \TeX{} terms used to describe \LaTeX3 functions% } \author{% The \LaTeX{} Project\thanks {% E-mail: \href{mailto:latex-team@latex-project.org}% {latex-team@latex-project.org}% }% } \date{Released 2024-05-08} \newcommand{\TF}{\textit{(TF)}} \begin{document} \maketitle This file describes aspects of \TeX{} programming that are relevant in a \pkg{expl3} context. \section{Reading a file} Tokenization. Treatment of spaces, such as the trap that \verb|\~~a| is equivalent to \verb|\~a| in \pkg{expl3} syntax, or that \verb|~| fails to give a space at the beginning of a line. \section{Structure of tokens} We refer to the documentation of \texttt{l3token} for a complete description of all \TeX{} tokens. We distinguish the meaning of the token, which controls the expansion of the token and its effect on \TeX{}'s state, and its shape, which is used when comparing token lists such as for delimited arguments. At any given time two tokens of the same shape automatically have the same meaning, but the converse does not hold, and the meaning associated with a given shape change when doing assignments. Apart from a few exceptions, a token has one of the following shapes. \begin{itemize} \item A control sequence, characterized by the sequence of characters that constitute its name: for instance, \cs{use:n} is a five-letter control sequence. \item An active character token, characterized by its character code (between $0$ and $1114111$ for \LuaTeX{} and \XeTeX{} and less for other engines) and category code~$13$. \item A character token such as |A| or |#|, characterized by its character code and category code (one of $1$, $2$, $3$, $4$, $6$, $7$, $8$, $10$, $11$ or~$12$ whose meaning is described below). \end{itemize} The meaning of a (non-active) character token is fixed by its category code (and character code) and cannot be changed. We call these tokens \emph{explicit} character tokens. Category codes that a character token can have are listed below by giving a sample output of the \TeX{} primitive \tn{meaning}, together with their \pkg{expl3} names and most common example: \begin{itemize} \item[1] begin-group character (|group_begin|, often |{|), \item[2] end-group character (|group_end|, often |}|), \item[3] math shift character (|math_toggle|, often |$|), % $ \item[4] alignment tab character (|alignment|, often |&|), \item[6] macro parameter character (|parameter|, often |#|), \item[7] superscript character (|math_superscript|, often |^|), \item[8] subscript character (|math_subscript|, often |_|), \item[10] blank space (|space|, often character code~$32$), \item[11] the letter (|letter|, such as |A|), \item[12] the character (|other|, such as |0|). \end{itemize} Category code~$13$ (|active|) is discussed below. Input characters can also have several other category codes which do not lead to character tokens for later processing: $0$~(|escape|), $5$~(|end_line|), $9$~(|ignore|), $14$~(|comment|), and $15$~(|invalid|). The meaning of a control sequence or active character can be identical to that of any character token listed above (with any character code), and we call such tokens \emph{implicit} character tokens. The meaning is otherwise in the following list: \begin{itemize} \item a macro, used in \pkg{expl3} for most functions and some variables (|tl|, |fp|, |seq|, \ldots{}), \item a primitive such as \tn{def} or \tn{topmark}, used in \pkg{expl3} for some functions, \item a register such as \tn{count}|123|, used in \pkg{expl3} for the implementation of some variables (|int|, |dim|, \ldots{}), \item a constant integer such as \tn{char}|"56| or \tn{mathchar}|"121|, used when defining a constant using \cs{int_const:Nn}, \item a font selection command, \item undefined. \end{itemize} Macros can be \tn{protected} or not, \tn{long} or not (the opposite of what \pkg{expl3} calls |nopar|), and \tn{outer} or not (unused in \pkg{expl3}). Their \tn{meaning} takes the form \begin{quote} \meta{prefix} |macro:|\meta{argument}|->|\meta{replacement} \end{quote} where \meta{prefix} is among \tn{protected}\tn{long}\tn{outer}, \meta{argument} describes parameters that the macro expects, such as |#1#2#3|, and \meta{replacement} describes how the parameters are manipulated, such as~|\int_eval:n{#2+#1*#3}|. This information can be accessed by \cs{cs_prefix_spec:N}, \cs{cs_parameter_spec:N}, \cs{cs_replacement_spec:N}. When a macro takes an undelimited argument, explicit space characters (with character code $32$ and category code $10$) are ignored. If the following token is an explicit character token with category code $1$ (begin-group) and an arbitrary character code, then \TeX{} scans ahead to obtain an equal number of explicit character tokens with category code $1$ (begin-group) and $2$ (end-group), and the resulting list of tokens (with outer braces removed) becomes the argument. Otherwise, a single token is taken as the argument for the macro: we call such single tokens \enquote{N-type}, as they are suitable to be used as an argument for a function with the signature~\texttt{:N}. When a macro takes a delimited argument \TeX{} scans ahead until finding the delimiter (outside any pairs of begin-group/end-group explicit characters), and the resulting list of tokens (with outer braces removed) becomes the argument. Note that explicit space characters at the start of the argument are \emph{not} ignored in this case (and they prevent brace-stripping). \section{Handling of hash tokens} \TeX{} uses the hash (octothorpe) character |#| to denote parameters for macros: these must be numbered sequentially. To allow handling of nested macros, \TeX{} requires that for each nesting level, hash tokens are doubled. For example \begin{verbatim} \cs_new:Npn \mypkg_outer:N #1 { \cs_new:Npn \mypkg_inner:N ##1 { #1 ##1 } } \end{verbatim} would define both |\mypkg_outer:N| and |\mypkg_inner:N| as taking exactly one argument. If we then do \begin{verbatim} \mypkg_outer:N \foo \cs_show:N \mypkg_inner:N \end{verbatim} \TeX{} will report \begin{verbatim} > \mypkg_inner:N=\long macro:#1->\foo #1. \end{verbatim} i.e.~the hash is not doubled, but is now the parameter of this macro. Exactly the same concept applies to anywhere that inline code is nested in \pkg{expl3}, for example inline mapping code, key definitions, etc. \end{document}