\iffalse meta-comment

File: l3term-glossary.tex

Copyright (C) 2018-2024 The LaTeX Project

It may be distributed and/or modified under the conditions of the
LaTeX Project Public License (LPPL), either version 1.3c of this
license or (at your option) any later version.  The latest version
of this license is in the file

   https://www.latex-project.org/lppl.txt

This file is part of the "l3kernel bundle" (The Work in LPPL)
and all files in that bundle must be distributed together.

The released version of this bundle is available from CTAN.

\fi

\documentclass{l3doc}


\title{%
  Glossary of \TeX{} terms used to describe \LaTeX3 functions%
}
\author{%
  The \LaTeX{} Project\thanks
    {%
      E-mail:
      \href{mailto:latex-team@latex-project.org}%
        {latex-team@latex-project.org}%
    }%
}
\date{Released 2024-09-10}

\newcommand{\TF}{\textit{(TF)}}

\begin{document}

\maketitle

This file describes aspects of \TeX{} programming that are relevant in a
\pkg{expl3} context.

\section{Reading a file}

Tokenization.

Treatment of spaces, such as the trap that \verb|\~~a| is equivalent to
\verb|\~a| in \pkg{expl3} syntax, or that \verb|~| fails to give a space at the
beginning of a line.

\section{Structure of tokens}

We refer to the documentation of \texttt{l3token} for a complete
description of all \TeX{} tokens.  We distinguish the meaning of the
token, which controls the expansion of the token and its effect on
\TeX{}'s state, and its shape, which is used when comparing token lists
such as for delimited arguments.  At any given time two tokens of the
same shape automatically have the same meaning, but the converse does
not hold, and the meaning associated with a given shape change when
doing assignments.

Apart from a few exceptions, a token has one of the following shapes.
\begin{itemize}
  \item A control sequence, characterized by the sequence of characters
    that constitute its name: for instance, \cs{use:n} is a five-letter
    control sequence.
  \item An active character token, characterized by its character code
    (between $0$ and $1114111$ for \LuaTeX{} and \XeTeX{} and less for
    other engines) and category code~$13$.
  \item A character token such as |A| or |#|, characterized by its
    character code and category code (one of $1$, $2$, $3$, $4$, $6$,
    $7$, $8$, $10$, $11$ or~$12$ whose meaning is described below).
\end{itemize}

The meaning of a (non-active) character token is fixed by its category
code (and character code) and cannot be changed.  We call these tokens
\emph{explicit} character tokens.  Category codes that a character token
can have are listed below by giving a sample output of the \TeX{}
primitive \tn{meaning}, together with their \pkg{expl3} names and most
common example:
\begin{itemize}
  \item[1] begin-group character (|group_begin|, often |{|),
  \item[2] end-group character (|group_end|, often |}|),
  \item[3] math shift character (|math_toggle|, often |$|), % $
  \item[4] alignment tab character (|alignment|, often |&|),
  \item[6] macro parameter character (|parameter|, often |#|),
  \item[7] superscript character (|math_superscript|, often |^|),
  \item[8] subscript character (|math_subscript|, often |_|),
  \item[10] blank space (|space|, often character code~$32$),
  \item[11] the letter (|letter|, such as |A|),
  \item[12] the character (|other|, such as |0|).
\end{itemize}
Category code~$13$ (|active|) is discussed below.  Input characters can
also have several other category codes which do not lead to character
tokens for later processing: $0$~(|escape|), $5$~(|end_line|),
$9$~(|ignore|), $14$~(|comment|), and $15$~(|invalid|).

The meaning of a control sequence or active character can be identical
to that of any character token listed above (with any character code),
and we call such tokens \emph{implicit} character tokens.  The meaning
is otherwise in the following list:
\begin{itemize}
  \item a macro, used in \pkg{expl3} for most functions and some variables
    (|tl|, |fp|, |seq|, \ldots{}),
  \item a primitive such as \tn{def} or \tn{topmark}, used in \pkg{expl3}
    for some functions,
  \item a register such as \tn{count}|123|, used in \pkg{expl3} for the
    implementation of some variables (|int|, |dim|, \ldots{}),
  \item a constant integer such as \tn{char}|"56| or
    \tn{mathchar}|"121|, used when defining a constant using
    \cs{int_const:Nn},
  \item a font selection command,
  \item undefined.
\end{itemize}
Macros can be \tn{protected} or not, \tn{long} or not (the opposite of
what \pkg{expl3} calls |nopar|), and \tn{outer} or not (unused in \pkg{expl3}).
Their \tn{meaning} takes the form
\begin{quote}
  \meta{prefix} |macro:|\meta{argument}|->|\meta{replacement}
\end{quote}
where \meta{prefix} is among \tn{protected}\tn{long}\tn{outer},
\meta{argument} describes parameters that the macro expects, such as
|#1#2#3|, and \meta{replacement} describes how the parameters are
manipulated, such as~|\int_eval:n{#2+#1*#3}|.  This information can be
accessed by \cs{cs_prefix_spec:N}, \cs{cs_parameter_spec:N},
\cs{cs_replacement_spec:N}.

When a macro takes an undelimited argument, explicit space characters
(with character code $32$ and category code $10$) are ignored.  If the
following token is an explicit character token with category code $1$
(begin-group) and an arbitrary character code, then \TeX{} scans ahead
to obtain an equal number of explicit character tokens with category
code $1$ (begin-group) and $2$ (end-group), and the resulting list of
tokens (with outer braces removed) becomes the argument.  Otherwise, a
single token is taken as the argument for the macro: we call such single
tokens \enquote{N-type}, as they are suitable to be used as an argument
for a function with the signature~\texttt{:N}.

When a macro takes a delimited argument \TeX{} scans ahead until finding
the delimiter (outside any pairs of begin-group/end-group explicit
characters), and the resulting list of tokens (with outer braces
removed) becomes the argument.  Note that explicit space characters at
the start of the argument are \emph{not} ignored in this case (and they
prevent brace-stripping).

\section{Handling of hash tokens}

\TeX{} uses the hash (octothorpe) character |#| to denote parameters for
macros: these must be numbered sequentially. To allow handling of nested
macros, \TeX{} requires that for each nesting level, hash tokens are doubled.
For example
\begin{verbatim}
\cs_new:Npn \mypkg_outer:N #1
  {
    \cs_new:Npn \mypkg_inner:N ##1
      {
        #1
        ##1
      }
  }
\end{verbatim}
would define both |\mypkg_outer:N| and |\mypkg_inner:N| as taking
exactly one argument. If we then do
\begin{verbatim}
\mypkg_outer:N \foo
\cs_show:N \mypkg_inner:N
\end{verbatim}
\TeX{} will report
\begin{verbatim}
> \mypkg_inner:N=\long macro:#1->\foo #1.
\end{verbatim}
i.e.~the hash is not doubled, but is now the parameter of this macro.

Exactly the same concept applies to anywhere that inline code is nested in
\pkg{expl3}, for example inline mapping code, key definitions, etc.

\end{document}