% !Rnw weave = Sweave

\documentclass[nojss,shortnames,article]{jss}

%% -- LaTeX packages and custom commands ---------------------------------------

%% recommended packages
\usepackage{thumbpdf,lmodern}
\usepackage[utf8]{inputenc}

%% other packages
\usepackage{nicefrac,array}
\usepackage{float}
\usepackage{amsmath,amsfonts,stfloats,footmisc}
% \usepackage[svgnames]{xcolor}
\usepackage{tikz}
\usepackage[all]{xy}
\usepackage{adjustbox}
\usepackage[position=bottom]{subfig}
\captionsetup[subfloat]{%
font=footnotesize,
labelformat=parens,labelsep=space,
listofformat=subparens,
captionskip=.3cm}
\usepackage{xspace}
% \usepackage{color}
% \definecolor{Gray}{gray}{0.85}
%% new custom commands
\newcommand{\class}[1]{`\code{#1}'}
\newcommand{\fct}[1]{\code{#1()}}
\def\att{\text{\raisebox{2pt}{$\scriptstyle \ast$}}}
\newcommand{\id}{\text{\raisebox{1pt}{$\scriptstyle =$}}}
\newcommand{\idn}{\text{\raisebox{1pt}{$\scriptstyle \neq$}}}
\definecolor{Blue}{rgb}{0,0,0.5}
\def\plus{\text{\raisebox{1pt}{$\;\scriptstyle +\;$}}}

\fnbelowfloat
%% For Sweave-based articles about R packages:
%% need no 
\usepackage{Sweave}
\SweaveOpts{engine=R, keep.source = TRUE,concordance=TRUE}
<<preliminaries, echo=FALSE, results=hide>>=
options(prompt = "R> ", continue = "+  ", width = 70, useFancyQuotes = FALSE)
@

%\VignetteIndexEntry{Introduction to the CNA method and package}

%% -- Article metainformation (author, title, ...) -----------------------------


\author{Michael Baumgartner\\University of Bergen, Norway 
   \And Mathias Amb\"uhl\\
  Consult AG, Switzerland}
\Plainauthor{Michael Baumgartner, Mathias Amb\"uhl}


\title{\pkg{cna}: An \proglang{R} Package for Configurational Causal Inference and Modeling}
\Plaintitle{cna: An R Package for Configurational Causal Inference and Modeling}
\Shorttitle{\pkg{cna}: Configurational Causal Inference and Modeling}

%% - \Abstract{} almost as usual
\Abstract{
 The \proglang{R} package \pkg{cna} provides comprehensive functionalities for causal inference and modeling with \emph{Coincidence Analysis} (CNA), which is a configurational comparative meth\-od of causal data analysis. 
In this vignette, we first review the theoretical and methodological background of CNA. Second, we introduce the data types processable by CNA, the package's core analytical functions with their arguments, and some auxiliary functions for data simulations. Third, CNA's output along with relevant fit parameters and output attributes are discussed. Fourth, we provide guidance on how to interpret that output and, in particular, on how to proceed in case of model ambiguities. Finally, some considerations are offered on benchmarking the reliability of CNA.}


\Keywords{configurational comparative methods, set-theoretic methods, Coincidence Analysis, Qualitative Comparative Analysis, INUS causation, Boolean causation}
\Plainkeywords{configurational comparative methods, set-theoretic methods, Coincidence Analysis, Qualitative Comparative Analysis, INUS causation, Boolean causation}


\Address{
  Michael Baumgartner\\
  University of Bergen\\
Department of Philosophy\\
Postboks 7805\\
5020 Bergen\\
Norway\\
  E-mail: \email{michael.baumgartner@uib.no}\\
  URL: \url{https://m-baum.github.io}
\bigskip

  Mathias Amb\"uhl\\
  Consult AG Statistical Services\\
Tramstrasse 10\\
8050 Z\"urich\\
  E-mail: \email{mathias.ambuehl@consultag.ch}
}

\begin{document}
\SweaveOpts{concordance=TRUE}




\section[Introduction]{Introduction} \label{intro}


\emph{Coincidence Analysis} (CNA) is a configurational comparative method of causal data analysis that was introduced for crisp-set (i.e.\ binary) data in (Baumgartner \citeyear{Baumgartner:2007a,Baumgartner:2008,Baumgartner:pars}) and generalized for multi-value and fuzzy-set data in \citep{BaumgartnerfsCNA}. In recent years, CNA has been applied in numerous studies across the social, political, and behavioral sciences, with a particularly rapid uptick in usage in public health, covering a wide range of topics such as colorectal cancer screening, patient safety in nursing homes, implementation of Hepatitis C virus treatments, drug withdrawal, COVID-19 vaccination rates, or the connection between firearm laws and homicide rates.%
% For example, \cite{Dy:2020} used the method to investigate how different implementation strategies %and social network factors
%  affect patient safety culture in medical homes. \cite{Yakovchenko:2020} applied CNA to data on factors affecting the uptake of innovation in the treatment of hepatitis C virus infection, or \cite{Haesebrouck:2019} used it to search for factors influencing EU member states' participation in military operations.
\footnote{The \href{https://www.zotero.org/groups/4567107/coincidence.analysis/library}{Zotero CNA library} provides detailed references to more than 100 applications of CNA. Among other impacts, CNA has been showcased in the flagship journal of implementation science \citep{Birken:CNA}.}
In contrast to more standard methods of data analysis, which primarily quantify effect sizes, CNA belongs to a family of methods designed to group causal influence factors conjunctively (i.e.\ in complex bundles) and disjunctively (i.e.\ on alternative pathways). It is firmly rooted in a so-called regularity theory of causation and it is the only method of its kind that can recover causal structures with multiple outcomes (effects), for example, causal chains.  



Many disciplines investigate causal structures with  one or both of the following features: (i) causes are arranged in complex bundles that only become operative when all of their components are properly co-instantiated, each of which in isolation is ineffective or leads to different outcomes, and (ii) outcomes can be brought about along alternative causal routes such that, when one route is suppressed, the outcome may still be produced via another one. 
For example, from a given set of implementation strategies available to medical facilities, some strategies yield a desired outcome (e.g.\ high uptake of treatment innovation) in combination with certain other strategies, whereas in different combinations the same strategies may have opposite or no effects (e.g.\ \citealp{Yakovchenko:2020}). Or, a variation in a phenotype only occurs if many single-nucleotide polymorphisms  
interact, and various such interactions can independently induce the same phenotype (e.g.\ \citealp{Culverhouse2002}). 
Different labels are used for features (i) and (ii) in different disciplines: ``interactions'', 
``component causation'', ``conjunctural causation'', ``alternative causation'', ``equifinality'', etc. For uniformity's sake, we will subsequently refer to (i) as \emph{conjunctivity} and to (ii) as \emph{disjunctivity} of causation, reflecting the fact that causes form conjunctions and disjunctions, that is, Boolean \textsc{and}- and \textsc{or}-connections. 


Causal structures featuring conjunctivity and disjunctivity pose severe challenges for %both for theories of causation and 
methods of causal data analysis. As many theories of causation entail that it is necessary (though not sufficient) for X to be a cause of Y that there be some kind of dependence (e.g.\ probabilistic or counterfactual) between X and Y, standard methods---for instance, regression and Bayesian network methods---infer that X is \emph{not} a cause of Y if X and Y are \emph{not} pairwise dependent (i.e.\ correlated). %\footnote{Methods of causal inference must be distinguished from methods of causal reasoning (e.g.\ \citealp[5-6]{Peters:2017}). The former are methods discovering or learning causal model from data, and the latter are methods testing given models by, for example, inferring predictions from them. This paper is only concerned with the forme r type of methods.} 
However, structures displaying conjunctivity and disjunctivity often do not exhibit pairwise dependencies. As a very simple illustration, consider the interplay between a person's skills to perform an activity, the challenges posed by that activity, and the actor's autotelic experience of complete involvement with the activity called \emph{flow} \citep{boredom1975}. A binary model of this interplay involves the factors S, with values 0/1 representing low/high skills, C, with 0/1 standing for low/high challenges, and F, with 0/1 representing the absence/presence of flow.  Csikszentmihalyi's (\citeyear[ch.\ 4]{boredom1975}) flow theory entails that flow is triggered if, and only if, skills and challenges are either both high or both low, meaning that $\text{F}\id 1$ has the two alternative causes $\text{S}\id 1\;\&\;\text{C}\id 1$ and $\text{S}\id 0 \; \&\; \text{C}\id 0$.
If the flow theory is true, ideal (i.e.\ non-fragmented, unconfounded, noise-free) data on this structure feature the four configurations $c_1$ to $c_4$ in Table \ref{tab_ex}a, and no others.
\begin{table}[tb]
  {  \centering
\includegraphics[width=8cm]{tab1.pdf}

  }
\caption{Table (a) contains ideal configurational data, where each row depicts a different configuration of the factors S, C and F. Configuration $c_1$, for example, represents cases (units of observation) in which all factors take the value 1, whereas in $c_2$, S and C are 0 and F is 1, etc. Table (b) is the corresponding correlation matrix.}\label{tab_ex}
\end{table}
As can easily be seen from the corresponding correlation matrix in Table \ref{tab_ex}b, there are no pairwise dependencies. % between these three factors. 
In consequence, standard methods will struggle to find the flow model, even when processing ideal data on it.

Although standard methods provide various protocols for tracing interaction effects involving two or three exogenous factors, these interaction calculations face tight computational complexity restrictions when more exogenous factors are involved and quickly run into multicollinearity issues \citep{brambor2006}. Yet, structures with conjunctivity and disjunctivity may be much more complex than the flow model. Consider the electrical circuit in Figure \ref{fig1}a. It comprises a lamp \textsf{L} that can be on or off and four switches \textsf{A} to \textsf{D}, each of which can either be in position 1 or position 0.
There are three alternative conjunctions of switch positions that close the circuit and cause the lamp to be on:
 $\text{A}\id 0$ $\&$ $\text{B}\id 1$ $\&$ $\text{D}\id 1\;\;$ \textsc{or}  $\;\;\text{A}\id 1$ $\&$ $\text{C}\id 0$ $\&$ $\text{D}\id 0\;\;$ \textsc{or}   $\;\;\text{B}\id 0$ $\&$ $\text{C}\id 1$ $\&$ $\text{D}\id 1$.
As the switches are mutually independent, there are $2^4=16$ logically possible configurations of switch positions. For each of these configurations $c_1$ to $c_{16}$, Table \ref{fig1}c lists whether the lamp is on ($\text{L}\id 1$) or off ($\text{L}\id 0$). That table thus contains all and only the empirically possible configurations of the five binary factors representing the switches and the lamp. These are ideal data for the circuit in Figure \ref{fig1}a. 
Yet, even though all of the switch positions are causes of the lamp being on in some combination or other, factors A, B, and C are pairwise \emph{independent} of L; only D is weakly correlated with L, as can be seen from the correlation matrix in Table \ref{fig1}b (which results from Table \ref{fig1}c). 
Standard methods of causal data analysis cannot infer the causal structure behind that circuit from Table \ref{fig1}c. They are not designed to group causes conjunctively and disjunctively. %, rather, their aim is to quantify effect sizes


\setcounter{figure}{1}   
\addtolength{\belowcaptionskip}{-0.3cm}
\renewcommand{\figurename}{Figure/Table}

 \begin{figure}\vspace{-.6cm}
  {  \centering
\includegraphics[width=13.5cm]{fig1.pdf}

  }

    \caption{Diagram (a) depicts a simple electrical circuit with  three single-pole switches \textsf{D}, \textsf{B}, \textsf{A}, one double-pole switch \textsf{C}, and one lamp \textsf{L}. Table (c) comprises ideal data on that circuit and Table (b) the correlation matrix corresponding to that data.}\label{fig1}
  \end{figure}

\renewcommand{\figurename}{Figure}


A switch position as $\text{A}\id 0$ can only be identified as cause of $\text{L}\id 1$ by finding the whole conjunction of switch positions in which $\text{A}\id 0$ is indispensable for closing the circuit. More generally, discovering causal structures exhibiting conjunctivity and disjunctivity calls for a method that tracks causation as defined by a theory not treating a dependence between individual causes and effects as necessary for causation and that embeds values of exogenous factors in complex Boolean \textsc{and}- and \textsc{or}-functions over many other causes, fitting those functions as a whole to the data. But the space of Boolean functions over even a handful of factors is vast. For $n$ binary factors there exist $2^{2^n}$ Boolean functions. For the switch positions in our circuit there exist 65536 Boolean functions; if we add only one additional binary switch that number jumps to 4.3 billion and if we also consider factors with more than two values that number explodes beyond controllability. That means a method capable of correctly discovering causal structures with conjunctivity and disjunctivity must find ways to efficiently  navigate in that vast space of possibilities. This is the purpose of CNA.

CNA takes data on binary, multi-value or fuzzy-set factors as input and infers causal structures as defined by the so-called \emph{INUS theory} from it. %, which may comprise binary, multi-value or  
The INUS theory was first developed by \citet{Mackie:1974} and later refined to the \emph{MINUS theory} by \cite{grasshoff2001} (see also \citealp{BaumFalk}; \citealp{Beirlaen2018}). It defines %causally relevant factor values 
causation in terms of redundancy-free Boolean dependency structures and, importantly, does not require causes and their outcomes to be pairwise dependent. As such, it is custom-built to account for structures featuring conjunctivity and disjunctivity. 

CNA is not the only method for the discovery of (M)INUS structures. Other methods that can be used for that purpose are Logic Regression (\citealp{Ruczinski2003,Kooperberg2005}), which is implemented in the \proglang{R} package \href{https://cran.r-project.org/package=LogicReg}{\pkg{LogicReg}} \citep{LogicReg},\footnote{Another package implementing a variation of Logic Regression is \href{https://cran.r-project.org/package=logicFS}{\pkg{logicFS}}  \citep{logicFS}.} 
 and Qualitative Comparative Analysis (QCA; \citealp{Ragin:2008,Rihoux:2009,cronqvist2009,Thiem2018}), implemented in the \proglang{R} packages \href{https://cran.r-project.org/package=QCApro}{\pkg{QCApro}} \citep{Thiem2018} and \href{https://cran.r-project.org/package=QCA}{\pkg{QCA}} \citep{QCARef}.\footnote{Other useful QCA software include \href{https://cran.r-project.org/package=QCAfalsePositive}{\pkg{QCAfalsePositive}} \citep{QCAfalsePositive} and \href{https://cran.r-project.org/package=SetMethods}{\pkg{SetMethods}} \citep{SetMethods}.}
But CNA is the only method of its kind that can build models with more than one outcome and, hence,
can analyze common-cause and causal chain structures as well as causal cycles and feedbacks. Moreover, unlike the models produced by Logic Regression or Qualitative Comparative Analysis, CNA's models are guaranteed to be redundancy-free, which makes them directly causally interpretable in terms of the (M)INUS theory; and CNA is more successful than any other method at exhaustively uncovering \emph{all} (M)INUS models that fit the data equally well. For detailed comparisons of CNA with Qualitative Comparative Analysis and Logic Regression see \citep{BaumgartnerfsCNA,Swiatczak2021} and \citep{CNA_LR}, respectively.


The \pkg{cna} package reflects and implements CNA's latest stage of development.
This vignette provides a detailed introduction to \pkg{cna}. We first exhibit \pkg{cna}'s theoretical and methodological background. Second, we discuss the main inputs of the package's core function \code{cna()} along with numerous auxiliary functions for data review and simulation. Third, the working of the algorithm implemented in \code{cna()} is presented.
Fourth, we explain \code{cna()}'s output along with relevant fit parameters and output attributes. Fifth, we provide some guidance on how to interpret that output and, in particular, on how to proceed in case of model ambiguities. Finally, some considerations are offered on benchmarking the reliability of \code{cna()}.




\section[Background]{Background}

The (M)INUS theory of causation belongs to the family of so-called \emph{regularity theories}, which have roots as far back as \cite{Hume:1999}). It is a type-level theory of causation  (cf.\ \citealp{BaumCaus}) that analyzes the dependence relation of causal relevance between factors/variables taking on specific values, as in ``$\text{X}\id\chi$ is causally relevant to  $\text{Y}\id\gamma$''. 
It assumes that causation is ultimately a deterministic form of dependence, such that whenever the same complete cause occurs the same effect follows. This entails that indeterministic behavior patterns in data result from insufficient control over background influences generating noise and not from the indeterministic nature of the underlying causal processes. For $\text{X}\id\chi$ to be a (M)INUS cause of $\text{Y}\id\gamma$, $\text{X}\id\chi$ must be a difference-maker of $\text{Y}\id\gamma$, meaning---roughly---that there exists a context in which other causes take constant values and a change from $\text{X}\idn\chi$ to $\text{X}\id\chi$ is associated with a change from $\text{Y}\idn\gamma$ to $\text{Y}\id\gamma$.




To further clarify that theory as well as the characteristics and requirements of inferring (M)INUS structures from empirical data a number of preliminaries are needed. 

\subsection{Factors and their values}

Factors are the basic modeling devices of CNA. They are analogous to (random) variables in statistics, that is, they are functions from (measured) properties into a range of values. They can be used to represent categorical properties that partition sets of units of observation (cases) either into two sets, in case of binary properties, or into more than two (but finitely many) sets, in case of multi-value properties, such that the resulting sets are exhaustive and pairwise disjoint. Factors representing binary properties can be \emph{crisp-set} ($cs$) or \emph{fuzzy-set} ($fs$); the former can take on $0$ and $1$ as possible values, whereas the latter can take on any (continuous) values from the unit interval $[0,1]$. Factors representing multi-value properties are called \emph{multi-value ($mv$) factors}; they can take on any of an open (but finite) number of non-negative integers.  

Values of a $cs$ or $fs$ factor X can be interpreted as membership scores in the set of cases exhibiting the property represented by X.
A case of type $\text{X}\id 1$ is a full member of that set, a case of type $\text{X}\id 0$ is a (full) non-member, and a case of type $\text{X}\id \chi_i$, $0<\chi_i<1$, is a member to degree $\chi_i$. A case is considered a member of X if its membership score $\chi_i$ reaches the 0.5-anchor, that is, $\chi_i \geq 0.5$, and it is a non-member of X if $\chi_i < \ $0.5. An alternative interpretation, which lends itself particularly well for causal modeling, is that ``$\text{X}\id 1$'' stands for the full presence of the property represented by X, ``$\text{X}\id 0$'' for its full absence, and ``$\text{X}\id \chi_i$'' for its partial presence (to degree $\chi_i$). By contrast, the values of an $mv$ factor X designate the particular way in which the property represented by X is exemplified. For instance, if X represents the education of subjects, $\text{X}\id 2$ may stand for ``high school'', with $\text{X}\id 1$ (``no completed primary schooling'') and $\text{X}\id 3$ (``university'') designating other possible property exemplifications. $Mv$ factors taking on one of their possible values also define sets, but the values themselves must not be interpreted as membership scores; rather they denote the relevant property exemplification.


As the explicit ``Factor$=$value'' notation yields convoluted syntactic expressions with increasing model complexity, the \pkg{cna} package uses the following shorthand notation, which is standard in Boolean algebra \citep{Bowran:1965}: membership in a set is expressed by italicized upper case and non-membership by italicized lower case letters. ``$X$'' signifies membership in the set of cases exhibiting the property represented by X and ``$x$'' signifies non-membership in that set. Italicization thus carries meaning: ``X'' designates the factor and ``$X$'' membership in the set of cases with values of X above 0.5. In case of $mv$ factors, value assignments to factors are not abbreviated but always written out, using the ``Factor$=$value'' notation.


\subsection[Boolean operations]{Boolean operations}

The (M)INUS theory defines causation using the Boolean operations of negation ($\neg X$, or $x$), conjunction ($X\att Y$), disjunction ($X + Y$), implication ($X\rightarrow Y$), and equivalence ($X\leftrightarrow Y$).\footnote{Note that ``\att'' and ``+'' are used as in Boolean algebra here, which means, in particular, that they do not represent the linear algebraic (arithmetic) operations of multiplication and addition (notational variants of Boolean ``\att'' and ``+'' are ``$\wedge$'' and ``$\vee$''). For a standard introduction to Boolean algebra see \citep{Bowran:1965}. }  Negation is a unary truth function, the other operations are binary truth functions. That is, they take one resp.\ two truth values as inputs and output a truth value. When applied to $cs$ factors, both their input and output set is $\{0,1\}$. 
Negation is translated by ``not'', conjunction by ``and'', disjunction by ``or'', implication by ``if \ldots then'', and equivalence by ``if and only if (iff)''. Their classical definitions are given in Table \ref{tab1}.
\begin{table}[tb]\centering
\begin{tabular}{|cc|| >{\centering}m{.7cm}|c|c|c|c|} %|c|c|}
\hline
\multicolumn{2}{|c||}{{\small Inputs}}&\multicolumn{5}{|c|}{{\small Outputs}}\\[.1cm]\hline %&\multicolumn{2}{|c|}{{\footnotesize equivalent to ``$\rightarrow$''}}\\[.1cm]\hline
$X$& $Y$ & $\neg X$ & $X\att Y$ & $X + Y$ & $X\rightarrow Y$ & $X\leftrightarrow Y$\\\hline %&$x + Y$& $\neg(X\att y)$\\\hline
$1$& $1$ & $0$& $1$ &$1$&$1$& $1$  \\   
$1$& $0$& $0$&  $0$ &$1$& $0$ & $0$ \\  
$0$& $1$& $1$&  $0$ &$1$&$1$ &  $0$\\   
$0$& $0$& $1$&  $0$ & $0$&$1$ &$1$\\    
  \hline
\end{tabular}\caption{Classical Boolean operations applied to $cs$ factors.
}\label{tab1}
\end{table}

These operations can be straightforwardly applied to $mv$ factors as well, in which case they amount to functions from the $mv$ factors' domain of values into the set $\{0,1\}$. To illustrate, let both X and Y be ternary factors with values from the domain $\{0,1,2\}$. The negation of $\text{X}\id 2$, \emph{viz.}\ $\neg (\text{X}\id 2)$, then returns $1$ iff X is not $2$, meaning iff X is $0$ or $1$. $\text{X}\id 2 \att \text{Y}\id 0$ yields $1$ iff X is $2$ and Y is $0$. $\text{X}\id 2 + \text{Y}\id 0$ returns $1$ iff X is $2$ or Y is $0$. $\text{X}\id 2 \rightarrow \text{Y}\id 0$ yields $1$ iff  X is not $2$ or Y is $0$. $\text{X}\id 2 \leftrightarrow \text{Y}\id 0$ issues $1$ iff either X is $2$ and Y is $0$ or X is not $2$ and Y is not $0$.

For $fs$ factors, the classical Boolean operations must be translated into fuzzy logic. There exist numerous systems of fuzzy logic (for an overview cf.\ \citealp{Hajek1998}), many of which come with their own renderings of Boolean operations. In the context of CNA (and QCA), the following fuzzy-logic rendering is standard: negation $\neg X$ is $1-X$, conjunction $X\att Y$ is the minimum membership score in $X$ and $Y$, i.e., $\min(X,Y)$, disjunction $X+Y$ is the maximum membership score in $X$ and $Y$, i.e., $\max(X,Y)$, an implication $X\rightarrow Y$ is taken to express that the membership score in $X$ is smaller or equal to $Y$ ($X\leq Y$), and an equivalence $X\leftrightarrow Y$ that the membership scores in $X$ and $Y$ are equal ($X=Y$).\label{fs.render}


Based on the implication operator, the notions of \emph{sufficiency} and \emph{necessity} are defined, which are the two Boolean dependencies exploited by the (M)INUS theory:
%
\begin{description}\itemsep0pt
\item[Sufficiency] $X$ is sufficient for $Y$ iff $X\,\rightarrow\, Y$ (or equivalently: $x + Y$; and colloquially:  ``if $X$ is present, then $Y$ is present''); 
\item[Necessity] $X$ is necessary for $Y$ iff $Y\,\rightarrow\, X$ (or equivalently: $x\rightarrow y$ or $y + X$; and colloquially:  ``if $Y$ is present, then $X$ is present''). 
\end{description}
%
\vspace{-.1cm}
Analogously for more complex expressions: 
%
\vspace{-.1cm}
\begin{itemize}\itemsep0pt
\item $\text{X}\id 3\, \att \text{Z}\id 2\;$ is sufficient for $\;\text{Y}\id 4$ \hspace{.15cm} iff \hspace{.15cm}  $\text{X}\id 3 \att \text{Z}\id 2 \;\rightarrow\; \text{Y}\id 4$;
\item $\text{X}\id 3\, +\, \text{Z}\id 2\;$ is necessary for $\;\text{Y}\id 4$ \hspace{.15cm} iff \hspace{.15cm}  $\text{Y}\id 4\;\rightarrow\; \text{X}\id 3\,+ \, \text{Z}\id 2$; 
\item $\text{X}\id 3\,+\, \text{Z}\id 2\;$ is sufficient and necessary for $\;\text{Y}\id 4$ \hspace{.15cm} iff  \hspace{.15cm} $\text{X}\id 3\,+\,  \text{Z}\id 2 \;\leftrightarrow\; \text{Y}\id 4$.
\end{itemize}


\newpage

\subsection[(M)INUS causation]{(M)INUS causation}\label{models}
% \nopagebreak
Boolean dependencies of sufficiency and necessity amount to mere patterns of co-occurrence of factor values; as such, they carry no causal connotations whatsoever. In fact, most Boolean dependencies do not reflect causal dependencies. 
To just mention two well-rehearsed examples: the sinking of a properly functioning barometer in combination with high temperatures and blue skies is sufficient for weather changes, but it does not cause the weather; or whenever it rains, the street gets wet, hence, wetness of the street is necessary for rainfall but certainly not causally relevant for it. At the same time, some dependencies of sufficiency and necessity are in fact due to underlying causal dependencies: rainfall is sufficient for wet streets and also a cause thereof, or the presence of oxygen is necessary for fires and also a cause thereof. 

That means the crucial problem to be solved by the (M)INUS theory is to filter out those Boolean dependencies that are due to underlying causal dependencies and are, hence, amenable to a causal interpretation. The main reason why most sufficiency and necessity relations %structures of Boolean dependencies
 do not reflect causation is that they either contain redundancies or are themselves redundant to account for the behavior of the outcome, whereas causal conditions do not feature redundant elements and are themselves indispensable to account for the outcome in at least one context.
Accordingly, to filter out the causally interpretable Boolean dependencies, they need to be freed of redundancies. 
In Mackie's (\citeyear[62]{Mackie:1974}) words, causes are \emph{I}nsufficient but \emph{N}on-redundant parts of \emph{U}nnecessary but \emph{S}ufficient conditions (thus the acronym INUS).


While Mackie's INUS theory only requires that sufficient conditions be freed of redundancies, he himself formulates a problem for that theory, \emph{viz.}\ the \emph{Manchester Factory Hooters} problem \citep[81-87]{Mackie:1974}, which \cite{grasshoff2001} solve by eliminating redundancies also from necessary conditions. Accordingly, modern versions of the INUS theory stipulate that whatever can be removed from  sufficient or necessary conditions without affecting their sufficiency and necessity is not a difference-maker and, hence, not a cause. The causally interesting sufficient and necessary conditions are \emph{minimal} in the following sense: % that they do not contain sufficient and necessary proper parts. More explicitly: 
\begin{description}
\item[Minimal sufficiency]A conjunction $\Phi$ of coincidently instantiated factor values (e.g., $X_{1} \att \ldots\\ \att X_{n}$) is a minimally sufficient condition of $Y$ iff $\Phi\,\rightarrow \,Y$ and there does not exist a proper part $\Phi^{\prime}$ of $\Phi$ such that $\Phi^{\prime} \,\rightarrow\, Y$, where a proper part $\Phi^{\prime}$ of $\Phi$ is the result of eliminating one or more conjuncts from $\Phi$.
\item[Minimal necessity] A disjunction $\Psi$ of minimally sufficient conditions (e.g., $\Phi_{1}  + \ldots + \Phi_{n}$) is a minimally necessary condition of $Y$ iff %$\Psi$ is necessary for $Y$ 
$Y \,\rightarrow\, \Psi$ and there does not exist a proper part $\Psi^{\prime}$ of $\Psi$ such that $Y \,\rightarrow\, \Psi^{\prime}$, where a proper part $\Psi^{\prime}$ of $\Psi$ is the result of eliminating one or more disjuncts from $\Psi$.
\end{description}
Minimally sufficient and minimally necessary conditions can be combined to so-called \emph{atomic MINUS-formulas} (\citealp{Beirlaen2018}; or, equivalently, \emph{minimal theories}, \citealp{grasshoff2001}): %\footnote{We provide suitably simplified definitions that suffice for our purposes here. For complete definitions see \citep{BaumFalk}.}
\begin{description}
\item[Atomic MINUS-formula] An atomic MINUS-formula of an outcome $Y$ is an expression $\Psi \leftrightarrow Y$, where $\Psi$ is a minimally necessary disjunction of minimally sufficient conditions of $Y$, in disjunctive normal form (DNF).\footnote{An expression is in DNF iff it is a disjunction of one or more conjunctions of one or more literals (i.e.\ factor values; \citealp[190]{Lemmon:1965}; or \citealp[13]{Bowran:1965}). \label{fnDNF}}
\end{description}

Atomic MINUS-formulas can represent structures with one outcome only. To represent structures with more than one outcome, atomic MINUS-formulas are conjunctively combined to \emph{complex MINUS-formulas}. But conjunctive concatenation can introduce new redundancies. It is possible that a conjunction of, say, three atomic MINUS-formulas $\mathbf{m}_1\att \mathbf{m}_2 \att \mathbf{m}_3$ is logically equivalent to a conjunction of only two of them, for instance, $\mathbf{m}_1\att \mathbf{m}_2$ (see \citealp{BaumFalk} for a concrete example). In that case, $\mathbf{m}_3$ makes no difference to the behavior of the factors in the structure, meaning it is redundant. \cite{BaumFalk} call this a \emph{structural redundancy}. Consequently, the definition of a complex MINUS-formula must include an additional non-redundancy constraint: 
\begin{description}
\item[Complex MINUS-formula] A complex MINUS-formula of outcomes $Y_1,\ldots, Y_n$ is a conjunction $(\Psi_1 \leftrightarrow Y_1)\att\ldots\att(\Psi_n \leftrightarrow Y_n)$ of atomic MINUS-formulas that is itself free of structural redundancies. %, meaning it is not logically equivalent to a proper part of itself. 
\label{minus_formula}
\end{description}
Both atomic and complex MINUS-formulas are referred to as \emph{MINUS-formulas}, for short. They serve as a bridge between Boolean dependencies and causal dependencies: only those Boolean dependencies are causally interpretable that appear in MINUS-formulas. To make this concrete, consider the following atomic exemplar:
\begin{equation}\label{ex1}
A\att e \,+ \,C\att d \;\leftrightarrow\; B 
\end{equation}
\eqref{ex1} being a MINUS-formula of $B$ entails that $A\att e$ and $C\att d$, but neither $A$, $e$, $C$, nor $d$ alone, are sufficient for $B$ and that $A\att e \,+ \,C\att d$, but neither $A\att e$ nor $C\att d$ alone, are necessary for $B$. If this holds, it follows that for each (appearance of a) factor value in \eqref{ex1} there exists a \emph{difference-making pair}, meaning a pair of configurations such that a change in that factor value alone accounts for a change in the outcome \citep{BaumFalk}. For example, $A$ being part of the MINUS-formula \eqref{ex1} entails that there are two configurations $\sigma_i$ and $\sigma_j$ such that $e$ is given and $C\att d$ is not given in both $\sigma_i$ and $\sigma_j$, while $\sigma_i$ features $A$ and $B$ and $\sigma_j$ features $a$ and $b$.  Only if such a difference-making pair $\langle \sigma_i,\sigma_j\rangle$ exists is $A$ indispensable to account for $B$. Analogously, \eqref{ex1} being a MINUS-formula entails that there exist difference-making pairs for all other (appearances of) factor values in \eqref{ex1}.



To define causation in terms of Boolean difference-making,  
an additional condition is needed because not all MINUS-formulas faithfully represent causation. Complete redundancy elimination is relative to the set of analyzed factors $\mathbf{F}$, meaning that factor values contained in a MINUS-formula relative to some $\mathbf{F}$ may fail to be part of a MINUS-formula relative to supersets of $\mathbf{F}$ \citep{Baumgartner:actual}. In other words, by adding further factors %(suitable for causal modeling) 
to the analysis, factor values that originally appeared to be non-redundant to account for an outcome can turn out to be redundant after all. 
Hence, a \emph{permanence} condition needs to be imposed: only factor values that are permanently non-redundant, meaning that cannot be rendered redundant by expanding factor sets, are causally relevant. 



These considerations yield the following definition of causation:

\begin{description}\item[Causal Relevance (MINUS)]
 $X$ is causally relevant to $Y$ if, and only if, (I) $X$ is part of a MINUS-formula of $Y$ relative to some factor set $\mathbf{F}$ and (II) $X$ remains part of a MINUS-formula of $Y$ across all expansions of $\mathbf{F}$.
\end{description}

Two features of the (MINUS) definition make it particularly well suited for the analysis of structures affected by conjunctivity and disjunctivity. First, (MINUS) does not require that causes and effects are pairwise dependent. The following is a well-formed MINUS-formula expressing the flow model from the introduction: $S\att C \,+ \,s\att c \,\leftrightarrow\, F$. As shown in Table \ref{tab_ex}, ideal data generated from that model feature no pairwise dependencies. Nonetheless, if, say, high skills are permanently non-redundant to account for flow in combination with high challenges, they are causally relevant for flow subject to (MINUS), despite being uncorrelated with flow. Second, MINUS-formulas whose elements satisfy the permanence constraint not only identify causally relevant factor values but also place a Boolean ordering over these causes, such that conjunctivity and disjunctivity can be directly read off their syntax. 
 Take the following complex MINUS-formula:\begin{equation}\label{mt2}
(A\att b\; +\;a\att B \;\leftrightarrow\; C)\;\att\;(C\att f\; +\; D \;\leftrightarrow\; E)\end{equation} 
Causally interpreting \eqref{mt2} against the background of (MINUS) entails these causal ascriptions:%
\begin{enumerate}\itemsep0pt
\item the factor values listed on the left-hand sides of ``$\leftrightarrow$'' are causally relevant for the factor values on the right-hand sides; 
%
\item $A$ and $b$ are jointly relevant to $C$ and located on a causal path that differs from the path on which the jointly relevant $a$ and $B$ are located; $C$ and $f$ are jointly relevant to $E$ and located on a path that differs from $D$'s path; 
%
\item there is a causal chain from $A\att b$ and $a\att B$ via $C$ to $E$. 
\end{enumerate}




\subsection[Inferring MINUS causation from data]{Inferring MINUS causation from data}\label{inference}

Inferring MINUS causation from data faces various challenges. First, as anticipated in section \ref{intro}, causal structures for which conjunctivity and disjunctivity hold cannot be uncovered by scanning data for dependencies between pairs of factor values and suitably combining  dependent pairs. Instead, discovering MINUS causation requires searching for dependencies between complex Boolean functions of exogenous factors and outcomes. %But the space of Boolean functions over even a handful of binary factors is vast. For three factors there are 256 possible Boolean functions, for four 65'536, and 
But the space of Boolean functions over more than five factors is so vast that it cannot be exhaustively scanned. Hence, 
 algorithmic strategies are needed to purposefully narrow down the search.



Second, condition (MINUS.II) is not comprehensively testable. Once a MIN\-US-formula of an outcome $Y$ comprising a factor value $X$ has been inferred from data  $\delta$, the question arises whether the non-redundancy of $X$ in accounting for $Y$ is an artifact of $\delta$, due, for example, to the uncontrolled variation of confounders, or whether it is genuine and persists when further factors are taken into consideration. But in practice, expanding the set of factors is only feasible within narrow confines. To make up for the impossibility to  test  (MINUS.II), the analyzed data $\delta$ should be collected in such a way that Boolean dependencies in $\delta$ are not induced by an uncontrolled variation of latent causes but by the measured factors themselves. 
If the dependencies in $\delta$ are not artefacts of latent causes, they cannot be neutralized by factor set expansions, meaning they are permanent and, hence, causal. It follows that in order for it to be guaranteed that causal inferences drawn from $\delta$ are error-free, $\delta$ must meet very high quality standards. In particular, the uncontrolled causal background of $\delta$ must be \emph{homogeneous} \citep[286]{Baumgartner:simul}:
 \begin{description}\item[Homogeneity] The unmeasured causal background of data $\delta$ is homogeneous if, and only if, latent causes not connected to the outcome(s) in $\delta$ on causal paths via the measured exogenous factors (so-called \emph{off-path} causes) take constant values (i.e.\ do not vary) in the cases recorded in $\delta$.\end{description}
  
However, third, real-life data often do not meet very high quality standards. Rather, they tend to be \emph{fragmented} and \emph{noisy}. Data are fragmented when not all possible configurations of the analyzed factors are observed, and they are noisy when they contain cases incompatible with the data-generating structure. The degree of fragmentation corresponds to the ratio of configurations of exogenous factors that are compatible with the data-generating structure but missing from the data, due to practical limitations of data collection. Noise, in turn, is measurable as the ratio of cases in the data that are incompatible with the data-generating structure. Noise can be induced, for instance, by measurement error or limited control over latent causes, i.e.\ confounding. In the presence of fragmentation and noise, there typically are no strict Boolean sufficiency or necessity relations in the data. In consequence, methods of MINUS discovery have to carefully evaluate the available evidence on sufficiency and necessity relations; and they can only approximate strict MINUS structures by fitting their models more or less closely to the data using suitable parameters and thresholds of model fit (\citealp{DeSouter2024,newMeasures}; for details see section \ref{cons} below). Moreover, noise stemming from the uncontrolled variation of latent causes gives rise to homogeneity violations, which yield that inferences to MINUS causation are not guaranteed to be error-free. In order to nonetheless distill causal information from noisy data, strategies for avoiding over- and underfitting and estimating the error risk are needed (see \citealp{Parkkinen2023}).



Fourth, according to the MINUS theory, the inference to \emph{causal irrelevance} is much more demanding than the inference to causal relevance. Establishing that $X$ is a MINUS cause of $Y$ requires demonstrating the existence of at least one context with a constant background in which a difference in $X$ is associated with a difference in $Y$, whereas establishing that $X$ is \emph{not} a MINUS cause of $Y$ requires demonstrating the \emph{non-existence} of such a context, which is impossible on the basis of the non-exhaustive data samples that are typically analyzed in real-life studies. Correspondingly, the fact that, say, $G$ does not appear in \eqref{mt2} does not imply that $G$ is causally irrelevant to $C$ or $E$. The non-inclusion of $G$ simply means that the data from which \eqref{mt2} has been derived do not contain evidence for the causal relevance of $G$. However, future research having access to additional data might reveal the existence of a difference-making context for $G$ and, hence, entail the causal relevance of $G$ to $C$ or $E$ after all.

Finally, on a related note, as a result of the common fragmentation of real-life data $\delta$ MINUS-formulas inferred from $\delta$ cannot be expected to completely reflect the causal structure generating $\delta$. That is, MINUS-formulas inferred from $\delta$ are inevitably going to be \emph{incomplete}. They only detail those causally relevant factor values along with those conjunctive, disjunctive, and sequential groupings for which $\delta$ contain difference-making evidence. What difference-making evidence is contained in $\delta$  not only depends on the cases recorded in $\delta$ but, when $\delta$ is noisy, also on the tuning thresholds imposed to approximate strict Boolean dependency structures; relative to some such tuning settings an association between $X$ and $Y$ may pass as a sufficiency or necessity relation whereas relative to another setting it will not.
Hence, the inference to MINUS causation is sensitive to the chosen tuning settings, to the effect that choosing different settings is often going to be associated with changes in inferred MINUS-formulas. 

Some variance (but not all) in inferred MINUS-formulas is unproblematic. Two different MINUS-formulas $\mathbf{m}_{i}$ and $\mathbf{m}_{j}$ derived from $\delta$ using different tuning settings are in no disagreement if $\mathbf{m}_{i}$ and $\mathbf{m}_{j}$ are related in terms of the submodel relation:
%the causal claims entailed by $\mathbf{m}_{i}$ and $\mathbf{m}_{j}$ stand in a subset relation; that is, if one formula is a submodel of the other:
\begin{description}\item[Submodel relation]A MINUS-formula $\mathbf{m}_i$ is a \emph{submodel} of another MINUS-formula $\mathbf{m}_j$ if, and only if, the causal  ascriptions entailed by $\mathbf{m}_i$ are a subset of the causal ascriptions entailed by $\mathbf{m}_j$. %\begin{enumerate}\itemsep0pt\item[(i)] all factor values causally relevant according to $\mathbf{m}_i$ are also causally relevant according to $\mathbf{m}_j$, \item[(ii)] all factor values contained in two different disjuncts in $\mathbf{m}_i$ are also contained in two different disjuncts in $\mathbf{m}_j$, \item[(iii)] all factor values contained in the same conjunct in $\mathbf{m}_i$ are also contained in the same conjunct in $\mathbf{m}_j$, and \item[(iv)] if $\mathbf{m}_i$ and  $\mathbf{m}_j$ are complex MINUS-formulas, all atomic components $\mathbf{m}_i^k$ of $\mathbf{m}_i$ have a counterpart $\mathbf{m}_j^k$ in $\mathbf{m}_j$ such (i) to (iii) are satisfied for $\mathbf{m}_i^k$ and $\mathbf{m}_j^k$.\end{enumerate}
\end{description} 
\noindent If $\mathbf{m}_i$ is a submodel of $\mathbf{m}_j$, $\mathbf{m}_j$ is a \emph{supermodel} of $\mathbf{m}_i$. All of $\textbf{m}_i$'s causal ascriptions are contained in its supermodels' ascriptions, and $\textbf{m}_i$ contains the causal ascriptions of its own submodels. The submodel relation is reflexive: every model is a submodel (and supermodel) of itself; moreover, if $\mathbf{m}_i$ and $\mathbf{m}_j$ are submodels of one another, then $\mathbf{m}_i$ and $\mathbf{m}_j$ are identical. Most importantly, if two MINUS-formulas related by the submodel relation are not identical, they can be interpreted as describing the same causal structure at different levels of detail. 



\medskip
Before we turn to the \pkg{cna} package, a terminological note is required. In the literature on configurational comparative methods it has become customary to refer to the models produced by the methods as \emph{solution formulas}. To mirror that convention, the \pkg{cna} package refers to atomic MINUS-formulas inferred from data by CNA as \emph{atomic solution formulas}, \emph{\textbf{asf}}, for short, and to complex MINUS-formulas inferred from data as \emph{complex solution formulas}, \emph{\textbf{csf}}. For brevity, we will henceforth mainly use the shorthands \emph{asf} and \emph{csf}.  





\section[The input of CNA]{The input of CNA}

The goal of CNA is to output all \emph{asf} and \emph{csf} within provided bounds of model complexity that fit an input data set relative to provided tuning settings, in particular, fit thresholds.
 The algorithm performing this task in the \pkg{cna} package is implemented in the function \code{cna()}. Its most important arguments are:
\begin{Code}
cna(x, outcome = TRUE, con = 1, cov = 1, maxstep = c(3, 4, 10), 
    measures = c("standard consistency", "standard coverage"), 
    ordering = NULL, strict = FALSE, exclude = character(0), notcols = NULL, 
    what = if (control$suff.only) "m" else "ac", details = FALSE, 
    suff.only = FALSE, acyclic.only = FALSE, cycle.type=c("factor","value"))
\end{Code}
%
This section explains most of these inputs and introduces some auxiliary functions. The arguments \code{what}, \code{acyclic.only}, and \code{cycle.type} will be discussed in section \ref{output}.


\subsection[Data]{Data}



Data $\delta$ processed by CNA have the form of $m\times k$ matrices, where $m$ is the number of units of observation (cases) and $k$ is the number of measured factors. $\delta$ can either be of type ``crisp-set'' ($cs$), ``multi-value'' ($mv$) or ``fuzzy-set'' ($fs$). Data that feature $cs$ factors only are $cs$. If the data contain at least one $mv$ factor, they count as $mv$. Data featuring at least one $fs$ factor are treated as $fs$.\footnote{Note, first, that factors calibrated at crisp-set thresholds may appear with unsuitably extreme values if the data as a whole are treated as $fs$ due to some $fs$ factor, and second, that mixing $mv$ and $fs$ factors in one analysis is not possible.} Examples of each data type are given in Table \ref{tab2}. Raw data collected in a study typically need to be suitably calibrated before they can be fed to \code{cna()}. We do not address the calibration problem here because it is the same for CNA as for QCA, in which context it has been extensively discussed, for example, by \cite{Thiem:2013} or \cite{Schneider:2012}. The \proglang{R} packages \href{https://cran.r-project.org/package=QCApro}{\pkg{QCApro}}, \href{https://cran.r-project.org/package=QCA}{\pkg{QCA}}, and \href{https://cran.r-project.org/package=SetMethods}{\pkg{SetMethods}} provide all tools necessary for data calibration.
\begin{table}[tb]\centering
\subfloat[$cs$ data]{\begin{tabular}{r|cccc}
& $A$ & $B$ & $C$ & $D$  \\\hline
$c_1$ &0 &0 &0& 0  \\
$c_2$ &0 &1 &0& 0  \\
$c_3$ &1 &1 &0& 0  \\
$c_4$ &0 &0 &1& 0  \\
$c_5$ &1 &0 &0& 1  \\
$c_6$ &1 &0 &1& 1  \\
$c_7$ &0 &1 &1& 1  \\
$c_8$ &1 &1 &1& 1 \\  \hline
  
\end{tabular}
}\hspace{.5cm}
\subfloat[$mv$ data]{
\begin{tabular}{r|cccc}
& $A$ & $B$ & $C$ & $D$  \\\hline
$c_1$ &1 &3 &3& 1  \\
$c_2$ &2 &2 &1& 2  \\
$c_3$ &2 &1 &2& 2  \\
$c_4$ &2 &2 &2& 2  \\
$c_5$ &3 &3 &3& 2  \\
$c_6$ &2 &4 &3& 2  \\
$c_7$ &1 &3 &3& 3  \\
$c_8$ &1 &4 &3& 3 \\  \hline
\end{tabular}
}\hspace{.5cm}
\subfloat[$fs$ data]{
\begin{tabular}{r|ccccc}
      & $A$ & $B$ & $C$ & $D$ & $E$      \\\hline 
$c_1$ & 0.37 &0.30 &0.16 &0.06 &0.25 \\
$c_2$ & 0.89 &0.39 &0.64 &0.09 &0.03 \\
$c_3$ & 0.06 &0.61 &0.92 &0.37 &0.15 \\
$c_4$ & 0.65 &0.93 &0.92 &0.18 &0.93 \\
$c_5$ & 0.08 &0.08 &0.12 &0.86 &0.91 \\
$c_6$ & 0.70 &0.02 &0.85 &0.91 &0.97 \\
$c_7$ & 0.04 &0.72 &0.76 &0.90 &0.68 \\
$c_8$ & 0.81 &0.96 &0.89 &0.72 &0.82 \\
  \hline
\end{tabular}%\caption{}
}

\caption{Data types processable by CNA.}\label{tab2}
\end{table}

Calibrated data are given to the \code{cna()} function via the argument \code{x}, which must be a data frame or an object of class ``configTable'' as output by the \code{configTable()} function (see section \ref{configTable} below). The \pkg{cna} package contains a number of example data sets from published studies, \code{d.autonomy}, \code{d.educate}, \code{d.irrigate}, \code{d.jobsecurity}, \code{d.minaret}, \code{d.pacts}, \code{d.pban}, \code{d.performance}, \code{d.volatile}, \code{d.women}, and one simulated data set, \code{d.highdim}. For details on their contents and sources, see the \pkg{cna} \href{https://cran.r-project.org/web/packages/cna/cna.pdf}{reference manual}. After having loaded the \pkg{cna} package, 
all of these data sets are available for processing:
<<data examples, results=hide , message = FALSE, warning=FALSE>>=
library(cna)
cna(d.educate)
cna(d.women)
@
%
%Prior to version 3.2 of the \pkg{cna} package, \code{cna()} needed be told explicitly what type of data \code{x} contains using the \code{type} argument. Now, \code{type} has the default value \code{"auto"} inducing automatic detection of the data type. The \code{type} argument remains in the package for backwards compatibility and in order to allow the user to specify the data type manually: \code{type = "cs"} stands for $cs$ data, \code{type = "mv"} for $mv$ data, and \code{type = "fs"} for $fs$ data.\footnote{The corresponding shortcut functions \code{cscna()}, \code{mvcna()}, and \code{fscna()} also remain available; see \code{?shortcuts}.} %  though: crisp-set data is specified by \code{type = "cs"}, multi-value data by \code{type = "mv"}, and fuzzy-set  If the data are not of type $cs$, \code{cna()} must be told explicitly what type of data \code{x} contains using the \code{type} argument, which takes the values \code{"mv"} for $mv$ data and \code{"fs"} for $fs$ data. The functions \code{mvcna(x, ...)} and \code{fscna(x, ...)} are available as shorthands for \code{cna(x, type = "mv", ...)} and \code{cna(x, type = "fs", ...)}, respectively.
% # <<data type, eval=F>>=
% # cna(d.jobsecurity)
% # cna(d.pban) 
% # @


\subsubsection[Configuration tables]{Configuration tables}\label{configTable}

To facilitate the reviewing of data, the function \code{configTable(x, case.cutoff = 0)}  assembles cases with identical configurations in one row of a so-called \emph{configuration table}. A configuration table is not to be confused with what is called a \emph{truth table} in QCA. While a QCA truth table indicates for every configuration of all exogenous factors (i.e.\ for every minterm) whether it is sufficient for the outcome, a CNA configuration table does not express relations of sufficiency but simply provides a compact representation of the data that lists all configurations exactly once and adds a column indicating how many instances (cases) of each configuration are contained in the data. 

The first argument \code{x} of \code{configTable()} is a data frame or matrix. The function then merges multiple rows of \code{x} featuring the same configuration into one row, such that each row of the resulting table corresponds to one determinate configuration of the factors in \code{x}. The number of occurrences of a configuration and an enumeration of the cases instantiating it are saved as attributes ``n'' and ``cases'', respectively. %The argument \code{type} is the same as in the \code{cna()} function; it specifies the data type and takes the default value \code{"auto"} inducing automatic data type detection.\enlargethispage{0.7cm}
%When not applied to $cs$ data, the data type must be specified with the \code{type} argument. Alternatively, the shorthand functions \code{csct(x)}, \code{mvct(x)} and \code{fsct(x)} are available.
<<configTable1, eval=T>>=
configTable(d.women)
@
%
The second argument \code{case.cutoff} allows for setting a minimum frequency cutoff determining that configurations with less instances in the data are not included in the configuration table and the ensuing analysis. For instance, \code{configTable(x, case.cutoff = 3)} entails that configurations that are instantiated in less than 3 cases are excluded.

Configuration tables produced by \code{configTable()} can be directly passed to \code{cna()}. Moreover, as configuration tables generated by \code{configTable()} are objects that are very particular to the \pkg{cna} package, the function \code{ct2df()} is available to transform configuration tables back into ordinary \proglang{R} data frames.
<<configTable2, eval = F>>=
pact.ct <- configTable(d.pacts, case.cutoff = 2)
ct2df(pact.ct)
@


\subsubsection[Data simulations]{Data simulations}\label{simul}

The \pkg{cna} package provides extensive functionalities for data simulations---which, in turn, are essential for inverse search trials that benchmark CNA's output (see section \ref{bench}). In a nutshell, the functions \code{allCombs()} and \code{full.ct()} generate the space of all logically possible configurations over a given set of factors, \code{selectCases()} selects, from this space, the configurations that are compatible with a data-generating causal structure, which, in turn, can be randomly drawn by \code{randomAsf()} and \code{randomCsf()}, \code{makeFuzzy()} fuzzifies that data, and \code{some()} randomly selects cases, for instance, to produce data fragmentation. 

More specifically, \code{allCombs(x)} takes an integer vector \code{x} as input and generates a configuration table featuring all possible value configurations of \code{length(x)} factors---the first factor having \code{x[1]} values, the second \code{x[2]} values etc. The factors are labeled using capital letters in alphabetical order. Analogously, but more flexibly, \code{full.ct(x)} generates a configuration table with all logically possible value configurations of the factors defined in the input \code{x}, which can be a configuration table, a data frame, an integer, a list specifying the factors' value ranges, or a character vector featuring all admissible factor values.
<<simul1, eval = F>>=
allCombs(c(2, 2, 2)) - 1 
allCombs(c(3, 4, 5))
full.ct("A + B*c")
full.ct(6)
full.ct(list(A = 1:2, B = 0:1, C = 1:4)) 
@
%
The input of \code{selectCases(cond, x)} is a character string \code{cond} specifying a Boolean function, which typically (but not necessarily) expresses a data-generating MINUS structure, as well as, optionally, a data frame or configuration table \code{x}. If \code{x} is specified, the function selects the cases that are compatible with \code{cond} from \code{x}; if \code{x} is not specified, it selects from \code{full.ct(cond)}. It is possible to randomly draw \code{cond} using \code{randomAsf(x)} or \code{randomCsf(x)}, which generate random atomic and complex solution (i.e.\ MINUS-)formulas, respectively, from a data frame or configuration table \code{x}.
<<simul1, eval = F>>=
dat1 <- allCombs(c(2, 2, 2)) - 1 
selectCases("A + B <-> C", dat1)
selectCases("(h*F + B*C*k + T*r <-> G)*(A*b + H*I*K <-> E)")
target <- randomCsf(full.ct(6))
selectCases(target)
@
%
% The closely related function \code{selectCases1(cond, x, con = 1, cov = 1)} additionally allows for providing consistency (\code{con}) and coverage (\code{cov}) thresholds (see section \ref{cons}), such that some cases that are incompatible with \code{cond} are also selected, as long as \code{cond} still meets \code{con} and \code{cov} in the resulting data. Thereby, measurement error or noise can be simulated in a manner that allows for controlling the degree of case incompatibilities.
% <<simul2, eval= F>>=
% dat2 <- full.ct(list(EN = 0:2, TE = 0:4, RU = 1:4)) 
% selectCases1("EN=1*TE=3 + EN=2*TE=0 <-> RU=2", dat2, con = .75, cov = .75)
% @
%

\code{makeFuzzy(x, fuzzvalues = c(0, 0.05, 0.1))} simulates fuzzy-set data by transforming a data frame or configuration table \code{x} consisting of $cs$ factors into an $fs$ configuration table. To this end, the function adds values selected at random from the argument \code{fuzzvalues} to the 0's and subtracts them from the 1's in \code{x}. \code{fuzzvalues} is a numeric vector of values from the interval [0,1].
<<simul3, eval= F>>=
makeFuzzy(selectCases("Hunger + Heat <-> Run"), 
   fuzzvalues = seq(0, 0.4, 0.05))
@
%
Finally, \code{some(x, n = 10, replace = TRUE)} randomly selects \code{n} cases from a data frame or configuration table \code{x}, with or without replacement. If \code{x} features all configurations that are compatible with a data-generating structure and \code{n < nrow(x)}, the data frame or configuration table issued by \code{some()} is fragmented, meaning it does not contain all empirically possible configurations. If \code{n > nrow(x)}, data of large sample sizes can be generated featuring multiple instances of the empirically possible configurations. 
<<simul4, eval= F>>=
dat3 <- allCombs(c(3, 4, 5))
dat4 <- selectCases("A=1*B=3 + A=3 <-> C=2", configTable(dat3))
some(dat4, n = 10, replace = FALSE)
some(dat4, n = 1000, replace = TRUE)
@


\subsection{Evaluating sufficiency and necessity} \label{cons}
\nopagebreak
As we have seen in section \ref{inference}, real-life data tend to be affected by fragmentation and noise, in the presence of which strictly sufficient or necessary conditions for an outcome often do not exist. Accordingly, methods for MINUS discovery have to carefully evaluate what inferences on sufficiency and necessity relations the available evidence warrants. In particular, it must be assessed whether the nonexistence of strictly sufficient or necessary conditions is more likely due to deficiencies in data collection and lack of control over background influences, or whether it is indicative of the absence of causal relations. The former holds if, and only if, strict sufficiency and necessity would have been observed had the  data been ideal, that is, non-fragmented and noise-free. Various measures are available to help evaluate whether that hypothetical holds or not. As the implication operator underlying the notions of sufficiency and necessity is defined differently in classical and fuzzy logic, the evaluation measures for $cs$ and $mv$ data have different formal definitions than the measures for $fs$ data. But since they are very closely related conceptually, the measures have identical names in the $cs$/$mv$ and $fs$ contexts. Their names reflect the fact that they have all evolved from \emph{consistency} and \emph{coverage}, which \citet{ragin2006} imported into the QCA protocol and which have proven serviceable to the purposes of CNA as well. This section first presents the variants of consistency and coverage suitable for $cs$ and $mv$ data, then turns to the $fs$ case, and finally discusses the choice of evaluation measures and the setting of corresponding thresholds.


\subsubsection{Crisp-set and multi-value data}

To introduce the relevant evaluation measures for $cs$ and $mv$ data, we use $\Phi$ as a placeholder for a Boolean expression in DNF (see fn.\ \ref{fnDNF}), for example, $A$, $A\att C$, or $A\id 2\att C\id 3 \,+\, A\id 1\att C\id 1$, while $\phi$ represents the negation of that DNF. Analogously, $Y$ and $y$ shall be placeholders for single factor values and their negations, for example, $A$ and $a$. We call $\Phi$ the \emph{antecedent} whose sufficiency or necessity for the \emph{outcome} $Y$ is to be assessed. %of an implication $\Phi\rightarrow Y$, and $Y$ is the \emph{consequent}.
Moreover, we use cardinality bars $|\ldots |$ to refer to the number of cases in the analyzed data $\delta$ satisfying the enclosed expression. For example, $|\Phi \att Y|$ designates the number of cases in $\delta$ instantiating $\Phi \att Y$ (i.e.\ both the antecedent and the outcome).

In versions of \pkg{cna} $<4.0$, the only implemented sufficiency measure was \emph{consistency}, which we give in two equivalent forms here---the first being the most commonly used form and the second being the form that renders the measure's penalty term ($|\Phi \att y|$) explicit.\footnote{The evaluation measures discussed in this section contain arithmetic sums. We symbolize sums with script-style ``$\scriptstyle +$'', as opposed to the ``$+$'' used for Boolean disjunction. }
\begin{align*}
  consistency(\Phi,Y) = \frac{|\Phi \att Y|}{|\Phi|} = \frac{|\Phi \att Y|}{|\Phi \att Y| \plus |\Phi \att y|} %\label{con}
\end{align*}
Consistency is formally equivalent to what is known as \emph{positive predictive value} or \emph{precision} in various fields of machine learning. It considers all cases in $\delta$ featuring $\Phi$ and measures the proportion of them that satisfy $\Phi \rightarrow Y$, which are those that instantiate $Y$ in addition to $\Phi$. It penalizes the cases with $\Phi$ that violate $\Phi \rightarrow Y$, that is, 
cases with $\Phi \att y$. 

Even though this penalization schema makes good intuitive sense, consistency has two distinctive weaknesses (see \citealt{DeSouter2024}). First, as cases exhibiting $Y$ cannot violate $\Phi \rightarrow Y$, consistency tends to be high in data $\delta$ with a high proportion of cases with $Y$, meaning with high outcome prevalence. Even if $\Phi$ and $Y$ are entirely independent in $\delta$, the consistency of $\Phi$ for $Y$ is equal to the prevalence of $Y$ \citep{newMeasures}. In other words, when the prevalence of $Y$ is high, consistency is high for every (arbitrary) $\Phi$. But of course, the mere fact that most cases in $\delta$ instantiate $Y$ is not evidence in favor of every $\Phi$ being sufficient for  $Y$ in the (hypothetical) ideal version of $\delta$. Second, if cases with $Y$ are rare, there are only few cases that could possibly corroborate that $\Phi\rightarrow Y$ holds (i.e.\ the few cases with $Y$), and if some of them are affected by noise, consistency plummets. In consequence, the chances that consistency can detect sufficiency satisfaction are low. Overall, consistency is too lenient when prevalence is high and it is overly strict when prevalence is low.

To address these weaknesses, \pkg{cna} $4.0$ makes three new sufficiency measures available: \emph{preva\-lence-adjusted consistency} (PA-consistency), \emph{contrapositive consistency} (C-consistency), and \emph{antecedent-adjusted C-consistency} (AAC-consistency).
\begin{align*}PA\text{-}consistency(\Phi, Y) &= \frac{|\Phi \att Y|}{|\Phi \att Y| \plus \frac{|Y|}{|y|} \cdot |\Phi \att y|} \\[.3cm]
  C\text{-}consistency(\Phi, Y) &=\frac{|\phi \att y|}{|\phi \att y| \plus  |\Phi \att y|} \\[.3cm]
  AAC\text{-}consistency(\Phi, Y) &= \frac{|\phi \att y|}{|\phi \att y| \plus \frac{|\phi|}{|\Phi|} \cdot |\Phi \att y|} %\label{AACon}
\end{align*}
PA-consistency is a variant of consistency that is equivalent to calibrated precision as proposed by  \cite{siblini_master_2020}. It differs from consistency by the weight $\frac{|Y|}{|y|}$ attached to the penalty term
$|\Phi \att y|$ in the denominator. When $|Y| = |y|$, this weight is $1$, to the effect that PA-consistency and consistency are equal. However, the weight increases as $|Y|$ increases relative to $|y|$ and it decreases as $|Y|$ decreases relative to $|y|$. As a result, $|\Phi \att y|$ is penalized more strongly when prevalence is high and less strongly when prevalence is low. 

C-con\-sist\-ency is equivalent to the measure of \emph{specificity} used in machine learning. 
Its use as sufficiency measure in CNA is based on the rule of contraposition, which states that $\Phi \rightarrow Y$ is logically equivalent to $y \rightarrow \phi$. In order for $\Phi$ to be sufficient for $Y$, the cases with $y$ must exhibit $\phi$. Accordingly, C-consistency
penalizes the cases with $y$ exhibiting $\Phi$, which are the same cases penalized by consistency and exactly the cases violating  $\Phi\rightarrow Y$ and $y \rightarrow \phi$. \cite{DeSouter2024} has shown that a combined use of consistency and C-consistency can increase model quality substantively. Still, even though C-consistency does not suffer from the weaknesses of consistency, it has its own limitations. It is too lenient when $\Phi$ is infrequent in $\delta$, and it is overly strict when $\Phi$ is frequent (see \citealt{newMeasures} for details).

AAC-consistency addresses the limitations of C-consistency by adjusting the penalty term $|\Phi \att y|$ in the denominator of C-consistency by the weight $\frac{|\phi|}{|\Phi|}$. If $|\phi| = |\Phi|$, this weight is $1$, rendering  AAC-consistency and C-consistency equal.  However, the weight increases as $|\phi|$ increases relative to $|\Phi|$ and it decreases as $|\phi|$ decreases relative to $|\Phi|$, yielding that $|\Phi \att y|$ is penalized more strongly when $\Phi$ is infrequent and less strongly when $\Phi$ is frequent.


% It is calculated by dividing the number of cases instantiating $\Phi \att Y$ by the sum of these cases and the number of cases instantiating $\Phi \att y$ multiplied by the ratio of cases instantiating $Y$ to cases instantiating $y$. This ratio is the prevalence of $Y$ in $\delta$. PC-consistency is a more stringent measure than consistency, as it penalizes cases with $\Phi$ that violate $\Phi \rightarrow Y$ more heavily when $Y$ is rare.

The only necessity measure implemented in versions of \pkg{cna} $<4.0$ was \emph{coverage}, which, for reasons of transparency, we also write in two equivalent forms here.
\begin{align*}
  coverage(\Phi, Y) = \frac{|\Phi \att Y|}{|Y|} = \frac{|\Phi \att Y|}{|\Phi \att Y| \plus |\phi \att Y|} 
\end{align*}
Coverage is equivalent to what is labelled \emph{sensitivity} or \emph{recall} in machine learning. It considers all cases in $\delta$ featuring $Y$ and measures the proportion of them that comply with the necessity of $\Phi$ for $Y$ (i.e.\ $Y \rightarrow \Phi$), which are those that instantiate both $\Phi$ and $Y$, and it penalizes the cases exhibiting $Y$ without $\Phi$, that is, the cases with $\phi \att Y$.

Despite following a sensible penalization scheme for necessity relations, coverage has weaknesses analogous to those of consistency \citep{DeSouter2024}. On the one hand, it tends to be high in data $\delta$ with a high proportion of cases featuring the antecedent $\Phi$. %Even if $\Phi$ and $Y$ are entirely independent, the coverage of $\Phi \rightarrow Y$ is equal to the proportion of cases featuring $\Phi$ \citep{newMeasures}. 
Any (arbitrary) $\Phi$ that is independent of $Y$ scores high on coverage, as long as it is given in a high proportion of cases in $\delta$ \citep{newMeasures}. 
Plainly, though, that most cases instantiate $\Phi$ is not evidence in favor of $\Phi$ being necessary for $Y$ in the (hypothetical) ideal version of $\delta$. On the other hand, coverage is greatly susceptible to noise when the proportion of cases with the antecedent is low. As it penalizes cases with $\phi \att Y$ in proportion to cases with $\Phi \att Y$, only a few noisy cases with $\phi \att Y$ pull down coverage significantly, irrespective of whether $\Phi$ is actually a cause of $Y$. So, coverage is too lenient when the proportion of $\Phi$ is high and too strict when that proportion is low. %, making it an unreliable evaluation measure for the necessity of $\Phi$ for $Y$ when $|\Phi|\big/N$ is at high or low extremes.

Version $4.0$ of \pkg{cna} provides three new necessity measures: \emph{antecedent-adjusted coverage} (AA-coverage), \emph{contrapositive coverage} (C-coverage), and \emph{prevalence-adjusted C-coverage} (PAC-coverage).
\begin{align*}
AA\text{-}coverage(\Phi, Y) &= \frac{|\Phi \att Y|}{|\Phi \att Y| \plus \frac{|\Phi|}{|\phi|} \cdot |\phi \att Y|} \\[.3cm]
  C\text{-}coverage(\Phi, Y) &= \frac{|\phi \att y|}{|\phi \att y| \plus |\phi \att Y|}\\[.3cm]
PAC\text{-}coverage(\Phi, Y) &= \frac{|\phi \att y|}{|\phi \att y| \plus \frac{|y|}{|Y|} \cdot |\phi \att Y|} 
\end{align*}
As regular coverage is too lenient when the proportion of $\Phi$ is high and too strict when that proportion is low, AA-coverage adds the weight $\frac{|\Phi|}{|\phi|}$ to the penalty $|\phi \att Y|$. This makes AA-coverage stricter than coverage when the antecedent proportion is high and more lenient than coverage when it is low, thereby mitigating the limitations of coverage.

C-coverage is equivalent to \emph{negative predictive value} (NPV). Its use as necessity measure is justified by the rule of contraposition, which guarantees that $\Phi$ is necessary for $Y$ if, and only if, $y$ is necessary for $\phi$. Both necessity relations are violated by cases with $\phi \att Y$ (and by no others). Correspondingly, C-coverage penalizes the cases exhibiting $\phi\att Y$ by measuring the proportion of cases with $\phi$ that instantiate $y$. It does not suffer from the limitations of coverage, but it has its own weaknesses. It is too lenient when prevalence is low, and it is too strict when prevalence is high \citep{newMeasures}. 

To alleviate these problems of C-coverage, PAC-coverage adjusts for prevalence by adding the weight $ \frac{|y|}{|Y|}$ to the penalty term $|\phi \att Y|$. This weight ensures that, when prevalence is low, cases with $\phi \att Y$ are penalized more strongly, and less strongly when prevalence is high.






\subsubsection{Fuzzy-set data}

Contrary to the $cs/mv$ measures, the $fs$ measures are not defined in terms of case cardinalities but based on sums over membership scores in the data $\delta$---for example, $\sum\nolimits_{i=1}^n \min(\Phi_i,Y_i)$, which is the sum over $\min(\Phi_i,Y_i)$\footnote{The minimum function is the $fs$ rendering of conjunction, see p.\ \pageref{fs.render} above.} in all $n$ cases of $\delta$. For convenience, we will shorten sums by dropping running indices, meaning that $\sum\nolimits_{i=1}^n \min(\Phi_i,Y_i)$ becomes $\sum\min(\Phi,Y)$. To maximize transparency with respect to penalization, we again provide two equivalent forms for the $fs$ versions of the standard sufficiency and necessity measures:
\begin{align*}
  consistency(\Phi, Y) &= \frac{\sum\min(\Phi,Y)}{\sum\Phi}\\ &=  \frac{\sum \min(\Phi,Y)}{\sum (\min(\Phi,Y)\plus  \min(\Phi,y) - \min(\Phi,\phi,Y,y))} \\[.4cm]
   coverage(\Phi, Y) &= \frac{\sum \min(\Phi,Y)}{\sum Y}\\ &=  \frac{\sum\min(\Phi,Y)}{\sum (\min(\Phi,Y)\plus  \min(\phi,Y) - \min(\Phi,\phi,Y,y))}
\end{align*}
Subtracting $\min(\Phi,\phi,Y,y)$ from the penalty terms in the denominators of consistency and coverage is needed to correct for the double-counting due to cases with non-zero membership scores in all of $\Phi$, $\phi$, $Y$, and $y$ (for more details see \citealt{DeSouter2024}, 24). Such corrections are not necessary in $cs$ and $mv$ measures, as values of $cs$ and $mv$ factors are mutually exclusive, meaning that $\min(\Phi,\phi,Y,y)$ is always $0$.


The limitations of standard consistency and coverage for sufficiency and necessity evaluation in $fs$ data are analogous to those in $cs$ and $mv$ data. To address them, version 4.0 of \pkg{cna} provides $fs$ variants of PA-consistency, C-consistency, AAC-consistency, AA-coverage, C-coverage, and PAC-coverage. The $fs$ measures are defined as follows: %, where $k$, $0\leq k\leq 1$, is a hyperparameter (to be set by the user) that allows for weighing the double-counting corrections in the adjustment weights:
% VERSION WITH k
% \begin{align*}
%   PA\text{-}con(\Phi\rightarrow Y) &=  \frac{\sum \min(\Phi,Y)}{\sum \min(\Phi,Y) \plus  \frac{\sum Y + k\cdot \sum\min(\Phi,\phi,Y,y)}{\sum y-  k\cdot \sum \min(\Phi,\phi,Y,y)} \cdot \sum (\min(\Phi,y) - \min(\Phi,\phi,Y,y))} \\[.3cm]
% C\text{-}con(\Phi\rightarrow Y)  &=  \frac{\sum \min(\phi,y)}{\sum (\min(\phi,y)\plus  \min(\Phi,y) - \min(\Phi,\phi,Y,y))} \end{align*}
% \begin{align*}
% AAC\text{-}con(\Phi\rightarrow Y) &=  \frac{\sum \min(\phi,y)}{\sum \min(\phi,y) \plus  \frac{\sum \phi + k\cdot \sum\min(\Phi,\phi,Y,y)}{\sum \Phi -  k\cdot \sum \min(\Phi,\phi,Y,y)} \cdot \sum (\min(\Phi,y) - \min(\Phi,\phi,Y,y))} \\[.3cm]
% AA\text{-}cov(\Phi\rightarrow Y) &=  \frac{\sum \min(\Phi,Y)}{\sum \min(\Phi,Y) \plus  \frac{\sum \Phi + k\cdot \sum\min(\Phi,\phi,Y,y)}{\sum \phi -  k\cdot \sum \min(\Phi,\phi,Y,y)} \cdot \sum (\min(\phi,Y) - \min(\Phi,\phi,Y,y))} \\[.3cm]
%  C\text{-}cov(\Phi\rightarrow Y) &=  \frac{\sum\min(\phi,y)}{\sum (\min(\phi,y)\plus  \min(\phi,Y) - \min(\Phi,\phi,Y,y))} \\[.3cm]
%  PAC\text{-}cov(\Phi\rightarrow Y) &=  \frac{\sum \min(\phi,y)}{\sum \min(\phi,y) \plus  \frac{\sum y + k\cdot \sum\min(\Phi,\phi,Y,y)}{\sum Y -  k\cdot \sum \min(\Phi,\phi,Y,y)} \cdot \sum (\min(\phi,Y) - \min(\Phi,\phi,Y,y))}
% \end{align*}
% VERSION WITHOUT k
\begin{align*}
  PA\text{-}consistency(\Phi, Y) &=  \frac{\sum \min(\Phi,Y)}{\sum \min(\Phi,Y) \plus  \frac{\sum Y}{\sum y} \cdot \sum (\min(\Phi,y) - \min(\Phi,\phi,Y,y))} \\[.3cm]
C\text{-}consistency(\Phi, Y)  &=  \frac{\sum \min(\phi,y)}{\sum (\min(\phi,y)\plus  \min(\Phi,y) - \min(\Phi,\phi,Y,y))} \end{align*}
\begin{align*}
AAC\text{-}consistency(\Phi, Y) &=  \frac{\sum \min(\phi,y)}{\sum \min(\phi,y) \plus  \frac{\sum \phi }{\sum \Phi} \cdot \sum (\min(\Phi,y) - \min(\Phi,\phi,Y,y))} \\[.3cm]
AA\text{-}coverage(\Phi, Y) &=  \frac{\sum \min(\Phi,Y)}{\sum \min(\Phi,Y) \plus  \frac{\sum \Phi }{\sum \phi} \cdot \sum (\min(\phi,Y) - \min(\Phi,\phi,Y,y))} \\[.3cm]
 C\text{-}coverage(\Phi, Y) &=  \frac{\sum\min(\phi,y)}{\sum (\min(\phi,y)\plus  \min(\phi,Y) - \min(\Phi,\phi,Y,y))} \\[.3cm]
 PAC\text{-}coverage(\Phi, Y) &=  \frac{\sum \min(\phi,y)}{\sum \min(\phi,y) \plus  \frac{\sum y }{\sum Y} \cdot \sum (\min(\phi,Y) - \min(\Phi,\phi,Y,y))} 
\end{align*}

% wcon(X,Y,f)=SXY/(SXY + (SY+kâ‹…SXxYy)/(Sy-kâ‹…SXxYy)  â‹…(SXy - SXxYy) )

\subsubsection{Choosing measures and setting thresholds}

Sufficiency and necessity measures play a twofold role in CNA: on the one hand, they are key in CNAâ€™s model-building algorithm (see section \ref{algo}), and on the other, they are used to select among multiple model candidates output by that algorithm (section \ref{ambigu}). When the analyzed data are noisy and/or fragmented, strictly sufficient and necessary conditions may not exist for scrutinized outcomes. But associations that score reasonably high on chosen sufficiency and necessity measures may be acceptable as sufficiency or necessity relations nonetheless, on the grounds that they would have been relations of strict sufficiency or necessity had the data been ideal. By lowering the thresholds for associations to pass the sufficiency or necessity test and accepting the concomitant error risk, it becomes possible to extract causal information from data despite the presence of noise and fragmentation.

The argument \code{measures} of the \code{cna()} function selects the sufficiency and the necessity measure to be employed for model-building. It expects a vector of length two as input, where the first value specifies the variant of consistency to be used as sufficiency measure and the second the variant of coverage to be used as necessity measure. The measures can be identified by their full names or by their aliases from the following list.
<<showMeasures>>=
showConCovMeasures()
@
The default is \code{measures = c("standard consistency", "standard coverage")}, which\linebreak can be abbreviated as \code{measures = c("scon", "scov")}. For ideal data, all combinations of sufficiency and necessity measures yield the same output (at maximal thresholds), meaning that any combination is as good as any other. However, for noisy and fragmented data, different combinations may produce different models. These differences tend to increase with the degree at which the value distributions of the factors in the data are imbalanced. Which measure combinations to use for which data scenarios is a matter of ongoing research. The results of \cite{newMeasures} show that the measure combination producing the most reliable models for $cs$ data is \emph{PA-consistency} for sufficiency evaluation and \emph{PAC-coverage} for necessity evaluation, that is, \code{measures = c("PAcon", "PACcov")}. Their results also show that when the prevalence of the outcome is balanced in a mid-range (i.e.\ above 0.3 and below 0.7), all measure combinations perform similarly well. Unfortunately, no analogous studies have been conducted for $mv$ and $fs$ data yet. Hence, for these data types, the default measure combination \code{measures = c("scon", "scov")} is recommended at present. Still, for all data types and scenarios, different measure combinations can be used for cross-validation purposes. \cite{DeSouter2024} has shown that models scoring similarly high on consistency and C-consistency as well as on coverage and C-coverage are more likely to be correct than models scoring high on only consistency and coverage.

The \code{details} argument can be used to obtain the scores of the returned \emph{asf} and \emph{csf} on any of the available evaluation measures, irrespective of whether they are used for model-building or not. The argument expects a character vector specifying the measures of interest, which can be identified by their full names or by their aliases.\label{details_first}
<<showDetails,eval= F>>=
cna(d.women, measures = c("PAcon", "PACcov"), details = c("scon", "scov",
   "ccon", "ccov", "AAcov", "AACcon"))
@
The function behind the \code{details} argument is also available as stand-alone function \linebreak \code{detailMeasures()}, which is explained in the \pkg{cna} \href{https://cran.r-project.org/web/packages/cna/cna.pdf}{reference manual}.

Thresholds (from the interval $[0,1]$) for the chosen variants of consistency and coverage can be given to the \code{cna()} function using the arguments \code{con} and \code{cov}. The \code{con} argument sets the consistency (or sufficiency) threshold for minimally sufficient conditions (\emph{msc}), for \emph{asf} and \emph{csf}, while \code{cov} sets the coverage (or necessity) threshold for \emph{asf} and \emph{csf} (no coverage threshold is imposed on \emph{msc}). %As illustrated on pp.\ 19-20 of the \pkg{cna} \href{https://cran.r-project.org/web/packages/cna/cna.pdf}{reference manual}, setting different consistency thresholds for \emph{msc} and \emph{asf/csf} can enhance the informativeness of \code{cna()}'s output in certain cases but is non-standard. The standard setting is \code{con = con}. 
The default numeric value for all thresholds is 1, %\code{con = con = cov = 1}, 
which corresponds to strict Boolean sufficiency and necessity. Contrary to QCA, which often returns solutions that do not comply with the chosen consistency threshold and which does not impose a coverage threshold at all, CNA uses the selected sufficiency and necessity measures as authoritative model building criteria such that, if they are not met, CNA abstains from issuing solutions. That means, if the default thresholds are used, \code{cna()} will only output strictly sufficient \emph{msc}, \emph{asf}, and \emph{csf} and only strictly necessary \emph{asf} and \emph{csf}. 

If the data are noisy, the default thresholds will typically not yield any solution formulas. In such cases, \code{con} and \code{cov} may be suitably lowered. By lowering \code{con} below 1 in a $cs$ analysis, \code{cna()} is given permission to treat $\Phi$ as sufficient for $Y$, even though there are some cases with $\Phi\att y$ in the data. Or by lowering \code{cov} in an $fs$ analysis, \code{cna()} is allowed to treat $\Phi$ as necessary for $Y$, even though some cases have higher membership scores on $Y$ than on $\Phi$, meaning that the sum of the membership scores in $Y$ over all cases in the data exceeds the sum of the membership scores in $\min(\Phi,Y)$. 

Determining the optimal values to which \code{con} and \code{cov} should be lowered in a specific discovery context is a delicate task. On the one hand, CNA faces a severe overfitting risk when the data contain configurations incompatible with the data-generating structure, meaning that \code{con} and \code{cov} must not be set too high (i.e.\ too close to 1). On the other hand, the lower \code{con} and \code{cov} are set, the less complex and informative CNA's output will be, that is, the more CNA's purpose of uncovering causal complexity will be undermined. To find a suitable balance between over- and underfitting, \cite{Parkkinen2023} systematically re-analyze the data at all \code{con} and \code{cov} settings in the interval $[0.7,1]$, collect all solutions resulting from such a re-analysis series in a set $\mathbf{M}$, and select the solution formulas with the most sub- and supermodels in $\mathbf{M}$. These are the solutions with the highest \emph{overlap in causal ascriptions} with the other solutions in $\mathbf{M}$. They are the most \emph{robust} solutions inferable from the data. This approach to robustness scoring is implemented in the function \code{frscored_cna(x, fit.range, granularity)} of the \proglang{R} package \href{https://cran.r-project.org/package=frscore}{\pkg{frscore}} \citep{frscore-pkg}.\footnote{In addition, the \proglang{R} package \href{https://cran.r-project.org/package=cnaOpt}{\textbf{cnaOpt}} \citep{cnaOptRef} provides functions for finding the maximal consistency and coverage scores obtainable from a given data set and for identifying models reaching those scores. For a discussion of possible applications of maximal scores see \citep{optimize}.} The function accepts all arguments of \code{cna()}, except for \code{con} and \code{cov}. It analyzes the data \code{x} with the sufficiency and necessity measures selected by \code{measures} at all threshold combinations in the interval \code{fit.range} with increments specified by \code{granularity}, and it scores the resulting models based on their robustness. 
<<frscore, eval=F>>=
library(frscore)
frscored_cna(d.pban, fit.range = c(1, 0.8), granularity = 0.1)
@


If the analyst does not want to conduct a whole robustness analysis, reasonable non-perfect threshold settings are \code{con} $=$ \code{cov} $=0.8$ or $0.75$. 
To illustrate, \code{cna()} does not build solutions for the $fs$ data named \code{d.jobsecurity} at the following \code{con} and \code{cov} thresholds (the argument \code{outcome} is explained in section \ref{order} below): 
<<cons1, eval=F>>=
cna(d.jobsecurity, outcome = "JSR", con = 1, cov = 1)
cna(d.jobsecurity, outcome = "JSR", con = .9, cov = .9)
@
But if \code{con} and \code{cov} are set to $0.75$, 20 solutions are returned with the default \code{measures} and 17 solutions with \code{measures = c("PAcon", "PACcov")}---the latter set of solutions being a subset of the former: 
<<cons2, eval=F>>=
cna(d.jobsecurity, outcome = "JSR", con = .75, cov = .75)
cna(d.jobsecurity, outcome = "JSR", con = .75, cov = .75,
   measures = c("PAcon", "PACcov"))
@
In the presence of noise, it is generally advisable to vary the \code{con} and \code{cov} settings to some degree to assess how sensitive the model space is to changes in tuning parameters and to evaluate the overlap in causal ascriptions across different solutions.  
Less complex solutions are generally preferable over more complex ones, and solutions with more overlap are preferable over solutions with less overlap. 
If the sufficiency and necessity scores of resulting solutions can be increased by raising the \code{con} and \code{cov} settings without, at the same time, disproportionately increasing the solutions' complexity, solutions with higher fit are preferable over solutions with lower fit. But if an increase in fit comes with a substantive increase in model complexity, less complex models with lower fit are to be preferred (to avoid overfitting).




\subsection[Outcome, ordering, and exclude]{Outcome, ordering, and exclude}\label{order}
\nopagebreak
In principle, the \code{cna()} function does not need to be told which factors in the data \code{x} have values that are endogenous (i.e.\ are outcomes). % and which ones have exogenous values (i.e.\ causes). 
It attempts to infer that from \code{x}. However, in ordinary research contexts, analysts do not start from scratch. They often possess some prior theoretical or causal knowledge about an investigated structure that allows them to, for example, identify potential outcomes or to exclude certain causal relationships. The \code{cna()} function provides three arguments---\code{outcome}, \code{ordering}, and \code{exclude}---through which prior causal information can be supplied to efficiently constrain the search space.  This section introduces these arguments and their interplay.

Prior knowledge about which factors have values that can figure as outcomes can be given to \code{cna()} via the argument \code{outcome}, which takes as input a character vector specifying one or several factor values that are to be considered as potential outcome(s). For $cs$ and $fs$ data, factor values are expressed by upper and lower cases (e.g.~\code{outcome = c("A", "b")}), for $mv$ data, they are expressed by the ``Factor=value'' notation (e.g.\ \code{outcome = c("A=1","B=3")}). The default is \code{outcome = TRUE}, which means that values of all factor in \code{x} are potential outcomes. For example, the following function call determines that of all 9 factors in \code{d.volatile}, only $VO2$, i.e.\ VO2 taking the value 1, is a potential outcome, meaning that \code{cna()} does not attempt to model values of any other factors as outcomes:
<<outcome1, eval=F>>=
cna(d.volatile, outcome = "VO2")
@

When the data \code{x} contain multiple potential outcomes, it may moreover be known that these outcomes are causally ordered in a certain way, to the effect that some of them are causally upstream of the others. Such information can be given to CNA via a \emph{causal ordering}, which is a relation $\text{A}\prec \text{C}$  (defined on the factors in \code{x}) entailing that values of C cannot cause values of A (e.g.\ because instances of A occur temporally before instances of C). That is, an ordering excludes certain causal dependencies but does not stipulate any. The corresponding argument, \code{ordering}, takes as value a character string. For example, \code{ordering = "A, B < C"} determines that factor C is causally located after A and B, meaning that values of C are not potential causes of values of A and B. The latter are located on the same level of the ordering, for A and B are unrelated by $\prec$, whereas C is located on a level that is downstream of the $\text{A},\text{B}$-level. If an ordering is provided, \code{cna()} only searches for MINUS-formulas in accordance with the ordering. 
%if no ordering is provided, \code{cna()} treats values of all factors in \code{x} as potential outcomes and explores whether a MINUS-formula for them can be inferred.
An ordering does not need to explicitly mention all factors in \code{x}. If only a subset of the factors are assigned to \code{ordering}, the non-included factors are entailed to be upstream of the included ones. Hence, \code{ordering = "C"} means that C is located downstream of all other factors in \code{x}. 

To further specify the causal ordering, the logical argument \code{strict} is available. It determines whether the elements of one level in an ordering can be causally related or not. For example, if \code{ordering = "A, B < C"} and \code{strict = TRUE}, then values of A and B are excluded to be causally related and \code{cna()} skips corresponding tests. By contrast, if \code{ordering = "A, B < C"} and \code{strict = FALSE}, then \code{cna()} also searches for dependencies among values of A and B. 

Let us illustrate this with the data \code{d.autonomy}. Relative to the following function call, which stipulates that values of AU cannot cause values of EM, SP, and CO and that the latter factors are not mutually causally related, \code{cna()} infers that $SP$ is causally relevant to $AU$:\footnote{The function \code{csf()} used in the following code builds the \emph{csf} from a \code{cna()} solution object; see section \ref{what}.}
<<ordering1, message = FALSE, warning=FALSE>>=
dat.aut.1 <- d.autonomy[15:30, c("AU","EM","SP","CO")]
ana.aut.1 <- cna(dat.aut.1, ordering = "EM, SP, CO < AU", strict = TRUE,
   con = .9, cov = .9)
printCols <- c("condition", "con", "cov")
csf(ana.aut.1)[printCols]
@
If we set \code{strict} to \code{FALSE} and, thereby, allow for causal dependencies among values of EM, SP, and CO, it turns out that $SP$ not only causes $AU$, but, on another causal path, also makes a difference to $EM$:
<<ordering2, message = FALSE, warning=FALSE>>=
ana.aut.2 <- cna(dat.aut.1, ordering = "EM, SP, CO < AU", strict = FALSE, 
   con = .9, cov = .9)
csf(ana.aut.2)[printCols]
@
The arguments \code{ordering} and \code{outcome} interact closely. It is often not necessary to specify both of them. For example, \code{ordering = "C", strict = TRUE} is equivalent to \code{outcome = "C"}. Still, it is important to note that the characters assigned to \code{ordering} are interpreted as \emph{factors}, whereas the characters assigned to \code{outcome} are interpreted as \emph{factor values}. This difference may require the specification of both \code{ordering} and \code{outcome}, in particular, when only specific values of the factors in the ordering are potential outcomes. To illustrate, compare the following two function calls:
<<outcome2, eval=F>>=
cna(d.pban, ordering = "T, PB", con = .75, cov = .75)
cna(d.pban, outcome = c("T=2", "PB=1"), ordering = "T, PB",
   con = .75, cov = .75)
@
The first call entails that any values of the factors T and PB, in that order, are located at the downstream end of the causal structure generating the data \code{d.pban}. It returns various solutions for $\text{PB}\id 1$ as well as for both $\text{T}\id 0$ and $\text{T}\id 2$. The second call, by contrast, narrows the search down to $\text{T}\id 2$ as only potential outcome value of factor T, such that no solutions for $\text{T}\id 0$ are produced.

%As of version 3.6.1 of \pkg{cna}, the \code{cna()} function provides yet another argument, called \code{exclude}, that allows users to constrain the search space based on causal knowledge available prior to the analysis. 

A causal ordering  excludes \emph{all} values of a factor as potential causes of an outcome. However, a user might only wish to exclude \emph{some} values as potential causes. This selective exclusion can be specified in the \code{exclude} argument. It is assigned a vector of character strings, where factor values to be excluded are listed to the left of the "\code{->}" sign, and the corresponding outcomes are listed to the right. For example, \code{exclude = "A=1,C=3 -> B=1"} excludes that the value 1 of factor A and the value 3 of factor C are considered as causes of the value 1 of factor B. It is also possible to exclude factor values as potential causes of multiple outcomes; for instance, \code{exclude = c("A,c -> B", "b,H -> D")}. In the context of $cs$ and $fs$ data, upper case letters are interpreted as 1, while lower case letters are interpreted as 0. If factor names have multiple letters, any upper case letter is interpreted as 1, and the absence of upper case letters as 0. For $mv$ data, the ``Factor=value'' notation is required. The \code{exclude} argument can be used either independently or in conjunction with \code{outcome} and \code{ordering}. But if the assignments to \code{outcome} and \code{ordering} contradict those to \code{exclude}, the assignments to \code{exclude} will be disregarded. %If \code{exclude} is assigned values of factors that do not appear in the data \code{x}, an error will be returned.

For example, the following call narrows down the search beyond what is specified in the \code{outcome} and \code{ordering} arguments by excluding the causal relevance of $\text{C}\id 2$ for $\text{T}\id 2$ and of $\text{T}\id 1$ and $\text{V}\id 0$ for $\text{PB}\id 1$:
<<exclude1, eval=F>>=
cna(d.pban, outcome = c("T=2", "PB=1"), ordering = "T, PB", 
   con = .75, cov = .75, exclude = c("C=2 -> T=2", "T=1,V=0 -> PB=1"))
@
Selective exclusion of certain causal relationships might also be the only type of prior knowledge available to an analyst. In that case, \code{exclude} can be used as sole search space constraint.
<<exclude2, eval=F>>=
cna(d.jobsecurity, con = .85, cov = .85, exclude = c("s,c -> JSR",
   "jsr, L -> R"))
@

In general, \code{cna()} should be given all the causal information about the interplay of the factors in the data that is available prior to the analysis. There often exist many MINUS-formulas that fit the data equally well. The more prior information \code{cna()} has at its disposal, the more specific the output will be, on average. 



\subsection[Maxstep]{Maxstep}\label{maxstep}
As will be exhibited in more detail in section \ref{algo}, \code{cna()} builds atomic solution formulas (\emph{asf}), \emph{viz.}\ minimally necessary disjunctions of minimally sufficient conditions (\emph{msc}), from the bottom up by gradually permuting and testing conjunctions and disjunctions of increasing complexity for sufficiency and necessity. The combinatorial search space that this algorithm has to scan depends on a variety of different aspects, for instance, on the number of factors in \code{x}, on the number of values these factors can take, on the number and length of the recovered \emph{msc}, etc. As the search space may be too large to be exhaustively scanned in reasonable time, the argument \code{maxstep} allows for setting an upper bound for the complexity of the generated \emph{asf}. \code{maxstep} takes a vector of three integers $c(i,j,k)$ as input, entailing that the generated \emph{asf} have maximally $j$ disjuncts with maximally $i$ conjuncts each and a total of maximally $k$ factor values. The default is \code{maxstep = c(3,4,10)}. The user can set it to any complexity level if computational time and resources are not an issue.

The \code{maxstep} argument is particularly relevant for the analysis of high dimensional data and data featuring severe model ambiguities. As an example of the first kind, consider the data \code{d.highdim} comprising 50 crisp-set factors, V1 to V50, and 1191 cases, which were simulated from a presupposed data-generating structure with the outcomes $V13$ and $V11$ (see the \pkg{cna} \href{https://cran.r-project.org/web/packages/cna/cna.pdf}{reference manual} for details). These data feature 20\% noise and massive fragmentation. At the default \code{maxstep}, the following analysis, which finds the complete data-generating structure, takes between 15 and 20 seconds to complete; lowering \code{maxstep} to \code{c(2,3,10)} reduces that time to less than one second, at the expense of only finding half of the data-generating structure:
<<maxstep0, eval=F>>=
cna(d.highdim, outcome = c("V13", "V11"), con = .8, cov = .8)
cna(d.highdim, outcome = c("V13", "V11"), con = .8, cov = .8,
   maxstep = c(2,3,10))
@



A telling example of extensive model ambiguities is the data set \code{d.volatile}. When only constrained by an ordering, \code{cna()} quickly recovers 416 complex solution formulas (\emph{csf}) at the default \code{maxstep}. But those are by far not all \emph{csf} that fit \code{d.volatile} equally well. When \code{maxstep} is increased only slightly to \code{c(4,4,10)}, the number of \emph{csf} jumps to 2860:\footnote{In the standard print method of \code{cna()}, the \code{n.init} parameter in \code{csf()} is set to 1000; to get all \emph{csf}, this parameter needs to be increased. See section \ref{what} for details.}
<<maxstep1, eval=F>>=
cna(d.volatile, ordering = "VO2", maxstep = c(3,4,10))
vol1 <- cna(d.volatile, ordering = "VO2", maxstep = c(4,4,10))
csf(vol1, n.init = 3000)
@




If \code{maxstep} is further increased, the number of solutions explodes and the analysis soon fails to terminate in reasonable time. When a complete analysis cannot be completed, \code{cna()} can be told to only search for \emph{msc} by setting the argument \code{suff.only} to its non-default value \code{TRUE}. As the search for \emph{msc} is the part of a CNA analysis that is least computationally demanding, it will typically terminate quickly and, thus, shed some light on the dependencies among the factors in \code{x} even when the construction of all models is infeasible. 
<<maxstep2, eval=F>>=
cna(d.volatile, ordering = "VO2", maxstep = c(8,10,40), suff.only = TRUE)
@
If \code{suff.only} is set to \code{TRUE}, CNA can process data of higher dimensionality than at the argument's default value. \citet{Yakovchenko:2020}, for example, run \code{cna()} on data comprising 73 exogenous factors with \code{suff.only = TRUE}. Based on the resulting \emph{msc}, they then select a proper subset of those factors for further processing.


While the \code{maxstep} argument is valuable for controlling the search space in case of high-dimensional and ambiguous data, it also comes with a pitfall: it may happen that \code{cna()} fails to find a model because of a \code{maxstep} that is too low. An example is \code{d.jobsecurity}. At the default \code{maxstep}, \code{cna()} does not build a solution, but if \code{maxstep} is increased, two solutions are found.
<<maxstep3>>=
ana.jsc.1 <- cna(d.jobsecurity, ordering = "JSR", con = .9, cov = .85)
csf(ana.jsc.1)[printCols]
ana.jsc.2 <- cna(d.jobsecurity, ordering = "JSR", con = .9, cov = .85,
  maxstep = c(3,5,12))
csf(ana.jsc.2)[printCols]
@
In sum, there are two possible reasons for why \code{cna()} fails to build a solution: (i) the chosen \code{maxstep} is too low; (ii) the chosen \code{con} and/or \code{cov} values are too high, meaning the processed data \code{x} are too noisy.
Accordingly, in case of a null result, two paths should be explored (in that order): (i) gradually increase \code{maxstep};
(ii) lower \code{con} and \code{cov}, as described in section \ref{cons} above.

\subsection[Negated outcomes]{Negated outcomes}

In classical logic, the law of Contraposition ensures that an expression of type $\Phi \leftrightarrow Y$ is equivalent to the expression that results from negating both sides of the double arrow: $\neg \Phi \leftrightarrow \neg Y$. Applied to the context of configurational causal modeling that entails that an \emph{asf} for $Y$ can be transformed into an \emph{asf} for the negation of $Y$, \emph{viz.}\ $y$, based on logical principles alone, i.e.\ without a separate data analysis. However, that transformability only holds for \emph{asf} with perfect consistency and coverage (\code{con = cov = 1}) that are inferred from exhaustive (non-fragmented) data (see section \ref{exhaustive} for details on exhaustiveness).
If an \emph{asf} of an outcome $Y$ does not reach perfect consistency or coverage or is inferred from fragmented data, identifying the causes of $y$ requires a separate application of \code{cna()}. % explicitly targeting the causes of the negated outcome(s). 

There are two ways to search for the causes of negated outcomes. The first is by simply specifying the factor values of interest in the \code{outcome} argument. While \code{outcome = c("A", "B")} yields MINUS-formulas for the positive outcomes $A$ and $B$,   \code{outcome = c("a", "b")} induces \code{cna()} to search for models of the corresponding negated outcomes. Alternatively, the argument \code{notcols} allows for negating the values of factors in $cs$ and $fs$ data (in case of $mv$ data, \code{cna()} automatically searches for models of all possible values of endogenous factors, thereby rendering \code{notcols} redundant). If \code{notcols = "all"}, all factors are negated, i.e.\ their values $i$ are replaced by $1-i$. If \code{notcols} is given a character vector of factors in the data, only the values of the factors in that vector are negated. For example, \code{notcols = c("A", "B")} determines that only $A$ and $B$ are negated. 

When processing $cs$ or $fs$ data, CNA should first be used to model the positive outcomes. If resulting \emph{asf} and \emph{csf} do not reach perfect consistency, coverage, and exhaustiveness scores (and the causes of the negated outcomes are of interest), a second CNA should be run negating the values of all factors that have been modeled as outcomes in the first CNA. To illustrate, we revisit our analysis of \code{d.autonomy} from section \ref{order}, which identified $AU$ and $EM$ as outcomes. The following two calls conduct analyses of the corresponding negated outcomes that produce the same solutions.
<<notcols, message = FALSE, warning=FALSE>>=
ana.aut.3 <- cna(dat.aut.1, outcome = c("au", "em"), con = .88, cov = .82) 
csf(ana.aut.3)[printCols] 
ana.aut.4 <- cna(dat.aut.1, ordering = "AU", con = .88, cov = .82,
   notcols = c("AU", "EM")) 
csf(ana.aut.4)[printCols] 
@


\section[The CNA algorithm]{The CNA algorithm}\label{algo}

This section explains the working of the algorithm implemented in the \code{cna()} function. We first provide an informal summary and then a detailed outline in four stages. The aim of \code{cna()} is to find all \emph{msc}, \emph{asf}, and \emph{csf} in the input data \code{x} that meet the thresholds \code{con} and \code{cov} of the sufficiency and necessity measures specified in \code{measures}  in accordance with \code{outcome}, \code{ordering}, \code{exclude}, and \code{maxstep}. The algorithm starts with single factor values and tests whether they meet \code{con}; if that is not the case, it proceeds to test conjunctions of two factor values, then to conjunctions of three, and so on. Whenever a conjunction meets \code{con} (and no proper part of it has previously been identified to meet \code{con}), it is automatically a minimally sufficient condition \emph{msc}, and supersets of it do not need to be tested any more. Then, it tests whether single \emph{msc} meet \code{con} and \code{cov}; if not, it proceeds to disjunctions of two, then to disjunctions of three, and so on. Whenever a disjunction meets \code{con} and \code{cov} (and no proper part of it has previously been identified to meet \code{con} and \code{cov}), it is automatically a minimally necessary disjunction of \emph{msc}, and supersets of it do not need to be tested any more. All and only those disjunctions of \emph{msc} that meet both \code{con} and \code{cov} are then issued as \emph{asf}. Finally, recovered \emph{asf} are conjunctively concatenated to \emph{csf} while ensuring that concatenation does not introduce new redundancies and that the resulting \emph{csf} comply with the chosen tuning settings. 

%eliminating structural redundancies and deleting tautologous and contradictory solutions as well as solutions with partial structural redundancies and constant factors.



The \code{cna()} algorithm can be more specifically broken down into four stages.


\vspace{-.2cm}
\begin{description}\item[Stage 1] On the basis of \code{outcome}, \code{ordering}, and \code{exclude}, \code{cna()} first builds a set of potential outcomes $\textbf{O}= \{\text{O}_h\id \omega_f,\ldots ,\text{O}_m\id \omega_g\}$ from the set of factors $\mathbf{F}=\{\text{O}_1,\ldots,\text{O}_n\}$ in \code{x},\footnote{Note that if \code{x} is a data frame, \code{cna()} first transforms \code{x} into a configuration table by means of \code{configTable(x)}. %, thereby passing the argument \code{type} (and the two additional arguments \code{rm.dup.factors} and \code{rm.const.factors}) to the \code{configTable()} function.
} 
where $1\leq h \leq m\leq n$, and second assigns a set of potential cause factors $\textbf{C}_{\text{O}_i}$ from $\mathbf{F}\setminus \{\text{O}_i\}$ to every element $\text{O}_i\id \omega_k$ of $\mathbf{O}$. If no \code{outcome} and \code{ordering} are provided, all value assignments to all elements of $\mathbf{F}$ are treated as possible outcomes in case of $mv$ data, whereas in case of $cs$ and $fs$ data $\mathbf{O}$ is set to $\{\text{O}_1\id 1, \ldots, \text{O}_n\id 1\}$. If both \code{ordering} and \code{exclude} are empty, all values of all factors in $\mathbf{F}\setminus \{\text{O}_i\}$ are treated as possible causes of $\text{O}_i\id \omega_k$, for every $\text{O}_i\id \omega_k\in\mathbf{O}$.

\item[Stage 2] \code{cna()} attempts to build a set \textbf{\textsc{msc}$_{\text{O}_i\id \omega_k}$} of minimally sufficient conditions that meet \code{con} for each $\text{O}_i\id \omega_k\in \mathbf{O}$. To this end, it first checks for each value assignment $\text{X}_h\id \chi_j$ of each element of $\textbf{C}_{\text{O}_i}$, such that $\text{X}_h\id \chi_j$ has a membership score above 0.5 in at least one case in \code{x}, whether $\text{X}_h\id \chi_j \,\rightarrow\, \text{O}_i\id \omega_k$ 
meets \code{con}. %, i.e.\ whether 
%$con(\text{X}_h\id \chi_j\,\rightarrow\, \text{O}_i\id \omega_k) \geq$ \code{con}. 
If, and only if, that is the case, $\text{X}_h\id \chi_j$ is put into the set  \textbf{\textsc{msc}$_{\text{O}_i\id \omega_k}$}. Next, \code{cna()} checks for each conjunction of two factor values $\text{X}_m\id \chi_j\,\att \,\text{X}_n\id \chi_l$ from $\textbf{C}_{\text{O}_i}$, such that $\text{X}_m\id \chi_j\,\att\, \text{X}_n\id \chi_l$ has a membership score above 0.5 in at least one case in \code{x} and no part of $\text{X}_m\id \chi_j\,\att\, \text{X}_n\id \chi_l$ is already contained in \textbf{\textsc{msc}$_{\text{O}_i\id \omega_k}$}, whether $\text{X}_m\id \chi_j\,\att\, \text{X}_n\id \chi_l\;\rightarrow\; \text{O}_i\id \omega_k$ meets \code{con}. If, and only if, that is the case, $\text{X}_m\id \chi_j\,\att\, \text{X}_n\id \chi_l$ is put into the set \textbf{\textsc{msc}$_{\text{O}_i\id \omega_k}$}. Next, conjunctions of three factor values with no parts already contained in \textbf{\textsc{msc}$_{\text{O}_i\id \omega_k}$} are tested, then conjunctions of four factor values, etc., until either all logically possible conjunctions of the elements of $\textbf{C}_{\text{O}_i}$ have been tested or \code{maxstep} is reached. Every non-empty \textbf{\textsc{msc}$_{\text{O}_i\id \omega_k}$} is passed on to the third stage.  

\item[Stage 3] \code{cna()} attempts to build a set \textbf{\textsc{asf}$_{\text{O}_i\id \omega_k}$} of atomic solution formulas for every $\text{O}_i\id \omega_k\in \mathbf{O}$, which has a non-empty \textbf{\textsc{msc}$_{\text{O}_i\id \omega_k}$}, by disjunctively concatenating the elements of \textbf{\textsc{msc}$_{\text{O}_i\id \omega_k}$} to minimally necessary conditions of $\text{O}_i\id \omega_k$ that meet \code{con} and \code{cov}. To this end, it first checks for each single condition $\Phi_h \in \text{\textbf{\textsc{msc}}}_{\text{O}_i\id \omega_k}$ 
whether $\Phi_h\rightarrow \text{O}_i\id \omega_k$ meets \code{con} and \code{cov}. If, and only if, that is the case, $\Phi_h$ is put into the set \textbf{\textsc{asf}$_{\text{O}_i\id \omega_k}$}. Next, \code{cna()} checks for each disjunction of two conditions $\Phi_m + \Phi_n$ from \textbf{\textsc{msc}}$_{\text{O}_i\id \omega_k}$, such that no part of $\Phi_m + \Phi_n$ is already contained in \textbf{\textsc{asf}$_{\text{O}_i\id \omega_k}$}, whether $\Phi_m + \Phi_n\rightarrow \text{O}_i\id \omega_k$ meets \code{con} and \code{cov}. If, and only if, that is the case, $\Phi_m + \Phi_n$ is put into the set \textbf{\textsc{asf}$_{\text{O}_i\id \omega_k}$}. Next, disjunctions of three conditions from \textbf{\textsc{msc}}$_{\text{O}_i\id \omega_k}$ with no parts already contained in \textbf{\textsc{asf}$_{\text{O}_i\id \omega_k}$} are tested, then disjunctions of four conditions, etc., until either all logically possible disjunctions of the elements of \textbf{\textsc{msc}}$_{\text{O}_i\id \omega_k}$ have been tested or \code{maxstep} is reached. Every non-empty \textbf{\textsc{asf}$_{\text{O}_i\id \omega_k}$} is passed on to the fourth stage. 


\item[Stage 4] \code{cna()} calls the function \code{csf()}, which builds a set \textbf{\textsc{csf}$_{\mathbf{O}}$} of complex solution formulas. This is done in a stepwise manner as follows. First, all logically possible conjunctions of exactly one element from every non-empty \textbf{\textsc{asf}}$_{\text{O}_i\id \omega_k}$ are constructed.
 Second, the conjunctions resulting from step 1 are freed of structural redundancies (cf.\ p.\ \pageref{minus_formula} above; \citealp{BaumFalk}), and  tautologous and contradictory solutions as well as solutions with constant factors are eliminated. Third, \emph{csf} with so-called \emph{partial structural redundancies}, which may arise from noisy data, are eliminated (see the Appendix on p.\ \pageref{redundant} for more). Fourth, if \code{acyclic.only = TRUE}, solutions with cyclic substructures are removed (see section \ref{cycles}). Fifth, for those solutions that were modified in the previous steps, the sufficiency and necessity scores specified by \code{measures} are re-calculated and solutions that no longer reach \code{con} or \code{cov} are deleted. Finally, the remaining solutions are checked for submodel relations: if one of them is a submodel of another one, it is deleted---the output of \code{cna()} shall be maximally informative. The remaining solutions are then returned as \textbf{\textsc{csf}$_{\mathbf{O}}$}. If there is only one non-empty set \textbf{\textsc{asf}}$_{\text{O}_i\id \omega_k}$, \textbf{\textsc{csf}$_{\mathbf{O}}$} is identical to \textbf{\textsc{asf}}$_{\text{O}_i\id \omega_k}$.\end{description}

To illustrate, the following code chunk, first, simulates the data in Table \ref{tab2}c, p.\ \pageref{tab2}, and second, runs \code{cna()} (and \code{csf()}) on that data at \code{con = .8} and \code{cov = .8}, with the default \code{measures} and \code{maxstep}, and without specification of \code{outcome}, \code{ordering}, and \code{exclude}.
<<tab2c, eval=F>>=
dat5 <- allCombs(c(2, 2, 2, 2, 2)) -1
dat6 <- selectCases("(A + B <-> C)*(A*B + D <-> E)", dat5)
set.seed(3) 
tab3c <- makeFuzzy(dat6, fuzzvalues = seq(0, 0.4, 0.01))
cna(tab3c, con = .8, cov = .8)
@

Table \ref{tab2}c contains data of type $fs$, meaning that the values in the data matrix are interpreted as membership scores in fuzzy sets. As is customary for this data type, we use uppercase italics for membership in a set and lowercase italics for non-membership. In the absence of any prior causal knowledge, the set of potential outcomes is determined to be $\mathbf{O}=\{A,B,C,D,E\}$ in stage 1, that is, the presence of each factor in Table \ref{tab2}c is treated as a potential outcome. Moreover, all other factors are potential cause factors of every element of $\mathbf{O}$, hence, $\textbf{C}_{A}=\{\text{B},\text{C},\text{D},\text{E}\}$, $\textbf{C}_{B}=\{\text{A},\text{C},\text{D},\text{E}\}$, $\textbf{C}_{C}=\{\text{A},\text{B},\text{D},\text{E}\}$, $\textbf{C}_{D}=\{\text{A},\text{B},\text{C},\text{E}\}$, and $\textbf{C}_{E}=\{\text{A},\text{B},\text{C},\text{D}\}$.

In stage 2, \code{cna()} succeeds in building non-empty sets of minimally sufficient conditions in compliance with \code{con} and \code{con} for all elements of $\mathbf{O}$: \textbf{\textsc{msc}}$_A =\{B\att d\att E\}$, \textbf{\textsc{msc}}$_B =\{C\att d,\, d\att E,\,$ $ a\att C\att D,\,$ $ a\att C\att E,\, a\att C\att e \}$, \textbf{\textsc{msc}}$_C =\{A,\, B,\, d\att E\}$, 
\textbf{\textsc{msc}}$_D =\{b\att E,\, a\att E,\, c\att E\}$, 
\textbf{\textsc{msc}}$_E =\{D,\, A\att B,\,$ $ A\att C\}$. 
But only the elements of \textbf{\textsc{msc}}$_C$ and \textbf{\textsc{msc}}$_E$ can be disjunctively combined to atomic solution formulas that meet \code{cov} in stage 3: \textbf{\textsc{asf}}$_C = \{A + B \leftrightarrow C\}$ and \textbf{\textsc{asf}}$_E = \{D + A\att B \leftrightarrow E, \; D + A\att C \leftrightarrow E \}$. For the other three factors in $\mathbf{O}$, the \code{cov} threshold of $0.8$ cannot be satisfied. \code{cna()} therefore abstains from issuing \emph{asf} for $A$, $B$ and $D$.  


Finally, in stage 4  one redundancy-free \emph{csf} is built from the inventory of \emph{asf} in \textbf{\textsc{asf}}$_C$ and \textbf{\textsc{asf}}$_E$, which constitutes \code{cna()}'s final output for Table \ref{tab2}c:
\begin{equation}\label{m8}
 (A \; +\; B \leftrightarrow C)\;\att\;(D \; +\; A\att B \leftrightarrow E)\;\;\;\,\,\; con=0.836 ;\; cov=0.897
 \end{equation}


\section[The output of CNA]{The output of CNA}\label{output}

\subsection[Customizing the output]{Customizing the output} \label{what}


The default output of \code{cna()} first lists the provided ordering (if any), second, the pre-identified outcomes (if any), third, the implemented sufficiency and necessity measures, fourth, the recovered \emph{asf}, and fifth, the \emph{csf}. \emph{Asf} and \emph{csf} are ordered by complexity and the product of their \code{con} and \code{cov} scores.
For \emph{asf} and \emph{csf}, three attributes are standardly computed: \code{con}, \code{cov}, and \code{complexity}. The first two correspond to a solution's scores on the selected variants of consistency and coverage (see section \ref{cons} above), and the \code{complexity} score amounts to the number of factor value appearances on the left-hand sides of "$\rightarrow$" or "$\leftrightarrow$" in \emph{asf} and \emph{csf}. %; and the \code{inus} attribute indicates whether a solution has the form of a well-formed MINUS structure (for more on the \code{inus} attribute cf.\ section \ref{inus} below).

As indicated on page \pageref{details_first}, %additional solution attributes can be requested via the argument \code{details}
\code{cna()} can also return the scores on the evaluation measures not used for model building by requesting them via the argument \code{details}. A number of additional solution attributes, all of which will be explained below, can be computed: \code{exhaustiveness}, and \code{faithfulness} for both \emph{asf} and \emph{csf}, as well as \code{coherence} and \code{cyclic} for \emph{csf}. These attributes are also accessible via the \code{details} argument by giving it a character vector that specifies the attributes to be computed: for example, \code{details = c("faithful\-ness", "exhaustiveness")}---the strings can be abbreviated, e.g. \code{"f"} for \code{"faithful\-ness"}, \code{"e"} for \code{"exhaustiveness"}, etc. 
<<details, eval=F>>=
cna(d.educate, details = c("e", "f", "co", "cy", "PAcon", "PACcov"))
@ 
 
The output of \code{cna()} can be further customized through the argument \code{what} that controls which solution items to print. It can be given a character string specifying the requested solution items: \code{"t"} stands for the configuration table, \code{"m"} for minimally sufficient conditions (\emph{msc}), \code{"a"} for \emph{asf}, \code{"c"} for \emph{csf}, and \code{"all"} for all solution items. 
<<what, eval=F>>=
cna(d.educate, what = "tm")
cna(d.educate, what = "mac")
cna(d.educate, what = "all")
@

As shown in section \ref{maxstep}, it can happen that many \emph{asf} and \emph{csf} fit the data equally well. The standard output of \code{cna()} only features 5 solution items of each type. To recover all \emph{msc} and \emph{asf} the functions \code{msc(x)} and \code{asf(x)} are available, where \code{x} is a solution object of \code{cna()}.
<<vol1, eval=F>>=
vol2 <- cna(d.volatile, ordering = "VO2", con = .9, cov = .9)
msc(vol2)
asf(vol2)
print(asf(vol2), Inf)
@
While \code{msc()} and \code{asf()} simply access the complete sets of \emph{msc} and \emph{asf} stored in \code{x}, the \emph{csf} are not stored in \code{x}. The construction of \emph{csf} in the fourth stage of the CNA algorithm is not conducted by the \code{cna()} function itself, rather, it is outsourced to the function \code{csf()} with these main arguments:
\begin{Code}
csf(x, n.init = 1000, cyclic.only = x$acyclic.only, details = x$details, 
   cycle.type = x$cycle.type, verbose = FALSE)
\end{Code}
The argument \code{details} is the same as in \code{cna()} (see p.\ \pageref{details_first}); the arguments \code{cyclic.only} and \code{cycle.type} will be further discussed in section \ref{cycles}, \code{n.init} and \code{verbose} are explained in the remainder of this one. It can happen that the set \textbf{\textsc{asf}$_{\text{O}_i\id \omega_k}$} contains too many \emph{asf} to construct all \emph{csf} in reasonable time. The argument \code{n.init} therefore allows for controlling how many conjunctions of \emph{asf} are initially built in the first step of \emph{csf} construction (see stage 4 of the CNA algorithm); it defaults to 1000. Increasing or lowering that default results in more or less \emph{csf} being built and in longer or shorter computing times, respectively.
<<vol1, eval=F>>=
csf(vol2, n.init = 2000)
csf(vol2, n.init = 100)
@
Setting the argument \code{verbose} to its non-default value \code{TRUE} prints some information about the \emph{csf} construction process to the console, e.g.\ how many structural redundancies or cyclic substructures have been eliminated along the way.
<<vol1, eval=F>>=
csf(vol2, verbose = TRUE)
@

% \subsection[INUS vs. non-INUS solutions]{INUS vs. non-INUS solutions} \label{inus}
% 
% The (M)INUS-theory of causation (cf.\ section \ref{models}) has been developed for strictly Boolean discovery contexts, meaning for deterministic (i.e.\ noise-free) data that feature perfectly sufficient and necessary conditions. In such contexts, some Boolean expressions can be identified as non-minimal (i.e.\ as featuring redundant elements) on mere \emph{logical grounds}, that is, independently of data. For instance, in an expression as \begin{equation}A \; +\; a\att B \;\leftrightarrow\; C\label{redun} \end{equation} $a$ in the second disjunct is redundant, for \eqref{redun} is logically equivalent to $A \, + \, B \;\leftrightarrow \;C$. These two formulas state exactly the same. Under no conceivable circumstances could $a$ as contained in \eqref{redun} ever make a difference to $C$. To see this, note that a necessary condition for $a\att B$ to be a complex cause of $C$ is that there exists a context $\mathcal{F}$ such that $C$ is only instantiated when \emph{both} $a$ and $B$ are given. That means that, in $\mathcal{F}$, $C$ is not instantiated if $B$ is given but $a$ is not, which, in turn, means that $C$ is not instantiated if $B$ is given and $A$ (\emph{viz.}\ not-$a$) is given. But such an $\mathcal{F}$ cannot possibly exist, for $A$ itself is sufficient for $C$ according to \eqref{redun}. It follows that in every context where $B$ is instantiated, a change from $A$ to $a$ is not associated with a change in $C$ (which takes the value 1 throughout the change in the factor A), meaning that $a$ cannot possibly make a difference to $C$ and, hence, cannot be a cause of $C$. 
% That is, \eqref{redun} can be identified as non-minimal independently of all data. \eqref{redun} is not a well-formed causal model. It is not a MINUS-formula, or not an \emph{INUS solution}. 
% 
% Correspondingly, a solution as \eqref{redun} or any other non-INUS expression will never be inferred from strictly deterministic data. By contrast, it is possible that a necessary disjunction of sufficient conditions that does \emph{not} have INUS form is redundancy-free (or minimal) relative to indeterministic data. Hence, the solution attribute \code{inus} makes explicit whether an \emph{asf} or \emph{csf} is an INUS solution. Moreover, the \code{cna()} and the \code{csf()} functions both have an argument \code{inus.only} controlling whether only INUS solutions (MINUS-formulas) shall be built or whether non-INUS solutions shall be issued as well (if they are inferable from data). As \code{inus.only} defaults to \code{TRUE}, standard calls of \code{cna()} and \code{csf()} will never yield non-INUS solutions, even if such solutions might be inferable from data. But if the analyst is interested in non-(M)INUS structures as well, she can request corresponding solutions by \code{inus.only = FALSE}.
% 
% 
% 
% To illustrate this, we first show that CNA  never infers non-(M)INUS solutions from deterministic data---even if \code{inus.only = FALSE}. For this purpose, we simulate ideal data from the non-INUS solution in \eqref{redun}; \code{cna(..., inus.only = FALSE)} will always, i.e.\ upon an open number of re-runs of the following code chunk, return $A \; + \; B \leftrightarrow C$, regardless of the fact that we select cases based on \eqref{redun} with \code{selectCases1("A + a*B <-> C", ...)}.
% <<inus1,  message = FALSE, warning=FALSE>>=
% dat.inu.1 <- allCombs(c(2, 2, 2)) -1
% dat.inu.2 <- some(dat.inu.1, 40, replace = TRUE)
% dat.inu.3 <- selectCases1("A + a*B <-> C",  con = 1, cov = 1, dat.inu.2)
% asf(cna(dat.inu.3, con = 1, cov = 1, inus.only = FALSE))
% @
% But in real-life discovery contexts, especially in observational studies, deterministic dependencies are the exception rather than the norm. Ordinary (observational) data are indeterministic, meaning that causes tend to be combined both with the presence and the absence of outcomes. In such discovery contexts, which can be simulated by lowering \code{con} and \code{cov} in \code{selectCases1()}, non-INUS expressions may count as minimally necessary disjunctions of minimally sufficient conditions. Correspondingly, if \code{inus.only} is set to \code{FALSE}, \code{cna()} may infer non-INUS solutions:
% <<inus2,  message = FALSE, warning=FALSE>>=
% set.seed(26)
% dat.inu.4 <- some(dat.inu.1, 40, replace = TRUE)
% dat.inu.5 <- selectCases1("A + a*B <-> C", con = .8, cov = .8, dat.inu.4)
% asf(cna(dat.inu.5, con = .8, cov = .8, inus.only = FALSE))
% @
% In indeterministic data, it can happen that $a$ is needed to lift the consistency of $B$ above the chosen \code{con} threshold. In such a case, $a$ can be argued to make a difference to $C$: only in conjunction with $a$ does $B$ reach \code{con}; and this holds notwithstanding the fact that $A$ itself also meets \code{con}. Or put differently, when the consistency of $A\rightarrow C$ is below 1, there exist cases where $A$ is instantiated and $C$ is not, which, in turn, renders it possible for a change from $A$ to $a$, while $B$ is constantly instantiated, to be associated with a change in $C$, meaning that $a$ can be a difference-maker for $C$.
% 
%  
% 
% If non-INUS expressions can be inferred from indeterministic data, the crucial follow-up question is whether the indeterminism in the data is due to insufficient control of background influences (i.e.\ to noise, measurement error, etc.)\ or to the inherent indeterministic nature of the physical processes themselves (as can e.g.\ be found in the domain of quantum mechanics, cf.\ \citealp{Albert:1992}). If the latter is the case, the difference-making relations stipulated by non-INUS solutions should be taken seriously. %and, hence, the argument \code{inus.only} should be set to its non-default value \code{FALSE}. 
% By contrast, if the former is the case (i.e.\ the indeterminism is due to noise), the difference-making relations entailed by non-INUS solutions are mere artifacts of the noise in the data, meaning that they would disappear if the corresponding causal structure were investigated under more ideal discovery circumstances. In that case, which  obtains in the macro domains to which CNA is typically applied, the argument \code{inus.only} should be left at its default value \code{TRUE} both in \code{cna()} and \code{csf()}. %, such that non-INUS solutions are not built to being with. 
% If \code{inus.only} is switched to \code{TRUE} in the \code{cna()}-call from the previous code chunk, no solution is returned any more, that is, there does not exist a MINUS-formula satisfying \code{con} and \code{cov} for \code{dat.inu.5}:
% <<inus4,  message = FALSE, warning=FALSE>>=
% asf(cna(dat.inu.5, con = .8, cov = .8, inus.only = TRUE))
% @
% The function behind the solution attribute \code{inus} and the argument \code{inus.only} is also available as stand-alone function \code{is.inus()}. Logical redundancies as contained in non-INUS solutions can be eliminated by means of the function \code{minimalize()} (see the \pkg{cna} \href{https://cran.r-project.org/web/packages/cna/cna.pdf}{reference manual} for details). 
% 

\subsection[Exhaustiveness]{Exhaustiveness and faithfulness}\label{exhaustive}


Exhaustiveness and faithfulness are two measures of model fit that quantify the degree of correspondence between the configurations that are, in principle, compatible with a solution $\mathbf{m}$ and the configurations actually contained in the data from which $\mathbf{m}$ is derived. To demonstrate those measures, let $\mathbf{F}_\mathbf{m}$ symbolize the set of factors with values contained in $\mathbf{m}$. Exhaustiveness is high when \emph{all} or most configurations of the factors in $\mathbf{F}_\mathbf{m}$ that are \emph{compatible} with $\mathbf{m}$ are actually contained in the data. More specifically, it amounts to the ratio of the number of configurations over $\mathbf{F}_\mathbf{m}$ in the data that are compatible with $\mathbf{m}$ to the total number of configurations over $\mathbf{F}_\mathbf{m}$ that are compatible with $\mathbf{m}$. %\footnote{In versions of the \pkg{cna} package prior to 3.1, exhaustiveness and faithfulness were not relativized to $\mathbf{F}_\mathbf{m}$. In consequence, the exhaustiveness and faithfulness scores obtained from previous package versions may be different from the current scores. } 
To illustrate, consider \code{d.educate}, which contains all configurations that are compatible with the two \emph{csf} issued by \code{cna()} and \code{csf()}:
<<d.edu1>>=
printCols <- c("condition", "con", "cov", "exhaustiveness")
csf(cna(d.educate, details = "exhaust"))[printCols]
@
If, say, the first configuration in \code{d.educate} (\emph{viz.} $U\att D\att L\att G\att E$) is not observed or removed---as in \code{d.educate[-1,]}---, \code{cna()} still builds the same solutions (with perfect consistency and coverage). In that case, however, the resulting \emph{csf} are not exhaustively represented in the data, for one configuration that is compatible with both \emph{csf} is not contained therein.
<<d.edu2>>=
csf(cna(d.educate[-1,], details = "exhaust"))[printCols]
@
In a sense, faithfulness is the complement of exhaustiveness. It is high when \emph{no} or only few configurations of the factors in $\mathbf{F}_\mathbf{m}$ that are \emph{incompatible} with $\mathbf{m}$ are in the data.
More specifically, faithfulness amounts to the ratio of the number of configurations over $\mathbf{F}_\mathbf{m}$ in the data that are compatible with $\mathbf{m}$ to the total number of configurations over $\mathbf{F}_\mathbf{m}$ in the data. The two \emph{csf} resulting from \code{d.educate} also reach perfect faithfulness:
<<d.edu3>>=
printCols <- c("condition", "con", "cov", "faithfulness")
csf(cna(d.educate, details = "faithful"))[printCols]
@
If we add a configuration that is not compatible with these \emph{csf}, say, $U\att D\att l\att G\att e$ and lower the consistency threshold, the same solutions along with one other result---this time, however, with non-perfect faithfulness scores.
<<d.edu4>>=
csf(cna(rbind(d.educate,c(1,1,0,1,0)), con = .8, details = "f"))[printCols]
@
If both exhaustiveness and faithfulness are high, the configurations over  $\mathbf{F}_\mathbf{m}$ in the data are all and only the configurations of the factors in $\mathbf{F}_\mathbf{m}$ that are compatible with $\mathbf{m}$. Low exhaustiveness and/or faithfulness, by contrast, means that the data do not contain many configurations of the factors in $\mathbf{F}_\mathbf{m}$ compatible with $\mathbf{m}$ and/or the data contain many configurations not compatible with $\mathbf{m}$. In general, solutions with higher exhaustiveness and faithfulness scores are preferable over solutions with lower scores.




\subsection[Coherence]{Coherence}


Coherence is a measure for model fit that is custom-built for \emph{csf}. It measures the degree to which the \emph{asf} combined in a \emph{csf} cohere, that is, are instantiated together in the data rather than independently of one another. Coherence is intended to capture the following intuition. Suppose a \emph{csf} entails that $A$ is a sufficient cause of $B$, which, in turn, is entailed to be a sufficient cause of $C$. Corresponding data $\delta$ should be such that the $A-B$ link of that causal chain and the $B-C$ link are either both instantiated or both not instantiated in the cases recorded in $\delta$. By contrast, a case in $\delta$ such that, say, only the $A-B$ link is instantiated but the $B-C$ link is not, pulls down the coherence of that \emph{csf}. The more such non-cohering cases are contained in $\delta$, the lower the overall coherence score of the \emph{csf}.  


Coherence is more specifically defined as the ratio of the number of cases satisfying all \emph{asf} contained in a \emph{csf} to the number of cases satisfying at least one \emph{asf} in the \emph{csf}. More formally, let a \emph{csf} contain $asf_1, asf_2, \ldots, asf_n$, coherence then amounts to (where $|\ldots|_\delta$ represents the cardinality of the set of cases in $\delta$ satisfying the corresponding expression): $$\frac{\left\vert{\,asf_1\att asf_2 \att \ldots\att asf_n\,}\right\vert_\delta}{\left\vert{\,asf_1 + asf_2 +\ldots+ asf_n\,}\right\vert_\delta}$$  

To illustrate, we add a case of type $U\att d\att L\att g\att e$ to \code{d.educate}. 
When applied to the resulting data (\code{d.edu.exp1}), \code{cna()} and \code{csf()} issue two \emph{csf}. 
<<rownames, results=hide, echo=F>>=
rownames(d.educate) <- 1:8
@
<<coherence>>=
d.edu.exp1 <- rbind(d.educate, c(1,0,1,0,0))
printCols <- c("condition", "con", "cov", "coherence")
csf(cna(d.edu.exp1, con = .8, details = "cohere"))[printCols]
@
In the added case, none of these two \emph{csf} cohere, as only one of their component \emph{asf} is satisfied. %Moreover, for the second \emph{csf} there is yet another non-cohering case in \code{d.edu.exp1} (case \#7). 

Coherence is an additional parameter of model fit that allows for selecting among multiple solutions: the higher the coherence score of a \emph{csf}, the better the overall model fit.  %In \code{d.edu.exp1}, thus, the first and the third \emph{csf} are preferable over the second solution subject to their superior coherence.



\newpage

\subsection[Cycles]{Cycles} \label{cycles}

Detecting causal cycles is one of the most challenging tasks in causal data analysis---in all methodological traditions. One reason is that factors in a cyclic structure are so highly interdependent that, even under optimal discovery conditions, the diversity of (observational) data tends to be too limited to draw informative conclusions about the data-generating structure. Various methods in fact are restricted to analyzing acyclic structures only (most notably, Bayes nets methods, cf.\ \citealp{Spirtes:2000}).

 % as the \emph{csf} inferred from the \code{d.autonomy} data in the previous section demonstrate
\code{cna()} and \code{csf()} output cyclic \emph{csf} if they fit the data. The optional solution attribute \code{cyclic} identifies those \emph{csf} that contain a cyclic substructure. A causal structure has a cyclic substructure if, and only if, it contains a directed causal path from at least one cause back to itself. The MINUS theory spells this criterion out more explicitly as follows: 
\begin{description}\item[Cycle]
A complex MINUS-formula $\mathbf{m}$ has a cyclic substructure if, and only if, $\mathbf{m}$ contains a sequence $\langle Z_1, Z_2,\ldots, Z_n\rangle$ such that every $Z_i$ is contained in an atomic MINUS-formula  of $Z_{i+1}$ and $Z_1=Z_n$ in $\mathbf{m}$.
\end{description}
To illustrate, consider the following analysis of the \code{d.autonomy} data.

<<cycle>>=
printCols <- c("condition", "con", "cov", "cyclic")
csf(cna(d.autonomy, ordering = "AU", con = .9, cov = .94,
   details = "cy", maxstep = c(2, 2, 8)))[printCols]
@
All \emph{csf} inferred in this analysis contain the cyclic sequence  $\langle SP, EM, SP\rangle$ and, thus, represent causal cycles.
% 
% with a simple example, \eqref{cycl} contains the cyclic sequence $\langle A, B, A\rangle$ and, thus, represents a causal cycle:
% \begin{equation}\label{cycl}
% (A\att c \, + \, D\att E\; \leftrightarrow\; B)\, \att \, (B\att F\, +\, G\att h \;\leftrightarrow\; A)\end{equation}
Typically, when cyclic models fit the data, the output of \code{cna()} and \code{csf()} is very ambiguous. Therefore, if there are independent reasons to assume that the data are not generated by a cyclic structure, both \code{cna()} and \code{csf()} have the argument \code{acyclic.only}, which, if set to its non-default value \code{TRUE}, prevents solutions with cycles from being returned and, thereby, reduces model ambiguities. For example, by switching \code{acyclic.only} from \code{FALSE} to \code{TRUE} in the following analysis, the solution space is reduced from 72 to 23:
<<cycle1>>=
csf(cna(d.irrigate, con = .75, cov = .75, acyclic.only = F)) |> nrow()
csf(cna(d.irrigate, con = .75, cov = .75, acyclic.only = T)) |> nrow()
@

The \code{cycle.type} argument---also available in both \code{cna()} and \code{csf()}---controls whether a cyclic sequence $\langle Z_1, Z_2,\ldots, Z_n\rangle$ is composed of factors (\code{cycle.type = "factor"}), which is the default, or factor values (\code{cycle.type = "value"}). To illustrate the difference, if \code{cycle.type = "factor"}, \eqref{cycl2} counts as cyclic: 
\begin{equation}\label{cycl2} (A\, +\, B\; \leftrightarrow \;C)\,*\,(c\, +\, D\; \leftrightarrow\; A)\end{equation} The factor A (with value 1) appears in an \emph{asf} of $C$ (i.e.\ $\text{C}\id 1$), and the factor C (with value 0) appears in an \emph{asf} of $A$. But if \code{cycle.type = "value"}, \eqref{cycl2} does not pass as cyclic. Although $A$ appears in an \emph{asf} of $\text{C}\id 1$, that same value of C does not appear in an \emph{asf} of $A$; rather, $\text{C}\id 0$ appears in the \emph{asf} of $A$. 

The function behind the solution attribute \code{cyclic} and behind the corresponding \code{cna()} arguments is also available as stand-alone function \code{cyclic()} (see the \pkg{cna} \href{https://cran.r-project.org/web/packages/cna/cna.pdf}{reference manual} for details).



\subsection[Plotting the output]{Plotting the output} \label{plots}

MINUS-formulas can be visualized as causal hypergraphs, which are related to directed acyclic graphs (DAGs; \citealp{greenland1999,Spirtes:2000}), the most widely used tool for visualizing causal structures. But while edges in DAGs connect exactly two nodes, indicating the direction of causation, edges in hypergraphs can connect more than two nodes and, thereby, represent more than just the direction of causation. Causal hypergraphs can merge nodes into bundles and then connect these bundles to other nodes. This allows for representing conjunctive and disjunctive groupings of causes and, accordingly, for capturing the causal complexity encoded in MINUS-formulas. Furthermore, while DAGs are assumed not to contain cycles, causal hypergraphs may include cycles to accommodate the fact that MINUS-formulas may have cyclic substructures. 

For convenience, we use the acronym CHG to refer to causal hypergraphs. A CHG is a pair $(\mathbf{F},\mathbf{E})$, where $\mathbf{F}$ is a set of nodes and $\mathbf{E}$ is a set of ordered pairs $\langle \mathbf{F}_i, \mathbf{F}_j\rangle$ of disjoint subsets of $\mathbf{F}$.\footnote{For more on hypergraphs see e.g.\ \citep{Gallo1993} or \citep{Bretto2013}.} Each of these ordered pairs $\langle \mathbf{F}_i, \mathbf{F}_j\rangle \in \mathbf{E}$ is called a directed \emph{hyperedge}. The subset $\mathbf{F}_i$ is called the \emph{tail} of the hyperedge, $\mathbf{F}_j$ is its \emph{head}. The heads of hyperedges in CHGs representing MINUS-formulas are always singleton sets, whereas their tails can contain multiple elements.\footnote{Such hyperedges are also called \emph{backward} hyperedges, cf.\ (\citealp{Pons2024}).} Just as edges in DAGs, hyperedges in CHGs represent the relation of direct causal relevance. But while nodes in DAGs represent factors or variables such as $\mathrm{A}$ and $\mathrm{B}$, the nodes in CHGs represent factor values such as $A$, $b$, or $\text{C}\id 3$. Hence, DAGs represent causal relationships between factors or variables, whereas CHGs represent causal relationships between factor values.


\begin{figure}
\captionsetup[subfigure]{oneside,margin={-1.2cm,0cm}}
\subfloat[.49\textwidth][set-CHG]{\includegraphics[width=.46\textwidth]{plot1.pdf}}
\hspace{1.3cm}
\captionsetup[subfigure]{oneside,margin={-1.8cm,0cm}}
\subfloat[.49\textwidth][mv-CHG]{\includegraphics[width=.46\textwidth]{plot2.pdf}}
\caption{Causal Hypergraphs.}\label{fig3}
\end{figure}
There are two types of CHGs: \emph{set-CHGs} for causal structures involving values of crisp-set or fuzzy-set factors and \emph{mv-CHGs} for structures of multi-value factors. Besides nodes and directed hyperedges, set-CHGs contain two further graphical elements: ``${\bullet}$'' for bundling nodes in a conjunction and ``${\scriptstyle{\Diamond}}$'' at tails of hyperedges for negating factor values. Mv-CHGs also symbolize conjunction through ``${\bullet}$'', but instead of a negation sign, they feature numeric values directly assigned to the tails and heads of hyperedges indicating the factor values that are connected by the hyperedge. In both set- and mv-CHGs, hyperedges with the same head form a disjunction. Figure \ref{fig3}a is an example of a set-CHG, and Figure \ref{fig3}b an example of an mv-CHG.


The \proglang{R} package \href{https://cran.r-project.org/package=causalHyperGraph}{\pkg{causalHyperGraph}} provides functions to visualize the output of \code{cna()} as CHGs. The most basic plotting function is \code{plot(x)}, which takes a solution object \code{x} of \code{cna()} as input an draws the solution formulas contained in \code{x} as CHGs. For example, the following code draws the set-CHG in Figure \ref{fig3}a from the  solution object \code{cna()} infers from \code{d.women}:

<<dat.redun, eval=F>>=
library(causalHyperGraph)
ana.d.women <- cna(d.women)
plot(ana.d.women)
@

There is also a function \code{causalHyperGraph(x)} that takes a character vector \code{x} expressing  MINUS-formulas as input and draws the corresponding CHGs. To illustrate, the mv-CHG in Figure \ref{fig3}b is drawn as follows:
<<dat.redun, eval=F>>=
causalHyperGraph("(A=1*B=2 + C=0*D=2 <-> E=1)*(E=1 + F=0 <-> G=1)")
@

\section[Interpreting the output]{Interpreting the output}

The ultimate output of \code{cna()} and \code{csf()} is the set \textbf{\textsc{csf}$_{\mathbf{O}}$} of \emph{csf}, which may coincide with \emph{asf} if the data contain only one endogenous factor. The causal inferences warranted by the data input \code{x}, relative to the selected \code{measures} and their associated \code{con} and \code{cov} thresholds, while adhering to the specified \code{outcome}, \code{ordering}, \code{exclude}, and \code{maxstep} settings, must be drawn from the set \textbf{\textsc{csf}$_{\mathbf{O}}$}. This section explains the final interpretative step in a CNA analysis.


%
There are three possible types of outputs:
\begin{enumerate}\itemsep0pt
\item \textbf{\textsc{csf}$_{\mathbf{O}}$} contains no \emph{csf} (and, correspondingly, no \emph{asf});
\item \textbf{\textsc{csf}$_{\mathbf{O}}$} contains exactly one \emph{csf} (and, correspondingly, exactly one \emph{asf} for each endogenous factor);
\item \textbf{\textsc{csf}$_{\mathbf{O}}$} contains more than one \emph{csf}  (and, correspondingly, more than one \emph{asf} for at least one endogenous factor).
\end{enumerate}

\subsection[No solution]{No solution}
As indicated in section \ref{maxstep}, a null result can have two sources: either the data are too noisy to render the \code{con} and \code{cov} thresholds satisfiable or the \code{maxstep} is too low. If increasing \code{maxstep} does not yield solutions at the chosen \code{con} and \code{cov} thresholds, the latter may be lowered, preferably with a concomitant robustness analysis as described in section \ref{cons}. If no solutions are recovered at \code{con = cov = .7}, the data are too noisy to warrant reliable causal inferences. Users are then advised to go back to the data and follow standard guidelines (known from other methodological frameworks) to improve data quality, e.g.\ by integrating further relevant factors into the analysis, enhancing the control of unmeasured causes, expanding the population of cases or disregarding inhomogeneous cases, correcting for measurement error, supplying missing values, etc. 

It must be emphasized again (see section \ref{inference}) that, under normal circumstances, an empty \textbf{\textsc{csf}$_{\mathbf{O}}$} does not warrant the conclusion that the factors contained in the data input \code{x} are causally irrelevant to one another. The inference to causal irrelevance is much more demanding than the inference to causal relevance. 
A null result only furnishes evidence for causal irrelevance if there are independent reasons to assume that all potentially relevant factors are measured in \code{x} and that \code{x} exhausts the space of empirically possible configurations.
 


\subsection[A unique solution]{A unique solution}
That \textbf{\textsc{csf}$_{\mathbf{O}}$} contains  exactly one \emph{csf} is the optimal completion of a CNA analysis. It means that the data input \code{x} contains sufficient evidence for a determinate causal inference. The factor values on the left-hand sides of ``$\leftrightarrow$'' in the \emph{asf} constituting that \emph{csf} can be interpreted as causes of the factor values on the right-hand sides. % of ``$\leftrightarrow$''
Moreover, their conjunctive, disjunctive, and sequential groupings reflect the actual properties of the data-generating causal structure. 

Plainly, as with any other method of causal inference, the reliability of CNA's causal conclusions essentially hinges on the quality of the processed data. If the data satisfy homogeneity (see section \ref{inference}), a unique solution is guaranteed to correctly reflect the data-generating structure. With increasing data deficiencies (noise, fragmentation), the (inductive) risk of committing causal fallacies inevitably increases as well. For details on the degree to which the reliability of CNA's causal conclusions decreases with increasing data deficiencies see \citep{BaumgartnerfsCNA} and \citep{Parkkinen2023}. 





\subsection[Multiple solutions]{Multiple solutions}\label{ambigu}

If \textbf{\textsc{csf}$_{\mathbf{O}}$} has more than one element, the processed data underdetermine their own causal modeling. That means the evidence contained in the data is insufficient to determine which of the solutions contained in \textbf{\textsc{csf}$_{\mathbf{O}}$} corresponds to the data-generating causal structure. An output set of multiple solutions $\{csf_1, csf_2, ..., csf_n\}$ is to be interpreted \emph{disjunctively}: the data-generating causal structure is %correctly reflected by 
$$csf_1\; \text{ OR }\; csf_2 \;\text{ OR }\; ... \;\text{ OR }\; csf_n$$ but, based on the evidence contained in the data, it is ambiguous which of these $csf$ is actually operative. 

That empirical data underdetermine their own causal modeling is a very common phenomenon in all methodological traditions (\citealp{simon1954}; \citealp[59-72]{Spirtes:2000};  \citealp{Kalisch2012}; \citealp{eberhardt2013}; \citealp{BaumgartnerAmbigu}). But while some methods are designed to automatically generate all fitting models, e.g.\ Bayes nets methods and configurational comparative methods, other methods rely on search heuristics that zoom in on one best fitting model only, e.g.\ logic regression or regression analytic methods, more generally. Whereas model ambiguities have long been a thoroughly investigated topic in certain traditions, such as Bayes nets methods, they are only beginning to be studied in the literature on configurational comparative methods. %, in particular on QCA. %There is still an unfortunate practice of model-underreporting in QCA studies. In fact, most QCA software regularly fails to find all data-fitting models. The only currently available QCA program that recovers the whole model space by default is \pkg{QCApro} \citep{Thiem2018}.

CNA---on a par with any other method---cannot disambiguate what is empirically underdetermined. Rather, it draws those and only those causal conclusions for which the data \emph{de facto} contain evidence. In cases of empirical underdetermination it therefore renders transparent all data-fitting models and leaves the disambiguation up to the analyst.

That \code{cna()} and \code{csf()} issue multiple solutions for some data input \code{x} does not necessarily mean that \code{x} is deficient. In fact, even data that are \emph{ideal} by all quality standards of configurational causal modeling can give rise to model ambiguities. The following simulates a case in point:
<<ambigu1>>=
dat7 <- selectCases("a*B + A*b + B*C <-> D")
printCols <- c("condition", "con", "cov", "exhaustiveness","faithfulness")
csf(cna(dat7, details = c("ex", "fa")))[printCols]
@
\code{dat7} induces perfect consistency and coverage scores and is free of fragmentation; it contains all and only the configurations that are compatible with the target structure, which accordingly is exhaustively and faithfully reflected in \code{dat7}. Nonetheless, two models can be inferred. The causal structures expressed by these two models generate the exact same data, meaning they are \emph{empirically indistinguishable}. 

Although a unique solution is more determinate and, thus, preferable to multiple solutions, the fact that \code{cna()} and \code{csf()} generate multiple equally data-fitting models is not generally an uninformative result. In the above example, both models feature $a\att B \, + \, A\att b$. That is, the data contain enough evidence to establish the joint relevance of $a\att B$ and of $A\att b$ for $D$ (on alternative paths). What is more, it can be conclusively inferred that $D$ has a further complex cause, \emph{viz.}\ either $A\att C$ or $B\att C$. It is merely an open question which of these candidate causes is actually operative.

That different model candidates have some \emph{msc} in common is a frequent phenomenon. Here's a real-life example, where two alternative causes, \emph{viz.} $C\id 1\; + \; F\id 2$, are present in all solutions:
<<ambigu2>>=
csf(cna(d.pban, cov = .95, maxstep = c(3, 5, 10)))["condition"]
@
Such commonalities can be reported as conclusive results.

Moreover, even though multiple solutions do not permit pinpointing the causal structure behind an outcome, they nonetheless allow for constraining the range of possibilities. In a context where the causes of some outcome are unknown it amounts to a significant gain of scientific insight when a study can show that the structure behind that outcome has one of a small number of possible forms, even if it cannot determine which one exactly.

However, the larger the amount of data-fitting solutions and the lower the amount of commonalities among them, the lower the overall informativeness of a CNA output. Indeed, if data fragmentation is high, meaning if there are many unobserved possible configurations, the ambiguity ratio in configurational causal modeling can reach dimensions where nothing at all can be concluded about the data-generating structure any more. Hence, a highly ambiguous result is on a par with a null result. A telling example of this sort is \code{d.volatile} which was discussed in section \ref{maxstep} above (cf.\ also \citealp{BaumgartnerAmbigu}).

%As the problem of model ambiguities is still under-investigated in the CNA literature, there do not yet exist conventionalized guidelines for how to proceed in cases of ambiguities. 
The model fit scores and solution attributes reported in the output objects of \code{cna()} and \code{csf()} often provide some leverage to narrow down the space of model candidates. For instance, if, in a particular discovery context, there is reason to assume that data have been collected as exhaustively as possible, to the effect that most configurations compatible with an investigated causal structure should be contained in the data, the model space may be restricted to \emph{csf} with a high score on exhaustiveness. By way of example, for \code{d.pban} a total of 14 \emph{csf} are built at \code{cov = .95}:
<<ambigu3>>=
ana.pban <- cna(d.pban, cov = .95, maxstep = c(6, 6, 10),
   details = c("fa", "ex"))
csf.pban <- csf(ana.pban)
length(csf.pban$condition)
@
If only \emph{csf} with \code{exhaustiveness >= .85} are considered, the amount of candidate \emph{csf} is reduced to 2:
<<ambigu4>>=
csf.pban.ex <- subset(csf.pban, exhaustiveness >= .85)
length(csf.pban.ex$condition)
@
To also resolve this final ambiguity, complexity may be brought to bear. Among equally data-fitting models the less complex ones are generally preferable because they are less likely to be overfitted and make less causal claims, resulting in a lower error risk. In the above example, if complexity is required to be as low as possible, only one model remains:
<<ambigu5>>=
subset(csf.pban.ex, complexity == min(csf.pban.ex$complexity))
@



Clearly though, the fit parameters and solution attributes provided by \code{cna()} and \code{csf()} will not always provide a basis for complete ambiguity elimination. The evidence contained in data is often insufficient to draw determinate causal conclusions. In such instances, data-external sources of information, such as prior causal knowledge or background theories, may be available that can be used to suitably constrain the search space of \code{cna()} via the arguments \code{outcome}, \code{ordering}, or \code{exclude} (see section \ref{order} above). This tends to bring down ambiguities significantly. Moreover, the next section will show how knowledge about individual cases in the data can be leveraged to select among the model candidates. Nevertheless, it may also be impossible to resolve all ambiguities when the evidence in the data is complemented by data-external sources of information. 

The most important course of action in the face of ambiguities is to \emph{render them transparent}. By default, readers of CNA publications should be informed about the degree of ambiguity. Full transparency with respect to model ambiguities, first, allows readers to determine for themselves how much confidence to have in the conclusions drawn in a study, and second, paves the way for follow-up studies that are purposefully designed to resolve previously encountered ambiguities.



\subsection["Back to the cases"]{``Back to the cases''}\label{back}


When CNA is applied to small- or intermediate-$N$ data, researchers may be familiar with some or all of the cases in their data. For instance, they may know that in a particular case certain causes of an outcome are operative while others are not. Or they may know why certain cases are outliers or why others feature an outcome but none of the potential causes. A proper interpretation of a CNA result may therefore require that the performance of the obtained models be assessed on the case level and against the background of the available case knowledge.  

The function that facilitates the evaluation of recovered \emph{msc}, \emph{asf}, and \emph{csf} on the case level is \code{condition(x, ct, measures)}. Its first input is a character vector \code{x} specifying Boolean expres\-sions, typically \emph{asf} or \emph{csf}. The second input is a data frame or configuration table \code{ct}. The \code{measures} argument is the same as in the \code{cna()} function (see section \ref{cons} above): it expects a character vector of length 2 indicating the measures to be used for evaluating sufficiency and necessity. In case of $cs$ or $mv$ data, the output of \code{condition()} then highlights in which cases \code{x} is instantiated, whereas for $fs$ data, the output lists relevant membership scores in exogenous and endogenous factors. Moreover, if \code{x} is an \emph{asf} or \emph{csf}, \code{condition()} issues their scores on the chosen \code{measures}. 

% Instead of a configuration table, it is also possible to give \code{condition()} a data frame as second input. In this case, the data type must be specified using the \code{type} argument. To abbreviate the specification of the data type, the functions \code{cscond(x, ct)}, \code{mvcond(x, ct)}, and \code{fscond(x, ct)} are available as shorthands.

To illustrate, we re-analyze \code{d.autonomy}:
<<back1, results=hide>>=
dat.aut.2 <- d.autonomy[15:30, c("AU","EM","SP","CO","RE","DE")]
ana.aut.5 <- cna(dat.aut.2, outcome = c("EM","AU"), con = .91, cov = .91)
condition(csf(ana.aut.5)$condition, dat.aut.2)
@
That function call returns a list of three tables, each corresponding to one of the three \emph{csf} contained in \code{ana.aut.5} and breaking down the relevant \emph{csf} to the case level by contrasting the membership scores in the left-hand and right-hand sides of the component \emph{asf}. A case with a higher left-hand score is one that pulls down the sufficiency measure (e.g.\ standard consistency), whereas a case with a higher right-hand score pulls down the necessity measure (e.g.\ standard coverage). For each \emph{csf}, \code{condition()} moreover returns overall scores on the chosen evaluation \code{measures} as well as the corresponding scores for the component \emph{asf}.

The three \emph{csf} in \code{ana.aut.5} differ only in regard to their component \emph{asf} for outcome $AU$. The function \code{group.by.outcome(condList)}, which takes an output object \code{condList} of \code{condition()} as input, lets us more specifically compare these different \emph{asf} with respect to how they fare on the case level. 
<<back2>>=
group.by.outcome(condition(asf(ana.aut.5)$condition, dat.aut.2))$AU
@
The first three columns of that table list the membership scores of each case in the left-hand sides of the \emph{asf}, and the fourth column reports the membership scores in $AU$. The table shows that the first \emph{asf} ($SP \leftrightarrow AU$) outperforms the other \emph{asf} in cases ENacg3/6/7, ENacto1, ENacosa1, and ENacat3, while it is outperformed by another \emph{asf} in cases ENacg2 and ENacg4. In all other cases, the three solution candidates fare equally. If the analyst is closely familiar with some of these cases, performance differences on the case level can help to choose among the candidates. For instance, if it is known that there are no other factors operative in case ENacg7 than the ones contained in \code{dat.aut.2}, it follows that ENacg7's full membership in $AU$ must be brought about by $SP$---which, in turn, disqualifies the other solutions. By contrast, if the absence of other relevant factors can be assumed for case ENacg4, the \emph{asf} featuring $SP$ as cause of $AU$ is disqualified.


\section[Benchmarking]{Benchmarking}\label{bench}


Benchmarking the reliability of a method of causal inference is an essential element of method development and validation. In a nutshell, it amounts to testing to what degree the benchmarked method recovers the true data-generating structure $\Delta$ or proper substructures of $\Delta$ from data of varying quality. As $\Delta$ is not normally known in real-life discovery contexts, the reliability of a method cannot be assessed by applying it to real-life data. Instead, reliability benchmarking is done in so-called \emph{inverse searches}, which reverse the order of causal discovery as it is commonly conducted in scientific practice. An inverse search comprises three steps:
\begin{enumerate}\itemsep0pt\item[(1)] a data-generating causal structure $\Delta$ is presupposed/drawn (as ground truth),
\item[(2)] artificial data $\delta$ is simulated from $\Delta$, possibly featuring various deficiencies (e.g.\ noise or fragmentation),  
\item[(3)] $\delta$ is processed by the tested method in order to check whether its output meets the tested reliability benchmark. \end{enumerate} 

A benchmark test can measure various properties of a method's output, for instance, whether it is error-free, correct or complete, etc. 
As real-life data are often fragmented, methods for MINUS discovery typically do not infer the complete $\Delta$ from a real-life $\delta$ but only proper substructures thereof (see section \ref{inference}). Thus, since completeness is not CNA's primary aim, it should likewise not be the primary reliability benchmark for CNA; it is more important that its output scores high on \emph{error-freeness} and \emph{correctness}. 

CNA's output, \emph{viz.}\ the issued set \textbf{\textsc{csf}$_{\mathbf{O}}$} of \emph{csf}, is error-free iff it does not entail a causal claim that is false of the ground truth $\Delta$ (i.e.\ no false positive). 
That can be satisfied in two ways: either (i) \textbf{\textsc{csf}$_{\mathbf{O}}$} is empty, meaning no causal inferences are drawn, or (ii) \textbf{\textsc{csf}$_{\mathbf{O}}$} contains at least one\footnote{Recall from section \ref{ambigu} that an output containing multiple solutions is to be interpreted disjunctively; and a disjunction of solutions is true iff at least one solution is true.} solution $\mathbf{m}_i$ that is correct of $\Delta$, which is the case iff $\mathbf{m}_i$ is a submodel of $\Delta$ (for details on the submodel relation see section \ref{inference}). So, \textbf{\textsc{csf}$_{\mathbf{O}}$} satisfies the error-freeness benchmark iff it satisfies conditions (i) or (ii). With increasing stringency, \textbf{\textsc{csf}$_{\mathbf{O}}$} can then be said to be correct of $\Delta$ iff condition (ii) is satisfied, meaning \textbf{\textsc{csf}$_{\mathbf{O}}$} actually contains at least one solution $\mathbf{m}_i$ that is a submodel of $\Delta$, and thus correct. Finally, completeness measures the informativeness of \textbf{\textsc{csf}$_{\mathbf{O}}$}, that is, the ratio of causal properties of $\Delta$ captured and revealed by the solutions in \textbf{\textsc{csf}$_{\mathbf{O}}$}.



The \pkg{cna} package provides many functionalities to conduct inverse searches that are tailor-made to benchmark the output of \code{cna()} and \code{csf()}. The functions \code{randomAsf()} and \code{randomCsf()} can be used to draw a data-generating structure $\Delta$ in step (1). \code{randomAsf(x)} generates a structure with a single outcome (i.e.\ a random \emph{asf}) and \code{randomCsf(x)} an acyclic multi-outcome structure (i.e.\ a random \emph{csf}), where \code{x} is a data frame or \code{configTable} defining the factors and their possible values from which the structures are drawn. The function \code{selectCases()}, which has already been discussed in section \ref{simul}, can be employed to simulate data $\delta$ in the course of step (2). Finally, \code{is.submodel(x, y)} determines whether models are related by the submodel relation, which, in turn, helps in assessing whether \textbf{\textsc{csf}$_{\mathbf{O}}$} is true of $\Delta$. 
\code{is.submodel()} takes a character vector \code{x} of \emph{asf} as first input and tests whether the elements of that vector are submodels of \code{y}, which, in turn, is a character string of length 1 representing the target \emph{asf} (i.e.\ $\Delta$). If $\Delta$ is a \emph{csf} with multiple outcomes, the function \code{causal\_submodel(x, y)} from the \href{https://cran.r-project.org/package=frscore}{\pkg{frscore}} package should be used to determine whether \code{x} is true of \code{y}.
%\footnote{Note that \code{is.submodel()} only yields adequate correctness assessments if ground truths have no more than one outcome. If ground truths have multiple outcomes, the function \code{causal\_submodel(x, y)} from the \href{https://cran.r-project.org/package=frscore}{\pkg{frscore}} package should be used instead.} 
Moreover, the function \code{identical.model(x, y)} is available to check whether \code{x} (which must have length 1) and \code{y} are identical.




Against that background, the following might be a core of a error-freeness benchmark test that simulates multi-value data with 20\% missing observations and 10\% random noise (i.e.\ cases incompatible with the ground truth), and that runs \code{cna()} and \code{csf()} using the evaluation \code{measures} of prevalence-adjusted consistency and antecedent-adjusted coverage at \code{con} $=$ \code{cov} $=$ $0.75$ and giving the algorithm and \code{outcome} specification.
<<details, eval=F>>=
# Draw a ground truth with outcomes A=1 and B=2.
fullData <- allCombs(c(4,4,4,4,4)) |> ct2df()
groundTruth <- randomCsf(fullData, outcome = c("A=1", "B=2"), compl = 3)
# Generate ideal data for groundTruth.
idealData <- ct2df(selectCases(groundTruth, fullData))
# Introduce 20% fragmentation.
fragData <- idealData[-sample(1:nrow(idealData), nrow(idealData)*0.2), ] 
# Add 10% random noise (cases incompatible with ground truth).
incompCases <- dplyr::setdiff(fullData, idealData)
x <- rbind(incompCases[sample(1:nrow(incompCases), 
   nrow(fragData) * 0.1), ], fragData)  
# Run CNA with outcome specification and PAcon/AAcov as evaluation 
# measures.
csfs <- csf(cna(x, outcome = c("A=1", "B=2"), con = .75, cov = .75, 
   maxstep = c(3, 3, 12), measures = c("PAcon", "AAcov")))
# Check whether no causal error (no false positive) is returned.
if(length(csfs$condition)==0) {
   TRUE } else {any(unlist(lapply(csfs$condition,
   function(x) frscore::causal_submodel(x, groundTruth, fullData))))}
@
Every re-run of this code chunk generates a different ground truth and different data. In some runs CNA passes the test, in others it does not. To determine CNA's error-freeness ratio under these test conditions, the above core must be embedded in a suitable test loop. To estimate CNA's overall error-freeness ratio, the test conditions should be systematically varied by, for instance, varying the complexity of the ground truth, the degree of fragmentation and noise, the evaluation measures and corresponding thresholds, or by drawing the noise with a bias or supplying CNA with more or less prior causal information via  \code{ordering} or \code{exclude}. Correctness and completeness tests can be designed analogously, by suitably modifying the last line that evaluates the solution object \code{csfs}. For single-outcome structures (\emph{asf}), benchmark tests with some of the above variations have been conducted in \citep{BaumgartnerfsCNA, CNA_LR, Swiatczak2024, DeSouter2024,newMeasures}; corresponding tests for multi-outcome structures (\emph{csf}) have been carried out in \citep{Parkkinen2023}.






\section{Summary}

This vignette introduced the theoretical foundations as well as the main functions of the \pkg{cna} \proglang{R} package for configurational causal inference and modeling with Coincidence Analysis (CNA). Moreover, we explained how to interpret the output of CNA, provided some guidance for how to use various model fit parameters for the purpose of ambiguity reduction, and supplied a benchmarking template. 

CNA is currently the only method searching for (M)INUS causation in data that builds multi-outcome models and, hence, not only orders causes conjunctively and disjunctively but also sequentially. %conjunctivity and disjunctivity, but also sequentiality.
Moreover, it builds causal models on the basis of a bottom-up algorithm that is unique among configurational comparative methods and gives CNA an edge over other methods in guaranteeing the redundancy-freeness of its models, which, in turn, is crucial for their causal interpretability. Overall, CNA constitutes a powerful methodological alternative for researchers interested in causal structures featuring conjunctivity and disjunctivity. The \pkg{cna} package makes that inferential power available to end-users.

\section*{Acknowledgments}
We are grateful to Alrik Thiem, Martyna Klein, Jonathan Freitas, and Luna De Souter for helpful comments on earlier drafts of this vignette, and we thank the Toppforsk-program of the Trond Mohn Foundation and the University of Bergen (grant nr.\ 811886), the Research Council of Norway (grant nr.\ 326215), and the Swiss National Science Foundation (grant nr.\ PP00P1\_144736/1) for generous support of the research behind the \pkg{cna} package over the years.


  % \bibliography{integra}
\begin{thebibliography}{53}
\newcommand{\enquote}[1]{``#1''}
\providecommand{\natexlab}[1]{#1}
\providecommand{\url}[1]{\texttt{#1}}
\providecommand{\urlprefix}{URL }
\expandafter\ifx\csname urlstyle\endcsname\relax
  \providecommand{\doi}[1]{doi:\discretionary{}{}{}#1}\else
  \providecommand{\doi}{doi:\discretionary{}{}{}\begingroup
  \urlstyle{rm}\Url}\fi
\providecommand{\eprint}[2][]{\url{#2}}

\bibitem[{Amb\"uhl and Baumgartner(2022)}]{cnaOptRef}
Amb\"uhl M, Baumgartner M (2022).
\newblock \emph{\pkg{cnaOpt}: Optimizing Consistency and Coverage in
  Configurational Causal Modeling.}
\newblock \textsf{R} Package Version 0.5.2.
\newblock \urlprefix\url{https://cran.r-project.org/package=cnaOpt}.

\bibitem[{Baumgartner(2009{\natexlab{a}})}]{Baumgartner:2007a}
Baumgartner M (2009{\natexlab{a}}).
\newblock \enquote{Inferring Causal Complexity.}
\newblock \emph{Sociological Methods \& Research}, \textbf{38}, 71--101.

\bibitem[{Baumgartner(2009{\natexlab{b}})}]{Baumgartner:2008}
Baumgartner M (2009{\natexlab{b}}).
\newblock \enquote{Uncovering Deterministic Causal Structures: A {B}oolean
  Approach.}
\newblock \emph{Synthese}, \textbf{170}, 71--96.

\bibitem[{Baumgartner(2013)}]{Baumgartner:actual}
Baumgartner M (2013).
\newblock \enquote{A Regularity Theoretic Approach to Actual Causation.}
\newblock \emph{Erkenntnis}, \textbf{78}, 85--109.

\bibitem[{Baumgartner(2015)}]{Baumgartner:pars}
Baumgartner M (2015).
\newblock \enquote{Parsimony and Causality.}
\newblock \emph{Quality \& Quantity}, \textbf{49}, 839--856.

\bibitem[{Baumgartner(2020)}]{BaumCaus}
Baumgartner M (2020).
\newblock \enquote{Causation.}
\newblock In D~Berg-Schlosser, B~Badie, L~Morlino (eds.), \emph{The {SAGE}
  Handbook of Political Science}, pp. 305--321. SAGE, London.

\bibitem[{Baumgartner and Amb\"uhl(2020)}]{BaumgartnerfsCNA}
Baumgartner M, Amb\"uhl M (2020).
\newblock \enquote{Causal Modeling with Multi-Value and Fuzzy-Set Coincidence
  Analysis.}
\newblock \emph{Political Science Research and Methods}, \textbf{8}, 526--542.
\newblock \doi{10.1017/psrm.2018.45}.

\bibitem[{Baumgartner and Amb\"{u}hl(2021)}]{optimize}
Baumgartner M, Amb\"{u}hl M (2021).
\newblock \enquote{Optimizing Consistency and Coverage in Configurational
  Causal Modeling.}
\newblock \emph{Sociological Methods \& Research}, \textbf{52}(3), 1288--1320.
\newblock \doi{10.1177/0049124121995554}.

\bibitem[{Baumgartner and Falk(2023{\natexlab{a}})}]{BaumFalk}
Baumgartner M, Falk C (2023{\natexlab{a}}).
\newblock \enquote{Boolean Difference-Making: A Modern Regularity Theory of
  Causation.}
\newblock \emph{The British Journal for the Philosophy of Science},
  \textbf{74}(1), 171--197.
\newblock \doi{10.1093/bjps/axz047}.

\bibitem[{Baumgartner and Falk(2023{\natexlab{b}})}]{CNA_LR}
Baumgartner M, Falk C (2023{\natexlab{b}}).
\newblock \enquote{Configurational Causal Modeling and Logic Regression.}
\newblock \emph{Multivariate Behavioral Research}, \textbf{58}(2), 292--310.
\newblock \doi{10.1080/00273171.2021.1971510}.

\bibitem[{Baumgartner and Thiem(2017)}]{BaumgartnerAmbigu}
Baumgartner M, Thiem A (2017).
\newblock \enquote{Model Ambiguities in Configurational Comparative Research.}
\newblock \emph{Sociological Methods \& Research}, \textbf{46}(4), 954--987.

\bibitem[{Baumgartner and Thiem(2020)}]{Baumgartner:simul}
Baumgartner M, Thiem A (2020).
\newblock \enquote{Often Trusted but Never (Properly) Tested: Evaluating
  {Q}ualitative {C}omparative {A}nalysis.}
\newblock \emph{Sociological Methods \& Research}, \textbf{49}, 279--311.
\newblock \doi{10.1177/0049124117701487}.

\bibitem[{Beirlaen \emph{et~al.}(2018)Beirlaen, Leuridan, and Van
  De~Putte}]{Beirlaen2018}
Beirlaen M, Leuridan B, Van De~Putte F (2018).
\newblock \enquote{A logic for the discovery of deterministic causal
  regularities.}
\newblock \emph{Synthese}, \textbf{195}(1), 367--399.
\newblock \doi{10.1007/s11229-016-1222-x}.

\bibitem[{Bowran(1965)}]{Bowran:1965}
Bowran AP (1965).
\newblock \emph{A Boolean Algebra. Abstract and Concrete}.
\newblock Macmillan, London.

\bibitem[{Brambor \emph{et~al.}(2006)Brambor, Clark, and Golder}]{brambor2006}
Brambor T, Clark WR, Golder M (2006).
\newblock \enquote{Understanding Interaction Models: Improving Empirical
  Analyses.}
\newblock \emph{Political Analysis}, \textbf{14}(1), 63--82.
\newblock \doi{10.1093/pan/mpi014}.

\bibitem[{Braumoeller(2015)}]{QCAfalsePositive}
Braumoeller B (2015).
\newblock \emph{\textbf{QCAfalsePositive}: Tests for Type I Error in
  Qualitative Comparative Analysis (QCA)}.
\newblock R package version 1.1.1,
  \urlprefix\url{https://CRAN.R-project.org/package=QCAfalsePositive}.

\bibitem[{Bretto(2013)}]{Bretto2013}
Bretto A (2013).
\newblock \emph{Hypergraph Theory: An Introduction}.
\newblock Springer International Publishing.

\bibitem[{Cronqvist and Berg-Schlosser(2009)}]{cronqvist2009}
Cronqvist L, Berg-Schlosser D (2009).
\newblock \enquote{Multi-Value {QCA} ({mvQCA}).}
\newblock In B~Rihoux, CC~Ragin (eds.), \emph{Configurational Comparative
  Methods: Qualitative Comparative Analysis ({QCA}) and Related Techniques},
  pp. 69--86. Sage Publications, London.

\bibitem[{Csikszentmihalyi(1975)}]{boredom1975}
Csikszentmihalyi M (1975).
\newblock \emph{Beyond Boredom and Anxiety}.
\newblock Jossey-Bass Publishers, San Francisco.

\bibitem[{Culverhouse \emph{et~al.}(2002)Culverhouse, Suarez, Lin, and
  Reich}]{Culverhouse2002}
Culverhouse R, Suarez BK, Lin J, Reich T (2002).
\newblock \enquote{A Perspective on Epistasis: Limits of Models Displaying No
  Main Effect.}
\newblock \emph{The American Journal of Human Genetics}, \textbf{70}(2),
  461--471.
\newblock \doi{10.1086/338759}.

\bibitem[{De~Souter(2024)}]{DeSouter2024}
De~Souter L (2024).
\newblock \enquote{Evaluating {B}oolean Relationships in Configurational
  Comparative Methods.}
\newblock \emph{Journal of Causal Inference}, \textbf{12}(1).
\newblock \doi{10.1515/jci-2023-0014}.

\bibitem[{De~Souter and Baumgartner(2025)}]{newMeasures}
De~Souter L, Baumgartner M (2025).
\newblock \enquote{New sufficiency and necessity measures for model building with Coincidence Analysis.}
\newblock \emph{Zenodo}. 
\newblock \urlprefix\url{https://doi.org/10.5281/zenodo.13619580}.

\bibitem[{Dusa(2024)}]{QCARef}
Dusa A (2024).
\newblock \emph{\pkg{QCA}: A Package for {Q}ualitative {C}omparative
  {A}nalysis}.
\newblock \textsf{R} Package Version 3.23.
\newblock \urlprefix\url{https://cran.r-project.org/package=QCA}.

\bibitem[{Eberhardt(2013)}]{eberhardt2013}
Eberhardt F (2013).
\newblock \enquote{Experimental Indistinguishability of Causal Structures.}
\newblock \emph{Philosophy of Science}, \textbf{80}(5), 684--696.

\bibitem[{Gallo \emph{et~al.}(1993)Gallo, Longo, Pallottino, and
  Nguyen}]{Gallo1993}
Gallo G, Longo G, Pallottino S, Nguyen S (1993).
\newblock \enquote{Directed Hypergraphs and Applications.}
\newblock \emph{Discrete Applied Mathematics}, \textbf{42}(2), 177--201.
\newblock \doi{10.1016/0166-218X(93)90045-P}.

\bibitem[{Gil-Pons \emph{et~al.}(2024)Gil-Pons, Ward, and Miller}]{Pons2024}
Gil-Pons R, Ward M, Miller L (2024).
\newblock \enquote{Finding (s,d)-Hypernetworks in F-Hypergraphs is NP-Hard.}
\newblock \emph{Information Processing Letters}, \textbf{184}, 106433.
\newblock \doi{10.1016/j.ipl.2023.106433}.

\bibitem[{{Gra{\ss}hoff} and May(2001)}]{grasshoff2001}
{Gra{\ss}hoff} G, May M (2001).
\newblock \enquote{Causal Regularities.}
\newblock In W~Spohn, M~Ledwig, M~Esfeld (eds.), \emph{Current Issues in
  Causation}, pp. 85--114. Mentis, Paderborn.

\bibitem[{Greenland \emph{et~al.}(1999)Greenland, Pear, and
  Robins}]{greenland1999}
Greenland S, Pear J, Robins JM (1999).
\newblock \enquote{Causal Diagrams for Epidemiologic Research.}
\newblock \emph{Epidemiology}, \textbf{10}(1), 37--48.

\bibitem[{H\'ajek(1998)}]{Hajek1998}
H\'ajek P (1998).
\newblock \emph{Metamathematics of Fuzzy Logic}.
\newblock Kluwer, Dordrecht.

\bibitem[{Hume(1999 (1748))}]{Hume:1999}
Hume D (1999 (1748)).
\newblock \emph{An Enquiry Concerning Human Understanding}.
\newblock Oxford University Press, Oxford.

\bibitem[{Kalisch \emph{et~al.}(2012)Kalisch, Maechler, Colombo, Maathuis, and
  Buehlmann}]{Kalisch2012}
Kalisch M, Maechler M, Colombo D, Maathuis MH, Buehlmann P (2012).
\newblock \enquote{Causal Inference Using Graphical Models with the \textsf{R}
  Package \pkg{pcalg}.}
\newblock \emph{Journal of Statistical Software}, \textbf{47}(11), 1--26.

\bibitem[{Kooperberg and Ruczinski(2005)}]{Kooperberg2005}
Kooperberg C, Ruczinski I (2005).
\newblock \enquote{Identifying Interacting {SNP}s Using {M}onte {C}arlo Logic
  Regression.}
\newblock \emph{Genetic Epidemiology}, \textbf{28}(2), 157--170.
\newblock \doi{10.1002/gepi.20042}.

\bibitem[{Kooperberg and Ruczinski(2023)}]{LogicReg}
Kooperberg C, Ruczinski I (2023).
\newblock \emph{\pkg{LogicReg}: Logic Regression}.
\newblock \proglang{R} package version 1.6.6.
\newblock \urlprefix\url{https://CRAN.R-project.org/package=LogicReg}.

\bibitem[{Lemmon(1965)}]{Lemmon:1965}
Lemmon EJ (1965).
\newblock \emph{Beginning Logic}.
\newblock Chapman \& Hall, London.

\bibitem[{Mackie(1974)}]{Mackie:1974}
Mackie JL (1974).
\newblock \emph{The Cement of the Universe. A Study of Causation}.
\newblock Clarendon Press, Oxford.

\bibitem[{Oana \emph{et~al.}(2025)Oana, Medzihorsky, Quaranta, and
  Schneider}]{SetMethods}
Oana IE, Medzihorsky J, Quaranta M, Schneider CQ (2025).
\newblock \emph{\textbf{SetMethods}: Functions for Set-Theoretic Multi-Method
  Research and Advanced QCA}.
\newblock R package version 4.1,
  \urlprefix\url{https://CRAN.R-project.org/package=SetMethods}.

\bibitem[{Parkkinen and Baumgartner(2023)}]{Parkkinen2023}
Parkkinen VP, Baumgartner M (2023).
\newblock \enquote{Robustness and Model Selection in Configurational Causal
  Modeling.}
\newblock \emph{Sociolocial Methods \& Research}, \textbf{52}(1), 176--208.

\bibitem[{Parkkinen and Baumgartner(2024)}]{frscore-pkg}
Parkkinen VP, Baumgartner M (2024).
\newblock \emph{\pkg{frscore}: Functions for Calculating Fit-Robustness of
  {CNA}-solutions}.
\newblock \textsf{R} Package Version 0.4.1.
\newblock \urlprefix\url{https://CRAN.R-project.org/package=frscore}.

\bibitem[{Ragin(2006)}]{ragin2006}
Ragin CC (2006).
\newblock \enquote{Set Relations in Social Research: Evaluating Their
  Consistency and Coverage.}
\newblock \emph{Political Analysis}, \textbf{14}(3), 291--310.

\bibitem[{Ragin(2008)}]{Ragin:2008}
Ragin CC (2008).
\newblock \emph{Redesigning Social Inquiry: Fuzzy Sets and Beyond}.
\newblock University of Chicago Press, Chicago.

\bibitem[{Rihoux and Ragin(2009)}]{Rihoux:2009}
Rihoux B, Ragin CC (eds.) (2009).
\newblock \emph{Configurational Comparative Methods. Qualitative Comparative
  Analysis (QCA) and Related Techniques}.
\newblock Sage, Thousand Oaks.

\bibitem[{Ruczinski \emph{et~al.}(2003)Ruczinski, Kooperberg, and
  LeBlanc}]{Ruczinski2003}
Ruczinski I, Kooperberg C, LeBlanc M (2003).
\newblock \enquote{Logic Regression.}
\newblock \emph{Journal of Computational and Graphical Statistics},
  \textbf{12}(3), 475--511.
\newblock \doi{10.1198/1061860032238}.

\bibitem[{Schneider and Wagemann(2012)}]{Schneider:2012}
Schneider CQ, Wagemann C (2012).
\newblock \emph{Set-Theoretic Methods: A User's Guide for Qualitative
  Comparative Analysis ({QCA}) and Fuzzy-Sets in the Social Sciences}.
\newblock Cambridge University Press, Cambridge.

\bibitem[{Schwender and Tietz(2024)}]{logicFS}
Schwender H, Tietz T (2024).
\newblock \emph{\pkg{logicFS}: Identification of SNP Interactions}.
\newblock \proglang{R} package version 2.26.0.
\newblock \doi{10.18129/B9.bioc.logicFS}.

\bibitem[{Siblini \emph{et~al.}(2020)Siblini, Fr\'ery, He-Guelton, Obl\'e, and
  Wang}]{siblini_master_2020}
Siblini W, Fr\'ery J, He-Guelton L, Obl\'e F, Wang YQ (2020).
\newblock \enquote{Master Your Metrics with Calibration.}
\newblock In MR~Berthold, A~Feelders, G~Krempl (eds.), \emph{Advances in
  {Intelligent} {Data} {Analysis} {XVIII}}, pp. 457--469. Springer, Cham.
\newblock \doi{10.1007/978-3-030-44584-3_36}.

\bibitem[{Simon(1954)}]{simon1954}
Simon HA (1954).
\newblock \enquote{Spurious Correlation: A Causal Interpretation.}
\newblock \emph{Journal of the American Statistical Association},
  \textbf{49}(267), 467--479.

\bibitem[{Spirtes \emph{et~al.}(2000)Spirtes, Glymour, and
  Scheines}]{Spirtes:2000}
Spirtes P, Glymour C, Scheines R (2000).
\newblock \emph{Causation, Prediction, and Search}.
\newblock 2 edition. MIT Press, Cambridge.

\bibitem[{Swiatczak(2021)}]{Swiatczak2021}
Swiatczak MD (2021).
\newblock \enquote{Different Algorithms, Different Models.}
\newblock \emph{Quality \& Quantity}, \textbf{56}(4), 1913--1937.
\newblock \doi{10.1007/s11135-021-01193-9}.

\bibitem[{Swiatczak and Baumgartner(2024)}]{Swiatczak2024}
Swiatczak MD, Baumgartner M (2024).
\newblock \enquote{Data Imbalances in Coincidence Analysis: A Simulation
  Study.}
\newblock \emph{Sociological Methods \& Research}.
\newblock ISSN 1552-8294.
\newblock \doi{10.1177/00491241241227039}.

\bibitem[{Thiem(2018)}]{Thiem2018}
Thiem A (2018).
\newblock \emph{\pkg{QCApro}: Advanced Functionality for Performing and
  Evaluating {Q}ualitative {C}omparative {A}nalysis}.
\newblock \textsf{R} Package Version 1.1-2.
\newblock \urlprefix\url{https://CRAN.R-project.org/package=QCApro}.

\bibitem[{Thiem and Du\c{s}a(2013)}]{Thiem:2013}
Thiem A, Du\c{s}a A (2013).
\newblock \emph{Qualitative Comparative Analysis With {R}: A User's Guide}.
\newblock Springer, New York, NY.

\bibitem[{Whitaker \emph{et~al.}(2020)Whitaker, Sperber, Birken, Baumgartner,
  Thiem, Cragun, Damschroder, Miech, and Slade}]{Birken:CNA}
Whitaker RG, Sperber N, Birken S, Baumgartner M, Thiem A, Cragun D, Damschroder
  L, Miech E, Slade A (2020).
\newblock \enquote{Coincidence Analysis: A New Method for Causal Inference in
  Implementation Science.}
\newblock \emph{Implementation Science}, \textbf{15}.
\newblock \doi{10.1186/s13012-020-01070-3}.

\bibitem[{Yakovchenko \emph{et~al.}(2020)Yakovchenko, Miech, Chinman, Chartier,
  Gonzalez, Kirchner, Morgan, Park, Powell, Proctor, Ross, Waltz, and
  Rogal}]{Yakovchenko:2020}
Yakovchenko V, Miech EJ, Chinman MJ, Chartier M, Gonzalez R, Kirchner JE,
  Morgan TR, Park A, Powell BJ, Proctor EK, Ross D, Waltz TJ, Rogal SS (2020).
\newblock \enquote{Strategy Configurations Directly Linked to Higher Hepatitis
  C Virus Treatment Starts: An Applied Use of Configurational Comparative
  Methods.}
\newblock \emph{Medical Care}, \textbf{58}(5).
\newblock \doi{10.1097/MLR.0000000000001319}.

\end{thebibliography}


\section*{Appendix}
\subsection*{Partial structural redundancies} \label{redundant}



% It is not only possible that
% Boolean expressions describing the behavior of single outcomes contain redundant proper parts, but such expressions can themselves---as a whole---be redundant in superordinate structures. For instance, when three \emph{asf} are conjunctively concatenated to \emph{asf}$_1\,\att\,$\emph{asf}$_2\,\att\,$\emph{asf}$_3$ in the first step of stage 4 of the CNA algorithm (see section \ref{algo}), it can happen that \emph{asf}$_1\,\att\,$\emph{asf}$_2\,\att\,$\emph{asf}$_3$ is logically equivalent to \emph{asf}$_1\,\att\,$\emph{asf}$_2$, meaning that \emph{asf}$_3$ makes no difference to accounting for the behavior of the outcomes in that structure and is, thus, redundant. This is called a \emph{structural redundancy} (for a detailed discussion see \citealp{BaumFalk}).
% 
% 
% 
% This type of redundancy is best introduced with a concrete example. Consider the following complex MINUS-formula:
% \begin{equation}(A\att B\, +\, C \; \leftrightarrow \; D)\;\att\;(a \,+\, c \;\leftrightarrow \; E)
% \label{red1} \end{equation}
% \eqref{red1} represents a causal structure such that $A\att B$ and $C$ are two alternative causes of $D$ and $a$ and $c$ are two alternative causes of $E$. That is, the presence of $A$ and $C$ is relevant to $D$ and their absence is  relevant to $E$.
% A possible interpretation of these factors might be the following. Suppose a city has two power stations: a wind farm and a nuclear plant. Let $A$ express that the wind farm is operational and $C$ that the nuclear plant is operational and let operationality be sufficient for a nuclear plant to produce electricity, while a wind farm produces electricity provided it is operational and there is wind ($B$). Hence, the wind farm being operational while it is windy or the nuclear plant being operational ($A\att B \,+\, C$) are two alternative causes of the city being power supplied ($D$). Whereas the wind farm or the nuclear plant not being operational ($a\, +\, c$) are two alternative causes of an alarm being triggered ($E$).
% 
% 
% The following data (\code{dat.redun}) comprise all and only the configurations that are compatible with \eqref{red1}:\label{dat.redun}
% <<redundant1>>=
% (dat.redun <- ct2df(selectCases("(A*B + C <-> D)*(a + c <-> E)")))
% @
% The problem now is that \code{dat.redun} does not only entail the two \emph{asf} contained in \eqref{red1}, \emph{viz.}\ \eqref{rdnD} and \eqref{rdnE}, but also a third one, \emph{viz.}\ \eqref{rdnC}:
% \begin{align}
% A\att B \,+\, C \; \leftrightarrow \; D\label{rdnD}\\
% a \,+\, c \;\leftrightarrow \; E \label{rdnE}\\
% a\att D\, +\, e\; \leftrightarrow \; C  \label{rdnC}
% \end{align}
% That means the behavior of $C$, which is exogenous in the data-generating structure \eqref{red1}, can be expressed as a redundancy-free Boolean function of its two effects $D$ and $E$. \eqref{rdnC}, hence, amounts to an upstream (or backtracking) \emph{asf}, which, obviously, must not be causally interpreted. Indeed, when \eqref{rdnC} is embedded in the superordinate dependency structure \eqref{ex15} that results from a conjunctive concatenation of all \emph{asf} that follow from \code{dat.redun}, it turns out that \eqref{rdnC} is redundant. The reason is that \eqref{ex15} has a proper part which is logically equivalent to \eqref{ex15}, namely \eqref{red1}.
% \begin{equation}
% (A\att B \, +\,  C \; \leftrightarrow \; D)\;\att\;(a \,+\, c \;\leftrightarrow \; E)\;\att\;(a\att D \, +\,  e\; \leftrightarrow \; C) \label{ex15}
% \end{equation}
% \eqref{ex15} and \eqref{red1} state exactly the same about the behavior of the factors in \code{dat.redun}, meaning that \eqref{rdnC} makes no difference to that behavior over and above \eqref{rdnD} and \eqref{rdnE}. By contrast, neither \eqref{rdnD} nor \eqref{rdnE} can be eliminated from \eqref{ex15} such that the remaining expression is logically equivalent to \eqref{ex15}. Both of these downstream \emph{asf} make their own distinctive difference to the behavior of the factors in \code{dat.redun}. The upstream \emph{asf} \eqref{rdnC}, however, is a \emph{structural redundancy} in \eqref{ex15}. \eqref{ex15} must not be causally interpreted because it is not a complex MINUS-formula (see p.\ \pageref{minus_formula} above).
% 
% 
% Accordingly, the \code{csf()} function, which performs stage 4 of the CNA algorithm, removes all structurally redundant \emph{asf} from conjunctions of \emph{asf}; that is, when applied to \code{dat.redun} it returns \eqref{red1}, not \eqref{ex15}:
% <<redundant2>>=
% printCols <- c("condition", "con", "cov", "redundant")
% csf(cna(dat.redun, details = "r"))[printCols]
% @
% % In previous versions (< 3.0) of the \pkg{cna} package, structurally redundant \emph{asf} were not automatically removed but only marked by means of the solution attribute \code{redundant}. The solutions with \code{redundant = TRUE} then had to be further processed by the function \code{minimalizeCsf()}. To reproduce that old behavior, \code{csf()} now has an additional argument \code{minimalizeCsf}, which defaults to \code{TRUE}. If set to \code{FALSE}, structural redundancies are not automatically eliminated. Accordingly, the following call returns \eqref{ex15} and marks it as containing a structural redundancy---and thus as not having (M)INUS form:
% % <<redundant3>>=
% % csf(cna(dat.redun, details = "r"), inus.only = FALSE,
% %     minimalizeCsf = FALSE)[printCols]
% % @
% % As \emph{csf} with \code{redundant = TRUE} must never be causally interpreted, the setting \code{minimalizeCsf = FALSE} is deprecated. It is mainly kept in the package for backwards compatibility and developing purposes. Correspondingly, the solution attribute \code{redundant} is no longer relevant, as \code{cna()} and \code{csf()} no longer output \emph{csf} with structural redundancies in the first place.
% 

As discussed in section \ref{models}, conjunctively concatening atomic MINUS-formulas or \emph{asf} may give rise to structural redundancies (cf.\ also \citealp{BaumFalk}), which are eliminated in stage 4 of the CNA algorithm. While structural redundancies of whole \emph{asf} can occur in both ideal and noisy data, the latter type of data may induce yet another, but related, type of redundancy, which is not documented anywhere in the research literature and, thus, requires explicit discussion in this Appendix.

When data do not feature strict Boolean dependencies, building \emph{csf} from the inventory of \emph{asf} recovered in stage 3 of the CNA algorithm may lead to the redundancy of proper parts of \emph{asf}---parts which are not redundant when those \emph{asf} are considered in isolation. That is, a complex structure can entail that one of its \emph{asf} has a redundant proper part, which redundancy, however, is not visible in the data. We call this a \emph{partial structural redundancy}.

A concrete example helps to clarify the problem. Consider the solutions obtained from analyzing the data set \code{d.autonomy} without elimination of partial structural redundancies. This is achieved by setting the developer argument \code{inus.only} to \code{FALSE}, which adds an additional column, \code{inus}, to the output, indicating with \code{TRUE}/\code{FALSE} whether the formula in the corresponding row qualifies as a MINUS-formula:
<<redundant4a, results=hide, echo=F>>=
options(width=80)
@
<<redundant4>>=
printCols <- c("condition", "con", "cov", "inus")
csf(cna(d.autonomy, ordering = "AU", con = .9, cov = .94,
   maxstep = c(2, 2, 8), inus.only = FALSE))[printCols]
@
Solution \#3 in that list logically entails \eqref{equiv}:
\begin{equation}\label{equiv} 
(SP\att RE \,+\, ci\att cn\; \leftrightarrow\; EM)\,\att\,(ci\, +\, EM \;\leftrightarrow \;SP)
\end{equation}
That is, if the behavior of $EM$ is regulated by the first \emph{asf} in solution \#3 and \eqref{equiv}, $co$ in \#3---for pure logical reasons---cannot make a difference to $SP$ and, hence, is redundant. That partial structural redundancy, however, is not visible in the data \code{d.autonomy} where $EM$ alone (i.e.\ without $co$) is not sufficient for $SP$ with standard consistency $0.9$, which is the threshold chosen for the above analysis. By itself, $EM$ only reaches a standard consistency of 0.891 for $SP$, which can be shown using the \code{condTbl()} function as follows:
<<redundant5>>=
condTbl("EM -> SP", configTable(d.autonomy))
@
Hence, the data suggest that $co$ makes a difference to $SP$, to the effect that meeting \code{con} $ = 0.9$ for all \emph{msc} requires $EM\att co$ (and not $EM$ alone) to be treated as cause of $SP$. At the same time, that \eqref{equiv} logically follows from solution \#3 implies that it is logically excluded that $co$ is a difference-maker of $SP$ in the context of solution \#3. The result is a contradiction: the data call for including $co$ as cause of $SP$, whereas the structure inferred from that data entails not to include $co$.

The case of solution \#6 is analogous. Solution \#6 not only entails \eqref{equiv} but is logically equivalent to it. The upshot is the same: the data determine that some causal relevance relation obtains, which is logically excluded by the very structure inferred from the data. Such inconsistencies cannot be resolved by modifying solutions \#3 and \#6; rather, these solutions are not, and cannot be transformed into, well-formed MINUS-formulas that would meet \code{con}. They must be eliminated from the output. This is exactly what happens when \code{cna()} and \code{csf()} run normally, that is, without setting  \code{inus.only} to \code{FALSE}:
<<redundant6>>=
csf(cna(d.autonomy, ordering = "AU", con = .9, cov = .94,
   maxstep = c(2, 2, 8), details = "inus"))[printCols]
@
% In sum, as of version 3.0 of the \pkg{cna} package, both structural and partial structural redundancies are automatically resolved and eliminated. The functions \code{cna()} and \code{csf()} now exclusively output MINUS-formulas (i.e.\ INUS solutions).
\end{document}