\newpage

\section{\label{sub:NN}Neural Network Analysis}

\subsection{\label{sub:Variables}Variables for NN training}

\noindent Following the same procedure as in the previous analysis, we determine the content
of signal and background in the preselected sample, increase signal/background rate
and from this, measure the cross-section.
The procedure adopted in the p17 analysis was feed a set of topological variables into an
artificial neural network in order to provide the best possible separation between
signal and background. As before, the criteria for choosing such variables were: power of discrimination
and $\tau$-uncorrelated variables. The set is presented below:

\begin{itemize}
\item \textit{\textbf{$H_{T}$}} - the scalar sum of all jet's $p_{T}$ (here and below including $\tau$ lepton candidates). 

\item \textit{\textbf{$\not\!\! E_{T}$ significance}} - As being the variable that provides the best signal-background
separation we decided to optimize it.

\item \textit{\textbf{Aplanarity}} \cite{p17topo} - the normalized momentum tensor is defined as

\begin{center}
\begin{equation}
{\cal M} = \frac{\sum_{o}p^{o}_{i}p^{o}_{j}}{\sum_{o}|\overrightarrow{p^{o}}|}
\label{tensor}
\end{equation}
\end{center}

\noindent where $\overrightarrow{p^{0}}$ is  the momentum-vector of a reconstructed object $o$
and $i$ and $j$ are cartesian coordinates. From the diagonalization of $\cal M$ we find three eigenvalues
$\lambda_{1}\geq\lambda_{2}\geq\lambda_{3}$ with the constraint $\lambda_{1} + \lambda_{2} + \lambda_{3} = 1$.
The aplanarity {\cal A} is given by {$\cal A$} = $\frac{3}{2}\lambda_{3}$ and measures the flatness of an event.
Hence, it is defined in the range $0 \leq {\cal M} \leq 0.5$. Large values of {$\cal A$} correspond to more spherical events,
like $t\bar{t}$ events for instance, since they are typical of decays of heavy objects. On the other hand,
both QCD and $W + \mbox{jets}$ events are more planar since jets in these events are primarily due to
initial state radiation.

\item \textit{\textbf{Sphericity}} \cite{p17topo} - being defined as {$\cal S$} = $\frac{3}{2}(\lambda_{2} + \lambda_{3})$,
and having a range $0 \leq {\cal S} \leq 1.0$, sphericity is a measure of the summed $p^{2}_{\perp}$ with 
respect to the event axis. In this sense a 2-jets event corresponds to {$\cal S$} $\approx 0$ and an isotropic event
{$\cal S$} $\approx 1$. $t\bar{t}$ events are very isotropic as they are typical of the decays of heavy objects
and both QCD and $W + \mbox{jets}$ events are less isotropic due to the fact that jets in these events come 
primarily from initial state radiation.

\item \textit{\textbf{Top and $W$ mass likelihood}} - a $\chi^{2}$-like variable. 
$L\equiv\left(\frac{M_{3j}-m_{t}}{\sigma_{t}}\right)^{2}+\left(\frac{M_{2j}-M_{W}}{\sigma_{W}}\right)^{2}$,
where $m_{t}, M_{W},\sigma_{t},\sigma_{W}$ are top and W masses (172.4
GeV and 81.02 GeV respectively) and resolution values (19.4 GeV and 8.28
GeV respectively). $M_{3j}$ and $M_{2j}$ are invariant masses composed
of the jet combinations. We choose combination that minimizes $L$. 

\item \textit{\textbf{Centrality}}, defined as $\frac{H_{T}}{H_{E}}$ , where $H_{E}$ is sum
of energies of the jets. 
\item \textit{\textbf{$\cos(\theta*)$}} -  The angle between the beam axis and the 
highest-$p_T$ jet in the rest frame of all the jets in the event.
\item \textit{\textbf{$\sqrt(s)$}} - The invariant mass of all jets and $\tau$s in the event.

\end{itemize}

The chosen variables are in the end a consequence of the method employed  in this
analysis: use events from the QCD-enriched 
loose-tight sample to model QCD events in the signal-rich sample, and use
a b-tag veto sample as an independent control sample to check the validity of such 
background modeling.

%\clearpage

\subsection{\label{sub:NN-variables}Topological NN}
For training the Neural Network we used the Multilayer Perceptron algorithm, as described in 
\cite{MLPfit}. As explained before in Section \ref{sub:Results-of-the}, the first 1400000 
events in the ``loose-tight'' sample were used as background 
for NN training for taus types 1 and 2, and the first 600000 of the same sample for NN training for type 3 taus.
This means that different tau types are being treated separately in the topological NN.
In both cases 1/3 of the Alpgen sample of $t\bar{t} \rightarrow \tau +jets$ was used for NN training and 2/3 of it
for the measurement.
When doing the measurement later on (Section \ref{sub:xsect}) we pick the tau with the highest $NN(\tau)$
in the signal sample as the tau cadidate at same time that taus in the loose-tight sample are picked at
random since all of them are regarded as fake taus by being below the cut $NN(\tau)$ = 0.7. By doing this 
we expect to avoid any bias when selecting real taus for the measurement.
Figures \ref{fig:nnout_type2_training} and \ref{fig:nnout_type3_training} show the 
effect of each of the chosen the topological event NN input variables on the final output.

Figures \ref{fig:nnout_type2} and \ref{fig:nnout_type3} show the NN output as a result of the training
described above. It is evident from both pictures that high values of NN correspond to 
the signal-enriched region. 


\begin{figure}[h]
\includegraphics[scale=0.6]{plots/SetI_NNout_SM_type2_tauQCD.eps}
\caption{Training of topological Neural Network output for type 1 and 2 $\tau$ channel. 
Upper left: relative impact of each of the input variables; upper right: topological structure;
lower right: final signal-background separation of the method; lower left: convergence curves.}
\label{fig:nnout_type2_training}
\end{figure}

\newpage


\begin{figure}[h]
\includegraphics[scale=0.6]{plots/SetI_NNout_SM_type3_tauQCD.eps}
\caption{Training of topological Neural Network output for type 3 $\tau$ channel. 
Upper left: relative impact of each of the input variables; upper right: topological structure;
lower right: final signal-background separation of the method; lower left: convergence curves.}
\label{fig:nnout_type3_training}
\end{figure}

\begin{figure}[h]
\includegraphics[scale=0.5]{CONTROLPLOTS/Std_TypeI_II/nnout.eps}
\caption{The topological Neural Network output for type 1 and 2 $\tau$ channel}
\label{fig:nnout_type2}
\end{figure}

\newpage

\begin{figure}[t]
\includegraphics[scale=0.5]{CONTROLPLOTS/Std_TypeIII/nnout.eps}
\caption{The topological Neural Network output for type 3 $\tau$ channel}
\label{fig:nnout_type3}
\end{figure}


\subsection{\label{sub:NN-optimization}NN optimization}
One difference between this present analysis and the previous p17 is that we performed a NN optimization along with a
$\not\!\! E_{T}$ significance optimization. Previously a cut of $>$ 3.0 was applied to $\not\!\! E_{T}$ significance 
at the preselection stage and then it was included as one of the variables for NN training. This time as we 
chose to optimize it, since it is still a good variable to provide signal-background discrimination (Figure \ref{fig:metl_note}).
It is important to stress out that after the optimization we performed the 
analysis with the optimized $\not\!\! E_{T}$ significance cut
applied when doing both $\tau$ and b ID (Section \ref{sub:Results-of-the}), therefore 
after the preselection where no $\not\!\! E_{T}$ significance cut was applied. 
We then went back anp reprocessed (preselected) all MC samples with the optimized cut. Both results,
with $\not\!\! E_{T}$ significance applied during and after preselectio were identical. 
We then chose to present this analyis with this cut applied at the preselection 
level in order to have a consistent cut flow throughout the analysis(Section \ref{sub:Preselection}).


\begin{figure}[h]
\includegraphics[scale=0.5]{plots/metl_allEW.eps}
\caption{$\not\!\! E_{T}$ significance distribution for signal and backgrounds.}
\label{fig:metl_note}
\end{figure}


\newpage

Below we describe how we split this part of the analysis into two parts:

\begin{enumerate}
\item {\bf Set optimization:} We applied an arbitrary cut on $\not\!\! E_{T}$ significance of $\geq$ 4.0 and 
varied the set of varibles going into NN training
\item {\bf $\not\!\! E_{T}$ significance optimization:} After chosing the best set based on the lowest RMS, 
we then varied the $\not\!\! E_{T}$ significance cut
\end{enumerate}

For this part of the analysis we present the sets of variables that were taken into account to perform the NN traning
\begin{itemize}
 \item \textit{\textbf{Set I}} : {$H_{T}$},  aplan (aplanarity), sqrts ($\sqrt{s}$)
 \item \textit{\textbf{Set II}} : {$H_{T}$},  aplan, cent (centrality)
 \item \textit{\textbf{Set III}} : {$H_{T}$},  aplan, spher (spherecity)
 \item \textit{\textbf{Set IV}} : {$H_{T}$}, cent, spher
 \item \textit{\textbf{Set V}} : aplan, cent, spher
 \item \textit{\textbf{Set VI}} : {$H_{T}$}, aplan, sqrts, spher
 \item \textit{\textbf{Set VII}} : {$H_{T}$}, aplan, sqrts, cent
 \item \textit{\textbf{Set VIII}} : {$H_{T}$}, aplan, sqrts, costhetastar ($cos(\theta^{*})$
 \item \textit{\textbf{Set IX}} : {$H_{T}$}, aplan, sqrts, cent, spher
 \item \textit{\textbf{Set X}} : {$H_{T}$}, aplan, sqrts, cent, costhetastar
 \item \textit{\textbf{Set XI}} : {$H_{T}$}, aplan, sqrts, spher, costhetastar
 \item \textit{\textbf{Set XII}} : metl, {$H_{T}$}, aplan, sqrts
 \item \textit{\textbf{Set XIII}} : metl, {$H_{T}$}, aplan, cent
 \item \textit{\textbf{Set XIV}} : metl, {$H_{T}$}, aplan, spher
 \item \textit{\textbf{Set XV}} : metl, {$H_{T}$}, cent, spher
 \item \textit{\textbf{Set XVI}} : metl, {$H_{T}$}, aplan
 \item \textit{\textbf{Set XVII}} : metl, {$H_{T}$}, sqrts
 \item \textit{\textbf{Set XVIII}} : metl, aplan, sqrts
 \item \textit{\textbf{Set XIX}} : metl, {$H_{T}$}, cent
 \item \textit{\textbf{Set XX}} : metl, {$H_{T}$}, aplan, sqrts, cent
 \item \textit{\textbf{Set XXI}} : metl, {$H_{T}$}, aplan, cent, spher
 \item \textit{\textbf{Set XXII}} : metl, {$H_{T}$}, aplan, sqrts, spher
 \item \textit{\textbf{Set XXIII}} : metl, {$H_{T}$}, aplan, sqrts, costhetastar
 \item \textit{\textbf{Set XXIV}} : metl, sqrts, cent, spher, costhetastar
 \item \textit{\textbf{Set XXV}} : metl, {$H_{T}$}, cent, spher, costhetastar
 \item \textit{\textbf{Set XXVI}} : metl, aplan, cent, spher, costhetastar
 \item \textit{\textbf{Set XXVII}} : metl, {$H_{T}$}, aplan, cent, costhetastar
 \item \textit{\textbf{Set XXVIII}} : {$H_{T}$}, aplan, topmassl
 \item \textit{\textbf{Set XXIX}} : {$H_{T}$}, aplan, sqrts, topmassl
 \item \textit{\textbf{Set XXX}} : {$H_{T}$}, aplan, sqrts, cent, topmassl
 \item \textit{\textbf{Set XXXI}} : {$H_{T}$}, aplan, sqrts, costhetastar, topmassl
 \item \textit{\textbf{Set XXXII}} : metl, {$H_{T}$}, topmassl, aplan, sqrts
 \item \textit{\textbf{Set XXXIII}} : metl, spher, costhetastar, aplan, cent
% \item \textit{\textbf{Set XXXIV}} : metl, spher, sqrts, topmassl, ktminp
 \end{itemize}

The criteria used for making a decision on which variable should be used follow:
\begin{itemize}
 \item No more than 5 variables to keep NN simple and stable. More variables leads to instabilities (different 
 result after each retraining) and require larger training samples.
% \item We want to use $metl$ (\met significance) variable, since it's the one providing best discrimination.
 \item We do not want to use highly correlated variables in same NN.
% \item We can not use tau-based variables. 
 \item We want to use variables with high discriminating power.
\end{itemize}

In order to make the decision about which of these 11 choices is the optimal we created an ensemble of 
20000 pseudo-datasets each containing events randomly (according to a Poisson distribution) picked 
from QCD, EW and $\ttbar$ templates. Each of these datasets was treated like real data, meaning applying all 
the cuts and doing the shape fit of event topological NN. QCD templates for fit were made from the same 
``loose-tight $\tau$ sample'' from which the QCD component of the ``data'' was drawn. 
The figure of merit chosen is given by Equation \ref{merit} below:

\begin{equation}
f = \displaystyle \frac{(N_{fit} - N_{true})}{N_{true}}
\label{merit}
\end{equation}


\noindent where $N_{fit}$ is the number of $t\bar{t}$ pairs given by 
the fit and $N_{true}$ is the number of $t\bar{t}$ pairs from the Poisson distribution. 
In both Set and $\not\!\! E_{T}$ significance optimization, the lowest RMS was used 
to caracterize which configuration is the best in each case.


The plots showing results concerning the set optimizations are found in Appendix \ref{app:set_opt} and are summarized 
in Table \ref{setopt_table} below, where each RMS and mean are shown.
For NN training is standard to choose the number of hidden nodes as being twice 
the number the number of variables used for the training. The parenthesis after each set ID show the number of 
hidden nodes in NN training.

\begin{table}[htbp]
\begin{tabular}{|c|r|r|r|} \hline
Set of variables  & \multicolumn{1}{c|}{RMS}    & \multicolumn{1}{c|}{mean} \\ \hline

\hline


Set1(6)      &  \multicolumn{1}{c|}{0.1642}   &  \multicolumn{1}{c|}{0.0265}\\ \hline

Set2(6)      &  \multicolumn{1}{c|}{0.1840}     &  \multicolumn{1}{c|}{0.0054}\\ \hline

Set3(6)      &  \multicolumn{1}{c|}{0.1923}   &  \multicolumn{1}{c|}{0.0060}\\ \hline

Set4(6)      &  \multicolumn{1}{c|}{0.1978}   &  \multicolumn{1}{c|}{0.0175}\\ \hline

Set5(6)      &  \multicolumn{1}{c|}{0.2385}     &  \multicolumn{1}{c|}{0.0022}\\ \hline

Set6(8)     &  \multicolumn{1}{c|}{0.1687}   &  \multicolumn{1}{c|}{0.0115}\\ \hline

Set7(8)     &  \multicolumn{1}{c|}{0.1667}   &  \multicolumn{1}{c|}{0.0134}\\ \hline

Set8(10)     &  \multicolumn{1}{c|}{0.1668}     &  \multicolumn{1}{c|}{0.0162}\\ \hline

Set9(10)     &  \multicolumn{1}{c|}{0.1721}     &  \multicolumn{1}{c|}{0.0102}\\ \hline

Set10(10)     &  \multicolumn{1}{c|}{0.1722}     &  \multicolumn{1}{c|}{0.0210}\\ \hline

Se11(10)      &  \multicolumn{1}{c|}{0.1716}   &  \multicolumn{1}{c|}{0.0180}\\ \hline

Set12(8)     &  \multicolumn{1}{c|}{0.1662}     &  \multicolumn{1}{c|}{0.0039}\\ \hline

Set13(8)     &  \multicolumn{1}{c|}{0.1819}     &  \multicolumn{1}{c|}{0.0018}\\ \hline

Set14(8)     &  \multicolumn{1}{c|}{0.1879}     &  \multicolumn{1}{c|}{0.0019}\\ \hline

Set15(8)     &  \multicolumn{1}{c|}{0.1884}     &  \multicolumn{1}{c|}{-0.0004}\\ \hline

Set16(6)     &  \multicolumn{1}{c|}{0.1912}     &  \multicolumn{1}{c|}{0.0034}\\ \hline

Set17(6)     &  \multicolumn{1}{c|}{0.1768}     &  \multicolumn{1}{c|}{0.0074}\\ \hline

Set18(6)     &  \multicolumn{1}{c|}{0.2216}     &  \multicolumn{1}{c|}{-0.0030}\\ \hline

Set19(6)     &  \multicolumn{1}{c|}{0.1921}     &  \multicolumn{1}{c|}{0.0015}\\ \hline

Set20(10)     &  \multicolumn{1}{c|}{0.1620}     &  \multicolumn{1}{c|}{0.0262}\\ \hline

Set21(10)     &  \multicolumn{1}{c|}{0.1753}     &  \multicolumn{1}{c|}{0.0010}\\ \hline

Set22(10)     &  \multicolumn{1}{c|}{0.1646}     &  \multicolumn{1}{c|}{0.0086}\\ \hline

Set23(10)     &  \multicolumn{1}{c|}{0.1683}     &  \multicolumn{1}{c|}{0.0132}\\ \hline

Set24(10)     &  \multicolumn{1}{c|}{0.2053}     &  \multicolumn{1}{c|}{0.0122}\\ \hline

Set25(10)     &  \multicolumn{1}{c|}{0.1906}     &  \multicolumn{1}{c|}{0.0038}\\ \hline

Set26(10)     &  \multicolumn{1}{c|}{0.2130}     &  \multicolumn{1}{c|}{0.0028}\\ \hline

Set27(10)     &  \multicolumn{1}{c|}{0.1859}     &  \multicolumn{1}{c|}{0.0004}\\ \hline

Set28(6)     &  \multicolumn{1}{c|}{0.1910}     &  \multicolumn{1}{c|}{-0.0022}\\ \hline

Set29(8)     &  \multicolumn{1}{c|}{0.1587}     &  \multicolumn{1}{c|}{0.0214}\\ \hline

Set30(10)     &  \multicolumn{1}{c|}{0.1546}     &  \multicolumn{1}{c|}{0.0148}\\ \hline

Set31(10)     &  \multicolumn{1}{c|}{0.1543}     &  \multicolumn{1}{c|}{0.0203}\\ \hline

Set32(10)     &  \multicolumn{1}{c|}{0.1468}     &  \multicolumn{1}{c|}{0.0172}\\ \hline

Set33(10)     &  \multicolumn{1}{c|}{0.2201}     &  \multicolumn{1}{c|}{0.0081}\\ \hline

%Set34(10)     &  \multicolumn{1}{c|}{0.1955}     &  \multicolumn{1}{c|}{0.0184}\\ \hline
\end{tabular}
\caption{Results for set optimization part whit $\not\!\! E_{T}$ significance $>$ 4.0 applied to all sets.
The number in parenthesis refers to number of hidden nodes in each case.}
\label{setopt_table} 
\end{table}

From Table \ref{setopt_table} we see that Set I has the lowest RMS, thus we chose it
as the set to be used in $\not\!\! E_{T}$ significance optimization part, whose results are
shown in Appendix \ref{app:metl_opt} and then summarized in Table \ref{metlopt_table} below

\begin{table}[htbp]
\begin{tabular}{|c|r|r|r|} \hline
Set of variables  & $\not\!\! E_{T}$ significance cut & RMS    & \multicolumn{1}{c|}{mean} \\ \hline

\hline


%Set6(10)      &  \multicolumn{1}{c|}{1.0} &  \multicolumn{1}{c|}{0.2611}   \\ \hline

%Set6(10)      &  \multicolumn{1}{c|}{1.5} &  \multicolumn{1}{c|}{0.2320}   \\ \hline

%Set6(10)      &  \multicolumn{1}{c|}{2.0} &  \multicolumn{1}{c|}{0.2102}   \\ \hline

%Set6(10)      &  \multicolumn{1}{c|}{2.5} &  \multicolumn{1}{c|}{0.2021}   \\ \hline

Set32(10)      &  \multicolumn{1}{c|}{3.0} &  \multicolumn{1}{c|}{0.1507}   &  \multicolumn{1}{c|}{0.0157}\\ \hline

Set32(10)      &  \multicolumn{1}{c|}{3.5} &  \multicolumn{1}{c|}{0.1559}   &  \multicolumn{1}{c|}{0.0189}\\ \hline

Set32(10)     &  \multicolumn{1}{c|}{4.0} &  \multicolumn{1}{c|}{0.1468}   &  \multicolumn{1}{c|}{0.0172}\\ \hline

Set32(10)     &  \multicolumn{1}{c|}{4.5} &  \multicolumn{1}{c|}{0.1511}   &  \multicolumn{1}{c|}{0.0153}\\ \hline

Set32(10)     &  \multicolumn{1}{c|}{5.0} &  \multicolumn{1}{c|}{0.1552}   &  \multicolumn{1}{c|}{0.0205}\\ \hline

%Set6(10)     &  \multicolumn{1}{c|}{5.5} &  \multicolumn{1}{c|}{0.4008}   \\ \hline
\end{tabular}
\caption{Results for $\not\!\! E_{T}$ significance optimization part when varying the $\not\!\! E_{T}$ significance cut
The number in parenthesis refers to number of hidden nodes in each case.}
\label{metlopt_table} 
\end{table}


Combined results from Tables \ref{setopt_table} and \ref{metlopt_table} show that the best configuration found
was Set I  with $\not\!\! E_{T}$ significance $\geq$ 4.0. Therefore, this was the 
configuration used to perform the cross-section measurement.Figure \ref{fig:METsig_RMS} shows the variation of the RMS as function 
of the $\not\!\! E_{T}$ significance we applied.

\begin{figure}[b]
\includegraphics[scale=0.4]{plots/METsig-RMS.eps}
\caption{Plot of RMS as a function the $\not\!\! E_{T}$ significance applied}
\label{fig:METsig_RMS}
\end{figure}


\clearpage


In order to check the validity of our emsemble tests procedure, it is instructive to plot both the 
distribution of the predicted number of $t\bar{t}$ and what is called ``pull'', defined in Equation  
\ref{pull} below:

\begin{equation}
p = \displaystyle \frac{(N_{fit}-N_{true})}{\sigma_{fit}}
\label{pull}
\end{equation}

\noindent where $\sigma_{fit}$ is the error on the number of $t\bar{t}$ pairs given by the fit.

Figures \ref{fig:gaus_ttbar} and \ref{fig:pull} show both beforementioned distributions.

From Figure \ref{fig:gaus_ttbar} we see a good agreement between the number of $t\bar{t}$ pairs
initially set in the ensemble and the measured value. And Figure \ref{fig:pull} shows a nice gaussian
curve, that indicates a good behaviour of the fit uncertainties in the ensembles.

\begin{figure}[t]
\includegraphics[scale=0.5]{plots/gaus_ttbar.eps}
\caption{Distribution of the output ``measurement'' for an ensemble with 116.9 $\ttbar$ events.}
\label{fig:gaus_ttbar}
\end{figure}

\begin{figure}[t]
\includegraphics[scale=0.5]{plots/pull1-40.eps}
\caption{The ensemble test's pull.}
\label{fig:pull}
\end{figure}


%\newpage


\clearpage