\newpage

\section{\label{sub:NN}Neural Network Analysis}

\subsection{\label{sub:Variables}Variables for NN training}

\noindent Following the same procedure as in the p17 analysis, we determine the content
of signal and background in the preselected sample, increase signal/background rate
and from this, measure the cross section.
In p17, an artificil neural network based on topological characteristics of an event was used to
extract signal from a background-enriched region. As before, the criteria used in choosing the variables were:
power of discrimination and $\tau$-uncorrelated variables. The following variables were considered:

\begin{itemize}
\item \textit{\textbf{$H_{T}$}} - The scalar sum of all jet $p_{T}$'s (here and below including $\tau$ lepton candidates). 
For $H_{T}$ values above $\sim$ 200 GeV we observed a dominance of signal over background.

\item \textit{\textbf{$\not\!\! E_{T}$ significance}} - It is computed from calculated resolutions of 
physical objects (jets, electrons, muons and unclustered energy) \cite{p17_note,METsig}. 
It was chosen to be used and optimized due to its good signal-background discrimination power.

\item \textit{\textbf{Aplanarity}} \cite{p17topo} - the normalized momentum tensor is defined as

\begin{center}
\begin{equation}
{\cal M}_{ab} \equiv \frac{\sum_{i}p_{ia}p_{ib}}{\sum_{i}p^{2}_{i}}
\label{tensor}
\end{equation}
\end{center}

\noindent where $p_{i}$ is  the momentum-vector
and teh index $i$ runs over all the jets and the $W$. From the diagonalization of $\cal M$ we find three eigenvalues
$\lambda_{1}\geq\lambda_{2}\geq\lambda_{3}$ with the constraint $\lambda_{1} + \lambda_{2} + \lambda_{3} = 1$.
The aplanarity is defined as {$\cal A$} = $\frac{3}{2}\lambda_{3}$ and measures the flatness of an event.
It assumes values in the range $0 \leq {\cal A} \leq 0.5$. 
It was chosen to be used in the NN due to the fact that large values of {$\cal A$} correspond to more spherical events,
like $t\bar{t}$ events for instance, since they are typical of cascade decays of heavy objects. On the other hand,
both QCD and $W + \mbox{jets}$ events tend to be more collinear since jets in these events are primarily due to
initial state radiation.

\item \textit{\textbf{Sphericity}} \cite{p17topo} - Defined as {$\cal S$} = $\frac{3}{2}(\lambda_{2} + \lambda_{3})$,
and ranges as $0 \leq {\cal S} \leq 1.0$, sphericity is a measure of the summed $p^{2}_{\perp}$
More isotropic events have {$\cal S$} $\approx 1$ while less isotropic ones have {$\cal S$} $\approx 0$. 
Sphericity is a good discrminator since $t\bar{t}$ events are very isotropic as they are typical of the decays of heavy objects
and both QCD and $W + \mbox{jets}$ events are less isotropic due to the fact that jets in these events come 
primarily from initial state radiation.

\item \textit{\textbf{Top and $W$ mass likelihood}} - a $\chi^{2}$-like variable. 
$L\equiv\left(\frac{M_{3j}-m_{t}}{\sigma_{t}}\right)^{2}+\left(\frac{M_{2j}-M_{W}}{\sigma_{W}}\right)^{2}$,
where $m_{t}, M_{W},\sigma_{t},\sigma_{W}$ are top and W masses (172.4
GeV and 81.02 GeV respectively) and resolution values (19.4 GeV and 8.28
GeV respectively). $M_{3j}$ and $M_{2j}$ are invariant masses composed
of  2- and 3-jet combinations. We choose combination that minimizes $L$. 

\item \textit{\textbf{Centrality}}, defined as $\frac{H_{T}}{H_{E}}$ , where $H_{E}$ is sum
of energies of the jets. Used as discrimination variable since highe values ($\sim$ 1.0) are 
more signal-dominated while low values ($\sim$ 0) are more background-dominated.

\item \textit{\textbf{$\cos(\theta*)$}} -  The angle between the beam axis and the 
highest-$p_T$ jet in the rest frame of all the jets in the event. $t\bar{t}$ events tend to have
a lower ($\sim$ 0) $\cos(\theta*)$ values. This motivated its choice.

\item \textit{\textbf{$M_{jj\tau}$}} - The invariant mass of all jets and $\tau$s in the event.

\end{itemize}

The chosen variables are in the end a consequence of the method employed  in this
analysis: use events from the QCD-enriched 
loose-tight sample to model QCD events in the signal-rich sample, and use
a b-tag veto sample as an independent control sample to check the validity of such 
background modeling. Plots of all variables described above are found in Appendix \ref{app:discri_var}.

%\clearpage

\subsection{\label{sub:NN-variables}Topological NN}
For training the Neural Network we used the Multilayer Perceptron algorithm, as described in 
\cite{MLPfit}. As explained before in Section \ref{sub:Results-of-the}, the first 1400000 
events in the ``loose-tight'' sample were used as background 
for NN training for taus of Types 1 and 2, and the first 600000 of the same sample for NN training for type 3 taus.
In both cases 1/3 of the Alpgen sample of $t\bar{t} \rightarrow \tau +jets$ was used for NN training and 2/3 of it
for the measurement.
When doing the measurement later on (Section \ref{sub:xsect}) we pick the tau with the highest $NN(\tau)$
in the signal sample as the tau cadidate at same time that taus in the loose-tight sample are picked at
random since all of them are regarded as fake taus by being below the cut $NN(\tau)$ = 0.7. By doing this 
we expect to avoid any bias when selecting real taus for the measurement.
Figures \ref{fig:nnout_type2_training} and \ref{fig:nnout_type3_training} show the 
effect of each of the chosen the topological event NN input variables on the final output.

Figures \ref{fig:nnout_type2} and \ref{fig:nnout_type3} show the NN output as a result of the training
described above. It is evident from both pictures that high values of NN correspond to 
the signal-enriched region. 


\begin{figure}[h]
\includegraphics[scale=0.49]{plots/SetI_NNout_SM_type2_tauQCD.eps}
\caption{Training of topological Neural Network output for Type 1 and 2 $\tau$ channel combined. 
Upper left: relative impact of each of the input variables; upper right: relative weights
of the synaptic connections of the trained network;
lower left: convergence curves; lower right: the output distribution of signal and background
test samples after training.}
\label{fig:nnout_type2_training}
\end{figure}

%\newpage


\begin{figure}[h]
\includegraphics[scale=0.49]{plots/SetI_NNout_SM_type3_tauQCD.eps}
\caption{Training of topological Neural Network output for type 3 $\tau$ channel. 
Upper left: relative impact of each of the input variables; upper right: relative weights
of the synaptic connections of the trained network;
lower left: convergence curves; lower right: the output distribution of signal and background
test samples after training.}
\label{fig:nnout_type3_training}
\end{figure}

\begin{figure}[h]
\includegraphics[scale=0.5]{CONTROLPLOTS/Std_TypeI_II/nnout.eps}
\caption{The topological Neural Network output for type 1 and 2 $\tau$ channel}
\label{fig:nnout_type2}
\end{figure}

\newpage

\begin{figure}[t]
\includegraphics[scale=0.5]{CONTROLPLOTS/Std_TypeIII/nnout.eps}
\caption{The topological Neural Network output for type 3 $\tau$ channel}
\label{fig:nnout_type3}
\end{figure}


\subsection{\label{sub:NN-optimization}NN optimization}
One difference between this present analysis and the previous p17 is that we performed a NN optimization along with a
$\not\!\! E_{T}$ significance optimization. Previously a cut of $>$ 3.0 was applied to $\not\!\! E_{T}$ significance 
at the preselection stage and then it was included as one of the variables for NN training. This time as we 
chose to optimize it, since it is still a good variable to provide signal-background discrimination (Figure \ref{fig:metl_note}).
It is important to stress out that after the optimization we performed the 
analysis with the optimized $\not\!\! E_{T}$ significance cut
applied when doing both $\tau$ and b ID (Section \ref{sub:Results-of-the}), therefore 
after the preselection where no $\not\!\! E_{T}$ significance cut was applied. 
We then went back anp reprocessed (preselected) all MC samples with the optimized cut. Both results,
with $\not\!\! E_{T}$ significance applied during and after preselectio were identical. 
We then chose to present this analyis with this cut applied at the preselection 
level in order to have a consistent cut flow throughout the analysis(Section \ref{sub:Preselection}).


\begin{figure}[h]
\includegraphics[scale=0.5]{plots/metl_allEW.eps}
\caption{$\not\!\! E_{T}$ significance distribution for signal and backgrounds.}
\label{fig:metl_note}
\end{figure}


\newpage

Below we describe how we split this part of the analysis into two parts:

\begin{enumerate}
\item {\bf Set optimization:} We applied an ``reasonable'' cut on $\not\!\! E_{T}$ significance of $\geq$ 4.0 and 
varied the set of varibles going into NN training.
\item {\bf $\not\!\! E_{T}$ significance optimization:} After chosing the best set based on the lowest RMS of the
figure of merith used (see Eq. \ref{merit}), we then optimized the $\not\!\! E_{T}$ significance cut.
\end{enumerate}

For this part of the analysis we present the sets of variables that were taken into account to perform the NN traning:
\begin{itemize}
 \item \textit{\textbf{Set 1}} : {$H_{T}$},  aplan (aplanarity), Mjjtau ($M_{jj\tau}$)
 \item \textit{\textbf{Set 2}} : {$H_{T}$},  aplan, cent (centrality)
 \item \textit{\textbf{Set 3}} : {$H_{T}$},  aplan, spher (spherecity)
 \item \textit{\textbf{Set 4}} : {$H_{T}$}, cent, spher
 \item \textit{\textbf{Set 5}} : aplan, cent, spher
 \item \textit{\textbf{Set 6}} : {$H_{T}$}, aplan, Mjjtau, spher
 \item \textit{\textbf{Set 7}} : {$H_{T}$}, aplan, Mjjtau, cent
 \item \textit{\textbf{Set 8}} : {$H_{T}$}, aplan, Mjjtau, costhetastar ($cos(\theta^{*})$)
 \item \textit{\textbf{Set 9}} : {$H_{T}$}, aplan, Mjjtau, cent, spher
 \item \textit{\textbf{Set 10}} : {$H_{T}$}, aplan, Mjjtau, cent, costhetastar
 \item \textit{\textbf{Set 11}} : {$H_{T}$}, aplan, Mjjtau, spher, costhetastar
 \item \textit{\textbf{Set 12}} : METsig ($\not\!\! E_{T}$ significance), {$H_{T}$}, aplan, Mjjtau
 \item \textit{\textbf{Set 13}} : METsig, {$H_{T}$}, aplan, cent
 \item \textit{\textbf{Set 14}} : METsig, {$H_{T}$}, aplan, spher
 \item \textit{\textbf{Set 15}} : METsig, {$H_{T}$}, cent, spher
 \item \textit{\textbf{Set 16}} : METsig, {$H_{T}$}, aplan
 \item \textit{\textbf{Set 17}} : METsig, {$H_{T}$}, Mjjtau
 \item \textit{\textbf{Set 18}} : METsig, aplan, Mjjtau
 \item \textit{\textbf{Set 19}} : METsig, {$H_{T}$}, cent
 \item \textit{\textbf{Set 20}} : METsig, {$H_{T}$}, aplan, Mjjtau, cent
 \item \textit{\textbf{Set 21}} : METsig, {$H_{T}$}, aplan, cent, spher
 \item \textit{\textbf{Set 22}} : METsig, {$H_{T}$}, aplan, Mjjtau, spher
 \item \textit{\textbf{Set 23}} : METsig, {$H_{T}$}, aplan, Mjjtau, costhetastar
 \item \textit{\textbf{Set 24}} : METsig, Mjjtau, cent, spher, costhetastar
 \item \textit{\textbf{Set 25}} : METsig, {$H_{T}$}, cent, spher, costhetastar
 \item \textit{\textbf{Set 26}} : METsig, aplan, cent, spher, costhetastar
 \item \textit{\textbf{Set 27}} : METsig, {$H_{T}$}, aplan, cent, costhetastar
 \item \textit{\textbf{Set 28}} : {$H_{T}$}, aplan, topmassl
 \item \textit{\textbf{Set 29}} : {$H_{T}$}, aplan, Mjjtau, topmassl
 \item \textit{\textbf{Set 30}} : {$H_{T}$}, aplan, Mjjtau, cent, topmassl
 \item \textit{\textbf{Set 31}} : {$H_{T}$}, aplan, Mjjtau, costhetastar, topmassl
 \item \textit{\textbf{Set 32}} : METsig, {$H_{T}$}, topmassl, aplan, Mjjtau
 \item \textit{\textbf{Set 33}} : METsig, spher, costhetastar, aplan, cent
% \item \textit{\textbf{Set XXXIV}} : metl, spher, Mjjtau, topmassl, ktminp
 \end{itemize}

P17 tried only three different sets among hundreds of possible combinations. We believe that the 
33 sets tested above suffice in giving an optimal result.
The criteria used for making a decision on which variable should be used follow:
\begin{itemize}
 \item No more than 5 variables to keep NN simple and stable. More require larger training samples.
 \item We want to use METsig variable, since it's the one providing best discrimination.
 \item We do not want to use highly correlated variables in same NN. Such as $H_{T}$ and jet $p_{T}$'.
% \item We can not use tau-based variables. 
 \item We want to use variables with high discriminating power.
\end{itemize}

In order to make the decision about which of these 33 choices is the optimal we created an ensemble of 
20000 pseudo-datasets each containing events randomly (according to a Poisson distribution) picked 
from QCD, EW and $\ttbar$ templates. Each of these datasets was treated like real data, meaning applying all 
the cuts and doing the shape fit of event topological NN. QCD templates for fit were made from the same 
``loose-tight $\tau$ sample'' from which the QCD component of the ``data'' was drawn. 
We used the folloing quantity as the figure of merit:

\begin{equation}
f = \displaystyle \frac{(N_{fit} - N_{true})}{N_{true}}
\label{merit}
\end{equation}


\noindent where $N_{fit}$ is the number of $t\bar{t}$ pairs given by 
the fit and $N_{true}$ is the number of $t\bar{t}$ pairs from the Poisson distribution. 
In both Set and $\not\!\! E_{T}$ significance optimization, the lowest RMS was used 
to characterize which configuration is the best in each case.


The plots showing results concerning the set optimization are found in Appendix \ref{app:set_opt} and are summarized 
in Table \ref{setopt_table} below, where each RMS and mean are shown. The parenthesis after each set ID show the number of 
hidden nodes in NN training.

\begin{table}[htbp]
\begin{tabular}{|c|r|r|r|} \hline
Set of variables  & \multicolumn{1}{c|}{RMS}    & \multicolumn{1}{c|}{mean} \\ \hline

\hline


Set1(6)      &  \multicolumn{1}{c|}{0.1642}   &  \multicolumn{1}{c|}{0.0265}\\ \hline

Set2(6)      &  \multicolumn{1}{c|}{0.1840}     &  \multicolumn{1}{c|}{0.0054}\\ \hline

Set3(6)      &  \multicolumn{1}{c|}{0.1923}   &  \multicolumn{1}{c|}{0.0060}\\ \hline

Set4(6)      &  \multicolumn{1}{c|}{0.1978}   &  \multicolumn{1}{c|}{0.0175}\\ \hline

Set5(6)      &  \multicolumn{1}{c|}{0.2385}     &  \multicolumn{1}{c|}{0.0022}\\ \hline

Set6(8)     &  \multicolumn{1}{c|}{0.1687}   &  \multicolumn{1}{c|}{0.0115}\\ \hline

Set7(8)     &  \multicolumn{1}{c|}{0.1667}   &  \multicolumn{1}{c|}{0.0134}\\ \hline

Set8(10)     &  \multicolumn{1}{c|}{0.1668}     &  \multicolumn{1}{c|}{0.0162}\\ \hline

Set9(10)     &  \multicolumn{1}{c|}{0.1721}     &  \multicolumn{1}{c|}{0.0102}\\ \hline

Set10(10)     &  \multicolumn{1}{c|}{0.1722}     &  \multicolumn{1}{c|}{0.0210}\\ \hline

Se11(10)      &  \multicolumn{1}{c|}{0.1716}   &  \multicolumn{1}{c|}{0.0180}\\ \hline

Set12(8)     &  \multicolumn{1}{c|}{0.1662}     &  \multicolumn{1}{c|}{0.0039}\\ \hline

Set13(8)     &  \multicolumn{1}{c|}{0.1819}     &  \multicolumn{1}{c|}{0.0018}\\ \hline

Set14(8)     &  \multicolumn{1}{c|}{0.1879}     &  \multicolumn{1}{c|}{0.0019}\\ \hline

Set15(8)     &  \multicolumn{1}{c|}{0.1884}     &  \multicolumn{1}{c|}{-0.0004}\\ \hline

Set16(6)     &  \multicolumn{1}{c|}{0.1912}     &  \multicolumn{1}{c|}{0.0034}\\ \hline

Set17(6)     &  \multicolumn{1}{c|}{0.1768}     &  \multicolumn{1}{c|}{0.0074}\\ \hline

Set18(6)     &  \multicolumn{1}{c|}{0.2216}     &  \multicolumn{1}{c|}{-0.0030}\\ \hline

Set19(6)     &  \multicolumn{1}{c|}{0.1921}     &  \multicolumn{1}{c|}{0.0015}\\ \hline

Set20(10)     &  \multicolumn{1}{c|}{0.1620}     &  \multicolumn{1}{c|}{0.0262}\\ \hline

Set21(10)     &  \multicolumn{1}{c|}{0.1753}     &  \multicolumn{1}{c|}{0.0010}\\ \hline

Set22(10)     &  \multicolumn{1}{c|}{0.1646}     &  \multicolumn{1}{c|}{0.0086}\\ \hline

Set23(10)     &  \multicolumn{1}{c|}{0.1683}     &  \multicolumn{1}{c|}{0.0132}\\ \hline

Set24(10)     &  \multicolumn{1}{c|}{0.2053}     &  \multicolumn{1}{c|}{0.0122}\\ \hline

Set25(10)     &  \multicolumn{1}{c|}{0.1906}     &  \multicolumn{1}{c|}{0.0038}\\ \hline

Set26(10)     &  \multicolumn{1}{c|}{0.2130}     &  \multicolumn{1}{c|}{0.0028}\\ \hline

Set27(10)     &  \multicolumn{1}{c|}{0.1859}     &  \multicolumn{1}{c|}{0.0004}\\ \hline

Set28(6)     &  \multicolumn{1}{c|}{0.1910}     &  \multicolumn{1}{c|}{-0.0022}\\ \hline

Set29(8)     &  \multicolumn{1}{c|}{0.1587}     &  \multicolumn{1}{c|}{0.0214}\\ \hline

Set30(10)     &  \multicolumn{1}{c|}{0.1546}     &  \multicolumn{1}{c|}{0.0148}\\ \hline

Set31(10)     &  \multicolumn{1}{c|}{0.1543}     &  \multicolumn{1}{c|}{0.0203}\\ \hline

Set32(10)     &  \multicolumn{1}{c|}{0.1468}     &  \multicolumn{1}{c|}{0.0172}\\ \hline

Set33(10)     &  \multicolumn{1}{c|}{0.2201}     &  \multicolumn{1}{c|}{0.0081}\\ \hline

%Set34(10)     &  \multicolumn{1}{c|}{0.1955}     &  \multicolumn{1}{c|}{0.0184}\\ \hline
\end{tabular}
\caption{Results for set optimization part whit $\not\!\! E_{T}$ significance $>$ 4.0 applied to all sets.
The number in parenthesis refers to number of hidden nodes in each case.}
\label{setopt_table} 
\end{table}

From Table \ref{setopt_table} we see that Set 32 has the lowest RMS, thus we chose it
as the set to be used in $\not\!\! E_{T}$ significance optimization part, whose results are
shown in Appendix \ref{app:metl_opt} and then summarized in Table \ref{metlopt_table} below

\begin{table}[htbp]
\begin{tabular}{|c|r|r|r|r|} \hline
Set 32  & Number of hidden nodes &$\not\!\! E_{T}$ significance cut & RMS    & \multicolumn{1}{c|}{mean} \\ \hline

\hline


%Set6(10)      &  \multicolumn{1}{c|}{1.0} &  \multicolumn{1}{c|}{0.2611}   \\ \hline

%Set6(10)      &  \multicolumn{1}{c|}{1.5} &  \multicolumn{1}{c|}{0.2320}   \\ \hline

%Set6(10)      &  \multicolumn{1}{c|}{2.0} &  \multicolumn{1}{c|}{0.2102}   \\ \hline

%Set6(10)      &  \multicolumn{1}{c|}{2.5} &  \multicolumn{1}{c|}{0.2021}   \\ \hline

1     &  \multicolumn{1}{c|}{10} &  \multicolumn{1}{c|}{3.0} &  \multicolumn{1}{c|}{0.1507}   &  \multicolumn{1}{c|}{0.0157}\\ \hline

2      &  \multicolumn{1}{c|}{10} &  \multicolumn{1}{c|}{3.5} &  \multicolumn{1}{c|}{0.1559}   &  \multicolumn{1}{c|}{0.0189}\\ \hline

3     &  \multicolumn{1}{c|}{10} &  \multicolumn{1}{c|}{4.0} &  \multicolumn{1}{c|}{0.1468}   &  \multicolumn{1}{c|}{0.0172}\\ \hline

4     &  \multicolumn{1}{c|}{10} &  \multicolumn{1}{c|}{4.5} &  \multicolumn{1}{c|}{0.1511}   &  \multicolumn{1}{c|}{0.0153}\\ \hline

5     &  \multicolumn{1}{c|}{10} &  \multicolumn{1}{c|}{5.0} &  \multicolumn{1}{c|}{0.1552}   &  \multicolumn{1}{c|}{0.0205}\\ \hline

%Set6(10)     &  \multicolumn{1}{c|}{5.5} &  \multicolumn{1}{c|}{0.4008}   \\ \hline
\end{tabular}
\caption{Results for $\not\!\! E_{T}$ significance optimization part when varying the $\not\!\! E_{T}$ significance cut
The number in parenthesis refers to number of hidden nodes in each case.}
\label{metlopt_table} 
\end{table}


Combined results from Tables \ref{setopt_table} and \ref{metlopt_table} show that the best configuration found
was Set 32  with $\not\!\! E_{T}$ significance $\geq$ 4.0. Therefore, this was the 
configuration used to perform the cross section measurement.Figure \ref{fig:METsig_RMS} shows the variation of the RMS as function 
of the $\not\!\! E_{T}$ significance we applied.

\begin{figure}[h]
\includegraphics[scale=0.35]{plots/METsig-RMS.eps}
\caption{Plot of RMS as a function the $\not\!\! E_{T}$ significance applied}
\label{fig:METsig_RMS}
\end{figure}


%\clearpage


In order to check the validity of our emsemble tests procedure, it is instructive to plot both the 
distribution of the predicted number of $t\bar{t}$ and what is called ``pull'', defined in Equation  
\ref{pull} below:

\begin{equation}
p = \displaystyle \frac{(N_{fit}-N_{true})}{\sigma_{fit}}
\label{pull}
\end{equation}

\noindent where $\sigma_{fit}$ is the error on the number of $t\bar{t}$ pairs given by the fit.

Figures \ref{fig:gaus_ttbar} and \ref{fig:pull} show both beforementioned distributions.

From Figure \ref{fig:gaus_ttbar} we see a good agreement between the number of $t\bar{t}$ pairs
initially set in the ensemble and the measured value. And Figure \ref{fig:pull} shows a nice gaussian
curve, that indicates a good behaviour of the fit uncertainties in the ensembles.

\begin{figure}[t]
\includegraphics[scale=0.40]{plots/gaus_ttbar.eps}
\caption{Distribution of the output ``measurement'' for an ensemble with 116.9 $\ttbar$ events.}
\label{fig:gaus_ttbar}
\end{figure}

\begin{figure}[b]
\includegraphics[scale=0.40]{plots/pull1-40.eps}
\caption{The ensemble test's pull.}
\label{fig:pull}
\end{figure}


%\newpage


\clearpage