\newpage \section{\label{sub:NN}Neural Network Analysis} \subsection{\label{sub:Variables}Variables for NN training} \noindent Following the same procedure as in the p17 analysis, we determine the content of signal and background in the preselected sample, increase signal/background rate and from this, measure the cross section. In p17, an artificil neural network based on topological characteristics of an event was used to extract signal from a background-enriched region. As before, the criteria used in choosing the variables were: power of discrimination and $\tau$-uncorrelated variables. The following variables were considered: \begin{itemize} \item \textit{\textbf{$H_{T}$}} - The scalar sum of all jet $p_{T}$'s (here and below including $\tau$ lepton candidates). For $H_{T}$ values above $\sim$ 200 GeV we observed a dominance of signal over background. \item \textit{\textbf{$\not\!\! E_{T}$ significance}} - It is computed from calculated resolutions of physical objects (jets, electrons, muons and unclustered energy) \cite{p17_note,METsig}. It was chosen to be used and optimized due to its good signal-background discrimination power. \item \textit{\textbf{Aplanarity}} \cite{p17topo} - the normalized momentum tensor is defined as \begin{center} \begin{equation} {\cal M}_{ab} \equiv \frac{\sum_{i}p_{ia}p_{ib}}{\sum_{i}p^{2}_{i}} \label{tensor} \end{equation} \end{center} \noindent where $p_{i}$ is the momentum-vector and teh index $i$ runs over all the jets and the $W$. From the diagonalization of $\cal M$ we find three eigenvalues $\lambda_{1}\geq\lambda_{2}\geq\lambda_{3}$ with the constraint $\lambda_{1} + \lambda_{2} + \lambda_{3} = 1$. The aplanarity is defined as {$\cal A$} = $\frac{3}{2}\lambda_{3}$ and measures the flatness of an event. It assumes values in the range $0 \leq {\cal A} \leq 0.5$. It was chosen to be used in the NN due to the fact that large values of {$\cal A$} correspond to more spherical events, like $t\bar{t}$ events for instance, since they are typical of cascade decays of heavy objects. On the other hand, both QCD and $W + \mbox{jets}$ events tend to be more collinear since jets in these events are primarily due to initial state radiation. \item \textit{\textbf{Sphericity}} \cite{p17topo} - Defined as {$\cal S$} = $\frac{3}{2}(\lambda_{2} + \lambda_{3})$, and ranges as $0 \leq {\cal S} \leq 1.0$, sphericity is a measure of the summed $p^{2}_{\perp}$ More isotropic events have {$\cal S$} $\approx 1$ while less isotropic ones have {$\cal S$} $\approx 0$. Sphericity is a good discrminator since $t\bar{t}$ events are very isotropic as they are typical of the decays of heavy objects and both QCD and $W + \mbox{jets}$ events are less isotropic due to the fact that jets in these events come primarily from initial state radiation. \item \textit{\textbf{Top and $W$ mass likelihood}} - a $\chi^{2}$-like variable. $L\equiv\left(\frac{M_{3j}-m_{t}}{\sigma_{t}}\right)^{2}+\left(\frac{M_{2j}-M_{W}}{\sigma_{W}}\right)^{2}$, where $m_{t}, M_{W},\sigma_{t},\sigma_{W}$ are top and W masses (172.4 GeV and 81.02 GeV respectively) and resolution values (19.4 GeV and 8.28 GeV respectively). $M_{3j}$ and $M_{2j}$ are invariant masses composed of 2- and 3-jet combinations. We choose combination that minimizes $L$. \item \textit{\textbf{Centrality}}, defined as $\frac{H_{T}}{H_{E}}$ , where $H_{E}$ is sum of energies of the jets. Used as discrimination variable since highe values ($\sim$ 1.0) are more signal-dominated while low values ($\sim$ 0) are more background-dominated. \item \textit{\textbf{$\cos(\theta*)$}} - The angle between the beam axis and the highest-$p_T$ jet in the rest frame of all the jets in the event. $t\bar{t}$ events tend to have a lower ($\sim$ 0) $\cos(\theta*)$ values. This motivated its choice. \item \textit{\textbf{$M_{jj\tau}$}} - The invariant mass of all jets and $\tau$s in the event. \end{itemize} The chosen variables are in the end a consequence of the method employed in this analysis: use events from the QCD-enriched loose-tight sample to model QCD events in the signal-rich sample, and use a b-tag veto sample as an independent control sample to check the validity of such background modeling. Plots of all variables described above are found in Appendix \ref{app:discri_var}. %\clearpage \subsection{\label{sub:NN-variables}Topological NN} For training the Neural Network we used the Multilayer Perceptron algorithm, as described in \cite{MLPfit}. As explained before in Section \ref{sub:Results-of-the}, the first 1400000 events in the ``loose-tight'' sample were used as background for NN training for taus of Types 1 and 2, and the first 600000 of the same sample for NN training for type 3 taus. In both cases 1/3 of the Alpgen sample of $t\bar{t} \rightarrow \tau +jets$ was used for NN training and 2/3 of it for the measurement. When doing the measurement later on (Section \ref{sub:xsect}) we pick the tau with the highest $NN(\tau)$ in the signal sample as the tau cadidate at same time that taus in the loose-tight sample are picked at random since all of them are regarded as fake taus by being below the cut $NN(\tau)$ = 0.7. By doing this we expect to avoid any bias when selecting real taus for the measurement. Figures \ref{fig:nnout_type2_training} and \ref{fig:nnout_type3_training} show the effect of each of the chosen the topological event NN input variables on the final output. Figures \ref{fig:nnout_type2} and \ref{fig:nnout_type3} show the NN output as a result of the training described above. It is evident from both pictures that high values of NN correspond to the signal-enriched region. \begin{figure}[h] \includegraphics[scale=0.49]{plots/SetI_NNout_SM_type2_tauQCD.eps} \caption{Training of topological Neural Network output for Type 1 and 2 $\tau$ channel combined. Upper left: relative impact of each of the input variables; upper right: relative weights of the synaptic connections of the trained network; lower left: convergence curves; lower right: the output distribution of signal and background test samples after training.} \label{fig:nnout_type2_training} \end{figure} %\newpage \begin{figure}[h] \includegraphics[scale=0.49]{plots/SetI_NNout_SM_type3_tauQCD.eps} \caption{Training of topological Neural Network output for type 3 $\tau$ channel. Upper left: relative impact of each of the input variables; upper right: relative weights of the synaptic connections of the trained network; lower left: convergence curves; lower right: the output distribution of signal and background test samples after training.} \label{fig:nnout_type3_training} \end{figure} \begin{figure}[h] \includegraphics[scale=0.5]{CONTROLPLOTS/Std_TypeI_II/nnout.eps} \caption{The topological Neural Network output for type 1 and 2 $\tau$ channel} \label{fig:nnout_type2} \end{figure} \newpage \begin{figure}[t] \includegraphics[scale=0.5]{CONTROLPLOTS/Std_TypeIII/nnout.eps} \caption{The topological Neural Network output for type 3 $\tau$ channel} \label{fig:nnout_type3} \end{figure} \subsection{\label{sub:NN-optimization}NN optimization} One difference between this present analysis and the previous p17 is that we performed a NN optimization along with a $\not\!\! E_{T}$ significance optimization. Previously a cut of $>$ 3.0 was applied to $\not\!\! E_{T}$ significance at the preselection stage and then it was included as one of the variables for NN training. This time as we chose to optimize it, since it is still a good variable to provide signal-background discrimination (Figure \ref{fig:metl_note}). It is important to stress out that after the optimization we performed the analysis with the optimized $\not\!\! E_{T}$ significance cut applied when doing both $\tau$ and b ID (Section \ref{sub:Results-of-the}), therefore after the preselection where no $\not\!\! E_{T}$ significance cut was applied. We then went back anp reprocessed (preselected) all MC samples with the optimized cut. Both results, with $\not\!\! E_{T}$ significance applied during and after preselectio were identical. We then chose to present this analyis with this cut applied at the preselection level in order to have a consistent cut flow throughout the analysis(Section \ref{sub:Preselection}). \begin{figure}[h] \includegraphics[scale=0.5]{plots/metl_allEW.eps} \caption{$\not\!\! E_{T}$ significance distribution for signal and backgrounds.} \label{fig:metl_note} \end{figure} \newpage Below we describe how we split this part of the analysis into two parts: \begin{enumerate} \item {\bf Set optimization:} We applied an ``reasonable'' cut on $\not\!\! E_{T}$ significance of $\geq$ 4.0 and varied the set of varibles going into NN training. \item {\bf $\not\!\! E_{T}$ significance optimization:} After chosing the best set based on the lowest RMS of the figure of merith used (see Eq. \ref{merit}), we then optimized the $\not\!\! E_{T}$ significance cut. \end{enumerate} For this part of the analysis we present the sets of variables that were taken into account to perform the NN traning: \begin{itemize} \item \textit{\textbf{Set 1}} : {$H_{T}$}, aplan (aplanarity), Mjjtau ($M_{jj\tau}$) \item \textit{\textbf{Set 2}} : {$H_{T}$}, aplan, cent (centrality) \item \textit{\textbf{Set 3}} : {$H_{T}$}, aplan, spher (spherecity) \item \textit{\textbf{Set 4}} : {$H_{T}$}, cent, spher \item \textit{\textbf{Set 5}} : aplan, cent, spher \item \textit{\textbf{Set 6}} : {$H_{T}$}, aplan, Mjjtau, spher \item \textit{\textbf{Set 7}} : {$H_{T}$}, aplan, Mjjtau, cent \item \textit{\textbf{Set 8}} : {$H_{T}$}, aplan, Mjjtau, costhetastar ($cos(\theta^{*})$) \item \textit{\textbf{Set 9}} : {$H_{T}$}, aplan, Mjjtau, cent, spher \item \textit{\textbf{Set 10}} : {$H_{T}$}, aplan, Mjjtau, cent, costhetastar \item \textit{\textbf{Set 11}} : {$H_{T}$}, aplan, Mjjtau, spher, costhetastar \item \textit{\textbf{Set 12}} : METsig ($\not\!\! E_{T}$ significance), {$H_{T}$}, aplan, Mjjtau \item \textit{\textbf{Set 13}} : METsig, {$H_{T}$}, aplan, cent \item \textit{\textbf{Set 14}} : METsig, {$H_{T}$}, aplan, spher \item \textit{\textbf{Set 15}} : METsig, {$H_{T}$}, cent, spher \item \textit{\textbf{Set 16}} : METsig, {$H_{T}$}, aplan \item \textit{\textbf{Set 17}} : METsig, {$H_{T}$}, Mjjtau \item \textit{\textbf{Set 18}} : METsig, aplan, Mjjtau \item \textit{\textbf{Set 19}} : METsig, {$H_{T}$}, cent \item \textit{\textbf{Set 20}} : METsig, {$H_{T}$}, aplan, Mjjtau, cent \item \textit{\textbf{Set 21}} : METsig, {$H_{T}$}, aplan, cent, spher \item \textit{\textbf{Set 22}} : METsig, {$H_{T}$}, aplan, Mjjtau, spher \item \textit{\textbf{Set 23}} : METsig, {$H_{T}$}, aplan, Mjjtau, costhetastar \item \textit{\textbf{Set 24}} : METsig, Mjjtau, cent, spher, costhetastar \item \textit{\textbf{Set 25}} : METsig, {$H_{T}$}, cent, spher, costhetastar \item \textit{\textbf{Set 26}} : METsig, aplan, cent, spher, costhetastar \item \textit{\textbf{Set 27}} : METsig, {$H_{T}$}, aplan, cent, costhetastar \item \textit{\textbf{Set 28}} : {$H_{T}$}, aplan, topmassl \item \textit{\textbf{Set 29}} : {$H_{T}$}, aplan, Mjjtau, topmassl \item \textit{\textbf{Set 30}} : {$H_{T}$}, aplan, Mjjtau, cent, topmassl \item \textit{\textbf{Set 31}} : {$H_{T}$}, aplan, Mjjtau, costhetastar, topmassl \item \textit{\textbf{Set 32}} : METsig, {$H_{T}$}, topmassl, aplan, Mjjtau \item \textit{\textbf{Set 33}} : METsig, spher, costhetastar, aplan, cent % \item \textit{\textbf{Set XXXIV}} : metl, spher, Mjjtau, topmassl, ktminp \end{itemize} P17 tried only three different sets among hundreds of possible combinations. We believe that the 33 sets tested above suffice in giving an optimal result. The criteria used for making a decision on which variable should be used follow: \begin{itemize} \item No more than 5 variables to keep NN simple and stable. More require larger training samples. \item We want to use METsig variable, since it's the one providing best discrimination. \item We do not want to use highly correlated variables in same NN. Such as $H_{T}$ and jet $p_{T}$'. % \item We can not use tau-based variables. \item We want to use variables with high discriminating power. \end{itemize} In order to make the decision about which of these 33 choices is the optimal we created an ensemble of 20000 pseudo-datasets each containing events randomly (according to a Poisson distribution) picked from QCD, EW and $\ttbar$ templates. Each of these datasets was treated like real data, meaning applying all the cuts and doing the shape fit of event topological NN. QCD templates for fit were made from the same ``loose-tight $\tau$ sample'' from which the QCD component of the ``data'' was drawn. We used the folloing quantity as the figure of merit: \begin{equation} f = \displaystyle \frac{(N_{fit} - N_{true})}{N_{true}} \label{merit} \end{equation} \noindent where $N_{fit}$ is the number of $t\bar{t}$ pairs given by the fit and $N_{true}$ is the number of $t\bar{t}$ pairs from the Poisson distribution. In both Set and $\not\!\! E_{T}$ significance optimization, the lowest RMS was used to characterize which configuration is the best in each case. The plots showing results concerning the set optimization are found in Appendix \ref{app:set_opt} and are summarized in Table \ref{setopt_table} below, where each RMS and mean are shown. The parenthesis after each set ID show the number of hidden nodes in NN training. \begin{table}[htbp] \begin{tabular}{|c|r|r|r|} \hline Set of variables & \multicolumn{1}{c|}{RMS} & \multicolumn{1}{c|}{mean} \\ \hline \hline Set1(6) & \multicolumn{1}{c|}{0.1642} & \multicolumn{1}{c|}{0.0265}\\ \hline Set2(6) & \multicolumn{1}{c|}{0.1840} & \multicolumn{1}{c|}{0.0054}\\ \hline Set3(6) & \multicolumn{1}{c|}{0.1923} & \multicolumn{1}{c|}{0.0060}\\ \hline Set4(6) & \multicolumn{1}{c|}{0.1978} & \multicolumn{1}{c|}{0.0175}\\ \hline Set5(6) & \multicolumn{1}{c|}{0.2385} & \multicolumn{1}{c|}{0.0022}\\ \hline Set6(8) & \multicolumn{1}{c|}{0.1687} & \multicolumn{1}{c|}{0.0115}\\ \hline Set7(8) & \multicolumn{1}{c|}{0.1667} & \multicolumn{1}{c|}{0.0134}\\ \hline Set8(10) & \multicolumn{1}{c|}{0.1668} & \multicolumn{1}{c|}{0.0162}\\ \hline Set9(10) & \multicolumn{1}{c|}{0.1721} & \multicolumn{1}{c|}{0.0102}\\ \hline Set10(10) & \multicolumn{1}{c|}{0.1722} & \multicolumn{1}{c|}{0.0210}\\ \hline Se11(10) & \multicolumn{1}{c|}{0.1716} & \multicolumn{1}{c|}{0.0180}\\ \hline Set12(8) & \multicolumn{1}{c|}{0.1662} & \multicolumn{1}{c|}{0.0039}\\ \hline Set13(8) & \multicolumn{1}{c|}{0.1819} & \multicolumn{1}{c|}{0.0018}\\ \hline Set14(8) & \multicolumn{1}{c|}{0.1879} & \multicolumn{1}{c|}{0.0019}\\ \hline Set15(8) & \multicolumn{1}{c|}{0.1884} & \multicolumn{1}{c|}{-0.0004}\\ \hline Set16(6) & \multicolumn{1}{c|}{0.1912} & \multicolumn{1}{c|}{0.0034}\\ \hline Set17(6) & \multicolumn{1}{c|}{0.1768} & \multicolumn{1}{c|}{0.0074}\\ \hline Set18(6) & \multicolumn{1}{c|}{0.2216} & \multicolumn{1}{c|}{-0.0030}\\ \hline Set19(6) & \multicolumn{1}{c|}{0.1921} & \multicolumn{1}{c|}{0.0015}\\ \hline Set20(10) & \multicolumn{1}{c|}{0.1620} & \multicolumn{1}{c|}{0.0262}\\ \hline Set21(10) & \multicolumn{1}{c|}{0.1753} & \multicolumn{1}{c|}{0.0010}\\ \hline Set22(10) & \multicolumn{1}{c|}{0.1646} & \multicolumn{1}{c|}{0.0086}\\ \hline Set23(10) & \multicolumn{1}{c|}{0.1683} & \multicolumn{1}{c|}{0.0132}\\ \hline Set24(10) & \multicolumn{1}{c|}{0.2053} & \multicolumn{1}{c|}{0.0122}\\ \hline Set25(10) & \multicolumn{1}{c|}{0.1906} & \multicolumn{1}{c|}{0.0038}\\ \hline Set26(10) & \multicolumn{1}{c|}{0.2130} & \multicolumn{1}{c|}{0.0028}\\ \hline Set27(10) & \multicolumn{1}{c|}{0.1859} & \multicolumn{1}{c|}{0.0004}\\ \hline Set28(6) & \multicolumn{1}{c|}{0.1910} & \multicolumn{1}{c|}{-0.0022}\\ \hline Set29(8) & \multicolumn{1}{c|}{0.1587} & \multicolumn{1}{c|}{0.0214}\\ \hline Set30(10) & \multicolumn{1}{c|}{0.1546} & \multicolumn{1}{c|}{0.0148}\\ \hline Set31(10) & \multicolumn{1}{c|}{0.1543} & \multicolumn{1}{c|}{0.0203}\\ \hline Set32(10) & \multicolumn{1}{c|}{0.1468} & \multicolumn{1}{c|}{0.0172}\\ \hline Set33(10) & \multicolumn{1}{c|}{0.2201} & \multicolumn{1}{c|}{0.0081}\\ \hline %Set34(10) & \multicolumn{1}{c|}{0.1955} & \multicolumn{1}{c|}{0.0184}\\ \hline \end{tabular} \caption{Results for set optimization part whit $\not\!\! E_{T}$ significance $>$ 4.0 applied to all sets. The number in parenthesis refers to number of hidden nodes in each case.} \label{setopt_table} \end{table} From Table \ref{setopt_table} we see that Set 32 has the lowest RMS, thus we chose it as the set to be used in $\not\!\! E_{T}$ significance optimization part, whose results are shown in Appendix \ref{app:metl_opt} and then summarized in Table \ref{metlopt_table} below \begin{table}[htbp] \begin{tabular}{|c|r|r|r|r|} \hline Set 32 & Number of hidden nodes &$\not\!\! E_{T}$ significance cut & RMS & \multicolumn{1}{c|}{mean} \\ \hline \hline %Set6(10) & \multicolumn{1}{c|}{1.0} & \multicolumn{1}{c|}{0.2611} \\ \hline %Set6(10) & \multicolumn{1}{c|}{1.5} & \multicolumn{1}{c|}{0.2320} \\ \hline %Set6(10) & \multicolumn{1}{c|}{2.0} & \multicolumn{1}{c|}{0.2102} \\ \hline %Set6(10) & \multicolumn{1}{c|}{2.5} & \multicolumn{1}{c|}{0.2021} \\ \hline 1 & \multicolumn{1}{c|}{10} & \multicolumn{1}{c|}{3.0} & \multicolumn{1}{c|}{0.1507} & \multicolumn{1}{c|}{0.0157}\\ \hline 2 & \multicolumn{1}{c|}{10} & \multicolumn{1}{c|}{3.5} & \multicolumn{1}{c|}{0.1559} & \multicolumn{1}{c|}{0.0189}\\ \hline 3 & \multicolumn{1}{c|}{10} & \multicolumn{1}{c|}{4.0} & \multicolumn{1}{c|}{0.1468} & \multicolumn{1}{c|}{0.0172}\\ \hline 4 & \multicolumn{1}{c|}{10} & \multicolumn{1}{c|}{4.5} & \multicolumn{1}{c|}{0.1511} & \multicolumn{1}{c|}{0.0153}\\ \hline 5 & \multicolumn{1}{c|}{10} & \multicolumn{1}{c|}{5.0} & \multicolumn{1}{c|}{0.1552} & \multicolumn{1}{c|}{0.0205}\\ \hline %Set6(10) & \multicolumn{1}{c|}{5.5} & \multicolumn{1}{c|}{0.4008} \\ \hline \end{tabular} \caption{Results for $\not\!\! E_{T}$ significance optimization part when varying the $\not\!\! E_{T}$ significance cut The number in parenthesis refers to number of hidden nodes in each case.} \label{metlopt_table} \end{table} Combined results from Tables \ref{setopt_table} and \ref{metlopt_table} show that the best configuration found was Set 32 with $\not\!\! E_{T}$ significance $\geq$ 4.0. Therefore, this was the configuration used to perform the cross section measurement.Figure \ref{fig:METsig_RMS} shows the variation of the RMS as function of the $\not\!\! E_{T}$ significance we applied. \begin{figure}[h] \includegraphics[scale=0.35]{plots/METsig-RMS.eps} \caption{Plot of RMS as a function the $\not\!\! E_{T}$ significance applied} \label{fig:METsig_RMS} \end{figure} %\clearpage In order to check the validity of our emsemble tests procedure, it is instructive to plot both the distribution of the predicted number of $t\bar{t}$ and what is called ``pull'', defined in Equation \ref{pull} below: \begin{equation} p = \displaystyle \frac{(N_{fit}-N_{true})}{\sigma_{fit}} \label{pull} \end{equation} \noindent where $\sigma_{fit}$ is the error on the number of $t\bar{t}$ pairs given by the fit. Figures \ref{fig:gaus_ttbar} and \ref{fig:pull} show both beforementioned distributions. From Figure \ref{fig:gaus_ttbar} we see a good agreement between the number of $t\bar{t}$ pairs initially set in the ensemble and the measured value. And Figure \ref{fig:pull} shows a nice gaussian curve, that indicates a good behaviour of the fit uncertainties in the ensembles. \begin{figure}[t] \includegraphics[scale=0.40]{plots/gaus_ttbar.eps} \caption{Distribution of the output ``measurement'' for an ensemble with 116.9 $\ttbar$ events.} \label{fig:gaus_ttbar} \end{figure} \begin{figure}[b] \includegraphics[scale=0.40]{plots/pull1-40.eps} \caption{The ensemble test's pull.} \label{fig:pull} \end{figure} %\newpage \clearpage