ttbar/p20_taujets_note/NN.tex - view

File: [Nicadd] / ttbar / p20_taujets_note / NN.tex
Revision 1.2: download - view: text, annotated - select for diffs
Wed May 18 21:54:06 2011 UTC (13 years, 2 months ago) by uid12904
Branches: MAIN
CVS tags: HEAD

simple change to test

\newpage %ttttt \section{\label{sub:NN}Neural Network Analysis} \subsection{\label{sub:Variables}Variables for NN training} \noindent Following the same procedure as in the previous analysis, we determine the content of signal and background in the preselected sample, increase signal/background rate and from this, measure the cross-section. The procedure adopted in the p17 analysis was feed a set of topological variables into an artificial neural network in order to provide the best possible separation between signal and background. As before, the criteria for choosing such variables were: power of discrimination and $\tau$-uncorrelated variables. The set is presented below: \begin{itemize} \item \textit{\textbf{$H_{T}$}} - the scalar sum of all jet's $p_{T}$ (here and below including $\tau$ lepton candidates). \item \textit{\textbf{$\not\!\! E_{T}$ significance}} - As being the variable that provides the best signal-background separation we decided to optimize it. \item \textit{\textbf{Aplanarity}} \cite{p17topo} - the normalized momentum tensor is defined as \begin{center} \begin{equation} {\cal M} = \frac{\sum_{o}p^{o}_{i}p^{o}_{j}}{\sum_{o}|\overrightarrow{p^{o}}|} \label{tensor} \end{equation} \end{center} \noindent where $\overrightarrow{p^{0}}$ is the momentum-vector of a reconstructed object $o$ and $i$ and $j$ are cartesian coordinates. From the diagonalization of $\cal M$ we find three eigenvalues $\lambda_{1}\geq\lambda_{2}\geq\lambda_{3}$ with the constraint $\lambda_{1} + \lambda_{2} + \lambda_{3} = 1$. The aplanarity {\cal A} is given by {$\cal A$} = $\frac{3}{2}\lambda_{3}$ and measures the flatness of an event. Hence, it is defined in the range $0 \leq {\cal M} \leq 0.5$. Large values of {$\cal A$} correspond to more spherical events, like $t\bar{t}$ events for instance, since they are typical of decays of heavy objects. On the other hand, both QCD and $W + \mbox{jets}$ events are more planar since jets in these events are primarily due to initial state radiation. \item \textit{\textbf{Sphericity}} \cite{p17topo} - being defined as {$\cal S$} = $\frac{3}{2}(\lambda_{2} + \lambda_{3})$, and having a range $0 \leq {\cal S} \leq 1.0$, sphericity is a measure of the summed $p^{2}_{\perp}$ with respect to the event axis. In this sense a 2-jets event corresponds to {$\cal S$} $\approx 0$ and an isotropic event {$\cal S$} $\approx 1$. $t\bar{t}$ events are very isotropic as they are typical of the decays of heavy objects and both QCD and $W + \mbox{jets}$ events are less isotropic due to the fact that jets in these events come primarily from initial state radiation. \item \textit{\textbf{Top and $W$ mass likelihood}} - a $\chi^{2}$-like variable. $L\equiv\left(\frac{M_{3j}-m_{t}}{\sigma_{t}}\right)^{2}+\left(\frac{M_{2j}-M_{W}}{\sigma_{W}}\right)^{2}$, where $m_{t}, M_{W},\sigma_{t},\sigma_{W}$ are top and W masses (172.4 GeV and 81.02 GeV respectively) and resolution values (19.4 GeV and 8.28 GeV respectively). $M_{3j}$ and $M_{2j}$ are invariant masses composed of the jet combinations. We choose combination that minimizes $L$. \item \textit{\textbf{Centrality}}, defined as $\frac{H_{T}}{H_{E}}$ , where $H_{E}$ is sum of energies of the jets. \item \textit{\textbf{$\cos(\theta*)$}} - The angle between the beam axis and the highest-$p_T$ jet in the rest frame of all the jets in the event. \item \textit{\textbf{$\sqrt(s)$}} - The invariant mass of all jets and $\tau$s in the event. \end{itemize} The chosen variables are in the end a consequence of the method employed in this analysis: use events from the QCD-enriched loose-tight sample to model QCD events in the signal-rich sample, and use a b-tag veto sample as an independent control sample to check the validity of such background modeling. %\clearpage \subsection{\label{sub:NN-variables}Topological NN} For training the Neural Network we used the Multilayer Perceptron algorithm, as described in \cite{MLPfit}. As explained before in Section \ref{sub:Results-of-the}, the first 1400000 events in the ``loose-tight'' sample were used as background for NN training for taus types 1 and 2, and the first 600000 of the same sample for NN training for type 3 taus. This means that different tau types are being treated separately in the topological NN. In both cases 1/3 of the Alpgen sample of $t\bar{t} \rightarrow \tau +jets$ was used for NN training and 2/3 of it for the measurement. When doing the measurement later on (Section \ref{sub:xsect}) we pick the tau with the highest $NN(\tau)$ in the signal sample as the tau cadidate at same time that taus in the loose-tight sample are picked at random since all of them are regarded as fake taus by being below the cut $NN(\tau)$ = 0.7. By doing this we expect to avoid any bias when selecting real taus for the measurement. Figures \ref{fig:nnout_type2_training} and \ref{fig:nnout_type3_training} show the effect of each of the chosen the topological event NN input variables on the final output. Figures \ref{fig:nnout_type2} and \ref{fig:nnout_type3} show the NN output as a result of the training described above. It is evident from both pictures that high values of NN correspond to the signal-enriched region. \begin{figure}[h] \includegraphics[scale=0.6]{plots/SetI_NNout_SM_type2_tauQCD.eps} \caption{Training of topological Neural Network output for type 1 and 2 $\tau$ channel. Upper left: relative impact of each of the input variables; upper right: topological structure; lower right: final signal-background separation of the method; lower left: convergence curves.} \label{fig:nnout_type2_training} \end{figure} \newpage \begin{figure}[h] \includegraphics[scale=0.6]{plots/SetI_NNout_SM_type3_tauQCD.eps} \caption{Training of topological Neural Network output for type 3 $\tau$ channel. Upper left: relative impact of each of the input variables; upper right: topological structure; lower right: final signal-background separation of the method; lower left: convergence curves.} \label{fig:nnout_type3_training} \end{figure} \begin{figure}[h] \includegraphics[scale=0.5]{CONTROLPLOTS/Std_TypeI_II/nnout.eps} \caption{The topological Neural Network output for type 1 and 2 $\tau$ channel} \label{fig:nnout_type2} \end{figure} \newpage \begin{figure}[t] \includegraphics[scale=0.5]{CONTROLPLOTS/Std_TypeIII/nnout.eps} \caption{The topological Neural Network output for type 3 $\tau$ channel} \label{fig:nnout_type3} \end{figure} \subsection{\label{sub:NN-optimization}NN optimization} One difference between this present analysis and the previous p17 is that we performed a NN optimization along with a $\not\!\! E_{T}$ significance optimization. Previously a cut of $>$ 3.0 was applied to $\not\!\! E_{T}$ significance at the preselection stage and then it was included as one of the variables for NN training. This time as we chose to optimize it, since it is still a good variable to provide signal-background discrimination (Figure \ref{fig:metl_note}). It is important to stress out that after the optimization we performed the analysis with the optimized $\not\!\! E_{T}$ significance cut applied when doing both $\tau$ and b ID (Section \ref{sub:Results-of-the}), therefore after the preselection where no $\not\!\! E_{T}$ significance cut was applied. We then went back anp reprocessed (preselected) all MC samples with the optimized cut. Both results, with $\not\!\! E_{T}$ significance applied during and after preselectio were identical. We then chose to present this analyis with this cut applied at the preselection level in order to have a consistent cut flow throughout the analysis(Section \ref{sub:Preselection}). \begin{figure}[h] \includegraphics[scale=0.5]{plots/metl_allEW.eps} \caption{$\not\!\! E_{T}$ significance distribution for signal and backgrounds.} \label{fig:metl_note} \end{figure} \newpage Below we describe how we split this part of the analysis into two parts: \begin{enumerate} \item {\bf Set optimization:} We applied an arbitrary cut on $\not\!\! E_{T}$ significance of $\geq$ 4.0 and varied the set of varibles going into NN training \item {\bf $\not\!\! E_{T}$ significance optimization:} After chosing the best set based on the lowest RMS, we then varied the $\not\!\! E_{T}$ significance cut \end{enumerate} For this part of the analysis we present the sets of variables that were taken into account to perform the NN traning \begin{itemize} \item \textit{\textbf{Set I}} : {$H_{T}$}, aplan (aplanarity), sqrts ($\sqrt{s}$) \item \textit{\textbf{Set II}} : {$H_{T}$}, aplan, cent (centrality) \item \textit{\textbf{Set III}} : {$H_{T}$}, aplan, spher (spherecity) \item \textit{\textbf{Set IV}} : {$H_{T}$}, cent, spher \item \textit{\textbf{Set V}} : aplan, cent, spher \item \textit{\textbf{Set VI}} : {$H_{T}$}, aplan, sqrts, spher \item \textit{\textbf{Set VII}} : {$H_{T}$}, aplan, sqrts, cent \item \textit{\textbf{Set VIII}} : {$H_{T}$}, aplan, sqrts, costhetastar ($cos(\theta^{*})$ \item \textit{\textbf{Set IX}} : {$H_{T}$}, aplan, sqrts, cent, spher \item \textit{\textbf{Set X}} : {$H_{T}$}, aplan, sqrts, cent, costhetastar \item \textit{\textbf{Set XI}} : {$H_{T}$}, aplan, sqrts, spher, costhetastar \item \textit{\textbf{Set XII}} : metl, {$H_{T}$}, aplan, sqrts \item \textit{\textbf{Set XIII}} : metl, {$H_{T}$}, aplan, cent \item \textit{\textbf{Set XIV}} : metl, {$H_{T}$}, aplan, spher \item \textit{\textbf{Set XV}} : metl, {$H_{T}$}, cent, spher \item \textit{\textbf{Set XVI}} : metl, {$H_{T}$}, aplan \item \textit{\textbf{Set XVII}} : metl, {$H_{T}$}, sqrts \item \textit{\textbf{Set XVIII}} : metl, aplan, sqrts \item \textit{\textbf{Set XIX}} : metl, {$H_{T}$}, cent \item \textit{\textbf{Set XX}} : metl, {$H_{T}$}, aplan, sqrts, cent \item \textit{\textbf{Set XXI}} : metl, {$H_{T}$}, aplan, cent, spher \item \textit{\textbf{Set XXII}} : metl, {$H_{T}$}, aplan, sqrts, spher \item \textit{\textbf{Set XXIII}} : metl, {$H_{T}$}, aplan, sqrts, costhetastar \item \textit{\textbf{Set XXIV}} : metl, sqrts, cent, spher, costhetastar \item \textit{\textbf{Set XXV}} : metl, {$H_{T}$}, cent, spher, costhetastar \item \textit{\textbf{Set XXVI}} : metl, aplan, cent, spher, costhetastar \item \textit{\textbf{Set XXVII}} : metl, {$H_{T}$}, aplan, cent, costhetastar \item \textit{\textbf{Set XXVIII}} : {$H_{T}$}, aplan, topmassl \item \textit{\textbf{Set XXIX}} : {$H_{T}$}, aplan, sqrts, topmassl \item \textit{\textbf{Set XXX}} : {$H_{T}$}, aplan, sqrts, cent, topmassl \item \textit{\textbf{Set XXXI}} : {$H_{T}$}, aplan, sqrts, costhetastar, topmassl \item \textit{\textbf{Set XXXII}} : metl, {$H_{T}$}, topmassl, aplan, sqrts \item \textit{\textbf{Set XXXIII}} : metl, spher, costhetastar, aplan, cent % \item \textit{\textbf{Set XXXIV}} : metl, spher, sqrts, topmassl, ktminp \end{itemize} The criteria used for making a decision on which variable should be used follow: \begin{itemize} \item No more than 5 variables to keep NN simple and stable. More variables leads to instabilities (different result after each retraining) and require larger training samples. % \item We want to use $metl$ (\met significance) variable, since it's the one providing best discrimination. \item We do not want to use highly correlated variables in same NN. % \item We can not use tau-based variables. \item We want to use variables with high discriminating power. \end{itemize} In order to make the decision about which of these 11 choices is the optimal we created an ensemble of 20000 pseudo-datasets each containing events randomly (according to a Poisson distribution) picked from QCD, EW and $\ttbar$ templates. Each of these datasets was treated like real data, meaning applying all the cuts and doing the shape fit of event topological NN. QCD templates for fit were made from the same ``loose-tight $\tau$ sample'' from which the QCD component of the ``data'' was drawn. The figure of merit chosen is given by Equation \ref{merit} below: \begin{equation} f = \displaystyle \frac{(N_{fit} - N_{true})}{N_{true}} \label{merit} \end{equation} \noindent where $N_{fit}$ is the number of $t\bar{t}$ pairs given by the fit and $N_{true}$ is the number of $t\bar{t}$ pairs from the Poisson distribution. In both Set and $\not\!\! E_{T}$ significance optimization, the lowest RMS was used to caracterize which configuration is the best in each case. The plots showing results concerning the set optimizations are found in Appendix \ref{app:set_opt} and are summarized in Table \ref{setopt_table} below, where each RMS and mean are shown. For NN training is standard to choose the number of hidden nodes as being twice the number the number of variables used for the training. The parenthesis after each set ID show the number of hidden nodes in NN training. \begin{table}[htbp] \begin{tabular}{|c|r|r|r|} \hline Set of variables & \multicolumn{1}{c|}{RMS} & \multicolumn{1}{c|}{mean} \\ \hline \hline Set1(6) & \multicolumn{1}{c|}{0.1642} & \multicolumn{1}{c|}{0.0265}\\ \hline Set2(6) & \multicolumn{1}{c|}{0.1840} & \multicolumn{1}{c|}{0.0054}\\ \hline Set3(6) & \multicolumn{1}{c|}{0.1923} & \multicolumn{1}{c|}{0.0060}\\ \hline Set4(6) & \multicolumn{1}{c|}{0.1978} & \multicolumn{1}{c|}{0.0175}\\ \hline Set5(6) & \multicolumn{1}{c|}{0.2385} & \multicolumn{1}{c|}{0.0022}\\ \hline Set6(8) & \multicolumn{1}{c|}{0.1687} & \multicolumn{1}{c|}{0.0115}\\ \hline Set7(8) & \multicolumn{1}{c|}{0.1667} & \multicolumn{1}{c|}{0.0134}\\ \hline Set8(10) & \multicolumn{1}{c|}{0.1668} & \multicolumn{1}{c|}{0.0162}\\ \hline Set9(10) & \multicolumn{1}{c|}{0.1721} & \multicolumn{1}{c|}{0.0102}\\ \hline Set10(10) & \multicolumn{1}{c|}{0.1722} & \multicolumn{1}{c|}{0.0210}\\ \hline Se11(10) & \multicolumn{1}{c|}{0.1716} & \multicolumn{1}{c|}{0.0180}\\ \hline Set12(8) & \multicolumn{1}{c|}{0.1662} & \multicolumn{1}{c|}{0.0039}\\ \hline Set13(8) & \multicolumn{1}{c|}{0.1819} & \multicolumn{1}{c|}{0.0018}\\ \hline Set14(8) & \multicolumn{1}{c|}{0.1879} & \multicolumn{1}{c|}{0.0019}\\ \hline Set15(8) & \multicolumn{1}{c|}{0.1884} & \multicolumn{1}{c|}{-0.0004}\\ \hline Set16(6) & \multicolumn{1}{c|}{0.1912} & \multicolumn{1}{c|}{0.0034}\\ \hline Set17(6) & \multicolumn{1}{c|}{0.1768} & \multicolumn{1}{c|}{0.0074}\\ \hline Set18(6) & \multicolumn{1}{c|}{0.2216} & \multicolumn{1}{c|}{-0.0030}\\ \hline Set19(6) & \multicolumn{1}{c|}{0.1921} & \multicolumn{1}{c|}{0.0015}\\ \hline Set20(10) & \multicolumn{1}{c|}{0.1620} & \multicolumn{1}{c|}{0.0262}\\ \hline Set21(10) & \multicolumn{1}{c|}{0.1753} & \multicolumn{1}{c|}{0.0010}\\ \hline Set22(10) & \multicolumn{1}{c|}{0.1646} & \multicolumn{1}{c|}{0.0086}\\ \hline Set23(10) & \multicolumn{1}{c|}{0.1683} & \multicolumn{1}{c|}{0.0132}\\ \hline Set24(10) & \multicolumn{1}{c|}{0.2053} & \multicolumn{1}{c|}{0.0122}\\ \hline Set25(10) & \multicolumn{1}{c|}{0.1906} & \multicolumn{1}{c|}{0.0038}\\ \hline Set26(10) & \multicolumn{1}{c|}{0.2130} & \multicolumn{1}{c|}{0.0028}\\ \hline Set27(10) & \multicolumn{1}{c|}{0.1859} & \multicolumn{1}{c|}{0.0004}\\ \hline Set28(6) & \multicolumn{1}{c|}{0.1910} & \multicolumn{1}{c|}{-0.0022}\\ \hline Set29(8) & \multicolumn{1}{c|}{0.1587} & \multicolumn{1}{c|}{0.0214}\\ \hline Set30(10) & \multicolumn{1}{c|}{0.1546} & \multicolumn{1}{c|}{0.0148}\\ \hline Set31(10) & \multicolumn{1}{c|}{0.1543} & \multicolumn{1}{c|}{0.0203}\\ \hline Set32(10) & \multicolumn{1}{c|}{0.1468} & \multicolumn{1}{c|}{0.0172}\\ \hline Set33(10) & \multicolumn{1}{c|}{0.2201} & \multicolumn{1}{c|}{0.0081}\\ \hline %Set34(10) & \multicolumn{1}{c|}{0.1955} & \multicolumn{1}{c|}{0.0184}\\ \hline \end{tabular} \caption{Results for set optimization part whit $\not\!\! E_{T}$ significance $>$ 4.0 applied to all sets. The number in parenthesis refers to number of hidden nodes in each case.} \label{setopt_table} \end{table} From Table \ref{setopt_table} we see that Set I has the lowest RMS, thus we chose it as the set to be used in $\not\!\! E_{T}$ significance optimization part, whose results are shown in Appendix \ref{app:metl_opt} and then summarized in Table \ref{metlopt_table} below \begin{table}[htbp] \begin{tabular}{|c|r|r|r|} \hline Set of variables & $\not\!\! E_{T}$ significance cut & RMS & \multicolumn{1}{c|}{mean} \\ \hline \hline %Set6(10) & \multicolumn{1}{c|}{1.0} & \multicolumn{1}{c|}{0.2611} \\ \hline %Set6(10) & \multicolumn{1}{c|}{1.5} & \multicolumn{1}{c|}{0.2320} \\ \hline %Set6(10) & \multicolumn{1}{c|}{2.0} & \multicolumn{1}{c|}{0.2102} \\ \hline %Set6(10) & \multicolumn{1}{c|}{2.5} & \multicolumn{1}{c|}{0.2021} \\ \hline Set32(10) & \multicolumn{1}{c|}{3.0} & \multicolumn{1}{c|}{0.1507} & \multicolumn{1}{c|}{0.0157}\\ \hline Set32(10) & \multicolumn{1}{c|}{3.5} & \multicolumn{1}{c|}{0.1559} & \multicolumn{1}{c|}{0.0189}\\ \hline Set32(10) & \multicolumn{1}{c|}{4.0} & \multicolumn{1}{c|}{0.1468} & \multicolumn{1}{c|}{0.0172}\\ \hline Set32(10) & \multicolumn{1}{c|}{4.5} & \multicolumn{1}{c|}{0.1511} & \multicolumn{1}{c|}{0.0153}\\ \hline Set32(10) & \multicolumn{1}{c|}{5.0} & \multicolumn{1}{c|}{0.1552} & \multicolumn{1}{c|}{0.0205}\\ \hline %Set6(10) & \multicolumn{1}{c|}{5.5} & \multicolumn{1}{c|}{0.4008} \\ \hline \end{tabular} \caption{Results for $\not\!\! E_{T}$ significance optimization part when varying the $\not\!\! E_{T}$ significance cut The number in parenthesis refers to number of hidden nodes in each case.} \label{metlopt_table} \end{table} Combined results from Tables \ref{setopt_table} and \ref{metlopt_table} show that the best configuration found was Set I with $\not\!\! E_{T}$ significance $\geq$ 4.0. Therefore, this was the configuration used to perform the cross-section measurement.Figure \ref{fig:METsig_RMS} shows the variation of the RMS as function of the $\not\!\! E_{T}$ significance we applied. \begin{figure}[b] \includegraphics[scale=0.4]{plots/METsig-RMS.eps} \caption{Plot of RMS as a function the $\not\!\! E_{T}$ significance applied} \label{fig:METsig_RMS} \end{figure} \clearpage In order to check the validity of our emsemble tests procedure, it is instructive to plot both the distribution of the predicted number of $t\bar{t}$ and what is called ``pull'', defined in Equation \ref{pull} below: \begin{equation} p = \displaystyle \frac{(N_{fit}-N_{true})}{\sigma_{fit}} \label{pull} \end{equation} \noindent where $\sigma_{fit}$ is the error on the number of $t\bar{t}$ pairs given by the fit. Figures \ref{fig:gaus_ttbar} and \ref{fig:pull} show both beforementioned distributions. From Figure \ref{fig:gaus_ttbar} we see a good agreement between the number of $t\bar{t}$ pairs initially set in the ensemble and the measured value. And Figure \ref{fig:pull} shows a nice gaussian curve, that indicates a good behaviour of the fit uncertainties in the ensembles. \begin{figure}[t] \includegraphics[scale=0.5]{plots/gaus_ttbar.eps} \caption{Distribution of the output ``measurement'' for an ensemble with 116.9 $\ttbar$ events.} \label{fig:gaus_ttbar} \end{figure} \begin{figure}[t] \includegraphics[scale=0.5]{plots/pull1-40.eps} \caption{The ensemble test's pull.} \label{fig:pull} \end{figure} %\newpage \clearpage