--- ttbar/p20_taujets_note/NN.tex 2011/05/18 21:30:39 1.1.1.1 +++ ttbar/p20_taujets_note/NN.tex 2011/06/01 01:20:54 1.3 @@ -4,42 +4,44 @@ \subsection{\label{sub:Variables}Variables for NN training} -\noindent Following the same procedure as in the previous analysis, we determine the content +\noindent Following the same procedure as in the p17 analysis, we determine the content of signal and background in the preselected sample, increase signal/background rate -and from this, measure the cross-section. -The procedure adopted in the p17 analysis was feed a set of topological variables into an -artificial neural network in order to provide the best possible separation between -signal and background. As before, the criteria for choosing such variables were: power of discrimination -and $\tau$-uncorrelated variables. The set is presented below: +and from this, measure the cross section. +In p17, an artificil neural network based on topological characteristics of an event was used to +extract signal from a background-enriched region. As before, the criteria used in choosing the variables were: +power of discrimination and $\tau$-uncorrelated variables. The following variables were considered: \begin{itemize} -\item \textit{\textbf{$H_{T}$}} - the scalar sum of all jet's $p_{T}$ (here and below including $\tau$ lepton candidates). +\item \textit{\textbf{$H_{T}$}} - The scalar sum of all jet $p_{T}$'s (here and below including $\tau$ lepton candidates). +For $H_{T}$ values above $\sim$ 200 GeV we observed a dominance of signal over background. -\item \textit{\textbf{$\not\!\! E_{T}$ significance}} - As being the variable that provides the best signal-background -separation we decided to optimize it. +\item \textit{\textbf{$\not\!\! E_{T}$ significance}} - It is computed from calculated resolutions of +physical objects (jets, electrons, muons and unclustered energy) \cite{p17_note,METsig}. +It was chosen to be used and optimized due to its good signal-background discrimination power. \item \textit{\textbf{Aplanarity}} \cite{p17topo} - the normalized momentum tensor is defined as \begin{center} \begin{equation} -{\cal M} = \frac{\sum_{o}p^{o}_{i}p^{o}_{j}}{\sum_{o}|\overrightarrow{p^{o}}|} +{\cal M}_{ab} \equiv \frac{\sum_{i}p_{ia}p_{ib}}{\sum_{i}p^{2}_{i}} \label{tensor} \end{equation} \end{center} -\noindent where $\overrightarrow{p^{0}}$ is the momentum-vector of a reconstructed object $o$ -and $i$ and $j$ are cartesian coordinates. From the diagonalization of $\cal M$ we find three eigenvalues +\noindent where $p_{i}$ is the momentum-vector +and teh index $i$ runs over all the jets and the $W$. From the diagonalization of $\cal M$ we find three eigenvalues $\lambda_{1}\geq\lambda_{2}\geq\lambda_{3}$ with the constraint $\lambda_{1} + \lambda_{2} + \lambda_{3} = 1$. -The aplanarity {\cal A} is given by {$\cal A$} = $\frac{3}{2}\lambda_{3}$ and measures the flatness of an event. -Hence, it is defined in the range $0 \leq {\cal M} \leq 0.5$. Large values of {$\cal A$} correspond to more spherical events, -like $t\bar{t}$ events for instance, since they are typical of decays of heavy objects. On the other hand, -both QCD and $W + \mbox{jets}$ events are more planar since jets in these events are primarily due to +The aplanarity is defined as {$\cal A$} = $\frac{3}{2}\lambda_{3}$ and measures the flatness of an event. +It assumes values in the range $0 \leq {\cal A} \leq 0.5$. +It was chosen to be used in the NN due to the fact that large values of {$\cal A$} correspond to more spherical events, +like $t\bar{t}$ events for instance, since they are typical of cascade decays of heavy objects. On the other hand, +both QCD and $W + \mbox{jets}$ events tend to be more collinear since jets in these events are primarily due to initial state radiation. -\item \textit{\textbf{Sphericity}} \cite{p17topo} - being defined as {$\cal S$} = $\frac{3}{2}(\lambda_{2} + \lambda_{3})$, -and having a range $0 \leq {\cal S} \leq 1.0$, sphericity is a measure of the summed $p^{2}_{\perp}$ with -respect to the event axis. In this sense a 2-jets event corresponds to {$\cal S$} $\approx 0$ and an isotropic event -{$\cal S$} $\approx 1$. $t\bar{t}$ events are very isotropic as they are typical of the decays of heavy objects +\item \textit{\textbf{Sphericity}} \cite{p17topo} - Defined as {$\cal S$} = $\frac{3}{2}(\lambda_{2} + \lambda_{3})$, +and ranges as $0 \leq {\cal S} \leq 1.0$, sphericity is a measure of the summed $p^{2}_{\perp}$ +More isotropic events have {$\cal S$} $\approx 1$ while less isotropic ones have {$\cal S$} $\approx 0$. +Sphericity is a good discrminator since $t\bar{t}$ events are very isotropic as they are typical of the decays of heavy objects and both QCD and $W + \mbox{jets}$ events are less isotropic due to the fact that jets in these events come primarily from initial state radiation. @@ -48,13 +50,17 @@ $L\equiv\left(\frac{M_{3j}-m_{t}}{\sigma where $m_{t}, M_{W},\sigma_{t},\sigma_{W}$ are top and W masses (172.4 GeV and 81.02 GeV respectively) and resolution values (19.4 GeV and 8.28 GeV respectively). $M_{3j}$ and $M_{2j}$ are invariant masses composed -of the jet combinations. We choose combination that minimizes $L$. +of 2- and 3-jet combinations. We choose combination that minimizes $L$. \item \textit{\textbf{Centrality}}, defined as $\frac{H_{T}}{H_{E}}$ , where $H_{E}$ is sum -of energies of the jets. +of energies of the jets. Used as discrimination variable since highe values ($\sim$ 1.0) are +more signal-dominated while low values ($\sim$ 0) are more background-dominated. + \item \textit{\textbf{$\cos(\theta*)$}} - The angle between the beam axis and the -highest-$p_T$ jet in the rest frame of all the jets in the event. -\item \textit{\textbf{$\sqrt(s)$}} - The invariant mass of all jets and $\tau$s in the event. +highest-$p_T$ jet in the rest frame of all the jets in the event. $t\bar{t}$ events tend to have +a lower ($\sim$ 0) $\cos(\theta*)$ values. This motivated its choice. + +\item \textit{\textbf{$M_{jj\tau}$}} - The invariant mass of all jets and $\tau$s in the event. \end{itemize} @@ -62,7 +68,7 @@ The chosen variables are in the end a co analysis: use events from the QCD-enriched loose-tight sample to model QCD events in the signal-rich sample, and use a b-tag veto sample as an independent control sample to check the validity of such -background modeling. +background modeling. Plots of all variables described above are found in Appendix \ref{app:discri_var}. %\clearpage @@ -70,8 +76,7 @@ background modeling. For training the Neural Network we used the Multilayer Perceptron algorithm, as described in \cite{MLPfit}. As explained before in Section \ref{sub:Results-of-the}, the first 1400000 events in the ``loose-tight'' sample were used as background -for NN training for taus types 1 and 2, and the first 600000 of the same sample for NN training for type 3 taus. -This means that different tau types are being treated separately in the topological NN. +for NN training for taus of Types 1 and 2, and the first 600000 of the same sample for NN training for type 3 taus. In both cases 1/3 of the Alpgen sample of $t\bar{t} \rightarrow \tau +jets$ was used for NN training and 2/3 of it for the measurement. When doing the measurement later on (Section \ref{sub:xsect}) we pick the tau with the highest $NN(\tau)$ @@ -88,21 +93,25 @@ the signal-enriched region. \begin{figure}[h] -\includegraphics[scale=0.6]{plots/SetI_NNout_SM_type2_tauQCD.eps} -\caption{Training of topological Neural Network output for type 1 and 2 $\tau$ channel. -Upper left: relative impact of each of the input variables; upper right: topological structure; -lower right: final signal-background separation of the method; lower left: convergence curves.} +\includegraphics[scale=0.49]{plots/SetI_NNout_SM_type2_tauQCD.eps} +\caption{Training of topological Neural Network output for Type 1 and 2 $\tau$ channel combined. +Upper left: relative impact of each of the input variables; upper right: relative weights +of the synaptic connections of the trained network; +lower left: convergence curves; lower right: the output distribution of signal and background +test samples after training.} \label{fig:nnout_type2_training} \end{figure} -\newpage +%\newpage \begin{figure}[h] -\includegraphics[scale=0.6]{plots/SetI_NNout_SM_type3_tauQCD.eps} +\includegraphics[scale=0.49]{plots/SetI_NNout_SM_type3_tauQCD.eps} \caption{Training of topological Neural Network output for type 3 $\tau$ channel. -Upper left: relative impact of each of the input variables; upper right: topological structure; -lower right: final signal-background separation of the method; lower left: convergence curves.} +Upper left: relative impact of each of the input variables; upper right: relative weights +of the synaptic connections of the trained network; +lower left: convergence curves; lower right: the output distribution of signal and background +test samples after training.} \label{fig:nnout_type3_training} \end{figure} @@ -148,66 +157,67 @@ level in order to have a consistent cut Below we describe how we split this part of the analysis into two parts: \begin{enumerate} -\item {\bf Set optimization:} We applied an arbitrary cut on $\not\!\! E_{T}$ significance of $\geq$ 4.0 and -varied the set of varibles going into NN training -\item {\bf $\not\!\! E_{T}$ significance optimization:} After chosing the best set based on the lowest RMS, -we then varied the $\not\!\! E_{T}$ significance cut +\item {\bf Set optimization:} We applied an ``reasonable'' cut on $\not\!\! E_{T}$ significance of $\geq$ 4.0 and +varied the set of varibles going into NN training. +\item {\bf $\not\!\! E_{T}$ significance optimization:} After chosing the best set based on the lowest RMS of the +figure of merith used (see Eq. \ref{merit}), we then optimized the $\not\!\! E_{T}$ significance cut. \end{enumerate} -For this part of the analysis we present the sets of variables that were taken into account to perform the NN traning +For this part of the analysis we present the sets of variables that were taken into account to perform the NN traning: \begin{itemize} - \item \textit{\textbf{Set I}} : {$H_{T}$}, aplan (aplanarity), sqrts ($\sqrt{s}$) - \item \textit{\textbf{Set II}} : {$H_{T}$}, aplan, cent (centrality) - \item \textit{\textbf{Set III}} : {$H_{T}$}, aplan, spher (spherecity) - \item \textit{\textbf{Set IV}} : {$H_{T}$}, cent, spher - \item \textit{\textbf{Set V}} : aplan, cent, spher - \item \textit{\textbf{Set VI}} : {$H_{T}$}, aplan, sqrts, spher - \item \textit{\textbf{Set VII}} : {$H_{T}$}, aplan, sqrts, cent - \item \textit{\textbf{Set VIII}} : {$H_{T}$}, aplan, sqrts, costhetastar ($cos(\theta^{*})$ - \item \textit{\textbf{Set IX}} : {$H_{T}$}, aplan, sqrts, cent, spher - \item \textit{\textbf{Set X}} : {$H_{T}$}, aplan, sqrts, cent, costhetastar - \item \textit{\textbf{Set XI}} : {$H_{T}$}, aplan, sqrts, spher, costhetastar - \item \textit{\textbf{Set XII}} : metl, {$H_{T}$}, aplan, sqrts - \item \textit{\textbf{Set XIII}} : metl, {$H_{T}$}, aplan, cent - \item \textit{\textbf{Set XIV}} : metl, {$H_{T}$}, aplan, spher - \item \textit{\textbf{Set XV}} : metl, {$H_{T}$}, cent, spher - \item \textit{\textbf{Set XVI}} : metl, {$H_{T}$}, aplan - \item \textit{\textbf{Set XVII}} : metl, {$H_{T}$}, sqrts - \item \textit{\textbf{Set XVIII}} : metl, aplan, sqrts - \item \textit{\textbf{Set XIX}} : metl, {$H_{T}$}, cent - \item \textit{\textbf{Set XX}} : metl, {$H_{T}$}, aplan, sqrts, cent - \item \textit{\textbf{Set XXI}} : metl, {$H_{T}$}, aplan, cent, spher - \item \textit{\textbf{Set XXII}} : metl, {$H_{T}$}, aplan, sqrts, spher - \item \textit{\textbf{Set XXIII}} : metl, {$H_{T}$}, aplan, sqrts, costhetastar - \item \textit{\textbf{Set XXIV}} : metl, sqrts, cent, spher, costhetastar - \item \textit{\textbf{Set XXV}} : metl, {$H_{T}$}, cent, spher, costhetastar - \item \textit{\textbf{Set XXVI}} : metl, aplan, cent, spher, costhetastar - \item \textit{\textbf{Set XXVII}} : metl, {$H_{T}$}, aplan, cent, costhetastar - \item \textit{\textbf{Set XXVIII}} : {$H_{T}$}, aplan, topmassl - \item \textit{\textbf{Set XXIX}} : {$H_{T}$}, aplan, sqrts, topmassl - \item \textit{\textbf{Set XXX}} : {$H_{T}$}, aplan, sqrts, cent, topmassl - \item \textit{\textbf{Set XXXI}} : {$H_{T}$}, aplan, sqrts, costhetastar, topmassl - \item \textit{\textbf{Set XXXII}} : metl, {$H_{T}$}, topmassl, aplan, sqrts - \item \textit{\textbf{Set XXXIII}} : metl, spher, costhetastar, aplan, cent -% \item \textit{\textbf{Set XXXIV}} : metl, spher, sqrts, topmassl, ktminp + \item \textit{\textbf{Set 1}} : {$H_{T}$}, aplan (aplanarity), Mjjtau ($M_{jj\tau}$) + \item \textit{\textbf{Set 2}} : {$H_{T}$}, aplan, cent (centrality) + \item \textit{\textbf{Set 3}} : {$H_{T}$}, aplan, spher (spherecity) + \item \textit{\textbf{Set 4}} : {$H_{T}$}, cent, spher + \item \textit{\textbf{Set 5}} : aplan, cent, spher + \item \textit{\textbf{Set 6}} : {$H_{T}$}, aplan, Mjjtau, spher + \item \textit{\textbf{Set 7}} : {$H_{T}$}, aplan, Mjjtau, cent + \item \textit{\textbf{Set 8}} : {$H_{T}$}, aplan, Mjjtau, costhetastar ($cos(\theta^{*})$) + \item \textit{\textbf{Set 9}} : {$H_{T}$}, aplan, Mjjtau, cent, spher + \item \textit{\textbf{Set 10}} : {$H_{T}$}, aplan, Mjjtau, cent, costhetastar + \item \textit{\textbf{Set 11}} : {$H_{T}$}, aplan, Mjjtau, spher, costhetastar + \item \textit{\textbf{Set 12}} : METsig ($\not\!\! E_{T}$ significance), {$H_{T}$}, aplan, Mjjtau + \item \textit{\textbf{Set 13}} : METsig, {$H_{T}$}, aplan, cent + \item \textit{\textbf{Set 14}} : METsig, {$H_{T}$}, aplan, spher + \item \textit{\textbf{Set 15}} : METsig, {$H_{T}$}, cent, spher + \item \textit{\textbf{Set 16}} : METsig, {$H_{T}$}, aplan + \item \textit{\textbf{Set 17}} : METsig, {$H_{T}$}, Mjjtau + \item \textit{\textbf{Set 18}} : METsig, aplan, Mjjtau + \item \textit{\textbf{Set 19}} : METsig, {$H_{T}$}, cent + \item \textit{\textbf{Set 20}} : METsig, {$H_{T}$}, aplan, Mjjtau, cent + \item \textit{\textbf{Set 21}} : METsig, {$H_{T}$}, aplan, cent, spher + \item \textit{\textbf{Set 22}} : METsig, {$H_{T}$}, aplan, Mjjtau, spher + \item \textit{\textbf{Set 23}} : METsig, {$H_{T}$}, aplan, Mjjtau, costhetastar + \item \textit{\textbf{Set 24}} : METsig, Mjjtau, cent, spher, costhetastar + \item \textit{\textbf{Set 25}} : METsig, {$H_{T}$}, cent, spher, costhetastar + \item \textit{\textbf{Set 26}} : METsig, aplan, cent, spher, costhetastar + \item \textit{\textbf{Set 27}} : METsig, {$H_{T}$}, aplan, cent, costhetastar + \item \textit{\textbf{Set 28}} : {$H_{T}$}, aplan, topmassl + \item \textit{\textbf{Set 29}} : {$H_{T}$}, aplan, Mjjtau, topmassl + \item \textit{\textbf{Set 30}} : {$H_{T}$}, aplan, Mjjtau, cent, topmassl + \item \textit{\textbf{Set 31}} : {$H_{T}$}, aplan, Mjjtau, costhetastar, topmassl + \item \textit{\textbf{Set 32}} : METsig, {$H_{T}$}, topmassl, aplan, Mjjtau + \item \textit{\textbf{Set 33}} : METsig, spher, costhetastar, aplan, cent +% \item \textit{\textbf{Set XXXIV}} : metl, spher, Mjjtau, topmassl, ktminp \end{itemize} +P17 tried only three different sets among hundreds of possible combinations. We believe that the +33 sets tested above suffice in giving an optimal result. The criteria used for making a decision on which variable should be used follow: \begin{itemize} - \item No more than 5 variables to keep NN simple and stable. More variables leads to instabilities (different - result after each retraining) and require larger training samples. -% \item We want to use $metl$ (\met significance) variable, since it's the one providing best discrimination. - \item We do not want to use highly correlated variables in same NN. + \item No more than 5 variables to keep NN simple and stable. More require larger training samples. + \item We want to use METsig variable, since it's the one providing best discrimination. + \item We do not want to use highly correlated variables in same NN. Such as $H_{T}$ and jet $p_{T}$'. % \item We can not use tau-based variables. \item We want to use variables with high discriminating power. \end{itemize} -In order to make the decision about which of these 11 choices is the optimal we created an ensemble of +In order to make the decision about which of these 33 choices is the optimal we created an ensemble of 20000 pseudo-datasets each containing events randomly (according to a Poisson distribution) picked from QCD, EW and $\ttbar$ templates. Each of these datasets was treated like real data, meaning applying all the cuts and doing the shape fit of event topological NN. QCD templates for fit were made from the same ``loose-tight $\tau$ sample'' from which the QCD component of the ``data'' was drawn. -The figure of merit chosen is given by Equation \ref{merit} below: +We used the folloing quantity as the figure of merit: \begin{equation} f = \displaystyle \frac{(N_{fit} - N_{true})}{N_{true}} @@ -218,13 +228,11 @@ f = \displaystyle \frac{(N_{fit} - N_{tr \noindent where $N_{fit}$ is the number of $t\bar{t}$ pairs given by the fit and $N_{true}$ is the number of $t\bar{t}$ pairs from the Poisson distribution. In both Set and $\not\!\! E_{T}$ significance optimization, the lowest RMS was used -to caracterize which configuration is the best in each case. +to characterize which configuration is the best in each case. -The plots showing results concerning the set optimizations are found in Appendix \ref{app:set_opt} and are summarized -in Table \ref{setopt_table} below, where each RMS and mean are shown. -For NN training is standard to choose the number of hidden nodes as being twice -the number the number of variables used for the training. The parenthesis after each set ID show the number of +The plots showing results concerning the set optimization are found in Appendix \ref{app:set_opt} and are summarized +in Table \ref{setopt_table} below, where each RMS and mean are shown. The parenthesis after each set ID show the number of hidden nodes in NN training. \begin{table}[htbp] @@ -307,13 +315,13 @@ The number in parenthesis refers to numb \label{setopt_table} \end{table} -From Table \ref{setopt_table} we see that Set I has the lowest RMS, thus we chose it +From Table \ref{setopt_table} we see that Set 32 has the lowest RMS, thus we chose it as the set to be used in $\not\!\! E_{T}$ significance optimization part, whose results are shown in Appendix \ref{app:metl_opt} and then summarized in Table \ref{metlopt_table} below \begin{table}[htbp] -\begin{tabular}{|c|r|r|r|} \hline -Set of variables & $\not\!\! E_{T}$ significance cut & RMS & \multicolumn{1}{c|}{mean} \\ \hline +\begin{tabular}{|c|r|r|r|r|} \hline +Set 32 & Number of hidden nodes &$\not\!\! E_{T}$ significance cut & RMS & \multicolumn{1}{c|}{mean} \\ \hline \hline @@ -326,15 +334,15 @@ Set of variables & $\not\!\! E_{T}$ sig %Set6(10) & \multicolumn{1}{c|}{2.5} & \multicolumn{1}{c|}{0.2021} \\ \hline -Set32(10) & \multicolumn{1}{c|}{3.0} & \multicolumn{1}{c|}{0.1507} & \multicolumn{1}{c|}{0.0157}\\ \hline +1 & \multicolumn{1}{c|}{10} & \multicolumn{1}{c|}{3.0} & \multicolumn{1}{c|}{0.1507} & \multicolumn{1}{c|}{0.0157}\\ \hline -Set32(10) & \multicolumn{1}{c|}{3.5} & \multicolumn{1}{c|}{0.1559} & \multicolumn{1}{c|}{0.0189}\\ \hline +2 & \multicolumn{1}{c|}{10} & \multicolumn{1}{c|}{3.5} & \multicolumn{1}{c|}{0.1559} & \multicolumn{1}{c|}{0.0189}\\ \hline -Set32(10) & \multicolumn{1}{c|}{4.0} & \multicolumn{1}{c|}{0.1468} & \multicolumn{1}{c|}{0.0172}\\ \hline +3 & \multicolumn{1}{c|}{10} & \multicolumn{1}{c|}{4.0} & \multicolumn{1}{c|}{0.1468} & \multicolumn{1}{c|}{0.0172}\\ \hline -Set32(10) & \multicolumn{1}{c|}{4.5} & \multicolumn{1}{c|}{0.1511} & \multicolumn{1}{c|}{0.0153}\\ \hline +4 & \multicolumn{1}{c|}{10} & \multicolumn{1}{c|}{4.5} & \multicolumn{1}{c|}{0.1511} & \multicolumn{1}{c|}{0.0153}\\ \hline -Set32(10) & \multicolumn{1}{c|}{5.0} & \multicolumn{1}{c|}{0.1552} & \multicolumn{1}{c|}{0.0205}\\ \hline +5 & \multicolumn{1}{c|}{10} & \multicolumn{1}{c|}{5.0} & \multicolumn{1}{c|}{0.1552} & \multicolumn{1}{c|}{0.0205}\\ \hline %Set6(10) & \multicolumn{1}{c|}{5.5} & \multicolumn{1}{c|}{0.4008} \\ \hline \end{tabular} @@ -346,18 +354,18 @@ The number in parenthesis refers to numb Combined results from Tables \ref{setopt_table} and \ref{metlopt_table} show that the best configuration found -was Set I with $\not\!\! E_{T}$ significance $\geq$ 4.0. Therefore, this was the -configuration used to perform the cross-section measurement.Figure \ref{fig:METsig_RMS} shows the variation of the RMS as function +was Set 32 with $\not\!\! E_{T}$ significance $\geq$ 4.0. Therefore, this was the +configuration used to perform the cross section measurement.Figure \ref{fig:METsig_RMS} shows the variation of the RMS as function of the $\not\!\! E_{T}$ significance we applied. -\begin{figure}[b] -\includegraphics[scale=0.4]{plots/METsig-RMS.eps} +\begin{figure}[h] +\includegraphics[scale=0.35]{plots/METsig-RMS.eps} \caption{Plot of RMS as a function the $\not\!\! E_{T}$ significance applied} \label{fig:METsig_RMS} \end{figure} -\clearpage +%\clearpage In order to check the validity of our emsemble tests procedure, it is instructive to plot both the @@ -378,13 +386,13 @@ initially set in the ensemble and the me curve, that indicates a good behaviour of the fit uncertainties in the ensembles. \begin{figure}[t] -\includegraphics[scale=0.5]{plots/gaus_ttbar.eps} +\includegraphics[scale=0.40]{plots/gaus_ttbar.eps} \caption{Distribution of the output ``measurement'' for an ensemble with 116.9 $\ttbar$ events.} \label{fig:gaus_ttbar} \end{figure} -\begin{figure}[t] -\includegraphics[scale=0.5]{plots/pull1-40.eps} +\begin{figure}[b] +\includegraphics[scale=0.40]{plots/pull1-40.eps} \caption{The ensemble test's pull.} \label{fig:pull} \end{figure}