ttbar/p20_taujets_note/Tools.tex - view

File: [Nicadd] / ttbar / p20_taujets_note / Tools.tex
Revision 1.1: download - view: text, annotated - select for diffs
Wed May 18 21:30:39 2011 UTC (13 years, 2 months ago) by uid12904
CVS tags: MAIN, HEAD

Initial revision

\section{Tools} \subsection{Object ID} \subsubsection{\label{sub:tau--ID}$\tau$ ID} \paragraph{Tau decay modes} The $\tau$ lepton have several decay channels, classified by the number of charged particles (tracks) associated with it \cite{PDG} : \begin{itemize} \item electron + muon ($\tau\rightarrow e\nu_{e}\nu_{\tau}$ or $\tau\rightarrow\mu\nu_{\mu}\nu_{\tau})$, BR = 35\% \item charged hadron ($\tau\rightarrow\pi^{-}\nu_{\tau}$), BR =12\% \item charged hadron + $\geq1$ neutral particle (i.e. $\tau\rightarrow\rho^{-}\nu_{\tau}\rightarrow\pi^{0}n+\pi^{-}\nu_{\tau}$) , BR = 38\% \item 3 charged hadrons + $\geq0$ neutral hadrons, BR = 15\% (so-called {}``3-prong'' decays) \end{itemize} \paragraph{Tau ID variables} At D0 $\tau$s are identified in their hadronic modes (contributing to inefficiency of id) as narrow (0.3 cone) jets,isolated and matched to a charged track. The (most important) discriminating variables are \cite{tau ID}: \begin{itemize} \item Profile - $\frac{E_{T}^{1}+E_{T}^{0}}{\sum_{i}E_{T}^{i}}$, where $E_{T}^{i}$ is the $E_{T}$ of the ith highest $E_{T}$ tower in the cluster \item Isolation, defined as $\frac{E(0.5)-E(0.3)}{E(0.3)}$, where $E(R)$ is the energy contained in a radius of R around cal cluster centroid \item Track isolation, defined as $\sum p_{T}$ of non-$\tau$ tracks in cone of 0.5 around the calorimeter cluster centroid \end{itemize} Using these and other variables, 2 Neural Networks are trained to identify 3 types of $\tau$ ($\pi$-type, $\rho$-type and 3-prong) The output of these NN provides a set of 3 variables (nnout 1... 3) to be used to select $\tau$ in the event. The high values of NN have to correspond to the physical $\tau$ leptons, while the low ones should indicate fakes. For more details, see \cite{tau ID}. \paragraph{Energy Scale} In \cite{ingo1} the process $Z\rightarrow\tau\tau$ had been studied. In particular the Figure 4 of that note demonstrates an excellent agreement between the data and $Z\rightarrow\tau\tau$ MC in distribution of the invariant mass of the $\tau$ pair. Figure 15 of \cite{ingo2} shows other important properties ($P_{T}$ of $\tau$ and $\not E_{T}$) which also agree very well. Since no energy correction had been applied to the $\tau$ in this work, one can conclude that we can take the energy scale of $\tau$ ID to be 1. \paragraph{Performance } The $\tau$ NN had been trained and optimized for the low jet multiplicity events (i.e. $Z\rightarrow\tau\tau$). We wanted to compare its performance for the high multiplicity signal (top) that we are searching for here. In order to evaluate the ID efficiency reliably one has to match the reconstructed $\tau$ candidate with the true $\tau$ from MC. We start with all the $\tau$ candidates in an event, regardless of the quality. We then want to select those that can be with high confidence correspond to the real $\tau$ leptons. The assumption is that those $\tau$ candidates, whose energy and direction matched to a physical $\tau$ are indeed representing the detector signature of this particle. We can then determine how well does the $\tau$ ID identify this $\tau$ lepton. Figure \ref{cap:Matching-of-MC} illustrates the matching procedure - the $\tau$ candidates with $\Delta R$ from a real MC $\tau$ of 0.05 and $\Delta P$ of 10 GeV are deemed to be the {}``real'' matched $\tau$ For such $\tau$ we plotted the NN for different $\tau$ types (Fig \ref{cap:NN-for-matched}). From these one can determine the efficiency of $\tau$ ID for various cuts on NN (Fig \ref{tauID}). % \begin{figure} \subfigure[$\Delta R$ between reco $\tau$ and MC $\tau$]{\includegraphics[scale=0.3]{plots_for_talk/drmin}}\subfigure[$\Delta R$ between reco $\tau$ and MC $\tau$ (low values)]{\includegraphics[scale=0.3]{plots_for_talk/drmin_zoomed}} \subfigure[Difference in energy between reco and MC $\tau$]{\includegraphics[scale=0.3]{plots_for_talk/dpmin}}\subfigure[Difference in energy between MC and reco $\tau$ that were matched in angle]{\includegraphics[scale=0.3]{plots_for_talk/dpmin005}} \caption{Matching of MC $\tau$ and reco $\tau$. Black is $Z\rightarrow\tau\tau$, red is $t\overline{t}\rightarrow\tau+jets$The histograms are normalized to 1 to enable comparision.} \label{cap:Matching-of-MC} \end{figure} % \begin{figure} \subfigure[NN for ALL types]{\includegraphics[scale=0.3]{plots_for_talk/nnmatched}}\subfigure[NN for type 1]{\includegraphics[scale=0.3]{plots_for_talk/nn1matched}} \subfigure[NN type 2]{\includegraphics[scale=0.3]{plots_for_talk/nn2matched}}\subfigure[NN for type 3]{\includegraphics[scale=0.3]{plots_for_talk/nn3matched}} \caption{NN for matched $\tau$. Black is $Z\rightarrow\tau\tau$, red is $t\overline{t}\rightarrow\tau+jets$. The histograms are normalized to 1 to enable comparision.} \label{cap:NN-for-matched} \end{figure} % \begin{figure} \subfigure[ALL types]{\includegraphics[scale=0.3]{plots_for_talk/eff0}}\subfigure[Type 1]{\includegraphics[scale=0.3]{plots_for_talk/eff1}} \subfigure[Type 2]{\includegraphics[scale=0.3]{plots_for_talk/eff2}}\subfigure[Type 3]{\includegraphics[scale=0.3]{plots_for_talk/eff3}} \caption{$\tau$ ID Efficiencies for different types. Black is $Z\rightarrow\tau\tau$, red is $t\overline{t}\rightarrow\tau+jets$} \label{tauID} \end{figure} In order to choose the best cut on $\tau$ NN one has to also consider the fake rate (the number of fake $\tau$ candidates passing the ID requirements successfully). For this purpose we had examined the $\tau$ candidates in the preselected ALLJET data sample (the details of preselection are described in section \ref{sub:Preselection}). Since this dataset is QCD dominated (no more then 0.2\% of electroweak is expected) we can safely assume all $\tau$ in it to be fake (this assumption will be employed again for our QCD background estimation in section \ref{sub:QCD-modeling}). Figure \ref{cap:NN-for-fake} shows the distribution of NN for the $\tau$. From this we can determine the fake rate dependence on NN cut (Fig \ref{tauID_Fake}). We can note that type 3 has noticeably higher fake rate. This is to be expected, since most jets have higher track multiplicities than type 1 and 2 $\tau$ making it harder for them to pass $\tau$ ID requirements. % \begin{figure} \subfigure[NN for ALL types]{\includegraphics[scale=0.3]{plots/NN0fakeNN}}\subfigure[NN for type 1]{\includegraphics[scale=0.3]{plots/NN1akeNN}} \subfigure[NN type 2]{\includegraphics[scale=0.3]{plots/NN2akeNN}}\subfigure[NN for type 3]{\includegraphics[scale=0.3]{plots/NN3akeNN}} \caption{NN for fake $\tau$} \label{cap:NN-for-fake} \end{figure} % \begin{figure} \subfigure[ALL types]{\includegraphics[scale=0.3]{plots/NN0fake}}\subfigure[Type 1]{\includegraphics[scale=0.3]{plots/NN1fake}} \subfigure[Type 2]{\includegraphics[scale=0.3]{plots/NN2fake}}\subfigure[Type 3]{\includegraphics[scale=0.3]{plots/NN3fake}} \caption{$\tau$ ID Fake Rate for different types} \label{tauID_Fake} \end{figure} On Fig \ref{tauID_Fake_Eff} we plot the fake rate vs. efficiency of the $\tau$ ID for our channel. From this we can select the optimal selection cut on $\tau$ NN, based on the $\tau$ID significance, defined as $\frac{Number\, of\, real\, taus}{\sqrt{Number\, of\, real+Number\, fakes}}$ (Fig \ref{tauID_Fake_signif}). It is computed on our preselected analysis data set (section \ref{sub:Preselection}) We can conclude that D0 $\tau$ ID algorithm has efficiency for $t\bar{t}$ comparable with $Z\rightarrow\tau\tau$. The optimal cut on $\tau$ NN appears to be 0.95 for all the types. % \begin{figure} \subfigure[ALL types]{\includegraphics[scale=0.3]{plots/NN0fake_eff}}\subfigure[Type 1]{\includegraphics[scale=0.3]{plots/NN1fake_eff}} \subfigure[Type 2]{\includegraphics[scale=0.3]{plots/NN2fake_eff}}\subfigure[Type 3]{\includegraphics[scale=0.3]{plots/NN3fake_eff}} \caption{$\tau$ ID Efficiency vs. the Fake rate. Type 2 is the cleanest, type 3 has highest fake rate, as expected} \label{tauID_Fake_Eff} \end{figure} % \begin{figure} \subfigure[ALL types]{\includegraphics[scale=0.3]{plots/NN0fake_signif}}\subfigure[Type 1]{\includegraphics[scale=0.3]{plots/NN1fake_signif}} \subfigure[Type 2]{\includegraphics[scale=0.3]{plots/NN2fake_signif}}\subfigure[Type 3]{\includegraphics[scale=0.3]{plots/NN3fake_signif}} \caption{$\tau$ ID Significance vs. the NN cut. The 0.95 cut appears to be advantageous for all the types} \label{tauID_Fake_signif} \end{figure} \subsubsection{\label{sub:B-tagging}B-tagging} The chosen b-tagging algorithm is Secondary Vertex Tagger (SVT) \cite{b-ID}. It is characterized by high (compared to other taggers) purity, which is essential for such QCD-dominated channel as ours. The algorithm reconstructs secondary vertices inside a jet, using the jet's associated tracks. The tracks are also required to pass a set of cuts outlined in Table \ref{cap:The-standard-cuts}. Then, the decay length significance is computed. If the jet has this significance grater then 7 (for SVT TIGHT) it is considered b-tagged. As can be seen from Figures \ref{cap:SVT-Signifficance} and \ref{cap:SVT-Signifficance_data}, the TIGHT cut is most appropriate for our signal. % \begin{table} \begin{tabular}{|l|l|l|l|} \hline {\large SVT}& & & \tabularnewline \hline & LOOSE & MEDIUM & TIGHT\tabularnewline \hline Number of SMT hits & 2 & 2 & 2\tabularnewline \hline $P_{T}$ of tracks & 1 GeV/c & 1 GeV/c & 1 GeV/c\tabularnewline \hline impact parameter significance of tracks & 3 & 3.5 & 3.5\tabularnewline \hline track $\chi^{2}$& 10 & 10 & 10\tabularnewline \hline max vertex $\chi^{2}$& 100 & 100 & 100\tabularnewline \hline vertex collinearity & 0.9 & 0.9 & 0.9\tabularnewline \hline max vertex decay length & 2.6cm & 2.6cm & 2.6cm\tabularnewline \hline Decay Length Significance Cut & 5 & 6 & 7\tabularnewline \hline \end{tabular} \caption{The standard cuts on SVT \cite{b-ID}} \label{cap:The-standard-cuts} \end{table} % \begin{figure} \includegraphics[scale=0.5]{MS_thesis/proposal_plots/svt} \caption{SVT Decay Length Significance for the b-jets in $t\overline{t}\rightarrow\mu+jets$ MC} \label{cap:SVT-Signifficance} \end{figure} % \begin{figure} \subfigure[SVT significance across the entire range]{\includegraphics[scale=0.3]{plots/svt_data}}\subfigure[SVT significance at values near 0]{\includegraphics[scale=0.3]{plots/svt_data_zoomed}} \caption{SVT Decay Length Significance for all the jets in the ALLJET skim data.} \label{cap:SVT-Signifficance_data} \end{figure} \paragraph{Taggability} In order to reconstruct a secondary vertex in a jet, the jet must contain at least 2 tracks. If such tracks are found and their $P_{T}$ is greater then 0.5 GeV the jet is called taggable. In MC it is important to distinguish the taggability from the tagging efficiency, since the later depends on the jet's flavor. \paragraph{B-tagging efficiency} It is known that b-tagging applied directly to MC gives an overestimated efficiency. In order to account for this factor SVT had been parameterized on $t\overline{t}\rightarrow\mu+jets$ MC and $\mu+jets$ data to compute the correction factor, which has to be applied to MC. As result we obtain the MC tagging probability and data corrected one (Figure \ref{bID}). It can be noted that the data corrected efficiency is indeed noticeably (>30\%) lower then what we would expect by applying SVT directly to MC. \paragraph{C-tagging efficiency} An assumption is made that the correction factor obtained by dividing the semi-leptonic b-tagging efficiency in data to the one in MC also is correct for c-jets. Hence the MC-obtained inclusive c-taging efficiency is multiplied by this factor (and by it's taggability too) in order to estimate the c-tagging probability \paragraph{Light jet tagging efficiency} The b-tag fake rate from light quarks is computed by measuring the negative tag rate. It is defined by the rate of appearance of secondary vertices with negative decay length significance. It is assumed that the light quarks have equal chances to produce SV with positive and negative decay length significance (due to finite resolution effects) while the heavy flavor jets can only produce SV with positive decay length significance. This however is not quite true and a special scaling factor ($SF_{hf}$) is introduced to correct for the fraction of heavy flavors among the jets with the negative decay length significance. Another correction is for the presence of the long lived particles in light jets ($SF_{ll}$). Both factors are derived from Monte Carlo. % \begin{figure} \subfigure[SVT efficiency]{\includegraphics[scale=0.3]{plots/SVTeff_pt}}\subfigure[SVT efficiency]{\includegraphics[scale=0.3]{plots/SVTeff_eta}} \subfigure[SVT efficiency parametrized on data]{\includegraphics[scale=0.3]{plots/SVTeff}}\subfigure[SVT efficiency parametrized on MC]{\includegraphics[scale=0.3]{plots/SVTeffMC}} \caption{SVT Efficiency for $t\overline{t}\rightarrow\tau+jets$ MC. Red is MC parameterization black is data-corrected. Flavor depancance is taken into account. The lower plots show 2D parametrizations} \label{bID} \end{figure} \paragraph{Event tagging efficiency} The tag rates and the taggability had been combined and used to predict the probability for a jet to be b-tagged (b-tagging weight). The final resulting per-event probability of having at least one such a tag for the $t\overline{t}\rightarrow\tau+jets$ MC is plotted on Figure \ref{cap:The-probability-to} % \begin{figure} \includegraphics[scale=0.5]{plots/ttb_eventprobtag} \caption{The probability to tag at least one jet with SVT for $t\overline{t}\rightarrow\tau+jets$ MC} \label{cap:The-probability-to} \end{figure} Finally it has to be noted that we tried to avoid th overlap between $\tau$ ID and b-tagging. That is we remove the jets, matched to a 0.8 $\tau$ candidate within $\Delta R$ = 0.5 \subsection{Trigger} \subsubsection{\label{sub:Running-trigsim}Running TRIGSIM} In order to search for a signal one has to collect the data with a chosen set of triggers. We want to find such a combination so to maximize the fraction of signal events written out. For that purpose the trigger simulation program was run on the MC signal. The following efficiencies were obtained (for 12.30 version of the global D0 trigger definition list) (Table \ref{cap:Event-overlaps-between}) % \begin{table} \begin{tabular}{|l||l||} \hline {\large Trigger}& {\large Fraction of events passing}\tabularnewline \hline 4JT12 & 0.74$\pm$0.05\tabularnewline 3J15\_2J25\_PVZ & 0.73$\pm$0.05\tabularnewline MHT30\_3CJT5& 0.68$\pm$0.04\tabularnewline MU\_JT20\_L2M0& 0.30$\pm$0.01\tabularnewline \hline \multicolumn{1}{|l|}{MU\_JT20\_L2M0 \&\& MHT30\_3CJT5}& \multicolumn{1}{||l||}{0.2$\pm$0.01}\tabularnewline \hline \multicolumn{1}{||l}{MHT30\_3CJT5 \&\& 4JT12}& \multicolumn{1}{||l||}{0.4$\pm$0.04 }\tabularnewline \hline \multicolumn{1}{||l}{4JT12 \&\& 3J15\_2J25\_PVZ}& \multicolumn{1}{||l||}{0.67$\pm$0.04}\tabularnewline \hline \end{tabular} \caption{Trigger efficiencies and event overlaps between the most efficient (for selecting $t\overline{t}\rightarrow\tau+jets$) unprescaled triggers. As one can see 3J15\_2J25\_PVZ has large overlap with 4JT12, and since 4JT12 is better studied it has been chosen for this analysis.} \label{cap:Event-overlaps-between} \end{table} Taking this into account, we are left with 3 triggers, giving altogether \textasciitilde{}85\% efficiency: $\mathit{MHT30\_3CJT5}$ - $\not\!\! E_{T}$ trigger, requiring at least 30 GeV at level 3, which leads to \textasciitilde{}30\% inefficiency, since our missing $E_{T}$ peaks around 50 GeV $Description$: {\large L1}: At least three Calorimeter trigger towers with $E_{T}$$>$5 GeV. {\large L2}: Require jet $E_{T}$$>$20. {\large L3}: Vector $H_{T}$ sum $>$30 GeV. Also, one in 4000 events is recorded and marked as {}``unbiased'' $\mathit{4JT12}$ - Trigger designed for the $t\bar{t}\rightarrow jets$ analysis \cite{alljet}. Meets all the jet number requirements, but doesn't often have high enough $\not E_{T}$ $Description$: {\large L1}: At least three Calorimeter JET trigger towers having $E_{T}$$>$5 GeV. {\large L2}: Three JET candidates with $E_{T}$$>$8 GeV and HT $>$ 50 GeV. {\large L3}: Four $|$$\eta$$|$$<$3.6 jet candidates with $E_{T}$$>$10 GeV found using a simple cone algorithm. Three of those jets must have $E_{T}$$>$15 GeV. Record one in 500 events marked as 'unbiased'. $\mathit{MU\_JT20\_L2M0}\mathbb{\,\,}$- Muon trigger with $<$20\% efficiency for our signal, but it's unprescaled and has little overlap with others $Description$: {\large L1}: A single muon trigger based on muon scintillator and also requiring one Calorimeter JET trigger tower with $E_{T}$>3 GeV. {\large L2}: At least one muon found meeting MEDIUM quality requirements but no pT or region requirement. Also require at least one jet with $E_{T}$>10 GeV. {\large L3}: At least one jet with $E_{T}$>20 GeV is found using a simple cone algorithm. Additionally, one in 500 of all events is recorded and marked as 'unbiased'. \subsubsection{Triggers in version 13} A new trigger list has recently been used for D0 data-taking. It includes a number of new triggers and modifications to existing ones. Running TRIGSIM on this triglist we discover that the triggers that were best in v12 ($\mathit{4JT12}$ and $\mathit{MHT30\_3CJT5}$) are also most efficient in v13. In fact, OR of just these two triggers gives 90$\pm$5\% efficiency. The names and definitions of these triggers had changed: 4JT12 became JT2\_4JT12L\_HT. An additional $H_{T}$ cut of 120 GeV is being applied: $Description$: {\large L1}: Three calorimeter trigger towers with $E_{T}$>5 GeV. {\large L2}: Pass events with at least three JET candidates with $E_{T}$>6 GeV and $H_{T}$, formed with jets above 6 GeV, greater than 70 GeV. {\large L3}: Requires at least four jets with $E_{T}$ > 12 GeV and at at least three jets with $E_{T}$ > 15 GeV. Also require at least two jets to have $E_{T}$ > 25 GeV . Event $H_{T}$ (calculated using all jets with $E_{T}$>9 GeV) > 120 GeV . MHT30\_3CJT5 became JT2\_MHT25\_HT. $Description$:: {\large L1}: Three calorimeter trigger towers with $E_{T}$>4 GeV, |$\eta$|<2.4, and two calorimeter trigger towers with $E_{T}$>5 GeV. {\large L2}: Pass events with at least three JET candidates with $E_{T}$>6 GeV and $H_{T}$, formed with jets above 6 GeV, greater than 70 GeV {\large L3}: Vector $H_{T}$ sum for the event must be above 25 GeV. Also require event scalar $H_{T}$ (calculated using all jets with $E_{T}$>9 GeV) > 125 GeV. \subsubsection{Turn-on curves} These features are reflected in the corresponding turn-on curves (integrated efficiencies of signal) . Such a curve for MHT30 and 4JT12 triggers is shown on Figure \ref{trig}.% \begin{figure} \includegraphics[scale=0.8]{MS_thesis/proposal_plots/MET_eff_new__7} \includegraphics[scale=0.6]{MS_thesis/proposal_plots/4JT12jet3pt} \caption{Integrated trigger efficiency for MHT30 and 4JT12 triggers for the $t\overline{t}\rightarrow\tau+jets$ MC, obtained using TRIGSIM.} \begin{centering}\label{trig}\par\end{centering} \end{figure} \subsubsection{Trigger simulation} At the time of the analysis TRIGSIM had not reached the state in which it could reliably reproduce the trigger efficiency on data. Therefore, the accepted practice is to parameterize the trigger turn-ons on data and apply this parameterization to MC files. Such procedure was performed with the top\_trigger package \cite{top_trigger}. On Figure \ref{cap:The-trigger-efficiency} one can see the results of a test to check the validity of such approach. We used the dataset collected by a single muon trigger (MU\_JT20\_L2M0). We can assume that such data has little bias with respect to the 4JT10 trigger. Hence, if we count the number of the 4 jet events that passed 4JT10 and compare it with the top\_trigger prediction for the same events we can check how well does top\_trigger perform. As can be noted from Figure \ref{cap:The-trigger-efficiency} the agreement is fairly good, especially in the region which we use in this analysis (we require jets to have $P_{T}>20$ GeV). The efficiency turn-on curve, produced by top\_trigger is shown on Fig. \ref{cap:The-trigger-efficiency_MC} and is in agreement with the TRIGSIM (Fig. \ref{trig}). % \begin{figure} \includegraphics[scale=0.7]{plots/4JT10_MULOOSE} \caption{The integrated trigger efficiency closure plot. Black is the MU\_JT20\_L2M0 data, red is top\_trigger prediction for this data.} \label{cap:The-trigger-efficiency} \end{figure} % \begin{figure} \includegraphics[scale=0.7]{plots/4JT12_EFF_toptrig} \caption{The integrated trigger efficiency for the $t\overline{t}\rightarrow\tau+jets$ MC obtained using top\_trigger. } \label{cap:The-trigger-efficiency_MC} \end{figure}