Annotation of ttbar/p20_taujets_note/Analysis.tex, revision 1.1
1.1 ! uid12904 1:
! 2: \section{Analysis}
! 3:
! 4:
! 5: \subsection{Outline}
! 6:
! 7: The analysis procedure involved several stages:
! 8:
! 9: \begin{itemize}
! 10: \item Preselection (section \ref{sub:Preselection}). At least 4 jets and
! 11: $\not\!\! E_{T}$ significance > 3. 653727 events selected in the
! 12: data, 109.93 $\pm$7.26 $t\bar{t}$ among them are expected. S:B =
! 13: 1:6000.
! 14: \item ID cuts (section \ref{sub:Results-of-the}) . At least one good $\tau$
! 15: candidate and at at least one tight SVT tag is requited. We also required
! 16: $\geq2$ jets with $|\eta|<2.4$ and $P_{T}>20$ GeV. 216 events selected
! 17: in the data, 9.320$\pm$0.620 $t\bar{t}$ among them are expected.
! 18: S:B = 1:58.
! 19: \item Topological NN (section \ref{sub:NN-variables}). A sequence of two
! 20: feed-forward NN had been trained and applied. The optimal cut on the
! 21: second NN has been found to be 0.6. With this final cut we had obtained
! 22: 13 events in data with 4.93$\pm$0.33 $t\bar{t}$ among them are expected.
! 23: S:B = 1:2.5.
! 24: \end{itemize}
! 25: The W background had been modeled using ALPGEN Monte Carlo simulation,
! 26: while QCD had been extracted from the data using procedure, described
! 27: in section \ref{sub:QCD-modeling}.
! 28:
! 29:
! 30: \subsection{\label{sub:Preselection}Preselection}
! 31:
! 32: The total number of events in this 351 $pb^{-1}$data skim is 17 millions.
! 33: This is a very large and rather unwieldy dataset. Hence, the main
! 34: goal of preselection was to reduce this dataset while imposing the
! 35: most obvious and straightforward requirements, characterizing my signal
! 36: signature. Such characteristic features include the following:
! 37:
! 38: \begin{itemize}
! 39: \item Moderate $\not\!\! E_{T}$ arising from both the W vertex and $\tau$
! 40: decay.
! 41: \item At least 4 jets have to be present.
! 42: \item $\tau$lepton and 2 b-jets are present.
! 43: \end{itemize}
! 44: Since both $\tau$ ID and b-tagging involve complex algorithms which
! 45: are likely to be signal-sensitive and may require extensive \char`\"{}tuning\char`\"{},
! 46: we've chosen not to use them at the preselection stage.
! 47:
! 48: Similarly, we had chosen not to impose any jet $P_{T}$ cuts, since
! 49: such cuts strongly depend on the JES corrections and associated errors
! 50: and hence are better to be applied at a later stage.
! 51:
! 52: The first 3 preselection criteria were chosen similar to the $t\overline{t}\rightarrow jets$
! 53: analysis \cite{alljet}:
! 54:
! 55: \begin{itemize}
! 56: \item Primary Vertex is reconstructed and is within the central tracker
! 57: volume (60 cm in Z from the detector center) and has at least 3 tracks
! 58: associated with it.
! 59: \item Veto on isolated electrons and muons to avoid overlap with the $t\overline{t}\rightarrow lepton+jets$
! 60: cross section analysis.
! 61: \item $N_{jets}\geq4$ with $P_{T}>8\, GeV$.
! 62: \end{itemize}
! 63: At this point, $t\overline{t}\rightarrow e+jets$ and $t\overline{t}\rightarrow\mu+jets$
! 64: analysis \cite{l+jets} are applying cuts on $\Delta\phi$ between
! 65: the lepton and $\not\!\! E_{T}$ as well as so-called \char`\"{}triangular\char`\"{}
! 66: cuts in $\Delta\phi$ - $\not\!\! E_{T}$ plane. The goal is to eliminate
! 67: the events with fake $\not\!\! E_{T}$ . The neutrino and lepton coming
! 68: from the W are expected to fly opposite direction most of the time.
! 69: However, as can be observed on Figure \ref{cap:dphi}, no such simple
! 70: cuts are obvious in case of $\tau$ . That is to be expected since
! 71: $\tau$ itself emits a neutrino in its decay, contributing to $\not\!\! E_{T}$
! 72: . So, instead a new variable is proposed to cut off the fake $\not\!\! E_{T}$
! 73: events and reduce the sample size.
! 74:
! 75: %
! 76: \begin{figure}
! 77: \includegraphics[scale=0.4]{analysis/plots/dphitaumet}
! 78:
! 79:
! 80: \caption{$\Delta\phi$ between $\tau$ and $\not\!\! E_{T}$ for QCD (black)
! 81: and $t\bar{t}\rightarrow\tau+jets$ (red).}
! 82:
! 83: \label{cap:dphi}
! 84: \end{figure}
! 85:
! 86:
! 87: %
! 88: \begin{figure}
! 89: \includegraphics[scale=0.4]{analysis/plots/metl}
! 90:
! 91:
! 92: \caption{$\not\!\! E_{T}$ significance for QCD and $t\bar{t}\rightarrow\tau+jets$.}
! 93:
! 94: \label{cap:metl}
! 95: \end{figure}
! 96:
! 97:
! 98: $\not\!\! E_{T}$ significance \cite{metl} is defined as measure
! 99: of likelihood of $\not\!\! E_{T}$ arising from physical sources,
! 100: rather than fluctuations in detector measurements. As can be observed
! 101: on Fig. \ref{cap:metl} it proves to be an effective way to reduce
! 102: the data skim. Cut of 3 was used for preselection.
! 103:
! 104: Now we need to scale the original 10K events of the MC sample to 349
! 105: $pb^{-1}$. The total $t\bar{t}$ cross section is 6.8 pb \cite{NNLO}.
! 106: Taking into account the branching fraction to hadronic $\tau+jets$
! 107: mode, the effective cross section comes out to be:
! 108:
! 109: $B(\tau\rightarrow hadrons)\cdot B(t\bar{t}\rightarrow\tau+jets)\cdot\sigma(t\bar{t})=0.65\cdot0.15\cdot6.8=0.66$
! 110: pb
! 111:
! 112: Throughout this work we had however used the value of $\sigma(t\bar{t})$
! 113: of 5.5 pb, the value, computed by the ALPGEN simulation, taking into
! 114: account the generation cuts. The effective cross section used for
! 115: scaling is then 0.53 pb. Since this value is only used for reference
! 116: and optimization of S:B it's of no importance which number is used.
! 117:
! 118: The relative flavor fractions of the $W+4jets$ process were taken
! 119: from ALPGEN simulation as ratios of the simulated cross section. It
! 120: was then normalized to the measured total value of 4.5 $\pm$ 2.2
! 121: pb \cite{W+4j}
! 122:
! 123: Table \ref{presel} shows the results of the preselection for both
! 124: data and the backgrounds.
! 125:
! 126: %
! 127: \begin{table}
! 128: \begin{tabular}{|c|c|c|c|}
! 129: \hline
! 130: &
! 131: \# passed&
! 132: ALPGEN $\sigma$, pb&
! 133: \# passed scaled\tabularnewline
! 134: \hline
! 135: \hline
! 136: data&
! 137: 653727/17M&
! 138: &
! 139: 653727\tabularnewline
! 140: \hline
! 141: $t\overline{t}\rightarrow\tau+jets$&
! 142: 6141/10878&
! 143: 0.821 $\pm$ 0.004&
! 144: 109.93 $\pm$7.26\tabularnewline
! 145: \hline
! 146: $Wbbjj\rightarrow$ $\tau\nu+bbjj$&
! 147: 2321/11576&
! 148: 0.222 $\pm$ 0.044&
! 149: 9.98 $\pm$ 2.08\tabularnewline
! 150: \hline
! 151: $Wccjj\rightarrow$ $\tau\nu+ccjj$&
! 152: 2289/10995&
! 153: 0.527 $\pm$ 0.059&
! 154: 24.77 $\pm$ 3.22\tabularnewline
! 155: \hline
! 156: $Wcjjj\rightarrow$ $\tau\nu+cjjj$&
! 157: 2169/10435&
! 158: 0.920 $\pm$0.087 &
! 159: 42.23 $\pm$ 4.87\tabularnewline
! 160: \hline
! 161: $Wjjjj\rightarrow$ $\tau\nu+jjjj$&
! 162: 2683/11920&
! 163: 14.14 $\pm$ 1.3&
! 164: 720.33 $\pm$ 81.48 \tabularnewline
! 165: \hline
! 166: \end{tabular}
! 167:
! 168:
! 169: \caption{Preselection results. Shown are the total acceptances (including
! 170: preselection) and the \# of events scaled to 349 $\pm$23 $pb^{-1}$
! 171: (no systematic uncertainties except for this luminosity error are
! 172: included). The Alpgen samples generation cuts are described in \cite{l+jets}.}
! 173:
! 174: \label{presel}
! 175: \end{table}
! 176:
! 177:
! 178:
! 179: \subsection{\label{sub:Results-of-the}Results of the ID cuts}
! 180:
! 181: The next step was to apply the requirement of $\tau$ and b tagging.
! 182: Table \ref{cap:btaggingandtau} shows the selection criteria that
! 183: we apply to data and MC and the resulting selection efficiencies.
! 184: The results of this procedure can be observed in Table \ref{b and tau}.
! 185: It can be noted that S:B at this stage is 1:58, which is way too low.
! 186: In section \ref{sub:NN-variables} we will describe the topological
! 187: NN used to enhance the signal content.
! 188:
! 189: %
! 190: \begin{table}
! 191: \begin{tabular}{|c|c|c|}
! 192: \hline
! 193: &
! 194: {\scriptsize data}&
! 195: {\scriptsize taggingMC}\tabularnewline
! 196: \hline
! 197: \hline
! 198: &
! 199: {\scriptsize $\geq1$ $\tau$ with $|\eta|<2.4$ and $P_{T}>20\, GeV$}&
! 200: {\scriptsize $\geq1$ $\tau$ with $|\eta|<2.4$ and $P_{T}>20\, GeV$}\tabularnewline
! 201: \hline
! 202: &
! 203: {\scriptsize $\geq1$ SVT}&
! 204: {\scriptsize $TrigWeight\cdot bTagProb$}\tabularnewline
! 205: \hline
! 206: &
! 207: {\scriptsize $\geq2$ jets with $|\eta|<2.4$ and $P_{T}>20\, GeV$}&
! 208: {\scriptsize $\geq2$ jets with $|\eta|<2.4$ and $P_{T}>20\, GeV$}\tabularnewline
! 209: \hline
! 210: \end{tabular}
! 211:
! 212:
! 213: \caption{b-tagging and $\tau$ ID. In the MC we use the b-tagging certified
! 214: parametrization rather then actual b-tagging, that is we applied the
! 215: b-tagging weight ($bTagProb$). We also used the triggering weight
! 216: as computed by top\_trigger.}
! 217:
! 218: \label{cap:btaggingandtau}
! 219: \end{table}
! 220:
! 221:
! 222: %
! 223: \begin{table}
! 224: \begin{tabular}{|c|c|c|c|}
! 225: \hline
! 226: &
! 227: {\small \# passed}&
! 228: {\small Acceptance}&
! 229: {\small \# passed scaled}\tabularnewline
! 230: \hline
! 231: \hline
! 232: {\small data}&
! 233: {\small 216/653727}&
! 234: &
! 235: {\small 216}\tabularnewline
! 236: \hline
! 237: {\small $t\overline{t}\rightarrow\tau+jets$}&
! 238: {\small 524.0/6141}&
! 239: {\small 0.0480$\pm$0.0020}&
! 240: {\small 9.320$\pm$0.620}\tabularnewline
! 241: \hline
! 242: {\small $Wbbjj\rightarrow$ $\tau\nu+bbjj$}&
! 243: {\small 54.5/2321}&
! 244: {\small 0.0150$\pm$0.0024}&
! 245: {\small 0.012$\pm$0.002}\tabularnewline
! 246: \hline
! 247: {\small $Wccjj\rightarrow$ $\tau\nu+ccjj$}&
! 248: {\small 13.3/2289}&
! 249: {\small 0.0039$\pm$0.0012}&
! 250: {\small 0.034$\pm$0.005}\tabularnewline
! 251: \hline
! 252: {\small $Wcjjj\rightarrow$ $\tau\nu+cjjj$}&
! 253: {\small 8.0/2169}&
! 254: {\small 0.0025$\pm$0.0010}&
! 255: {\small 0.160$\pm$0.020}\tabularnewline
! 256: \hline
! 257: {\small $Wjjjj\rightarrow$ $\tau\nu+jjjj$}&
! 258: {\small 3.3/2683}&
! 259: {\small 0.0009$\pm$0.0006}&
! 260: {\small 0.860$\pm$0.100}\tabularnewline
! 261: \hline
! 262: \end{tabular}
! 263:
! 264:
! 265: \caption{b-tagging and $\tau$ ID results. Shown are the total acceptances
! 266: (including preselection) and the \# of events scaled to Luminosity.}
! 267:
! 268: \label{b and tau}
! 269: \end{table}
! 270:
! 271:
! 272: For the purposes of this analysis we define 3 subsamples out of the
! 273: original preselected data sample:
! 274:
! 275: \begin{itemize}
! 276: \item The {}``signal'' sample - require at least 1 $\tau$ with $NN>0.95$
! 277: and at least one SVT tag (as in Table \ref{cap:btaggingandtau}).
! 278: This is the main sample used for the measurement - 268 events.
! 279: \item The {}``$\tau$ veto sample'' - Same selection, but instead of $NN_{\tau}>0.95$
! 280: $0<NN_{\tau}<0.5$ was required for $\tau$ candidates and no events
! 281: with {}``good'' (NN>0.8) taus were allowed. This sample is used
! 282: for the topological NN training - 21022 events.
! 283: \item The {}``$b$ veto'' sample - at least 1 $\tau$ with $NN>0.95$,
! 284: but NO SVT tags. This sample is to be used for the QCD prediction
! 285: - 4642 events
! 286: \end{itemize}
! 287:
! 288: \subsection{\label{sub:QCD-modeling}QCD modeling}
! 289:
! 290: The difference between the total number of $t\bar{t}$ and $W$ events
! 291: and data has to be attributed to QCD events, where $\tau$ candidate
! 292: is a jet, mistakenly identified as a $\tau$. In order to estimate
! 293: this background contribution the following strategy was employed.
! 294:
! 295:
! 296: \subsubsection{Parametrization}
! 297:
! 298: In this section our definition of the $\tau$ fake rate is different
! 299: from the one in Figure \ref{tauID_Fake_Eff}. There, the goal was
! 300: to determine the total number of fake $\tau$ candidates per event
! 301: in the ALLJET data skim. Now our goal is to estimate the number of
! 302: events that would pass all our signal selection criteria, yet contain
! 303: no physical $\tau$ leptons but only fakes. In other words we are
! 304: modeling the QCD contribution to out final $t\bar{t}$ candidate event
! 305: selection.
! 306:
! 307: We started with the {}``$b$ veto'' sample. It can be considered
! 308: predominantly QCD data sample. Almost all $\tau$ candidates in it
! 309: have to be fake. Figure \ref{cap:taufaketaus} shows the distribution
! 310: of these candidates by $P_{T}$ and $|\eta|$. On the other hand Fig.
! 311: \ref{cap:taufakejets} displays the jets found in the same events.
! 312:
! 313: %
! 314: \begin{figure}
! 315: \includegraphics[scale=0.5]{plots/jet_trf}
! 316:
! 317:
! 318: \caption{Jets in the QCD sample}
! 319:
! 320: \label{cap:taufakejets}
! 321: \end{figure}
! 322:
! 323:
! 324: %
! 325: \begin{figure}
! 326: \includegraphics[scale=0.4]{plots/tau_trf}
! 327:
! 328:
! 329: \caption{$\tau$ candidates in the QCD sample}
! 330:
! 331: \label{cap:taufaketaus}
! 332: \end{figure}
! 333:
! 334:
! 335: Since the $\tau$ here are really jets, we can simply divide one histogram
! 336: by the other bin by bin to parametrize the $\tau$ fake rate. Figure
! 337: \ref{cap:taufakerate} demonstrates this parametrization.
! 338:
! 339: %
! 340: \begin{figure}
! 341: \includegraphics[scale=0.5]{plots/tauTRF}
! 342:
! 343:
! 344: \caption{$\tau$ fake rate parametrization}
! 345:
! 346: \label{cap:taufakerate}
! 347: \end{figure}
! 348:
! 349:
! 350: The large isolated spikes are caused by limited statistics available
! 351: in these bins. In order to reduce this effect and minimize the statistical
! 352: uncertainty we had performed a 2D fit to this distribution. This fit
! 353: is then to be used for the QCD prediction.
! 354:
! 355:
! 356: \subsubsection{Fit\label{sub:Fit}}
! 357:
! 358: The fitting had been performed separately in $\eta$ and $P_{T}$
! 359: projections, that is we had assumed that the 2D parametrization can
! 360: be simply factored in two components:
! 361:
! 362: \[
! 363: F(\eta,P_{T})\equiv A(\eta)\cdot B(P_{T})\]
! 364:
! 365:
! 366: The $\eta$ distributions (as we have observed in section \ref{sub:Signal-characteristics})
! 367: are symmetric around 0, hence we can perform the fit to its absolute
! 368: value. The fitting function was the following:
! 369:
! 370: \[
! 371: A(\eta)\equiv a_{1}+a_{2}\cdot\eta{}^{2}+a_{3}\cdot\eta{}^{3}+a_{4}\cdot\eta{}^{4}+...+a_{7}\cdot\eta{}^{7}\]
! 372:
! 373:
! 374: if $\eta=0$ $a_{1}=0$ was set to avoid singularity.
! 375:
! 376: The fitting function for $P_{T}$ has been picked so that it would
! 377: describe the data well and had not been monotonous (that is we want
! 378: $\lim_{P_{T}\rightarrow\infty}B\left(P_{T}\right)\rightarrow const$)
! 379: :
! 380:
! 381: \[
! 382: B(P_{T})\equiv b_{1}\cdot\exp\left(\frac{P_{T}}{\left(P_{T}+b_{3}\right)^{2}}\right)+b_{2}\cdot\left(\frac{P_{T}}{P_{T}+b_{3}}\right)\]
! 383:
! 384:
! 385: The distributions in $\eta$ and $P_{T}$ had been separately and
! 386: fitted with $A(\eta)$ and $B(P_{T})$. The result of this procedure
! 387: can be observed on Fig. \ref{cap:taufakerate_fit}.
! 388:
! 389: As can be observed, the fit in $\eta$ fails around the $\eta=1$.
! 390: This is the ICD region, which is expected to have different effect
! 391: on different $\tau$ types. In order to account for this effect we
! 392: had performed the fit for each type separately. The result can be
! 393: observed on the Fig. \ref{cap:taufakerate_fit_types}
! 394:
! 395: As can be seen, the effect of the ICD region is largest in type 1
! 396: and is minor fit the type 2. At the same time the $\eta$ distribution
! 397: in signal (Fig. \ref{cap:reco tau}) is fairly uniform.
! 398:
! 399: Hence we had imposed the following cuts to remove these ICD fakes
! 400: :
! 401:
! 402: \begin{itemize}
! 403: \item For type 1: $0.8<|\eta|<1.3$ region cut off
! 404: \item For type 3: $0.85<|\eta|<1.1$ region cut off
! 405: \end{itemize}
! 406: With these cuts, the fits had been much improved (Fig. \ref{cap:taufakerate_fit_types_noeta}).
! 407: The resulting 2D param
! 408:
! 409: %
! 410: \begin{figure}
! 411: \includegraphics[scale=0.4]{plota_may18/fit_alltypes}
! 412:
! 413:
! 414: \caption{Fit of the $\eta$ and $P_{T}$ distributions of the $\tau$ fake
! 415: rate.}
! 416:
! 417: \label{cap:taufakerate_fit}
! 418: \end{figure}
! 419:
! 420:
! 421: %
! 422: \begin{figure}
! 423: {\tiny \subfigure[Type 1 fit]{\includegraphics[scale=0.2]{plota_may18/fit_type1}}\subfigure[Type 2 fit]{\includegraphics[scale=0.2]{plota_may18/fit_type2}}}{\tiny \par}
! 424:
! 425: {\tiny \subfigure[Type 3 fit]{\includegraphics[scale=0.2]{plota_may18/fit_type3}}}{\tiny \par}
! 426:
! 427:
! 428: \caption{Fit of the $\eta$ and $P_{T}$ distributions of the $\tau$ fake
! 429: rate.}
! 430:
! 431: \label{cap:taufakerate_fit_types}
! 432: \end{figure}
! 433:
! 434:
! 435: %
! 436: \begin{figure}
! 437: {\tiny \subfigure[Type 1 fit]{\includegraphics[scale=0.2]{plota_may18/fit_type1_cut}}\subfigure[Type 2 fit]{\includegraphics[scale=0.2]{plota_may18/fit_type2}}}{\tiny \par}
! 438:
! 439: {\tiny \subfigure[Type 3 fit]{\includegraphics[scale=0.2]{plota_may18/fit_type3_cut}}}{\tiny \par}
! 440:
! 441:
! 442: \caption{Fit of the $\eta$ and $P_{T}$ distributions of the $\tau$ fake
! 443: rate. The ICD region had been cut off for the types 1 and 3}
! 444:
! 445: \label{cap:taufakerate_fit_types_noeta}
! 446: \end{figure}
! 447:
! 448:
! 449: As can be seen from Table \ref{b and tau (types)} the type 1 $\tau$
! 450: contribute less then 1 event even before the $\eta$ cut. After the
! 451: cut its contribution is totally negligible, so it was decided to discard
! 452: these events from the $t\bar{t}$ cross section measurement. The final
! 453: 2D parametrization of the $\tau$ fake rate ($F(\eta,P_{T})$) is
! 454: shown on Fig. \ref{cap:taufakerate_fit2D}. In the Table \ref{b and tau (types) after eta}
! 455: we can observe how the $\eta$ cut effects the number of selected
! 456: events.
! 457:
! 458: %
! 459: \begin{table}
! 460: \begin{tabular}{|c|c|}
! 461: \hline
! 462: {\tiny data}&
! 463: {\tiny taggingMC}\tabularnewline
! 464: \hline
! 465: \hline
! 466: {\tiny $\geq1$ $\tau$ with $|\eta|<2.4$ and $P_{T}>20\, GeV$}&
! 467: {\tiny $\geq1$ $\tau$ with $|\eta|<2.4$ and $P_{T}>20\, GeV$}\tabularnewline
! 468: \hline
! 469: {\tiny $\geq1$ SVT}&
! 470: {\tiny $TrigWeight\cdot bTagProb$}\tabularnewline
! 471: \hline
! 472: {\tiny $\geq2$ jets with $|\eta|<2.4$ and $P_{T}>20\, GeV$}&
! 473: {\tiny $\geq2$ jets with $|\eta|<2.4$ and $P_{T}>20\, GeV$}\tabularnewline
! 474: \hline
! 475: \end{tabular}
! 476:
! 477: \begin{tabular}{|c|c|c|c|}
! 478: \hline
! 479: &
! 480: Type 1&
! 481: Type 2&
! 482: Type 3\tabularnewline
! 483: \hline
! 484: \hline
! 485: data&
! 486: 28&
! 487: 91&
! 488: 94\tabularnewline
! 489: \hline
! 490: $t\overline{t}\rightarrow\tau+jets$&
! 491: 0.73$\pm$0.05&
! 492: 5.61$\pm$0.37&
! 493: 3.12$\pm$0.20\tabularnewline
! 494: \hline
! 495: $W\rightarrow\tau\nu+jets$&
! 496: 0.094$\pm$0.005&
! 497: 0.93$\pm$0.04&
! 498: 0.39$\pm$0.02\tabularnewline
! 499: \hline
! 500: \end{tabular}
! 501:
! 502:
! 503: \caption{b-tagging and $\tau$ ID results per type. Shown are the \# of events
! 504: predicted in signal and observed in the data as well as the cuts applied.}
! 505:
! 506: \label{b and tau (types)}
! 507: \end{table}
! 508:
! 509:
! 510: %
! 511: \begin{table}
! 512: \begin{tabular}{|c|c|c|}
! 513: \hline
! 514: &
! 515: Type 2&
! 516: Type 3\tabularnewline
! 517: \hline
! 518: \hline
! 519: data&
! 520: 91&
! 521: 71\tabularnewline
! 522: \hline
! 523: $t\overline{t}\rightarrow\tau+jets$&
! 524: 5.61$\pm$0.37&
! 525: 2.81$\pm$0.18\tabularnewline
! 526: \hline
! 527: $W\rightarrow\tau\nu+jets$&
! 528: 0.93$\pm$0.04&
! 529: 0.32$\pm$0.01\tabularnewline
! 530: \hline
! 531: \end{tabular}
! 532:
! 533:
! 534: \caption{b-tagging and $\tau$ ID results per type after the $\eta$ cut.
! 535: Shown are the \# of events predicted in signal and observed in the
! 536: data as well as the cuts applied.}
! 537:
! 538: \label{b and tau (types) after eta}
! 539: \end{table}
! 540:
! 541:
! 542: %
! 543: \begin{figure}
! 544: \subfigure[Type 2 2D fit]{\includegraphics[scale=0.2]{plota_may18/type2_surf}}\subfigure[Type 3 2D Fit]{\includegraphics[scale=0.2]{plota_may18/type3_surf}}
! 545:
! 546:
! 547: \caption{The 2D combined fit (in $\eta$ and $P_{T}$) of the $\tau$ fake
! 548: rate}
! 549:
! 550: \label{cap:taufakerate_fit2D}
! 551: \end{figure}
! 552:
! 553:
! 554:
! 555: \subsubsection{Closure tests}
! 556:
! 557: In order to test the validity of fitting separately in $\eta$ and
! 558: $P_{T}$ ignoring the possible correlations had to be checked. The
! 559: Fig \ref{cap:Closure_test} demonstrates the closure test that was
! 560: used for this purpose. In the same {}``b veto sample'' we had applied
! 561: the resulting $F(\eta,P_{T})$ to each jet and compared the resulting
! 562: (predicted) $\tau$ distributions with ones obtained from the actual
! 563: $\tau$ candidates (which of cause are predominantly fakes here).
! 564:
! 565: However, one could imagine a pair of 2D distributions that would agree
! 566: perfectly in both projections and yet still be very different. In
! 567: order to test against such a possibility we had performed the same
! 568: cross-check as before, but we required the jets to be from 0.5 to
! 569: 1 in $\eta$. For such $\eta$ {}``slice'' we had applied $F(\eta,P_{T})$
! 570: and compared the actual $P_{T}$ with the predicted. Figure \ref{cap:Closure_test_2}
! 571: demonstrates that the agreement is still fairly good.
! 572:
! 573: %
! 574: \begin{figure}
! 575: \includegraphics[scale=0.2]{plota_may18/closure_eta_2}\includegraphics[scale=0.2]{plota_may18/closure_pt_2}
! 576:
! 577: \includegraphics[scale=0.2]{plota_may18/closure_eta_3}\includegraphics[scale=0.2]{plota_may18/closure_pt_3}
! 578:
! 579:
! 580: \caption{The closure test of the $\tau$ fake rate function. The red histograms
! 581: are for the actual $\tau$ candidates in the {}``veto'' sample.
! 582: The green ones are the prediction. The $\eta$ distribution show some
! 583: discrepancy related to error of the fit.}
! 584:
! 585: \label{cap:Closure_test}
! 586: \end{figure}
! 587:
! 588:
! 589: %
! 590: \begin{figure}
! 591: \subfigure[Type 2]{\includegraphics[scale=0.45]{plots/pt_closure_type2}}\subfigure[Type 3]{\includegraphics[scale=0.45]{plots/pt_closure_type3}}
! 592:
! 593:
! 594: \caption{The closure test of the $\tau$ fake rate function. The red histograms
! 595: are for the actual $\tau$ candidates in the {}``veto'' sample.
! 596: The green ones are the prediction. The jets had been selected with
! 597: $0.5<\eta<1$. An asymmetric range had been chosen to avoid possible
! 598: bias.}
! 599:
! 600: \label{cap:Closure_test_2}
! 601: \end{figure}
! 602:
! 603:
! 604:
! 605: \subsubsection{Computing the QCD fraction}
! 606:
! 607: We assume that probability for a jet to fake a $\tau$ is simply $F(\eta,P_{T})$.
! 608: Then, the probability that at least one of the jets in the event will
! 609: fake $\tau$ can be computed as following:
! 610:
! 611: \begin{center}$P_{event}=1-\prod_{j}(1-F(P_{T}^{j},\eta^{j}))$\par\end{center}
! 612:
! 613: Summing up such probabilities over the tagged data we obtain the QCD
! 614: background estimation.
! 615:
! 616: Using the results described in previous section we get $N_{QCD}=71.13\pm1.56$
! 617: for the $\tau$ type 2 and $N_{QCD}=77.46\pm0.80$ for the $\tau$
! 618: type 3, which agrees with the observed data (in Table \ref{b and tau (types) after eta})
! 619: fairly well. One can also observe (see Appendix) that the predicted
! 620: distributions of the main topological variables (section \ref{sub:NN-variables})
! 621: are in fairly good agreement with what is observed in the data.
! 622:
! 623:
! 624: \subsection{\label{sub:NN-variables}Topological NN}
! 625:
! 626: For signal training sample 7481 preselected $t\overline{t}$ MC events
! 627: were used (NOT the same as the 6141 selection sample events). For
! 628: the background, the $\tau$ veto sample was used.
! 629:
! 630: Similarly to the alljet analysis \cite{alljet} we define 2 networks:
! 631:
! 632: \begin{enumerate}
! 633: \item Contains 3 topological (aplanarity, sphericity and centrality and
! 634: 2 energy-based ( $H_{T}$ and $\sqrt{S}$ ).
! 635: \item Contains the output of the first, W and top mass likelihood, b-jet's
! 636: $P_{T}$ and b-jet's decay lengths.
! 637: \end{enumerate}
! 638: These are the kinematic and topological variables used:
! 639:
! 640: \begin{itemize}
! 641: \item $H_{T}$- the scalar sum of all jet $P_{T}$s (and $\tau$).
! 642: \item Sphericity and Aplanarity - these variables are formed from the eigenvalues
! 643: of the normalized Momentum Tensor of the jets in the event. These
! 644: are expected to be higher in the top pair events than in a typical
! 645: QCD event.
! 646: \item Centrality, defined as $\frac{H_{T}}{H_{E}}$ , where $H_{E}$is sum
! 647: of energies of the jets.
! 648: \item Top and W mass likelihood - $\chi^{2}$-like variable. $L\equiv\left(\frac{M_{3j}-M_{t}}{\sigma_{t}}\right)^{2}+\left(\frac{M_{2j}-M_{w}}{\sigma_{w}}\right)^{2}$,
! 649: where $M_{t},M_{W},\sigma_{t},\sigma_{W}$ are top and W masses (175
! 650: GeV and 80 GeV respectively) and resolution values (45 GeV and 10
! 651: GeV respectively \cite{alljet}). $M_{3j}$ and $M_{2j}$ are composed
! 652: of the jet combinations, so to minimize L.
! 653: \item $P_{T}$ and lifetime significance of the leading b-tagged jet.
! 654: \end{itemize}
! 655: Many of these variables (for instance mass likelihood and aplanarity)
! 656: are only defined for events with 2 or more jets. So, we require now
! 657: 2 jets with $P_{T}$>20 GeV and $|\eta|$<2.5.
! 658:
! 659: Appendix has the plots of all these variables, which serves also as
! 660: an additional check of an agreement between the data and prediction.
! 661: Two of these plots can be observed on Fig. \ref{cap:The-nn0-input-small}.
! 662: As can be seen the NN input variables show fairly good agreement between
! 663: between data and MC, which gives us confidence that the NN will provide
! 664: sensible output, using these variables.
! 665:
! 666: %
! 667: \begin{figure}
! 668: \includegraphics[scale=0.3]{analysis/CONTROLPLOTS/aplan_0_type2}\includegraphics[scale=0.3]{analysis/CONTROLPLOTS/ht_0_type2}
! 669:
! 670:
! 671: \caption{2 of the 5 input variables of the first topological NN before the
! 672: NN cut ($\tau$ type 2). The Kolmogorov-Smirnov (KS) probabilities
! 673: are shown, indicating how good the agreement is.}
! 674:
! 675: \label{cap:The-nn0-input-small}
! 676: \end{figure}
! 677:
! 678:
! 679:
! 680: \subsection{NN optimization}
! 681:
! 682: For training the NN we used the Multi Layer Perceptron (MLP) \cite{MLPfit},
! 683: as implemented in ROOT framework. The input events had been split
! 684: into 7466 train and 14932 test entries. At each of the 500 training
! 685: {}``epochs'' it evaluates the fractional error for both signal and
! 686: background, showing how successful it has been in discriminating the
! 687: test events (Figure \ref{cap:NN-error})
! 688:
! 689: %
! 690: \begin{figure}
! 691: \subfigure[The first NN]{\includegraphics[scale=0.4]{analysis/GOODNN_NOTAU/nn0training300}}\subfigure[The second NN]{\includegraphics[scale=0.4]{analysis/GOODNN_NOTAU/nn1training300}}
! 692:
! 693:
! 694: \caption{NN error. Red is test sample, blue is training sample}
! 695:
! 696: \label{cap:NN-error}
! 697: \end{figure}
! 698:
! 699:
! 700: The resulting NNs are shown on Fig. \ref{cap:NN0} and \ref{cap:NN1}.
! 701: There one can observe the structure of the trained NN (blue interconnected
! 702: nodes) and the performance evaluation based on the training samples.
! 703: In Appendix (Fig \ref{cap:The-resulting-output_type2} and \ref{cap:The-resulting-output_type3})
! 704: we can observe this final NN output in the main analysis data sample
! 705: (as well as in the signal and in the backgrounds).
! 706:
! 707: %
! 708: \begin{figure}
! 709: \subfigure[The first NN]{\includegraphics[scale=0.6]{analysis/GOODNN_NOTAU/nn0analysis300}}
! 710:
! 711:
! 712: \caption{NN0 structure. The upper left plots show the relative impact of the
! 713: variables on the NN output. The bottom left is distribution of NNout,
! 714: the bottom right - efficiencies. Red is signal, blue is background.}
! 715:
! 716: \label{cap:NN0}
! 717: \end{figure}
! 718:
! 719:
! 720: %
! 721: \begin{figure}
! 722: \includegraphics[scale=0.6]{analysis/GOODNN_NOTAU/nn1analysis300}
! 723:
! 724:
! 725: \caption{NN1 structure. The upper left plots show the relative impact of the
! 726: variables on the NN output. The bottom left is distribution of NNout,
! 727: the bottom right - efficiencies. Red is signal, blue is background.}
! 728:
! 729: \label{cap:NN1}
! 730: \end{figure}
! 731:
! 732:
! 733: The result of applying this NN to data is shown on Figure \ref{cap:Result-of-applying}
! 734: . At this point we had to determine what cuts on the topological NN
! 735: output maximize the signal significance. The signal significance is
! 736: defined as $\frac{Number\, of\, signal\, events}{\sqrt{Number\, of\, Signal+Background\, events}}$
! 737: and is shown on Figure \ref{signal-signifficance} . The maximum it
! 738: reaches at $NN1>0.9$ for both type 2 and 3. Therefor this is the
! 739: cut we've used for the cross section measurement. The results of this
! 740: measurement are summarized in Table \ref{cap:RESULTS}
! 741:
! 742: %
! 743: \begin{figure}
! 744: \subfigure[Type 2]{\includegraphics[scale=0.3]{plots/NNresult_tau2}}\subfigure[Type 2 (zoomed)]{\includegraphics[scale=0.3]{plots/NNresult_zoomed_tau2}}
! 745:
! 746: \subfigure[Type 3]{\includegraphics[scale=0.3]{plots/NNresult_tau3}}\subfigure[Type 3 (zoomed)]{\includegraphics[scale=0.3]{plots/NNresult_zoomed_tau3}}
! 747:
! 748:
! 749: \caption{Result of applying NN cut. $t\bar{t}$, $W$ and QCD are plotted
! 750: incrementally in order to compare with \# of events observed in data.
! 751: Error bars include only statistical errors. $\sigma(t\bar{t})=5.54$
! 752: pb is assumed. The right plot only shows the entries with high NN.
! 753: The errors are statistical only.}
! 754:
! 755: \label{cap:Result-of-applying}
! 756: \end{figure}
! 757:
! 758:
! 759: %
! 760: \begin{figure}
! 761: \subfigure[Type 2]{\includegraphics[scale=0.3]{plots/NNresult_signiff_tau2}}\subfigure[Type 3]{\includegraphics[scale=0.3]{plots/NNresult_signiff_tau3}}
! 762:
! 763:
! 764: \caption{$t\bar{t}\rightarrow\tau+jets$ signal significance}
! 765:
! 766: \label{signal-signifficance}
! 767: \end{figure}
! 768:
! 769:
! 770: %
! 771: \begin{table}
! 772: \begin{centering}\begin{tabular}{|c|c|c|c|c|c|c|c|c|}
! 773: \hline
! 774: Channel &
! 775: $N^{obs}$ &
! 776: ${\mathcal{B}}$ &
! 777: $\int{\mathcal{L}}dt$ &
! 778: \multicolumn{2}{c|}{Bakgrounds}&
! 779: $\varepsilon(t\bar{t})$ (\%) &
! 780: $s$ (7 pb) &
! 781: s+b \tabularnewline
! 782: \hline
! 783: $\tau$+jets type 2 &
! 784: 5 &
! 785: 0.1 &
! 786: 349.3 &
! 787: $W\rightarrow\tau\nu$ &
! 788: 0.60$\pm$0.03&
! 789: 1.57$\pm$0.01 &
! 790: 3.83$_{-0.51}^{+0.46}$ &
! 791: 6.84$_{-0.51}^{+0.46}$ \tabularnewline
! 792: &
! 793: &
! 794: &
! 795: &
! 796: fakes &
! 797: 2.41$\pm$0.09 &
! 798: &
! 799: &
! 800: \tabularnewline
! 801: \hline
! 802: $\tau$+jets type 3 &
! 803: 5 &
! 804: 0.1 &
! 805: 349.3 &
! 806: $W\rightarrow\tau\nu$ &
! 807: 0.27$\pm$0.01&
! 808: 0.73$\pm$0.01 &
! 809: 1.80$_{-0.23}^{+0.22}$ &
! 810: 4.39$_{-0.23}^{+0.22}$ \tabularnewline
! 811: &
! 812: &
! 813: &
! 814: &
! 815: fakes &
! 816: 2.33$\pm$0.09 &
! 817: &
! 818: &
! 819: \tabularnewline
! 820: \hline
! 821: \end{tabular}\par\end{centering}
! 822:
! 823:
! 824: \caption{The final result summary after the NN>0.9 cut, $\epsilon(t\bar{t})$
! 825: is the total signal acceptance.}
! 826:
! 827: \label{cap:RESULTS}
! 828: \end{table}
! 829:
! 830:
! 831:
! 832: \section{Systematic uncertainties}
! 833:
! 834: . The most important systematic effects (except of the b-tagging,
! 835: which is treated later) are summarized in Table \ref{cap:Syst}.
! 836:
! 837: %
! 838: \begin{table}
! 839: {\footnotesize }\begin{tabular}{|c||c|c|}
! 840: \hline
! 841: Channel&
! 842: {\footnotesize $\tau$+jets type 2 }&
! 843: {\footnotesize $\tau$+jets type 3 }\tabularnewline
! 844: \hline
! 845: \hline
! 846: {\footnotesize Jet Energy Scale }&
! 847: {\footnotesize $_{-0.27}^{+0.30}$ }&
! 848: {\footnotesize $_{-0.69}^{+0.53}$ }\tabularnewline
! 849: \hline
! 850: {\footnotesize Primary Vertex }&
! 851: {\footnotesize $_{+0.037}^{-0.036}$ }&
! 852: {\footnotesize $_{+0.095}^{-0.093}$ }\tabularnewline
! 853: \hline
! 854: {\footnotesize MC stat }&
! 855: {\tiny $_{+0.25}^{-0.22}$ }&
! 856: {\tiny $_{+0.65}^{-0.58}$ }\tabularnewline
! 857: \hline
! 858: {\footnotesize Trigger }&
! 859: {\footnotesize $_{-0.020}^{+0.0025}$ }&
! 860: {\footnotesize $_{-0.069}^{+0.0056}$ }\tabularnewline
! 861: \hline
! 862: {\footnotesize Branching ratio }&
! 863: {\footnotesize $_{+0.074}^{-0.071}$ }&
! 864: {\footnotesize $_{+0.19}^{-0.18}$ }\tabularnewline
! 865: \hline
! 866: {\footnotesize QCD fake rate parametrization }&
! 867: {\footnotesize $_{+0.17}^{-0.17}$ }&
! 868: {\footnotesize $_{+0.34}^{-0.34}$ }\tabularnewline
! 869: \hline
! 870: $W\rightarrow\tau\nu$&
! 871: {\footnotesize $_{+0.19}^{-0.19}$ }&
! 872: {\footnotesize $_{+0.19}^{-0.19}$ }\tabularnewline
! 873: \hline
! 874: \end{tabular}{\footnotesize \par}
! 875:
! 876:
! 877: \caption{Systematic uncertainties on $\sigma(t\bar{t})$ (in pb).}
! 878:
! 879: \label{cap:Syst}
! 880: \end{table}
! 881:
! 882:
! 883:
! 884: \subsection{JES}
! 885:
! 886: The energy scale corrections applied to data and MC have uncertainties
! 887: associated with them. These uncertainties result in systematic shift
! 888: in the measured cross section. To compute these systematics the JES
! 889: corrections in MC were shifted up (or down) by $\delta JES^{data}=\sqrt{(\delta_{syst}^{data})^{2}+(\delta_{stat}^{data})^{2}+(\delta_{syst}^{MC})^{2}+(\delta_{stat}^{MC})^{2}}$.
! 890:
! 891:
! 892: \subsection{Primary Vertex and Branching Ratio}
! 893:
! 894: The PV and $t\bar{t}$ and W branching fractions had been assigned
! 895: uncertainties of 1\% and 2\% correspondingly, same as in \cite{alljet}
! 896:
! 897:
! 898: \subsection{Luminosity}
! 899:
! 900: The total integrated luminosity of the data used in this analysis
! 901: is $349\pm23$. This error yields to the uncertainty quoted in Table
! 902: \ref{cap:Syst}.
! 903:
! 904:
! 905: \subsection{Trigger}
! 906:
! 907: The trigger parametrization systematics is computed by top\_trigger
! 908: \cite{top_trigger}.
! 909:
! 910:
! 911: \subsection{B-tagging}
! 912:
! 913: B-tagging uncertainty effects are taken into account by varying the
! 914: systematic and statistical errors on the MC tagging weights.
! 915:
! 916: These errors arise form several independent sources:
! 917:
! 918: \begin{itemize}
! 919: \item B-jet tagging parametrization.
! 920: \item C-jet tagging parametrization.
! 921: \item Light jet tagging parametrization (negative tag rate). Derived by
! 922: varying by $\pm1\sigma$ the parametrization and adding in quadrature
! 923: 8\% relative uncertainty from the variation of the negative tag rate
! 924: measured in different samples.
! 925: \item Systematic uncertainties on the scale factors $SF_{hf}$ and $SF_{ll}$
! 926: are derived from the statistical error due to finite MC statistics.
! 927: \item Semi-leptonic b-tagging efficiency parametrization in MC and in data
! 928: (System 8).
! 929: \item Taggability. This includes the statistical error due to finite statistic
! 930: in the samples from which it had been derived and systematic, reflecting
! 931: the (neglected) taggability dependence on the jet multiplicity.
! 932: \end{itemize}
! 933: The resulting effect of all of these error sources on the final number
! 934: is summarized in Table \ref{cap:b-tagging-systematics-sources}
! 935: along with the total b-ID systematic error (quoted in Table \ref{cap:Syst}).
! 936:
! 937: %
! 938: \begin{table}
! 939: \begin{tabular}{|c|c|c|}
! 940: \hline
! 941: Channel&
! 942: {\footnotesize $\tau$+jets type 2 }&
! 943: {\footnotesize $\tau$+jets type 3 }\tabularnewline
! 944: \hline
! 945: \hline
! 946: b-tagging&
! 947: {\tiny $_{-0.13}^{+0.076}$ }&
! 948: {\tiny $_{-0.26}^{+0.41}$ }\tabularnewline
! 949: \hline
! 950: c-tagging&
! 951: {\tiny $_{-0.20}^{+0.16}$ }&
! 952: {\tiny $_{-0.48}^{+0.60}$ }\tabularnewline
! 953: \hline
! 954: l-tagging&
! 955: {\tiny $_{-0.0051}^{+0.0051}$ }&
! 956: {\tiny $_{-0.014}^{+0.014}$ }\tabularnewline
! 957: \hline
! 958: $SF_{hf}$&
! 959: {\tiny $_{-0.00036}^{+0.00036}$ }&
! 960: {\tiny $_{-0.00094}^{+0.00094}$ }\tabularnewline
! 961: \hline
! 962: $SF_{ll}$&
! 963: {\tiny $_{-0.00036}^{+0.00036}$ }&
! 964: {\tiny $_{-0.00094}^{+0.00094}$ }\tabularnewline
! 965: \hline
! 966: $\mu$ b-tagging (data)&
! 967: {\tiny $_{-0.091}^{+0.094}$ }&
! 968: {\tiny $_{-0.24}^{+0.25}$ }\tabularnewline
! 969: \hline
! 970: $\mu$ b-tagging (MC)&
! 971: {\tiny $_{+0.11}^{-0.10}$ }&
! 972: {\tiny $_{+0.28}^{-0.25}$ }\tabularnewline
! 973: \hline
! 974: taggability&
! 975: {\tiny $_{-0.048}^{+0.049}$ }&
! 976: {\tiny $_{-0.13}^{+0.13}$ }\tabularnewline
! 977: \hline
! 978: \end{tabular}
! 979:
! 980:
! 981: \caption{b-tagging systematics sources}
! 982:
! 983: \label{cap:b-tagging-systematics-sources}
! 984: \end{table}
! 985:
! 986:
! 987:
! 988: \subsection{Fake rate}
! 989:
! 990: The systematic uncertainty, associated with the $\tau$ fake rate
! 991: is just the statistical error of the fit, described in section \ref{sub:Fit}.
! 992:
! 993:
! 994: \subsection{W background prediction}
! 995:
! 996: The method used to describe the $W\rightarrow\tau\nu$ background
! 997: is not perfect. There are two potential sources of error
! 998:
! 999: \begin{itemize}
! 1000: \item Only W+4 partons MC had been used. It is however expected that W+2
! 1001: and W+3 would some (albeit smaller) contribution. In order to properly
! 1002: take this into account one would need to combine all jet multiplicity
! 1003: samples. This leads to slight underestimation of the result.
! 1004: \item The {}``$b$ veto'' sample may contain some W contribution, from
! 1005: wjjjj events. This leads to double-counting of these vents and hence
! 1006: overestimation of the result.
! 1007: \end{itemize}
! 1008: A conservative estimate of 50\% uncertainty on the number of W events
! 1009: in the final sample had been applied. That is, by varying this number
! 1010: up and down by 50\% we observed the effect on the cross section (as
! 1011: quoted in Table \ref{cap:Syst}).
! 1012:
! 1013:
! 1014: \section{Cross section}
! 1015:
! 1016: The cross section is defined as $\sigma=\frac{Number\, of\, signal\, events}{\varepsilon(t\bar{t})\cdot BR(t\bar{t})\cdot Luminosity}$.
! 1017: The results was the following:
! 1018:
! 1019: \begin{center}$\tau$+jets type 2 cross section: \[
! 1020: 3.63\;\;_{-3.50}^{+4.72}\;\;(stat)\;\;_{-0.48}^{+0.49}\;\;(syst)\;\;\pm0.24\;\;(lumi)\;\; pb\]
! 1021: \par\end{center}
! 1022:
! 1023: \begin{center}$\tau$+jets type 3 cross section: \[
! 1024: 9.39\;\;_{-7.49}^{+10.10}\;\;(stat)\;\;_{-1.18}^{+1.25}\;\;(syst)\;\;\pm0.61\;\;(lumi)\;\; pb\]
! 1025: \par\end{center}
! 1026:
! 1027: The combined cross section was estimated by minimizing the sum of
! 1028: the negative log-likelihood functions for each channel. Functional
! 1029: form of the likelihood function was the same that had been used for
! 1030: the $e\mu$ channel (\cite{emu}). Combined cross section yields
! 1031:
! 1032: \begin{center}\[
! 1033: 5.05\;\;_{-3.46}^{+4.31}\;\;(stat)\;\;_{-0.67}^{+0.68}\;\;(syst)\;\;\pm0.33\;\;(lumi)\;\; pb\]
! 1034: \par\end{center}
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>