Annotation of ttbar/p20_taujets_note/Analysis.tex, revision 1.1.1.1
1.1 uid12904 1:
2: \section{Analysis}
3:
4:
5: \subsection{Outline}
6:
7: The analysis procedure involved several stages:
8:
9: \begin{itemize}
10: \item Preselection (section \ref{sub:Preselection}). At least 4 jets and
11: $\not\!\! E_{T}$ significance > 3. 653727 events selected in the
12: data, 109.93 $\pm$7.26 $t\bar{t}$ among them are expected. S:B =
13: 1:6000.
14: \item ID cuts (section \ref{sub:Results-of-the}) . At least one good $\tau$
15: candidate and at at least one tight SVT tag is requited. We also required
16: $\geq2$ jets with $|\eta|<2.4$ and $P_{T}>20$ GeV. 216 events selected
17: in the data, 9.320$\pm$0.620 $t\bar{t}$ among them are expected.
18: S:B = 1:58.
19: \item Topological NN (section \ref{sub:NN-variables}). A sequence of two
20: feed-forward NN had been trained and applied. The optimal cut on the
21: second NN has been found to be 0.6. With this final cut we had obtained
22: 13 events in data with 4.93$\pm$0.33 $t\bar{t}$ among them are expected.
23: S:B = 1:2.5.
24: \end{itemize}
25: The W background had been modeled using ALPGEN Monte Carlo simulation,
26: while QCD had been extracted from the data using procedure, described
27: in section \ref{sub:QCD-modeling}.
28:
29:
30: \subsection{\label{sub:Preselection}Preselection}
31:
32: The total number of events in this 351 $pb^{-1}$data skim is 17 millions.
33: This is a very large and rather unwieldy dataset. Hence, the main
34: goal of preselection was to reduce this dataset while imposing the
35: most obvious and straightforward requirements, characterizing my signal
36: signature. Such characteristic features include the following:
37:
38: \begin{itemize}
39: \item Moderate $\not\!\! E_{T}$ arising from both the W vertex and $\tau$
40: decay.
41: \item At least 4 jets have to be present.
42: \item $\tau$lepton and 2 b-jets are present.
43: \end{itemize}
44: Since both $\tau$ ID and b-tagging involve complex algorithms which
45: are likely to be signal-sensitive and may require extensive \char`\"{}tuning\char`\"{},
46: we've chosen not to use them at the preselection stage.
47:
48: Similarly, we had chosen not to impose any jet $P_{T}$ cuts, since
49: such cuts strongly depend on the JES corrections and associated errors
50: and hence are better to be applied at a later stage.
51:
52: The first 3 preselection criteria were chosen similar to the $t\overline{t}\rightarrow jets$
53: analysis \cite{alljet}:
54:
55: \begin{itemize}
56: \item Primary Vertex is reconstructed and is within the central tracker
57: volume (60 cm in Z from the detector center) and has at least 3 tracks
58: associated with it.
59: \item Veto on isolated electrons and muons to avoid overlap with the $t\overline{t}\rightarrow lepton+jets$
60: cross section analysis.
61: \item $N_{jets}\geq4$ with $P_{T}>8\, GeV$.
62: \end{itemize}
63: At this point, $t\overline{t}\rightarrow e+jets$ and $t\overline{t}\rightarrow\mu+jets$
64: analysis \cite{l+jets} are applying cuts on $\Delta\phi$ between
65: the lepton and $\not\!\! E_{T}$ as well as so-called \char`\"{}triangular\char`\"{}
66: cuts in $\Delta\phi$ - $\not\!\! E_{T}$ plane. The goal is to eliminate
67: the events with fake $\not\!\! E_{T}$ . The neutrino and lepton coming
68: from the W are expected to fly opposite direction most of the time.
69: However, as can be observed on Figure \ref{cap:dphi}, no such simple
70: cuts are obvious in case of $\tau$ . That is to be expected since
71: $\tau$ itself emits a neutrino in its decay, contributing to $\not\!\! E_{T}$
72: . So, instead a new variable is proposed to cut off the fake $\not\!\! E_{T}$
73: events and reduce the sample size.
74:
75: %
76: \begin{figure}
77: \includegraphics[scale=0.4]{analysis/plots/dphitaumet}
78:
79:
80: \caption{$\Delta\phi$ between $\tau$ and $\not\!\! E_{T}$ for QCD (black)
81: and $t\bar{t}\rightarrow\tau+jets$ (red).}
82:
83: \label{cap:dphi}
84: \end{figure}
85:
86:
87: %
88: \begin{figure}
89: \includegraphics[scale=0.4]{analysis/plots/metl}
90:
91:
92: \caption{$\not\!\! E_{T}$ significance for QCD and $t\bar{t}\rightarrow\tau+jets$.}
93:
94: \label{cap:metl}
95: \end{figure}
96:
97:
98: $\not\!\! E_{T}$ significance \cite{metl} is defined as measure
99: of likelihood of $\not\!\! E_{T}$ arising from physical sources,
100: rather than fluctuations in detector measurements. As can be observed
101: on Fig. \ref{cap:metl} it proves to be an effective way to reduce
102: the data skim. Cut of 3 was used for preselection.
103:
104: Now we need to scale the original 10K events of the MC sample to 349
105: $pb^{-1}$. The total $t\bar{t}$ cross section is 6.8 pb \cite{NNLO}.
106: Taking into account the branching fraction to hadronic $\tau+jets$
107: mode, the effective cross section comes out to be:
108:
109: $B(\tau\rightarrow hadrons)\cdot B(t\bar{t}\rightarrow\tau+jets)\cdot\sigma(t\bar{t})=0.65\cdot0.15\cdot6.8=0.66$
110: pb
111:
112: Throughout this work we had however used the value of $\sigma(t\bar{t})$
113: of 5.5 pb, the value, computed by the ALPGEN simulation, taking into
114: account the generation cuts. The effective cross section used for
115: scaling is then 0.53 pb. Since this value is only used for reference
116: and optimization of S:B it's of no importance which number is used.
117:
118: The relative flavor fractions of the $W+4jets$ process were taken
119: from ALPGEN simulation as ratios of the simulated cross section. It
120: was then normalized to the measured total value of 4.5 $\pm$ 2.2
121: pb \cite{W+4j}
122:
123: Table \ref{presel} shows the results of the preselection for both
124: data and the backgrounds.
125:
126: %
127: \begin{table}
128: \begin{tabular}{|c|c|c|c|}
129: \hline
130: &
131: \# passed&
132: ALPGEN $\sigma$, pb&
133: \# passed scaled\tabularnewline
134: \hline
135: \hline
136: data&
137: 653727/17M&
138: &
139: 653727\tabularnewline
140: \hline
141: $t\overline{t}\rightarrow\tau+jets$&
142: 6141/10878&
143: 0.821 $\pm$ 0.004&
144: 109.93 $\pm$7.26\tabularnewline
145: \hline
146: $Wbbjj\rightarrow$ $\tau\nu+bbjj$&
147: 2321/11576&
148: 0.222 $\pm$ 0.044&
149: 9.98 $\pm$ 2.08\tabularnewline
150: \hline
151: $Wccjj\rightarrow$ $\tau\nu+ccjj$&
152: 2289/10995&
153: 0.527 $\pm$ 0.059&
154: 24.77 $\pm$ 3.22\tabularnewline
155: \hline
156: $Wcjjj\rightarrow$ $\tau\nu+cjjj$&
157: 2169/10435&
158: 0.920 $\pm$0.087 &
159: 42.23 $\pm$ 4.87\tabularnewline
160: \hline
161: $Wjjjj\rightarrow$ $\tau\nu+jjjj$&
162: 2683/11920&
163: 14.14 $\pm$ 1.3&
164: 720.33 $\pm$ 81.48 \tabularnewline
165: \hline
166: \end{tabular}
167:
168:
169: \caption{Preselection results. Shown are the total acceptances (including
170: preselection) and the \# of events scaled to 349 $\pm$23 $pb^{-1}$
171: (no systematic uncertainties except for this luminosity error are
172: included). The Alpgen samples generation cuts are described in \cite{l+jets}.}
173:
174: \label{presel}
175: \end{table}
176:
177:
178:
179: \subsection{\label{sub:Results-of-the}Results of the ID cuts}
180:
181: The next step was to apply the requirement of $\tau$ and b tagging.
182: Table \ref{cap:btaggingandtau} shows the selection criteria that
183: we apply to data and MC and the resulting selection efficiencies.
184: The results of this procedure can be observed in Table \ref{b and tau}.
185: It can be noted that S:B at this stage is 1:58, which is way too low.
186: In section \ref{sub:NN-variables} we will describe the topological
187: NN used to enhance the signal content.
188:
189: %
190: \begin{table}
191: \begin{tabular}{|c|c|c|}
192: \hline
193: &
194: {\scriptsize data}&
195: {\scriptsize taggingMC}\tabularnewline
196: \hline
197: \hline
198: &
199: {\scriptsize $\geq1$ $\tau$ with $|\eta|<2.4$ and $P_{T}>20\, GeV$}&
200: {\scriptsize $\geq1$ $\tau$ with $|\eta|<2.4$ and $P_{T}>20\, GeV$}\tabularnewline
201: \hline
202: &
203: {\scriptsize $\geq1$ SVT}&
204: {\scriptsize $TrigWeight\cdot bTagProb$}\tabularnewline
205: \hline
206: &
207: {\scriptsize $\geq2$ jets with $|\eta|<2.4$ and $P_{T}>20\, GeV$}&
208: {\scriptsize $\geq2$ jets with $|\eta|<2.4$ and $P_{T}>20\, GeV$}\tabularnewline
209: \hline
210: \end{tabular}
211:
212:
213: \caption{b-tagging and $\tau$ ID. In the MC we use the b-tagging certified
214: parametrization rather then actual b-tagging, that is we applied the
215: b-tagging weight ($bTagProb$). We also used the triggering weight
216: as computed by top\_trigger.}
217:
218: \label{cap:btaggingandtau}
219: \end{table}
220:
221:
222: %
223: \begin{table}
224: \begin{tabular}{|c|c|c|c|}
225: \hline
226: &
227: {\small \# passed}&
228: {\small Acceptance}&
229: {\small \# passed scaled}\tabularnewline
230: \hline
231: \hline
232: {\small data}&
233: {\small 216/653727}&
234: &
235: {\small 216}\tabularnewline
236: \hline
237: {\small $t\overline{t}\rightarrow\tau+jets$}&
238: {\small 524.0/6141}&
239: {\small 0.0480$\pm$0.0020}&
240: {\small 9.320$\pm$0.620}\tabularnewline
241: \hline
242: {\small $Wbbjj\rightarrow$ $\tau\nu+bbjj$}&
243: {\small 54.5/2321}&
244: {\small 0.0150$\pm$0.0024}&
245: {\small 0.012$\pm$0.002}\tabularnewline
246: \hline
247: {\small $Wccjj\rightarrow$ $\tau\nu+ccjj$}&
248: {\small 13.3/2289}&
249: {\small 0.0039$\pm$0.0012}&
250: {\small 0.034$\pm$0.005}\tabularnewline
251: \hline
252: {\small $Wcjjj\rightarrow$ $\tau\nu+cjjj$}&
253: {\small 8.0/2169}&
254: {\small 0.0025$\pm$0.0010}&
255: {\small 0.160$\pm$0.020}\tabularnewline
256: \hline
257: {\small $Wjjjj\rightarrow$ $\tau\nu+jjjj$}&
258: {\small 3.3/2683}&
259: {\small 0.0009$\pm$0.0006}&
260: {\small 0.860$\pm$0.100}\tabularnewline
261: \hline
262: \end{tabular}
263:
264:
265: \caption{b-tagging and $\tau$ ID results. Shown are the total acceptances
266: (including preselection) and the \# of events scaled to Luminosity.}
267:
268: \label{b and tau}
269: \end{table}
270:
271:
272: For the purposes of this analysis we define 3 subsamples out of the
273: original preselected data sample:
274:
275: \begin{itemize}
276: \item The {}``signal'' sample - require at least 1 $\tau$ with $NN>0.95$
277: and at least one SVT tag (as in Table \ref{cap:btaggingandtau}).
278: This is the main sample used for the measurement - 268 events.
279: \item The {}``$\tau$ veto sample'' - Same selection, but instead of $NN_{\tau}>0.95$
280: $0<NN_{\tau}<0.5$ was required for $\tau$ candidates and no events
281: with {}``good'' (NN>0.8) taus were allowed. This sample is used
282: for the topological NN training - 21022 events.
283: \item The {}``$b$ veto'' sample - at least 1 $\tau$ with $NN>0.95$,
284: but NO SVT tags. This sample is to be used for the QCD prediction
285: - 4642 events
286: \end{itemize}
287:
288: \subsection{\label{sub:QCD-modeling}QCD modeling}
289:
290: The difference between the total number of $t\bar{t}$ and $W$ events
291: and data has to be attributed to QCD events, where $\tau$ candidate
292: is a jet, mistakenly identified as a $\tau$. In order to estimate
293: this background contribution the following strategy was employed.
294:
295:
296: \subsubsection{Parametrization}
297:
298: In this section our definition of the $\tau$ fake rate is different
299: from the one in Figure \ref{tauID_Fake_Eff}. There, the goal was
300: to determine the total number of fake $\tau$ candidates per event
301: in the ALLJET data skim. Now our goal is to estimate the number of
302: events that would pass all our signal selection criteria, yet contain
303: no physical $\tau$ leptons but only fakes. In other words we are
304: modeling the QCD contribution to out final $t\bar{t}$ candidate event
305: selection.
306:
307: We started with the {}``$b$ veto'' sample. It can be considered
308: predominantly QCD data sample. Almost all $\tau$ candidates in it
309: have to be fake. Figure \ref{cap:taufaketaus} shows the distribution
310: of these candidates by $P_{T}$ and $|\eta|$. On the other hand Fig.
311: \ref{cap:taufakejets} displays the jets found in the same events.
312:
313: %
314: \begin{figure}
315: \includegraphics[scale=0.5]{plots/jet_trf}
316:
317:
318: \caption{Jets in the QCD sample}
319:
320: \label{cap:taufakejets}
321: \end{figure}
322:
323:
324: %
325: \begin{figure}
326: \includegraphics[scale=0.4]{plots/tau_trf}
327:
328:
329: \caption{$\tau$ candidates in the QCD sample}
330:
331: \label{cap:taufaketaus}
332: \end{figure}
333:
334:
335: Since the $\tau$ here are really jets, we can simply divide one histogram
336: by the other bin by bin to parametrize the $\tau$ fake rate. Figure
337: \ref{cap:taufakerate} demonstrates this parametrization.
338:
339: %
340: \begin{figure}
341: \includegraphics[scale=0.5]{plots/tauTRF}
342:
343:
344: \caption{$\tau$ fake rate parametrization}
345:
346: \label{cap:taufakerate}
347: \end{figure}
348:
349:
350: The large isolated spikes are caused by limited statistics available
351: in these bins. In order to reduce this effect and minimize the statistical
352: uncertainty we had performed a 2D fit to this distribution. This fit
353: is then to be used for the QCD prediction.
354:
355:
356: \subsubsection{Fit\label{sub:Fit}}
357:
358: The fitting had been performed separately in $\eta$ and $P_{T}$
359: projections, that is we had assumed that the 2D parametrization can
360: be simply factored in two components:
361:
362: \[
363: F(\eta,P_{T})\equiv A(\eta)\cdot B(P_{T})\]
364:
365:
366: The $\eta$ distributions (as we have observed in section \ref{sub:Signal-characteristics})
367: are symmetric around 0, hence we can perform the fit to its absolute
368: value. The fitting function was the following:
369:
370: \[
371: A(\eta)\equiv a_{1}+a_{2}\cdot\eta{}^{2}+a_{3}\cdot\eta{}^{3}+a_{4}\cdot\eta{}^{4}+...+a_{7}\cdot\eta{}^{7}\]
372:
373:
374: if $\eta=0$ $a_{1}=0$ was set to avoid singularity.
375:
376: The fitting function for $P_{T}$ has been picked so that it would
377: describe the data well and had not been monotonous (that is we want
378: $\lim_{P_{T}\rightarrow\infty}B\left(P_{T}\right)\rightarrow const$)
379: :
380:
381: \[
382: B(P_{T})\equiv b_{1}\cdot\exp\left(\frac{P_{T}}{\left(P_{T}+b_{3}\right)^{2}}\right)+b_{2}\cdot\left(\frac{P_{T}}{P_{T}+b_{3}}\right)\]
383:
384:
385: The distributions in $\eta$ and $P_{T}$ had been separately and
386: fitted with $A(\eta)$ and $B(P_{T})$. The result of this procedure
387: can be observed on Fig. \ref{cap:taufakerate_fit}.
388:
389: As can be observed, the fit in $\eta$ fails around the $\eta=1$.
390: This is the ICD region, which is expected to have different effect
391: on different $\tau$ types. In order to account for this effect we
392: had performed the fit for each type separately. The result can be
393: observed on the Fig. \ref{cap:taufakerate_fit_types}
394:
395: As can be seen, the effect of the ICD region is largest in type 1
396: and is minor fit the type 2. At the same time the $\eta$ distribution
397: in signal (Fig. \ref{cap:reco tau}) is fairly uniform.
398:
399: Hence we had imposed the following cuts to remove these ICD fakes
400: :
401:
402: \begin{itemize}
403: \item For type 1: $0.8<|\eta|<1.3$ region cut off
404: \item For type 3: $0.85<|\eta|<1.1$ region cut off
405: \end{itemize}
406: With these cuts, the fits had been much improved (Fig. \ref{cap:taufakerate_fit_types_noeta}).
407: The resulting 2D param
408:
409: %
410: \begin{figure}
411: \includegraphics[scale=0.4]{plota_may18/fit_alltypes}
412:
413:
414: \caption{Fit of the $\eta$ and $P_{T}$ distributions of the $\tau$ fake
415: rate.}
416:
417: \label{cap:taufakerate_fit}
418: \end{figure}
419:
420:
421: %
422: \begin{figure}
423: {\tiny \subfigure[Type 1 fit]{\includegraphics[scale=0.2]{plota_may18/fit_type1}}\subfigure[Type 2 fit]{\includegraphics[scale=0.2]{plota_may18/fit_type2}}}{\tiny \par}
424:
425: {\tiny \subfigure[Type 3 fit]{\includegraphics[scale=0.2]{plota_may18/fit_type3}}}{\tiny \par}
426:
427:
428: \caption{Fit of the $\eta$ and $P_{T}$ distributions of the $\tau$ fake
429: rate.}
430:
431: \label{cap:taufakerate_fit_types}
432: \end{figure}
433:
434:
435: %
436: \begin{figure}
437: {\tiny \subfigure[Type 1 fit]{\includegraphics[scale=0.2]{plota_may18/fit_type1_cut}}\subfigure[Type 2 fit]{\includegraphics[scale=0.2]{plota_may18/fit_type2}}}{\tiny \par}
438:
439: {\tiny \subfigure[Type 3 fit]{\includegraphics[scale=0.2]{plota_may18/fit_type3_cut}}}{\tiny \par}
440:
441:
442: \caption{Fit of the $\eta$ and $P_{T}$ distributions of the $\tau$ fake
443: rate. The ICD region had been cut off for the types 1 and 3}
444:
445: \label{cap:taufakerate_fit_types_noeta}
446: \end{figure}
447:
448:
449: As can be seen from Table \ref{b and tau (types)} the type 1 $\tau$
450: contribute less then 1 event even before the $\eta$ cut. After the
451: cut its contribution is totally negligible, so it was decided to discard
452: these events from the $t\bar{t}$ cross section measurement. The final
453: 2D parametrization of the $\tau$ fake rate ($F(\eta,P_{T})$) is
454: shown on Fig. \ref{cap:taufakerate_fit2D}. In the Table \ref{b and tau (types) after eta}
455: we can observe how the $\eta$ cut effects the number of selected
456: events.
457:
458: %
459: \begin{table}
460: \begin{tabular}{|c|c|}
461: \hline
462: {\tiny data}&
463: {\tiny taggingMC}\tabularnewline
464: \hline
465: \hline
466: {\tiny $\geq1$ $\tau$ with $|\eta|<2.4$ and $P_{T}>20\, GeV$}&
467: {\tiny $\geq1$ $\tau$ with $|\eta|<2.4$ and $P_{T}>20\, GeV$}\tabularnewline
468: \hline
469: {\tiny $\geq1$ SVT}&
470: {\tiny $TrigWeight\cdot bTagProb$}\tabularnewline
471: \hline
472: {\tiny $\geq2$ jets with $|\eta|<2.4$ and $P_{T}>20\, GeV$}&
473: {\tiny $\geq2$ jets with $|\eta|<2.4$ and $P_{T}>20\, GeV$}\tabularnewline
474: \hline
475: \end{tabular}
476:
477: \begin{tabular}{|c|c|c|c|}
478: \hline
479: &
480: Type 1&
481: Type 2&
482: Type 3\tabularnewline
483: \hline
484: \hline
485: data&
486: 28&
487: 91&
488: 94\tabularnewline
489: \hline
490: $t\overline{t}\rightarrow\tau+jets$&
491: 0.73$\pm$0.05&
492: 5.61$\pm$0.37&
493: 3.12$\pm$0.20\tabularnewline
494: \hline
495: $W\rightarrow\tau\nu+jets$&
496: 0.094$\pm$0.005&
497: 0.93$\pm$0.04&
498: 0.39$\pm$0.02\tabularnewline
499: \hline
500: \end{tabular}
501:
502:
503: \caption{b-tagging and $\tau$ ID results per type. Shown are the \# of events
504: predicted in signal and observed in the data as well as the cuts applied.}
505:
506: \label{b and tau (types)}
507: \end{table}
508:
509:
510: %
511: \begin{table}
512: \begin{tabular}{|c|c|c|}
513: \hline
514: &
515: Type 2&
516: Type 3\tabularnewline
517: \hline
518: \hline
519: data&
520: 91&
521: 71\tabularnewline
522: \hline
523: $t\overline{t}\rightarrow\tau+jets$&
524: 5.61$\pm$0.37&
525: 2.81$\pm$0.18\tabularnewline
526: \hline
527: $W\rightarrow\tau\nu+jets$&
528: 0.93$\pm$0.04&
529: 0.32$\pm$0.01\tabularnewline
530: \hline
531: \end{tabular}
532:
533:
534: \caption{b-tagging and $\tau$ ID results per type after the $\eta$ cut.
535: Shown are the \# of events predicted in signal and observed in the
536: data as well as the cuts applied.}
537:
538: \label{b and tau (types) after eta}
539: \end{table}
540:
541:
542: %
543: \begin{figure}
544: \subfigure[Type 2 2D fit]{\includegraphics[scale=0.2]{plota_may18/type2_surf}}\subfigure[Type 3 2D Fit]{\includegraphics[scale=0.2]{plota_may18/type3_surf}}
545:
546:
547: \caption{The 2D combined fit (in $\eta$ and $P_{T}$) of the $\tau$ fake
548: rate}
549:
550: \label{cap:taufakerate_fit2D}
551: \end{figure}
552:
553:
554:
555: \subsubsection{Closure tests}
556:
557: In order to test the validity of fitting separately in $\eta$ and
558: $P_{T}$ ignoring the possible correlations had to be checked. The
559: Fig \ref{cap:Closure_test} demonstrates the closure test that was
560: used for this purpose. In the same {}``b veto sample'' we had applied
561: the resulting $F(\eta,P_{T})$ to each jet and compared the resulting
562: (predicted) $\tau$ distributions with ones obtained from the actual
563: $\tau$ candidates (which of cause are predominantly fakes here).
564:
565: However, one could imagine a pair of 2D distributions that would agree
566: perfectly in both projections and yet still be very different. In
567: order to test against such a possibility we had performed the same
568: cross-check as before, but we required the jets to be from 0.5 to
569: 1 in $\eta$. For such $\eta$ {}``slice'' we had applied $F(\eta,P_{T})$
570: and compared the actual $P_{T}$ with the predicted. Figure \ref{cap:Closure_test_2}
571: demonstrates that the agreement is still fairly good.
572:
573: %
574: \begin{figure}
575: \includegraphics[scale=0.2]{plota_may18/closure_eta_2}\includegraphics[scale=0.2]{plota_may18/closure_pt_2}
576:
577: \includegraphics[scale=0.2]{plota_may18/closure_eta_3}\includegraphics[scale=0.2]{plota_may18/closure_pt_3}
578:
579:
580: \caption{The closure test of the $\tau$ fake rate function. The red histograms
581: are for the actual $\tau$ candidates in the {}``veto'' sample.
582: The green ones are the prediction. The $\eta$ distribution show some
583: discrepancy related to error of the fit.}
584:
585: \label{cap:Closure_test}
586: \end{figure}
587:
588:
589: %
590: \begin{figure}
591: \subfigure[Type 2]{\includegraphics[scale=0.45]{plots/pt_closure_type2}}\subfigure[Type 3]{\includegraphics[scale=0.45]{plots/pt_closure_type3}}
592:
593:
594: \caption{The closure test of the $\tau$ fake rate function. The red histograms
595: are for the actual $\tau$ candidates in the {}``veto'' sample.
596: The green ones are the prediction. The jets had been selected with
597: $0.5<\eta<1$. An asymmetric range had been chosen to avoid possible
598: bias.}
599:
600: \label{cap:Closure_test_2}
601: \end{figure}
602:
603:
604:
605: \subsubsection{Computing the QCD fraction}
606:
607: We assume that probability for a jet to fake a $\tau$ is simply $F(\eta,P_{T})$.
608: Then, the probability that at least one of the jets in the event will
609: fake $\tau$ can be computed as following:
610:
611: \begin{center}$P_{event}=1-\prod_{j}(1-F(P_{T}^{j},\eta^{j}))$\par\end{center}
612:
613: Summing up such probabilities over the tagged data we obtain the QCD
614: background estimation.
615:
616: Using the results described in previous section we get $N_{QCD}=71.13\pm1.56$
617: for the $\tau$ type 2 and $N_{QCD}=77.46\pm0.80$ for the $\tau$
618: type 3, which agrees with the observed data (in Table \ref{b and tau (types) after eta})
619: fairly well. One can also observe (see Appendix) that the predicted
620: distributions of the main topological variables (section \ref{sub:NN-variables})
621: are in fairly good agreement with what is observed in the data.
622:
623:
624: \subsection{\label{sub:NN-variables}Topological NN}
625:
626: For signal training sample 7481 preselected $t\overline{t}$ MC events
627: were used (NOT the same as the 6141 selection sample events). For
628: the background, the $\tau$ veto sample was used.
629:
630: Similarly to the alljet analysis \cite{alljet} we define 2 networks:
631:
632: \begin{enumerate}
633: \item Contains 3 topological (aplanarity, sphericity and centrality and
634: 2 energy-based ( $H_{T}$ and $\sqrt{S}$ ).
635: \item Contains the output of the first, W and top mass likelihood, b-jet's
636: $P_{T}$ and b-jet's decay lengths.
637: \end{enumerate}
638: These are the kinematic and topological variables used:
639:
640: \begin{itemize}
641: \item $H_{T}$- the scalar sum of all jet $P_{T}$s (and $\tau$).
642: \item Sphericity and Aplanarity - these variables are formed from the eigenvalues
643: of the normalized Momentum Tensor of the jets in the event. These
644: are expected to be higher in the top pair events than in a typical
645: QCD event.
646: \item Centrality, defined as $\frac{H_{T}}{H_{E}}$ , where $H_{E}$is sum
647: of energies of the jets.
648: \item Top and W mass likelihood - $\chi^{2}$-like variable. $L\equiv\left(\frac{M_{3j}-M_{t}}{\sigma_{t}}\right)^{2}+\left(\frac{M_{2j}-M_{w}}{\sigma_{w}}\right)^{2}$,
649: where $M_{t},M_{W},\sigma_{t},\sigma_{W}$ are top and W masses (175
650: GeV and 80 GeV respectively) and resolution values (45 GeV and 10
651: GeV respectively \cite{alljet}). $M_{3j}$ and $M_{2j}$ are composed
652: of the jet combinations, so to minimize L.
653: \item $P_{T}$ and lifetime significance of the leading b-tagged jet.
654: \end{itemize}
655: Many of these variables (for instance mass likelihood and aplanarity)
656: are only defined for events with 2 or more jets. So, we require now
657: 2 jets with $P_{T}$>20 GeV and $|\eta|$<2.5.
658:
659: Appendix has the plots of all these variables, which serves also as
660: an additional check of an agreement between the data and prediction.
661: Two of these plots can be observed on Fig. \ref{cap:The-nn0-input-small}.
662: As can be seen the NN input variables show fairly good agreement between
663: between data and MC, which gives us confidence that the NN will provide
664: sensible output, using these variables.
665:
666: %
667: \begin{figure}
668: \includegraphics[scale=0.3]{analysis/CONTROLPLOTS/aplan_0_type2}\includegraphics[scale=0.3]{analysis/CONTROLPLOTS/ht_0_type2}
669:
670:
671: \caption{2 of the 5 input variables of the first topological NN before the
672: NN cut ($\tau$ type 2). The Kolmogorov-Smirnov (KS) probabilities
673: are shown, indicating how good the agreement is.}
674:
675: \label{cap:The-nn0-input-small}
676: \end{figure}
677:
678:
679:
680: \subsection{NN optimization}
681:
682: For training the NN we used the Multi Layer Perceptron (MLP) \cite{MLPfit},
683: as implemented in ROOT framework. The input events had been split
684: into 7466 train and 14932 test entries. At each of the 500 training
685: {}``epochs'' it evaluates the fractional error for both signal and
686: background, showing how successful it has been in discriminating the
687: test events (Figure \ref{cap:NN-error})
688:
689: %
690: \begin{figure}
691: \subfigure[The first NN]{\includegraphics[scale=0.4]{analysis/GOODNN_NOTAU/nn0training300}}\subfigure[The second NN]{\includegraphics[scale=0.4]{analysis/GOODNN_NOTAU/nn1training300}}
692:
693:
694: \caption{NN error. Red is test sample, blue is training sample}
695:
696: \label{cap:NN-error}
697: \end{figure}
698:
699:
700: The resulting NNs are shown on Fig. \ref{cap:NN0} and \ref{cap:NN1}.
701: There one can observe the structure of the trained NN (blue interconnected
702: nodes) and the performance evaluation based on the training samples.
703: In Appendix (Fig \ref{cap:The-resulting-output_type2} and \ref{cap:The-resulting-output_type3})
704: we can observe this final NN output in the main analysis data sample
705: (as well as in the signal and in the backgrounds).
706:
707: %
708: \begin{figure}
709: \subfigure[The first NN]{\includegraphics[scale=0.6]{analysis/GOODNN_NOTAU/nn0analysis300}}
710:
711:
712: \caption{NN0 structure. The upper left plots show the relative impact of the
713: variables on the NN output. The bottom left is distribution of NNout,
714: the bottom right - efficiencies. Red is signal, blue is background.}
715:
716: \label{cap:NN0}
717: \end{figure}
718:
719:
720: %
721: \begin{figure}
722: \includegraphics[scale=0.6]{analysis/GOODNN_NOTAU/nn1analysis300}
723:
724:
725: \caption{NN1 structure. The upper left plots show the relative impact of the
726: variables on the NN output. The bottom left is distribution of NNout,
727: the bottom right - efficiencies. Red is signal, blue is background.}
728:
729: \label{cap:NN1}
730: \end{figure}
731:
732:
733: The result of applying this NN to data is shown on Figure \ref{cap:Result-of-applying}
734: . At this point we had to determine what cuts on the topological NN
735: output maximize the signal significance. The signal significance is
736: defined as $\frac{Number\, of\, signal\, events}{\sqrt{Number\, of\, Signal+Background\, events}}$
737: and is shown on Figure \ref{signal-signifficance} . The maximum it
738: reaches at $NN1>0.9$ for both type 2 and 3. Therefor this is the
739: cut we've used for the cross section measurement. The results of this
740: measurement are summarized in Table \ref{cap:RESULTS}
741:
742: %
743: \begin{figure}
744: \subfigure[Type 2]{\includegraphics[scale=0.3]{plots/NNresult_tau2}}\subfigure[Type 2 (zoomed)]{\includegraphics[scale=0.3]{plots/NNresult_zoomed_tau2}}
745:
746: \subfigure[Type 3]{\includegraphics[scale=0.3]{plots/NNresult_tau3}}\subfigure[Type 3 (zoomed)]{\includegraphics[scale=0.3]{plots/NNresult_zoomed_tau3}}
747:
748:
749: \caption{Result of applying NN cut. $t\bar{t}$, $W$ and QCD are plotted
750: incrementally in order to compare with \# of events observed in data.
751: Error bars include only statistical errors. $\sigma(t\bar{t})=5.54$
752: pb is assumed. The right plot only shows the entries with high NN.
753: The errors are statistical only.}
754:
755: \label{cap:Result-of-applying}
756: \end{figure}
757:
758:
759: %
760: \begin{figure}
761: \subfigure[Type 2]{\includegraphics[scale=0.3]{plots/NNresult_signiff_tau2}}\subfigure[Type 3]{\includegraphics[scale=0.3]{plots/NNresult_signiff_tau3}}
762:
763:
764: \caption{$t\bar{t}\rightarrow\tau+jets$ signal significance}
765:
766: \label{signal-signifficance}
767: \end{figure}
768:
769:
770: %
771: \begin{table}
772: \begin{centering}\begin{tabular}{|c|c|c|c|c|c|c|c|c|}
773: \hline
774: Channel &
775: $N^{obs}$ &
776: ${\mathcal{B}}$ &
777: $\int{\mathcal{L}}dt$ &
778: \multicolumn{2}{c|}{Bakgrounds}&
779: $\varepsilon(t\bar{t})$ (\%) &
780: $s$ (7 pb) &
781: s+b \tabularnewline
782: \hline
783: $\tau$+jets type 2 &
784: 5 &
785: 0.1 &
786: 349.3 &
787: $W\rightarrow\tau\nu$ &
788: 0.60$\pm$0.03&
789: 1.57$\pm$0.01 &
790: 3.83$_{-0.51}^{+0.46}$ &
791: 6.84$_{-0.51}^{+0.46}$ \tabularnewline
792: &
793: &
794: &
795: &
796: fakes &
797: 2.41$\pm$0.09 &
798: &
799: &
800: \tabularnewline
801: \hline
802: $\tau$+jets type 3 &
803: 5 &
804: 0.1 &
805: 349.3 &
806: $W\rightarrow\tau\nu$ &
807: 0.27$\pm$0.01&
808: 0.73$\pm$0.01 &
809: 1.80$_{-0.23}^{+0.22}$ &
810: 4.39$_{-0.23}^{+0.22}$ \tabularnewline
811: &
812: &
813: &
814: &
815: fakes &
816: 2.33$\pm$0.09 &
817: &
818: &
819: \tabularnewline
820: \hline
821: \end{tabular}\par\end{centering}
822:
823:
824: \caption{The final result summary after the NN>0.9 cut, $\epsilon(t\bar{t})$
825: is the total signal acceptance.}
826:
827: \label{cap:RESULTS}
828: \end{table}
829:
830:
831:
832: \section{Systematic uncertainties}
833:
834: . The most important systematic effects (except of the b-tagging,
835: which is treated later) are summarized in Table \ref{cap:Syst}.
836:
837: %
838: \begin{table}
839: {\footnotesize }\begin{tabular}{|c||c|c|}
840: \hline
841: Channel&
842: {\footnotesize $\tau$+jets type 2 }&
843: {\footnotesize $\tau$+jets type 3 }\tabularnewline
844: \hline
845: \hline
846: {\footnotesize Jet Energy Scale }&
847: {\footnotesize $_{-0.27}^{+0.30}$ }&
848: {\footnotesize $_{-0.69}^{+0.53}$ }\tabularnewline
849: \hline
850: {\footnotesize Primary Vertex }&
851: {\footnotesize $_{+0.037}^{-0.036}$ }&
852: {\footnotesize $_{+0.095}^{-0.093}$ }\tabularnewline
853: \hline
854: {\footnotesize MC stat }&
855: {\tiny $_{+0.25}^{-0.22}$ }&
856: {\tiny $_{+0.65}^{-0.58}$ }\tabularnewline
857: \hline
858: {\footnotesize Trigger }&
859: {\footnotesize $_{-0.020}^{+0.0025}$ }&
860: {\footnotesize $_{-0.069}^{+0.0056}$ }\tabularnewline
861: \hline
862: {\footnotesize Branching ratio }&
863: {\footnotesize $_{+0.074}^{-0.071}$ }&
864: {\footnotesize $_{+0.19}^{-0.18}$ }\tabularnewline
865: \hline
866: {\footnotesize QCD fake rate parametrization }&
867: {\footnotesize $_{+0.17}^{-0.17}$ }&
868: {\footnotesize $_{+0.34}^{-0.34}$ }\tabularnewline
869: \hline
870: $W\rightarrow\tau\nu$&
871: {\footnotesize $_{+0.19}^{-0.19}$ }&
872: {\footnotesize $_{+0.19}^{-0.19}$ }\tabularnewline
873: \hline
874: \end{tabular}{\footnotesize \par}
875:
876:
877: \caption{Systematic uncertainties on $\sigma(t\bar{t})$ (in pb).}
878:
879: \label{cap:Syst}
880: \end{table}
881:
882:
883:
884: \subsection{JES}
885:
886: The energy scale corrections applied to data and MC have uncertainties
887: associated with them. These uncertainties result in systematic shift
888: in the measured cross section. To compute these systematics the JES
889: corrections in MC were shifted up (or down) by $\delta JES^{data}=\sqrt{(\delta_{syst}^{data})^{2}+(\delta_{stat}^{data})^{2}+(\delta_{syst}^{MC})^{2}+(\delta_{stat}^{MC})^{2}}$.
890:
891:
892: \subsection{Primary Vertex and Branching Ratio}
893:
894: The PV and $t\bar{t}$ and W branching fractions had been assigned
895: uncertainties of 1\% and 2\% correspondingly, same as in \cite{alljet}
896:
897:
898: \subsection{Luminosity}
899:
900: The total integrated luminosity of the data used in this analysis
901: is $349\pm23$. This error yields to the uncertainty quoted in Table
902: \ref{cap:Syst}.
903:
904:
905: \subsection{Trigger}
906:
907: The trigger parametrization systematics is computed by top\_trigger
908: \cite{top_trigger}.
909:
910:
911: \subsection{B-tagging}
912:
913: B-tagging uncertainty effects are taken into account by varying the
914: systematic and statistical errors on the MC tagging weights.
915:
916: These errors arise form several independent sources:
917:
918: \begin{itemize}
919: \item B-jet tagging parametrization.
920: \item C-jet tagging parametrization.
921: \item Light jet tagging parametrization (negative tag rate). Derived by
922: varying by $\pm1\sigma$ the parametrization and adding in quadrature
923: 8\% relative uncertainty from the variation of the negative tag rate
924: measured in different samples.
925: \item Systematic uncertainties on the scale factors $SF_{hf}$ and $SF_{ll}$
926: are derived from the statistical error due to finite MC statistics.
927: \item Semi-leptonic b-tagging efficiency parametrization in MC and in data
928: (System 8).
929: \item Taggability. This includes the statistical error due to finite statistic
930: in the samples from which it had been derived and systematic, reflecting
931: the (neglected) taggability dependence on the jet multiplicity.
932: \end{itemize}
933: The resulting effect of all of these error sources on the final number
934: is summarized in Table \ref{cap:b-tagging-systematics-sources}
935: along with the total b-ID systematic error (quoted in Table \ref{cap:Syst}).
936:
937: %
938: \begin{table}
939: \begin{tabular}{|c|c|c|}
940: \hline
941: Channel&
942: {\footnotesize $\tau$+jets type 2 }&
943: {\footnotesize $\tau$+jets type 3 }\tabularnewline
944: \hline
945: \hline
946: b-tagging&
947: {\tiny $_{-0.13}^{+0.076}$ }&
948: {\tiny $_{-0.26}^{+0.41}$ }\tabularnewline
949: \hline
950: c-tagging&
951: {\tiny $_{-0.20}^{+0.16}$ }&
952: {\tiny $_{-0.48}^{+0.60}$ }\tabularnewline
953: \hline
954: l-tagging&
955: {\tiny $_{-0.0051}^{+0.0051}$ }&
956: {\tiny $_{-0.014}^{+0.014}$ }\tabularnewline
957: \hline
958: $SF_{hf}$&
959: {\tiny $_{-0.00036}^{+0.00036}$ }&
960: {\tiny $_{-0.00094}^{+0.00094}$ }\tabularnewline
961: \hline
962: $SF_{ll}$&
963: {\tiny $_{-0.00036}^{+0.00036}$ }&
964: {\tiny $_{-0.00094}^{+0.00094}$ }\tabularnewline
965: \hline
966: $\mu$ b-tagging (data)&
967: {\tiny $_{-0.091}^{+0.094}$ }&
968: {\tiny $_{-0.24}^{+0.25}$ }\tabularnewline
969: \hline
970: $\mu$ b-tagging (MC)&
971: {\tiny $_{+0.11}^{-0.10}$ }&
972: {\tiny $_{+0.28}^{-0.25}$ }\tabularnewline
973: \hline
974: taggability&
975: {\tiny $_{-0.048}^{+0.049}$ }&
976: {\tiny $_{-0.13}^{+0.13}$ }\tabularnewline
977: \hline
978: \end{tabular}
979:
980:
981: \caption{b-tagging systematics sources}
982:
983: \label{cap:b-tagging-systematics-sources}
984: \end{table}
985:
986:
987:
988: \subsection{Fake rate}
989:
990: The systematic uncertainty, associated with the $\tau$ fake rate
991: is just the statistical error of the fit, described in section \ref{sub:Fit}.
992:
993:
994: \subsection{W background prediction}
995:
996: The method used to describe the $W\rightarrow\tau\nu$ background
997: is not perfect. There are two potential sources of error
998:
999: \begin{itemize}
1000: \item Only W+4 partons MC had been used. It is however expected that W+2
1001: and W+3 would some (albeit smaller) contribution. In order to properly
1002: take this into account one would need to combine all jet multiplicity
1003: samples. This leads to slight underestimation of the result.
1004: \item The {}``$b$ veto'' sample may contain some W contribution, from
1005: wjjjj events. This leads to double-counting of these vents and hence
1006: overestimation of the result.
1007: \end{itemize}
1008: A conservative estimate of 50\% uncertainty on the number of W events
1009: in the final sample had been applied. That is, by varying this number
1010: up and down by 50\% we observed the effect on the cross section (as
1011: quoted in Table \ref{cap:Syst}).
1012:
1013:
1014: \section{Cross section}
1015:
1016: The cross section is defined as $\sigma=\frac{Number\, of\, signal\, events}{\varepsilon(t\bar{t})\cdot BR(t\bar{t})\cdot Luminosity}$.
1017: The results was the following:
1018:
1019: \begin{center}$\tau$+jets type 2 cross section: \[
1020: 3.63\;\;_{-3.50}^{+4.72}\;\;(stat)\;\;_{-0.48}^{+0.49}\;\;(syst)\;\;\pm0.24\;\;(lumi)\;\; pb\]
1021: \par\end{center}
1022:
1023: \begin{center}$\tau$+jets type 3 cross section: \[
1024: 9.39\;\;_{-7.49}^{+10.10}\;\;(stat)\;\;_{-1.18}^{+1.25}\;\;(syst)\;\;\pm0.61\;\;(lumi)\;\; pb\]
1025: \par\end{center}
1026:
1027: The combined cross section was estimated by minimizing the sum of
1028: the negative log-likelihood functions for each channel. Functional
1029: form of the likelihood function was the same that had been used for
1030: the $e\mu$ channel (\cite{emu}). Combined cross section yields
1031:
1032: \begin{center}\[
1033: 5.05\;\;_{-3.46}^{+4.31}\;\;(stat)\;\;_{-0.67}^{+0.68}\;\;(syst)\;\;\pm0.33\;\;(lumi)\;\; pb\]
1034: \par\end{center}
FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>