\section{Analysis}


\subsection{Outline}

The analysis procedure involved several stages:

\begin{itemize}
\item Preselection (section \ref{sub:Preselection}). At least 4 jets and
$\not\!\! E_{T}$ significance > 3. 653727 events selected in the
data, 109.93 $\pm$7.26 $t\bar{t}$ among them are expected. S:B =
1:6000. 
\item ID cuts (section \ref{sub:Results-of-the}) . At least one good $\tau$
candidate and at at least one tight SVT tag is requited. We also required
$\geq2$ jets with $|\eta|<2.4$ and $P_{T}>20$ GeV. 216 events selected
in the data, 9.320$\pm$0.620 $t\bar{t}$ among them are expected.
S:B = 1:58. 
\item Topological NN (section \ref{sub:NN-variables}). A sequence of two
feed-forward NN had been trained and applied. The optimal cut on the
second NN has been found to be 0.6. With this final cut we had obtained
13 events in data with 4.93$\pm$0.33 $t\bar{t}$ among them are expected.
S:B = 1:2.5. 
\end{itemize}
The W background had been modeled using ALPGEN Monte Carlo simulation,
while QCD had been extracted from the data using procedure, described
in section \ref{sub:QCD-modeling}.


\subsection{\label{sub:Preselection}Preselection}

The total number of events in this 351 $pb^{-1}$data skim is 17 millions.
This is a very large and rather unwieldy dataset. Hence, the main
goal of preselection was to reduce this dataset while imposing the
most obvious and straightforward requirements, characterizing my signal
signature. Such characteristic features include the following:

\begin{itemize}
\item Moderate $\not\!\! E_{T}$ arising from both the W vertex and $\tau$
decay. 
\item At least 4 jets have to be present. 
\item $\tau$lepton and 2 b-jets are present. 
\end{itemize}
Since both $\tau$ ID and b-tagging involve complex algorithms which
are likely to be signal-sensitive and may require extensive \char`\"{}tuning\char`\"{},
we've chosen not to use them at the preselection stage.

Similarly, we had chosen not to impose any jet $P_{T}$ cuts, since
such cuts strongly depend on the JES corrections and associated errors
and hence are better to be applied at a later stage.

The first 3 preselection criteria were chosen similar to the $t\overline{t}\rightarrow jets$
analysis \cite{alljet}:

\begin{itemize}
\item Primary Vertex is reconstructed and is within the central tracker
volume (60 cm in Z from the detector center) and has at least 3 tracks
associated with it. 
\item Veto on isolated electrons and muons to avoid overlap with the $t\overline{t}\rightarrow lepton+jets$
cross section analysis. 
\item $N_{jets}\geq4$ with $P_{T}>8\, GeV$. 
\end{itemize}
At this point, $t\overline{t}\rightarrow e+jets$ and $t\overline{t}\rightarrow\mu+jets$
analysis \cite{l+jets} are applying cuts on $\Delta\phi$ between
the lepton and $\not\!\! E_{T}$ as well as so-called \char`\"{}triangular\char`\"{}
cuts in $\Delta\phi$ - $\not\!\! E_{T}$ plane. The goal is to eliminate
the events with fake $\not\!\! E_{T}$ . The neutrino and lepton coming
from the W are expected to fly opposite direction most of the time.
However, as can be observed on Figure \ref{cap:dphi}, no such simple
cuts are obvious in case of $\tau$ . That is to be expected since
$\tau$ itself emits a neutrino in its decay, contributing to $\not\!\! E_{T}$
. So, instead a new variable is proposed to cut off the fake $\not\!\! E_{T}$
events and reduce the sample size.

%
\begin{figure}
\includegraphics[scale=0.4]{analysis/plots/dphitaumet}


\caption{$\Delta\phi$ between $\tau$ and $\not\!\! E_{T}$ for QCD (black)
and $t\bar{t}\rightarrow\tau+jets$ (red).}

\label{cap:dphi} 
\end{figure}


%
\begin{figure}
\includegraphics[scale=0.4]{analysis/plots/metl}


\caption{$\not\!\! E_{T}$ significance for QCD and $t\bar{t}\rightarrow\tau+jets$.}

\label{cap:metl} 
\end{figure}


$\not\!\! E_{T}$ significance \cite{metl} is defined as measure
of likelihood of $\not\!\! E_{T}$ arising from physical sources,
rather than fluctuations in detector measurements. As can be observed
on Fig. \ref{cap:metl} it proves to be an effective way to reduce
the data skim. Cut of 3 was used for preselection.

Now we need to scale the original 10K events of the MC sample to 349
$pb^{-1}$. The total $t\bar{t}$ cross section is 6.8 pb \cite{NNLO}.
Taking into account the branching fraction to hadronic $\tau+jets$
mode, the effective cross section comes out to be:

$B(\tau\rightarrow hadrons)\cdot B(t\bar{t}\rightarrow\tau+jets)\cdot\sigma(t\bar{t})=0.65\cdot0.15\cdot6.8=0.66$
pb

Throughout this work we had however used the value of $\sigma(t\bar{t})$
of 5.5 pb, the value, computed by the ALPGEN simulation, taking into
account the generation cuts. The effective cross section used for
scaling is then 0.53 pb. Since this value is only used for reference
and optimization of S:B it's of no importance which number is used.

The relative flavor fractions of the $W+4jets$ process were taken
from ALPGEN simulation as ratios of the simulated cross section. It
was then normalized to the measured total value of 4.5 $\pm$ 2.2
pb \cite{W+4j}

Table \ref{presel} shows the results of the preselection for both
data and the backgrounds.

%
\begin{table}
\begin{tabular}{|c|c|c|c|}
\hline 
&
\# passed&
ALPGEN $\sigma$, pb&
\# passed scaled\tabularnewline
\hline
\hline 
data&
653727/17M&
&
653727\tabularnewline
\hline 
$t\overline{t}\rightarrow\tau+jets$&
6141/10878&
0.821 $\pm$ 0.004&
109.93 $\pm$7.26\tabularnewline
\hline 
$Wbbjj\rightarrow$ $\tau\nu+bbjj$&
2321/11576&
0.222 $\pm$ 0.044&
9.98 $\pm$ 2.08\tabularnewline
\hline 
$Wccjj\rightarrow$ $\tau\nu+ccjj$&
2289/10995&
0.527 $\pm$ 0.059&
24.77 $\pm$ 3.22\tabularnewline
\hline 
$Wcjjj\rightarrow$ $\tau\nu+cjjj$&
2169/10435&
0.920 $\pm$0.087 &
42.23 $\pm$ 4.87\tabularnewline
\hline 
$Wjjjj\rightarrow$ $\tau\nu+jjjj$&
2683/11920&
14.14 $\pm$ 1.3&
720.33 $\pm$ 81.48 \tabularnewline
\hline
\end{tabular}


\caption{Preselection results. Shown are the total acceptances (including
preselection) and the \# of events scaled to 349 $\pm$23 $pb^{-1}$
(no systematic uncertainties except for this luminosity error are
included). The Alpgen samples generation cuts are described in \cite{l+jets}.}

\label{presel} 
\end{table}


\subsection{\label{sub:Results-of-the}Results of the ID cuts}

The next step was to apply the requirement of $\tau$ and b tagging.
Table \ref{cap:btaggingandtau} shows the selection criteria that
we apply to data and MC and the resulting selection efficiencies.
The results of this procedure can be observed in Table \ref{b and tau}.
It can be noted that S:B at this stage is 1:58, which is way too low.
In section \ref{sub:NN-variables} we will describe the topological
NN used to enhance the signal content.

%
\begin{table}
\begin{tabular}{|c|c|c|}
\hline 
&
{\scriptsize data}&
{\scriptsize taggingMC}\tabularnewline
\hline
\hline 
&
{\scriptsize $\geq1$ $\tau$ with $|\eta|<2.4$ and $P_{T}>20\, GeV$}&
{\scriptsize $\geq1$ $\tau$ with $|\eta|<2.4$ and $P_{T}>20\, GeV$}\tabularnewline
\hline 
&
{\scriptsize $\geq1$ SVT}&
{\scriptsize $TrigWeight\cdot bTagProb$}\tabularnewline
\hline 
&
{\scriptsize $\geq2$ jets with $|\eta|<2.4$ and $P_{T}>20\, GeV$}&
{\scriptsize $\geq2$ jets with $|\eta|<2.4$ and $P_{T}>20\, GeV$}\tabularnewline
\hline
\end{tabular}


\caption{b-tagging and $\tau$ ID. In the MC we use the b-tagging certified
parametrization rather then actual b-tagging, that is we applied the
b-tagging weight ($bTagProb$). We also used the triggering weight
as computed by top\_trigger.}

\label{cap:btaggingandtau} 
\end{table}


%
\begin{table}
\begin{tabular}{|c|c|c|c|}
\hline 
&
{\small \# passed}&
{\small Acceptance}&
{\small \# passed scaled}\tabularnewline
\hline
\hline 
{\small data}&
{\small 216/653727}&
&
{\small 216}\tabularnewline
\hline 
{\small $t\overline{t}\rightarrow\tau+jets$}&
{\small 524.0/6141}&
{\small 0.0480$\pm$0.0020}&
{\small 9.320$\pm$0.620}\tabularnewline
\hline 
{\small $Wbbjj\rightarrow$ $\tau\nu+bbjj$}&
{\small 54.5/2321}&
{\small 0.0150$\pm$0.0024}&
{\small 0.012$\pm$0.002}\tabularnewline
\hline 
{\small $Wccjj\rightarrow$ $\tau\nu+ccjj$}&
{\small 13.3/2289}&
{\small 0.0039$\pm$0.0012}&
{\small 0.034$\pm$0.005}\tabularnewline
\hline 
{\small $Wcjjj\rightarrow$ $\tau\nu+cjjj$}&
{\small 8.0/2169}&
{\small 0.0025$\pm$0.0010}&
{\small 0.160$\pm$0.020}\tabularnewline
\hline 
{\small $Wjjjj\rightarrow$ $\tau\nu+jjjj$}&
{\small 3.3/2683}&
{\small 0.0009$\pm$0.0006}&
{\small 0.860$\pm$0.100}\tabularnewline
\hline
\end{tabular}


\caption{b-tagging and $\tau$ ID results. Shown are the total acceptances
(including preselection) and the \# of events scaled to Luminosity.}

\label{b and tau} 
\end{table}


For the purposes of this analysis we define 3 subsamples out of the
original preselected data sample:

\begin{itemize}
\item The {}``signal'' sample - require at least 1 $\tau$ with $NN>0.95$
and at least one SVT tag (as in Table \ref{cap:btaggingandtau}).
This is the main sample used for the measurement - 268 events. 
\item The {}``$\tau$ veto sample'' - Same selection, but instead of $NN_{\tau}>0.95$
$0<NN_{\tau}<0.5$ was required for $\tau$ candidates and no events
with {}``good'' (NN>0.8) taus were allowed. This sample is used
for the topological NN training - 21022 events. 
\item The {}``$b$ veto'' sample - at least 1 $\tau$ with $NN>0.95$,
but NO SVT tags. This sample is to be used for the QCD prediction
- 4642 events 
\end{itemize}

\subsection{\label{sub:QCD-modeling}QCD modeling}

The difference between the total number of $t\bar{t}$ and $W$ events
and data has to be attributed to QCD events, where $\tau$ candidate
is a jet, mistakenly identified as a $\tau$. In order to estimate
this background contribution the following strategy was employed.


\subsubsection{Parametrization}

In this section our definition of the $\tau$ fake rate is different
from the one in Figure \ref{tauID_Fake_Eff}. There, the goal was
to determine the total number of fake $\tau$ candidates per event
in the ALLJET data skim. Now our goal is to estimate the number of
events that would pass all our signal selection criteria, yet contain
no physical $\tau$ leptons but only fakes. In other words we are
modeling the QCD contribution to out final $t\bar{t}$ candidate event
selection.

We started with the {}``$b$ veto'' sample. It can be considered
predominantly QCD data sample. Almost all $\tau$ candidates in it
have to be fake. Figure \ref{cap:taufaketaus} shows the distribution
of these candidates by $P_{T}$ and $|\eta|$. On the other hand Fig.
\ref{cap:taufakejets} displays the jets found in the same events.

%
\begin{figure}
\includegraphics[scale=0.5]{plots/jet_trf}


\caption{Jets in the QCD sample}

\label{cap:taufakejets} 
\end{figure}


%
\begin{figure}
\includegraphics[scale=0.4]{plots/tau_trf}


\caption{$\tau$ candidates in the QCD sample}

\label{cap:taufaketaus} 
\end{figure}


Since the $\tau$ here are really jets, we can simply divide one histogram
by the other bin by bin to parametrize the $\tau$ fake rate. Figure
\ref{cap:taufakerate} demonstrates this parametrization.

%
\begin{figure}
\includegraphics[scale=0.5]{plots/tauTRF}


\caption{$\tau$ fake rate parametrization}

\label{cap:taufakerate} 
\end{figure}


The large isolated spikes are caused by limited statistics available
in these bins. In order to reduce this effect and minimize the statistical
uncertainty we had performed a 2D fit to this distribution. This fit
is then to be used for the QCD prediction.


\subsubsection{Fit\label{sub:Fit}}

The fitting had been performed separately in $\eta$ and $P_{T}$
projections, that is we had assumed that the 2D parametrization can
be simply factored in two components:

\[
F(\eta,P_{T})\equiv A(\eta)\cdot B(P_{T})\]


The $\eta$ distributions (as we have observed in section \ref{sub:Signal-characteristics})
are symmetric around 0, hence we can perform the fit to its absolute
value. The fitting function was the following:

\[
A(\eta)\equiv a_{1}+a_{2}\cdot\eta{}^{2}+a_{3}\cdot\eta{}^{3}+a_{4}\cdot\eta{}^{4}+...+a_{7}\cdot\eta{}^{7}\]


if $\eta=0$ $a_{1}=0$ was set to avoid singularity.

The fitting function for $P_{T}$ has been picked so that it would
describe the data well and had not been monotonous (that is we want
$\lim_{P_{T}\rightarrow\infty}B\left(P_{T}\right)\rightarrow const$)
:

\[
B(P_{T})\equiv b_{1}\cdot\exp\left(\frac{P_{T}}{\left(P_{T}+b_{3}\right)^{2}}\right)+b_{2}\cdot\left(\frac{P_{T}}{P_{T}+b_{3}}\right)\]


The distributions in $\eta$ and $P_{T}$ had been separately and
fitted with $A(\eta)$ and $B(P_{T})$. The result of this procedure
can be observed on Fig. \ref{cap:taufakerate_fit}. 

As can be observed, the fit in $\eta$ fails around the $\eta=1$.
This is the ICD region, which is expected to have different effect
on different $\tau$ types. In order to account for this effect we
had performed the fit for each type separately. The result can be
observed on the Fig. \ref{cap:taufakerate_fit_types}

As can be seen, the effect of the ICD region is largest in type 1
and is minor fit the type 2. At the same time the $\eta$ distribution
in signal (Fig. \ref{cap:reco tau}) is fairly uniform. 

Hence we had imposed the following cuts to remove these ICD fakes
:

\begin{itemize}
\item For type 1: $0.8<|\eta|<1.3$ region cut off
\item For type 3: $0.85<|\eta|<1.1$ region cut off
\end{itemize}
With these cuts, the fits had been much improved (Fig. \ref{cap:taufakerate_fit_types_noeta}).
The resulting 2D param

%
\begin{figure}
\includegraphics[scale=0.4]{plota_may18/fit_alltypes}


\caption{Fit of the $\eta$ and $P_{T}$ distributions of the $\tau$ fake
rate.}

\label{cap:taufakerate_fit} 
\end{figure}


%
\begin{figure}
{\tiny \subfigure[Type 1 fit]{\includegraphics[scale=0.2]{plota_may18/fit_type1}}\subfigure[Type 2 fit]{\includegraphics[scale=0.2]{plota_may18/fit_type2}}}{\tiny \par}

{\tiny \subfigure[Type 3 fit]{\includegraphics[scale=0.2]{plota_may18/fit_type3}}}{\tiny \par}


\caption{Fit of the $\eta$ and $P_{T}$ distributions of the $\tau$ fake
rate.}

\label{cap:taufakerate_fit_types} 
\end{figure}


%
\begin{figure}
{\tiny \subfigure[Type 1 fit]{\includegraphics[scale=0.2]{plota_may18/fit_type1_cut}}\subfigure[Type 2 fit]{\includegraphics[scale=0.2]{plota_may18/fit_type2}}}{\tiny \par}

{\tiny \subfigure[Type 3 fit]{\includegraphics[scale=0.2]{plota_may18/fit_type3_cut}}}{\tiny \par}


\caption{Fit of the $\eta$ and $P_{T}$ distributions of the $\tau$ fake
rate. The ICD region had been cut off for the types 1 and 3}

\label{cap:taufakerate_fit_types_noeta} 
\end{figure}


As can be seen from Table \ref{b and tau (types)} the type 1 $\tau$
contribute less then 1 event even before the $\eta$ cut. After the
cut its contribution is totally negligible, so it was decided to discard
these events from the $t\bar{t}$ cross section measurement. The final
2D parametrization of the $\tau$ fake rate ($F(\eta,P_{T})$) is
shown on Fig. \ref{cap:taufakerate_fit2D}. In the Table \ref{b and tau (types) after eta}
we can observe how the $\eta$ cut effects the number of selected
events.

%
\begin{table}
\begin{tabular}{|c|c|}
\hline 
{\tiny data}&
{\tiny taggingMC}\tabularnewline
\hline
\hline 
{\tiny $\geq1$ $\tau$ with $|\eta|<2.4$ and $P_{T}>20\, GeV$}&
{\tiny $\geq1$ $\tau$ with $|\eta|<2.4$ and $P_{T}>20\, GeV$}\tabularnewline
\hline 
{\tiny $\geq1$ SVT}&
{\tiny $TrigWeight\cdot bTagProb$}\tabularnewline
\hline 
{\tiny $\geq2$ jets with $|\eta|<2.4$ and $P_{T}>20\, GeV$}&
{\tiny $\geq2$ jets with $|\eta|<2.4$ and $P_{T}>20\, GeV$}\tabularnewline
\hline
\end{tabular}

\begin{tabular}{|c|c|c|c|}
\hline 
&
Type 1&
Type 2&
Type 3\tabularnewline
\hline
\hline 
data&
28&
91&
94\tabularnewline
\hline 
$t\overline{t}\rightarrow\tau+jets$&
0.73$\pm$0.05&
5.61$\pm$0.37&
3.12$\pm$0.20\tabularnewline
\hline
$W\rightarrow\tau\nu+jets$&
0.094$\pm$0.005&
0.93$\pm$0.04&
0.39$\pm$0.02\tabularnewline
\hline
\end{tabular}


\caption{b-tagging and $\tau$ ID results per type. Shown are the \# of events
predicted in signal and observed in the data as well as the cuts applied.}

\label{b and tau (types)} 
\end{table}


%
\begin{table}
\begin{tabular}{|c|c|c|}
\hline 
&
Type 2&
Type 3\tabularnewline
\hline
\hline 
data&
91&
71\tabularnewline
\hline 
$t\overline{t}\rightarrow\tau+jets$&
5.61$\pm$0.37&
2.81$\pm$0.18\tabularnewline
\hline
$W\rightarrow\tau\nu+jets$&
0.93$\pm$0.04&
0.32$\pm$0.01\tabularnewline
\hline
\end{tabular}


\caption{b-tagging and $\tau$ ID results per type after the $\eta$ cut.
Shown are the \# of events predicted in signal and observed in the
data as well as the cuts applied.}

\label{b and tau (types) after eta} 
\end{table}


%
\begin{figure}
\subfigure[Type 2 2D fit]{\includegraphics[scale=0.2]{plota_may18/type2_surf}}\subfigure[Type 3 2D Fit]{\includegraphics[scale=0.2]{plota_may18/type3_surf}}


\caption{The 2D combined fit (in $\eta$ and $P_{T}$) of the $\tau$ fake
rate}

\label{cap:taufakerate_fit2D} 
\end{figure}


\subsubsection{Closure tests}

In order to test the validity of fitting separately in $\eta$ and
$P_{T}$ ignoring the possible correlations had to be checked. The
Fig \ref{cap:Closure_test} demonstrates the closure test that was
used for this purpose. In the same {}``b veto sample'' we had applied
the resulting $F(\eta,P_{T})$ to each jet and compared the resulting
(predicted) $\tau$ distributions with ones obtained from the actual
$\tau$ candidates (which of cause are predominantly fakes here).

However, one could imagine a pair of 2D distributions that would agree
perfectly in both projections and yet still be very different. In
order to test against such a possibility we had performed the same
cross-check as before, but we required the jets to be from 0.5 to
1 in $\eta$. For such $\eta$ {}``slice'' we had applied $F(\eta,P_{T})$
and compared the actual $P_{T}$ with the predicted. Figure \ref{cap:Closure_test_2}
demonstrates that the agreement is still fairly good.

%
\begin{figure}
\includegraphics[scale=0.2]{plota_may18/closure_eta_2}\includegraphics[scale=0.2]{plota_may18/closure_pt_2}

\includegraphics[scale=0.2]{plota_may18/closure_eta_3}\includegraphics[scale=0.2]{plota_may18/closure_pt_3}


\caption{The closure test of the $\tau$ fake rate function. The red histograms
are for the actual $\tau$ candidates in the {}``veto'' sample.
The green ones are the prediction. The $\eta$ distribution show some
discrepancy related to error of the fit.}

\label{cap:Closure_test} 
\end{figure}


%
\begin{figure}
\subfigure[Type 2]{\includegraphics[scale=0.45]{plots/pt_closure_type2}}\subfigure[Type 3]{\includegraphics[scale=0.45]{plots/pt_closure_type3}}


\caption{The closure test of the $\tau$ fake rate function. The red histograms
are for the actual $\tau$ candidates in the {}``veto'' sample.
The green ones are the prediction. The jets had been selected with
$0.5<\eta<1$. An asymmetric range had been chosen to avoid possible
bias.}

\label{cap:Closure_test_2} 
\end{figure}


\subsubsection{Computing the QCD fraction}

We assume that probability for a jet to fake a $\tau$ is simply $F(\eta,P_{T})$.
Then, the probability that at least one of the jets in the event will
fake $\tau$ can be computed as following:

\begin{center}$P_{event}=1-\prod_{j}(1-F(P_{T}^{j},\eta^{j}))$\par\end{center}

Summing up such probabilities over the tagged data we obtain the QCD
background estimation.

Using the results described in previous section we get $N_{QCD}=71.13\pm1.56$
for the $\tau$ type 2 and $N_{QCD}=77.46\pm0.80$ for the $\tau$
type 3, which agrees with the observed data (in Table \ref{b and tau (types) after eta})
fairly well. One can also observe (see Appendix) that the predicted
distributions of the main topological variables (section \ref{sub:NN-variables})
are in fairly good agreement with what is observed in the data.


\subsection{\label{sub:NN-variables}Topological NN}

For signal training sample 7481 preselected $t\overline{t}$ MC events
were used (NOT the same as the 6141 selection sample events). For
the background, the $\tau$ veto sample was used.

Similarly to the alljet analysis \cite{alljet} we define 2 networks:

\begin{enumerate}
\item Contains 3 topological (aplanarity, sphericity and centrality and
2 energy-based ( $H_{T}$ and $\sqrt{S}$ ). 
\item Contains the output of the first, W and top mass likelihood, b-jet's
$P_{T}$ and b-jet's decay lengths. 
\end{enumerate}
These are the kinematic and topological variables used:

\begin{itemize}
\item $H_{T}$- the scalar sum of all jet $P_{T}$s (and $\tau$). 
\item Sphericity and Aplanarity - these variables are formed from the eigenvalues
of the normalized Momentum Tensor of the jets in the event. These
are expected to be higher in the top pair events than in a typical
QCD event. 
\item Centrality, defined as $\frac{H_{T}}{H_{E}}$ , where $H_{E}$is sum
of energies of the jets. 
\item Top and W mass likelihood - $\chi^{2}$-like variable. $L\equiv\left(\frac{M_{3j}-M_{t}}{\sigma_{t}}\right)^{2}+\left(\frac{M_{2j}-M_{w}}{\sigma_{w}}\right)^{2}$,
where $M_{t},M_{W},\sigma_{t},\sigma_{W}$ are top and W masses (175
GeV and 80 GeV respectively) and resolution values (45 GeV and 10
GeV respectively \cite{alljet}). $M_{3j}$ and $M_{2j}$ are composed
of the jet combinations, so to minimize L. 
\item $P_{T}$ and lifetime significance of the leading b-tagged jet. 
\end{itemize}
Many of these variables (for instance mass likelihood and aplanarity)
are only defined for events with 2 or more jets. So, we require now
2 jets with $P_{T}$>20 GeV and $|\eta|$<2.5.

Appendix has the plots of all these variables, which serves also as
an additional check of an agreement between the data and prediction.
Two of these plots can be observed on Fig. \ref{cap:The-nn0-input-small}.
As can be seen the NN input variables show fairly good agreement between
between data and MC, which gives us confidence that the NN will provide
sensible output, using these variables.

%
\begin{figure}
\includegraphics[scale=0.3]{analysis/CONTROLPLOTS/aplan_0_type2}\includegraphics[scale=0.3]{analysis/CONTROLPLOTS/ht_0_type2}


\caption{2 of the 5 input variables of the first topological NN before the
NN cut ($\tau$ type 2). The Kolmogorov-Smirnov (KS) probabilities
are shown, indicating how good the agreement is.}

\label{cap:The-nn0-input-small} 
\end{figure}


\subsection{NN optimization}

For training the NN we used the Multi Layer Perceptron (MLP) \cite{MLPfit},
as implemented in ROOT framework. The input events had been split
into 7466 train and 14932 test entries. At each of the 500 training
{}``epochs'' it evaluates the fractional error for both signal and
background, showing how successful it has been in discriminating the
test events (Figure \ref{cap:NN-error})

%
\begin{figure}
\subfigure[The first NN]{\includegraphics[scale=0.4]{analysis/GOODNN_NOTAU/nn0training300}}\subfigure[The second NN]{\includegraphics[scale=0.4]{analysis/GOODNN_NOTAU/nn1training300}}


\caption{NN error. Red is test sample, blue is training sample}

\label{cap:NN-error} 
\end{figure}


The resulting NNs are shown on Fig. \ref{cap:NN0} and \ref{cap:NN1}.
There one can observe the structure of the trained NN (blue interconnected
nodes) and the performance evaluation based on the training samples.
In Appendix (Fig \ref{cap:The-resulting-output_type2} and \ref{cap:The-resulting-output_type3})
we can observe this final NN output in the main analysis data sample
(as well as in the signal and in the backgrounds).

%
\begin{figure}
\subfigure[The first NN]{\includegraphics[scale=0.6]{analysis/GOODNN_NOTAU/nn0analysis300}}


\caption{NN0 structure. The upper left plots show the relative impact of the
variables on the NN output. The bottom left is distribution of NNout,
the bottom right - efficiencies. Red is signal, blue is background.}

\label{cap:NN0} 
\end{figure}


%
\begin{figure}
\includegraphics[scale=0.6]{analysis/GOODNN_NOTAU/nn1analysis300}


\caption{NN1 structure. The upper left plots show the relative impact of the
variables on the NN output. The bottom left is distribution of NNout,
the bottom right - efficiencies. Red is signal, blue is background.}

\label{cap:NN1} 
\end{figure}


The result of applying this NN to data is shown on Figure \ref{cap:Result-of-applying}
. At this point we had to determine what cuts on the topological NN
output maximize the signal significance. The signal significance is
defined as $\frac{Number\, of\, signal\, events}{\sqrt{Number\, of\, Signal+Background\, events}}$
and is shown on Figure \ref{signal-signifficance} . The maximum it
reaches at $NN1>0.9$ for both type 2 and 3. Therefor this is the
cut we've used for the cross section measurement. The results of this
measurement are summarized in Table \ref{cap:RESULTS}

%
\begin{figure}
\subfigure[Type 2]{\includegraphics[scale=0.3]{plots/NNresult_tau2}}\subfigure[Type 2 (zoomed)]{\includegraphics[scale=0.3]{plots/NNresult_zoomed_tau2}}

\subfigure[Type 3]{\includegraphics[scale=0.3]{plots/NNresult_tau3}}\subfigure[Type 3 (zoomed)]{\includegraphics[scale=0.3]{plots/NNresult_zoomed_tau3}}


\caption{Result of applying NN cut. $t\bar{t}$, $W$ and QCD are plotted
incrementally in order to compare with \# of events observed in data.
Error bars include only statistical errors. $\sigma(t\bar{t})=5.54$
pb is assumed. The right plot only shows the entries with high NN.
The errors are statistical only.}

\label{cap:Result-of-applying} 
\end{figure}


%
\begin{figure}
\subfigure[Type 2]{\includegraphics[scale=0.3]{plots/NNresult_signiff_tau2}}\subfigure[Type 3]{\includegraphics[scale=0.3]{plots/NNresult_signiff_tau3}}


\caption{$t\bar{t}\rightarrow\tau+jets$ signal significance}

\label{signal-signifficance} 
\end{figure}


%
\begin{table}
\begin{centering}\begin{tabular}{|c|c|c|c|c|c|c|c|c|}
\hline 
Channel &
$N^{obs}$ &
${\mathcal{B}}$ &
$\int{\mathcal{L}}dt$ &
\multicolumn{2}{c|}{Bakgrounds}&
$\varepsilon(t\bar{t})$ (\%) &
$s$ (7 pb) &
s+b \tabularnewline
\hline 
$\tau$+jets type 2 &
5 &
0.1 &
349.3 &
$W\rightarrow\tau\nu$ &
0.60$\pm$0.03&
1.57$\pm$0.01 &
3.83$_{-0.51}^{+0.46}$ &
6.84$_{-0.51}^{+0.46}$ \tabularnewline
&
&
&
&
fakes &
2.41$\pm$0.09 &
&
&
\tabularnewline
\hline 
$\tau$+jets type 3 &
5 &
0.1 &
349.3 &
$W\rightarrow\tau\nu$ &
0.27$\pm$0.01&
0.73$\pm$0.01 &
1.80$_{-0.23}^{+0.22}$ &
4.39$_{-0.23}^{+0.22}$ \tabularnewline
&
&
&
&
fakes &
2.33$\pm$0.09 &
&
&
\tabularnewline
\hline
\end{tabular}\par\end{centering}


\caption{The final result summary after the NN>0.9 cut, $\epsilon(t\bar{t})$
is the total signal acceptance.}

\label{cap:RESULTS} 
\end{table}


\section{Systematic uncertainties}

. The most important systematic effects (except of the b-tagging,
which is treated later) are summarized in Table \ref{cap:Syst}.

%
\begin{table}
{\footnotesize }\begin{tabular}{|c||c|c|}
\hline 
Channel&
{\footnotesize $\tau$+jets type 2 }&
{\footnotesize $\tau$+jets type 3 }\tabularnewline
\hline
\hline 
{\footnotesize Jet Energy Scale }&
{\footnotesize $_{-0.27}^{+0.30}$ }&
{\footnotesize $_{-0.69}^{+0.53}$ }\tabularnewline
\hline 
{\footnotesize Primary Vertex }&
{\footnotesize $_{+0.037}^{-0.036}$ }&
{\footnotesize $_{+0.095}^{-0.093}$ }\tabularnewline
\hline 
{\footnotesize MC stat }&
{\tiny $_{+0.25}^{-0.22}$ }&
{\tiny $_{+0.65}^{-0.58}$ }\tabularnewline
\hline 
{\footnotesize Trigger }&
{\footnotesize $_{-0.020}^{+0.0025}$ }&
{\footnotesize $_{-0.069}^{+0.0056}$ }\tabularnewline
\hline 
{\footnotesize Branching ratio }&
{\footnotesize $_{+0.074}^{-0.071}$ }&
{\footnotesize $_{+0.19}^{-0.18}$ }\tabularnewline
\hline 
{\footnotesize QCD fake rate parametrization }&
{\footnotesize $_{+0.17}^{-0.17}$ }&
{\footnotesize $_{+0.34}^{-0.34}$ }\tabularnewline
\hline 
$W\rightarrow\tau\nu$&
{\footnotesize $_{+0.19}^{-0.19}$ }&
{\footnotesize $_{+0.19}^{-0.19}$ }\tabularnewline
\hline
\end{tabular}{\footnotesize \par}


\caption{Systematic uncertainties on $\sigma(t\bar{t})$ (in pb).}

\label{cap:Syst} 
\end{table}


\subsection{JES}

The energy scale corrections applied to data and MC have uncertainties
associated with them. These uncertainties result in systematic shift
in the measured cross section. To compute these systematics the JES
corrections in MC were shifted up (or down) by $\delta JES^{data}=\sqrt{(\delta_{syst}^{data})^{2}+(\delta_{stat}^{data})^{2}+(\delta_{syst}^{MC})^{2}+(\delta_{stat}^{MC})^{2}}$.


\subsection{Primary Vertex and Branching Ratio}

The PV and $t\bar{t}$ and W branching fractions had been assigned
uncertainties of 1\% and 2\% correspondingly, same as in \cite{alljet} 


\subsection{Luminosity}

The total integrated luminosity of the data used in this analysis
is $349\pm23$. This error yields to the uncertainty quoted in Table
\ref{cap:Syst}.


\subsection{Trigger}

The trigger parametrization systematics is computed by top\_trigger
\cite{top_trigger}.


\subsection{B-tagging}

B-tagging uncertainty effects are taken into account by varying the
systematic and statistical errors on the MC tagging weights.

These errors arise form several independent sources:

\begin{itemize}
\item B-jet tagging parametrization. 
\item C-jet tagging parametrization. 
\item Light jet tagging parametrization (negative tag rate). Derived by
varying by $\pm1\sigma$ the parametrization and adding in quadrature
8\% relative uncertainty from the variation of the negative tag rate
measured in different samples. 
\item Systematic uncertainties on the scale factors $SF_{hf}$ and $SF_{ll}$
are derived from the statistical error due to finite MC statistics. 
\item Semi-leptonic b-tagging efficiency parametrization in MC and in data
(System 8). 
\item Taggability. This includes the statistical error due to finite statistic
in the samples from which it had been derived and systematic, reflecting
the (neglected) taggability dependence on the jet multiplicity. 
\end{itemize}
The resulting effect of all of these error sources on the final number
is summarized in Table \ref{cap:b-tagging-systematics-sources}
along with the total b-ID systematic error (quoted in Table \ref{cap:Syst}).

%
\begin{table}
\begin{tabular}{|c|c|c|}
\hline 
Channel&
{\footnotesize $\tau$+jets type 2 }&
{\footnotesize $\tau$+jets type 3 }\tabularnewline
\hline
\hline 
b-tagging&
{\tiny $_{-0.13}^{+0.076}$ }&
{\tiny $_{-0.26}^{+0.41}$ }\tabularnewline
\hline 
c-tagging&
{\tiny $_{-0.20}^{+0.16}$ }&
{\tiny $_{-0.48}^{+0.60}$ }\tabularnewline
\hline 
l-tagging&
{\tiny $_{-0.0051}^{+0.0051}$ }&
{\tiny $_{-0.014}^{+0.014}$ }\tabularnewline
\hline 
$SF_{hf}$&
{\tiny $_{-0.00036}^{+0.00036}$ }&
{\tiny $_{-0.00094}^{+0.00094}$ }\tabularnewline
\hline 
$SF_{ll}$&
{\tiny $_{-0.00036}^{+0.00036}$ }&
{\tiny $_{-0.00094}^{+0.00094}$ }\tabularnewline
\hline 
$\mu$ b-tagging (data)&
{\tiny $_{-0.091}^{+0.094}$ }&
{\tiny $_{-0.24}^{+0.25}$ }\tabularnewline
\hline 
$\mu$ b-tagging (MC)&
{\tiny $_{+0.11}^{-0.10}$ }&
{\tiny $_{+0.28}^{-0.25}$ }\tabularnewline
\hline 
taggability&
{\tiny $_{-0.048}^{+0.049}$ }&
{\tiny $_{-0.13}^{+0.13}$ }\tabularnewline
\hline
\end{tabular}


\caption{b-tagging systematics sources}

\label{cap:b-tagging-systematics-sources} 
\end{table}


\subsection{Fake rate}

The systematic uncertainty, associated with the $\tau$ fake rate
is just the statistical error of the fit, described in section \ref{sub:Fit}.


\subsection{W background prediction}

The method used to describe the $W\rightarrow\tau\nu$ background
is not perfect. There are two potential sources of error

\begin{itemize}
\item Only W+4 partons MC had been used. It is however expected that W+2
and W+3 would some (albeit smaller) contribution. In order to properly
take this into account one would need to combine all jet multiplicity
samples. This leads to slight underestimation of the result. 
\item The {}``$b$ veto'' sample may contain some W contribution, from
wjjjj events. This leads to double-counting of these vents and hence
overestimation of the result. 
\end{itemize}
A conservative estimate of 50\% uncertainty on the number of W events
in the final sample had been applied. That is, by varying this number
up and down by 50\% we observed the effect on the cross section (as
quoted in Table \ref{cap:Syst}).


\section{Cross section}

The cross section is defined as $\sigma=\frac{Number\, of\, signal\, events}{\varepsilon(t\bar{t})\cdot BR(t\bar{t})\cdot Luminosity}$.
The results was the following:

\begin{center}$\tau$+jets type 2 cross section: \[
3.63\;\;_{-3.50}^{+4.72}\;\;(stat)\;\;_{-0.48}^{+0.49}\;\;(syst)\;\;\pm0.24\;\;(lumi)\;\; pb\]
 \par\end{center}

\begin{center}$\tau$+jets type 3 cross section: \[
9.39\;\;_{-7.49}^{+10.10}\;\;(stat)\;\;_{-1.18}^{+1.25}\;\;(syst)\;\;\pm0.61\;\;(lumi)\;\; pb\]
\par\end{center}

The combined cross section was estimated by minimizing the sum of
the negative log-likelihood functions for each channel. Functional
form of the likelihood function was the same that had been used for
the $e\mu$ channel (\cite{emu}). Combined cross section yields

\begin{center}\[
5.05\;\;_{-3.46}^{+4.31}\;\;(stat)\;\;_{-0.67}^{+0.68}\;\;(syst)\;\;\pm0.33\;\;(lumi)\;\; pb\]
\par\end{center}