The files report on bin counts of bytes and packets every
10 milliseconds. The portion of the data considered here is between the
180000th and the 720000th bins.
The
figure (left) shows the original Cleveland (clev) data (red) and the replayed
version of the same data (blue). The top row shows packet counts, the lower
row shows byte counts. The black line shows the sample mean for the data.
All plots show some uptend in the mean rate. This poses some issues for
queueing analysis, but might be handled either by detrending or changing
the server rate to more closely track the increasing "local mean" (preferred). The
original data shows additional spikes. As shown below, from a queueing
perspective at least some of these spikes lead to qualtitative differences
in the queueing at medium to high utilizations.
The
figure (left) shows the original Indianapolis (ipls) data (red) and the
replayed version of the same data (blue). The top row shows packet counts,
the lower row shows byte counts. The black line shows the sample mean for
the data. A linear trend in the mean rate is less evident then with the
Cleveland data. The original data
shows considerable 'spikiness' that is missing in the replayed data.
Visually these data sets appear to be quite different. Less obvious is
a fall in the mean rate near the end of the replayed data, especially the
byte counts, that is not present in the original data.
Adapting to Mean Trends
The mean plays a unique role in queueing analysis because it determines
the server rate, C, through the relationship
C=mean/utilization.
(*)
If one assumes the input data is stationary then the mean rate is the
same throughout the data. In practice, real data shows mean trends and
mean shifts. This Abilene data shows mean trends. Several queueing
comparisons were performed to deal with changes in the mean rate, including:
1) ignore them [click here],
2) use regression to approximate the mean trend line, and allow C to
vary by using the mean trend line, [click
here] and
3) find the 'local means' in short intervals, and allow C to vary by
using the local means [click here].
For this data method 2 appears to be most appropriate.
Conclusions
Visually the replayed data appears much smoother than the original
data. As a consequence the queue lengths from the replayed data are generally
much smaller (less variable). The queue length distributions reflect these
smaller queue lengths and also show qualitative differences in the rate
of tail decay. The one exception to this occurs with the Cleveland data
using local means at utilizations below 75%. This similarity may
be more of an artifact of the connections between the data and its local
means than of a similarity between the two datasets. By itself, and given
the visual differences, it is not convincing that even the Cleveland replayed
data is similar in a queueing sense to the original data.