Queueing Comparison of Original and Replayed Abilene Data

Data Description

The files report on bin counts of bytes and packets every 10 milliseconds. The portion of the data considered here is between the 180000th and the 720000th bins.
 

clev.overall.summary1.pdfThe figure (left) shows the original Cleveland (clev) data (red) and the replayed version of the same data (blue). The top row shows packet counts, the lower row shows byte counts. The black line shows the sample mean for the data. All plots show some uptend in the mean rate. This poses some issues for queueing analysis, but might be handled either by detrending or changing the server rate to more closely track the increasing "local mean" (preferred). The original data shows additional spikes. As shown below, from a queueing perspective at least some of these spikes lead to qualtitative differences in the queueing at medium to high utilizations.
 

ipls.overall.summary1.pdfThe figure (left) shows the original Indianapolis (ipls) data (red) and the replayed version of the same data (blue). The top row shows packet counts, the lower row shows byte counts. The black line shows the sample mean for the data. A linear trend in the mean rate is less evident then with the Cleveland data. The original data shows considerable 'spikiness' that is missing in the replayed data. Visually these data sets appear to be quite different. Less obvious is a fall in the mean rate near the end of the replayed data, especially the byte counts, that is not present in the original data.

Adapting to Mean Trends
The mean plays a unique role in queueing analysis because it determines the server rate, C, through the relationship
                                                                          C=mean/utilization.                                (*)
If one assumes the input data is stationary then the mean rate is the same throughout the data. In practice, real data shows mean trends and mean shifts. This Abilene data shows mean trends.  Several queueing comparisons were performed to deal with changes in the mean rate, including:
1) ignore them [click here],
2) use regression to approximate the mean trend line, and allow C to vary by using the mean trend line, [click here] and
3) find the 'local means' in short intervals, and allow C to vary by using the local means [click here].

For this data method 2 appears to be most appropriate.

Conclusions
Visually the replayed data appears much smoother than the original data. As a consequence the queue lengths from the replayed data are generally much smaller (less variable). The queue length distributions reflect these smaller queue lengths and also show qualitative differences in the rate of tail decay. The one exception to this occurs with the Cleveland data using local means at utilizations below 75%.  This similarity may be more of an artifact of the connections between the data and its local means than of a similarity between the two datasets. By itself, and given the visual differences, it is not convincing that even the Cleveland replayed data is similar in a queueing sense to the original data.