Mobile Cloud Storage Dataset

We contribute to research community a dataset consisting of HTTP-level request logs from storage front-end servers of a mobile cloud storage service. The service is very similar to Google Drive. The dataset spans one week and consists of about 350 million HTTP request logs. Each day corresponds to a single plain-text file as listed below. We have also adapted a small sample file consisting of several thousands lines, to facilitates fast browsing of dataset format.

In particular, each line corresponds to a HTTP request with 10 fields.

  1. Timestamp in seconds: relative to the first request in the dataset.
  2. Mobile device type: 0 for Android, 1 for iOS.
  3. Device ID (anonymized): numerical ID that uniquely identifies a mobile device.
  4. User ID (anonymized): numerical ID that uniquely identifies a registered user. A user might use several devices.
  5. Request type: 0 for file storage operation request, 1 for file retrieval operation request, 2 for chunk uploading request, 3 for chunk downloading request.
  6. Data volume: the volume of uploaded (resp. downloaded) data for a storage (resp. retrieval) request.
  7. Request processing time: the duration between the first bytes received by front-end server and the last bytes sent to mobile client.
  8. Upstream response Time: the time spent in storing/preparing the requested content by upstream storage servers, i.e., the servers that physically host the data. This value is missed in some logs. In this case, the time is '-1'.
  9. RTT: the average of all RTTs measured for the TCP connection on which the HTTP request is transferred. If missed, '-1' is assigned.
  10. Proxied or not: whether the request is proxied or not, obtained from the HTTP header X-FORWARDED-FOR. 0: not; 1: proxied.