Video Archive Format Details


Updated October 26, 2000

Introduction

COOL.STF has contracted with the Internet Archive to record 20 television channels in digital format for a year or more. The incoming video (either in digital or analog formats) is brought down to composite or S-Video (Y/C) formats and then re-encoded using FutureTel NS320 video encoders. Each archiving system runs four encoders and buffers the generated MPEG data using two hard drives. When all four of the 1GB buffers on a hard drive fills up the system switches recoding to the other hard drive and a background process then transfers the 4 x 1GB files onto a Breece Hill Q215 DLT 7000 tape robot.

The system also uses a couple of Nokia 9600 satellite receivers with special software to download the EPG (Electronic Program Guide) from the DISH Network and ExpressVu DBS services. This data (or pseudo generated events for channels that are not carried on DISH Network or ExpressVu) is used as the basis for the EPG that indexes the data on the tapes produced.

MPEG-2 Format

The MPEG-2 specification allows for many different recording formats. These are the settings we have chosen for our encoders:

Video MPEG-2
Video Resolution 2/3 D1
480x480 for NTSC
480x576 for PAL
Video Chroma Resolution 4:2:0 standard
Video Bitrate 3,000,000 bps
Audio MPEG-1 Level 2
Audio Sampling Rate 48KHz stereo
Audio Bitrate 224,000 bps
Multiplex Program Stream
Multiplex Bitrate Approx. 3.275 Mbps
PES Packet Size 1024
Pack Size One PES packet per pack

MPEG-2 Decoder Compatibility

We have succesfully tested extracted video with the following playback devices/programs. Generally speaking, for software playback of MPEG-2 video, a Pentium II class Processor running at 333MHz or higher is required.

Product Version/OS Results
Sigma Designs Hollywood Plus (PCI Card) 1.81
Windows 98
Almost perfect - doesn't interpolate the picture up to full D1 when in full-screen mode.
Sigma Designs NetStream 2 (PCI Card) 4.60 (1.60a)
Windows NT 4 SP6
Perfect
Sigma Designs NetStream 2000 (PCI Card) 1.0 build 125
Windows NT SP6 & Windows 2000
Perfect
Optibase VideoPlex Express (PCI Card) 1.2
Windows NT SP6 & Windows 2000
Perfect
Stradis SDM275 (PCI Card) 2.01.007
Windows NT SP6 & Windows 2000
Perfect
PowerDVD (Software DVD Player) 2.5
Windows 98
Perfect
MGI Soft DVD MAX (Software DVD Player) 3.32.03
Windows 98
Perfect (but must use a supported AGP graphics card to use this product)
MediaMatics DVD Express (Software DVD Player - OEM Product) 5.00.00.5.6.1
Windows 98
Perfect
Xing DVD Player (Software DVD Player) 1.61
Windows 98
Doesn't play. Irrelevent since this product has been withdrawn from the market
Ligos (Software DirectShow Filter)  1.0
Windows 98
Occasionally crashes. Doesn't interpolate the picture up to full D1 resolution
ATI DVD Player (Software DVD Player) 3.5
Windows 98
Perfect
Herosoft SDVD 2000

Windows 2000
Works very well. Occasionally, you have to right click on the video window and select "Original Size" to get a picture.

After a hardware decoder, our recommended player is PowerDVD since it offers very good picture quality and compatbility with many different PCs.

Tape Layout

Tapes are identified by a barcode label that's read by the tape loader. Tapes are written in raw format with fixed 64K blocks. Files are seperated by filemarks.

File 0 is a 64K block that contains a copy of the barcode label and a copy of this text in HTML format.

Files 1 through 32 contain MPEG-2 Program Stream Data. The size of each file is 1GB.

File 33 contains the raw event file for the particular recording system. This way, every tape contains not only the MPEG-2 data, but also the description of how to extract it and an index to the data on the tape (as well as other tapes depending on when the recording was made).

Since MPEG-2 data is already well compressed, compression is disabled on the drive. This results in 35 decimal GB of storage per tape or roughly 32.5 real GB.

Some DLT tapes yield slightly less than 32.5GB - this is due to excessive space being used to repeat tape blocks as a result of errors on the tape. For these tapes, as soon as we get an error, the tape is removed from the loader and the current file is then repeated as the first MPEG-2 file on the next tape loaded. These "bad" tapes are then restored to a seperate system, erased and then re-recorded. If the tape has 29 or more files already recorded, a copy of the EPG file is written and the tape is considered completed. If less than 29 files are on the tape, it's loaded back into a tape drive and filled to capacity like any normal tape.

Event File Layout

The event file is the key to decoding the system. It is in effect a program guide that lists all the events (or programs) that are contained in the archive. This file is tab delimited with a carriage-return & line-feed at the end of each line.

Parameter Description
event-id A unique identifier for the event. This is a 28-bit value starting at zero. The top 4-bits indicate the archiving system that recorded the event.
start-date-time The starting date and time of the event using Universal Time.
run-time The running time of the event in hours and minutes
tape-id The barcode ID of the tape containing the event.
file-number The file number on the tape that contains the event.
byte-offset The offset within the file at which the event starts. If an event spans files, the next file will use an offset of zero.
channel-name The name of the channel that carried the event.
event-title Name of the event.
event-description Description of the event.

The event file can be parsed to generate a web viewable program guide. It is expected that the data will be imported into a database, from there, all that's needed to retrieve an event from tape is the event-id.

Extracting Events as MPEG-2 Files

This is the algorithm that's used to extract an event and write it as an MPEG-2 Program Stream file. The key to finding an event is the event ID.

  1. Extract all entries in the event file that match the requested event-id
  2. After all records have been found for the requested event, find the next event on the same channel. This is the stopping point of the event.
  3. For each of the event records:
  4.  Ensure the appropriate tape is loaded (can be verified by reading the first block of file zero)
  5.  Send a command to the tape drive to space file-number - 1 filemarks into the tape
  6.  Send a command to the tape drive to space byte-offset / 64K blocks into the file
  7.  For the first event record, align the output (see below)
  8.  Write read data to the output file
  9. Loop for each event record

MPEG-2 Program Stream files need to start with a specific set of packets to ensure compatiblity between decoders. This is how to align the output so that the MPEG-2 files can be read correctly. This applies only to the first file within an event - data after the start of the event is inherently already aligned:

  1. Loop through recovered data until a pack header (00 00 01 BA) is found
  2. From this point, search until a system header (00 00 01 BB) is found taking into consideration any stuffing in the pack header. Make a copy of the system header
  3. Now search for a sequence header (00 00 01 B3). This sequence header will be part of a PES packet
  4. Search backwards until the start of the video PES packet is found (00 00 01 E0)
  5. Ensure that the video PES packet it preceeded by a pack header (00 00 01 BA). If not, keep searching. The pack header sets the System Time Clock (STC) and is very important for setting up the decoder correctly.
  6. Once this point has been found, write the pack header we just found, followed by the saved system header, followed by the video PES packet.
  7. The initial video PES packet has any data between the PES header and seqeunce header removed and the size of the PES packet is adjusted. This means that the first PES packet will typically be shorter than the remainder of the PES packets in the stream and will ensure that it doesn't contain any erroneous data left over from the previous PES packets.
  8. Because of the way the FutureTel multiplexer works and this search algorithm, this inital PES packet will always start a Group Of Pictures (GOP), which prevents incorrect frames on the first few pictures prior to the start of the GOP.
  9. Once these headers have been output correctly, the remainder of the MPEG data for the event can be written straight to disk with no processing until the end of the event.
  10. When all data for the event has been extracted, again the software aligns to a pack header, however, the pack header is replaced with an end stream header (00 00 01 BD) indicating to the decoder the end of the stream.

Note: due to some timing problems (clock synchronization related at this end), tapes recorded prior to July 1, 2000 will have an incorrect byte-offset in the EPG data. A simple workaround is to read the start of the file and compare the running-time in the MPEG-2 GOP header until it matches the time of the event (both are relative to UTC).

Samples

Sample raw event file
Sample EPG data in HTML format (01/08/2000)
Sample EPG data in HTML format (01/09/2000)
Sample EPG data in HTML format (01/10/2000)
Sample EPG data in HTML format (01/11/2000)