Streaming Protocol in TCP 2.0

Version V1.0.16

Version Date Change Log
V1.0.00 2008/04/02
  • Document created
V1.0.01 2008/06/19
  • Change the Rsvd[2] filed in VIDEO_INFO data structure to meet SDK implementation.
  • Remove AUTHEN_REQ_FRAME and AUTHEN_REPLY_FRAME structures
V1.0.02 2008/07/14
  • Add QVGA and QQVGA for PAL in Video Resolution index mapping table
  • Update the StreamID usage descriptions
V1.0.03 2008/08/28
  • Add Video Loss and motion descriptions for QUAD video servers
  • Add more Time Zones
V1.0.04 2008/09/02
  • change QCIF resolution to 160x112
V1.0.05 2008/09/05
  • Add more Time Zones
V1.0.06 2008/12/19
  • Change the Video Loss and Motion configurations for ACD2000Q
  • update the stream ID in the TCP header for ACE2000Q and Platform
V1.0.07 2008/12/29
  • Modify motion and video loss bit definition of B2  in ACD-2000Q
V1.0.08 2008/2/3
  • Modify motion status flag in B2 header for PlatformA general encoder
V1.0.09 2009/09/01
  • The last byte, reserved byte, in VIDEO_INFO was defined for CSDK. For firmware, this reserved byte is fixed to 0.
V1.0.10 2009/10/07
  • Update the motion in B2 for PIR motion sensor.
V1.0.11 2010/03/02
  • Add the error return code in the authentication result for streaming ID error.
V1.0.12 2010/06/07
V1.0.13 2011/3/14
V1.0.14 2011/5/23
V1.0.15 2011/7/12
  • correct the audio frame size in audio frame section
V1.0.16 2011/11/01
  • Add note for 1920x1072 resolution in the video B2 index

 

 


 

Description

The streaming protocol provides the message handshaking protocol for the decoder to receive and decode the video/audio data from encoders like cameras or video servers. In here, we defined the message format for authentication, getting video/audio frames and control messages. Refer the connection management.

Topology

Message Format

There are three types of message in the video/audio sessions. They are Authentication frames, Video/Audio frames and Streaming control message frames.

Authentication Frames

// definitions in encryption method in Encryption in AUTHEN_REQ
#define ENCRYPTION_NONE 0 // no encryption algorithm is applied for response[64]. The response[64] will be the password in plain text for authentication.
#define ENCRYPTION_BASE64 1 // the base64 coding algorithm is applied for response[64].
response[64] = base64(password)
Please refer to the important note below
#define ENCRYPTION_MD5 2 // the MD5 coding algorithm is applied for response[64].
response[64] = md5(password)
Note: Because the length of base64(password) is longer then 64 if the length of password is longer then 53 bytes, authentication fails all the time. The ENCRYPTION_BASE64 is not proper design here. Designer should not use this mechanism to protect the security. The ENCRYPTION_MD5 method helps security then.
typedef struct {
  char name[32];   // user name for authentication. The last character must be '\0'. Therefore, the maximum length of it is 31
  // DI / Motion triggered to video and audio streaming
  tStreamByEvent event;   // Streaming method configuration. Please refer to Event Triggered Streaming on Protocol TCP 2.0
  char rsvd[20];   // reserved. not used.
  unsigned short StreamID;   // Video Streaming ID started from 0.
For single stream encoder, this value is fixed to 0.
In dual streams encoder, QUAD Video Server, this value gives the stream/channel number. The stream or channel number is StreamID+1. For example, set StreamID=1 to get the stream 2 or channel 2 video.
  unsigned short Encryption;   // Encryption method for response[64]. The supported encryption methods are listed above.
  char response[64];   // Encryption code. see the description of the definition in encryption method
} AUTHEN_REQ;
// definitions in authentication result in status in AUTHEN_REPLY
#define AUTHEN_SUCCESS 0    
#define AUTHEN_FAIL 1    
#define ERROR_STREAM_ID 2 // PlatformT:4.07.11 and later
typedef struct {
  char status;   // Authentication Result.
  char rsvd;   // reserved. not used.
  unsigned short StreamID;   // Same definition as StreamID in AUTHEN_REQ
  int SockID;   // The socket file descriptor of this TCP connection. This is used for exchange of control message between decoder and encoder.
In encoder, SockID = accept (socket, *address,*address_len);
  char CameraName[32];   // Encoder's CameraName. The last character must be '\0'. Therefore, the maximum length of it is 31.
  char rsvd1[88];   // reserved. not used.
} AUTHEN_REPLY;
Flow of authentication

Video Frames
// definitions in media type in MediaType in B2_HEADER
// ##### VIDEO TYPE ######
#define B2_VIDEO_MPEG4 0x01 // Video data encoded by MPEG4
#define B2_VIDEO_MJPEG 0x04 // Video data encoded by MJPEG
#define B2_VIDEO_H264 0x05 // Video data encoded by H.264
// ##### AUDIO TYPE #####
#define B2_AUDIO_8KPCM 0x02 // Audio data encoded by 8KHz, 16bit in width, PCM
#define B2_AUDIO_8KPCM_TS 0x03 // Audio data encoded by 8KHz, 16bit in width, PCM with timestamp
#define B2_AUDIO_G711A 0x06 // G.711 a-law compressd audio data
#define B2_AUDIO_G711U 0x07 // G.711 u-law compressd audio data
typedef struct {
  char key[4];   // 4 characters key. it was fixed to key[0]=0x00, key[1]=0x00, key[2]=0x01, key[3]=0xb2
  unsigned char MediaType;   // see above definitions.
  unsigned char StreamID;   // Same definition as StreamID in AUTHEN_REQ
  unsigned char ExtB2Len;   // by default, the B2 frame is fixed length (basic B2 frame). But, for adding video intelligence feather, the B2 frame becomes variable length (advance B2 frame). This filed describes if there is more data appended to the basic B2 frame. If it is 0x00, it is the basic B2 frame.
  unsigned char rsvd;   // not used. it has to fixed to 0x00 for backward compatible.
  unsigned int len;   // in bytes, the length of the video/audio frame but not include this header
} B2_HEADER;
 
typedef struct {
  time_t TSInSec;   // The timestamp in second. In encoder, TSInSec = time(NULL);
  unsigned char TimeZone;   // the index of time zone. Please refer the time zone index mapping table.
  unsigned char VLoss;   // video loss indication. Please refer to Definition of VLoss in VIDEO_INFO.
  unsigned char motion;   // triggers of motion detection. Please refer to Definition of motion in VIDEO_INFO.
  unsigned char DIs;   // signal level of DIs. In general, there are two DIs in the encoder, the bit 0/1 represent the DI 1/2 signal level. For example, the DIs=0x02, the DI1 is in low level and DI2 is in high level.
  unsigned int FrameCount;   // frame counter. It is started from 1 and increased by 1 for every video/audio frame. This frame counter will not be increased in video loss B2 frame.
  unsigned char resolution;   // the index of the video resolution. Please refer the video resolution index mapping table
  unsigned char bitrate;   // the index of the video bitrate. Please refer the video bitrate index mapping table
  unsigned char FpsMode;   // 0: Constant FPS mode (MODE 1), 1: Variable FPS mode (MODE 2)
  unsigned char FpsNum;   // the number of FPS. This value depends on the FpsMode. If FpsMode=0, this value shows the constant FPS number. If the FpsMode=1, this value shows the FPS number of this streaming session.
  struct timeval timestamp;   // The timestamp in 10 msec. In encoder, gettimoeofday(&timestamp, NULL);
  unsigned short MDActives[3];   // the number of active micro-blocks in the motion region. The MDACtives[0] represents the number of active micro-blocks in the motion region 1. The micro-block is 16x16 pixels in the video image. If the motion was not triggered, the corresponding MDActives has to be zero. In PlatformW encoder and QUAD video server, these fields are fixed to 0 because it does not support this feature.
  unsigned char FixTimeZone;   // Reserved for SDK only. The firmware will set this byte to 0x00.
  unsigned char isPre:1;   // This filed is for CSDK. Need to decode , but not need to render;
For devices, this field is reserved and fixed to 0
  unsigned char PreCounts:7;   // This field is for CSDK. Valid if Pre-Frame is enable(1), max : 0x07 = 127
For devices, this field is not used and fixed to 0.
} VIDEO_INFO;

Definition of VLoss in VIDEO_INFO

Except the QUAD video server, the bit 0 in the VLoss represents the video locked state. When it is "1", the video source was locked. When it is "0", the video loss was detected.

In SED2300Q QUAD video server, there are 4 video channels. The bit 0/1/2/3 in VLoss represent the video locked state for video channel 1/2/3/4. When the corresponding bit is set, the video loss was found in the video channel. For example, the VLoss=0x01, the channel 1 was video lost but channel 2/3/4 have video.

In ACD2000Q QUAD video server, the bit 0 in VLoss is used to indicate the video locked state in 4CH mode. In QUAD mode, the bit 0/1/2/3 represent the video locked state for video channel 1/2/3/4. When the corresponding bit is set, the video was locked in the video channel.

Here is the summary of the VLoss on encoders
Encoder Bits used in VLoss bit value when video was locked
General Encoder bit 0 1
SED2300Q bit 0,1,2,3 0
ACD2000Q Single Mode bit 0 maps video loss in every channel B2 frame.
Because there is no channel ID in the video B2 frame, the client could not tell the video loss event from which channel. Client needs to link its network socket to the channel stream to monitor events in video B2 frames.
1
QUAD Mode bit 0,1,2,3 map ch1,ch2,ch3,ch4 1
Sequential Mode bit 0,1,2,3 map ch1,ch2,ch3,ch4 1

Definition of motion in VIDEO_INFO

Except the QUAD video server, there are three motion regions in the video stream. The bit 0/1/2 in the motion represent the motion 1/2/3. When the corresponding bit is set, the motion in the region occurs. For example, motion=0x01, the motion region 1 has motion but motion region 2 and 3 do not have motion.

In SED2300Q QUAD video server, there is only one motion region per a channel. The bit 0/1/2/3 in the motion represent the  motion in channel 1/2/3/4. When the corresponding bit is set, the motion in the channel occurs. For example, motion=0x01, the motions in channel 1 have motions but channel 2/3/4 do not have motion.

In ACD2000Q QUAD video server, one motion region is available in each channel in Single mode and 4 motion regions are available in QUAD mode. In QUAD mode, every channel has a motion region. In Single and QUAD mode, the bit 0/1/2/3 in motion represent the motion in channel 1/2/3/4. When the corresponding bit is set, the motion in the region occurs. For example, motion=0x01 in QUAD mode, the channel 1 has motion but channel 2/3/4 do not have motion.

If the camera has PIR motion sensor, the MSB is used. The 1 means the motion occurs in PIR motion sensor. Otherwise, it is 0. Currently, only some camera are built in PIR motion sensor.

Here is the summary of the motion on encoders
Encoder Number of Motion Regions Bit used in motion trigger bit value when motion occurs
General Encoder 3 bit 1/2/3 map to region 1/2/3
bit 7 maps to PIR sensor state
1
SED2300Q 4 bit 0/1/2/3 map to ch1/ch2/ch3/ch4 1
ACD-2000Q Single Mode 1 in each channel bit 0 maps to motion region 1 in each channel stream
Because there is no channel ID in the video B2 frame, the client could not tell the motion event from which channel. Client needs to link its network socket to the channel stream to monitor events in video B2 frames.
1
QUAD Mode 4 bit 0/1/2/3 map to ch1/ch2/ch3/ch4
Sequential Mode 0 None 0
TCD-2000Q
(Not Available)
Single Mode 1 in each channel bit 0 maps to motion region 1 in each channel stream
Because there is no channel ID in the video B2 frame, the client could not tell the motion event from which channel. Client needs to link its network socket to the channel stream to monitor events in video B2 frames.
1
QUAD Mode 4 bit 1/2/3/4 map to ch1/ch2/ch3/ch4
Sequential Mode 0 None 0

Important Note:

  • The motion, DIs, and MDActives[3] are updated by frame basis. They will NOT be limited by event timer and motion trigger timer which are used in the control sessions.
  • In PlatformT and PlatformK, the motion detection is only available in stream 1. However, these two streams use the same video source. To have motion event in stream 2, the firmware copies the motion detection results in stream 1 video B2 to stream 2 video B2.
  • To simplify NVR software integration, the motion region bitmap in TCD-2000Q follows general encoder definitions.

Time Zone index mapping table for TimeZone in VIDEO_INFO

Time Zone Index Time Zone Index
-12:00 0x00 +01:00 0x0D
-11:00 0x01 +02:00 0x0E
-10:00 0x02 +03:00 0x0F
-09:00 0x03 +04:00 0x10
-08:00 0x04 +05:00 0x11
-07:00 0x05 +06:00 0x12
-06:00 0x06 +07:00 0x13
-05:00 0x07 +08:00 0x14
-04:00 0x08 +09:00 0x15
-03:00 0x09 +10:00 0x16
-02:00 0x0A +11:00 0x17
-01:00 0x0B +12:00 0x18
+00:00 0x0C +13:00 0x19
-09:30 0x20 -04:30 0x21
-03:30 0x22 +03:30 0x23
+04:30 0x24 +05:30 0x25
+05:45 0x26 +06:30 0x27
+09:30 0x28 +11:30 0x29
+12:45 0x2A    

Video Resolution index mapping table for resolution in VIDEO_INFO

Mapping algorithm in video resolution and index (8 bits) is listed below.
bit 7 TV Standard. For NTSC, it is 0. For PAL, it is 1.
bit 6 0 : the video resolutions in D1, CIF and QCIF for NTSC and PAL.
1 : the video resolutions in VGA, Mega-Pixels.
bit 5~0 Resolution Index
Video Resolution Index Video Resolution Index
N160x120 (QQVGA) 0x47 P160x112 (QQVGA) 0xC7
N160x112 (QCIF) 0x02 P176x144 (QCIF) 0x05
N176x120 (QCIF) 0x06 P320x240 (QVGA) 0xC6
N320x240 (QVGA) 0x46 P352x288 (CIF) 0x04
N352x240 (CIF) 0x01 P640x480 (VGA) 0xC0
N640x480 (VGA) 0x40 P720x576 (D1) 0x03
N720x480 (D1) 0x00    
N1280x720 (720P) 0x41    
N1280x960 0x42    
N1280x1024 0x43    
N1600x1200 0x44    
N1920x1080 0x45    
N2032x1920 0x48    
N1280x352 0x49    
N1920x1072* 0x4A    
*: In WEB UI and URL commands, the video resolution 1920x1080 is used.

Video Bitrate index mapping table for bitrate in VIDEO_INFO

Video Bitrate Index Video Bitrate Index
28 Kbps 0x00 2 Mbps 0x0A
56 Kbps 0x01 2.5 Mbps 0x0B
128 Kbps 0x02 3 Mbps 0x0C
256 Kbps 0x03 3.5 Mbps 0x0D
384 Kbps 0x04 4 Mbps 0x0E
500 Kbps 0x05 4.5 Mbps 0x0F
750 Kbps 0x06 5 Mbps 0x10
1 Mbps 0x07 5.5 Mbps 0x11
1.2 Mbps 0x08 6 Mbps 0x12
1.5 Mbps 0x09 ---- ----

Video Frame Architecture

// ##### video B2 frame #####
typedef struct {
  B2_HEADER header;   // header.len = sizeof(VIDEO_INFO) + Length of Video Raw DATA;
  VIDEO_INFO info;  
} VIDEO_B2;
// ##### extended video B2 #####
typedef struct {
  unsigned char ExtB2[1];   // valid if only if the ExtB2Len in B2_HEADER is not zero. Then, this extended B2 will be appended to the VIDEO_B2. The length of this extended B2 will be added to the len in B2_HEADER. Thus, this extended B2 is treated as the part of Video B2 Frame.
} VIDEO_EXT_B2;

In TCP 2.0 protocol, the Video B2 frame was added in the front of every encoded video frame.
In RTP protocol, the Video B2 frame might be appended or not appended in the end of video frame.

Audio Frames

typedef struct {
  B2_HEADER head;   // refer to B2 Header in Video Frames
  struct timeval timestamp;   // The timestamp in 10 msec. In encoder, gettimoeofday(&timestamp, NULL);
  unsigned char rsvd[8];   // not used, fixed to 0x00s
} AUDIO_B2;

Audio Frame Architecture

In TCP 2.0 protocol, the Audio B2 frame was added in the front of audio data in every audio frame. In RTP protocol, the audio B2 frame will NOT added into the audio RTP frames.

The audio frame is designed to have no fragment in TCP/UDP level. That means the length of the audio frame including the networking header (TCP/UDP/RTP) will be shorter than MTU. To achieve better AV synchronization, the audio raw data will be fragmented into small packet to send. The fragmented size in audio frame is vary with platforms. For example, in PlatformW, the audio frame size is 536 bytes in NTSC and 640 bytes in PAL. In PlatformA, the audio frame size is 1024 bytes. The size of audio frame is not changed during streaming. Therefore, to cross platforms, the length information in audio B2 or RTP header should be used to get the actual audio raw data.

Streaming Control Message Frames

These streaming control messages are only valid when the encoder is on the TCP 2.0 protocol.

// definitions in Stream Control Message Type in the MsgType field in STREAM_HEADER
#define MSG_VARIABLE_FPS 0x20 // change the variable FPS number. Not available in PlatformT encoders
#define MSG_PAUSE_CTRL 0x21 // hold off or resume streaming the video/audio frames
#define MSG_MAX_STREAM_TIMEOUT 0x22 // The time of streaming has arrived the maximum streaming time, decoder or encoder will go to disconnect immediately.
// definitions the return status in the MsgType field in STREAM_HEADER. It indicates the result of executing the control message.
#define MSG_STATE_OK 0x80    
#define MSG_STATE_ERR    
typedef struct {
  unsigned char key[4]; // 4 characters key. it was fixed to key[0]=0x00, key[1]=0x00, key[2]=0x01, key[3]=0xb2
  unsigned char MsgType; // see the above definitions.
  unsigned char StreamID; // Not used. Fixed to 0.
  unsigned char ExtB2Len; // Not used. Fixed to 0.
  unsigned char rsvd; // Not used. It has to fixed to 0x00 for backward compatible.
  unsigned int len; // in bytes, the length of streaming control message but not include this header
} STREAM_HEADER;
typedef struct {
  STREAM_HEADER head; // refer to stream header in streaming control message frames.
  unsigned char msg[1]; // Streaming control message. The setting in this field depends on the MsgType. In encoder, this field indicates what control message it executed.
} STREAM_MSG;
// definitions in control message in msg in STREAM_MSG
// ##### For Variable FPS number control, head.MsgType = MSG_VARIABLE_FPS ####
// the msg[1] addresses the variable FPS number. This message is only valid when the encoder is in variable FPS mode.
// ##### For PAUSE control, head.MsgType = MSG_PAUSE_CTRL
#define STREAM_PAUSE_OFF 0 // resume streaming video/audio frames to decoder
// ##### For STREAM timeout, head.MsgType = MSG_MAX_STREAM_TIMEOUT
#define STREAM_PAUSE_ON 1 // // hold off streaming video/audio frames to decoder
// the msg[1] is fixed to 0.
Flow of streaming control message

For Variable FPS or PAUSE control :

The response of the encoder's reply with error code depends on the decoder's implementation. It could be ignore the error and send the control message again (this might be a good approach) or treat it as the video session error.

In decoder, the maximum time for waiting encoder's reply after sent out the control message is not defined here. Again, how to react if the maximum time was expired depends on the decoder's implementation (re-send might be a good approach). It might be good approach to set the maximum time to wait for encoder's reply to the maximum time of receiving video/audio time (refer to next chapter for further details)

For max streaming timeout control :

This function is designed to limit the client streaming time from preventing overloading of the encoder's capacity. Encoder will send this control message to decoder to inform encoder is going to disconnect in 5 seconds, decoder should reply this message with MSG_STATE_OK. If encoder didn't get the response before disconnect then encoder will still disconnect  this connection.

Connection Management

There are several states in the encoder and decoder to handle the video and audio data. The state transition helps to describe the behavior (protocol) of handling video and audio data in encoder and decoder.

State Machine in Encoder

Item Maximum time in second Description
Hunt I-Frame Timer 5 Maximum hunting the first I-Frame time
State Machine in Decoder

Item Maximum time in second Description
Connection Timer 10 TCP 3-Way Handshaking Timer
Authentication Timer 5 Authentication Timer
Video/Audio Receiving Timer 5 Timer for receiving Video/Audio frames
  1. "ACCOUNT/PASSWORD ERROR, WAN_IP" when the authentication fails or authentication timer timeout
  2. "CONNECT TO SERVER_IP FAIL", when it fails of receiving or decoding Video/Audio frames.
  3. "VIDEO LOSS, WAN_IP", when there is no video frame received in Video/Audio Receiving Timer.

See Also

Control Session Design Specification
Function Availability with Platform Firmware
Event Triggered Streaming on Protocol TCP 2.0

Back to

HOME