Version | Date | Change Log |
---|---|---|
V1.0.00 | 2008/04/02 |
|
V1.0.01 | 2008/06/19 |
|
V1.0.02 | 2008/07/14 |
|
V1.0.03 | 2008/08/28 |
|
V1.0.04 | 2008/09/02 |
|
V1.0.05 | 2008/09/05 |
|
V1.0.06 | 2008/12/19 |
|
V1.0.07 | 2008/12/29 |
|
V1.0.08 | 2008/2/3 |
|
V1.0.09 | 2009/09/01 |
|
V1.0.10 | 2009/10/07 |
|
V1.0.11 | 2010/03/02 |
|
V1.0.12 | 2010/06/07 |
|
V1.0.13 | 2011/3/14 |
|
V1.0.14 | 2011/5/23 |
|
V1.0.15 | 2011/7/12 |
|
V1.0.16 | 2011/11/01 |
|
The streaming protocol provides the message handshaking protocol for the decoder to receive and decode the video/audio data from encoders like cameras or video servers. In here, we defined the message format for authentication, getting video/audio frames and control messages. Refer the connection management.
There are three types of message in the video/audio sessions. They are Authentication frames, Video/Audio frames and Streaming control message frames.
Flow of authentication
// definitions in encryption method in Encryption in AUTHEN_REQ #define ENCRYPTION_NONE 0 // no encryption algorithm is applied for response[64]. The response[64] will be the password in plain text for authentication. #define ENCRYPTION_BASE64 1 // the base64 coding algorithm is applied for response[64].
response[64] = base64(password)
Please refer to the important note below#define ENCRYPTION_MD5 2 // the MD5 coding algorithm is applied for response[64].
response[64] = md5(password)Note: Because the length of base64(password) is longer then 64 if the length of password is longer then 53 bytes, authentication fails all the time. The ENCRYPTION_BASE64 is not proper design here. Designer should not use this mechanism to protect the security. The ENCRYPTION_MD5 method helps security then. typedef struct { char name[32]; // user name for authentication. The last character must be '\0'. Therefore, the maximum length of it is 31 // DI / Motion triggered to video and audio streaming tStreamByEvent event; // Streaming method configuration. Please refer to Event Triggered Streaming on Protocol TCP 2.0 char rsvd[20]; // reserved. not used. unsigned short StreamID; // Video Streaming ID started from 0.
For single stream encoder, this value is fixed to 0.
In dual streams encoder, QUAD Video Server, this value gives the stream/channel number. The stream or channel number is StreamID+1. For example, set StreamID=1 to get the stream 2 or channel 2 video.unsigned short Encryption; // Encryption method for response[64]. The supported encryption methods are listed above. char response[64]; // Encryption code. see the description of the definition in encryption method } AUTHEN_REQ; // definitions in authentication result in status in AUTHEN_REPLY #define AUTHEN_SUCCESS 0 #define AUTHEN_FAIL 1 #define ERROR_STREAM_ID 2 // PlatformT:4.07.11 and later typedef struct { char status; // Authentication Result. char rsvd; // reserved. not used. unsigned short StreamID; // Same definition as StreamID in AUTHEN_REQ int SockID; // The socket file descriptor of this TCP connection. This is used for exchange of control message between decoder and encoder.
In encoder, SockID = accept (socket, *address,*address_len);char CameraName[32]; // Encoder's CameraName. The last character must be '\0'. Therefore, the maximum length of it is 31. char rsvd1[88]; // reserved. not used. } AUTHEN_REPLY; Video Frames
Audio Frames
// definitions in media type in MediaType in B2_HEADER // ##### VIDEO TYPE ###### #define B2_VIDEO_MPEG4 0x01 // Video data encoded by MPEG4 #define B2_VIDEO_MJPEG 0x04 // Video data encoded by MJPEG #define B2_VIDEO_H264 0x05 // Video data encoded by H.264 // ##### AUDIO TYPE ##### #define B2_AUDIO_8KPCM 0x02 // Audio data encoded by 8KHz, 16bit in width, PCM #define B2_AUDIO_8KPCM_TS 0x03 // Audio data encoded by 8KHz, 16bit in width, PCM with timestamp #define B2_AUDIO_G711A 0x06 // G.711 a-law compressd audio data #define B2_AUDIO_G711U 0x07 // G.711 u-law compressd audio data typedef struct { char key[4]; // 4 characters key. it was fixed to key[0]=0x00, key[1]=0x00, key[2]=0x01, key[3]=0xb2 unsigned char MediaType; // see above definitions. unsigned char StreamID; // Same definition as StreamID in AUTHEN_REQ unsigned char ExtB2Len; // by default, the B2 frame is fixed length (basic B2 frame). But, for adding video intelligence feather, the B2 frame becomes variable length (advance B2 frame). This filed describes if there is more data appended to the basic B2 frame. If it is 0x00, it is the basic B2 frame. unsigned char rsvd; // not used. it has to fixed to 0x00 for backward compatible. unsigned int len; // in bytes, the length of the video/audio frame but not include this header } B2_HEADER; typedef struct { time_t TSInSec; // The timestamp in second. In encoder, TSInSec = time(NULL); unsigned char TimeZone; // the index of time zone. Please refer the time zone index mapping table. unsigned char VLoss; // video loss indication. Please refer to Definition of VLoss in VIDEO_INFO. unsigned char motion; // triggers of motion detection. Please refer to Definition of motion in VIDEO_INFO. unsigned char DIs; // signal level of DIs. In general, there are two DIs in the encoder, the bit 0/1 represent the DI 1/2 signal level. For example, the DIs=0x02, the DI1 is in low level and DI2 is in high level. unsigned int FrameCount; // frame counter. It is started from 1 and increased by 1 for every video/audio frame. This frame counter will not be increased in video loss B2 frame. unsigned char resolution; // the index of the video resolution. Please refer the video resolution index mapping table unsigned char bitrate; // the index of the video bitrate. Please refer the video bitrate index mapping table unsigned char FpsMode; // 0: Constant FPS mode (MODE 1), 1: Variable FPS mode (MODE 2) unsigned char FpsNum; // the number of FPS. This value depends on the FpsMode. If FpsMode=0, this value shows the constant FPS number. If the FpsMode=1, this value shows the FPS number of this streaming session. struct timeval timestamp; // The timestamp in 10 msec. In encoder, gettimoeofday(×tamp, NULL); unsigned short MDActives[3]; // the number of active micro-blocks in the motion region. The MDACtives[0] represents the number of active micro-blocks in the motion region 1. The micro-block is 16x16 pixels in the video image. If the motion was not triggered, the corresponding MDActives has to be zero. In PlatformW encoder and QUAD video server, these fields are fixed to 0 because it does not support this feature. unsigned char FixTimeZone; // Reserved for SDK only. The firmware will set this byte to 0x00. unsigned char isPre:1; // This filed is for CSDK. Need to decode , but not need to render;
For devices, this field is reserved and fixed to 0unsigned char PreCounts:7; // This field is for CSDK. Valid if Pre-Frame is enable(1), max : 0x07 = 127
For devices, this field is not used and fixed to 0.} VIDEO_INFO; Definition of VLoss in VIDEO_INFO
Except the QUAD video server, the bit 0 in the VLoss represents the video locked state. When it is "1", the video source was locked. When it is "0", the video loss was detected.
In SED2300Q QUAD video server, there are 4 video channels. The bit 0/1/2/3 in VLoss represent the video locked state for video channel 1/2/3/4. When the corresponding bit is set, the video loss was found in the video channel. For example, the VLoss=0x01, the channel 1 was video lost but channel 2/3/4 have video.
In ACD2000Q QUAD video server, the bit 0 in VLoss is used to indicate the video locked state in 4CH mode. In QUAD mode, the bit 0/1/2/3 represent the video locked state for video channel 1/2/3/4. When the corresponding bit is set, the video was locked in the video channel.
Here is the summary of the VLoss on encoders
Encoder Bits used in VLoss bit value when video was locked General Encoder bit 0 1 SED2300Q bit 0,1,2,3 0 ACD2000Q Single Mode bit 0 maps video loss in every channel B2 frame.
Because there is no channel ID in the video B2 frame, the client could not tell the video loss event from which channel. Client needs to link its network socket to the channel stream to monitor events in video B2 frames.1 QUAD Mode bit 0,1,2,3 map ch1,ch2,ch3,ch4 1 Sequential Mode bit 0,1,2,3 map ch1,ch2,ch3,ch4 1 Definition of motion in VIDEO_INFO
Except the QUAD video server, there are three motion regions in the video stream. The bit 0/1/2 in the motion represent the motion 1/2/3. When the corresponding bit is set, the motion in the region occurs. For example, motion=0x01, the motion region 1 has motion but motion region 2 and 3 do not have motion.
In SED2300Q QUAD video server, there is only one motion region per a channel. The bit 0/1/2/3 in the motion represent the motion in channel 1/2/3/4. When the corresponding bit is set, the motion in the channel occurs. For example, motion=0x01, the motions in channel 1 have motions but channel 2/3/4 do not have motion.
In ACD2000Q QUAD video server, one motion region is available in each channel in Single mode and 4 motion regions are available in QUAD mode. In QUAD mode, every channel has a motion region. In Single and QUAD mode, the bit 0/1/2/3 in motion represent the motion in channel 1/2/3/4. When the corresponding bit is set, the motion in the region occurs. For example, motion=0x01 in QUAD mode, the channel 1 has motion but channel 2/3/4 do not have motion.
If the camera has PIR motion sensor, the MSB is used. The 1 means the motion occurs in PIR motion sensor. Otherwise, it is 0. Currently, only some camera are built in PIR motion sensor.
Here is the summary of the motion on encoders
Important Note:
- The motion, DIs, and MDActives[3] are updated by frame basis. They will NOT be limited by event timer and motion trigger timer which are used in the control sessions.
- In PlatformT and PlatformK, the motion detection is only available in stream 1. However, these two streams use the same video source. To have motion event in stream 2, the firmware copies the motion detection results in stream 1 video B2 to stream 2 video B2.
- To simplify NVR software integration, the motion region bitmap in TCD-2000Q follows general encoder definitions.
Time Zone index mapping table for TimeZone in VIDEO_INFO
Time Zone Index Time Zone Index -12:00 0x00 +01:00 0x0D -11:00 0x01 +02:00 0x0E -10:00 0x02 +03:00 0x0F -09:00 0x03 +04:00 0x10 -08:00 0x04 +05:00 0x11 -07:00 0x05 +06:00 0x12 -06:00 0x06 +07:00 0x13 -05:00 0x07 +08:00 0x14 -04:00 0x08 +09:00 0x15 -03:00 0x09 +10:00 0x16 -02:00 0x0A +11:00 0x17 -01:00 0x0B +12:00 0x18 +00:00 0x0C +13:00 0x19 -09:30 0x20 -04:30 0x21 -03:30 0x22 +03:30 0x23 +04:30 0x24 +05:30 0x25 +05:45 0x26 +06:30 0x27 +09:30 0x28 +11:30 0x29 +12:45 0x2A   Video Resolution index mapping table for resolution in VIDEO_INFO
Mapping algorithm in video resolution and index (8 bits) is listed below. bit 7 TV Standard. For NTSC, it is 0. For PAL, it is 1. bit 6 0 : the video resolutions in D1, CIF and QCIF for NTSC and PAL.
1 : the video resolutions in VGA, Mega-Pixels.bit 5~0 Resolution Index
Video Resolution Index Video Resolution Index N160x120 (QQVGA) 0x47 P160x112 (QQVGA) 0xC7 N160x112 (QCIF) 0x02 P176x144 (QCIF) 0x05 N176x120 (QCIF) 0x06 P320x240 (QVGA) 0xC6 N320x240 (QVGA) 0x46 P352x288 (CIF) 0x04 N352x240 (CIF) 0x01 P640x480 (VGA) 0xC0 N640x480 (VGA) 0x40 P720x576 (D1) 0x03 N720x480 (D1) 0x00 N1280x720 (720P) 0x41 N1280x960 0x42 N1280x1024 0x43 N1600x1200 0x44 N1920x1080 0x45 N2032x1920 0x48 N1280x352 0x49 N1920x1072* 0x4A *: In WEB UI and URL commands, the video resolution 1920x1080 is used. Video Bitrate index mapping table for bitrate in VIDEO_INFO
Video Bitrate Index Video Bitrate Index 28 Kbps 0x00 2 Mbps 0x0A 56 Kbps 0x01 2.5 Mbps 0x0B 128 Kbps 0x02 3 Mbps 0x0C 256 Kbps 0x03 3.5 Mbps 0x0D 384 Kbps 0x04 4 Mbps 0x0E 500 Kbps 0x05 4.5 Mbps 0x0F 750 Kbps 0x06 5 Mbps 0x10 1 Mbps 0x07 5.5 Mbps 0x11 1.2 Mbps 0x08 6 Mbps 0x12 1.5 Mbps 0x09 ---- ---- Video Frame Architecture
// ##### video B2 frame ##### typedef struct { B2_HEADER header; // header.len = sizeof(VIDEO_INFO) + Length of Video Raw DATA; VIDEO_INFO info; } VIDEO_B2; // ##### extended video B2 ##### typedef struct { unsigned char ExtB2[1]; // valid if only if the ExtB2Len in B2_HEADER is not zero. Then, this extended B2 will be appended to the VIDEO_B2. The length of this extended B2 will be added to the len in B2_HEADER. Thus, this extended B2 is treated as the part of Video B2 Frame. } VIDEO_EXT_B2; In TCP 2.0 protocol, the Video B2 frame was added in the front of every encoded video frame.
In RTP protocol, the Video B2 frame might be appended or not appended in the end of video frame.
typedef struct { B2_HEADER head; // refer to B2 Header in Video Frames struct timeval timestamp; // The timestamp in 10 msec. In encoder, gettimoeofday(×tamp, NULL); unsigned char rsvd[8]; // not used, fixed to 0x00s } AUDIO_B2; Audio Frame Architecture
In TCP 2.0 protocol, the Audio B2 frame was added in the front of audio data in every audio frame. In RTP protocol, the audio B2 frame will NOT added into the audio RTP frames.
The audio frame is designed to have no fragment in TCP/UDP level. That means the length of the audio frame including the networking header (TCP/UDP/RTP) will be shorter than MTU. To achieve better AV synchronization, the audio raw data will be fragmented into small packet to send. The fragmented size in audio frame is vary with platforms. For example, in PlatformW, the audio frame size is 536 bytes in NTSC and 640 bytes in PAL. In PlatformA, the audio frame size is 1024 bytes. The size of audio frame is not changed during streaming. Therefore, to cross platforms, the length information in audio B2 or RTP header should be used to get the actual audio raw data.
These streaming control messages are only valid when the encoder is on the TCP 2.0 protocol.
Flow of streaming control message
typedef struct { STREAM_HEADER head; // refer to stream header in streaming control message frames. unsigned char msg[1]; // Streaming control message. The setting in this field depends on the MsgType. In encoder, this field indicates what control message it executed. } STREAM_MSG; // definitions in control message in msg in STREAM_MSG // ##### For Variable FPS number control, head.MsgType = MSG_VARIABLE_FPS #### // the msg[1] addresses the variable FPS number. This message is only valid when the encoder is in variable FPS mode. // ##### For PAUSE control, head.MsgType = MSG_PAUSE_CTRL #define STREAM_PAUSE_OFF 0 // resume streaming video/audio frames to decoder // ##### For STREAM timeout, head.MsgType = MSG_MAX_STREAM_TIMEOUT #define STREAM_PAUSE_ON 1 // // hold off streaming video/audio frames to decoder // the msg[1] is fixed to 0. For Variable FPS or PAUSE control :
The response of the encoder's reply with error code depends on the decoder's implementation. It could be ignore the error and send the control message again (this might be a good approach) or treat it as the video session error.
In decoder, the maximum time for waiting encoder's reply after sent out the control message is not defined here. Again, how to react if the maximum time was expired depends on the decoder's implementation (re-send might be a good approach). It might be good approach to set the maximum time to wait for encoder's reply to the maximum time of receiving video/audio time (refer to next chapter for further details)
For max streaming timeout control :
This function is designed to limit the client streaming time from preventing overloading of the encoder's capacity. Encoder will send this control message to decoder to inform encoder is going to disconnect in 5 seconds, decoder should reply this message with MSG_STATE_OK. If encoder didn't get the response before disconnect then encoder will still disconnect this connection.
There are several states in the encoder and decoder to handle the video and audio data. The state transition helps to describe the behavior (protocol) of handling video and audio data in encoder and decoder.
State Machine in Encoder
- Transition State Diagram
- There is a 5 second timer to hunt the first I frame when the authentication is successful. However, this should not be happened. If it happened, the hardware in camera must be out of order.
- The encoder will NOT closed the Video/Audio session when the client's control data/AudioOut sessions were closed.
- Timing Parameters
State Machine in Decoder
Item Maximum time in second Description Hunt I-Frame Timer 5 Maximum hunting the first I-Frame time
- Transition State Diagram
- Control Session Error means there is anything wrong in handling the control data in the control session. For example, the control socket errors.
- The network socket will be closed and re-issue a new socket connection when the state moves to Start state from another state.
- In the ECM and decoder implementations, the video session will not be closed when the control session error occurs. But in NVR implementation, the video session will be closed when the control session error occurs.
- The Timeout event depends on the timing parameters in every state. The maximum values of these timing parameters were listed below.
- Timing Parameters
Item Maximum time in second Description Connection Timer 10 TCP 3-Way Handshaking Timer Authentication Timer 5 Authentication Timer Video/Audio Receiving Timer 5 Timer for receiving Video/Audio frames
- Error Message in OSD for Decoder
- "ACCOUNT/PASSWORD ERROR, WAN_IP" when the authentication fails or authentication timer timeout
- "CONNECT TO SERVER_IP FAIL", when it fails of receiving or decoding Video/Audio frames.
- "VIDEO LOSS, WAN_IP", when there is no video frame received in Video/Audio Receiving Timer.
Control Session Design Specification
Function Availability with Platform Firmware
Event Triggered Streaming on Protocol TCP 2.0