[PDF](12) Ulllted States Patent (10) Patent N0.: US 7715475...
7 downloads
133 Views
2MB Size
US007715475B1
(12) Ulllted States Patent
(10) Patent N0.:
Puri et al. (54)
(45) Date of Patent:
CONTENT ADAPTIVE VIDEO ENCODER Inventors:
Puri’ Riverdales NY
6,161,137 A *
Mehmet
Reha Civanlar, Middletown, N] (US) (73)
Assignee:Z AT&T Intellectual Property H, LIP‘,
1/2001 Kobayashiet a1.
6,182,069 B1 *
1/2001
6,256,423 B1
7/2001 Knshnamurthy et a1.
6,263,022 B1 6,266,442 B1 *
7/2001 Chen et a1. 7/2001 Laumeyer et a1. ......... .. 382/190
6,266,443 B1
7/2001 Vetro et a1. 9/2001 Lubln et a1.
.
.
.
6,388,688
Subject‘ to any d1scla1mer, the term ofth1s
B1
5/2002
6/2002 Apostolopoulos
6,430,591 B1
8/2002 Goddard
This patent is subject to a terminal dis-
(Continued)
Clalmer'
OTHER PUBLICATIONS
App1_ NO; 10/970,607
Abe Shinji et a1. , Video Retrieval Method, Feb. 13, 1998, Nippon Corporation, pp. 1-6 (JP410040260A).*
Oct‘ 21’ 2004
Primary ExamineriGims S Philippe
Related US. Application Data
(63) ilonmgngagaoln givjlgjaltlcgloolg '
(51)
(52)
’
’
'
Int_ CL H04N 7/12 G06K 9/46 U 5 Cl
58
F,‘ I'd
( )
1e S
'
(57)
74’873’ ?led on ’
’
'
(200601) 375/240 382/190
""
"""
37’5/240 01
1. t.
?l f
ee app 10a Ion
(56)
'
’
38'2/’190_ G’06F 17/30’
1 t
h’h. t
e or Comp 6 e Seam
9/1995
sen from a plurality of encoders. Each encoder is associated With a model. The chosen encoder is particularly suited to bit-stream for each segment includes information regarding
Which encoder Was used to encode that segment. A matching decoder of a plurality of decoders is chosen using the infor mation in the coded bitstream to decode each segment using
U.S. PATENT DOCUMENTS 5,448,307 A
Video content into segments based on prede?ned classi?ca tions or models. Examples of such classi?cations comprise action scenes, sloW scenes, loW or high detail scenes, and brightness of the scenes. Based on the segment classi?ca
encoding the unique subject matter of the segment.The coded
15 Dry‘
References Cited
5 513 282 A 5,579,052 A 5,748,789 A
A system for content adaptive encoding and decoding Video is
tions, each segment is encoded With a different encoder cho
0 37;‘72sZ0c38l02I202zrc24/6 709/2'24f ’
ABSTRACT
disclosed. The system comprises modules for segmenting
(200601)
'
Schileru-Key
6,404,814 B1 *
et a1. ................... .. 375/24012
U-S-C- 154(1)) by 1599 days-
(22) Filed:
6t ill. ............... .. 707/6
6,285,797 B1
Patent 15 extended Or ad1usted under 35
(21)
12/2000 Ogdon et a1. ............. .. 709/224
6,175,596 B1
’ .
Not1ce:
*May 11, 2010
New York NY (Us) .
(*)
US 7,715,475 B1
a decoder suited for the_ classi?cation _ or model of the seg_ _ ment. If scenes ex1stWh1ch do not fall 1n a prede?ned class1 ?cation, or Where classi?cation is more dif?cult based on the scene content, these scenes are segmented, coded and
Gelissen et a1.
4/1996 Williams 1 1/1996 Amen 5/199g Lee et 31‘
5,946,043 A *
8/1999 Lee et a1. ............. .. 375/24024
5,949,484 A *
9/1999 Nakaya et a1. ......... .. 348/384.1
6,094,457 A
7/2000 LiIlZer et a1
decoded using a generic coder and decoder.
17 Claims, 15 Drawing Sheets
DOB CONTENT
MODEL MAPPER I130
MODEL A VIDEO
[5"
[825
SEMANTIC AND smucwn: DESCRIP'TORS
WWW” 8's COMPARATOR
unuuun COMPUTER AND SELECTOR
[BIB
[82B
( 305
I
15°‘
| connnnnon I» “1mm 520
connnnnron
,szz T 312 comm MODEL B VIDEO SEMANTIC AND STRUCTURE DESCRIPTORS
810
840
5“
COMPUTER AND
,m MINIMUM
5"
COMPUTER AND
SELECTOR
\
—»
SELECT”
I COMPARATOR
common
“2
835
3J2‘
US 7,715,475 B1 Page 2 US. PATENT DOCUMENTS 6,477,201 B1* 6,493,023 B1
_ 11/2002 Wlne et a1. ........... .. 375/24008 12/2002 Watson
6,493,386 B1 12/2002 Vetro @1916,496,607 B1 * 12/2002 Krishnamurthy et a1. 6,516,090 B1
6,539,060 6,549,658 6,631,162 6,643,387
B1 B1 B1 B1
2/2003 Lennon eta1~
3/2003 4/2003 10/2003 11/2003
Lee etal. Schweid eta1~ Lee etal. Sethuraman et a1.
382/282
6,665,346 B1
12/2003 Lee et a1.
6,671,412 B2 6,678,413 B1 6,704,281 B1*
12/2003 Katata 6131. 1/2004 Liang et a1‘ 3/2004 Hourunranta etal. ..... .. 370/230
6,748,113 B1 6,763,069 B1
6/2004 Kondo 6131. 7/2004 Divakamn et 31‘
6,909,745 B1*
6/2005 P11116131. ............ .. 375/240.01
7,245,821 B2 7,456,760 B2
* cited by examiner
7/2007 Okada 11/2008 Normile 6131.
US. Patent
May 11,2010
Sheet 1 0115
US 7,715,475 B1
EA mu22 2x2m
52222352>m2“ 27.2%
>2221 222.>2\2 2 2
. 2-:02,51“ 2 m 2 M H 0 1 2 | > 2 0 N2\N_~\22§\5212 2 222:2>%25
km2 21|.2
\P
2?“ N.QNE .\QNRN
4|5222W2N2 2 2 > 5 3 2 > 1 0 ° 2~\ §\ 52.1 2221
.25.28
US. Patent
May 11,2010
Sheet 5 0f 15
US 7,715,475 B1
i
g
.
:3 r.___________________..._____,.._._________
_-__..__
xmi -
mamQ2_Ma“ 2i Q:
“tile n 22E2:_E
"~§< 62w\xa§s
HI [InL__\1
|.fr!_b
_ _ _ _
Twig}; m 1.uw “252nE m m n _ n #25 on E
L-.:Emwé"?e.
:2-."rmun4*
all n “ u $550an
.nQbw2.1‘
rl_TIf
ti. “ is: a: u _E _
+" 7n u izk@228mzam a n:0;52_ ‘w?wz "“a;e:é
was;uFr__2:§e:-2I15;5:
V3u":>“328as:5? #TJ \zs5z?2uw‘asm"%h"+}m1:?i%.m hé 12“mags azwu?wwm _ _, a r_ u 22325%.1m 71u 7E E @2 5E: I2 n7 5$£2812\528 3 m415 m :18“:.E9b25FE;323
o:itn_I7'1Ll|!.
25:2 unL4 _o$152
n on 5:22s: isN _
+€J ..2_\82a
fL_ 7m :3 u H $52
2513_-2 _ _
\ u \ “ n n _ om _ _
_
l-I|'.‘J
US. Patent
May 11,2010
Sheet 6 0f 15
US 7,715,475 B1
_.
J]; KIT 5:2217 ne2_c “ u u m 55:8
.EL5%:22na?
N8mn“oz1Mm2“i3#so5u25
m@$2.5Q;6“82E2
m 123%H .u $5287
“ u 558 21 n u 2i 2222><
"m51.8$152>18" n u 51 21
_ -
_ .
_ -
x_ _
m @228HE2a”7
m 7:1 £1$5:2782;
u u
m82>mis:12$3528 u552;m227)52
E58 \ e2 _.
" nas855m 21 ke: w _
@m.rE
m Q;217.114. m2151H$5an2 82.;
053mm 2_7
_ -
lI J
US. Patent
May 11,2010
Sheet 7 0f 15
ENCODER SET
GENERIC MODEL VIDEO ENCODER / 90s
CONTENT MODEL A VIDEO ENCODER
CONTENT MODEL 6 VIDEO ENCODER
US 7,715,475 B1
US. Patent
625:.
-.5:r%sa
May 11,2010
Sheet 8 0f 15
US 7,715,475 B1
\» 501 M “N: n
_ _ L
:81 m 5:1 "=2:~827n5:555::
iLmfl:u5|wS3l?1m» E5_u"1|:55:$.|:52J_58: msm_.\»i"l:.v+2_wm .
i 52n\_55:5:75
.a::25: .f as_L v 22r
\ _ _ _
55:5 n 52 , _\ #52 m u h“5255.22:8: u¢
n u 32
m :2 J" 5252
z:n w w n 3.2@265:SE\.
m E5:5 :
m 7, 528n55m5:::5
nI u21 5255_
A \W _ 5s:.
m _ 55:5 n , 5:::2 5m
_ _
I un15:::.515::e::
m 5:5nrw 5::e:505:0252:
n " ’ N2:Em5::monmdm
US. Patent
FI G.
May 11,2010
Sheet 9 0115
US 7,715,475 B1
11 /§48
_
_
_
SEGMENT DESCRIPTION ENcDDER
[1102
AOL» SEGMENT DESCRIPTORS,
[308
’ "04
ID, TIME cDDE VALUES 7
10 mm MAPPER
H06 ’
[1110
SW
BINARY com
‘3508
INDIcEs 1o —-—L> V
LUT1
110911‘
|
"08’
,1112 i310.
SUBSEGMENT/ROI DESCRIPTORS, ID, TIME
r312
em VALUES T0
LUT1 ADDREss
I
[1120 r1114 # INDICES To BINARY CODE r1116 M2
M3506 550D I
INDIcEs MAPPER I
11194
LU_T 2
‘"8
‘ £322
ADDRESS
PREéHIJQEZSING 7
VALUES TO INDEX
III/0:113:10 ’ “24
112811‘
TO INDEX MAPPER
[1.140 313%)‘ C?“
[1134
113B’? LUT 4
“56
I 346
.
I
I550F
LUT 4
‘
I
350E
LUT s ADDRESS
"26
{332
I
LUTZI
.
11132 CONTENT MODEL VALUE
'
BINARY com
MAPPER
I
-
'
ADDRESS
,1142
[I150
coDINc NoIsE FILTER VALUES Io [NDlCES
INDICES T0 BINARY com
'1'“
MAPPER
-
LUTS
1148A, '
LUT s
"46“
ADDREss
3506
I
'
V
US. Patent
FI G.
May 11,2010
Sheet 10 0f 15
US 7,715,475 B1
12 [302
_
_
SEGMENT DESCRIPTION 0Ec00ER
[1200 436”-
- BINARY CODE
[350B
f 1210 ’ ‘208
10 INDlCES LUT 1
1209 r
DESCRIPTORS, ID, TIME I
LUT 1 ADDRESS
‘202
[1210 ,ssoc.
3 r 5°”
7
‘366
7 CODE VALUES MAPPER
12044
I
364
INmcEs Io sEcNENI —-J—>
f 1220
BINARY CODE
,121s?
IN0IcEs TO
To INDICES HM
SUBSEGMENT/ROI {1219* oEscRIRIoRs. [0, TIME
I I
30s v
370 I
CODE VALUES MAPPER -
12144
‘212
J ‘ME
w
'
LUT 2
ADDRESS
BINARLZiZDE TO INDEX LUT 3
.
I
’ m8?
601E120
PREPROCESSING
w
JSBOF
.
LUT 3
BI;‘8R:NDC&DE M 4 I
[1240
I374
MODEL VALUE MAPPER I
ADDRESS
[1240 -
BINARY CODE
r 3606
T0 INmcEs LUT s
12444
I
‘242
I
,1238_ INDEX TO coNIENI
12344 m 4
1232
372
ADDRESS
[1230
l
I
VALUES MAPPER
12244
‘222
'
¢
LUT 5
ADDRESS
[1250 IN0IcEs I0 CODING
-
’ ‘248' mm FILTER VALUES MAPPER
376
I
l
-
US. Patent
May 11,2010
Sheet 11 0115
FIG.
US 7,715,475 B1
13 37
8 MODEL VIDEO DECODERS
GENERIC MODEL VIDEO DECODER / 130a comm MODEL A VIDEO DECODER
CONTENTMODELG VIDEO DECODER L--->
US. Patent
May 11,2010
Sheet 13 0f 15
US 7,715,475 B1
FIG. 15
mwsDC
3
EmD
W5 or. .m& 00
33 BA
w88
_I
R
:SA EMF4|NBF5BEH0 4 wrME
_ 1
‘ml-IGR‘IN
lw. o0_
5 2
6I|l
W TR 0 I I E R % C
IBM-lu ’
3 70
_i.L.
l
1 5 10 2
US. Patent
May 11,2010
FIG.
Sheet 14 0115
US 7,715,475 B1
16a
(EA; ‘602w ANALYZE INPUT VIDEO, CLASSIFY AND EXTRACT VIDEO SEGMENTS
1606
1,6,, REPRESENT
INPUT NEXT VIDEO SEGMENT A
‘608w
V
f IEIU
ANALYZE VIDEO SEGMENT,
REPRESENT B‘
IDENTIFY SUBSEGMENTS/ ROI
7 DESCRIPTORS
SELECTIVELY , SPATIA LLY AND 1612~/~
A_
r DESCRIPTORS
TEMPORALLY DOWNSAMPLE
I16“ V
SUBSEGMENTS/ROI I
CA
/ I6I8
‘ 616w ASSIGN AN APPROPRIATE MODEL TO EACH SEGMENT IF POSSIBLE
REPRESENT D‘ DESCRIPTORS
V
1620
16824
GENERIC? MODEL
ENCODE USING AN ENCODER FROM PLURALITY OF
7
YES
ENCODERS, EACH ASSOCIATED WITH SPECIFIC CONTENT
MODEL L
I
IIBZZ
ENCODE USING GENERIC MODEL
ENCODER
r
[1628
ESTIMATE DIFFERENT TYPES
REPRESENT E
FILTERS FOR THEIR REMOVAL
S
ENCODE
1626
I
T
/
1632
I630~? SEGMENT DESCRIPTION
1
MULTIPLEX AND SEND TO CHANNEL 1634 ALL SEGMENTS ENCODED ? YES
US. Patent
May 11,2010
Sheet 15 0115
US 7,715,475 B1
FIG. 16b OPEN CONNECTION TO CHANNEL @1702 TO BEGIN RECEIVING BITSTREAM
CONTINUE RECEIVING BITSTREAM AND DEMULTIPLEX
ARE THESE SEGMENT DESCRIPTION BITS ?
J“ 1704
YES
/ I 71 O DERIVE DESCRIPTORS
DECODE USING GENERIC MODEL DECODER
I |
Al
DECODE USING DECODER FROM A PLURALITY OF DECODERS, EACH f 1 71 6 ASSOCIATED WITH A SPECIFIC CONTENT MODEL
I
I
+1
APPLY CODING NOISE REMOVAL FILTERS
I SELECTIVELY, SPATIALLY AND TEMPORALLY
UPSAMPLE SUBSEGMENTS/ROI
I ASSEMBLE VIDEO SEGMENT FOR OUTPUT TD DISPLAY
SEGMENTS DECDDED
DERIVE DESCRIPTORS DERIVE DESCRIPTORS DERIVE DESCRIPTORS I728
US 7,715,475 B1 1
2
CONTENT ADAPTIVE VIDEO ENCODER
bi-directional video and, hence, alloWs the use of high com
RELATED APPLICATIONS
plexity encoders and can tolerate larger delays. The largest application of the second group is entertainment and, in par ticular, distribution of full-length movies. Compressing mov
The present application claims priority to US. patent application Ser. No. 09/874,873, ?led Jun. 5, 2001, the con tents of Which are incorporated herein by reference. The present disclosure is related to: Ser. No. 09/874,872, entitled “A Method of Content Adaptive Video Encoding” ?led concurrently hereWith and Which is incorporated herein
ies for transmission over the common broadband access pipes
such as cable TV or DSL has obvious and signi?cant appli cations. An important factor in delivering movies in a com
mercially plausible Way includes maintaining quality at an acceptable level at Which vieWers are Willing to pay. The challenge is to obtain a very high compression in
coding of movies While maintaining an acceptable quality.
by reference; Ser. No. 09/874,879, entitled “A System for Content Adaptive Video Decoding”, ?led concurrently here With and Which is incorporated herein by reference; Ser. No. 09/874,878, entitled “A Method of Content Adaptive Video Decoding” ?led concurrently hereWith and Which is incorpo rated herein by reference; and Ser. No. 09/874,877, entitled “A System and Method of Filtering Noise” ?led concurrently hereWith and Which is incorporated herein by reference. FIELD OF THE INVENTION
The video content in movies typically covers a Wide range of characteristics: sloW scenes, action-packed scenes, loW or
high detailed scenes, scenes With bright lights or shot at night, scenes With simple camera movements to scenes With com
plex movements, and special effects. Many of the existing video compression techniques may be adequate for certain 20
The invention relates to the encoding of video signals, and
more particularly, content adaptive encoding that improves e?icient compression of movies. 25
BACKGROUND OF THE INVENTION
Video compression has been a popular subject for aca demia, industry and international standards bodies alike for more than tWo decades. Consequently, many compressors/ decompressors, or coders/decoders (“codecs”) have been
types of scenes but inadequate for other scenes. Typically, codecs designed for videotelephony are not as ef?cient for coding other types of scenes. For example, the International Telecommunications Union (ITU) H.263 standard codec per forms Well for scenes having little detail and sloW action because in video telephony, scenes are usually less complex and motion is usually simple and sloW. The H.263 standard
optimally applies to videoconferencing and videotelephony for applications ranging from desktop conferencing to video surveillance and computer-based training and education. The H.263 standard aims at video coding for loWer bit rates in the
30
range of 20-30 kbps. Other video coding standards are aimed at higher bitrates or other functionalities, such as MPEG-l (CDROM video),
developed providing performance improvements or neW
MPEG-2 (digital TV, DVD and HDTV), MPEG-4 (Wireless
functionality over the existing ones. Several video compres sion standards include MPEG-2, MPEG-4, Which has a much Wider scope, and H.26L and H.263 that mainly target com
video, interactive object based video), or still images such as J PEG. As can be appreciated, the various video coding stan 35
munications applications. Some generic codecs supplied by companies such as Microsoft® and Real Networks@ enable the coding of generic video/movie content. Currently, the MPEG-4 stan dard and the H.26L, H.263 standards offer the latest technol ogy in standards-based codecs, While another codec DivX;-)
sion techniques adequately provides acceptable performance 40
over the Wide range of video content.
45
FIG. 1 illustrates a prior art frame-based video codec and FIG. 2 illustrates a prior art object based video codec. As shoWn in FIG. 1, a general purpose codec 100 is useful for coding and decoding video content such as movies. Video information may be input to a spatial or temporal doWnsam
is emerging as an open-source, ad-hoc variation of the MPEG-4 standard. There are a number of video codecs that do not use these or earlier standards and claim signi?cant
improvements in performance; hoWever, many such claims are dif?cult to validate. General purpose codecs do not pro
pling processor 102 to undergo ?xed spatial/temporal doWn sampling ?rst. An encoder 104 encodes video frames (or ?elds) from the doWnsampled signal. An example of such an
vide signi?cant improvement in performance. To obtain sig ni?cant improvements, video codecs need to be highly adapted to the content they expect to code. The main application of video codecs may be classi?ed in tWo broad categories based on their interactivity. The ?rst category is interactive bi-directional video. Peer-to-peer communications applications usually involve interactive bi directional video such as video telephony. In video telephony, the need exists for loW delay to insure that a meaningful
50
and/or temporally upsample the frames for display. 55
tribution applications, including broadcast and Video-on-De mand (VoD). This second category usually does not involve
FIG. 2 shoWs a block diagram of a specialiZed obj ect-based codec 200 for coding and decoding video objects as is knoWn in the art. Video content is input to a scene segmenter 202 that
segments the content into video objects. A segment is a tem
system requires each terminal both to encode and decode
and cost and siZe issues require similar complexity in the encoders and decoders (the encoder may still be 2-4 times more complex than the decoder), resulting in almost a sym metrical arrangement. The second category of video codecs relates to video dis
encoder is an MPEG-l or MPEG-2 video encoder. Encoder 104 generates a compressed bitstream that can be stored or
transmitted via a channel. The bitstream is eventually decoded via corresponding decoder 106 that outputs recon structed frames to a postprocessor 108 that may spatially
interaction can be achieved betWeen the tWo parties and the audio and video (speaker lip movements) are not out of syn chroniZation. Such a bi-directional video communication
video. Further, loW delay real-time encoding and decoding
dards, While being e?icient for the particular characteristics of a certain type of content such as still pictures or loW bit rate transmissions, are not optimal for a broad range of content characteristics. Thus, at present, none of the video compres
poral fragment of the video. The segmenter 202 also produces 60
a scene description 204 for use by the compositor 240 in reconstructing the scene. Not shoWn in FIG. 2 is the encoder
of the scene description produced by segmenter 202. The video objects are output from lines 206 to a prepro
cessor 208 that may spatially and/ or temporally doWnsample 65
the objects to output lines 210. The doWnsampled signal may be input to an encoder 212 such as a video object encoder using the MPEG-2, MPEG-4 or other standard knoWn to
US 7,715,475 B1 3
4
those of skill in the art. The contents of the MPEG-2, MPEG
The descriptors are used to select an encoder from plurality of encoders for each portion (segment/subsegment or ROI) so
4, H.26L and H.263 standards are incorporated herein by reference. The encoder 212 encodes each of these video objects separately and generates bitstreams 214 that are mul tiplexed by a multiplexer 216 that can either be stored or
that the highest compression can be achieved for the particu lar content in each portion. The descriptors are a predeter mined set of classi?cations such as, for example, action inten
transmitted on a channel 218. The encoder 212 also encodes
sity, details in a scene, or brightness of the scene. Each
header information. An external encoder (not shoWn) encodes scene description information 204 produced by segmenter
encoder uses a coding algorithm particularly suited to e?i ciently encode and produce a bitstream according to the char acteristics of each segment. One of the available encoders is a
202.
The video objects bitstream is eventually demultiplexed
generic encoder preserved for segments that have classi?ca
using a demultiplexer 220 into individual video object bit
tions that do not ?t the prede?ned classi?cations or that are
streams 224 and are decoded in video object decoder 226. The
dif?cult to classify. The descriptors used according to the present invention may have some overlapping general descriptions to those
resulting decoded video objects 228 may undergo spatial and/or temporal upsampling using a postprocessor 230 and the resulting signals on lines 232 are composed to form a
used in the MPEG-7 standard. For example, a classi?cation of camera motion may be a descriptor used in both MPEG-7 and in the classi?cation of video content according to the present
scene at compositor 240 that uses a scene description 204
generated at the encoder 202, coded by external means and decoded and input to the compositor 240. Some codecs are adaptive in terms of varying the coding scheme according to certain circumstances, but these codecs generally change “modes” rather than address the di?iculties
invention. HoWever, as explained above, the descriptors used in MPEG-7 are for video indexing and retrieval rather than 20
The coded bitstream includes information about the selected encoder. This information enables the selection of a
explained above. For example, some codecs Will sWitch to a different coding mode if a buffer is full of data. The neW mode
may involve changing the quantiZer to prevent the buffer from again becoming saturated. Further, some codecs may sWitch
matching decoder chosen from a plurality of decoders. Each encoder/decoder pair is designed to ?t a range of character 25
modes based on a data block siZe to more easily accommodate
varying siZed data blocks. In sum, although current codecs may exhibit some adaptiveness or mode selection, they still fail to address the ine?iciencies in encoding and decoding a Wide variety of video content using codecs developed for
encoding and decoding processes.
istics, or a model, and is referred to as a codec for that model.
For example, camera motions such as Zooming and rotations may require sophisticated tools such as Zoom and rotation compensation and may belong to a particular model. A multi media or video portion having these particular camera 30
narroW applications.
motions may be encoded by its corresponding codec to
achieve higher e?iciency. In another example, in some video scenes, a specialiZed type of subscene Will shoW a conversation betWeen tWo
SUMMARY
What is needed in the art is a codec that adaptively changes its coding techniques based on the content of the particular
people using the technique called “opposing glances.” In an 35
video scene or portion of a scene. The present invention
alleviates the disadvantages of the prior art by content adap tive coding in Which the video codec adapts to the character istics and attributes of the video content. The present inven
40
tion is preferably targeted for coding and decoding full-length
encode the segments in an e?icient manner for that model. Thus for one scene, background information may be encoded
feature movies, although it is clearly applicable to any deliv ery of content. The present invention differs from existing codecs that treat video content as a sequence of frames con
sisting of matrices of pixels. Rather, the present invention
45
relates to segmenting the movie into fragments or portions that can be coded by specialiZed coders optimiZed for the This segmentation/classi?cation process may involve a
decoded by the corresponding generic decoder before assem 50
55
into portions. Segment and subsegments are typically tempo video. The content may also be divided in other Ways for 60
For example, a region of interest that covers several frames
describing the scene. In another aspect of the invention, the process is fully automated.
closely matches the model.
The present invention may be understood With reference to
the attached draWings, of Which: FIG. 1 illustrates a prior art frame-based video codec; FIG. 2 illustrates a prior art object-based video codec; FIG. 3 shoWs an exemplary content adaptive segment
can be referred to as a spatio-temporal fragment or portion.
The classi?cation of portions canbe done manually by human operator or semi-automatically by human operator helped by a specialiZed editing system designed to extract descriptors
ing a model for each segment of the content from a list of existing models, and selecting a codec that matches or most
BRIEF DESCRIPTION
ral fragments of the video, and a ROI is a spatial fragment of
classi?cation and comparison to prede?ned content models.
bly for display. The invention disclosed herein comprises a system and method of adaptively analyzing video content and deterrnin
automatically Will likely be negligible. The proposed coding structure preferably classi?es a video/movie into segments and further into subsegments and regions of interest (ROIs). The video content is thus divided
by one encoder and information associated With the speakers (perhaps de?ned as an ROI for that frame) encoded by a different encoder. Some scenes Will not ?t one of the prede?ned models. The segments associated With these scenes Will be transmitted to a generic encoder and after transmission or storage, Will be
properties of the particular fragment. manual operation, semi-manual of automatic method. Con sidering the cost of movie production, the increase in cost to perform this process either manually, semi-automatic, or
opposing glances scene, the camera focuses alternatively on the tWo participants in the scene. The segments and/or sub segments associated With an opposing glances scene Will be mapped to have associated models for video content that does not vary much throughout the scene. The segments are then transmitted to an encoder of the plurality of encoders that Will
65
based video codec; FIG. 4 is a diagram shoWing an example of video/movie sequence consisting of a number of types of video segments;
US 7,715,475 B1 6
5 FIG. 5 is a diagram showing an example of an “opposing
object motion or camera motion. The MPEG-7 standard,
glances” video segment consisting of a number of subseg ments; FIG. 6 is a block diagram illustrating a semantics and global scene attributes-based classi?er and video segments
hoWever, is primarily focused on providing a quick and e?i cient searching mechanism for locating information about various types of multimedia material. Therefore, the MPEG-7 standard fails to address video content encoding and decod
extractor;
mg.
FIG. 7 is a block diagram illustrating a structure and local scene attributes based classi?er, and a subsegments and ROI
and index audio/video content to enable such uses as a song
The MPEG-7 standard is useful, for example, to describe
identi?er;
location system. In this example, if a person Wishes to locate a song but does not knoW the title, the person may hum or sing a portion of the song to a speech recognition system. The received data is used to perform a search of a database of the indexed audio content to locate the song for the person. The concept of indexing audio/video content is related to the present disclosure and some of the parameters and methods of
FIG. 8 shoWs a block diagram of a semantic and structure descriptors to nearest content model mapper;
FIG. 9 is a block diagram illustrating an exemplary set of content model video segment encoders; FIG. 10 is a block diagram illustrating a coding noise
analyZer and ?lter decoder;
indexing content according to MPEG-7 may be applicable to the preparation of descriptors and identi?ers of audio/video
FIG. 11 is a block diagram illustrating a segment descrip
tion encoder; FIG. 12 is a block diagram illustrating a segment descrip
content for the present invention.
Returning to the description of present invention, the
tion decoder; FIG. 13 is a block diagram illustrating an exemplary set of content model video segment decoders; FIG. 14 is a block diagram illustrating a set of coding noise
20
308 of FIG. 3 are shoWn as single signals, but are vectors and
carry information for all portions in the video content. The descriptors may be similar to some of the descriptors used in
removal ?lters; FIG. 15 is a block diagram illustrating an exemplary video segment scene assembler; and FIGS. 16a and 16b shoW an example of a method of encod ing and decoding a bitstream according to an aspect of the
MPEG-7. HoWever, the descriptors contemplated according 25
present invention. DETAILED DESCRIPTION
descriptors, identi?ers and time code output on lines 306 and
30
to the present invention are beyond the categoriZations set forth in MPEG-7. For example, descriptors related to such video features as rotation, Zoom compensation, and global motion estimation are necessary for the present invention but may not be part of MPEG-7. Portions output on lines 304 are input to a locator or loca
tion module 310 that classi?es the portion based on structure
The present invention may be understood With reference to FIGS. 3-16b that illustrate embodiments and aspects of the invention. FIG. 3 illustrates a system for providing video content encoding and decoding according to a ?rst embodi ment of the invention. A block diagram of the system 300
35
illustrates a specialiZed codec for coding and decoding video portions (segments, subsegments or ROIs). The video por tions may be part of a movie or any kind of video or multi media content. The video content is input via line 301 to an
region of interest means noting coordinates of a top left comer (or other comer) and a siZe, typically in an x and y dimension, 40
extractor 302 for semantic and global statistics analysis based on prede?ned classi?cations. The extractor 302 also performs video segments extraction. The outcome of the classi?cation and extraction process is a video stream divided into a number
of portions on outputs 304, as Well as speci?c descriptors output on line 306 de?ning high level semantics of each
45
portion as Well as identi?ers and time code output on line 308
for each portion. The terms “portion” or “fragment” are used herein may most commonly refer to a video “segment” but as made clear above, these terms may refer to any of a segment, subsegment,
and local statistics. The locator 310 also locates subsegments and regions of interest (ROI). When a classi?cation of motion, color, brightness or other feature is local Within a subsegment, then the locator 310 may perform the classi?ca tions. When classi?cations are globally uniform, then the extractor 302 may classify them. The process of locating a
50
of an area of interest. Locating an area of interest may also include noting a timecode of the frame or frames in Which an ROI occurs. An example of a ROI includes an athlete such as a tennis player Who moves around a scene, playing in a tennis
match. The moving player may be classi?ed as a region of interest since the player is the focus of attention in the game. The locator 310 further classi?es each segment into sub segments as Well as regions of interest and outputs the sub segments on lines 316. The locator 310 also outputs descrip tors 312 de?ning the structure of each subsegment and ROI, and outputs timecode and ROI identi?ers 314. Further descriptors for an ROI may include a mean or variance in
region of interest, or other data. Similarly, When the other
brightness or, for example, if the region is a ?at region or
terms are used herein, they may not be limited to the exact
contains edges, descriptors corresponding to the region’s
de?nition of the term. For example, the term “segment” When used herein may primarily refer to a segment but it may also
55
refer to a region of interest or a subsegment or some other
cessor 320. HoWever, depending on the locator signals 312 and 314, an exception may be made to retain full quality for certain sub segments or ROIs. The operation of the doWnsam pling processor similar to that of similar processors used in
data.
Turning momentarily to a related industry standard, MPEG-7, called the “Multimedia Content Description Inter face”, relates to multimedia content and supports a certain
60
degree of interpretation of the information’s meaning. The MPEG-7 standard is tangentially related to the present dis closure and its contents in its ?nal form are incorporated herein by reference. The standard produces descriptors asso ciated With multimedia content. A descriptor in MPEG-7 is a
representation of a feature of the content, such as grid layouts of images, histograms of a speci?c visual item, color or shape,
characteristics. The subsegments 316 output from the locator 310 may be spatially/temporally doWn-sampled by a prepro
FIG. 1 and FIG. 2. The preprocessor 320 outputs on lines 324 doWn-sampled segments that are temporarily stored in a buffer 326 to aWait
encoding. Buffer outputs 328 make the segments available for further processing. The signal 322 optionally carries infor 65
mation regarding What ?lters Were used prior to doWnsam pling to reduce aliasing, such that an appropriate set of ?lters can be employed for upsampling at the decoding end. A