(12) Ulllted States Patent (10) Patent N0.: US 7715475


[PDF](12) Ulllted States Patent (10) Patent N0.: US 7715475...

7 downloads 133 Views 2MB Size

US007715475B1

(12) Ulllted States Patent

(10) Patent N0.:

Puri et al. (54)

(45) Date of Patent:

CONTENT ADAPTIVE VIDEO ENCODER Inventors:

Puri’ Riverdales NY

6,161,137 A *

Mehmet

Reha Civanlar, Middletown, N] (US) (73)

Assignee:Z AT&T Intellectual Property H, LIP‘,

1/2001 Kobayashiet a1.

6,182,069 B1 *

1/2001

6,256,423 B1

7/2001 Knshnamurthy et a1.

6,263,022 B1 6,266,442 B1 *

7/2001 Chen et a1. 7/2001 Laumeyer et a1. ......... .. 382/190

6,266,443 B1

7/2001 Vetro et a1. 9/2001 Lubln et a1.

.

.

.

6,388,688

Subject‘ to any d1scla1mer, the term ofth1s

B1

5/2002

6/2002 Apostolopoulos

6,430,591 B1

8/2002 Goddard

This patent is subject to a terminal dis-

(Continued)

Clalmer'

OTHER PUBLICATIONS

App1_ NO; 10/970,607

Abe Shinji et a1. , Video Retrieval Method, Feb. 13, 1998, Nippon Corporation, pp. 1-6 (JP410040260A).*

Oct‘ 21’ 2004

Primary ExamineriGims S Philippe

Related US. Application Data

(63) ilonmgngagaoln givjlgjaltlcgloolg '

(51)

(52)





'

Int_ CL H04N 7/12 G06K 9/46 U 5 Cl

58

F,‘ I'd

( )

1e S

'

(57)

74’873’ ?led on ’



'

(200601) 375/240 382/190

""

"""

37’5/240 01

1. t.

?l f

ee app 10a Ion

(56)

'



38'2/’190_ G’06F 17/30’

1 t

h’h. t

e or Comp 6 e Seam

9/1995

sen from a plurality of encoders. Each encoder is associated With a model. The chosen encoder is particularly suited to bit-stream for each segment includes information regarding

Which encoder Was used to encode that segment. A matching decoder of a plurality of decoders is chosen using the infor mation in the coded bitstream to decode each segment using

U.S. PATENT DOCUMENTS 5,448,307 A

Video content into segments based on prede?ned classi?ca tions or models. Examples of such classi?cations comprise action scenes, sloW scenes, loW or high detail scenes, and brightness of the scenes. Based on the segment classi?ca

encoding the unique subject matter of the segment.The coded

15 Dry‘

References Cited

5 513 282 A 5,579,052 A 5,748,789 A

A system for content adaptive encoding and decoding Video is

tions, each segment is encoded With a different encoder cho

0 37;‘72sZ0c38l02I202zrc24/6 709/2'24f ’

ABSTRACT

disclosed. The system comprises modules for segmenting

(200601)

'

Schileru-Key

6,404,814 B1 *

et a1. ................... .. 375/24012

U-S-C- 154(1)) by 1599 days-

(22) Filed:

6t ill. ............... .. 707/6

6,285,797 B1

Patent 15 extended Or ad1usted under 35

(21)

12/2000 Ogdon et a1. ............. .. 709/224

6,175,596 B1

’ .

Not1ce:

*May 11, 2010

New York NY (Us) .

(*)

US 7,715,475 B1

a decoder suited for the_ classi?cation _ or model of the seg_ _ ment. If scenes ex1stWh1ch do not fall 1n a prede?ned class1 ?cation, or Where classi?cation is more dif?cult based on the scene content, these scenes are segmented, coded and

Gelissen et a1.

4/1996 Williams 1 1/1996 Amen 5/199g Lee et 31‘

5,946,043 A *

8/1999 Lee et a1. ............. .. 375/24024

5,949,484 A *

9/1999 Nakaya et a1. ......... .. 348/384.1

6,094,457 A

7/2000 LiIlZer et a1

decoded using a generic coder and decoder.

17 Claims, 15 Drawing Sheets

DOB CONTENT

MODEL MAPPER I130

MODEL A VIDEO

[5"

[825

SEMANTIC AND smucwn: DESCRIP'TORS

WWW” 8's COMPARATOR

unuuun COMPUTER AND SELECTOR

[BIB

[82B

( 305

I

15°‘

| connnnnon I» “1mm 520

connnnnron

,szz T 312 comm MODEL B VIDEO SEMANTIC AND STRUCTURE DESCRIPTORS

810

840

5“

COMPUTER AND

,m MINIMUM

5"

COMPUTER AND

SELECTOR

\

—»

SELECT”

I COMPARATOR

common

“2

835

3J2‘

US 7,715,475 B1 Page 2 US. PATENT DOCUMENTS 6,477,201 B1* 6,493,023 B1

_ 11/2002 Wlne et a1. ........... .. 375/24008 12/2002 Watson

6,493,386 B1 12/2002 Vetro @1916,496,607 B1 * 12/2002 Krishnamurthy et a1. 6,516,090 B1

6,539,060 6,549,658 6,631,162 6,643,387

B1 B1 B1 B1

2/2003 Lennon eta1~

3/2003 4/2003 10/2003 11/2003

Lee etal. Schweid eta1~ Lee etal. Sethuraman et a1.

382/282

6,665,346 B1

12/2003 Lee et a1.

6,671,412 B2 6,678,413 B1 6,704,281 B1*

12/2003 Katata 6131. 1/2004 Liang et a1‘ 3/2004 Hourunranta etal. ..... .. 370/230

6,748,113 B1 6,763,069 B1

6/2004 Kondo 6131. 7/2004 Divakamn et 31‘

6,909,745 B1*

6/2005 P11116131. ............ .. 375/240.01

7,245,821 B2 7,456,760 B2

* cited by examiner

7/2007 Okada 11/2008 Normile 6131.

US. Patent

May 11,2010

Sheet 1 0115

US 7,715,475 B1

EA mu22 2x2m

52222352>m2“ 27.2%

>2221 222.>2\2 2 2

. 2-:02,51“ 2 m 2 M H 0 1 2 | > 2 0 N2\N_~\22§\5212 2 222:2>%25

km2 21|.2

\P

2?“ N.QNE .\QNRN

4|5222W2N2 2 2 > 5 3 2 > 1 0 ° 2~\ §\ 52.1 2221

.25.28

US. Patent

May 11,2010

Sheet 5 0f 15

US 7,715,475 B1

i

g

.

:3 r.___________________..._____,.._._________

_-__..__

xmi -

mamQ2_Ma“ 2i Q:

“tile n 22E2:_E

"~§< 62w\xa§s

HI [InL__\1

|.fr!_b

_ _ _ _

Twig}; m 1.uw “252nE m m n _ n #25 on E

L-.:Emwé"?e.

:2-."rmun4*

all n “ u $550an

.nQbw2.1‘

rl_TIf

ti. “ is: a: u _E _

+" 7n u izk@228mzam a n:0;52_ ‘w?wz "“a;e:é

was;uFr__2:§e:-2I15;5:

V3u":>“328as:5? #TJ \zs5z?2uw‘asm"%h"+}m1:?i%.m hé 12“mags azwu?wwm _ _, a r_ u 22325%.1m 71u 7E E @2 5E: I2 n7 5$£2812\528 3 m415 m :18“:.E9b25FE;323

o:itn_I7'1Ll|!.

25:2 unL4 _o$152

n on 5:22s: isN _

+€J ..2_\82a

fL_ 7m :3 u H $52

2513_-2 _ _

\ u \ “ n n _ om _ _

_

l-I|'.‘J

US. Patent

May 11,2010

Sheet 6 0f 15

US 7,715,475 B1

_.

J]; KIT 5:2217 ne2_c “ u u m 55:8

.EL5%:22na?

N8mn“oz1Mm2“i3#so5u25

m@$2.5Q;6“82E2

m 123%H .u $5287

“ u 558 21 n u 2i 2222><

"m51.8$152>18" n u 51 21

_ -

_ .

_ -

x_ _

m @228HE2a”7

m 7:1 £1$5:2782;

u u

m82>mis:12$3528 u552;m227)52

E58 \ e2 _.

" nas855m 21 ke: w _

@m.rE

m Q;217.114. m2151H$5an2 82.;

053mm 2_7

_ -

lI J

US. Patent

May 11,2010

Sheet 7 0f 15

ENCODER SET

GENERIC MODEL VIDEO ENCODER / 90s

CONTENT MODEL A VIDEO ENCODER

CONTENT MODEL 6 VIDEO ENCODER

US 7,715,475 B1

US. Patent

625:.

-.5:r%sa

May 11,2010

Sheet 8 0f 15

US 7,715,475 B1

\» 501 M “N: n

_ _ L

:81 m 5:1 "=2:~827n5:555::

iLmfl:u5|wS3l?1m» E5_u"1|:55:$.|:52J_58: msm_.\»i"l:.v+2_wm .
i 52n\_55:5:75

.a::25: .f as_L v 22r

\ _ _ _

55:5 n 52 , _\ #52 m u h“5255.22:8: u¢

n u 32

m :2 J" 5252

z:n w w n 3.2@265:SE\.

m E5:5 :

m 7, 528n55m5:::5

nI u21 5255_

A \W _ 5s:.

m _ 55:5 n , 5:::2 5m

_ _

I un15:::.515::e::

m 5:5nrw 5::e:505:0252:

n " ’ N2:Em5::monmdm

US. Patent

FI G.

May 11,2010

Sheet 9 0115

US 7,715,475 B1

11 /§48

_

_

_

SEGMENT DESCRIPTION ENcDDER

[1102

AOL» SEGMENT DESCRIPTORS,

[308

’ "04

ID, TIME cDDE VALUES 7

10 mm MAPPER

H06 ’

[1110

SW

BINARY com

‘3508

INDIcEs 1o —-—L> V

LUT1

110911‘

|

"08’

,1112 i310.

SUBSEGMENT/ROI DESCRIPTORS, ID, TIME

r312

em VALUES T0

LUT1 ADDREss

I

[1120 r1114 # INDICES To BINARY CODE r1116 M2

M3506 550D I

INDIcEs MAPPER I

11194

LU_T 2

‘"8

‘ £322

ADDRESS

PREéHIJQEZSING 7

VALUES TO INDEX

III/0:113:10 ’ “24

112811‘

TO INDEX MAPPER

[1.140 313%)‘ C?“

[1134

113B’? LUT 4

“56

I 346

.

I

I550F

LUT 4



I

350E

LUT s ADDRESS

"26

{332

I

LUTZI

.

11132 CONTENT MODEL VALUE

'

BINARY com

MAPPER

I

-

'

ADDRESS

,1142

[I150

coDINc NoIsE FILTER VALUES Io [NDlCES

INDICES T0 BINARY com

'1'“

MAPPER

-

LUTS

1148A, '

LUT s

"46“

ADDREss

3506

I

'

V

US. Patent

FI G.

May 11,2010

Sheet 10 0f 15

US 7,715,475 B1

12 [302

_

_

SEGMENT DESCRIPTION 0Ec00ER

[1200 436”-

- BINARY CODE

[350B

f 1210 ’ ‘208

10 INDlCES LUT 1

1209 r

DESCRIPTORS, ID, TIME I

LUT 1 ADDRESS

‘202

[1210 ,ssoc.

3 r 5°”

7

‘366

7 CODE VALUES MAPPER

12044

I

364

INmcEs Io sEcNENI —-J—>

f 1220

BINARY CODE

,121s?

IN0IcEs TO

To INDICES HM

SUBSEGMENT/ROI {1219* oEscRIRIoRs. [0, TIME

I I

30s v

370 I

CODE VALUES MAPPER -

12144

‘212

J ‘ME

w

'

LUT 2

ADDRESS

BINARLZiZDE TO INDEX LUT 3

.

I

’ m8?

601E120

PREPROCESSING

w

JSBOF

.

LUT 3

BI;‘8R:NDC&DE M 4 I

[1240

I374

MODEL VALUE MAPPER I

ADDRESS

[1240 -

BINARY CODE

r 3606

T0 INmcEs LUT s

12444

I

‘242

I

,1238_ INDEX TO coNIENI

12344 m 4

1232

372

ADDRESS

[1230

l

I

VALUES MAPPER

12244

‘222

'

¢

LUT 5

ADDRESS

[1250 IN0IcEs I0 CODING

-

’ ‘248' mm FILTER VALUES MAPPER

376

I

l

-

US. Patent

May 11,2010

Sheet 11 0115

FIG.

US 7,715,475 B1

13 37

8 MODEL VIDEO DECODERS

GENERIC MODEL VIDEO DECODER / 130a comm MODEL A VIDEO DECODER

CONTENTMODELG VIDEO DECODER L--->

US. Patent

May 11,2010

Sheet 13 0f 15

US 7,715,475 B1

FIG. 15

mwsDC

3

EmD

W5 or. .m& 00

33 BA

w88

_I

R

:SA EMF4|NBF5BEH0 4 wrME

_ 1

‘ml-IGR‘IN

lw. o0_

5 2

6I|l

W TR 0 I I E R % C

IBM-lu ’

3 70

_i.L.

l

1 5 10 2

US. Patent

May 11,2010

FIG.

Sheet 14 0115

US 7,715,475 B1

16a

(EA; ‘602w ANALYZE INPUT VIDEO, CLASSIFY AND EXTRACT VIDEO SEGMENTS

1606

1,6,, REPRESENT

INPUT NEXT VIDEO SEGMENT A

‘608w

V

f IEIU

ANALYZE VIDEO SEGMENT,

REPRESENT B‘

IDENTIFY SUBSEGMENTS/ ROI

7 DESCRIPTORS

SELECTIVELY , SPATIA LLY AND 1612~/~

A_

r DESCRIPTORS

TEMPORALLY DOWNSAMPLE

I16“ V

SUBSEGMENTS/ROI I

CA

/ I6I8

‘ 616w ASSIGN AN APPROPRIATE MODEL TO EACH SEGMENT IF POSSIBLE

REPRESENT D‘ DESCRIPTORS

V

1620

16824

GENERIC? MODEL

ENCODE USING AN ENCODER FROM PLURALITY OF

7

YES

ENCODERS, EACH ASSOCIATED WITH SPECIFIC CONTENT

MODEL L

I

IIBZZ

ENCODE USING GENERIC MODEL

ENCODER

r

[1628

ESTIMATE DIFFERENT TYPES

REPRESENT E

FILTERS FOR THEIR REMOVAL

S

ENCODE

1626

I

T

/

1632

I630~? SEGMENT DESCRIPTION

1

MULTIPLEX AND SEND TO CHANNEL 1634 ALL SEGMENTS ENCODED ? YES

US. Patent

May 11,2010

Sheet 15 0115

US 7,715,475 B1

FIG. 16b OPEN CONNECTION TO CHANNEL @1702 TO BEGIN RECEIVING BITSTREAM

CONTINUE RECEIVING BITSTREAM AND DEMULTIPLEX

ARE THESE SEGMENT DESCRIPTION BITS ?

J“ 1704

YES

/ I 71 O DERIVE DESCRIPTORS

DECODE USING GENERIC MODEL DECODER

I |

Al

DECODE USING DECODER FROM A PLURALITY OF DECODERS, EACH f 1 71 6 ASSOCIATED WITH A SPECIFIC CONTENT MODEL

I

I

+1

APPLY CODING NOISE REMOVAL FILTERS

I SELECTIVELY, SPATIALLY AND TEMPORALLY

UPSAMPLE SUBSEGMENTS/ROI

I ASSEMBLE VIDEO SEGMENT FOR OUTPUT TD DISPLAY

SEGMENTS DECDDED

DERIVE DESCRIPTORS DERIVE DESCRIPTORS DERIVE DESCRIPTORS I728

US 7,715,475 B1 1

2

CONTENT ADAPTIVE VIDEO ENCODER

bi-directional video and, hence, alloWs the use of high com

RELATED APPLICATIONS

plexity encoders and can tolerate larger delays. The largest application of the second group is entertainment and, in par ticular, distribution of full-length movies. Compressing mov

The present application claims priority to US. patent application Ser. No. 09/874,873, ?led Jun. 5, 2001, the con tents of Which are incorporated herein by reference. The present disclosure is related to: Ser. No. 09/874,872, entitled “A Method of Content Adaptive Video Encoding” ?led concurrently hereWith and Which is incorporated herein

ies for transmission over the common broadband access pipes

such as cable TV or DSL has obvious and signi?cant appli cations. An important factor in delivering movies in a com

mercially plausible Way includes maintaining quality at an acceptable level at Which vieWers are Willing to pay. The challenge is to obtain a very high compression in

coding of movies While maintaining an acceptable quality.

by reference; Ser. No. 09/874,879, entitled “A System for Content Adaptive Video Decoding”, ?led concurrently here With and Which is incorporated herein by reference; Ser. No. 09/874,878, entitled “A Method of Content Adaptive Video Decoding” ?led concurrently hereWith and Which is incorpo rated herein by reference; and Ser. No. 09/874,877, entitled “A System and Method of Filtering Noise” ?led concurrently hereWith and Which is incorporated herein by reference. FIELD OF THE INVENTION

The video content in movies typically covers a Wide range of characteristics: sloW scenes, action-packed scenes, loW or

high detailed scenes, scenes With bright lights or shot at night, scenes With simple camera movements to scenes With com

plex movements, and special effects. Many of the existing video compression techniques may be adequate for certain 20

The invention relates to the encoding of video signals, and

more particularly, content adaptive encoding that improves e?icient compression of movies. 25

BACKGROUND OF THE INVENTION

Video compression has been a popular subject for aca demia, industry and international standards bodies alike for more than tWo decades. Consequently, many compressors/ decompressors, or coders/decoders (“codecs”) have been

types of scenes but inadequate for other scenes. Typically, codecs designed for videotelephony are not as ef?cient for coding other types of scenes. For example, the International Telecommunications Union (ITU) H.263 standard codec per forms Well for scenes having little detail and sloW action because in video telephony, scenes are usually less complex and motion is usually simple and sloW. The H.263 standard

optimally applies to videoconferencing and videotelephony for applications ranging from desktop conferencing to video surveillance and computer-based training and education. The H.263 standard aims at video coding for loWer bit rates in the

30

range of 20-30 kbps. Other video coding standards are aimed at higher bitrates or other functionalities, such as MPEG-l (CDROM video),

developed providing performance improvements or neW

MPEG-2 (digital TV, DVD and HDTV), MPEG-4 (Wireless

functionality over the existing ones. Several video compres sion standards include MPEG-2, MPEG-4, Which has a much Wider scope, and H.26L and H.263 that mainly target com

video, interactive object based video), or still images such as J PEG. As can be appreciated, the various video coding stan 35

munications applications. Some generic codecs supplied by companies such as Microsoft® and Real Networks@ enable the coding of generic video/movie content. Currently, the MPEG-4 stan dard and the H.26L, H.263 standards offer the latest technol ogy in standards-based codecs, While another codec DivX;-)

sion techniques adequately provides acceptable performance 40

over the Wide range of video content.

45

FIG. 1 illustrates a prior art frame-based video codec and FIG. 2 illustrates a prior art object based video codec. As shoWn in FIG. 1, a general purpose codec 100 is useful for coding and decoding video content such as movies. Video information may be input to a spatial or temporal doWnsam

is emerging as an open-source, ad-hoc variation of the MPEG-4 standard. There are a number of video codecs that do not use these or earlier standards and claim signi?cant

improvements in performance; hoWever, many such claims are dif?cult to validate. General purpose codecs do not pro

pling processor 102 to undergo ?xed spatial/temporal doWn sampling ?rst. An encoder 104 encodes video frames (or ?elds) from the doWnsampled signal. An example of such an

vide signi?cant improvement in performance. To obtain sig ni?cant improvements, video codecs need to be highly adapted to the content they expect to code. The main application of video codecs may be classi?ed in tWo broad categories based on their interactivity. The ?rst category is interactive bi-directional video. Peer-to-peer communications applications usually involve interactive bi directional video such as video telephony. In video telephony, the need exists for loW delay to insure that a meaningful

50

and/or temporally upsample the frames for display. 55

tribution applications, including broadcast and Video-on-De mand (VoD). This second category usually does not involve

FIG. 2 shoWs a block diagram of a specialiZed obj ect-based codec 200 for coding and decoding video objects as is knoWn in the art. Video content is input to a scene segmenter 202 that

segments the content into video objects. A segment is a tem

system requires each terminal both to encode and decode

and cost and siZe issues require similar complexity in the encoders and decoders (the encoder may still be 2-4 times more complex than the decoder), resulting in almost a sym metrical arrangement. The second category of video codecs relates to video dis

encoder is an MPEG-l or MPEG-2 video encoder. Encoder 104 generates a compressed bitstream that can be stored or

transmitted via a channel. The bitstream is eventually decoded via corresponding decoder 106 that outputs recon structed frames to a postprocessor 108 that may spatially

interaction can be achieved betWeen the tWo parties and the audio and video (speaker lip movements) are not out of syn chroniZation. Such a bi-directional video communication

video. Further, loW delay real-time encoding and decoding

dards, While being e?icient for the particular characteristics of a certain type of content such as still pictures or loW bit rate transmissions, are not optimal for a broad range of content characteristics. Thus, at present, none of the video compres

poral fragment of the video. The segmenter 202 also produces 60

a scene description 204 for use by the compositor 240 in reconstructing the scene. Not shoWn in FIG. 2 is the encoder

of the scene description produced by segmenter 202. The video objects are output from lines 206 to a prepro

cessor 208 that may spatially and/ or temporally doWnsample 65

the objects to output lines 210. The doWnsampled signal may be input to an encoder 212 such as a video object encoder using the MPEG-2, MPEG-4 or other standard knoWn to

US 7,715,475 B1 3

4

those of skill in the art. The contents of the MPEG-2, MPEG

The descriptors are used to select an encoder from plurality of encoders for each portion (segment/subsegment or ROI) so

4, H.26L and H.263 standards are incorporated herein by reference. The encoder 212 encodes each of these video objects separately and generates bitstreams 214 that are mul tiplexed by a multiplexer 216 that can either be stored or

that the highest compression can be achieved for the particu lar content in each portion. The descriptors are a predeter mined set of classi?cations such as, for example, action inten

transmitted on a channel 218. The encoder 212 also encodes

sity, details in a scene, or brightness of the scene. Each

header information. An external encoder (not shoWn) encodes scene description information 204 produced by segmenter

encoder uses a coding algorithm particularly suited to e?i ciently encode and produce a bitstream according to the char acteristics of each segment. One of the available encoders is a

202.

The video objects bitstream is eventually demultiplexed

generic encoder preserved for segments that have classi?ca

using a demultiplexer 220 into individual video object bit

tions that do not ?t the prede?ned classi?cations or that are

streams 224 and are decoded in video object decoder 226. The

dif?cult to classify. The descriptors used according to the present invention may have some overlapping general descriptions to those

resulting decoded video objects 228 may undergo spatial and/or temporal upsampling using a postprocessor 230 and the resulting signals on lines 232 are composed to form a

used in the MPEG-7 standard. For example, a classi?cation of camera motion may be a descriptor used in both MPEG-7 and in the classi?cation of video content according to the present

scene at compositor 240 that uses a scene description 204

generated at the encoder 202, coded by external means and decoded and input to the compositor 240. Some codecs are adaptive in terms of varying the coding scheme according to certain circumstances, but these codecs generally change “modes” rather than address the di?iculties

invention. HoWever, as explained above, the descriptors used in MPEG-7 are for video indexing and retrieval rather than 20

The coded bitstream includes information about the selected encoder. This information enables the selection of a

explained above. For example, some codecs Will sWitch to a different coding mode if a buffer is full of data. The neW mode

may involve changing the quantiZer to prevent the buffer from again becoming saturated. Further, some codecs may sWitch

matching decoder chosen from a plurality of decoders. Each encoder/decoder pair is designed to ?t a range of character 25

modes based on a data block siZe to more easily accommodate

varying siZed data blocks. In sum, although current codecs may exhibit some adaptiveness or mode selection, they still fail to address the ine?iciencies in encoding and decoding a Wide variety of video content using codecs developed for

encoding and decoding processes.

istics, or a model, and is referred to as a codec for that model.

For example, camera motions such as Zooming and rotations may require sophisticated tools such as Zoom and rotation compensation and may belong to a particular model. A multi media or video portion having these particular camera 30

narroW applications.

motions may be encoded by its corresponding codec to

achieve higher e?iciency. In another example, in some video scenes, a specialiZed type of subscene Will shoW a conversation betWeen tWo

SUMMARY

What is needed in the art is a codec that adaptively changes its coding techniques based on the content of the particular

people using the technique called “opposing glances.” In an 35

video scene or portion of a scene. The present invention

alleviates the disadvantages of the prior art by content adap tive coding in Which the video codec adapts to the character istics and attributes of the video content. The present inven

40

tion is preferably targeted for coding and decoding full-length

encode the segments in an e?icient manner for that model. Thus for one scene, background information may be encoded

feature movies, although it is clearly applicable to any deliv ery of content. The present invention differs from existing codecs that treat video content as a sequence of frames con

sisting of matrices of pixels. Rather, the present invention

45

relates to segmenting the movie into fragments or portions that can be coded by specialiZed coders optimiZed for the This segmentation/classi?cation process may involve a

decoded by the corresponding generic decoder before assem 50

55

into portions. Segment and subsegments are typically tempo video. The content may also be divided in other Ways for 60

For example, a region of interest that covers several frames

describing the scene. In another aspect of the invention, the process is fully automated.

closely matches the model.

The present invention may be understood With reference to

the attached draWings, of Which: FIG. 1 illustrates a prior art frame-based video codec; FIG. 2 illustrates a prior art object-based video codec; FIG. 3 shoWs an exemplary content adaptive segment

can be referred to as a spatio-temporal fragment or portion.

The classi?cation of portions canbe done manually by human operator or semi-automatically by human operator helped by a specialiZed editing system designed to extract descriptors

ing a model for each segment of the content from a list of existing models, and selecting a codec that matches or most

BRIEF DESCRIPTION

ral fragments of the video, and a ROI is a spatial fragment of

classi?cation and comparison to prede?ned content models.

bly for display. The invention disclosed herein comprises a system and method of adaptively analyzing video content and deterrnin

automatically Will likely be negligible. The proposed coding structure preferably classi?es a video/movie into segments and further into subsegments and regions of interest (ROIs). The video content is thus divided

by one encoder and information associated With the speakers (perhaps de?ned as an ROI for that frame) encoded by a different encoder. Some scenes Will not ?t one of the prede?ned models. The segments associated With these scenes Will be transmitted to a generic encoder and after transmission or storage, Will be

properties of the particular fragment. manual operation, semi-manual of automatic method. Con sidering the cost of movie production, the increase in cost to perform this process either manually, semi-automatic, or

opposing glances scene, the camera focuses alternatively on the tWo participants in the scene. The segments and/or sub segments associated With an opposing glances scene Will be mapped to have associated models for video content that does not vary much throughout the scene. The segments are then transmitted to an encoder of the plurality of encoders that Will

65

based video codec; FIG. 4 is a diagram shoWing an example of video/movie sequence consisting of a number of types of video segments;

US 7,715,475 B1 6

5 FIG. 5 is a diagram showing an example of an “opposing

object motion or camera motion. The MPEG-7 standard,

glances” video segment consisting of a number of subseg ments; FIG. 6 is a block diagram illustrating a semantics and global scene attributes-based classi?er and video segments

hoWever, is primarily focused on providing a quick and e?i cient searching mechanism for locating information about various types of multimedia material. Therefore, the MPEG-7 standard fails to address video content encoding and decod

extractor;

mg.

FIG. 7 is a block diagram illustrating a structure and local scene attributes based classi?er, and a subsegments and ROI

and index audio/video content to enable such uses as a song

The MPEG-7 standard is useful, for example, to describe

identi?er;

location system. In this example, if a person Wishes to locate a song but does not knoW the title, the person may hum or sing a portion of the song to a speech recognition system. The received data is used to perform a search of a database of the indexed audio content to locate the song for the person. The concept of indexing audio/video content is related to the present disclosure and some of the parameters and methods of

FIG. 8 shoWs a block diagram of a semantic and structure descriptors to nearest content model mapper;

FIG. 9 is a block diagram illustrating an exemplary set of content model video segment encoders; FIG. 10 is a block diagram illustrating a coding noise

analyZer and ?lter decoder;

indexing content according to MPEG-7 may be applicable to the preparation of descriptors and identi?ers of audio/video

FIG. 11 is a block diagram illustrating a segment descrip

tion encoder; FIG. 12 is a block diagram illustrating a segment descrip

content for the present invention.

Returning to the description of present invention, the

tion decoder; FIG. 13 is a block diagram illustrating an exemplary set of content model video segment decoders; FIG. 14 is a block diagram illustrating a set of coding noise

20

308 of FIG. 3 are shoWn as single signals, but are vectors and

carry information for all portions in the video content. The descriptors may be similar to some of the descriptors used in

removal ?lters; FIG. 15 is a block diagram illustrating an exemplary video segment scene assembler; and FIGS. 16a and 16b shoW an example of a method of encod ing and decoding a bitstream according to an aspect of the

MPEG-7. HoWever, the descriptors contemplated according 25

present invention. DETAILED DESCRIPTION

descriptors, identi?ers and time code output on lines 306 and

30

to the present invention are beyond the categoriZations set forth in MPEG-7. For example, descriptors related to such video features as rotation, Zoom compensation, and global motion estimation are necessary for the present invention but may not be part of MPEG-7. Portions output on lines 304 are input to a locator or loca

tion module 310 that classi?es the portion based on structure

The present invention may be understood With reference to FIGS. 3-16b that illustrate embodiments and aspects of the invention. FIG. 3 illustrates a system for providing video content encoding and decoding according to a ?rst embodi ment of the invention. A block diagram of the system 300

35

illustrates a specialiZed codec for coding and decoding video portions (segments, subsegments or ROIs). The video por tions may be part of a movie or any kind of video or multi media content. The video content is input via line 301 to an

region of interest means noting coordinates of a top left comer (or other comer) and a siZe, typically in an x and y dimension, 40

extractor 302 for semantic and global statistics analysis based on prede?ned classi?cations. The extractor 302 also performs video segments extraction. The outcome of the classi?cation and extraction process is a video stream divided into a number

of portions on outputs 304, as Well as speci?c descriptors output on line 306 de?ning high level semantics of each

45

portion as Well as identi?ers and time code output on line 308

for each portion. The terms “portion” or “fragment” are used herein may most commonly refer to a video “segment” but as made clear above, these terms may refer to any of a segment, subsegment,

and local statistics. The locator 310 also locates subsegments and regions of interest (ROI). When a classi?cation of motion, color, brightness or other feature is local Within a subsegment, then the locator 310 may perform the classi?ca tions. When classi?cations are globally uniform, then the extractor 302 may classify them. The process of locating a

50

of an area of interest. Locating an area of interest may also include noting a timecode of the frame or frames in Which an ROI occurs. An example of a ROI includes an athlete such as a tennis player Who moves around a scene, playing in a tennis

match. The moving player may be classi?ed as a region of interest since the player is the focus of attention in the game. The locator 310 further classi?es each segment into sub segments as Well as regions of interest and outputs the sub segments on lines 316. The locator 310 also outputs descrip tors 312 de?ning the structure of each subsegment and ROI, and outputs timecode and ROI identi?ers 314. Further descriptors for an ROI may include a mean or variance in

region of interest, or other data. Similarly, When the other

brightness or, for example, if the region is a ?at region or

terms are used herein, they may not be limited to the exact

contains edges, descriptors corresponding to the region’s

de?nition of the term. For example, the term “segment” When used herein may primarily refer to a segment but it may also

55

refer to a region of interest or a subsegment or some other

cessor 320. HoWever, depending on the locator signals 312 and 314, an exception may be made to retain full quality for certain sub segments or ROIs. The operation of the doWnsam pling processor similar to that of similar processors used in

data.

Turning momentarily to a related industry standard, MPEG-7, called the “Multimedia Content Description Inter face”, relates to multimedia content and supports a certain

60

degree of interpretation of the information’s meaning. The MPEG-7 standard is tangentially related to the present dis closure and its contents in its ?nal form are incorporated herein by reference. The standard produces descriptors asso ciated With multimedia content. A descriptor in MPEG-7 is a

representation of a feature of the content, such as grid layouts of images, histograms of a speci?c visual item, color or shape,

characteristics. The subsegments 316 output from the locator 310 may be spatially/temporally doWn-sampled by a prepro

FIG. 1 and FIG. 2. The preprocessor 320 outputs on lines 324 doWn-sampled segments that are temporarily stored in a buffer 326 to aWait

encoding. Buffer outputs 328 make the segments available for further processing. The signal 322 optionally carries infor 65

mation regarding What ?lters Were used prior to doWnsam pling to reduce aliasing, such that an appropriate set of ?lters can be employed for upsampling at the decoding end. A