Editing
AI for Video Understanding
(section)
Jump to navigation
Jump to search
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== <span style="color: #FFFFFF;">Remembering</span> == * '''Action recognition''' β Classifying what action or activity is being performed in a video clip. * '''Temporal modeling''' β Modeling how video content changes over time; the core challenge of video AI. * '''Two-Stream Network''' β Early influential architecture combining spatial (RGB frames) and temporal (optical flow) streams for action recognition. * '''3D CNN''' β Convolution applied across both spatial and temporal dimensions; captures short-range motion patterns. * '''I3D (Inflated 3D ConvNet)''' β Inflates 2D ImageNet-trained weights to 3D; seminal video understanding architecture. * '''Video Transformer (ViViT, TimeSformer)''' β Transformer architectures for video; apply self-attention over space and time. * '''Optical flow''' β A dense field of pixel motion vectors between consecutive frames; classical representation of video motion. * '''Temporal grounding''' β Locating the start and end time of a described event in a video. * '''Video captioning''' β Generating natural language descriptions of video content. * '''Video QA''' β Answering natural language questions about video content. * '''Kinetics dataset''' β A large-scale action recognition benchmark with 400-700 action classes and 240Kβ650K video clips. * '''ActivityNet''' β A large benchmark for dense video captioning and activity recognition. * '''Slowfast Networks''' β Two-pathway network: slow (high resolution, low frame rate) + fast (low resolution, high frame rate); models different temporal granularities. * '''Video diffusion models''' β Applying diffusion model framework to generate realistic video sequences; Sora, Runway, Pika. * '''Long-form video understanding''' β Reasoning about events across minutes or hours of video; challenging for models with limited temporal context. </div> <div style="background-color: #006400; color: #FFFFFF; padding: 20px; border-radius: 8px; margin-bottom: 15px;">
Summary:
Please note that all contributions to BloomWiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
BloomWiki:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Navigation menu
Personal tools
Not logged in
Talk
Contributions
Create account
Log in
Namespaces
Page
Discussion
English
Views
Read
Edit
View history
More
Search
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Tools
What links here
Related changes
Special pages
Page information