Skip to content Skip to sidebar Skip to footer

Author page: admin

SAM2Long: A Training-Free Enhancement to SAM 2 for Long-Term Video Segmentation

Long Video Segmentation involves breaking down a video into certain parts to analyze complex processes like motion, occlusions, and varying light conditions. It has various applications in autonomous driving, surveillance, and video editing. It is challenging yet critical to accurately segment objects in long video sequences. The difficulty lies in handling extensive memory requirements and…

Read More

New generative AI tools open the doors of music creation

This work was made possible by core research and engineering efforts from Andrea Agostinelli, Zalán Borsos, George Brower, Antoine Caillon, Cătălina Cangea, Noah Constant, Michael Chang, Chris Deaner, Timo Denk, Chris Donahue, Michael Dooley, Jesse Engel, Christian Frank, Beat Gfeller, Tobenna Peter Igwe, Drew Jaegle, Matej Kastelic, Kazuya Kawakami, Pen Li, Ethan Manilow, Yotam Mann,…

Read More

LongAlign: A Segment-Level Encoding Method to Enhance Long-Text to Image Generation

The rapid progress of text-to-image (T2I) diffusion models has made it possible to generate highly detailed and accurate images from text inputs. However, as the length of the input text increases, current encoding methods, such as CLIP (Contrastive Language-Image Pretraining), encounter various limitations. These methods struggle to capture the full complexity of long text descriptions,…

Read More

Latent Action Pretraining for General Action models (LAPA): An Unsupervised Method for Pretraining Vision-Language-Action (VLA) Models without Ground-Truth Robot Action Labels

Vision-Language-Action Models (VLA) for robotics are trained by combining large language models with vision encoders and then fine-tuning them on various robot datasets; this allows generalization to new instructions, unseen objects, and distribution shifts. However, various real-world robot datasets mostly require human control, which makes scaling difficult. On the other hand, Internet video data offers…

Read More