Skip to content Skip to sidebar Skip to footer

Author page: admin

URBAN-SIM: Advancing Autonomous Micromobility with Scalable Urban Simulation

Micromobility solutions—such as delivery robots, mobility scooters, and electric wheelchairs—are rapidly transforming short-distance urban travel. Despite their growing popularity as flexible, eco-friendly transport alternatives, most micromobility devices still rely heavily on human control. This dependence limits operational efficiency and raises safety concerns, especially in complex, crowded city environments filled with dynamic obstacles like pedestrians and…

Read More

Meta AI Researchers Release MapAnything: An End-to-End Transformer Architecture that Directly Regresses Factored, Metric 3D Scene Geometry

A team of researchers from Meta Reality Labs and Carnegie Mellon University has introduced MapAnything, an end-to-end transformer architecture that directly regresses factored metric 3D scene geometry from images and optional sensor inputs. Released under Apache 2.0 with full training and benchmarking code, MapAnything advances beyond specialist pipelines by supporting over 12 distinct 3D vision…

Read More

NVIDIA AI Presents ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning

Estimated reading time: 5 minutes Introduction Embodied AI agents are increasingly being called upon to interpret complex, multimodal instructions and act robustly in dynamic environments. ThinkAct, presented by researchers from Nvidia and National Taiwan University, offers a breakthrough for vision-language-action (VLA) reasoning, introducing reinforced visual latent planning to bridge high-level multimodal reasoning…

Read More

IBM AI Releases Granite-Docling-258M: An Open-Source, Enterprise-Ready Document AI Model

IBM has released Granite-Docling-258M, an open-source (Apache-2.0) vision-language model designed specifically for end-to-end document conversion. The model targets layout-faithful extraction—tables, code, equations, lists, captions, and reading order—emitting a structured, machine-readable representation rather than lossy Markdown. It is available on Hugging Face with a live demo and MLX build for Apple Silicon. What’s new compared to…

Read More

Strengthening our Frontier Safety Framework

We’re expanding our risk domains and refining our risk assessment process. AI breakthroughs are transforming our everyday lives, from advancing mathematics, biology and astronomy to realizing the potential of personalized education. As we build increasingly powerful AI models, we’re committed to responsibly developing our technologies and taking an evidence-based approach to staying ahead of emerging…

Read More

Gemini Robotics 1.5: DeepMind’s ER↔VLA Stack Brings Agentic Robots to the Real World

Can a single AI stack plan like a researcher, reason over scenes, and transfer motions across different robots—without retraining from scratch? Google DeepMind’s Gemini Robotics 1.5 says yes, by splitting embodied intelligence into two models: Gemini Robotics-ER 1.5 for high-level embodied reasoning (spatial understanding, planning, progress/success estimation, tool-use) and Gemini Robotics 1.5 for low-level visuomotor…

Read More