Research Highlights

Research Highlights

For the complete publication list, please refer to Google scholar

Autonomous Driving

CoPlanner: An Interactive Motion Planner with Contingency-Aware Diffusion for Autonomous Driving

Accurate trajectory prediction and motion planning are crucial for autonomous driving in complex, interactive environments with multimodal uncertainties. Current generation-then-evaluation frameworks construct multiple plausible trajectory hypotheses but adopt a single outcome, leading to overconfident decisions and lacking fallback strategies. Decoupling prediction and planning may result in socially inconsistent joint trajectories. We propose a contingency-aware diffusion planner (CoPlanner), a unified framework that jointly models multi-agent interactive trajectory generation and contingency-aware motion planning. CoPlanner preserves feasible fallback options and enhances robustness under uncertainty, consistently surpassing state-of-the-art methods on the nuPlan benchmark in safety and comfort.

VLM-E2E: Enhancing End-to-End Autonomous Driving with Multimodal Driver Attention Fusion

Human drivers excel at navigating complex scenarios by leveraging rich attentional semantics, whereas current autonomous systems often lose crucial semantic information when converting 2D observations into 3D space, limiting their effectiveness in dynamic environments. To address this challenge, we propose VLM-E2E, a novel framework that utilizes the superior scene understanding and reasoning capabilities of Vision-Language Models (VLMs) to enhance training through attentional cues. Our method integrates textual representations into Bird's-Eye-View (BEV) features for semantic supervision, enabling the model to learn richer feature representations that explicitly capture driver attentional semantics and align more closely with human-like driving behavior. Furthermore, we introduce a learnable BEV-Text weighted fusion strategy to dynamically balance the contributions of BEV and text features, effectively addressing the modality importance imbalance in multimodal fusion. This approach ensures that complementary visual and textual information is fully leveraged, resulting in more holistic and robust representations of driving environments.

OmniReason: A Temporal-Guided Vision-Language-Action Framework for Autonomous Driving

Recent advances in vision-language models (VLMs) have demonstrated impressive spatial reasoning capabilities for autonomous driving, yet existing methods predominantly focus on static scene understanding while neglecting the essential temporal dimension of real-world driving scenarios. To address this limitation, we propose the OmniReason framework, which establishes robust spatiotemporal reasoning by jointly modeling dynamic 3D environments and their underlying decision-making processes. Our work makes two fundamental advances: (1) We introduce OmniReason-Data, two large-scale vision-language-action (VLA) datasets with dense spatiotemporal annotations and natural language explanations, generated through a hallucination-mitigated auto-labeling pipeline that ensures physical plausibility and temporal coherence; (2) We develop the OmniReason-Agent architecture, which integrates a sparse temporal memory module for persistent scene context modeling and an explanation generator that produces human-interpretable decision rationales, facilitated by our spatiotemporal knowledge distillation approach that captures spatiotemporal causal reasoning patterns.

HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation

While Retrieval-Augmented Generation (RAG) augments Large Language Models (LLMs) with external knowledge, conventional single-agent RAG remains fundamentally limited in resolving complex queries demanding coordinated reasoning across heterogeneous data ecosystems. We present HM-RAG, a novel Hierarchical Multi-agent Multimodal RAG framework that pioneers collaborative intelligence for dynamic knowledge synthesis across structured, unstructured, and graph-based data. The framework is composed of a three-tiered architecture with specialized agents: a Decomposition Agent that dissects complex queries into contextually coherent sub-tasks via semantic-aware query rewriting and schema-guided context augmentation; Multi-source Retrieval Agents that carry out parallel, modality-specific retrieval using plug-and-play modules designed for vector, graph, and web-based databases; and a Decision Agent that uses consistency voting to integrate multi-source answers and resolve discrepancies in retrieval results through Expert Model Refinement.

SEG-Parking: Towards Safe, Efficient, and Generalizable Autonomous Parking via End-to-End Offline Reinforcement Learning

Autonomous parking is a critical component for achieving safe and efficient urban autonomous driving. However, unstructured environments and dynamic interactions pose significant challenges to autonomous parking tasks. To address this problem, we propose SEG-Parking, a novel end-to-end offline reinforcement learning (RL) framework to achieve interaction-aware autonomous parking. Notably, a specialized parking dataset is constructed for parking scenarios, which include those without interference from the opposite vehicle (OV) and complex ones involving interactions with the OV. Based on this dataset, a goal-conditioned state encoder is pretrained to map the fused perception information into the latent space. Then, an offline RL policy is optimized with a conservative regularizer that penalizes out-of-distribution actions. Extensive closed-loop experiments are conducted in the high-fidelity CARLA simulator. Comparative results demonstrate the superior performance of our framework with the highest success rate and robust generalization to out-of-distribution parking scenarios.

TSCLIP: Robust CLIP Fine-Tuning for Worldwide Cross-Regional Traffic Sign Recognition

Traffic sign recognition is essential for navigation and traffic control but suffers from performance drops under cross-regional domain shifts. We present TSCLIP, a robust fine-tuning approach leveraging the CLIP vision-language model for worldwide traffic sign recognition. TSCLIP introduces a prompt engineering scheme that integrates scene descriptions and traffic rules to generate semantically rich text prompts, and employs adaptive dynamic weight ensembling (ADWE) to combine zero-shot CLIP knowledge with newly learned representations. This design enhances generalization while effectively adapting to diverse regional traffic sign data, marking the first use of CLIP for global cross-regional traffic sign recognition.

CurbNet: Curb Detection Framework Based on LiDAR Point Cloud Segmentation

Curb detection is vital for defining drivable areas in intelligent driving, yet remains challenging due to complex road environments. This paper presents CurbNet, a novel framework for curb detection based on LiDAR point cloud segmentation. To support research in this domain, we construct 3D-Curb, the largest and most diverse curb point cloud dataset with 3D annotations derived from SemanticKITTI. CurbNet exploits the height variation characteristic of curbs by leveraging spatially rich 3D point clouds and introducing a Multi-Scale and Channel Attention (MSCA) module to enhance detection of features distributed unevenly in space. We further design an adaptive weighted loss function group to address class imbalance in curb data. Comprehensive validation shows that CurbNet achieves robust, accurate curb detection with strong generalization in diverse real-world scenarios.

Trajectory Tree-Based Pairwise Game for Interactive Decision-Making and Motion Planning in Autonomous Driving

This paper presents a trajectory tree-based pairwise game approach for interactive decision-making and motion planning. It formulates the multi-vehicle interaction as multiple pairwise games, which are further modeled by trajectory trees. Furthermore, a novel utility function identification is proposed to recover the utility of opponent vehicle from limited state observations. Simulations demonstrate significant improvement on computation efficiency and travel safety in different multi-vehicle interaction scenarios.

Monocular 3D Lane Detection for Autonomous Driving: Recent Achievements, Challenges, and Outlooks

Shor introduction: 3D lane detection is vital for autonomous driving, enabling vehicles to perceive road geometry in 3D for safe navigation. Due to the cost of sensors and the richness of visual data, monocular-based approaches have gained significant attention. However, current methods remain unreliable, hindering full vision-based autonomy. This review summarizes recent advances, compares algorithm performance and complexity, discusses datasets and unresolved challenges, and suggests future directions.

LD-Scene: LLM-Guided Diffusion for Controllable Generation of Adversarial Safety-Critical Driving Scenarios

This work presents LD-Scene, a novel framework that integrates Large Language Models (LLMs) with Latent Diffusion Models (LDMs) for user-controllable adversarial driving scenario generation through natural language. It comprises an LDM that captures realistic driving trajectory distributions and an LLM-based guidance module that translates user queries into adversarial guidance functions, facilitating the generation of scenarios aligned with user queries. The guidance module integrates an LLM-based Chain-of-Thought (CoT) code generator and an LLM-based code debugger, enhancing the controllability and robustness in generating guidance functions.

NetRoller: Interfacing General and Specialized Models for End-to-End Autonomous Driving

Integrating General Models (GMs) like Large Language Models with Specialized Models (SMs) offers a promising way to address data diversity and capacity limitations in autonomous driving. To bridge the inherent asynchrony between GMs and SMs, we propose NetRoller, an adapter with three key mechanisms: (1) early-stopped reasoning to extract efficient, context-rich representations from GMs, (2) learnable and positional embeddings for robust cross-modality translation, and (3) lightweight Query/Feature Shift tuning to boost SM performance. With these designs, NetRoller allows SMs to run at native frequencies while leveraging GM insights, achieving notable gains in planning safety and perception accuracy.

PlanScope: Learning to Plan Within Decision Scope Does Matter

In autonomous driving, learning-based planning often suffers from reasoning disturbance due to unpredictable events in driving logs, such as sudden obstacles or traffic signal changes. To address this, we propose identifying decisions with their time horizons and retaining only those within derivable horizons, thereby mitigating irrational behaviors. A key implementation is temporal batch normalization, which proves particularly effective and consistently improves driving scores in closed-loop evaluations on the nuPlan motion planning leaderboard. This method also serves as a plug-and-play enhancement for other learning-based planning models.

CoDriveVLM: VLM-Enhanced Urban Cooperative Dispatching and Planning for Future Autonomous Mobility on Demand Systems

This work introduces CoDriveVLM, a novel framework for integrated dispatching and cooperative planning in Autonomous Mobility-on-Demand (AMoD) systems. By leveraging Vision-Language Models (VLMs), CoDriveVLM enhances multimodal perception and enables high-fidelity dispatching with real-time collision risk assessment. A decentralized ADMM-based cooperative planning method ensures scalable, safe trajectory optimization with mutual avoidance among connected and autonomous vehicles (CAVs). Simulations validate the framework's robustness and effectiveness across diverse urban traffic conditions, demonstrating its potential to improve the efficiency and reliability of future AMoD services.

VLM-UDMC: VLM-Enhanced Unified Decision-Making and Motion Control for Urban Autonomous Driving

This work presents VLM-UDMC, a vision-language model (VLM)-enhanced framework for unified decision-making and motion control in urban autonomous driving. It integrates scene reasoning and risk-aware attention through a two-step RAG-based slow system that dynamically reconfigures motion planning using context-aware potential functions. A lightweight multi-kernel decomposed LSTM enables real-time, accurate trajectory prediction for diverse traffic participants. Validated through simulations and real-world tests on a full-size autonomous vehicle, VLM-UDMC improves driving performance with enhanced interpretability and adaptability to complex urban environments.

UDMC: Unified Decision-Making and Control Framework for Urban Autonomous Driving With Motion Prediction of Traffic Participants

This work presents UDMC, an interpretable and unified framework for Level 4 autonomous driving that integrates decision-making and motion control into a single optimal control problem (OCP). By modeling traffic participants and regulations through innovative potential functions and incorporating a dedicated motion prediction module, UDMC ensures safe, rule-compliant behavior in complex urban environments. The framework enables real-time execution of adaptive driving maneuvers while maintaining computational efficiency. High-fidelity CARLA simulations demonstrate its robustness and superior performance over baseline methods, with enhanced safety and traffic compliance.

Occlusion-Aware Contingency Safety-Critical Planning for Autonomous Vehicles

Occlusion-Aware Contingency Safety-Critical Planning for Autonomous Vehicles This research introduces an occlusion-aware contingency planning framework for real-time, safe, and efficient autonomous driving in dynamic, partially occluded environments. The framework jointly optimizes an exploratory trajectory to enhance situational awareness and a fallback trajectory to ensure safety, with a shared initial segment enabling smooth transitions. Leveraging consensus ADMM, the optimization is decomposed into low-dimensional convex subproblems, ensuring real-time coordination and computational efficiency. Extensive simulations and real-world experiments on a 1:10 scale autonomous vehicle demonstrate that our approach outperforms state-of-the-art baselines in terms of safety and efficiency under occluded traffic conditions.

Fast and Scalable Game-Theoretic Trajectory Planning with Intentional Uncertainties

While game-theoretic methods are acknowledged for their effectiveness in modeling multi-agent interactions, they are known to be computationally heavy, especially when intentional uncertainties are involved. We propose a novel game-theoretic interactive trajectory planning method with intentional uncertainties, which demonstrates both high computational efficiency and enhanced scalability. We model the interactions between agents with intentional uncertainties as a general Bayesian game, which is equivalent to a potential game, thus the equilibrium can be obtained by solving a unified optimization problem. We present a distributed algorithm based on the dual consensus ADMM tailored to the parallel solving of the problem, and thus the computational efficiency and scalability are significantly enhanced.

DSDrive: Distilling Large Language Model for Lightweight End-to-End Autonomous Driving with Unified Reasoning and Planning

This study proposes DSDrive, a streamlined end-to-end paradigm tailored for integrating the reasoning and planning of autonomous vehicles into a unified framework. Our approach tackles the high computational demands of LLMs as well as to map high-level textual reasoning to low-level trajectory planning for autonomous vehicles. Specifically, DSDrive leverages a compact LLM that employs a distillation method to preserve the enhanced reasoning capabilities of a larger-sized vision language model (VLM). A waypoint-driven dual-head coordination module is further developed to effectively align the reasoning and planning tasks. DSDrive has been thoroughly tested in closed-loop simulations, where it performs on par with benchmark models and even outperforms in many key metrics, all while being more compact in size.

CALMM-Drive: Confidence-Aware Autonomous Driving with Large Multimodal Model

This study proposes CALMM-Drive, a novel Confidence-Aware Large Multimodal Model (LMM) empowered Autonomous Driving framework. Our approach employs Top-K confidence elicitation, which facilitates the generation of multiple candidate decisions along with their confidence levels. Furthermore, we propose a novel planning module that integrates a diffusion model for trajectory generation and a hierarchical refinement process to find the optimal path. This framework enables the selection of the best plan accounting for both low-level solution quality and high-level tactical confidence, which mitigates the risks of one-shot decisions and overcomes the limitations induced by short-sighted scoring mechanisms.

Integrating Decision-Making Into Differentiable Optimization Guided Learning for End-to-End Planning of Autonomous Vehicles

This research highlights the development of an end-to-end planning framework that enhances driving performance beyond mere imitation of expert demonstrations. By framing decision-making and trajectory planning as a differentiable nonlinear optimization problem, the framework effectively integrates with a learning-based approach while preserving intrinsic constraints throughout the learning process. Validated on the Waymo Open Motion dataset, it consistently outperforms baseline methods. We provide a thorough analysis of how optimized decisions contribute to overall enhancements in driving performance.

Synergizing Decision Making and Trajectory Planning Using Two-Stage Optimization for Autonomous Vehicles

This paper presents a local planner that combines decision-making and trajectory planning for autonomous driving, structured as a nonlinear programming problem. To address the complexities of mixed-integer programming, a two-stage optimization (TSO) approach is proposed. Our closed-loop simulations in CARLA highlight its adaptability to changing driving conditions with high computational efficiency.

Game-Theoretic Driver Modeling and Decision-Making for Autonomous Driving with Temporal-Spatial Attention-Based Deep Q-Learning

A temporal-spatial attention-based deep Q-learning (TSA-DQN) algorithm is developed to estimate the decision level of surrounding vehicles and optimize ego vehicle's decision. Simulations demonstrate improved safety, efficiency, and success rates over baselines in various driving scenarios.

A Universal Multi-Vehicle Cooperative Decision-Making Approach in Structured Roads by Mixed-Integer Potential Game

This paper proposes a universal multi-vehicle cooperative decision-making method in structured roads with game theory. We transform the decision-making problem into a graph path searching problem within a way-point graph framework. The problem is formulated as a mixed-integer linear programming problem (MILP) first and transformed into a mixed-integer potential game (MIPG). Two Gauss-Seidel algorithms for cooperative decision-making are presented to solve the MIPG problem.

Integrated Decision Making and Trajectory Planning for Autonomous Driving Under Multimodal Uncertainties: A Bayesian Game Approach

This research proposes an innovative integrated decision-making and trajectory planning framework for autonomous vehicles. The approach models multimodal interactions of traffic participants as a general Bayesian game, and the corresponding Bayesian coarse correlated equilibrium (Bayes-CCE) reveals the optimal decision and planning scheme under multimodal uncertainties.

LMMCoDrive: Cooperative Driving with Large Multimodal Model

This research introduces LMMCoDrive, a framework for decentralized cooperative scheduling and motion planning in Autonomous Mobility-on-Demand (AMoD) systems. It integrates scheduling and motion planning using a Large Multimodal Model (LMM) with BEV representation, refining CAV trajectories while ensuring safety. A decentralized ADMM-based optimization strategy evolves the CAV graph. Simulations highlight LMM's effectiveness in enhancing traffic efficiency and safe, practical AMoD operations.

Safe and Real-Time Consistent Planning for Autonomous Vehicles in Partially Observed Environments via Parallel Consensus Optimization

This research introduces a consistent parallel trajectory optimization (CPTO) approach for real-time, consistent, and safe trajectory planning for autonomous driving in partially observed environments. The CPTO framework introduces a consensus safety barrier module, ensuring that each generated trajectory maintains a consistent and safe segment, even when faced with varying levels of obstacle detection accuracy. We validate our CPTO framework through extensive comparisons with state-of-the-art baselines across multiple driving tasks in partially observable environments. Our results demonstrate improved safety and consistency using both synthetic and real-world traffic datasets.

Barrier-Enhanced Parallel Homotopic Trajectory Optimization for Safety-Critical Autonomous Driving

This research seamlessly integrates discrete decision-making maneuvers with continuous trajectory variables for safety-critical autonomous driving. The algorithm operates in real-time, optimizing trajectories of autonomous vehicles to ensure safety, stability, and proactive interaction with uncertain human-driven vehicles across various driving tasks, utilizing over-relaxed ADMM iterations. We provide a comprehensive theoretical analysis of safety and computational efficiency.

Improved Consensus ADMM for Cooperative Motion Planning of Large-Scale Connected Autonomous Vehicles with Limited Communication

This research presents a parallel optimization algorithm for cooperative motion planning of large-scale CAVs under limited communications, achieving O(N) time complexity by leveraging sparsity and an improved consensus ADMM. A lightweight evolution strategy enhances computational efficiency, managing small CAV groups. The method, validated with a receding horizon scheme, outperforms existing solvers in simulations of up to 100 vehicles in CARLA, showcasing its efficiency, scalability, and effectiveness.

A Universal Cooperative Decision-Making Framework for Connected Autonomous Vehicles with Generic Road Topologies

This research proposes a general approach for optimal cooperative decision-making of connected autonomous vehicles (CAVs). The approach utilizes the graph representation of generic road topologies and reformulates the cooperative decision-making problem of CAVs as a mixed-integer linear program (MILP). The corresponding solution results in optimal cooperative decision-making, and simulations in various traffic scenarios demonstrate improved comfort, security, and traffic efficiency.

Spatiotemporal Receding Horizon Control with Proactive Interaction Towards Autonomous Driving in Dense Traffic

This research proposes a computationally-efficient spatiotemporal receding horizon control (ST-RHC) scheme to generate a safe, dynamically feasible, energy-efficient trajectory in control space, where different driving tasks in dense traffic can be achieved with high accuracy and safety in real time. The effectiveness of the proposed ST-RHC scheme is demonstrated through comprehensive comparisons with state-of-the-art algorithms on synthetic and real-world traffic datasets under dense traffic.

Decentralized iLQR for Cooperative Trajectory Planning of Connected Autonomous Vehicles via Dual Consensus ADMM

This research proposes a decentralized iterative LQR algorithm for cooperative trajectory planning of connected autonomous vehicles (CAVs) using dual consensus ADMM. The approach reformulates a non-convex problem into a series of convex ones, enabling parallel optimization. Real-time performance and scalability are achieved through efficient trajectory updates. Experiments show superior scalability and efficiency compared to baseline methods.

Robotics

ManiVID-3D: 3D Visual Reinforcement Learning for Robotic Manipulation

We propose ManiVID-3D, a novel 3D visual reinforcement learning architecture designed for robotic manipulation, which learns view-invariant representations through self-supervised disentangled feature learning. The framework incorporates ViewNet, a lightweight yet effective module that aligns point clouds from arbitrary viewpoints into a unified coordinate system without the need for extrinsic calibration. Additionally, we develop an efficient GPU-accelerated batch rendering module enabling large-scale training for 3D visual RL at unprecedented speeds. The system's robustness to severe view changes highlight the effectiveness of geometrically consistent representations for scalable robotic manipulation.

Embracing Bulky Objects with Humanoid Robots: Whole-Body Manipulation with Reinforcement Learning

This paper introduces a reinforcement learning framework that integrates a pre-trained human motion prior with a neural signed distance field (NSDF) representation to achieve robust whole-body embracing.Our method leverages a teacher-student architecture to distill large-scale human motion data, generating kinematically natural and physically feasible whole-body motion patterns. This facilitates coordinated control across the arms and torso, enabling stable multi-contact interactions that enhance the robustness in manipulation and also the load capacity. he embedded NSDF further provides accurate and continuous geometric perception, improving contact awareness throughout long-horizon tasks.

FisheyeDepth: A Real-Scale Self-Supervised Depth Estimation Model for Fisheye Cameras

Accurate depth estimation is vital for 3D scene understanding in robotics, yet fisheye cameras face challenges from image distortion and scarce ground-truth data. We propose FisheyeDepth, a self-supervised depth estimation model specifically designed for fisheye imagery. Our method integrates a fisheye camera model into projection and reprojection during training, improving distortion handling and training stability. By using real-scale pose information instead of network-estimated poses, FisheyeDepth produces physically meaningful depth suitable for robotic applications. Additionally, a multi-channel output strategy fuses multi-scale features to enhance robustness against pose noise. This design enables reliable, true-scale depth estimation for autonomous driving and robotic perception.

LOVON: Legged Open-Vocabulary Object Navigator

In this paper, we propose LOVON, a novel framework that integrates large language models (LLMs) for hierarchical task planning with open-vocabulary visual detection models, tailored for effective long-range object navigation in dynamic, unstructured environments. To tackle real-world challenges including visual jittering, blind zones, and temporary target loss, we design dedicated solutions such as Laplacian Variance Filtering for visual stabilization. We also develop a functional execution logic for the robot that guarantees LOVON's capabilities in autonomous navigation, task adaptation, and robust task completion. Extensive evaluations demonstrate the successful completion of long-sequence tasks involving real-time detection, search, and navigation toward open-vocabulary dynamic targets. Furthermore, real-world experiments across different legged robots (Unitree Go2, B2, and H1-2) showcase the compatibility and appealing plug-and-play feature of LOVON.

MonoGlass3D: Monocular 3D Glass Detection with Plane Regression and Adaptive Feature Fusion

Detecting and localizing glass in 3D environments is difficult due to glass’s optical properties and the absence of dedicated datasets. We introduce a new dataset with varied glass configurations and precise 3D annotations from real-world scenarios, and propose MonoGlass3D, a monocular 3D glass detection network. Our network performs glass segmentation and plane regression simultaneously. It features an adaptive feature fusion module for capturing diverse contextual information and a plane regression pipeline, which seamlessly integrates the geometric properties and semantic context, improving glass surface understanding.

Local Reactive Control for Mobile Manipulators With Whole-Body Safety in Complex Environments

we present a novel local reactive controller that reformulates the time-domain single-step problem into a multi-step optimization problem in the spatial domain, leveraging the propagation of a serial kinematic chain. This transformation facilitates the formulation of customized, decoupled link-specific constraints, which is further solved efficiently with augmented Lagrangian differential dynamic programming (AL-DDP). Our approach naturally absorbs spatial kinematic propagation in the forward pass and processes all link-specific constraints simultaneously during the backward pass, enhancing both constraint management and computational efficiency. Notably, in this framework, we formulate collision avoidance constraints for each link using accurate geometric models with extracted free regions, and this improves the maneuverability of the mobile manipulator in narrow, cluttered spaces.

Interactive Navigation for Legged Manipulators with Learned Arm-Pushing Controller

Interactive navigation is crucial in scenarios where proactively interacting with objects can yield shorter paths, thus significantly improving traversal efficiency. Existing methods primarily focus on using the robot body to relocate obstacles during navigation. However, they prove ineffective in narrow or constrained spaces where the robot's dimensions restrict its manipulation capabilities. This paper introduces a novel interactive navigation framework for legged manipulators, featuring an active arm-pushing mechanism that enables the robot to reposition movable obstacles in space-constrained environments. To this end, we develop a reinforcement learning-based arm-pushing controller with a two-stage reward strategy for object manipulation. Specifically, this strategy first directs the manipulator to a designated pushing zone to achieve a kinematically feasible contact configuration. Then, the end effector is guided to maintain its position at appropriate contact points for stable object displacement while preventing toppling. The simulations validate the robustness of the arm-pushing controller, showing that the two-stage reward strategy improves policy convergence and long-term performance. Real-world experiments further demonstrate the effectiveness of the proposed navigation framework, which achieves shorter paths and reduced traversal time.

FLORES: A Reconfigured Wheel-Legged Robot for Enhanced Steering and Adaptability

We present FLORES (reconfigured wheel-legged robot for enhanced steering and adaptability), a novel robot featuring a unique front-leg design that replaces the conventional hip-roll degree of freedom (DoF) with hip-yaw DoFs. This innovation enables efficient movement on flat surfaces and greater adaptability on complex terrains, allowing seamless transitions between wheeled and legged locomotion. To leverage FLORES’s mechanical advantages, we develop a customized reinforcement learning controller with a tailored reward structure for this configuration. Our approach generates adaptive, multi-modal locomotion strategies and enables the robot to achieve novel, efficient gaits that combine the strengths of both movement modes. Experiments demonstrate FLORES's improved steering, navigation efficiency, and versatile mobility across diverse environments.

DuLoc: Life-Long Dual-Layer Localization in Changing and Dynamic Expansive Scenarios

Our paper introduces DuLoc, a robust and accurate localization framework that tightly couples LiDAR-inertial odometry with offline map-based localization. By integrating a prior global map with dynamic real-time local maps and incorporating a constant-velocity motion model, DuLoc significantly enhances localization repeatability, accuracy, and adaptability to changing environments. Extensive experiments were conducted on 32 Intelligent Guided Vehicles (IGVs) operating in ultra-large port areas, covering 2,856 hours of real-world data. The results demonstrate that DuLoc consistently outperforms state-of-the-art LiDAR localization systems in large-scale, dynamic outdoor scenarios, offering superior reliability and robustness against long-term environmental changes.

On the Surprising Robustness of Sequential Convex Optimization for Contact-Implicit Motion Planning

We propose CRISP (Contact-implicit motion planning with Sequential convex programming), a robust solver that leverages sequential convex programming to tackle contact-implicit motion planning problems. Unlike traditional primal-dual approaches, CRISP focuses solely on the primal problem, solving convex quadratic programs with adaptive trust regions at each iteration. Our method demonstrates surprising robustness in discovering contact patterns with naive initializations, often succeeding even with all-zero starting points, while providing theoretical convergence guarantees to first-order stationary points.

RoboDexVLM: Vision-Language Model-Enabled Task Planning and Motion Control for Dexterous Robot Manipulation

FRTree Planner: Robot Navigation in Cluttered and Unknown Environments with Tree of Free Regions

We propose FRTree, a novel navigation framework that leverages a dynamic treestructure of free regions for robot navigation in cluttered, unknown enyironments with narrow passages. FRTree continuously incorporates real-time perceptive information to identify distinct navigation options and dynamically expands the tree toward explorable and traversable directions. Extensive simulations and real-world tests show that FRTree outperforms benchmark methods in generating safe and efficient motion plans in highly cluttered and unknown terrains.

GS-LIVM: Real-Time Photo-Realistic LiDAR-Inertial-Visual Mapping with Gaussian Splatting

This research introduce GS-LIVM, a real-time photo-realistic LiDAR-Inertial-Visual mapping framework with Gaussian Splatting tailored for outdoor scenes. Compared to existing methods based on Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS), our approach enables real-time photo-realistic mapping while ensuring high-quality image rendering in large-scale unbounded outdoor environments.

Arm-Constrained Curriculum Learning for Loco-Manipulation of a Wheel-Legged Robot

This research introduces an arm-constrained curriculum learning architecture to tackle the issues introduced by adding the manipulator. Firstly, we develop an arm-constrained reinforcement learning algorithm to ensure safety and reliability in control performance after equipping the manipulator. Additionally, to address discrepancies in reward settings between the arm and the base, we propose a reward-aware curriculum learning method. The policy is first trained in Isaac gym and transferred to the physical robot to complete grasping tasks, including the door-opening task, fan-twitching task and the relay-baton-picking and following task. The results demonstrate that our proposed approach effectively controls the arm-equipped wheel-legged robot to master grasping abilities including the dynamic grasping skills, allowing it to chase and catch a moving object while in motion. Please refer to our website for the code and supplemental videos.

Collision-Free Trajectory Optimization in Cluttered Environments Using Sums-of-Squares Programming

This research introduces a trajectory optimization framework for robot navigation in cluttered 3D environments by guiding robot motion through a graph of convex regions within collision-free space. A Sums-of-Squares (SOS) optimization problem is formulated to determine the minimum scaling factor for the region to contain the robot at fixed configuration, and safety constraints are then established by limiting the scaling factor along the trajectory. A guiding direction, derived from the Lagrangian gradient at the SOS optimum is integrated with the AL-iLQR algorithm to efficiently solve the nonlinear trajectory optimization problem.

Geometry-Aware Safety-Critical Local Reactive Controller for Robot Navigation in Unknown and Cluttered Environments

This research proposes a safety-critical local reactive controller for robot navigation in unknown environments. The trajectory tracking task is formulated as a constrained polynomial optimization problem with safety constraints imposed via Sum-of-Squares (SOS) certificates. The problem is convexified into a semidefinite program (SDP) using truncated multi-sequences and moment relaxation, enabling real-time performance.

Template from BootstrapMade