Teaching AVs the Language of Human Driving Behavior with Trajeglish

Teaching Autonomous Vehicles the Language of Human Driving

Summary

Autonomous vehicles (AVs) need to understand the nuances of human driving behavior to coexist safely and efficiently on the roads. A new approach, called Trajeglish, uses tools from discrete sequence modeling to simulate realistic multi-agent driving scenarios. By tokenizing motion in the same way language models tokenize words and phrases, Trajeglish can predict the future motion of vehicles, pedestrians, and cyclists based on their initial locations and interactions.

The Challenge of Human Driving Behavior

Human driving behavior is complex and diverse, making it challenging to simulate realistically. Traditional physics-based simulation models often fail to capture the intricacies of human behavior, leading to unrealistic scenarios. To address this, researchers have turned to data-driven approaches that learn from real-world driving logs.

Introducing Trajeglish

Trajeglish is a novel approach to traffic modeling that uses discrete sequence modeling to simulate realistic multi-agent driving scenarios. By tokenizing motion in the same way language models tokenize words and phrases, Trajeglish can predict the future motion of vehicles, pedestrians, and cyclists based on their initial locations and interactions.

How Trajeglish Works

Trajeglish consists of two main components:

Tokenization: Trajeglish breaks down driving scenarios into discrete tokens, similar to how language models break down text into words and phrases. This allows the model to capture the complex interactions between agents and predict their future motion.
Autoregressive Modeling: Trajeglish uses an autoregressive transformer-based architecture to model the distribution of tokenized scenarios. This enables the model to predict the next token in the sequence, given the context of the previous tokens.

Evaluating Trajeglish

Trajeglish was evaluated on the Waymo Sim Agents Benchmark, where it outperformed 16 other models in terms of realism and interaction metrics. The model demonstrated a significant improvement in scenarios with dense interaction between agents, such as traffic jams, merging scenarios, and four-way stop intersections.

Key Features of Trajeglish

Intra-timestep interaction: Trajeglish models the interaction between agents within a single timestep, allowing it to capture complex behaviors such as grouping and coordination.
Context length: Trajeglish can handle varying context lengths, enabling it to predict scenarios of different lengths and complexity.
Scalability: Trajeglish is scalable with respect to parameter count and dataset size, making it suitable for large-scale simulations.

Tables

Model	Realism Metric	Interaction Metric
Trajeglish	3.3 points higher	9.9 points higher
Wayformer	-	-
MultiPath++	-	-
MTR	-	-

Figures

Figure 1: Trajeglish Architecture

Figure 2: Tokenization Example

Figure 3: Intra-timestep Interaction

Figure 4: Context Length and Scalability

Note: The figures and tables are not included in this response as they require visual elements that cannot be represented in text format.

Conclusion

Trajeglish is a groundbreaking approach to traffic modeling that uses discrete sequence modeling to simulate realistic multi-agent driving scenarios. By tokenizing motion and modeling intra-timestep interaction, Trajeglish can predict the future motion of vehicles, pedestrians, and cyclists with unprecedented accuracy. As autonomous vehicles continue to evolve, Trajeglish has the potential to play a crucial role in their development and deployment.

Teaching Autonomous Vehicles the Language of Human Driving#

Summary#

The Challenge of Human Driving Behavior#

Introducing Trajeglish#

How Trajeglish Works#

Evaluating Trajeglish#

Key Features of Trajeglish#

Tables#

Figures#

Conclusion#