Introduction

Our project aims to enhance a segment of SANDAG's Activity-Based Model (ABM). ABMs, which base travel demand on an individual's daily activities, simulate decisions of individuals and households related to daily travel. We are focusing on the ABM Trip Destination component to forecast where individuals and households travel, and serves a crucial step in determining activity patterns and travel demand.

ABMs have significant implications for SANDAG's infrastructure and urban planning policies in San Diego. Currently, the trip destination component takes 40 minutes to process and generates 12 million trips. Our project intends to apply machine learning to boost computational efficiency of SANDAG's trip destination component while preserving its predictive accuracy.

Landing Page

Data and Exploratory Data Analysis

The ABM utilizes a statistical model to generate synthetic population data using the San Diego census data. The synthetic population datasets offer detailed sociodemographic profiles of regional households and individuals, including age, race, education level, and serves as the key initial foundation to the pipeline.

The statistical model then generates a set of synthetic trips for the synthetic population. The synthetic tours dataset contains information on tour details, such as origin, purpose, mode, etc. The project will utilize the synthetic population and trips data to train the model and predict trip destinations. More information about each feature from the datasets can be found through SANDAG's ABM Github Repository .

Since San Diego is a large county containing 12 million trips, data was filtered to represent households in Districts 1, 2, 5, and 6. These district were selected based on areas of interest as they represented communities of diverse socio-economic backgrounds. Trip destinations were in the form of Traffic Analysis Zones (TAZ), which were aggregated in form of Land Use Zones (LUZ).

Interactive 3D Map for Synthetic Population Trip Destination Distribution

Switch Browser for Optimal Performance if Lag Occurs

Methodology

Performing model selection on the synthetic dataset resulted in a Decision Tree Classifier obtaining the best accuracy scores. The baseline Decision Tree model resulted in a 90% training accuracy, 72% testing accuracy, 72% F1 weighted averaged score, and 69% F1 macro averaged score. To mitigate the risks of class imbalance, our team utilized an oversampling technique, Synthetic Minority Oversampling Technique (SMOTE), to balance the class distribution of the dataset. GridSearch Cross Validation helped tune the model's hyperparameters to address issues of overfitting and optimize the model for F1 weighted average score.

Discussion

The results indicate that machine learning applications have the potential to impactful in the domain of ABMs. A p-value of 0.9374 indicates that there is not enough evidence to conclude that the two samples come from the different distributions. The model obtains a score for precision and recall both at 73%, showing that model's ability to balance correctly identifying true instances while minimizing false predictions. Compared to SANDAG’s trip destination component, which has a linear time complexity and requires a 40 minute runtime to generate 12 million trips, the decision tree model requires a 1.5 minute runtime to train on 4.5 million trips, and predict 1 million trips.

In the future, the team hopes to analyze how the decision tree performs with a larger dataset, such as 12 million. We hope to explore alternative oversampling techniques and the application of more advanced models, such as Neural Networks or ensemble methods similar to previous literature, to will adequately capture complex relationships between trips and individuals, and improve generalization compared to a decision tree.

Conclusion

The project provides valuable insight on machine learning applications in the domain of ABMs. The decision tree model maintained similar trip destination predictions to the current SANDAG ABM while improving computational efficiency. However, it's important to note that our team's findings do not prove that machine learning directly improved predictive power for ABMs, nor do they conclude that machine learning is a better alternative to ABMs. As the field of ABMs continues to grow, it is integral to seek methods to improve current pipelines, and incorporating machine learning techniques into activity based models can be promising for our future by improving our understanding of travel behavior. The combination of these techniques can allow organizations to capture complex relationships between individuals, daily activity patterns, and travel demand to strive towards developing more innovative solutions and establish local policies that are beneficial and equitable for all residents.