How we harnessed Machine Learning to predict taxi fares for travel patterns decision making for a taxi agency

admin |
October 23, 2023

The Puzzle and Misguided Trail

A leading taxi agency sought a way to enhance their services, to foresee fare estimates with precision. The hurdle? The unpredictability woven into the city’s fabric—a multifaceted taxi world involving numerous variables—distance, time of day, traffic conditions, and more. Traditional methods fell short in providing precise estimations, leading to inefficient planning and resource allocation.

The agency had previously built a simplistic linear regression model. However, the city’s dynamics defied this approach—the relationships between factors were far from linear. The model struggled to grasp the complexities of rush hours, surges in demand, and the geographical intricacies that painted the city’s fare landscape. The models yielded Root Mean Squared Error (RMSE) scores well above 10, indicating significant discrepancies between predicted and actual fares. It became apparent that it was not just about having the right data and applying statistical models. They needed experts who can apply the right algorithm to their data and fine tune the model.

Cracking the Code With Our ML Experts

Our secret weapon? The XGBoost algorithm—a beacon in the realm of predictive analytics. 

Our journey began with sourcing the publicly available NYC taxi data from AWS S3. We then delved into exploratory data analysis to grasp the intricacies of the dataset. Feature engineering played a pivotal role, where we transformed raw data into meaningful predictors, leveraging techniques like encoding categorical variables, creating time-based features, and incorporating spatial attributes to capture location-specific trends.

Model development involved an iterative process of training, validation, and refinement. XGBoost’s ability to handle complex relationships and non-linearities in the data proved instrumental. The RMSE plummeted to an impressive 4.2, signifying a significant leap towards predictive accuracy. Hyperparameter tuning fine-tuned the model’s performance, optimizing its predictive capabilities.

All this magic was performed in Azure Databricks using allocated resources efficiently to save on model building, training, and fine-tuning costs in cloud.

The Triumph

With the help our team, the agency gained a beacon of reliability. Not only did the RMSE witness a drastic reduction, but other key metrics—R² score, indicating the model’s goodness-of-fit, surged above 0.75. The model’s predictions now aligned closely with actual fares, enabling the agency to chart routes and allocate resources with unprecedented precision.

Through this journey, the agency discovered that success lay not just in adopting innovative approaches but in selecting the right tool for the complexity at hand. XGBoost’s ability to untangle the city’s intricacies transformed the agency’s operations, ushering in an era of informed decision-making and enhanced customer satisfaction.

Read more posts

Scroll to Top