This session will focus on methodological advancements with transit data, including making, mixing and matching, and employing machine learning (ML) techniques to address a number of issues. Topics include transit and COVID, customer satisfaction, E-scooters, direct demand modeling, unplanned disruptions, WIFI, and so much more!
Using Data Science and GIS-Based Analysis of Transit Passenger Complaints to Uncover Patterns of Passenger Frustration
Moran Yona, Israel Public Transport Authority, Israel Ministry of TransportShow Abstract
Genadi Birfir, Adalya Consulting and Management
Sigal Kaplan, Hebrew University of Jerusalem
This study shows how mapping and statistical modelling can improve system-wide analysis of passenger complaints. The analyzed data set consists of 718 passenger complaints concerning municipal bus lines. GIS-based analysis showed that the spatial spread of complaints changes over time, as a function of service disruption type and geographical areas. Recursive Bivariate Probit model results indicated that the most acute sources of frustration are problem recurrence and monetary loss. The service problems that are associated with problem recurrence are mainly crowding, followed by delays and line cancellations. A negative Binomial model estimation showed that the number of complaints increases with the ratio between the number of boarding to bus arrivals and decreases with the increase in motorization rate and with the distance from the central bus terminal. Latent cluster analysis shows that acute frustration is mainly related to one type of complaints.
Synthetic Mobility Traces from Mobile Phone Data, Infection Modelling and Public Transport
Sebastian Müller (firstname.lastname@example.org), Technische Universität BerlinShow Abstract
Christian Rakow, Technische Universitat Berlin
Kai Nagel, Technische Universitat Berlin
Epidemiological simulations are used to better understand the spread of infectious diseases like COVID-19 and to predict their course. This paper uses a data-driven and activity-based model based on mobile phone data to assess the influence of public transport on infection dynamics in the current COVID-19 pandemic. Results for Berlin show that wearing masks significantly reduces the share of infections occurring in public transport. 90% of passengers wearing cloth masks reduces the share of infections in public transport by about 70%. 90% of passengers wearing the even better N95 masks reduces infections in public transport by nearly 100%. These results show that it is not necessary to shut down public transport when passengers wear masks. Rather, care should be taken to ensure that the density in vehicles is not too high by using large vehicles and by increasing the supply at peak times.
Analysing the Role of Waiting Time Reliability in Public Transport Route Choice Using Smart Card Data
Sanmay Shelat (email@example.com), Delft University of TechnologyShow Abstract
Oded Cats, Delft University of Technology
Niels van Oort, Technische Universiteit Delft
Hans van Lint, Technische Universiteit Delft
Waiting time has been consistently shown to have significant impact on route choice behaviour in public transport networks. Furthermore, unreliability in this component may cause frustration and anxiety in travellers. Although the effect of travel time reliability has been studied extensively, most studies have used stated preferences which have disadvantages, such as inherent hypothetical bias. In this study, we use revealed preferences derived from passively collected smart card data to analyse the role of waiting time reliability in public transport route choice in The Hague. Waiting time reliability is studied as regular and irregular deviations from scheduled values, and a number of indicators for the latter are examined. Results indicate reliability ratios of 0.20–1.12; relatively low compared to literature, potentially adding to growing consensus that stated preferences tend to overestimate values. Small coefficients in combination with the overall reliable service means that unreliability plays a small role in travellers’ route choices in The Hague. Additionally, behaviour in morning peak and off-peak hours is contrasted and difference in reliability coefficients for different modes in the network, and for origin and transfer stops are reported.
Public Opinions on Public Transportation amid COVID-19 Pandemic: Preliminary Analysis of Social Media Data
SONG HE (firstname.lastname@example.org), George Mason UniversityShow Abstract
Ossama Salem, George Mason University
The COVID-19 pandemic has exerted significant, long-lasting, and ever-changing impacts on many aspects of the United States society. Public transportation has witnessed major reduction in ridership since March 2020, and it is important to understand riders’ perspectives behind the numbers, improve rider and employee safety, and recover revenue losses for transit systems. Traditional means to investigate public opinions such as surveys and interviews are discrete, costly, and time-consuming, which fails to capture the dynamic impacts of the COVID-19 pandemic on public transportation. This research proposes a framework to continuously analyze social media data from Twitter through topic modeling to reveal riders’ interests and concerns regarding public transportation during COVID-19 pandemic in near real-time. The temporal and spatial trends of riders’ concerns expressed on Twitter are also derived to support public transportation agencies in forming a better understanding of the impacts on transit from the COVID-19 pandemic. Results show that COVID-19 status, personal protective equipment, transportation system service and performance, and social distancing were the most prominent concerns that riders in the United States have expressed on Twitter. Discussions on these topics were gradually increasing over time except for those on social distancing, and significant spatial variations were observed across different states including California, Washington D.C., Florida, New York, Texas, and Washington. This framework has the potential to assist public agencies in adjusting operations, recovering transit ridership, and planning for next steps during the COVID-19 pandemic and similar incidents in the future.
Short-term Forecast on Individual Accessibility in Bus System Based on Neural Network
Yufan Zuo, Southeast UniversityShow Abstract
Di Huang, Southeast University
Xiao Fu, Southeast University
Zhiyuan Liu, Southeast University
This study proposes a three-stage method for short-term forecast on individual accessibility in bus system based on neural network (NN). In the first stage, a NN is designed to determine whether the passengers will have bus trips in the predicted period. An extra layer is added into the NN considering the bus travel generation rate in the region. The inputs of the designed NN are composed of the appearance of the bus trips of the passengers in the historical periods. Then, the probability of the passenger’s destination choice is calculated based on his/her historical bus trip data. In the third stage, land use information combined with the results of the second stage are used to obtain the individual accessibility in bus system in the predicted period. The land use information is represented by spatial distribution of the number of points of interest in the studied area. Results show that the proposed method can give precise short-term forecast on individual accessibility in bus system both in weekdays and weekends. The results also demonstrate the capabilities of combining deep learning method, traffic data and land use information to get the future spatial distribution of individual accessibility in traffic system.
Automated Capturing of Transit Passenger Transfers and Network O-D Using An Optimized Wi-Fi-Based Recognition Algorithm
Majeed Algomaiah, University of LouisvilleShow Abstract
Richard Li (email@example.com), University of Louisville
Origin and destination (O-D) of public transit passengers are important for the planning and operation of the transit system. However, in the US, only 46% of public transit agencies have a smartcard system in the US, and most of them require entry-only tap, which prohibits identifying passenger destinations. A lot of transit agencies collect passenger O-D information without considering passenger transfers, which makes the O-D matrix to be at route-level instead of the more accurate network-level. There is a need of a cost-effective and automated solution to facilitate the majority of the U.S. transit agencies in recognizing the origin and destination of passengers as well as capturing passenger transfers. The objective of this paper is to create a novel strategy for transit agencies to capture network-based passenger O-D matrix with consideration of transfers using a cost-effective Wi-Fi sensing based approach. When recognizing passenger boarding and alighting at bus stops, a GIS-based optimization algorithm is harnessed to maximize the passenger recognition rate. Two pilot studies were conducted, and the results reveal that the proposed Wi-Fi based approach is capable of recognizing 78.7% of the total passengers as well as detecting their boarding and alighting activities. The paper demonstrates the ability of the proposed novel method to detect passengers with a reasonable detection rate by using Wi-Fi technology on bus routes, which makes it feasible for transit agencies to conduct a frequent network-level passenger O-D study.
Why Do People Take E-scooter Trips? Big Data and Unsupervised Machine Learning Insights on Temporal and Spatial Usage Patterns
Nitesh Shah, University of TennesseeShow Abstract
Jing Guo, University of Tennessee, Knoxville
Lee Han, University of Tennessee
Christopher Cherry (firstname.lastname@example.org), University of Tennessee, Knoxville
Electric scooters (e-scooters) are becoming one of the most popular micromobility options in the United States. However, there is little known about the patterns of e-scooter use. This study proposes a framework for high-resolution clustering of micromobility data based on temporal, spatial, and weather attributes in order to classify trip types. As a case study, we scrutinized more than one million scooter trips of Nashville, Tennessee, from September 1, 2018, to August 31, 2019. Weather data and land use data from the Nashville Travel Demand Model data and scraping of Google Maps Point of Interest (POI) data complemented the trip data. The combination of Principal Component Analysis (PCA) and a K-Means unsupervised machine learning algorithm identified five distinct e-scooter usage patterns, namely daytime short errand, utilitarian, evening social, night-time entertainment district, and recreational trips. Among other findings, the most popular use of e-scooters in Nashville was to travel within the entertainment district at night, which contributed to 26% of all e-scooter trips. We did not find e-scooter use patterns that resemble typical commuting patterns. The average daily number of trips on a typical weekend was 84% higher than a typical weekday. We also found variation in e-scooter usage patterns over a year. The findings of this study can help city administrations, planners, and micromobility operators to understand when and where people are using e-scooters. Such knowledge can guide them in making data-driven decisions regarding safety, sustainability, and mode substitution of emerging micromobility.
Unplanned Disruption Analysis in Urban Railway Systems Using Smart Card Data
Tianyou Liu, Northeastern UniversityShow Abstract
Zhenliang Ma, Monash University
Haris Koutsopoulos, Northeastern University
Metro system disruptions are a big concern due to their impacts on safety, service quality, and operating efficiency. A better understanding of system performance and passenger behavior under unplanned disruptions is critical for efficient decision making, effective customer communication, and identifying potential improvements. However, few studies explore disruption impacts on individual passenger behavior, and mostly collect data manually. Due to survey limitations, this study examines the potential of automated data to analyze unplanned disruption impacts comprehensively. We propose a systematic approach to evaluate disruption impacts on system performance and passenger behavior using automated fare collection (AFC) data. The approach proposes various metrics and inference methods to evaluate performance from perspectives of train operations, information provision (customer communication), and bridging strategy (use of shuttle services to connect stations impacted by an incident). The proposed approach is demonstrated using data from a major metro system. The results highlight the ability of AFC data to provide new insights for unplanned disruption analysis that are difficult to extract from traditional data collection methods.
Analyzing Bus Ridership with a Spatial Direct Demand Model
Raven McKnight, Metro Transit (MN)Show Abstract
Eric Lind, Metro Transit, Minneapolis-St. Paul
Direct demand transit modeling is challenging due to complex demographic and geographic phenomena. Many direct demand models rely on overly general demographic characteristics such as population density as predictors. Additionally, they often ignore the inherently spatial nature of ridership by using non-spatial methods or by aggregating bus stops counterintuitively into Census geographies. These practices fail to appropriately model the spatial structure of transit data and limit the ability of the model to describe ridership in terms of rider demographics. In this paper, we implement a spatial Bayesian model, the BYM2, to model bus ridership at Metro Transit (Minneapolis-Saint Paul, MN). We incorporate more descriptive demographic predictors to understand characteristics of transit riders in the region. The model conducts geographic smoothing which improves model fit and more accurately describes spatial bus ridership data. We identify demographic predictors which are rarely usedin the transit modeling literature. Finally, we recommend spatial modeling techniques as a partial solution to known bounding issues for bus stops aggregated to Census geographies.
Using Origin-Destination Flows Determined from APC and AFC Data to Correct Biases in Socioeconomic and Travel Characteristics Obtained from Transit Onboard Surveys
Rabi Mishalani (email@example.com), Ohio State UniversityShow Abstract
Mark McCord, Ohio State University
Transit agencies collect onboard surveys of passengers’ socioeconomic and travel (SE&T) characteristics on a regular basis. The surveys are subject to sample and response biases, which can affect the representativeness of the results. Independently conducted passenger stop‑to‑stop origin-destination (OD) flow surveys are used to correct for some biases, but these additional surveys are costly and time-consuming. The OD flows can be estimated using data already being collected from Automatic Passenger Count (APC) and Automatic Fare Collection (AFC) systems, and these estimates can conceivably be used to correct the SE&T characteristics, thereby reducing or eliminating the need for the additional OD surveys. A large‑scale empirical study was conducted to investigate the sensitivity of adjusted SE&T characteristics to OD flows derived from different data sources. The results indicate that using OD flows determined from already existing APC and AFC data leads to the similar or better quality of adjusted SE&T characteristics in comparison to the quality resulting from using the additional OD surveys administered in practice.
Short-term Metro Passenger Flow Prediction Capturing the Impact of Unplanned Events
yangyang zhao, Southwest Jiaotong UniversityShow Abstract
Zhenliang Ma, Monash University
Unplanned events present challenges for operations and management in metro systems. Short-term passenger flow prediction can help agencies to better design contingency strategies and communicate with passengers under unplanned events. Though many short-term prediction methods have been proposed in the literature, most studies focused on normal situations. The study focuses on the short-term metro passenger flow predictions under disruptions and explores novel mechanisms for capturing the impact of unplanned events and addressing the imbalanced dataset for training. Typical machine learning and deep learning methods are developed for exploration. A large-scale automatic fare collection (AFC) data and incident log data for a heavily used metro system is used for empirical studies. The analysis found that the same type of unplanned events shares a similar and consistent pattern of demand change (with respect to normal situations) at the station level. The synthetic minority over-sampling technique (SMOTE) can enrich the passenger flow observations under unplanned events and generate a balanced dataset for model training. The results show that the combination of passenger flow change ratio and the SMOTE oversampling technique enables the prediction models to learn the impact of unplanned events, and thus significantly improves the prediction accuracy under disruptions. However, the over-sampling techniques (i.e., SMOTE and replication) slightly deteriorate the prediction accuracy for passenger flow under normal situations. The findings shed insights on mechanisms for disruption impact representation and oversampling imbalanced data in model training and guide the development of short-term prediction under unplanned events.
An Examination of New York City Transit’s Bus and Subway Ridership Trends during the COVID-19 Pandemic
Anne Halvorsen, MTA New York City TransitShow Abstract
Daniel Wood, MTA New York City Transit
Darian Jefferson, MTA New York City Transit
Timon Stasko, No Organization
Jack Hui, MTA New York City Transit
Alla Reddy, MTA New York City Transit
The New York City metropolitan area was hard hit by COVID-19, and the pandemic brought with it unprecedented challenges for New York City Transit. This paper addresses techniques used to estimate dramatically changing ridership, at a time when previously dependable sources suddenly became unavailable (e.g., local bus payment data, manual field checks). The paper describes alterations to ridership models, as well as expanding usage of Automated Passenger Counters, including validation of the new technology and scaling to account for partial data availability. The paper then examines trends in subway and bus ridership. Peak periods shifted in both time of day and relative intensity compared to the rest of the day, but not in the same way on weekdays and weekends. On average, trip distances became longer for subway and local bus routes, but overall average bus trip distances decreased due to a drop in express bus usage. Subway ridership changes were compared to neighborhood demographic statistics and numerous correlations were identified, including with employment, income, and race and ethnicity. Other factors, such as the presence of hospitals, were not found to be significant.
Understanding Ridesplitting Behavior with Interpretable Machine Learning Models: Comparing Trip-level and Community-level Characteristics using Chicago’s Ridesourcing Trips
Hoseb Abkarian, Northwestern UniversityShow Abstract
Ying Chen, Northwestern University
Hani Mahmassani, Northwestern University
As congestion levels increase in cities, it is important to analyze people’s choices of different services provided by Transportation Network Companies (TNC). Using machine learning techniques in conjunction with large TNC data, this paper specially focuses on uncovering complex relationships underlying ridesplitting market share. A real-world dataset provided by TNCs in Chicago is used in analyzing ridesourcing trips from November 2018 to December 2019 to understand trends in the city. Aggregated origin-destination trip-level characteristics, such as mean cost, mean time, and travel time reliability, are extracted and combined with origin-destination community-level characteristics. Three tree-based algorithms are then utilized to model the market share of ridesplitting trips. The most significant factors are extracted as well as their marginal effect on ridesplitting behavior, using partial dependency plots. The results suggest that, overall, community-level factors are as or more important than trip-level characteristics. Additionally, the percentage of White people highly affect ridesplitting market share as well as the percentage of bachelor’s degree holders and households with two people residing in them. Finally, the potential impact of taxes, crimes, cultural differences and comfort is discussed in driving the market share and suggestions are presented for future research and data collection attempts.
Examining the Discrepancy between Self-Reported and Actual Commuting Behavior at the Individual Level
Tianyu Su (firstname.lastname@example.org), Massachusetts Institute of Technology (MIT)Show Abstract
M. Elena Renda, Istituto di Informatica e Telematica del CNR
Jinhua Zhao, Massachusetts Institute of Technology (MIT)
Travel surveys lay the foundation of transportation modeling and planning, making travel survey methods an extensively studied area. Given the rapid development of information technology and urban sensing systems, recent years have seen substantial improvements in survey-elicited and passive mobility data collection approaches. These phenomena enable strict and detailed comparisons between self-reported travel behavior extracted from travel surveys and actual travel behavior revealed by urban sensing systems such as smart cards (SC) and parking systems. Most previous works examined this discrepancy at the population level; however, an individual-level investigation of this discrepancy is crucial and has vast potential, from informing target travel demand management to designing personalized transportation services. In this research, the discrepancy between self-reported and actual travel behavior is studied at both the individual and aggregated levels, leveraging the available mobility data, namely commuting diaries and passive mobility records. We propose a group of discrepancy measurements for two types of commuting activities (i.e., transit and driving) and apply the framework to the empirical analysis at the Massachusetts Institute of Technology (MIT). The application reveals that survey-elicited commuting diaries are relatively reliable when examining overall commuting trends, yet they are relatively less accurate when used to investigate individual-level discrepancies between reported and actual commuting behavior of MIT employees. Furthermore, this paper identifies associations between commuting discrepancies and certain individual characteristics, including employee type, age, gender, stated commuting mode, and actual commuting frequency. In addition, the distributions of discrepancy measurements across different employee groups are found to vary substantially.
DISCLAIMER: All information shared in the TRB Annual Meeting Online Program is subject to change without notice. Changes, if necessary, will be updated in the Online Program and this page is the final authority on schedule information.