This session features a wide variety of transit data and application advancements including General Transit Feed Specification (GTFS)realtime; ferry smart card data; GTFS for trains; transit load profiles; transit customer satisfaction measures using Twitter; mesoscopic transit networks; high-resolution GPS; and much more.
Constructing Spatiotemporal Load Profiles of Transit Vehicles with Multiple Data Sources
Ding Luo, Delft University of TechnologyShow Abstract
Oded Cats, Delft University of Technology
Hans Van Lint, Delft University of Technology
Load profiles of transit vehicles are crucial for operators to improve transit planning and operations. However, obtaining such information has been a difficult task for a long time due to technical and financial constraints. Although a significant advance in data collection related to both transit demand and supply has been achieved over the past decade, information related to load profiles at the vehicular level is either impossible or very difficult to retrieve from these data. Therefore, these data are often underutilized due to considerable deficiencies and shortcomings in either the data themselves, and / or the algorithms processing these data. The contribution of this paper is a new methodology to process, infer and integrate multiple data sources for constructing load profiles of transit vehicles. This methodology integrates three types of transit data sources, including automatic fare collection (AFC); automatic vehicle location (AVL); and general transit feed specification (GTFS) data. It consists of four steps, each of which is dedicated to addressing specific issues that arise from either a single or a combination of these data sets. Through these steps, the methodology generates passenger load profiles which enable detailed investigation into vehicle movements and demand patterns over time and space, including service utilization and the propagation of delays and crowding. To demonstrate the methodology, we use data collected from the urban transit network in The Hague, The Netherlands. We illustrate that with a series of inference and matching approaches, spatiotemporal load profiles of transit vehicles can be constructed. The visualization of these profiles through space-time seat occupancy graphs provides operators with a compact and powerful reference to improve their services.
Data-Driven Mesoscopic Simulation Models of Large-Scale Surface Transit Networks
Bo Wen, IBI GroupShow Abstract
Siva Srikukenthiran, University of Toronto
Amer Shalaby, University of Toronto
The planning of transit services, assessment of operational strategies, and evaluation of service changes across any transit network can benefit tremendously from utilizing a high-fidelity model of the entire network. While microsimulation methods represent an attractive option, they are typically used for modeling one corridor or a small sub-network. Developing a microsimulation model of an entire transit network is a daunting or otherwise practically infeasible task, particularly for extensive networks. In response to this challenge, this study presents an alternative data-driven mesoscopic simulation pipeline that models surface transit movements based on open data and machine learning. Vehicle running speed regression models and lognormal dwell time distribution models were used to perform the mesoscopic transit simulations. A comprehensive comparison of running speed models using multiple linear regression (MLR), support vector machine (SVM), linear mixed effect model (LME), regression tree (RT) and random forest (RF) showed that LME and RF outperformed MLR in terms of RMSE by 7.9% and 5.9% respectively. It was also found that the dwell time models and the running time model together adequately replicated the variations in headways, delays, and dwell times. Validations of the simulation at the stop level and the route level, while showing encouraging results, demonstrated the need to capture passenger demand and congestion variations using additional data in future studies.
Applying the Bellwether Strategy in Public Transit Data Analytics
Ka Kee Alfred Chu, Autorité régionale de transport métropolitainShow Abstract
Andre Lomone, Autorite regionale de transport metropolitain
From a data-poor environment not so long ago, public transit organizations are nowadays overwhelmed by data streams. The opportunities presented by these data to answer relevant questions for planning and decision-making are more than ever. However, due to limits in computational resource and time, some information may not be worthwhile nor practical to generate using a full-fledged data processing procedure. The bellwether strategy proposed in this paper uses simple models to relate two or multiple strategically selected travel patterns in order to bypass complex algorithm and obtain quick approximations. The strategy also provides a good way to detect data collection error and anomaly in datasets and allows the trialing of new data processing algorithm before a full-fledged implementation.
Three case studies, covering multiple scenarios, are used to illustrate real-world applications: deriving the daily metro network ridership using one station; deriving the origin-destination matrix from a reference travel pattern and deriving missing commuter rail ridership based on one metro station. The difference between the calculated results, based on the full-fledged algorithm, and the estimated results, based on the approximation model, is used to evaluate prediction accuracy. The results of the case studies are reasonably accurate. The approach is generalizable and its specifications depend on the needs and choice. Adopting the strategy would relieve the computation needs in many transit organizations and bridge the gap between academic research and industry practices.
Alighting Stop Determination and Origin–Destination Matrix Estimation in Bus Transit Systems Based on User Segmentation
Fenfan Yan, Tongji UniversityShow Abstract
Chao Yang, Tongji University
Satish Ukkusuri, Purdue University
OD matrix estimation is an important problem in bus transit systems for transportation system analyses, planning and ridership analysis. However, smart cards of most bus transit system only record boarding stops of passengers, which hinders direct exaction of transit OD matrix. In this paper, we associate smart card data, bus GPS data and static transit network information to derive transit OD matrix. Passengers are segmented into regular and irregular groups using K-means clustering based on number of trips per day and days with trip records per week. Regular users are believed to have higher tendency to return to previously visited locations and exhibit high predictability. Deterministic model such as the trip chain analysis is conducted on these passengers. Around 84.55% records can be determined using the developed model. For irregular users with uncertainty and limited records, machine learning algorithms are developed in alighting stop classification. The performance of Naïve Bayesian, SVM, decision tree, random forest, KNN and ensemble learning are then compared and an average accuracy of around 70% is achieved. Transfer trip recognition helps to distinguish transfer trips from single trips. This research sheds light on forming a set of simple and applicable methodologies in alighting stop determination and OD matrix estimation using smart card data.
Mapping the Unmapped Transit Network of Bogotá, Colombia
Eric Goldwyn, Columbia UniversityShow Abstract
C. Erik Vergel-Tovar, Universidad del Rosario
New tools have enabled “civic mappers” and
transportation researchers to map previously unmapped transit networks that have
been historically the purview of locals and insiders. These new datasets and
maps show the extent of these systems while also enumerating basic operating
characteristics, such as travel speed, route distance, frequency, and fare data.
In this paper, we detail our process of visualizing Bogotá’s entire transit network, both the centrally-planned
system of buses and the decentralized network of jitneys. By seeing the entire
network, we argue that we can disentangle the development patterns of the city
and monitor who has access to reliable transit, which also happens to be one of
the United Nations’ Sustainable Development Goals. Since this type of work is
still in its infancy, it is critical that researchers go out into the field and
add more examples of how to do this work and share their process so different
methodologies can be tested in different types of cities. In Bogotá, we,
researchers from NYU’s Marron Institute of Urban Management worked with
researchers and students from the Universidad del Rosario and the civic mapping
community in Bogotá, used smartphones, cloud-based data managements systems, and
mapmaking software to bring Bogotá’s unmapped transit network out of the shadows
and put it on an equal footing with the established network of buses.
Temporal and Spatial Interactions Between Transit Ridership and Urban Land Use Patterns
Merkebe Demissie, University of CalgaryShow Abstract
Lina Kattan, University of Calgary
The movement of people in urban area is influenced by the distribution of home, work and other activity areas and the transportation link between them. The use of urban areas continually change and the activity patterns of significant trip generators evolves over time. Efficient public transit planning need to perform frequent estimates of the spatiotemporal distribution of different activity areas and measure the likely consequences of the changes upon transit uses. Public transit is one of the agencies that generate a variety of large volume of data on a high velocity basis. These data are obtained from on-board sensors and data collection points introduced by automated data collection system (e.g. automatic passenger count - APC). Many transit agencies also generate a General Transit Feed Specification (GTFS) data and share them openly with public. This study explores the use of APC, GTFS and land use data to examine a variety of land use and transit ridership interactions using visualization, data mining, and statistical analysis techniques. Results show that transit ridership at the individual bus stop level gives a better understanding of the unique land use that surrounds each bus stop. Zonal level transit ridership patterns reveal the different trip generation and attraction roles of the corresponding land uses. In addition, transit routes with different land use mixture generate significantly different pattern and size of ridership in the morning and afternoon peak hours.
Measuring and Visualizing Transit Customers’ Satisfaction Using Twitter Data
Baocheng Wu, University of British Columbia, OkanaganShow Abstract
Ahmed Idris, University of British Columbia
The feasibility of utilizing Twitter data for the purposes of measuring and visualizing transit customers’ satisfaction is evaluated based on a series of analyses: data mining, semantic analysis, sentiment analysis, and GIS visualization. With free access (under certain restrictions) to Twitter databases through Twitter API, search modifications can be made to adjust the needs of the developer. Twitter data used in this study is acquired through a unique search combination, with keywords such as agency name and mode choice accompanied with a search area and a language. This methodology can collect all kind of tweets related to public transportation, with or without an agency name contained in a text. Further, semantic analysis was applied to classify and store each tweet into corresponding category based on a high volume lexical analysis. In order to quantify service quality, sentiment analysis was used as a customer satisfaction measurement system that grades a tweet from extremely negative to extremely positive. In the end, tweets were visualized given their locations for problem detection and identification.
Using Smart Card Data to Identify Critical Transfer Hubs and Characterize Transfer Time Between Metro and Bus System
Siyu Hao, National University of SingaporeShow Abstract
Der-Horng Lee, National University of Singapore
The emergence of smart card data provides great impetus to better understanding mobility pattern in a more sensible way. Taking advantages of smart card data in Singapore, which integrates bus and metro trips in a same data frame as well as records both passengers' boarding and alighting activities, we are able to investigate multi-modal transfer patterns between metro and bus system. In this paper, we firstly explore the variation of multi-modal transfer time and its impacting factors. We find that the multi-modal transfer time between metro and bus system varies significantly across different time slots of day and different transfer hubs (metro stations) in the city. The explanations for the variations of transfer time are given, considering trip purpose, bus service frequency and several properties of metro stations, such as layout, scale and location. In addition, a rule-based approach to identifying critical transfer hubs is proposed. Based on this approach, we isolate a group of critical transfer hubs in the morning peak and another group of critical transfer hubs in the evening peak, which demonstrates that the proposed approach is of favorable efficiency and interpretability. Finally, the distribution of metro-to-bus transfer time is estimated. We present a comparison among a set of commonly used theoretical distributions to model transfer time. The fitting result indicates that Lognormal distribution outperforms among all the candidates and can be considered as a well representative descriptor of metro-to-bus transfer time. To my best knowledge, this is the first attempt to model the distribution of multi-modal transfer time using large quantities of real world data passively collected by smart card.
Quality Control: Lessons Learned from the Deployment and Evaluation of GTFS Real-Time Feeds
Sean Barbeau, USF Center for Urban Transportation ResearchShow Abstract
Real-time transit information has many benefits to transit riders and agencies, including shorter perceived and actual wait times, a lower learning curve for new riders, an increased feeling of safety, and increased ridership. In the last few years, a real-time complement to the General Transit Feed Specification (GTFS) format, GTFS-realtime, has emerged. GTFS-realtime has the potential to standardize real-time data feeds and lead to wide-spread adoption for transit agencies and multimodal apps. However, GTFS-realtime suffers from a lack of clear documentation and openly available validation tools, which significantly increases the time and effort necessary to create and maintain GTFS-realtime feeds. More importantly, bad data has been shown to have a negative effect on ridership, the rider’s opinion of the agency, and the rider’s satisfaction with multimodal apps. This paper discusses the lessons learned in the deployment of a GTFS-realtime feed with an open-source mobile transit app as part of a regional transit information system for the Tampa Bay area in Florida. These experiences led to improvements to the GTFS-realtime specification itself, as well the creation of an open-source GTFS-realtime validation tool. An evaluation of 65 transit agency GTFS-realtime feeds using the validation tool showed integrity errors in 55 feeds and warnings in 51 feeds, indicating wide-spread problems with quality control. This paper concludes with recommendations going forward that will help reduce the time needed to develop, test, deploy, and maintain GTFS-realtime feeds, which will in turn lead to better quality real-time information for transit riders.
Evaluation of General Transit Feed Specification Data for Electric Train Energy-Consumption Estimation
Weichang Yuan, North Carolina State UniversityShow Abstract
H. Christopher Frey, North Carolina State University
Energy consumption of electric trains is sensitive to route characteristics and travel activity. Publicly available data can be used to develop inputs needed for train energy consumption estimation. However, such data have not been evaluated. The objective of this paper is to evaluate General Transit Feed Specification (GTFS) data for route characteristics and travel activity data using the Washington Metropolitan Area Transit Authority (WMATA) Metrorail system as an example. The methodology includes data collection, evaluation, and estimation of derived activity data. Data were collected from GTFS-Static data, GTFS-Realtime (GTFS-RT) data, field measurements, and the District of Columbia Geographic Information System (DCGIS) services. Data evaluation refers to comparing GTFS-Static and GTFS-RT data to field measured data, DCGIS data, or both. Station stop locations, route lengths, stop-by-stop segment distances and travel times, and dwell times at station stops were evaluated. Segment average speeds were estimated. GTFS-Static data can provide station stop locations with high accuracy; however, GTFS-Static data underestimate route length and segment distances. GTFS-Static travel time is based on schedules, not actual travel time. GTFS-RT data need bias-correction to estimate actual time that the train is in motion, considering dwell times at station stops. Segment average speeds were estimated based on DCGIS distances and average bias-corrected GTFS-RT times. Average speeds between stops are available for the entire system. Based on data for the WMATA Metrorail system, GTFS data have minor errors and correctable bias. However, GTFS data need to be combined with other data sources for electric train energy consumption estimation.
A Study of Bus High-Resolution GPS Speed Data Accuracy
Miguel Figliozzi, Portland State UniversityShow Abstract
The recent availability of high-frequency transit (HFT) data for buses has allowed the estimation bus-travel speed profiles between bus stops. HFT data is defined as data comprised by GPS vehicle trajectory recorded at or less than five second intervals. With HFT data it is now possible to measure relative changes in bus speed at intersections, ramps, crosswalks, etc. The word “relative” is emphasized because previous research efforts have never compared GPS-based bus speeds with general traffic speeds. This research fills this knowledge gap. This research utilizes accurate stationary radar speed data as a baseline or ground truth data to estimate bus GPS-based speed accuracy. A thorough data analysis of the bus and traffic speed data indicates that HFT speed estimations between stops are accurate and highly correlated with traffic speeds. Time-space speed profiles and regression analysis are utilized to quantify factors that affect HFT speed estimation accuracy. The relative advantages and limitations of the HFT data are presented and discussed. The role of transit vehicle frequency is examined and the study concludes that large HFT datasets are highly suitable to cost-effectively monitor recurrent arterial speed performance and analyze causes of congestion along transit routes.
Temporal Sampling and Service Frequency Harmonics in Transit Accessibility Evaluation
Andrew Owen, University of Minnesota, Twin CitiesShow Abstract
Brendan Murphy, University of Minnesota, Twin Cities
In the context of public transit networks, repeated calculation of accessibility at multiple departure times provides a more robust representation of local accessibility. However, these calculations can require significant amounts of time and/or computing power. One way to reduce these requirements is to calculate accessibility only for a sample of time points over a time window of interest, rather than every one. To date, many accessibility evaluation projects have employed temporal sampling strategies, but the effects of different strategies have not been investigated and their performance has not been compared. Using detailed block-level accessibility calculated at 1-minute intervals as a reference dataset, four different temporal sampling strategies are evaluated using aggregate sample error metrics as well as indicators of spatially clustered error. Systematic sampling at a regular interval performs well on average but is susceptible to spatially-clustered harmonic error effects which may bias aggregate accessibility results. A constrained random walk sampling strategy provides slightly worse average sample error, but eliminates the risk of harmonic error effects.
Analysis of Ferry Passenger Alighting and Boarding Movement Using Smart Card Data Visualization in Brisbane, Queensland, Australia
Roy Zhu, University of QueenslandShow Abstract
Samuel Hislop-Lynch, University of Queensland
SangHyung Ahn, University of Queensland
The city of Brisbane, Australia is forecasted to undergo extensive population growth, presenting contemporary and future challenges to its transportation infrastructure. Subsequently, it is crucial to implement new strategies which sustainably facilitate mobility demands. Fortunately, Brisbane’s urban structure creates opportunities for the use of non-road public transportation, specifically the CityCat ferry. Operating parallel to Brisbane’s road corridors, the CityCat offers the ability to bypass congestion. However, a better understanding of Brisbane’s ferry network and its users is crucial for determining its potential to facilitate transit. The requisite information is potentially available through smart card data, which is emerging as a source of insight for public transit networks. Past researchers used Brisbane’s smart card data to investigate macroscopic trends in ferry usage. However, there is limited literature employing a microscopic approach to analyze the ferry system. This paper thus aims to adopt a more human-centric approach for smart card data analysis by understanding passenger boarding and alighting patterns. Two data visualization tools were introduced to assist analysis. The headway-dwell time plot depicts dwell time for each ferry due to alighting and boarding movements of each passenger at a terminal. This information is used to generate the flow-dwell time plot, which conveys dwell time due to a specific quantity of alighting, boarding or total passenger movements for a terminal. Numerous suggestions evolved regarding the potential of the information gained from these visualizations. It can inform researchers and operators on stop performance by pinpointing inefficiencies in passenger movements and provide simulation model inputs.