Identifying and Inventorying Legacy Materials for Digitization at the National Transportation Library
Nancy Loi, Simmons College Mary Moulton, OST-R/Bureau of Transportation Statistics
Show Abstract
As an all-digital repository of transportation knowledge, the National Transportation Library has undertaken several digitization projects over the years to preserve legacy print materials and make them accessible to stakeholders, researchers, and the general public. In keeping with the library's collection development policy, NTL is particularly interested in publications and documents created by United States Department of Transportation agencies.
This poster details efforts to review NTL's past digitization practices and identify DOT materials that should be prioritized for future digitization projects due to their unique nature or at-risk status. We anticipate that this preliminary scoping and assessment work will help inform NTL's budget decisions for digitization.
|
P18-20566
|
Preparing for a Trustworthiness Assessment of the National Transportation Library’s Digital Repository ROSA-P
Erica Zhang, University of Illinois, Urbana Champaign Mary Moulton, OST-R/Bureau of Transportation Statistics
Show Abstract
The National Transportation Library (NTL) is an all-digital repository of transportation knowledge that falls under federal mandates to “serve as a central clearinghouse for transportation data and information of the Federal Government” as well as “use best practices in digital preservation and search to optimize archiving, dissemination, and search features which will ensure long-term stewardship of the results of federally-funded research.”
In line with these governing legislation, NTL has committed to maintaining a trustworthy repository that can reliably provide long-term public access to its holdings. This poster documents NTL’s process of planning for a self-assessment and peer review of ROSA P against the 16 Core Requirements of Repository Trustworthiness set forth by the Data Seal of Approval and World Data System. A primary component of this process was gathering policy, procedural, and planning documentation to serve as potential evidence of NTL’s trustworthiness. Engaging in this task allowed NTL to identify gaps in documentation, which not only benefits the trustworthiness assessment, but more generally, provides NTL with knowledge management guidance by identifying undocumented workflows and tacit knowledge that will need to be captured in the near future.
|
P18-20570
|
ROSA-P: The National Transportation Library’s Repository and Open Science Access Portal
Mary Moulton, OST-R/Bureau of Transportation Statistics
Show Abstract
The National Transportation Library (NTL) was founded as an all-digital repository of US DOT research reports, technical publications and data products. NTL’s primary public offering is ROSA-P, the Repository and Open Science Access Portal. An open access repository, ROSA-P is designated as the full-text archive for the results of USDOT-funded research under its public access plan. Collections in ROSA-P are available without restriction to transportation researchers, statistical organizations, the media, and the public. In 2014, NTL began planning a migration of its digital repository to ROSA-P, a new open source platform hosted by Centers for Disease Control (CDC). NTL worked closely with CDC’s team on content migration, training, and the successful move to production. NTL staff re-architected repository metadata to meet Public Access requirements and implemented digital data curation for US DOT funded research results, including the application of digital object identifiers (DOI). ROSA-P makes NTL’s digital collections findable and accessible to users, improves search and discovery, and links data to publications. Additionally, NTL digital resources have improved exposure through specialized search services like Google Scholar.
|
P18-20571
|
Knowledge Extraction from Transportation Research Thesaurus
Subasish Das, Texas A&M Transportation Institute Anandi Dutta, Texas A&M University
Show Abstract
The concept of research thesaurus has evolved from a list of associated words to the controlled vocabularies, where terms form complex structures through semantic associations. In the world of web semantic, a thesaurus is a documentary language that uses a controlled vocabulary to mitigate the ambiguity issues of natural language when it comes to indexing and information retrieval processes. The Transportation Research Thesaurus (TRT), curated and maintained by the Transportation Research Board (TRB), is a web-based tool to improve the indexing and retrieval of transportation information. TRT added nearly 10,000 unique words or word pairs related to transportation engineering in a hierarchical order. It also provides additional information on source, definition, hierarchy, and date added. This study conducted text mining and topic modeling on a bag of 150,000 TRT word contents, metadata, and hierarchy tags to perform knowledge extraction. The data visualization used in this analysis are n-grams for different hierarchy levels, network plot for various cluster groups, term frequency-inverse term frequency plots for significant clusters, and structural topic models. The findings from this study showed that hierarchy tagging is required to be modified in some places based on the patterns found in the knowledge extraction. This study will help in organizing TRT effectively and will provide future directions in developing an effective Simple Knowledge Organization System (SKOS).
|
P18-20572
|
Transportation Research Thesaurus: The Swiss Army Knife of Controlled Vocabularies
Janet Daly, Transportation Research Board Sandra Tucker, Texas A&M University
Show Abstract
The Transportation Research Thesaurus (TRT) is a controlled vocabulary designed to improve the indexing and retrieval of transportation information. Although the thesaurus was originally developed in 1993, it is regularly updated to capture cutting-edge transportation concepts. Each record in the Transportation Research Information Services (TRIS) Databases--including the Transport Research International Documentation (TRID), Research in Progress (RIP), Research Needs Statements (RNS), and Publications Index Databases-- is tagged with terms from the TRT, and searchers also can directly access the thesaurus to improve the quality and efficiency of their searches. Because the TRT provides a common and consistent language between producers and users of transportation information, it can also be used whenever there is a need to optimize the organization of knowledge for retrieval. This poster provides an overview on the structure and use of the TRT. It describes how searchers can leverage the TRT to improve their own literature searches. It also highlights how transportation agencies are incorporating the TRT in a variety of other applications, including tagging records in their own databases, locating definitions, or finding alternative terminology. The poster concludes with an overview of some potential future directions to improve the usability of the TRT.
|
P18-20575
|
National Transportation Library Fellowship: Year in Review
Laura Farley, National Transportation Library
Show Abstract
This poster will follow the first seven months of the National Transportation Library (NTL) Fellowship. Started in June, this new fellowship offers a librarian the opportunity to build skills as a data manager and have a lasting impact on federal data curation. The fellowship program fosters increased technical skills, the opportunity to think broadly and creatively, and the chance to work with a range of professionals. During the first year, I will complete rotations with all areas of the library; reference, cataloging, data management, and systems operations. This poster will look at each area of operation and will include contributions to NTL and lessons learned. During year two and three I will turn to working on a data focused project of my choosing. Preliminary ideas for the project will be included in the poster, likely with an emphasis on including interactive data visualizations on ROSA-P.
|
P18-20643
|
Design of HBase and Hybrid Hadoop Ecosystem Architecture in Transportation Data Management
Hang Yue, Johns Hopkins University
Show Abstract
Big Data 4-V are "volume, variety, velocity, and veracity", and big data analysis 5-M are "measure, mapping, methods, meanings, and matching". With the increase in collection and use of unstructured or semi-structured data, NoSQL databases, including column-family oriented HBase, are popular, HBase is scalable, has different versions of cell values for transportation data archiving. The Max numbers of HBase table row and column are large, and rows and columns are not arranged like the classic spreadsheet model, but rather use a tag metaphor (that is, information is available under a specific tag). Another, different column families with columns and row keys are stored in different Hfiles. HBase in traffic data query is key lookup/key range scan; as data query tool, HBase shell offers "create, get, put, scan, and delete" commands. Besides HBase design in transportation, this research describes HBase architecture and Hybrid Hadoop Ecosystem Architecture design for the development of transportation data management systems. The Hybrid Architecture consists of traditional transportation databases and different Hadoop components, and offers the application potentials on transportation, such as capturing and analyzing voluminous complex data, leveraging traffic data correlations, inferring knowledge from noisy and heterogeneous sources, facilitating transportation research into specific topics, identifying new approaches for transportation engineering, and moving from subjective decision-making to evidence-based transportation data analysis.
|
P18-20645
|
Unstructured Traffic Data Analysis Using Apache Drill
Hang Yue, Johns Hopkins University
Show Abstract
Semi-structured data or self-describing data will account for more than 80% of the data collected by organizations. Data is doubling every two years, RDBMS has been default choice for data analysis, but face cost/time challengers for rapidly growing, varying data sets. NOSQL is designed for big data, 40-year monopoly of RDBMS is over. There are some data analysis challenges: schemas are hard to keep sync when data structure is changing rapidly; non-repetitive/ad-hoc queries needs may not justify schema modeling costs; complex data is hard to map to relational data table; the use of a single query is to join data from different data sources in multiple databases. This research demonstrates Apache Drill for unstructured traffic data analyses and analyzes the benefits and potentials of this new tool based on different traffic data query case studies. Drill is the industry’s first schema-free SQL engine for big data, and bring SQL to scalable data of distributed systems without compromising on the system flexibility. It can freedom access to different data types, but not anti-schema or anti-DBA. Apache Drill does not need ETL for unstructured data, can runtime query complication for dynamic schema discovery, and query traffic data in-situ without data load. It can reduce cost/time of traditional transportation database systems, and achieve agility with no IT intervention. Moreover, Drill can support BI tools you love, allow reuse of existing SQL tools and skills, and bring SQL into unstructured future for the development of transportation database system.
|
P18-20647
|
Delivering Data Packages for Discovery, Analysis, and Preservation
Leighton Christiansen, OST-R/Bureau of Transportation Statistics
Show Abstract
The United States Department of Transportation (USDOT) Plan to Increase Public Access to the Results of Federally-Funded Scientific Research (PA), requires, in part, that “digitally formatted scientific data resulting from unclassified research supported wholly or in part by Federal funding to be stored and publicly accessible for search, retrieval, and analysis.” The PA goes on to require that researchers deliver a Data Management Plan (DMP) which identifies practices they will employ to ensure the long-term access and preservation of the project data.
The National Transportation Library (NTL), of USDOT’s Bureau of Transportation Statistics (BTS), is applying the standards and practices required by the PA to new datasets created by BTS. Further, in order to ensure the greatest possible longevity of discovery and preservation, as well as encouraging data interoperability and reuse, NTL is helping BTS staff to create even more robust documentation for datasets. This documentation is collectively known as a “data package.” In addition to the final dataset, and the DMP, a data package includes other documentation, which as defined by NTL, is “needed to contextualize the dataset for any and all users.”
This poster will explore the various elements of data packages, and look at their initial use within BTS and the NTL repository.
|
P18-20648
|
National RTAP’s Resource Share Curation Project: Development of a Collection Tailored Toward Rural and Tribal Transit Technical Assistance
Cara Marcus, National Rural Transit Assistance Program (RTAP) Robin Phillips, National Rural Transit Assistance Program (RTAP)
Show Abstract
The National Rural Transit Assistance Program (National RTAP) operates under a cooperative agreement between the Federal Transit Administration and the Neponset Valley Transportation Management Association. Our comprehensive set of free technical assistance resources includes training materials, toolkits, webinars, technical briefs, and web applications to address the training and technical assistance needs of rural and Tribal transit operators and the State RTAP programs across the nation. In 2017 our Resource Library’s online portal (Resource Share) underwent a comprehensive upgrade to enhance its look-and-feel and overall functionality. This upgrade provided the opportunity for the Resource Center Manager to undertake a thorough curation project of over 700 resources currently in the system, both our own and those of other key organizations. This poster describes the project scope, objectives, methodology, outcomes and lessons learned.
|
P18-20651
|
AASHTO Digital Publications: Before You Download . . . (Don’t!)
Carol Paszamant, New Jersey Department of Transportation
Show Abstract
Given all the confusion since AASHTO started providing their publications to state DOTs in digital format, and given the complexities of the download process, the limit of one free download per institution, and the need for DRM account login and password to open such documents, the credentials for which, once used, cannot be transferred to another account, it would therefore be helpful for state DOT libraries to inform their organizations of the best practice of letting their department library download these publications so that access can be predictable and consistent for all such publications.
|
P18-20965
|