TT2011: The Need for a New Paradigm in Transport Data Collection and Analysis
Geodata of open source Flickr data in San Francisco. (Photos taken by tourists are in red, blue by local and yellow could be both). Photo by Eric Fischer.

Geo-tagged open-source Flickr data in San Francisco. (Red dots are photos taken by tourists, blue dots are photos taken by locals, and yellow dots could be either.) Photo by Eric Fischer.

In a large city with broad institutional capacity like New York City, a lot of data is available. The city has access to a lot of useful numbers from a variety of sources, from community-based organizations that track the block-by-block details of the city, to meetings and conferences about larger city goals, to institutions like the Department of Health and local universities.

Commissioner of the New York City Department of Transportation Janette Sadik-Khan was profiled by Esquire magazine late last year as an unrelenting, number-cruncher who has put forth her agenda swiftly and efficiently. For Sadik-Khan, collecting data is paramount:

“Stat! The commissioner dispatches data streams as though from a machine gun, pelting dissenters with a language that is part English and part numerical: Injuries to motorists and passengers in the project areas are down 63 percent. Pedestrian injuries are down 35 percent. Eighty percent fewer pedestrians are walking in the roadway in Times Square.”

She has used data to build compelling arguments, infrastructure and services that are sustainable, and people-friendly improvements of mass transit in New York City. For the most part, the city likes her work, though, like any change-maker, there are also detractors. The city now has car-less plazas, bike lanes, two bus rapid transit routes and other pedestrian- and cycle-friendly facilities.

As Esquire continues:

“But she won’t stop there. After stuffing you full with safety data she’ll insist you try a pie of peripheral benefits. One of the reasons GPS units were plugged into all thirteen thousand yellow taxis in New York was so that the DOT could track the performance of the new system. More stats! They found that northbound taxi trips in west midtown were 17 percent faster in the fall of 2009 (after the Broadway shutdown) than in the fall of 2008. And the stats don’t lie!”

Sadik-Khan has been so successful at her job in part through her deployment of data, which has given government agencies like NYCDOT and the Department of City Planning the data needed to make good decisions and build a strong case for change. Yet in the field of transportation, certain issues, particularly in the developing world, lack the numbers.

Last week, a panel of distinguished experts spoke about the lack of comprehensive and reliable data in the transport sector during the annual Transforming Transportation event, organized by EMBARQ (the producer of this blog) and other partners. The data that are collected usually relate to car traffic. Information that is still needed include statistics on non-motorized transit, modal shares and where people travel.

It depends on the country, too. Madhav Pai, director of EMBARQ’s India program says, “In a place like India, data is lacking everywhere. We’re missing information on how the poor are moving and there’s absolutely no information on walking, bike riding and paratransit.” Plus, he says, for transport operations like rickshaws, there are twice the number of informal operators as there are regulated services. “So much is in the unofficial realm.”

Other reasons the transport field lacks high quality data include:

  • Lack of uniform definitions. For example, the definition of a road fatality varies from country to country, meaning there is no uniform way for an agency to know how to collect the data. The standard international definition by the World Health Organization is the following: a death is considered a traffic fatality if it occurs within 30 days of a traffic crash and is the result of that traffic crash. But knowing this requires coordination between the police department and the hospitals, in order to follow up and find out what happened to each crash victim. Often in the developing world, a fatality will only be considered as such if the person dies on the scene of the crash, which means the number of traffic fatalities is underreported.
  • Lack of the breadth and scope of supervising institutions and thus international standards to obtain high quality data, as opposed to other fields, such as demographics, public health and international trade.
  • Transport information is fragmented. Different institutions are responsible for data in different ways, whether its at a regional, citywide or national scale.
  • Ways of measuring data change from country to country and even within countries. In Mexico, for example, some cities only report walking trips that are longer than 15 minutes, while other cities report walking trips that are more than 5 minutes long. The issue is that there are no universal definitions for most transport indicators.
  • Cities have different policy priorities, and therefore, choose to keep track of different types of data. For example, some cities may not track emissions or energy intensity of certain vehicles.

When it comes to transport, academics, public policy experts and engineers will argue that collecting and analyze good data are key to achieving policy changes, developing better and more efficient transport systems, figuring out what works and what does not in the local context, deciding what’s measurable (and should be adjusted,) and what’s financially feasible. The irony is that there’s more data in areas of enforcement and regulation. The availability of data depends on demand, which often makes it difficult for emerging issues to be supported by the information that’s needed.

What makes data useful?

Lee Schipper, founder of EMBARQ and project scientist for Global Metropolitan Studies at the University of California, Berkeley, and a senior research engineer at Stanford University, explained that a good dataset should allow you to understand how it was created, how the data was measured, what it includes, and perhaps, more importantly, what it does not include. For example, a dataset can claim to have the total kilometers traveled in a country, but it may only include national highways, which only carry a portion of total traffic.

Information from the Panelists

Jose Barbero, a consultant with the Inter-American Development Bank, said there’s more data within certain sectors than others, at least in Latin America. Those sectors with very high data availability include international trade, energy use, national accounts and emissions. Data with intermediate availability include types of fuel used by transport modes, types of vehicles and trailers in freight transport, and road transport. Data with the lowest availability are generally non-motorized transport and trip purpose. Barbero also said the private sector is not involved enough in processing and sharing data.

Rodolfo Huici, principle economist for Infrastructure and Environment at the Inter-American Development Bank, said 50 percent of transportation-related emissions comes from freight. To address issues of urban livability, it is essential to also analyze the urban freight industry. However, a huge portion of domestic road freight transportation is often informal. He elaborated on an IDB proposal to develop a regional “observatory” to collect data, establish and analyze transport indicators, and improve knowledge on freight travel.

James Leather, transport specialist at the Asian Development Bank, spoke on the quality and availability of transport data in Asia. He said different levels of government collect and have access to information, making it inefficient to get details and analyze policies based on numbers. There’s an imbalance, he pointed out, where budgets are allocated and where data are available. Leather suggested, too, that Asia has extremely limited data on walking and biking.

How to Make Progress

To deal with the issue of data availability, there needs to be institutional coordination and clear definitions. People at the top and bottom have to know the data they’re looking for and how it is defined. Plus, national level actors must be motivated to collect the data, which means information has to be useful, not just to international actors. Datasets that are difficult, expensive and time-consuming to obtain are a huge challenge for organizations with limited capacity. Local ownership is key.

Part of the lack of data is because passenger transit, carbon reduction, mobility and biking are still becoming mainstream issues in development and national and local policy. As we wrote about last week, the sustainable transport sector still struggles to articulate its co-benefits and achieve consensus on an international scale. However, the lack of data is not just specific to transport; local officials don’t realize the importance of good data collection, in general.

Projects like the Partnership on Sustainable Low Carbon Transport, or SLoCaT, convened by Cornie Huizenga and Tom Hamlin, are working to facilitate a “transport data group” and to strengthen capacity for data collection at the national and local level, particularly in developing countries.

EMBARQ’s Madhav Pai says decision-making has been so ad-hoc in urban planning and transport in the developing world that there hasn’t been much rigor in trying to utilize data. “Conventional tools and models have been made in the west,” he says. “Developing cities are in such a transition that the instruments that have been developed cannot predict and apply to the massive growth.”

The tools to collect data are failing so there isn’t the incentive to do it. “There’s been a leap of technology so there should also be a leap in how data are used and generated,” he says. In Indian cities, he points out, the traditional process of making infrastructure and urban growth plans based on data just doesn’t apply because the information becomes irrelevant so quickly. Urban plans based on data just wont be the model for places like India.“We need new paradigms of how data are collected,” Pai says, adding, “It’s essential to shorten the time between collection and analysis.”

The question is: What’s the best tradeoff between speed (i.e. quick data collection) and accuracy? If you want to save time and do more superficial data collection, do you still get good enough quality to inform decision-making?

Given the pace and scale of change and the difficulty in making predictions in the transport field, Pai points out that it’s imperative to build systems that allow for continuous adjustment and improvement, a reason why Pai supports flexible solutions like like bus rapid transit.

Right Menu Icon