Making money from accessing the vast amounts of information collected by the U.S. government has been the basis for many commercial enterprises. The widespread use of Census Bureau data alone has been a great business resource for decades — with a relatively new twist as a component of Google Maps.
Now the U.S. government has undertaken a major effort to make tons of federal information from all agencies more accessible for the general public and the business community. The government’s Open Data Policy requires federal agencies to put most of their information resources into electronically accessible configurations.
“One of the things we’re doing to fuel more private sector innovation and discovery is to make vast amounts of America’s data open and easy to access for the first time in history,” said President Obama when he announced the program in a May 9 executive order. The Office of Management Budget supplemented the order with an implementation directive.
Mix-and-Match IT Tools
The program will provide an interesting example of matching older government “legacy” IT systems with more advanced technologies including cloud services and the involvement of GitHub — a software startup that wasn’t even in business five years ago.
”By requiring that government agencies provide newly generated government data in machine-readable formats like CSV, XML and JSON — and when appropriate, expose data via Application Programming Interfaces — the new executive order and policy will further accelerate the liberation of government data,” said Federal Chief Information Officer Steven VanRoekel.
The Open Data Policy directs federal agencies to collect or create information in a way that supports downstream information processing and dissemination activities, and to build information systems to support interoperability and accessibility. Agencies will be required to improve data management and release practices, as well as to strengthen privacy and security measures.
The OMB directive includes several IT compliance elements:
- Formatting: Agencies must use machine-readable and open formats for information as it is collected or created. While information should be collected electronically by default, machine-readable and open formats must be used in conjunction with electronic, telephone or paper-based collection.
- Open access: Federal data managers must apply open licenses to information as it is collected or created so that if data is made public, there are no restrictions on copying, publishing, distributing, transmitting, adapting, or otherwise using the information for noncommercial or commercial purposes.
- Structure: Agencies must describe information using common core metadata as it is collected and created. Metadata should also include information about origin, linked data, geographic location, time series continuations, data quality, and other relevant indices that reveal relationships between datasets and allow the public to determine the fitness of the data source.
As the program unfolds, the federal chief information officer and chief technology officer will release free, open source tools on Github.
“This effort, known as Project Open Data, can accelerate the adoption of open data practices by providing plug-and-play tools and best practices to help agencies improve the management and release of open data,” said VanRoekel.
Launched in 2008, Github is a software clearinghouse designed to facilitate code collaboration and development for open source and private development projects. The company sprang from Git, a code coordination process first developed by Linus Torvalds, the creator of Linux.
For example, one tool released in conjunction with the Open Data launch automatically converts simple spreadsheets and databases into APIs for easier consumption by developers. Anyone — from government agencies to private citizens to local governments and for-profit companies — can freely use and adapt these tools, starting immediately.
The Cloud Technology Component
To further spur agency action, operations at Data.gov, the federal website and portal for accessing executive branch information, will be expanded to include new services such as improved visualization, mapping tools, and better context to help locate and understand data resources.
In addition, the site will offer robust API access for developers. Since Data.gov was launched in 2009, available data sets have grown from 47 to 400,000. Consumers have used the site for tasks from tracking product recalls to obtaining air travel on-time records.
One likely impact of the program will be the incorporation of cloud technology into the process.
“We believe the economics and the ubiquity of access of the cloud are central to any open data strategy. Cloud technology also provides the agility and ease of use that a modern government needs in order to adapt and iterate quickly,” Saf Rabah, vice president of marketing at Socrata, told the E-Commerce Times.
“Open data, unlike legacy IT domains, is evolving very rapidly — thanks to mainstream adoption globally, spurred by a thriving ecosystem of developers, entrepreneurs and big Internet players like Google and Yelp, who see an opportunity to create new consumer services with newly available government data. The rate of innovation far outpaces traditional procurement and deployment cycles in government,” he explained.
“Since open data is the one area where government is the accelerator of innovation, it’s essential to keep pace with the rest of the field,” Rabah said.
“Cloud technology will be a part of achieving the Open Data goals. It may not be the only innovation, but it could play a significant role,” Chris Wilson, vice president of federal government affairs at TechAmerica, told the E-Commerce Times.
The Critical Interoperability Factor
One aspect of the program presents both an opportunity and a challenge. The policy specifically calls for improved interoperability as a factor for advancing open data implementation.
“Interoperability is critical, and without it you are really stymied in utilizing any of the other tools that may be available,” Hudson Hollister, executive director of the Data Transparency Coalition, told the E-Commerce Times.
“Right now, standards setting for interoperability seems to be nobody’s job — and the federal government has the opportunity to take the lead here,” he said.
There is a need for federal managers to ensure that within their own agencies, IT systems are compatible and talk to each other. Then there is the problem of making sure IT is interoperable across government agencies on a functional basis — and then among and between subject areas.
It’s hoped that pending IT-related legislation on Capitol Hill will result in improving the development of interoperability standards, Hollister said.
One element of the policy that could play into the interoperability issue is a requirement that agencies publicly list an inventory of the data sets they possess. The scope of information revealed in such inventories across the government could provide a better basis for framing the tools needed to make the data more accessible.
“When open data procedures are incorporated into agency processes from the start, we’ll start to see more systems designed for bulk access from the start, and we’ll be better able to recoup all the missed opportunities in legacy datasets that are still closed,” said John Wonderlich, policy director at the Sunlight Foundation.
“We’ll be able to evaluate agencies’ transparency against what they’ve defined as their candidates for release, and clearly identify areas where agencies avoid disclosure altogether,” he noted.
While the Sunlight Foundation is primarily interested in open government issues, the information resource disclosure requirement will improve the ability of private sector firms to evaluate government data resources and deploy the necessary IT tools to access information.