Gale Brewer, chair of the Committee on Technology in Government of the New York City Council, has introduced a draft law that would adopt open data sharing standards for the city's government.
At EveryBlock, we've been working with Council Member Brewer's office for more than two years. As we worked to launch our New York City site, we saw that she was instrumental in the passage Local Law 47, requiring the publication of monthly reports regarding data collected on calls made to the 311 system. We saw it as a step in the right direction, and Council Member Brewer as someone we wanted to work with.
Recently, we were invited to submit testimony in favor of this Local Law, with a focus on concrete examples of our experience with the publication of data in New York and other EveryBlock cities. We show that publishing frequently updated, well-structured, complete data sets can be a simple matter.
This is an active piece of legislation. If you live in New York, please contact your council member and express your support for the strongest possible Open Data law.
Here's the testimony we submitted in support of open data standards in New York City:
EveryBlock strongly supports the introduction of a Local Law to create open data standards in New York City
EveryBlock is a neighborhood news site serving 15 cities, including New York. You can see our work at http://nyc.everyblock.com/, where we combine public records from New York City's government with news articles, business reviews, images, and other items collected from across the Web.
In a typical month, we add thousands of crime reports, building permits, restaurant inspections, street closings, business licenses, news articles, and other news for every block in the city. Wherever possible, items are published at the block level, so that people can see what's going on near them. We also offer the public the ability to subscribe to daily updates through e-mail and RSS feeds. We continually search for new ways to add information in more useful ways.
In the course of our work in the past two years, we've worked with city leaders — department heads, council members, technology developers, policy makers, and so on in each of the cities we cover. We share some of what we've learned about data sharing in New York and other places below, and we look forward to working with DOITT, the New York City Council Technology Committee, and other stakeholders on fashioning an effective local law in ways to benefit all New Yorkers. We really hope that this law — and the data published as a result of it — serve as an example for other municipalities.
Some thoughts on the draft language of the Local Law
We reviewed the draft amendment to title 23 of the administrative code of the city of New York as seen here on the Council Web site. Setting aside any word-parsing, we find this to be a strong move forward in open data law for municipalities. There are three areas we see as especially promising:
The provision that "all public records shall also be made available in their raw or unprocessed form" is especially welcome. Very often, municipalities seek overly expensive, complicated technology projects to present data. Leveraging the efforts of citizen developers working with powerful tools is the way to go. As part of the EveryBlock project, for instance, we recently open-sourced the site's backend code.
"All public records shall be presented and structured in a format that permits automated processing." This is a much less complicated requirement than it seems. See below for many examples of structured formats that are available in existing tools and technologies that are widely available in New York and other municipalities.
The draft local law states that, "All public records shall be updated as often as necessary to preserve the integrity and usefulness of the records." Often, we see municipalities publish information once and fail to update it on a regular basis.
Examples of data published by the City of New York and displayed on EveryBlock
The City of New York already publishes a significant amount of data to its Web site. Here's a quick review of a number of these data types.
This data is published in Excel spreadsheet format as Job Weekly Statistical Reports by the Department of Buildings. The department updates its data weekly, and we at EveryBlock publish it shortly thereafter.
This data comes from the Sign Monthly Statistical Reports published by the New York City Department of Buildings in Excel spreadsheet format. The data is updated regularly and we at EveryBlock publish it shortly thereafter.
This data is published in Excel spreadsheet format on the Rolling Sales Update section of the city Web site. The New York City Department of Finance maintains the data, and updates it once a month. In general, the Finance Department deserves a lot of credit for the amount of data it publishes in this format. The department also provides RSS feeds to alert users when new data is published.
Examples of data we'd like to see or existing data that fails to meet standards set forth in this local law
This data comes from the precinct reports published by the police department. These reports are not comprehensive (they only include seven crime types), unspecific (they are only collected to precinct level), and infrequent (published weekly). This data lags far behind many other cities in each of these criteria.
On EveryBlock, we publish information that we collect from the NYC*scout page, run by the Mayor's Office of Operations. We obtain this data by scraping the maps in the NYC*scout application. This is a tiny sliver of the service requests completed by the city. It would be much better to have formal feed of all 311 data, along with details on the final disposition of service requests.
This data comes from this database of completed graffiti cleanup locations and this database of pending graffiti cleanup locations, maintained by the Mayor's Community Affairs Unit. It would be better if this data was published in formal feeds with structured formats. This allows for more sustainable methods than scraping Web databases.
This data comes from the online restaurant inspection database published by the Department of Health and Mental Hygiene. It would be better if this data was published in a formal feed with a structured format. This allows for more sustainable methods than scraping Web databases.
This data is no longer updated on EveryBlock because the source database, which used to be maintained by the Center for New York City Law CityAdmin search tool, is no longer available. This is a great example of why it's important to have reliable, centralized, well-structured datasets available to the public.
Other technology methods from other cities
Here are some examples of extremely lightweight data sharing at the civic level — next to zero effort with huge utility to developers like EveryBlock.
We worked with the San Francisco Police Department to help them create an XML file that they update for the public daily. The police do this as an export straight out of their CAD system — no development costs, no maintenance.
Database dumps: San Francisco restaurant inspections
The City and County of San Francisco does a .mdb file database dump in their native format and publishes it to an FTP server with access to EveryBlock and other approved entities.
Text files: San Jose building permits
We see this ordinance as a welcome next step in New York City government. If affirms the importance of data sharing, provides clear instruction on what should be published, and begins to set technology standards for format and structure.