Open Data

Definition

Open data is according to the § 3 odst. 11 zákona č. 1061999 Sb. o svobodném přístupu k informacím:

Information published such that it allows for remote access in open, machine-readable format which purpose and further usage is in no way limited and which is registered in national open data catalogue.

Requirements

The following requirements must be met:

  • Availability in convenient machine-readable open format
  • Access is not restricted or conditioned (ie., by registration) and reuse is permitted
  • Registered in national open data catalogue with direct data links
  • Curated, contact to the curator is provided for feedback or requests
  • Documentation
  • Actuality

Requirements

Image source: https://www.webfirst.com/services/open-data-solutions

What is not Open Data?

  • Web service with limited access number
  • Data in PDF format
  • Data in XLS print-formated
  • Data in pseudo-CSV (like TSV), ie. non-standard format

5★ Open Data

Image source: https://5stardata.info/cs/

💡 Tip: Read more on https://5stardata.info/cs/

5★ Open Data

★ make your stuff available on the Web (whatever format) under an open license ⚠

★★ make it available as structured data (e.g., Excel instead of image scan of a table) ⚠

★★★ make it available in a non-proprietary open format (e.g., CSV instead of Excel)

★★★★ use URIs to denote things, so that people can point at your stuff

★★★★★ link your data to other data to provide context


⚠ Note that we can consider data to be open-ish since the ★★★

What are the costs & benefits of ★ Web data?

As a consumer …

  • ✔ You can look at it.
  • ✔ You can print it.
  • ✔ You can store it locally (on your hard drive or on an USB stick).
  • ✔ You can enter the data into any other system.
  • ✔ You can change the data as you wish.
  • ✔ You can share the data with anyone you like.

As a publisher …

  • ✔ It’s simple to publish.
  • ✔ You do not have explain repeatedly to others that they can use your data.

What are the costs & benefits of ★★ Web data?

As a consumer …

you can do all what you can do with ★ Web data and additionally:

  • ✔ You can directly process it with proprietary software to aggregate it, perform calculations, visualise it, etc.
  • ✔ You can export it into another (structured) format.

As a publisher …

  • ✔ It’s still simple to publish.

What are the costs & benefits of ★★★ Web data?

As a consumer …

you can do all what you can do with ★★ Web data and additionally:

  • ✔ You can manipulate the data in any way you like, without the need to own any prorietary software package.

As a publisher …

  • ✔ It’s still rather simple to publish.
  • ⚠ You might need converters or plug-ins to export the data from the proprietary format.

What are the costs & benefits of ★★★★ Web data?

As a consumer …

you can do all what you can do with ★★★ Web data and additionally:

  • ✔ You can link to it from any other place (on the Web or locally).
  • ✔ You can bookmark it.
  • ✔ You can reuse parts of the data.
  • ✔ You may be able to reuse existing tools and libraries, even if they only understand parts of the pattern the publisher used.
  • ✔ You can combine the data safely with other data. URIs are a global scheme so if two things have the same URI then it’s intentional, and if so that’s well on it’s way to being 5-star data!
  • ⚠ Understanding the structure of an RDF “Graph” of data can be more effort than tabular (Excel/CSV) or tree (XML/JSON) data.

As a publisher …

  • ✔ You have fine-granular control over the data items and can optimise their access (load balancing, caching, etc.)
  • ✔ Other data publishers can now link into your data, promoting it to 5 star!
  • ⚠ You typically invest some time slicing and dicing your data.
  • ⚠ You’ll need to assign URIs to data items and think about how to represent the data.
  • ⚠ You need to either find existing patterns to reuse or create your own.

What are the costs & benefits of ★★★★★ Web data?

As a consumer …

you can do all what you can do with ★★★★ Web data and additionally:

  • ✔ You can discover more (related) data while consuming the data.
  • ✔ You can directly learn about the data schema.
  • ⚠ You now have to deal with broken data links, just like 404 errors in web pages.
  • ⚠ Presenting data from an arbitrary link as fact is as risky as letting people include content from any website in your pages. Caution, trust and common sense are all still necessary.

As a publisher …

  • ✔ You make your data discoverable.
  • ✔ You increase the value of your data.
  • ✔ Your own organisation will gain the same benefits from the links as the consumers.
  • ⚠ You’ll need to invest resources to link your data to other data on the Web.
  • ⚠ You may need to repair broken or incorrect links.

What’s the meaning of it?

Lifecycle

Image source: http://sportsrecruits.com/images/sr_otg/club_transparency_icon.svg

Transparency

Image source: http://chittagongit.com/icon/infrastructure-icon-11.html

Infrastructure

Image source: https://res.cloudinary.com/logrhythm/image/upload/c_scale,w_250/v1534437608/icons/nextgen-siem-data-processing-icon.png

Normalization

Economical aspect

Open Data is convenient way to provide information. Creation of PDF files, web sites or applications requires additional costs which are usually not applicable when it comes to Open Data.

In principal, making data publicly available is merely posting them in its raw form (of course, remember the 5★).

And many, many more …

Call to arms

To be:

  • curious consumer
  • smart architect
  • responsible and aware publisher
  • forthcoming maintainer


I don't think anymore, that:

  • open data is a buzzword I don’t really understand
  • open data is any data published under open license
  • I am satisfied with data being only in PDF
  • I can put the whole sentence as a column name and don’t piss anyone off

FAQ

Thank You!

Resources