Search & Discovery

Jun 21, 2021

In March we participated in a Climate Hackathon, building a solution to query full text documents in a global climate legislation database and surface insights for researchers. Initially the project seemed like a detour from our storage efforts but we were motivated because we like the material. That includes both the tech and the content: indexing and analyzing vast troves of documents is a challenge full of potential. Plus these docs are government climate policies, legislation that traces evolving international thinking.

In the hackathon we teamed up with two skilled hackers (Lasse Andersen and Laurence Watson) and, despite infrastructure glitches, conceptually things fell into place. We collected the documents from disparate sources, indexed them, and logically represented the documents as a Directed Acyclic Graph either by time or with other contextual relationships (however it was only a demo). We named the product PoliGrok because it helps users peruse the reams of legalese.

In the course of hacking it became apparent that the solution we had collaboratively built could be used for documents stored on IPFS as well. This serendipitously provided a piece of our puzzle, namely, how to index the Paris Climate Accord docs if they’re stored in a Web3 environment.

In previous hackathons, like HackFS and Apollo, we focused on our ClimateDataPool storage product: how to usher documents from civil servants in the signatory countries, to decentralized storage, and then make the docs available – i.e. discoverable – through a web frontend and API. Searching the document content was a follow-on process dependent on solving the storage question.

For the IPFS search implementation we’ll have to modify the linking within query results. Typically a document management system resides on a server and the docs are stored in some sort of bucket on that server (e.g. the current system at the United Nations at unfccc.int). The search engine indexes all the docs in the bucket. Then when a user receives a search result and wants to view an original, there’s a link to the doc in the bucket.

We would have to modify the link so it goes to an IPFS resource, the Web3 location of the stored document. This means the index will reside on our webserver (and could be mirrored elsewhere) but the documents will be judiciously stored, in a decentralized manner, on IPFS.

Last summer at the start of the HackFS hackathon we wanted to give our storage project its own identity so we coined ClimateDataPool. But we also realized that Chaîne ‘the org’ could have multiple products. With PoliGrok, and with thanks to our collaborators, that’s now the case. We’re expanding our horizons and discovering new methods & tech to foster climate-conscious societies. ✅

Chaîne Research

Discussion about this post