End of GSoC journey

2 minute read

Hi everyone, this is Vishav and I am here with the final iteration of my blog, “Journey to GSoC”. If you haven’t read my previous blog, you can read it here and keep up.

The last week of three month period of Google Summer of Code is here. It’s been a great learning experience and a fantastic journey. Apart the technical learning, this project also introduced me to an amazing community. This blog describes my whole journey throughout the project in one space. A single blog cannot describe all my learning and experiences but I am doing my best to pour all my accumulations into this blog.

Improve Synced Data Structures (#336)

This PR marked the beginning of my GSoC project and implemented the basic synced data structures. Earlier the JSON backend was implemented using the classes: SyncedAttrDict, SyncedList and JSONDict. But these classes provide limited functionality, like singular backend and limited support for nesting structures. So in order to provide different backend and support different data structures, I refactored these classes. In this PR, I have added the following classes:

  • SyncedCollection: This class is intended for use as an abstract base class. In addition, it declares as abstract methods the methods that must be implemented by any subclass to match the API.
  • SyncedAttrDict: Implements the dict data structure of API.
  • SyncedList: Implements the the list data structure of API.
  • JSONCollection: Implements synchronization functions for JSON backend.
  • JSONDict: Implements dict data structure with JSON backend.
  • JSONList: Implements list data structure with JSON backend.

Drop Support for Python 3.5 (#340)

This PR drops the support for Python 3.5. This was necessary because we use collections.abc.Collection in implementation and this was introduced in Python 3.6 .

Added backends to SyncedCollection (#364)

This PR adds ZarrCollection, RedisCollection, and MongoDBCollection to implement the zarr, redis, and MongoDB backend respectively to synced data structures. Every backend also provide dict and list data-structures implementations similar to JSON backend.

Added buffering and caching to SyncedCollection (#363)

In buffering, we suspend the synchronization with the backend and the data is temporarily stored in buffer. All write operations are written to the buffer, and read operations are performed from the buffer whenever possible. In caching, we store a copy of data in the memory so the next read operations will fetch the data from the memory instead of underlying-backend. These both provide better peformance as we fetch the data from the memory.

Added hypothesis based test to SyncedCollection (#373)

I worked on the adding hypothesis based testing to SyncedCollection. There were a lot of problems with hypothesis in combination with pytest fixtures. So we decided to close this PR and approach the problem at a later date.

Added validation layer to SyncedCollection (#378))

This PR adds validation layer to the SyncedCollection by adding a validator (or list of validators) that are applied to inputs. Previously, we only have a function that validate the keys of SyncedAttrDict. Now, it generalizes this behaviour to validate all input data, and not just SyncedAttrDict.

What’s left to do

  • Lazy statepoint loading: This changes behavior of Job to load its statepoint lazily, when opened by id

I believe this GSoC journey will shape my career path and direct my attitude in the right way. Over the summer I learned about a lot of things and have come out as a better developer and a better person as a whole. I’d like to extend my heartfelt thanks towards my mentors and the whole signac community for their constant help and support throughout the summer.

Updated: