Buffering

2 minute read

Hi everyone, this is Vishav and I am here with the 3rd iteration of my blog, “Journey to GSoC.” If you haven’t read my previous blog, you can read it here and keep up.

The first phase of Google Summer of Code has been completed. I am done with JSON backend (JSONCollection, JSONDict, JSONList). You can see the work in this pull request.

The goal for the second phase of the summer is to add features for buffering and caching.

In the first week of second phase I worked to finalize the JSON backend PR from the first phase. Some touch-ups and some minor issues were adressed with the help of the mentors. Some issues/discussions that were addressed are:

  • When I was trying to split classes (SyncedCollection, SyncedDict etc.) in different files. I was getting circular import error as I am creating Instance of JSONDict and JSONList in SyncedCollection.from_base. I discussed this problem with my mentors and they suggested me to use metaclass for SyncedCollection and register every child class using it.
  • There were some problems with the tests, I updated them using pytest.fixtures and decided to use hypothesis
  • The code was fine but there were many discrepancies with the docstrings. With the help of the mentors I resolved them to make the changes understandable even for new users.

With all the above resolved, I finally started working on the buffering.

Buffering

In buffering, we suspend the synchronization with the backend and the data is temporarily stored in buffer. All write operations are written to the buffer, and read operations are performed from the buffer whenever possible. When we exit the buffering mode, all the buffered data is written to the backend. Buffering provides better performance because the read and write operations are done in memory.

API for Buffering

The buffering will be provided by signac.buffered and SyncedCollection.buffered. These methods provide a context manager for buffering mode.

The signac.buffered is a global buffered mode where all the instances of synced data structures such as JSONDict and JSONList are buffered. All write operations are deferred until the flush_all function is called, the buffer overflows, or upon exiting the buffer mode.

This is a typical example of signac.buffered:

jsd = signac.JSONDict('test.json')
with signac.buffered():
    jsd.a = 'buffered'
assert jsd.a == 'buffered'

The SyncedCollection.buffered is a context manager provided by individual instances of synced data structures. All write operations are deferred until the flush function is called or upon exiting the buffer mode.

This is a typical example of SyncedCollection.buffered:

jsd = signac.JSONDict('test.json')
with jsd.buffered() as b:
    b.a = 'buffered'
assert jsd.a == 'buffered'

Updated: