Buffering
Hi everyone, this is Vishav and I am here with the 3rd iteration of my blog, “Journey to GSoC.” If you haven’t read my previous blog, you can read it here and keep up.
The first phase of Google Summer of Code has been completed.
I am done with JSON backend (JSONCollection
, JSONDict
, JSONList
).
You can see the work in this pull request.
The goal for the second phase of the summer is to add features for buffering and caching.
In the first week of second phase I worked to finalize the JSON backend PR from the first phase. Some touch-ups and some minor issues were adressed with the help of the mentors. Some issues/discussions that were addressed are:
- When I was trying to split classes (
SyncedCollection
,SyncedDict
etc.) in different files. I was getting circular import error as I am creating Instance ofJSONDict
andJSONList
inSyncedCollection.from_base
. I discussed this problem with my mentors and they suggested me to usemetaclass
forSyncedCollection
and register every child class using it. - There were some problems with the tests, I updated them using
pytest.fixtures
and decided to use hypothesis - The code was fine but there were many discrepancies with the docstrings. With the help of the mentors I resolved them to make the changes understandable even for new users.
With all the above resolved, I finally started working on the buffering.
Buffering
In buffering, we suspend the synchronization with the backend and the data is temporarily stored in buffer. All write operations are written to the buffer, and read operations are performed from the buffer whenever possible. When we exit the buffering mode, all the buffered data is written to the backend. Buffering provides better performance because the read and write operations are done in memory.
API for Buffering
The buffering will be provided by signac.buffered
and SyncedCollection.buffered
.
These methods provide a context manager for buffering mode.
The signac.buffered
is a global buffered mode where all the instances of synced data structures such as JSONDict
and JSONList
are buffered.
All write operations are deferred until the flush_all
function is called, the buffer overflows, or upon exiting the buffer mode.
This is a typical example of signac.buffered
:
jsd = signac.JSONDict('test.json')
with signac.buffered():
jsd.a = 'buffered'
assert jsd.a == 'buffered'
The SyncedCollection.buffered
is a context manager provided by individual instances of synced data structures.
All write operations are deferred until the flush
function is called or upon exiting the buffer mode.
This is a typical example of SyncedCollection.buffered
:
jsd = signac.JSONDict('test.json')
with jsd.buffered() as b:
b.a = 'buffered'
assert jsd.a == 'buffered'