Accessing Chronicle Engine via NFS

Overview

Chronicle Engine is a data virtualisation layer.  It abstracts away the complexity of accessing, manipulating and subscribing to various data source so that the user of that data doesn't need to know how or where  the data is actually stored.  This means that this data can be migrated between systems or stored in a manner which is more efficient but would be to complex for the developer to use.

The basic interfaces are Concurrent Map and a simple Pub/Sub.  Using these in combination with the stream like filters and transformation you can access files, in memory data caches, LDAP, SQL Databases, Key-value NoSQL databases and low latency persisted stores.

What we are investigating is using NFS as a means of access as well as our Java and C# client to access the data in a natural way.  This way any program on Windows, Unix or MacOSX could use it. How might this look?

Access via NFS.

The data stores in Chronicle Engine are organise hierarchically as a tree, rather like a directory structure.  The keys of a key-value store are like files in a directory and the values are the contents of the file. This translates to a virtual file system.

In Java, to access a map on the server or a remote client.

Map<String, String> map = acquireMap("/group/data", String.class, String.class);

map.put("key-1", "Hello World");
map.put("key-2", "G-Day All");

However with NFS mounting we can access the same map from any program, even shell.

~ $ cd /group/data
/group/data $ echo Hello World > key-1
/group/data $ echo G-Day All > key-2

To get a value, this is really simple in Java

String value = map.get("key-1");

And via NFS it is also simple

/group/data $ cat key-1
Hello World

What about more complex functions?

An advantage of having our own NFS server is that we can add virtual files which can perform functions provided they follow the general file access contract.

In Java we can apply a query to get in real time all the people over 20 years old.  If an entry is added, it is printed as it happens.

map.entrySet().query()
    .filter(e -> e.getValue().age > 20)
    .map(e -> e.getKey())
    .subscribe(System.out::println);;

So how could this translate on NFS?

/group/data $ tail -9999f '.(select key where age > 20)'
Bob Brown
Cate Class

This would give you all the current names, but any new names as they happen.

Choosing your format.

By having virtual files you can ask for them in a different format.  Say the underlying data object is a row in an RDBMS data base.  You might want this in CSV format, but you might want it in XML or JSON.

/group/users $ ls
peter-lawrey
/group/users $ cat peter-lawrey.csv
Peter,Lawrey,UK,1001
/group/users $ cat peter-lawrey.xml
<user id="1001">
    <first>Peter</first>
    <last>Lawrey</last>
    <country>UK</country>
</user>
/group/users $ cat peter-lawrey.json


{"user": { "id": "1001", "first": "Peter", "last": "Lawrey", "country": "UK" }}

By adding a recognised file extension, the file can appear in the format desired.

Updating a record could be as simple as writing to a file.

What are the advantages over using a normal NFS file system?

The main advantage is extensibility.  Chronicle Engine supports;
  • billions of entries in one map (directory)
  • LAN and WAN data replication.
  • real time updates of changes.
  • query support.
  • data compression
  • traffic shaping.
  • auditability of who changed what when.
We plan to support data distribution as well and support for more back end data stores.

Feedback

What would you use such a system for?  What features would you lie to see?

You can comment here or on the Chronicle Forum

I look forward to hearing your thoughts.

Comments

  1. Sorry for being thick but I don't really understand what you're describing here. AFAIK Chronicle uses a couple of files on the disk (one for data and one for index if I recall correctly) which it maps into the communicating processes' address space (plus there is the asynchronous mirroring over TCP).

    Are you talking here about putting those files on NFS and sharing it between computers? Or something else entirely?

    If the former, it is a very risky proposition - there are all kinds of bugs related to the interaction between NFS clients / servers and the OS, which - independently of what you do in user mode - can corrupt data (frequently silently) and even crash the OS :-(. I wouldn't use NFS for anything else than the most simple data distribution - certainly not for a high performance / high concurrency system.

    ReplyDelete

Post a Comment

Popular posts from this blog

Java is Very Fast, If You Don’t Create Many Objects

System wide unique nanosecond timestamps

Comparing Approaches to Durability in Low Latency Messaging Queues