Joined November 2013
·

Dan Harabagiu

Munich, Bavaria
·
·
·

Posted to JSON database engine over 1 year ago

The idea would be not to store it into memory. And definitely not have the whole dataset into memory :), that would be very stupid. The idea was to distribute the tree structure to data-files and be able to fast search them. The whole concept would get out from the "document" concept, while the data structure would "be" only one document.

One of the ideas I had regarding storing this information, was something as following:
* Each level of the JSON tree would be stored into a separate file, with an text index attached to it.
* Each entry in the file would consist of the "leaf" entry to which it relates, the key and value.

So for the root of the JSON I would have a file sort of "DB0" containing:
deepKey
separatorKeyNameseparatorValue / Or Key pointer to deeper levelend-separator_

So for level 0 let's say we would have for example:
NULLseparatorkey1separatorkeyvalue1end-separator
NULLseparatorkey2separatorkeyvalue2end-separator
NULLseparatorkey3separator#deepkeyID1end-separator

and for the level 1 we would have a file called "DB_1" containing same line logic and for example we could have:

deepKeyID1separatorkeylvl1separatorkeylvl1value1end-separator
deepKeyID1separatorkeylvl2separatorkeylvl2value2end-separator

Considering that each file would have additional an index build in or adjacent to it, for fast text file search, the lookup process would be the following:

for query: find "key1" -> the application would look into the root file "DB_0" and get the value of it.

for query: find "key3.keylvl1" -> the application would look into the "DB0" and get the value of it, will check that this is a deeper key and open "DB1" and look inside for deep_key == value of key3 and key == keylvl1 and return the value.

for query: find "key3.*" -> the application would look into the "DB0" and get the value of it, will check that this is a deeper key and open "DB1" and look inside for deep_key == value of key3, get all the keys and values and build the resulting json object.

With some extrapolation more cases can be build and even use fixed number keys for storing arrays (restricting the user to use number as keys)

I am ignoring from start here, the caching of most used entries and also the additional concept that I have to store the "value" only once in a different table and just used referral keys for values (If the value Germany or New York is existing multiple times would be store only once)

The possible problem that I see here is that the deeper level files would be considerably larger than the root ones. Maybe a partitioning or sharding based on source_key would help.

Cheers,
Dan.

PS: I updated several times this comment, to math the markup language.

Posted to JSON database engine over 1 year ago

As I said in the conclusion, using MongoDB or anything JSON based document store, would mean that I would need to handle databases, collections, documents which is a little an overhead for the scope of the idea above.

If I would use one db, one collection and one document, I am not sure what is the doc limit on the mongoDB as in size, considering that in the above concept I would store few GB of data.

Anyhow, thanks for the hint.

Achievements
62 Karma
3,506 Total ProTip Views