Joined November 2013
·
Posted to
JSON database engine
over 1 year
ago
As I said in the conclusion, using MongoDB or anything JSON based document store, would mean that I would need to handle databases, collections, documents which is a little an overhead for the scope of the idea above.
If I would use one db, one collection and one document, I am not sure what is the doc limit on the mongoDB as in size, considering that in the above concept I would store few GB of data.
Anyhow, thanks for the hint.
Achievements
62 Karma
3,610 Total ProTip Views
Forked
Have a project valued enough to be forked by someone else
Honey Badger
Have at least one original Node.js-specific repo
Desert Locust
Have at least one original repo where Erlang is the dominant language
The idea would be not to store it into memory. And definitely not have the whole dataset into memory :), that would be very stupid. The idea was to distribute the tree structure to data-files and be able to fast search them. The whole concept would get out from the "document" concept, while the data structure would "be" only one document.
One of the ideas I had regarding storing this information, was something as following:
* Each level of the JSON tree would be stored into a separate file, with an text index attached to it.
* Each entry in the file would consist of the "leaf" entry to which it relates, the key and value.
So for the root of the JSON I would have a file sort of "DB0" containing:
deepKeyseparatorKeyNameseparatorValue / Or Key pointer to deeper levelend-separator_
So for level 0 let's say we would have for example:
NULLseparatorkey1separatorkeyvalue1end-separator
NULLseparatorkey2separatorkeyvalue2end-separator
NULLseparatorkey3separator#deepkeyID1end-separator
and for the level 1 we would have a file called "DB_1" containing same line logic and for example we could have:
deepKeyID1separatorkeylvl1separatorkeylvl1value1end-separator
deepKeyID1separatorkeylvl2separatorkeylvl2value2end-separator
Considering that each file would have additional an index build in or adjacent to it, for fast text file search, the lookup process would be the following:
for query: find "key1" -> the application would look into the root file "DB_0" and get the value of it.
for query: find "key3.keylvl1" -> the application would look into the "DB0" and get the value of it, will check that this is a deeper key and open "DB1" and look inside for deep_key == value of key3 and key == keylvl1 and return the value.
for query: find "key3.*" -> the application would look into the "DB0" and get the value of it, will check that this is a deeper key and open "DB1" and look inside for deep_key == value of key3, get all the keys and values and build the resulting json object.
With some extrapolation more cases can be build and even use fixed number keys for storing arrays (restricting the user to use number as keys)
I am ignoring from start here, the caching of most used entries and also the additional concept that I have to store the "value" only once in a different table and just used referral keys for values (If the value Germany or New York is existing multiple times would be store only once)
The possible problem that I see here is that the deeper level files would be considerably larger than the root ones. Maybe a partitioning or sharding based on source_key would help.
Cheers,
Dan.
PS: I updated several times this comment, to math the markup language.