Big Data

There's a total of 2 articles.

Data structures for massive datasets

data structures big data probability count-min sketch bloom filters reservoir-sampling

The algorithms that we use every day to manipulate data assume that we have access to all the data we need. What if there’s more data that can fit in a single computer or if accessing the data itself to do searches is expensive? If so, we can use specialized data structures that can help us “estimate” the actual value without actually computing it, in some cases an estimate might be good enough. These data structures are: count-min sketch, bloom filters, and reservoir-sampling.

Published on Sat, May 9, 2020
Last modified on Fri, Nov 22, 2024
810 words - Page Source

Memtable & SSTable (Sorted String Table)

memtable sorted string table data structures big data linked list

The pattern of batching data up in memory, tracked in a write ahead log, and periodically flushed to disk is ubiquitous today. OSS examples are LevelDB, Cassandra, InfluxDB, or HBase.

In this article I implement a tiny memtable for a timeseries database in golang and briefly talk about how it can be compressed into a sorted string table.

Published on Sat, Feb 29, 2020
Last modified on Sun, Nov 10, 2024
892 words - Page Source