logblock2

module

v0.0.0-...-b92c3bc Latest Latest Go to latest Published: Jun 11, 2024 License: BSD-3-Clause

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/akmistry/logblock2

Links

Open Source Insights

README ¶

Logblock: An LSM-based block device

Logblock is a software block device built on the concept of log-structured merge-trees (LSMs).

Logblock is inspired by how the Flash Translation Layer (FTLs) of an SSD works. But instead of NAND devices, we have blob storage systems, which are conceptually similar at the API level. By pushing the FTL up the stack, we can build block devices on top of abstract distributed storage systems, decoupling the block device from the underlying storage hardware. This is more CPU/memory instensive than simply mapping block ranges on a storage device to a logical block device, which is largely how traditional distributed block device are implemented. But decoupling from the storage hardware create a large amount of flexibility in terms of data allocation, replication, encoding, etc.

How to use

Currently, logblock is very basic and only supports storing data on the local filesystem. The logblock binary is located in cmd/logblock and can be run as:

% ./logblock -size 64G /dev/nbd0 /path/to/data/storage/directory
% sudo mke2fs /dev/nbd0
% sudo mount /dev/nbd0 /mnt/test

Overview

Conceptually, a block device can be thought of as a key-value store (a very limited one). The key is a fixed 8-byte integer, which is the block index (or LBA), and the value is simply the block data. Therefore, one can trivially build a block device on top of their favourite key-value database.

This is not particularly useful for building a locally-attached block device (maybe except for simulating a very large device for testing). However, one can use a distributed KV-store to build scalable, fault-tolerent block devices for VMs and other applications.

Logblock goes one step further, and instead of building on top of a database, it leverages the properties of a block device (i.e. fixed-size keys and values) to build a more efficient solution.

As an LSM, writes are written to a write-ahead log. Once the log reaches a certain size, it is "compacted" into a sparse block format. A block mapping keeps track of which file contains which block. The compaction step also frees up space which has been overwritten by more recently written blocks.

QNAs (Questions Nobody Asked)

Why is it called "logblock2"?

Because it's a "Log"-structured "Block" device. Also, naming is hard.

No, why is it called logblock_2_?

Because I have an original logblock git repo, which is >5 years old. It has multiple iterations of the write-ahead log and sparse block format, various block mapping data structures, a "graveyard" of unused code, and a git history which is starting to look like https://xkcd.com/1296/. It was probably best to start a new repo as I open-source this project.

How is this better than existing-distributed-block-device?

Honestly, it isn't. For one, this isn't a complete project, but rather a technical demo. It has too many missing features to be useful, not the least of which is integration with a blob/log storage backend.

TODO

TRIM support
Re-designed metadata model
Storage backends (i.e. S3, HDFS, etc)
Performance improvements
More tests
Alternate compaction strategies

LICENSE

Logblock is released under a BSD 3-Clause License.

Directories ¶

Path	Synopsis
cmd
logblock command
internal
adaptor
app/logblock
block
blockmap
metadata
metadata/pb
rangemap
readers
sparseblock
sparseblock/pb
storage
storage/cloud
storage/local
t
testutil
tracker
util
wal
wal/pb

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL