Log-structured merge-tree

Log-structured merge-tree

In computer science, the log-structured merge-tree (also known as LSM tree, or LSMT) is a data structure with performance characteristics that make it attractive for providing indexed access to files with high insert volume, such as transactional log data. LSM trees, like other search trees, maintain key-value pairs. LSM trees maintain data in two or more separate structures, each of which is optimized for its respective underlying storage medium; data is synchronized between the two structures efficiently, in batches.

Comment
enIn computer science, the log-structured merge-tree (also known as LSM tree, or LSMT) is a data structure with performance characteristics that make it attractive for providing indexed access to files with high insert volume, such as transactional log data. LSM trees, like other search trees, maintain key-value pairs. LSM trees maintain data in two or more separate structures, each of which is optimized for its respective underlying storage medium; data is synchronized between the two structures efficiently, in batches.
DeleteMinAvg
enO
DeleteMinWorst
enO
Depiction
LSM Tree.png
FindMinAvg
enO
FindMinWorst
enO
Has abstract
enIn computer science, the log-structured merge-tree (also known as LSM tree, or LSMT) is a data structure with performance characteristics that make it attractive for providing indexed access to files with high insert volume, such as transactional log data. LSM trees, like other search trees, maintain key-value pairs. LSM trees maintain data in two or more separate structures, each of which is optimized for its respective underlying storage medium; data is synchronized between the two structures efficiently, in batches. One simple version of the LSM tree is a two-level LSM tree. As described by Patrick O'Neil, a two-level LSM tree comprises two tree-like structures, called C0 and C1. C0 is smaller and entirely resident in memory, whereas C1 is resident on disk. New records are inserted into the memory-resident C0 component. If the insertion causes the C0 component to exceed a certain size threshold, a contiguous segment of entries is removed from C0 and merged into C1 on disk. The performance characteristics of LSM trees stem from the fact that each component is tuned to the characteristics of its underlying storage medium, and that data is efficiently migrated across media in rolling batches, using an algorithm reminiscent of merge sort. Most LSM trees used in practice employ multiple levels. Level 0 is kept in main memory, and might be represented using a tree. The on-disk data is organized into sorted runs of data. Each run contains data sorted by the index key. A run can be represented on disk as a single file, or alternatively as a collection of files with non-overlapping key ranges. To perform a query on a particular key to get its associated value, one must search in the Level 0 tree and also each run.The Stepped-Merge version of the LSM tree is a variant of the LSM tree that supports multiple levels with multiple tree structures at each level. A particular key may appear in several runs, and what that means for a query depends on the application. Some applications simply want the newest key-value pair with a given key. Some applications must combine the values in some way to get the proper aggregate value to return. For example, in Apache Cassandra, each value represents a row in a database, and different versions of the row may have different sets of columns. In order to keep down the cost of queries, the system must avoid a situation where there are too many runs. Extensions to the 'leveled' method to incorporate B+ tree structures have been suggested, for example bLSM and Diff-Index. LSM-tree was originally designed for write-intensive workloads. As increasingly more read and write workloads co-exist under an LSM-tree storage structure, read data accesses can experience high latency and low throughput due to frequent invalidations of cached data in buffer caches by LSM-tree compaction operations. To re-enable effective buffer caching for fast data accesses, a Log-Structured buffered-Merged tree (LSbM-tree) is proposed and implemented. LSM trees are used in data stores such as Apache AsterixDB, Bigtable, HBase, LevelDB, Apache Accumulo, SQLite4, Tarantool, RocksDB, WiredTiger, Apache Cassandra, InfluxDB and ScyllaDB.
Hypernym
Structure
InsertAvg
enO
InsertWorst
enO
InventedBy
enPatrick O'Neil, Edward Cheng, Dieter Gawlick, Elizabeth O'Neil
InventedYear
1996
Is primary topic of
Log-structured merge-tree
Label
enLog-structured merge-tree
Link from a Wikipage to an external page
asterixdb.apache.org/
www.benstopford.com/2015/02/14/log-structured-merge-trees
Link from a Wikipage to another Wikipage
Apache Accumulo
Apache Cassandra
B+ tree
Bigtable
Category:Database index techniques
Category:Trees (data structures)
Computer science
Database index
Data structure
Elizabeth O'Neil
File:LSM Tree.png
HBase
InfluxDB
LevelDB
Merge sort
Patrick O'Neil
RocksDB
Scylla (database)
Search tree
SQLite4
Tarantool
Transaction log
Tree (data structure)
WiredTiger
Name
enLog-structured merge-tree
SameAs
4qj8H
Log-structured merge-tree
LSM-дерево
m.0r4qf5x
Q6666764
Subject
Category:Database index techniques
Category:Trees (data structures)
Thumbnail
LSM Tree.png?width=300
Type
enHybrid
WasDerivedFrom
Log-structured merge-tree?oldid=1120239072&ns=0
WikiPageLength
9000
Wikipage page ID
38562148
Wikipage revision ID
1120239072
WikiPageUsesTemplate
Template:Cite book
Template:Cite journal
Template:CS trees
Template:Infobox data structure
Template:More footnotes
Template:Reflist
Template:Short description