mirror of
git://git.sv.gnu.org/emacs.git
synced 2026-04-27 16:51:06 -07:00
1061 lines
43 KiB
ReStructuredText
1061 lines
43 KiB
ReStructuredText
.. mode: -*- rst -*-
|
|
|
|
|
|
Manual Variable Temporal (MVT) pool design
|
|
==========================================
|
|
|
|
:Tag: design.mps.poolmvt
|
|
:Author: P T Withington
|
|
:Date: 1998-02-13
|
|
:Status: incomplete design
|
|
:Revision: $Id$
|
|
:Copyright: See section `C. Copyright and License`_.
|
|
|
|
|
|
Introduction
|
|
------------
|
|
|
|
_`.intro`: This is a second-generation design for a pool that manually
|
|
manages variable-sized objects. It is intended as a replacement for
|
|
poolmv (except in its control pool role) and poolepdl, and it is
|
|
intended to satisfy the requirements of the Dylan "misc" pool and the
|
|
product malloc/new drop-in replacement.
|
|
|
|
_`.readership`: MM developers
|
|
|
|
_`.source`: req.dylan(6), req.epcore(16), req.product(2)
|
|
|
|
_`.background`: design.mps.poolmv_, design.mps.poolepdl(0),
|
|
design.product.soft.drop(0), paper.wil95(1), paper.vo96(0),
|
|
paper.grun92(1), paper.beck82(0), `mail.ptw.1998-02-25.22-18`_.
|
|
|
|
.. _design.mps.poolmv: poolmv
|
|
|
|
.. _mail.ptw.1998-02-25.22-18: https://info.ravenbrook.com/project/mps/mail/1998/02/25/22-18/0.txt
|
|
|
|
|
|
Definitions
|
|
-----------
|
|
|
|
_`.def.alignment`: Alignment is a constraint on an object's address,
|
|
typically to be a power of 2 (see also, glossary.alignment )
|
|
|
|
_`.def.bit-map`: A bitmap is a boolean-valued vector (see also,
|
|
glossary.bitmap ).
|
|
|
|
_`.def.block`: A block is a contiguous extent of memory. In this
|
|
document, block is used to mean a contiguous extent of memory managed
|
|
by the pool for the pool client, typically a subset of a segment
|
|
(compare with `.def.segment`_).
|
|
|
|
_`.def.cartesian-tree`: A cartesian tree is a binary tree ordered by
|
|
two keys (paper.stephenson83(0)).
|
|
|
|
_`.def.crossing-map`: A mechanism that supports finding the start of
|
|
an object from any address within the object, typically only required
|
|
on untagged architectures (see also, glossary.crossing.map ).
|
|
|
|
_`.def.footer`: A block of descriptive information describing and
|
|
immediately following another block of memory (see also
|
|
`.def.header`_).
|
|
|
|
_`.def.fragmentation`: Fragmented memory is memory reserved to the
|
|
program but not usable by the program because of the arrangement of
|
|
memory already in use (see also, glossary.fragmentation ).
|
|
|
|
_`.def.header`: A block of descriptive information describing and
|
|
immediately preceding another block of memory (see also,
|
|
glossary.in-band.header ).
|
|
|
|
_`.def.in-band`: From "in band signalling", when descriptive
|
|
information about a data structure is stored in the data structure
|
|
itself (see also, glossary.in-band.header ).
|
|
|
|
_`.def.out-of-band`: When descriptive information about a data
|
|
structure is stored separately from the structure itself (see also,
|
|
glossary.out-of-band.header ).
|
|
|
|
_`.def.refcount`: A refcount is a count of the number of users of an
|
|
object (see also, glossary.reference.count ).
|
|
|
|
_`.def.segment`: A segment is a contiguous extent of memory. In this
|
|
document, segment is used to mean a contiguous extent of memory
|
|
managed by the MPS arena (design.mps.arena_) and subdivided by the
|
|
pool to provide blocks (see `.def.block`_) to its clients.
|
|
|
|
.. _design.mps.arena: arena
|
|
|
|
_`.def.splay-tree`: A splay tree is a self-adjusting binary tree
|
|
(paper.st85(0), paper.sleator96(0)).
|
|
|
|
_`.def.splinter`: A splinter is a fragment of memory that is too small
|
|
to be useful (see also, glossary.splinter )
|
|
|
|
_`.def.subblock`: A subblock is a contiguous extent of memory. In this
|
|
document, subblock is used to mean a contiguous extent of memory
|
|
manage by the client for its own use, typically a subset of a block
|
|
(compare with `.def.block`_).
|
|
|
|
|
|
Abbreviations
|
|
-------------
|
|
|
|
_`.abbr.abq`: ABQ = Available Block Queue
|
|
|
|
_`.abbr.ap`: AP = Allocation Point
|
|
|
|
_`.abbr.cbs`: CBS = Coalescing Block Structure
|
|
|
|
_`.abbr.mps`: MPS = Memory Pool System
|
|
|
|
_`.abbr.mv`: MV = Manual-Variable
|
|
|
|
_`.abbr.ps`: PS = PostScript
|
|
|
|
|
|
|
|
Overview
|
|
--------
|
|
|
|
_`.overview`: MVT is intended to satisfy the requirements of the
|
|
clients that need manual-variable pools, improving on the performance
|
|
of the existing manual-variable pool implementations, and reducing the
|
|
duplication of code that currently exists. The expected clients of MVT
|
|
are: Dylan (currently for its misc pool), EP (particularly the dl
|
|
pool, but all pools other than the PS object pool), and Product
|
|
(initially the malloc/new pool, but also other manual pool classes).
|
|
|
|
|
|
Requirements
|
|
------------
|
|
|
|
_`.req.cat`: Requirements are categorized per guide.req(2).
|
|
|
|
_`.req.risk`: req.epcore(16) is known to be obsolete, but the revised
|
|
document has not yet been accepted.
|
|
|
|
|
|
Critical requirements
|
|
.....................
|
|
|
|
_`.req.fun.man-var`: The pool class must support manual allocation and
|
|
freeing of variable-sized blocks (source: req.dylan.fun.misc.alloc,
|
|
req.epcore.fun.{dl,gen,tmp,stat,cache,trap}.{alloc,free},
|
|
req.product.fun.{malloc,new,man.man}).
|
|
|
|
_`.non-req.fun.gc`: There is not a requirement that the pool class
|
|
support formatted objects, scanning, or collection objects; but it
|
|
should not be arbitrarily precluded.
|
|
|
|
_`.req.fun.align`: The pool class must support aligned allocations to
|
|
client-specified alignments. An individual instance need only support
|
|
a single alignment; multiple instances may be used to support more
|
|
than one alignment (source: req.epcore.attr.align).
|
|
|
|
_`.req.fun.reallocate`: The pool class must support resizing of
|
|
allocated blocks (source req.epcore.fun.dl.promise.free,
|
|
req.product.dc.env.{ansi-c,cpp}).
|
|
|
|
_`.non-req.fun.reallocate.in-place`: There is not a requirement blocks
|
|
must be resized in place (where possible); but it seems like a good
|
|
idea.
|
|
|
|
_`.req.fun.thread`: Each instance of the pool class must support
|
|
multiple threads of allocation (source req.epcore.fun.dl.multi,
|
|
req.product.dc.env.{ansi-c,cpp}).
|
|
|
|
_`.req.attr.performance`: The pool class must meet or exceed
|
|
performance of "competitive" allocators (source:
|
|
rec.epcore.attr.{run-time,tp}, req.product.attr.{mkt.eval, perform}).
|
|
[Dylan does not seem to have any requirement that storage be allocated
|
|
with a particular response time or throughput, just so long as we
|
|
don't block for too long. Clearly there is a missing requirement.]
|
|
|
|
_`.req.attr.performance.time`: By inference, the time overhead must be
|
|
competetive.
|
|
|
|
_`.req.attr.performance.space`: By inference, the space overhead must
|
|
be competetive.
|
|
|
|
_`.req.attr.reliability`: The pool class must have "rock-solid
|
|
reliability" (source: req.dylan.attr.rel.mtbf, req.epcore.attr.rel,
|
|
req.product.attr.rel).
|
|
|
|
_`.req.fun.range`: The pool class must be able to manage blocks
|
|
ranging in size from 1 byte to all of addressable memory
|
|
(req.epcore.attr.{dl,gen,tmp,stat,cache,trap}.obj.{min,max}. The range
|
|
requirement may be satisfied by multiple instances each managing a
|
|
particular client-specified subrange of sizes. [Dylan has requirements
|
|
req.dylan.attr.{capacity,obj.max}, but no requirement that such
|
|
objects reside in a manual pool.]
|
|
|
|
_`.req.fun.debug`: The pool class must support debugging erroneous
|
|
usage by client programs (source: req.epcore.fun.{dc.variety,
|
|
debug.support}, req.product.attr.{mkt.eval,perform}). Debugging is
|
|
permitted to incur additional overhead.
|
|
|
|
_`.req.fun.debug.boundaries`: The pool class must support checking for
|
|
accesses outside the boundaries of live objects.
|
|
|
|
_`.req.fun.debug.log`: The pool class must support logging of all
|
|
allocations and deallocations.
|
|
|
|
_`.req.fun.debug.enumerate`: The pool class must support examining all
|
|
allocated objects.
|
|
|
|
_`.req.fun.debug.free`: The pool class must support detecting
|
|
incorrect, overlapping, and double frees.
|
|
|
|
_`.req.fun.tolerant`: The pool class must support tolerance of
|
|
erroneous usage (source req.product.attr.use.level.1).
|
|
|
|
|
|
Essential requirements
|
|
......................
|
|
|
|
_`.req.fun.profile`: The pool class should support memory usage
|
|
profiling (source: req.product.attr.{mkt.eval, perform}).
|
|
|
|
_`.req.attr.flex`: The pool class should be flexible so that it can be
|
|
tuned to specific allocation and freeing patterns (source:
|
|
req.product.attr.flex,req.epcore.attr.{dl,cache,trap}.typ). The
|
|
flexibility requirement may be satisfied by multiple instances each
|
|
optimizing a specific pattern.
|
|
|
|
_`.req.attr.adapt`: The pool class should be adaptive so that it can
|
|
accommodate changing allocation and freeing patterns (source:
|
|
req.epcore.fun.{tmp,stat}.policy,
|
|
req.product.attr.{mkt.eval,perform}).
|
|
|
|
|
|
Nice requirements
|
|
.................
|
|
|
|
_`.req.fun.suballocate`: The pool class may support freeing of any
|
|
aligned, contiguous subset of an allocated block (source
|
|
req.epcore.fun.dl.free.any, req.product.attr.{mkt.eval,perform}).
|
|
|
|
|
|
Architecture
|
|
------------
|
|
|
|
_`.arch.overview`: The pool has several layers: client allocation is
|
|
by Allocation Points (APs).
|
|
|
|
_`.arch.overview.ap`: APs acquire storage from the pool
|
|
available-block queue (ABQ).
|
|
|
|
_`.arch.overview.abq`: The ABQ holds blocks of a minimum configurable
|
|
size: "reuse size".
|
|
|
|
_`.arch.overview.storage`: The ABQ acquires storage from the arena,
|
|
and from its internal free block managers.
|
|
|
|
_`.arch.overview.storage.contiguous`: The arena storage is requested
|
|
to be contiguous to maximize opportunities for coalescing (Loci will
|
|
be used when available).
|
|
|
|
_`.arch.overview.cbs`: The free block managers hold blocks freed by
|
|
the client until, through coalescing, they have reached the reuse
|
|
size, at which point they are made available on the ABQ.
|
|
|
|
_`.arch.ap`: The pool will use allocation points as the allocation
|
|
interface to the client.
|
|
|
|
_`.arch.ap.two-phase`: Allocation points will request blocks from the
|
|
pool and suballocate those blocks (using the existing AP, compare and
|
|
increment, 2-phase mechanism) to satisfy client requests.
|
|
|
|
_`.arch.ap.fill`: The pool will have a configurable "fill size" that
|
|
will be the preferred size block used to fill the allocation point.
|
|
|
|
_`.arch.ap.fill.size`: The fill size should be chosen to amortize the
|
|
cost of refill over a number of typical reserve/commit operations, but
|
|
not so large as to exceed the typical object population of the pool.
|
|
|
|
_`.arch.ap.no-fit`: When an allocation does not fit in the remaining
|
|
space of the allocation point, there may be a remaining fragment.
|
|
|
|
_`.arch.ap.no-fit.sawdust`: If the fragment is below a configurable
|
|
threshold (minimum size), it will be left unused (but returned to the
|
|
free block managers so it will be reclaimed when adjacent objects are
|
|
freed).
|
|
|
|
_`.arch.ap.no-fit.splinter`: otherwise, the remaining fragment will be
|
|
(effectively) returned to the head of the available-block queue, so
|
|
that it will be used as soon as possible (that is, by objects of similar
|
|
birthdate).
|
|
|
|
_`.arch.ap.no-fit.oversize`: If the requested allocation exceeds the
|
|
fill size it is treated exceptionally (this may indicate the client
|
|
has either misconfigured or misused the pool and should either change
|
|
the pool configuration or create a separate pool for these exceptional
|
|
objects for best performance).
|
|
|
|
_`.arch.ap.no-fit.oversize.policy`: Oversize blocks are assumed to
|
|
have exceptional lifetimes, hence are allocated to one side and do not
|
|
participate in the normal storage recycling of the pool.
|
|
|
|
_`.arch.ap.refill.overhead`: If reuse size is small, or becomes small
|
|
due to `.arch.adapt`_, all allocations will effectively be treated
|
|
exceptionally (the AP will trip and a oldest-fit block will be chosen
|
|
on each allocation). This mode will be within a constant factor in
|
|
overhead of an unbuffered pool.
|
|
|
|
_`.arch.abq`: The available block queue holds blocks that have
|
|
coalesced sufficiently to reach reuse size.
|
|
|
|
_`.arch.abq.reuse.size`: A multiple of the quantum of virtual memory
|
|
is used as the reuse size (`.anal.policy.size`_).
|
|
|
|
_`.arch.abq.fifo`: It is a FIFO queue (recently coalesced blocks go to
|
|
the tail of the queue, blocks are taken from the head of the queue for
|
|
reuse).
|
|
|
|
_`.arch.abq.delay-reuse`: By thus delaying reuse, coalescing
|
|
opportunities are greater.
|
|
|
|
_`.arch.abq.high-water`: It has a configurable high water mark, which
|
|
when reached will cause blocks at the head of the queue to be returned
|
|
to the arena, rather than reused.
|
|
|
|
_`.arch.abq.return`: When the MPS supports it, the pool will be able
|
|
to return free blocks from the ABQ to the arena on demand.
|
|
|
|
_`.arch.abq.return.segment`: `.arch.abq.return`_ can be guaranteed to
|
|
be able to return a segment by setting reuse size to twice the size of
|
|
the segments the pool requests from the arena.
|
|
|
|
_`.arch.cbs`: The coalescing block structure holds blocks that have
|
|
been freed by the client.
|
|
|
|
_`.arch.cbs.optimize`: The data structure is optimized for coalescing.
|
|
|
|
_`.arch.cbs.abq`: When a block reaches reuse size, it is added to the
|
|
ABQ.
|
|
|
|
_`.arch.cbs.data-structure`: The data structures are organized so that
|
|
a block can be both in the free block managers and on the ABQ
|
|
simultaneously to permit additional coalescing, up until the time the
|
|
block is removed from the ABQ and assigned to an AP.
|
|
|
|
_`.arch.fragmentation.internal`: Internal fragmentation results from
|
|
The pool will request large segments from the arena to minimize the
|
|
internal fragmentation due to objects not crossing segment boundaries.
|
|
|
|
_`.arch.modular`: The architecture will be modular, to allow building
|
|
variations on the pool by assembling different parts.
|
|
|
|
_`.arch.modular.example`: For example, it should be possible to build
|
|
pools with any of the freelist mechanisms, with in-band or out-of-band
|
|
storage (where applicable), that do or do not support derived object
|
|
descriptions, etc.
|
|
|
|
_`.arch.modular.initial`: The initial architecture will use
|
|
`.sol.mech.free-list`_ for the free block managers,
|
|
`.sol.mech.storage.out-of-band`_, `.sol.mech.desc.derived`_, and
|
|
`.sol.mech.allocate.buffer`_.
|
|
|
|
_`.arch.segregate`: The architecture will support segregated
|
|
allocation through the use of multiple allocation points. The client
|
|
will choose the appropriate allocation point either at run time, or
|
|
when possible, at compile time.
|
|
|
|
_`.arch.segregate.initial`: The initial architecture will segregate
|
|
allocations into two classes: large and small. This will be
|
|
implemented by creating two pools with different parameters.
|
|
|
|
_`.arch.segregate.initial.choice`: The initial architecture will
|
|
provide glue code to choose which pool to allocate from at run time.
|
|
If possible this glue code will be written in a way that a good
|
|
compiler can optimize the selection of pool at compile time.
|
|
Eventually this glue code should be subsumed by the client or
|
|
generated automatically by a tool.
|
|
|
|
_`.arch.debug`: Debugging features such as tags, fenceposts, types,
|
|
creators will be implemented in a layer above the pool and APs. A
|
|
generic pool debugging interface will be developed to support
|
|
debugging in this outer layer.
|
|
|
|
_`.arch.debug.initial`: The initial architecture will have counters
|
|
for objects/bytes allocated/freed and support for detecting
|
|
overlapping frees.
|
|
|
|
_`.arch.dependency.loci`: The architecture depends on the arena being
|
|
able to efficiently provide segments of varying sizes without
|
|
excessive fragmentation. The locus mechanism should satisfy this
|
|
dependency. (See `.anal.strategy.risk`_.)
|
|
|
|
_`.arch.dependency.mfs`: The architecture internal data structures
|
|
depend on efficient manual management of small, fixed-sized objects (2
|
|
different sizes). The MFS pool should satisfy this dependency.
|
|
|
|
_`.arch.contingency`: Since the strategy we propose is new, it may not
|
|
work.
|
|
|
|
_`.arch.contingency.pathological`: In particular, pathological
|
|
allocation patterns could result in fragmentation such that no blocks
|
|
recycle from the free bock managers to the ABQ.
|
|
|
|
_`.arch.contingency.fallback`: As a fallback, there will be a pool
|
|
creation parameter for a high water mark for the free space.
|
|
|
|
_`.arch.contingency.fragmentation-limit`: When the free space as a
|
|
percentage of all the memory managed by the pool (a measure of
|
|
fragmentation) reaches that high water mark, the free block managers
|
|
will be searched oldest-fit before requesting additional segments from
|
|
the arena.
|
|
|
|
_`.arch.contingency.alternative`: We also plan to implement
|
|
`.sol.mech.free-list.cartesian-tree`_ as an alternative free block
|
|
manager, which would permit more efficient searching of the free
|
|
blocks.
|
|
|
|
_`.arch.parameters`: The architecture supports several parameters so
|
|
that multiple pools may be instantiated and tuned to support different
|
|
object cohorts. The important parameters are: reuse size, minimum
|
|
size, fill size, ABQ high water mark, free block fragmentation limit
|
|
(see `.arch.contingency.fragmentation-limit`_).
|
|
|
|
_`.arch.parameters.client-visible`: The client-visible parameters of
|
|
the pool are the minimum object size, the mean object size, the
|
|
maximum object size, the reserve depth and fragmentation limit. The
|
|
minimum object size determines when a splinter is kept on the head of
|
|
the ABQ (`.arch.ap.no-fit.splinter`_). The maximum object size
|
|
determines the fill size (`.arch.ap.fill.size`_) and hence when a
|
|
block is allocated exceptionally (`.arch.ap.no-fit.oversize`_). The
|
|
mean object size is the most likely object size. The reserve depth is
|
|
a measure of the hysteresis of the object population. The mean object
|
|
size, reserve depth and, maximum object size are used to determine the
|
|
size of the ABQ (`.arch.abq.high-water`_). The fragmentation limit is
|
|
used to determine when contingency mode is used to satisfy an
|
|
allocation request (`.arch.contingency`_).
|
|
|
|
_`.arch.adapt`: We believe that an important adaptation to explore is
|
|
tying the reuse size inversely to the fragmentation (as measured in
|
|
`.arch.contingency.fragmentation-limit`_).
|
|
|
|
_`.arch.adapt.reuse`: By setting reuse size low when fragmentation is
|
|
high, smaller blocks will be available for reuse, so fragmentation
|
|
should diminish.
|
|
|
|
_`.arch.adapt.overhead`: This will result in higher overhead as the AP
|
|
will need to be refilled more often, so reuse size should be raised
|
|
again as fragmentation diminishes.
|
|
|
|
_`.arch.adapt.oldest-fit`: In the limit, if reuse size goes to zero,
|
|
the pool will implement a "oldest-fit" policy: the oldest free block
|
|
of sufficient size will be used for each allocation.
|
|
|
|
_`.arch.adapt.risk`: This adaptation is an experimental policy and
|
|
should not be delivered to clients until thoroughly tested.
|
|
|
|
|
|
Analysis
|
|
--------
|
|
|
|
_`.anal.discard`: We have discarded many traditional solutions based
|
|
on experience and analysis in paper.wil95(1). In particular, managing
|
|
the free list as a linear list arranged by address or size and basing
|
|
policy on searching such a linear list in a particular direction, from
|
|
a particular starting point, using fit and/or immediacy as criteria.
|
|
We believe that none of these solutions is derived from considering
|
|
the root of the problem to be solved (as described in
|
|
`.anal.strategy`_), although their behavior as analyzed by Wilson
|
|
gives several insights.
|
|
|
|
_`.anal.strategy`: For any program to run in the minimum required
|
|
memory (with minimal overhead -- we discard solutions such as
|
|
compression for now), fragmentation must be eliminated. To eliminate
|
|
fragmentation, simply place blocks in memory so that they die "in
|
|
order" and can be immediately coalesced. This ideal is not achievable,
|
|
but we believe we can find object attributes that correlate with
|
|
deathtime and exploit them to approximate the ideal. Initially we
|
|
believe birth time and type (as approximated by size) will be useful
|
|
attributes to explore.
|
|
|
|
_`.anal.strategy.perform`: To meet `.req.attr.performance`_, the
|
|
implementation of `.sol.strategy`_ must be competitive in both time
|
|
and space.
|
|
|
|
_`.anal.strategy.risk`: The current MPS segment substrate can cause
|
|
internal fragmentation which an individual pool can do nothing about.
|
|
We expect that request.epcore.170193.sugg.loci_ will be implemented to
|
|
remove this risk.
|
|
|
|
.. _`request.epcore.170193.sugg.loci`: https://info.ravenbrook.com/project/mps/import/2001-11-05/mmprevol/request/epcore/170193/
|
|
|
|
_`.anal.policy`: Deferred coalescing, when taken to the extreme will
|
|
not minimize the memory consumption of a program, as no memory would
|
|
ever be reused. Eager reuse appears to lead to more fragmentation,
|
|
whereas delayed reuse appears to reduce fragmentation
|
|
(paper.wil95(1)). The systems studied by Wilson did not directly
|
|
address deferring reuse. Our proposed policy is to reuse blocks when
|
|
they reach a (configurable) size. We believe that this policy along
|
|
with the policy of segregating allocations by death time, will greatly
|
|
reduce fragmentation.
|
|
|
|
_`.anal.policy.risk`: This policy could lead to pathological behavior
|
|
if allocations cannot be successfully segregated.
|
|
|
|
_`.anal.policy.allocate.segregate`: This policy has some similarities
|
|
to CustomAlloc (paper.grun92(1)). CustomAlloc segregates objects by
|
|
size classes, and then within those classes chooses a different
|
|
allocator depending on whether that size class has a stable or
|
|
unstable population. Classes with stable population recycle storage
|
|
within the class, whereas classes with unstable populations return
|
|
their storage to the general allocation pool for possible reuse by
|
|
another class. CustomAlloc, however, requires profiling the
|
|
application and tuning the allocator according to those profiles.
|
|
Although we intend to support such tuning, we do not want to require
|
|
it.
|
|
|
|
_`.anal.policy.reallocate`: For reallocation, `.req.fun.suballocate`_
|
|
can be used to free the remainder if a block is made smaller. Doing so
|
|
will cause the freed block to obey `.sol.policy.allocate`_ (that is,
|
|
the freed block will not be treated specially, it will be subject to
|
|
the normal policy on reuse). Copying can be used if a block is made
|
|
larger. paper.vo96(0) reports success in over-allocating a block the
|
|
first time it is resized larger, presumably because blocks that are
|
|
resized once tend to be resized again and over-allocating may avoid a
|
|
subsequent copy. If each object that will be reallocated can be given
|
|
its own allocation point until its final reallocation, the allocation
|
|
point can be used to hold released or spare storage.
|
|
|
|
_`.anal.policy.size`: We believe that this will take advantage of the
|
|
underlying virtual memory system's ability to compact the physical
|
|
memory footprint of the program by discarding free fragments that
|
|
align with the virtual memory quantum. (In a VM system one can
|
|
approximate compaction by sparse mapping. If every other page of a
|
|
segment is unused, the unused pages can be unmapped, freeing up
|
|
physical memory that can be mapped to a new contiguous vm range.)
|
|
|
|
_`.anal.mech.free-list`: The literature (paper.grun92(1),
|
|
paper.vo96(0)) indicate that `.sol.mech.free-list.cartesian-tree`_
|
|
provides a space-efficient implementation at some cost in speed.
|
|
`.sol.mech.free-list.splay-tree`_ is faster but less space-efficient.
|
|
`.sol.mech.free-list.bitmap`_ is unstudied. Many of the faster
|
|
allocators maintain caches of free blocks by size to speed allocation
|
|
of "popular" sizes. We intend to initially explore not doing so, as we
|
|
believe that policy ultimately leads to fragmentation by mixing
|
|
objects of varying death times. Instead we intend to use a free list
|
|
mechanism to support fast coalescing, deferring reuse of blocks until
|
|
a minimum size has been reached.
|
|
|
|
_`.anal.mech.allocate.optimize-small`: Wilson (paper.wil95(1)) notes
|
|
that small blocks typically have short lifetimes and that overall
|
|
performance is improved if you optimize the management of small
|
|
blocks, e.g., `.sol.mech.allocate.lookup-table`_ for all small blocks.
|
|
We believe that `.sol.mech.allocate.buffer`_ does exactly that.
|
|
|
|
_`.anal.mech.allocate.optimize-new`: Wilson (paper.wil95(1)) reports
|
|
some benefit from "preserving wilderness", that is, when a block of
|
|
memory must be requested from the system to satisfy an allocation,
|
|
only the minimum amount of that block is used, the remainder is
|
|
preserved (effectively by putting it at the tail of the free list).
|
|
This mechanism may or may not implement `.sol.policy.allocate`_. We
|
|
believe a better mechanism is to choose to preserve or not, based on
|
|
`.sol.policy.allocate`_.
|
|
|
|
|
|
Ideas
|
|
-----
|
|
|
|
_`.sol`: Many solution ideas for manual management of variable-sized
|
|
memory blocks are enumerated by paper.wil95(1). Here we list the most
|
|
promising, and some of our own.
|
|
|
|
|
|
Strategy
|
|
........
|
|
|
|
_`.sol.strategy`: To run a program in the minimal required memory,
|
|
with minimal overhead, utilize memory efficiently. Memory becomes
|
|
unusable when fragmented. Strategy is to minimize fragmentation. So
|
|
place blocks where they won't cause fragmentation later.
|
|
|
|
_`.sol.strategy.death`: Objects that will die together (in time)
|
|
should be allocated together (in space); thus they will coalesce,
|
|
reducing fragmentation.
|
|
|
|
_`.sol.strategy.death.birth`: Assume objects allocated near each other
|
|
in time will have similar deathtimes (paper.beck82(0)).
|
|
|
|
_`.sol.strategy.death.type`: Assume objects of different type may have
|
|
different deathtimes, even if born together.
|
|
|
|
_`.sol.strategy.death.predict`: Find and use program features to predict deathtimes.
|
|
|
|
_`.sol.strategy.reallocate`: Reallocation implies rebirth, or at least
|
|
a change in lifetime
|
|
|
|
_`.sol.strategy.debug`: As much of the debugging functionality as
|
|
possible should be implemented as a generally available MPS utility;
|
|
the pool will provide support for debugging that would be expensive or
|
|
impossible to allocate outside the pool.
|
|
|
|
|
|
Policy
|
|
......
|
|
|
|
Policy is an implementable decision procedure, hopefully approximating
|
|
the strategy.
|
|
|
|
_`.sol.policy.reuse`: Defer reusing blocks, to encourage coalescing.
|
|
|
|
_`.sol.policy.split`: When a block is split to satisfy an allocation,
|
|
use the remainder as soon as possible.
|
|
|
|
_`.sol.policy.size`: Prevent `.sol.policy.reuse`_ from consuming all
|
|
of memory by choosing a (coalesced) block for reuse when it reaches a
|
|
minimum size.
|
|
|
|
_`.sol.policy.size.fixed`: Use the quantum of virtual memory (e.g.,
|
|
one page) as minimum size.
|
|
|
|
_`.sol.policy.size.tune`: Allow tuning minimum size.
|
|
|
|
_`.sol.policy.size.adapt`: Adaptively change minimum size.
|
|
|
|
_`.sol.policy.allocate`: Allocate objects with similar birthdate and
|
|
lifetime together.
|
|
|
|
_`.sol.policy.allocate.segregate`: Segregate allocations by type.
|
|
|
|
_`.sol.policy.allocate.segregate.size`: Use size as a substitute for type.
|
|
|
|
_`.sol.policy.allocate.segregate.tune`: Permit tuning of segregation.
|
|
|
|
_`.sol.policy.allocate.segregate.adapt`: Adaptively segregate allocations.
|
|
|
|
_`.sol.policy.reallocate`: Implement reallocation in a central
|
|
mechanism outside of the pool, create a generic pool interface in
|
|
support of same.
|
|
|
|
_`.sol.policy.debug`: Implement a pool debugging interface.
|
|
|
|
_`.sol.policy.debug.counters`: Implement debugging counters in the
|
|
pool that are queried with a generic interface.
|
|
|
|
_`.sol.policy.debug.verify`: Implement debugging error returns on
|
|
overlapping frees.
|
|
|
|
|
|
Mechanism
|
|
.........
|
|
|
|
Mechanisms are algorithms or data structures used to implement policy.
|
|
|
|
_`.sol.mech.free-list`: Mechanisms that can be used to describe the
|
|
free list.
|
|
|
|
_`.sol.mech.free-list.cartesian-tree`: Using address and size as keys
|
|
supports fast coalescing of adjacent blocks and fast searching for
|
|
optimal-sized blocks. Unfortunately, because the shape of the tree is
|
|
constrained by the second key, it can become unbalanced. This data
|
|
structure is used in the SunOS 4.1 malloc (paper.grun92(1)).
|
|
|
|
_`.sol.mech.free-list.splay-tree`: The amortized cost of a splay tree
|
|
is competitive with balanced binary trees in the worst case, but can
|
|
be significantly better for regular patterns of access because
|
|
recently-accessed keys are moved to the root of the tree and hence can
|
|
be re-accessed quickly. This data structure is used in the System Vr4
|
|
malloc (paper.vo96(0)). (For a complete analysis of the splay tree
|
|
algorithm time bounds see paper.st85(0).)
|
|
|
|
_`.sol.mech.free-list.bitmap`: Using address as an index and
|
|
fix-sized blocks, the booleans can represent whether a block is free
|
|
or not. Adjacent blocks can be used to construct larger blocks.
|
|
Efficient algorithms for searching for runs in a vector are known.
|
|
This data structure is used in many file system disk block managers.
|
|
|
|
_`.sol.mech.free-list.refcount`: A count of the number of allocated
|
|
but not freed subblocks of a block can be used to determine when a
|
|
block is available for reuse. This is an extremely compact data
|
|
structure, but does not support subblock reuse.
|
|
|
|
_`.sol.mech.free-list.hybrid`: Bitmaps appear suited particularly to
|
|
managing small, contiguous blocks. The tree structures appear suited
|
|
particularly to managing varying-sized, discontiguous blocks. A
|
|
refcount can be very efficient if objects can be placed accurately
|
|
according to death time. A hybrid mechanism may offer better
|
|
performance for a wider range of situations.
|
|
|
|
_`.sol.mech.free-list.linked`: An address-ordered singly linked free
|
|
list using space in each free block to store the block's size and a
|
|
pointer to the next block.
|
|
|
|
_`.sol.mech.storage`: Methods that can be used to store the free list description.
|
|
|
|
_`.sol.mech.storage.in-band`: The tree data structures (and
|
|
`.sol.mech.free-list.linked`_) are amenable to being stored in the
|
|
free blocks themselves, minimizing the space overhead of management.
|
|
To do so imposes a minimum size on free blocks and reduces the
|
|
locality of the data structure.
|
|
|
|
_`.sol.mech.storage.out-of-band`: The bit-map data structure must be
|
|
stored separately.
|
|
|
|
_`.sol.mech.desc`: for an allocated block to be freed, its base and
|
|
bound must be known
|
|
|
|
_`.sol.mech.desc.derived`: Most clients can supply the base of the
|
|
block. Some clients can supply the bound.
|
|
|
|
_`.sol.mech.desc.in-band`: When the bound cannot be supplied, it can
|
|
be stored as an in-band "header". If neither the base nor bound can be
|
|
supplied (e.g., the client may only have an interior pointer to the
|
|
block), a header and footer may be required.
|
|
|
|
_`.sol.mech.desc.out-of-band`: In un-tagged architectures, it may be
|
|
necessary to store the header and footer out-of-band to distinguish
|
|
them from client data. Out-of-band storage can improve locality and
|
|
reliability. Any of the free-list structures can also be used to
|
|
describe allocated blocks out-of-band.
|
|
|
|
_`.sol.mech.desc.crossing-map`: An alternative for untagged
|
|
architectures is to store a "crossing map" which records an encoding
|
|
of the start of objects and then store the descriptive information
|
|
in-band.
|
|
|
|
_`.sol.mech.allocate`: Mechanisms that can be used to allocate blocks
|
|
(these typically sit on top of a more general free-list manager).
|
|
|
|
_`.sol.mech.allocate.lookup-table`: Use a table of popular sizes to
|
|
cache free blocks of those sizes.
|
|
|
|
_`.sol.mech.allocate.buffer`: Allocate from contiguous blocks using
|
|
compare and increment.
|
|
|
|
_`.sol.mech.allocate.optimize-small`: Use a combination of techniques
|
|
to ensure the time spent managing a block is small relative to the
|
|
block's lifetime; assume small blocks typically have short lifetimes.
|
|
|
|
_`.sol.mech.allocate.optimize-new`: When "virgin" memory is acquired
|
|
from the operating system to satisfy a request, try to preserve it
|
|
(that is, use only what is necessary).
|
|
|
|
_`.sol.mech.allocate.segregate.size`: Use size as a substitute for
|
|
type.
|
|
|
|
_`.sol.mech.reallocate`: use `.req.fun.suballocate`_ to return unused
|
|
memory when a block shrinks, but differentiate this from an erroneous
|
|
overlapping free by using separate interfaces.
|
|
|
|
|
|
|
|
Implementation
|
|
--------------
|
|
|
|
The implementation consists of the following separable modules:
|
|
|
|
|
|
Splay Tree
|
|
..........
|
|
|
|
_`.impl.c.splay`: The implementation of
|
|
`.sol.mech.free-list.splay-tree`_. See design.mps.splay_.
|
|
|
|
.. _design.mps.splay: splay.txt
|
|
|
|
|
|
Coalescing Block Structure
|
|
..........................
|
|
|
|
_`.impl.c.cbs`: The initial implementation will use
|
|
`.sol.mech.free-list.splay-tree`_ and
|
|
`.sol.mech.storage.out-of-band`_. For locality, this storage should be
|
|
managed as a linked free list of splay nodes suballocated from blocks
|
|
acquired from a pool shared by all CBS's. Must support creation and
|
|
destruction of an empty tree. Must support search, insert and delete
|
|
by key of type Addr. Must support finding left and right neighbors of
|
|
a failed search for a key. Must support iterating over the elements of
|
|
the tree with reasonable efficiency. Must support storing and
|
|
retrieving a value of type Size associated with the key. Standard
|
|
checking and description should be provided. See design.mps.cbs_.
|
|
|
|
.. _design.mps.cbs: cbs.txt
|
|
|
|
|
|
Fail-over to address-ordered free list
|
|
......................................
|
|
|
|
_`.impl.c.freelist`: Because the CBS uses out-of-band storage, it may
|
|
be unable to handle insert (design.mps.cbs.function.cbs.insert.fail_)
|
|
and delete (design.mps.cbs.function.cbs.delete.fail_) operations. When
|
|
this happen MVT fails over to an address-ordered singly linked free
|
|
list. This uses in-band storage and so cannot run out of memory, but
|
|
it has much worse performance than the CBS. Therefore MVT eagerly
|
|
attempts to flush blocks from the free list back to the CBS. See
|
|
design.mps.freelist_ for the design and implementation of the free
|
|
list.
|
|
|
|
.. _design.mps.cbs.function.cbs.delete.fail: cbs#function.cbs.delete.fail
|
|
.. _design.mps.cbs.function.cbs.insert.fail: cbs#function.cbs.insert.fail
|
|
.. _design.mps.freelist: freelist
|
|
|
|
|
|
Available Block Queue
|
|
.....................
|
|
|
|
_`.impl.c.abq`: The initial implementation will be a queue of fixed
|
|
size (determined at pool creation time from the high water mark). Must
|
|
support creation and destruction of an empty queue. Must support
|
|
insertion at the head or tail of the queue (failing if full), peeking
|
|
at the head of the queue, and removal of the head (failing if empty)
|
|
or any element of the queue (found by a search). Standard checking and
|
|
description should be provided. See design.mps.abq_.
|
|
|
|
.. _design.mps.abq: abq.txt
|
|
|
|
|
|
Pool implementation
|
|
...................
|
|
|
|
_`.impl.c`: The initial implementation will use the above modules to
|
|
implement a buffered pool. Must support creation and destruction of
|
|
the pool. Creation takes parameters: minimum size, mean size, maximum
|
|
size, reserve depth and fragmentation limit. Minimum, mean, and
|
|
maximum size are used to calculate the internal fill and reuse sizes.
|
|
Reserve depth and mean size are used to calculate the ABQ high water
|
|
mark. Fragmentation limit is used to set the contingency mode. Must
|
|
support buffer initialization, filling and emptying. Must support
|
|
freeing. Standard checking and description should be provided.
|
|
[Eventually, it should support scanning, so it can be used with
|
|
collected pools, but no manual pool currently does.]
|
|
|
|
_`.impl.c.future`: The implementation should not preclude "buffered
|
|
free" (`mail.ptw.1997-12-05.19-07`_) being added in the future.
|
|
|
|
.. _mail.ptw.1997-12-05.19-07: https://info.ravenbrook.com/project/mps/mail/1997/12/05/19-07/0.txt
|
|
|
|
_`.impl.c.parameters`: The pool parameters are calculated as follows
|
|
from the input parameters: minimum, mean, and maximum size are taked
|
|
directly from the parameters.
|
|
|
|
_`.impl.c.parameter.fill-size`: The fill size is set to the maximum
|
|
size times the reciprocal of the fragmentation limit, rounded up to
|
|
the arena grain size.
|
|
|
|
_`.imple.c.parameter.reuse-size`: The reuse size is set to twice the
|
|
fill size (see `.arch.abq.return.segment`_,
|
|
`.impl.c.free.merge.segment`_).
|
|
|
|
_`.impl.c.parameter.abq-limit`: The ABQ high-water limit is set to the
|
|
reserve depth times the mean size (that is, the queue should hold as
|
|
many reuse blocks as would take to cover the population hysteresis if
|
|
the population consisted solely of mean-sized blocks, see
|
|
`.arch.abq.high-water`_).
|
|
|
|
_`.impl.c.parameter.avail-limit`: The free block high-water limit is
|
|
implemented by comparing the available free space to an "available
|
|
limit". The available limit is updated each time a segment is
|
|
allocated from or returned to the arena by setting it to the total
|
|
size of the pool times the fragmentation limit divide vy 100 (see
|
|
`.arch.contingency.fallback`_).
|
|
|
|
_`.impl.c.ap.fill`: An AP fill request will be handled as follows:
|
|
|
|
- If the request is larger than fill size, attempt to request a
|
|
segment from the arena sufficient to satisfy the request.
|
|
|
|
- Use any previously returned splinter (from `.impl.c.ap.empty`_), if
|
|
large enough.
|
|
|
|
- Attempt to retrieve a free block from the head of the ABQ (removing
|
|
it from ABQ and the free block managers if found).
|
|
|
|
- If above fragmentation limit, attempt to find a block in the free
|
|
block managers, using oldest-fit search.
|
|
|
|
- Attempt to request a segment of fill size from the arena.
|
|
|
|
- Attempt to find a block in the free block managers, using oldest-fit
|
|
search.
|
|
|
|
- Otherwise, fail.
|
|
|
|
_`.impl.c.ap.empty`: An AP empty request will be handled as follows:
|
|
|
|
- If remaining free is less than min size, return it to the free block
|
|
managers.
|
|
|
|
- If the remaining free is larger than any previous splinter, return
|
|
that splinter to the free block managers and save this one for use
|
|
by a subsequent fill.
|
|
|
|
- Otherwise return the remaining block to the free block managers.
|
|
|
|
_`.impl.c.free`: When blocks are returned to the free block managers
|
|
they may be merged with adjacent blocks. If a merge occurs with a
|
|
block on the ABQ, the ABQ must be adjusted to reflect the merge.
|
|
|
|
_`.impl.c.free.exception`: Exceptional blocks are returned directly to
|
|
the arena.
|
|
|
|
_`.impl.c.free.merge`: If a merge occurs and the merged block is
|
|
larger than reuse size:
|
|
|
|
- If the ABQ is full, remove the block at the head of the ABQ from the
|
|
ABQ and the free block managers and return it to the arena(*).
|
|
|
|
- Insert the newly merged block at the tail of the ABQ, leaving it in
|
|
the free block managers for further merging.
|
|
|
|
_`.impl.c.free.merge.segment`: (*) Merged blocks may not align with
|
|
arena segments. If necessary, return the interior segments of a block
|
|
to the arena and return the splinters to the free block managers.
|
|
|
|
_`.impl.c.free.merge.segment.reuse`: If the reuse size (the size at
|
|
which blocks recycle from the free block managers to the ABQ) is at
|
|
least twice the fill size (the size of segments the pool allocates
|
|
from the arena), we can guarantee that there will always be a
|
|
returnable segment in every ABQ block.
|
|
|
|
_`.impl.c.free.merge.segment.overflow`: If the reuse size is set
|
|
smaller (see `.arch.adapt`_), there may not be a returnable segment in
|
|
an ABQ block, in which case the ABQ has "overflowed". Whenever this
|
|
occurs, the ABQ will be refilled by searching the free block managers
|
|
for dropped reusable blocks when needed.
|
|
|
|
_`.impl.c.free.merge.segment.risk`: The current segment structure does
|
|
not really support what we would like to do. Loci should do better:
|
|
support reserving contiguous address space and mapping/unmapping any
|
|
portion of that address space.
|
|
|
|
_`.impl.c.free.merge.alternative`: Alternatively, if the MPS segment
|
|
substrate permitted mapping/unmapping of pages, the pool could use
|
|
very large segments and map/unmap pages as needed.
|
|
|
|
|
|
AP Dispatch
|
|
...........
|
|
|
|
_`.impl.c.multiap`: The initial implementation will be a glue layer
|
|
that selects among several AP's for allocation according to the
|
|
predicted deathtime (as approximated by size) of the requested
|
|
allocation. Each AP will be filled from a pool instance tuned to the
|
|
range of object sizes expected to be allocated from that AP. [For
|
|
bonus points provide an interface that creates a batch of pools and
|
|
AP's according to some set of expected object sizes. Eventually expand
|
|
to understand object lifetimes and general lifetime prediction keys.]
|
|
|
|
_`.impl.c.multiap.sample-code`: This glue code is not properly part of
|
|
the pool or MPS interface. It is a layer on top of the MPS interface,
|
|
intended as sample code for unsophisticated clients. Sophisticated
|
|
clients will likely want to choose among multiple AP's more directly.
|
|
|
|
|
|
Testing
|
|
-------
|
|
|
|
_`.test.component`: Components `.impl.c.splay`_, `.impl.c.cbs`_, and
|
|
`.impl.c.abq`_ will be subjected to individual component tests to
|
|
verify their functionality.
|
|
|
|
_`.test.regression`: All tests applied to MV (design.mps.poolmv_) and
|
|
EPDL (design.mps.poolepdl(0)) will be applied to poolmvt to ensure
|
|
that mvt is at least as functional as the pools it is replacing.
|
|
|
|
.. _design.mps.poolmv: poolmv
|
|
|
|
_`.test.qa`: Once poolmvt is integrated into the MPS, the standard MPS
|
|
QA tests will be applied to poolmvt prior to each release.
|
|
|
|
_`.test.customer`: Customer acceptance tests will be performed on a
|
|
per-customer basis before release to that customer (cf.
|
|
proc.release.epcore(2).test)
|
|
|
|
|
|
Text
|
|
----
|
|
|
|
Possible tweaks (from `mail.pekka.1998-04-15.13-10`_):
|
|
|
|
.. _mail.pekka.1998-04-15.13-10: https://info.ravenbrook.com/project/mps/mail/1998/04/15/13-10/0.txt
|
|
|
|
- Try to coalesce splinters returned from AP's with the front (or any)
|
|
block on the ABQ.
|
|
|
|
- Sort ABQ in some other way to minimize splitting/splinters. For
|
|
example, proximity to recently allocated blocks.
|
|
|
|
|
|
|
|
B. Document History
|
|
-------------------
|
|
|
|
- 1998-02-04 PTW Initial e-mail discussion. See thread starting with
|
|
`mail.ptw.1998-02-04.21-27`_.
|
|
|
|
.. _mail.ptw.1998-02-04.21-27: https://info.ravenbrook.com/project/mps/mail/1998/02/04/21-27/0.txt
|
|
|
|
- 1998-02-13 PTW Initial draft based on e-mail request for comments.
|
|
See thread starting with `mail.ptw.1998-02-12.03-36`_.
|
|
|
|
.. _mail.ptw.1998-02-12.03-36: https://info.ravenbrook.com/project/mps/mail/1998/02/12/03-36/0.txt
|
|
|
|
- 1998-04-01 PTW Revised in response to e-mail request for comments.
|
|
See thread starting with `mail.ptw.1998-03-23.20-43`_.
|
|
|
|
.. _mail.ptw.1998-03-23.20-43: https://info.ravenbrook.com/project/mps/mail/1998/03/23/20-43/0.txt
|
|
|
|
- 1998-04-15 PTW Revised in response to e-mail request for comments.
|
|
See thread starting with `mail.ptw.1998-04-13.21-40`_.
|
|
|
|
.. _mail.ptw.1998-04-13.21-40: https://info.ravenbrook.com/project/mps/mail/1998/04/13/21-40/0.txt
|
|
|
|
- 1998-05-06 PTW Revised in response to review
|
|
review.design.mps.poolmv2.2_(0).
|
|
|
|
.. _design.mps.poolmv2.2: poolmv2#2
|
|
|
|
- 2002-06-07 RB_ Converted from MMInfo database design document.
|
|
|
|
- 2013-05-21 GDR_ Converted to reStructuredText.
|
|
|
|
.. _RB: http://www.ravenbrook.com/consultants/rb/
|
|
.. _GDR: http://www.ravenbrook.com/consultants/gdr/
|
|
|
|
|
|
C. Copyright and License
|
|
------------------------
|
|
|
|
Copyright (C) 2002-2014 Ravenbrook Limited. All rights reserved.
|
|
<http://www.ravenbrook.com/>. This is an open source license. Contact
|
|
Ravenbrook for commercial licensing options.
|
|
|
|
Redistribution and use in source and binary forms, with or without
|
|
modification, are permitted provided that the following conditions are
|
|
met:
|
|
|
|
#. Redistributions of source code must retain the above copyright
|
|
notice, this list of conditions and the following disclaimer.
|
|
|
|
#. Redistributions in binary form must reproduce the above copyright
|
|
notice, this list of conditions and the following disclaimer in the
|
|
documentation and/or other materials provided with the distribution.
|
|
|
|
#. Redistributions in any form must be accompanied by information on how
|
|
to obtain complete source code for this software and any
|
|
accompanying software that uses this software. The source code must
|
|
either be included in the distribution or be available for no more than
|
|
the cost of distribution plus a nominal fee, and must be freely
|
|
redistributable under reasonable conditions. For an executable file,
|
|
complete source code means the source code for all modules it contains.
|
|
It does not include source code for modules or files that typically
|
|
accompany the major components of the operating system on which the
|
|
executable file runs.
|
|
|
|
**This software is provided by the copyright holders and contributors
|
|
"as is" and any express or implied warranties, including, but not
|
|
limited to, the implied warranties of merchantability, fitness for a
|
|
particular purpose, or non-infringement, are disclaimed. In no event
|
|
shall the copyright holders and contributors be liable for any direct,
|
|
indirect, incidental, special, exemplary, or consequential damages
|
|
(including, but not limited to, procurement of substitute goods or
|
|
services; loss of use, data, or profits; or business interruption)
|
|
however caused and on any theory of liability, whether in contract,
|
|
strict liability, or tort (including negligence or otherwise) arising in
|
|
any way out of the use of this software, even if advised of the
|
|
possibility of such damage.**
|