mirror of
git://git.sv.gnu.org/emacs.git
synced 2026-04-27 08:43:40 -07:00
241 lines
10 KiB
ReStructuredText
241 lines
10 KiB
ReStructuredText
.. _mmref-recycle:
|
|
|
|
Recycling techniques
|
|
====================
|
|
|
|
There are many ways for automatic memory managers to determine what
|
|
memory is no longer required. In the main, garbage collection relies
|
|
on determining which blocks are not pointed to by any program
|
|
variables. Some of the techniques for doing this are described briefly
|
|
below, but there are many potential pitfalls, and many possible
|
|
refinements. These techniques can often be used in combination.
|
|
|
|
|
|
Tracing collectors
|
|
------------------
|
|
|
|
Automatic memory managers that follow pointers to determine which
|
|
blocks of memory are :term:`reachable` from program variables (known
|
|
as the :term:`root set`) are known as :dfn:`tracing` collectors. The
|
|
classic example is the mark-sweep collector.
|
|
|
|
|
|
Mark-sweep collection
|
|
^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
In a :term:`mark-sweep` collection, the collector first examines the
|
|
program variables; any blocks of memory pointed to are added to a list
|
|
of blocks to be examined. For each block on that list, it sets a flag
|
|
(the mark) on the block to show that it is still required, and also
|
|
that it has been processed. It also adds to the list any blocks
|
|
pointed to by that block that have not yet been marked. In this way,
|
|
all blocks that can be reached by the program are marked.
|
|
|
|
In the second phase, the collector *sweeps* all allocated memory,
|
|
searching for blocks that have not been marked. If it finds any, it
|
|
returns them to the allocator for reuse.
|
|
|
|
.. figure:: ../diagrams/mark-sweep.svg
|
|
:align: center
|
|
:alt: Diagram: Five memory blocks, three of which are reachable from program variables.
|
|
|
|
Five memory blocks, three of which are reachable from program variables.
|
|
|
|
In the diagram above, block 1 is directly accessible from a program
|
|
variable, and blocks 2 and 3 are indirectly accessible. Blocks 4 and 5
|
|
cannot be reached by the program. The first step would mark block 1,
|
|
and remember blocks 2 and 3 for later processing. The second step
|
|
would mark block 2. The third step would mark block 3, but wouldn't
|
|
remember block 2 as it is already marked. The sweep phase would ignore
|
|
blocks 1, 2, and 3 because they are marked, but would recycle blocks 4
|
|
and 5.
|
|
|
|
The two drawbacks of simple mark-sweep collection are:
|
|
|
|
* it must scan the entire memory in use before any memory can be freed;
|
|
|
|
* it must run to completion or, if interrupted, start again from scratch.
|
|
|
|
If a system requires real-time or interactive response, then simple
|
|
mark-sweep collection may be unsuitable as it stands, but many more
|
|
sophisticated garbage collection algorithms are derived from this
|
|
technique.
|
|
|
|
|
|
Copying collection
|
|
^^^^^^^^^^^^^^^^^^
|
|
|
|
After many memory blocks have been allocated and recycled, there are
|
|
two problems that typically occur:
|
|
|
|
* the memory in use is widely scattered in memory, causing poor
|
|
performance in the :term:`memory caches <cache (1)>` or
|
|
:term:`virtual memory` systems of most modern computers (known as
|
|
poor :term:`locality of reference`);
|
|
|
|
* it becomes difficult to allocate large blocks because free memory is
|
|
divided into small pieces, separated by blocks in use (known as
|
|
:term:`external fragmentation`).
|
|
|
|
One technique that can solve both these problems is :term:`copying
|
|
garbage collection`. A copying garbage collector may move allocated
|
|
blocks around in memory and adjust any references to them to point to
|
|
the new location. This is a very powerful technique and can be
|
|
combined with many other types of garbage collection, such as
|
|
mark-sweep collection.
|
|
|
|
The disadvantages of copying collection are:
|
|
|
|
* it is difficult to combine with :term:`incremental garbage
|
|
collection` (see below) because all references must be adjusted to
|
|
remain consistent;
|
|
|
|
* it is difficult to combine with :term:`conservative garbage
|
|
collection` (see below) because references cannot be confidently
|
|
adjusted;
|
|
|
|
* extra storage is required while both new and old copies of an object
|
|
exist;
|
|
|
|
* copying data takes extra time (proportional to the amount of
|
|
:term:`live` data).
|
|
|
|
|
|
Incremental collection
|
|
^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Older garbage collection algorithms relied on being able to start
|
|
collection and continue working until the collection was complete,
|
|
without interruption. This makes many interactive systems pause during
|
|
collection, and makes the presence of garbage collection obtrusive.
|
|
|
|
Fortunately, there are modern techniques (known as :term:`incremental
|
|
garbage collection`) to allow garbage collection to be performed in a
|
|
series of small steps while the program is never stopped for long. In
|
|
this context, the program that uses and modifies the blocks is
|
|
sometimes known as the :term:`mutator`. While the collector is trying
|
|
to determine which blocks of memory are reachable by the mutator, the
|
|
mutator is busily allocating new blocks, modifying old blocks, and
|
|
changing the set of blocks it is actually looking at.
|
|
|
|
Incremental collection is usually achieved with either the cooperation
|
|
of the memory hardware or the mutator; this ensures that, whenever
|
|
memory in crucial locations is accessed, a small amount of necessary
|
|
bookkeeping is performed to keep the collector's data structures
|
|
correct.
|
|
|
|
|
|
Conservative garbage collection
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Although garbage collection was first invented in 1958, many languages
|
|
have been designed and implemented without the possibility of garbage
|
|
collection in mind. It is usually difficult to add normal garbage
|
|
collection to such a system, but there is a technique, known as
|
|
:term:`conservative garbage collection`, that can be used.
|
|
|
|
The usual problem with such a language is that it doesn't provide the
|
|
collector with information about the data types, and the collector
|
|
cannot therefore determine what is a pointer and what isn't. A
|
|
conservative collector assumes that anything *might* be a pointer. It
|
|
regards any data value that looks like a pointer to or into a block of
|
|
allocated memory as preventing the recycling of that block.
|
|
|
|
Note that, because the collector does not know for certain which
|
|
memory locations contain pointers, it cannot readily be combined with
|
|
copying garbage collection. Copying collection needs to know where
|
|
pointers are in order to update them when blocks are moved.
|
|
|
|
You might think that conservative garbage collection could easily
|
|
perform quite poorly, leaving a lot of garbage uncollected. In
|
|
practice, it does quite well, and there are refinements that improve
|
|
matters further.
|
|
|
|
|
|
Reference counts
|
|
----------------
|
|
|
|
A reference count is a count of how many :term:`references` (that is,
|
|
pointers) there are to a particular memory block from other blocks. It
|
|
is used as the basis for some automatic recycling techniques that do
|
|
not rely on tracing.
|
|
|
|
|
|
Simple reference counting
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
In a simple :term:`reference counting` system, a reference count is
|
|
kept for each :term:`object`. This count is incremented for each new
|
|
reference, and is decremented if a reference is overwritten, or if the
|
|
referring object is recycled. If a reference count falls to zero, then
|
|
the object is no longer required and can be recycled.
|
|
|
|
Reference counting is frequently chosen as an automatic memory
|
|
management strategy because it seems simple to implement using
|
|
:term:`manual memory management` primitives. However, it is hard to
|
|
implement efficiently because of the cost of updating the counts. It
|
|
is also hard to implement reliably, because the standard technique
|
|
cannot reclaim objects connected in a loop. In many cases, it is an
|
|
inappropriate solution, and it would be preferable to use
|
|
:term:`tracing garbage collection` instead.
|
|
|
|
Reference counting is most useful in situations where it can be
|
|
guaranteed that there will be no loops and where modifications to the
|
|
reference structure are comparatively infrequent. These circumstances
|
|
can occur in some types of database structure and some file systems.
|
|
Reference counting may also be useful if it is important that objects
|
|
are recycled promptly, such as in systems with tight memory
|
|
constraints.
|
|
|
|
|
|
Deferred reference counting
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
The performance of reference counting can be improved if not all
|
|
references are taken into account. In one important technique, known
|
|
as :term:`deferred reference counting`, only references from other
|
|
objects are counted, and references from program variables are
|
|
ignored. Since most of the references to the object are likely to be
|
|
from local variables, this can substantially reduce the overhead of
|
|
keeping the counts up to date. An object cannot be reclaimed as soon
|
|
as its count has dropped to zero, because there might still be a
|
|
reference to it from a program variable. Instead, the program
|
|
variables (including the :term:`control stack`) are periodically
|
|
:term:`scanned <scan>`, and any objects which are not referenced from
|
|
there and which have zero count are reclaimed.
|
|
|
|
Deferred reference counting cannot normally be used unless it is
|
|
directly supported by the compiler. It's more common for modern
|
|
compilers to support tracing garbage collectors instead, because they
|
|
can reclaim loops. Deferred reference counting may still be useful for
|
|
its promptness---but that is limited by the frequency of scanning the
|
|
program variables.
|
|
|
|
|
|
One-bit reference counting
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Another variation on reference counting, known as the :term:`one-bit
|
|
reference count`, uses a single bit flag to indicate whether each
|
|
object has either "one" or "many" references. If a reference to an
|
|
object with "one" reference is removed, then the object can be
|
|
recycled. If an object has "many" references, then removing references
|
|
does not change this, and that object will never be recycled. It is
|
|
possible to store the flag as part of the *pointer* to the object, so
|
|
no additional space is required in each object to store the count.
|
|
One-bit reference counting is effective in practice because most
|
|
actual objects have a reference count of one.
|
|
|
|
|
|
Weighted reference counting
|
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
|
|
|
Reference counting is often used for tracking inter-process references for :term:`distributed garbage collection`. This fails to collect objects in separate processes if they have looped references, but tracing collectors are usually too inefficient as inter-process tracing entails much communication between processes. Within a process, tracing collectors are often used for local recycling of memory.
|
|
|
|
Many distributed collectors use a technique called :term:`weighted
|
|
reference counting`, which reduces the level of communication even
|
|
further. Each time a reference is copied, the weight of the reference
|
|
is shared between the new and the old copies. Since this operation
|
|
doesn't change the total weight of all references, it doesn't require
|
|
any communication with the object. Communication is only required when
|
|
references are deleted.
|