emacs/mps/manual/source/design/object-debug.rst

.. sources:

    `<https://info.ravenbrook.com/project/mps/master/design/object-debug/>`_

.. mps:prefix:: design.mps.object-debug

Debugging features for client objects
=====================================


Introduction
------------

:mps:tag:`intro` This is the design for all the various debugging
features that MPS clients (and sometimes MPS developers) can use to
discover what is happening to their objects and the memory space.

:mps:tag:`readership` MPS developers.


History
-------

:mps:tag:`hist.0` The first draft merely records all the various ideas
about fenceposting that came up in discussions in June, July and
September 1998. This includes the format wrapping idea from
:mps:ref:`mail.ptw.1998-06-19.21-13(0)`. Pekka Pirinen, 1998-09-10.

:mps:tag:`hist.1` Converted from MMInfo database design document.
Richard Brooksby, 2002-06-07.

:mps:tag:`hist.2` Converted to reStructuredText. Gareth Rees,
2013-04-14.


Overview
--------

:mps:tag:`over.fenceposts` In its current state, this document mostly
talks about fenceposts, straying a little into tagging where theses
features have an effect on each other. [There exist other documents
that list other required features, and propose interfaces and
implementations. These will eventually be folded into this one. pekka
1998-09-10]


Requirements
------------

:mps:tag:`req.fencepost` Try to detect overwrites and underwrites of
allocated blocks by adding fenceposts (source req.product.??? VC++,
:mps:ref:`req.epcore.fun.debug.support`).

:mps:tag:`req.fencepost.size` The fenceposts should be at least 4
bytes on either side or 8 bytes if on one side only, with an
adjustable content (although VC++ only has 4 bytes with pattern
0xFDFDFDFD, having unwisely combined the implementation with other
debug features).

:mps:tag:`req.fencepost.check` There should be a function to check all
the fenceposts (source :mps:ref:`req.epcore.fun.debug.support`).

:mps:tag:`req.free-block` Try to detect attempts to write and read
free blocks.

:mps:tag:`req.walk` There should be a way to map ("walk") a user
function over all allocated objects (except PS VM objects), possibly
only in a separate debugging variety/mode (source
:mps:ref:`req.epcore.fun.debug.support`).

:mps:tag:`req.tag` There should be a way to store at least a word of
user data (a "tag", borrowing the SW term) with every object in
debugging mode, to be used in memory dumps (source req.product.???
VC++).

:mps:tag:`req.tag.walk` The walking function (as required by
:mps:ref:`.req.walk`) should have access to this data (source
:mps:ref:`req.epcore.fun.debug.support`).

:mps:tag:`req.dump.aver` It must be possible to perform a memory dump
after an :c:func:`AVER` has fired (naturally, if the information
required for the dump has been corrupted, it will fail, as softly as
possible). (source @@@@)

[There are more, especially about memory dumps and allocation
locations. pekka 1998-09-10]


Solution ideas
--------------

:mps:tag:`note.assumptions` I've tried not to assume anything about
the coincidence of manual/automatic, formatted/unformatted, and
ap/mps_alloc. I think those questions deserve to be decided on their
own merits. instead of being constrained by a debug feature.

:mps:tag:`fence.content.repeat` The content of a fencepost could be
specified as a byte/word which used repeatedly to fill the fencepost.

:mps:tag:`fence.content.template` The content could be given as a
template which is of the right size and is simply copied onto the
fencepost.

:mps:tag:`fence.walk` :mps:ref:`.req.fencepost.check` requires the
ability to find all the allocated objects. In formatted pools, this is
not a problem. In unformatted pools, we could use the walker. It's a
feasible strategy to bet that any pool that might have to support
fenceposting will also have a walking requirement.

:mps:tag:`fence.tag` Fenceposting also needs to keep track which
objects have fenceposts. unless we manage to do them all. It would be
easiest to put this in the tags.

:mps:tag:`fence.check.object` A function to check the fenceposts on a
given object would be nice.

:mps:tag:`fence.ap` AP's could support fenceposting transparently by
having a mode where :c:func:`mps_reserve` always goes out-of-line and
fills in the fenceposts (the pool's :c:func:`BufferFill` method isn't
involved). This would leave the MPS with more freedom of
implementation, especially when combined with some of the other ideas.
We think doing a function call for every allocation is not too bad for
debugging.

:mps:tag:`fence.outside-ap` We could also let the client insert their
own fenceposts outside the MPS allocation mechanism. Even if
fenceposting were done like this, we'd still want it to be an MPS
feature, so we'd offer sample C macros for adding the size of the
fencepost and filling in the fencepost pattern. Possibly something
like this (while we could still store the parameters in the pool or
allocation point, there seems little point in doing so in this case,
and having them as explicit parameters to the macros allows the client
to specify constants to gain effiency)::

    #define mps_add_fencepost(size, fp_size)
    #define mps_fill_fenceposts(obj, size, fp_size, fp_pattern)

The client would need to supply their own fencepost checking function,
obviously, but again we could offer one that matches the sample
macros.

:mps:tag:`fence.tail-only` In automatic pools, the presence of a
fencepost at the head of the allocated block results in the object
reference being an internal pointer. This means that the format or the
pool would need to know about fenceposting and convert between
references and pointers. This would slow down the critical path when
fenceposting is used. This can be ameliorated by putting a fencepost
at the tail of the block only: this obviates the internal pointer
problem and could provide almost the same degree of checking (provided
the size was twice as large), especially in copying pools, where there
are normally no gaps between allocated blocks. In addition to the
inescapable effects on allocation and freeing (including copying and
reclaim thereunder), only scanning would have to know about
fenceposts.

:mps:tag:`fence.tail-only.under` Walking over all the objects in the
pool would be necessary to detect underwrites, as one couldn't be sure
that there is a fencepost before any given object (or where it's
located exactly). If the pool were doing the checking, it could be
sure: it would know about alignments and it could put fenceposts in
padding objects (free blocks will have them because they were once
allocated) so there'd be one on either side of any object (except at
the head of a segment, which is not a major problem, and could be
fixed by adding a padding object at the beginning of every segment).
This requires some cleverness to avoid splinters smaller than the
fencepost size, but it can be done.

:mps:tag:`fence.wrapper` On formatted pools, fenceposting could be
implemented by "wrapping" the client-supplied format at creation time.
The wrapper can handle the conversion from the fenceposted object and
back. This will be invisible to the client and gives the added benefit
that the wrapper can validate fenceposts on every format operation,
should it desire. That is, the pool would see the fenceposts as part
of the client object, but the client would only see its object; the
format wrapper would translate between the two. Note that hiding the
fenceposts from scan methods, which are required to take a contiguous
range of objects, is a bit complicated.

:mps:tag:`fence.client-format` The MPS would supply such a wrapper,
but clients could also be allowed to write their own fenceposted
formats (provided they coordinate with allocation, see below). This
would make scanning fenceposted segments more efficient.

:mps:tag:`fence.wrapper.variable` Furthermore, you could create
different classes of fencepost within a pool, because the fencepost
itself could have a variable format. For instance, you might choose to
have the fencepost be minimal (one to two words) for small objects,
and more detailed/complex for large objects (imagining that large
objects are likely vector-ish and subject to overruns). You could get
really fancy and have the fencepost class keyed to the object class
(for example, different allocation points create different classes of
fenceposting).

:mps:tag:`fence.wrapper.alloc` Even with a wrapped format, allocation
and freeing would still have know about the fenceposts. If allocation
points are used, either MPS-side (:mps:ref:`.fence.ap`) or client-side
(:mps:ref:`.fence.outside-ap`) fenceposting could be used, with the obvious
modifications.

:mps:tag:`fence.wrapper.alloc.format` We could add three format
methods, to adjust the pointer and the size for alloc and free, to put
down the fenceposts during alloc, and to check them; to avoid slowing
down all allocation, this would require some MOPping to make the
format class affect the choice of the alloc and free methods (see
:mps:ref:`mail.pekka.1998-06-11.18-18`).

:mps:tag:`fence.wrapper.alloc.size` We could just communicate the size
of the fenceposts between the format and the allocation routines, but
then you couldn't use variable fenceposts (.fence.wrapper.variable).
[All this applies to copying and reclaim in a straight-forward manner,
I think.]

:mps:tag:`fence.pool.wrapper` Pools can be wrapped as well. This could
be a natural way to represent/implement the fenceposting changes to
the Alloc and Free methods. [@@@@alignment]

:mps:tag:`fence.pool.new-class` We could simply offer a debugging
version of each pool class (e.g., :c:func:`mps_pool_class_mv_debug`).
As we have seen, debugging features have synergies which make it
advantageous to have a coordinated implementation, so splitting them
up would not just complicate the client interface, it would also be an
implementation problem; we can turn features on or off with pool init
parameters.

:mps:tag:`fence.pool.abstract` We could simply use pool init
parameters only to control all debugging features (optargs would be
useful here). While there migh be subclasses and wrappers internally,
the client would only see a single pool class; in the internal view,
this would be an abstract class, and the parameters would determine
which concrete class actually gets instantiated.

:mps:tag:`tag.out-of-line` It would be nice if tags were stored
out-of-line, so they can be used to study allocation patterns and
fragmentation behaviours. Such an implementation of tagging could also
easily be shared among several pools.


Architecture
------------

:mps:tag:`pool` The implementation is at the pool level, because pools
manage allocated objects. A lot of the code will be generic,
naturally, but the data structures and the control interfaces attach
to pools. In particular, clients will be able to use tagging and
fenceposting separately on each pool.

:mps:tag:`fence.size` Having fenceposts of adjustable size and pattern
is quite useful. We feel that restricting the size to an integral
multiple of the [pool or format?] alignment is harmless and simplifies
the implementation enormously.

:mps:tag:`fence.template` We use templates
(:mps:ref:`.fence.content.template`) to fill in the fenceposts, but we
do not give any guarantees about the location of the fenceposts, only
that they're properly aligned. This leaves us the opportunity to do
tail-only fenceposting, if we choose.

:mps:tag:`fence.slop` [see :mps:ref:`impl.c.dbgpool.FenceAlloc` @@@@]

:mps:tag:`fence.check.free` We check the fenceposts when freeing an
object.

:mps:tag:`unified-walk` Combine the walking and tagging requirements
(:mps:ref:`.req.tag.walk` and @@@@) into a generic facility for
walking and tagging objects with just one interface and one name:
tagging. Also combine the existing formatted object walker into this
metaphor, but allowing the format and tag parameters of the step
function be optional [this part has not been implemented yet pekka
1998-09-10].

:mps:tag:`init` It simplifies the implementation of both tagging and
fenceposting if they are always on, so that we don't have to keep
track of which objects have been fenceposted and which have not, and
don't have to have three kinds of tags: for user data, for
fenceposting, and for both. So we determine this at pool init time
(and let fenceposting turn on tagging, if necessary).

:mps:tag:`pool-parameters` Fencepost templates and tag formats are
passed in as pool parameters.

:mps:tag:`modularity` While a combined generic implementation of tags
and fenceposts is provided, it is structured so that each part of it
could be implemented by a pool-specific mechanism with a minimum of
new protocol. [This will be improved, when we figure out formatted
pools -- they don't need tags for fenceposting.]

:mps:tag:`out-of-space` If there's no room for tags, we will not dip
into the reservoir, just fail to allocate the tag. If the alloc call
had a reservoir permit, we let it succeed even without a tag, and just
make sure the free method will not complain if it can't find a tag. If
the call didn't have a reservoir permit, we free the block allocated
for the object and fail the allocation, so that the client gets a
chance to do whatever low-memory actions they might want to do.
[Should this depend on whether there is anything in the reservoir?]
This breaks the one-to-one relationship between tags and objects, so
some checks cannot be made, but we do count the "lost" tags.

[need to hash out how to do fenceposting in formatted pools]


Client interface
----------------

:mps:tag:`interface.fenceposting.check`
:c:func:`mps_pool_check_fenceposts` is a function to check all
fenceposts in a pool (:c:func:`AVER` if a problem is found)

[from here on, these are tentative and incomplete]

.. c:function:: mps_res_t mps_fmt_fencepost_wrap(mps_fmt_t *format_return, mps_arena_t arena, mps_fmt_t format, [fp parameters])

:mps:tag:`interface.fenceposting.format` A function to wrap a format
(class) to provide fenceposting.

.. c:function:: void (*mps_fmt_adjust_fencepost_t)(size_t *size_io)

:mps:tag:`interface.fenceposting.add` A format method to adjust size
of a block about to be allocted to allow for fenceposts.

.. c:function::  void (*mps_fmt_put_fencepost_t)(mps_addr_t * addr_io, size_t size)

:mps:tag:`interface.fenceposting.add` A format method to add a fencepost around a block
about to be allocated [the NULL method adds a tail fencepost]

.. c:function::  mps_bool_t (*mps_fmt_check_fenceposts_t)(mps_addr_t)

:mps:tag:`interface.fenceposting.add` A format method to check the
fenceposts around an object [the ``NULL`` method checks tails].

.. c:function:: mps_class_t mps_debug_class(mps_class_t class)

:mps:tag:`interface.fenceposting.pool` A function to wrap a pool class
to provide fenceposting (note absence of arena parameter).

.. c:function:: mps_res_t mps_alloc(mps_addr_t *, mps_pool_t, size_t);
.. c:function:: mps_res_t mps_alloc_dbg(mps_addr_t *, mps_pool_t, size_t, ...);
.. c:function:: mps_res_t mps_alloc_dbg_v(mps_addr_t *, mps_pool_t, size_t, va_list);

:mps:tag:`interface.tags.alloc` Three functions to replace existing
:c:func:`mps_alloc` (request.???.??? proposes to remove the varargs)

.. c:function:: void (*mps_objects_step_t)(mps_addr_t addr, size_t size, mps_fmt_t format, mps_pool_t pool, void *tag_data, void *p)
.. c:function:: void mps_pool_walk(mps_arena_t arena, mps_pool_t pool, mps_objects_step_t step, void *p)
.. c:function:: void mps_arena_walk(mps_arena_t arena, mps_objects_step_t step, void *p)

:mps:tag:`interface.tags.walker` Functions to walk all the allocated
objects in a pool or an arena (only client pools in this case),
``format`` and ``tag_data`` can be ``NULL`` (``tag_data`` really wants
to be ``void *``, not :c:type:`mps_addr_t`, because it's stored
together with the internal tag data in an MPS internal pool)


Examples
--------

:mps:tag:`example.debug-alloc` ::

    #define MPS_ALLOC_DBG(res_io, addr_io, pool, size)
      MPS_BEGIN
        static mps_tag_A_s _ts = { __FILE__, __LINE__ };

        *res_io = mps_alloc(addr_io, pool, size, _ts_)
      MPS_END


Implementation
--------------

:mps:tag:`new-pool` The client interface to control fenceposting
consists of the new classes :c:func:`mps_pool_class_mv_debug`,
:c:func:`mps_pool_class_epdl_debug`, and
:c:func:`mps_pool_class_epdr_debug`, and their new init parameter of
type :c:type:`mps_pool_debug_option_s`. [This is a temporary solution,
to get it out without writing lots of new interface. pekka 1998-09-10]

:mps:tag:`new-pool.impl` The debug pools are implemented using the
"class wrapper" :c:func:`EnsureDebugClass`, which produces a subclass
with modified ``init``, ``finish``, ``alloc``, and ``free`` methods.
These methods are implemented in the generic debug class code
(:c:func:`impl.c.dbgpool`), and are basically wrappers around the
superclass methods (invoked through the ``pool->class->super`` field).
To find the data stored in the class for the debugging features, they
use the ``debugMixin`` method provided by the subclass. So to make a
debug subclass, three things should be provided: a structure
definition of the instance containing a
:c:type:`PoolDebugMixinStruct`, a pool class function that uses
:c:func:`EnsureDebugClass`, and a ``debugMixin`` method that locates
the :c:type:`PoolDebugMixinStruct` within an instance.

:mps:tag:`tags.splay` The tags are stored in a splay tree of tags
allocated from a subsidiary MFS pool. The client needs to specify the
(maximum) size of the client data in a tag, so that the pool can be
created.

[Lots more should be said, eventually. pekka 1998-09-10]