mirror of
git://git.sv.gnu.org/emacs.git
synced 2026-01-03 02:31:03 -08:00
140 lines
6.2 KiB
ReStructuredText
140 lines
6.2 KiB
ReStructuredText
.. index::
|
|
single: Memory Pool System; performance
|
|
single: performance
|
|
single: generation; choosing size
|
|
|
|
.. _guide-perf:
|
|
|
|
Tuning the Memory Pool System for performance
|
|
=============================================
|
|
|
|
.. note::
|
|
|
|
When developing a benchmark to profile your program against, bear
|
|
in mind that the benchmark should allocate several times the
|
|
amount of physical memory that you expect to be available to the
|
|
process. If the total allocation fits into the available memory,
|
|
there's no point running a garbage collector at all: you might as
|
|
well just allocate and never collect.
|
|
|
|
The most important aspect of tuning the MPS is to choose good sizes
|
|
for the :term:`generations` in your :term:`generation chain`. The
|
|
ideal size of a generation should be such that when it is collected,
|
|
most of the blocks allocated in that generation should be found to be
|
|
:term:`dead` (and so the cost of :term:`scanning <scan>` and
|
|
:term:`copying <copying garbage collection>` them can be avoided
|
|
entirely). If a generation is collected when its blocks are mostly
|
|
alive, that is a waste of time.
|
|
|
|
In the tables below I give the execution time of ``test-leaf.scm`` in
|
|
the toy Scheme interpreter under different settings for its generation
|
|
chain. (This test case allocates hundreds of millions of small
|
|
short-lived objects.)
|
|
|
|
First, the effect of varying the capacity of a chain with a single
|
|
generation.
|
|
|
|
======== ========= =========================
|
|
Capacity Mortality Execution time (user+sys)
|
|
======== ========= =========================
|
|
100 0.80 362.6
|
|
200 0.80 354.9
|
|
400 0.80 349.7
|
|
800 0.80 314.4
|
|
1600 0.80 215.7
|
|
3200 0.80 94.0
|
|
6400 0.80 53.5
|
|
12800 0.80 79.6
|
|
25600 0.80 77.6
|
|
======== ========= =========================
|
|
|
|
Second, the effect of varying the mortality of a chain with a single
|
|
generation.
|
|
|
|
======== ========= =========================
|
|
Capacity Mortality Execution time (user+sys)
|
|
======== ========= =========================
|
|
6400 0.20 55.4
|
|
6400 0.40 54.0
|
|
6400 0.60 54.0
|
|
6400 0.80 53.5
|
|
6400 0.99 54.8
|
|
======== ========= =========================
|
|
|
|
Third, the effect of varying the number of generations (all
|
|
generations being identical).
|
|
|
|
=========== ======== ========= =========================
|
|
Generations Capacity Mortality Execution time (user+sys)
|
|
=========== ======== ========= =========================
|
|
1 6400 0.80 53.5
|
|
2 6400 0.80 42.4
|
|
3 6400 0.80 42.1
|
|
4 6400 0.80 42.2
|
|
5 6400 0.80 42.2
|
|
=========== ======== ========= =========================
|
|
|
|
These tables suggest that:
|
|
|
|
1. The improvement in performance to be gained by getting generation
|
|
sizes right is dramatic: much bigger than the small improvements to
|
|
gained from other techniques.
|
|
|
|
2. The predicted mortality doesn't make much difference to the overall
|
|
execution time (it does affect the distribution of pause times,
|
|
however: see :ref:`topic-collection-schedule`.)
|
|
|
|
3. You can make generations too big as well as too small.
|
|
|
|
4. There are rapidly diminishing returns to be gained from adding
|
|
generations.
|
|
|
|
.. note::
|
|
|
|
:ref:`topic-telemetry` can be used to discover when generations
|
|
are being collected and what proportion of blocks were found to be
|
|
alive.
|
|
|
|
The table below shows the effect of varying the initial allocation of
|
|
address space to the arena (using three generations each with capacity
|
|
6400 kB, mortality 0.80).
|
|
|
|
============= ========== =========== =========================
|
|
Address space Extensions Collections Execution time (user+sys)
|
|
============= ========== =========== =========================
|
|
2 32 371 52.0
|
|
4 21 370 47.0
|
|
8 0 [1]_ [1]_
|
|
14 0 [1]_ [1]_
|
|
16 0 2436 160.5
|
|
18 0 1135 89.1
|
|
20 0 673 60.6
|
|
22 0 484 48.7
|
|
24 0 400 43.1
|
|
32 0 368 41.2
|
|
64 0 368 43.1
|
|
128 0 368 46.4
|
|
256 0 368 46.3
|
|
512 0 368 49.3
|
|
1024 0 368 42.0
|
|
2048 0 368 43.2
|
|
4096 0 368 43.5
|
|
8192 0 368 46.1
|
|
16384 0 368 49.2
|
|
32768 0 368 57.1
|
|
65536 0 368 71.1
|
|
131072 0 368 101.3
|
|
262144 0 368 161.3
|
|
524288 0 368 273.0
|
|
1048576 0 368 504.6
|
|
============= ========== =========== =========================
|
|
|
|
.. note::
|
|
|
|
.. [1] With this initial allocation of address space, the test
|
|
case failed to run to completion after thousands of seconds
|
|
and tens of thousands of garbage collection cycles.
|
|
|
|
The lesson here is that the allocation of address space has to be
|
|
comfortably larger than the working set of the program, but that a
|
|
very large address space is ruinous to performance.
|