xbzrle.txt 4.76 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
XBZRLE (Xor Based Zero Run Length Encoding)
===========================================

Using XBZRLE (Xor Based Zero Run Length Encoding) allows for the reduction
of VM downtime and the total live-migration time of Virtual machines.
It is particularly useful for virtual machines running memory write intensive
workloads that are typical of large enterprise applications such as SAP ERP
Systems, and generally speaking for any application that uses a sparse memory
update pattern.

Instead of sending the changed guest memory page this solution will send a
compressed version of the updates, thus reducing the amount of data sent during
live migration.
In order to be able to calculate the update, the previous memory pages need to
be stored on the source. Those pages are stored in a dedicated cache
(hash table) and are accessed by their address.
The larger the cache size the better the chances are that the page has already
been stored in the cache.
A small cache size will result in high cache miss rate.
Cache size can be changed before and during migration.

Format
=======

The compression format performs a XOR between the previous and current content
of the page, where zero represents an unchanged value.
The page data delta is represented by zero and non zero runs.
A zero run is represented by its length (in bytes).
A non zero run is represented by its length (in bytes) and the new data.
The run length is encoded using ULEB128 (http://en.wikipedia.org/wiki/LEB128)

There can be more than one valid encoding, the sender may send a longer encoding
for the benefit of reducing computation cost.

page = zrun nzrun
       | zrun nzrun page

zrun = length

nzrun = length byte...

length = uleb128 encoded integer

On the sender side XBZRLE is used as a compact delta encoding of page updates,
Cao jin's avatar
Cao jin committed
45
retrieving the old page content from the cache (default size of 64MB). The
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
receiving side uses the existing page's content and XBZRLE to decode the new
page's content.

This work was originally based on research results published
VEE 2011: Evaluation of Delta Compression Techniques for Efficient Live
Migration of Large Virtual Machines by Benoit, Svard, Tordsson and Elmroth.
Additionally the delta encoder XBRLE was improved further using the XBZRLE
instead.

XBZRLE has a sustained bandwidth of 2-2.5 GB/s for typical workloads making it
ideal for in-line, real-time encoding such as is needed for live-migration.

Example
old buffer:
1001 zeros
05 06 07 08 09 0a 0b 0c 0d 0e 0f 10 11 12 13 68 00 00 6b 00 6d
3074 zeros

new buffer:
1001 zeros
01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 68 00 00 67 00 69
3074 zeros

encoded buffer:

encoded length 24
e9 07 0f 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f 03 01 67 01 01 69

74 75
Cache update strategy
=====================
Cao jin's avatar
Cao jin committed
76
Keeping the hot pages in the cache is effective for decreasing cache
77 78 79 80 81
misses. XBZRLE uses a counter as the age of each page. The counter will
increase after each ram dirty bitmap sync. When a cache conflict is
detected, XBZRLE will only evict pages in the cache that are older than
a threshold.

82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136
Usage
======================
1. Verify the destination QEMU version is able to decode the new format.
    {qemu} info migrate_capabilities
    {qemu} xbzrle: off , ...

2. Activate xbzrle on both source and destination:
   {qemu} migrate_set_capability xbzrle on

3. Set the XBZRLE cache size - the cache size is in MBytes and should be a
power of 2. The cache default value is 64MBytes. (on source only)
    {qemu} migrate_set_cache_size 256m

4. Start outgoing migration
    {qemu} migrate -d tcp:destination.host:4444
    {qemu} info migrate
    capabilities: xbzrle: on
    Migration status: active
    transferred ram: A kbytes
    remaining ram: B kbytes
    total ram: C kbytes
    total time: D milliseconds
    duplicate: E pages
    normal: F pages
    normal bytes: G kbytes
    cache size: H bytes
    xbzrle transferred: I kbytes
    xbzrle pages: J pages
    xbzrle cache miss: K
    xbzrle overflow : L

xbzrle cache-miss: the number of cache misses to date - high cache-miss rate
indicates that the cache size is set too low.
xbzrle overflow: the number of overflows in the decoding which where the delta
could not be compressed. This can happen if the changes in the pages are too
large or there are many short changes; for example, changing every second byte
(half a page).

Testing: Testing indicated that live migration with XBZRLE was completed in 110
seconds, whereas without it would not be able to complete.

A simple synthetic memory r/w load generator:
..    include <stdlib.h>
..    include <stdio.h>
..    int main()
..    {
..        char *buf = (char *) calloc(4096, 4096);
..        while (1) {
..            int i;
..            for (i = 0; i < 4096 * 4; i++) {
..                buf[i * 4096 / 4]++;
..            }
..            printf(".");
..        }
..    }