Merging and Compacting VirtualBox Snapshots

Jun 17, 2010 pythonVirtualBox

Using the VirtualBox GUI to manually merge lots of snapshots is time consuming and fiddly so I wrote a Python script called vboxmerge.py to do this automatically. The script merges a snapshot branch into the base VDI with a single command (it also illustrates how easy it is to script VirtualBox using Python).

Important: This post is obsolete: a much easier way is to use the VirtualBox VM Clone command – see Update December 2013: An easier way to merge snapshots in my Cloning and Copying VirtualBox virtual machines blog post.

The more VirtualBox snapshots you have the more disk space you consume. Snapshot VDIs grow with every previously unallocated guest OS write and it’s not long before the total size of a machine’s base VDI plus snapshots exceeds than the machines logical HDD size (this is a very good reason why it’s not a good idea to oversize your hard disks).

Here’s an example of the savings on one of my Windows XP guests (it has a 50GB logical HDD):

Before merge: Base VDI plus 13 snapshots totaling 90.1GB

After merge: Base VDI 46.0GB

After zeroing and compacting: 8.1GB (see Compacting VDIs below).

The merge almost halved the size; the compaction brought it to to below 10% of the original total size.

Compacting without first zeroing out free space on the guest file system generally provides no or very little benefit.

If you’re unfamiliar with the workings of VirtualBox snapshots and VDIs there is a great set of FAQs and Tutorial: All about VDIs in the VBox forums.

Merging the Snapshots

To merge snapshots into the base VDI just run the script specifying the machine name e.g.

python vboxmerge.py "My Machine"

If the machine’s snapshot tree has multiple branches you will need to run the vboxmerge once for each branch to merge the entire tree. The final machine state will the that of the last merged branch.

Important: Backup all your virtual machines before running the script and do dry runs first (using the --dryrun command option).

Note:

Gotcha : If the machine being processed is selected in the VirtualBox GUI then the GUI sometimes throws and error or stops responding. No damage results and the problem can be avoided by selecting another machine in the GUI before you run vboxmerge.

Compacting VDIs

Merging snapshots removes redundant shadowed blocks and will return a lot of space, but it doesn’t return blocks that have previously been written but are now no longer used by the guest file system. VirtualBox processes VDIs as block devices, it knows nothing about files and file systems. The VirtualBox compact command compacts blocks of zeros so for compaction to be effective you need to zero free space from the guest operating system before compacting the VDI using VirtualBox.

Windows: Use sdelete, for example to zero all unused space on drive C:

sdelete.exe -c C:

Linux: A zeroing utility for ext2 and ext3 file systems is zerofree written by Ron Yorston. Your file system must be unmounted or mounted read-only before using this utility.

If you decide to compact you should zero out the free space after merging (if you do it before merging the zeroing will balloon the most recent snapshot and the subsequent merging will take much longer). The steps are:

  1. Merge all snapshots, for example:

    python vboxmerge.py "My Machine"
    
  2. Open the merged guest machine and use an appropriate command to zero fill unused disk space.

  3. Compact the base VDI and create a base snapshot, for example:

    python vboxmerge.py --compact --snapshot "My Machine"
    

It would be nice if file systems had some sort of zero after delete option to zero free disk space automatically (see zerofree).

Why create a base snapshot?: Because I never like to write directly to the base VDI – creating a snapshot effectively leaves the base VDI read-only so even if your machine crashes half way through writing and corrupts the current state you can restore back to the base VDI.

Prerequisites

This is the environment I used to develop and test vboxmerge.py:

You will also need to download and install Python for Windows extensions (in my case 64bit).

Setup the Python VirtualBox bindings wrapper module (vboxapi) which is installed by the VirtualBox installer:

Test your environment by running Python and executing:

>>> import vboxapi
>>> vb = vboxapi.VirtualBoxManager(None,None)

If there a no errors you’re OK.

To print a list of your VM names and UUIDs execute this:

>>> for m in vb.getArray(vb.vbox,'machines'):
...     print m.name, m.id
...

The vboxmerge.py script

Important: The script was developed and tested with VirtualBox 3.2.4, it might not work with other versions of VirtualBox (see Prerequisites).

#!/usr/bin/env python
'''
vboxmerge.py - Merge VirtualBox snapshots into base VDI

Run 'python vboxmerge.py --help' to display command options.

Written by Stuart Rackham, <srackham@gmail.com>
Copyright (C) 2010 Stuart Rackham. Free use of this software is
granted under the terms of the MIT License.
'''
import os, sys
import vboxapi
import pywintypes

PROG = os.path.basename(__file__)
VERSION = '0.1.1'
VBOX = vboxapi.VirtualBoxReflectionInfo(False)  # VirtualBox constants.


def out(fmt, *args):
    if not OPTIONS.quiet:
        sys.stdout.write((fmt % args))

def die(msg, exitcode=1):
    OPTIONS.quiet = False
    out('ERROR: %s\n', msg)
    sys.exit(exitcode)

def runcmd(async_cmd, *args):
    '''
    Run the bound asynchronous method async_cmd with arguments args.
    Display progress and return once the command has completed.
    If an error occurs print the error and exit the program.
    '''
    if not OPTIONS.dryrun:
        try:
            progress = async_cmd(*args)
            while not progress.completed:
                progress.waitForCompletion(30000)   # Update progress every 30 seconds.
                out('%s%% ', progress.percent)
            out('\n')
        except pywintypes.com_error, e:
            die(e.args[2][2])   # Print COM error textual description and exit.

def vboxmerge(machine_name):
    '''
    Merge snapshots using global OPTIONS.
    '''
    vbm = vboxapi.VirtualBoxManager(None, None)
    vbox = vbm.vbox
    try:
        mach = vbox.findMachine(machine_name)
    except pywintypes.com_error:
        die('machine not found: %s' % machine_name)
    out('\nmachine: %s: %s\n', mach.name, mach.id)
    if mach.state != VBOX.MachineState_PoweredOff:
        die('machine must be powered off')
    session = vbm.mgr.getSessionObject(vbox)
    vbox.openSession(session, mach.id)
    try:
        snap = mach.currentSnapshot
        if snap:

            if OPTIONS.discard_currentstate:
                out('\ndiscarding current machine state\n')
                runcmd(session.console.restoreSnapshot, snap)

            skip = int(OPTIONS.skip)
            count = int(OPTIONS.count)
            while snap:
                parent = snap.parent
                if skip <= 0 and count > 0:
                    out('\nmerging: %s: %s\n', snap.name, snap.id)
                    runcmd(session.console.deleteSnapshot, snap.id)

                    # The deleteSnapshot API sometimes silently skips snapshots
                    # so test to make sure the snapshot is no longer valid.
                    try: snap.id
                    except pywintypes.com_error: pass
                    else:
                        if not OPTIONS.dryrun:
                            die('%s: %s: more than one child VDI' % (snap.name, snap.id))

                    count -= 1
                snap = parent
                skip -= 1
        else:
            out('no snapshots\n')

        if OPTIONS.snapshot:
            # Create a base snapshot.
            out('\ncreating base snapshot\n')
            runcmd(session.console.takeSnapshot, 'Base', 'Created by vboxmerge')

        if OPTIONS.compact:
            # Compact the base VDI.
            for attachment in mach.mediumAttachments:
                if attachment.type == VBOX.DeviceType_HardDisk:
                    base = attachment.medium.base
                    if base.type == VBOX.MediumType_Normal:
                        out('\ncompacting base VDI: %s\n', base.name)
                        runcmd(base.compact)
    finally:
        session.close()


if __name__ == '__main__':
    description = '''Merge VirtualBox snapshots into base VDI. MACHINE is the machine name.'''
    from optparse import OptionParser
    parser = OptionParser(usage='%prog [OPTIONS] MACHINE',
        version='%s %s' % (PROG,VERSION),
        description=description)
    parser.add_option('--skip', dest='skip',
        help='skip most recent N snapshots', metavar='N', default=0)
    parser.add_option('--count', dest='count',
        help='only merge N snapshots', metavar='N', default=1000)
    parser.add_option('-q', '--quiet',
        action='store_true', dest='quiet', default=False,
        help='do not display progress messages')
    parser.add_option('-n', '--dryrun',
        action='store_true', dest='dryrun', default=False,
        help='do nothing except display what would be done')
    parser.add_option( '--compact',
        action='store_true', dest='compact', default=False,
        help='compact the base VDI')
    parser.add_option( '--snapshot',
        action='store_true', dest='snapshot', default=False,
        help='create base snapshot')
    parser.add_option( '--discard-currentstate',
        action='store_true', dest='discard_currentstate', default=False,
        help='discard the current state of the MACHINE')
    if len(sys.argv) == 1:
        parser.parse_args(['--help'])
    global OPTIONS
    (OPTIONS, args) = parser.parse_args()
    vboxmerge(args[0])

Other Resources

« Previous Next »