Come on, vagrant up! Saving Vagrant images that don't get a NAT address

Andrew Phillips

As part of testing and demonstrating our advanced deployment automation1 platform Deployit, we at XebiaLabs use a lot of cloud and Devops tooling to be able to handle all the different types of middleware we support and build, CI and Ops tooling with which we integrate2.

I was recently setting up a Vagrant3 environment to demonstrate Deployit's Puppet module, which automatically registers new Puppet-provisioned middleware with your deployment automation platform to enable application-tier deployments to it, and ended up wrestling for quite some time with a tricky VirtualBox problem.

The issue in question has been around for over two years now, and relates to VirtualBox's DHCP server sometimes, under as-yet-undetermined circumstances, failing to allocate an IP address to the NAT interface.
Since all Vagrant-managed images get a NAT interface by default4, this is more than a little inconvenient: Vagrant simply hangs during the VM configuration phase.

Since the problem doesn't occur deterministically, one way to work around this issue is simply to avoid having to reboot the image: play the "NAT lottery" until you get lucky by killing the VBoxManage process if the image is hanging and trying vagrant up again, then run vagrant suspend rather than vagrant halt and you can resume the images when you need them.

That will work, but I wasn't particularly happy with this approach because, aside from me not liking the idea of repeatedly killing hypervisor processes (I'm somewhat of a pacifist in this regard ;-)), it effectively "cripples" Vagrant: the ease with which you can start, stop and re-configure images is precisely one of the things that makes Vagrant so useful!

One of the things I quickly discovered is that, if you start a Vagrant-created image via the VirtualBox UI and it experiences the problem5, cycling the NAT adapter with ifdown eth0 && ifup eth0 fixes things: this second time, it is able to pick up an IP address from the DHCP server.

Unfortunately, this does't get you far with an image that Vagrant itself is trying to start: Vagrant creates headless sessions, so you can't actually access them through the VirtualBox UI until you've killed Vagrant and the VBoxHeadless process it starts.
Edited April 24th to add: see Richard Pot's comment for instructions on how to start boxes in GUI mode.

Luckily, VirtualBox allows you to execute commands on the guest OS without having to use the UI, via VBoxManage's guestcontrol command. So when Vagrant was again hanging while waiting to connect to the image, the first thing I tried was

/path/to/vboxmanage guestcontrol my-vagrant-image exec "/usr/bin/sudo" --username vagrant --password vagrant --verbose --wait-stdout ifdown eth0
/path/to/vboxmanage guestcontrol my-vagrant-image exec "/usr/bin/sudo" --username vagrant --password vagrant --verbose --wait-stdout ifup eth0

That did, as hoped, allow the NAT adapter to pick up an IP address. Unfortunately, it also confused Vagrant, which (presumably thinking that the image had gone offline) quit.

Happily, you don't have to bring down the adapter to request a new IP address: dhclient will do just as well. And indeed

/path/to/vboxmanage guestcontrol my-vagrant-image exec "/usr/bin/sudo" --username vagrant --password vagrant --verbose --wait-stdout dhclient

works: the NAT adapter picks up an IP address and, after a few seconds, Vagrant continues with the image configuration.

Something to hopefully help out even if it indeed takes another couple of years to get to bottom of the actual issue ;-)

Footnotes

  1. Or Application Release Automation (ARA), if you follow Gartner
  2. Check the platform support page for details
  3. For those not familiar with Vagrant, it's a powerful tool (written in Ruby) that allows you to declaritively define multiple related virtual images based on templates called 'boxes'. Vagrant orchestrates the interaction with VirtualBox to give you a very simple way of stopping, starting and configuring a cluster of images. In that sense, it's a little lit a VirtualBox-based CloudFormation.
  4. That's how Vagrant communicates with the image while configuring it
  5. You'll know because ifconfig will show that the eth0 adapter does not have an IPv4 address

Comments (3)

  1. Richard Pot - Reply

    April 24, 2012 at 9:22 am

    If you add "config.vm.boot_mode = :gui" to your Vagrantfile you get a Virtualbox GUI. There you can logon to the new VM and execute the commands.

    I've also seen some Vagrant / NAT problems if my host machine has no network connection. The "vagrant up" is than stuck at the same command. Executing a "vagrant ssh" in another terminal gets the connection to the new VM. Perhaps this also works in your scenario.

  2. Andrew Phillips - Reply

    April 24, 2012 at 4:43 pm

    @Richard: > If you add “config.vm.boot_mode = :gui” to your Vagrantfile you get a Virtualbox GUI

    Eek. Thanks! RTFM ;-) And no, vagrant ssh unfortunately didn't work for me. Or more accurately: connecting with PuTTY didn't work (vagrant ssh isn't supported on Windows) since the machine is not yet listening at the NAT address.

  3. Koaps - Reply

    August 6, 2013 at 10:42 am

    Heya,

    I ran into the same issues and while the basebox has issues booting by itself and was fixed but blanking and disabling udev rules:

    # Make sure Udev doesn't block our network
    # http://6.ptmc.org/?p=164
    echo "cleaning up udev rules"
    rm /etc/udev/rules.d/70-persistent-net.rules
    mkdir /etc/udev/rules.d/70-persistent-net.rules
    rm -rf /dev/.udev/
    rm /lib/udev/rules.d/75-persistent-net-generator.rules

    When I was trying to build a cluster, spinning up multiple VM's from a Vagrant file, it started failing again.

    I found that the issue for me was that the MAC and UUID (probably the issue) was in the ifcfg-eth0 in the basebox image.

    I removed them and restarted networks and vagrant up completed ok.

    wanted to throw this is in since this post seemed like the best one out there on the subject.

    Thanks for the other tips, was helpful getting access to the headless VM.

Add a Comment