booting windows on different hardware using iSCSI

This one was one hell of a problem I ever worked on. Took me almost 2 years to get around it. Solved it a month back, got time to write today.

Think of this:

1 iscsi target, 1 windows image to serve on tftp. Multiple nodes that boot from it. It will work fine until all the nodes have exact same config. Change network card of 1 and it will fail to boot. Its fine. Windows needs network adapter to boot over iscsi and if there is no driver for this different adapter then it will die. Makes sense.

Problem was, even after installing different drivers [which is not_so_straight_forward_either ] Windows won’t boot. There are some LWF [Light weight Filter] binding issues that won’t let it happen. Theoretically, if you get rid of these bindings, it should work. Took a long time to realize all the parts of it.

and I was able to boot over iscsi even after changing n/w adapter ! So, here are my notes, might save one’s time. There are products like CCBoot that have solved this problem already but all of these tools are commercial.

2 main parts to think of and each has multiple ways to solve it

1] install drivers offline

2] messing with the registry

 

1] Installing the device drivers offline –

There are few ways of doing this. only 1 worked for me

a] you could figure out where to put driver files and registry entries all by yourself and code a client. I don’t know how to do this.

b] there is one tiny utility called devcon.exe that installs drivers offline. As in on any raw/vhd/qcow2/wim image. I don’t remember exactly but devcon installs drivers under different Enumeration in registry. [ yeah! related ]

c] you can have a sysprepped image. Or Windows PE based one for newer versions of Windows. Important part here is, you have to configure sysprep/winpe such that on first boot that image boots into setup and not directly to desktop. It should enter setup mode, install drivers and reboot.

d] You can create a deployment image using SCCM and configure task to install drivers during deployment. Alternatively, you can simply use dism.exe (sccm uses the same anyway) to install drivers in offline mode. In my personal experience, this worked the best. I have seen it installing drivers and registry entries etc all were correctly placed (unlike devcon.exe)

Use any method (I’d prefer using dism.exe) to get the image in place.

2] Messing with registry

There is some thing called Light weight filter driver that gets bound to network card. Few things one should know:

– These bindings are formed if you do standalone install. If you install directly on iSCSI drive then you won’t see them.

– As far as I got to see, they were formed after new n/w device got installed irrespective of being on iSCSI drive all the time

– Once you get rid of those bindings for that specific machine, they won’t re-generate if the service is disabled.

Thanks to Chau Chee Yang I know how to get rid of bindings. This one works perfect.

http://chee-yang.blogspot.in/2012/05/migrate-windows-7-instance-to-iscsi.html

There is also one more change that I had to do.

Change the value of FilterRunType to 2 at

HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\ Network\{4d36e974-e325-11ce-bfc1-08002be10318}\{B70D6460-3635-4D42-B866-B8AB1A24454C}\Ndi

* THIS TOOK 2 YEARS * FACEPALM *

and now the fun part. Making sure that bindings are removed just before the machine boots.

CCBoot solves this by having a boot time driver. I have no idea how to build that. but hey! I know Python and Python is awesome for one more reason called libguestfs.  It has a subproject called hivex and it lets you modify windows registry offline. More about it here : http://libguestfs.org/virt-win-reg.1.html

So when ipxe chain loads and sends http request back to your server, just make sure to use libguestfs to get rid of the bindings offline and then serve that image.

This should work. I did most of the stuff by hand and I was playing over virtual box since I don’t have enough resources here. In fact it worked.

Some things that you should know about ControlSets:

ControlSet001 is the one that should be modified. It is the one that is in use.

CurrentControlSet is just like a symlink and it points to 001. You won’t see it in offline mode.

ControlSet002 is backup. It is used when you select ‘Last known good configuration’ option in boot menu.

Also, UUIDs of hardware change from machine to machine.

Drivers change location in

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ Class\{4D36E972-E325-11CE-BFC1-08002BE10318}

I had realtek at 0007. After I installed that other Intel Pro something, it got onto 0007 and realtek was moved to 0013, for the reasons I have no idea about.

“””

Note:

Information provided here can be used for commercial or non-commercial use, with or without attribution. 

The information in this article along with the references from different sites, is provided as is, without any guarantee or warranty. I shall not be held responsible for any loss that may occur. I am not liable to provide any sort of support, paid or unpaid.

This article is strongly opinionated. This worked for me, it may or may not work for you. You are expected to learn from it and improve.

I would strongly suggest to test on virtual machine.

“””

Advertisements

6 comments

  1. Hi, I am struggling on the same problem here… Why do you insist on having bindings removed just before machine boots? Shouldn’t just removing them on virtual machine once and for all work? Another thing, I found a nice solution of installing drivers offline:dpinst from windows driver kit. What do you think of it?

    • Hi Ivan,

      getting rid of bindings every time was required in my particular case. It was to ensure that image will boot even if the hardware is completely new to the base image. If it works for your case to not to have it in place then may be you can skip it but I doubt that will work. As far as I remember those bindings re-generate on first boot on new devices and hence you will get the BSOD from second boot on wards.

      you can test it on a VM. Get rid of bindings, reboot on different hardware which should work. Then reboot and it will fail even on same device. At this point, may be getting rid of them once from backend will get it working forever for this particular new device but in my case this sort of adjustments were not to be allowed.

      I used dpinst at the very early days of this problem. I don’t recollect what exactly happened and somehow in myriads of combinations I missed using it again. check if it registers the device in HKLM\system\ControlSet001\Enum or creates a new enum along side called “root”. If it adds it to root then its likely not what you want.

      See if this makes sense and you can get it working. Get back otherwise, will see in details.

      • Wow, I haven’t seen your reply at first and thought your blog was dead. My mistake =(. Can we discuss this problem somewhere more efficient? I’m developing a diskless boot server for both windows and linux guests and this problem is the weakest part of whole idea. My contact email is stein[dot]freel[at]gmail[dot]com

  2. Hi,
    Do I need to keep the PCI slot same on both device for this method to work ? I have PCI 224 on the machine VM and 256 on the machine where I am booting using the iSCSI to source.

    • you shouldn’t really have to. Also, check for that updated patch MS released in May 2015. Then you probably don’t have to do registry hacks yourself.

  3. Unfortunately, it did not work for me with different PCI unless the registry files are already there. Can you please elaborate on the patch you are talking about ?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s