Wednesday, 19 June 2013

Hiatus

Between the launch of Z87, to multiple projects at work and recently starting an IT degree, I haven't had much time to put into blogging.

Lately I've also had some personal projects such as building a reasonable sized Litecoin/Bitcoin farm, so far we are up to 7000 KH/s with another 2500 coming within a few weeks. I have an upcoming article for ABC Tech covering crypto currency, I can hopefully come at it from a different perspective than most other writers, by looking at the community from inside out.

I have a back log of blog posts to write shortly, the next up within a few weeks (I hope).

Cheers

Tuesday, 30 April 2013

Microsoft Failover Cluster CSV Volume Disappear

We recently began experiencing some problems with the 3rd member of our Windows Failover Cluster. Our cluster consists of 3 servers running 2008 R2 SP1, running Failover cluster manager with a SAN backend. This SAN presents a number of Cluster Shared Volumes (CSV) to the servers, all of our data sits on these CSV's.

One afternoon our primary CSV went into redirected mode, this is a normal occurrence during backup operations, but no backup was scheduled and we were not able to turn off redirected mode. We had to schedule a short outage, fully power off the Hyper-V hosts and power back on. After a full investigation turned up nothing we put this down to an anomaly. Another 3 weeks later and the problem happened again, this time we scheduled a longer outage so we could investigate the problem more thoroughly.

During testing we discovered that one of the 3 hosts was causing the issue, when it was removed from the cluster, no problems, when it was in the cluster the CSV in question would randomly go into redirected mode. Logs of the SAN and Hyper-V hosts turned up nothing and all the cluster tests passed perfectly.

Unfortunately during our testing, we encountered a bigger problem. When bringing the faulty host back online for the 3rd time, the CSV itself disappeared on the 2 healthy hosts, the CSV was still visible on the 3rd host. We promptly removed the 3rd host from the cluster but the CSV did not reappear on the 2 healthy hosts.




What didn't work

We tried a number of processes to get the volume to re-appear.
  • Rescanning/refreshing in Disk Manager
  • Deleting and re-adding the CSV
  • Repairing the CSV
  • Restarting the Hyper-V hosts
  • Removing the faulty host from the SAN LUN zones
At this point we were a little worried, our primary CSV was displayed in windows as an empty disk (as above)  With the Failover Cluster tools we checked out the DiskSignature and we were greeted with a grim 0. 

Command: cluster resource VMs /priv
D  VMs                  DiskSignature                  0 (0x0)

Scanning the FailoverClustering event logs we turned up the following events:

Event ID: 1568
Source: FailoverClustering
Cluster physical disk resource 'VMs' cannot be brought online because the associated disk could not be found. The expected signature of the disk was 'F62FC592'. If the disk was replaced or restored, in the Failover Cluster Manager snap-in, you can use the Repair function (in the properties sheet for the disk) to repair the new or restored disk. If the disk will not be replaced, delete the associated disk resource.

and

Event ID: 1568
Source: FailoverClustering
Cluster disk resource 'VMs' found the disk identifier to be stale. This may be expected if a restore operation was just performed or if this cluster uses replicated storage. The DiskSignature or DiskUniqueIds property for the disk resource has been corrected.

This was repeated over and over, the Cluster was trying to repair the problem but not having any success.




The Solution

After reading this thread we noticed in the last post a user mentioned "a Microsoft tech fixed the problem, the disk first sector was corrupted" We decided a partition table scan and re-write were worth a shot.

Using testdisk I was able to successfully recover the volume by first analyzing the disk for partitions then writing the changes.

I then re-wrote the disk signature (which I found in the FailoverClustering logs, as per above) to the volume using the below command.

CLUSTER RESOURCE VMs DiskSignature F62FC592

The volume then successfully came online, phew and all within my outage window!


Monday, 8 April 2013

DPM 2012 SP1 replica inconsistent - datasource is owned by a different DPM server

Recently we took the leap of faith to DPM 2012 SP1 + Update Rollup 1. SP1 offers proper compatibility with SQL 2012 and Server 2012, products we have begun using within our organization  The initial install and management went without a hitch, in fact it went eerily too well.

The follow month however wasn't such smooth sailing, within a few days a number of SQL data sources belonging to two different protection groups began failing with "DPM could not run the backup/recovery job for the data source because it is owned by a different DPM server.". The error description went on to say the "Owner DPM Server: ." claiming "." owned the DPM job.

This was an unusual error to receive as there has only ever been a single DPM server within the organization  so the possibility of another DPM server owning the job was highly unlikely.


The problem in detail

The 5 data sources that were failing were all SharePoint data sources. We are using a Sharepoint 2010 Farm protection group (PG) and backing up any SQL resources that arn't covered in this PG with a simple SQL PG. "Sharepoint_Config" was one of the failing resources, as well as 4 SQL jobs "Application_Registry_Service", "Bdc_Service_DB", "Managed Metadata Service" and "PerformancePoint Service Application".

DPM complained that the "Replica is inconsistent" and attached the following detailed error description:
"The replica of SQL Server 2008 database SERVER\Application_Registry_Service on server.contoso.internal is inconsistent with the protected data source. All protection activies for data source will fail until the replica is synchronized with consistency check. You can recover data from existing recovery points, but new recovery points cannot be created until the replica is consistent.
For SharePoint farm, recovery points will continue getting created with the databases that are consistent. To backup inconsistent databases, run a consistency check on the farm. (ID 3106)
DPM could not run the backup/recovery job for the data source because it is owned by a different DPM server.
Data Source: SERVER\Application_Registry_Service
Owner DPM Server: . (ID 3184 Details: The file or directory is corrupted and unreadable (0x80070570))"

DPM suggests to take ownership, I attempted this, re-ran the consistency check and within 5 minutes received the errors messages again.

Our logs were also complaining of communication problems, which we initially put down to network issues, but this theory was quickly debunked as other data sources on the same server were successfully backing up.
FsmBlock.cs(178)        2DE6593E-B086-4002-9205-0A57B65BDC8E    WARNING    Backup.DeltaDataTransferLoop.CommonLoop : RAReadDatasetDelta, StatusReason = Error (StatusCode = -2147023671, ErrorCode = CommunicationProblem, workitem = 70aeaa93-b090-4a3c-bea0-c6fd1a1b4625)
01    TaskExecutor.cs(843)        2DE6593E-B086-4002-9205-0A57B65BDC8E    FATAL    Task stopped (state=Failed, error=RmAgentCommunicationError; -2147023671; WindowsHResult),

We also tried removing the Protect Group and re-adding it, checking SQL permissions, repairing the DPM agent, un-installing and re-installing the Agent, all of which failed.



The Solution

This was a very difficult one to troubleshoot, as SP1 was so new, no one had published details of experiencing similar problems.

Eventually we tracked down the issue to be ActiveOwner problems on the SQL server. The ActiveOwner files are located in "c:\program files\Microsoft Data Protection Manage\dpm\activeowner" on the server hosting the databases (SQL server being backed up). These ActiveOwner are used to manage ownership of databases, important for ensuring multiple DPM servers aren't attempting to backup/restore resources contemporaneously.

After opening the directory and locating the ActiveOwner files for the failing databases, we noticed they were all 0 KB, while healthy ActiveOwner files were 1 KB and contained the name of the owner DPM server.
1. Open  "c:\program files\Microsoft Data Protection Manage\dpm\activeowner" on the database server. 
2. Rename any 0 KB files to <name>.old 
3. Run SetDpmServer from ""c:\program files\Microsoft Data Protection Manage\dpm\bin"Syntax: SetDpmServer.exe -DpmServerName <SERVER>
Replace server with the computer name of your DPM server. 
4. Re-run your synchronization
This fix literally takes 3 minutes, yet it took us an entire week of investigation to come to this conclusion.

Wednesday, 6 March 2013

Distributing Adobe Acrobat 9.x updates in an enterprise

I will save you my Adobe hate rant, but if you ever look at my twitter its well documented I am not a fan of the Adobe update model. It's slow, updates need to be run consecutively and it uses lots of CPU cycles and bandwidth.

I am sure administrators that are pushed for time just ignore Acrobat updates, after all, end-users will never notice the benefits of security patches, right? I prefer to play it safe and try to stay best practice where possible, I have chosen a simple Kixtart script to manage the update process.


The script explained

The code is very simple, providing a step-by-step update from 9.0.0 right up to version 9.5.3, the current version in the 9.x stream as of the time this article was written.

At very least you need to set the $repopath variable to a network location your user/computer accounts can access. You also must populate all the Acrobat .msp updates into the $repopath.

I am using a SCCM "Whether or not a user is logged on" deployment, this means the SYSTEM account is used to during the installation, resultantly I permission-ed my update repository to allow the "Domain Computer" group read access. I decided on installing from network as opposed to downloading all the updates local due to sheer size. The repo is 1.5gb and some computers may only need 100 MB of updates, adding an un-required load onto the network.

I won't paste the whole script here, but below is an example of the update process I am using. It checks the version, installs the next update inline, then checks the version again, repeat, repeat.
Install Update, Check Version
  if ($ver = "9.1.0")
    gosub installAcro911
    gosub acroVerCheck
  endif
Example update install
:installAcro911  ? "Installing Acrobat 9.1.1 upgrade"  SHELL '%comspec% /c msiexec /p "$repopath\AcrobatUpd911_all_incr.msp" /qn /norestart REINSTALL=ALL REINSTALLMODE=omus'  Copy ("generic.tch") ("$touchpath\adobeupgrade911.tch") /H  ? "Acrobat 9.1.1 upgrade complete"return

In the above code I use a "copy generic.tch" command, this is just an empty file I copy to the local file system, it allows me to quickly check the current the update level of Acrobat 9.x, you can remove this step if you wish.

I'm using the "HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall" registry key to check for the current version. I tried reading versions from files in the Acrobat folder and checking the Acrobat registry key but both were unreliable.

The script is available here from my github enjoy!

Tuesday, 12 February 2013

Decreasing Windows 7 and Xendesktop logon time

A windows domain environment provides many benefits such as group policy, the ability to deploy software and customize users settings. With the benefits also comes increased logon times and lag associated with automation, GPO and drive/printer mappings.

In a contemporary situation when a user logs onto a machine for the first time their profile is created and during future logons their is no requirement to re-apply settings/group policy unless the policy has changed. Citrix have addressed the situation of profiles within Xendesktop environments with their Citrix Profile Manager (CPM). CPM can be perfect for some situations, but not all. What if you want your users settings sanitized after every logon? What if you need a clean slate or don't want to manage/delete problematic profiles as they arise?

If you go without CPM then logons are invariably slower due to the re-creation of %userprofile% and collation of policies into HKCU on every logon. This is where creating a custom default profile can be handy. If you pre-create the profile and then remove some of the windows customization stubs, you can cut valuable seconds off your logon time.

For example, my default Xendesktop logon time was around 1 minute for users without CPM. Once I added a custom default profile it dropped to around 45 seconds and removing some of windows default customization stubs dropped the time even further to 40 seconds. If a 20 second reduction doesn't sound like much, just ask your end users that have to endure an eternity of windows welcome screens taunting them with the a never ending spinning circle.



Creating a custom default user profile

Microsoft suggest using their copyprofile unattended.xml method which does work well and is the only Microsoft supported method of overriding the default user profile in Windows 7. Unfortunately for those users that already have a working and highly customized Citrix vdisk, the thought of sysprepping might not be  most welcome idea.

The other method is to do an old fashion override of the default user profile. However there is some caveats with this, it's unsupported by Microsoft and there can be issues such as the My Documents folder being named the same as the account from which you overrode the default user profile with. I have found no such issues with my Xendesktop profiles and I used the override method. If you do chose to use this method, please test robustly.

An extremely handy tool for the override method is Windows Enabler, it un-greys out (for lack of a better term) the "Copy to...." profile box under the Windows User Profiles control panel applet.



I would however suggest if you are using planning to use a customized default profile with a flat Windows 7 (non-virtualized) deployment you do use the copyprofile method, perhaps as part of your SCCM/MDT deployment process.



Removing customization stubs to increase logon time

Even more frustrating than waiting at the welcome screen is getting past it then realizing your going to have to wait another 15-30 seconds for Windows to "personalize your settings", this is where customization stubs come in.

Under the registry path "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Active Setup\Installed Components\" there are a number of listed IDs in the format of "{2C7339CF-2B09-4501-B3F3-F3508C9228ED}". Within some of these ID's is a REG_EXPAND_SZ value named "StubPath".

When a user logs on, regardless of if the default user profile contains the required settings, any stubpath commands in this registry path are executed, costing you valuable milliseconds during logon. We can speed up the logon simply by removing the required stubs.

You can remove the stubs you want by searching for "stubpath" within the "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Active Setup\Installed Components\" key, the "(Default)" value will tell you what the stub in question is responsible for.


Below is an example of some of the stubs I remove by simply applying the below .reg file to my vDisk.


Windows Registry Editor Version 5.00
;IE9
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Active Setup\Installed Components\>{26923b43-4d38-484f-9b9e-de460746276c}]
"StubPath"=-
;Browser
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Active Setup\Installed Components\>{60B49E34-C7CC-11D0-8953-00A0C90347FF}]
"StubPath"=-
;Themes
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Active Setup\Installed Components\{2C7339CF-2B09-4501-B3F3-F3508C9228ED}]
"StubPath"=-
;MailNews
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Active Setup\Installed Components\{44BBA840-CC51-11CF-AAFA-00AA00B6015C}]
"StubPath"=-
;WMP 12.0
[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Active Setup\Installed Components\{6BF52A52-394A-11d3-B153-00C04F79FAA6}]
"StubPath"=-
You should test, test and then test again when removing any of these stubpaths as they can cause unintended consequences. In fact unless you are using a customized default user profile it is probably safest to leave stubpaths alone. 

For example if you remove the Windows Theme stub, without populating the default user profile, this will result in a classic theme (Windows XP style). It is the windows theme stub that handles apply's Windows Basic and if possible Aero themes during logon.


Lets hope these couple of simple changes can improve your users experience.

Friday, 1 February 2013

IBM AutoLoader TS2900 DPM 2012 setup tips


At the end of 2012 I received a new IBM TS2900 autoloader. I didn't have time to set it up, so for a couple of months it was being used as a glorified tape drive.

Last month I got around to configuring DPM 2012 to use the autoloader and hit a few speed bumps along the way. Hopefully the below information can help you setup your device quicker.

Just for reference my platform is Server 2008 R2 running DPM 2012 SP1 with Update Rollup 1.



I can't see the Autoloader in DPM or Device Manager

Hold on didn't I just buy an autoloader, so why can I only see the tape drive in device manager and DPM? This took me a while to work out, but its really easy to resolve.

The tape drive must be set to random mode, if you are using sequential mode your autoloader is essentially a manually controlled tape drive. As soon as you set random mode, the autoloader is presented to windows. You can set random mode by logging into your TS2900 front end and following the below steps.

  1. Click on "Logical", located under the "Configure Library" menu.
  2. Select "Random" from the "Library Mode:" drop down box.
  3. Click submit.
If you are running sequential mode, you might see the below error messages when attempting to install the IBM driver. They are caused because random mode has not been selected and the autoloader is not being presented to windows, hence no device for the driver to install.

DBG:         install_exclusive.c, 1239: InstallVirtualBus: UpdateDriverForPlugAndPlayDevices failed - Update.
EXT: 0 -> -1: install_exclusive.c, 1249: InstallVirtualBus: status 0xe0000235.
DBG:         install_exclusive.c, 885: InstallVirtualBusByType: InstallVirtualBus failed
Program stopped prematurely due to error(s).If the debug flag was set, check debug.txt for details.


Driver Installion

Next you need to install the IBM driver from the IBM download centre. The one I am using at the time of writing this article is named "IBMTape.x64_w08_6233_WHQL_Cert.zip", but I have had success with non-WHQL versions aswell. There is a small tweak required to get the driver working with DPM, follow the below steps to install the driver correctly.

  1. Extract the zip and install the driver by clicking on "install_exclusive.exe"
  2. Wait for the driver installation to complete, then open up Control Panel > Device Manager
  3. You should see the "IBM TotalStorage 3572 Tape Library" under "Medium Changer devices", this is your autoloader. If you don't see it, try uninstalling the driver, rebooting and re-installing the driver. Also ensure you followed the above steps to enable Random library mode on your autoloader.
  4. Right click the "IBM ULTRIUM 5 HH 3580 TAPE DRIVE" (or your equivalent Tape driver), select "Update Driver Software".
  5. Select "Browse my computer for driver software"
  6. Select "Let me pick from a list of device drivers on my computer"
  7. Select "LTO Tape Drive"
  8. Install the driver and close device manager.


This process replaces the recently installed IBM LTO Tape driver with the default Microsoft driver. This is required as DPM (as of release 2012 SP1 Update Pack 1) doesn't work with the IBM provided tape driver. You still must install the IBM driver however, as the autoloader "IBM TotalStorage 3572 Tape Library" does require the IBM driver package to work.

Failure to replace the IBM drivers with the Microsoft drivers will result in the error message below.
The operation failed because of a protection agent failure. (ID 998 Details: The parameter is incorrect (0x80070057))
There you have it, a working TS2900 autoloader, be it with a few quirky work arounds, but since following the above steps mine hasn't missed a beat.



Thursday, 10 January 2013

Citrix Xenserver 6.1 Xentools installation problems

I really do love the Citrix Xendesktop platform and products associated with it, but all too often Citrix have launch issues with their products. The latest issue is with Xentools 6.1 installer being a little dodgy and feature lacking (no PVS/VSS support) being a key problem.
I also experienced a number of issues upgrading from Xentools standard 6.0 to 6.1 on machines that didn't require PVS support.
One of the main issues that I had is what Citrix call "continous reboots with standard tools installation never finishing". Citrix provide the following explaination for the problem:
"This issue occurs when attempting to install the Standard Tools shipped with XenServer 6.1 into a VM that has no virtual network interfaces. A workaround is to create at least one virtual network interface, install the Standard Tools, and then remove the virtual network interface (if so desired)."
In my case there was in fact a virtual network interface and I was still having the looping. Eventually after around 10 reboots the installer simply hangs at the very start of the "installing drivers, installing guest tools screen" and goes no further.
Windows eventlogs give no clues and Citrix logs don't any useful information. After following the "Uninstalling the Xenserver 6.1 standard tools" steps from the CTX135099 Xenserver Tools Workaround Guide for 6.1.0, including removing all the Windows Driver packages manually, I still had no joy. In fact on some systems I couldn't remove the "Windows Driver Package - Citrix Systems Inc. (xennet) Net" package.
After scouring windows programs and featuring and removing everything guest/driver related I feel back to the trusted "wmic product get name" command to get a list of installed products. I found "Citrix Xen Windows x64 PV Drivers" which wasn't listed under Windows programs and features GUI. It seems the "Citrix Xen Windows x64 PV Drivers" package failed to uninstall or install properly and was holding up the xentools installation process. To resolve the problem is fairly simple.
  1. Boot into Windows and open a command prompt
  2. Issue the command - wmic product where name='Citrix Xen Windows x64 PV Drivers' call uninstall
  3. Reboot
  4. Rerun the xentools installation process
After following the above steps I finally had a working Xentools standard successfully installed with statistically reporting being sent back to Xencenter.

Remember if you want to install the legacy tools (with support for PVS and volume shadow copy) then your VM must have its platform:device_id set to 0001. You can read more about changing the device_id under the section title "Preparing to Install the XenServer 6.0.2 Hotfix 9 Tools or XenServer 6.1 Legacy Tools in a New Windows Vista, Windows 7, Windows Server 2008, or Windows Server 2008 R2 VM (for PVS or VSS Support)" of CTX135099.