Friday, June 30, 2017

Tivoli (TSM) Backup can't backup a drive in Windows

Over here we use IBM's TSM backup system. I do not want to go over its features and setup, but the bottom line is that I get an email listing the backup status for each machine (known as nodes in TSM lingo) I am backing up. And one day one of those nodes barked:

backup7x SERVER02.EXAMPLE           Failed***    12     2017-06-22 00:00:00 2017-06-22 
00:01:06 2017-06-22 00:01:07

If you are curious about the 12, here is what it means right out of that very same email (I copied that session including the wasteful blank lines):

Result:

0 - Success.

1 - See explanation for 'Missed'.

4 - The operation completed successfully, but some files were not
processed.

8 - The operation completed with at least one warning message.

12 - The operation completed with at least one error message
(except for error messages for skipped files).

That does not help me much. You see, I like to have access to logs and not sad face cryptic messages. So I went to C:\Program Files\Tivoli\TSM\baclient to look into dsmsched.log for any funny business. And funny business I found:

06/22/2017 00:01:09 --- SCHEDULEREC OBJECT BEGIN D-0000AM 06/22/2017 00:00:00
06/22/2017 00:01:10 Incremental backup of volume '\\server02\d$'
06/22/2017 00:01:11 ANS1228E Sending of object '\\server02\d$' failed.
06/22/2017 00:01:11 ANS1751E Error processing '\\server02\d$': The file system can not 
be accessed.
06/22/2017 00:01:11 --- SCHEDULEREC STATUS BEGIN
06/22/2017 00:01:11 --- SCHEDULEREC OBJECT END D-0000AM 06/22/2017 00:00:00
06/22/2017 00:01:11 ANS1512E Scheduled event 'D-0000AM' failed.  Return code = 12.
06/22/2017 00:01:11 Sending results for scheduled event 'D-0000AM'.
06/22/2017 00:01:11 Results sent to server for scheduled event 'D-0000AM'.

Ok, what's so special about the D drive? I looked at the config file, C:\Program Files\Tivoli\TSM\baclient\dsm.opt, and it seems to be right. If you do not believe me (I wouldn't and I have to live with me), here are its first few lines:

NODENAME SERVER02.EXAMPLE
TCPSERVERADDRESS backup7x.example.com

DOMAIN "\\server2\d$"
MANAGEDSERVICES WEBCLIENT SCHEDULE
webports 1501 1581

txnbytelimit 25600
schedmode prompted
schedlogretent 30,d
errorlogretent 30,d
passwordaccess generate
quiet
tapeprompt no


EXCLUDE.BACKUP "*:\Thumbs.db"
EXCLUDE.BACKUP "*:\desktop.ini"
EXCLUDE.BACKUP "*:\*.tmp"
EXCLUDE.BACKUP "*:\...\Scans\mpcache-*"
EXCLUDE.BACKUP "*:\microsoft uam volume\...\*"
EXCLUDE.BACKUP "*:\microsoft uam volume\...\*.*"
EXCLUDE.BACKUP "*:\...\EA DATA. SF"
EXCLUDE.BACKUP "*:\IBMBIO.COM"
EXCLUDE.BACKUP "*:\IBMDOS.COM"
EXCLUDE.BACKUP "*:\IO.SYS"
[...]

As you can see, I am telling it to only backup the D drive. Maybe we should take a look at this drive and see who can access it:

C:\Users\raub> icacls d:\
d:\ AD\EXAMPLE_Domain Admins:(OI)(CI)(F)
    AD\EXAMPLE_Users:(RX)

Successfully processed 1 files; Failed processing 0 files
C:\Users\raub>
Where:
  • OI: Object inherit
  • CI: Container inherit
  • F: Full access
  • RX: Read and execute

We can also do that through powershell:

PS C:\Users\raub> get-acl d:\ | fl


Path   : Microsoft.PowerShell.Core\FileSystem::D:\
Owner  : BUILTIN\Administrators
Group  : AD\Domain Users
Access : AD\EXAMPLE_Domain Admins Allow  FullControl
         AD\EXAMPLE_Users Allow  ReadAndExecute, Synchronize
Audit  :
Sddl   : O:BAG:DUD:PAI(A;OICI;0x1200a9;;;SY)(A;OICI;FA;;;S-1-5-21-344340502-4252695000-2390403120-1439459)(A;;0x1200a9;
         ;;S-1-5-21-344340502-4252695000-2390403120-1439468)(A;OICI;FA;;;S-1-5-21-344340502-4252695000-2390403120-14759
         66)



PS C:\Users\raub>

which as you can see is a more verbose way to say the same thing. But what is missing here? You see, by default Windows services are run by the system user (it's full name is NT AUTHORITY\SYSTEM. So let's add it. Does it need to write to the drive as far as TSM is concerned? We are backing up here. Maybe if we need to restore we might need to write but we will cross that bridge when we get to it (hopefully never).

You can add that user and setup the permissions (I did read-execute; but wonder if read only would suffice. Let me know if you find the answer) either using the windows explorer, icacls, or Set-Acl. Pick one; what really matters is that in the end of the day you should have something like this:

C:\Users\raub> icacls d:\
d:\ AD\EXAMPLE_Domain Admins:(OI)(CI)(F)
    AD\EXAMPLE_Users:(RX)
    NT AUTHORITY\SYSTEM:(OI)(CI)(RX)

Successfully processed 1 files; Failed processing 0 files
C:\Users\raub>
or in powershell,

PS C:\Users\raub> get-acl d:\ | fl


Path   : Microsoft.PowerShell.Core\FileSystem::D:\
Owner  : BUILTIN\Administrators
Group  : AD\Domain Users
Access : NT AUTHORITY\SYSTEM Allow  ReadAndExecute, Synchronize
         AD\EXAMPLE_Domain Admins Allow  FullControl
         AD\EXAMPLE_Users Allow  ReadAndExecute, Synchronize
Audit  :
Sddl   : O:BAG:DUD:PAI(A;OICI;0x1200a9;;;SY)(A;OICI;FA;;;S-1-5-21-344340502-4252695000-2390403120-1439459)(A;;0x1200a9;
         ;;S-1-5-21-344340502-4252695000-2390403120-1439468)(A;OICI;FA;;;S-1-5-21-344340502-4252695000-2390403120-14759
         66)



PS C:\Users\raub>

And now I get an email saying all is well:

backup7x SERVER02.EXAMPLE           Completed    0      2017-06-24 00:00:00 2017-06-24 
00:00:54 2017-06-24 01:07:57

Some of you noticed this status email is from 2 days later. The reason was that on the 23rd it was catching up and that took quite a while.