ESM Troubleshooting

You are sitting contently, minding your own business, applying the latest Banner upgrading using Ellucian Solution Manager (ESM), when you suddenly get the following:

What the hell do you do now?

ESM, when it works, actually makes the Banner Admin's life much easier. Instead of endlessly typing "sqlplus /nolog @gostage", we can now apply most upgrades with just a "click of a button".

Unfortunately, the flip side of this is that when things break, it's MUCH more difficult to troubleshoot. Error logs are usually buried deep down several layers of subdirectories with obscure names.

This week, let's go over some troubleshooting tips when ESM errs out.

 

ESM File Layout:

The ESM process is run on the Jobsub Server as Jenkins agent under the "banner" user. In addition, the agent sets up the following two directories within the $BANNER_HOME.

$BANNER_HOME/upgrades - this directory contains the actual upgrade files.

Under this directory you will see directories such as (stu80500u, arsys80301u, etc). ESM will actually run the uprades from here.

$BANNER_HOME/bmui - this directory contains a "/logs" directory which contains logfiles of all upgrades run.

The top level directories are named using the Unix timestamp of the time in which they were started. The easiest way to find the most current log directory (which is most cases will be the directory you are most interested in) is to do an "ls -lt" command here.

Under each log directory, you still see a BMUI_ENV* directory which actually contains the logfiles.


So with this in mind, when ESM errs out, here's quick list of things you could try to resolve the problem.

  • Locate the appropriate log file directory (ls -lt), and then locate the most recent logfile (ls -lt as well) and check for any errors, If you find them, fix and then restart the job. Common errors include locked account owners (BANINST1. SATURN. etc) and/or out of tablespace extents.
  • When a job errs out, you need to cancel the job in ESM, remember you need to cancel it in the jobs menu and restart. In most cases this is not a problem as ESM calls GOSTAGE, and GOSTAGE will remember the last step completed from the GURDMOD table.
  • In 99% of the cases, checking the log files will give you enough information to resolve the issue(s). However, if you still cannot fix the problem, you can proceed to continue the upgrade manually $BANNER_HOME/upgrades directory. If you decide to do this, be sure to edit the login.sql file and all the passwords of all module owners. (You can them proceed to start using 'perl sctupgd.pl')
  • If you manually finish an upgrade outside, remove/rename the restart directory located under the $BANNER_HOME/bmui directory. This directory contains restart information for the current upgrade. If you finish that manually, you will have problems when you run a subsqeuently upgrade via ESM.