SharePoint 2010 and governance are tightly connected these days. Governance consists of policies, roles, responsibilities and processes to ‘avoid’ chaos in the SharePoint environment.
Some of the possible policies implemented by an organization can be monitored by SharePoint itself by the new feature called SharePoint Health Analyzer. The Health Analyzer is rule based and the rules check the farm for potential problems. SharePoint has several out of the box rules, like ‘Drives are running out of disk space’ and ‘The server farm account should not be used for other services’. To make administrators aware of possible issues with the farm all errors are listed in the Health Reports list.
To monitor additional items not available out of the box, custom health rules can be built.
Functionality of the custom health rule
Sandboxed solutions can be monitored a little bit better than SharePoint default provides.
When the Microsoft SharePoint Foundation Sandboxed Code Service is started sandboxed solutions can be deployed and used. To monitor the resources consumed by the sandboxed solutions a couple of timer jobs measure and collect the consumed resources:
- Solution Daily Resource Usage Update
- Solution Resource Usage Log Processing
- Solution Resource Usage Update
It is possible to deploy sandboxed solution when the timer jobs are not running. In that case the resources consumed by the sandboxed solutions are not measured. This means when the sandboxed solutions are consuming a lot of resources the farm can get down in not a nice way. One of the strengths of sandboxed solutions is that they are hosted in a partially trusted context so they do not affect the rest of the SharePoint implementation...
Wouldn’t it be nice to be sure the timer jobs are running when the Sandboxed Code Service is running?
One of the strengths of sandboxed solutions is that they are hosted in a partially trusted context so they do not affect the rest of the SharePoint implementation
Build the rule
To set up a health analyzer rule an empty SharePoint project is created. A class is added which inherits from SPHealthAnalysisRule. This is the class to implement the rule definition. The class can also be inherited from SPRepairableHealthAnalysisRule. This means the rule can repair the problem itself. When inheriting from SPRepairableHealthAnalysisRule the method Repair() has to be implemented as well.
To define a rule some properties has to be overridden:
- Category - rule definitions are grouped by Category in the default view
- Summary - the title for the rule
- Explanation - displayed in the Health Reports list when the rule fails
- Remedy - displayed in the Health Reports list when the rule fails
- ErrorLevel - severity of the failure of the rule
All rule definitions can be found in the rule definition list: Central Administration, Monitoring, Review rule definitions. The Health Reports list can be found in the Central Administration, Monitoring, Review problems and solutions. The Health Reports list explains the issue, failing servers and services and the remedy to the problem.
The difference between the two lists is that the rule definition list displays all the existing rules; the Health Reports list displays the rules which cause problems in the farm.
The rules are categorized in both lists. Code listing 1 lists the custom rule in the Configuration category.
public override SPHealthCategory Category
{
get
{
return SPHealthCategory.Configuration;
}
}
Listing 1: Override the Category property
The title of the rule can be set by overriding the Summary property as shown in code listing 2.
public override string Summary
{
get
{
return "Verify the sandbox related jobs are started.";
}
}
Listing 2: Override the Summary property
The Summary is shown in the rule definition list and the health reports list.
The custom rule can be scheduled by overriding the AutomaticExecutionParameters property. In the get accessor a SPHealthAnalysisRuleAutomaticExecutionParameters object is returned.
When the property isn’t overridden the rule can be scheduled manually by a farm administrator.
public override SPHealthAnalysisRuleAutomaticExecutionParameters AutomaticExecutionParameters
{
get
{
SPHealthAnalysisRuleAutomaticExecutionParameters
parameter = new
SPHealthAnalysisRuleAutomaticExecutionParameters();
parameter.Schedule = SPHealthCheckSchedule.Hourly;
parameter.Scope = SPHealthCheckScope.Any;
parameter.RepairAutomatically = false;
parameter.ServiceType = typeof(SPTimerService);
return parameter;
}
}
Listing 3: Overriding property SPHealthAnalysisRuleAutomaticExecutionParameters
In code listing 3 the rule is scheduled on an hourly basis and the scope is set to SPHealthCheckScope.Any. This means the rule will run on an hourly basis on the first available computer with the specified service.
The code that identifies the real problem is the Check() method. The Check() method returns the outcome of the check: SPHeathStatus.Failed or SPHealthStatus.Passed. When SPHeathStatus.Failed is the result of the check, the rule will be displayed in the Health Reports list to make the administrator aware of the problem.
public override SPHealthCheckStatus Check()
{
jobTitleOfDisabledJobs.Clear();
SPUserCodeService userCodeService = SPUserCodeService.Local;
if (userCodeService.IsEnabled)
{
using (SPSite site = new SPSite("your_site_url"))
foreach (SPJobDefinition job
in site.WebApplication.JobDefinitions)
{
switch (job.Title)
{
case "Solution Daily Resource Usage Update":
AddJobTitle(job);
break;
case "Solution Resource Usage Log Processing":
AddJobTitle(job);
break;
case "Solution Resource Usage Update":
AddJobTitle(job);
break;
default:
break;
}
}
if (jobTitleOfDisabledJobs.Count != 0)
{
return SPHealthCheckStatus.Failed;
}
}
return SPHealthCheckStatus.Passed;
}
Listing 4: Implementation of the Check() method
The code in listing 4 checks if the Sandboxes Code Service is started. If the service is started, it checks if the timer jobs involved in measuring the resources consumed by the solutions are enabled. When the service isn’t started there are no running sandboxed solutions and no resources have to be measured. The AddJobTitle() method adds the title(s) of the disabled job(s) to a generic list of strings as stated in code listing 5.
private void AddJobTitle(SPJobDefinition job)
{
if (job.IsDisabled)
{
jobTitleOfDisabledJobs.Add(job.Title);
}
}
Listing 5: Collection of titles of disabled jobs
The generic list is used at the Explanation property to inform the farm administrator exactly which job(s) aren't enabled. Code listing 6 overrides the Explanation property and returns an informative message including the titles of the jobs which aren’t enabled.
public override string Explanation
{
get
{
string jobTitles = string.Empty;
for (int i = 0; i < jobTitleOfDisabledJobs.Count; i++)
{
jobTitles += jobTitleOfDisabledJobs[i].ToString();
if (i != jobTitleOfDisabledJobs.Count - 1)
{
jobTitles += " / ";
}
}
return "The Microsoft SharePoint Foundation Sandboxed
Code Service is started, but not all the " +
"timerjobs are: " + jobTitles;
}
}
Listing 6: Display an informative message in the Explanation property of the rule
Besides explaining the issue in detail a message can be displayed to the administrator on how to act on the issue. Code listing 7 overrides the Remedy property.
public override string Remedy
{
get
{
return "Start the timerjobs: Solution Daily Usage
Update, Solution Resource Usage Log Processing
and/or Solution Resource Usage Update.";
}
}
Listing 7: Override the Remedy property
Code listing 8 overrides the ErrorLevel property and sets the severity of the issue to Warning.
public override SPHealthCheckErrorLevel ErrorLevel
{
get
{
return SPHealthCheckErrorLevel.Warning;
}
}
Listing 8: Override the ErrorLevel property
Deploy the rule
The rule can be deployed by adding a farm level scoped feature to the solution, and add an event receiver to the feature.
Code listing 9 overrides the FeatureInstalled and FeatureUninstalling to register and unregister the rule.
public override void FeatureInstalled
(SPFeatureReceiverProperties properties)
{
try
{
Assembly currentAssembly = Assembly.GetExecutingAssembly();
IDictionary<Type, Exception>
exceptions =
SPHealthAnalyzer.RegisterRules(currentAssembly);
if (exceptions != null)
{
if (exceptions.Count == 0)
{
//ok
}
else
{
//something went wrong, take appropriate action
}
}
}
catch (Exception ex)
{
throw new Exception("There was an error registering
the health rule: " + ex.Message);
}
}
public override void FeatureUninstalling
(SPFeatureReceiverProperties properties)
{
try
{
Assembly currentAssembly = Assembly.GetExecutingAssembly();
IDictionary<Type, Exception>
exceptions =
SPHealthAnalyzer.UnregisterRules(currentAssembly);
if (exceptions != null)
{
if (exceptions.Count == 0)
{
//ok
}
else
{
//something went wrong, take appropriate action
}
}
}
catch (Exception ex)
{
throw new Exception("There was an error removing
the health rule: " + ex.Message);
}
}
Listing 9: Register and unregister the rule
After the deployment, the rule can be found in the rule definition list: Central Administration, Monitoring, Review rule definitions. Details of the rule are shown in Figure 1.
Figure 1: The rule definition
The default view of the rule definition list is showing all the defined rules EXCEPT the rules defined with the category ‘System’. Of course it's possible to create an additional view to show all the rules.
Run and test the rule
By disabling for example the ‘Solution Daily Resource Usage Update’ job and the ‘Solution Resource Usage Update’ job the rule can be tested to fail.
The rule is scheduled to run every hour as defined in code listing 3 and shown in Figure 1. To speed things up the rule can be selected to run directly by pressing Run Now. The timer job which actually runs the Health Analyzer jobs, dependent of the schedule, is in this case Health Analysis Job (Hourly, Microsoft SharePoint Foundation Timer, Any Server). The job can be found in Central Administration, Monitoring, Review Job Definitions. By selecting the job title the details of the job are shown and the job can run immediately by selecting Run Now.
The rule fails and shows up in the Health Reports list as shown in Figure 2.
Figure 2: The rule fails
The little yellow sign at the bottom of the icon in Figure 2 shows a visual representation of the ErrorLevel set in code listing 8. The Summary of the rule set in code listing 2 is displayed as the title of the rule.
The details of the rule can be viewed by selecting the text in the Health Reports list. A dialog opens as shown in Figure 3. Note that the Explanation of the rule changes when different sandbox timer jobs are disabled or enabled and the rule is reanalyzed. The Explanation property displays exactly which timer jobs aren´t enabled.
Keep in mind to attach the debugger to the timer services when debugging the rules: OWSTimer.exe process.
Figure 3: Details of the failed rule
Settings are not updated
It is possible that a property like the category will change during development. A mistake is easily made and the rule has to move from the Configuration to the Performance category. Code listing 10 displays the minor change in code.
public override SPHealthCategory Category
{
get
{
return SPHealthCategory.Performance;
}
}
Listing 10: Changing the Category of the rule definition
After the deployment of the solution with the changed Category property the rule fails, but the rule shows up at the 'old' category Configuration.
This is one of the possible 'issues' which can occur if the Timer Service Recycle timer job isn’t restarted. The job is like an iisreset, but for the timer service, it recycles the Timer Service to free resources.
Conclusion
It is a good practice to implement health rules to monitor the farm, be informed about issues and take the appropriate action. The rules are not that hard to program, but can be really useful to have an overview of issues.
Whenever possible implement health rules that comply with the governance plan of the organization. It will be easier and maybe avoids chaos in the SharePoint environment