Tuesday, February 17, 2015

What should I check with a Health Assessment?

When you perform a health assessment of a SharePoint farm, you need to check everything you have and compare it to patterns and practices.  In some cases you may come across limits (supported maximums) and boundaries (hard limits) for certain settings, your goal should be to ensure you are well within any limits and to have a plan in place to maintain your settings within the standards and practices as they relate to your farms.


The purpose of this blog post is to give you a guide into the physical attributes for your solution and what you need to check.  I do not talk about tools in this blog, but suggest you employ a tool for your health assessment because it provides consistent, repeatable approach to your solutions health.


I will not be too verbose in this post, but rather will concentrate on the areas one of my cohorts, Kevin Cole (follow him on twitter at ), a Microsoft Certified Master of SharePoint 2010 and brilliant technical mind, and I came up with.  I have the areas broken down into 11 different sections and will briefly talk about what you need to know in each of the areas, so lets get to it.


The Check Points

As I mentioned you can check these things manually, but it will be time consuming, there are many tools available for you to perform these, we use PowerShell and it allows us to regularly and consistently create our reports for health.  I have not gone in depth into any of these, but I will add to this/modify it if you provide feedback.  This is a work in progress, but as far as I know the only check list that I have found to date that covers off the farm.


Servers

  1. Determine the servers being used in the farm: Server identification is needed to understand the resources you are working with and to identify gaps in architecture
  2. Determine the roles of each server in the farm: The role tells you what the server is doing and on which tier of the farm architecture the server resides.
  3. Draw the logical diagram of the farm: A list of servers and their roles is difficult for the average user to understand, a graphical representation makes it easier for everyone to understand.
  4. Gather the number of processors, type and if they are dedicated or shared (VM) for each server: Knowing the allocated processing power helps identify processing shortfalls that may cause performance issues.
  5. Gather the RAM and whether it is dedicated or shared (VM) for each server: Knowing the allocated RAM helps identify when disk caching will occur and identify performance issues.
  6. Gather the total and available storage for each server (Physical and SAN): Understanding your storage and any limitations will ensure you don't run into a situation that has you scrambling to add storage.  In addition, configuration of swap drives, etc. can affect performance.
  7. Gather the type, current capacity, allocated and maximum capacity of the SAN: Knowing the SAN capacity will help with determining current capacity and planned growth. The type of SAN will help identify any RBS provider issues or determine what is needed to implement RBS, if it has not been implemented.
  8. Determine the hardware lifecycle for server infrastructure: Understanding how old each server is and when it is planned to be replaced allows for a proper perspective when identifying which servers are underpowered for the current environment or for future growth.
  9. Determine the patch levels of the server OS and all dependent services: Identifying any outstanding patches will identify any risks to the stability of the OS and the services SharePoint relies upon and may identify possible security exploits.
  10. Determine patching schedule and outage windows for the solution: Patching Schedules and Outage windows are important to the health of the servers, allowing for proper maintenance of the servers without the risk of causing a disruption. Determine if and when patching is
    performed, when the outage window occurs and how long it lasts.
  11. Determine the SQL Server version and patch level: Knowing your SQL Server version and patch level will help you identify issues with performance and may identify security holes.  In addition, the SQL Server version affects some feature availability and limitations, depending on your farm.
  12. RBS SQL Server Configuration: Storing BLOBs in the database can consume large amounts of file space and expensive server resources. RBS efficiently transfers the BLOBs to a dedicated storage solution of your choosing, and stores references to them in the database. This frees server storage for structured data, and frees server resources for database operations.
  13. RBS BLOB Threshold: Setting the right size threshold will ensure a balance between processing needed to offload large files and your content database size.
  14. SAN Configuration: A misconfigured SAN can cause increased latency and other issues to RBS, SharePoint and SQL Server.
  15. Storage Provider Configuration: Using the correct storage provider (and correct version) for your SAN will improve performance. 
  16. SAN Capacity: Ensure your future storage needs do not exceed the current capacity, check for the current utilization and available storage as well as the ability to expand storage hardware if needed.
  17. SharePoint RBS Configuration: Ensure your farm is configured correctly for RBS.
  18. BLOB caching setup: Disk-based caching is extremely fast and eliminates the need for database round trips if it is configured properly.
  19. RAM Utilization: Ensure your farm servers are not over utilized.
  20. CPU Utilization: Ensure your farm servers are not over utilized.
  21. User Profile import filters:  Are service accounts and disabled accounts filtered out?
  22. User profile synchronization schedule: Find the right balance for the sync. 
  23. Portal super reader and super user accounts setup: Verify they are set properly and that the membership is correct. 
  24. Office web apps cache: It is recommended to isolate the content database used for the Office Web Apps cache, so that cached files do not contribute to size of the "main" content database(s) for the Web application.
  25. OWA service apps: Ensure the Apps are running on correct server roles.
  26. Web apps: Ensure Web apps are not running in ASP.NET debug mode in production.
  27. Farms: Record the number of Farms and purpose of each.
  28. Web Apps: Ensure Web apps are configured correctly.
  29. Content Databases: Ensure proper content database sizes and configuration.
  30. Site Collections: Ensure properly sized and organized site collections.
  31. Custom Features: Review and record the Custom Features, where they are used, their intended purpose and proper installation and activation.
  32. Custom Apps: Review and record all custom apps installed on the farm, their intended use and where they are being used.
  33. Custom Web Parts: Review and record where any custom web parts are being used and that they are working properly.
  34. Environments: Record and ensure the environments are synchronized and consistent with each other and that they are being used for their intended purpose.
  35. Environment Patching: Check environments for consistent patching (build numbers) between all environments
  36. SQL Naming: Ensure SQL Servers are using SQL Aliases, not computer names or CNAMES
  37. DNS: Ensure host records defined for the SQL Aliases
     

Platform

  1. Page File on a separate drive from the OS, SharePoint and Logs
  2. Does Storage meet the farms needs (current vs. projected)
  3. Are there large files being stored in document repositories
  4. Record number and size of files
  5. Is there a change management process involved?


Logs

  1. Check Application log for errors
  2. Check System log for errors
  3. Check ULS log for errors/ critical / warnings
  4. Check IIS logs for 503 error pages
  5. Check IIS logs for slow (>200ms) loading pages
  6. Check IIS logs for Active Directory Latency (304 not modified with excessive load times)
  7. Check IIS logs for dead links (404 errors)
  8. Check Requests per second count from IIS logs
  9. Check log locations (SharePoint/IIS should be on a secondary drive)
  10. Check for unrestricted growth
  11. Check log drive capacity/utilization


Solution Integrity

  1. Old SSP Site removed (for in place upgrades)
  2. Check Supported Limits for Managed path counts
  3. Check Supported Limits for Content DB sizes
  4. Check Supported Limits for List item counts
  5. Check for deleted pages in navigation
  6. Check for unused content sources in the search crawl
  7. Check Health Analyzer rules
  8. Check patch levels for all content databases
  9. Check for orphaned site collections
  10. Check for broken site collections
  11. Check for broken my sites
  12. Check for missing web part references (Error web part detected)
  13. Any Sites running in UI Compatibility Mode (2007 or 2010)
  14. Check code quality process for stress testing
  15. Check code quality process for load testing
  16. Check code quality process for security testing (each role)


Continuity

  1. Is backup being performed? 
  2. Review backup process
  3. Is the disaster recovery plan tested and reviewed annually? 
  4. Ensure Central Admin is redundant.
  5. Is disaster recovery farm on another site? 
  6. Virtual machines distributed properly across physical hosts for disaster protection?
  7.  Check for role redundancy for Web front ends
  8.  Check for role redundancy for Application Servers
  9.  Check for role redundancy for Database
  10.  Check for Service redundancy 

Security 

  1. Check for Extra ISA Firewall rules.
  2. Check SSL Use // IPSEC
  3. Are MySites hosted on a dedicated web application?
  4. Is the farm admin able to manage the service accounts?
  5. Ensure farm account is not be used for other services.
  6. Farm account should not be in local administrators group unless doing install or patch.
  7. Ensure external access uses SSL?
  8. Kerberos Configuration (SPN's configured properly)
  9. Ensure the proper number of service accounts:
    SP 2007: 3
    SP 2010: 5
    SP 2013: up to 16 service and 3 server.
  10. Ensure My Sites are configured with secondary site collection owners.
  11. Ensure farm admin and service accounts are not be permitted interactive logon.
  12. Ensure the proper service accounts are used for the proper services:

Database

  1. Check content databases within limits.
  2. Check transaction log sizes.
  3. Check for excessive free space. // shrink db
  4. Trim audit logs to reduce content db size.
  5. Check for maximum degree of parallelism.
  6. Ensure database auto growth sizes set properly.

Information Architecture

  1. Verify: universal site taxonomy.
  2. Check maximum site depth.
  3. Check maximum site width
  4. Check for a high number of role assignments on individual items.
  5. Check for a high number of unique permissions.
  6. Check content growth projections.
  7. Check for a high number of sites sharing a content database.

Branding

  1. Are there any custom master pages?
  2. Are the custom master pages or page layouts working properly?
  3. Are all images / styles / etc checked in and published?

Customization

  1. What WSP Solutions are deployed?
  2. Are any InfoPath forms deployed?
  3. Check for Invalid / missing Feature counts.
  4. Ensure assemblies are compiled in release mode not debug mode.
  5. Which solutions are 3rd party?
  6. Which solutions are in house?
  7. Check solution utilization (Where, activation locations, actual usage)

Search

  1. Check crawl logs for any errors or warnings.
  2. Check crawl schedules.
  3. Check crawl running time versus crawl interval.
  4. Check for successful crawls and crawl failures.
  5. Check search service account configuration.


I realize there may be some repetition above, but the purpose of this is to help you ensure a healthy environment.  If you have any questions, additions or modifications, please comment and I will make updates.  Please follow me on twitter @DavidRMcMillan and @DevFactoPortals.  I look forward to making this a resource any admin can use.

No comments:

Post a Comment