SharePoint Server 2013 Issue: AppFabric Distributed Cache Service Crashes

Recently, I set up a SharePoint Server 2013 farm for a client. It is a small 3-server farm setup consisting of one web front end server, one application server, and one database server. After the installation completed successfully, there was a problem with AppFabric Distributed Cache Service. The service couldn’t start either automatically or manually. The service kept crashing. There were several error messages on the event logs and the ULS logs:

Microsoft.Fabric.Common.OperationCompletedException: Operation completed with an exception —> System.TimeoutException: The operation has timed out.

AppFabricCachingService.CrashMicrosoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRService0001>:SubStatus<ES0001>:Service initialization failed. No user action required.

I have tried reinstalling the service, the farm, and even the whole SharePoint software but the problem persisted. To give some context the following is the server and farm setup:

Virtualization Yes, all 3 farm servers are virtual machines
Virtualization Software VMware ESX 4.1
Operating System Windows Server 2008 R2 SP1
SharePoint SharePoint Server 2013 RTM
SQL Server SQL Server 2008 R2 SP1

This issue didn’t happen on other environment, so it could be unique to the hardware or the virtualization system.

After many sessions of troubleshooting with Microsoft Support, we managed to solve the issue with a workaround, by altering the default distributed cache service configuration and manually set a smaller cache size. Follow the steps to do it:

  1. Open SharePoint 2013 Management Shell as Administrator.
  2. Type Use-CacheCluster.
  3. Type Export-CacheClusterConfig .\afconfig.xml. You can change .\afconfig.xml to another location or filename if necessary.
  4. Open afconfig.xml with Notepad or any other text editor.
  5. Find and replace <dataCache size=”Medium”> to <dataCache size=”Small”>.
  6. Find and replace <caches partitionCount=”256″> to <caches partitionCount=”32″>.
  7. Save and close afconfig.xml.
  8. Go back to SharePoint 2013 Management Shell as Administrator.
  9. Type Stop-CacheCluster to ensure that AppFabric Distributed Cache Service is not running.
  10. Type Import-CacheClusterConfig .\afconfig.xml.
  11. Type Start-CacheCluster and ensure that all cache hosts are started.
  12. If the above command exits before all cache hosts are started (in starting condition), wait for a few minutes, then type Get-CacheHost.

The above workaround solves the crashing issue of AppFabric Distributed Cache Service. What’s weird, though, after I reverted back the configuration to Medium dataCache size and 256 cache partitionCount, AppFabric Distributed Cache Service still runs happily. And I never found the root cause of the issue. But at least now the farm is up and running.

SSRS (SQL Server Reporting Service) Data Source losing the stored credential

I had a SharePoint integrated SSRS instance. Lately some problems happened where the data source is losing the stored credential every night. It happened both to the shared data source and the custom data source.

When I manually reentered the credential, it would be wiped again overnight.

Turned out, the problem is because it didn’t use a domain account as the service account. Based on certain security recommendation, I configured the reporting service to use one of the built-in accounts. It turned out to cause the issue.

Solution: Use a domain user account as the reporting service’s service account.

SharePoint error when activating publishing feature: the trial period for this product has expired

On some of my SharePoint 2007 farms, when I tried to activate the publishing feature, I got this error:

The trial period for this product has expired.

I was certain that my license had not expired, so I did some research on Google. I found that it was a known issue that affected Microsoft Office Servers product with service pack 2.

My SharePoint farms had been patched to version (SP2 + December 09 cumulative update), so they were affected.

The solution was to install Microsoft KB971620. It required a server reboot after installation.

not the last of his kind