Recently, I set up a SharePoint Server 2013 farm for a client. It is a small 3-server farm setup consisting of one web front end server, one application server, and one database server. After the installation completed successfully, there was a problem with AppFabric Distributed Cache Service. The service couldn’t start either automatically or manually. The service kept crashing. There were several error messages on the event logs and the ULS logs:
Microsoft.Fabric.Common.OperationCompletedException: Operation completed with an exception —> System.TimeoutException: The operation has timed out.
AppFabricCachingService.CrashMicrosoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRService0001>:SubStatus<ES0001>:Service initialization failed. No user action required.
I have tried reinstalling the service, the farm, and even the whole SharePoint software but the problem persisted. To give some context the following is the server and farm setup:
|Virtualization||Yes, all 3 farm servers are virtual machines|
|Virtualization Software||VMware ESX 4.1|
|Operating System||Windows Server 2008 R2 SP1|
|SharePoint||SharePoint Server 2013 RTM|
|SQL Server||SQL Server 2008 R2 SP1|
This issue didn’t happen on other environment, so it could be unique to the hardware or the virtualization system.
After many sessions of troubleshooting with Microsoft Support, we managed to solve the issue with a workaround, by altering the default distributed cache service configuration and manually set a smaller cache size. Follow the steps to do it:
- Open SharePoint 2013 Management Shell as Administrator.
- Type Use-CacheCluster.
- Type Export-CacheClusterConfig .\afconfig.xml. You can change .\afconfig.xml to another location or filename if necessary.
- Open afconfig.xml with Notepad or any other text editor.
- Find and replace <dataCache size=”Medium”> to <dataCache size=”Small”>.
- Find and replace <caches partitionCount=”256″> to <caches partitionCount=”32″>.
- Save and close afconfig.xml.
- Go back to SharePoint 2013 Management Shell as Administrator.
- Type Stop-CacheCluster to ensure that AppFabric Distributed Cache Service is not running.
- Type Import-CacheClusterConfig .\afconfig.xml.
- Type Start-CacheCluster and ensure that all cache hosts are started.
- If the above command exits before all cache hosts are started (in starting condition), wait for a few minutes, then type Get-CacheHost.
The above workaround solves the crashing issue of AppFabric Distributed Cache Service. What’s weird, though, after I reverted back the configuration to Medium dataCache size and 256 cache partitionCount, AppFabric Distributed Cache Service still runs happily. And I never found the root cause of the issue. But at least now the farm is up and running.