Rediger

Troubleshoot common Durable Task SDK issues

This article helps you diagnose and fix common issues when building applications with the portable Durable Task SDKs. These SDKs connect to the Durable Task Scheduler backend and run on any hosting platform, including Azure Container Apps, Kubernetes, and VMs. For issues specific to the Durable Task Scheduler service, see Troubleshoot the Durable Task Scheduler. For Durable Functions issues, see Durable Functions troubleshooting guide.

Tip

The Durable Task Scheduler monitoring dashboard is useful for inspecting orchestration status, viewing execution history, and identifying failures. Use it alongside this guide to speed up troubleshooting.

Find your issue

Error message or symptom Section
connection refused or failed to connect at startup Emulator isn't running or is unreachable
Connection string parse errors or authentication errors at startup Connection string format is incorrect
Worker connects but orchestrations don't start Task hub doesn't exist
401 Unauthorized or identity/role errors on Azure Identity-based authentication failures on Azure
Orchestration stuck in "Pending" Orchestration is stuck in the "Pending" state
Orchestration stuck in "Running" Orchestration is stuck in the "Running" state
Replay failures, infinite loops, or unexpected behavior Nondeterministic orchestrator code
Type mismatch or JSON serialization errors Serialization and deserialization errors
activity not found Activity not found
RESOURCE_EXHAUSTED or message too large gRPC message size limit exceeded
CANCELLED: Cancelled on client during shutdown Stream cancellation errors during shutdown
CS0419 / VSTHRD105 warnings break build Source generator warnings break builds (C#)
OrchestratorBlockedException (Java) OrchestratorBlockedException (Java)
Unhelpful error when using retry_policy (Python) Retry policy requires max_retry_interval (Python)

Connection and setup issues

Emulator isn't running or is unreachable

If your app fails at startup with a connection error like "connection refused" or "failed to connect," check that the Durable Task Scheduler emulator is running and accessible.

  1. Check that the emulator Docker container is running:

    docker ps | grep durabletask
    
  2. Check the correct port mappings. The emulator exposes two ports:

    • 8080—gRPC endpoint (used by your app)
    • 8082—Dashboard UI

    If you're using a custom port mapping, update your connection string to match the host port mapped to container port 8080.

  3. Test connectivity to the gRPC endpoint:

    curl -v http://localhost:8080
    

    A connection refusal indicates that the container isn't running or that the port mapping is incorrect.

Connection string format is incorrect

Connection string errors are a common cause of startup failures. Check that your connection string matches the expected format.

Local development (emulator):

Endpoint=http://localhost:8080;Authentication=None

Azure (managed identity):

Endpoint=https://<scheduler-name>.durabletask.io;Authentication=ManagedIdentity

Azure (user-assigned managed identity):

Endpoint=https://<scheduler-name>.durabletask.io;Authentication=ManagedIdentity;ClientID=<client-id>

Common mistakes:

  • Using https for the local emulator (the emulator uses http)
  • Using http for Azure endpoints (Azure requires https)
  • Omitting the Authentication parameter
  • Using the dashboard port (8082) instead of the gRPC port (8080)

Client or worker fails to connect

If your client or worker fails to connect, verify the following:

  • Connection string matches the expected format shown in Connection string format is incorrect.
  • Task hub name is the same for both the client and worker.
  • Endpoint URL uses http for the local emulator and https for Azure.

For full setup examples in each language, see Create an app with Durable Task SDKs.

Task hub doesn't exist

If your orchestrations fail to start or the worker connects but doesn't process work, the task hub might not exist on the scheduler. The emulator typically creates task hubs automatically using the DTS_TASK_HUB_NAMES environment variable.

Check that the emulator was started with the correct task hub name:

docker run -d -p 8080:8080 -p 8082:8082 \
  -e DTS_TASK_HUB_NAMES="my-taskhub" \
  mcr.microsoft.com/dts/dts-emulator:latest

For Azure-hosted schedulers, create the task hub using the Azure CLI:

az durabletask taskhub create \
  --resource-group <resource-group> \
  --scheduler-name <scheduler-name> \
  --name <taskhub-name>

Identity-based authentication failures on Azure

If your app runs locally but fails when deployed to Azure, the issue is likely related to authentication:

  1. Check that the managed identity is assigned to your app (system-assigned or user-assigned).
  2. Check that the identity has the Durable Task Data Contributor role on the scheduler resource or specific task hub.
  3. Make sure the connection string uses the correct Authentication value (ManagedIdentity). In Python, pass a DefaultAzureCredential() instance as the token_credential parameter instead of using a connection string.
  4. For user-assigned identities, check that the ClientID in the connection string matches the identity's client ID.

For detailed instructions, see Identity-based access for Durable Task Scheduler.

Orchestration issues

Orchestration is stuck in the "Pending" state

An orchestration in "Pending" status indicates it was scheduled but a worker hasn't picked it up. Check the following items:

  • Worker is running. Ensure your worker process is running and connected to the same task hub where the orchestration was scheduled.
  • Task hub name matches. Check that the worker and client both reference the same task hub name. A mismatch causes the worker to poll a different task hub.
  • Orchestrator is registered. The orchestrator function or class referenced when scheduling must be registered with the worker.

Check that the orchestrator class is registered with the worker during startup. If you use source generators ([DurableTask] attribute), the registration is automatic. Otherwise, register manually:

builder.Services.AddDurableTaskWorker(builder =>
{
    builder.AddTasks(tasks =>
    {
        tasks.AddOrchestrator<MyOrchestrator>();
        tasks.AddActivity<MyActivity>();
    });
});

Orchestration is stuck in the "Running" state

An orchestration stuck in "Running" typically means it's waiting for a task that isn't complete. To diagnose, open the Durable Task Scheduler dashboard and inspect the orchestration's execution history. Look for the last completed event — the next event in the sequence is the one that's blocking.

Common causes:

  • Activity not registered. The orchestration calls an activity name that isn't registered with the worker. The dashboard shows a TaskScheduled event with no corresponding TaskCompleted. Check that the activity name matches between your orchestrator code and worker registration (see Activity not found).
  • Waiting on an external event. The orchestration calls waitForExternalEvent and the event isn't raised yet. The dashboard shows an EventRaised event is expected but missing. Verify the event name and that the sender is targeting the correct orchestration instance ID.
  • Waiting on a durable timer. The orchestration creates a timer that isn't expired yet. The dashboard shows a TimerCreated event. Wait for the timer to fire, or check if the timer duration is longer than expected.
  • Activity throws an unhandled exception. The dashboard shows a TaskFailed event. Check the failure details for the exception message and stack trace.

Nondeterministic orchestrator code

Orchestrator code must be deterministic. Nondeterministic code causes replay failures that result in unexpected behavior, infinite loops, or errors. Don't use current time, random numbers, GUIDs, or I/O (like HTTP calls) directly in orchestrator code. Use the context-provided alternatives or delegate to activities.

// ❌ Wrong - non-deterministic
var now = DateTime.UtcNow;
var id = Guid.NewGuid();
var data = await httpClient.GetAsync("https://example.com/api");

// ✅ Correct - deterministic
var now = context.CurrentUtcDateTime;
var id = context.NewGuid();
var data = await context.CallActivityAsync<string>("FetchData");

Serialization and deserialization errors

Serialization errors occur when the types used for orchestration inputs, outputs, or activity results don't match between caller and callee. These errors can appear as unexpected null values, JsonException, or type cast failures in your orchestration history.

How to diagnose:

  1. Open the Durable Task Scheduler dashboard and inspect the orchestration history. Look at the Input and Result fields for activities that failed.
  2. Verify the type expected by the orchestrator matches the type returned by the activity. For example, if the activity returns a string but the orchestrator expects an int, the deserialization fails.
  3. Check for non-serializable types. Custom types that can't be serialized to JSON (for example, types with circular references or no default constructor) fail silently or throw exceptions.

Known issue (Java): Passing a String directly to an activity can result in double-quoted strings (for example, "\"hello\"" instead of "hello"). This behavior is a known issue. Cast the result explicitly or use wrapper objects.

Tip

Use simple data types (strings, numbers, arrays, and plain objects or POJOs/POCOs/dataclasses) for orchestration and activity inputs and outputs. Avoid complex types with custom serialization logic.

Activity issues

Activity not found

If an orchestration fails with an "activity not found" error, the activity name registered with the worker doesn't match the name used in the orchestration code.

In .NET, activities can be registered by class name or by using the [DurableTask] attribute with source generators. Verify that the activity class is included in the worker registration:

builder.Services.AddDurableTaskWorker(builder =>
{
    builder.AddTasks(tasks =>
    {
        tasks.AddActivity<SayHello>();
    });
});

When calling the activity from an orchestrator, use the class name:

string result = await context.CallActivityAsync<string>(nameof(SayHello), "Tokyo");

Activity failure handling

When an activity throws an exception, the orchestrator receives a TaskFailedException (or language equivalent). Catch this exception and inspect the inner error details to find the root cause. In C#, use ex.FailureDetails to access the error type and message, and IsCausedBy<T>() to check for specific exception types.

For detailed error handling and retry policy examples in each language, see Error handling and retries.

gRPC issues

gRPC message size limit exceeded

If you see a RESOURCE_EXHAUSTED or message too large error, an orchestration or activity input/output exceeds the gRPC default maximum message size of 4 MB.

Mitigations:

  • Reduce the size of inputs and outputs. Store large payloads in external storage, like Azure Blob Storage, and pass only references.
  • Break large fan-out results into smaller batches processed through sub-orchestrations.

Stream cancellation errors during shutdown

When stopping a worker, you might see CANCELLED: Cancelled on client errors. These errors are typically harmless and occur because the gRPC stream between the worker and the scheduler closes during shutdown. The .NET, Python, and Java SDKs handle these errors internally.

In JavaScript, the SDK might throw Stream error Error: 1 CANCELLED: Cancelled on client when calling worker.stop(). This error is a known issue. Wrap the stop call in a try-catch if the error affects your shutdown logic:

try {
  await worker.stop();
} catch (error) {
  // Ignore stream cancellation errors during shutdown
  if (!error.message.includes("CANCELLED")) {
    throw error;
  }
}

Logging and diagnostics

Verbose logging configuration

Increase log verbosity to get more details about SDK operations, including gRPC communication and orchestration replay events.

In your appsettings.json or logging configuration file:

{
  "Logging": {
    "LogLevel": {
      "Default": "Information",
      "Microsoft.DurableTask": "Debug"
    }
  }
}

Use replay-safe loggers to avoid duplicate log entries during orchestration replay:

public override async Task<string> RunAsync(
    TaskOrchestrationContext context, string input)
{
    ILogger logger = context.CreateReplaySafeLogger<MyOrchestrator>();
    logger.LogInformation("Processing input: {Input}", input);
    // ...
}

Application Insights integration

For production applications, configure Application Insights to collect telemetry from your Durable Task SDK application. The integration approach depends on your hosting platform:

Hosting platform Setup instructions
Azure Container Apps Monitor logs in Azure Container Apps with Log Analytics
Azure App Service Enable diagnostic logging for apps in Azure App Service
Azure Kubernetes Service Monitor Azure Kubernetes Service

For more information about diagnostics, see Diagnostics in Durable Task SDKs.

Language-specific issues

C#

Source generator warnings break builds

If you use <TreatWarningsAsErrors>true</TreatWarningsAsErrors> in your project, the Durable Task source generators might produce warnings (CS0419, VSTHRD105) that break your build. Suppress these specific warnings:

<PropertyGroup>
  <NoWarn>$(NoWarn);CS0419;VSTHRD105</NoWarn>
</PropertyGroup>

This known issue is being tracked on GitHub and is addressed in an upcoming release.

Roslyn analyzer throws in foreach loops

The Durable Task Roslyn analyzer might throw an ArgumentNullException when orchestrator lambda code is inside a foreach loop. This behavior is a known issue that doesn't affect runtime behavior. Update to the latest analyzer package version to get the fix.

Java

Gradle permission denied error

On macOS or Linux, running ./gradlew might fail with a "permission denied" error. Fix this error by making the file executable:

chmod +x gradlew

OrchestratorBlockedException

The OrchestratorBlockedException occurs when orchestrator code performs a blocking operation that the SDK detects as potentially nondeterministic. This exception is a safeguard to prevent orchestrator code from violating orchestrator code constraints.

Common causes:

  • Calling a blocking external API in orchestrator code.
  • Using Thread.sleep() directly instead of ctx.createTimer().
  • Performing file or network I/O in orchestrator code.

Move all blocking or I/O operations into activities.

Python

Retry policy requires max_retry_interval

When you configure a retry_policy in Python, omitting the max_retry_interval parameter produces an error that doesn't clearly indicate the cause. Always specify max_retry_interval:

from datetime import timedelta
from durabletask import task

retry_policy = task.RetryPolicy(
    max_number_of_attempts=3,
    first_retry_interval=timedelta(seconds=5),
    max_retry_interval=timedelta(minutes=1),  # Required
)

WhenAllTask exception behavior

When you use when_all to run multiple tasks in parallel, if one or more tasks fail, the exception behavior might not match expectations. Only the first exception is raised, and the remaining task exceptions might be lost. Inspect individual task results if you need complete error information:

tasks = [ctx.call_activity(process_item, input=item) for item in items]
try:
    results = yield task.when_all(tasks)
except TaskFailedError as e:
    # Only the first failure is raised
    # Check individual tasks for comprehensive error handling
    print(f"At least one task failed: {e}")

Get support

For questions and reporting bugs, open an issue in the GitHub repo for the relevant SDK. When you report a bug, include:

  • Affected orchestration instance IDs
  • Time range in UTC that shows the problem
  • Application name and deployment region (if relevant)
  • SDK version and hosting platform
  • Relevant logs or error messages
SDK GitHub repository
.NET microsoft/durabletask-dotnet
Java microsoft/durabletask-java
JavaScript microsoft/durabletask-js
Python microsoft/durabletask-python

Next steps