Friday, January 20, 2012

FAQ on Task.Start

FAQ on Task.Start:
Recently I’ve heard a number of folks asking about Task.Start, when and when not to use it, how it behaves,and so forth. I thought I’d answer some of those questions here in an attempt to clarify and put to rest any misconceptions about what it is and what it does.
1. Question: When can I use Task.Start?
The Start instance method may be used if and only if the Task is in the Created state (i.e. Task.Status returns TaskStatus.Created). And the only way a Task can be in the Created state is if the Task were instantiated using one of Task’s public constructors, e.g. "var t = new Task(someDelegate);”.
2. Question: Should I call Start on a Task created by Task.Run / Task.ContinueWith / Task.Factory.StartNew / TaskCompletionSource / async methods / …?
No. Not only shouldn’t you, but you simply can’t… it would fail with an exception. See question #1. The Start method is only applicable to a Task in the Created state. Tasks created by all of those mentioned means are already beyond the Created state, such that their Task.Status will not return TaskStatus.Created, but something else, like TaskStatus.WaitingForActivation, or TaskStatus.Running, or TaskStatus.RanToCompletion.
3. Question: What does Start actually do?
It queues the Task to the target TaskScheduler (the parameterless overload of Start targets TaskScheduler.Current). When you construct a Task with one of Task’s constructors, the Task is inactive: it has not been given to any scheduler yet, and thus there’s nothing to actually execute it. If you never Start a Task, it’ll never be queued, and so it’ll never complete. To get the Task to execute, it needs to be queued to a scheduler, so that the scheduler can execute it when and where the scheduler sees fit to do so. The act of calling Start on a Task will twiddle some bits in the Task (e.g. changing its state from Created to WaitingToRun) and will then pass the Task to the target scheduler via the TaskScheduler’s QueueTask method. At that point, the task’s future execution is in the hands of the scheduler, which should eventually execute the Task via the TaskScheduler’s TryExecuteTask method.
4. Question: Can I call Start more than once on the same Task?
No. A Task may only transition out of the Created state once, and Start transitions a Task out of the Created state: therefore, Start may only be used once. Any attempts to call Start on a Task not in the Created state will result in an exception. The Start method employs synchronization to ensure that the Task object remains in a consistent state even if Start is called multiple times concurrently… only one of those calls may succeed.
5. Question: What’s the difference between using Task.Start and Task.Factory.StartNew?
Task.Factory.StartNew is shorthand for new’ing up a Task and Start’ing it. So, the following code:

var t = Task.Factory.StartNew(someDelegate);
is functionally equivalent to:
var t = new Task(someDelegate);
t.Start();
Performance-wise, the former is slightly more efficient. As mentioned in response to question #3, Start employs synchronization to ensure that the Task instance on which Start is being called hasn’t already been started, or isn’t concurrently being started. In contrast, the implementation of StartNew knows that no one else could be starting the Task concurrently, as it hasn’t given out that reference to anyone… so StartNew doesn’t need to employ that synchronization.
6. Question: I’ve heard that Task.Result may also start the Task. True?
False. There are only two ways that a Task in the Created state may transition out of that state:
  1. A CancellationToken was passed into the Task’s constructor, and that token then had or then has cancellation requested. If the Task is still in the Created state when that happens, it would transition into the Canceled state.
  2. Start is called on the Task.
That’s it, and notice that Result is not one of those two. If you use .Wait() or .Result on a Task in the Created state, the call will block; someone else would need to Start the Task so that it could then be queued to a scheduler, so that the scheduler could eventually execute it, and so that the Task could complete… the blocking call could then complete as well and wake up.
What you might be thinking of isn’t that .Result could start the task, but that it could potentially “inline” the task’s execution. If a Task has already been queued to a TaskScheduler, then that Task might still be sitting in whatever data structure the scheduler is using to store queued tasks. When you call .Result on a Task that’s been queued, the runtime can attempt to inline the Task’s execution (meaning to run the Task on the calling thread) rather than purely blocking and waiting for some other thread used by the scheduler to execute the Task at some time in the future. To do this, the call to .Result may end up calling the TaskScheduler’s TryExecuteTaskInline method, and it’s up to the TaskScheduler how it wants to handle the request.
7. Question: Should I return unstarted Tasks from public APIs?
The proper question is “Should I return Tasks in the Created state from public APIs?” And the answer is “No.” (I draw the distinction in the question here due to questions #1 and #2 above… the majority of mechanisms for creating a Task don’t permit for Start to be called, and I don’t want folks to get the impression that you must call Start on a Task in order to allow it to be returned from a public API… that is not the case.)
The fundamental idea here is this. When you call a normal synchronous method, the invocation of that method begins as soon as you’ve invoked it. For a method that returns a Task, you can think of that Task as representing the eventual asynchronous completion of the method. But that doesn’t change the fact that invoking the method begins the relevant operation. Therefore, it would be quite odd if the Task returned from the method was in the Created state, which would mean it represents an operation that hasn’t yet begun.
So, if you have a public method that returns a Task, and if you create that Task using one of Task’s constructors, make sure you Start the Task before returning it. Otherwise, you’re likely to cause a deadlock or similar problem in the consuming application, as the consumer will expect the Task to eventually complete when the launched operation completes, and yet if such a Task hasn’t been started, it will never complete. Some frameworks that allow you to parameterize the framework with methods/delegates that return Tasks even validate the returned Task’s Status, throwing an exception if the Task is still Created.
8. Question: So, should I use Task’s ctor and Task.Start?
In the majority of cases, you’re better off using some other mechanism. For example, if all you want to do is schedule a Task to run some delegate for you, you’re better off using Task.Run or Task.Factory.StartNew, rather than constructing the Task and then Start’ing it; not only will the former methods result in less code, but they’re also cheaper (see question #5 above), and you’re less likely to make a mistake with them, such as forgetting to Start the Task.
There are of course valid situations in which using the ctor + Start makes sense. For example, if you choose to derive from Task for some reason, then you’d need to use the Start method to actually queue it. A more advanced example is if you want the Task to get a reference to itself. Consider the following (buggy) code:
Task theTask = null;
theTask = Task.Run(() => Console.WriteLine(“My ID is {0}.”, theTask.Id));
Spot the flaw? There’s a race. During the call to Task.Run, a new Task object is created and is queued to the ThreadPool scheduler. If there’s not that much going on in the ThreadPool, a thread from the pool might pick it up almost instantly and start running it. That thread is now racing to access the variable ‘theTask’ with the main thread that called Task.Run and that needs to store the created Task into that ‘theTask’ variable. I can fix this race by separating the construction and scheduling:
Task theTask = null;
theTask = new Task(() =>Console.WriteLine(“My ID is {0}.”, theTask.Id));
theTask.Start(TaskScheduler.Default);
Now I’m now sure that the Task instance will have been stored into the ‘theTask’ variable before the ThreadPool processes the Task, because the ThreadPool won’t even get a reference to the Task until Start is called to queue it, and by that point, the reference has already been set (and for those of you familiar with memory models, the appropriate fences are put in place by Task to ensure this is safe).

0 коммент.:

Post a Comment

Powered by Blogger.