Thursday, January 26, 2012

Finding with Git

Finding with Git:
Git is an amazing version control system that never loses anything, but
sometimes it can be hard to find out where things are. Most of the time
it is going to be git log that is our friend, but not all the time.


Where is my file?



Sometimes you know that you have a file in your repository but, you
don't know exactly where it is. git ls-files is the answer.

# Find all files with the name security in the path.
$ git ls-files | grep security
lib/dynamo-db/security.js
test/security-test.js

Obviously, you can use any particular grep options you prefer, like -i
to ignore case.


In what files does word exist?



If you want to find information inside files git grep is your friend.
git grep works similar to a recursive grep grep -r . but it only
searches files that are managed by git.

# Find all lines matching *crypt* in current version.
$ git grep crypt
lib/dynamo-db/security.js:var crypto = require('crypto');
lib/dynamo-db/security.js:   var hmac = crypto.createHmac('sha256', key);

# Also give me the line numbers
git grep -n crypt
lib/dynamo-db/security.js:2:var crypto = require('crypto');
lib/dynamo-db/security.js:15:   var hmac = crypto.createHmac('sha256', key);

# List only the file names
git grep -l crypt
lib/dynamo-db/security.js

# Also list how many times (count) it matched.
$ git grep -c crypt
lib/dynamo-db/security.js:2

It is also possible to give versions to grep to find out what has
changed between revisions.

# Find all files with lines matching *type* in revisions `master` and `8f0fb7f`.
git grep -l type  master 8f0fb7f
master:lib/dynamo-db/index.js
master:lib/dynamo-db/security.js
master:package.json
8f0fb7f:lib/dynamo-db/index.js
8f0fb7f:package.json

Maybe this is not that impressive. Most of the above can be accomplished
with standard grep, find, ack, and friends. But Git is a version
control system. How do I find out about things that happened in the past?


Who deleted my file?



You know how it is, you are working in some project and, your dog gets
sick and you have to stay home from work, when you come back someone has
deleted your file! Where is it and who did it? git log to the rescue.


git log shows you the commit logs. It is your eye into the past.

# When (in what commit) was my file deleted?
$ git log --diff-filter=D -- test/create-table-test.js
commit ba6c4d8bc165b8fb8208979c3e5513bd53477d51
Author: Anders Janmyr <anders@janmyr.com>
Date:   Wed Jan 25 09:46:52 2012 +0100

  Removed the stupid failing test.

Looks like I found the culprit, was I working from home? But is the file
really deleted here. To get some more information about files add the
--summary option.

# When (in what commit) was my file deleted?
$ git log --diff-filter=D --summary -- test/create-table-test.js
commit ba6c4d8bc165b8fb8208979c3e5513bd53477d51
Author: Anders Janmyr <anders@janmyr.com>
Date:   Wed Jan 25 09:46:52 2012 +0100

  Removed the stupid failing test.

delete mode 100644 test/create-table-test.js

Yes, it looks like the file is really deleted. Stupid bastard! Let me
break this command down starting with the last command.



  • test/create-table-test.js - The filename has to be the relative path
    to the file (from your current directory).

  • -- - The double-dash is used to tell Git that this is not a branch
    or an option.

  • --summary - Show me what files were deleted or added.
    --name-status is similar.

  • --diff-filter - This is a real beauty, it allows me to limit the log
    to show me only the specified kind of change, in this case D, for
    Deleted. Other options are: Added (A), Copied (C), Modified (M) and Renamed (R)


When was a file added?



This uses the same technique as above, but I will vary it since I don't
want to type the full path of the file.

# Find out when the integration tests where added.
$ git log --diff-filter=A --name-status |grep -C 6 integ
commit 09420cfea8c7b569cd47f690104750fec358a10a
Author: Anders Janmyr <anders@janmyr.com>
Date:   Tue Jan 24 16:23:52 2012 +0100

  Extracted integration test

A integration-test/sts-test.js

commit 205db3965dec6c2c4c7b2bb75387a591d49e1951
Author: Anders Janmyr <anders@janmyr.com>
Date:   Sat Jan 21 10:03:59 2012 +0100

As you can see here I am using --name-status as a variation on
--summary, it uses the same notation as the --diff-filter.


I am using grep -C 6 to get some context around the found element, in
this case six lines before and after the match. Very useful!


Who changed that line?



As you probably know it is to see who has done something in a file by
using git blame

$ git blame test/security-test.js
205db396 (Anders Janmyr 2012-01-21 10:03:59 +0100  1) var vows = require('vows'),
205db396 (Anders Janmyr 2012-01-21 10:03:59 +0100  2)     assert = require('assert');
205db396 (Anders Janmyr 2012-01-21 10:03:59 +0100  3)
09420cfe (Anders Janmyr 2012-01-24 16:23:52 +0100  4) var access = 'access';
09420cfe (Anders Janmyr 2012-01-24 16:23:52 +0100  5) var secret = 'secret';
90b65208 (Anders Janmyr 2012-01-21 11:58:21 +0100  6)
205db396 (Anders Janmyr 2012-01-21 10:03:59 +0100  7) var security = new Security({
90b65208 (Anders Janmyr 2012-01-21 11:58:21 +0100  8)   access: access,
90b65208 (Anders Janmyr 2012-01-21 11:58:21 +0100  9)   secret: secret
205db396 (Anders Janmyr 2012-01-21 10:03:59 +0100 10) });
...

Every line gets annotated with the commit who introduced it and by whom.
Very helpful.


Who deleted that line?



Another feature, which is not as well known, is git blame --reverse.
It allows you to see the file as it was before, annotated to show you
where it has been changed.

# Check the what lines have been changed since the last 6 commits.
$ git blame --reverse head~6..head security-test.js
558b8e7f (Anders Janmyr 2012-01-25 16:53:52 +0100  1) var vows = require('vows'),
558b8e7f (Anders Janmyr 2012-01-25 16:53:52 +0100  2)     assert = require('assert');
558b8e7f (Anders Janmyr 2012-01-25 16:53:52 +0100  3)
093c13e9 (Anders Janmyr 2012-01-24 17:26:09 +0100  4) var Security = require('dynamo-db').Security;
ba6c4d8b (Anders Janmyr 2012-01-25 09:46:52 +0100  5)
^b96c68b (Anders Janmyr 2012-01-21 12:33:50 +0100  6) var access = process.env['S3_KEY'];
^b96c68b (Anders Janmyr 2012-01-21 12:33:50 +0100  7) var secret = process.env['S3_SECRET'];
558b8e7f (Anders Janmyr 2012-01-25 16:53:52 +0100  8)
558b8e7f (Anders Janmyr 2012-01-25 16:53:52 +0100  9) var security = new Security({
558b8e7f (Anders Janmyr 2012-01-25 16:53:52 +0100 10)   access: access,
558b8e7f (Anders Janmyr 2012-01-25 16:53:52 +0100 11)   secret: secret
558b8e7f (Anders Janmyr 2012-01-25 16:53:52 +0100 12) });

In the output you can see that most of the lines are still the same at
HEAD (558b8e7f). But the fourth and fifth line 093c13e9 and ba6c4d8b
don't exist anymore. And the sixth and seventh lines ^b96c68b have
been changed after this commit.


What commits contain the string?



Another thing I find very useful is to find out when certain words or
sentences are removed or added. For this you can use git log -S<string> or
git log -G<regex>

# Find commits that modified the string aws and display the full diff.
$ git log -Saws --diff-filter=M --patch
commit b96c68b839f204b310b79570bc3d27dc93cff588
Author: Anders Janmyr <anders@janmyr.com>
Date:   Sat Jan 21 12:33:50 2012 +0100

  We have a valid request, tjohoo

diff --git a/lib/dynamo-db/security.js b/lib/dynamo-db/security.js
index bee6936..8471527 100644
--- a/lib/dynamo-db/security.js
+++ b/lib/dynamo-db/security.js
@@ -2,6 +2,7 @@
var crypto = require('crypto');
var _ = require('underscore');
var request = require("request");
+var xml2js = require('xml2js');

function Security(options) {
 this.options = options;
@@ -23,7 +24,7 @@ mod.timestamp = function() {
mod.defaultParams = function() {
 return {
   AWSAccessKeyId: this.options.access,
-    Version: '2010-05-08',
+    Version: '2011-06-15',
   Timestamp: this.timestamp(),
   SignatureVersion: 2,
   SignatureMethod: 'HmacSHA256'
@@ -57,9 +58,10 @@ mod.url = function(host, path, params) {

mod.makeRequest = function(method, host, path, params, callback) {
 var extParams = _.extend({}, this.defaultParams(), params);
-  var signedParams = this.signedParams('GET', 'iam.amazonaws.com', '/', extParams);
-  console.log(extParams,signedParams);
-  return request({ method: method, url: this.url(host, path, signedParams) },
+  var signedParams = this.signedParams(method, host, path, extParams);
+  var url = this.url(host, path, signedParams);
+  console.log(url,signedParams);
+  return request({ method: method, url: url },
...

That's all folks!

Friday, January 20, 2012

FAQ on Task.Start

FAQ on Task.Start:
Recently I’ve heard a number of folks asking about Task.Start, when and when not to use it, how it behaves,and so forth. I thought I’d answer some of those questions here in an attempt to clarify and put to rest any misconceptions about what it is and what it does.
1. Question: When can I use Task.Start?
The Start instance method may be used if and only if the Task is in the Created state (i.e. Task.Status returns TaskStatus.Created). And the only way a Task can be in the Created state is if the Task were instantiated using one of Task’s public constructors, e.g. "var t = new Task(someDelegate);”.
2. Question: Should I call Start on a Task created by Task.Run / Task.ContinueWith / Task.Factory.StartNew / TaskCompletionSource / async methods / …?
No. Not only shouldn’t you, but you simply can’t… it would fail with an exception. See question #1. The Start method is only applicable to a Task in the Created state. Tasks created by all of those mentioned means are already beyond the Created state, such that their Task.Status will not return TaskStatus.Created, but something else, like TaskStatus.WaitingForActivation, or TaskStatus.Running, or TaskStatus.RanToCompletion.
3. Question: What does Start actually do?
It queues the Task to the target TaskScheduler (the parameterless overload of Start targets TaskScheduler.Current). When you construct a Task with one of Task’s constructors, the Task is inactive: it has not been given to any scheduler yet, and thus there’s nothing to actually execute it. If you never Start a Task, it’ll never be queued, and so it’ll never complete. To get the Task to execute, it needs to be queued to a scheduler, so that the scheduler can execute it when and where the scheduler sees fit to do so. The act of calling Start on a Task will twiddle some bits in the Task (e.g. changing its state from Created to WaitingToRun) and will then pass the Task to the target scheduler via the TaskScheduler’s QueueTask method. At that point, the task’s future execution is in the hands of the scheduler, which should eventually execute the Task via the TaskScheduler’s TryExecuteTask method.
4. Question: Can I call Start more than once on the same Task?
No. A Task may only transition out of the Created state once, and Start transitions a Task out of the Created state: therefore, Start may only be used once. Any attempts to call Start on a Task not in the Created state will result in an exception. The Start method employs synchronization to ensure that the Task object remains in a consistent state even if Start is called multiple times concurrently… only one of those calls may succeed.
5. Question: What’s the difference between using Task.Start and Task.Factory.StartNew?
Task.Factory.StartNew is shorthand for new’ing up a Task and Start’ing it. So, the following code:
var t = Task.Factory.StartNew(someDelegate);
is functionally equivalent to:
var t = new Task(someDelegate);
t.Start();
Performance-wise, the former is slightly more efficient. As mentioned in response to question #3, Start employs synchronization to ensure that the Task instance on which Start is being called hasn’t already been started, or isn’t concurrently being started. In contrast, the implementation of StartNew knows that no one else could be starting the Task concurrently, as it hasn’t given out that reference to anyone… so StartNew doesn’t need to employ that synchronization.
6. Question: I’ve heard that Task.Result may also start the Task. True?
False. There are only two ways that a Task in the Created state may transition out of that state:
  1. A CancellationToken was passed into the Task’s constructor, and that token then had or then has cancellation requested. If the Task is still in the Created state when that happens, it would transition into the Canceled state.
  2. Start is called on the Task.
That’s it, and notice that Result is not one of those two. If you use .Wait() or .Result on a Task in the Created state, the call will block; someone else would need to Start the Task so that it could then be queued to a scheduler, so that the scheduler could eventually execute it, and so that the Task could complete… the blocking call could then complete as well and wake up.
What you might be thinking of isn’t that .Result could start the task, but that it could potentially “inline” the task’s execution. If a Task has already been queued to a TaskScheduler, then that Task might still be sitting in whatever data structure the scheduler is using to store queued tasks. When you call .Result on a Task that’s been queued, the runtime can attempt to inline the Task’s execution (meaning to run the Task on the calling thread) rather than purely blocking and waiting for some other thread used by the scheduler to execute the Task at some time in the future. To do this, the call to .Result may end up calling the TaskScheduler’s TryExecuteTaskInline method, and it’s up to the TaskScheduler how it wants to handle the request.
7. Question: Should I return unstarted Tasks from public APIs?
The proper question is “Should I return Tasks in the Created state from public APIs?” And the answer is “No.” (I draw the distinction in the question here due to questions #1 and #2 above… the majority of mechanisms for creating a Task don’t permit for Start to be called, and I don’t want folks to get the impression that you must call Start on a Task in order to allow it to be returned from a public API… that is not the case.)
The fundamental idea here is this. When you call a normal synchronous method, the invocation of that method begins as soon as you’ve invoked it. For a method that returns a Task, you can think of that Task as representing the eventual asynchronous completion of the method. But that doesn’t change the fact that invoking the method begins the relevant operation. Therefore, it would be quite odd if the Task returned from the method was in the Created state, which would mean it represents an operation that hasn’t yet begun.
So, if you have a public method that returns a Task, and if you create that Task using one of Task’s constructors, make sure you Start the Task before returning it. Otherwise, you’re likely to cause a deadlock or similar problem in the consuming application, as the consumer will expect the Task to eventually complete when the launched operation completes, and yet if such a Task hasn’t been started, it will never complete. Some frameworks that allow you to parameterize the framework with methods/delegates that return Tasks even validate the returned Task’s Status, throwing an exception if the Task is still Created.
8. Question: So, should I use Task’s ctor and Task.Start?
In the majority of cases, you’re better off using some other mechanism. For example, if all you want to do is schedule a Task to run some delegate for you, you’re better off using Task.Run or Task.Factory.StartNew, rather than constructing the Task and then Start’ing it; not only will the former methods result in less code, but they’re also cheaper (see question #5 above), and you’re less likely to make a mistake with them, such as forgetting to Start the Task.
There are of course valid situations in which using the ctor + Start makes sense. For example, if you choose to derive from Task for some reason, then you’d need to use the Start method to actually queue it. A more advanced example is if you want the Task to get a reference to itself. Consider the following (buggy) code:
Task theTask = null;
theTask = Task.Run(() => Console.WriteLine(“My ID is {0}.”, theTask.Id));
Spot the flaw? There’s a race. During the call to Task.Run, a new Task object is created and is queued to the ThreadPool scheduler. If there’s not that much going on in the ThreadPool, a thread from the pool might pick it up almost instantly and start running it. That thread is now racing to access the variable ‘theTask’ with the main thread that called Task.Run and that needs to store the created Task into that ‘theTask’ variable. I can fix this race by separating the construction and scheduling:
Task theTask = null;
theTask = new Task(() =>Console.WriteLine(“My ID is {0}.”, theTask.Id));
theTask.Start(TaskScheduler.Default);
Now I’m now sure that the Task instance will have been stored into the ‘theTask’ variable before the ThreadPool processes the Task, because the ThreadPool won’t even get a reference to the Task until Start is called to queue it, and by that point, the reference has already been set (and for those of you familiar with memory models, the appropriate fences are put in place by Task to ensure this is safe).

Choosing Client Technology Today for Tomorrow


(by Brian Noyes)


Choosing the "right" client technology today is one of the hottest and contested questions we help IDesign customers address. I hope in this little missive to clarify our current thinking, in the face of the plethora of options, past decisions made, new technologies, misconceptions, and what we see our customers opting for. 


Basically, it comes down to this question: should we use Silverlight or WPF, or should we abandon them and go with an HTML solution? It turns out this question is actually multiple questions compounded. You really need some additional context of the kinds of client applications you are building, who your target audience is, what platforms you think you need to run on today and in the future, and where your team's skills are. 
Really there are three primary paths that you should be choosing from today: build Windows clients with WPF or Silverlight and not worry about multi-platform, build multiple client applications - one for each target platform using the appropriate native technology - with common backing services, or build HTML applications for maximum reach. Each has its benefits and pitfalls and requires you to make tradeoffs. 
HTML5/JS applications may seem the most attractive because of the promise of multi-platform support. But you also have to take into account what you are giving up in terms of productivity and capabilities. There are still many things that business apps need today that will just not be possible or will be extremely difficult to achieve in the browser with an HTML5/JS application. You also need to fight the misperception that an HTML5/JS Metro application is any more multi-platform than a XAML based one - it is not. You just happen to be building a Windows application with those languages, but the implementation and even coding patterns you employ will be very different for a native Windows Metro application than they would be for a browser application. 
If you choose Silverlight because it is "multi-platform" (which really only means Windows and Mac), you are probably on a dead end path and probably should switch to HTML5/JS for the long term. It is pretty clear that Microsoft has no intentions (whether there is an SL6 or not) of continuing to invest in Silverlight as a cross-platform technology, if at all. The exception is that one definition of cross platform is Desktop/Browser/Phone/Xbox, which does appear to be a safe track for Silverlight in the near future. 
If you choose Silverlight or WPF because they are the best, most productive platforms for building rich business apps for the Windows platform, then you are on the right track. 

Choosing between Silverlight and WPF can be a little tougher. Silverlight is your best choice if you think you want to port your application to run in Windows 8 as a Metro style application some day. Silverlight and Metro require you to have the same kind of architecture, executing environment restrictions, adopt the same asynchronous patterns, and more. However, if your application is a desktop application now and you envision it staying that way for a long time to come and not becoming a Metro app, then WPF will still give you the greatest power, flexibility, and productivity for building rich desktop client applications for the Windows platform. 
What we have in SL5 might be it for the long term, as there may be no subsequent releases. But that doesn't necessarily mean you have to jump ship right away, especially for desktop applications. Running Silverlight applications as browser applications does have some risk because even though Microsoft has announced support for SL5 for 10 years, they have not said they will make sure it works with the next 10 years worth of browser versions and potential new browsers. Your users will probably want to keep their browsers moving forward, but a Silverlight browser application might keep you tied down at some point in the future. Silverlight desktop applications (Out of Browser mode) are just as safe as all the Windows Forms applications are that have been running comfortable even though there have not been any substantial updates for Windows Forms in 6 years. 

At the end, it all boils down to who your target audience is. 

If your target audience is business employees on company computers, then Silverlight is a good path in the near term, with a possible migration to WinRT/Metro in the future for some of the functionality. If you have a good understanding of what a Metro application is and what the constraints of Metro are and you can't envision your application running as a Metro application, then stick to WPF. 
If your target audience is consumers, Silverlight is a terrible path to be on for the long term, unless it is part of a multi-native-client strategy where you will have native clients for Windows/iOS/Android/Mac backed by a common set of services. If so, Silverlight as a bridge technology to WinRT/Metro for the Windows clients is a good path to be on. Otherwise, HTML5/JS is the only safe path for a consumer facing app if you are really trying to build one and only one client app. 
If your target audience are business employees, but highly mobile ones, the choices are roughly the same as consumer these days, and becoming more so all the time. The days of the dedicated company desktop computer that is there to perform one job function are quickly fading into the past. 
The above makes the choice of XAML technologies a tough one unless you are willing to commit to the multi-native-client strategy, which is the best choice from a user experience perspective, but certainly one that takes some more resources to pull off. Going HTML 5 means a significant productivity hit, especially depending on the skills of your team, and significantly different application architecture, technology choices, learning curve, etc. Eventually, I think many companies will choose the HTML 5 path. But as an architect, I think the right choice in most cases is a good service-based architecture with multiple native clients to cover the Windows-Mac-iOS-Android space.

Thursday, January 19, 2012

Asynchronous Programming with the Reactive Extensions (while waiting for async/await)

Asynchronous Programming with the Reactive Extensions (while waiting for async/await):
This article was originally published on the MVP Award Blog in December 2011.

Nowadays, with applications that use more and more services that are in the cloud, or simply perform actions that take a user noticeable time to execute, it has become vital to program in an asynchronous way.

But we, as developers, feel at home when thinking sequentially. We like to send a request or execute a method, wait for the response, and then process it.

Unfortunately for us, an application just cannot wait synchronously for a call to end anymore. Reasons can be that the user expects the application to continue responding, or because the application joins the results of multiple operations, and it is necessary to perform all these operations simultaneously for good performance.

Frameworks that are heavily UI dependent (like Silverlight or Silverlight for Windows Phone) are trying the force the developer's hand into programming asynchronously by removing all synchronous APIs. This leaves the developer alone with either the Begin/End pattern, or the plain old C# events. Both patterns are not flexible, not easily composable, often lead to memory leaks, and are just plain difficult to use or worse, to read.

C# 5.0 async/await


Taking a quick look at the not so distant future, Microsoft has taken the bold approach to augment its new .NET 4.5 to include asynchronous APIs and in the case of the Windows Runtime (WinRT), restrict some APIs to be asynchronous only. These are based on the Task class, and are backed by languages to ease asynchronous programming.

In the upcoming C# 5.0 implementation, the async/await pattern is trying to handle this asynchrony problem by making asynchronous code look synchronous. It makes asynchronous programming more "familiar" to developers.

If we take this example:

    static void Main(string[] args)
   {
       // Some initialization of the DB...
       Task<int> t = GetContentFromDatabase();

       // Execute some other code when the task is done
       t.ContinueWith(r => Console.WriteLine(r.Result));

       Console.ReadLine();
   }

   public static async Task<int> GetContentFromDatabase()
   {
       int source = 22;

       // Run starts the execution on another thread
       var result = (int) await Task.Run(
           () => {
               // Simulate DB access
               Thread.Sleep(1000);
               return 10;
           }
       );

       return source + result * 2;
   }

The code in GetContentFromDatabaselooks synchronous, but under the hood, it's actually split in half (or more) where the await keyword is used.

The compiler is applying a technique used many times in the C# language, known as syntactic sugar. The code is expanded to a form that is less readable, but is more of a plumbing code that is painful to write – and get right – each time. The using statement, iterators and more recently LINQ are very good examples of that syntactic sugar.

Using a plain old thread pool call, the code actually looks a lot more like this, once the compiler is done:

    public static void Main()
   {
       MySyncMethod(result => Console.WriteLine(result));
       Console.ReadLine();
   }

   public static void GetContentFromDatabase (Action<int> continueWith)
   {
       // The first half of the async method (with QueueUserWorkItem)
       int source = 22;

       // The second half of the async method
       Action<int> onResult = result => {
           continueWith(source + result * 2);
       };

       ThreadPool.QueueUserWorkItem(
           _ => {
               // Simulate DB access
               Thread.Sleep(1000);

               onResult(10);
           }
       );
   }

This sample somewhat more complex, and does not properly handle exceptions. But you probably get the idea.

Asynchronous Development now


Nonetheless, you may not want or will be able to use C# 5.0 soon enough. A lot of people are still using .NET 3.5 or even .NET 2.0, and new features like async will take a while to be deployed in the field. Even when the framework has been offering it for a long time, the awesome LINQ (a C# 3.0 feature) is still being adopted and is not that widely used.

The Reactive Extensions (Rx for friends) offer a framework that is available from .NET 3.5 and functionality similar to C# 5.0, but provide a different approach to asynchronous programming, more functional. More functional is meaning fewer variables to maintain states, and a more declarative approach to programming.

But don't be scared. Functional does not mean abstract concepts that are not useful for the mainstream developer. It just means (veryroughly) that you're going to be more inclined to separate your concerns using functions instead of classes.

But let's dive into some code that is similar to the two previous examples:

    static void Main(string[] args)
   {
       IObservable<int> query = GetContentFromDatabase();

       // Subscribe to the result and display it
       query.Subscribe(r => Console.WriteLine(r));

       Console.ReadLine();
   }

   public static IObservable<int> GetContentFromDatabase()
   {
       int source = 22;

       // Start the work on another thread (using the ThreadPool)
       return Observable.Start<int>(
                  () => {
                     Thread.Sleep(1000);
                     return 10;
                  }
              )

              // Project the result when we get it
              .Select(result => source + result * 2);
   }

From the caller's perspective (the main), the GetContentFromDatabase method behaves almost the same way a .NET 4.5 Task would, and the Subscribe pretty much replaces the ContinueWith method.

But this simplistic approach works well for an example. At this point, you could still choose to use the basic ThreadPoolexample shown earlier in this article.

A word on IObservable


An IObservable is generally considered as a stream of data that can push to its subscribers zero or more values, and either an error or completion message. This “Push” based model that allows the observation of a data source without blocking a thread. This is opposed to the Pull model provided by IEnumerable, which performs a blocking observation of a data source. A very good video with Erik Meijer explains these concepts on Channel 9.

To match the .NET 4.5 Task model, an IObservable needs to provide at most one value, or an error, which is what the Observable.Start method is doing.

A more realistic example


Most of the time, scenarios include calls to multiple asynchronous methods. And if they're not called at the same time and joined, they're called one after the other.

Here is an example that does task chaining:

    public static void Main()
   {
       // Use the observable we've defined before
       var query = GetContentFromDatabase();

             // Once we get the token from the database, transform it first
       query.Select(r => "Token_" + r)

            // When we have the token, we can initiate the call to the web service
            .SelectMany(token => GetFromWebService(token))

            // Once we have the result from the web service, print it.
            .Subscribe(_ => Console.WriteLine(_));
   }

   public static IObservable<string> GetFromWebService(string token)
   {
       return Observable.Start(
           () => new WebClient().DownloadString("http://example.com/" + token)
       )
       .Select(s => Decrypt(s));
   }

The SelectMany operator is a bit strange when it comes to the semantics of an IObservable that behaves like a Task. It can then be thought of a ContinueWith operator. The GetContentFromDatabase only pushes one value, meaning that the provided lambda for the SelectMany is only called once.

Taking the Async route


A peek at WinRT and the Build conference showed a very interesting rule used by Microsoft when moving to asynchronous API throughout the framework. If an API call nominally takes more than 50ms to execute, then it's an asynchronous API call.

This rule is easily applicable to existing .NET 3.5 and later frameworks by exposing IObservable instances that provide at most one value, as a way to simulate a .NET 4.5 Task.

Architecturally speaking, this is a way to enforce that the consumers of a service layer API will be less tempted to synchronously call methods and negatively impact the perceived or actual performance of an application.

For instance, a "favorites" service implemented in an application could look like this, using Rx:

    public interface IFavoritesService
   {
       IObservable<Unit> AddFavorite(string name, string value);
       IObservable<bool> RemoveFavorite(string name);
       IObservable<string[]> GetAllFavorites();
   }

All the operations, including ones that alter content, are executed asynchronously. It is always tempting to think a select operation will take time, but we easily forget that an Addoperation could easily take the same amount of time.

A word on unit: The name comes from functional languages, and represents the void keyword, literally. A deep .NET CLR limitation prevents the use of System.Void as a generic type parameter, and to be able to provide a void return value, Unit has been introduced.

Wrap up


Much more can be achieved with Rx but for starters, using it as a way to perform asynchronous single method call seems to be a good way to learn it.

Also, a note to Rx experts, shortcuts have been taken to explain this in the most simple form, and sure there are many tips and tricks to know to use Rx effectively, particularly when it is used all across the board. The omission of the Completed event is one of them.

Finally, explaining the richness of the Reactive Extensions is a tricky task. Even the smart guys of the Rx team have a hard time doing so... I hope this quick start will help you dive into it!

Wednesday, January 11, 2012

DateTime.UtcNow is generally preferable to DateTime.Now

DateTime.UtcNow is generally preferable to DateTime.Now:
This seems to be commonly known and accepted best practice to use DateTime.UtcNow for non-user facing scenarios such as time interval and timeout measurement.
I’ve just done an audit of the Roslyn codebase and replaced most DateTime.Now calls with DateTime.UtcNow. I thought it’d be useful to post my changeset description here (although none of it is new – I just summarize some common knowledge readily available in the sources linked below).
====
Replacing DateTime.Now with DateTime.UtcNow in most cases.
We should be using DateTime.Now only in user-facing scenarios. It respects the timezone and Daylight Savings Time (DST), and the user feels comfortable when we show them the string that matches their wall clock time.
In all other scenarios though DateTime.UtcNow is preferable.
First, UtcNow usually is a couple of orders of magnitude faster, since Now is basically first calling UtcNow and then doing a very expensive call to figure out the time zone and daylight savings time information. Here’s a great chart from Keyvan’s blog:

Second, Now can be a big problem because of a sudden 1-hour jump during DST adjustments twice a year. Imagine a waiting loop with a 5-sec timeout that happens to occur exactly at 2am during DST transition. The operation that you expect to timeout after 5 sec will run for 1 hour and 5 seconds instead! That might be a surprise.
Hence, I'm replacing DateTime.Now with DateTime.UtcNow in most situations, especially polling/timeout/waiting and time interval measurement. Also, everywhere where we persist DateTime (file system/database) we should definitely be using UtcNow because by moving the storage into a different timezone, all sorts of confusion can occur (even "time travel", where a persisted file can appear with a future date). Granted, we don't often fly our storage with a supersonic jet across timezones, but hey.
For precise time interval measurements we should be using System.Diagnostics.StopWatch which uses the high resolution timer (QueryPerformanceCounter).
For a cheap timestamp, we should be generally using Environment.TickCount - it's even faster than DateTime.UtcNow.Ticks.
I'm only leaving DateTime.Now usages in test code (where it's never executed), in the compiler Version parsing code (where it's used to generate a minor version) and in a couple of places (primarily test logging and perf test reports) where we want to output a string in a local timezone.
Sources:
* http://www.keyvan.ms/the-darkness-behind-datetime-now
* http://stackoverflow.com/questions/62151/datetime-now-vs-datetime-utcnow
* http://stackoverflow.com/questions/28637/is-datetime-now-the-best-way-to-measure-a-functions-performance

Tuesday, January 10, 2012

Free .NET decompiler: JustDecompile

Reflector being not for free, there is a pretty good free decompiler for .NET: JustDecompile
http://www.telerik.com/products/decompiler.aspx

Powered by Blogger.