Friday, May 11, 2007

LINQ to Google Desktop

Given the number of recent blog posts and articles talking about creating LINQ providers, I decided to go ahead and try create a provider for Google Desktop.

The work in this post is based on the following posts/articles:



Also samples were created using the March 2007 CTP of Orcas.

The first thing to do is try to figure out how a Linq to Google Desktop query look like.

As discussed in a previous post, the result of a Google Desktop query could yield different kinds of results. For example a query for the "linq" term could return references to files, emails, calendar items, etc.

In order to deal with this, the first thing to do was to create a class hierarchy for the kinds of elements that could result from a Google Desktop query. This class hierarchy will have GDResult as the base class (which will be almost the same as an Indexable in the Google Desktop schemas).

Also we need to create a wrapper around the Google Desktop COM API, to capture the results and instantiate the appropriate class given the selected result.

For simplicity of this experiment only GDFileResult and GDEmailResult will be implemented and supported by the provider.


public class GDResult
{
public string Schema {...}
public string Content {... }

public bool Contains(string term)
{
...
}
}

public class GDFileResult : GDResult
{
public string FileType { ... }
public string Location { ... }
}

public class GDEmailResult : GDResult
{
public string From { ... }
public string To { ... }
public string Cc { ... }
public string Subject { ... }
}


Given this we could image how the Linq queries will look like. For example if we want to get all the PDF files that contain the "statement" and "expression" terms we could write:



GDDesktop gd = new GDProvider();
var t = from t in gd
where t.Contains("statement") &&
t.Contains("expression")&&
t is GDFileResult &&
((GDFileResult)t).FileType == "pdf"
select t;




This Linq query will be translated to a Google Desktop query:

" statement expression filetype:pdf "

And the t is GDFileResult must tell the Google Desktop API that only file references must be retrieved.

This first thing to do is to create the GDesktop this class represents our connection with Google Desktop this object is similar to a database connection object in Linq to Sql . GDesktop must implement the System.Linq.IQueryable<T> interface. This interface inherits from IEnumerable<T>. Since at this time there's no official documentation on how to implement this interface correctly, only the required elements were added.

This implementation of the Google Desktop provider looks like this:


public class GDesktop :IQueryable<GDResult>
{

public IQueryable<T> CreateQuery<T>(Expression expression)
{
GDQuery q = new GDQuery();
if (expression.NodeType == ExpressionType.Call)
{
MethodCallExpression mcExpression =
(MethodCallExpression)expression;
switch (mcExpression.Method.Name)
{
case "Where":
GDQueryInfo qi = ProcessWhereMethodCall(mcExpression);
q.QueryInfo = qi;
break;

default:
throw new NotImplementedException(
"Could not handle method: "+
mcExpression.Method.Name);

}
}
else
{
throw new NotSupportedException(expression.ToString());
}
return (IQueryable<T>)q;

}


public Expression Expression
{
get {
return Expression.Constant(this);
}
}


#region Expression tree processing
....
#endregion


#region Not implemented methods

public TResult Execute<TResult>(Expression expression)
{
throw new Exception("The method or operation is not implemented.");
}


public IEnumerator<GDResult> GetEnumerator()
{
throw new Exception("The method or operation is not implemented.");
}

IEnumerator System.Collections.IEnumerable.GetEnumerator()
{
throw new Exception("The method or operation is not implemented.");
}

public IQueryable CreateQuery(Expression expression)
{
throw new Exception("The method or operation is not implemented.");
}

public Type ElementType
{
get { throw new Exception("The method or operation is not implemented."); }
}

public object Execute(Expression expression)
{
throw new Exception("The method or operation is not implemented.");
}

#endregion



The CreateQuery method is very important and is called with the expression tree representing the contents of the first section the query. For example for the following query:


var e = from t in gd
where t.Contains("linq")
select t;


Is interpreted by the compiler as:


var e = gd.Where(t => t.Contains("linq"));


When the compiler determines that gd is an IQueryable it generates calls to the Expression property and the CreateQuery method.

The CreateQuery method of the GDesktopclass is going to return a instance of a GDQuery object which represents the specific query that is being created. Also a GDQueryInfo object is used store collected query terms and modifiers from the expressions trees.

The only supported operation that can be applied to GDesktop is Where. This is very important because no other operation, like Select, Take or Skip could be applied because there's no way (that I know of) to get all the elements of the Google Desktop index!.

The ProcessWhereMethodCall walks the where expression tree to collect the required query terms and the query modifiers . The only supported elements in the where section are:


  1. Calls to the Contains method meaning that the document contains a term or phrase

  2. 'is' expressions to filter the type of element that must be returned

  3. Equality expression of "query time" properties for elements such as 'Filetype' or 'From' or 'To'



Processing for each of this elements is implemented this way:


#region Expression tree processing

private GDQueryInfo ProcessWhereMethodCall(MethodCallExpression mcExpression)
{
Expression theObjectArgument = mcExpression.Arguments[0];
Expression whereExpressionTree = mcExpression.Arguments[1];

GDesktop provider =
(GDesktop)((ConstantExpression)theObjectArgument).Value;

GDQueryInfo qi = new GDQueryInfo();
CollectQueryInfo(whereExpressionTree, qi);
return qi;

}

private void CollectQueryInfo(Expression whereExpressionTree, GDQueryInfo qi)
{
if (whereExpressionTree is UnaryExpression &&
((UnaryExpression)whereExpressionTree).Operand
is LambdaExpression)
{
UnaryExpression ue = (UnaryExpression)whereExpressionTree;
CollectQueryInfo(((LambdaExpression)ue.Operand).Body, qi);
}

if (whereExpressionTree.NodeType == ExpressionType.AndAlso)
{
BinaryExpression be = (BinaryExpression)whereExpressionTree;
CollectQueryInfo(be.Left,qi);
CollectQueryInfo(be.Right,qi);
}

if (whereExpressionTree.NodeType == ExpressionType.Call)
{
ProcessMethodCall(whereExpressionTree, qi);
}

if (whereExpressionTree.NodeType == ExpressionType.Equal)
{
ProcessEqual(whereExpressionTree, qi);
}

if (whereExpressionTree.NodeType == ExpressionType.TypeIs)
{
ProcessTypeIs(whereExpressionTree, qi);
}

}

private void ProcessTypeIs(Expression whereExpressionTree, GDQueryInfo qi)
{
if (whereExpressionTree is TypeBinaryExpression)
{
TypeBinaryExpression be = (TypeBinaryExpression)whereExpressionTree;

if (be.Expression is ParameterExpression &&
((ParameterExpression)be.Expression).Type == typeof(GDResult) &&
be.TypeOperand.IsSubclassOf(typeof(GDResult)))
{
switch (be.TypeOperand.Name)
{
case "GDFileResult":
qi.ElementType = GDElementType.File;
break;
case "GDEmailResult":
qi.ElementType = GDElementType.Email;
break;
case "GDResult":
qi.ElementType = null;
break;
default:
throw new NotSupportedException("Element type not supported");
}
}
}
else
{
throw new NotSupportedException("TypeIs expression not supported");
}
}

private void ProcessEqual(Expression whereExpressionTree, GDQueryInfo qi)
{

if (whereExpressionTree is BinaryExpression)
{
BinaryExpression be = (BinaryExpression)whereExpressionTree;
Expression leftExpression = be.Left;
if (IsAPropertyAccessToSpecificResultItem(leftExpression, typeof(GDFileResult)))
{
string argumentValue = (string)GetObjectFromArgument(be.Right);

switch (((MemberExpression)leftExpression).Member.Name)
{
case "FileType":
qi.FileType = argumentValue;
break;
}

}
else
{
if (IsAPropertyAccessToSpecificResultItem(
leftExpression,
typeof(GDEmailResult)))
{

string argumentValue = (string)GetObjectFromArgument(be.Right);

switch (((MemberExpression)leftExpression).Member.Name)
{
case "From":
qi.From = argumentValue;
break;
case "To":
qi.To = argumentValue;
break;
case "Cc":
qi.Cc = argumentValue;
break;
case "Subject":
qi.Subject = argumentValue;
break;
}
}
else
{
throw new NotSupportedException("Property not supported");
}
}
}
else
{
throw new NotSupportedException("Member access not supported");
}
}


private void ProcessMethodCall(Expression whereExpressionTree, GDQueryInfo qi)
{
MethodCallExpression mCall =
(MethodCallExpression)whereExpressionTree;
if (mCall.Method.Name == "Contains")
{
object o = GetObjectFromArgument(mCall.Arguments[0]);
qi.AddTerm((string)o);
}
else
{
throw new NotSupportedException("Method call not supported "+mCall.ToString());
}

}


private object GetObjectFromArgument(Expression e)
{
return LambdaExpression.Lambda(e).Compile().DynamicInvoke();

}

private static bool IsAPropertyAccessToSpecificResultItem(Expression leftExpression, Type elementType)
{
return leftExpression.NodeType == ExpressionType.MemberAccess &&
((MemberExpression)leftExpression).Expression.NodeType
== ExpressionType.Convert &&
((UnaryExpression)((MemberExpression)leftExpression).Expression).Type
== elementType &&
((UnaryExpression)((MemberExpression)leftExpression).Expression).Operand.Type
== typeof(GDResult) &&
((MemberExpression)leftExpression).Member is PropertyInfo;
}

#endregion



Certainly this code is not pretty. In the future I'll try to improve it.

Since we want to allow the user to pass variables, function calls, property references, literal, etc. to the query arguments. For example:


var e2 = from t in gdProvider
where t.Contains(args[0]) &&
t.Contains(GetString(1))&&
t is GDFileResult &&
((GDFileResult)t).FileType == "pdf"
select t;



Because of this we need to get the value from the expression tree representing the query argument. This is done in the GetObjectFromArgument method.


private object GetObjectFromArgument(Expression e)
{
return LambdaExpression.Lambda(e).Compile().DynamicInvoke();

}


Note that the expression tree representing the query argument is being compiled as the body of a lambda expression with no parameters. Then we call the generated delegate to get the value. This is a very good example that shows that you have complete control over the complete query, even the arguments.

Now that we have processed the where section of the query and that we have collected the necessary elements to build the query string, we need a place to put the call to Google Desktop. By looking above in the definition of the GDesktop.CreateQuery method is important to note that GDQuery also have to implement IQueryable.For now our implementation is very basic. The place to put the call to Google Desktop will be the GetEnumerator method of the GDQuery class.


public class GDQuery : IQueryable
{

public IEnumerator GetEnumerator()
{
GDesktopWrapper gd = new GDesktopWrapper();
string qs = qInfo.CreateQueryString();
return gd.Query(qs, qInfo.ElementType).GetEnumerator();
}

public Expression Expression
{
get {
return Expression.Constant(this);
}
}

private GDQueryInfo qInfo;
public GDQueryInfo QueryInfo
{
get {
return this.qInfo;
}
set
{
this.qInfo = value;
}
}



#region Not implemented methods
...
#endregion
}


For features like projections we need to put more work on this implementation.

Code for this experiment can be found here.

In future posts I'll continue working with this implementation in order to add new features. For example projections,joins, and other Linq features.

3 comments:

Auxon said...

This is going to take all night ... :-D

Scott said...

I have added a Linq to Google application to codeplex. It currently supports querying Google Base for products. I think you will be interested in some of the ideas I use for query parsing.
www.codeplex.com\glinq

Anonymous said...

This is an awesome piece of work! Thanks!!!! You should submit this over to linqhelp.com