Thursday, December 22, 2011

C# LINQ To Objects: Using GroupBy with more control

The previous post demonstrated simple use of GroupBy to collect aggregates on some property of an object. This post will demonstrate an example of another overload of GroupBy method which will allow us to group the objects in a more flexible way.

For this example too, we will use the same List of Score objects, which we used in previous example. However, the objective of grouping will be quite different now.

So, the objective is to retrieve a table of subject-wise highest scores and name of students who attained this highest score in corresponding subject. The structure of output can be visualized as below-

Subject Top Score Top Scorer's Name

To achieve this form of grouping, we will need to use the GroupBy method in such a way that it allows us to define what form of result we need as an output of grouping. One of the eight overloads of GroupBy extension method provides a flexibility to define the result type as an argument to itself. Below is the overload we are looking for-

public static IEnumerable<IGrouping<TKey, TSource>> GroupBy<TSource, TKey, TElement, TResult>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector,
    Func elementSelector, 
    Func<TKey, IEnumerable<TSource>, TResult> resultSelector) 
Here are the various arguments and their meanings:

The first argument this IEnumerable<TSource> source is the input sequence itself and as it is an extension method, the first argument is never actually passed into the method.

The second argument Func<TSource, TKey> keySelector, is a delegate to apply to each element in the input sequence to obtain a key (type TKey). The key is what decides which group the element is associated with. We want our final output to be grouped on SubjectName property of Score object. So, our obvious choice as keySelector will be-

groupingKey => groupingKeySubjectName, //keySelector

The third argument Func elementSelector is a delegate to apply to each element (type TElement) to obtain the value which should be part of the relevant group. Once we have grouped our list by SubjectName, we will need to identify the students with their marks in each group.

So, as an element selector we create an anonymous type with properties Name and Marks

elementSelector => new { Name = elementSelector.StudentName, Marks = elementSelector.MarksObtained },   //elementSelector 

Now the last argument Func<TKey, IEnumerable<TSource>, TResult> resultSelector), is a delegate to apply to each grouping to produce a final result (type TResult). As an output, we want a sequence of objects with properties having subject name, highest scores and name of student who attained highest score.

So, we create another anonymous type with properties SubjectName, HighestScore and HighestScorerName. SubjectName will be same as the key of grouping operation. HighestScore of a subject can be determined from elementSelector of the corresponding group and HighestScorerName is the Name of student who has Marks equal to HighestScore. So, here is how our resultSelector will look like-

(groupingKey, elementSelector) => new { //resultSelector
                SubjectName = groupingKey,
                HighestScore = elementSelector.Max(t => t.Marks),
                HighestScorerName = elementSelector.Where(t => t.Marks == elementSelector.Max(f => f.Marks)).Select(t => t.Name).SingleOrDefault()}

This is it. Lets assemble the GroupBy method on the instance of List from previous example and see what we get

var topScorers = examResult.GroupBy(
            groupingKey => groupingKey.SubjectName, //keySelector
            elementSelector => new { Name = elementSelector.StudentName, Marks = elementSelector.MarksObtained },   //elementSelector 
            (groupingKey, elementSelector) => new { //resultSelector
                SubjectName = groupingKey,
                HighestScore = elementSelector.Max(t => t.Marks),
                HighestScorerName = elementSelector.Where(t => t.Marks == elementSelector.Max(f => f.Marks)).Select(t => t.Name).SingleOrDefault()
            }).Select(resultSelector => resultSelector).ToList();

When you look at the output sequence, notice that this overload of GroupBy does not return the IEnumerable of IGrouping type but it returns the IEnumerable of anonymous type defined in the third parameter (resultSelector) of the GroupBy method. This is all the fun of it.

Saturday, October 22, 2011

C# LINQ To Objects: Simple Grouping using GroupBy

We face number of scenarios in day to day programming when we need to group a number of records based on a key and calculate aggregates like SUM, AVG, MAX, MIN etc on these groups. We do it more frequently in SQL but we can also do it easily in C# using LINQ.

Static class Enumerable in System.Linq namespace defines an extension method- GroupBy with one (simpletest) of the available overloads
public static IEnumerable<IGrouping<TKey, TSource>> GroupBy<TSource, TKey>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector) 
This method does to an IEnumerable exactly what GROUP BY in SQL does to a number of records. Lets explore it by means of an example. First we need a class on IEnumerable of which we can apply GroupBy method.
public class Score {
    public string StudentName { get; set; }
    public string SubjectName { get; set; }
    public float MaxMarks { get; set; }
    public float MarksObtained { get; set; }
}
Class Score is a simple class which represents score of a student in a particular subject. There are two float type properties MaxMarks and MarksObtained. These two properties can be used to calculate aggregate on. Lets suppose we have a collection of objects of Score type, each represents the score of a student in a subject and we need to calculate Average marks obtained for students in a particular subject.
Lets first create a list of Score objects
class Program {
    static void Main(string[] args) {
        List<Score> examResult = new List<Score>();
        examResult.Add(new Score() { StudentName = "Steve", SubjectName = "Maths", MaxMarks = 100, MarksObtained = 90 });
        examResult.Add(new Score() { StudentName = "Steve", SubjectName = "Physics", MaxMarks = 100, MarksObtained = 86 });
        examResult.Add(new Score() { StudentName = "Steve", SubjectName = "Chemistry", MaxMarks = 100, MarksObtained = 72 });
        examResult.Add(new Score() { StudentName = "Steve", SubjectName = "Computer Science", MaxMarks = 100, MarksObtained = 91 });

        examResult.Add(new Score() { StudentName = "Sarah", SubjectName = "Maths", MaxMarks = 100, MarksObtained = 85 });
        examResult.Add(new Score() { StudentName = "Sarah", SubjectName = "Physics", MaxMarks = 100, MarksObtained = 76 });
        examResult.Add(new Score() { StudentName = "Sarah", SubjectName = "Chemistry", MaxMarks = 100, MarksObtained = 92 });
        examResult.Add(new Score() { StudentName = "Sarah", SubjectName = "Computer Science", MaxMarks = 100, MarksObtained = 92 });


        examResult.Add(new Score() { StudentName = "David", SubjectName = "Maths", MaxMarks = 100, MarksObtained = 74 });
        examResult.Add(new Score() { StudentName = "David", SubjectName = "Physics", MaxMarks = 100, MarksObtained = 82 });
        examResult.Add(new Score() { StudentName = "David", SubjectName = "Chemistry", MaxMarks = 100, MarksObtained = 85 });
        examResult.Add(new Score() { StudentName = "David", SubjectName = "Computer Science", MaxMarks = 100, MarksObtained = 89 });

        Console.ReadLine();
    }
}
Now we have is a list of type Score having 12 objects of Score for 3 students and 4 subjects and we are all set to determine the average marks obtained by these students in each subject. So, our key to group these records should be SubjectName property and we want to calculate average on MarksObtained property. Here is how it is done-
var avgResults = examResult.GroupBy(rec => rec.SubjectName).
            Select(rec => new { SubjectName = rec.Key, AVGMarks = rec.Average(t => t.MarksObtained) }).ToList();

foreach (var item in avgResults) {
     Console.WriteLine("Subject Name: {0},   Average Marks:{1}", item.SubjectName, item.AVGMarks);
}
Notice that in Select method, we create an Anonymous type with two properties SubjectName and AVGMarks. Finally we get a collection (avgResults) of this Anonymous type.

Sunday, October 16, 2011

Serializing .NET Classes into XML (C#)

Serialization: The process of converting an object into a stream of bytes. This stream of bytes can be persisted in form of a physical file like XML.
In .NET framework, the namespace System.Xml.Serialization provides all the necessary functionality to help convert a "Serializable" object into stream of bytes and then System.IO helps with all the necessary tools to write those stream of bytes into a physical file.
Here is an example where we serialize an object into a stream and then save the stream into a physical XML

Model classes: First we will create a few simple classes which we want to be represented in form of an XML

public class University {
 public University() { }

    public string Name { get; set; }
    public string Address { get; set; }
    public short Rating { get; set; }
    public List<Institute> AffiliatedInstitutes = new List<Institute>();
}

public class Institute {
 public Institute() { }

    public string Name { get; set; }
    public string Address { get; set; }
    public short Rating { get; set; }
    public List<Student> Students = new List<Student>();
 }

public class Course {
    public Course() { }

    public string Name { get; set; }
    public short DurationInMonths { get; set; }
    public string CourseType { get; set; }
}

public class Student {
    public Student() { }

    public string FirstName { get; set; }
    public string LastName { get; set; }
    public string Address { get; set; }
    public string EnrollmentNumber { get; set; }
    public Course Course { get; set; }
 }

In the example above, we are going to serialize the class University which must include properties of all the types (Institute, Course, Student). To enable these classes to be serialized we need to decorate them with some attributes. Lets take a look at the definitions of same classes-
using System.Collections.Generic;
using System.Xml.Serialization;
using System.IO;
using System;
...
[XmlRoot()]
public class University {
 public University() { }

    public string Name { get; set; }
    public string Address { get; set; }
    public short Rating { get; set; }
    public List<Institute> AffiliatedInstitutes = new List<Institute>();
}

[XmlInclude(typeof(Institute))]
public class Institute {
 public Institute() { }

    public string Name { get; set; }
    public string Address { get; set; }
    public short Rating { get; set; }
    public List<Student> Students = new List<Student>();
 }

[XmlInclude(typeof(Course))]
public class Course {
    public Course() { }

    public string Name { get; set; }
    public short DurationInMonths { get; set; }
    public string CourseType { get; set; }
}

[XmlInclude(typeof(Student))]
public class Student {
    public Student() { }

    public string FirstName { get; set; }
    public string LastName { get; set; }
    public string Address { get; set; }
    public string EnrollmentNumber { get; set; }
    public Course Course { get; set; }
 }

In the class definitions above, there are a few things worth noticing.
XmlRoot() attribute: The class definition following this attribute is chosen to be the root node of the finally serialized XML.
XmlInclude(TYPE) attribute: The class definitions following this attribute are marked to be serialized, if there exist any class member of its type in the root object.
By default all the unmarked properties of a class are treated as child elements of parent object(XmlElement). If we want a property to appear like an attribute of its parent node, we need to add an attribute XmlAttribute("attributeName").
Finally we define a method in our University class which actually does the job to convert an object of its own type into stream of bytes (i.e. serializes its object) and then writes those stream of bytes to a physical fine.
public class University {
    public University() {}

    public string Name { get; set; }
    public string Address { get; set; }
    public short Rating { get; set; }
    public List<Institute> AffiliatedInstitutes = new List<Institute>();

    public bool SaveToXML(string filePath) {
  try {
   //Instantiate an object of XmlSerializer class specifying the root object type (i.e. University)
   XmlSerializer serializer = new XmlSerializer(typeof(University));
   
            //Instantiate an object of memory stream which we will use as a continer for the serialized stream
   MemoryStream ms = new MemoryStream();

            using (ms) {
                //Run Serialize method on current instance of University class
    serializer.Serialize(ms, this);
    
    //Read the memory stream into a string object, 
    //though we don't need to read it but we will do so, so that we can debug it and see the XML first before we write it
                ms.Position = 0;
    string data = string.Empty;
    StreamReader reader = new StreamReader((Stream) ms);
                using (reader) {
     data = reader.ReadToEnd();
                }

    //Now write the string to the specified path
                File.WriteAllText(filePath, data);
    reader.Dispose();
    return true;
            }
        } catch (Exception) {
            return false;
  }
    }
}

Now, lets test the code above-

      
static void Main(string[] args) {
 Course mastersBusiness = new Course() {
  CourseType = "PG-Degree",
        DurationInMonths = 24,
        Name = "Masters of Business Administration"
    };

    Course bachelorEngineering = new Course() {
        CourseType = "Graguate-Degree",
        DurationInMonths = 48,
        Name = "Bachelor of Engineering"
    };

    Student steveRichards = new Student() {
        FirstName = "Steve",
        LastName = "Richards",
        Course = bachelorEngineering,
        Address = string.Empty,
        EnrollmentNumber = "BE20111234"
    };

    Student davidBaker = new Student() {
        FirstName = "David",
        LastName = "Baker",
        Course = mastersBusiness,
        Address = string.Empty,
        EnrollmentNumber = "MB20111234"
    };

    Institute rafaelInstitute = new Institute() {
        Name = "St. Rafael Institute for Higher Studies",
        Address = "123, Orleans Dr., Santa Clara, CA 94902",
        Rating = 3
    };

    rafaelInstitute.Students.Add(steveRichards);
    rafaelInstitute.Students.Add(davidBaker);

    University testUniversity = new University() {
        Name = "State University of California",
        Address = "Palo Alto, CA, 92033",
        Rating = 5
    };

    testUniversity.AffiliatedInstitutes.Add(rafaelInstitute);
 testUniversity.SaveToXML(@"F:\" + testUniversity.Name + ".xml");
}

When we are good so far, here is the XML you should already have written somewhere on your disk-

<?xml version="1.0"?>
<University xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <AffiliatedInstitutes>
    <Institute>
      <Students>
        <Student>
          <FirstName>Steve</FirstName>
          <LastName>Richards</LastName>
          <Address />
          <EnrollmentNumber>BE20111234</EnrollmentNumber>
          <Course>
            <Name>Bachelor of Engineering</Name>
            <DurationInMonths>48</DurationInMonths>
            <CourseType>Graguate-Degree</CourseType>
          </Course>
        </Student>
        <Student>
          <FirstName>David</FirstName>
          <LastName>Baker</LastName>
          <Address />
          <EnrollmentNumber>MB20111234</EnrollmentNumber>
          <Course>
            <Name>Masters of Business Administration</Name>
            <DurationInMonths>24</DurationInMonths>
            <CourseType>PG-Degree</CourseType>
          </Course>
        </Student>
      </Students>
      <Name>St. Rafael Institute for Higher Studies</Name>
      <Address>123, Orleans Dr., Santa Clara, CA 94902</Address>
      <Rating>3</Rating>
    </Institute>
  </AffiliatedInstitutes>
  <Name>State University of California</Name>
  <Address>Palo Alto, CA, 92033</Address>
  <Rating>5</Rating>
</University>