Showing posts with label GroupBy. Show all posts
Showing posts with label GroupBy. Show all posts

Thursday, December 22, 2011

C# LINQ To Objects: Using GroupBy with more control

The previous post demonstrated simple use of GroupBy to collect aggregates on some property of an object. This post will demonstrate an example of another overload of GroupBy method which will allow us to group the objects in a more flexible way.

For this example too, we will use the same List of Score objects, which we used in previous example. However, the objective of grouping will be quite different now.

So, the objective is to retrieve a table of subject-wise highest scores and name of students who attained this highest score in corresponding subject. The structure of output can be visualized as below-

Subject Top Score Top Scorer's Name

To achieve this form of grouping, we will need to use the GroupBy method in such a way that it allows us to define what form of result we need as an output of grouping. One of the eight overloads of GroupBy extension method provides a flexibility to define the result type as an argument to itself. Below is the overload we are looking for-

public static IEnumerable<IGrouping<TKey, TSource>> GroupBy<TSource, TKey, TElement, TResult>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector,
    Func elementSelector, 
    Func<TKey, IEnumerable<TSource>, TResult> resultSelector) 
Here are the various arguments and their meanings:

The first argument this IEnumerable<TSource> source is the input sequence itself and as it is an extension method, the first argument is never actually passed into the method.

The second argument Func<TSource, TKey> keySelector, is a delegate to apply to each element in the input sequence to obtain a key (type TKey). The key is what decides which group the element is associated with. We want our final output to be grouped on SubjectName property of Score object. So, our obvious choice as keySelector will be-

groupingKey => groupingKeySubjectName, //keySelector

The third argument Func elementSelector is a delegate to apply to each element (type TElement) to obtain the value which should be part of the relevant group. Once we have grouped our list by SubjectName, we will need to identify the students with their marks in each group.

So, as an element selector we create an anonymous type with properties Name and Marks

elementSelector => new { Name = elementSelector.StudentName, Marks = elementSelector.MarksObtained },   //elementSelector 

Now the last argument Func<TKey, IEnumerable<TSource>, TResult> resultSelector), is a delegate to apply to each grouping to produce a final result (type TResult). As an output, we want a sequence of objects with properties having subject name, highest scores and name of student who attained highest score.

So, we create another anonymous type with properties SubjectName, HighestScore and HighestScorerName. SubjectName will be same as the key of grouping operation. HighestScore of a subject can be determined from elementSelector of the corresponding group and HighestScorerName is the Name of student who has Marks equal to HighestScore. So, here is how our resultSelector will look like-

(groupingKey, elementSelector) => new { //resultSelector
                SubjectName = groupingKey,
                HighestScore = elementSelector.Max(t => t.Marks),
                HighestScorerName = elementSelector.Where(t => t.Marks == elementSelector.Max(f => f.Marks)).Select(t => t.Name).SingleOrDefault()}

This is it. Lets assemble the GroupBy method on the instance of List from previous example and see what we get

var topScorers = examResult.GroupBy(
            groupingKey => groupingKey.SubjectName, //keySelector
            elementSelector => new { Name = elementSelector.StudentName, Marks = elementSelector.MarksObtained },   //elementSelector 
            (groupingKey, elementSelector) => new { //resultSelector
                SubjectName = groupingKey,
                HighestScore = elementSelector.Max(t => t.Marks),
                HighestScorerName = elementSelector.Where(t => t.Marks == elementSelector.Max(f => f.Marks)).Select(t => t.Name).SingleOrDefault()
            }).Select(resultSelector => resultSelector).ToList();

When you look at the output sequence, notice that this overload of GroupBy does not return the IEnumerable of IGrouping type but it returns the IEnumerable of anonymous type defined in the third parameter (resultSelector) of the GroupBy method. This is all the fun of it.

Saturday, October 22, 2011

C# LINQ To Objects: Simple Grouping using GroupBy

We face number of scenarios in day to day programming when we need to group a number of records based on a key and calculate aggregates like SUM, AVG, MAX, MIN etc on these groups. We do it more frequently in SQL but we can also do it easily in C# using LINQ.

Static class Enumerable in System.Linq namespace defines an extension method- GroupBy with one (simpletest) of the available overloads
public static IEnumerable<IGrouping<TKey, TSource>> GroupBy<TSource, TKey>(
    this IEnumerable<TSource> source,
    Func<TSource, TKey> keySelector) 
This method does to an IEnumerable exactly what GROUP BY in SQL does to a number of records. Lets explore it by means of an example. First we need a class on IEnumerable of which we can apply GroupBy method.
public class Score {
    public string StudentName { get; set; }
    public string SubjectName { get; set; }
    public float MaxMarks { get; set; }
    public float MarksObtained { get; set; }
}
Class Score is a simple class which represents score of a student in a particular subject. There are two float type properties MaxMarks and MarksObtained. These two properties can be used to calculate aggregate on. Lets suppose we have a collection of objects of Score type, each represents the score of a student in a subject and we need to calculate Average marks obtained for students in a particular subject.
Lets first create a list of Score objects
class Program {
    static void Main(string[] args) {
        List<Score> examResult = new List<Score>();
        examResult.Add(new Score() { StudentName = "Steve", SubjectName = "Maths", MaxMarks = 100, MarksObtained = 90 });
        examResult.Add(new Score() { StudentName = "Steve", SubjectName = "Physics", MaxMarks = 100, MarksObtained = 86 });
        examResult.Add(new Score() { StudentName = "Steve", SubjectName = "Chemistry", MaxMarks = 100, MarksObtained = 72 });
        examResult.Add(new Score() { StudentName = "Steve", SubjectName = "Computer Science", MaxMarks = 100, MarksObtained = 91 });

        examResult.Add(new Score() { StudentName = "Sarah", SubjectName = "Maths", MaxMarks = 100, MarksObtained = 85 });
        examResult.Add(new Score() { StudentName = "Sarah", SubjectName = "Physics", MaxMarks = 100, MarksObtained = 76 });
        examResult.Add(new Score() { StudentName = "Sarah", SubjectName = "Chemistry", MaxMarks = 100, MarksObtained = 92 });
        examResult.Add(new Score() { StudentName = "Sarah", SubjectName = "Computer Science", MaxMarks = 100, MarksObtained = 92 });


        examResult.Add(new Score() { StudentName = "David", SubjectName = "Maths", MaxMarks = 100, MarksObtained = 74 });
        examResult.Add(new Score() { StudentName = "David", SubjectName = "Physics", MaxMarks = 100, MarksObtained = 82 });
        examResult.Add(new Score() { StudentName = "David", SubjectName = "Chemistry", MaxMarks = 100, MarksObtained = 85 });
        examResult.Add(new Score() { StudentName = "David", SubjectName = "Computer Science", MaxMarks = 100, MarksObtained = 89 });

        Console.ReadLine();
    }
}
Now we have is a list of type Score having 12 objects of Score for 3 students and 4 subjects and we are all set to determine the average marks obtained by these students in each subject. So, our key to group these records should be SubjectName property and we want to calculate average on MarksObtained property. Here is how it is done-
var avgResults = examResult.GroupBy(rec => rec.SubjectName).
            Select(rec => new { SubjectName = rec.Key, AVGMarks = rec.Average(t => t.MarksObtained) }).ToList();

foreach (var item in avgResults) {
     Console.WriteLine("Subject Name: {0},   Average Marks:{1}", item.SubjectName, item.AVGMarks);
}
Notice that in Select method, we create an Anonymous type with two properties SubjectName and AVGMarks. Finally we get a collection (avgResults) of this Anonymous type.