Techy: Mergesort used in inversion counting

Note. In order to understand the inversion counting method in this blog, you need to be familiar with mergesort algorithm first. Especially the "divide and conquer paradigm" concept applied in mergesort.

I have explained that mergesort can be done in O(n*log(n)) in previous blog here

What is inversion

given an array A with length n, the number of inversion in this array equals to number of pairs (A[i], A[j]) with i < j and A[i] > A[j].

example: array = {1,3,5,2,4,6}, the inversions are (5,2) (5,4) (3,2)

Why study inversion

for example: to study the similarity of movie taste for two people.
identify 6 movies and give them to the two to rank according to their favors.
input result from 1st person as array A and result from 2nd person as array B, then calculate the number of inversions to quantify their movie favor differences.

The design of inversion counting
1.
it uses "divide and conquer paradigm" and algorithm as below:

count_inversion (array A)
count_inversion (left half of A i,j < n/2)
count_inversion (right half of A i,j > n/2)
count_split_inversion (i < n/2 < j)

at each level of count method, the operation carried out is "count_split_inversion". The key challenge is to make sure "count_split_inversion" method can be done in O(n) so that inversion counting can be done with O(nlog(n)).

2, (brilliant idea!)
Piggyback on mergesort!

sort_and_count_inversion (array A)
sort_and_count_inversion (left half of A i,j < n/2)
sort_and_count_inversion (right half of A i,j > n/2)
merge_and_count_split_inversion (i < n/2 < j)

The reason to add mergesort, which has O(nlog(n)) is that it will reduce the inversion count complexity from O(n^2) to O(nlog(n)), and we will see as below:

when do merge_and_count_split_inversion from "sorted left half A" and "sorted right half A", items are copied from them to an empty array (same as in mergesort). Now we can claim that: whenever we compare A[i] and A[j] if we found A[i] > A[j] (i<n/2<j) in this step , an inversion is found. this step is O(n).

Therefore, the computational complexity at each level is O(n) and the total complexity is O(nlog(n)).

C# Implementation

 using System;  
 using System.Linq;  
 namespace InversionCount  
 {  
   class Program  
   {  
     static void Main(string[] args)  
     {  
       Console.WriteLine("please give a integer for the size of array, system will generate internal values and do sorting for you");  
       int size = int.Parse(Console.ReadLine());  
       int[] count = new int[size];  
       RandomizeArray(ref count);  
       SortAndCountInversion(count);  
       Console.ReadLine();  
     }  
     public static int[] SortAndCountInversion(int[] A)  
     {  
       if (A.Length == 1)  
         return A;  
       int size = A.Length;  
       int halfSize = A.Length / 2;  
       int[] A1 = new int[halfSize];  
       int[] A2 = new int[A.Length - halfSize];  
       Split<int>(A, halfSize, out A1, out A2);  
       int[] ResultA = SortAndCountInversion(A1);  
       int[] ResultB = SortAndCountInversion(A2);  
       return MergeAndCoutInversion(ResultA, ResultB);  
     }  
     private static int[] MergeAndCoutInversion(int[] ResultA, int[] ResultB)  
     {  
       int resultSize = ResultA.Length + ResultB.Length;  
       int[] resultArray = new int[resultSize];  
       int i = 0, j = 0;  
       for (int k = 0; k < resultSize; k++)  
       {  
         if (i == ResultA.Length)  
           resultArray[k] = ResultB[j++];  
         else if (j == ResultB.Length)  
           resultArray[k] = ResultA[i++];  
         else if (ResultA[i] <= ResultB[j])  
           resultArray[k] = ResultA[i++];  
         else if (ResultA[i] > ResultB[j])  
         {  
           for (int s = i; s < ResultA.Length; s++)  
           {  
             Console.WriteLine("{0} - {1}", ResultA[s], ResultB[j]);  
           }  
           resultArray[k] = ResultB[j++];            
         }  
         else  
         {  
           Console.WriteLine("odd!");  
         }  
       }  
       return resultArray;  
     }  
     public static void Split<T>(T[] array, int index, out T[] first, out T[] second)  
     {  
       first = array.Take(index).ToArray();  
       second = array.Skip(index).ToArray();  
     }  
     private static void RandomizeArray(ref int[] count)  
     {  
       int size = count.Length;  
       Random random = new Random();  
       int randomInt;  
       for (int i = 0; i < size; i++)  
       {  
         randomInt = random.Next(0, size);  
         count[i] = randomInt;  
       }  
     }  
   }  
 }

Techy

Friday, January 16, 2015

Mergesort used in inversion counting

No comments:

Post a Comment