Project Euler problem – Multiples of 3 and 5; Small yet insightful (Part 1)

PERSPECTIVE

Programming may not come naturally to everybody, but a little perspective can change things.

Familiarity with the most basic C# programs is assumed. No object-oriented concepts are needed to understand Part 1. If you are absolutely new to C#, it would be beneficial to go through, at least, the following. That is all I ask:

Hello World in C#

This post if for those new to programming and looking for a little perspective. If you can make C# dance to your tunes, do not fret. You can still read and help me improve this post.

PerspectiveRatatouille

It is always wonderful to see a small and simple problem, when properly handled, reveal so much beauty that it is almost impossible to believe.

There was a time when programming made no sense to me. I had no idea how to do it. That is when I found a very simple problem and I was enlightened:

Find if a given number is prime or not

Now, there is not much substance in this problem. So it seems till you poke it and a legion of mathematical beasts are unleashed to challenge your intellect. Number theorists of all ages have struggled to tame these beasts. They still roam about in the wilderness of Mathematics.

THE PROBLEM

This Friday, I was looking for some fun. I created an account on Project Euler . The first question that was thrown at me was:

If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23. Find the sum of all the multiples of 3 or 5 below 1000.

THE SOLUTIONS

This problem seems simple. It is if we have great computing power at our disposal. It is not if we want to properly tame it.

THE NAIVE SOLUTION


using System;

namespace ProjectEuler_Question_1
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Sum of all numbers below 1000 that are multiples of 3 or 5 are: ");
            Console.WriteLine(FindSumOfMultiplesOf3And5().ToString());
            Console.Read();
        }

        static double FindSumOfMultiplesOf3And5()
        {
            double sum = 0;
            for (double i = 0; i < 1000; i++)
            {
                /* % is the modulus operator */
                if ((i % 3 == 0) || (i % 5 == 0))
                {
                    sum = sum + i;
                }
            }
            return sum;
        }
    }
}

This is a solution. It works. The major elements of any computer programme are present:

  1. Data
    – do some operation on numbers below 1000
  2. Data Structure
    – data types used; double, in the code above, to store numbers
  3. Algorithm
    The loop, the modulus operator %, the summation

WhatIsProgramming

Why is this naive, then?
It is naive because of:

Inefficient use of hardware resources:

  • On a Windows 7 virtual machine with one CPU core and 8 GB RAM allotted, the results are given below:
Limit used Time taken
1000 sub-milliseconds range
100000 sub-milliseconds range
10000000 140.62 milliseconds
1000000000 13972.65 milliseconds ~= 14 seconds
100000000000 1599374.20 milliseconds ~= 26 minutes

Look at the CPU madness as witnessed by KSysGuard (KDE System Monitor) on my openSUSE host running on i7-4770 with 32 GB RAM:

CPUActivity

  • On an i5-4670 physical machine with 4 CPU cores and 32 GB RAM, the results are given below:
Limit used Time taken
100000000000 1407631.13 milliseconds ~= 23.5 minutes

There are a lot of variables that can affect the execution time. On of the most important is the hardware. A Pentium 4 PC would have made me wait for a few hours. So, do not assume that the algorithm would always run on great hardware. Design algorithms in such a way that they may run on low-end hardware too.

There is still one question here

Why would we bother about large inputs when the problem specifically speaks about 1000 as the upper limit?

This is a very important question. The answer is quite large. It will be answered in another part.

Inefficient use of intellectual resources:

If we had infinite computing power, data structures and algorithms would not have mattered at all, but we do not. So, it is imperative that we invest some time in finding a better algorithm for the job. This process can be slow and evolutionary in nature. It can even be sudden, but we have to make it happen.

There is another issue that we have not discussed:

How would we go about solving this problem if we had no computer at our disposal?

Just imagine the pain one would have to go through to find out all the multiples of 3 and 5 below 100000000000.

BETTER SOLUTION

A small investment of time can help us in breaking the problem into smaller pieces. These smaller pieces can be solved without any electronic computing power.

Realizing that we are dealing with a couple of Arithmetic Progressions can help us create better strategies. So, we are dealing with:

  1. Arithmetic Progression with 3 as the first number and a common difference of 3
  2. Arithmetic Progression with 5 as the first number and a common difference of 5
  3. Arithmetic Progression with 15 as the first number and a common difference of 15

Where did that Point 3 come from?

This needs some detailed explanation. Let us build a Truth Table for Logical OR to see where Point 3 comes from:

Multiple of 3 (A)
Multiple of 5 (B)
A OR B
T F T
F T T
T T T
F F F

Pictorially Logical OR could be represented as:

VennDiagram_ProjectEuler_Problem1The Naive Solution used Logical OR to filter the input:

if ((i % 3 == 0) || (i % 5 == 0))
{
    sum = sum + i;
}

What all this means is that:

An input can be a “Multiple of 3” OR a “Multiple of 5” OR both. In all these cases the input will be sieved into the conditional block. This happens ONLY once for any input. Thus, the sum is guaranteed to be the sum of unique numbers.

Now we have to take a look at another Truth Table. This time for Logical AND:

Multiple of 3 (A)
Multiple of 5 (B)
A AND B
T F F
F T F
T T T
F F F

When we break our problem into two Arithmetic Progressions, find their sums and add those sums together to get the final result, a small error creeps in. Imagine that we have two sets of numbers – one containing all the multiples of 3 and another that contains all multiples of 5. So, {3, 6, 9, 12, 15, …} is one set and {5, 10, 15, 20, 25…} another.

If we sum the sets individually, we have to do this:

S3 = 3 + 6 + 9 + 12 + 15 + ...

and

S5 = 5 + 10 + 15 + 20 + 25 + ...

As we can see, there are numbers that are in both the sets.

S3 and S5 are sums of unique numbers, but when we are dealing S3 + S5, certain numbers get added twice. 15, 30, 45 etcetera are examples of such numbers. These numbers form an Arithmetic Progression with 15 as the first number and a common difference of 15.

We may expect the answer of the problem to be

S3 and 5 = S3 + S5

The truth, however, is

S3 and 5 actual > S3 and 5 expected

So, to get rid of the error, we have to do the following:

VennDiagram_ProjectEuler_Problem1_3The above picture shows the filtering we need to do. We do that filtering implicitly when we do the following calculation:

S3 and 5 = S3 + S5 - S15

Now what we expect will be equal to the actual output:

S3 and 5 actual == S3 and 5 expected

Armed with so much knowledge, we are now in a position to attack the problem in a really profitable way. The code with the new algorithm is:

using System;

namespace ConsoleApplication5
{
    class Program
    {
        static double limit = 1000;

        static void Main(string[] args)
        {
            double a = FindSumOfArithmeticProgression(3);
            double b = FindSumOfArithmeticProgression(5);
            double c = FindSumOfArithmeticProgression(15);
            double sum = (a + b - c);
            Console.WriteLine(sum.ToString("N"));
            Console.Read();
        }

        static double FindSumOfArithmeticProgression(double multiplesOf)
        {
            /* We will use the following formula to calculate the sum of the Progression:
             * n(a + l)/2,
             * where a = first element, l = last element and n = total number of elements in the progression and 2 = well, 2
             * To find n, we have to use:
             * a + (n - 1)d = l => n = 1 + (l - a)/d
             */
            double a = multiplesOf, d = multiplesOf, l = 0;

            /*Find l */
            l = FindLastElement(multiplesOf);

            /* Find n */
            double n = 1 + (l - a) / d;

            /* Find l */

            /* Find sum */
            double sum = (double) (n * (a + l) / 2);
            return sum;
        }

        private static double FindLastElement(double multiplesOf)
        {
            /* Find the range in which we can find the last multiple
             * In our case, 1000 - 3 or 1000 - 5 is the answer
             */
            double rangeInWhichToFindTheLastMultiple = limit - multiplesOf;
            double l = 0;

            for (double i = rangeInWhichToFindTheLastMultiple; i < limit; i++)
            {
                if (i % multiplesOf == 0)
                {
                    l = i;
                    break;
                }
            }
            return l;
        }
    }
}

This is fast beyond belief. No input is big enough to make it sweat:

Limit used Time taken
1000 sub-milliseconds range
100000 sub-milliseconds range
10000000 sub-milliseconds range
1000000000 sub-milliseconds range
100000000000 sub-milliseconds range

From 26 minutes taken by the old naive algorithm to the sub-milliseconds taken by our new, shiny algorithm, for an input as large as 100000000000, the improvement is unbelievable.

Now that we have brought the execution time down, we are faced with another issue. Yes, we are not done yet. Far from it. The new problem is the output. It is not precise at all

23,333,333,331,666,700,000.00 while it should have been 23,333,333,331,666,666,668.00

What happened? Where did all the good digits go?

Why is the output not precise?

This is because we have not chosen the correct Data Structure. The data type double, chosen for the purpose of holding various numbers, has its limits. It is a 64-bit data type.

Type Approximate range Precision .NET Framework type
double ±5.0 × 10−324 to ±1.7 × 10308 15-16 digits System.Double

Read the System.Double documentation thoroughly to understand what limitations we are dealing with.

FROM double TO decimal

We have to get a better data type if we wish to solve the problem for very large numbers. Let us look at

Type Approximate Range Precision .NET Framework type
decimal (-7.9 x 1028 to 7.9 x 1028) / (100 to 28) 28-29 significant digits System.Decimal

Decimal loses steam when input is 1015.

For anything lesser than that, decimal will work. Decimal is a 128-bit data type. Read System.Decimal documentation for details.

Let us compare the output for double and decimal for 1014 as input

OUTPUT WHEN USING DOUBLE OUTPUT WHEN USING DECIMAL
2,333,333,333,333,320,000,000,000,000.00 2,333,333,333,333,316,666,666,666,668.00

Decimal can handle a really large input, but what if we have an input that is > 1014 ?

FROM decimal TO BigInteger

This is where we go to the data type that is the biggest of them all – The BigInteger.

BigInteger can hold arbitrarily large numbers with one thing to keep in mind

Because the BigInteger type is immutable (see Mutability and the BigInteger Structure) and because it has no upper or lower bounds, an OutOfMemoryException can be thrown for any operation that causes a BigInteger value to grow too large

Now that BigInteger has arrived to rescue us, there is another issue that crops up:

The biggest integer literal that can be defined in C# is ulong with max value of 18,446,744,073,709,551,615

Type Range Size .NET Framework type
ulong 0 to 18,446,744,073,709,551,615 Unsigned 64-bit integer System.UInt64

So anything larger than 18,446,744,073,709,551,615 will make the compiler writhe in pain. It lets us know by spewing out the following exception:

error CS1021: Integral constant is too large

BigInteger has us covered even in such situations. Turning the very large input into a string and using BigInteger.Parse function can help us overcome the issue.


using System;
using System.Numerics;

namespace ConsoleApplication5
{
    class Program
    {
        static BigInteger limit = BigInteger.Pow(10, 80);

        static void Main(string[] args)
        {
            BigInteger a = FindSumOfArithmeticProgression(3);
            BigInteger b = FindSumOfArithmeticProgression(5);
            BigInteger c = FindSumOfArithmeticProgression(15);
            BigInteger sum = (a + b - c);
            Console.WriteLine(sum.ToString("N"));
            Console.Read();
        }

        static BigInteger FindSumOfArithmeticProgression(BigInteger multiplesOf)
        {
            /* We will use the following formula to calculate the sum of the Progression:
             * n(a + l)/2,
             * where a = first element, l = last element and n = total number of elements in the progression and 2 = well, 2
             * To find n, we have to use:
             * a + (n - 1)d = l => n = 1 + (l - a)/d
             */
            BigInteger a = multiplesOf, d = multiplesOf, l = 0;

            /*Find l */
            l = FindLastElement(multiplesOf);

            /* Find n */
            BigInteger n = 1 + (l - a) / d;

            /* Find l */

            /* Find sum */
            BigInteger sum = (BigInteger) (n * (a + l) / 2);
            return sum;
        }

        private static BigInteger FindLastElement(BigInteger multiplesOf)
        {
            /* Find the range in which we can find the last multiple
             * In our case, 1000 - 3 or 1000 - 5 is the answer
             */
            BigInteger rangeInWhichToFindTheLastMultiple = limit - multiplesOf;
            BigInteger l = 0;

            for (BigInteger i = rangeInWhichToFindTheLastMultiple; i < limit; i++)
            {
                if (i % multiplesOf == 0)
                {
                    l = i;
                    break;
                }
            }
            return l;
        }
    }
}

The above code uses an arbitrarily large number to show how BigInteger when coupled with BigInteger.Parse can solve our issues.

CONCLUDING PART 1:

I have covered a lot in this part:

  • The basic structure of a program; data structures, algorithms etcetera
  • How execution time depends on hardware (there are other factors not covered)
  • How choice of data structure can affect the program and its output
  • How to choose proper data types for desired results

This is a work-in-progress and changes will be made. Next few parts will cover some advanced stuff.
There is a lot more to do so as to make the program beautifully generic. All that will be done in the next few posts. We have only breached the topmost layer of the beautiful world of Software Engineering or more precisely the Art of Software Engineering 🙂

When we had no computers, we had no programming problem either. When we had a few computers, we had a mild programming problem. Confronted with machines a million times as powerful, we are faced with a gigantic programming problem. – Edsger Dijkstra