Project Euler / programming problems

benjito · March 17, 2022, 9:57pm

Yeah that is a good point. And this problem comes up fairly quickly - even with 13, we are well out of 32-bit land.

antonTobi · March 17, 2022, 10:05pm

If we precomputed logarithms of the 9 digits we could switch to summing the logarithms, I guess that would start being faster if the input is long enough?

benjito · March 17, 2022, 10:07pm

Not sure if precision is a problem with this one, but yeah that does sound fast

At the very least, you’d get a close answer

antonTobi · March 17, 2022, 10:59pm

Problem 8, Python implementation of the log idea

Basic idea: step digit by digit and keep a running total of the sum of logs of last k digits.
(when a zero is encountered we reset the sum)

N = "73167176531330624919225119674426574742355349194934\
96983520312774506326239578318016984801869478851843\
85861560789112949495459501737958331952853208805511\
12540698747158523863050715693290963295227443043557\
66896648950445244523161731856403098711121722383113\
62229893423380308135336276614282806444486645238749\
30358907296290491560440772390713810515859307960866\
70172427121883998797908792274921901699720888093776\
65727333001053367881220235421809751254540594752243\
52584907711670556013604839586446706324415722155397\
53697817977846174064955149290862569321978468622482\
83972241375657056057490261407972968652414535100474\
82166370484403199890008895243450658541227588666881\
16427171479924442928230863465674813919123162824586\
17866458359124566529476545682848912883142607690042\
24219022671055626321111109370544217506941658960408\
07198403850962455444362981230987879927244284909188\
84580156166097919133875499200524063689912560717606\
05886116467109405077541002256983155200055935729725\
71636269561882670428252483600823257530420752963450";
k = 13

from math import log, e
logs = {str(d): log(d) for d in range(1, 10)}

length = 0
logSum = 0
maxLogSum = 0

for i in range(len(N)):
    if (N[i] == '0'):
        length = 0
        logSum = 0
    else:
        logSum += logs[N[i]]
        if (length == k):
            logSum -= logs[N[i-k]]
        else:
            length += 1
        if (length == k):
            maxLogSum = max(logSum, maxLogSum)

print(round(pow(e, maxLogSum)))

(this does get the correct answer, but I guess for some larger k the accuracy will start to become an issue?)

yebellz · March 18, 2022, 12:06am

Running @antonTobi’s log-sum-based script 500 times

real    0m10.819s
user    0m9.449s
sys     0m1.358s

Running a similar multiplication-based script 500 times

Code for problem 8

# modified from antonTobi's solution

N = "73167176531330624919225119674426574742355349194934\
96983520312774506326239578318016984801869478851843\
85861560789112949495459501737958331952853208805511\
12540698747158523863050715693290963295227443043557\
66896648950445244523161731856403098711121722383113\
62229893423380308135336276614282806444486645238749\
30358907296290491560440772390713810515859307960866\
70172427121883998797908792274921901699720888093776\
65727333001053367881220235421809751254540594752243\
52584907711670556013604839586446706324415722155397\
53697817977846174064955149290862569321978468622482\
83972241375657056057490261407972968652414535100474\
82166370484403199890008895243450658541227588666881\
16427171479924442928230863465674813919123162824586\
17866458359124566529476545682848912883142607690042\
24219022671055626321111109370544217506941658960408\
07198403850962455444362981230987879927244284909188\
84580156166097919133875499200524063689912560717606\
05886116467109405077541002256983155200055935729725\
71636269561882670428252483600823257530420752963450";
k = 13

length = 0
prod = 1
maxProd = 0

for i in range(len(N)):
    if (N[i] == '0'):
        length = 0
        prod = 1
    else:
        prod *= int(N[i])
        if (length == k):
            prod //= int(N[i-k])
        else:
            length += 1
        if (length == k):
            maxProd = max(prod, maxProd)

print(maxProd)

real    0m10.603s
user    0m9.159s
sys     0m1.436s

The numbers that we are handling here all fit within a long (integer) data type (no need for large int computation). I think the underlying machine will perform all of the basic integer arithmetic operations at the same rate per second, regardless of whether it is additions/subtractions vs multiplications/divisions.

Hence, there is minimal difference in the above comparison, besides a minor savings for avoiding the slight overheads of the log-based method.

antonTobi · March 18, 2022, 12:20am

Ah, I guess that means there still won’t be any gains even if N is very large.

How about if k is large enough that the product needs more than 64 bits? At a certain point we will need more precision in the logs, but it feels like if we’re dealing with something like N = 10^9 and k = 10^3 it would be worthwhile to compute the log to sufficient precision in the beginning, since addition should be faster than multiplication for numbers with lots of digits?

antonTobi · March 18, 2022, 12:27am

By the way, isn’t long = 32 bits, which is smaller than 9^13? We do stay comfortably within 64 bits though.

(k = 21 is where we first need more than 64 bits to represent 9^k)

benjito · March 18, 2022, 12:35am

In Java, long is 64 Primitive Data Types (The Java™ Tutorials > Learning the Java Language > Language Basics)

C++ is at least 32

antonTobi · March 18, 2022, 12:36am

Ah, I didn’t know of the top of my head, so I googled and stupidly assumed that there would be a consistent convention across languages

benjito · March 18, 2022, 12:36am

it’s not even consistent across various C++ implementations

yebellz · March 18, 2022, 12:42am

It seems plausible that there could be a point where it becomes advantageous. Maybe if we were also dealing with more than just single digits, but larger numbers as well. That could be an exercise in itself, design a puzzle where the log solution makes sense. However, one does have to be very careful about maintaining precision with the logarithm. With the automatic big integer handling of Python 3, you can seamlessly handle very large integers without worrying about precision loss. However, I’m not too sure how easy or computationally efficient it would be to manage higher precision floating point numbers in Python. I guess there must be libraries, but I’m not familiar with those. But, fundamentally, if you need more than 64-bits of integer range, you’d also need at least more than 64-bits of floating precision when working in the log-domain.

Another puzzle (#99 Largest Exponential - Project Euler) is actually a much better example of something better handled in the log-domain, and it’s quite reasonable since exact precision is not necessary (because we are only looking for the largest).

Sorry, I might be using the wrong terminology. I just mean whatever the underlying 64-bit int type is called, maybe “long long”?

antonTobi · March 18, 2022, 12:55am

But this is also true for this problem: the important part is comparing the sizes of the products. I did use the maxLogSum to recover the maxProduct in the end since there was more than enough precision, but one could also save the index and calculate the product normally in the end (at no significant cost if k is much smaller than N).

So the question is at what point a non-optimal product could have the largest logSum due to a precision error. Or inversely, to how much precision do we need to calculate the logs of the digits to guarantee this not happening for a given k?

yebellz · March 18, 2022, 1:03am

Sounds like a Project Euler question in itself…

So, I think this problem would depend heavily on the specific computing environment, including exactly how log and exp are implemented, and the corresponding floating precision. Let’s pose this question even more concretely:

Project Anton #1

The below Python 3 code demonstrates a log-sum-exp approximation for multiplication, which is precise for reasonably-sized integer values a and b. That is, the final expression often evaluates as True for “not too large” integers a and b.

import math
a, b = 6, 7
round(math.exp(math.log(a) + math.log(b))) == a * b

For large enough positive integer values of a and b, the above approximation becomes inaccurate. What is the smallest product a * b for which the last line of the above code evaluates as False?

Note: this question assumes Python 3.X and depends on the exact behavior of the math library. If the specific version makes a difference, please specify in your answer.

shinuito · March 18, 2022, 2:25am

benjito · March 18, 2022, 2:58am

Well I’m not sure how to narrow it down further, but double precision floats usually have 53 precision bits. So I think stuff starts failing somewhere in the neighborhood of 2^53…

Probably a bit earlier since all the 9s are squished together haha (log(999) and log(998) are closer together than log(100) and log(101))

antonTobi · March 18, 2022, 3:01am

As a theoretically inclined person who would rather think about beautiful Turing machines than the ugly real world with it’s annoying hardware and differing implementations, I’m almost offended having my name put on this question

benjito · March 18, 2022, 3:02am

If you want a theoretical framing, maybe use IEEE 754 and assume the log function works to all 53 bits of precision

Which… now that I’m thinking about it maybe this problem gets easier if we think about log base-2?

Chinitsu · March 19, 2022, 7:58am

Some one post the next problem pls

martin3141 · March 19, 2022, 10:41am

I believe the next problem can be solved by hand, using a bit of number theory. But I can’t guarantee it.

antonTobi · March 19, 2022, 11:05am