public abstract class Unknown<V> extends java.lang.Object implements Lazy<V>, java.util.concurrent.Callable<V>
Base class for lazy values.
Lazy values differ from proper values insofar as they need to know how to exactly evaluate themselves. This is the job of the _e() and _v() functions.
The functionality of _e()
is fixed through final
and cannot be changed.
It calls _v()
in a loop until the value is fully evaluated and then
caches it in a private instance variable. Hence, instances of
Unknown are not immutable and this in turn requires synchronization.
The caching of the evaluated value would not be strictly necessary, as repeated evaluation must always yield the same result. Experience shows, however, that re-evaluations are quite common, hence not caching the result would most likely cause even the simplest program to perform unacceptably bad.
The real work, then, is done by _v(). _v() may return a proper value or another lazy value, though it is required that the latter is somehow closer to the final result, so that evaluation of that value gets even closer and repeated evaluation of the indermediate lazy values eventually reaches the result.
Lazy values are introduced at 3 occasions:
Code for self referential values. Take the following example:
// the list of fibonacci numbers fibs = 0:1:zipWith (+) fibs (tail fibs)
One cannot just compile this thus:
static final TListfibs = DCons. inst.run(box(0), DCons. inst().run(box(1), Prelude. zipWith().run(Prelude.INum_Int._plus(), fibs, Prelude. tail().run(fibs))));
The problem is that the java variable fibs is not yet assigned
at the time it is passed to zipWith.
The solution is:
static final Lazy> fibs = new Unknown >() { final Lazy > _v() { return DCons. inst().run(box(0), DCons. inst().run(box(1), Prelude. zipWith().run(Prelude.INum_Int._plus(), fibs, Prelude. tail().run(fibs)))); } }
In this way, evaluation of fibs is avoided unless it is really needed and the reference is assigned a non-null value.
Here is a short explanation to clarify what a non-strict argument is.
Many functions have the property that they do not always need all their arguments to compute their return value. A classic example that should be familiar to Java programmers is the function performed by the && operator: if the left operand is false, there is no need to evaluate the right operand because the result is false no matter what the boolean value of the right operand is.
It is important to note that the guarantee to evaluate certain
operators such as &&, || or the ternary ?:
operator lazily
takes them semantically apart from all other operators and
methods in the otherwise strict evaluation model of java.
It is, for instance, not possible to wrap these short-circuit operators
in a java method. Consider the expression:
a == null ? "null" : a.foo()
Here, the method foo() can only be called when a is not null, but whether it is called or not is decided in the ?: function itself, not, as usual, before the evaluation of the function starts. Therefore, the following method has a completely different semantic than the operator:
static<T> T eitherOr(boolean cond, T t, T f) { if (cond) return t; return f; }
If one would replace all occurences of the ?: operator in a java program
with calls to the eitherOr method, the resulting program would behave
differently in those cases where the evaluation of the second
or the third argument to eitherOr would not deliver a result.
For example, in
eitherOr(a == null, "null", a.foo())
a NullPointerException would be thrown in the case where a was indeed null, where in the original program the result was the string "null".
In the context of the frege language, we say that evaluation of an expression diverges if it fails to deliver a value of the expected type. This subsumes, for example, endless recursion or looping, run time exceptions like division by zero and so on. Divergence of an evaluation is semantically equivalent to non-termination: the instance that caused the diverged evaluation will never receive an answer.
Because we cannot foresee if a particular expression actually will diverge or not when evaluated, we assume that (evaluation of) any expression can diverge. To formalize this, we imagine every type is lifted so that it contains an extra undefined value (sometimes called bottom). Thus, an expression of type Int can, at runtime, evaluate to an integer number or to the undefined value, one of type Bool to true, false or undefined, and so on. (In the frege runtime, when evaluation detects the undefined value, some excpetion will be thrown.)
Lazy evaluation means that expressions are evaluated at the latest possible
moment. This is what java does with its short-circuit operators.
For instance, whether the second or third operand of the ?: will be evaluated
depends on the boolean value of the first operand.
A lazy language is simply one where all functions and operators
are "short-circuit" and any
expression will be evaluated only when absolutely necessary, for instance
when it must be printed on the users screen.
Of course, lazy evaluation does not come for free. Instead of values one must pass around representations of expressions that evaluate to values unless they diverge. Let us call such descriptions of how to compute a value lazy values.
Construction of lazy values costs processing time and memory. The time needed in construction of the lazy value is not lost, however. If the value is actually not evaluated, the time that would have been spent evaluating it needlessly is, so to speak, payed back.
There are expression, though, where we know that they will be evaluated. For example, in the frege version of the eitherOr from above:
eitherOr true x y = x eitherOr false x y = y
we know for instance that if eitherOr a b c is evaluated, then a absolutely must be evaluated also. This opens opportunities for optimizations that remove unneeded lazyness without changing the behaviour of the program.
We say that a function is strict in (one of) its arguments if it is unavoidable that the function evaluates it. This assumes that the function itself is called only when their value is needed.
Now, if evaluation of an expression diverges (never returns) and if that expression is passed as argument to a function that is strict in that argument, evaluation of that function will also diverge.
Hence, it does not make a difference from a semantic point of view, whether we pass a strict argument in evaluated or lazy form: If the evaluation of the argument diverges, the evaluation of the function call will also diverge. If the argument evaluates to a proper value, the function call may or may not diverge, but if it does, it will not be the fault of the argument.
Conversely, it will make a big difference whether we evaluate a non strict argument before we actually call the function. This is precisely the semantic difference between the java method eitherOr and the java ternary operator ?:
It should be clear by now, that the first argument of the ?: java operator is strict, while the others are non-strict.
The frege compiler does a so called strictness analysis that helps it find most function arguments (and, in general) pattern bound variables that are strict.
In our running example the compiler would correctly infer that the first argument to eitherOr is strict, and the other two are non strict. When code for a call to eitherOr must be generated, it will make sure that expressions in the place of the second or third argument are passed lazily, while the expression given for the first argument will have been evaluated already to avoid the costs of creating a lazy value.
Non-recursive tail calls.
If the return value of a function is expressed as a function call, we call this a tail call. If a function tail-calls itself, this is tail recursion. Here are examples:
even 0 = true even n = odd (n-1) odd 0 = false odd n = even (n-1) factorial n | n >= 0 = helper 1 n where helper r 0 = r helper r n = helper (r*n) (n-1)
Function "helper" is tail recursive and will be compiled to a while loop. Functions "factorial", "even" and "odd", however, employ tail calls. In "factorial", this is not a problem, it just sets up parameters for the very efficient "helper" function, thus we can expect a maximum stack depth of 2 no matter what the value of "n" is.
Things are different with "even" and "odd" that would need stack space
proportional to the value of their argument if implemented naively.
But frege functions may simply return their tail call in the form of
an Unknown object instead of actually evaluating it themselves. This way, a long
series of tail calls can be converted to a while loop, see _e()
.
Unfortunately, this does not work in every case. Take the even/odd exmple in a slightly changed form:
even 0 = true even 1 = false even n = odd (n-1) odd n = !(even n)This will very soon run out of stack space with greater n. To see why, imagine even 5 had to be evaluated.
even 5 = odd 4 by 3rd rule of even odd 4 = !(even 4) by definition of odd !(even 4) = !(odd 3) by 3rd rule of even !(odd 3) = !(!(even 3)) !(!(even 3)) = !(!(odd 2)) !(!(odd 2)) = !(!(!(even 2))) !(!(!(even 2))) = !(!(!(odd 1))) !(!(!(odd 1))) = !(!(!(!(even 0)))) !(!(!(!(even 0)))) = !(!(!(!true)))
Because ! must first know the value to negate it must evaluate its argument before it returns. Hence, evaluation of (odd n) results in n nested negations, i.e. in a stack depth of n.
This effect happens always when the recursion is done indirectly in an argument of a strict function. Another well-known case is the naive implementation of a function that tells a lists's length:
naiveLength [] = 0 naiveLength (_:as) = 1 + naiveLength as
Here, the recursion takes place in the 2nd argument of the addition. To add 1 to something, one must first know to what, hence the whole list must be consulted before even the first addition can take place.
Compare the better implementation:
length xs = helper xs 0 where helper [] !count = count helper (_:as) !count = helper as (count+1)Here, only tail call recursion takes place and because counter is strict, the increment will be evaluated in each loop.
Constructor and Description |
---|
Unknown() |
Modifier and Type | Method and Description |
---|---|
V |
_e()
Evaluates a lazy value.
|
boolean |
_u()
Tells if this is a yet unevaluated value.
|
abstract Lazy<V> |
_v()
Do the real work in evaluating the result.
|
<T extends Lazy<T>> |
as() |
V |
call()
Implemenation of the
Callable interface |
Lazy<V> |
eval()
Backwards compatibility
|
boolean |
unknown()
Checks if this value is unevaluated.
|
protected volatile V result
public final V _e()
Evaluates a lazy value.
This is done by calling this._v() and, while the return value is still unevaluated, the _v() of the returned value and so on until the result is evaluated.
The evaluated value will be a subclass of FV
, such as
a boxed primitive value, a Box
java reference value or
even a function type.
e_() must be synchronized because parallel threads could try to _e() us at the same time.
Deadlocks could occur only when one manages to construct two different Unknown objects a and b, where during evaluation of a the evaluated b is needed and during evaluation of b the the evaluated a is needed and one thread tries to evaluate a, thereby locking a first and another thread tries to evaluate b, thereby locking b first and then the first thread waits for b while the second one waits for a.
This could happen, for instance, when the value of "blackhole" is computed:
a = b+1 b = a-1 blackhole = case a `par` b `par` (a,b) of (x,y) -> x+y
which should compile to something like this:
static Lazy<Box.Int> a = new Unknown<Box.Int> () { Lazy<Boxed.Int> _v() { return Box.Int.mk(b._e().j + 1); } } static Lazy<Box.Int> b = new Unknown<Box.Int> () { Lazy<Boxed.Int> _v() { return Box.Int.mk(a._e().j - 1); } } ...
This, of course, is a programming error and, ironically, no answer is in some sense the correct answer to the question what number a or b is exactly.
In a non parallel setting (i.e., when the `par` operator degenerates to the ordinary tuple constructor) the above equations would simply result in a stack overflow. From the stack trace one could easily deduce that there was a blackhole, for it would look like
called from Unknown$1._e called from Unknown$2._v called from Unknown$2._e called from Unknown$1._v called from Unknown$1._e ...
public abstract Lazy<V> _v()
Do the real work in evaluating the result.
From the type one can see that eval() is not required to return the final evaluated value. It can also return another unevaluated value, though the understanding is that the returned value must be somehow closer to the final result, so that, by calling that values eval() one gets even closer to the result and so forth until it is finally reached.
This is the method subclasses of Unknown must implement.
public final boolean unknown()
Checks if this value is unevaluated.
If this is not the case, the value is cached and can be
delivered via _e()
immediately.
If evaluation is in progress in another thread, this method will block and return false when evaluation completes.
public final boolean _u()
Lazy
Tells if this is a yet unevaluated value.
_u
in interface Lazy<V>
Unknown
and false in all other cases.
This function helps to avoid instanceof and explicit type conversions.
It is used by _e()
to decide if another invokation
of _v()
is needed.
Non-lazy values just return false.
public final V call()
Implemenation of the Callable
interface