Sunday, October 11, 2009

Normalization-By-Evaluation, Normalization-By-Focusing

So I've moved to Paris. I finally arrived here last week (after a bit of a summer vacation), to begin a post-doc at Université Paris 7. In addition to my host Paul-André Melliès, there is a ridiculous number of people based here doing exciting work in programming and proof theory, and also quite a few visitors. So I'm excited.

One of my goals with this post-doc is to better connect the general approach to logic and programming that I was trained in at CMU, which could be called a "syntactic" approach, to the more semantic approach prevalent in Europe. "Syntactic" is in scare quotes because it is sometimes used as a derogatory term, slightly gentler than the epithet "symbol-pushing", and I don't think it is really appropriate. Of course I wouldn't think so! But there's a burden of proof in explaining how this thing that looks like symbol-pushing to the outside world is actually just another way (and a useful way) of manipulating semantic content.

So this post is in that vein, and was prompted by hearing Danko Ilik (one of Hugo Herbelin's students) give a talk two days ago about Normalization-By-Evaluation (NBE), and its apparently well-known (but I guess not well-known enough, since I hadn't seen it before) connection to Kripke semantics for intuitionistic logic. The connection is extremely easy to state:

NBE = completeness ○ soundness

That is, constructively speaking, NBE for the lambda calculus (a.k.a. a proof system for intuitionistic logic) corresponds to applying the soundness theorem (if something is provable it is true in every Kripke model) followed by the completeness theorem (if something is true in every Kripke model then it is provable). More precisely, the soundness theorem is applied to bring a lambda term into the universal model of contexts ordered by inclusion, and then the completeness theorem takes it back into the lambda calculus, now in normal form. Darin Morrison recently posted some elegant code to the Agda mailing list, demonstrating this explicitly.

Okay, that's neat, now let's talk about something else: focusing. If you are reading this blog you have probably heard something about it. (If not, you can learn about it here or here or here or here or here or here.) Originally, focusing was conceived as a search strategy for linear logic sequent calculus, achieving efficiency by severely restricting when you could apply the different rules, based on the polarity of connectives (positive ⊗, ⊕, !, ∃, versus negative ⅋, &, ?, ∀). The surprising and important result was that this strategy is complete, i.e.:

THEOREM (Completeness of focusing) [Andreoli]: If the sequent Γ ⊢ Δ is provable in linear logic, then it has a focusing proof.

The converse, soundness, is of course also important, but was obvious in Andreoli's original formulation, where focusing was seen as a search strategy, and a focusing proof was just a special sort of sequent proof.

Nonetheless, we have soundness and completeness theorems. Great! Can we compose them?

Well, let's not go there quite yet...

First, let's move away from linear logic, which may seem exotic, to something more mundane: classical propositional logic. Is there a notion of focusing proofs in classical logic? In fact, there are too many. Whereas the polarities of the connectives of linear logic are all somehow fixed by their inference rules, in classical logic there are many different, equally valid possibilities for inference rules, and as a result the propositional connectives are fundamentally bipolar. The flip side is that the classical connectives can be explicitly polarized, e.g.., we can choose whether to treat conjunction ∧ positively as ⊗, or negatively as &, and similarly with disjunction, negation, and implication. And this act of disambiguation endows the classical connectives with constructive content.

How so?

Well, let's take another look at the soundness and completeness theorems for focusing. Since focusing proofs deal with explicitly polarized propositions, and ordinary (classical) proofs don't, we will relate the two notions via an operation |-| that "forgets" polarity, i.e., collapses ⊗ and & both to plain conjunction, ⅋ and & both to plain disjunction, etc. Then we can state the following:

THEOREM (Classical soundness of focusing): If the sequent Γ ⊢ Δ has a focusing proof, then |Γ| ⊢ |Δ| has an ordinary classical proof.

THEOREM (Classical completeness of focusing): If the sequent |Γ| ⊢ |Δ| has an ordinary classical proof, then Γ ⊢ Δ has focusing proof.

So now can we compose?

Again speaking more constructively, suppose we take as our starting point some version of classical natural deduction, e.g., natural deduction adjoined with reductio ad absurdum, or with Peirce's law, a.k.a. lambda calculus with control operators. And suppose we have a proof (program) establishing (inhabiting) some proposition (type). Now, to apply the completeness theorem to obtain a focusing proof/program, crucially, we must choose some polarization of the connectives. What we obtain by applying the completeness theorem can and will depend on the particular polarization. But then, once we have a focusing derivation, we can again obtain a classical derivation in a deterministic way, by applying the soundness theorem.

And what do we get as a result? Well, we can make two observations:

1. The result is in normal form. I.e., it corresponds to a β-reduced, η-expanded lambda term (or a sequent calculus proof without cuts and with only atomic initial axioms).
2. The result is in continuation-passing style (CPS).

And in fact, the choice we made when applying the completeness theorem—the way we polarized the connectives—is precisely the way we choose among the many different CPS translations, call-by-value vs call-by-name, etc.

[Aside: it is well-known that different CPS transformations correspond to different double-negation translations, proof-theoretically. This may seem at odds with the above statement, since the result of applying completeness followed by soundness is still a proof of the original sequent, not a double-negated sequent. But we can make the connection to double-negation translation more transparent by factoring the soundness theorem by way of minimal logic, i.e, the fragment of intuitionistic logic where negation is defined by ¬A = A ⊃ #, for some distinguished atom #. The soundness theorem is then stated as follows:

THEOREM (Minimal soundness of focusing): If the sequent Γ ⊢ Δ has a focusing proof, then Γ^t, Δ^f ⊢ # has an ordinary minimal proof.

Here, -^t is the operation that interprets positive connectives "directly" (i.e., conjunction as conjunction, disjunction as disjunction, etc.) and interprets negative connectives as the negation of their De Morgan duals, while -^f interprets negative connectives as their duals, and positive connectives as their negations. Now, the result of applying the classical completeness theorem followed by minimal soundness, followed by the inclusion of minimal logic into classical logic, is not necessarily the original sequent, but a (classically) equivalent one. For more details, see Chapters 3 and 4 of my thesis.]

In short, by composing the soundness and completeness theorems for focusing, we obtain a normalization procedure for a proof system/programming language, that moreover fixes an evaluation order based on the polarization. We could call this "Normalization-By-Focusing":

NBF = soundness ○ completeness

So how does this compare to the original equation, NBE = completeness ○ soundness? First, observe that the fact that the positions of soundness and completeness are swapped is purely a historical convention! Since focusing was originally conceived as a search strategy for sequent calculus, it made sense to ask that it be complete. But in any case, it is clear that the focusing completeness theorem is playing a role analogous to the Kripke soundness theorem, and the focusing soundness theorem a role analogous to the Kripke completeness theorem. In other words, in both cases we are going out of a formal system with some arbitrary collection of rules, into a very well-behaved system, and then back into a canonical fragment of the original system.

In other words, I think it is entirely accurate to say that focusing provides a semantics of proofs. And a proof-theoretic semantics at that.

But how does it really relate to other semantic approaches? Again, that's what I'm here to find out. I suspect, though, that there's more than a superficial resemblance between the NBE and NBF equations, and there might be a deeper connection between focusing and the technique of forcing in proof theory.


Rob said...

Great post! Frank has made the point, which I've agreed with, that focusing proofs are the "real proofs" - obviously we've gone pretty far down this road with the negative intuitionistic fragment where we generally think of the canonical forms being the only ones that we care about.

However, I also like the syntax vs. semantics bent you're taking here - it's not like those are the "real proofs," its that the focusing proofs embed the meaning and the unfocused proofs are only interesting insofar as computation is interesting to you at the moment.

This makes me ask how what you've said here is related to Miller's work on CutEliminate-Then-Focus versus Focus-Then-CutEliminate notions of normalization. Thoughts?

Dan said...

Two observations about two observations:

"1. The result is in normal form. I.e., it corresponds to a β-reduced, η-expanded lambda term (or a sequent calculus proof without cuts and with only atomic initial axioms)."

Doesn't this depend on "interpreting into" a cut-free focused sequent
calculus? I.e., could you tell the whole story for a cut-full focused
calculus, in which case you wouldn't read back a normal form?

"2. The result is in continuation-passing style (CPS)."

Doesn't NBE also fix an evaluation order? I think it's like the
definitional interpreter story, where if you write the code in the
obvious way, the eval. order you get depends on the eval. order of the
meta-language. E.g. in Shinwell et. al ICFP '03, when they're using NBE
as an example in FreshML, they through a (unit -> ) into the
interpretation of the function type, since they're working in a CBV

Noam said...

Rob, re the Miller-Saurin proof of focusing completeness: that's a good question. It seems they actually ask it in their paper: "Given a proof Π in MALL with cuts, two processes are available: focalization
and cut-reduction. Do the two processes commutes?" They don't give an answer, though. I think it's significant that they are starting with linear logic, where there is already a good notion of proof identity. With unpolarized classical logic it's not clear to me that you can really ask this question, because cut-elimination equates all proofs (in the sense of "Lafont's critical pair", i.e., you can construct a single derivation D that will reduce to either D1 or D2, for arbitrary proofs D1 and D2 of a sequent -- though maybe this is the wrong way to think of proof identity). I've never understood the Miller-Saurin approach from a "pragmatic" perspective of getting the focusing completeness theorem done with and out of the way (I think it's easier to do that by straightforward induction over derivations, and then it's a lot easier to prove cut-elimination for focusing proofs than for unfocused), but there's definitely something noble about their effort to get a more atomic explanation of focalization.

Dan, re 1: this is related to Rob's question. I think it's important that the cut-elimination argument is "self-evident" (modulo termination of the subformula ordering), so in some sense it doesn't matter whether you interpret the cut rule explicitly or as an admissible inference. But on the other hand, it does matter. Much like NBE, it seems that NBF does "too much too soon" -- it doesn't model realistic operational semantics. On the third hand, as you suggest re 2, John Reynolds and Olivier Danvy have given us techniques for extracting realistic operational semantics from definitional interpreters: CPS translation + closure conversion + defunctionalization. By polarizing classical logic, we already get the CPS translation out of the way. I'd like to understand what the other steps correspond to logically, if they do.

Neel Krishnaswami said...

Rob: they're only the real proofs if you equip the shift operators with exactly the right amount of computational effects. E.g., if your language is really pure and the shifts are just delay operators to control evaluation order then focused proofs will distinguish equivalent proofs.

Also, you can use other notions of semantics to get other NBE systems. E.g., Balat et al.'s POPL 04 paper actually used a form of NBE based on Beth models, not Kripke models, when they gave their coproduct equality algorithm. The covering relation in Beth models gives rise to a Grothendieck topology, which is why they needed Fiore and Simpson's Grothendieck logical relations to prove their algorithm correct.

I think this will ultimately be a very important ingredient in developing a full understanding of the positive connectives. (I've been thinking a little about how to use this to prove things about that weird "nuclear substitution" stuff I've occasionally muttered about.)

Rob said...

Neel: That's if you're thinking of computation as cut elimination. I'm also inclined to think of it as proof search.

That is, in CLF, if there are two not-equal focusing proofs, they're both "pure" but are (by adequacy) representations of different objects/computations.

Noam said...

Neel: I agree there's something about the interaction of the positive connectives with computationally pure implication that hasn't yet been explained in focusing land. But following this train of thought, maybe it's possible to reconstruct a better focusing syntax by looking at Beth models?

Neel Krishnaswami said...

Rob, you're right. The CLF monad is commutative, though, so maybe there's a further refinement to the type theory lurking there.

Noam -- that's a very interesting thought. I recall that Altenkirch and co. in their '01 paper used something a little reminiscent of the higher-order syntax like you came up with for focusing. I had completely forgotten that, because I still haven't fully internalized the idea of computationally higher-order representations of syntax.