Modulating the Permanent

Modulo calculating the permanent is an interesting topic in counting complexity 🙂

Gödel's Lost Letter and P=NP

Muir

Thomas Muir coined the term “permanent” as a noun in his treatise on determinants in 1882. He took it from Augustin Cauchy’s distinction in 1815 between symmetric functions that alternate when rows of a matrix are interchanged versus ones that “stay permanent.” To emphasize that all terms of the permanent have positive sign, he modified the contemporary notation $latex {left| A right|}&fg=000000$ for the determinant of a matrix $latex {A}&fg=000000$ into

$latex displaystyle overset{+}{|} A overset{+}{|} &fg=000000$

for the permanent. Perhaps we should be glad that this notation did not become permanent.

Today Ken and I wish to highlight some interesting results on computing the permanent modulo some integer value.

Recall the permanent of a square matrix $latex {A}&fg=000000$ is the function defined as follows by summing over all permutations of $latex {{1,dots,n}}&fg=000000$, that is, over all members of the symmetric group $latex {S_n}&fg=000000$:

$latex displaystyle mathrm{perm}(A)=sum_{sigmain S_n}prod_{i=1}^n a_{i,sigma(i)}. &fg=000000$

View original post 1,060 more words

Counting Edge Colorings Is Hard

Holants, Galois theorem, complexity dichotomy… Something extremely nice, and something we must learn.

Gödel's Lost Letter and P=NP

JYCaiUW

Jin-Yi Cai is one of the world’s experts on hardness of counting problems, especially those related to methods based on complex—pun intended—gadgets. He and his students have built a great theory of dichotomy, which we covered two years ago. This means giving conditions under which a counting problem in $latex {mathsf{#P}}&fg=000000$ must either be in $latex {mathsf{P}}&fg=000000$ or be $latex {mathsf{#P}}&fg=000000$-complete, with no possibility in-between. The theory is built around a combinatorial algebraic quantity called the holant, which arose in Leslie Valiant’s theory of holographic algorithms.

Today Ken and I wish to discuss a recent paper on the hardness of counting the number of edge colorings, even for planar graphs.

This work is due to our dear friend Jin-Yi—who has been a colleague of each of us—with Heng Guo and Tyson Williams. It is entitled The Complexity of Counting Edge Colorings and a Dichotomy for Some Higher…

View original post 1,308 more words

What to count? Prelude and fugue

Prelude

You do not have to understand it in details to understand the main part of this post. Just enjoy the music of it 🙂

Meidanis and Feijao introduced the SCoJ model, the Single Cut or Join model in genome rearrangement. A model where the small parsimony problem is in P, wow, nice, isn’t it? Eric Tannier drew my attention to this paper and suggested we should work on approximative sampling of most parsimonious SCoJ scenarios on a rooted binary tree. It looked pretty similar to our DCJ sampling and counting work. In case of DCJs, we had an auxiliary graph in which vertices represents AA and BB paths (W-shaped and M-shaped paths), edges represent joint sorting of these components. Each 1-factor represents a subset of DCJ scenarios, naturally, which components are sorted together and which one independently, and we defined a rapidly mixing Markov chain on the 1-factors converging to the distribution proportional to the number of corresponding DCJ scenarios. In case of SCoJs, we can define some auxiliary graph in which vertices represents extremities and edges possible adjacencies in ancestral genomes. For a given 1-factor, it is easy to count the number of SCoJ scenarios, so we can define a Markov chain, the task is to prove its rapid mixing.

After a while we started scratching our heads, and later on it turned out that it is very unlikely that there is any such rapidly mixing Markov chain, since it would imply that RP = NP, in which we do not believe. So good I moved to Theoretical Computer Science, the negative result is a result in mathematics, even if it is not as groundbreaking as the Galois theorem…

But I hoped that at least we can have some positive results on medians. The SCoJ median of odd number of genomes is unique, actually, as Meidanis and Feijao already showed. It is not unique for even number of genomes, but hopefully the solution space is not as complicated as for arbitrary binary trees. At least I hoped, and I gave it as a summer project to Heather Smith, a PhD student at University of South Carolina. Poor girl… But at least she had the opportunity to show how clever she was: she also started to scratch her head that cannot copy our Lemma 22 (in this paper), and indeed, she gave a counterexample that the analogue lemma is simply not true. And after some careful inferring, it turns out that problems start even if we drop the SCoJ model, and consider only independently evolving characters, namely, classical phylogeny.

So this is the Prelude, and now I try to write down the remaining in an understandable way…

Fugue

The fugue will be understandable, but something very peculiar, in its most complicated, complex, baroque style. If I want to introduce it in a more raw and profane way, I would say, “what the bloody hell is going on???”. Or if you would like to learn some Hungarian cursing: “mi a halál redvás fasza folyik itt???”. Definitely the strangest thing I ever see in mathematical phylogeny.

Imagine a star tree, the leaves are labelled with sequences over a finite alphabet. For sake of simplicity, we will work with the easy {0,1} alphabet until we say something else. Each string is equally long and we are looking for the median of them, namely, a string that minimizes the sum of Hamming distances from these strings. Trivial problem, isn’t it? Just take the majority of the characters in each position of the median string. We would like to  to label the center of the star with it, and consider how this median is transformed into the sequences at the leaves.

If the number of strings are even, and the alphabet size is 2, as we fixed it, then there might be multiple solutions in each position. No problem, the solutions might be combined in an arbitrary way, so still it is easy to count how many medians we have. However, assume that we would like to count more: we also would like to count how many ways this median can be transformed into the sequences given at the leaves of the star tree. And here comes something which does not seem to be a big thing but makes really-really a difference.

We can look at the star tree as an unresolved rooted tree, and count how many way the mutations might come in time. See this example:

rootedtree

We will call such a solution as a most parsimonious scenario. There are 6 leaves and six mutations necessary to transform the median sequence at the root to the sequences at the leaves. These mutations might come in any order, so there are 6! possible most parsimonious scenario. This is a general phenomena: whatever the median is, the number of mutations are always the same, so the number of most parsimonious scenarios are the same for each median.

One can also imagine the usual star tree, and then count how many ways the mutations might come on each edges of the star tree, see this:

startree

The sequences at the leaves are the same as in the previous case, just we have a different median. There are still 6 mutations, but now they fall onto 4 edges in a 2, 2, 1, 1 split, and hence the possible most parsimonious scenarios for this median is 2! 2! 1! 1! = 4.

Got it? So in the later case, we distinguish in which order the mutations come at each edge, but does not compare the time of the mutations at different edges.

Also note that different medians have different number of most parsimonious scenarios. Consider the previous median, 1001, that would have only 2 most parsimonious scenario since the 6 mutations would fall onto 5 edges in a 2, 1, 1, 1, 1 split.

And this makes the difference if we would like to count the number of most parsimonious scenarios summed over all possible medians: in the first case it is simply 2k (k n/2+c)!, where k is the number of positions with ambiguous ancestral characters, n is the number of leaves and c is the number of other mutations. In the second case… I really, really have no idea… Different medians have different most parsimonious scenarios and it is unclear if this can be counted quickly or even approximately.

There are the following open questions:

  • What is the median that maximizes the number of most parsimonious scenarios?
  • It is possible to calculate the number or most parsimonious scenarios in polynomial time or is it a #P-complete counting problem?
  • Is it possible to count/sample the most parsimonious scenarios approximately in a stochastic manner? Namely, is there any FPRAS and/or FPAUS algorithm for them? Or is it a problem that cannot be approximated under some reasonable assumption (assuming that RP is not NP)?

What we know is the mixing behavior of the simple Markov chain that walks on the possible median genomes by changing one position in the ancestral sequence at a time and converges to the distribution proportional to the number of most parsimonious scenarios belonging to the actual median. It is torpidly mixing even if the number of leaves is constant 4. Ouch… But this does not prove that there is no FPAUS for this problem…

And please note if we go back from the second model to the first one, the problem is almost trivial, we have a closed form for the number of most parsimonious scenarios that is easy to calculate.

When we wrote our SCoJ paper I give a seminar on it with the title: “How can an easy counting problem be transformed into a hopelessly hard one? Well, very easily…”. Well, indeed, and here something similar happens again: the difference between the two problems is a looking innocent multinomial factor which stands for how many ways we can merge the mutation events on the edges of the tree if we consider they order in a time scale.

Have you ever seen anything like this?

The Braga-Stoye Markov chain and the Big Project

This is a pretty technical post seasoned with some personal memory. I hope I added sufficiently many references such that an eager reader without background in genome rearrangement can understand this post, even if it needs quite a lot of effort.

The Braga-Stoye Markov chain

Sophia Yancopoulos is a very interesting person. She visited RECOMB-CG ’09 in Budapest literally for one coffee break (coming from the U.S…). She arrived at the beginning of the break, pulling her cabin luggage and laughing. Spent a few words with a few of us, and took a taxi to catch the flight back to the U.S. It was a very busy period of her life, but she couldn’t resist the temptation to attend to this annual workshop, she explained.

Her double cut and join model (DCJ) revolutionized the research on genome rearrangement. Although the DCJ model is not quite realistic from a biological point of view, it is similar to the more realistic reversal (or Hannenhalli-Pevzner) model, and computationally much simpler. And at least we know something about its counting complexity. In fact, we do know that the number of most parsimonious DCJ solutions between two genomes is in FPRAS and FPAUS.

Having a trilateral grant with Eric Tannier and Haris Gavranovic, we had a meeting in Sarajevo in 2009 December to work on genome rearrangement problems. We were in a pub on Friday evening, the 11th of December, 2009, when I realized the fact what is now Theorem 15 in this paper. “It must be a rapidly mixing Markov chain” I said enthusiastically. However, it took a while to work out the details since we failed at the point that looked the simplest at first glance.

(Cultural intermezzo: Sarajevo is a beautiful city, a rendezvous of western and eastern culture. The coffee were served with sugar cubes during the coffee breaks, and the Bosnian students attending to our brainstorming sessions proffered the sugar cubes to each other with the question: “kocku?” Kocka means cube in Hungarian, here ‘c’ should not be pronounced as for example in ‘cigarette’ but with a sound similar to what a cricket gives. The conjugation a -> u is the objective case of female words in Slavish languages, recalling my weak memories of Russian language learning. So I could understand ‘kocku?’, and although I am not much of a linguist, I can pretty much enjoy such discoveries.)

So the simplest case (on which we worked in Sarajevo between coffee breaks) is when the adjacency graph is a single cycle. We have a closed formula for the number of DCJ scenarios in this case and we also know the structure of the solution space, thanks to AĂŻda Ouangraoua and Anne Bergeron. We know that small alterations are sufficient to get an irreducible Markov chain, ie. to get from any solution to any other solution with a series of such small alterations. What we do not know whether or not this Markov chain is rapidly mixing even in this simplest case. That’s why eventually we came up with a different solution: we used the fact that the solutions of the simple cases can be counted and sampled from the sharp uniform distribution in polynomial time, we factorized (partitioned) the state space of an arbitrary case into such simple cases and worked out a rapidly mixing Markov chain for the general case walking on the partitions. It was sufficient to prove that the number of most parsimonious DCJ scenarios is in FPRAS and FPAUS.

Marilia Braga and Jens Stoye proved that whatever the two genomes are, small alterations are sufficient to transform any solution to any other solution. In fact, the small alterations are the change of two (!!!) consecutive steps. It is easy to engineer a Markov chain Monte Carlo converging to the uniform distribution of the solutions using the small alterations that Marilia and Jens described, and we call this Markov chain the Braga-Stoye Markov chain. Although we already know that the number of most parsimonious DCJ scenarios are in FPRAS and FPAUS (and any proof or disproof on the rapid mixing of the Braga-Stoye Markov chain will not change this fact), I would go mad to see a proof (or disproof…) that this Markov chain is rapidly mixing. In the next section, I will tell you why.

The big project

Timothy Brooks Paige was my summer student in 2005. We discovered the ‘big islands’ in the solution spaces of reversal sorting scenarios. Consider two hurdles separated by a small oriented cycle (for a definition of hurdles and oriented cycles, see this presentation). The possible first sorting steps fall into three categories:

  • merging the two hurdles
  • cutting a hurdle
  • sorting the oriented cycle

It is easy to show that any solution starting with the merge of two hurdles must finish with the sorting of the oriented cycle. This oriented cycle cannot be sorted earlier after the hurdle merge: either it is an unoriented cycle or the sorting of it would create a new hurdle. On the other hand, this cycle can anytime be sorted after a hurdle cut. What follows is that the solutions starting with a hurdle merge do not share any intermediate genome with the solutions starting with other sorting steps, except the last intermediate genome containing only the small oriented component. Namely, if we represent the solutions with a directed graph, in which the vertices are the start, the end and the possible intermediate genomes, and there is an edge from v to w iff there is a reversal transforming v into w, then there is a ‘big empty island’ between the two flows of solutions. With other words, if one would like to design an MCMC on the most parsimonious reversal sorting scenarios that cut out a part of the scenario, and draw a new path between the two intermediate genomes at the two ends of the resampled window, then this Markov chain will not be irreducible unless an arbitrary large window can be cut out. Compare this with the Braga-Stoye chain where a size two window resampling is sufficient to get an irreducible Markov chain.

A PhD student of Bernard Moret, Krister Swenson visited me in 2007 autumn, and he showed me a much simpler case having this big empty island property. Following his ideas, I was able to find the simplest possible case: the solution space consists of only two paths not sharing any intermediate genome. This was the key to our theorem proving the following. If one creates a Markov chain that walks on the most parsimonious reversal sorting scenarios by cutting out an arbitrary window and resampling it using sequential importance sampling, then this Markov chain will be torpidly mixing. And nobody has any better idea than using sequential importance sampling to resample a window…

So the only known simple (ie. not applying parallel chains) and irreducible Markov chain walking on the most parsimonious reversal scenarios is also known to be torpidly mixing. The big project is to design an irreducible and rapidly mixing Markov chain on the most parsimonious reversal scenarios. I have a few ideas how to design such Markov chains, and one of them is related to the Braga-Stoye Markov chain.

Cooling down DCJ sorting scenarios to Hannenhalli-Pevzner scenarios

When an MCMCer bumps into a problem of slow mixing, a handsome solution is Parallel Tempering. But when the target distribution is the uniform one, there is no use of heating in the usual sense. However, we had the following idea: consider the DCJ sorting scenarios. Some of them will have many circular chromosomes in the intermediate genomes, and some of them will have none. These later are the Hannenhalli-Pevzner solutions, having only reversals in case of unichromosomal genomes and only reversals, reciprocal translocations, end-chromosomal translocations, chromosome fusions and fissions, ie. Hannenhalli-Pevzner type rearrangements in case of multichromosomal genomes. So we can work out an energy function that counts the number of circular chromosomes along the DCJ sorting path, and then the corresponding Boltzmann distribution at low temperatures will be dominated by the HP solutions (and restricting the distribution to it is the uniform distribution), and at infinite temperature, it will be the uniform distribution of DCJ solutions. We proved the following (Andrew Wei Xu also had a remarkable contribution on these proofs):

  1. Having a topology defined by small alterations, the energy landscape is such that all local minima are global, ie. Hannenhalli-Pevzner solutions. Namely, the cooling process cannot be trapped into a minimum which is not global.
  2. a polynomial number of chains is sufficient with the following properties:
  • the coldest chain is dominated by the HP solutions
  • the hottest chain has infinite temperature
  • the exchange probability (expressed as a Metropolis-Hastings ratio) is at least 1/2 between any two neighbor chain.

I remember this was the first case we had some computer aided proof. Eric Tannier did not believe in the first property, he gave me sort of pathological cases, and I wrote a Monte Carlo program to search for local minimum which happened to be always global. Looking at the particular ways how the global minimum can be reached, we managed to understand the background structure and work out a (now pure mathematical) proof.

These are pretty promising properties, but they do not prove rapid mixing in any sense. Just one thing: if we apply small perturbations in the individual chains, then the hottest chain will be the Braga-Stoye Markov chain. Shall I explain more why I am keen to know the rapidness of the Braga-Stoye chain?

MCMCMCMCMCMCMCMCMC…

MCMC stands for Markov chain Monte Carlo. MCMCMC is simply parallel tempering, but funny bioinformaticians call it Metropolis Coupled Markov chain Monte Carlo (bioinformaticians love to give funny names). Together with Aaron Darling, we managed to sneak in one more MC to get MCMCMCMC: Model Changing Metropolis Coupled Markov chain Monte Carlo. It is a parallel Markov chain Monte Carlo, ie. MCMCMC, with the additional complication that the Markov chains do not walk on the same state spaces, so the states does not simply exchanged, but transformed. In our particular case we developed for reversal sorting scenarios, the target distributions were the uniform distributions of k-long prefixes of most parsimonious reversal sorting scenarios, one chain for each k. The exchange between two Markov chains means the (random) elongation of the shorter prefix and the (deterministic) chop of the longer prefix, applying the appropriate Metropolis-Hastings ratio. At least it is an idea. If all moves are exchanges, then this Markov chain remain torpidly mixing, in spite of looking sophisticated, the same counterexample we developed to the window sequential importance sampling suffices to prove it. However, with exchange and small alterations in the individual chains, this MCMCMCMC is rapidly mixing on the same particular example. An open question if it is always rapidly mixing.

And the famous four reversal conjecture

As I mentioned, small alterations do not make a Markov chain irreducible on the most parsimonious reversal scenarios. As long as small alterations means changing a few consecutive steps, and the solutions are represented by the series of intermediate genomes. I have a conjecture that small alterations might suffice in the following way:

Let Sigma be the alphabet representing possible DCJ operations on k long permutations in the following way: a character (a b | c d) in Sigma means that extremities a and b form an adjacency, c and d also form an adjacency, and the DCJ operation shuffles b and c. With this notation, (a b | c d) ~ (b a | d c) ~ (d c | b a) ~ (c d | a b). Equivalent characters are treated as the same characters.

Let Pi be a k long, hurdle-free signed permutation. Construct a graph whose vertices are the possible most parsimonious reversal scenarios represented as sequences over Sigma. Connect two vertices iff the length of their longest common subsequence is at least the length of the individual sequences minus four. The conjecture is that this graph will be connected whatever Pi is. With other words, the conjecture is that the following Markov chain is irreducible: start with an arbitrary most parsimonious reversal scenario, represent it as a series of DCJ operations, rule out 4, not necessarily consecutive steps, add 4 new DCJ operations, not necessarily to the same place, not necessarily to consecutive places, and keep it if the resulting sequence is a meaningful, most parsimonious reversal scenario (with appropriate Metropolis-Hastings ratio), otherwise the Markov chain remains in the same place. The permutation -1 -2 -3 -4 is a counterexample that less connection is not sufficient for irreducibility.

Good luck proving this conjecture… and if this conjecture is true, then we still do not know the mixing time of this Markov chain, so some proof of rapid mixing is needed. Or disproof, and then this is another dead end in this project…

There is a simpler version of the four reversal conjecture on black and white graphs, so far we have partial results on linear graphs with two undergrad students, Eliot Bixby and Toby Flint. Some hope.

Isn’t the Braga-Stoye Markov chain more promising?

Conclusions

So we need a proof (or disproof) of rapid mixing of the Braga-Stoye Markov chain.

Furthermore, let SBR denote the problem of sorting by reversals (finding a most parsimonious reversal scenario), and let #SBR denote the corresponding counting problem (counting the number of most parsimonious reversal scenarios). We are looking for the computational complexity of #SBR. The possibilities are:

  • it is in FP. Very unlikely. Adam Siepel had a paper on the algorithmics of looking one step ahead, it is already messy, and there is a clear combinatorial explosion as we go further, unclear how to handle.
  • It is in FPAUS (and thus FPRAS since the problem is self-reducible). I am keen to prove this, but you can see that it seems to be extremely hard.
  •  It can happen that it is not in FPAUS (and thus not in FPRAS), at least under some reasonable computational complexity assumptions, ie. assuming that RP is not NP. I am planning to have a post on speculating that there seems to be always some NP-complete problem in the background of a non-approximability result, and well, there are plenty of NP-complete problems in genome rearrangement. It could be the case that all of our previous effort were just waste of time and energy, but if this is the case, then at least I would like to see some mathematically rigorous proof of it…
  • Oh, and if somebody could prove that it is in #P-complete, I would pretty much appreciate it…

Two nasty counting problems

OK, let’s get started. Here are two nasty counting problems, my favorite ones. They look pretty easy but they are actually extremely hard. That’s why they are nasty.

The first one

Given k different colors, and from each color, there are n1, n2 … nk number of balls with the given color (ni might be 0). Furthermore, there are k-1 gray balls, gray is not amongst the k colors. How many ways are there to order the balls into a sequence such that the number of gray balls in any prefix is at most the number of finished colors minus 1. A color is finished in a prefix if it does not appear in the corresponding suffix.

Prove of disprove that this counting problem is #P-complete.

There is a simple Markov chain which swaps two consecutive balls if the resulting sequence is also an allowed order. It is easy to show that this Markov chain is irreducible. Prove or disprove that this Markov chain is rapidly mixing. Note that rapid mixing would provide an FPRAS approximation as the counting problem is self-reducible.

Please, measure the size of the problem in unary. Namely, the size of the problem is the number of balls, and not the number of digits you need to describe the problem.

The second one

Prove or disprove that counting the most parsimonious DCJ scenarios  is #P-complete.

Why they are hard?

Good question. My opinion is that we have too little freedom here. To be able to prove #P-completeness, another #P-complete problem should be reduced to it, like the #2-SAT. However, a single series of numbers completely define the first counting problem. How to transform a 2-CNF into this colored balls problem? Unclear. Funnily, the suggested Markov chain is also nastily hard to prove to be rapidly mixing…

The other problem is at least known to be in FPRAS and FPAUS. But also has too little freedom. We have a formula to calculate the exact number of solutions, which is permanent-like. The problem with it is not that it is only permanent-like, but we have a very special matrix here: all the values are determined by the raws and columns, ie. the degree of freedom grows only linearly with the sum of the number of columns and rows. With other words, the degree of freedom is approximately the square root of the size of the entire matrix. Could such a simple problem already be #P-complete?

Another possible way to solve these problems is to prove that they can be solved in polynomial time. Good luck! Did I mention that they were nasty? I mean, they are really, really…