If still troubled why not leave a comment?
A good library for directed evolution is a diverse targetted one.
The target of the library can focus from a few residues to a whole region: the former is done with degenerate primers, while the latter is done by error-prone PCR. Although some combinations of the two are sometimes done.
In terms of degenerate primers (or more correctly primers with degenerate nucleotides, also called QuickChange primers after the original kit), the app GlueIT can help in determining which degenerate codons to use and how many to reasonably alter, while the app MutantPrimers can help by designing the primers. Unfortunately, the annealing temperature of the PCR reaction plays a large part, so the app QCCC was made to dermine the extent of randomisation.
In terms of error-prone PCR, the starting template and nucleotide balance can be altered to achieve different mutational loads and biases.
First, the amount of starting template controls the average mutational load of the library. If this is low, there will be a large redundancy in the library, including a significant fraction of wild type sequences, but if this too high, the library will be dominated by deleterious mutations masking beneficial ones.
Second, the PCR method employed has a profound effect on the resulting mutational spectrum, which the more biases it is the less diversity is present.
Ideally, high enough that you have little redundancy, while low enough that you have most single amino acid variants accessible via a single mutation, which is somewhere around 5 mutations per kb (see PedelAA for your specific case).
The thing to remember is that the number of mutations per sequence follows a Poisson distribution: if you had an average mutational load of 1.0, 36% of your library will have zero mutations and 36% with a single nucleotide mutations as opposed to all sequences with a single mutation. Also, two thirds of nucleotide mutation are missense mutations, so a load of 1 mutation per kb is just a bit less than half wild type at the amino acid level.
There are several papers that explore the fraction of mutations that are beneficial, neutral or deleterious. It depends on the protein and whether one is looking at it from structural (ΔΔG) or a catalytic (ΔΔG‡) point of view. But roughly, 1-10% are beneficial, 10-20% are deleterious and the rest neutral (to see an example or calculate on your own protein see landscape app). Therefore for a protein, for every 1.3 mutations that the load increases, the fraction of non-dead protein decreases by 20% (or whatever the value is), so at an average 5 nucleotides mutations 57% of the library is dead (0.8^(5*0.7)). But the rest are neutral, or neutral with beneficial mutations!
It depends on the desired mutational load, on the importance of mutational biases... and how much time you are willing to troubleshoot.
The simplest and cleanest method is using the error-prone enzyme Mutazyme (Promega GeneMorph kit; technically Pfu PolB-Sso 7d D215A D473A), which has a low error rate (about 0.9/doubling/kb), but is less biases and is less likely to give a smear or no PCR products on an agarose gel (although it is nowhere as robust as say a normal Q5 reaction).
Another method is manganese mutagenesis, where up to 5 mM manganese are added to a Taq reaction, which results in a high error rate, but the product is highly biases towards adenine mutations, so is often counterbalanced by using unequal NTPs.
A third method is using nucleotide analogues that increase mutations, such as 8-oxoGTP or dPTP (Jena Bioscience kit), which result in a very high mutational load.
Unfortunately, the latter two strongly affect PCR yields, which means that one cannot mix and match, say manganese with Mutazyme. Similarly, with mutation shuffling methods there is a sensitive PCR step, but neither DNA shuffling or StEP work with epPCR to a usable/satisfactory degree.
A recent development is getting a synthetic library which scan all mutations or similar. These completely circumvent the above issues, but are expensive and require a great deal of study into the detail of the final product to make sure that the limitations don't interfere. For example, in the case of a scanning library, where each variant will have only a single mutation, will not be able to find epistatic cases etc.).
If you believe it is something wrong with our code, please contact us. Here are some pointers:
Unfortunately, yes: it both has a strong effect on diversity and is a strong phenomenon.If the mutational spectrum greatly differs from an equiprobably scenario, the diversity is greatly reduced. Some methods of introducing random mutations favour certain mutations so much that certain desired amino acid changes, such as a glutamaine to glutamate, becomes very unlikely. For more see: Matteo's blogpost on mutational biases
Library size is the number of variants counted or estimated after the library construction transformation. This is not the population size of the culture, which will be repeats of originals (unless using the MP6 plasmid) and no unique variants are added. Often the test strain is sick and poorly competent, so two transformations are done. Namely, a library is assembled, transformed (into a high competency strain, e.g. DH5α, TOP10 etc.) and plasmid prepped and re-transformed (into the test strain). In this case the library size is based on the original transformation as no new variants are getting added —worse still, there is a bottleneck event which will result in some skewness.