Pedel comes in two flavours.
Given a library of L sequences, comprising variants of a sequence of N nucleotides, into which random point mutations have been introduced, we wish to calculate the expected number of distinct sequences in the library. (Typically assuming L > 10, N > 5, and the mean number of mutations per sequence m < 0.1 x N).
SaabRincon et al (2001, Protein Eng., 14, 149155) constructed a library of 5 million clones with a single round of epPCR on a 700 bp gene. Sequencing 10 of these, indicated an error rate of 34 nucleotide substitutions per daughter sequence. Entering L = 5000000, N = 700 and m = 3.5 into the base PEDEL sever page, and clicking 'Calculate', shows that the expected number of distinct sequences in the library is 4.153 x 10^6, or about 4.2 million.
If you follow the link to 'detailed statistics' and, once again, enter L = 5000000, N = 700 and m = 3.5 and click 'Calculate', you get a breakdown of library statistics for each of the sublibraries comprising all those daughter sequences with exactly x base substitutions (x = 0, 1, 2, 3, ...).
For example the first line of the table shows that Px = 3.02% of the library (i.e. Lx = 1.51 x 10^5 daughter sequences) have x = 0 base substitutions (i.e. they are identical to the parent sequence). The total number of possible variants with 0 base substitutions is, of course, Vx = 1 (just the parent sequence) and the total number of distinct sequences with 0 base substitutions present in the library is, similarly, Cx = 1. The completeness of the x = 0 sublibrary is Cx/Vx = 100%. The redundancy of this sublibrary  i.e. wasted duplication  is LxCx = 1.51 x 10^5.
You also have the option to plot this data by following the 'Plot this data' link. Choose the statistic to plot and whether or not to use a logscale on the yaxis. For example, a plot of Px or Lx gives a Poisson distribution. A plot of Vx shows how the number of possible variants increases very rapidly as the number of base substitutions is increased. A plot of Cx shows how the expected number of distinct sequences in the sublibraries initially increases  limited by the number of possible variants, Vx  and then decreases  limited by the size of the sublibrary, Lx. A plot of LxCx shows the extent of wasted duplication in the lower xvalue sublibraries.
Returning to the base PEDEL server page, you can follow links to plot the expected number of distinct sequences in a library for a range of mutation rates, library sizes or sequence lengths. The third option probably won't be very useful, but the first two will help you to decide what library size to aim for in order to obtain a given diversity, and what mutation rate to use to maximize the diversity for a given library size.
For example, follow the 'mutation rates' link, enter L = 5000000, N = 700 and m = 0.2  20, and click 'Calculate'. From the plot, you can see that the expected number of distinct sequences increases rapidly with m until m ~ 5, and then levels off with < 10% redundancy in the library. On the other hand, if you chose m ~ 1.5, then the library would be about 60% redundant. After selecting an optimal mutation rate m, you can go back to the 'detailed statistics' page to check the expected completeness of the x = 0, 1, 2, 3, ... sublibraries.
PEDEL uses a generic Poisson model of sequence mutations. There are a couple of simplifications that you should be aware of:
A good review of the sources of bias in epPCR (and other directed evolution protocols) can be found in Neylon C., 2004, Chemical and biochemical strategies for the randomization of protein encoding DNA sequences: library construction methods for directed evolution, Nucleic Acids Res., 32, 14481459.
PEDELAA is an extension to amino acid sequences of the original nucleotide version of PEDEL (see links to publications and mathematics notes from the statistics home page). Due to the more complex problem of estimating diversity and completeness at the amino acid level (as opposed to nucleotides and codons), there are some major differences in the algorithms and a few extra approximations. A brief description of the procedure follows.
First, to recapitulate the nucleotide version of PEDEL: As discussed in the mathematics notes, for the nucleotide version, if the input library is conceptually divided into sublibraries Lx (x=0,1,2,...) where the sublibrary Lx comprises all variants in the library with exactly x nucleotide substitutions, then the PEDEL scenario divides into two regions:
In the transition region (where Lx ~ Vx) we can calculate Cx with the formula Cx ~ Vx(1exp(Lx/Vx)). This is based on the assumption that all variants in Vx are equiprobable, so the mean number of occurrences in the sublibrary Lx of each variant in Vx is Lx/Vx and, assuming Poisson statistics, the probability that any given variant is present in the sublibrary is 1exp(Lx/Vx), so the expected number of distinct variants present in the sublibrary is Cx ~ Vx(1exp(Lx/Vx)).
In the more complex scenario presented in PEDELAA, the assumption of equiprobable variants breaks down for two reasons: (i) we have introduced a full 4 x 4 nucleotide substitution matrix (in particular the transition:transversion ratio is not assumed to be unity), and (ii) even if nucleotide substitutions were equiprobable, the corresponding amino acid substitutions are not. However we may still borrow some concepts from the equiprobable nucleotide version of PEDEL  namely, (1) when Lx is small compared with Vx, then Cx is approximately equal to Lx, and (2) when Lx is large compared with Vx then Cx is approximately equal to Vx. However these concepts need some refining, as follows.
The probabilities of variants being truncated (i.e. containing introduced stop codons) are then subtracted from the P(x_nt) distributions. Clearly this is an increasing function of x_nt and, by x_nt = 100, typically less than 0.6% of variants are stopcodon free. Note that the P(x_nt) will no longer sum up to unity; instead (after discarding indelcontaining variants) they sum up to L_eff / L_tot, where L_eff is the 'effective' library size (i.e. the number of variants with no indels or stop codons) and L_tot is the total library size (albeit again excluding variants with indels).
Next, the Poisson and PCR P(x_nt) distributions are redistributed into amino acid P(x_aa) distributions. First the mean number, frac, of nonsynonymous amino acid substitutions per nucleotide substitution (given that the nucleotide substitution doesn't produce a stop codon) is calculated. Typically frac ~ 2/3. For each x_nt, the number of nonsynonymous amino acid substitutions resulting from exactly x_nt nucleotide substitutions is assumed to follow a binomial distribution, B(x_nt,frac) (i.e. x_nt 'trials'; probability of 'success' per 'trial' = frac). Summing up the binomial distributions, each weighted by P(x_nt), for x_nt = 0,1,2,...,100 gives the amino acid Poisson and PCR P(x_aa) distributions. Of course, Sum_{x_nt} P(x_nt) = Sum_{x_aa} P(x_aa) = L_eff / L_tot. The Poisson and PCR amino acid sublibrary sizes, Lx, are given by P(x_aa) x L_tot.
All these estimates rely on the mean number of nucleotide substitutions per variant, nsubst, being relatively small compared with the number of codons in the sequence, so that multiple substitutions in the same codon are not very common. In practice, we limit nsubst <= 0.1 x input sequence length (in nucleotides). In fact, for the Poisson case, we can calculate L0, L1 and L2 exactly (a sum over all possible variants with exactly 0, 1 or 2 amino acid substitutions, multiplied by their probabilities given by the input nucleotide substitution matrix and nsubst, multiplied by L_tot). These calculations agree very well with the 'sum of binomial distributions' method. For example, for the library presented in Volles & Lansbury (2005), we have
'exact' 'binomial sum' L0 3.763e+05 3.861e+05 L1 8.174e+05 8.205e+05 L2 8.795e+05 8.717e+05
Note that the introduced stop codon and indel statistics and graphs are exact calculations (based on the input substitution, indel and nucleotide matrix parameters) and do not use any of the above approximations (except Poisson statistics). The above approximations are only used for the library completeness statistics.
Property  Volles & Lansbury  Firth & Patrick  

    Poisson  PCR 
Truncations (%)  15  15.6  
# Fulllength clones  3.1 x 10^6  3.18 x 10^6  
Protein mutation freq. per aa  0.016  0.0160  
Mean # mutations per protein  2.1  2.12  
Unmutated sequences (%)*  14  10.1  14.0 
# of unique proteins  1.3 x 10^6  1.32 x 10^6  1.29 x 10^6 
# of unique point mutations  1990  1989  
# of unique single point mutations  1566  1618  1618 
The 'Lx < 0.1 Vx_1' criterion for deciding when to use the 'Cx ~ Lx' approximation is sometimes inaccurate, and can be refined as follows.
First consider a single nucleotide substitution in a single codon. There are 9 possible mutated codons. An amino acid mutation that can only be coded by a single codon out of the 9 and that requires a transversion, has only a 1 in 15 probability (assuming a transition:transversion ratio of 3), since if p is the probability of a transversion, then 3p is the probability of a transition, and the total probability of the 9 mutated codons is 6(p) + 3(3p) = 15p.
For example, if the parent codon is GGG (Gly), then the 9 singlenucleotidesubstitution codons are
codon  amino acid  relative probability 

AGG  Arg  3p 
CGG  Arg  p 
TGG  Trp  p 
GAG  Glu  3p 
GCG  Ala  p 
GCG  Ala  p 
GTG  Val  p 
GGA  Gly  3p 
GGC  Gly  p 
GGT  Gly  p 
AA  Probabilities given the codon mutates  Probabilities given the amino acid mutates 

Gly  5/15  (wildtype) 
Arg  4/15  4/10 
Glu  3/15  3/10 
Trp  1/15  1/10 
Ala  1/15  1/10 
Val  1/15  1/10 
The 'Lx < 0.1 Vx_1' criterion assumes that all of the singlenucleotidesubstitution nonsynonymous amino acid substitutions are equiprobable  i.e. 1 in 5 in the above example, but in general represented by the reciprocal of the 'A' factor described in the above section, where typically A ~ 5.8; whereas, in fact, the most common singlenucleotidesubstitution amino acid substitution (GGG > Arg) is 4 x as likely as the rarest (GGG > Trp or Ala or Val). In cases where some nucleotide substitutions (as defined by the 4 x 4 nucleotide substitution matrix) are particularly rare, the probability difference between the rarest and the most common singlenucleotidesubstitution amino acid substitutions at a given site can be much greater.
The 'Lx < 0.1 Vx_1' criterion for being in the 'Cx ~ Lx' region is basically to make sure that there are enough variants in Vx to 'absorb' all Lx sublibrary members so that (within a small error) at most one sublibrary member is equal to any given variant in Vx. In practice, it doesn't matter what the probability of the rarest variants is. What matters for the 'Cx ~ Lx' approximation is that the mean frequency in Lx of the most common variant is < 0.1. In fact the mean frequency of the most common variant in Lx, which we denote by Rx, is easy to calculate for x = 0, 1, 2, ..., 20, ..., and is shown in the PEDELAA output table of sublibrary statistics.
Using these Rx values, the 'Lx < 0.1 Vx_1' criterion would be replaced with the criterion 'Rx < 0.1'. In practice this means that if, in the table of sublibrary statistics, there are Rx values > 0.1, for which the 'Cx ~ Lx' approximation has been used (i.e. x >= 3 and Lx < 0.1 Vx_1), then the particular corresponding Cx values may be overestimates. A warning and html link are given in the table of sublibrary statistics whenever this occurs.
In frame sequence that was mutagenised. Note that all symbols that aren't uppecase ATUGC, will be discarded along with a Fasta header (e.g. '>T. maritima Cystathionine βlyase'), therefore for masked sequences use lowercase.
(nonnegative numbers. Overall scaling is unimportant as this is taken from the 'mean number of substitutions per daughter sequence' parameter.)
To  
From 
