Given a library of L sequences, where each sequence is chosen at random from a set of V equiprobable variants, we wish to calculate the expected number of distinct (i.e. unique) sequences represented in the library. Alternatively, given a set of V equiprobable variants, we wish to calculate the library size L necessary to obtain a given percentage completeness, or to have a given probability of being 100% complete. (Typically assuming V >> 1, e.g. V > 10.)
For the default values on the web server, there are a total of one million possible variants. This is roughly equivalent to an oligonucleotide directed randomization experiment involving four NNS codons (which gives V = 324 = 1048576).
In the first panel, the experimenter has constructed a library of three million transformants and wishes to estimate how many of the one million possible variants are represented in the library. The answer is ~95.02% or 950200 variants.
In the second panel, s/he knows that there are one million possible variants, and wants to know how big her/his library should be in order to ensure that ~95% of them (i.e. 950000 variants) are represented. The answer is that a library of ~2.996 million transformants is required.
In the ideal situation her/his library would contain all one million variants. In the third panel s/he calculates the library size required in order to be 95% sure of complete representation. The answer is ~16.79 million transformants.
Programme to find the expected amino acid completeness of a given library
(not counting any variants with introduced stop codons) where the sequences
in the library are chosen at random from a set of XYZ codon variants (where
X,Y,Z are standard
nucleotide codes chosen by the user, e.g. XYZ
= NNS [N = A/C/G/T; S = C/G; 32 possible equiprobable codon variants; 20
+ 1 possible non-equiprobable amino acid/stop codon variants]). Up to six
codons may be independently varied.
To calculate the library size required in order to obtain a given completeness, or to have a given probability of being 100% complete, just try entering different library sizes and check the resulting library statistics until you home in on your required values.