Genome Sizes

The genome of an organism is the complete set of genes specifying how its phenotype will develop (under a certain set of environmental conditions). In this sense, then, diploid organisms (like ourselves) contain two genomes, one inherited from our mother, the other from our father.

The table below presents a selection of representative genome sizes from the rapidly-growing list of organisms whose genomes have been sequenced.



Table of Genome Sizes (haploid)
Base pairsGenesNotes
Phi-X 174 5,38610virus of E. coli
Human mitochondrion16,56937
Epstein-Barr virus (EBV)172,28280causes mononucleosis
Nanoarchaeum equitans490,885552This parasitic member of the Archaea has the smallest genome of a true organism yet found.
nucleomorph of Guillardia theta551,264511all that remains of the nuclear genome of a red alga (eukaryote) engulfed long ago by another eukaryote
Mycoplasma genitalium580,073485three of the smallest true organisms
Ureaplasma urealyticum751,719652
Mycoplasma pneumoniae816,394680
Chlamydia trachomatis1,042,519936most common sexually-transmitted disease (STD) bacterium in the U.S.
Rickettsia prowazekii1,111,523834bacterium that causes epidemic typhus
Treponema pallidum1,138,0111,039bacterium that causes syphilis
Mimivirus1,181,4041,262A virus (of an amoeba) with a genome larger than the six cellular organisms above
Rickettsia conorii1,268,7551,374causes Mediterranean spotted fever
Pelagibacter ubique1,308,7591,354smallest genome yet found in a free-living organism (marine α-proteobacterium)
Borrelia burgdorferi1.44 x 1061,738bacterium that causes Lyme disease [Note]
Aquifex aeolicus1,551,3351,749bacterium isolated from a hot spring in Yellowstone National Park
Campylobacter jejuni1,641,4811,708frequent cause of food poisoning
Helicobacter pylori1,667,8671,589chief cause of stomach ulcers (not stress and diet)
Thermoplasma acidophilum1,564,9051,509These unicellular microbes look like typical bacteria but their genes
are so different from those of either bacteria or eukaryotes that they are
classified in a third kingdom: Archaea.
Methanococcus jannaschii1,664,9701,783
Aeropyrum pernix1,669,6951,885
Pyrococcus horikoshii1,738,5051,994
Methanobacterium
thermoautotrophicum
1,751,3772,008
Haemophilus influenzae1,830,1381,738bacterium that causes middle ear infections
Thermotoga maritima1,860,7251,879marine bacterium
Streptococcus pneumoniae2,160,8372,236the pneumococcus
Archaeoglobus fulgidus2,178,4002,437another member of the Archaea
Neisseria meningitidis2,184,4062,185Group A; causes occasional epidemics of meningitis in less developed countries.
Neisseria meningitidis2,272,3512,221Group B; the most frequent cause of meningitis in the U.S.
Encephalitozoon cuniculi2,507,5191,997(plus 69 RNA genes); a parasitic eukaryote.
Propionibacterium acnes2,560,2652,333causes acne
Listeria monocytogenes2,944,5282,9262,853 of these encode proteins; the rest RNAs
Deinococcus radiodurans3,284,1563,187on 2 chromosomes and 2 plasmids; bacterium noted for its resistance to radiation damage
Synechocystis3,573,4704,003a marine cyanobacterium ("blue-green alga")
Vibrio cholerae4,033,4603,890in 2 chromosomes; causes cholera
Mycobacterium tuberculosis4,411,5323,959causes tuberculosis
Mycobacterium leprae3,268,2031,604causes leprosy
Bacillus subtilis4,214,8144,779another bacterium
E. coli K-124,639,2214,3774,290 of these genes encode proteins; the rest RNAs
E. coli O157:H75.44 x 1065,416strain that is pathogenic for humans; has 1,346 genes not found in E. coli K-12
Agrobacterium tumefaciens4,674,0625,419Useful vector for making transgenic plants; shares many genes with Sinorhizobium meliloti
Salmonella enterica var Typhi4,809,0374,395+ 2 plasmids with 372 active genes; causes typhoid fever
Salmonella enterica var Typhimurium4,857,4324,450+ 1 plasmid with 102 active genes
Yersinia pestis4,826,1004,052on 1 chromosome + 3 plasmids; causes plague
Schizosaccharomyces pombe12,462,6374,929Fission yeast. A eukaryote with fewer genes than the five bacteria below.
Ralstonia solanacearum5,810,9225,129soil bacterium pathogenic for many plants; 1681 of its genes on a huge plasmid
Pseudomonas aeruginosa6.3 x 1065,570Increasingly common cause of opportunistic infections in humans.
Streptomyces coelicolor6,667,5077,842An actinomycete whose relatives provide us with many antibiotics
Sinorhizobium meliloti6,691,6946,204The rhizobial symbiont of alfalfa. Genome consists of one chromosome and 2 large plasmids.
Saccharomyces cerevisiae12,495,6825,770Budding yeast. A eukaryote.
Cyanidioschyzon merolae16,520,3055,331A unicellular red alga.
Plasmodium falciparum22,853,7645,268Plus 53 RNA genes. Causes the most dangerous form of malaria.
Thalassiosira pseudonana34.5 x 10611,242A diatom. Plus 144 chloroplast and 40 mitochondrial genes encoding proteins
Neurospora crassa38,639,76910,082Plus 498 RNA genes.
Caenorhabditis elegans 100,258,17119,427The first multicellular eukaryote to be sequenced.
Arabidopsis thaliana115,409,949~28,000a flowering plant (angiosperm) See note.
Drosophila melanogaster122,653,97713,379the "fruit fly"
Anopheles gambiae278,244,06313,683Mosquito vector of malaria.
Tetraodon nigroviridis (a pufferfish)3.42 x 10827,918Although Tetraodon seems to have about the same number of genes as we do, it has much less "junk" DNA so its total genome is about a tenth the size of ours.
Rice3.9 x 10837,544
Sea urchin8.14 x 108~23,300
Dogs2.4 x 10919,300
Humans3.3 x 109~20,500 [Link to more details.]
Amphibians109–1011?
Psilotum nudum2.5 x 1011?Note

Note: The gene total for Borrelia burgdorferi is based on 853 genes on its single chromosome (of 910,724 base pairs) plus 430 genes on 11 of the 17 plasmids it contains.

Arabidopsis thaliana is a plant (in the mustard family) that has the smallest genome known in the plant kingdom and for this reason has become a favorite of plant molecular biologists. The sequences of two of its five chromosomes (#2 and #4) were published in December 1999. The others were reported in December 2000.

Even though Psilotum nudum (sometimes called the "whisk fern") is a far simpler plant than Arabidopsis (it has no true leaves, flowers, or fruit), it has 3000 times as much DNA. No one knows why, but 80% or more of it is repetitive DNA containing no genetic information. This is also the case for some amphibians, which contain 30 times as much DNA as we do but certainly are not 30 times as complex.

The total amount of DNA in the haploid genome is called its C value. The lack of a consistent relationship between the C value and the complexity of an organism (e.g., amphibians vs. mammals) is called the C value paradox.

How many genes does it take to make an organism?

The scientists at The Institute for Genomic Research (now known as the J. Craig Venter Institute) who determined the Mycoplasma genitalium sequence have followed this work by systematically destroying its genes (by mutating them with insertions) to see which ones are essential to life and which are dispensable. Of the 485 protein-encoding genes, they have concluded that only 381 of them are essential to life.

Welcome&Next Search

27 May 2009