Regulating Information in Molecules: The Convention on Biological Diversity and Digital Sequence Information

The United Nations Convention on Biological Diversity and its subsequent Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization to the Convention on Biological Diversity provide a framework to conserve biological diversity, sustainably use biodiversity components and fairly and equitably share their benefits. There is unresolved contention about treating information as a derivative of biological materials and a distinct commodity with a value that can be translated into definable benefits. This article addresses whether there is information in DNA sequences, finding that there is causal information but no intentional or semantic information, although the causal contribution remains difficult to determine. This article concludes that caution should be exercised in limiting access to information in DNA through regulation because of the perverse outcomes controlling potential uses and reducing incentives for others to use information in new and innovative ways.


Introduction
The United Nations Convention on Biological Diversity (CBD) and its subsequent Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization to the Convention on Biological Diversity (Nagoya Protocol) proposed a framework to conserve biological diversity, sustainably use the components of biodiversity and fairly and equitably share the benefits from utilising genetic resources. 1 The basic scheme for fairly and equitably sharing the benefits from utilising genetic resources obliges Contracting Parties to the CBD and Parties to the Nagoya Protocol to consider implementing legislative, administrative and policy measures facilitating access to 'genetic resources' within their sovereign control with prior informed consent and mutually agreed terms (known as access and benefit-sharing (ABS)). 2 In this context, 'genetic resources' are defined as 'genetic material of actual or potential value' and 'genetic material' as 'any material of plant, animal, microbial or other origin containing functional units of heredity '. 3 In practice, however, the term has a very flexible meaning, and Contracting Parties implementing the CBD may apply the term broadly to include most biological materials and derivatives. 4 The Nagoya Protocol extends these obligations to include 'derivatives' 5 and to 1 CBD, Art. 1; Nagoya Protocol, Art. 1. 2 CBD, Art. 15; Nagoya Protocol, Arts. 5 and 6. 3 CBD, Art. 2. See also UNEP/CBD/WG-ABS/7/2, [18] and Annex ([3]). 4 See UNEP/CBD/WG-ABS/9/INF/1. See also UNEP/CBD/COP/3/20, [35]-[37]. 5 A derivative is 'a naturally occurring biochemical compound resulting from the genetic expression or metabolism of biological or genetic resources': Nagoya Protocol, Art. 2. The interpretation is complicated because the term 'derivative' is included in the definition of 'biotechnology', and that term is then included in the definition of 'utilization of genetic resources' that is engaged in the fair and equitable benefit-sharing of the ABS obligations: Arts. 2, 5.1 and 5.2.

The United Nations Convention on Biological Diversity and its subsequent Nagoya Protocol on Access to Genetic Resources and the Fair and Equitable Sharing of Benefits Arising from their Utilization to the Convention on
Biological Diversity provide a framework to conserve biological diversity, sustainably use biodiversity components and fairly and equitably share their benefits. There is unresolved contention about treating information as a derivative of biological materials and a distinct commodity with a value that can be translated into definable benefits. This article addresses whether there is information in DNA sequences, finding that there is causal information but no intentional or semantic information, although the causal contribution remains difficult to determine. This article concludes that caution should be exercised in limiting access to information in DNA through regulation because of the perverse outcomes controlling potential uses and reducing incentives for others to use information in new and innovative ways.
'Traditional Knowledge associated with genetic resources'. 6 The CBD has attracted 196 Contracting Parties and the Nagoya Protocol 133 Parties. Most ABS schemes rely on a contractual arrangement between a resource holder and the party seeking access to the resource that incorporates the CBD and Nagoya Protocol obligations, including the sharing of monetary and nonmonetary benefits. 7 Despite almost three decades of operation, concerns remain about the likely potential for the CBD and Nagoya Protocol to deliver significant benefits. 8 As a result of these concerns, there has been a resurgent interest in extracting benefits from utilising information associated with genetic resources (e.g., downloading a DNA sequence from a publicly available database) as another source of potentially significant ABS benefits. However, failing to include or extend ABS to genetic information could undermine the existing ABS scheme because information can be utilised without the ABS arrangements and specifically benefit-sharing that apply to physical genetic resources. 9 At the CBD and Nagoya Protocol forums, these concerns were captured by the term 'digital sequence information' (DSI). This term is a 'place holder, without prejudice to future consideration of alternative terms'. 10 The core of the contention is the ways and merits of treating DSI as a derivative of the materials within the ABS transaction itself, which becomes a distinct commodity with a value that the ABS scheme attempts to translate into definable benefits. 11 The purpose of this article is to address the concern that informational language in the context of genetic resources (e.g., 'transcription', 'translation', 'coding', 'editing', 'proofreading', 'copying', 'gene expression', 'signals', 'program' and 'book of life') 12 relating to morphological development and evolution potentially falling within the scope of the CBD and the Nagoya Protocol's ABS arrangements is essentially flawed. While these are not new concerns in the discussion on information metaphors in the biological sciences, they are poignant because a bottom-up account of genetics based on the information flowing from DNA sequences infers a significance for a sequence that it might not have. In contrast, a top-down account traces an observable phenomenon to the products that result from the DNA sequence and other relevant causes. Put simply, as this article will demonstrate, the informational language used to describe molecular biology 13 'leads to a misleading picture of possible explanations in molecular biology' 14 and has been so pervasive in common understandings of genetics that it has probably limited the perspectives of policymakers addressing genetic resources. If this is correct, then this article argues that founding a legislative, administrative and policy scheme on this misleading picture of bottom-up information flowing from DNA sequences may perpetuate perverse outcomes and (further) 15 undermine the purpose and integrity of ABS schemes.
In addressing these matters, the article is structured as follows. The next part outlines the dimensions of the DSI issue in the CBD and Nagoya Protocol forums. The following part traces the developments of the use of informational language in genetics, detailing the current theoretical framework for information in philosophy and law and distinguishing between the ideals of classical and molecular genes so that simplistic and ultimately misleading conceptions of DNA sequence do not undermine the role and place of information in DNA sequences. The next part discusses the implications of these theoretical threads for the regulation of DSI in the context of ABS. The final part concludes that there already exists adequate potential in the current ABS scheme of CBD Contracting States and Nagoya Protocol Parties to implement ABS legislative, administrative and policy measures to regulate DSI as a genetic resource. Alternatively, those implementing existing national ABS can include terms and conditions as part of prior informed consent and mutually agreed terms addressing DSI (as some Contracting Parties have done already). While initiating such measures is possible, there are potential problems. The result will be a matrix of different laws, policies and practices among the CBD Contracting States and Nagoya Protocol Parties that will likely perpetuate perverse outcomes by controlling the potential uses of information and reducing the incentives for users of genetic resources to apply information in new and innovative ways. Consequently, this is likely to undermine the conservation and sustainable use of

DSI as a CBD and Nagoya Protocol Issue
Formally recorded concerns about ABS and DSI in the CBD and Nagoya Protocol forums emerged in 2016 16 with the decision to establish an Ad Hoc Technical Expert Group on Digital Sequence Information on Genetic Resources (AHTEG-DSI). 17 The AHTEG-DSI compiled and synthesised views about DSI and commissioned a fact-finding and scoping study to consider the concept and scope of DSI and how DSI was currently used. 18 Significantly, the indicative and contextual information 'that may be relevant to the utilization of genetic resources' considered by the AHTEG-DSI included: (a) the nucleic acid sequence reads and the associated data (b) information on the sequence assembly, its annotation and genetic mapping. This information may describe whole genomes, individual genes or fragments thereof, barcodes, organelle genomes or single nucleotide polymorphisms (c) information on gene expression (d) data on macromolecules and cellular metabolites (e) information on ecological relationships and abiotic factors of the environment (f) information on function, such as behavioural data (g) structure, including morphological data and phenotype (h) information related to taxonomy (i) modalities of use. 19 The AHTEG-DSI concluded that more discussion about terminology was required to find a balance that could accommodate scientific, technological, market and other changes and provide legal certainty. 20 Recognising a lack of consensus and common ground about the scope of the CBD and Nagoya Protocol and the likely consequences for DSI on benefit-sharing through technology transfer, partnerships and collaboration, information exchange and capacity development, 21 the AHTEG-DSI continued together with an open-ended working group to develop modalities for sharing benefits from DSI 22 and reported their findings in 2018. 23 Reflecting the lack of agreement, the mandate of the AHTEG-DSI was extended, and they commissioned various additional studies on information traceability, sequence databases and domestic legal, administrative and policy ABS measures addressing DSI and benefit-sharing. 24 The outcomes of the AHTEG-DSI were to be considered by the Open-Ended Intersessional Working Group to Support the Preparation of the Post-2020 Global Biodiversity Framework and at the next Conference of the Parties (COP) to the CBD and the Meeting of the Parties (MOP) to the Nagoya Protocol in 2020. 25 However, the COP and MOP were postponed because of the global coronavirus pandemic.
The AHTEG-DSI has been a rich source of detail on DSI through its call for submissions and the commissioned peer-reviewed studies. The original commissioned scoping study identified the diversity of terms for DSI, including 'resources in silico, genetic sequence data, genetic sequence information, digital sequence data, genetic information, dematerialized genetic resources, in silico utilization, information on nucleic acid sequences, nucleic acid information, and natural information' (emphasis in original). 26 The study noted that more discussion of the terminology was required. It applied a basic conception of DSI as the order of nucleotides in a sequence that might be stored in a computer: '[DSI] is primarily the product of sequencing technologies that have become faster, cheaper, and more accurate in recent years. The aim of DNA sequencing is to determine the order in which each of the four DNA nucleotides is arranged in the molecule '. 27 More recently, and extending previous work, 28 another commissioned study considered the concept and scope of DSI and how DSI was currently used concluding that the proximity of the information to the underlying physical genetic resource provided a logical basis to group information that could comprise DSI as follows: 'Group 1 -Narrow: DNA and RNA'; 'Group 2 -Intermediate: (DNA and RNA) + proteins'; 'Group 3 -Intermediate: (DNA, RNA and proteins) + metabolites'; and, 'Group 4 -Broad: (DNA, RNA, protein, metabolites) + traditional knowledge, ecological interactions, [and so on]'. 29 This study framed its discussion around the information flows represented by the 'central dogma' (DNA to RNA to protein to metabolites). 30 The AHTEG-DSI also commissioned a combined study on databases and traceability of DSI 31 that essentially limited their considerations to databases holding nucleotide sequence data and traceability in the core database infrastructure, the International Nucleotide Sequence Data Collaboration (INSDC). 32 This included nucleic acid sequence reads (the sequence of nucleotides, such as CGAAAGACCGGC) and the associated data and information on the sequence assembly, annotation and genetic mapping. 33 They found that some of these databases also included 'subsidiary information', which was broadly defined as information on gene expression, data on macromolecules and cellular metabolites, information on ecological relationships and other environmental data, functional data (e.g., behavioural data), structural data (e.g., morphological data and phenotype) and taxonomy data. 34 The study concluded that the INSDC's use of accession numbers (unique identifiers) facilitated database governance and traceability and that, at least in theory, this was a feasible mechanism for tracing nucleotide sequence data from a country or origin to a benefit-sharing user. 35 The other AHTEG-DSI commissioned study on domestic measures addressing the commercial and non-commercial use of DSI and benefit-sharing found four kinds of legislative, administrative and policy measures: regulating DSI as a distinct object of ABS and separate from the physical genetic resources; regulating DSI as a part of the utilisation of physical genetic resources; regulating DSI by requiring benefit-sharing (but not access) to cover the uses of DSI; and regulating DSI through other measures, such as compliance-related measures and monitoring mechanisms. 36 The study found that some jurisdictions explicitly included 'DSI' language like 'genetic information', 'genetic heritage', 'intangible components', 'gene sequences', 'sequence information', 'information' and 'information of genetic origin', while others interpreted their existing ABS legislative, administrative and policy measures as including 'DSI', such as 'genetic resources', 'genetic material', 'biological resources', 'associated knowledge', 'information of genetic origin', 'research results' and 'derivative'. However, the distinction between explicit and interpretive coverage was not necessarily clear. 37 Where DSI was regulated as a distinct object of ABS and separate from the physical genetic resources, the ABS schemes extended broadly to include information associated with the genetic resources. For example, Malaysia's ABS laws apply broadly to 'biological resources', which include 'genetic resources, organisms, microorganisms, derivatives and parts of the genetic resources, organisms, microorganisms or derivatives', 'the populations and any other biotic component of an ecosystem with actual or potential use or value for humanity' and 'information relating to' these 'biological resources'. In addition, the definition of 'derivative' includes 'information in relation to derivatives'. 38 Kenya's ABS laws apply to 'access', which means 'obtaining, possessing and using genetic resources conserved, whether derived products and, where applicable, intangible components, for purposes of research, bio-prospecting, conservation, industrial application or commercial use', where 'intangible components' are 'any information held by persons that is associated with or regarding genetic resources'. 39 Others regulate DSI by requiring benefit-sharing (but not access) obligations to cover the uses of DSI. For example, under India's ABS laws, benefit-sharing obligations apply to 'biological resource occurring in India or knowledge associated thereto' for 'research or for commercial utilisation or for biosurvey and bio-utilisation', 40 where 'research' means the 'study or systematic investigation of any biological resource or technological application, that uses biological systems, living organisms or derivatives thereof to make or modify products or processes for any use'; 'commercial utilisation' means 'end uses of biological resources for commercial utilisation such as drugs, industrial enzymes, food flavours, fragrance, cosmetics, emulsifiers, oleoresins, colours, extracts and genes used for improving crops and livestock through genetic intervention, but does not include conventional breeding or traditional practices in use in any agriculture, horticulture, poultry, dairy farming, animal husbandry or bee keeping'; and 'bio-survey and bioutilisation' means the 'survey or collection of species, subspecies, genes, components and extracts of biological resource for any purpose and includes characterisation, inventorisation and bioassay'. 41 The Indian law also provides that '[n]o person shall, without the previous approval of the National Biodiversity Authority, transfer the results of any research relating to any biological resources occurring in, or obtained from, India'. 42 While there is no consensus apparent in the existing practices about the best ways to regulate DSI, there are, as the examples demonstrate, various forms of genetic information already subject to regulation in implementing CBD and Nagoya Protocol-consistent ABS schemes.
In addition to these commissioned studies, submissions of views and information to clarify the concept of DSI and benefitsharing arrangements from using DSI have been made by CBD Contracting Parties and others, including other governments, Indigenous Peoples and local communities, relevant organisations and stakeholders. 43 A range of responses have been submitted essentially in three groupings: those arguing that DSI should not be a part of the CBD and Nagoya Protocol; 44 those favouring some accommodation of DSI; 45 and those favouring or already including DSI in their legal, policy and administrative ABS arrangements. 46 As a broad generalisation, technologically rich CBD Contracting Parties favour DSI not being a part of the CBD and Nagoya Protocol, and technologically poor CBD Contracting Parties favour DSI being accommodated or included in the CBD and Nagoya Protocol. As a useful summary of the way forward, the recent Open-Ended Working on the Post-2020 Global Biodiversity Framework considered a typology of possible regulatory options (although traditional knowledge associated with genetic resources was not addressed): 47 'Option 0: Status Quo', addressing DSI under the existing arrangements through domestic ABS laws, policies and processes; 'Option 1: DSI Fully Integrated into the [CBD] and the Nagoya Protocol', addressing DSI as a genetic resource under the CBD and Nagoya Protocol and as an obligation under those agreements and implemented in domestic ABS laws, policies and processes; 'Option 2: Standard [Mutually Agreed Terms]', addressing DSI through an obligation to share benefits from the uses of DSI without restricting access to DSI itself through some kind of agreement with standard terms and conditions; 'Option 3: No [Prior Informed Consent], No [Material Transfer Agreement]', addressing DSI by requiring a payment or contribution for access or use of the DSI into a multilateral fund without the need for prior informed consent or mutually agreed terms and ABS contracts; 'Option 4: Enhanced Technical and Scientific Cooperation', democratising access and use of DSI so that each country has the capacity and opportunity to access and use DSI; and 'Option 5: No Benefit Sharing from DSI', no mechanisms are proposed and there is no benefit-sharing from the use of DSI. 48 Returning to the AHTEG-DSI and the indicative and contextual information 'that may be relevant to the utilization of genetic resources', 49 the commissioned study essentially considered four kinds of information according to 'the flow of information from a genetic resource, particularly the degree of biological processing and proximity to the underlying genetic resource, to provide a logical basis to group information that may comprise DSI'. 50 Underpinning this 'logical basis to group information' was a particular conception of information in genetics founded in an ideal of the 'central dogma', 51 which expresses genetic information as 'nucleotide sequence information associated with transcription', 'protein sequence' information associated with translation, 'information associated with transcription and translation' and 'metabolites and biochemical pathways, thus comprising information associated with transcription, translation and biosynthesis' and 'extends to behavioural data, information on ecological relationships and traditional knowledge, thus comprising information associated with transcription, translation and biosynthesis, as well as downstream subsidiary information concerning interactions with other genetic resources and the environment as well as its utilization, among other subsidiary information'. 52 Importantly, the study also addressed the 'degree of biological processing and proximity to the underlying genetic resource' to distinguish between 'data' and 'information', the latter information being processed data. 53 The issue of the broader concept of biological information is addressed next. However, first, it is important to make a distinction here between information about DNA sequences and information in DNA sequences.
Information about DNA sequences is the vast quantity of information produced, collected, stored, accessed, managed and manipulated, including the order of nucleotides in a sequence, how the sequencing was conducted, annotations and functional analysis. This information is the subject matter of the information sciences bioinformatics applied to genetics: What makes biology an information science in this sense is not anything about the nature of genes, but the fact that contemporary biology works with vast bodies of data that the unaided human mind is incapable of processing effectively. 54 The CBD already has an extensive mechanism to address this information about DNA sequences that is independent of the ABS obligations. 55 Essentially, the CBD has a general obligation to promote the exchange of information on the 'results of technical, scientific and socio-economic research', 'training and surveying programmes', 'specialized knowledge' and '[I]ndigenous and traditional knowledge as such and in combination with the technologies ['relevant to the conservation and sustainable use of biological diversity or make use of genetic resources']' 56 and 'where feasible, include repatriation of information'. 57 There is a clearing house mechanism 'to promote and facilitate technical and scientific cooperation' 58 realised through decentralised databases and websites (information hubs) and national government websites. 59 The Nagoya Protocol Access and Benefit Sharing Clearing House is a part of the CBD's clearing house mechanism and applies only to ABS arrangements and 'access to information made available by each Party relevant to the implementation of this [Nagoya] Protocol'. 60 The CBD's Clearing House Mechanism (including the Nagoya Protocol Access and Benefit Sharing Clearing House), linked sites and sites linked to those sites set out information about DNA sequences and genetic resources more broadly. 61 In contrast, information in DNA sequences is 'a theoretical entity which exists in the genome and explains biological phenomena'. 62 Information in DNA sequences is the information in genetic resources as opposed to the information about genetic resources.

DNA Sequences as Information
It is uncontroversial that DNA is a linear sequence of molecules that can be presented as syntactic information in the language of the genetic code. 63 Letters of the alphabet making words that are joined into sentences, paragraphs and chapters represent syntactic information in language (here English). Similarly, photographs, music and computer programs are all, or can be rendered into, linear sequences of syntactic information in language (i.e., 0s and 1s of binary code). The proposition here is that because linear sequences in the form of words in sentences, paragraphs and chapters (and also photographs, music and computer programs) are information, then, similarly, DNA molecules in a linear sequence with a code represent information. 64 The question then is whether DNA molecules can actually be information.
The ideal of a DNA sequence as information traces back to at least 1953 65 and the 'central dogma' that information flows from DNA to RNA to proteins but not the other way out of proteins. 66 According to this account, the organism's genome accumulates information (through the mechanisms of evolution) for transmission to the next generations. 67 The organism itself is merely the reservoir and transmitter of information. 68 Taken literally, a DNA sequence as information means that the arrangements of As, Ts, Gs and Cs represent the raw data, and they are themselves information. This is consistent with the ideal of life as information and programmable Boolean switches. 69 This might appear intuitively correct given the explosion of bioinformatics as a technological discipline exploiting information. 70 Unfortunately, this notion overlooks the complexity of genetics and gives undue weight to a particular conception of genotype as the causative (or purposeful) explanation (as addressed in detail below). The trajectory of this debate is important because developing regulatory schemes based on a particular perspective or preference regarding an unsettled theoretical foundation is likely to lead to bad laws and unforeseen consequences. At the heart of this problem is finding a common understanding for the term 'information' in genetics and how this might be addressed by law because, very crudely, it is not clear whether 'information' is addressing these molecules literally or metaphorically. 71 A good entry point into this ongoing debate, and an obviously very brief account, 72 is Charles Darwin's 1859 theory of natural selection that set aside the idea that species were immutable, static and designed by a god. Darwin instead introduced the idea that species had adapted to their environments over many generations. 73 While Darwin posited that evolution and inheritance were linked, he also accepted that he was unable to explain the mechanism by which traits were inherited. 74 In 1865, Gregor Mendel provided such an account, positing from his experiments with peas that dominant and recessive elements were inherited. He traced those elements through hybrids as different constitutions and groupings of elements ('Faktoren'). 75 Mendel's insight was to mediate the relationship between genotypes ('genes') and their phenotypes ('unit characters') 76 by assigning the unobservable genotype to a phenotype (i.e., traits, such as seed texture, seed colour, pollen texture and pollen colour) and tracking the 'dominant' and 'recessive' phenotypes across hybridising crosses. 77 In this sense, Mendel's elements were necessary for his explanation to work, 78 and as such, the Mendelian gene (albeit Mendel was no Mendelian) 79 refers to the unit of inheritance ('Zellelemente') 80 that predicted the apparent characters across generations. At its most simple, the Mendelian gene is an account of a mechanism for observed phenotypes from sexual crosses. This account posits a 'gene' (coined by Wilhelm Johannsen) 81 to be an undefined unit of inheritance transmitted across generations and links the phenotype to a genotype. 82 Here the genotype was the speculated and inferred cause of the observed phenotypes, with no explanation of the material and instrumental manifestation of the Mendelian gene itself. 83 Importantly, however, 'what were studied were character differences, not characters, and what explained them were differences in genes, not the genes themselves'. 84 As a unit of inheritance, the Mendelian gene was a theoretical explanation of two kinds: first, the heritable factors an offspring receives, one from each parent in sexual crosses, albeit not an observable entity but an explanation of the observations of segregation and independent assortment (the classical gene); and second, the material and instrumental entity of heredity (the molecular gene). 85 Both forms of the Mendelian gene persist, 86 although modern genetic practitioners often conflate the two forms: When molecular biologists focus on nucleotide sequences, they think of genes in molecular concept. But at earlier stages of investigation, when they have not gotten close to specifying nucleotide sequences, they tend to think of genes in terms of the rougher-grained classical concept. 87 68 Morange, "History of Molecular Biology," 2. 69 For an elegant account of this perspective, see Rosenberg,"Darwinian Reductionism, See, for example, Ranganathan, Encyclopedia of Bioinformatics. 71 Griffiths,"Genetics and Philosophy,[146][147] The history of genetics is well traversed. See, for example, Falk, "Genetic Analysis: A History"; Carlson, "Mendel's Legacy"; Keller, "The Century of the Gene"; Morange, "History of Molecular Biology". 73 Darwin, "On the Origin of Species." For an overview, see Bowler, "Evolution: The History of an Idea." 74 Darwin, "On the Origin of Species," 19-20. 75 Mendel, "Versuche über Pflanzenhybriden," 42. 76 For language developed by Wilhelm Johannsen, see Johannsen, "The Genotype Conception of Heredity." See also Roll-Hansen, "Sources of Wilhelm Johannsen's Genotype Theory." 77 This is the passing over of a more complicated and intriguing moment in the history of genetics: see Falk, "Mendel's Impact." 78 Griffiths and Stotz, "Genetics and Philosophy," 15. 79 Olby, "Mendel no Mendelian?" 80 Mendel, "Versuche über Pflanzenhybriden," 42. 81 See Johannsen, "Elemente der exakten Ereblichkeitslehre." 82 See Wanscher, "Analysis of Wilhelm Johannsen's." 83 For an account of this perspective, see Waters, "Genes Made Molecular," 169-174 and the references therein. 84 Waters, "Genes Made Molecular," 172. 85 For an engaging discussion of these different uses of the gene concept leading to Gregor Mendel being characterised as a methodological reductionist, the later re-discovery of Mendel's work and Hugo de Vries applying the concept to the material and causal elements as a conceptual reductionist, see Falk, "Genetic Analysis: A History," 4. See also Kitcher, "1953 and all That," 336. 86 See Carlson,"Mendel's Legacy." 87 Waters, "Genes Made Molecular," 183.
Rather than thinking of classical genes and molecular genes as separate theories, 88 most genetics practitioners consider a continuous theory addressed at two levels of resolution, with the classical genes being an 'organic extension' of the molecular gene. 89 Conceived this way, genetics is a reductionist, bottom-up account based in a physical sciences methodology using numerical analyses, with the outcome that the methodological and conceptual understanding of heredity is reduced to the sum of the physical and chemical properties of its building blocks, and that is DNA as the molecular gene, so the genotypes are the causes of phenotypes. 90 For the present purposes, it is significant that the later biochemical and molecular biological account of the material and instrumental manifestation of the gene overlooked this classical account of the units of a genetic and nongenetic context that resulted in the observed phenotype. 91 Put simply, 'molecular biologists can now determine the exact molecular identity of the relevant differences and explain how in general such differences produce phenotypic difference within a genetic context' (emphasis added). 92 This becomes clear when tracing the ideal of the material and instrumental manifestation of the molecular gene as opposed to the classical gene. This is important because the molecular gene has taken precedence as the account of genetics and gained a popular appeal 93 that overlooks much of the intriguing complexity and the role of other non-genetic (epigenetic) 94 factors in the observed phenotype, such as environmental effects.
The key moments in tracing the primacy of the material and instrumental manifestation of the molecular gene might be, hopefully uncontroversially, 95 106 Seymour Benzer proposed the conception of genes as linear structures along chromosomes (rather than being like beads on a necklace, they are instead divisible into smaller units of mutation and recombination) in 1955; 107 and, the elucidation of the genetic code by Crick and others in 1961. 108 The end of this track is the ideal of a linear molecular gene where the DNA sequence is considered the genotype and causative agent for the observed phenotype.
Intriguingly, Watson and Crick, in proposing the double helix structure for DNA in 1953, speculated about the 'possible copying mechanism for the genetic material' 109 and 'the precise sequence of the bases in the code that carries the genetic information '. 110 This was essentially entrenching using information language in molecular biology that had started with terms such as 'words', 'codes', 'messages' and 'texts' in the 1930s and took solid hold in the 1940s. 111 Crick's speculation later matured to the generalised rule for the informational transfer from one polymer to another (DNA to RNA, RNA to DNA, RNA to RNA, DNA to DNA and RNA to protein but not protein to protein, 112 protein to DNA and protein to RNA) 113 so that 'once "information" has passed into protein it cannot get out again', where 'information' means 'the precise determination of sequence, either of bases in the nucleic acid or of amino acid residues in the protein' (emphasis in original). 114 There are two parts to this claim that played out over the following decades. First, there is a coded sequence specificity between the DNA and the transcribed RNA and the translated polypeptide (sequence hypothesis). 115 Second, the expression of the DNA sequence determines the RNA or a protein product such that all products are informed (specified or caused) by the DNA sequences (central dogma). 116 Some of the details here matter: Crick accepted that protein synthesis involved 'the flow of energy, the flow of matter, and the flow of information', and his focus was the 'information'. 117 Crick later stated that: it was abundantly clear by that time that a protein had a well-defined three-dimensional structure, and that its activity depended crucially on this structure, it was necessary to put the folding-up process on one side, and postulate that, by and large, the polypeptide chain folded itself up. [118] This temporarily reduced the central problem from a three dimensional one to a one dimensional one … The principal problem could then be stated as the formulation of the general rules for information transfer from one polymer with a defined alphabet to another. 119 The key advances reinforcing these determinist and informational explanations of the now linear molecular gene, again hopefully uncontroversially, were: François Jacobs, Jacques Monod, Sydney Brenner, François Gros, Francis Crick and others' discovery of mRNA in 1960; 120 the genetic code linking triplets of nucleotides (codons) to specific amino acids in 1961; 121 Jacobs and Monod's explanation of a regulation mechanism (the lac operon) on a linear DNA molecule accounting for the relationship between DNA, RNA and proteins and pointing to a hierarchical network of regulation in 1962, 122 Jim Shapiro and others' isolation of a bacterial gene (the lac operon) in 1969, 123 Howard Temin and David Baltimore's discovery of the enzyme that reversed transcription process making DNA from an RNA template in 1970, 124 David Jackson, Robert Symons and Paul Berg making recombinant DNA molecules in 1972, 125 Herbert Boyer and Stanley Cohen showing that engineered DNA molecules could be cloned in foreign cells in 1973, 126 and, then the sequencing of various genomes, including the bacteriophage φX174 in 1978, 127 Haemophilus influenzae in 1995, 128 yeast Saccharomyces cerevisiae in 1996, 129

the nematode
Caenorhabditis elegans in 1998, 130 the fruit fly Drosophila melanogaster in 1999 131 and the human genome in 2000. 132 The end of this track of enquiry was to cement the ideal of a linear molecular gene where the DNA sequence was popularly considered the genotype and causative agent for all the observed phenotypes.
The appeal of this approach was the focus on the simple explanatory power 133 of the information comprised by DNA both as a store of evolutionary accumulated changes and as a master plan for cell development and performance. 134 This was also the logic of a reductive physics and chemistry account where the molecule is the semantics of the transmitted information: 'the polypeptide phenotype is determined by the polynucleotide genotype'. 135 This has been, as the tracing of key advances above illustrates, amazingly heuristically successful. The key point, however, is that framed this way, the DNA sequence is conceived as a repository of meaning, aboutness and content (intentional or semantic information) regarding complex phenotypes so that the genotypes are privileged in the causes of phenotypes and the ideal that there is information in the DNA sequence. 136 The problem remains, however: If there is information in the DNA sequence, what kind of information is it?

Information in DNA Sequences
Even when Watson and Crick were proposing their DNA structure, the information metaphors were known to be problematic. 137 The postgenomic era following the release of the draft human genome sequence in 2001 138 revealed that the relationship between the 20,000-25,000 structural genes and the approximately 1,000,000 polypeptides of the proteome 139 was a lot more complicated than a mere linear sequence of nucleotides corresponding to a linear order of the gene products (RNA and amino acids in a polypeptide). Since then, the disruption of information flows has been repeatedly demonstrated, revealing that the DNA sequence and other factors (including non-genetic factors) are the contributors to RNA and protein sequence specificity (both quality and quantity): genes comprising complex regulatory networks affected by non-genetic factors result in a range of stable and robust phenotypes; 140 epigenetic acetylation, phosphorylation and methylation markings can regulate gene transcription; 141 a range of cis-and trans-acting factors interact with the linear DNA sequence (e.g., transcription factors, promotors, activators, repressors, enhancers, silencers, and splicing factors), resulting in a diversity of RNAs (e.g., mRNA, rRNA, tRNA, lncRNA and RNAi) and protein forms/splice variants; 142 reordering of the linear DNA sequences through frameshifting, programme slippage or bypassing and codon redefinition; 143 and RNA editing resulting in a significantly larger transcriptome. 144 These examples confirm that there is not necessarily a consistent nexus between the DNA sequence as the sole source of information flowing from the DNA to an RNA and a protein. 145 Put simply, the evidence now clearly shows that there is not always a direct correspondence between the DNA sequence and protein because there is processing and modification of that sequence on the way from DNA to protein, 146 and modification of that processing and function can be from outside the DNA sequences (and particularly from the environment). 147 This calls for a nuanced depiction of DNA as information because a DNA sequence alone (genotype) cannot cause the whole organism. There are other contributing causes, such as the environment, and thus, the classical gene (observed phenotype) is not necessarily only caused by the molecular gene (deterministic and reduced to a DNA sequence). There are potentially better accounts of the information in DNA sequences, as argued in the following paragraphs.
A useful starting point is Shannon information theory, which posits a simple quantitative framework for describing correlations: two variables are correlated in some sense where the output of the channel depends on the input. 148 Here the correlation termed information has a special and technical mathematical sense that treats both sense and non-sense input messages as the same. 149 Taking this a step further, the information might be considered to have some content in the sense of natural signs and indicators 150 and that there is a correlation between a DNA sequence and an observed phenotype 151 with information conveyed in the sense of its natural meaning. 152 Thus: The key currency in information theory is the entropy H(X) of a random variable X. The entropy is a measure of uncertainty in the realization of X. If X takes on value Xi with probability pi, the entropy H(X) = Σipilog pi. The key statistic in information theory is the mutual information I(X;Y) between two random variables X and Y. The mutual information, defined as I(X;Y) = H(X)-H(X|Y), measures how much we learn about the value of X by knowing Y. 153 If the DNA sequence is information-in this sense of information as the correspondence between the linear sequences of DNA nucleotides that specifies the linear sequences of RNA ribonucleotides and the linear sequences of amino acids (protein)-then Shannon information theory has some application with the input DNA sequence, the output amino acid sequence and the information transfer from DNA to amino acid modelled as the communication channel. 154 This accepts that DNA sequences do have a limited causal role 155 but not a broader intentional or semantic role. 156 Using Crick's words, this is 'the general rules for information transfer from one polymer with a defined alphabet to another', 157 where 'information was "merely a convenient shorthand for the underlying causal effect", namely the "precise determination of sequence"'. 158 This may also be conceived as a causal specificity between the DNA, the RNA and the coded amino acids, 159 representation of the RNA and amino acids in the DNA sequence and the correlation information between the DNA sequence, RNA and amino acids 160 -the 'information that specifies the product is no longer carried in a three-dimensional structure but instead by the linear, one-dimensional order of elements in each sequence'. 161 This might usefully be termed 'Crick information'. 162 This Crick information also extends to regulatory DNA sequences (including non-coding sequences) and sequences of factors controlling transcription, splicing and editing and non-coding RNAs, where there is a correspondence between the DNA nucleotides and the RNA ribonucleotides and protein amino acids. 163 The next question, however, is whether DNA sequences can hold more information than just Crick information.
The main contribution from Mendel's genetics was to establish a process among scientists of an experimental tradition hybridising organisms and looking to draw inferences from the patterns of inheritance. 164 The quest for the causative agents shifted from the unobservable Mendelian gene to the molecular gene. This has privileged and preferenced DNA sequence 148 See Shannon, "Mathematical Theory of Communication." See also Fabris, 'Shannon Information Theory." 149 Shannon, "Mathematical Theory of Communication," 99. 150 Dretske, "Epistemology and Information," 31. 151 See Godfrey-Smith, "Information in Biology," 104. Noting, however, that a sign, an object and an interpretant may actually be required for there to be any meaningful correlation. Hence, 'while tree rings might be a source of quantitative information, they would not mean anything about a tree's age unless there were an agent present who understood how tree rings are produced and how they relate to yearly growth': Kumar, "Information, Meaning, and Error," 91. 152 Grice, "Meaning." 153 Bergstrom, "The Transmission Sense of Information," 161. 154 See, for example, Bynum, "Informational Metaphysics," 205; Román-Roldán, "Application of Information Theory to DNA Sequence Analysis," 1188. Noting, of course, this might all be a misunderstanding: see Ben-Naim, "Entropy and Information Theory," 1185-1189. 155 See, for example, Godfrey-Smith, "Information, Arbitrariness, and Selection," 204. For a contrary view, see Kitcher, "Battling the Undead" (arguing genetic coding has no explanatory weigh). 156 For other views, see Rosenberg, "Is Epigenetic Inheritance a Counterexample"; Fantini, "Of Arrows and Flows"; Weber, "The Central Dogma as a Thesis." 157 Crick, "Central Dogma of Molecular Biology," 561. 158 Stotz, "Biological Information, Causality and Specificity," 369. See also Griffiths, "Genetic, Epigenetic and Exogenetic Information," 2. 159 See Griffiths, "Measuring Causal Specificity"; Pocheville, "Comparing Causes," 93-102. 160 See Shea, "Representation in the Genome," 314. 161 Stotz, "Biological Information, Causality and Specificity," 368. Note that three-dimensional structures can now be reliably predicted: see Jumper , "Highly Accurate Protein Structure Prediction." 162 Griffiths, "Genetics and Philosophy," 40-41. See also Sarkar, "Genes Encode Information," 261-262, 281-284; Godfrey-Smith, "Information, Arbitrariness, and Selection," 203; Godfrey-Smith, "On the Theoretical Role of 'Genetic Coding'," 35; Godfrey-Smith, "Genes and Codes," 325. In more recent work, Paul Griffiths and Karola Stotz have 'reserved the term "Crick information" for a measure of the intrinsic information content of a sequence, rather than for the measure of the relationship between a sequence and its causes': Griffiths, "Genetic, Epigenetic and Exogenetic Information," 7. See also Stotz, "Biological Information, Causality and Specificity," 367-370. 163 Stotz, "Biological Information, Causality and Specificity," 380; Stotz, "With 'Genes' Like That," 538. Noting that this is not uncontroversial, see Sarkar, "Decoding 'Coding'-Information." 164 Griffiths, "Genetics and Philosophy," 17. See also Falk, "Genetic Analysis: A History," 4. information with the presumption that DNA sequences are the cause of the subsequent transcription, translation and then protein action and the postgenomic events resulting in a phenotype that can be traced back to a causal DNA sequence. This is consistent with the early understanding of the central dogma that the linear order of nucleotides in DNA specifies the linear order of nucleotides in RNA and the linear order of amino acids in polypeptides (hence Crick information) and early analyses that identified DNA sequences as making the actual causal difference for the observed RNAs and proteins. 165 In addition to this is the regulation of linear genes. With the lac operon as the model in mind, for regulatory DNA sequences upstream of a DNA coding sequence for the observed RNAs and proteins, there was information about the conditions for expression and a programmed blueprint for the resulting cells, organs and organisms. The controversy is whether there is information in the DNA sequence that has a role in development (i.e., programs for individual organisms where the program carries the information for development) and evolution (i.e., inherited characters holding the accumulated information over evolutionary time). Putting this another way, DNA appears as a structure and mechanism with a purpose, function, end or goal as information for the programmed machine of the cell, organ, organism and so on, with mutation modifying the DNA and natural selection choosing fitter outcomes such that the DNA sequences direct the effects. 166 Information, in this sense, can generally be considered either causative or intentional or semantic. 167 This causative information, in addition to Crick information, is considered in this section, and the intentional or semantic information is considered in the next.

Causative Information
As set out above, causative information is about the quantity of information (i.e., order as opposed to disorder/entropy and not meaning) within a physical system so that information flows between a sender and a receiver through a channel about the correlations between the sender and receiver (hence Shannon information). The signal carries information about a source such that you can predict the state of the source from the signal: 168 'whenever Y is correlated with X, we can say that Y carries information about X', 169 and, hence, 'the disease phenotypes carry information about disease genes', 170 'genes carry information about phenotypes just as smoke carries information about fire' 171 and 'genes contain information about the proteins they make, and … about the whole-organism phenotype [in the same way as] there is an informational connection between smoke and fire, or between tree rings and a tree's age '. 172 In the context of DNA sequences, the sequences can be the source, the whole organism the receiver and the channel conditions are the resources needed for the organism's life cycle. 173 For a causal information account to be sufficient, holding the resources needed for the organism's life cycle constant would give information about the sequences. 174 As set out above, this causal information account is sufficient for Crick information because the DNA sequence is directly physically causally related to the RNA and amino acid sequence. 175 The DNA sequence CGAAAGACCGGC correlates (through the RNA sequence) with the amino acid sequence RKTG. Therefore, the DNA sequence CGAAAGACCGGC carries information about the amino acid sequence RKTG (Crick information). However, the causal information account is also sufficient where there is a correlation between a genotype and a phenotype as well as every other correlated non-genetic factor and the phenotype. 176 For example, 'an individual with the gene for achondroplasia will have short arms and legs [and] we can equally well say that a baby's environment carries information about growth; if it is malnourished, it will be underweight'. 177 In this sense, DNA sequence information (the gene for achondroplasia) is qualitatively the same as any other kind of correlated information (malnourished and underweight). Importantly, this information is not being used for an explanatory account to say what the DNA sequence product does or how it does it. 178 The apparent weakness of this causal account, so information in a richer sense, is that it makes information ubiquitous because it is not possible to distinguish genetic from non-genetic causes. 179 Put differently, DNA sequences do carry syntactic information (Shannon information) about RNA and amino acid sequences and some phenotypes, 180 but the DNA sequences cannot explain all the causal aspects of phenotypes, and every causal input, including non-genetic causes, will also be a source of information. 181 This is the 'parity thesis' (or 'parity of reasoning') that the informational causal factors 182 resulting in a phenotype are both the DNA sequences and other non-DNA elements, and as causal factors, the DNA sequences do not have any necessary priority or privilege. 183 The result is that without being able to weigh the different genetic and non-genetic causes, the best this causal information can be is about correlations, and they will be any and all correlations.
In attempts to bring more nuances to causal accounts, there have been further proposals. The starting point is accepting that phenotypes are the result of an interaction between DNA sequences (genes) and the environment (called the 'interactionist consensus') 184 and rejecting the ideal that DNA sequences (genes) have some necessarily superior causative role over the environment in determining a phenotype (called 'causal democracy'). 185 In this sense, DNA sequences have a differencemaking role when they actually affect the phenotype ('actual difference-makers'), 186 and this opens up the possibility of discriminating the contribution of a DNA sequence as 'causal relationships are relationships that are potentially exploitable for purposes of manipulation and control' from other genetic and environmental causes. 187 Using this ideal of 'causal specificity', it is then possible to distinguish the causal information from DNA sequences that provide information about their effects and quantify this information using Shannon information theory measures-effectively the reduction of uncertainty. 188 Essentially: There is a causal relationship between variables X and Y if it is possible to manipulate the value of Y by intervening to change the value of X. 'Intervention' here is a technical notion with various restrictions. For example, changing a third variable Z that simultaneously changes X and Y does not count as 'intervening' on X. Causal relationships between variables differ in how 'invariant' they are. Invariance is a measure of the range of values of X and Y across which the relationship between X and Y holds. But even relationships with very small ranges of invariance are causal relationships. 189 The idea here is that the more specific a causal relationship is between a DNA sequence and the observed phenotype, the more informational the DNA sequence will be: 'The specificity of a causal variable is obtained by measuring how much mutual information interventions on that variable carry about the effect variable'. 190 As practical examples, RNA transcribed from DNA sequences in a cell involve the DNA sequence, RNA polymerase and several other proteins so that each is a cause of the resulting RNA molecule. However, it is only the DNA sequence that is a specific actual difference-maker because varying the DNA sequence varies the resulting RNA molecule. 191 Extending this further, modelling measuring the mutual information between causes and effects of significant causes for DNA sequences and the simple production of cis-spliced mRNAs in a cell at a specific time showed the separate contributions of the DNA sequence and the cis-spliced mRNA variants. 192 An example is the single Drosophila melanogaster Down syndrome cell adhesion molecule (DSCAM) coding sequence, with its 38,016 splice variant proteins and the same DSCAM coding sequence in Homo sapiens involving three splice variants. 193 The causes in this model are the DNA sequence and the trans-factors affecting the splicing so that the information between the RNA and splicing is the sum of the mutual information (causal specificity) between the DNA and RNA and between the RNA and splicing. 194 The contributions from the DNA sequence and the splicing can be decomposed so that a value can be assigned to the variation in RNA coming from the splicing and the number of splicing variants per DNA sequence. 195 The result is a value (in bits) that can be assigned to the contribution of the DNA sequence and the splicing so that their relative contributions (significant causes) can be assessed. 196 Thus, the single DSCAM coding sequence in Drosophila is 15.2 bits for the information coming from the splicing processes and 0 bits for the amount of information originating in the DNA sequence and preserved in splicing processes. Therefore, causation is entirely accounted for by post-transcriptional processing. 197 The same DSCAM coding sequence in humans is 1.6 bits for the information coming from the splicing processes and 1 bit for the amount of information originating in the DNA sequence and preserved in splicing processes. Therefore, causation is partly accounted for by the DNA sequence and partly by post-transcriptional processing. 198 This analysis shows that DNA sequences are not necessarily the most significant causes and that the significant causes need to accommodate the different spatial (i.e., cells, tissues and organisms) and temporal diversity because the measured contributions do change across spaces and times. 199 Thus, highly causally specific relationships are informational, and these occur for DNA sequences such that '[o]rganisms reproduce with a high degree of fidelity though the informational specificity of nucleic acids for proteins and functional RNAs'. 200 These highly causally specific relationships, however, are also informational for all other biological systems and will apply to every fine-grained control over effects, such as antibody-mediated immune responses, enzymes for substrates and receptors for their ligands. 201 The consequence is that the contribution of the DNA sequence as a cause needs to be assessed for every instance, and only in some cases will the DNA sequence be the most significant cause and very, very rarely the only cause.

Intentional Information
In the early responses to the idea that it was not possible to distinguish genetic from non-genetic causes, there have been attempts to posit DNA sequence information as intentional or semantic. 202 Intentional or semantic information is the information of human thoughts and linguistic representations (utterances). It is intentional in that it is about the genotype of the phenotype that the genotype is intended to produce, although not necessarily the one actually produced. 203 The key distinction between intentional information and causative information is that intentional information need not be true 204 (like 'phlogiston or Pope Joan'). 205 Molecular biology includes many terms that assert this intentionality, such as 'messenger molecules', 'recognition sites ', 'proofreading', 'editing capabilities' and 'positional information'. 206 The DNA sequence represents, so the argument goes, something that comes from natural selection because the DNA sequence evolved for the purpose of determining the phenotype in the surviving organism. 207 These accounts posit that the meaning or sense can be reduced to the biological function (teleosemantics). 208 Thus, a 'DNA molecule has a particular sequence because it specifies a particular protein … [t]his element of intentionality comes from natural selection'. 209 These DNA sequences are special in the sense that 'biologists draw a distinction between two types of causal chain, genetic and environmental, or "nature" and "nurture"'. 210 In addition, '[f]luctuations in the environment are a source of noise in the system, not of information' ( e.g., a cake recipe will turn out slightly differently when baked in a different oven): 211 DNA contains information that has been programmed by natural selection; that this information codes for the amino acid sequence of proteins; that, in a much less well understood sense, the DNA and proteins carry instructions, or a program, for the development of the organism; that natural selection of organisms alters the information in the genome; and finally, that genomic information is 'meaningful' in that it generates an organism able to survive in the environment in which selection has acted. 212 There are problems with this approach. There must be a difference between the message in the DNA sequence and the physics and chemistry of the DNA sequence. Otherwise, 'we have chemistry or physics and not semantics'. 213 This difference is reputed to be 'arbitrariness' (or 'gratuity') in the sense that the form of the molecule is different to its meaning. 214 For example, Jacob and Monod demonstrated that the lac operon regulation was effected through inducers and repressors interacting with a regulatory protein. This changed their shape so that another part of the regulator protein bound to the DNA sequence, switching on or off any transcription of the DNA sequence open reading frame (the ß-galactosidase). 215 This suggests that 'there is no necessary connection between [the inducer and repressor] form (chemical composition) and meaning (genes switched on or off)'. 216 Therefore, like a symbolic language potentially conveying an indefinite number of meanings, it is 'the symbolic nature of molecular biology that makes possible an indefinite large number of biological forms'. 217 Reduced to the idea, the function is not specifically determined just by the chemistry of the molecules. 218 This does not, however, seem credible because for DNA 'the structure of the "message" is too closely connected to the structure of the "signal"'; therefore, the 'arbitrariness' is not clear, 219 and with so many causal links, the salient ones cannot be distinguished from the others, suggesting that 'the distinction between arbitrary and non-arbitrary causal roles is just in the eye of the beholder'. 220 At best, 'arbitrariness' is a 'useful if elusive concept in biology'. 221 Another criticism is that if 'arbitrariness' is necessary for molecules to have meaning, then assuming DNA (and RNA) have information, they will have the relevant chemically arbitrary relations. 222 Translation may be relevantly arbitrary because any given mRNA is arbitrarily related to the proteins it specifies, but there is no arbitrariness in the 'supposedly informational processes' of transcribing DNA to RNA or replicating DNA to DNA. 223 Further, if DNA sequences contain intentional information, then so must many other biological entities, 224 and there is, again, parity between the different causes, 225 thus: Nucleic acid sequences and phospholipid membranes both have distinctive and essential roles in the chemistry of life and in both cases there seems no realistic substitute for them. But the facts of development do not justify assigning DNA the role of information and control while inherited membrane templates get the role of 'material support' for reading DNA. 226 In contrast with causal information that seeks to distinguish between genetic (nature) and environmental (nurture) causes (recall the gene for achondroplasia and malnourished and underweight children), 227 the intentional information seeks to distinguish between developmental genetic information and other causes. 228 Preferencing causal paths between genotypes to phenotypes as different to the other causal paths between non-genetic factors (like the environment) to phenotypes 229 cannot be an adequate characterisation. 230 More sophisticated models positing a transmission sense of information (communications engineer's approach) 231 and teleosemantic information (infotel framewo rk)232 also fail. 233 The communications engineer's approach fails because there is no account of the teleosemantic content of these signals that have been designed by natural selection. 234 The infotel framewo rk does, however, provide some insights, although it ultimately fails. The basic proposition is that genetic and environmental causes in development can carry inherited information because the organism has adapted a phenotype over evolutionary time, carrying information about the selection pressures in past environments. These selection pressures are 'represented' in the DNA sequences of the current population that correlate with those past events. 235 Thus, 'reading information carried by the genome, information that has been built up in phylogenetic time through the process of natural selection'. 236 According to this model, the intentional information in the DNA sequences is the expression of the represented adaptation (the phenotype) in the current environment. 237 This does not, however, adequately account for observations in practice. For example, the seed beetle Stator limbatus develops according to the survival rates posed by the different seeds species on which it lays eggs. 238 Eggs laid on Acacia greggii seeds have very high rates of survival, while eggs laid on Cercidium floridum seeds face challenges. 239 To address these challenges, the seed beetle lays fewer larger eggs on the Cercidium floridum seeds. 240 Following the infotel theory, the outcome conflates teleosemantic information with a mechanistic role: Having detected which kind of seed it is depositing eggs upon, the mother signals to the offspring to adopt one growth strategy rather than another. Using the infotel theory, we can assign the larger egg mass the indicative content 'you are on Cercidium floridum' and the imperative content 'grow fast and get large' … this teleosemantic transmission information does not translate into a mechanistic explanation of development. If we ask the developmental question 'how does the egg mass produce faster growth and larger size' and answer 'by transmitting to the mechanisms of development the instruction to grow fast and get large' or 'by transmitting to the mechanisms of development the information that the egg has been laid on Cercidium floridum' it is evident how vacuous this is as an explanation. 241 This confounding teleosemantic information with a mechanistic role needs to clearly distinguish between developmental explanations and evolutionary explanations. 242 Thus, how an eye develops and how an eye works (focusing and transducing light) is different, and the DNA sequence cannot account for how an eye works as that is an evolutionary explanation for a preferred mechanism rather than a developmental mechanism. 243 In the context of Cercidium floridum, the information in the DNA sequences is not 'grow fast and get large' because this intentional 'grow fast and get large' is neither the specified order of amino acids in the proteins nor the adaptive history of natural selection. These 'are not mechanistic explanations of how phenotypes are constructed by the regulated expression of the genome, but evolutionary explanations of why development uses a particular mechanism to produce that outcome. They are evolutionary explanations of developmental phenotypes'. 244 Alternatively, if the mother laying eggs on Cercidium floridum seeds is characterised without historical information, and the 'grow fast and get large' is specifically caused by the environmental variable and the state of the organism, then this might be adaptive information: 'just as something needs to be adaptive in the past to be an adaptation in the future, a representation needs to have contained adaptive information in the past if it is to contain inherited information in the future'. 245 But of course, this is really just reframing the claim for inherited intentional information as causal specificity of the DNA sequence as an actual difference-maker within the current functioning organism.
Another possibility is that DNA sequences might be considered as having a limited kind of intentional information: 'As templates for the synthesis of macromolecules, nucleic acids determine their products in a way that is constitutive for instructions in general. It is therefore legitimate to attribute instructional content to molecular templates'. 246 The argument here is essentially that DNA sequences are information about the linear order of the components of RNA and proteins. This might be expressed correctly or incorrectly, and the DNA sequence holds the information in the sense that the DNA sequence serves as a template for synthesising RNA and proteins. 247 This is the same as saying the DNA sequence is Crick information with the assertion that 'this is information in a semantic, rather than a purely correlational, sense' because there is 'something in the process of template-directed synthesis itself that motivates the attribution of semantic information'. 248 This account does not progress beyond Crick information and does not support the existence of intentional information beyond the mere information of RNA and protein sequence.

DNA Sequence and Information
The adoption of information language in molecular biology 249 coincided with the move of scientists from physics to biology 250 and the funding from the Rockefeller Foundation (and Caltech) to promote the application of physical science to biology. 251 Perhaps importantly, physicists framed the new molecular biology in expressly informational terms, such as Erwin Schrödinger asserting in 1944 that chromosomes 'contain in some kind of code-script the entire pattern of the individual's future development and of its functioning in the mature state'. 252 This appears to have had a significant influence on the emerging molecular biology community with a 'new vision of biology' and providing the conceptual framework for proposing, doing and interpreting experiments, as well as a preference for determinist and informational explanations. 253 The former physicist Crick's central dogma illustrates these developments with a focus on information flows of 'detailed, residue-by-residue, sequence information from one polymer molecule to another'. 254 This information language has been extraordinarily successful in opening up molecular biology to reveal the molecular basis of the classical gene and the molecular networks for functioning organelles, cells and organisms. While there was hope (and still is?) that biology would be an information science, 255 serious attempts at biosemiotics, 256 teleosemantics 257 and deflationary accounts 258 have not been successful. 259 It remains unclear whether using information language is just heuristically useful fiction, 260 an illustrative metaphor 261 or something else. The analysis presented in this article shows that there are two related forms of information in DNA sequences, and these relate to the molecular gene rather than the classical gene: The outcome of the large DNA sequencing projects, such as the Human Genome Project in the early 2000s, 262 the Haplotype Mapping (HapMap) Project, 263 the 1000 Genomes Project 264 and the Arabidopsis Genome Initiative, 265 yielded very little insight into the functioning of organisms, demonstrating that the DNA sequence alone is not sufficient to improve our understanding of biology. 266 What was required was the integration of the DNA sequence with other information about the organisation, components and processes of biological organisms and an end of the ideal of DNA sequence as the ultimate reductionist biology. 267 The recent advances applying Shannon information theory to causal DNA sequence information have shown that the contribution of the DNA sequence needs to be assessed for every instance. Only in some cases will the DNA sequence be the most significant cause and very, very rarely the only cause. Recall the DSCAM coding sequence example, where the DSCAM DNA sequence is not the only causal information (and also provides no intentional information) and the specific context of the DNA sequence (Drosophila or human) was a critical determining factor. Put bluntly, the same DNA sequences have different amounts of information depending on their context, and this is only a limited kind of causal information.
A further limitation inherent in the DNA sequence as information is that the causal account fails to capture the directionality of information flows from DNA sequences (genotypes) to phenotypes. 268 Recall the central dogma that 'once "information" has passed into protein it cannot get out again' (emphasis in original). 269 The Shannon information theory, however, would posit that 'the amount of information that knowing the genotype G provides about the phenotype P is always exactly equal to the amount of information that knowing the phenotype P provides about the genotype G'. 270 However, for DNA sequences, this is a privileged directional flow of information from DNA sequence to protein where the DNA sequence informs the protein but not vice-versa. 271 There is a redundancy in the code so that from a protein, there are a range of possible RNAs and, consequently, DNA sequences. While this does not invalidate the application of Shannon information theory to causal information in DNA sequences, it suggests some caution in applying this abstracted theory to material DNA sequences.
This analysis shows that causal and intentional information is most often a metaphor when discussing DNA sequences, except for the limited and special circumstances of Crick information and where the DNA sequence is the causal actual differencemaker. There is no doubt that information metaphors are useful, and they have expounded the molecular gene in both the scientific and popular imagination. These metaphors may, however, have a more subtle role that has framed the way many discussions around DNA sequence information are limited in the CBD and Nagoya Protocol forums, conflating the ideals of classical genes and molecular genes: The very structure of a typical genetics education endows the character-makers conception with independent life, by investing it with heuristic power. Begin your education in genetics with Mendel's peas, and you will learn not merely about a case where, you are told, binary characters are determined by genes for those characters, and by nothing else. You will learn too that many apparently more complicated cases can be made tractable by treating them in the first instance like Mendel's peas. (And if you don't learn that, you won't pass.) Of course, you will go on to learn about all sorts of exceptions to your rule of thumb, and the reasons why those exceptions are the way they are: the effects of other genes, epigenetic modifications, the interplay of development and environment, chance. By the end of your education, you will know, of course, that 'it's not all in the genes', and become annoyed with anyone who suggests that you think otherwise. But the Mendelian, treat-'em-likethe-peas rule of thumb will remain in place. It will guide your reasoning and evenin the way of heuristicsperhaps your unreasoned reflections and reactions too, with much reinforcement from the wider culture in the form of gene-personifying 23-and-Me ads, 'gene for' discovery stories, jargon talk of what is in an organization's DNA, and so on. You will affirm genes-for-characters determinism in your actions and attitudes while rejecting it if asked about it, because you know that it's false (footnotes excluded). 272 That this is problematic is perhaps best illustrated by a limited ABS scheme directed to human influenza viruses. The World Health Organisation of the United Nations' Pandemic Influenza Preparedness Framework (PIP Framework) 273 defines 'gene sequences' as 'the order of nucleotides found in a molecule of DNA or RNA. They contain the genetic information that determines the biological characteristics of an organism or a virus'. 274 As the analysis in this article clearly demonstrated, 'the order of nucleotides found in a molecule of DNA or RNA' does not necessarily 'contain the genetic information that determines the biological characteristics of an organism or a virus'. Where they do 'contain … genetic information', this is limited to the Crick information and other causal information that is an actual difference-maker that needs to be determined on a case-bycase basis accounting for spatial and temporal differences. Clearly distinguishing between the ideals of classical and molecular genes is critical to the DSI discussions so that simplistic and ultimately misleading conceptions of DNA sequence do not undermine the role and place of information in DNA sequences. The next question, with these clear limits on thinking about information in DNA sequences, is how these matters should be addressed in the context of the DSI debates at the CBD and Nagoya Protocol forums.

Regulating DSI
Information under existing CBD and Nagoya Protocol ABS obligations, however described and defined, has been addressed through: 1. The ABS contract as a term and condition of prior informed consent and/or mutually agreed terms. 275  The challenge for addressing DSI is whether these existing arrangements are suitable or whether further or different measures are necessary. This is the important distinction because the general information obligations promote the disclosure and exchange of information, while proposals to address DSI treat DSI as a derivative of the materials within the ABS transaction itself, which becomes a distinct commodity with a value that an ABS scheme attempts to translate into definable benefits. 284 The mischief that needs to be addressed here is using DSI without the physical genetic materials, potentially avoiding the ABS prior informed consent and/or mutually agreed terms requirements, including the benefit-sharing. 285 The preferable outcome is a simple, efficient and effective multilateral agreement balancing access and benefit-sharing that delivers fair and appropriate benefits from the access and utilization of DSI (whatever that might be). 286 The analysis in this article shows that DNA sequences only have Crick information and may have causative information that would need to be assessed for each and every sequence. Put simply, there is limited information in DNA sequences, and consequently, there is lots of information about DNA sequences. This distinction and the limitations of treating DNA sequences as information per se become readily apparent in applying the DSI groupings proposed by the AHTEG-DSI commissioned study on the concept and scope of DSI and how DSI was currently used. 287 This study's groupings were, noting that these are expressed in the sense of the molecular gene and not the classical gene ideal: 288 1. 'Group 1 -Narrow: DNA and RNA'-'a narrow scope or proximity to the genetic resource and is limited to nucleotide sequence information associated with transcription'. 289 This is Crick information and limited to the extent that the DNA sequence information is the precise determination of base sequence in the nucleic acids (DNA and RNA) but less than the full Crick information because it does not include the precise determination of amino acid residues sequence in the protein.
2. 'Group 2 -Intermediate: (DNA and RNA) + proteins'-'an intermediate scope and extends to protein sequences, thus comprising information associated with transcription and translation' 290 (but only in one direction DNA to protein and less precision or confidence for protein to DNA because of the redundant genetic code). This is Crick information as the DNA sequence information is the precise determination of base and amino acid residues sequence in the nucleic acids (DNA and RNA) and protein. This will also include information about the DNA sequence that is correlated with the DNA sequence but is not an actual difference-maker (recall the DSCAM example above).
3. 'Group 3 -Intermediate: (DNA, RNA and proteins) + metabolites'-'a wider intermediate scope or proximity to the genetic resource and extends to metabolites and biochemical pathways, thus comprising information associated with transcription, translation and biosynthesis'. 291 This is Crick information and additional causal information where the causal specificity of the DNA sequence is an actual difference-maker for the metabolites. A DNA sequence as an actual difference-maker, however, would need to be assessed on a case-by-case basis accounting for spatial and temporal differences. Again, this will also include information about the DNA sequence that is correlated with the DNA sequence but is not an actual difference-maker.
4. 'Group 4 -Broad: (DNA, RNA, protein, metabolites) + traditional knowledge, ecological interactions, [and so on]'-'the broadest scope or weakest proximity to the underlying genetic resource and extends to behavioural data, information on ecological relationships and traditional knowledge, thus comprising information associated with transcription, translation and biosynthesis, as well as downstream subsidiary information concerning interactions with other genetic resources and the environment as well as its utilization, among other subsidiary information'. 292 This is Crick information and additional causal information where the causal specificity of the DNA sequence is an actual difference-maker for the metabolites. Again, a DNA sequence as an actual difference-maker would need to be assessed on a case-by-case basis accounting for spatial and temporal differences. This grouping will also include information about the DNA sequence, such as traditional knowledge and ecological interactions, that is correlated with the DNA sequence but is not an actual difference-maker.
The analysis clearly shows that if the DNA sequence is treated as a derivative of the materials within the ABS transaction itself and becomes a distinct commodity with a value that the ABS scheme attempts to translate into definable benefits (as some countries have already done through laws 293 or as mandatory terms and conditions as part of prior informed consent and/or mutually agreed terms addressing DSI), 294 privileging the DNA sequence will undervalue the other non-DNA sequence contributions to phenotypes. This will potentially limit the kinds and values of research and development on the other causal contributions. 295 Put simply, and demonstrated by the AHTEG-DSI commissioned study, the common understanding of genetics privileging bottom-up information flowing from DNA sequences is pervasive but a misleading base to found a legislative, administrative and policy ABS scheme. This will inevitably undermine the purpose of ABS schemes to deliver benefits for conservation and sustainable uses and the integrity of ABS schemes because the value of the DNA sequence that is the actual difference-maker is not being assessed (including accounting for spatial and temporal differences), and the value of most other DNA sequences is being overvalued and tied up with complex law, policy and processes. Predictably, this leads to perverse outcomes by controlling the potential uses of information or reducing the incentives to use information in new and 287 CBD/DSI/AHTEG/2020/1/3. 288 CBD/DSI/AHTEG/2020/1/3, Annex (p. 32  293 See CBD/DSI/AHTEG/2020/1/5, 10-11. 294 See, for example, CBD/DSI/AHTEG/2020 /1/5, 21. 295 This rejects the proposed 'Option 1: DSI fully integrated into the Convention on Biological Diversity and the Nagoya Protocol': CBD/WG2020/3/4, Annex II (p. 15).
innovative ways, and consequently for the conservation and sustainable use of biodiversity. 296 Further, merely leaving DNA sequences as DSI to be resolved within the current ABS scheme is problematic because this will create a complicated matrix of different laws, policies and practices as each CBD Contracting Party and Nagoya Protocol Party implements their own approaches, perpetuating these perverse outcomes. 297 This suggests that DSI is not a suitable target for regulation because it both simplistically privileges the bottom-up information flowing from DNA sequences and presumes all DNA sequences and their extensions (e.g., information associated with transcription, translation and biosynthesis) are sufficiently valuable to warrant complex ABS negotiation and agreement-making.
The dilemma for the CBD Contracting States and Nagoya Protocol Parties is whether to persevere in crafting the regulation of DNA sequence information to the kinds of nuances identified in the presented analysis about Crick information and causal actual difference-maker information and the inherent problems of deciding thresholds for how much actual difference-making has value. This might be possible but will likely directly conflict with the ideal of a simple, efficient and effective multilateral agreement because of the difficult assessments about whether the information is actually valuable 298 and how and when to impose those obligations (e.g., agreed standard material transfer terms and conditions) and deliver the benefits. 299 The likely most simple, efficient and effective compromise is to accept that a more general multilateral solution will have some inefficiencies capturing other kinds of information (e.g., correlated but not actual difference-maker information) but that also delivers certainty and predictability to those accessing DSI, externalises the value of benefit-sharing from the complexities of a matrix of different laws, policies and practices among the CBD Contracting States and Nagoya Protocol Parties and facilitates easy access to DNA sequences for any uses. 300 If the solution is to externalise the value of benefits in a multilateral agreement that is made separately from the ABS transaction (decoupling), this might include either a payment (e.g., a charge, levy or tax) 301 or other non-monetary benefits (e.g., research collaborations, training, knowledge platforms, technology transfer and technology co-development) 302 or a combination of these measures. The guiding principles for such a multilateral agreement are that users have certainty and predictability about their obligations (clear provenance) and facilitated, easy access to DNA sequences and are aware that regulatory options need to be considered and assessed in the context of both the classical gene and the molecular gene.

Conclusions
The starting point for working out whether there is information in DNA sequences was to distinguish between the classical gene and the molecular gene and appreciate the success of the reductionist, bottom-up account based in a physical sciences methodology that is DNA as the molecular gene, so the genotypes are the causes of phenotypes. Recall, however, that while classical genes and molecular genes might not be separate theories, conflating the two levels of resolution privileging the molecular gene overlooks the other factors (including non-genetic factors) that affect genotypes. This becomes important in appreciating the distinctions about the kinds of information in DNA sequences. There is no intentional or semantic information in DNA sequences. There is only causative information-Crick information in the DNA sequence specifying the linear order of RNA and proteins and other causal information where that DNA sequence makes an actual causal difference to the observed phenotype. These actual causal differences to the observed phenotype also apply to all other biological systems having an effect so that the quantum of the DNA sequence causation will vary from nothing to a lot depending on the particular spatial and temporal circumstances (recalling the DSCAM example). Privileging the DNA sequence undervalues all these other causes, limits the kinds and values of research and development about the other causal contributions and will likely reduce research and development using DNA sequences subject to ABS measures. The predictable consequence is to undermine the CBD's objectives of conservation and sustainable use of biodiversity. Perhaps most importantly, the analysis here cautions against both the 'genes-for-characters determinism in … actions and attitudes' 303 for those proposing DSI regulation and highlights the potential perverse outcomes from privileging the DNA sequence as a molecular gene rather than the more complicated classical