Open-ended Online Forum on Synthetic Biology for additional information gathering

Topic 1: Trends and issues in synthetic biology identified for more detailed assessment

Forum closed. No more comments will be accepted on this forum.

4: Integration of artificial intelligence and machine learning [#3038]

To facilitate a gathering of additional information, participants are asked to consider the following points in relation to this topic:
1. Review the potential positive and negative impacts on each of the three objectives of the Convention that may arise through the increased integration of artificial intelligence and machine learning. Please use specific examples when possible.

2. What is the timeframe for release or potential impact of applications designed by artificial intelligence or machine learning? Please be specific with your response.

3. What are the potential gaps or challenges for risk assessment, risk management and regulation that may arise from this topic in synthetic biology? Evaluate the availability of tools to detect, identify and monitor the organisms, components and products of synthetic biology (if applicable).

4. Review the potential social, economic, cultural, ethical, political, human health and/other relevant impacts that may arise from this trend and issue. What are the relevant considerations for IPLCs, women and youth?

5. Is the trend and issue attempting to address specific problems, and if so, what are these problems and their underlying causes? How else could these problems or causes be addressed?

6. What lessons can be learned from other domains addressing artificial intelligence? How might those lessons from elsewhere be relevant or shed insight in assessing this topic in the context of the aims of the Convention on Biological Diversity?

7. Where are limits of knowledge with respect to this trend and issue? Are there any other considerations that would be important to raise?

(edited on 2023-11-06 17:12 UTC by Mr Austein McLoughlin, Secretariat of the Convention on Biological Diversity)

posted on 2023-10-26 19:43 UTC by Ms. Melissa Willey, UNEP/SCBD/Biosafety

4 - artificial intelligence and machine learning.pdf - 105 KB

This is a reply to 3038

RE: 4: Integration of artificial intelligence and machine learning [#3052]

Dear colleagues,

It is my pleasure to welcome you to the Open-ended Online Forum on Synthetic Biology. The forum will be open from 6 to 15 November at 17.00 EST. I will be moderating the discussion and will provide support should the need arise. Please also bear in mind the forum guidelines that you can find on the website.

Thank you in advance for your engagement and I am looking forward to productive discussions!

Kind regards,
Florian Rabitz

posted on 2023-11-06 17:36 UTC by Mr. Florian Rabitz, Lithuania

This is a reply to 3052

RE: 4: Integration of artificial intelligence and machine learning [#3088]

My name is Christoph Then and I am a member of ENSSER (The European Network of Scientists for Social and Environmental Responsibility) and representing Testbiotech (http://www.testbiotch.org) in this discussion. In my contribution, I refer to points 2., 3. and 7 of the questions raised by the moderator.

Artificial Intelligence (AI) can radically enhance Synbio applications by converging the knowledge from several disciplines, predicting outcomes of genomic interventions, supporting in the design of experiments and establishing automated search from large data sources.

It can also be used to develop improved versions of Synbio tools (such as new versions of CRISPR/Cas) and to identify new target genes. Its application may result in strong increase in the pace and the scale of releases of Synbio organism into the environment in regard to numbers of organisms, the species involved and the receiving environments.

However, AI may have only limited use to predict the consequences for ecosystems and biodiversity, since it only can make choices based on an a priori established set of rules. While data and metrics to assess genetic functions and ‘omics’ (such as transcriptomics, proteomics, metabolomics) may be, case by case, available in sufficient amount and quality, this is may not be the case in regard to interactions with the environment and related biosafety questions. Therefore, the scale of certainties and uncertainties are likely to increase in parallel.

Furthermore, AI can play an important role in future applications of bio-hacking and enhance the possibilities for technology abuse with relevance for biosecurity. This finding is supported by ongoing research for military purposes.

AI already is used by several companies to develop specific applications (such as US company Inari) and is supposed to become a major driving tool in Synbio releases in future. If AI is successful in the acceleration of the development and release of Synbio organisms, these may have disruptive effects not only on the production systems but also onto the environments. Comprehensive technology assessment is needed to control the scale of release, avoid unpredictable interactions and predict potential tipping points for the receiving environments.

In summary, the border and realms of knowledge and unknowns are likely to be shifted by AI and regulators will be faced with a new dimensions of uncertainties.

see references for example:
Eslami M. et al. (2022) Artificial Intelligence for Synthetic Biology, Communications of the ACM, Vol., 65 | NO. 5, DOI:10.1145/ 3500922

Hassoun S. et al. (2021) Artificial Intelligence for Biology, Integrative and Comparative Biology, volume 61, number 6, pp. 2267–2275 https://doi.org/10.1093/icb/icab188

Radivojević T. et al. (2020) A machine learning Automated Recommendation Tool for synthetic biology, NATURE COMMUNICATIONS | 11:4879 | https://doi.org/10.1038/s41467-020-18008-4 |

posted on 2023-11-17 11:18 UTC by Mr. Christoph Then, Testbiotech

This is a reply to 3038

RE: 4: Integration of artificial intelligence and machine learning [#3095]

Greetings Colleagues,

I am Dr. Becky Mackelprang, and I work as the Associate Director for Security Programs at the Engineering Biology Research Consortium (EBRC). The EBRC is a US-based organization that brings together synthetic biology researchers in academia and industry with policymakers to help address global and national needs. I appreciate the opportunity to be part of this discussion. The following comments are my own and do not necessarily reflect the views of EBRC or its members.

Last week, EBRC published a white paper on “Security Considerations at the Intersection of Engineering Biology and Artificial Intelligence” (https://ebrc.org/security-engineering-biology-artificial-intelligence/),1 which considers AI-enhancement of biodesign capabilities, AI-facilitation of automated closed-loop laboratories, and the security implications of Large Language Models lowering barriers to knowledge acquisition. A key message is that AI/ML models can accelerate aspects of research and development in synthetic biology / engineering biology, not that AI/ML necessarily enables novel capabilities in the field. With that framing, I respond to specific questions posed by the Secretariat:

1) Positive and negative impacts on the objectives of the Convention:
AI/ML will enable the acceleration of the development of synthetic biology tools and products that can help mitigate the impacts of climate change and support diverse, healthy ecosystems. See:

“Machine learning-aided engineering of hydrolases for PET depolymerization,”2 which describes the use of ML to design improved enzymes for degrading plastic; and

“Microbial synthetic biology for plant metabolite production: a strategy to reconcile human health with the realization of the UN Sustainable Development Goals.”3

Recognizing that AI/ML can accelerate synthetic biology research and development, see the following sources describing potential applications of synthetic biology to challenges associated with climate change, sustainability, and biodiversity conservation:

EBRC Roadmap, Engineering Biology for Climate & Sustainability (https://roadmap.ebrc.org/engineering-biology-for-climate-sustainability/);4

Yang et al., 2017’s “Systems metabolic engineering as an enabling technology in accomplishing sustainable development goals,” which directly describes how metabolic engineering can help achieve UN Sustainable Development Goals;5

Jansson’s 2023 “Microorganisms, climate change, and the Sustainable Development Goals: progress and challenges,”6 which identifies opportunities to use synthetic biology to, among other things, create healthier ocean ecosystems.

I do not foresee that introducing AI/ML tools into the design process will complicate regulatory evaluation of resulting synthetic biology products, thus I do not anticipate any additional negative impacts on the objectives of the Convention as a result of the use of those tools.

2) Timeframe:
At present, AI/ML tools are accelerating basic research and development. I don’t anticipate a surge in applications designed for environmental release as a result of AI/ML in the coming few years.

3) Gaps or challenges for risk assessment, risk management, and regulation:
The use of AI/ML to improve the design of engineered biological parts or systems does not pose novel risks nor significantly complicate risk assessment, risk management, or regulation of engineered products.

4) Other relevant impacts:
While there are important social considerations associated with the advancement of AI/ML broadly, and there are social considerations associated with some applications of synthetic biology, I do not see the social dimensions listed by the Secretariat or under the purview of the CBD as being influenced by the application of AI/ML to synthetic biology. Note, however, the white paper mentioned above does discuss potential biosecurity-related risks that merit further evaluation and tracking.

5) Specific problems addressed:
Researchers using AI/ML in their synthetic biology research aim to solve major challenges in health and medicine, energy, materials production, and climate change, sustainability, and biodiversity. Solutions and applications of synthetic biology are most likely to be pieces of larger sustainability, conservation, and/or preservation strategies. For example, in addition to synthetic biology-enabled water quality testing and/or purification, strategies should be employed to prevent water pollution and clean existing waterways.

7) Other considerations important to raise:
AI/ML-enabled or -facilitated designs exist only in the digital world until physical tools and techniques are used to build them. Thus, as discussed in EBRC’s new white paper on engineering biology and artificial intelligence, it is not useful to equate biodesign with the physical creation of a biological part or system, regardless of whether the design is deemed “good” or “bad.” Significant time, resources, and expertise for building, testing, and iterating upon biological parts or systems is still required.

Resources:
(1) Engineering Biology Research Consortium; Compiled and edited by Charlie D. Johnson, Wilson Sinclair, Rebecca Mackelprang (2023): Security Considerations at the Intersection of Engineering Biology and Artificial Intelligence. Engineering Biology Research Consortium. https://doi.org/10.25498/E4J017

(2) Lu, H.; Diaz, D. J.; Czarnecki, N. J.; Zhu, C.; Kim, W.; Shroff, R.; Acosta, D. J.; Alexander, B. R.; Cole, H. O.; Zhang, Y.; Lynd, N. A.; Ellington, A. D.; Alper, H. S. Machine Learning-Aided Engineering of Hydrolases for PET Depolymerization. Nature 2022, 604 (7907), 662–667. https://doi.org/10.1038/s41586-022-04599-z.

(3) Rojo, F. P.; Vuong, P.; Pillow, J. J.; Kaur, P. Microbial Synthetic Biology for Plant Metabolite Production: A Strategy to Reconcile Human Health with the Realization of the UN Sustainable Development Goals. Biofuels Bioprod. Biorefining 2023, 17 (6), 1485–1495. https://doi.org/10.1002/bbb.2522.

(4) Engineering Biology Research Consortium (2022). Engineering Biology for Climate & Sustainability: A Research Roadmap for a Cleaner Future. Retrieved from http://roadmap.ebrc.org. doi: 10.25498/E4SG64.

(5) Yang, D.; Cho, J. S.; Choi, K. R.; Kim, H. U.; Lee, S. Y. Systems Metabolic Engineering as an Enabling Technology in Accomplishing Sustainable Development Goals. Microb. Biotechnol. 2017, 10 (5), 1254–1258. https://doi.org/10.1111/1751-7915.12766.

(6) Jansson, J. K. Microorganisms, Climate Change, and the Sustainable Development Goals: Progress and Challenges. Nat. Rev. Microbiol. 21, 622–623.

posted on 2023-11-20 17:10 UTC by Ms. Rebecca Mackelprang, EBRC (Engineering Biology Research Consortium)

This is a reply to 3038

RE: 4: Integration of artificial intelligence and machine learning [#3097]

Dear colleagues, My name is Jim Thomas. I am a researcher and consultant for Civil Society organisations tracking the implications of new and emerging technologies. I am also a current member of the mAHTEG on Synthetic Biology. Thank you for the opportunity to make this submission in this open forum. I apologise in advance that this is a long submission but I believe this topic is _incredibly_ urgent and now is the right moment for the Parties to the CBD to proactively initiate the work necessary to understand , assess and engage with the implications of how Artificial Intelligence and Machine learning is integrating with Synthetic Biology. The current speed, investment and shifts in technical and corporate developments in this arena should have us all sitting up in our seats with great attention and focus to addressing this topic.

I'd start with a quote:
“Limits are now being breached. We are approaching an inflection point with the arrival of these higher order technologies, the most profound in history. The coming wave of technology is built primarily on two general-purpose technologies capable of operating at the grandest and most granular levels alike: artificial intelligence and synthetic biology” – Mustafa Suleyman Co-founder DeepMind (now Google) and Inflection AI, The Coming Wave -Technology, Power and the 21st century’s Greatest Dilemma, p55
-----

‘Synthetic biology’ as a field was founded 2 decades ago upon the use of large sets of genomic data (big data) and new computational tools for rational design of biological parts, organisms and systems. As the available underlying digital tools for data processing and design transform, so the abilities , scope and potential impact of Synthetic biology as a field and industry (including the potential impact on the natural world and the aims of the convention) is also now rapidly transforming in front of our eyes.

Generative AI as Syn Bio Gamechanger:

Regarding integration of algorithmic processes and AI, some synthetic biology firms and practitioners (eg Amyris Biotechnologies, Zymergen, Gingko Bioworks) have used forms of artificial intelligence, including neural nets, for over a decade to sort through genomic data and select viable sequences as part of their genome-design processes as well to amass large genomic libraries. However in the past year the release of massive transformer -based or diffusion-based AI ‘foundation models’ (such as Open AI’s ‘Chat GPT’ , Google’s ‘Bard’, Meta’s ‘Llama’ or Stability AI’s ‘Stable Diffusion’ ) has effected a historic switch in the AI space that is now also creating a parallel switch in the Syn Bio space. This is a switch that Nvidia (the leading producer of AI chips) describes as a move from “discriminative AI to generative AI” and in biotech it echoes a switch in the field from genomics to a more mature synthetic biology itself.

Whereas the ‘discriminative AI ‘ of the past decade was about sorting differences and identifying clusters and outliers (ie high throughput automated analysis) , Generative AI is about directly generating novel forms (ie high throughput automated de novo design). So todays ‘large language model’s such as ChatGPT or Dall-E respond to human natural language prompts by automatically stitching together complex and compelling texts, images or videos that appear (and basically are) entirely new and then outputting them into the world.

They do this by leveraging massive computational machine learning analysis (training) of the relationships of elements within billions of ingested texts and images and then subsequently fine tuning the model on more specialized datasets to generate probabilistic algorithms. The generative systems then generate outputs coherent with the algorithmic rules the trained model has deduced. This creates new variations of text or image that satisfy the users request for a novel synthetic media product with certain parameters – at lightning speed for hundreds of millions of users.

Generative Ai tools such as ChatGPT (which went into public use almost exactly a year ago) represent the fastest ever commercial take-up and development of a new technology in history and as such are generating astronomic sums of speculative investment for tech investors. The high expectation of large and near-term commercial returns is driving the direction, nature and speed of real word development. These tools and their results arriving so fast into commercial use stretch and break our governance mechanisms across many domains in highly challenging ways.

The next Big tech investment? – a rush to AI/SynBIo:

Having mastered AI text, image, sound and video generation , the leading AI firms (which are also the world’s largest companies by market cap) are now training their attention on exploiting other commercially valuable forms of ‘language’ that they can deploy their ‘large language models’ (and similar) on. In particular across the tech industry there is incredible excitement to make the biological ‘languages’ of genomics and   proteinomics (and by extension the field of Synthetic biology) the next multi-trillion dollar commercial frontier for the generative AI revolution to ‘disrupt’ and deliver investor payouts – this having already spectacularly ‘disrupted’ graphic design, script writing, advertising, movies, the legal profession, journalism etc. The senior director of AI research at NVIDIA , Anima Anandkumar, describes an example of how AI corporate leaders are making this switch to applying their generative AI capabilities to synthetic biology by describing Nvidia’s GenSLM large language model:

“Rather than generating English language or any other natural language why not think about the language of the genomes. You know, we took all the DNA data that is available - both DNA and RNA data, for virus and bacteria that are known to us -about a hundred and ten million such genomes. We learnt a language model over that and then we can now ask it to generate new genomes” (source: Nvidia - The AI Podcast - Anima Anandkumar on Using Generative AI to Tackle Global Challenges – Ep203 - https://soundcloud.com/theaipodcast/anima-anandkumar)

As a demonstration of GenSLM, Nvidia recently ‘fine tuned’ their genomic language model based on 110 million genomes on a further 1.5 million COVID viral sequences in order to generate novel coronavirus variants (to aid in pandemic prediction and vaccine design (see https://blogs.nvidia.com/blog/generative-ai-covid-genome-sequences/ ) – although this also has dual use concerns) . Amongst the synthetic variants of COVID that Nvidia created digitally by AI are strains that closely match actual recent strains that have emerged in nature since those in the training set. This work is not just about covid. NVIDIA emphasizes that the underlying genomic large language model (GenSLM) could now be fine-tuned to create de-novo genomes of other viruses or bacteria, enabling new syn bio microbes to be automatically generated. In effect Nvidia has created a ‘ChatGPT for microbial design’. It should be noted that Nvidia – a chipmaker and the worlds 6th largest corporation by market cap- was not previously seen as a life sciences company.

Nvidia is far from the only trillion dollar tech giant moving full speed into applying generative AI to Syn Bio entities and parts. Meta (Facebook), Microsoft (who now control Open Ai) , Alphabet (Google) and Stability AI are all investing heavily in developing generative AI tools for synthetic biology. The first 3 of these are also among the 7 richest corporations in the world.The established corporate giants of the biotech world (eg Bayer, Syngenta, Corteva) are also using this generative Ai approach or contracting with smaller firms that employ it on their behalf. One recent report by a UK AI think ATnks, The Ada Lovelace institute, suggests that the market for AI-driven genomics technologies could reach more than £19.5 billion by 2030, up from just half a billion in 2021. – see https://www.adalovelaceinstitute.org/report/dna-ai-genomics/

While the visceral impact of generative Ai is already being felt across many other economic sectors (eg. in entertainment, law, education, advertising), Biotech and AI leaders alike are touting that applying generative AI to Synthetic Biology in particular is going to be a much more explosive act of “disruption” than other fields so far – predicting it may precipitate a ‘splitting the atom’ moment for AI. In his recent bestseller “The Coming Wave’ Deep Mind (now Google) co-founder Mustafa Suleyman names the current coming together of generative AI with Syn Bio as the most significant “superwave” that technologists have ever seen. Jason Kelly, CEO of Syn Bio ‘unicorn’ Gingko Bioworks, who recently inked a five year partnership with Google to train large Llanguage Models for synthetic biology, describes the exceptional commercial opportunity of applying generative Ai to Syn Bio like this:

“Here’s why Bio is particularly interesting for folks who are interested in AI: the idea of a foundation model plus fine tuning with specialized data - people in AI understand that. Lets try that with one of the categories of English - lets say ‘legal’ . That thing has to compete with a lawyer at Robeson Grey trained for 15 years, taught by other humans, writing contracts designed to be understood by human brains… in English (a language that coevolved with our brains). That also gives us leverage from how our brains work - and so we are asking these computer brains - neural nets - to compete with us on our turf -it’s a pretty high bar that its got to compete with. Now lets go over into biology. I remind you it runs on code (sequential letters) feels a lot like language - but it aint our language, we did not invent it ,we do not speak it. we do not read it, or write it and so I feel like these computer brains are going to kick our ass a lot faster in this domain than they do in English ... if you are looking to understand where Ai is really going to flip the script - not be a low level Clay Christensen disruption which is what’s happening in English - but rather be a like splitting the atom: its bio.”
(Jason Kelly speaking on ‘No Priors’ Podcast ep34 – see 12:50 at https://www.youtube.com/watch?v=snt-fMsCDVI)

Synthetic Protein engineering :

Gingko’s AI collaboration with Google is initially focusing on using generative AI to design proteins drawing on their in-house codebase of 2 billion protein sequences. Jason explains that “The new idea is “Can I make a foundation model that … speaks ‘protein’ just like GPT4 speaks english?” . Indeed following the success of Deep Mind’s Alphafold programme to ‘solve’ protein folding (a problem previously thought unsolvable) , several of the first wave of generative Ai models in Synthetic Biology are focusing exactly on generating de novo proteins never before seen in nature (“generative protein design”) as well altering and ‘optimising’ existing natural proteins.
.Some of these generative AI tools for protein engineering have already been highlighted in the note for this online open forum – eg ProtGPT2, Protein DT and Chroma - but there are also a number of startups (beyond Gingko) focused entirely on using generative AI for creating a range of de-novo proteins for commercial markets including enzymes, catalysts, food ingredients, pharmaceuticals, biomaterials, coatings, gene therapy and more. In another example of how AI is bringing unusual tech entrants into Syn Bio, global cloud data company Salesforce has developed ProGEN yet another large language model for generating novel proteins. This model was trained by feeding the amino acid sequences of 280 million different proteins into a machine learning model> Salesforce then fine-tuned the model by priming it with 56,000 sequences from just one class of protein: lysozymes – in order to initially generate functional novel lysozymes (used for food ingredients). A report on this work in Science Daily emphasises just how huge the protein design space is for novel variation just within this one class of proteins:

“With proteins, the design choices were almost limitless. Lysozymes are small as proteins go, with up to about 300 amino acids. But with 20 possible amino acids, there are an enormous number (20 to the power of 300) possible combinations. That's greater than taking all the humans who lived throughout time, multiplied by the number of grains of sand on Earth, multiplied by the number of atoms in the universe. Given the limitless possibilities, it's remarkable that the model can so easily generate working enzymes.” - see https://www.sciencedaily.com/releases/2023/01/230126124330.htm

Proteins as food ingredients is just one slice of the future engineered protein market that companies like Salesforce or Gingko are chasing. Syn Bio companies are developing engineered proteins as coatings, sweeteners, pesticides, packaging etc – including sevela uses that will involve environmental release of these novel protein entities.

Planetary boundaries for novel entities:

By itself this new AI-powered ability to generate a wider range of de novo proteins ever faster for industrial use should be regarded as a signifcicant industrial shift in production patterns for the CBD to consider in relation to its mandate - with potentially huge impacts on biodiversity in the longer term once a greater variety of engineered proteins make it int the biosphere. This will place a high requirement on monitoring, assessment and reporting as well the need to develop systems of recall, clean up and liability. A historical point of comparison might be the advent of synthetic chemistry techniques and establishment of the accompanying petrochemical-based synthetic chemical industry in the late 19th and early 20th century that flowed from new techniques to ‘crack’ hydrocarbons. That ability to generate de-novo chemical molecules before proper oversight and regulation were in place led to creation and dispersal of many thousands of different synthetic chemicals into the biosphere - many of which are now subject to complicated and difficult global efforts at clean-up or mitigation (or attempts at industrial replacement) because of the unxpected biological and health effects of those synthetic compounds interacting with the natural world. It is estimated that there are between 140,000- 350,000 different types of manufactured chemicals being released to the biosphere at approximately 220 billion tonnes per year and that the USA alone adds approx. 15000 new synthetic chemicals to the inventory every year. Most of these are new-to-nature and many are toxic at some concentration. (see this Naidu et al review article: https://www.sciencedirect.com/science/article/pii/S0160412021002415) In early 2022 scientists reported that humans had breached the safe ‘planetary boundary’ for novel chemical entities in the biosphere.

The prospect of unleashing a new generative synthetic protein industry, undergirded with massive speculative capital intended to artificially generate an array of de novo proteins for market profits ahead of deliberate international discussion and rule-setting should by itself raise significant red flags. That it is supercharged with the current investment hype on AI is doubly worrying. Proteins have been described as intricate nanomachines whose ongoing and multiple interactions govern most life processes at the molecular level. Synthetic Proteins as a class of complex molecules are therefore more likely to be more biologically active (and disruptive) than simple synthetic chemical compounds – indeed they may be deliberately designed to speed up, slow down, transform or otherwise alter molecular biological processes at the basis of life for industrial purposes requiring more complex safety assesment . Observers have noted for example that synthetically engineered proteins appear to have different stability than naturally evolved proteins – which may raise comparisons with the persistence problems of certain classes of synthetic chemicals. (eg POPs).

It was from the enormous challenge of trying to deal with the negative effects of unassessed, poorly understood synthetic chemicals that the precautionary principle was established in environmental governance which is enshrined in the preamble to the CBD itself as well as in the objective (article 1) of the Cartagena Protocol on Biosafety. This time we have the chance to apply it before the number of novel protein entities entering the biosphere starts to mimic the trajectory of synthetic chemicals..

“Text to Protein” may mean greater distribution, dual use:

Even more concerning is that the industrial generation of de novo proteins through AI-generated syn bio under current commercial directions may move towards becoming a widely distributed, automated and difficult to manage production industry far faster than the capital-intensive industrial chemistry industry – this as a result of new protein engineering Ai tools. Just as Chat GPT or Dall-E almost overnight enabled millions of ordinary users with just a web browser to enter natural language text descriptions in order to generate synthetic media, so new foundation models are being developed for web based natural language “text-to protein” discovery. In a system like proteinDT a user can write in natural language (such as English) the characteristics that they want to see in a synthetic protein (eg high thermal stability) and the generative AI model will then generate multiple viable synthetic protein sequences that can be selected and created out of synthetic RNA strands (eg expressed by an engineered microbe or in a cell free system) – equipment that in itself is becoming more distributed.

This distributed “text-to protein” model could make oversight more difficult. For example one paper on text-to-protein generation acknowledges “Although text-based protein design has many potential positive applications in agriculture, bioengineering, and therapeutics, it can be considered a dual-use technology. Much like generative models for small molecules (Urbina et al., 2022), ProteinDT could be applied to generate toxic or otherwise harmful protein sequences. Even though acting on these designs would require a wet lab, synthesizing custom amino acid sequences is typically straightforward compared to synthesizing novel small molecules. “ - see https://arxiv.org/pdf/2302.04611.pdf   The paper further notes that the authors own model allows generation of venomous and dangerous proteins and that “Future efforts to expand the training dataset and modeling improvements could increase the dual-use risk. “ .

Synthetic biology firms such as Gingko argue that in this way it will be possible to replace existing petrochemical based production in large capital intensive facilities with fast and lighter biological production methods. While replacing petroleum-derived chemicals may be one outcome, it will only be one amongst many commercial drivers of the technology. Other commercial entities may replace natural products grown by small farmers or displace forest or marine-derived commodities – changing land and ocean use patterns. If synthetic engineered proteins become a rapidly expanding, structurally diverse and widely distributed class of novel synthetic entities they will enter the biosphere and the new engineered proteins industry will necessitate new forms of biosafety assessment and oversight. It may further worsen the overreach of the planetary boundary on novel entities..In view of this the Precautionary approach should be followed to put a halt to these developments pending a more serious review by parties. This is called for under the convention Article 7c and the Cartegena Protocol (article 1) since this novel threat to biodiversity follows (in part) from the use of living modified organsims.

Further areas of consideration from Ai/Synthetic Biology integration:

• Bio-computation/ Biointelligence : Beyond accelerating Synthetic Biology, efforts are focusing on applying Syn Bio to transform AI and computation itself toward Biology-based computing, DNA data storage and molecular circuits. This use of living engineered elements to enable computation is sometimes called BioIntelligence. Using biological elements as a substrate for computing is posited as a potential solution to resource limitations in silicon-based computation. Engineered organisms, eg. E. coli, demonstrate remarkable computing capacity - distributed across a vast number of cells. Synthetic Biologists in Kolkata India have designed e-coli bacteria to computationally solve maze problems. (see https://thenewstack.io/distributed-e-coli-biocomputer-solves-maze-problems/ ) Australian start-up Cortical Labs has even grown a sort of brain-in-a- vat (in-vitro neural cell culture) as ‘biological processing unts’ and taught it to play the computer game ‘pong; - see https://www.nature.com/articles/d41586-022-03229-y. While we are some years away from a synthetic biology-based form of Ai being in common use its quite feasible to see a biological turn in computing in the coming decade. It would be helpful for the CBD to consider how this new use of genetic resources and genetic processes challnges the mandate of the convention.

> Sensing and Signaling for AI Agriculture: Organisms are being engineered as sensing/ signaling components in cyberphysical systems that mix Ai with Syn Bio. Specifially companies are now developing engineered crops that respond to stressors like drought or pests by emitting fluorescence that can be sensed as AI-guided precision agriculture. Eg. InnerPlant is one synthetic biology that genetically engineers plants to emit signals when stressed, These can then be monitored by satellite and cameras on John Deere's "see and spray" systems. – see https://www.dtnpf.com/agriculture/web/ag/blogs/machinerylink/blog-post/2023/01/17/deere-works-use-camera-technologies
This amounts to integration with Artificial intelligence systems in a different way. Just as seed and chemical companies previously engineered crops to be ‘roundup -ready’ to fit with their proprietary chemicals. Now the same firms want crops to be ‘robot-ready’ to fit with their proprietary precision agriculture systems. In the case of ‘roundup-ready crops’ this led, not surprisingly, to those chemical companies establishing overbearing market dominance through technical lock-in. the same may occur through use of ‘robot-ready’ lock-ins too.

Impacts on Conservation, Sustainable Use, and Equitable Sharing:
 Conservation: AI's application in mass-designing proteins, genetic parts, and organisms accelerates industrialization in Synthetic Biology across various sectors. This rapid development raises concerns about biosafety, governance, and the entry of diverse synthetic organisms into markets and the environment. Text-to-DNA and text-to-protein applications lower barriers for amateurs and small players, posing challenges in detection, monitoring, governance, and containment.

 Sustainable Use: AI-driven text-to-protein generation may disrupt industries like flavors, fragrances, and cosmetics. Sustainable use of biodiversity, largely by small-scale farmers in the global South, is threatened as AI offers biosynthetic alternatives for key compounds. For example Arzeda is a leading engineered protein company (they call themselves “The Protein Design Company”) using generative Ai tools to make commercial proteins. Their first product is a de novo engineered enzyme that upgrades common but bitter components in stevia (eg reb A) to rarer sweeter components (eg Reb D and Reb M). This significantly changes the economics of stevia growing -giving incredible market power to the holder or licensee of the patent on the enzyme. If enzymes are used to upgrade low quality biomass into high quality flavours, fragrences and sweeteners that too will disrupt sustainable use and natural markets. For example Israeli startup Ambrosia Bio is working with Gingko bioworks AI platform to develop engineered enzymes to convert cheap industrial feedstocks (e.g., sugar and starch) into rare sugars and specialized ingredients such as allulose. (see https://www.ginkgobioworks.com/2023/06/29/developing-a-more-scalable-enzymatic-process-for-allulose-with-ambrosia-bio/)   Allulose is found naturally only in small quantities in plant foods such as brown sugar, maple syrup, wheat and dried fruits like figs and raisins. Being able to enzymatically convert it from cheap sugars and starches may for example take away market opportunities from maple syrup or fig producers.

 Equitable Sharing of Benefits: AI's extensive use of biological data complicates governance and fair sharing of benefits from digital sequence information (DSI). generative AI rely on remixing collected DSI without proper attribution or benefit sharing, potentially leading to digital biopiracy. See more below re ‘Theft and Piracy’ under ‘lessons to learn’.

Regarding what lessons can be learned from other domains addressing artificial intelligence:

Fields of critical scholarship on AI (including AI Ethics and AI Safety) are foregrounding some of the following questions about Generative (and other ) Ai systems that are also directly relevant to addressing AI/Syn Bio integration:

1. Bias and other problems in training data.
There is an extensive literature and acknowledgement of concern among AI developers that the design, nature and selection of training datasets for foundational models (and for fine tuning) can introduce highly problematic biases in outcomes. High profile examples of AI’s bias problem include racial biases where Ai programs trained on public data scraped from the internet were either unable to recognize skin tones or recognized or presented black and brown people with racial slurs and associations. AI systems used for recruitment were found to bias strongly against women and ethnic minorities as were Ai systems for legal sentencing. Scholars have suggested that Ai systems for biodiversity data may also introduce problematic biases – for example if the training set is sourced from databases that are skewed towards urban or northern settings, where there is greater amount of data contributed by industry or an absence of indigenous or locally gathered data. Certain types of knowledge count as ‘data’ while other types of knowledge may not be encoded or be ‘cleaned up’ (all data undergoes layers of cleaning).

In the case of datasets for genomic design it will be necessary to reflect carefully what biases, absences and other warping effects the underlying training datasets may create. For example many crop genomic datasets are compiled by commercial and other research institutions who have been focused for several decades on breeding very particular traits towards industrial monocrop agriculture systems. These datasets may claim to represent genomic diversity within a crop (eg maize or beans) but may actually massively overrepresent genomic sequences for a narrow band of varieties optimized for commercial, northern, large scale agriculture systems. Microbial or viral genomic collections also may have been collected and digitized for very specific purposes or under specific conditions that make them non-representative or build bias into the training data.

2. Black box effects and hallucinations:
One of the hardest to deal with phenomena in Ai systems based on machine learning is the problem of the black box. In its simplest form this refers to fact that machine learning done by the system itself without supervision and so the ‘lessons’ and associations that it formulates into algorithms are not known to the humans operating the system (and may not be knowable). That is: the logic by which the Ai makes its decisions are an unexplainable ‘black box; . This leads to unexpected and unexplainable behaviours and outputs which may be the result of the system mis-associating or failing to make key associations that might be obvious to a human. One of the most deadly example of black box problems playing out has been when AI-guided ‘self-driving’ automobiles fail to recognize pedestrians, cyclists, children and other elements in their vicinity and may drive into them or over them.

Related to this is the phenomenom of AI ‘hallucinations’ by generative Ai systems. Large language models such as ChatGPT routinely incorporate elements in their output that appear compelling but are factually inaccurate or bizarre: Living people are described as deceased, dates are given wrongly, generative AI images of people develop additional body parts or mangled unreadable signage is created etc. While such hallucinations and black box failures can be problematic enough in the 2 dimensional and electronic domains of text, image, video or sound, they could be highly problematic if incorporated into genomic design of four dimensional living organisms or of active biological proteins released into the body or the biosphere. Genetic engineers already face problems of unexpected and emergent effects from small genomic changes (even as small as a single base pair). If an AI-designed genome was to begin to behave unpredictably or have significant side effects it may be impossible to understand why those changes have happened or to locate the cause until long after the organism or protein had entered the biosphere. It is not clear how biosafety assessment of Ai-designed organisms or proteins can proceed when the system is not even able to explain its own design rationale. In response to the wicked problems of the AI black box , the European Union is now prioritizing development of ‘Explainable Ai’. The CBD may wish to also insist that any organism, protein or other biological components designed through generative Ai must be able to provide strong explanations of its design decisions.

3: Theft and Biopiracy
While AI-generated art, video, music and other synthetic media are presented as de novo pieces of work they are in fact only new assemblages of existing material arranged according to probability by an algorithm. They depend entirely upon an underlying corpus of work created by human beings and turned into ‘tokens’ by the foundation model. In the domain of generative cultural media the recent frontline of public policy and legal battles has been artists, musicians and other cultural workers bringing AI companies to court for theft and piracy of their creative labour. Visul artists are sometimes able to recognize parts of their own work in outputs mixed together by generative platforms such as Dall-E, Midjourney or ChatGPT because their work was part of the original training dataset. There are now software programmes such as HaveIbeentrained.com where artists can upload their own images to discover if their material has been incorporated without their permission into Ai training datasets.

Just as theft and unauthorized use of cultural materials has emerged as a major legal and political fight in AI, so it is likely that the use of digital genomic sequences and other sorts of digital biological sequences in AI training datasets will further deepen and intensify the conflicts over biopiracy and digital sequence information (DSI). When Nvidia boasts that their GenSLM model contains “all the DNA data that is available - both DNA and RNA data - for virus and bacteria” that comment indicates that they have likely not received prior informed consent to use all of that collected digital sequence information to train their generative Ai model since the use of derivative DSI for generative Ai platforms is not currently written into material transfer agreements (nor were they when the original bioprospecting activity took place). Nor does there appear to be any mechanism by which the contribution of a particular digital sequence to creating a synthetic genomic sequence or proteinomic sequence can be recognized and fairly recompensed. There is not even a version of ‘haveibeentrained.com’ for the original providers and steward of genetic diversity to discover if their genomic sequences have been incorporated into generative AI synbio designs – even though the resultant orgaisms and proteins may be reaping actual commercial outcomes (such as Arzeda’s steviol conversion enzyme or AmbrosiaBio/Gingko’s Allulose enzyme).

The process withing The CBD and under the Nagoya Protocol that is addressing the thorny problems of DSI needs to urgently consider and investigate the further thorniness of how the arrival of generative AI into the Synthetic biology space depends upon mass unacknowledged theft of genomic sequences and other DSI without even attribution, let alone consent or benefit sharing.

(edited on 2023-11-21 20:00 UTC by Mr. Jim Thomas, Friends of the Earth U.S.)

posted on 2023-11-21 18:48 UTC by Mr. Jim Thomas, Friends of the Earth U.S.

This is a reply to 3038

RE: 4: Integration of artificial intelligence and machine learning [#3130]

Dear Participants of the Open-ended Online Forum on Synthetic Biology,

Thank you for your interventions and active engagement.
The forum is now closed for comments.

Thank you,
The Secretariat

posted on 2023-11-22 22:00 UTC by Mr Austein McLoughlin, Secretariat of the Convention on Biological Diversity