Invited Speakers



Talk Title: Learning and using self-supervised phenotypic features in small molecule discovery

Start Time: 11:00 AM PDT
Speaker: Paula A. Marin Zapata, Data Scientist, Bayer Berlin.

Abstract: Small molecule phenotypic screens test the effects of hundreds of thousands of compounds using miniaturized assays and automated image acquisition. These screens generate vast datasets that can be used to understand compounds’ mode of action and toxicity, characterize disease phenotypes, uncover new biology, and more. Image featurization is a prerequisite for the utilization of the data and self-supervised learning (SSL) is particularly suitable for this task given the scarcity of phenotypic annotations or biological labels. In this talk, I will show how we use SSL to extract image representations in small molecule discovery at Bayer AG. Our models provide powerful representations of plant phenotypes in herbicide screens as well as cellular phenotypes in Cell Painting data which outperform the commonly used software Cell Profiler. I will introduce examples of how image features can be leveraged in multiple applications, including phenotype clustering, mode of action identification, prediction of mitochondrial toxicity, and de novo molecular design with generative models among others.

Speaker Bio: Paula is a senior scientist at the Machine Learning Research group from Bayer AG in Berlin and a board member of the CytoData society. She obtained a PhD in biology from the German Cancer Research Center DKFZ, a MSc in Applied Mathematics from Eindhoven University of Technology, and a BSc in Biological Engineering from the National University of Colombia in Medellin. Her research focuses on computer vision applications in drug discovery, with emphasis in phenotypic profiling and Cell Painting.



Talk Title: Building Large-Scale Foundation Models for Digital Pathology with Millions of Whole Slides and Multi-Modal Generative AI: from Virchow to PRISM

Start Time: 13:00 PDT
Speaker: Siqi Liu, Ph.D., Director of AI Science, Paige AI.

Abstract:The application of artificial intelligence (AI) in precision medicine and decision support systems, particularly through the analysis of pathology images, holds immense potential to transform cancer diagnosis and treatment. However, implementing AI in computational pathology presents several challenges, including data heterogeneity, the need for large annotated datasets, huge compute costs, and the alignment between image tiles and whole slide information across multiple magnifications. To address these challenges, we propose a dual approach: (1) developing a family of large foundation models trained on millions of whole-slide images using self-supervised learning techniques to capture essential features and patterns across diverse pathology images; (2) pretraining an aggregator model that leverages vision and language through multimodal generative AI learning to incorporate clinical information typically reported in medical settings, thus enhancing the slide-level foundation model with richer contextual understanding. Our combined approach, developed through a collaboration between Paige and Microsoft, not only achieves state-of-the-art performance across various benchmarks but also demonstrates the generative AI’s potential to unlock new possibilities for next-generation computational pathology, promising more diverse applications and reduced development costs for better cancer diagnosis and treatment.

Speaker Bio: Siqi Liu is the Director of AI Science at Paige AI, based in New York City, US. Siqi oversees all AI product development and research initiatives at Paige. Paige AI is a leading company in digital pathology, leveraging advanced AI to revolutionize cancer diagnosis and treatment. Siqi's team aims to apply cutting-edge AI technology to real-world clinical and life sciences settings. Prior to joining Paige, Siqi worked at Siemens Healthineers in Princeton, US, as a staff scientist, where they applied AI to various radiology image modalities, including CT, MRI, and X-ray, Ultrasound, etc. Siqi earned a PhD in computer science from the University of Sydney, Australia in 2018, under the supervision of Weidong Cai.



Talk Title: Protein Data Bank: From Two Epidemics to the Global Pandemic to mRNA Vaccines and Paxlovid

Start Time: 9:20 AM PDT

Speaker: Stephen K. Burley, Uniersity Professor & Henry Rutgers Chair, Rutgers University

Abstract: Structural biologists around the world and the Protein Data Bank (PDB) played decisive roles in combating the COVID-19 pandemic. This talk will explain how global three-dimensional (3D) biostructure data was turned into global knowledge, allowing scientists and engineers around the world to understand the inner workings of coronaviruses and develop effective countermeasures against SARS-CoV-2. State-of-the-art mRNA vaccines, initially designed with guidance from single-particle cryo-electron microscopy structures of the SARS-CoV and MERS-CoV Spike Proteins, benefited more than five billion individuals around the world by preventing viral infections entirely or significantly reducing morbidity and mortality. Structure-guided drug discovery efforts at Pfizer, first initiated in the 2000s in response to the SARS-CoV epidemic and reactivated in 2020 early in the global pandemic, yielded nirmatrelvir -- a potent, orally-bioavailable, covalently-acting, peptidomimetic inhibitor of the SARS-CoV-2 Main Protease. This targeted antiviral drug received Emergency Use Authorization from the United States Food and Drug Administration in December 2021, less than two years following public release of the viral genome sequence. It is used clinically for the treatment of acute SARS-CoV-2 infections in a fixed dose combination with ritonavir and sold under the brand name Paxlovid. Bolstered by open access to research data generated with public and private monies, particularly 3D structures of coronavirus proteins archived in the PDB, basic and applied researchers made a difference when the world desperately needed them to succeed. To underscore the importance of these contributions, I quote Dr. Anthony Fauci, former head of the National Institute of Allergy and Infectious Disease, “Show me a person who’s vaccinated, got infected, took Paxlovid and died. I can’t find anybody.”

Speaker Bio: Stephen Kevin Burley is an expert in data science and bioinformatics, structural biology, and structure-guided drug discovery for oncology. He is the Director of the RCSB Protein Data Bank (RCSB.org). Within Rutgers, The State University of New Jersey he serves as University Professor and Henry Rutgers Chair, Founding Director of the Institute for Quantitative Biomedicine, and Cancer Pharmacology Research Program Co-Leader within the Rutgers Cancer Institute of New Jersey. Burley’s previous roles were Distinguished Lilly Research Scholar, Eli Lilly and Co.; Chief Scientific Officer and Senior Vice President for Research, SGX Pharmaceuticals, Inc.; Richard M. and Isabel P. Furlaud Professor, The Rockefeller University; and Investigator, Howard Hughes Medical Institute. His degrees include M.D. - Harvard Medical School; D.Phil. - Oxford University; and B.Sc. (physics) and Doctor of Science (Honoris causa) - Western University. Burley has published extensively in data science and bioinformatics, artificial intelligence/machine learning, structural biology, and clinical oncology.



Talk Title: High-throughput mapping of 3D reconstructed neurons at whole-brain scale using petavoxel-computing

Start Time: 16:10 PDT
Speaker: Hanchuang Peng, Ph.D, Allen Institute for Brain Science.

Abstract: In this talk I will discuss our work of a large-scale study of whole-brain neuron morphometry, analyzing 3.7 peta-voxels of mouse brain images at the single-cell resolution, producing one of the largest multi-morphometry databases of mammalian brains to date. We annotated 3D locations of cell bodies of over 182,000 neurons, modeled more than 15,000 dendritic microenvironments, characterized the full morphology of over 1,800 neurons along with their axonal motifs, and detected over 2.6 million axonal varicosities that indicate potential synaptic sites. Our analysis covers six levels of information related to neuronal populations, dendritic microenvironments, single-cell full morphology, sub-neuronal dendritic and axonal arborization, axonal varicosities , and sub-neuronal structural motifs, along with a quantification of the diversity and stereotypy of patterns at each level. Overall, our study provides an integrative description of key anatomical structures of neurons and their types, covering a wide range of scales and features, and contributes a large-scale resource to understanding neuronal diversity in the mammalian brain. With this dataset, we start to formulate a possible whole brain scale connectome at the single neuron resolution for mouse brains.

Speaker Bio: Hanchuan Peng joined the Allen Institute in 2012 to build a computational neuroanatomy and smart imaging group for the Institute’s new initiatives in neural coding and cell types. His current research focuses on bioimage analysis, large-scale informatics, machine learning, as well as computational biology. Before joining the Allen Institute, Peng was the head of a computational bioimage analysis lab at Howard Hughes Medical Institute, Janelia Farm Research Campus. His recent work includes developing novel algorithms for 3-D+ image analysis and data mining, building single-neuron whole-brain level 3-D digital atlases for model animals, and Vaa3D, which is a high-performance visualization-assisted analysis system for large 3-D+ biological and biomedical-image datasets. He is also the inventor of the widely cited minimum-redundant maximum-relevance (mRMR) feature selection algorithm in machine learning. Peng received his Ph.D. in biomedical engineering from Southeast University, China. He held postdoctoral positions at the Lawrence Berkeley National Laboratory at the University of California, Berkeley (computational biology, bioinformatics, and high-performance data mining with a particular focus on gene expression analysis) and at Johns Hopkins University Medical School (human brain imaging and analysis). He won several awards, including a Cozzarelli Prize (2013) for his collaborative research on dragonfly neurons, which “recognizes outstanding contributions to the scientific disciplines represented by the National Academy of Sciences (USA)”.



Talk Title: Microscopy, foundation models, and the scaling hypothesis: a phenomenal step forward for image-based profiling

Start Time: 8:40 AM PDT
Speaker: Berton Earnshaw, Ph.D., Machine Learning Fellow, Recursion.

Abstract:The use of morphological profiles of cellular microscopy images is by now a widespread method of investigating the functional effects of perturbations and treatments on cellular models of disease, yet the insights gained from such analyses are only as good as the features extracted from these unstructured data. In this talk, I will describe how Recursion leveraged its phenomic datasets, computational resources, and a self-supervised learning objective to build Phenom-1, a foundation model of cellular morphology whose performance on downstream tasks like recall of known biological relationships appears to scale linearly in the logarithm of the total computational cost used to train it, a phenomenon known as the scaling hypothesis. A smaller version of this foundation model, called Phenom-Beta, was recently released under a non-commercial license on NVIDIA’s BioNemo platform. I will also briefly describe the role that foundation models like Phenom-1 play in a vision of the future of drug discovery, where AI agents generate and test hypotheses inferred from such models, and give a demo of LOWE, a first step towards this vision in which the cognitive capabilities of LLMs are leveraged to reason about and execute typical tasks involved in drug discovery: retrieval and analysis of data, design and execution of experiments, generation of compounds and prediction of their properties, etc.

Speaker Bio: Berton Earnshaw is a Founding Fellow at Recursion, a leading clinical-stage TechBio company, and Scientific Director at Valence Labs, an AI research lab within Recursion whose mission is to industrialize scientific discovery to radically improve lives. Berton earned a PhD in mathematics from the University of Utah in its mathematical biology group, and was a postdoc at both the University of Utah and Michigan State University. Berton has worked in many scientific and leadership roles in industry, including CTO of Perfect Pitch (now Boomsourcing), Director of Data Science and Operations at Red Brain Labs (acquired by Savvysherpa), and Principal and Senior Scientist at Savvysherpa (acquired by UnitedHealth Group). While at Recursion, Berton has led the development and deployment of many of the machine learning capabilities employed in its drug discovery workflows, and currently directs multiple research programs across Recursion and Valence Labs.



Talk Title: Predicting Patient Treatment Outcomes using (Diffusion) Generative Models

Start Time: 14:20 PDT
Speaker: Charlotte Bunne, Assistant Professor, EPFL.

Abstract:As the smallest functioning living units, cells are key to understanding health and disease. To predict a patient’s responses to molecular drugs and design efficient treatments, it is vital to recover the underlying dynamics cells take upon administering a drug. Biologists have long sought to simulate the state and functioning of a cell in order to understand and control its core processes. In this talk, I will discuss how to use and design artificial intelligence tools combined with large biomedical datasets to infer such cellular behavior. I will cover our work on (diffusion and flow matching) generative models that robustly predict treatment responses of biopsied cells from metastatic melanoma patients. As integral part of an observational clinical cohort study, they are able to reveal otherwise hidden patterns of signaling pathway modulation associated with driver mutations and metastasis sites upon cancer treatments. Lastly, I will provide a perspective on how to develop biological foundation models and realize the vision of a virtual cell powered by artificial intelligence that will shape the future of treatment design and personalized therapies.

Speaker Bio: Charlotte Bunne is an assistant professor at EPFL in the Computer Science and Life-Sciences Department. Before, she was a PostDoc at Genentech and Stanford and Before and completed a PhD in Computer Science at ETH Zurich working with Andreas Krause and Marco Cuturi. During her graduate studies, she was a visiting researcher at the Broad Institute of MIT and Harvard hosted by Anne Carpenter and Shantanu Singh and worked with Stefanie Jegelka at MIT. Her research aims to advance personalized medicine by utilizing machine learning and large-scale biomedical data. Charlotte has been a Fellow of the German National Academic Foundation and is a recipient of the ETH Medal.