(SPEECH) [MUSIC PLAYING] (DESCRIPTION) Text, Westat, Advancing Clinical Research from Data to Discovery. Hashtag Data 2 discovery Scott Royal at a podium (SPEECH) Good afternoon, and welcome to Westat. I am Scott Royal and president of Westat, and I really thank all of you for attending the Data to Discovery Conference today. I'm really glad to have you here. I will say that when we planned this event months ago, we were recalling how usually glorious the falls are here in the DC area. And as you notice, we have a beautiful terrace outside. And so we had hoped that our cocktail hour at the end was going to be this lovely space. We're not sure we can guarantee that, but at least it's not a hurricane yet here. Let me take a minute to tell you a little bit about Westat. We are an innovative, professional services firm focused on helping our clients improve outcomes in health, education, social policy, and transportation. Our goal is very simple. We are dedicated to improving lives through research. Research starts with data, and at Westat, in case you didn't know, we collect a lot of data, so data is very important to us. But data alone doesn't provide all of the answers. By coupling data with the tools of data science, innovative statistical methods, and cutting edge technology, we are able to explore, track, and uncover meaning and value to help solve real-world problems. That's what this conference is about today-- using innovative tools and techniques and technologies to discover answers to our research challenges. (DESCRIPTION) Slide, Welcome (SPEECH) Collectively, we are pushing forward to find ways to answer puzzling questions and even experiment with new ways to find answers. Today's event is a great opportunity to open up a conversation on these challenges around big data. It's an opportunity for us to collaborate from different places, to think outside the box, and to access data that informs research and improves outcomes. When I think about innovation, I think about when we put this conference together. When we really want to challenge this idea of thinking outside the box, I'm thinking about an interesting inventor at 3M. Do you all know who Arthur Fry is? Many of you probably don't. But Arthur Fry worked for 3M back in the '60s and '70s. And he had a colleague named Spencer Silver. And Spencer and his colleagues had come together and found an adhesive that didn't stick very hard on things, but it stuck. Arthur had seen this presentation. They didn't know what in the world to do with this adhesive. I mean, people generally want adhesives to be pretty sticky. But Arthur sang in a church choir, and every time he tried to make little bookmarks for his hymnal, he'd open them up, and the bookmarks would fall out on the floor. (DESCRIPTION) Mimics bookmarks falling everywhere (SPEECH) And so Arthur had this great idea. He said, maybe this presentation I saw from my colleagues, I could put some of this little adhesive on the paper, and it would hold the paper in place. And hence, Post-it Notes came about. So Arthur and Spencer are credited with some controversy, I will say, for the Post-it Note creation. But it comes from thinking out of the box. Because sometimes, we already have solutions for problems that face us every single day. And some of what we're going to be talking about today is, do we already have data, already have information that's relevant to research questions that we face every day? Nancy Dianis, vice president of clinical trials research group here, has spearheaded this event. Nancy came to my office early after I started here at Westat and said, I want to do this event. And I said last night, when I met with some of our presenters, I said, I hadn't been here long enough to say no. And I'm glad I didn't. Nancy leads an interdisciplinary clinical trials team of epidemiologists, and biostatisticians, and other clinical research associates. Their expertise encompasses infectious disease, and chronic disease, international health, and a breath of other capabilities. As many of you know, Westat is a flexible and collaborative place. Our culture allows us to pull in expertise across the firm, including IT, systems, statistics, and other capabilities in support of research. We work with our government clients and our commercial clients to develop and manage all aspects of rigorous clinical studies. For example, we're currently assessing diabetes risk by looking at already collected data and diving into this data from electronic health records. We're examining how social networks help disseminate health risk information. We're also examining how to improve breast cancer detection in Ghana and supporting the research of Zika in Mexico. Our experience includes implementing, managing, and monitoring complicated clinical trials and clinical research. Today, we have a challenge to explore how to harness what we call big data often in biomedicine and do it faster and cheaper than ever before and to help transform medicine and health care. Westat's goal is to leverage our expertise and to develop novel ways of combining statistical methods and technology, including natural language processing, machine learning, and neural networks to transform data into insights that help our clients predict disease and develop effective therapies. Speaking of transformation and innovation, our distinguished keynote speaker, Dr. Atul Butte is transforming the way we think about and use biomedical data. He and his colleagues are uncovering innovative ways to harness a variety of data sources and to understand the immune system, leverage public genomic data to reveal novel insights on cancer, and predict whether drugs can be repurposed. Dr. Butte is the Priscilla Chan and Mark Zuckerberg distinguished professor and inaugural director of the Institute for Computational Health Sciences at the University of California San Francisco. He has an impressive array of accomplishments. So I invite you to read more about him in the bio that we've provided. And also, you can go online and watch a lot of other presentations that he's done. It's very fascinating and really informative. So I look forward to the chance to talk to all of you or at least many of you, I hope, at the reception following our conference. But now, let's welcome Dr. Butte. [APPLAUSE] (DESCRIPTION) Dr. Butte (SPEECH) Thank you. (DESCRIPTION) They shake hands as Dr. Butte takes the podium (SPEECH) Thank you so much. Thank you. Thank you very much. I know I'm competing with the Apple event going on, so I'll try to put on a show. (DESCRIPTION) Dr. Atul Butte, MD, PhD, UCSF, chief data scientist (SPEECH) There we go. So far, they've announced a new watch if you're keeping score. Nothing more than that. It'll go on for hours, and I won't, so I'm sure we can catch up. Well, it's great to see a lot of colleagues here in the audience, though. Thank you for coming. And thanks to Nancy for really shepherding this. It's been a year almost here in the preparation to finally get to this state and to this event. So I'm really glad, and it's really great to be here. And a lot of folks from NIH and other groups that we've been working with for a long time here, so it's great to see you all. First of all, I'm a medical doctor, so I got to start with my conflicts of interest. I have just a few. (DESCRIPTION) From a trillion points of data into discoveries, diagnostics, and new insights in health and disease. Atul Butte, director, Bakar Computational HealthSciences Institute, distinguished professor of pediatrics, UCSF, chief data scientist, University of California Health. a t u l dot b u t t e @ u c s f dot e d u, twitter at atul butte. Three long columns of conflicts of interest, including scientific founder and advisory board memberships, honoraria for talks, past or present consultancy, corporate relationships, and companies started by students (SPEECH) But suffice it to say, I've started a couple of companies. I consult for a bunch of companies. You might not want to believe another word I say over the next hour or so. I wouldn't blame you. But I'm most proud of the bottom right. Those are all the companies started by my students. (DESCRIPTION) Carmenta, Serendipity, Stimulomics, NunaHealth, Praedicat, My Time, Flipora, Tumbl dot i.n. (SPEECH) More than half my graduate students now start companies, even if they go into academia. And they do this with the most amazing platform in the world. It's just simply data, and it's often big data, and it's often open big data. So I'm going to show you how they do it. I'm going to show you how I do it. And maybe I'll convince you, this is still actually the most amazing time to be in biomedical innovation and entrepreneurship. So a lot of physicians end up having to use a slide like this as a slide of shame. This is my sleight of significance. Because if you want to change the world, you just can't keep writing papers about it. But if you've discovered something, it's up to us to file intellectual property. If no one licenses it, then it's up to you to start the company. If you want change the world, you just can't keep writing papers. And so I'm a believer in this, and you're going to hear a lot about companies. It's not taboo to talk about it. All right, so if you haven't heard, we're in the middle of this data deluge. Obviously, a lot of different magazines and covers. My favorite is this one from The Economist magazine about five years ago. (DESCRIPTION) The Economist, The data deluge and how to handle it, a 14 page special report. A man holds an upside down umbrella to collect ones and zeros as he waters a flower with a watering can. (SPEECH) This guy's collecting ones and zeros with the umbrella. (DESCRIPTION) A graph showing global information created and available storage over time. Information created grows exponentially, and there is a wide gap between the two numbers (SPEECH) And this is the issue of The Economists where they announced that the human species generates 2 zettabytes of data every year-- 2 zettabytes. So if you forget your metric prefixes, it goes kilo, mega, giga, terabyte, petabyte, exabyte, zettabyte. And of course, the following year, it's 4 zettabytes, and 8 zettabytes, and 16 zettabytes. It's doubling every year or two. I'll be the first to admit, most of these zettabytes of data are YouTube videos of kittens playing with pianos and yarn. Entertainment value perhaps, but no scientific value. But there are scientific data sets in those zettabytes. So for example, the Large Hadron Collider in Europe generated petabytes of data for those scientists to find new subatomic particles. But my favorite example is NASA. Everyone knows NASA. NASA announced about two years ago-- and it's still speculative-- but NASA announced, that with the new James Webb telescope coming online, and a lot of ground-based telescopes coming online, that they might, by the end of this decade-- not too far from now-- they might be able to generate an exabyte of data every day just in pictures. Now, there's something already interesting about NASA's exabytes. NASA already admits they generate so many pictures of the sky, there are not enough astronomers on this planet to look at all the pictures. OK, it's true. How do you know this? Because they've started something called GalaxyZoo. GalaxyZoo.org, you can go sign up right now. And if you pass the test, then you become the citizen scientist that decides, is that blob in the sky a star or a galaxy? They've run out of scientists. So they crowdsourced this now to citizen scientists to actually make the call. What is that spot in the sky? And it's something very magical when you think about it that in certain fields, when you've run out of scientists, you must bring in a new community of thinkers, of doers, of innovators beyond traditional science. And of course, I'm going to tell you, we're already there in biomedicine. Now, we have so much scientific data, you've got articles like this. This is in Wired by Chris Anderson. Not the most scientific of journals, very influential article to me though. (DESCRIPTION) Wired Magazine, The End of Theory: The Data Deluge Makes the Scientific Method Obsolete. at c.h.r.1.s.a, bit dot l.y. slash end science (SPEECH) Chris Anderson was Editor in Chief 2008, saying science itself is obsolete. No, right? Now, how do you say science is obsolete? Actually, it's not science. It's the scientific method that they're proposing is going to be obsolete. Everyone remembers the scientific method. You're taught this in second or third grade. You come up with a question first. We call it a hypothesis. And then you go gather data to answer the question. Kind of a no-brainer-- question first, then go get the measurements. And in third grade, it's like measuring pea pods and their growth and stuff. But in today's world, we already have the data. 99% of the hard part now is figuring out, what's the question I want to ask? What's the killer question here that everyone's been dying to know the answer for? And no one's even realized we have the data to ask and answer it today. That is the hard part. It's not cloud computing, Hadoop, any of these programs. It's about the question that drives this. What's the right question to ask next? And it's not easy. Computer scientists don't know what the unmet need is. And biomedical folks don't know how to code it. You need a new hybrid team or hybrid individuals now of the future, of the present that really know that these questions are important, and they can be answered today. Now, of course, we're in a data deluge in biomedicine as well. I could show you all sorts of nifty gadgets and blinking lights. I could show you a luminous sequencer with the blinking lights. I could should you a mass spectrometer. I could show you a proteomics setup. I love showing these. On the left, there is an Affymetrix gene chip. This is how I got my start roughly 20 years ago now. (DESCRIPTION) Affymetrix gene chip, a handheld chip about two inches long with a purple window. Next to it, a handheld microarray grid about five inches square. at affymetrix (SPEECH) We had the NCI 60, the famous cancer cell line panel, on these microarrays. These were priceless back then. And now, they're so cheap, I carry one in every suit. And this is what they actually look like here. (DESCRIPTION) He pulls out a gene chip and waves it around for the audience to see (SPEECH) I'm holding one in my hand. And it's amazing, for a couple hundred dollars now, you get a read-out of every gene in the genome for the aficionados. We were talking about RNA measurements here. Doesn't really matter. Again, I'm using this as an example of these omic technologies. $200, you get to read-out like that. And you know, it's amazing. We got tired of measuring them one by one. And then on the right there, you see, and also in my hand, we switched to the 96-well plate format. And that's so cheap, I carry that in my bag. (DESCRIPTION) Pulls out a clear plate with tiny black wells, about the size of a cell phone (SPEECH) And then we got tired of mentioning them 96 at a time. Then we switched to the 384-well plate format. And that's so cheap now, I carry that in my bag. (DESCRIPTION) Holds up black plate of the same size (SPEECH) I cannot more clearly illustrate exponential growth in biomedical data than to show you these plates. right when I got my start, one by one-- we don't even use this one anymore. Because we switched to something called RNA-seq using sequencers to do this stuff. This company, Affymetrix, doesn't even exist anymore. They got purchased by Thermo Fisher or something. This is what we do. We just keep making more and more and more measurements. And what does a well-funded lab do by NIH? Well, they get a big grant to go make measurements or write a couple of papers. And what do they do next? They get another grant to go make more measurements. And all these measurements are just accumulating. And what happened in this field-- again, we'll talk about RNA and microarrays for a second-- in early 2000, 2001, 2002, everyone started using gene chips. Here's a list of genes that [INAUDIBLE] to this cancer, or heart failure, or diabetes. And here's a list of genes. And there's a list of genes. And then it started with the journals, the top two journals first. They said, you know what? We're getting too many of these papers with gene chips. No more papers with gene chips. That's it. That's it. Unless you put this data into an international repository, where the reviewers can double check their math. It's probably getting important to do that getting closer to the clinic. And maybe others can use these samples for something else. The precedent for this goes back to GenBank. Everyone's heard of GenBank. It's next-door neighbors to PubMed. But hardly anyone uses GenBank, but everyone's heard of it. It's, like, 49 years old. The first papers come out with a DNA sequence. Nobody wants to type in AGGC again. They were sending tapes to each other back then. You remember three-inch disks, five-inch disks. Anyone remember eight-inch disks? This is before all that. They were sending tapes to each other. Forget about internet. But the precedent has been set that if enough people use a modality for measurements, you got to share the data. At some point, you've got to share the data. So fast forward in this field to August of 2012. This (DESCRIPTION) Gene data to hit milestone. bit dot l.y. slash gene data (SPEECH) article came out of Nature. It featured my lab and a bunch of others, where we hit this milestone of 1 million samples publicly available-- a million samples publicly available. (DESCRIPTION) Data dump, a graph showing exponential growth of gene expression data sets in publicly available databases. By 2012, ArrayExpress has around 0.2 million, GEO has 0.8 million (SPEECH) Now, here's-- ArrayExpress is the European one. GEO is the NCBI one. It doesn't really matter. They all have the same samples. Now, I just got started here at 0 up to a million like that. Now, it's amazing. But if you look really carefully here, it's sad. It's a little bit sad here. It's slowing down. Maybe you could see it's slowing down here and at the end. Then you realize, they only counted half this year. This came out in August of 2012. We're still doubling every three to four years. In fact, right now, if I looked maybe a week or two ago, we're at 2.2 million samples up here. It's a straight up to here-- 2.2 million. 2.2 million biopsies, animal models, cell lines, operating room samples. 2.2 million samples already open to any researcher. So what does that mean exactly? Any researcher on my campus, or in your campus, or at NIH, intramural, extramural, who wants to do a research project, they don't have to start with an IRB and getting patients recruited. They can start by downloading data or, as I really love to say, even the high school kid today who needs to do a science fair project-- Let's say she wants to do a science fair project on breast cancer. She can go to these websites, literally type in breast cancer, hit search, and find and download nearly 90,000 samples of breast cancer about as easily as she can find a song on iTunes today. (DESCRIPTION) Screenshot of NCBI search of GEO datasets. In the search bar, "breast cancer." Results, 184 data sets, 3,749 series, 89,365 samples, and 51 platforms (SPEECH) Breast cancer search-- 89,000. Let's call it 90,000 samples. So what's so magical about having 90,000 samples of breast cancer? If you haven't figured it out yet, that high school kid now has more samples of breast cancer than any breast cancer researcher in the world. Because every one of them eventually has got to submit their data into the middle. They want to get a journal out, they got to comply with NIH rules. And whoever now has the middle database has more data than any researcher in the world. Maybe it's not breast cancer. It's colon cancer. It's prostate cancer. Man, so many diseases have been studied in the last 20 years. All of that's sitting there, waiting for you. Because a number of people that deposit data greatly exceeds the number of people that withdraw data. It's kind of like a bank, where everyone is taught how to deposit money. No one's been taught how to withdraw money. Hell, I'd steal money from that bank. It's just sitting there. It's sitting there, waiting for you, Now, the number I like actually is 3,749. Let's call that 3,700. Let's round it down. 3,700-- so that number roughly means 3,700 different research groups and labs have already contributed data on breast cancer. Let me tell you what a ridiculous number that is. Let's pretend I'm writing an NIH grant, and I'm a newbie researcher. I know nothing about breast cancer. And here's the grant. You're all in the study section, right? And I propose to you that somehow, as a newbie researcher, I can get 3,700 of the best breast cancer labs in the world to get their best characterized patients, to run the latest measurement technologies on them. I propose to you, I can get all 3,700 labs to share their data with me for free, for no money. If I actually wrote that in an NIH grant, you would laugh me out of the room. Why would 3,700 of the best breast cancer labs share any data with me for free? Yet, here it is, sitting there, waiting for you. So when I think of open, big data like this, the phrase I like to use-- instead of public big data, I call it retroactive crowdsourcing. What does that mean? You kind of get an idea. 3,700 labs are there to help you, and they don't even know they're helping you. Actually, 3,700 labs are there to help you, and you don't even know they're helping you. And I don't know if 90,000 samples of breast cancer are still not enough for you. Just wait another two months. It will be 100,000. And this hour, we will add 200 samples to GL. Just sitting here, someone in the world is uploading data, just sitting there, waiting for you. What an amazing treasure of data that's out there. (DESCRIPTION) Yes, even a high-school student can use public data to design a new diagnostic test! (SPEECH) Now, if you have any doubts a high school kid can do it, here's a school child that did it. (DESCRIPTION) Headlines, 17 year old programs artificial brain to diagnose breast cancer. Teen develops algorithm to diagnose leukemia (SPEECH) This is Brittany Wenger from Sarasota, Florida. I've yet to meet her. I talk about her in all my talks. (DESCRIPTION) Photo of Brittany Wenger next to other winners with yellow and green trophies (SPEECH) At age 17, she won the top Google Science Fair competition. She's got the white trophy there. Because she built an artificial brain-- read that as neural networks-- to diagnose breast cancer. She literally downloaded immunohistic data, QPCR data, and she built a diagnostic out of it-- enough to win. I guess she was bored, because the following year, she did it all over again for leukemia. Of course, what I mean here is if a high school kid can do this, then every researcher in the field needs to know how to do this now. These are the kids. These are the ones who are actually going to be here to do all of this. What an amazing time. But this is what an enabled high school kid can do. Now, just to put out a couple of really key highlights, which I love, the data sets that we really use a lot, it's just an example. It's The Cancer Genome Atlas. It's very easily abbreviated TCGA. (DESCRIPTION) The Cancer Genome Atlas, 14 thousand cases, 39 types of cancers, 13 types of data: molecular, clinical, sequencing (SPEECH) Those four letters are really overlaying an incredible treasure trove of data-- 14,000 cancer cases of 39 different kinds. And here, it's not just the RNA. 13 types of data-- molecular, so it's oncogene sequencing, certainly RNA seek. There's histone modifications on methylation. But even the clinical data-- what drugs were used, survival, not survival, and even, in some cases, the mammography images and the head images. All that data is publicly available. There's a beautiful one. (DESCRIPTION) N I H Lincs program. LINCS aims to create a network-based understanding of biology by cataloging changes in gene expression and other cellular processes that occur when cells are exposed to a variety of perturbing agents. (SPEECH) The Lincs Program, which I love, this is on the drug side. And we'll talk about both of these data sets quite a bit in a moment here. (DESCRIPTION) 5,178 compounds, 1300 off-patent FDA approved drugs, 700 bioactive tool compounds, 2000 plus screening hits (MLCPN and others). 3,172 genes, shRNA and cDNA, targets and pathways of FDA-approved drugs, community nominations. (SPEECH) 5,000 different compounds, including a lot of off-patent ones have been tested across 15 different cell lines. (DESCRIPTION) Banked primary cell types, cancer cell lines, primary hTERT immortalized, patient derived iPS cells, 5 community nominated (SPEECH) And literally, measurements before and after just to see what happens when those drugs are applied to ES cells, cancer cells, well-known cancer cell lines, and all sorts of different cell types. And it's not just the drugs, but also these other perturbagens, overexpression and knock-down constructs. See what happens in all of these cell lines, if we actually knock out this gene or overexpressed side gene to get an idea of mechanism. Beautiful data set here. Just this is a million samples. Just that is a million samples publicly available. And then PubChem is another one. How many of you know of PubChem? Just a show of hands. (DESCRIPTION) Pub Chem website. 227 million substances, 1.3 million assays. More than a billion measurements within a grid of 300 trillion cells. 71 million meet Lipinsky 5. 1.2 million active substances (SPEECH) Yeah, very few hands. Maybe 10 in this audience here. PubChem, so this came out of Molecular Library's initiatives. This is back at Zerhouni era. And everyone said pharma is going to hate this, because it's academics taking over drug discovery. That never happened. But PubChem, we can get these cancer cells to glow if they're living or glow if they're dying. But the minute you have a color or metric assay like that, you could put it in front of robots. And robots can test 1,000 or a million chemicals one by one to see do any of these chemicals make a difference for those cells. But if NIH funded you to do this, you've got to drop all that data into PubChem. So what is PubChem? Now, think of it as a big grid, a big matrix. A quarter billion substances as the columns, 1.3 million screening tests as the rows. So if you do that math in that grid, there are 300 trillion little boxes there of which a billion of them have been tested. This compound in this screen, this compound in that screen. So a billion of them had been tested by someone somewhere, and the measurements are there. Of the quarter billion substances, 71 million meet Lipinski's rule of five. That means they could be orally available if you made a pill out of the thing. And of the billion measurements, 1.2 million of them were active. Meaning the investigator thought, this drug seems to work. It's not just a measurement, it's actually working. Now, I can easily bet you a beer this beats the screening data of any pharmaceutical company. Actually, I can bet you a bottle of wine this beats all the pharma companies combined. Because this is still doubling every three to four years. And very little is doubling in early stage discovery in pharma anymore. And they give this away without a user ID and password. It's just sitting there in NCBI. All of this data is sitting there, waiting for you. Who knows how many compounds are just sitting there? The next big drug for whatever is probably just very simply a column in [INAUDIBLE] today. And it's just sitting there, waiting for you. What an amazing time to be doing this kind of science. Now, we know about all the high school kids are trying to do it this data. What are we trying to do with this, among other things, is get to precision medicine. We're actually not that far from the All of Us Research Program. My hotel was right across the street from there. (DESCRIPTION) The Precision Medicine Initiative. The time is right because of sequencing the human genome, improved technologies for biomedical analysis, new tools for using large datasets. Fact sheet, President Obama's Precision Medicine Initiative. (SPEECH) And of course, we had President Obama talking about precision medicine, launching the All of Us Research cohort. A lot of different aspects, but there's also the NIH MATCH trial. There's FDA support. What is precision medicine? A lot of people have definitions that are different. Mine is kind of simple. We're going to customize the health care. We're going to deliver it to an individual based on the measurements we're going to get from that individual. Of course, we're getting a biological measurements, like DNA and proteins and things like that. But we need to get behavioral measurements. Maybe you love this kind of side effect. You don't really like the side effect at all. We got to get your preferences as part of this. We need to get your environment. So we're going to get a lot of measurements from you. But actually, we can't just do it with measurements from you. We need those kind of measurements on a lot of people. And even more importantly, we need to figure out, what medical care do we give to all of these people and what worked and didn't work? And now, how do we all apply it to you? It can't just be personalized. We need a whole community. We need a large database of what's working and not working [INAUDIBLE] with all of those measurements. And then when you show up, we know what to do. And so if you look at the seminal report, 2011, the National Academies, it comes down to three challenges. They're kind of a no-brainer, these three ovals here. (DESCRIPTION) Diagram showing new taxonomy leading to clinical medicine ovals, accurate diagnosis, targeted treatment, and improved health outcomes. Arrow leads from those to observational studies during normal course of clinical care. Biomedical research arrow joins that one to point to Knowledge Network. A final arrow, marked validated, leads back to new taxonomy. http://www.nap.edu slash catalog dot P.H.P. question mark record underscore I.D. equals 13284 (SPEECH) We need better diagnostics. We need better treatments. We need to figure out our health outcomes. So for the rest of this talk, I'm just going to give you some anecdotes, just examples of what you can do with data to get to diagnostics treatments and outcomes here. Now, precision medicine makes a lot of physicians nervous. Because by definition, if we're moving into an era of precision medicine, that means the last thing we were doing was not precise medicine by definition, right? You're also like, how do you say that was imprecise and this is precise? How do you measure the precision of a medicine? So to me, one way I love to do it is to think about those little kind of boxes we put patients into-- disease categories. There's a word for this called nosology. Nosology is a systematic classification of diseases, and it goes all the way back to this guy, Linnaeus. Linnaeus is the first guy to put species into a tree. remember kingdom, phylum, class, order, family, genus, species? You all had mnemonics for that. Some were clean. Some were dirty. Don't repeat them here. You all had that on a test at one point in your life. But Linnaeus was actually the first guy to put diseases into a tree as well, the genera morborum. He's not remembered for this, because he got it horribly wrong. He got the species so right. Let's put the mental diseases over here and the fevers with urinary pain over here. That's the best they can do in late 1600s. This a little bit after the 1600s, 1700s. Galileo was under house arrest for saying the sun is the center of the solar system. This is just 50, 60 years after that. Remarkable-- you could see how they're trying to use the fonts to kind of make a tree here. But you can get these books on Google Books. (DESCRIPTION) Scans of Genera Morborum and Clavis Classium (SPEECH) But fast forward into what we use now today. The modern version of this is the International Classification of Diseases, ICD-9 or ICD-10. So let's talk about ICD for a moment. So ICD literally started with William Farr and the Royal College of Physicians a little after the US Civil War. So we're using the 10th edition of something that started after the Civil War. Now, I love to think about these classifications a lot here. So we're at ICD-10 now. We'll talk about that in a moment. Here's ICD-2. This is a second edition. In fact, I carry this one in my bag here, so you can see what this actually looks like. First of all, you can see how thin it is. This is the second edition. This one's second revision-- Paris, 1909. Some of you remember European history. This was not a WHO standard. This became a League of Nations standard, because the United Nations was still 50 years after this. And so it's amazing what these books look like. If you have time at the break, I'll show you what this looks like. But it's written like our current code books. Let's use code 39 for cancer of the buccal cavity, code 40 for cancer of the stomach, liver, intestines, cancer of the skin. And then this book 50, 100 years ago, did what we still do today. We're running out of numbers. Put all the other cancers in 45-- not otherwise specified, including-- and I kid you not, it's in the book-- lung cancer, brain cancer. We just don't even bother distinguishing these from each other. Isn't lung cancer the number one cancer killer today in the United States? Was cancer not otherwise specified back 100 years ago. In fact, it is amazing what we put in not otherwise specified. It's all the embarrassing stuff in medicine we don't know how to diagnose, and we don't know how to treat. The minute we get some treatment, we kind of rescue it out of not otherwise specified. It's unbelievable. But if you're a researcher looking for something to do, go after not otherwise specified. Those are things we don't know what to do. That's where you should be going. But this book, 100 years ago-- literally, I kid you not-- it this international standard literally had to say that if you run into a patient that died because of a visitation from God, be sure to use code 189. I flagged the page. Now, the same way you're laughing at this, can you imagine how they're going to be laughing at us in 20 years, in 10 years? Oh, those silly fools didn't know this joint disease and this bowel disease were the same thing. Just because two different kinds of doctors take care of that doesn't mean they should be in different parts of the tree. This has lost all connection to science in some ways. It's just for billing. But this is where you are laughing at this, they're going to be laughing at us too. You know that's going to be happening. And so that narrative I want you to think about is, let's replace this with this-- the molecular view of these diseases here. Now, look, we've moved on. Obviously, we had ICD-9 for 20, 30 years. We moved on to ICD-10 just about two years ago-- the 10th edition of this book. And this is the second again. And it's amazing the physicians are in revolt about this, because there's too many codes. Oh, my goodness, we've got a code for if you're an astronaut hit by a micro meteorite or if you're a surfer eaten by a whale or hit by a whale. We've got codes for that. People were making fun of these. They had all the memes up there on the internet. This was my favorite. I don't always get sucked into a jet engine, but when I do, I use ICD-10 Code V97.33XD. (DESCRIPTION) A picture of the Most Interesting Man in the World. Text, I don't always get sucked into a jet engine, but when I do, I use ICD-10 CODE: V97 DOT 33XD (SPEECH) So people were making fun of this. Now, look, I have a lot of friends in San Diego. We have a great Naval base in San Diego. UCSD is not too far from there. And I've talked to my friends. And they say, indeed, if you are serving on an aircraft carrier, you can get sucked into a jet engine. It is possible. it is possible. But this is the chronic form of the disease. So I can imagine getting sucked in once. I'm really not sure how you get sucked in again. But if it happens, we're ready with a code. But the same we were laughing at this, we shouldn't be laughing at more codes. What we should have been laughing at was ICD-9, which we had for decades. So let's look at lung cancer again. (DESCRIPTION) Slide showing ICD-9 codes (SPEECH) Here's ICD-9, 2012, clinical modification. What's the code for lung cancer? 162.9. Here are all the diseases that are covered in 162.9. (DESCRIPTION) Adenocarcinoma of lung, stages 1 through 4, bronchioloalveolar carcinoma, broncoalveolar cancer, C.A. of lung, cancer of the left lung, large cell, cancer of the left lung, squamous (SPEECH) Did we take advantage of the latest in molecular testing, EGFR sequencing to distinguish disease? No, of course not. Did we take into account even basic histopathology to tell squamous cell from large cell from small cell? No, of course not. All of those were in one code. Do you know how we distinguished lung cancer in ICD-9? Left lung or right lung. That's how we coded lung cancer. We haven't even fixed that much in ICD-10. The first thing we got to do is to stop making fun of these codes. We got to get physicians and everyone involved with care to believe in these codes, want better codes. We have to not be afraid to tell each other what we really think a patient has and, more importantly now, tell the computer what we think this patient has. Because we're going to need the computer's help as we go forward to really get the right diagnoses and the right thing to do. Medicine is getting complicated. So we can't keep making fun of codes like this. We got to move these forward. ICD-11 has got to be even better as we go. Those are codes. So let's talk from codes to diagnostics. So one of the first things I set out doing when I got all this data-- and back then, when I started at Stanford, I guess, 14 years ago, we had a whopping 25,000 publicly-available samples. I'm thinking, what are we going to do with 25,000? Now, we got two million. Let's just go get them all. So my first big NIH grant, my first RO1 was go collect every human disease studied by microrays. Why pick cancer? Why pick diabetes? Just go get them all. Thousands of diseases already studied. And we started to collect them in this kind of format, where someone looked at disease and healthy at the same time. (DESCRIPTION) Graph showing disease individuals and healthy controls, Marina Sirota (SPEECH) A lot of researchers look metastatic cancer and nonmetastatic cancer. I don't even use that data, unless they had healthy normals in there too. You've got to have the normal controls. It's kind of funny, though, you're all researchers. We have so many different words for normal. The top six are normal, vehicle, wild type, controlled, times zero, and margins. We had to write a paper on the 200 words we use in medicine for normal. But we figured it all out with text [INAUDIBLE] and LP. We started collecting. Each box here is a gene. Maybe it's more of a gene, less of a gene. The [? complete ?] signatures. People have been doing this for decades. (DESCRIPTION) Slide showing several of the same graphs in a jumbled up pile. (SPEECH) Now, remember that's one experiment. But I can pick any disease, and probably hundreds or thousands of people I've already study this. So if I only have one data set, I'll skip that disease this year. I'll wait for two next year or four the next year. I could just pick and choose. So first thing we're going to do is make diagnostics for this data. And what's a diagnostic? A diagnostic helps a physician or a health professional know if you have a disease. There's imaging diagnostics. I love protein diagnostics. I love proteins. Because I still remember my way-back days as a pediatrician. If I had a toddler screaming their head off, with a fever and sore throat, somehow, you could stick this Q-tip back there and put some drops. You could tell if they have strep throat or not. I'm sure you've had this done to yourself as well. That is a rapid strep test. And in fact, that's something called an ELISA, an Enzyme-Linked Immunosorbent Assay. That's 1970s proteomics. I loved ELISAs. Because you don't even need a refrigerator for that stuff to work. There's no cold chain involved. And I've got a whole bunch of RNA measurements here. How am I going to get to proteins? Well, it's always dangerous to mention the central dogma. But DNA codes for RNA codes for protein. I've got a whole bunch of these measurements. I really want to make these into tests. And I got this little arrow here, which means if the RNA is changing, maybe the protein is. Maybe I can pick that up in the blood or something accessible. So we get a whole bunch of these diagnostics, so I'm not going to spend any time on these. Cancer markers was one. This was an interesting one. These little kids come in with medulla blastoma-- brain cancer-- a horrible cancer to have. And the pediatric brain surgeons were telling us, in a little kid, it's really hard to tell, where is the cancer ending and the normal brain beginning? Because obviously, you want to keep as much normal, healthy brain in a kid as you can. You say, well, can't you just come up with some kind of paint? We just want to paint the operative field. Turn off the lights. Just make it glow where the cancer is. How the hell do you make a paint like that? Well, you just download all the public data you on medulla blastoma. And then download normal neuron, normal glial cells, normal astrocytes. Subtract out the normal from the cancer. Figure out which code for cell surface protein is. Maybe a paint would stick to that. And here are two markers that we showed that actually are darker in medulla blastoma compared to the normal tissue around it. So here is how you do it. With public data, you make a paint. We also have one for transplant rejection-- patients coming in with new organs. We transplant hearts, livers, kidneys, and sometimes, the body wakes up one morning and says, you know what? This isn't part of me. I got to get rid of this thing. That's acute rejection. It's not painful. The patient doesn't know it's happening. And because of that, we've got to stick these needles in every once in a while to figure out, is there a rejection? That's a pretty good painful way to look. Why don't we make a blood test for this? And you know what? If we're going to make a blood test for rejection, let's make it work for any organ not just kidney or heart. I want it to work for all of them. So how do you do this? Well, you download all the publicly-available data you can on transplant rejection. There's 50 experiments now. Imagine a big Venn diagram. We're going after what's in common across all of them. And here's a marker that not only just worked in the blood, but also worked in immunohistopathology for rejection. I won't say anything more about those. Those have been published. The one I'll spend a little bit more time on is this disease-- preeclampsia. Show of hands if you've heard of this condition preeclampsia. Yes, so it's a larger audience than most, obviously, this being a biomedical community. Usually, it's half of the audience that actually watches Downton Abbey, because one of the characters dies of this condition. Spoiler alert if you haven't seen the series. This is Sybil who dies of preeclampsia. And so when you're pregnant, when a woman is pregnant, and the blood pressure goes shooting high, if it's not treated, it can lead to seizures. It kills the baby, kills the mom. This is still a major expense. A lot of morbidity and mortality around the world. (DESCRIPTION) 5-8% of pregnancies in US and worldwide, 4.1 million US births in 2009, up to 300,000 cases of preeclampsia annually. Responsible for 18% of US maternal deaths, maternal death in 56 out of 100,000 live births and neonatal death in 71 out of 100,000 live births. $20 billion in direct costs and an average hospital stay of 3.5 days. Linda Liu, Bruce Ling, Matt Cooper (SPEECH) You can see the billions that we spend, because then what you gotta do is take the baby as fast as you can in many cases. And we pediatricians have to keep these little tiny little creatures alive and healthy as long as we can. There are four drugs in trials in the United States for preeclampsia. But a diagnostic we're using is ancient. We look for urine protein-- not even a specific one, just urine protein-- probably one of the most nonspecific tests we have in today's obese America, where many people now have protein in the urine. So we wanted to make a blood test for preeclampsia. And so Linda Liu was a grad student. Bruce helped with the proteomics. Matt got involved in a moment you'll see towards the end. And how do you start? Well, you don't start with the IRB. You start with just typing in preeclampsia or pre-term birth. 266 experiments. Here's prematurity-- 299 pre-term birth. Preeclampsia-- 100 samples, 94 samples. I'm not saying there's 266 chips. I'm saying there's 266 collections of chips. Here's 300, 249, thousands of samples. So many investigators already funded to do the same experiment in their own samples. And what do we do is we put them all together. Here's a funny part of how I do my science. I love the fact that these researchers are following the rules. I love the fact they deposit their data. I don't trust a single one of them. But I trust what I see in common-- wisdom of the crowd here. If half of them, if 90% of them see the same marker going up-- and we love up tests in medicine. We don't like test going down. We like up tests for some reason in medicine. They all see the same markers going up, maybe there's something real there. I don't care if one person saw it. I want a whole bunch to see it. As we chase them down, we find a dozen of these markers, including here's hemopexin. (DESCRIPTION) New blood markers for preeclampsia (SPEECH) It's up-regulated in preeclampsia compared to normal, healthy pregnant women, no matter what the gestational age. (DESCRIPTION) At March of Dimes, bit dot L.Y. slash preeclamp (SPEECH) We had March of Dimes fund this work initially. Spark is a CTSA-funded seed award at Stanford. We brought a whole bunch of papers at this link here. By the way, all these slides are on SlideShare. I give them all away in the YouTube videos of the stuff. All the papers are at this link here. What do you do next in Silicon Valley? You start a company on this. (DESCRIPTION) Headlines, Carmenta Bioscience Secures over $2 million in oversubscribed seed financing. Progenity Acquires Carmenta Bioscience for proprietary preeclampsia technology, appoints Matthew Cooper Chief Scientific Officer. At Carmenta bio, progenity.com, bit dot L.Y. slash carm underscore prog (SPEECH) And this became Carmenta. (DESCRIPTION) Flow chart, need a diagnostic for preeclampsia. Public big data available. March of Dimes center for Prematurity Research. Data analyzed, diagnostic designed. SPARK grant, $50,000. Life Science angels, other seed investors, $2 million (SPEECH) Now, here's the arrows. You can see a couple of these arrows here. You could see it raised $2 million in seed financing. So let me be crystal clear to this academic audience for a moment here. I'm not showing you $2 million to brag about it. Indeed, it's not a lot of money to raise in Silicon Valley if you've seen any of the headlines. I am showing you the $2 million to maybe convince you this is the new way science has to continue out of the lab. The next experiment we need to run now is a prospective multi-center trial. But that science experiment will now be run using private dollars in a startup company. The science continues in the startup company funded by private dollars. Or to put it another way-- do you know how hard it is to get a brand new $2 NIH grant today? The science can and will continue in startup companies. But because it's relevant to patients now, it's funded by private dollars. So here's the arrows. We start with the unmet need-- public data available. March of Dimes funded the initial work. We designed the diagnostic, the Spark, the CTSA grant. 15,000 bucks to get samples tested. It seems to work. Launched a company. $2 million seed funding. I love this company, because as you can see on the bottom here, it's already been acquired. Some of you know what that means. So we went from public data, which people think is as valueless as kitten videos, to design the diagnostic testing samples in the freezer, launch of the company, selling the company, start to finish-- 24 months. Inventor is happy. Investor is happy. Of course, we're going to do more of these, and I'm giving away the secrets here. Because every one of you could pick a different disease to make a diagnostic for. We'd never step on each other's toes. That's how many diagnostics we still need in medicine. Every aspect of what we did here is in all the articles and all the links. All the papers describe everything here, lots of news articles here. I'm giving this stuff away to hope maybe some of you can get some others to do this, too, because we need those diagnostics. It's just sitting there, waiting for you. Let's talk about therapeutics for a second-- way harder, as some of you know. If I were to ask the lay audience how much does it cost to develop a new drug, usually the answer people give is that it's a billion dollars and 10 years. You've heard those numbers, right? It take a billion dollars and 10 years to make a new drug. That's advertised on the Super Bowl, I think, is why people know that number. Boy, that's an underestimate, according to Matthew Harper, who does this math every couple of years in Forbes. Now, it's really simple math, even after lunch. (DESCRIPTION) How much does Pharmaceutical Innovation cost? A look at 100 companies. at Matthew Herper, bit dot l.y. slash new drug 1 (SPEECH) What's the right number? How much of pharma spent on R&D for drugs divide by the number of drugs you got. It's really simple math. What did you spend divided by how much did you get. (DESCRIPTION) Table with columns, company, ticker, number of drugs approved, R&D spending per drug in million dollars, total R&D spending, 1997 to 2011 in millions (SPEECH) If you do that math, for these top 12 companies, it costs between $4 billion and $12 billion per drug. Probably not sustainable. But that's a really negative way to paint the pharma industry. Let me paint this problem in a positive way. Even if every pharma and biotech company on the planet was 100% successful in developing all the drugs they are working on, there are not enough of them on the planet to develop all the drugs we need for precision medicine. It's the NASA astronomer problem all over again. They're just not enough companies. How many do we have? 200? 300? That's it. How many are in the new codebook? We just don't have enough companies. We got to get to more efficient ways to do this, not just to spare and save pharma, because we just need the drugs. That random subtype of a cancer is never going to get a therapy unless we have a better way to do this. Now I'm not saying I got the world's solution to drug discovery, but can't we use all this data to do this? And indeed, we can. (DESCRIPTION) Differences in Disease individuals and healthy controls leads to disease gene expression signature. Differences in treated samples and untreated samples leads to drug gene expression profile (SPEECH) So as we're starting to collect all this data on the diseases, people just start dumping drug data onto the internet, including the Whitehead and Broad Institute with the connectivity map, which became links and now many others contributing drug data. So many samples before and after the drugs. So here on the left, we got samples with and without disease. And on the right, we got samples with and without the drugs. They might not even be the same tissue. It doesn't matter. We're going to put these two together. The reporters for this called this match.com for drugs. Some of you know match.com. It's how you find a date, maybe a spouse or mate if you're lucky. What's the age-old saying when you're trying to find a spouse? Opposites attract. You've heard of opposites attract. We use the same methodology. I'm going to show you all the bioinformatics with my arms. (DESCRIPTION) Holds arms even, then puts his right arm up and left arm down (SPEECH) If I've got a disease, where this gene goes up and this gene goes down, and I can find a drug that knows how to make this one go down and this one go up, maybe there's a match there. That's with two genes. Imagine 20,000. Imagine Kolmogorov-Smirnov tests and Pearson correlation coefficients. I'm not going to show you any of that-- a lot of ways to do this. We're naive looking for a drug that reverses the effect of the disease-- really simple. Turn our crank, we got a lot of these, a lot of them. This could be a drug. That could be a drug. But where we get a lot of traction aren't just the new drugs. It's the new uses of the old drugs-- drug repositioning. Now, of course, we didn't invent the concept of drug repositioning. These are drugs that are already being used. Everyone knows that one of those first early cardiac drugs that had a very interesting side effect of hair growth. That's minoxidil. Everyone knows the other cardiac drug that had its own interesting side effect, of course, which is Viagra. That's what you thought I was going to say first. And Viagra, it turns out, did a double reposition. So we also have Revatio. Revatio is the same molecule, specifically indicated for pediatric pulmonary hypertension. So when the blood vessels from the heart to lungs get constricted, Revatio opens those up. Everyone was making fun of babies on Viagra back then, but it also works in that too. But you know what? Instead of finding these by accident, how about we find them on purpose using publicly-available data. That was the concept. Now, we've done dozens of these. The one that got us on the front page of the Wall Street Journal was this one. Video is not working, but that's OK. This is an anti-epileptic drug. (DESCRIPTION) Text, Anti-epileptic drug topiramate works against a rat model of inflammatory bowel disease. Science Translational Medicine, 2011, bit dot L.Y. slash sci T.M. top, Joel Dudley and Marina Sirota (SPEECH) So this is topiramate, a seizure drug. We predicted it would work on inflammatory bowel disease, an autoimmune conditions. And so these are my famous rat colonoscopy videos. Not enough of you are laughing, so let me illustrate the geometry involved. This is a colonoscope. This is a rat. Give me some credit for this experiment here. [LAUGHTER] (DESCRIPTION) Puts fists together (SPEECH) This is what a normal rat colon looks like. It's anesthetized, of course, but still alive. And you can kind of see some redness here. We use a chemical called TNBS. It gives a very severe form of inflammatory bowel disease. All better with the topiramate. 40 megs of video at the Science Translational Medicine paper. We've done a bunch of these now. (DESCRIPTION) Text, Psychiatric drug imipramine shows significant activity against small cell lung cancer. (SPEECH) Joel and Maria did this one. This one came out about three years ago-- it's longer-- five years ago now. This is one of my favorites. We predicted imipramine, an anti-depressant, would work on small cell lung cancer. Imipramine, it's a great anti-depressant. But we've moved on to newer ones-- serotonin reuptake inhibitors, Prozac, and all the rest. And here, we predicted small cell lung cancer. (DESCRIPTION) Vehicle control covered in tumors. With imipramine, no tumors appear. Mice dosed after tumor formation. (SPEECH) This is a triple knockout of these three genes. This mouse gets mouth lung cancer. So it's not a xenograft. This is a genetically engineered mouse model. You had to wait a couple of weeks, and the mouse gets mouse lung cancer. And here it is treated with imipramine. So again, imipramine is a tricyclic anti-depressant. We have newer antidepressants. We don't use this one anymore, because it has some side effects. It can make you sleepy. And in some people, it might set you off for a QT prolongation. So in some people, imipramine might set you up for an arrhythmia. Actually, neither of those two side effects sound as bad as having lung cancer-- a 5% survival rate in five years. And the cancer is melted away here. It's gone in this mouse. So why do I show this one? I love this one. We start with computational prediction, public data. This drug is going to work. We test cell lines in a dish. I'm not even going to show you that. We get this amazing mouse lung picture 15 months after the computer prediction. We got IRB approval. We launched a trial on this one. Patients dosed on this drug. From public data to IRB approval in 15 months, trial launched. Total cost of the clinical trial-- $50,000. This is a phase 2 AAAAA study. Is there any biology at all here? Now, I'm financially connected to this molecule. I'm not in the trial. I'm not running the trial. I don't care if it works or not. We got to do so many more of these. If we don't tackle small cell lung cancer, nobody is. Nobody's working on small cell lung cancer today. Maybe one biotech somewhere in Europe. Nobody else is working on this. If we don't do this, nobody's going to do this. We got to do so many more. OK, here's one that hasn't been published yet. I love this one. It's coming now. I think we're finally close. Martin Peter Martin did this work, unpublished. So this is a disease called psoriasis. You've heard of psoriasis-- autoimmune condition. The skin gets thicker, plaquey, can be on your face. It can lead to other problems, like arthritis-- psoriasis. Now, this mouse has been genetically engineered to have psoriasis. It's an over-expression of a particular transcription factor. I am neither a mouse person nor a psoriasis person. But the hair kind of looks disheveled. I can't really see the fingers. It's all better. It looks cuter to me. I can see the fingers there, the digits. So this mouse had been treated with a 1960s diuretic. So in other words, we used to use this drug in the 1960s to help you pee. We have much better ways to make you pee in medicine. So we don't use this drug anymore, but it was the topic of a New England Journal of Medicine paper in the '60s. And whether you give it topically or systemically, this mouse does better. Now, look, you look at psoriasis and you say, well, if you're working at a biotech company, why are you working on psoriasis? We already have monoclonal antibodies. We have anti-IL-12/23 monoclonal antibodies-- $50,000 a year-- that treats psoriasis. They're immunosuppressants. You knock out a good component of the immune system, an autoimmune disease like psoriasis gets better. Why would you want to work on psoriasis? We already got these drugs, monoclonal antibodies for them. Well, let me show you the cellular data. Here's what normal mouse skin looks like. These pink are the keratinocytes. In psoriasis, they go crazy. So instead of staying at the top, they go deep. And you can see them kind of growing along a hair follicle or vessel here. This is half the dose. This is a double dose. It's a dose ranging study. And you can see all the pink are all back to where they're supposed to be at the edge here. But actually, the most important part of this slide are the blue. Those are all the immune cells. (DESCRIPTION) Many blue cells in picture (SPEECH) This is the first small molecule that fixes the keratinocytes in psoriasis. And it's not an immunosuppressant. Nobody has a molecule like this, and it's sitting there in public data-- a 1960s diuretic. Can I more convince you the value of open data? If we don't look, we will never find. How many other molecules are just sitting there as a column in someone's grid? Sitting there, waiting for you. So many more. (DESCRIPTION) Headlines, Digital drug development company NuMedii snags $3.5 million. Wall Street Journal. Astellas Hooks up with NuMedii to continue drug repurposing deal drive. (SPEECH) What do you do in Silicon Valley? You start a company on this. And this became NuMedii. This is a team of four people-- four full-time people-- in a biotech company. So I call these garage biotechs, because we worship garages in Silicon Valley. Apple and HP started in a garage. They're like national landmarks. And dorm rooms-- Yahoo, Google, Facebook started in dorm rooms. What's the smallest possible biotech company you can start in a garage? Because when you go home, you already got two million samples and mouse models up the wazoo that you can order from your garage today. I know it's a lot of hard work after that. What's the minimum viable biotech company you need that you can get started with in a garage? This little team now is working on an idiopathic pulmonary fibrosis. We got two drugs. Neither really work well for those patients. If they don't work on this, nobody is going to work on this. That's the whole point of these kind of companies. We need so many more. And we have plenty of diseases to go after. We don't have to step on people's toes to do it. All right, I got time. So let's go further here. What's going to be the next big open data? The hockey puck-- where is it going? Where am I skating to? What's half my lab invested in? Clinical trials data. (DESCRIPTION) Text, ImmPort, immport dot org. The next big open data, clinical trials. Download 300 plus studies today. Drug repositioning, new patient subsets, digital comparative effectiveness, more (SPEECH) What's a clinical trial? The most expensive experiment in the world. And half of them fail. And then when they fail, we barely write papers about it. Forget about releasing the data. Oh, boy, is that going to change. EMA-- European FDA-- EMA is requiring raw clinical trials data release. FDA might get there. All these three letter acronym organizations are fighting one way or the other. Everything from pharma to National Academies, everyone's got a stake in this. I think clinical trials data is going to be publicly available. I run this repository now with our good friends and colleagues at Northrop Grumman called Immport, funded by NIAID, where we give out hundreds of raw clinical trials to the public. By the way, I don't mean that summary table you got put in clinical trials [INAUDIBLE]. That's whatever. I mean the raw data-- every patient, every encounter, every arm, every dose, de-identified of course, every CRF released to the public. So we do this for hundreds of data sets. And I'm going to show you one example of what we did with it. What are we going to do? Why do you think that drug failed, or why do you think it worked? Just open the eyes to clinical trials data, like we've already been doing in every other data aspect. (DESCRIPTION) Text ImmPort redistributes data from major N I A I D funded programs and more (SPEECH) So we run Immport now, and we also get data from a lot of different consortia. In fact, all of these consortia give us data- Immunotolerance Network, CTOP transplantation. Bill and Melinda Gates will start to give us data now, March of Dimes. (DESCRIPTION) Graph showing increases in numbers of studies (SPEECH) It turns out premature birth and inflammation have a lot to do with each other. And then the Axillary Medicines Partnership, the AMP, with the NIAMS, this lupus and RA, we're getting that data as well. Hundreds of data sets now. We still have 120 that will be released. So there's our embargo period, where researchers get to publish before it's available. Over 300 available. And now, it's routine for us to get around 1,000 downloaders a month. I think when we started seven, eight years ago, we had 30 in one month. And now, we got thousands. You show people what you can do with the data, and it's just out there. That's de-identified raw data. 50 types of data-- flows cytometry data and, of course, the clinical trials data. So let me show you one example. We've written a paper about this. You can cite and more information there. A lot of people now are releasing their data. As a repository holder now, I love it when people say, here's our great paper. Data available under the accession number. That's great to see that. There's not a lot of PLOS One data available. So here's the paper. Go download the data right away. All right, let's look at a one real-world example of what are you going to do with clinical trials. This might be the most relevant to folks in the audience here. (DESCRIPTION) Text Reanalyzing RAVE. Trial of new approach to the induction of remission (SPEECH) This is a paper came out in 2010 called the RAVE study-- rituximab in ANCA-associated vasculitis. So vasculitis, what we're talking about here is the inflammation of blood vessels that go to the brain. And as you can imagine, the brain needs blood. So that's a medical emergency. You don't want to end up with a stroke. And we have an old way of doing this-- cyclophosphamide of treating this-- and then they're proposing a new way-- rituximab. And like many studies today from pharma, they're not trying to say it's better. They're just trying to say that it's not worse, non-inferior. That's the goal most people try to get to. This is a randomized, double blind, double dummy, active-controlled non-inferiority study. And this is the study here. Paper came out, New England Journal of Medicine. You could tell it's New England Journal based on the font. New England Journal of Medicine in 2010, paper came out. But what did they also have were all of these data elements, all of the time points, all the CRFs for everyone, including entire whole blood flow cytometry, ANCA levels, which drug were they on, rituximab placebo, and then the switchovers. Just under flow cytometry, they ran all of these panels. We're just looking at one of them-- the b-cell panels here. And all this data released to the public through our repository, through Immport here. OK, what's the first thing you're going to do with raw clinical trials data? You're going to try to reproduce the findings. That's a no-brainer. Do we have a quote, unquote, "reproducibility crisis?" Some people say, I don't believe we're in one. But we can argue about that over cocktails. Much more interesting venue to argue those kinds of things. (DESCRIPTION) Text reproduce CD19+ B-cell depletion using publicly released clinical trials data (SPEECH) And here is the original figure from the New England Journal of Medicine. When you give rituximab, the b-cells drop. That is a no-brainer. We get the same figure. But now, this is with r and g-g plot, and we added colors. And you can see, obviously, we do this to show yes, we got the data in one piece. We can exactly reproduce the picture here. That is interesting yet boring. The story of the future isn't about reproducing. It's what's the new science that you can do with this data. Now, as we read this paper, we realized there's still a killer question they didn't answer here. Let's say the paper gets accepted. Let's say the community gets this. Let's pretend you're a rheumatologist now. You see one of these patients. Well, which one of the drugs do I use? (DESCRIPTION) RAVE reanalysis. In retrospect, do any measured factors predict response? (SPEECH) Because the old drug worked 50/50. The new drug worked basically 60/40. So now, here's my choice. I got this old drug that's pretty cheap, but it has short-term side effects versus this new drug, which is super expensive, might have long-term side effects, but no short-term side effects. Well, which of the two do I use? Do I let the insurance company dictate this? How do I choose? 50/50, 60/40. So we just asked you a really simple, stupid question. Was there anything in any of these measurements that could tell us which drug worked better? (DESCRIPTION) Text Granularity index higher in rituximab-treated subjects with remission (SPEECH) And that's what we did. Do any of the measured factors predict response? In fact, if you turn the crank, it all comes down to what we fancily called the granularity cite index, but actually, it's the count of neutrophils. That's kind of odd, because these aren't the lymphocytes here. But turns out, the higher the neutrophil count, the more likely rituximab was going to work, to lead to remission compared to cyclophosphamide. And these are the kind of graphs here. So if you're neutrophil count was high, go for the rituximab instead of cyclophosphamide. In fact, in this paper, we proposed a new treatment paradigm. Instead of just picking by chance-- instead of letting the insurance company dictate this-- go for profile therapy. Instead of 50/50, 60/40, you might be able to get to an 80% remission rate, because you picked the right drug intelligently here. Now, here's where it's interesting. I'm not saying this is a done story here. All I'm saying is the next trial that looks at rituximab should at least look at this measurement to see if we can patients to a higher count. Now, it gets even more fascinating. Because here I am. I'm a neutral third party. I had nothing to do with this trial. But enough is documented now that I can understand all the elements here, and I could actually go talk to the people who are involved. I'm citing their paper. I now have this test, which I can call a granularity cite. Whatever I call it, the pharma industry will call that a companion diagnostic. I know it's a pretty simple one. Boy, I could even file intellectual property on this. I might even start a company on this. And guess who would want to buy this companion diagnostic? Boy, you got rituximab right there. This is just one of many, many, many examples or what we're going to do with trials data. Don't think that the pharma sponsor or anyone at trial looked at all the interesting questions in the data set here. Yes, OK, reproducibility-- whatever. But maybe the killer question is still waiting for them. And maybe, actually, they'll never ask this question, because they don't want people to know. I'm not saying this in an evil way. But they want it to be thought of as equal. Maybe you, as the third party, need to go in and say, this is where you should be using this drug. You can come up with a companion diagnostic, and you're not even part of the trial. We got dozens of these, and this is one of many examples. I see a future in clinical trials data. Here's another one. Boy, we have so many data sets now at Immport. All these trials and all of them are placebo controlled or time zeros, all those control groups. We realized, if you just take all the control individuals-- just the control groups-- they've been studying like crazy their immune systems, we already got more than 10,000 people with a normal, healthy immune system. They weren't treated with anything. They don't have any disease that we know of. Why don't we gather them all together, harmonize all those data. And now, we call that the 10,000 Immunome Project. We literally have the immunomes of 10,000 people. (DESCRIPTION) From the control groups of 242 manually curated experiments. bit dot ly slash 10k immu, http colon slash slash 10k immunomes dot org (SPEECH) And so it's CyTOF, flow cytometry, secreted immune proteins. Those are cytokines, like Luminex, all sorts of other assays, flu levels, titer levels, gene expression in some. It's not a complete matrix. We don't have everything on every one. But for thousands of people, we have many of these measurements that totals up to 10,000 people. To me, this is the control group of the future. Maybe we can start to launch trials and think of this as a common control group, because it covers many ages, 50/50 male, female, and all the different ethnicities as well. This isn't just some convenient sample. This represents the United States in many ways. We give this away. The bi-archive, the pre-prints-- there's a pre-print revolution now. I think it's already been downloaded 3,000 times. And it's finally been accepted to a journal. This is the new world. Who cares if it gets accepted to a journal, because 3,000 people already started to use this. And this is a website where you give this away. It's all tied to import. This is what you can do. It's all that trial data. Yeah, the disease is interesting. What about the controls? Because people don't have this kind of resource for regular immunology. I love the precision medicine issue. I love all of us. They're not even looking at the immune system. You're not even going to get it from there. So until someone else does this, this is what we got. It's freely available, because we figured out how to do this with the data. All right, in the last few minutes-- oh, yeah, I can't just take full credit. I got to point out some of the others too. We got Vivli, which launched two months ago, which is like an index of a lot of these open data resources, more than 3,000. YODA, Harlan over at Yale. Harlan Krumholz, great other person to follow on Twitter. He's been running YODA, which is another one of these clinical study requests. ClinicalStudyDataRequest.com-- each has their own weird way of doing this. Clinical study, you've got to use their portal. So you upload Sas code, and it runs on their portal. Immport, I love, because you download the data onto your laptop. You do whatever you want with this. There's no restrictions. It's not a virtual portal. But there are other ways to do this as well. All right, so in last few minutes, I'm just thrilled to be at UCSF. Some of you know I've been at Stanford. I was at Stanford for 10 years. I moved to UCSF about three years ago. It's an amazing time at University of California. If you've ever been to San Francisco recently, you know how much building and construction that is going on, especially in Mission Bay, this area south of the ballpark. (DESCRIPTION) Bakar Computational Health Science Institute. Two modern, gleaming buildings (SPEECH) We're slumming it out in this brand new building until they build our brand new building opening in late 2019. We're going to be on the second floor here. And just to orient you here, this is a Third Avenue tram. This is the new Warriors Stadium complex. We're next door to the Golden State Warriors. And people say, what about traffic? And we have no answers for any of that. But it's got to be next door. Maybe the restaurant scene will be better. Back here is Chan-Zuckerberg. Illumina was there. They're moving out. Chan-Zuckerberg is moving in, the bio hub. And then our new children's hospital, adult and cancer over here, a lot of different buildings. We're building a lot at UC and UCSF right now. I loved every day at Stanford. But I got to really take on the leadership of this new Institute. And we got launched with a $10 million gift from Priscilla Chan and Mark Zuckerberg. We' got more resources now from the Baker family, which is why we named it after them. And I'll show you what we've been doing with all this money here. First of all, it's great a lot of universities have this type of phenotype of a faculty member, a compute person or a data person. And they are usually scattered in various different departments and schools. Indeed, that was the case at UCSF. You put them all together, we got 50 faculty. You could see all their credentials here. (DESCRIPTION) 5 in National Academy of Medicine, 1 in National Academy of Science, 2 in American Society for Clinical Investigation, 3 N.I.H. Director's Awards, 2 Sloan Foundation Fellows, 1 HHMI faculty scholar, 1 MacArthur Foundation fellow, 1 Chan/Zuckerberg faculty fellow (SPEECH) And our goal is to really be the academic home for this type of faculty in research, R&D, and spin out companies, of course. Educational plans-- let me just tell you a little bit about that. So of course, like any major university, we have a graduate program in biomedical informatics. That's cool. Where I'm going after it with educational plans is everyone else. So it turns out, what we did was we partnered with a nonprofit called Software Carpentry. You might have heard of them. They're nationally known. And Software Carpentry just teaches people how to write code. And what we've done is now, we've paid for enough seats with Software Carpentry to make sure every single student at UCSF can now learn how to write code. I do not care if you're a dental student, a nursing student, a medical student. If you want to learn how to write code, we've paid for the seat now. We will pay you-- we will bring you pizza, like programming and pizza. We're only teaching them two things-- either R or Python or both. And it's not for credit. You can fail any number of times. Come back. It's not for credit. It's run by our libraries. R or Python. Note, Sas is not on the list. It's R or Python-- open languages here. Now, it turns out we ran out of seats in the city of San Francisco with Software Carpentry. 1,000 students have already taken this class. In fact, we have department chairs taking this class. So what we do now is we pay Berkeley students across the Bay to become instructors with Software Carpentry, then teach our folks across the Bay in San Francisco. So that's what we're doing with this nonprofit. Love it. But I want to be the first university to make sure every single student at our university learns how to write code. Maybe you don't know this. In California, you know how, in third grade, you learn cursive? We're getting rid of that and replacing it with coding. Those are the third graders coming right behind. It's about time our graduate students learn how to write code if a third grader's learning how to write code at this point. So I feel passionately about that one. (DESCRIPTION) Text U C S F Institute for computational health sciences, build the strongest team in the world in biomedical computation and health data analytics (SPEECH) Of course, we're going to recruit like crazy. That's the endless problem-- trying to get the very best people to your organization and build these new data assets for precision medicine. And let me just take a step back. Why did I move? UCSF is amazing, but UCSF is one of six medical schools in the University of California system. So we have 10 campuses. (DESCRIPTION) Slide showing a list of campuses in California (SPEECH) Six are medical, four are not. And we have three national labs, including three supercomputer centers. We have Lawrence Berkeley, Lawrence Livermore, and San Diego super computer center. So we got some compute power there. 18 health professional schools-- we train half the docs and residents in California. Two billion NIH funding. Roughly, a tenth of the NIH budget comes to UC. You can see all the rest here-- 5,000 doctors. (DESCRIPTION) 11.4 billion clinical operating revenue, 5 NCI comprehensive cancer centers, 5 NIH CTSA (SPEECH) And why I moved is, because for some reason, they said I get to build the central data warehouse for all patient data for the entire University of California. And I'll show you why that's so much fun now. I (DESCRIPTION) Combining healthcare data from across the six University of California medical schools and systems. UC Health from different campuses pointing to central UC Health Health Data Warehouse (SPEECH) wouldn't be an IT guy if I didn't show boxes pointing to boxes. So here is a box pointing to a box. What are we talking about here, of course, UCSF and UCLA are both top 10-- US news-- UC Davis, Irvine, Riverside, and San Diego. All that epic record data and images to come and all the text data will now be in a central health data warehouse. So this is proof that it's actually working here. (DESCRIPTION) UC Health Patients since January 2012 (SPEECH) This is the five million patients of what are our rough numbers. We've seen five million patients since 2012. But if we go back with all our record data, we have data on 15 million patients. So that's 5% of the US population we actually have record data on. This is proof that it works. This is just due location of every patient in University of California, UCSF, UC Davis, UCLA, UC Irvine. In teal-- I don't know what that color is-- UC San Diego. Riverside is out here. They're tiny. They're a brand new medical school. I don't want to leave them out. We got five major ones. And those five have CTSAs. Five have NCI comprehensive cancer centers. And then Riverside is tiny. By the way, this is Las Vegas. A bunch of their sick patients come to us. Hawaiian Islands are covered with our colors. When you're sick, you come to UC, especially in this part of the world. All the demographics that you can imagine here. (DESCRIPTION) The next big data, clinical data. Text Search 15 million patient records from the University of California with the U C Rex data explorer. (SPEECH) Yeah, 15 million patients. Why are we doing this? Look, I'm an academic. I write papers and grant. Some of you fund my work. Academics can only say so much. We've got the political will to do this, because we have a business reason to do this. (DESCRIPTION) Text U C Health, United Healthcare Form New A C O and Clinically Integrated Networks (SPEECH) What we've publicly state is working with the UnitedHealth Group, love them or not, we're working with UnitedHealth Group to make a single accountable care organization for the state of California. The entire University of California will be one ACO in five to 10 years. This is a new world for academic medical centers. You're going to see these new kinds of mergers and acquisitions. and they're not M&Es like you're used to here. We are going to partner together to actually work as one. That's why we're doing this-- to build a new ACO in five to 10 years. Let me run through what we're going to do with our common collected data here. Yeah, I still got some time. Let's pick one example what we can do with data. And that's is what pharma industry would call real-world evidence. And some of you are in the space, so this is going to be relevant to you. I'm technically an endocrinologist, barely, at this point. But type 2 diabetes, a very interesting disorder. it's a plague in some ways in the United States, around the world, and it's a super expensive one at that. And try to think about why. So this is the American Diabetes Association standards. This is the guideline. Look, give them some credit here. This is one of the lucky diseases where we have a guideline like this. There are a lot of diseases where we don't have this. And boy, it's in pastel colors. It's made for California this kind of diagram. And how do you read this diagram? It says, well, if you've got a patient with type 2 diabetes, lose weight and exercise. Well, it's a no-brainer. Then if that fails, metformin. And if that fails, try one of these six drugs. If those fail, add the other five drugs. If that fails, just go to insulin and metformin. That's how you read this thing. So we're looking at this, and we're thinking, boy, you know, these six kind of categories, well which, one do you use? It says, the choice depends on patient indices factors. In other words, we have no clue. Why is this interesting? It's because some of these boxes cost 200 times more than the other box. But they're all in the same size and kind of similar colors here. So a simple stupid thing we did was, well, what do our doctors actually do in University of California? So we started to look. So you're going to see these diagrams here. We used to call these diabetes donuts. But we realized that would be inappropriate. So now, we call these lifesavers. (DESCRIPTION) Medication strategies for first-time type 2 diabetes patients (SPEECH) And so a way to think about this is a pie chart, and pie is also inappropriate for type 2 diabetes. Think of this as a pie chart. This is the first drug that we can see that we started a type 2 diabetic. 12,000 patients with diabetes here. A third of them started metformin. OK, good. That was on the list. A third start with insulin. And then a smaller group starts with sulfonylureas. And then you got people starting at double therapy, triple therapy. And the black means there's so many little slices there, I can't even show you all the colors. So you've got a lot of variation here. Now, the more I look at all this data-- and I've been looking at this for three years now-- the more I realize medicine is like a game. And I'm not saying this to belittle having a disease. It's rotten having a disorder. But a lot of how we practice medicine is like a game. We make a move, and then we wait to see what the patient and the disease does, and then we make our next move. It might be on morning rounds. Let's see what happens. Let's make the next move the next morning. It could be clinic encounters. Let's see what happens. Come back to the clinic. So we made this move. Patient goes home, comes back in 90 days. Here's the next move. The way to read this is, everyone here was happy on that first dose. They never had to change it again. Yellow means they're all on metformin, but they had to change the dose. Any other color switch means we added a drug, subtracted a drug, switched a drug. They go. They come back another 90 days. Here's the next move. They go. They come back in another 90 days. Here is the fourth move. And here are four moves out in this chess game of sorts, and you realize we have 1,600 different ways to treat type 2 diabetes at UCSF-- probably ridiculous. Can we get this down to 1,000, 100, maybe 10 right ways? Because remember now, I got hemoglobin A1Cs up and down these grids. I can do comparative effectiveness of strategies here that I might not have to even believe what the American Diabetes Association says, because I have all this data. And by the way, once you get this at UCSF, and 48 hours later, we can do this across the entire University of California. 71,000 patients have diabetes here. Just like that, we have all the data in one place. By the way, it's already done. We can do this. And then you realize, we have 6,500 different ways to do this in the UC. That's too many, probably ridiculous. And by the way, anywhere you see purple, that was 200 times the cost for not really a good reason. Now, we got five years of follow-up on 70,000 patients of diabetes. You can guess where we're going with all of these studies. We're just going to show wink wink, nudge, nudge. When the pharma industry says it's better for long-term outcomes, is it really better? Well, here's our data. Publish it or not, we're going to just practice what we see now. That's the future here. (DESCRIPTION) Graph showing how alcohol induced mental disorders and other nonorganic psychoses are related (SPEECH) But to close it out here, in the end, I'm going to build maps-- maps of death and disease in California. I love maps. I use Google Maps. Some of you use Google Maps. Maybe some of you use Waze. A striking few of you probably still use Apple Maps, which is still terrible, but maybe better today if Apple announces anything decent. But I use maps. Google Maps might take you to pleasant destinations. Here's a map of how you get diseases and die in my state of California. This is real California data. It's opposite of Google Maps. So here's how to read this map. This was generated from real California data. Patients show up with alcoholism. A year after that, they show up with liver disease and cirrhosis. A year after that, they have liver abscess, so you can come straight down this way. And the squares means that you die of these conditions. So it's really, really hard to die of alcoholism, but it's easier to die of liver disease and cirrhosis. Does that make sense? That's Hannah and Jae who's done all this work. The maps get more complicated. Here, patients are [INAUDIBLE] heart attack. Everyone knows what a heart attack is. You can die right away, or you end up with heart failure in a year. Heart failure is a really big problem, where the heart is supposed to be a pump. It's too weak. It's not pumping the blood like the body needs. The fluid backs up into the lung, and then the patients die of sepsis three years later. Sepsis is a body-wide bacterial infection. It's a 50% mortality rate. The minute you get that diagnosis, you got a 50/50 chance the United States. Now, look, I was a pediatrician. I admit, I didn't take care of many patients with a heart attack. But I always thought it was the heart that killed you after you had a heart attack. But I'm amazed three years later. It's actually the infection that's killing you. And I'm not really sure we're screening people really well for sepsis with that fever. And why is it so big is because if you don't take the northern route, you can kill your kidneys, and take the southern route and have sepsis a year earlier. So that's why it's so big. You have two ways to get there. So it's not always the heart that kills you. The cardiologist keeps seeing you, but maybe it's the sepsis. So I love these maps of death and disease. I love Google Maps. But there's still one thing Google Maps can't do. Google Maps cannot show you where all the cars are on the map. I know it shows you the traffic-- red, yellow, green. But there's no way for Google Maps to show you all the little itty-bitty cars on the map. But when we open our new building with a conference room, with a wall of monitors, we're not just going to put the maps up there. We're actually going to show where all the patients are on the map. This is real California data. This is a prototype. As patients are getting older, the colors are getting brighter. This is literally how Californians move from disease to disease to disease to death. (DESCRIPTION) Computer-generated graphic showing large points of light with smaller points of light emerging from them (SPEECH) A whole bunch of them are about to get sepsis and die. There they go. Keep going. And now, we can start to predict what's going to happen the next 90 days, what's going to happen the next year. What are we going to do about it? And that, to me, is going to be the new definition of an accountable care organization-- one that simply accounts for the care of all 15 million of its participants. That is what we're building at University of California with all of this real-world data. All right, so what is big data in biomedicine? Is it about algorithms, programmers, Hadoop, cloud computing, high performance computer models? Yeah, it's about all of these things. I didn't show you any of these things, right? It's about predicting diseases before they strike. Explain rare diseases that defy experts, finding drugs for disease, a lack of attention. There is simply not enough pharma companies on the planet that we need making sure we do the right, safe, cost-effective thing for patients. And all of this is an amazing platform for biomedical innovation, entrepreneurship. You've got to follow all the rules. Medical schools have rules. We've got to act honorably, above the table. You saw my transparency slide there. But when you do, it's an amazing platform. For all of this, I could sum this up with one word-- hope. The patients, the families, the docs are depending on us to come up with solutions, not just more complaints about how bad the system is. That data is sitting there. Are you going to be the one to take this data? If you don't do this, get your kids to do this. Get your cousins to do this. Get your nephews to do this. The kids today got to do this. It's really important. This is what the world needs here. I have to thank an enormous number of people. 50 people get all that data warehouse stuff together. I used to color code these by UC campuses. Now, I realize we're just one giant university just in different zip codes. I have to thank a whole bunch of collaborators at NIH, at Stanford, at UCSF. We got a lot of work done. I have to thank UCSF and Priscilla Chan, Mark Zuckerberg for the endowment support. My lab has been blessed with 20 NIH grants from these 11 institutes of NIH. The six on the left give me more money. The five on the right give me less money, but I do still love you. The governor's office gives me money to run precision medicine. March of Dimes, JDRF, HHMI-- stem cell money-- a lot of disease-specific foundations giving us money to work on their disorders. I always thank my admin and tech staff. I'd never get a grant or paper out the door without them. [INAUDIBLE] at Harvard Med school has been my friend and mentor for life. Sam and Keith [INAUDIBLE] at UCSF. And Talmadge and Mark support me now running the medical school and the medical system. I always mention my daughter here, Kimi. She's a 10th grader now. Why I mentioned her at this point, too, she inspires me, but at the same time, she teaches me well. Why is it that the first introduction to biology medicine for a high-schooler is dissecting things with that smell? This is how we should be teaching biology to high school kids and not turning them off. And then when they get turned off, boy, they make another app to share pictures with each other. We do not need more apps to share pictures with each other. We need some of them making drugs in the garage. And I don't mean making meth in your garage. I mean, cancer drugs, diabetes drugs. That's what kids need to do. We got to teach them the right way. Get them inspired with the data here, not just dissecting things. And I always thank my wife, Tarangimi Deshpande, a molecular biochemist. Reads every major paper, grant that comes out of my collaborative with my lab, starts companies with me now, and let's me go all over the world to give talks like this. Thank you very much. [APPLAUSE]