Tropes

R_www.technology.org 2015 13740.txt.txt

#Raising Computers to Be Good Scientists Making sense of the new scientific data published every year including well over a million cancer-related journal articles is a tall order for the contemporary scientist. Even if a scientist were capable of reading every article and memorizing its content, drawing connections to answer real-world questions would require supernatural cognition. Figuring out how to actually read hundreds of thousands of scientific papers and apply their findings to real challenges, such as the treatment of cancer patients, is an arduous, uphill battle. But an associate professor in the University of Arizona School of Information Clayton Morrison, is doing just that one algorithm at a time. He wonders, as many others in his field do, if the solutions to big problems are already there, in extant data, but no one has been able to put it all together yet. Morrison, as the co-principal investigator, and a team of collaborators are using a research grant of more than $3. 6 million to investigate. Funded by the Defense Advanced Research Projects Agency, EACH: Reading and Assembling Contextual and Holistic Mechanisms From Textwill create a computer system that reads papers, extracts information on biochemical pathways, and plugs it all into large-scale, interactive models. REACH researchers are laying the foundation for interactive software that would allow drug developers, or maybe even doctors, to provide lots of information, such as a patient genome. In turn, it could model how a specific treatment would interact with the patient. heyl be the Microsofts and Googles of biomedicine, Morrison said. Its potential has mass appeal and big implications: fast, individualized and precise biomedical care. he REACH project is applied to cancer biology, but we have an even bigger vision than that, although cancer biology is big enough, Morrison said. If big data is a two-part challenge, Morrison said, then storing it and moving it around is the first part. The second part is understanding it. REACH works on the understanding part in three phases: extraction, assembly and inference. Extraction was put to the test this summer. Over the course of a year, researchers led by Mihai Surdeanu, associate professor in the School of Information and REACH principal investigator, trained a computer system to read papers using hundreds of algorithms. One, for example, allows it to understand that ouse, iceand us musculusall refer to the same thing. Others on the UA research team include Ryan Gutenkunst, assistant professor of molecular and cellular biology; Guang Yao, assistant professor of molecular and cellular biology; and Kobus Barnard, professor of computer science. Morrison, who also has a strong, academic background in developmental psychology, said, think that collaborative computers are going to be like children, and wel have to raise them, in a way. Theyl be as smart as wee able to teach them, and we need them to be able to communicate with us. In the recent evaluation of this first phase of REACH, the system was able to process 1, 000 papers on RAS-related cancers in a matter of hours, yielding results that exceeded state-of-the-art predecessors all by relying on algorithms. Asking a human scientist to do the same would be outrageous. Focusing their efforts on modeling how RAS functions in cancer cells was an easy choice, for a couple of reasons. RAS PROTEINS control the chemical pathways responsible for growth, migration and survival within a cell. Basically, theye got a big job. Secondly, RAS oncogenes are mutated in 33 percent of all human cancers, making them one of the most highly researched classes of oncogenes. And when you need thousands of papers on one subject, highly researched is important. Now that the REACH system knows how to read, it needs context. Morrison is currently building that in, by teaching it to differentiate between species (a yeast cell is different from a mouse). As of now, REACH is already familiar with 30 different species affected by RAS-related cancers. It also will need to understand differences among cell types, organs and tissue types. This is all part of the project assembly phase. By the end of the four-year project, REACH should be able to make inferences. In other words, it will hypothesize much as a scientist or a doctor might. would like to see this usher in computers understanding complex things at a level that we just can, Morrison said. t awesome. I can tell you how excited and passionate I get that I able to take things Ie developed and apply them to something that could potentially, directly improve peopleslives. h

< Back - Next >