DATA SCIENCE SEMINAR SERIES
The data science seminar series being conducted at DePaul University organized by the DePaul Center for Data Science is an ongoing event that consists of talks and presentations focused on various aspects of data science. The series is designed to provide students with an understanding of the principles and practices of data science, as well as the skills required to apply these principles in practical settings.
The series features speakers from various industries, including education, healthcare, and technology, who share their expertise and insights on topics such as data visualization, machine learning, and big data analytics. The seminars are interactive and engaging, providing students with opportunities to ask questions and engage in discussions with the speakers and their peers.
Through this seminar series, students gain exposure to the latest trends, tools, and techniques in the field of data science, preparing them for successful careers in this rapidly growing industry. The seminar series is open to all students, regardless of their level of experience with data science, and provides a valuable opportunity for networking and building connections with other students and professionals in the field.
List of Events:
In this talk, the speaker describes several of the projects developed at TEAPOT (The Educational and Professional Online Training) Lab on a diverse range of topics: psychology research methods, the Lisp programming language, and deep understanding of argumentative texts. He also describes some of the Natural Language Processing techniques that support these systems and other systems that could be developed in the future. Because testing the performance of these systems is important, and because it was a data science-type talk, he also describe the data used to develop these systems and test how well they help learners learn.
When: Thursday, Feb 9, 2023, at 4 pm CST
Speaker:
Peter Hastings has been at DePaul since 2001. For his PhD dissertation at the University of Michigan, he studied how people could learn the meanings of words from context. Afterwards, he was a postdoc at University of Memphis, working on interactive educational systems, one for helping 5th graders write stories, and another for college students learning the basics of computing. Before coming to DePaul, he taught Cognitive Science and Artificial Intelligence for two years at the University of Edinburgh. He teaches courses in Human Computer Interaction, Artificial Intelligence, and sometimes, Computational Neuroscience.
This talk covers a number of different applications of machine learning as applied to medical informatics. It also introduces the problem of diagnosing chronic fatigue syndrome from survey data, lung nodule malignancy from radiologist-annotated computed tomography imaging, traumatic brain injury from magnetic resonance imaging and, separately, from free text data, and, finally, breast cancer malignancy from pathology slides. The talk discusses commonalities and dissimilarities, and tailoring a machine learning strategy for each of the different tasks.
When: Thursday, March 13 2023, at 4 pm CST.
Speaker:
Jacob Furst is an Professor in the Jarvis College of Computing and Digital Media (JCDM) at DePaul University. His research interests are in medical informatics with applications of machine learning and data mining to medical image processing and computer vision. Dr. Furst earned his PhD in computer science from UNC Chapel Hill; he has a master’s degree in education and a bachelor’s degree in English literature.
Scientific publications constitute an extremely valuable repository of knowledge and collection of facts crucial to the advancement of science and development of applications, which grows as researchers learn from previous works and scientists use results in the literature to design and create. With the exponential growth of available publications, reading and extracting this wealth of information has become impractical for humans. Despite great progress in natural language processing, machine-learned solutions require large amounts of carefully annotated data for good performance. This is especially true in the context of accurately labeling and extracting complex scientific data. Towards our ultimate goal of extracting scientific facts from the literature, we first aim to identify blobs of text that contain all of the facts in a publication to be later automatically extracted or scrutinized by experts. Our previous work identified some facts missed by experts yet missed others due to the assumption that the target relation—here, a polymer and its glass transition temperature—would be contained within the same sentence. We set out to enhance our approximate labeling system to look back and ahead for missing information and successfully achieved 100% recall of scientific facts while reducing the full-text publication to 6% of its original size. Moreover, we assign confidence scores to sentences to further assist expert curators in identifying important sentences and facts locked in unstructured text.
When: Thursday, April , 2023, at 4 pm CST.
Speaker:
Roselyne Tchoua is an Assistant Professor in the School of Computing, DePaul University. Her interests have always gravitated around making seemingly inaccessible technology or unmanageable amounts of data more reachable. She joined the DePaul Center for Data Science to continue working in the fascinating space between data science and other science fields (e.g., medicine), extracting insight from data using machine learning and natural language processing techniques. She received her PhD in computer science from the University of Chicago, focusing on Hybrid Human Machine Scientific Information Extraction and working with materials science data specifically. During her graduate studies, she collaborated with scientists at the UChicago Institute of Molecular Engineering and the National Institute of Standards and Technology (NIST) to extract polymer names and properties from the literature. Before going to the University of Chicago, she was a scientist at Oak Ridge National.
Working with data has become a critical skill for today’s workforce as the cohort of businesses and organizations looking to cash in on the promise of big data grows. Despite their efforts, CEOs report failing to fulfill this promise. One reason is the unfair expectation that someone with a lifetime of expertise in another subject should quickly be able to master data science, a field that requires a lifetime to master in its own right. Instead, the tooling should come to the analysts where they already are. This goal means different things to different domains, but there are common threads. Careful design and implementation can leverage the best of human and machine, each performing its role. Human decision makers remain in control of interfaces curated to support their insight workflow, while behind-the-scenes machine learning augments human efforts. In this talk, the speaker discusses his work on these interactive machine learning (IML) systems, including with domain collaborators from diverse fields like biotechnology, medical informatics and journalism. In addition, he explains our recent drive to bring this technology to a suite of common data science problems.
When: Thursday, April 27, 2023, at 4 pm CST.
Speaker:
Eli T. Brown is an Associate Professor in the College of Computing and Digital Media (CDM) at DePaul University. He earned his B.A. from Cornell University in Computer Science and Math, and his Ph.D. and M.S. in Computer Science from Tufts University. His teaching is focused in the Data Science Program, where he teaches data visualization and machine learning. His research revolves around integrating the two disciplines together for more effective data analytics. He directs the Laboratory for Interactive Human-Computer Analytics (LIHCA; lihca.io), which develops new interactive machine learning technology to bring the power of machine learning to a wider audience with diverse problems to solve, including through collaborators in a variety of fields like biomedical, biotechnology and journalism.