Health Informatics
Description: Suicide is an alarming public health problem accounting for a considerable number of deaths each year worldwide. Many more individuals contemplate suicide. Understanding the attributes, characteristics, and exposures correlated with suicide remains an urgent and significant problem. As social networking sites have become more common, users have adopted these sites to talk about intensely personal topics, among them their thoughts about suicide. Such data has previously been evaluated by analyzing the language features of social media posts and using factors derived by domain experts to identify at-risk users. In this project, we automatically extract informal latent recurring topics of suicidal ideation found in social media posts. Our evaluation has demonstrated that we are able to automatically reproduce many of the expertly determined risk factors for suicide. Moreover, we have identified many informal latent topics related to suicide ideation such as concerns over health, work, self-image, and financial issues. Current projects include 1) expanding this work to other mental health issues, 2) testing additional feature extracting techniques, 3) Designing procedures to acquire a robust and reliable ground truth.
Faculty Contact: For more information, please contact Dr. Jonathan Gemmell
Machine Learning Established in 2000, the Sinai Urban Health Institute’s (SUHI) is the research arm of Sinai Chicago. SUHI is a nationally recognized community research center that works in partnership with community members and organizations to identify and address health inequities in some of the most underserved communities in the city. SUHI was an early adopter of the Community Health Worker (CHW) model. CHWs form a liaison between the patients and the hospital and help address inequities by addressing patients needs and connecting them with resources. Patients are referred to CHWs and asked to fill out a Social Determinants of Health (SDoH) survey. Dr. Tchoua is working with SUHI data to provide data-driven evidence of the positive impact of CHWs on the Emergency Department (ED) 30-day readmission rate, an important outcome of patient health and an important metric of hospital quality of care. This study also aims to highlight important aspects of the program and make recommendations that can improve the CHW program. The ultimate goal of the project is to promote the use of SDoH data in more Sinai clinics and other hospitals.
For more information please contact Dr. Roselyne Tchoua
Leveraging collaboration with Rush University Hospital, this project will look at syndemics, i.e., finding subgroups of diseases that occur together in patients and together contribute to excess burden of disease in a certain population. The problem of identifying syndemics can be formulated as a complex application of clustering. Syndemics or synergistic epidemics are the aggregation of two or more concurrent or sequential epidemics or disease clusters in a population with biological interactions, which exacerbate the prognosis and burden of disease. For example, the SAVA syndemic is comprised of substance abuse, violence, and AIDS, three conditions that disproportionately afflict those living in poverty in US cities. The problem of syndemics can be placed under the larger umbrella of personalized medicine in the sense that identifying subgroups of patients that are similar – have a similar set of characteristics and/or diagnosis – enables physicians to fine-tune treatment for individual patients. Instead of prescribing treatments for the “average patient”, physicians can use the context from clusters closely related to the patients to prescribe personalized treatment.
For more information please contact Dr. Roselyne Tchoua
ASPirin in Reducing Events in the Elderly (ASPREE) is a landmark research project to advance better health outcomes for older adults. ASPREE is comprised of an aspirin trial and a follow up health and aging study. Results for the “average” patients were that Aspirin had no significant impact on disability free longevity in older adults. The effects on minority health however are unclear due to the limited representation in the clinical trial. The goal of personalized medicine is to tailor disease prevention, diagnosis and treatment to each individual, while considering their genes, environment and lifestyles. This emerging field recognizes the limitations of the one-size-fits-all approach to treating the “average patient”, which ignores the numerous, sometimes subtle differences between patients. Our team at DePaul is using the ASPREE data and exploring machine learning techniques to study the heterogeneity of treatment or the potential different health outcome per groups of patients, specifically underrepresented groups.
For more information please contact Dr. Roselyne Tchoua
Materials informatics is an emerging field that has the potential to dramatically reduce the time-to-market and development for new materials; computational models scan large datasets to identify candidates for new materials. As such methods rely on access to large, machine- readable databases, the traditional text-based physical handbooks will not suffice. However, there are few examples of these scientific digital databases and constructing new databases is a monumental and costly task requiring years of expert labor, as the data that populate these databases must often be extracted manually from free-text publications. While, machine learning efforts have begun in materials science, the lack of annotated text hinders attempts to leverage approaches developed for biomedicine for example. In this project, we will build on previous work which leverages human and automated approaches to extract scientific named entities from text. We will enhance this work to tackle scientific entity relation extraction. Specifically, we will explore comparable human-in-the-loop extraction approaches to continue to contribute to existing datasets of annotated materials entities and properties.
For more information please contact Dr. Roselyne Tchoua
Medical Informatics
Accurate diagnosis of lung lesions in computed tomography (CT) depends on many factors, including the radiologists’ ability to detect and correctly interpret these lesions. Computer-aided diagnosis (CAD) systems can be used to measurably increase the accuracy of radiologists in this task. Various CAD systems have been developed over the years for the detection and classification of pulmonary nodules. Most of these systems mimic domain knowledge in order to extract image content and use a comparison with ground truth for evaluation. However, these systems work in an algorithmic fashion that is only tenuously related to human perception and characterization of image features. In the image retrieval community, this is known as the semantic gap problem – the lack of coincidence between the quantitative information that may be extracted computationally from the image data and the visual interpretation of this data by human observers. Students working on these projects will 1) establish the link between computer-based image features of lung nodules in CT scans and visual descriptors defined by human experts in the Lung Image Database Consortium (LIDC) terminology and 2) integrate these links into content-based lung nodule image retrieval (CBIR) systems.
PI: Dr. Daniela Raicu
Chronic fatigue syndrome is a prevalent disease with little known about its probable cause. Some research has shown a link to the Epstein-Barr virus, implicated in mononucleosis. In this project, we will perform complex statistical analysis of a data set gathered by the DePaul Psychology department; a data set of blood proteins measured in students at three stages: healthy volunteers, students who developed mono and a six month follow-up. We will analyze individual proteins in an attempt to be able to predict whether individuals are likely to develop chronic fatigues syndrome after getting mono, and also use correlational analysis to determine if there are patterns of protein co-activation that characterize healthy controls differently than individuals with chronic fatigue syndrome. A truly multi-disciplinary project, this project will also include regular meetings with the research group at the Psychology department.
PI: Dr. Jacob Furst
Traumatic brain injury (TBI) is well known to be related to intimate partner violence (IPV). However, the exact relationship is hard to quantify given the stigma and shame associated with IPC and given the difficulty in assessing TBI. This project aims to better understand this relationship by investigating the intake reports on patients entering the emergency department because of IPV and by investigating any subsequent reports to assess TBI. Further, we are interested in understanding how the Covid-19 lockdown affected further the complex relationship between IPV and TBI.
Bioinformatics
Biofilms refer to microbial life on surfaces. Microorganisms attach to surfaces and develop biofilms. Scientific imaging has proved to be an accepted research technique for investigating, analyzing and understanding biofilm systems. Due to the complexity and diversity of biofilms, as well as the surrounding habitat, different types of data formats exist to assess biofilm structure and composition. One of the most common ways to resolve structural aspects of biofilms as well as structure-function relationships is laser-based two and three dimensional imaging. Our goal is to develop tools that will rapidly identify biofilm regions of interest from these microscopes and machine learning techniques to gather information, objects and key features that are difficult to recognize from biofilm associated images for human interpretation. Also focus on developing applications that will enable managing large volumes of biofilm specific images.
PI: Dr. Thiru Ramaraj
Connecting genomic regions to phenotypes is critical in many biological fields, from medicine to conservation to agriculture and beyond. But it requires large numbers of genomes and associated phenotype data in order to capture diversity and provide enough samples for testing and training, making pangenomics difficult to scale in eukaryotic organisms with their large, complex genomes. This is complicated by heterogeneity resulting from different qualities of assemblies which affects pangenomic graphs. The best strategies for working with heterogeneous datasets and quantifying any resulting uncertainty in phenotype prediction have not been well studied in pangenomics. Genomes have the added issue of being related by descent and this evolutionary relatedness which can lead to issues such as false positive connections between genomic regions and phenotypes.
PI: Dr. Thiru Ramaraj
Most pangenomic graphs are created within single species. Expanding across evolutionary distance in order to capture variation contained in a larger clade is difficult because the nucleotide divergence levels make the number of paths in the graph greatly expand and many regions that are functionally equivalent between genomes don’t have enough sequence conservation to be recognized. However, being able to recognize and access genetic diversity from more distantly related organisms is important because important traits that don’t exist in your species of interest, such as disease resistance can be identified and brought in from these relatives or, once recognized, can be edited into the genome of the species of interest directly.
PI: Dr. Thiru Ramaraj
Neurons in the brainstem medullary reticular formation govern vital motor behaviors, such as breathing, vocalization, swallowing, and chewing. Critical for understanding these neural circuits is identification and localization of reticular subpopulations controlling these myriad functions. A major obstacle to identifying these nuclei is the lack of clear cytoarchitectonic boundaries and molecular markers within the reticular formation delineating functional nuclei. This project will investigate the three‐dimensional proximity and similarities in neuronal gene expression profiles to determine a mapping for motor behaviors in the brainstem.
PI: Dr. Thiru Ramaraj
Recommender Systems
Description: Recommender systems assist users in navigating complex information spaces and focus their attention on the content most relevant to their needs. Often these systems rely on user activity or descriptions of the content. Social annotation systems, in which users collaboratively assign tags to items, provide another means to capture information about users and items. Each of these data sources provides unique benefits, capturing different relationships. We propose leveraging multiple sources of data: ratings data as users report their affinity toward an item, tagging data as users assign annotations to items, and item data collected from an online database. Taken together, these datasets provide the opportunity to learn rich distributed representations by exploiting recent advances in neural network architectures.
For more information, please contact Dr. Jonathan Gemmell.
Computational Reproducibility
Description: Sciunits are efficient, lightweight, self-contained packages of computational experiments that can be guaranteed to repeat or reproduce regardless of deployment issues. Sciunit answers the call for a reusable research object that containerizes and stores applications simply and efficiently, facilitates sharing and collaboration, and eases the task of executing, understanding, and building on shared work. Explore Sciunits at: http://sciunit.run
Faculty Contact: For more information, please contact Dr. Tanu Malik.
Transportation Analytics
Description: This research area includes analyzing how people move from point A to point B. This can include congestion and speed on the transportation networks, how traffic patterns change over time, how short-term or long-term events affect these changes (construction, large social gatherings, the addition of new roads or public transport, etc.), choice of transportation mode, large external factors such as Covid-19 impacting the transportation, etc. Transportation analytics can be combined with other data sources such as demographic, social, health, economic, employment, and more to make the analysis richer and more meaningful.
Transportation analytics is an area rich in data visualization, geospatial analysis, and time series analysis. Other possibilities include Deep Learning application in connected and autonomous vehicles, safe driving analysis (safety belt, distracted driving, etc.), bike transport and bike infrastructure, road safety and vehicle crash analysis, etc.
Project details: The project can be carried out by a single student or group of students. The process will start with deciding on a problem statement and the dataset. The data science project can be done in R or Python and can be supplemented with other data analytics tools.
Data: There are many online open data resources from cities, states, or countries. The datasets to work on will be decided together with the student.
Skills required:
• Good knowledge of Python or R
• Experience in creating a data science project from start to end including data cleaning, exploratory data analysis, statistical analysis, data mining, and machine learning
• Experience in web data scraping is a plus
• Geospatial data analysis experience is a plus
Faculty Contact: For more information, please contact Dr. Ilyas Ustun
Opportunities for Student Involvement: Undergraduate and graduate students are welcome to participate in these projects. Both undergraduate and master’s students have the chance to engage in the projects for their capstone, independent study, or to gain valuable experience. For specific project details and availability, prospective students should initiate discussions with the faculty members associated with each project.
If the contact cannot be found, please reach out to Dr. Ilyas Ustun at iustun@depaul.edu