Projects

Medical Informatics

I3RIS: Interactive, Iterative, Integrated Radiology Image Search

Description: The advancements in medical imaging technologies have generated billions of images that are digitally stored and indexed in different data repositories worldwide. Current search mechanisms and query tools used to access these images in clinical practice are text-based only and are not sophisticated enough to fulfill the types of queries that clinicians need. Leveraging the richness of the medical data, the long-term objective of this interdisciplinary effort between DePaul University and University of Chicago is to provide the most useful information, the best images, and the most relevant data sources to clinicians at the point of care. Our specific goals are to design, develop, and evaluate a hybrid search engine that unlocks valuable information from onsite and online radiology data sources (in-house proprietary teaching files and publically available online peer-reviewed teaching files, radiology journals, and imaging related textbooks) to provide radiologists the most relevant information needed at the time of patient care. Our central hypothesis is that having a search mechanism that maps naturally from the user’s limited internal memory of observed cases to a wealth of examples available onsite and online would allow clinicians to make faster, more confident and accurate diagnoses by removing the innate error caused by the limits of human memory. To test the central hypothesis, we propose to 1) create a hybrid text and image distributed database by integrating radiology teaching files, textbooks, and journals, 2) extract knowledge from integrated data sources to augment medical decision making, and 3) develop a domain-specific interactive user interface with iterative query refinement.

Student Involvement: Now looking for Undergraduate, Master and PhD students. Master students can work on this project for Capstone.

Faculty Contact: For more information, please contact Dr. Daniela Raicu.

Medical Health Informatics

Description: Suicide is an alarming public health problem accounting for a considerable number of deaths each year worldwide. Many more individuals contemplate suicide. Understanding the attributes, characteristics, and exposures correlated with suicide remains an urgent and significant problem. As social networking sites have become more common, users have adopted these sites to talk about intensely personal topics, among them their thoughts about suicide. Such data has previously been evaluated by analyzing the language features of social media posts and using factors derived by domain experts to identify at-risk users. In this project, we automatically extract informal latent recurring topics of suicidal ideation found in social media posts. Our evaluation has demonstrated that we are able to automatically reproduce many of the expertly determined risk factors for suicide. Moreover, we have identified many informal latent topics related to suicide ideation such as concerns over health, work, self-image, and financial issues. Current projects include 1) expanding this work to other mental health issues, 2) testing additional feature extracting techniques, 3) Designing procedures to acquire a robust and reliable ground truth.

Student Involvement: Now looking for Undergraduate, Master and PhD students. Master students can work on this project for Capstone.

Faculty Contact: For more information, please contact Dr. Jonathan Gemmell.

Revolutionizing Medicine with Machine Learning

Description: Machine Learning is on the cliff of revolutionizing medical diagnosis. Diagnostic applications of machine learning are rapidly transitioning from the theoretical to the real-world. The transformational potential of diagnostic applications cannot be overstated from an at-home tool for early detection to an instant “second opinion” for a complex diagnostic case. Machine learning as a diagnostic tool will generate incredible efficiencies and cost savings for patients, doctors, and hospitals, and most importantly of all, it will save lives. In a quest to build more trustable Computer-Aided Diagnosis (CAD) systems for lung cancer, the CDM Medical Informatics Lab and the Imaging Institute at University of Chicago have been collaborating for over a decade to build the next generation CAD system with advanced imaging analytics and reasoning capabilities that can assist in the clinical decision making process. The collaboration involves three stages of research: 1) predictive modeling for high-level diagnostic interpretation derived from low-level image data, 2) learning the human visual perception of similarity using low-level image features and expert-in-the-loop feedback, and 3) evaluating the effects of smart capabilities on traditional CAD systems and medical experts’ performance.

Student Involvement: Now looking for Undergraduate, Master and PhD students. Master students can work on this project for Capstone.

Faculty Contact: For more information, please contact Dr. Daniela Raicu.

Computer-Aided Prognosis of Age-Related Macular Degeneration

Description: Advanced form of age-related macular degeneration (AMD) is a major health burden that can lead to irreversible vision loss in the elderly population. For early preventative interventions, there is a lack of effective tools to predict the prognosis outcome of advanced AMD because of the similar visual appearance of retinal image scans in the early stage and the variability of prognosis paths among patients. The existing prognosis models have several limitations: First, previous studies assume constant time intervals between doctor visits; however, in real world clinical settings, the visits may happen at irregular time intervals. The assumption of constant time intervals will lead to over-optimistic prediction results on specific training data sets while failing to produce generalizable results on new patient data sets. Second, current studies only predict one form of advanced AMD form at a time. Third, computer-based prognosis results are typically not validated on new patients and therefore, it is difficult to evaluate the generalizability of the proposed approaches. Lastly, there is a lack of interpretability of the models and explainability of how a computer-based prognosis determination has been made. The overall objective for this project is to design, develop, and evaluate AMD prognosis prediction models that can detect most relevant images containing AMD biomarkers, manage unevenly spaced sequential optical coherence tomography (OCT) images and predict all advanced AMD forms that can help with the interpretation and explainability of computer-aided prognosis models.

Faculty Contact: For more information, please contact Dr. Daniela Raicu.

Recommender Systems

Recommender Systems

Description: Recommender systems assist users in navigating complex information spaces and focus their attention on the content most relevant to their needs. Often these systems rely on user activity or descriptions of the content. Social annotation systems, in which users collaboratively assign tags to items, provide another means to capture information about users and items. Each of these data sources provides unique benefits, capturing different relationships. We propose leveraging multiple sources of data: ratings data as users report their affinity toward an item, tagging data as users assign annotations to items, and item data collected from an online database. Taken together, these datasets provide the opportunity to learn rich distributed representations by exploiting recent advances in neural network architectures.

Student Involvement: Now looking for Master and PhD students. Master students can work on this project for Capstone.

Faculty Contact: For more information, please contact Dr. Jonathan Gemmell.

Computational Reproducibility

Sciunits: Tools for conducting Reproducible Science

Description: Sciunits are efficient, lightweight, self-contained packages of computational experiments that can be guaranteed to repeat or reproduce regardless of deployment issues. Sciunit answers the call for a reusable research object that containerizes and stores applications simply and efficiently, facilitates sharing and collaboration, and eases the task of executing, understanding, and building on shared work. Explore Sciunits at: http://sciunit.run

Student Involvement: Now looking for Undergraduate, Master and PhD students. Master students can work on this project for Capstone.

Faculty Contact: For more information, please contact Dr. Tanu Malik.

Transportation Analytics

Transportation Data Analytics

Description: This research area includes analyzing how people move from point A to point B. This can include congestion and speed on the transportation networks, how traffic patterns change over time, how short-term or long-term events affect these changes (construction, large social gatherings, the addition of new roads or public transport, etc.), choice of transportation mode, large external factors such as Covid-19 impacting the transportation, etc. Transportation analytics can be combined with other data sources such as demographic, social, health, economic, employment, and more to make the analysis richer and more meaningful.

Transportation analytics is an area rich in data visualization, geospatial analysis, and time series analysis. Other possibilities include Deep Learning application in connected and autonomous vehicles, safe driving analysis (safety belt, distracted driving, etc.), bike transport and bike infrastructure, road safety and vehicle crash analysis, etc.

Project details: The project can be carried out by a single student or group of students. The process will start with deciding on a problem statement and the dataset. The data science project can be done in R or Python and can be supplemented with other data analytics tools.

Data: There are many online open data resources from cities, states, or countries. The datasets to work on will be decided together with the student.

Skills required:

• Good knowledge of Python or R

• Experience in creating a data science project from start to end including data cleaning, exploratory data analysis, statistical analysis, data mining, and machine learning

• Experience in web data scraping is a plus

• Geospatial data analysis experience is a  plus

Student Involvement: Looking for Undergraduate and Graduate students. Both Undergraduate and Master’s students can work on this project for their Capstone or Independent Study.

Faculty Contact: For more information, please contact Dr. Ilyas Ustun

Apply: https://forms.gle/ufR9jVarHGkkCMSi6

Traffic Crash Analysis

Description: Traffic crashes have a significant impact on the economy both in the form of property damage and also in the form of lost time. The most vulnerable population in traffic crashes are pedestrians and cyclists. Identifying the crash-prone locations will help traffic safety, transportation planning, and law enforcement to prioritize their efforts and resources to minimize the risk of accidents.

Project details: The project can be carried out by a single student or group of students. The process will start with deciding on a problem statement. The data science project can be done in R or Python and can be supplemented with other data analytics tools. Analyzing this data will provide insight into many aspects of traffic crashes. Joining with other data sources can improve the quality of the data. There can be several paths that can be explored here, including how the pedestrians or bikes are affected in a traffic crash, the propensity of drug usage in the severity of a traffic crash, etc.

Data: The Michigan traffic crash data is rich and contains a lot of information of the incident, severity, time, involvement of pedestrian or bike along with other factors. There are 300K rows each year for the past 10 years.

There are many cities and states providing crash data, so analysis of those can be pursued as well if so desired by the student.

Skills required:

• Good knowledge of Python or R

• Experience in creating a data science project from start to end including data cleaning, exploratory data analysis, statistical analysis, data mining, and machine learning

• Geospatial data analysis experience is a plus

Student Involvement: Looking for Undergraduate and Graduate students. Both Undergraduate and Master’s students can work on this project for their Capstone or Independent Study.

Faculty Contact: For more information, please contact Dr. Ilyas Ustun

Apply: https://forms.gle/ufR9jVarHGkkCMSi6

Materials

Human-in-the-loop Image Pattern Detection

Description: New materials can provide solutions for key challenges in sustainability, e.g., in energy, new catalysts for more efficient fuel cell technology. One of the several challenges in new materials discovery is the identification of the crystalline phases of inorganic compounds based on an analysis of high-intensity X-ray patterns. Identifying these phases is equivalent to finding the crystal structures (arrangement of atoms) of new compounds, which then leads to determining their properties. Fully automated phase identification is challenging as the images generated with the X-ray instrument can be noisy and the patterns (or series of matching peaks) has to be identified across sets of multiple samples of materials. In this project, we explore a combination of visualization techniques, guided by humans to accelerate crystalline phase identification. We will build on previous work to develop automated pattern detection and investigate opportunities to integrate expert and non-expert feedback.

Student Involvement: Now looking for Undergraduate, Master and PhD students. Master students can work on this project for Capstone.

Faculty Contact: For more information, please contact Dr. Roselyne Tchoua.

Hybrid Human-Machine Information Extraction

Description: Materials informatics is an emerging field that has the potential to dramatically reduce the time-to-market and development for new materials; computational models scan large datasets to identify candidates for new materials. As such methods rely on access to large, machine- readable databases, the traditional text-based physical handbooks will not suffice. However, there are few examples of these scientific digital databases and constructing new databases is a monumental and costly task requiring years of expert labor, as the data that populate these databases must often be extracted manually from free-text publications. While, machine learning efforts have begun in materials science, the lack of annotated text hinders attempts to leverage approaches developed for biomedicine for example. In this project, we will build on previous work which leverages human and automated approaches to extract scientific named entities from text. We will enhance this work to tackle scientific entity relation extraction. Specifically, we will explore comparable human-in-the-loop extraction approaches to continue to contribute to existing datasets of annotated materials entities and properties.

Student Involvement: Now looking for Undergraduate, Master and PhD students. Master students can work on this project for Capstone.

Faculty Contact: For more information, please contact Dr. Roselyne Tchoua.

Bioinformatics

Functional Neural Mapping for Behavior Modeling Using Big Data Computing

Description: A major goal in neuroscience research is to understand behavior at the level of neural networks. While many studies have attempted to tackle this goal, their resolution is not at the single neuron level or their scope is not extensive enough to make a concrete connection between behavior and neural networks. Caenorhabditis elegans provides clear advantages to overcome both of these challenges due to its simple nervous system and completely deciphered anatomical neural map. Moreover, C. elegans exhibits behaviors found in higher organisms, including food search behavior. In this interdisciplinary collaborative project between DePaul University and Rosalind Franklin University Medical School, we will use C. elegans to build functional networks of interneurons for food search behavior. We propose to perform in-depth research and develop new, powerful, and scalable image processing, indexing and data mining methods for efficient and effective analysis-based mapping of neural networks to locomotory search behaviors. Our proposed study will work on neuron-ablated C. elegans image datasets, and focus on (1) extracting representations of movement characteristics, (2) discovering and indexing behavior patterns in large sequential image data, (3) modeling search behavior similarity based on the discovered patterns, and (4) learning functional neural networks from combinations of behavioral models. The amount of data that will be generated from this research study will be in the petabytes range, making it crucial to employ cutting edge big data computing techniques on advanced large-scale distributed systems to make this study tractable.

Student Involvement: Now looking for Undergraduate, Master and PhD students. Master students can work on this project for Capstone.

Faculty Contact: For more information, please contact Dr. Daniela Raicu.

Building Ontologies

Description: Successful development of a biofilms information system requires a framework for representing and communicating information about this highly complex domain. To meet this requirement we will develop a biofilms ontology that captures the concepts used in biofilms research, the attributes of these concepts, and the relationships among them. This ontological framework will directly inform the data model that will support the information system, and its database implementation. To the extent possible, we will reuse existing ontologies in domains that overlap with biofilms research.

Student Involvement: Now looking for Senior Level Undergraduate or Master students.

Technical Skills Required: (1) Learn to use PROTÉGÉ/WEBPROTEGE to implement ontology (2) Python (3) SQL

Project Completion deadline: End of June 2020.

Faculty Contact: For more information, please contact Dr. Thiru Ramaraj.

Business Analytics

Analyzing Trade and Operational Trends of Businesses

Description: Fortune 500 companies typically have multi-national operations that require the shipping of goods across international boundaries. Such shipping is essential for global trade but has impacts on energy consumption and emissions. By analyzing these companies and their trade patterns, we are able to assess the company’s impacts in these areas. We are also able to conduct scenario analyses to forecast trade and shipping decisions in the future. However, while various data sources are available regarding total domestic and international trade volumes, there is a gap in company-specific data. The goal of this project is to develop company-specific trade data and to visualize insights using the data. This is a joint project between DePaul University and Argonne National Laboratory.

Project details: In this project, 2-3 students will investigate the following research question: what are the sales volumes for each company in various regions, such as domestically and internationally? The project focuses on Fortune 500 companies in freight-intensive sectors. A first step would be mining annual “10-K” reports that companies file annually with the US Securities and Exchange Commission. The collection of reports and starter pre-processing in Python will be made available. The selected student will develop methods to scrape the available information from the reports. Scraping company websites or industry journals may also be conducted (the student would develop this process). All datasets are public. The resulting data on company trade statistics will be applied in Argonne’s agent-based freight model to study energy consumption and emissions due to goods movement. The project will provide value by improving the model’s forecast of trade patterns for multi-national companies in the model.

Data: The dataset is to be developed using publicly available text reports. The student(s) will develop text mining routines to extract the data. About 200 reports have already been downloaded and have had some pre-processing. These will be transmitted via Box to the center. Additional reports from online sources may be scraped to obtain additional information.

Skills required:

  • Good knowledge of Python or R
  • Experience in creating a data science project from start to end including data cleaning, exploratory data analysis, statistical analysis, data mining, and machine learning
  • Experience in web data scraping is a plus

Student Involvement: Looking for Undergraduate and Graduate students. Both Undergraduate and Master’s students can work on this project for their Capstone or Independent Study.

Faculty Contact: For more information, please contact Dr. Ilyas Ustun

Apply: https://forms.gle/ufR9jVarHGkkCMSi6

Emerging Trends in Food Delivery

Description: COVID has dramatically impacted how people shop. There is more e-commerce activity now than ever before. The goal of this work is to develop a quantitative understanding of how COVID has impacted last-mile delivery in urban areas—how much more activity is there, what are the traffic patterns that result from this activity in different areas of the city? The work should focus on Chicago, and it should focus on either change in (1) delivery services from large parcel carriers, including UPS, FedEx, and Amazon; or (2) delivery service by contractors—people like you and me who are picking up groceries, food or other items and delivering these goods to homes (e.g., GrubHub, Instacart, and others). #2 is preferred. This is a joint project between DePaul University and Argonne National Laboratory.

Project details: In this project, 2-3 students will investigate the following research question: How much daily traffic (say, on an average weekday in June) is due to delivery service by independent contractors and couriers (Instacart, etc.)? We want to find how much the drivers buy, how many deliveries they make, and how long are their trips. The resulting data will be applied in Argonne’s agent-based freight model to study energy consumption and emissions due to last-mile delivery goods delivery. The data will provide value by improving the model’s forecast of e-commerce and its impacts on urban freight.

Data: The dataset is to be developed, and as such, this is a good project for creative, resourceful students that have an interest in the topic. Successful students will gain a working knowledge of how to research an environment that often has many unknowns. The student will need to find relevant data by searching the internet, sales reports, contacting companies, working with professors, and using creativity. We are working with the business school to determine whether contacts are available. Other ideas for finding data include writing an anonymous short survey and disseminating it through social media; checking social media platforms for insights, or finding and leveraging sources that are similar to the “Rideshare Guy” (for Uber/Lyft drivers).

Skills required:

  • Good knowledge of Python or R
  • Experience in creating a data science project from start to end including data cleaning, exploratory data analysis, statistical analysis, data mining, and machine learning
  • Experience in web data scraping is a plus

Student Involvement: Looking for Undergraduate and Graduate students. Both Undergraduate and Master’s students can work on this project for their Capstone or Independent Study.

Faculty Contact: For more information, please contact Dr. Ilyas Ustun

Apply: https://forms.gle/ufR9jVarHGkkCMSi6

Changes in Trucking Fleets Over Time

Description: The Federal Motor Carrier Safety Administration (FMCSA) is the agency of the Department of Transportation (DOT) responsible for monitoring and developing safety standards for commercial motor vehicles operating in interstate commerce. The Motor Carrier Management Information System (MCMIS) is a computerized system whereby the FMCSA maintains a comprehensive record of the safety performance of the motor carriers (truck and bus) and hazardous materials shippers who are subject to the Federal Motor Carrier Safety Regulations (FMCSR) or Hazardous Materials Regulations (HMR). This is a joint project between DePaul University and Argonne National Laboratory.

Project details: In this project, 2-3 students will (1) develop an interactive dashboard with company fleet data, and (2) analyze the changes in fleets for all companies over time. Using the MCMIS data, a dashboard should show historical, as well as the most recent data downloaded from here. The student(s) should write a script to download data automatically from the above link and update the dashboard or database. The student(s) should also conduct statistical analysis such as the growth of companies over time using the number of drivers and powered units from the data. As a starting point, there is a simple example of an RShiny dashboard to show the data in tabular format with the source code provided. Data analysis possibilities include (a) time series analysis, (b) merging MCMIS data with public economic data, and conducting regression for more insights, or others. The project will provide value by allowing energy researchers to see trends in fleet ownership over time.

Data: The online MCMIS Census File contains records for a steadily growing number of active entities, i.e., motor carriers, hazardous materials shippers, entities that are a carrier and a shipper, or registrants (entities who register vehicles but are not carriers). To identify each entity, MCMIS assigns a unique number to each entity record. This number is referred to as the record census number. This is also the number supplied to an entity as the USDOT number.

Skills required:

• Good knowledge of Python or R

• Experience in creating a data science project from start to end including data cleaning, exploratory data analysis, statistical analysis, data mining, and machine learning

• Experience in web data scraping is a plus

• Experience in dashboard creation using Python or R is a plus.

• Preferred analytic tool is R and RShiny.

Student Involvement: Looking for Undergraduate and Graduate students. Both Undergraduate and Master’s students can work on this project for their Capstone or Independent Study.

Faculty Contact: For more information, please contact Dr. Ilyas Ustun

Apply: https://forms.gle/ufR9jVarHGkkCMSi6

Image Analysis and Classification

Species Classification in Images from Motion Triggered Cameras

Description: This project focuses on developing models that will be able to detect animals in images and identify the species. Since 2010, Lincoln Zoo researchers have set up motion-trigged cameras four times per year (spring, summer, fall, and winter) for 28 consecutive days at roughly 100 sites (city parks, cemeteries, golf courses, and forest preserves). These sites fall along a gradient of urbanization and range from downtown Chicago to about 23 miles outside of the city. They use these data to understand where species prefer to live in Chicago and to determine how urban land cover influences the species choices. However, the rate at which the data is collected far surpasses the rate at which they are labeled, and to date, there is a huge backlog of unlabeled images. Thus, automated machine learning ad deep learning models are needed to minimize the need for a human to label the images.

Data: The entire dataset contains roughly 3.25 million images (labeled + unlabeled). Image detection and classification are some of the main tasks to be performed.

Project details: The project can be carried out by a single student or group of students. The process will start with a significant literature survey. The project can be done in R or Python and can be supplemented with other data analytics tools.

Challenges: As these images are obtained through motion-triggered cameras there will be many false positives where there is no animal presence at all. Many of these will involve a human in the image. As a matter of fact, they end up sampling people just as often (if not more) than the species they set out to study! Thus, a significant part of the project will be identifying the presence of a human or an animal in the images. Many of the images might have nothing at all, things like rain, snow, wind might have triggered the camera to take a picture. There can be other artifacts as well regarding image quality and whatnot. Significant data preparation and cleaning are part of the project. By participating in this project, the student will learn how to perform in-depth data analysis and apply different machine learning and deep learning models to a real-life data set.

Skills required:

• Experience in creating a data science project from start to end including data cleaning, exploratory data analysis, statistical analysis, data mining, and machine learning using Python or R

• Significant knowledge in Neural Nets, Deep Learning, and Convolutional Neural Networks

• Experience in creating a deep learning project

Student Involvement: Looking for Undergraduate and Graduate students. Both Undergraduate and Master’s students can work on this project for their Capstone or Independent Study.

Faculty Contact: For more information, please contact Dr. Ilyas Ustun

Apply: https://forms.gle/ufR9jVarHGkkCMSi6

Interested?

If you are interested to work on any of these projects as part of your capstone project or independent study, please contact Dr. Daniela RaicuDr. Raffaella Settimi or the faculty listed for the project.