June 3rd

9:00

Video: Opening Address: Bias, Equity, and Anti-racism in Data Science
Emily Hadley – Data Scientist – RTI International
As statisticians and data scientists, we collect data, store data, transform data, visualize data, and ultimately impact how data are used. With this responsibility, it is imperative that we confront the ways in which data and algorithms have been used to perpetuate racism and eliminate racist decisions and algorithms in our own work. Join a conversation where we discuss the landscape of bias, equity, and anti-racism in data science and machine learning. Consider questions to ask throughout the data lifecycle and walk away with anti-racist tools to incorporate into data science and data-driven decision-making.

10:00

Video: Beyond Bias Audits: Bringing Equity to the Entire Machine Learning Pipeline
Slides
Irene Chen – PhD Student – MIT
Machine learning has demonstrated the potential to fundamentally improve healthcare because of its ability to find latent patterns in large observational datasets and scale insights rapidly. However, the use of ML in healthcare also raises numerous ethical concerns, often analyzed through bias audits. How can we address algorithmic inequities once bias has been detected? In this talk, we consider the pipeline for ethical machine learning in health and focus on two case studies. First, cost-based metrics of discrimination in supervised learning can decompose into bias, variance, and noise terms with actionable steps for estimating and reducing each term. Second, deep generative models can address left-censorship from unequal access to care in disease phenotyping. The talk will conclude with a discussion of directions for further research along the entire model development pipeline including problem selection and data collection.

11:00

Video: Workshop: Model Agnostic Interpretability
Slides
Xin Hunt, PhD – Senior Machine Learning Developer – SAS
This workshop provides an entry-level introduction to model interpretability. Traditionally, trust for complex models was developed by carefully testing their accuracy on unseen data, with no attempt to understand the underlying model. Interpretability methods aim to facilitate the understanding of the behavior of complex models both globally (across the entire input space) and locally (at each observation).

12:15

Book Buffet
Moderated by Elena Snavely – Manager of Corporate Analytics and Insights – SAS
No time to read the latest books on the wide world of bias in ML and AI?  Our panelists have got you covered!  Join us for lightning reviews of some of our favorite books on the way data and analytics is affecting our world.  Panelists will offer tips on how they stay informed in this ever changing field and offer insights into how ethics is affecting their career aspirations.

1:00

Video: Data Cistems: Transphobia in Automated Systems and How to Be More Inclusive
Kelsey Campbell – Founder – Gayta Science
Society’s rampant cissexism is all too present in the algorithms and technology that are increasingly part of our daily lives. This talk introduces gender diversity and the many ways automated gender classification models are a danger for people who are transgender, gender nonconforming, and/or intersex. Learn why we should and how we can do better to capture the complexity of gender in our data systems and be more inclusive.

2:00

Video: Keynote: Layers of Fairness
Kristian Lum, PhD – Asst. Research Professor – Univ. of Pennsylvania

3:00

Video: Synthetic Populations
Caroline Kery – Data Scientist – RTI International
Synthetic populations are a broad category of datasets which can be used with great affect to model impact on population groups. However, the individual level and often geospatial nature of these datasets mean that there are vital fairness and privacy considerations which those in data science need to be mindful of when using them. In this talk, we will cover the concept of synthetic populations and provide context for different ways they can be used (and misused) to explore, model, and present data. 

4:00

Slides: Responsible Storytelling Strategies
Sarah Egan Warren, PhD – Teaching Asst. Professor – Institute for Advanced Analytics
Making responsible decisions about storytelling strategies should happen at the beginning of any data project–not as an afterthought at the end of the project. Your data story is your deliverable. This talk will introduce three storytelling strategies that you can implement to reduce bias in your communication and increase representation, fairness, and equity for all your audience members.

June 4th

9:00

Workshop: Modeling with GAMs
Eugenia Anello – Data Science Intern – Statwolf
Learn the basics of modeling with Generalized Additive Models (GAMs), which provide nonlinear flexibility while maintaining explainability of prediction.

10:00

Slides: Keynote: Searching for Alternative Facts
Francesca Tripodi, PhD – Assistant Professor – UNC
While many have studied the problem of political polarization, algorithms, filter bubbles, and media manipulation; few have considered how epistemological frameworks shape information silos. My talk addresses this gap, providing an ethnographic account of how the way we see the world impacts media literacy practices. While our quests for truth may start in good faith, my research considers the risks of “doing your own research” by considering how Google is changing the way it orders information.

11:00

Video: Keynote: Optimization of Optimal Sparse Decision Trees
Slides
Cynthia Rudin, PhD – Professor – Duke
With widespread use of machine learning, there have been serious societal consequences from using black box models for high-stakes decisions, including flawed bail and parole decisions in criminal justice, flawed models in healthcare, and black box loan decisions in finance. Transparency and interpretability of machine learning models is critical in high stakes decisions. In this talk, I will focus on a fundamental and important problem in the field of interpretable machine learning: optimal sparse decision trees. We would like to find trees that maximize accuracy and minimize the number of leaves in the tree (sparsity). This is an NP hard optimization problem with no polynomial time approximation. I will present the first practical algorithm for solving this problem, which uses a highly customized dynamic-programming-with-bounds procedure, computational reuse, specialized data structures, analytical bounds, and bit-vector computations. We can sometimes produce provably optimal sparse trees in about the same amount of time that CART produces a (non-optimal, greedy) decision tree. 

12:15

Engage Trivia
Moderated by Jim Box – Principal Data Scientist, SAS Cyber Analytics
Come test your current knowledge of responsible machine learning topics and compete against others in this fun trivia session. Questions will be pulled from applied research, analytical topics, and news headlines around the world. Prizes will be given to the top 3 performers!

1:00

Video: Keynote: Trustworthy AI
Jeannette M. Wing, PhD – Avanessians Director of the Data Science Institute and Professor of Computer Science at Columbia University
Recent years have seen an astounding growth in deployment of AI systems in critical domains such as autonomous vehicles, criminal justice, healthcare, hiring, housing, human resource management, law enforcement, and public safety, where decisions taken by AI agents directly impact human lives. Consequently, there is an increasing concern if these decisions can be trusted to be correct, reliable, fair, and safe, especially under adversarial attacks. How then can we deliver on the promise of the benefits of AI but address these scenarios that have life-critical consequences for people and society? In short, how can we achieve trustworthy AI?

Under the umbrella of trustworthy computing, there is a long-established framework employing formal methods and verification techniques for ensuring trust properties like reliability, security, and privacy of traditional software and hardware systems. Just as for trustworthy computing, formal verification could be an effective approach for building trust in AI-based systems. However, the set of properties needs to be extended beyond reliability, security, and privacy to include fairness, robustness, probabilistic accuracy under uncertainty, and other properties yet to be identified and defined. Further, there is a need for new property specifications and verification techniques to handle new kinds of artifacts, e.g., data distributions, probabilistic programs, and machine learning based models that may learn and adapt automatically over time. This talk will pose a new research agenda, from a formal methods perspective, for us to increase trust in AI systems.

2:00

Video: Hate-speech detection is not as easy as it seems: A tale of bias, overfitting and experimental errors
Barbara Poblete, PhD – Associate Professor – University of Chile
Hate speech is an important problem that is seriously affecting the dynamics and usefulness of online social communities. Large scale social platforms are currently investing important resources into automatically detecting and classifying hateful content, without much success. On the other hand, the results reported by state-of-the-art systems indicate that supervised approaches achieve almost perfect performance but only within specific datasets. This talk explores this apparent contradiction between existing literature and actual applications and presents evidence of methodological issues, as well as an important dataset bias.

3:00

Video: Workshop: Interpretable Image Recognition
Alina Barnett – PhD Student – Duke
Machine learning models are being deployed in high-stakes settings including the medical, financial and criminal justice domains. Existing models are often “black box,” meaning that the reasoning used to make decisions is not easily understood by humans. We would like to deploy models that make accurate predictions consistent with known medical science, government regulations and ethical considerations. Inherently interpretable models address this need by explaining the rationale behind each decision while maintaining equal or higher accuracy compared to black box models.
We developed a novel interpretable neural network algorithm that uses case-based reasoning to classify images. Using only image-level labels, the model learns a set of ‘prototypical’ features for each image class that correspond to parts of the training data. Thus, the model is able to yield helpful explanations that are qualitatively similar to the way ornithologists, physicians, and others explain challenging image classification tasks. In this workshop, we will go over how this model works and how you can use the publicly available code base to develop your own interpretable image-classification models.