Learning to increase the power of conditional randomization tests

Shalev Shaer; Yaniv Romano

doi:10.1007/s10994-023-06302-3

Learning to increase the power of conditional randomization tests

Shalev Shaer, Yaniv Romano

Computer Science

Research output: Contribution to journal › Article › peer-review

1 Scopus citations

Abstract

The model-X conditional randomization test is a generic framework for conditional independence testing, unlocking new possibilities to discover features that are conditionally associated with a response of interest while controlling type I error rates. An appealing advantage of this test is that it can work with any machine learning model to design powerful test statistic. In turn, the common practice in the model-X literature is to form a test statistic using machine learning models, trained to maximize predictive accuracy with the hope to attain a test with good power. However, the ideal goal here is to drive the model (during training) to maximize the power of the test, not merely the predictive accuracy. In this paper, we bridge this gap by introducing novel model-fitting schemes that are designed to explicitly improve the power of model-X tests. This is done by introducing a new cost function that aims at maximizing the test statistic used to measure violations of conditional independence. Using synthetic and real data sets, we demonstrate that the combination of our proposed loss function with various base predictive models (lasso, elastic net, and deep neural networks) consistently increases the number of correct discoveries obtained, while maintaining type I error rates under control.

Original language	English
Pages (from-to)	2317-2357
Number of pages	41
Journal	Machine Learning
Volume	112
Issue number	7
DOIs	https://doi.org/10.1007/s10994-023-06302-3
State	Published - Jul 2023

Keywords

Conditional independence testing
Conditional randomization test
Controlled feature selection
False discovery rate
Model-X Knockoffs

ASJC Scopus subject areas

Software
Artificial Intelligence

Access to Document

10.1007/s10994-023-06302-3

Cite this

@article{027618cff7124b4fad9d6d176106b396,

title = "Learning to increase the power of conditional randomization tests",

abstract = "The model-X conditional randomization test is a generic framework for conditional independence testing, unlocking new possibilities to discover features that are conditionally associated with a response of interest while controlling type I error rates. An appealing advantage of this test is that it can work with any machine learning model to design powerful test statistic. In turn, the common practice in the model-X literature is to form a test statistic using machine learning models, trained to maximize predictive accuracy with the hope to attain a test with good power. However, the ideal goal here is to drive the model (during training) to maximize the power of the test, not merely the predictive accuracy. In this paper, we bridge this gap by introducing novel model-fitting schemes that are designed to explicitly improve the power of model-X tests. This is done by introducing a new cost function that aims at maximizing the test statistic used to measure violations of conditional independence. Using synthetic and real data sets, we demonstrate that the combination of our proposed loss function with various base predictive models (lasso, elastic net, and deep neural networks) consistently increases the number of correct discoveries obtained, while maintaining type I error rates under control.",

keywords = "Conditional independence testing, Conditional randomization test, Controlled feature selection, False discovery rate, Model-X Knockoffs",

author = "Shalev Shaer and Yaniv Romano",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature.",

year = "2023",

month = jul,

doi = "10.1007/s10994-023-06302-3",

language = "אנגלית",

volume = "112",

pages = "2317--2357",

number = "7",

}

TY - JOUR

T1 - Learning to increase the power of conditional randomization tests

AU - Shaer, Shalev

AU - Romano, Yaniv

PY - 2023/7

Y1 - 2023/7

N2 - The model-X conditional randomization test is a generic framework for conditional independence testing, unlocking new possibilities to discover features that are conditionally associated with a response of interest while controlling type I error rates. An appealing advantage of this test is that it can work with any machine learning model to design powerful test statistic. In turn, the common practice in the model-X literature is to form a test statistic using machine learning models, trained to maximize predictive accuracy with the hope to attain a test with good power. However, the ideal goal here is to drive the model (during training) to maximize the power of the test, not merely the predictive accuracy. In this paper, we bridge this gap by introducing novel model-fitting schemes that are designed to explicitly improve the power of model-X tests. This is done by introducing a new cost function that aims at maximizing the test statistic used to measure violations of conditional independence. Using synthetic and real data sets, we demonstrate that the combination of our proposed loss function with various base predictive models (lasso, elastic net, and deep neural networks) consistently increases the number of correct discoveries obtained, while maintaining type I error rates under control.

AB - The model-X conditional randomization test is a generic framework for conditional independence testing, unlocking new possibilities to discover features that are conditionally associated with a response of interest while controlling type I error rates. An appealing advantage of this test is that it can work with any machine learning model to design powerful test statistic. In turn, the common practice in the model-X literature is to form a test statistic using machine learning models, trained to maximize predictive accuracy with the hope to attain a test with good power. However, the ideal goal here is to drive the model (during training) to maximize the power of the test, not merely the predictive accuracy. In this paper, we bridge this gap by introducing novel model-fitting schemes that are designed to explicitly improve the power of model-X tests. This is done by introducing a new cost function that aims at maximizing the test statistic used to measure violations of conditional independence. Using synthetic and real data sets, we demonstrate that the combination of our proposed loss function with various base predictive models (lasso, elastic net, and deep neural networks) consistently increases the number of correct discoveries obtained, while maintaining type I error rates under control.

KW - Conditional independence testing

KW - Conditional randomization test

KW - Controlled feature selection

KW - False discovery rate

KW - Model-X Knockoffs

UR - http://www.scopus.com/inward/record.url?scp=85147303625&partnerID=8YFLogxK

U2 - 10.1007/s10994-023-06302-3

DO - 10.1007/s10994-023-06302-3

M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???

AN - SCOPUS:85147303625

SN - 0885-6125

VL - 112

SP - 2317

EP - 2357

JO - Machine Learning

JF - Machine Learning

IS - 7

ER -

Learning to increase the power of conditional randomization tests

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this