**Dr. Juan Meza** is appointed Division Director for the NSF Division of Mathematical Sciences (DMS) effective February 20, 2018. "Dr. Meza joins the NSF from the University of California, Merced where he served most recently as the Dean of Natural Sciences. Dr. Meza’s distinguished career spans more than 30 years, with leadership and management experiences working in academia, national laboratories, and industry. He served as Department Head and Senior Scientist of High Performance Computing Research at Lawrence Berkeley National Laboratory and was a Distinguished Member of the Technical Staff at Sandia National Laboratories. Dr. Meza has also been a member of the NSF Mathematical and Physical Sciences Advisory Committee and multiple Committees of Visitors for DMS as well as a member of the NSF CISE Advisory Committee for Cyberinfrastructure. He has also served on the board of directors of the Society for the Advancement of Chicanos and Native Americans in the Sciences (SACNAS), the National Academies Board on Mathematical Sciences and its Applications, and the American Association for the Advancement of Science (AAAS) Council, representing the Section on Mathematics. Dr. Meza holds bachelor's and master’s degrees in electrical engineering, and master's and doctoral degrees in computational and applied mathematics, from Rice University."

Meza is a member of the AMS and has been involved in several AMS programs. He co-organized the Mathematics Research Communities session, *Scientific Computing and Advanced Computation*, in 2008, served on an AMS-NSF Mathematical Congress of the Americas (MCA) Travel Grants Committee, and gave a podcast interview on the many ways math is used to understand climate change. Meza was also featured in "Lathisms: Latin@s and Hispanics in Mathematical Sciences," in *Notices of the AMS* (October 2016).

See also: "NSF Tabs Former Dean Juan Meza to Lead Division of Mathematical Sciences," by Jason Alvarez, *UC Merced News*, February 21, 2018. (Photo courtesy of UC Merced.)

Categories: Math and Stats

Categories: Math and Stats

Former IMS President, *Annals of Statistics *and *IMS Bulletin *editor Bernard Silverman was knighted in the UK’s 2018 New Year Honours List for public service and services to Science. Bee Gees singer Barry Gibb and Beatles drummer Ringo Starr (among others) also received the same honour.

Sir Bernard Silverman’s research has ranged widely across theoretical and practical aspects of statistics, and is recognized as a pioneer of computational statistics. He has published extensively, covering aspects from the fundamental mathematical properties of new methods to computer packages for their implementation, and has also collaborated with researchers in many other scientific fields and provided statistical consultancy in industry, commerce and Government.

Following the award of a Gold Medal at the 1970 International Mathematical Olympiad, Bernard studied Mathematics and then Statistics at Cambridge University. In parallel with his doctoral research into computational statistics, he co-designed the first pocket programmable calculator, the Sinclair Cambridge Programmable. He went on to senior academic and leadership posts at the Universities of Bath, Bristol and Oxford, and also spent substantial time as a visitor at Stanford and other universities. From 2010–2017 he worked as Chief Scientific Adviser to the UK Government’s Home Office. He now works freelance, including research, charity trusteeship, consultancy, and advice to Government.

His main current research activity is in modern slavery, building on his work for the Home Office in producing the first scientific estimate of the prevalence of modern slavery in the UK. His estimate of 10,000 to 13,000 victims played a pivotal role in the launch of the strategy leading to the Modern Slavery Act 2015, and he is now involved in developing the methodology further and in applying it world-wide.

His other main interest is in security, as chair of the panel set up to give specialist advice to the senior judges who provide independent oversight of the use of investigatory powers by intelligence agencies, police forces and other public authorities. In addition, his concerns include the modernization of the census, research integrity, scientific matters relevant to public policy generally, and diversity and equality issues.

As well as being an IMS Fellow, Bernard (we should now say Sir Bernard) is a Fellow of the UK Royal Society and the Academy of Social Sciences, and a recipient of the COPSS Presidents’ Award and the RSS Guy Medals in Silver and Bronze. He has been awarded the honorary degree of Doctor of Science by the Universities of St Andrews, Lancaster, Bath and Bristol.

Categories: Math and Stats

The Statistical and Applied Mathematical Sciences Institute (SAMSI) has announced the hiring of a new director. David Banks, Duke University, has assumed the role, replacing Richard Smith. SAMSI is one of eight mathematical sciences institutes created by the National Science Foundation, and it is the only one in which statistics plays a large role.

David Banks lays out his initial agenda below and asks for your feedback and suggestions about programs on subjects that interest you. He says, “SAMSI’s future plan is to continue doing what it does well, which is fostering research and new careers at the interface of mathematics and statistics. But it will also move in some new directions. First, we will place greater emphasis upon data science, and reach out to partner more closely with researchers in computer science and related fields. Secondly, we will explore moving towards shorter programs—instead of the year-long programs that are the current practice, SAMSI intends to pursue some semester-long programs (similar to the Isaac Newton Institute). One consequence of this is that the number of programs will increase, creating more opportunities for scientists to propose and lead these initiatives.

“Currently, SAMSI is finishing two programs: one on Mathematical and Statistical Methods for Climate and the Earth System, and one on Quasi-Monte Carlo and High Dimensional Sampling Methods for Applied Mathematics. Next year, there will be two nine-month programs, one on Statistical, Mathematical and Computational Methods for Precision Medicine, and the other on Model Uncertainty: Mathematical and Statistical. After that, SAMSI will move towards shorter programs, and is currently entertaining proposals for programs on Causal Inference and on Games, Risk, and Decision Theory.

“Creating a good program requires significant forward planning. One needs to line up a small core of prominent researchers (often on sabbatical) who are willing to visit SAMSI and the three local universities (North Carolina State University, UNC–Chapel Hill, and Duke University) for extended periods of time, and to work with the SAMSI postdoctoral fellows. One also needs to enroll a large number of researchers who are willing to attend and present at SAMSI workshops, to help frame the research agenda and to get the appropriate conversations started. This is substantial work, but it can help build a career and be professionally gratifying. And SAMSI generally provides travel support to the workshops and a limited amount of support for long-term visitors, which helps with the recruitment.”

He added, “But the real reasons to lead a research program are that, one, it is a unique opportunity to personally shape the future of the discipline, and two, it is a lot of fun!”

If you have any tentative ideas for a future program, please contact the SAMSI directorate: dbanks@samsi.info or go to www.samsi.info.

Categories: Math and Stats

The Alan Turing Institute, the UK’s national institute for data science, has appointed Professor Sir Adrian Smith FRS as Institute Director. Professor Smith, currently Vice-Chancellor at the University of London, will take up his new role later this year.

Howard Covington, Chair of The Alan Turing Institute, said: “I am delighted that Adrian has agreed to lead the Institute. He not only has a formidable academic record and a deep commitment to advancing scientific excellence but also a huge breadth of experience leading world-class research organisations and working with and within government.”

Adrian Smith said: “The Alan Turing Institute has a unique role to play in ensuring that the UK fully exploits the potential of advances in data science and AI to transform business and social systems for the benefit of society.”

Categories: Math and Stats

“The Institute of Mathematical Statistics (IMS) is a society committed to the freedom of professional expression. The society wishes to foster a productive environment for the exchange of ideas and values participation of all members of the statistical community. The society, therefore, considers it essential that professional conduct is observed at all its functions. Accordingly, all attendees of IMS sponsored and co-sponsored events are expected to show respect and courtesy to other attendees. The society is currently devising specific rules of conduct and institutional mechanisms for enforcement of these rules. In the meantime, IMS members and attendees of IMS functions are advised that the society can and will take steps to guarantee a professional atmosphere and, in particular, will not tolerate harassment in any form.”

Categories: Math and Stats

Sanghamitra Bandyopadhyay, the Director of the Indian Statistical Institute, has been awarded the 2017 Infosys Science Foundation prize and the 2017 TWAS prize in engineering sciences for the impact of her work in the field of computer and engineering sciences.

The Infosys award is given annually to recognize the best scientists and scholars to honor their current work. She was specifically recognized for her work on algorithmic optimization with applications in marker identification in genetics, Alzheimer’s, HIV, and cancer research. The award carries $100,000, a gold medal, and a citation. Nobel Laureate Kip Thorne presided over the award giving ceremony in Bangalore in January, 2018.

The TWAS Prize (from The World Academy of Sciences) is given in recognition of excellence in research in the global south. The prize carries a plaque and a $15,000 cash award. Dr. Bandyopadhyay is the first scientist from the Indian Statistical Institute to have been elected a TWAS Fellow, since the inception of the award in 1985. She will be presented with the award in the 2018 annual general conference of the academy.

Categories: Math and Stats

*We introduce the first two in a series of IMS special lecture previews for 2018. Richard Samworth and Thomas Mikosch are two of this year’s Medallion Lecturers. Both of them will be giving their Lecture at the IMS Annual Meeting in Vilnius, Lithuania, July 2–6, 2018. The program will be announced soon. We’ll bring you more lecture previews in the next issues.*

Medallion preview: Richard Samworth

**Richard Samworth **is the Professor of Statistical Science and Director of the Statistical Laboratory at the University of Cambridge. He obtained his PhD in Statistics, also from the University of Cambridge, in 2004. His main research interests are in nonparametric and high-dimensional statistical inference. Particular topics include nonparametric function estimation problems (including under shape constraints), nonparametric classification, high-dimensional variable selection and dimension reduction. Richard serves as an Associate Editor for the *Annals of Statistics *and *Statistical Science*, as well as the *Journal of the American Statistical Association. *He has been awarded the Adams Prize (2017, joint with Graham Cormode), a Leverhulme prize (2014), the Royal Statistical Society’s Guy Medal in Bronze (2012) and Research prize (2008), and is an ASA Fellow (2015) and IMS Fellow (2014). Richard’s Medallion Lecture will be given at the IMS Vilnius meeting, July 2–6, 2018.

**Efficient entropy estimation, independence testing and more… all with k-nearest neighbour distances**

Nearest neighbour methods are most commonly associated with classification problems, but in fact they are very flexible and can be applied in a wide variety of statistical tasks. They are conceptually simple, can be computed easily even in multivariate problems, and we will argue in this talk that they can lead to methods with very attractive statistical properties. Our main focus is on entropy estimation [1] and independence testing [2], though if time permits, we may discuss other applications.

It was the founding father of information theory, Claude Shannon, who recognised the importance as a measure of unpredictability of the density functional

$H(f) = -\int f \log f.$

The polymath John von Neumann advised him to call it “entropy” for two reasons: “In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage”! In statistical contexts, it is often the estimation of entropy from a random sample that is of main interest, e.g. in goodness-of-fit tests of normality or uniformity, independent component analysis and feature selection in classification.

Kozachenko and Leonenko [3] proposed an intriguing closed-form estimator of entropy based on *k*th nearest neighbour distances; it also involves both the volume of the unit ball in *d* dimensions and the digamma function. Remarkably, under appropriate conditions, it turns out that a weighted generalisation of this estimator is efficient in arbitrary dimensions.

Testing independence and estimating dependence are well-

established areas of statistics, with the related idea of correlation dating back to Francis Galton’s 19th century work, which was subsequently expanded upon by Karl Pearson. Mutual information, a close cousin of entropy, characterises the dependence between two random vectors *X* and *Y* in a particularly convenient way. We can therefore adapt our entropy estimator to propose a new test of independence, which we call MINT, short for **M**utual **IN**formation **T**est. As well as having guaranteed nominal size, our test is powerful in the sense that it can detect alternatives whose mutual information is surprisingly small. We will also show how modifications of these ideas can be used to provide a new goodness-of-fit test for normal linear models.

**References:**

[1] Berrett, T.B., Samworth, R.J. and Yuan, M. (2018) Efficient multivariate entropy estimation via *k*-nearest neighbour distances. *Ann. Statist.*, to appear.

[2] Berrett, T.B. and Samworth, R.J. (2017) Nonparametric independence testing via mutual information.

https://arxiv.org/abs/1711.06642.

[3] Kozachenko, L.F. and Leonenko, N.N. (1987) Sample estimate of the entropy of a random vector. *Probl. Inform. Transm.*, **23**, 95–101.

Medallion preview: Thomas Mikosch

**Thomas Mikosch** received his PhD in Probability Theory at the University of Leningrad (St. Petersburg) in 1984. He is Professor of Actuarial Science at the University of Copenhagen. His scientific interests are at the interface of applied probability and mathematical statistics. In particular, he is interested in heavy-tail phenomena, extreme value theory, time series analysis, and random matrix theory. He has published about 130 scientific articles and five books. Thomas is a member of the Bernoulli Society (BS), Danish Statistical Association, Danish Association of Actuaries, Danish Royal Society of Sciences and Letters, and is an IMS Fellow. He has (co-)organized numerous conferences, workshops and PhD schools. Currently, he is Associate Editor of various journals, Editor of *Bernoulli* and *European Actuarial Journal*, EiC of the *Extremes Journal*, and he was the EiC of *Stochastic Processes and their Applications* in 2009–2012. He is one of the editors of the Springer book series *Operations Research and Financial Engineering*. He has served on the Itô Prize Committee since 2009. In the BS he chairs the Publications Committee, is Publications Secretary and a member of the Executive Council. In 2018 he was awarded the Alexander von Humboldt Research Prize. Thomas will also deliver his Medallion Lecture at the IMS Vilnius meeting, July 2–6, 2018.

**Regular variation and heavy-tail large deviations for time series**

The goal of this lecture is to present some of the recent results on heavy-tail modeling for time series and the analysis of their extremes.

Over the last 10–15 years, research in extreme value theory has focused on the interplay between the serial extremal dependence structure and the tails of time series. In this context, heavy-tailed time series (as appearing in finance, climate research, hydrology, and telecommunications) have been studied in detail, leading to an elegant probabilistic theory and statistical applications.

Heavy tails of the finite-dimensional distributions are well described by multivariate regular variation: it combines power-law tails of the marginal distributions and a flexible dependence structure which describes the directions at which extremes are most likely to occur; see Resnick (2007) for an introductory text to multivariate regular variation.

A second line of research has continued through the years but attracted less attention: heavy-tail large deviations. In the 1960s and 1970s A.V. and S.V. Nagaev started studying the probability of the rare event that a random walk with iid heavy-tailed step sizes would exceed a very high threshold far beyond the normalization prescribed by the central limit theorem. In the case of subexponential (in particular regularly varying) distributions the tail of the random walk above high thresholds is essentially determined by the maximum step size. Later, related results were derived for time series models by Davis and Hsing (1995), Mikosch and Wintenberger (2014, 2016), among others. Here, the main difficulty is to take into account clustering effects of the random walk above high thresholds.

Regular variation and heavy-tail large deviations are two aspects of dependence modeling in an extreme world. They are similar in the sense that they are closely related to the weak convergence of suitable point processes. Actually, both regular variation and heavy-tail large deviations are defined via the vague convergence of suitably scaled probability measures whose (infinite) limit measure has interpretation as the intensity measure of a Poisson process. In the heavy-tailed time series world this relationship opens the door to the Poisson approximation of extreme objects such as the upper order statistics of a univariate sample, the largest eigenvalues of the sample covariance matrix of a very high-dimensional time series, and to functionals acting on them.

**References:**

[1] Davis, R. A. and Hsing, T. (1995) Point process and partial sum convergence for weakly dependent random variables with infinite variance. *Ann. Prob.,* **23**, 879–917.

[2] Mikosch, T. and Wintenberger, O. (2014) The cluster index of regularly varying sequences with applications to limit theory for functions of multivariate Markov chains. *Probab. Theory Rel. Fields*, **159**, 157–196.

[3] Mikosch, T. and Wintenberger, O. (2016) A large deviations approach to limit theory for heavy-tailed time series. *Probab. Theory Rel. Fields*, **166**, 233–269.

[4] Nagaev, A. V. (1969) Integral limit theorems for large deviations when Cramér’s condition is not fulfilled I, II. *Theory Probab. Appl.,* **14**, 51–64 and 193–208.

[5] Nagaev, S. V. (1979) Large deviations of sums of independent random variables. *Ann. Probab.*, **7**, 745–789.

[6] Resnick, S. I. (2007) *Heavy-Tail Phenomena: Probabilistic and Statistical Modeling. *Springer, New York.

Categories: Math and Stats

*Observational Studies*, an IMS-affiliated journal, has an updated website. Check it out at https://obsstudies.org/. *Observational Studies *is a peer-reviewed journal that publishes manuscripts on all aspects of observational studies. The journal is open access and has no publication charges. Papers are posted to the website rapidly when accepted.

Categories: Math and Stats

Yoram Gat’s third column considers whether democracy would be better served by sortition:

For about 2,500 years, statistical sampling was closely linked with democracy. “Selection by lot is natural to democracy, as that by choice [i.e., elections] is to aristocracy,” asserted Aristotle in the 4th century BC, following his own first-hand experience at Athens and the conventional wisdom of his time. Montesquieu concurred in the first half of the 18th century. It was only in the last 200 years, as democracy displaced aristocracy as the legitimate organizing principle of politics, that *sortition**—*the delegation of power by statistical sampling—had to be air-brushed out of history and political science. And so today, it is commonly claimed that delegation of power was unknown to the Athenians and that their government was a “direct democracy”, governed solely by the mass body of the Athenian Assembly. Delegation of power, it is said, is a modern innovation that was necessitated by the much larger size of the modern polity.

This version of history is not only false (as the testimony of Aristotle shows), but *must be* false. A city with tens of thousands of citizens, as Athens was, could no more run its business without delegation than a country of millions can. Like the modern electorate, the Athenian Assembly could vote, but it could not write the proposals it voted on. Law-writing, as well as many other functions of government, cannot be “crowdsourced” and the only question is how those few who carry out those functions are selected. Some Greek cities, like Sparta, used the familiar selection mechanism of elections, but, as Aristotle indicates, those were considered oligarchical cities. Athens and other democratic cities had their law-writers statistically sampled (i.e., selected by lot) from the citizen body.

The idea that sortition is democratic while elections are oligarchical was so conventional, it seems, that despite being mentioned by multiple extant ancient texts, it is nowhere explicitly rationalized. As part of the attempt to dismiss sampling as a political device it is sometimes claimed today that its use in Athens was motivated by the superstition that randomization allowed the gods to make the selection. However, the historical record indicates that the main motivation behind the practice was the law of large numbers. It was expected that sortition would produce a group that would mirror the population in important respects. This was often stated as an expectation of resemblance between the population and the sample in terms of wealth and social status (i.e., that most members would be poor commoners) but it was taken for granted that these characteristics would be correlated with certain interests and beliefs.

In modern Western political ideology, there is significant equivocation regarding the desirability of having political power held by a statistically representative sample. The American founding fathers explicitly rejected democracy as nothing but mob rule. Their elections-based system was not advertised as being a democracy but a republic, where government is for the people, but not by the people. Jefferson put it this way: “[T]here is a natural aristocracy among men. The grounds of this are virtue and talents. […] May we not even say that that form of government is the best which provides the most effectually for a pure selection of these natural *aristoi* into the offices of government?” Thus, quite realistically, elections were not offered as a way to put in power average citizens, but instead, rather optimistically, as the way to select that “citizen whose merit may recommend him to the esteem and confidence of his country.”

Over time, optimism about the ability and willingness of the natural aristocracy to hold power and to use it for the benefit the people at large became harder to sustain, and explicitly paternalistic positions such as Jefferson’s were rejected. The term “republic” gave way to “democracy”, and conventional political ideology has come to hold that each person is the best judge of their own interest. But while ideology progressed, political institutions remained largely unchanged, and the same system of government that was explicitly designed to be non-democratic was rebranded as the quintessence of democracy.

Today it is accepted that certain groups—such as those defined by gender, ethnicity and sexual orientation—should be present in a democratic government according to their proportion in the population. Again, it is taken for granted that those characteristics are correlated with certain interests and beliefs and those should be represented in government proportionally. And yet gross distortions in terms of other characteristics—for example, age, wealth, profession and education—are matter-of-factly accepted as natural, and possibly desirable. Undoubtedly, those characteristics are correlated with interests and beliefs as well, and unless Jefferson’s premise—that some people are better off being represented by those who are naturally their betters—is accepted, then it is hard to understand how such a government could be considered democratic. Furthermore, since such distortions are unavoidable in any electoral system, and indeed in any deterministic selection system, it is hard to understand how any system in which representation is not based on statistical sampling could be considered democratic.

*Yoram would be happy to have a critical and skeptical conversation about the topics he discusses in this column. He invites readers to comment on this column below*,* **or you can email us at ***bulletin@imstat.org .**

Categories: Math and Stats

**Xiao-Li Meng writes:**

My sabbatical orientation at Lugano (see the last *XL-Files*) boosted my over-confidence into double digits. Anyone who asked about my sabbatical plan would get an ambitious answer: that I would complete 14 articles during my sabbatical year. The year is now (at the time of writing) 58.33% over. My accomplishment, you guessed it, is significantly lower—39.29% to be exact. In addition to the usual non-linear path of research progress, what slows me down are the never-ending errors I manage to create. Every morning I promised myself that this would be the day for the final proofreading. Yet I would retire in the evening with another 20–30 red circles on the draft. This happened on Dec 21, Dec 22, and Dec 23, a replay of *Groundhog Day,* undoubtedly pleasing card-carrying frequentists. I took a deep breath on Christmas Eve, forcing my fingers to plunge into the submission system faster than the rising temptation for yet another final proofreading. Finally, I could have a proofreading-free Christmas day.

Most of my errors are of a writing nature. Spellcheck has saved me thousands of times, but it cannot save me from confusing “a/an” with “the”, or mistaking “crispy” for “crisp”. It’s extremely frustrating as a non-native speaker, as I simply do not possess the kind of this-does-not-sound-right gut feeling. Far more time consuming, however, is seeking an enticing flow for both novice and expert readers. I almost never get the flow right on the first few tries, and sometimes a “final” proofreading compels a major reorganization. It’s always an internal struggle between the impulse to have a fast publication and the desire to make it a well-written, long-lasting article. The mantra *“It’s hard to publish, but impossible to unpublish”* can be very helpful when conducting this internal dialogue.

Indeed, I wish I had understood this mantra when I was publishing my thesis work. I managed to publish quite a few papers out of my thesis, but at least one of them I wish now I could unpublish. To be sure, it contains no technical error that I am aware of, nor can it have many writing errors—after all, it was published in a top journal. I was proud because it represented the first idea for which I could claim full credit and genuine novelty simultaneously. Before that work, all hypothesis testing procedures with multiply imputed data sets were based on Wald-type test statistics. One day, I just had this cute idea of manipulating complete-data likelihood ratio functions to compute the multiple-imputation likelihood tests almost as effortlessly as the Wald-type tests. I established theoretical validity and demonstrated its satisfactory performance on a real dataset, which apparently convinced the reviewers.

Over the years, the procedure got into a software package, and then inquiries came in. Why did the software produce negative test statistic values, when the reference distribution is an F distribution? I knew the answer. The test was built on an asymptotic equivalence between Wald and likelihood-ratio statistics, and how soon the asymptotics kick in would depend on the parametrization. It thus came as no surprise that it could fail badly with small datasets.

I then asked a wonderful student, Keith Chan, to seek the optimal parametrization. Soon he reported back that the problem was much worse than I realized. The asymptotic equivalence I relied on is guaranteed only under the null hypothesis. But the procedure I proposed uses this equivalence to estimate a key nuisance parameter, the fraction of missing information (FMI). When the null fails, which we typically hope for, the FMI can be so badly estimated that the test may have essentially zero power!

How on earth did I not check for power? A consequence of rushing for publication? Carried away by one cute idea? A sign of research immaturity? All of the above! What depresses me the most is that all the defects of my proposal were automatically fixed by Keith’s “test of average” guided by the likelihood principle. In contrast, my cute idea relies on “average of tests”, guided by a computational trick rather than statistical principles. Computational convenience should always be an important consideration. But when it becomes the driving force, we must keep in mind that computationally convenient bad procedures can do more harm than computationally inconvenient bad (and good) procedures.

Apparently, I had not learned this lesson well when I set my sabbatical goal of completing 14 papers. It should have been to produce at least one paper that will still have positive impact in 140 years. Surely our professional reward systems cannot possibly rely on such long-term qualitative measures. But that is exactly the reason that we need to remind ourselves constantly of the impossibility of unpublishing, to combat the tendency to pursue quantity over quality. Read and revise eight times before submitting.

Categories: Math and Stats

Bhramar Mukherjee from University of Michigan writes: Congratulations to **Tyler J. VanderWeele** of the Harvard T.H. Chan School of Public Health, the recipient of the 2017 COPSS Presidents’ Award. This award is presented annually to a young member of one of the participating societies of the Committee of Presidents of Statistical Societies (COPSS) in recognition of outstanding contributions to the profession of statistics. The award citation recognized Professor VanderWeele: “for fundamental contributions to causal inference and the understanding of causal mechanisms; for profound advancement of epidemiologic theory and methods and the application of statistics throughout medical and social sciences; and for excellent service to the profession including exceptional contributions to teaching, mentoring, and bridging many academic disciplines with statistics.”

Tyler VanderWeele was born in Chicago, Illinois, and raised in San Jose, Costa Rica, Bulgaria and Austria. He received his BA in Mathematics from the University of Oxford in 2000 and also completed there the requirements for a second BA in Philosophy and Theology. He received an MA in Finance from the Wharton School, University of Pennsylvania in 2002, and his PhD in Biostatistics at Harvard University in 2006, with his dissertation entitled *Contributions to the Theory of Causal Directed Acyclic Graphs,* with James Robins as dissertation advisor. He began as an Assistant Professor of Biostatistics at the University of Chicago, Department of Health Studies (now Public Health Sciences) in 2006, returning to Harvard as Associate Professor of Epidemiology in the Departments of Epidemiology and Biostatistics in 2009. He was promoted to Full Professor with tenure at Harvard University in 2013, and was just appointed the John L. Loeb and Frances Lehman Loeb Professor of Epidemiology.

His research concerns methodology for distinguishing between association and causation in observational studies, and the use of statistical and counterfactual ideas to formalize and advance epidemiologic theory and methods. Within causal inference, he has made important contributions to theory and methods for mediation, interaction, and spillover effects; theory for causal directed acyclic graphs; methodologies for sensitivity analysis for unmeasured confounding; and philosophical foundations for causal inference. He has also made contributions to measurement error and misclassification, to the formalization of epidemiologic concepts, and to study design. His empirical research spans psychiatric, perinatal, and social epidemiology; the science of happiness and flourishing; and the study of religion and health, including both religion and population health and the role of religion and spirituality in end-of-life care. In the twelve years following the receipt of his PhD, he has published over 250 papers in peer-reviewed journals, including 140 first- or sole- author papers in premier statistics, biomedical, and social science journals; he is author of the book *Explanation in Causal Inference: Methods for Mediation and Interaction* (Oxford University Press). He has served on the editorial boards of *Annals of Statistics*, *Journal of the Royal Statistical Society Series B*, *Epidemiology*, *American Journal of Epidemiology*, and *Sociological Methods and Research*. He is co-founder and editor-in-chief of the journal *Epidemiologic Methods*. He also serves Co-Director of the Initiative on Health, Religion and Spirituality, faculty affiliate of the Harvard Institute for Quantitative Social Science, and Director of the Program on Integrative Knowledge and Human Flourishing at Harvard University. In addition to being the recipient of the 2017 COPSS Presidents’ Award from the Committee of Presidents of Statistical Societies, he was the recipient of the 2013 Bradford Hill Memorial Lecture, the 2014 Mortimer Spiegelman Award, the 2015 Causality in Statistics Education Award, and the 2017 John Snow Award. He lives in Cambridge, Massachusetts, with his wife Elisabeth and their son Jonathan.

Read an interview with Tyler below.

You can watch last year’s COPSS Awards presentation ceremony, and the 2017 R.A. Fisher Lecture, at JSM Baltimore, at

http://ww2.amstat.org/meetings/jsm/2017/webcasts/index.cfm.

Tyler VanderWeele: Q&A

*Tyler VanderWeele, 2017 COPSS Presidents’ Award recipient, graciously agreed to respond to Bhramar Mukherjee’s questions:*

*What was your reaction to winning the prestigious COPSS Presidents’ award?*

I was delighted, and in a state of shock! My wife jumped for joy. A happy, almost mindless, daze set in. It was a Sunday afternoon and we went on a beautiful walk with our son through Cambridge and Harvard Yard. It was a very happy afternoon and evening. As it turned out, however, I had also contracted norovirus the night before, so I will perhaps never how much of the mindless daze was from COPSS or from… well, we won’t go into the aftermath!

*Which part of your job do you like the most?*

It would be a toss-up between having long stretches of time to think and to write (now sadly less frequent) and having such wonderful colleagues and students to work with. On the one hand, little makes me happier or more at peace than having an empty day to read, think, scribble out mathematics, or write. On the other hand, much of the deepest joy comes from the sharing of ideas, and developing them with colleagues and students. Unfortunately, the two increasingly seem to come into conflict due to limited time! I often wish there were 36 hours in a day.

*What advice would you give to young people who are entering the profession as PhD students and assistant professors at this time?*

My doctoral dissertation advisor, Jamie Robins, has consistently said to just pursue what you love and are interested in. I think that was very good advice and I would offer the same. In soft money environments especially (which is what many biostatisticians at least have to deal with), it is all too easy for one’s time and effort and creativity to be devoted to what is funded rather than what is important. I think it is essential not to confuse the means with the ends. The grants are meant to support research and the pursuit of knowledge; the pursuit of knowledge is not done for the sake of the grant! I think it is important to always be working on research questions that are significant and of interest and not just what happens to be around. I think it is also important to block out time to read broadly, to think deeply, to ponder the structure of our discipline and its relation to others. These things are essential in the choice of research questions. I have come to believe more and more strongly over my career that a substantial amount of time should be devoted to thinking about what is worthwhile pursuing and why. My hope is that universities and departments would do whatever they can to provide protected time for junior faculty (and all faculty!) to engage in deep reflective thought on important questions, whether those topics are funded or not.

*Who are your most significant mentors? How did/do they impact your career?*

I have had a number of wonderful mentors throughout life, I am very grateful to them. Charles Batty, my Analysis tutor in Mathematics at St. John’s College, Oxford, was an important mentor in encouraging careful, rigorous thought and probing the boundaries of concepts. Also at Oxford, my philosophy tutor Peter Hacker, an expert on Wittgenstein, taught me about the philosophy of language and about the drawing of distinctions between concepts, paying careful attention to how language is used. Believe it or not, that mentoring has been of tremendous value in trying to mathematically formalize and make more rigorous various epidemiologic concepts. At Harvard, Jamie Robins was a wonderful guide as I carried out original methodological research projects and he has constantly challenged me to think clearly and deeply about ideas and concepts, to focus on what seems most central and important. I have had many other important mentors throughout the years but in terms of my work in statistics, biostatistics and epidemiology, these would be the most important.

*Why were you drawn to causal inference?*

Before I began studies in biostatistics, I was actually in a doctoral program in finance. We would fit regression models and then would seem to interpret all of the regression coefficients the same way, often with some vague notion that the interpretation might be causal. It made me very uncomfortable. I felt that we were not really justified in interpreting the regression coefficient as we did, but I also felt that I lacked the technical vocabulary to express my concerns. After a while, I decided to leave finance and took a course in epidemiology and came across the concept of “confounding” and realized immediately that this was the concept that I had wanted to employ in my critique of what we had been doing in empirical finance. The next semester I began doctoral studies in biostatistics at Harvard and my very first semester there I took a course with Donald Rubin on Causal Inference and was introduced to the potential outcomes notation, and immediately saw the concept of confounding could be mathematically formalized by using such potential outcomes notation. I knew at that point that I wanted to pursue causal inference. The next year I took another, more advanced, course on causal inference with Jamie Robins at the School of Public Health at Harvard, and was introduced to causal inference with time-varying exposures, causal diagrams, and questions of mediation, which have subsequently become some of the topics of my own methodological research, much of which is summarized in my book *Explanation in Causal Inference: Methods for Mediation and Interaction*. I think having a formal framework to distinguish between association and causation is central. It is extremely important in the biomedical and social sciences. It is helpful, but perhaps not absolutely essential, when we are talking about the effects of a single exposure since, in that case, many of our intuitions and traditions that have been built up over the years work reasonably well. However, once we come to more nuanced inquiries concerning exposures that vary over time, or questions of mediation and mechanisms, or how we think about the causal effects on some secondary outcome in the presence of death that may precede our outcome measurement, it becomes extremely difficult to make progress in thinking about causality without a more formal framework. Counterfactuals and the potential outcomes model provide the necessary framework. The framework’s capacity to clarify and evaluate assumptions and to provide much more precise and nuanced interpretation to our estimands is extraordinary. A lot of work, however, still needs to be done in making these approaches standard practice in empirical research. For example, methods for sensitivity analysis for unmeasured confounding have been around for decades but are still rarely used in practice. In thinking about how to encourage broader use, a few months ago, in a paper in the *Annals of Internal Medicine*, I introduced a new metric called the E-value to assess the robustness of associations to potential unmeasured confounding (essentially related to the evidence for causality) that I hope will help standardize and promote the use of sensitivity analysis throughout the biomedical and social sciences. The formal work in causal inference using counterfactuals has constituted a massive advance in our capacity to reason about causality, and in understanding our limits in being able to do so. It has been a joy to be able to contribute to this important field.

*Anything else you’d like to share about our profession?*

I think that statistics as a discipline is under-appreciated in the university. It really provides the methodological foundation for so many other disciplines. It is interesting to go down the list of Departments in a university and think about how many of them use regression models, for example. Statistics has become one of academia’s major epistemologies, one of the ways we come to knowledge. I think it needs to be better acknowledged as such throughout the university. At the same time, I think that the use of statistics is often not adequately scrutinized. In many disciplines, even in statistics itself, we will often blindly accept the interpretation of some analysis without thinking critically about the interpretation, the degrees of evidence, and the assumptions that underlie the conclusions. The field of causal inference is of course helpful in this regard. But I think that the concerns are even broader. How do our statistical analyses relate to the pursuit of knowledge? When are we willing as a community to say that we know something on the grounds of statistical analyses? When is it the case that the evidence is such that it seems impossible that it will be overturned? The much discussed of late “replication crisis” has I think helped bring these important issues up quite dramatically. I also think it is possible that we sometimes overuse and over-rely upon statistics. I am sometimes surprised how in some papers, a policy conclusion is thought to immediately follow from a particular statistical analysis, when a number of ethical and value-related questions must also go into decision-making. Because statistical analyses are quantitative they seem more objective, and we have perhaps become too weak at other forms of ethical and practical reasoning, so that we, at times, perhaps over-rely on statistics in our thinking. In my view, statistics, as a discipline, is thus paradoxically under-appreciated, over-utilized, and under-scrutinized. I think additional reflection, and also education in the broader academic community on how statistical analyses are ultimately related to knowledge, would help increase the appreciation of our discipline and at the same time lead to better and more appropriate interpretation. I hope to spend a fair bit of time thinking further about this task in the years ahead, and hope that other statisticians will do the same.

*Finally, what are your hobbies/interests beyond statistics?*

I very much enjoy classical music and playing the piano, and I try to attend concerts whenever possible, though with a two-year-old now that has become a little less frequent. More and more time has been devoted to my family life, which I have thoroughly enjoyed. I enjoy food and wine… perhaps too much! And I also very much enjoy tennis and, in times past (and hopefully future), skiing. I’ve been fairly involved in various church communities throughout my life and this has been an important part of the way I think about and understand the world, and more recently this has also been part of my academic work with empirical studies on religion and health. I still enjoy opportunities to read more in philosophy and theology and some of my more recent work has also been thinking about how ideas in philosophy and theology might inform empirical statistical research in the social and biomedical sciences and vice versa… but now I am talking about work once again. Probably more balance with other interests and hobbies, family and friends would be good!

*Congratulations, Tyler VanderWeele, on behalf of the community. We wish you continued success with your amazingly creative scholastic career in the years to come!*

Categories: Math and Stats

Following “guest puzzler” Stanislav Volkov’s rotating wheel probability puzzle (solution below), Anirban DasGupta sets a statistics puzzle:

This is one of those quick-and-dirty methods, popularized by John Tukey, one that makes some intuitive sense, and can be very quickly implemented. This issue’s problem is about testing the equality of two absolutely continuous distributions on the real line. You may not have seen this *pocket test* before. Here is the exact problem.

Based on iid picks $X_1, … , X_n$ from an absolutely continuous distribution $F$ and an independent iid pick $Y_1, … , Y_n$ from a possibly different absolutely continuous distribution $G$, we propose a test statistic for testing $H0 : F = G$; as stated above, $F,G$ are distributions on the real line. Arrange the combined sample in an ascending order and suppose the overall sample maximum is a sample from $F$, and the overall sample minimum is a sample from $G$. Count the number of $X$-values larger than the largest $Y$-value and also count the number of $Y$-values smaller than the smallest $X$-value. The test statistic $T_n$ is the sum of these two extreme runs counts. If the overall sample maximum and the overall sample minimum are both samples from the same distribution, define $T_n$ to be zero.

a) Give theoretical values or theoretical approximate values for the mean and the variance of $T_n$ under the null.

b) Give theoretical approximations to cut-off values for rejecting the null based on the test statistic $T_n$. This is close to asking what are theoretically justified approximations to the null distribution of $T_n$.

c) Is this test distribution-free in the usual sense?

d) What would be the approximate power of this test at level .05 if $F = N(1, 1),G = N(0, 1), n = 100$? Be careful about the rejection region.

Solution to Student Puzzle 19**We received correct solutions to Stanislav Volkov’s puzzle from Mirza Uzair Baig from the University of Hawai’i at Mānoa, Jiashen Lu from the University of Pittsburgh, and Benjamin Stokell, University of Cambridge. Well done!**

**Stanislav explains:**

Observe that the required probability equals

$\begin{align*}

x:= \mathbb{P}(Y_\infty=0 |Y_0=0)=\sum_{k=0}^\infty \mathbb{P}\end{align*}$(the wheel rotates $4k$ times)

$\begin{align*}

=\sum_{k=0}^\infty \sum_{j_1,\dots,j_{4k}}

p_{j_1}\cdots p_{j_{4k}}\prod_{\ell\notin \{j_1,\dots,j_{4k}\}} (1-p_\ell)

\end{align*}$

where $j_n$ are distinct non-negative integers and $\ell$ is a non-negative integer as well; additionally, we assume that the “empty” sum (when $k=0$) equals $1$. This can be somewhat simplified observing that

\begin{align*}

\frac{x}{\prod_{j=1}^\infty (1-p_j)}

=

\sum_{k=0}^\infty \sum_{j_1,\dots,j_{4k}\ge 0}

\rho_{j_1}\cdots \rho_{j_{4k}}=:S

\end{align*}

where $\rho_k=p_k/(1-p_k)$, the $k-$th odds ratio.

Now we are going to use a little trick, namely that

\begin{align*}

\prod_{j=1}^\infty (1 + \nu \rho_j) &=

\sum_{k=0}^\infty \nu^k \sum_{j_1,\dots,j_k} \rho_j.

\end{align*}

Summing the above expression for $\nu=1$, $i$, $i^2=-1$, and $i^3=-i$ respectively, where $i=\sqrt{-1}$, we get

\begin{align*}

\prod_{j=1}^\infty (1 + \rho_j)

+\prod_{j=1}^\infty (1 + i \rho_j)

+\prod_{j=1}^\infty (1 – \rho_j)

+\prod_{j=1}^\infty (1 – i \rho_j)

=4S

\end{align*}

since

$$

1^k+i^k+(-1)^k+(-i)^k=\begin{cases}

1,&\text{if } k\mod 4=0,\\

0,&\text{otherwise}.

\end{cases}

$$

Consequently,

$$

x=\frac 14

\prod_{n=1}^\infty (1-p_n)

\left[

\prod_{n=1}^\infty (1 + \rho_n)

+\prod_{n=1}^\infty (1 + i \rho_n)

+\prod_{n=1}^\infty (1 – \rho_n)

+\prod_{n=1}^\infty (1 – i \rho_n)\right].

$$

Finally, in case $p_n=\frac 1{2n^2+1}$ one can use e.g. formulae 4.5.68–69 from “Handbook of Mathematical Functions” by Abramowitz and Stegun.

Note that this method can be easily generalized for a wheel with any number $M\ge 2$, by replacing $i=\sqrt[4]{1}$ with $\sqrt[M]{1}=e^{2\pi i/M}$.

Categories: Math and Stats

The IMS Elections this year are for the IMS President and five places on Council. The nominees are:

For IMS President:- Susan Murphy

- Vivek S. Borkar
- Vanja Dukic
- Christina Goldschmidt
- Ruth Heller
- Susan Holmes
- Xihong Lin
- Richard Lockhart
- Gabor Lugosi
- Nicolai Meinshausen
- Kerrie Mengersen

You can read their profiles and statements on pages 14–20 of the March 2018 IMS Bulletin, or on the IMS website here.

Voting will be open soon (at the time of writing) and closes May 20.

Categories: Math and Stats

The 2018 Wolf Prize for Mathematics has been awarded to **Alexander Beilinson** and **Vladimir Drinfeld**, both of the University of Chicago, "for their groundbreaking work in algebraic geometry, representation theory, and mathematical physics." The prize, which carries with it a cash award of US$100,000, was announced at a special event hosted by the President of Israel, Mr. Reuven Rivlin. Beilinson and Drinfeld are the authors of the 2004 AMS title Chiral Algebras. Read more about their work.

The Wolf Prizes have been awarded annually since 1978 to renowned scientists and artists for their achievements for humanity and for friendly relations between peoples, regardless of nationality, race, color, religion, gender, or political outlook, in fields including agriculture, chemistry, mathematics, medicine, physics, and the arts. Award winners are selected by international awards committees of three members, all of whom are scholars and experts in their fields.

Categories: Math and Stats

Categories: Math and Stats

Categories: Math and Stats

**Drawing Voting Districts and Partisan Gerrymandering: Preparing for 2020
A Statement endorsed by the American Statistical Association and the Council of the American Mathematical Society**

*Providence, RI: *The American Statistical Association (ASA) and Council of the American Mathematical Society (AMS) have issued a joint statement to inform discussions and planning around the drawing of voting districts as we approach the 2020 census. This marks the first time in recent history the two organizations have issued a joint statement of broad interest to the American public.

AMS President Ken Ribet said, "Our community is poised to play a central role in ongoing discussions about methods for creating voting districts and the evaluation of existing and proposed district maps. It has been a pleasure for me to observe the recent explosion in interest in this topic among colleagues and students in mathematics and statistics. I anticipate that the new statement by the ASA and AMS Council will lead to increasing transparency in the evaluation of districting methods."

The statement is organized around the following three facts:

1. Existing requirements on districts do not prevent gerrymandering.

2. It has become easier to design district plans that favor partisan outcomes with greater confidence.

3. Modern mathematical, statistical and computing methods can identify district plans that favor partisan outcomes.

"While these points may be common knowledge in some circles, it's important they be stated by objective and respected authorities like the AMS and the ASA and for them to be more widely known in the redistricting discussions around the 2020 Census," noted 2018 ASA President Lisa LaVange. "Having lived in both Maryland and North Carolina in the last few years, I sincerely hope policymakers will accept our offer of help to ensure a healthy and vibrant democracy."

The statement, while discussing Fact 2, cites "the growing use of big data and the increased role of predictive modeling of voting outcomes by election campaigns," and asserts, "Using these tools, legislators easily can draw district plans that satisfy political and legal criteria, yet also are highly likely to result in one party winning a disproportionate share of the elections relative to the number of people who voted for that party."

To help identify voting district plans that give one of the parties an unfair advantage, the two societies say a key step is to specify and calculate metrics that illuminate the partisan nature of proposed plans and briefly describe general principles and approaches.

"Statistical and mathematical standards and methods can be very helpful to inform decision-makers and the public about partisan gerrymandering," remarked the statement's main architect, Jerry Reiter, 2015-2017 chair of the ASA Scientific and Public Affairs Advisory Committee. "The statement acknowledges the value of partisan asymmetry as a standard, and it highlights some methods for measuring partisan asymmetry. The statement does not endorse any one method, as ultimately this issue is determined by policymakers and the courts."

Finally, the statement notes "that open and transparent research practices have facilitated more robust, reliable and accepted findings involving mathematics and statistical science" and suggests "such openness and transparency could benefit the processes for evaluating and drawing voter districts."

In issuing the statement, the two societies also offer to connect decision-makers and policymakers with mathematical and statistical experts.

[% ams_include('pao-contact') %]

Steve Pierson

ASA Director of Science Policy

703-302-1841

spierson@amstat.org

# # # #

**About the American Mathematical Society**

Founded in 1888 to further mathematical research and scholarship, the AMS fulfills its mission through programs and services that promote mathematical research and its uses, strengthen mathematical education and foster awareness and appreciation of mathematics and its connections to other disciplines and everyday life.

**About the American Statistical Association**

The ASA is the world’s largest community of statisticians and the oldest continuously operating professional science society in the United States. Its members serve in industry, government and academia in more than 90 countries, advancing research and promoting sound statistical practice to inform public policy and improve human welfare.

Categories: Math and Stats