Feeds:
Posts
Comments

College athletes made CNN headline two days after the big event, the last BCS NCAA football championship game. In its investigation about academic performance of college athletes, CNN found

public universities across the country where many students in the basketball and football programs could read only up to an eighth-grade level. The data obtained through open records requests also showed a staggering achievement gap between college athletes and their peers at the same institution.
This is not an exhaustive survey of all universities with major sports programs; CNN chose a sampling of public universities where open records laws apply. We sought data from a total of 37 institutions, of which 21 schools responded. The others denied our request for entrance exam or aptitude test scores, some saying the information did not exist and others citing privacy rules. Some simply did not provide it in time.

Details of the data are animated here. The list of schools to whom CNN sent requests is shown on CNN webpage

Credit: CNN

Wait a second! I cannot believe my eyes.

There are 38 logos for the 37 schools CNN investigated. Are you serious? Is it me who cannot count or what? Then when I read “details” more carefully, I found the following statement above the picture:

CNN sought records from nearly 40 public universities, and got data back from 21.

which seems to be a less precise but more accurate description of their data collection procedure. Ok, end of joking, up to more serious issue here.

The point is that those students who cannot read well were putting in a situation where they had little chance to finish their degree, or even meet the NCAA academic standards, unless they or someone cheat! More importantly, how many of such cases are there? or what percentage of student athletes are in such situation? No one is telling ……

What a system!

It has been inspiring to watch how Hans Rosling gave impressive talks about numbers and statistics. If you haven’t seen any of his great presentations, here is one example:

Chances are that you probably haven’t seen him showing his wild side before. I just saw this article, “Hans Rosling: the man who makes statistics sing“, in which he was referred as “the ‘Jedi master’ of data”, Not just because of his magical power with data. The fact is that the professor’s main hobby is sword swallowing.

What? sword swallowing? Yes! There is a video on YouTube showing him doing so (at around 8 min 30 sec).

Wow! This is eye-opening.

Big congratulations to Terry Speed who won the 2013 Prime Minister’s Prize for Science award.

Image

When he joins ABC News Breakfast, he talks about the prize, the pride, and the role statistics played in the O.J. Simpson murder case in which he testified as an expert witness.

Listen to him and you will also figure out where the prize money, or at least half of it, will go :)

Here is an old post about Terry’s Stuff.

With the 2013-2014 NFL preseason games underway, the business for experts to predict games is about to start again. Cannot wait…

Here comes the ESPN expert pick for week 1.

CBSsports also joined the expert pick business this year with its collection of experts:

Fans must be eager to know who is the best expert in this NFL prediction game and there are already questions posted in the comment section of ESPN Expert picks.

Based on the record I collected from ESPN in the last two years, we clearly have a winner: Seth Wickersham, who correctly predicted 69.9% and 65.2% games, the best among ESPN experts, for the last two NFL seasons, respectively. Here are the overall prediction accuracy records of each expert in the last two season, with more details here.

Picks Allen Golic Hoge Jaworski Mortensen
2013 60.2% 63.3% 66.8% 65.6% 69.5%
2012 65.0% 62.9% 63.5% 64.6% 60.5%
Picks Schefter Schlereth Wickersham Jackson Johnson
2013 62.5% 64.5% 69.9% 62.9% 60.2%
2012 61.7% 65.2% 65.2% N/A N/A
Picks Ditka Carter Accuscore Pick’em
2013 64.8% 66.0% 64.1% 65.9%
2012 N/A N/A 68.0% 68.0%

Adam Schefter has the worst prediction average among the ones who made picks for the last two seasons, and Keyshawn Johnson was the worst for the last season.

Chris Mortensen‘s results are the most curious ones, winner of the most improved expert award in 2013. He did really well for the 2013 season, but his predictions was worst of the worst for 2012 (large variability?).  Let’s see if he keeps it up this year :)

Some additional background information: Accuscore is based on simulations (algorithms and data) by accuscore.com and Pick’em is the average of all predictions by NFL fans who submitted their picks on ESPN.com before the game (kind of a “crowd prediction” by non-experts).

Unlike predictions used in the last two years, the ESPN expert pick page shows that Accuscore prediction is no longer included this year. I wish ESPN still includes this algorithm (statistics) based prediction in this prediction game.

We also had fun of comparing expert picksalgorithmic prediction and crowd prediction of the 2011-2012 season.

For this year, more experts, more fun! Now, let the game start! Are you ready for the football (and those experts)?

The Crimson Tide and the Buckeyes are ranked the first and second in the Associate Press preseason poll of 2013 college football season. Alabama is  going for its third consecutive national championship, which has not happened before. Meanwhile, the Ohio State has never posted consecutive undefeated/untied seasons. Are they going to defeat the odds?
Prof. Mark Berliner and Prof. Bill Notz share their thoughts in Rob Oller commentary: Let’s crunch numbers for college football published in Columbus Dispatch today.

The Crimson Tide is going for its third consecutive national championship. The last time a team went back-to-back-to-back during the modern poll era (1936-present) was never. [......]

“The law of averages doesn’t mean something becomes more likely as time goes on,” said Mark Berliner, a professor in the Department of Statistics at Ohio State. “Just because you’ve never seen three (national titles) in a row doesn’t make it more probable. In my mind it becomes even less likely. It’s an indication of how hard it is to do.” [......]

Ohio State, meanwhile, has never posted consecutive undefeated/untied seasons. The Buckeyes finished 12-0 last year. Are they due for another round of perfection? It might help if the football program was birthed 10 million years ago.

“If an event has some positive probability of occurring, if given enough opportunities, eventually it will occur,” said Bill Notz, who also works in the OSU Department of Statistics. “So it’s possible, but the chance is very small.”

My thoughts: Alabama winning the third time and Ohio State going undefeated both have positive probability to happen and one of them might indeed happen. However, they are not going to happen together for sure :)

Go Bucks!

Old Statistics Books

Thanks to retirement of our faculty members, a large collection of old books on Statistics appeared in the lunch room.

All right, can you guess what is the oldest book I found among them? Of course it is called “Mathematical Statistics”, which is written by Henry Lewis Rietz (Professor of Mathematics, The University of Iowa).

How about we make another guess: when was this book published? It was 19xy, where x is one more than the first number and y is two fewer than second number. The price of the book was marked on the front page as $2.00 at the time.

I was curious about how Doug Wolfe, the former department chair and the previous owner of the book, got this book into his collection. He told me that he picked up the book from in a used book section of a bookstore when he was a graduate student at the University of Iowa. OH! This was a long time ago :)

Due to the respect of history, I read the whole book. It is puzzling to me that the book covers topics that are so similar with most “modern mathematical statistics” books. The title of each chapter reads like:

  • I. THE NATURE OF THE PROBLEMS AND UNDERLYING CONCEPTS OF MATHEMATICAL STATISTICS
  • II. RELATIVE FREQUENCIES IN SIMPLE SAMPLING
  • III. FREQUENCY FUNCTION OF ONE VARIABLE
  • IV. CORRELATION
  • V. ON RANDOM SAMPLING FLUCTUATIONS
  • VI. THE LEXIS THEORY
  • VII. A DEVELOPMENT OF GRAM-CHARLIER SERIES

The first five chapters are pretty much like what we currently teach in MathStat classes these days, and the last two chapters reflects the emphasis of the field at the time. One one side, we can think that we are teaching very old stuffs in our classes right now. On the other side, it also shows that these concepts are fundamental and long-lasting, just like the concept of integral for calculus.

From education point of view, delivering these concepts to public and making people used to thinking in these frameworks are good contributions from statisticians.

From research point of view, the field has evolved so much with recent dramatic explosion of data collection and computation power. Where does the new math come to help us in this data age? Law of large numbers, asymptotic upper bounds? or just building deeper and deeper networks (learning)?

Anyway, if we are tired of thinking, we may put the discussion aside and enjoy some neat book cover designs:

Introduction to Statistical Analysis by Wilfrid J. Dixon and Frank J. Massey Jr. (1951)

A Sampler on Sampling by Bill Williams (1978)

Applied Statistics, Principles and Examples by D.R. Cox and E.J. Snell (1981)

By the way, in case that you are curious about how much the book “Mathematical Statistics” by Professor Henry Lewis Rietz worth now, you may find it on Amazon.com. When I checked, it was being sold at

Continue Reading »

The latest issue of ISCA Bulletin published my interview: A conversation with Professor Bin Yu. It is quite long, but informative. Here I picked out some short paragraphs based on my personal bias.

[Before College]

A math book from a cousin gave me my first boost into math when I was in 3rd and 4th grade. I enjoyed taking exponentials and logarithms using a table in the book.  I  believe doing the math problems provided a refuge of certainty and safety for me during a very turmoil time in China.

Another big boost in my interest in mathematics occurred when I was in the Lab School of Normal University in Harbin.  There I had a wonderful and extremely talented sub math teacher, Jianye  Chen (陈建业) in my second year in junior high. [......] Under his strong influence and, in some sense, fulfilling his unrealized dream of going to the math department at Peking University, I chose to do math at Peking University after receiving a very good score on the national college entrance examination in 1980.

[PKU]

The first math analysis discussion class was hard for me since I didn’t know how to do the problems. But you know, I really liked math and we had good professors. We didn’t interact a lot with the professors, because that was not the norm.

In the entrance exam to graduate school in Peking University, I came first in the math subject exams. However, the professor I wanted to work with did not take me after the oral exam. So I switched into Probability and Statistics, although I originally wanted to do Functional Analysis. That was actually a very good move, a forced one, but it has benefited me tremendously.

[Qualify Exam at Berkeley]

Shi: Is it the same format as we took it? 10 questions?

Yu: Yes. If you do three, I think, you pass.

[Marriage]

In the summer of 1987, I went back to China and got married to my boyfriend who went to graduate school in China in 1985 in architectural history. He was able to join me a year later in Berkeley and went to Berkeley’s School of Architecture. My American friends were a bit shocked to hear that I married someone that I hadn’t seen for two years. It was a bit risky, but looking back, it was the best decision in my life.

[Suggestion for Young Researchers]

So I would say to junior people who just started their career: take more risks, instead of being more careful. If you work in a very desirable field like Statistics, you could not go too wrong. Ultimately, whether you enjoy your life or not is because whether you are happy, not because you make the system happy. And the system actually becomes happy because you are happy.

[Current Status of Statistics]

I think we are in a golden area for Statistics as an intellectual field. But this field has to be broadly interpreted. Basically a lot of people trained in other fields are also doing this type of work we do.

I think if we rise up to the challenge, we will be the leading data scientists. With our great traditions of critical thinking with us, at the same time, embracing machine learning, database, and computing challenges.

You take some risks, and you cannot really “fail” too much. You have a safe net. You have a Ph.D. in Statistics. How wrong could it go, right?

[Statistics in China]

Shi: By talking with people in China, I do feel industry, especially the high-tech companies, has a huge need for people who can analyze their growing volume of data. Meanwhile, in more scientific area like Biology and Physics, they do have the same need to find people who can work with them in designing and analyzing their experiments and do better science. Is there anything universities in China can do to help foster this type of collaboration?

Yu: I think it is kind of happening already. Peking University is talking about a data science center. You have to have cross discipline centers. Any culture change is going to be a slow process. But when there is a need, especially for economic reasons, things just happen in the end. The statistics majors in China, and here too, have to get on top of computing. At senior level, it is easy to find collaborators because you have ideas and a record. If you are a beginner and you cannot even touch the data, who’s going to hire a statistics undergraduate to give advice to a CS undergraduate? It is a constant struggle that we should keep up with computing training of our students. Eventually I hope we will be just as good as computer science majors. That would be the goal, then we will have both the critical thinking and computing skills. I’m not worried about the mathematical part as much not because it is not important. We have been giving our students that, so it is not the urgent need.  The weaker point is the cross-field critical thinking and computing for statistics students.

[Statistics and Data Science]

Yu: [......] Lots of people think of statistics as counting numbers, but they don’t know all the exciting things we do. That’s a misconception. Either we go all the way out as a community to change it, which is an uphill battle, or we just embrace data science. Just start saying that we do data science. It is psychology. This is a personal opinion, not representing the view of IMS. I’m just wondering and I think it is a discussion worth having because of the popular unfavorable misconception of statistics.

Shi: Yes. I have colleagues who seldom read the Annuals of Statistics. They think the journal mainly concerns about theoretical results and mainly about asymptotic, but they are not.

Yu: It is a dilemma in China. Statistics (统计) is 一级学科. Data science is not one of the 学科 yet. But in certain occasions, we can say that we do data science. We are statisticians and we do data science. At least we should go that far.

[Statistics and Critical Thinking]

Yu: That’s a gradual process. As I feel being the chair is confronting different opinions. As you said, you cannot form critical thinking without people counter you, even just playing the devil’s advocate. If it is all “great”, it is not critical thinking. Critical thinking is not the most natural thing in the Chinese culture because we tend to want to agree with each other, which has strength in lots of situations, but not in Science. It is something I think the western culture has an edge. In the Chinese culture, there are things called “思辨”and “承传”, but it is more about listening to others than questioning.

I’m not disapproving by critiquing, but some students might take that way. So the challenge to me is how to train those students to become critical thinkers. It is almost like they have to establish confidence first somehow.

[Data Collection and Quality in China]

Shi: I found it amazing to see on the Internet that comments about any data or any article written by Bureau of Statistics of China are usually like people don’t trust any of them. It seems don’t matter what the report is about. When it says something is good, they don’t trust it; when it says something is bad, they don’t trust it.

Yu: Yeah, that’s a big problem you bring up that is data quality. It is not unrelated to plagiarism in doing research at every level. For statistics, if we cannot trust the data, we are done. Maybe theoretical statistics will develop further first before data analysis or data science. But companies care a lot more about good quality of data. They cannot fake their data as much because it is related with their revenue. That’s why I say industry would play a huge role in pushing the development of statistics or data science, whatever it is called, in China.

Again, the full interview can be found here: A conversation with Professor Bin Yu

Follow

Get every new post delivered to your Inbox.

Join 34 other followers