Crunching the NumbersVassar Takes Part in Nationwide Data Fest

Emily Janoski ’20 and three of her friends recently spent an entire weekend crunching data about the Canadian Rugby 7s Senior Women’s Team—and they weren’t alone in this curious pursuit. At DataFest, an annual event for statistics and data science aficionados in colleges around the country, about 60 Vassar students and more than 2,000 others analyzed the same information.

The first DataFest took place on the UCLA campus in 2011, when about 30 students analyzed five years of arrest records compiled by the Los Angeles Police Department. Since then, it has become a nationwide event at dozens of colleges and universities. Vassar has participated every year since 2016.

To ensure a level playing field, participants aren’t given the data until the competition begins at each site. Then they spend the next two days competing for prizes in three categories: Best Analysis, Best Use of External Data, and Best Visualization. “Secrecy of the data is part of the challenge and fun for DataFest participants,” said Associate Professor of Mathematics and Statistics Ming-Wen An, who co-organized the Vassar event with Assistant Professor of Mathematics and Statistics Jingchen Hu.

Assistant Professor of Biology Leroy Cooper (standing) mentors Carl Cao ’22 (left) and Anish Kumthekar ’22, members of Team I Have No Idea

Despite knowing virtually nothing about the sport of rugby, Janoski and teammates Sooyeon Baek ’20, Elle McKenzie ’20 and Jaein Kim ’20 won the Best Visualization prize in the Vassar competition. Using data supplied by the rugby players and their coaches, Janoski’s team determined that the Canadian players’ self-reported motivation significantly influenced the outcome of their games. “One part of our analysis found that for every 0.1 point increase in the team’s degree of motivation, there was a predicted increase of 2.7 net points in the game outcome,” she said. “We concluded that motivation is a crucial element in surpassing physical limitations, a testament to their incredible resilience.”

Janoski’s team, Linest, competed against 12 other teams from Vassar and two from Marist College, huddling over laptop computers in classrooms in Kenyon Hall for two days before making their presentations to a team of judges in Rockefeller Hall on the afternoon of April 14. Predictably, most of the participants were Computer Science or Mathematics and Statistics majors, but the field also included five Economics majors, three Applied Mathematics majors, three Political Science majors, three Biology majors, two Psychological Science majors, and single majors in Biochemistry, Chemistry, Math Education, Urban Studies, and Science, Technology and Society.

Judges (left to right): Hanna Ginzburg, Marc Smith, Brendan Flanagan, and Eric Hepp with Best Visualization winners Sooyeon Baek, Elle McKenzie, Emily Janoski, and Jaein Kim of Team Linest.

Other winners at Vassar’s DataFest were Kai Matheson ’19, Jordan Buhmann ’19 and Charles Hooghkirk ’19 of Team Akatsuki, for Best Use of Outside Data; and Team 456 of Marist College, for Best Analysis.

While the team members made the final decisions on how to analyze the data, they were assisted by roving consultants—Vassar faculty and staff and professionals in the field recruited by An and Hu. Joseph Phillips, a Statistics instructor at Manhattanville College, said he was impressed with Janoski and her teammates. “The students were really quick to find patterns in the data,” Phillips said. “I’m not only interested in helping the students in this competition but my own students as well, and being part of this event gave me new ideas on approaches I can take back to my own classes.”

Hu said she was pleased with the competition. “I was impressed by the students’ curiosity and perseverance of tackling a seemingly simple but actually challenging task in front of them for an entire weekend,” she said. “It was also encouraging to hear many students express their interest in data-related courses on campus. I look forward to seeing some of them in class in the near future, and of course, at the next DataFest!”