The art and science of big data analysis

0
44
big data analysis

Recent discussion has been taking place on the existence of several different data gaps across economic, social and political divisions – deficits left unaddressed at our own peril. But there’s another deficit that I’d argue has gone relatively unnoticed, but it’s no less important: Canada’s data analysis skills gap.

More collaborative learning and engagement between data science and the arts is needed if Canada’s data deficit is to be eliminated. Much of the problem could be addressed by ensuring that people have the abilities not only to know how to search for data, but also how to interpret it.

I am an economics professor and director of the Master of Public Service program at Waterloo University, where I have been conducting policy – oriented research using large datasets for over two decades.

Analyzing Big Data

We are currently living in the Big Data era, where massive amounts of information are gathered at ever – decreasing cost. Each posting on Facebook, Twitter and Instagram is a moment of data that can be archived and become part of a historical dataset. There is a significant demand for employees in an age when governments are making data more open and accessible, who can aggregate such large sets of information in a meaningful way and provide key insights.

In response, many Big Data analytics and data science undergraduate and graduate programs have emerged at universities across the country. Typically, these are housed in departments of computer science, mathematics, statistics and engineering.

A humanities approach to data

From a policy perspective, limited exposure to social science and humanities courses is a key missing ingredient of many of these programs. This may seem puzzling as why should arts courses be required by data science programs?

Social sciences and humanities train students in the theories of behaviour, which are needed to explain data trends and draw insightful narratives. This allows students of the arts to be an integral part of any process of model – building to predict human behavior and choices.

Data science students would also benefit from data ethics and governance courses given Facebook’s recent controversy over data collection practices. Students need to understand the importance of personal privacy and privacy and data protection, which takes precedence over actual data analysis.

On the flip side, students of the arts should also be encouraged to take challenging courses offering contemporary methods for analyzing big data. These courses may include machine learning, which is not yet common in the curriculum of social sciences or humanities.

Datafests and hackathons

Furthermore, ground – level scalable ideas that have the potential to bridge the division of data between sciences and arts are required. In many university campuses, data festivals and hackathons are becoming increasingly common. Typically, students are organized into teams in these events and have about two days to analyze data and prepare a summary of findings or recommendations. There have been many datafests organized by the American Statistical Association in various universities.

Typically, however, the emphasis is on data from the private sector mining. On the other hand, Canada’s first policy datafest used public datasets for graduate students in the arts. The University of Waterloo hosted various open datasets to analyze data in partnership with Innovation, Science and Economic Development Canada, the Government of Ontario and the Royal Bank of Canada.

Datafests demonstrate the data analytics expertise of students of humanities and social sciences. Their creativity, critical thinking, and diversity of thinking, all of which are key components of these disciplines, are important for the development, research, and analysis of policy issues.

Most of the research presented at the 2019 University of Waterloo Datafest was based on publicly available open data from various government websites. Using open data allows datafests to be low cost ventures with significant returns resulting in further awareness of information that is free and easily accessible. In addition, data festivals are an opportunity for experiential education where students are encouraged to accumulate relevant skills and need to work in teams to analyze contemporary policy issues.

In order to produce a critical mass of people who are skilled in sophisticated data analysis, the seeds are planted, can identify data deficits and offer recommendations on how to eliminate them. And that’s an important point. By downloading more information on public websites, the government can try to eliminate the national data deficit. However, if the ability to analyze data and make correct inferences is not widespread, this will not be effective or useful.

Reducing the skills gap

Of course, Statistics Canada and provinces and municipalities should be given priority over increased resources to ensure that dedicated offices and staff are able to assess data needs across departments. This is a short – term perspective, however, as it does not address the data analysis skill gap.

A long – term strategy to ensure a mix of training across different disciplines and encourage public, private and university partnerships should result in a significantly lower data deficit for Canada by reducing the data analytics skills gap and encouraging statistical literacy.

Steve Jobs summed up his business strategy by saying: “Technology alone is not enough in Apple’s DNA — it is technology married to liberal arts, married to humanities that gives us the results that make our heart sing.” While this was specific to the intersection between technology and the arts, it resonates deeply when considering how the Arts and Sciences can advance society further by encouraging cooperative data science learning.

LEAVE A REPLY

Please enter your comment!
Please enter your name here