Data and its Applications

“Big Data” is as the name suggests, a term for lots and lots of information (data) which can be used to make conclusions about a topic of interest. These are often referred to as datasets. An example may be the health data of the UK, which could be looked at (analysed) to look for patterns or trends which could then be used to make decisions about health care in general. 

There are many challenges that face Big Data projects. They include the practical challenges of collecting, storing, and analysing such vast amounts of data, but also the need to ensure information privacy. Data anonymisation tools, can help by removing identifiable information from datasets. One such tool is the Cambridgeshire and Peterborough Foundation Trusts (CPFT) Research Database. You can find more information about this from the home page of this website or by following this link.

To show you the impact of different sizes of datasets, we have constructed the imaginary example below.

Our study is looking at the illness, “No-Time-on-Your-Hands-itis (NTYH)”. This is a terrible disease where people have absolutely no time on their hands to enjoy themselves! We are really interested to find out more about potential risk factors for this disease. A risk factor is something about you that increases your chance of getting NTYH. It could be something about you personally, your lifestyle or where you live (e.g. whether you are male or female, whether you take regular exercise or not, whether you live in a town or village etc). 

So our first step is to recruit people to take part.

When the study begins, everyone is tested for NTYH. They are then all questioned to see which potential risk factors they may have. The information gathered is displayed in the table below labelled ‘RISK FACTORS’. To recruit new people, move the SLIDER in the table above the risk factor table - labelled ‘number of participants’. Men, Women, Girls and Boys will all be recruited.

To determine which factors may increase or decrease the risk of developing NYTH we will then look at the graph of all the factors and interpret the results together.


  • A value of 1 means that risk factor has no effect

  • A values of 2 indicates there is twice the risk of developing NTYH

  • A value of 0.5 indicates there is half the risk of developing NTYH

© CLIMB Project, University of Cambridge