§You have been approached by a client who analysis atmospheric science data and climate models.
§They have developed a new analysis technique, but it takes too long to run for them to use it.
§They have asked you to investigate the use of big data techniques to reduce the processing time.
§They have a large volume of data to process, and the analysis needs to be repeated frequently.
§They have the following basic requirements:
§Current analysis time is approximately 2.5 hours to analyse the climate model output data for a 1-hour time period.
§They wish to complete the analysis of the data for a 24-hour period (25 data sets) in under 2 hours.
§The data for a single day of model output is approximately 250MB.
§However, they have over 100 years worth of data to analyse making a total of over 9TB.
§It is not possible to hold on this in memory at one time, so the new process should load only 1 hour of data for processing at a time.
§If parallel processing is to occur, then 1 hour of data per worker can be loaded as needed.
§The data has been provided by the European Centre for Medium Range Weather Forecasts (ECMWF)
§You have been tasked with investigating the use of parallel processing to achieve the analysis period, with the following expectations:
§Test and compare the processing speed of sequential and parallel processing
§Extrapolate your findings to indicate the number of processors required
§Test how your code responds to common errors, e.g.
§data that is text instead of numeric
§use of NaN in the data as an error code.
§Run automated tests that allow your client to set the code running and return later to see the results, without intervention.
§Working code that demonstrates:
§Loading of only the data required for the processing taking place
§Sequential processing of the data
§Parallel processing of the data
§Plots of the comparisons between sequential processing and parallel processing with different numbers of workers
§Automated testing of your code to deal with pre-defined data error types.
§
DescriptionIn this final assignment, the students will demonstrate their ability to apply two ma
Path finding involves finding a path from A to B. Typically we want the path to have certain properties,such as being the shortest or to avoid going t
Develop a program to emulate a purchase transaction at a retail store. Thisprogram will have two classes, a LineItem class and a Transaction class. Th
1
Project 1
Introduction - the SeaPort Project series
For this set of projects for the course, we wish to simulate some of the aspects of a number of
1
Project 2
Introduction - the SeaPort Project series
For this set of projects for the course, we wish to simulate some of the aspects of a number of