Please read the whole instruction
.You will need to complete Task2 from C to L. Where it says [TODO] and TO_BE_COMPLETED fill all the code snippet in each cell from C to L where it says TO_BE_COMPLETED so that it gives the expected output as shows after each cell from C to L. You might not see the exact number as it shown in the expected output but the format and structure has to be same. I have provided 2 data file to run all the codes and you can download those running cell block B. at the end of the Task 2 it will generate 9 output csv files. Plot those csv files using the code provided on the cell "Line Plot". Your 9 output files should generate 9 plots showing at the end of the notebook. Your plot should be similar the one is shown at the end.
Please use apache Spark RDD using Python.Most of the codes are provided you will need to complete where it says TO_BE_COMPLETED.This is a big data analysis problem and try to optimize your code such a way so that when i will take all your scripts and run it on a server using 10GB of csv file it should run in 2 minutes.
We still use the Safegraph data to better understand how NYC response to the COVID-19 pandemic. If you have any doubts about the data, please consult SafeGraph's documentation for Places Schema and Weekly Pattern.
Problem Description
To assess the food access problem in NYC before and during the COVID-19 pandemic, we would like to plot the visit patterns for all food stores (including restaurants, groceries, deli's, etc.) such as the one shown below.