(5/5)

Create a MapReduce program called MovieRatings that will print all the average ratings for a movie based on user feedback

INSTRUCTIONS TO CANDIDATES

ANSWER ALL QUESTIONS

Assignment 5

The goal of this assignment is to learn to write a more complicated MapReduce program for processing big data sets in parallel using MapReduce framework.

Create a MapReduce program called MovieRatings that will print all the average ratings for a movie based on user feedback
The program should work on MovieLens data set, which can be downloaded from http://www.grouplens.org/node/73. Download the 100k data (ml-100zip) set into your Downloads folder. Extract the ml-100k.zip file into ~/Downloads folder itself. For this particular assignment, the program will use the u.data and u.item files under the extracted folder. Technically speaking, you really do not have to do this step if you had already done assignment 3. Because this was data preparatory step for assignment 3.
On running the MovieRatings MapReduce program, it should process the data file, which contains a list of movie ratings by users and also process u.item file, which has movie related information. In the u.data file, each user has rated at least 20 movies. Users and items are numbered consecutively from 1. The u.data file is randomly ordered and the fields are tab separated with list of fields being user id, item id, rating and timestamp. The u.item field contains sveral pieces of information about each movie. The u.item file is pipe(|) separated. Unfortunately, the text of the README file is incorrect in terms of field separator for u.item, but it has the correct information about the fields and their order. Please look at the actual file and also the README file.
The program should take 2 The first argument should be the directory where the movielens user data set

(u.data) and movie information (u.item) files are placed (called /movie-and-ratings in hdfs), and the second argument is the name of directory where the results will be placed in HDFS (called /movie-rating-result).

The result of the program should be a list of all movies rated with the following information per line -- movie id, movie title, release date, IMDB URL, average rating, total number of unique users rated, total number of ratings for the movie
Use 2 reducers by setting the number of reducers to 2.
Write a combiner for efficiency, if possible
Define a custom counter that has a running count of total records processed in map, total number of unique movies in
Before you begin, please split the input file into 5 files called udata, u2.data, u3.data, u4.data and u5.data, each having 20000 records. Use the Unix split command. Reference: https://kb.iu.edu/d/afar
All the input files - 5 user ratings file and the movie information file should be gzip compressed before processing.
Reducer output files should be

What to turn in?

Jar file that can be run at command line
A zip of NetBeans or Eclipse project
A sample run with output as a Word I have given you a template for assignment 4 solution. Follow the guidelines, fill the document and submit.

(5/5)

Use CA10RAM to get 10%* Discount.

Create a MapReduce program called MovieRatings that will print all the average ratings for a movie based on user feedback

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Related Questions

. Introgramming & Unix Fall 2018, CRN 44882, Oakland University Homework Assignment 6 - Using Arrays and Functions in C

. The standard path finding involves finding the (shortest) path from an origin to a destination, typically on a map. This is an

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class. The LineItem class will represent an individual

. SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

. Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Um e HaniScience

Muhammad Ali HaiderFinance

Husnain SaeedComputer science

Atharva PatilComputer science

Other Services

Create a MapReduce program called MovieRatings that will print all the average ratings for a movie based on user feedback

ANSWER ALL QUESTIONS

Attachments:

Instructions Files

Related Questions

. Introgramming & Unix Fall 2018, CRN 44882, Oakland University Homework Assignment 6 - Using Arrays and Functions in C

. The standard path finding involves finding the (shortest) path from an origin to a destination, typically on a map. This is an

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class. The LineItem class will represent an individual

. SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

. Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

Our Experts

Um e HaniScience

Muhammad Ali HaiderFinance

Husnain SaeedComputer science

Atharva PatilComputer science