When you sign into Facebook, it suggests friends. In this assignment, you will write a program that reads Facebook data and makes friend recommendations.
Facebook suggests people you may be (or should be) friends with. Netflix suggests movies you might like. Amazon suggests products to buy. How do they do that? In this assignment, you will learn one simple way to make such suggestions, called “collaborative filtering”. The actual algorithms used by these companies are closely guarded trade secrets.
A computer system that makes suggestions is called a recommender system. As background, there are two general approaches: collaborative filtering and content-based filtering.
In this assignment, you will implement a collaborative filtering recommendation system for suggesting friends on Facebook.
A graph or network represents relationships among things. The things are represented as nodes
or vertices, and the relationships are represented as edges.
One common use for a graph is to represent travel possibilities, such as on a road map or airline map. The nodes of the graph are cities, and the edges show which cities are directly connected. Then, you can use the graph to plan travel.
Another common use for a graph is to represent friendship among people in a social network. For example, here is the friendship graph for some of the characters of “Romeo and Juliet”:
An edge between person A and person B means that A considers B a friend, and also B
considers A a friend.
This graph is unable to represent certain information. For example, Count Paris wishes to wed Juliet, but she does not reciprocate his affection. You do not need to worry about this information, because Facebook does not represent this information either. (Some other social networking sites, such as Twitter and Google+, do permit one-way links.)
In the image above, ignore the gray background and the labels for the families ("houses"); those are there just to help you interpret the graph but are not part of the social network itself.
You will implement two mechanisms for recommending a new friend in a social network. A simple way to state this question is, “For user X, who is the best person to recommend as a friend?”
You will answer a more comprehensive question: “For user X, list some non-friends in order, starting with the best friend recommendation and ending with the worst.” A non-friend is a user who is not X and is not a friend of X. Depending on the recommendation algorithm, the list may include all non-friends or some of them.
For example, for Mercutio the list might be: Capulet
Montague
Benvolio
Friar Laurence Juliet
Further note that the recommendations might not be symmetric: the best friend recommendation for Montague might be Mercutio, but the best friend recommendation for Mercutio might be Capulet.
Your task will be to write code that, given a user U in the social network, produces friend recommendations for U, in order from best to worst. You will do this by assigning each potential friend a number called a score, where higher scores indicate a better match. Then you can sort your list according to the score. Given user X, if two people Y and Z would be equally good as new friends for X (they have the same score), then they should be listed in alphabetical order (for names) or numerical order (for numerical user IDs).
If non-friend Y is your friend's friend, then maybe Y should be your friend too. If person Y is the friend of many of your friends, then Y is an even better recommendation. The best friend recommendation is the person with whom you have the largest number of mutual friends. You will implement this heuristic.
As a concrete example, consider Mercutio in the Romeo and Juliet graph.
Mercutio has two friends in common with Capulet (Escalus and Paris). Mercutio has two friends in common with Montague (Escalus and Romeo). Mercutio has one friend in common with Benvolio (Romeo).
Mercutio has one friend in common with Friar Laurence (Romeo). Mercutio has one friend in common with Juliet (Romeo).
Mercutio has no friends in common with the Nurse. Mercutio has no friends in common with Tybalt.
Therefore, Capulet and Montague are the best friend recommendations for Mercutio, and the Nurse and Tybalt are the worst friend recommendations.
You will create a graph from the Facebook data in file facebook-links.txt (https://drive.google.com/file/d/1a_-DrgWw9gmSboJn_WgloZJXGowu4S8I/view?usp=sharing). You can see code for reading data from a file from Assignment 2 skeleton file. The facebook- links.txt file is courtesy of the Max Planck Institute for Software Systems. It contains a list of all of the user-to-user links from the Facebook New Orleans networks. These links are undirected on Facebook. Each line contains two numeric user identifiers, meaning the second user appeared in the first user's friend list, and the first user appeared in the second user's friend list.
Write the function recommend_by_number_of_common_friends. This function should take the friendship graph and a user ID as arguments and return a list of friend recommendations for the given user ordered by the number of common friends and in case of same number of friends, numerically. Note that unlike the example in class, we are working with integer IDs rather than string names.
Now, for every Facebook user with an id that is a multiple of 1000, print a list containing the first 10 friend recommendations, as determined by number of common friends. If there are fewer than 10 recommendations, print all the recommendations.
Print output format:
...
// 28000: 17125, 7033, 15462, 33049, 51105, 16424, 23, 7996, 1539, 17420,
// 29000: 28606,
// 30000: 14473, 14495, 17951, 19611, 22749, 23259, 30002, 3154, 8269, 862,
...
(The above is actual output for Problem 2. Your output for this Problem will differ but will use the same general formatting.)
Paste the output in comments inside your code file.
Note that throughout this assignment, you are permitted and encouraged, but not required, to define additional functions beyond the required ones.
Also, there is no skeleton code to start, but you may use the code from Lecture 14 and from Assignment 2 as help.
We will now give a different algorithm for computing a friendship score.
Consider the following hypothetical situation.
Two of your friends are J.D. Salinger and Tim Kinsella.
J.D. Salinger has only two friends (you and one other person). Tim Kinsella has 7 billion friends.
J.D. and Tim have no friends in common (besides you).
Since J.D. is highly selective in terms of friendship, and is a friend of yours, you are likely to have a lot in common with J.D.'s other friend. On the other hand, Tim is indiscriminate and there is little reason to believe that you should be friendly with any particular one of Tim's other friends.
Incorporate the above idea into your friend recommendation algorithm. Here is the concrete way that you will do so. We call the technique “influence scoring”.
Suppose that user1 and user2 have three friends in common: f1, f2, and f3. In Problem 2, the score for user2 as a friend of user1 is 1+1+1: each common friend contributes 1 to the score. In this problem, the score for user2 as a friend of user1 is 1/numfriends(f1) + 1/numfriends(f2) + 1/numfriends(f3), where numfriends(f) is the number of friends that f has. In other words, each friend F of user1 has a total influence score of 1 to contribute and divides it equally among all of F's friends.
In the example above, J.D. Salinger's other friend would have a score of 1/2, and each of Tim Kinsella's friends would have a score of 1/7000000000.
You have to implement a function recommend_by_influence that takes the friendship graph and a user ID as arguments and returns the top 10 friend suggestions ordered by influence and then numerically. You may find that the implementation is quite similar to code that you have already written in Problem 1; that is OK.
Do not change the code that you wrote for Problem 1. However, you can reuse most of it.
For every Facebook user with an id that is a multiple of 1000, print a list containing the first 10 friend recommendations, as determined by influence score. If there are fewer than 10 recommendations, print all the recommendations.
Output format is the same as in Problem 1. Paste the output into your code file.
Does the change of recommendation algorithm make any difference? Maybe not: you can see that Mercutio gets the same friend recommendations with both recommendation approaches. Does everyone get identical results with the two recommendation approaches?
If you try this on the Romeo Juliet graph, you will find that there are 5 people for whom the recommendations are the same, and 6 people for whom the recommendations are different.
Considering only those 63 Facebook users with an id that is a multiple of 1000, compute and print the number of Facebook users who have the same first 10 friend recommendations under both recommendation systems, and the number of Facebook users who have different first 10 friend recommendations under the two recommendation systems. This program will take some time to compute (at least a couple of minutes). Note that order of the 10 recommendations should not matter in this Problem.
Print output format:
// Same:
// Different:
Paste the output into your code file.
In Problem 1, every friend recommendation had a score of 1 or more. In Problem 2, every friend recommendation had a score of less than 1. Will that always be the case?
In comments within your code file, state whether each influence score will always be less than
We have seen that the two recommendation systems give different results. Which one is better?
You will test the two recommendation systems in the following way:
If either of these does not exist (e.g., F1 is not recommended as one of F2's friends), discard the F1-F2 pair from your experiment.
Otherwise, average these two numbers.
The "rank" is also known as the "index" or "position". It starts counting at 1, not 0.
For a perfect recommendation system, the first recommendation for F1 would be F2, and the first recommendation for F2 would be F1. In general, the closer to the front of the list these recommendations are, the better the recommendation system.
In your code file, give the average index for each recommendation system. State which recommendation system is better for the facebook graph.
Print output format:
// Average rank of influence method:
// Average rank of number of friends in common method:
// method is better.
Hint: The average indices should be in the 100-400 range. This is because the social network is very large, you are using very impoverished data (just existing connections) and you are looking for one specific “right” answer (even though in real life many of the recommendations are probably useful).
Note: You will evaluate the recommendation systems on 100 randomly-chosen edges out of the 817,090 edges in the Facebook graph. If you were to choose 100 different randomly-chosen edges, you would get different results. The answers are likely to be informative nonetheless. If you choose 100 edges at random, it is unlikely that you will make an unlucky choice, such that technique 1 is better than technique 2 for those 100 edges, but technique 2 is better than technique 1 overall. It's still a possibility, of course. The more edges you choose at random, the less likely you are to be unlucky in that way. We asked you to do the evaluation on 100 randomly-chosen edges because your code would be very slow if you used (say) 1000 choices.
To prevent different random choices from skewing your results, use the same random choices for both recommendation systems. Another way of saying this is that each time you make a random choice, you should evaluate both recommendation systems using that choice. Then go on to the next choice. Every run of your program will produce slightly different average ranks, but your program should be consistent in terms of which method is better.
Hint: To select a random element from a list, use the index “rand() % list.size()” and include
at the top.
DescriptionIn this final assignment, the students will demonstrate their ability to apply two ma
Path finding involves finding a path from A to B. Typically we want the path to have certain properties,such as being the shortest or to avoid going t
Develop a program to emulate a purchase transaction at a retail store. Thisprogram will have two classes, a LineItem class and a Transaction class. Th
1 Project 1 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of
1 Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of