logo Use CA10RAM to get 10%* Discount.
Order Nowlogo
(5/5)

In this assignment you will demonstrate your understanding of arrays, strings, functions, and the typedef facility.

INSTRUCTIONS TO CANDIDATES
ANSWER ALL QUESTIONS

Assignment 1 

In this assignment you will demonstrate your understanding of arrays, strings, functions, and the typedef facility. You must not make any use of malloc() (Chapter 10) or file operations (Chap- ter 11) in this project. You may use struct types (Chapter 8) if you wish, but are not required to.

 

Web Search

We have all used web search tools. In response to a query, a search engine result page (SERP) is generated, containing (among many other elements) a list of URLs and snippets, with each snippet a short extract from the corresponding web page. A typical snippet contains around 20–30 words and is selected from the corresponding document to show the context in which query terms appear.1 For example, the query “web search algorithms” at a commercial web search service resulted (August 2022) in the following URLs (in blue), page titles (in italics), and snippets (with matching key words in bold) being offered:

 

Note how the fourth snippet is actually composed of two shorter segments, indicated by the “. . . ” in the middle of it. In the other four snippets the generator decided that a single contiguous segment of text was the best representation.

 

1In early search services a snippet was generated for each document at the time it was indexed, and was independent of the any query. Query-biased snippets involve rather more computation and add to the time taken to evaluate queries, but are generally agreed to be more intuitive for users to digest and react to.

 

Your Mission... In this assignment you will build a program that reads paragraphs of text from stdin, builds a snippet for each paragraph according to certain rules, and then writes those snippets to stdout. You can either use Grok and the “terminal” facility that it provides, or develop your program outside of Grok. You should make use of functions in ctype.h and string.h. There is a handout linked from the LMS that provides guidance on the use of the Grok terminal.

Submission is via upload to the LMS. You cannot submit via Grok. Note that to be eligible for maximum marks in each stage you must exactly match the required output, including the formatting.

 

Stage 0 – Getting Started (maximum: 0/20 marks)

Copy the skeleton program ass1-skel.c and sample input files from the Assignment 1 LMS page, and check that you can compile the program via either Grok or gcc. The skeleton program contains a simple main() function and a version of the get word() function (make careful note of how it is different), and when executed it simply copies the input to the output, one word per line, sometimes including trailing punctuation. Try compiling and running the skeleton program before you do any- thing else: The skeleton file also includes a very important authorship declaration. Substantial mark penalties apply if you do not include it in your submission, or if you do not sign it. This is an absolute requirement, and no excuses of any sort will be accepted.

 

Stage 1 – Reading Text (maximum: 12/20 marks)

Now alter function get word() so that it returns one of three status values: WORD FND if it identified a word; PARA END if instead of a word, a paragraph boundary was identified; and EOF if the end of the input file was reached. Paragraphs are identified in the input text by the combination "\n\n", that is, two consecutive newline characters; or by reaching the end of file after one or more alphabetic words have been found since the last paragraph boundary A suitable #define appears in the skeleton code.

Once that is done, alter the processing loop in the main() so that it processes one paragraph at

a time rather than one word at a time. To accomplish this step you will probably want to introduce a new function (perhaps called get paragraph()) that calls get word() repeatedly, to build up a paragraph of words in an array. You may assume that each paragraph has at most MAX PARA LEN words in it, each of at most MAX WORD LEN characters.

Output: The required output from this stage is a message for each paragraph that indicates the paragraph number, and its length in words. See the LMS for examples of what is required.

 

Stage 2 – Matching Words and Printing Text (maximum: 16/20 marks)

The second task to write an output function that writes each paragraph’s words across the page, making sure that no line contains more than 72 characters. Each “word” that is output should be separated from the previous word in the same line by a single blank character. You’ll need to keep track of how many bytes have been written so far in each output line, and insert a newline character (instead of a blank) if the next word needing to be written won’t fit within the current line.

Then alter your program so that any words in each paragraph that match any words supplied on the command-line (use argc and argv to access these) in a case-insensitive manner (look at the library function strncasecmp()) are “bolded” in the output text by putting “**” before and after them. Be sure to handle the punctuated words properly; the punctuation should be retained, but outside the “**”. For example, if “web” is supplied on the command-line as one of the query terms, then the word “Web;” in the input text must place “**Web**;” into the output text. (Yes, in a real program we’d generate HTML tags to make the selected words bold, but for simplicity we’ll stay with plain text.)

Output: Examples showing the required output are linked from the LMS page.

 

Stage 3 – Building Snippets (maximum: 20/20 marks)

Suppose that the score of a snippet relative to a set of query terms is calculated as:

Add 15/(s + 10) points, where s is the start word of the snippet, counting from the first word of the paragraph as word number zero; plus

Add l/2 points for each different term that appears in the snippet, where l is the length of that query term; plus

Add 1.0 points for every other repetition of any query term that appears multiple times in the snippet; plus

Add 0.6 points if the snippet starts with the word immediately following a punctuated word (taking word “−1” to also be punctuated); plus

Add 0.3 points if the snippet ends with a punctuated word; and then

Subtract 0.1 points for each word that the snippet exceeds MIN SNIPPET LEN in length.

Snippets cannot be longer than MAX SNIPPET LEN words. Similarly, snippets cannot be shorter than MIN SNIPPET LEN words, except if the input paragraph is also shorter than MIN SNIPPET LEN words, in which case the whole input paragraph always becomes the snippet. If a snippet ends with a punc- tuated word that has a period “.” or a question mark “?” or a exclamation mark “!” attached, then no “. . . ” should be added. All other cases should have dots added immediately after the last word in the snippet, even if it takes the last output line past 72 characters.

Note that in the web search example on the first page the query term “algorithm” also matches against the word “algorithms”, this is a process known as stemming. Whole books have been writ- ten about stemming rules for English, so that “compute”, “computer”, “computed”, “computation”, “computational”, “computing”, “computability”, “computably”, and so on, are all taken as being the same underlying word. Stemming is definitely outside the scope of this assignment, and we will just locate exact (except for case, and for trailing punctuation) matches against the query terms.

Extend your Stage 2 program so that the highest-scoring contiguous snippet is identified for each input paragraph. In the case of scores that are equal, select the snippet that starts closest to the begin- ning of the paragraph. And if there are still ties, print out the shortest legal maximum-score snippet starting at that point. An exhaustive search approach to finding the best snippet in each paragraph is perfectly acceptable, and there won’t be marks allocated to efficiency (unless your code is so bad it can’t run in a reasonable time on typical non-extreme paragraphs of text). You should use standard functions such as strncasecmp() to compare words – no need for KMP or BMH in this task.

Output: Your output for this stage should show the required maximal-score snippet for each input paragraph, with query terms highlighted. Lines should again be broken so that they are at most 72 characters long (by calling the same function, I hope). See the LMS for examples.

 

 

(5/5)
Attachments:

Related Questions

. Introgramming & Unix Fall 2018, CRN 44882, Oakland University Homework Assignment 6 - Using Arrays and Functions in C

DescriptionIn this final assignment, the students will demonstrate their ability to apply two ma

. The standard path finding involves finding the (shortest) path from an origin to a destination, typically on a map. This is an

Path finding involves finding a path from A to B. Typically we want the path to have certain properties,such as being the shortest or to avoid going t

. Develop a program to emulate a purchase transaction at a retail store. This program will have two classes, a LineItem class and a Transaction class. The LineItem class will represent an individual

Develop a program to emulate a purchase transaction at a retail store. Thisprogram will have two classes, a LineItem class and a Transaction class. Th

. SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

1 Project 1 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of

. Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of Sea Ports. Here are the classes and their instance variables we wish to define:

1 Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of

Ask This Question To Be Solved By Our ExpertsGet A+ Grade Solution Guaranteed

expert
Um e HaniScience

576 Answers

Hire Me
expert
Muhammad Ali HaiderFinance

945 Answers

Hire Me
expert
Husnain SaeedComputer science

531 Answers

Hire Me
expert
Atharva PatilComputer science

839 Answers

Hire Me