Hello, I am looking for help for my project for my operating systems course.
For the project, I need to create a file system analysis tool that will analyze partitions. Attached is a zipped file with the project assignment with all the instructions and other resources used for the project.
This program must be programmed in ubuntu and coded in C
1
Objectives:
• Learn internal organization of a file system
• Practice reading bits and bytes
• Practice reading/utilizing Linux kernel code, man pages, and file system libraries/data structures
1 Project Description
In this project, you will write a file system analysis tool called fsa, which can analyze partitions
formatted with the ext2 file system.
Since this program can cause harm to the file system of the storage device to be analyzed, you
will be provided a disk image file (in zipped format due to large size, so unzip before using) where
the content of this file will be the clone of an ext2 formatted disk’s binary content (content from the
operating system’s point of view - sequence of blocks). We will call this file a virtual disk. Your
program will read the given virtual disk and report some general file system statistics, individual
group statistics, and the root directory entries. The virtual disk file will be passed as a command line
argument to your program as follows, where ext2disk_100mb.img is the name of the virtual
disk file to be analyzed:
./fsa ext2disk_100mb.img
As a result, the following information should be printed ("..."s will be filled by your program):
--General File System Information--
Block Size in Bytes: ...
Total Number of Blocks: ...
Disk Size in Bytes: ...
Maximum Number of Blocks Per Group: ...
Inode Size in Bytes: ...
Number of Inodes Per Group: ...
Number of Inode Blocks Per Group: ...
Number of Groups: ...
--Individual Group Information--
-Group ...-
Block IDs: ...
Block Bitmap Block ID: ...
Inode Bitmap Block ID: ...
Inode Table Block ID: ...
Number of Free Blocks: ...
Number of Free Inodes: ...
Number of Directories: ...
Free Block IDs: ...
Free Inode IDs: ...
2
.
.
.
--Root Directory Entries--
Inode: ...
Entry Length: ...
Name Length: ...
File Type: ...
Name: ...
.
.
.
2 Project Details
In order to complete this project, you need to learn ext2 file system details. We have covered the basics
of ext2 in class. Some resources documenting the internals of ext2 can also be found in blackboard as
part of this assignment. If you need more information, there are plenty other resources available on
the Internet related to ext2’s design.
A big help will be installing the e2fslibs-dev package in your Ubuntu machine, if not already
installed, and using the structs defined by this library in your program for some in-memory structures
of the ext2 file system including superblock, inode, group descriptor and directory entry structures.
You can install the e2fslibs-dev package using the following command:
sudo apt-get install e2fslibs-dev
Another important issue is that you should be able to read the content of the provided virtual disk
in raw mode. Below is a sample program called sb.c (also provided in blackboard) showing possible
ways to access the bytes of the provided virtual disk:
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <fcntl.h>
4 #include <unistd.h>
5 #include <ext2fs/ext2fs.h>
6
7 int main(int argc, char *argv[]) {
8
9 int i, rv, fd;
10 unsigned char byte;
11 char *sb1 = NULL;
12 struct ext2_super_block *sb2 = NULL;
13
14 if (argc != 2) {
15 fprintf (stderr, "%s: Usage: ./sb disk_image_file\n", "sb");
16 exit(1);
17 }
18
19 fd = open(argv[1], O_RDONLY);
20 if (fd == -1) {
21 perror("disk_image_file open failed");
22 exit(1);
23 }
3
24
25 /*skip the boot info - the first 1024 bytes - using the lseek*/
26 if (lseek(fd, 1024, SEEK_CUR) != 1024) {
27 perror("File seek failed");
28 exit(1);
29 }
30
31 sb1 = malloc(1024);
32 sb2 = malloc(sizeof(struct ext2_super_block));
33
34 if (sb1 == NULL || sb2 == NULL) {
35 fprintf (stderr, "%s: Error in malloc\n", "sb");
36 exit(1);
37 }
38
39 /*read the superblock byte by byte, print each byte, and store in sb1*/
40 for (i = 0; i < 1024; i++) {
41 rv = read(fd, &byte, 1);
42 if (rv == -1) {
43 perror("File read failed");
44 exit(1);
45 }
46 if (rv == 1) {
47 printf("byte[%d]: 0x%02X\n", i+1024, byte);
48 sb1[i] = byte;
49 }
50 }
51
52 printf ("Total Number of Inodes: %u\n", *(unsigned int *)sb1);
53
54
printf ("Number of Free Inodes: %u\n", *(unsigned int *)(sb1+16));
55
56 /*set the file offset to byte 1024 again*/
57 if (lseek(fd, 1024, SEEK_SET) != 1024) {
58 perror("File seek failed");
59 exit(1);
60 }
61
62 /*read the whole superblock and load into the ext2_super_block struct*/
63 /*assumes the struct fields are laid out in the order they are defined*/
64 rv = read(fd, sb2, sizeof(struct ext2_super_block));
65 if (rv == -1) {
66 perror("File read failed");
67 exit(1);
68 }
69 if (rv == 1024) {
70 printf ("Total Number of Inodes: %u\n", sb2->s_inodes_count);
71 printf ("Number of Free Inodes: %u\n", sb2->s_free_inodes_count);
72 }
73
74 free(sb1);
75 free(sb2);
76 close(fd);
77
78 return 0;
79 }
4
The program above opens the virtual disk in read only mode in line 19, skips the first 1024 bytes
in line 26 (you should know why we skip the first 1024 bytes - hint: read comments), and reads the
superblock info byte by byte printing and storing the bytes being read into sb1 in lines 39-50. This is
one way to reach the superblock info. Another (much easier) way is using the ext2_super_block
struct defined in the ext2_fs.h file. To be able to use this struct, you should include the ext2fs
library as in line 5 (#include <ext2fs/ext2fs.h>) assuming you have already installed the
e2fslibs-dev package. Line 57 shows how to reposition the file offset back to byte 1024, where the
superblock info starts. Then line 64 reads the whole superblock info into the sb2 struct. This assumes
that in-memory layout of the struct is not reordered by the compiler and the struct you use does not
require any padding bytes to be inserted by the compiler. I verified that this approach works in my
64-bit Ubuntu machine using the gcc compiler, but remember that it might not work in different
configurations! You can assume that it will work in the machine that we use for grading.
As a result, "Total Number of Inodes" and "Number of Free Inodes" values are printed to the screen
using two different approaches explained above. Both approaches should print the same values. Also,
each byte of the superblock info is printed to the screen using the first approach. You can redirect the
output of this program into a file as follows:
./sb ext2disk_100mb.img > out.txt
and analyse the output file (out.txt) to see the hexadecimal values of each byte of the superblock.
2.1 Descriptions and Tips
-General File System Information- should only be printed once and you can utilize the
following tips to find out necessary information:
• "Block Size in Bytes" indicates the block size used by the file system in bytes, which is available
in the superblock but not directly. You need to make some calculations to achieve the actual
value in bytes. Check page 4 (1.1.7. s_log_block_size part) of the provided Resource1.pdf
document in blackboard for more information about this calculation.
• "Total Number of Blocks" indicates the total number of blocks available in this partition, which
is directly available in the superblock.
• "Disk Size in Bytes" can be found by multiplying the "Total Number of Blocks" and the "Block
Size in Bytes".
• "Maximum Number of Blocks Per Group" is the maximum amount of blocks that each group
has and it is directly available in the superblock. Note that the last group will most probably
not have this many blocks but all other groups except the last one will definitely have this many
blocks.
• "Inode Size in Bytes" indicates the amount of bytes each inode occupies and it is directly available in the superblock.
• "Number of Inodes Per Group" indicates the amount of inodes created for each group, which is
directly available in the superblock.
• "Number of Inode Blocks Per Group" indicates the number of blocks used to store all the inodes
for each group. You already know the "Number of Inodes Per Group" and the "Inode Size in
Bytes". Using these two values you can calculate the total amount of bytes that the inodes
occupy for each group. Then using this value and the "Block Size in Bytes" value, you can
easily calculate the amount of disk blocks used to store all the inodes for each group.
5
• "Number of Groups" can be calculated using the "Total Number of Blocks" and the "Maximum
Number of Blocks Per Group" values. However, it might get tricky if the last group does not
have enough blocks to fit all necessary data structures and some data blocks. For simplicity,
we assume that the virtual disk we provided has enough blocks in its last group to fit all the
necessary data structures and some data blocks.
-Individual Group Information- should print group-specific information for each group
and you can utilize the following tips to find out these values:
• "-Group ...-" should include the group ID starting with 0 for the first group.
• "Block IDs" indicates the IDs of the blocks that are assigned for this group. Therefore, you will
provide a range here starting with the first block ID of the group and ending with the last block
ID of the group. For this calculation, you can utilize the ID of the first data block given in the
superblock info, the "Maximum Number of Blocks Per Group" value, and the "Total Number of
Blocks" value since the last block might have less blocks than the others.
• "Block Bitmap Block ID" indicates the ID of the first block holding the block bitmap for this
group and it is directly available in the group descriptor of this group.
• "Inode Bitmap Block ID" indicates the ID of the first block holding the inode bitmap for this
group and it is directly available in the group descriptor of this group.
• "Inode Table Block ID" indicates the ID of the first block holding the inode table for this group
and it is directly available in the group descriptor of this group.
• "Number of Free Blocks" indicates the total number of free blocks for this group and it is directly
available in the group descriptor of this group.
• "Number of Free Inodes" indicates the total number of free inodes for this group and it is directly
available in the group descriptor of this group.
• "Number of Directories" indicates the total number of directories for this group and it is directly
available in the group descriptor of this group.
• "Free Block IDs" indicates the IDs of the blocks that are free for this group. In order to find the
free blocks, you need to check the block bitmap bit by bit for every block ID included in this
group. You will not print all the IDs one by one, you will print them using ranges. For example,
if the free block IDs are 1,2,3,4,7,9,10,11,12,13,14,15; then you will only print 1-4, 7, 9-15.
• "Free Inode IDs" indicates the IDs of the inodes that are free for this group. In order to find the
free inodes, you need to check the inode bitmap bit by bit for every inode ID included in this
group and print ranges as in the "Free Block IDs" case. One important issue here is that the
first inode of the filesystem has the ID 1, not 0.
• You can use unsigned char arrays to store a bitmaps.
-Root Directory Entries- should print the directory entries for the root directory only (not
for the sub-directories) and you can utilize the following tips to find out these values:
• Root directory is always stored in the inode ID 2 (remember, inode IDs start from 1 instead of
0). You need to retrieve this inode and check the content of its data blocks to reach the directory
entries of the root directory. Checking only the first data block of the root inode (i_block[0])
will be sufficient for this project since our virtual disk does not have so many files/subdirectories
in the root directory. In the first data block of the root inode, you can find all necessary directory
entries structured in the ext2_dir_entry2 format and considering their length (rec_len),
you can iterate through them.
• "Inode" indicates the inode number of the entry and it is directly available in the related directory
entry for every iteration.
6
• "Entry Length" indicates the length of this directory entry in bytes and it is directly available in
the related directory entry. This length will be used to determine the starting position of the next
directory entry.
• "Name Length" indicates the length of the file name in characters and it is directly available in
the related directory entry.
• "File Type" indicates the type of the file (1: Regular, 2: Directory etc.) and it is directly available
in the related directory entry. It is sufficient to print the integer value.
• "Name" indicates the name of the file/subdirectory and it is directly available in the related
directory entry.
2.2 Extensions
First, you can also extend your program so that it can print the directory entries for the entire filesystem, not only for the root directory. In addition, you can add the functionality of printing the actual
content of the file given a file name. Finally, you can extend your program to work with other file
systems as well, such as ext3, ext4, xfs, etc. Note that these extensions are not mandatory and will
not be tested.
Instead of using a virtual disk, you can also work on a real disk and directly open a partition and
analyse the file system sitting on that partition using the read system call. For instance, line 19 of the
above code can be replaced with:
fd = open("/dev/sda", O_RDONLY);
to analyse the partition "sda". However, you should make sure that the device file you opened is not
modified. Modifying the device file can cause file system corruption. Alternatively, you can create
the image of a partition into a binary file using the dd command and work on the virtual disk that you
created as we do in this project. Please read the man page of the dd command carefully if you want
to use it.
3 Development
You will develop your program in a Unix environment using the C programming language. gcc will
be used as the compiler. You will be provided a Makefile and your program should compile without
any errors/warnings using this Makefile. Black-box testing will be applied but we will not provide
a sample black-box testing script this time. You are free to write your own testing scripts if you
find them useful; however, you will not submit any testing code that you wrote. In order to check
the correctness of the values that your program generates, you can use dumpe2fs command of
Linux, which will dump the superblock and the block group information given the virtual disk file as
a parameter as follows:
dumpe2fs ext2disk_100mb.img
Besides the superblock and the group information, if you want to learn the content of the virtual disk
to be able to check the root directory entries, you can mount it to your file system using the mount
command as follows:"
sudo mount ext2disk_100mb.img /mnt/ext2disk_100mb -t ext2 -o loop,ro,noexec
Before mounting, make sure that you created a mount point:
7
sudo mkdir /mnt/ext2disk_100mb
In order to unmount, you can use the umount command as follows:
sudo umount /mnt/ext2disk_100mb
and after unmounting, you can delete the mount point that you previously created:
sudo rmdir /mnt/ext2disk_100mb
Note that for this project, it is very easy to hard-code the correct values to be printed into
your program and pass our tests without even analyzing the virtual disk. However, you should
understand that this will be considered as plagiarism, and you will receive 0 from this project.
You are also not allowed to use ext2 analyzing/parsing tools (such as dumpe2fs, etc.) and their
libraries inside your program. You will walk through the ext2 on-disk structures with your own
C code. You can only use such tools to compare your program output, test, and debug your
program. You are allowed and encouraged to use the ext2 memory data structures provided by
the e2fslibs library.
4 Submission
Work will be penalized 5 points out of 100 if the submission instructions are not followed. In
addition, memory leaks will be penalized with a total of 5 points without depending on the amount
of the leak. Similarly, compilation warnings will also be penalized with a total of 5 points without
depending onthe type or the number of warnings. You can check the compilation warnings using the
-Wall flag ofgcc.
4.1 What to Submit
1. fsa.c: including the source code of your program.
2. README.txt including the following information:
• Did you fully implement the project as described? If not, what parts are not implemented
at all or not implemented by following the specified implementation rules? Please note
that for this project, it is very easy to print out the required output to the screen without
analyzing the virtual disk yourself. This will be considered as plagiarism!
8
5 Grading
Grading of your program will be done by an automated process. Your programs will also be passed
through a copy-checker program which will examine the source codes if they are original or copied.
We will also examine your source file(s) manually to check if you followed the specified implementation rules, if any.
DescriptionIn this final assignment, the students will demonstrate their ability to apply two ma
Path finding involves finding a path from A to B. Typically we want the path to have certain properties,such as being the shortest or to avoid going t
Develop a program to emulate a purchase transaction at a retail store. Thisprogram will have two classes, a LineItem class and a Transaction class. Th
1 Project 1 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of
1 Project 2 Introduction - the SeaPort Project series For this set of projects for the course, we wish to simulate some of the aspects of a number of
Sun | Mon | Tue | Wed | Thu | Fri | Sat |
---|---|---|---|---|---|---|
30 | 31 | 1 | 2 | 3 | 4 | 5 |
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 | 1 | 2 | 3 |