Undergraduate

Materials for OSTEP (Operating Systems: Three Easy Pieces) book       

  • This page contains the slide sets and the lectures for the EE415@KAIST( Intro. to Operating System)

Materials for Pintos projects              

  • This page contains the slide sets and lecture videos for Pintos Operating Systems. Pintos is an educational operating system developed at Stanford.

System Programming

  • Developing a system program is exciting experience. Programming becomes much easier when you use a right set of tools. Followings are some of the essential tools for writing a system program.
    • shell: You should be able to use the shell commands. It is further great if you can write shell scripts. (notes)
    • vim/emacs: These are text editors. These editors were first created in 1970’s but still are the most popular tools for writing the code. (notes)
    • gdb/kgdb: It is debugger. Debugging is an essential part of the software development. You may need to trace the value of the variables in the course of the program execution. You may want to know the contents of the user stack when the program calls a certain function. gdb/kgdb performs these tasks.(notes)
    • cscope/ctags/etags: They are code navigation tools. A software consists of tens or even hundreds of source files. They are written in all kinds of different programming languages. You may want to find the places where a certain function is called. You may find the definition of a given structure. cscope/ctags/etags allows you to navigate through hundreds of source files very quickly.(notes)
    • doxygen: It is a document generator. Writing a document for a C/C++ codes is boring exercise. There are variety of softwares that generate the document automatically from a given set of source codes.
    • compiler: A compiler is a computer program that translates computer code written in one programming language (the source language) into another programming language (the target language). The name compiler is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g., assembly language, object code, or machine code) to create an executable program.(notes)
    • awk/perl and etc.

EE415: Introduction to Operating System(운영체제 개론)

  • 교재: “운영체제 아주 쉬운 세가지 이야기” , 역자: 원유집, 박민규, 이성진(홍릉과학출판사)
  • Textbook: Operating System: Three Easy Pieces” by Remzi Arpaci-dusseau, and Andrea Arpaci-dusseau,
  • Lecture slide  (This lecture slide was developed with the help from the group of graduate students at ESOSLab @ Hanyang University.)
  • Homepage for EE415: Operating System Spring 2020

EE488: Unix Kernel Design (유닉스 커널 설계)

In this class, we learn the engineering aspect of the operating system design. The students will get exposed to the internal mechanism of then Unix Operating System. We read the code of xv6 operating system. xv6 is teaching operating system written by a group of people at MIT(https://pdos.csail.mit.edu/6.828/2012/xv6.html). xv6 is based upon the Unix Version 6 aka v6 that was developed for PDP11/40 in mid 70’s. This course is implementation intensive.

Suggested Topics for Independent Study in Operating System

The best way to understand the OS is to read the real code. The linux kernel code is large, nearly 1M lines. You have to carefully lay out a plan to digest the code. Reading a 600 page book from the beginning to the end is not a trivial task. I suggest that you choose one small topic and read the codes associated with the specific topic you have decided to focus on. If you can identify the issues in the existing code and can provide the better solution against the existing approach, that is going to be a big win. If you are bold enough, you could plan for donating your code to the open source community. It is more fun to work as a team. You can exchange ideas and can discuss the pros and the cons on the proposed idea. Followings are the selected topics you could work on by yourself.

  • Understanding the physical memory management scheme in Linux
    • These days, it is common that the server is loaded with multiple terabytes of DRAM. With the increase in the physical memory size, TLB miss rate also increases. To improve the TLB miss rate, the Linux provides transparent huge page (THP). With THP, Linux can support from 2 MByte to 1 GByte page size. THP increases the TLB coverage without the increase in the TLB miss rate. However, THP accompanies internal fragmentation and page allocation/initialization overhead. In this topic, we review the physical memory management policy of Linux OS and analyze the issues in the physical memory allocation policies in Linux OS.
  • Buffered Write Overhead in Linux
      In some cases, 4KB Buffered write can take as long as 1 sec. This is due to the page allocation overhead (ext4_write_begin():fs/ext4/inode.c ) and the overhead of finishing the buffered write (ext4_write_end():fs/ext4/inode.c). In this topic, we study EXT4 filesystem and analyze the behavior of the two functions (ext4_write_begin():fs/ext4/inode.c and ext4_write_end():fs/ext4/inode) that are the root cause in the anomalous delay in buffered write.
  •  fsync() mechanism in modern filesystems.
      In this topic, we examine the fsync() algorithm of EXT4, F2FS and XFS and compare the pros and cons of each.  fsync() is the essential component in key-value engine, database logging. It is one of the most time consuming system call. Through this analysis, we understand the behavior of the filesystem journaling, dirty page management module in the page cache.
  • Segment Cleaning in F2FS
      In this topic, we analyze the segment cleaning algorithm of F2FS filesystem. F2FS is the open source filesystem developed by Samsung. Segment cleaning is the most essential activity in log-structured filesystem. We study the concept of log-structured filesystem and review the F2FS code. Log-structured filesystem forms a filesystem partition as an array of the segments. Segment cleaning consolidates the valid filesystem blocks from the segments and resets the segment so that it can host the incoming write requests. To avoid frequent consolidation activity, it is critical that the filesystem blocks with similar life expectancy are clustered on the same segment. In this work, we examine the block classification algorithm of F2FS and analyze the pros and cons of the proposed approaches.
  • multiqueue block device in modern Linux OS
      The recent NVMe based SSD exports multiple command queues. The host’s CPU’s can dispatch the multiple commands in parallel fashion and the commands in these queues are serviced concurrently. In this topic, we analyze the block layer for the multiple command queue and understands how the SSD with multiple command queue interacts with the host. We read the code (block/blk-mq*, drivers/nvme/*). We run IOZone, Mobibench, Filebench benchmark program to characterize the behavior of the multi-queue block layer.
  • Linux Memory manager vs. Android workload
      We examine the memory manager of the linux. We examine the allocation, deallocation, internal and external fragmentation characteristics of the Linux OS. Current Linux memory manager is not well aligned with the memory access characteristics of the Android Apps due to its fragmentations. Memory fragmentation is critical issue in modern computing platform from the smartphone to the large scale server. In this topic, we can devise a new memory allocation algorithm and examine the performance of the algorithm you have developed.
  • Understanding the Log-structured Merge Tree in modern No-SQL DB
      This topic lies between OS and the application. Log-Structured Merge tree is design for the write-performance. It turns the random write into a sequential one. It is the heart of the most key-value storage engine including MongoDB. MongoDB is one of the most popular No-SQL DBMS. It is used to maintain the game items in the game server, the blog posts in SNS and etc. In this topic, we examine the IO behavior of the log-structured merge tree including split and merge.