Navigation:

Search



Related Articles

Our Friends

Articles Linux 2.6/3.0 Changes
 

Linux 2.6/3.0 Changes

This was written by Chris Verges, David Shea and given on Wed Apr 02 2003.

Table of Contents


1. Introduction

Linux was written back in the 90s by Linus Torvalds. It aimed to be a POSIX compliant clone for the x86 machines, but has grown into the friendly penguin OS we all know and love. "Linux" actually refers to the kernel of the operating system.

Definition: kernel
\Ker'nel\, n. (1) the inner and usually edible part of a seed or grain or nut or fruit stone; (2) the choicest or most essential or most vital part of some idea or experience; (3) (operating system) the essential part of Unix or other operating systems, responsible for resource allocation, low-level hardware interfaces, security, etc.
http://dictionary.reference.co

When dealing with the Linux kernel, there are two categories one generally falls into: stable and unstable. Stable kernels, meaning they have been tested extensively and are the production version, have an even minor number. Unstable versions have an odd minor number.

Kernel versioning schema:
   x.y.z                  2.4.21 (Stable)
   ^ ^ ^                  2.5.66 (Unstable)
   | | |                  2.6.1  (Stable)
   | | --- release number
   | | --- minor version
   | ----- major version

As a rule, NEW USERS SHOULD NOT MESS WITH UNSTABLE KERNELS! :) However, if you feel adventurous and have wonderful backups/don't care about your data, go for it. The official kernel website can be found at http://www.kernel.org/ . If you don't feel like waiting, point your browser to http://www.kernel.org/pub/linu for a full listing of kernel-related source. This archive contains everything from the original 1.0 kernel up to the latest unstable release, including some obscure patches perhaps not found elsewhere.

2. Installing 2.5

The latest unstable kernel as of this writing is 2.5.66. Soon (e.g. within my lifetime) the unstable branch will be locked, accepting no further changes, and release candidate testing will begin. Since some things have already been unofficially locked, we can start discussing those. One of these is the install methodology.

From kernel 2.2, the install script has been:

make mrproper
make menuconfig       > --- choose the options you want
make dep              > --- figure out all dependencies
make bzImage          > --- make the kernel proper
make modules          > --- compile the modules
make modules_install  > --- install the modules

Starting with 2.5, the script has been significantly cut back:

make menuconfig
make

By default 'make' will create the kernel proper for your architecture and compile/install all modules. It supports -jN for parallel make operation (running more than one copy of make at the same time). For the kernel hackers out there, the make script also will compile individual files by typing 'make filename '. For the graphically inclined, 'make xconfig' can replace 'make menuconfig', using the latest in qt graphic components. (It is slow, but bearable.)

Rumors started awhile back about the kbuild system being used in 2.5; for those that don't know, kbuild is an alternative build system for large projects. It has NOT been included in this release.

3. Major changes in 2.5
  • /proc/stat format changed
  • in-kernel module will now free memory marked __init or __init_data
  • kernel build system (see above section)
  • I/O subsystem reworked
    • faster due to new memory layers
    • 512 byte granularity on O_DIRECT I/O calls
    • access up to 16TB on 32-bit architectures, 8EB on 64-bit
  • /proc/sys/vm/swapiness now allows users to set preference for page cache over mapped memory
  • Ingo Molnar's O(1) scheduler
    • sched_yield() problem
  • preemptive patches included kernel-wide
  • futexes (Fast Userspace Mutexes) ( http://ds9a.nl/futex-manpages )
  • kernel threads improvements
    • ptrace functionality
    • /proc updates for threading now
  • core dump with style (/proc/sys/kernel/core_pattern)
  • ALSA included as standard (entered late into 2.4)
    • replaces OSS but provides backwards-compatibility
  • AGP 3.0 supported by overhauled agpgart
  • Faster system calls for chips that support SYSENTR extension
    • Intel Pentium Pro/AMD Athlon and higher
    • need updated glibc (>= 2.3.1) for this to work
  • SCSI is almost completely broken
  • quotas have been completely rewritten
  • CD writing/reading overhaul
  • New filesystems (JFS, XFS, NFSv4, sysfs, CIFS, etc.)
  • CPU Frequency Scaling (like SpeedStep(tm) technology)
  • IPSec included in the mainstream
  • Number of ports expanded
4. Deprecated in 2.5
  • khttpd (kernel-based webserver)
  • DRM for XFree86 4.0 (upgraded for 4.1.0)
  • system call table no longer exported
  • ham radio support moved to userspace
  • must boot from bootloader (e.g. no more straight floppy-based booting)
  • swap partitions using version 0 (only supports >= v1)
  • Compressed VFAT removed
    • remember the old DriveSpace from DOS 6.2?
  • usbdevfs
  • elvtune
5. The two R's of CDs

Beginning with the 2.5 kernel, CD writing and ripping can be performed under DMA mode. For the hardware illiterate, DMA stands for Direct Memory Access; it allows certain devices to get a request for a block of information and fill that request directly to preallocated memory, leaving the CPU free to do other work. Hard drives and network cards are the two most well-known DMA devices. Without DMA, a computer must use PIO (Programmable Input/Output). Under PIO, the CPU must handle the task of moving individual bytes of data from the device's buffer to RAM. The advent of DMA brought speed increases in hard drive and compact disc technologies. Until now, however, CD writing and audio ripping was still limited to PIO operation only.

This contribution to the kernel was made by Jens Axboe axboe at suse dot de . It has actually been availale in patch form for the 2.4 kernel for some time now, though never fully accepted until 2.5. His work generally tends to center around multimedia block devices (CDs and DVDs), so his website has great documentation about his speed enhancements in this area.

http://kernel.org/pub/linux/kernel/people/axboe/

6. The (infamous CS3210) O(1) Scheduler

One fateful day, Ingo sat at his computer, a case of Jolt at his feet and the stench of 1000 long coding sessions all around. As his fingers began to type, he wrote ...

Okay, what actually happened was an attempt to improve the scheduler latency. Since the 1.0 kernel, the scheduler in Linux has always been O(n). The reason for this is the data structures used to represent active processes in Linux.

6.1. Brief History

A process in Linux is actually represented by a struct. This struct contains things like the process id (PID), nice value (the value used to indicate the "interactiveness" of a process), etc. When a process wanted to be given processor time, it would tell the scheduler to add its struct to the "active processes" list.

Barring special Real-time processes, each process would be given a quanta of time per epoch. What's an epoch? Imagine the point in the course of human history where caffiene can no longer be produced -- e.g. the end of time as we know it. To the scheduler, an epoch occurs when all the processes that could be scheduled have been scheduled. That is, every process available has run on the CPU for at least its quanta of time. (A quantum is a small slice of something.)

The scheduler in 2.4> simply maintained a linked list of processes and would scan the list every time it was run to determine which process would go next. The scheduling priority in Linux allows for some processes (interactive ones) to be scheduled before non-interactive processes that simply hog the CPU. You can set this level with the 'nice' command. A higher niceness indicates an interactive process. (Nice ranges from 0 ... 40.)

6.2. Current History

Scanning this linked list EVERY SINGLE TIME the schedule was run caused the computer to stay in kernelspace far too long -- latency. In attempting to improve this latency, Ingo Molnar reasoned that determining the highest priority process available could be done via a priority queue. (Again, we're not considering Real-time processes yet!) His kernel patch takes the original linked list and transforms it into two priority queues, one of expired processes and one of active processes.

The highest priority process is removed from the "active" priority queue. It runs for some quanta. If it is preempted or yields control and has quanta left, it is placed back on the active queue. If it has no quanta left, however, it's quanta is reset and the process is thrown onto the "expired" queue. This is recursively done until the active queue is completely empty. At that point, the empty queue becomes the active queue, and vice versa. Then the show continues....

Priority queue removal in his scheduler runs at O(1). It uses slight optimization techniques and hardware bitmapping far beyond the scope of this presentation. Feel free to ask any questions afterwards, however. There are a few problems with it:

  1. Interactive processes run for the quanta and then do not get scheduled again until ALL OTHER PROCESSES (including non-interactive ones) have expired their quanta. This leaves an interval for lag when the system has a high load.
  2. sched_yield() now causes processes to sleep for quite some time due to the dual-queue approach. This should not affect programs, yet some (like OpenOffice) were written to take advantage of the benefits in the old version of the scheduler. These programs will seem to respond more sluggishly until their designers recode those portions.
7. Excuse me, I need to interrupt you

The preemptive patches in the kernel are quite a leap forward in increasing the responsiveness of the Linux system overall. Combined with Ingo's O(1) scheduler, there is an amazing decrease in lag time. It should be noted, however, that the O(1) scheduler and the preemptive patches are exclusive in development -- e.g. you can patch a 2.4 kernel with either the O(1) scheduler OR the preemptive patches OR both, but they can be used separately.

If a process is preemptible, it means it can be interrupted mid-execution to allow another process (usually with higher priority, like an interrupt handler) to run. Inside the Linux kernel, however, procedure calls have never been preemptible. That is, once you make a system call, you cannot be taken off the processor until the system call has completed. (On the way out of the system call, the kernel checks to see if the process needs to be rescheduled.)

The preempt kernel patch, maintained by Robert Love, allows 99.9% of the kernel to be preempted. There are a few areas that cannot be preempted (the scheduler and some SMP synchronization code, for instance), but they disable the preemption mechanism for the duration of their execution. More information about the preempt patches can be found at their website:

http://kpreempt.sourceforge.ne

8. User-mode Linux

Run Linux inside Linux! Wait a minute ... has the penguin gotten to you again?!

UML is actually a kernel virtual machine built into the Linux kernel. It is designed to help developers poke around with the more sensitive internals without having to crash machines and reboot servers. Some people have gone so far as to run UML as their "main" system (David Coulson and http://usermodelinux.org/ being the most notable).

With UML, a developer can test kernel code using normal tools like electric fence, gdb/dbx, etc. The "real" kernel keeps the usermode kernel separate from the hardware unless you allow it access; even so, there are quite a few things that just cannot be done because of abstraction and handling routines. All in all, however, it is a great tool for the kernel (non) savvy.

9. Filesystems

Filesystems added in the 2.5 kernel include NFSv4, XFS, JFS, and sysfs

NFSv4 is the new new reimplementation of NFS, and is to include support for ACLs, a "pseudo filesystem" for client caching of directories and data, state management, and file locking.

sysfs, not to be confused with the system call of the same name, is a memory based filesystem for the representation and modification of kernel object (kobjects). This is similar to the functionality currently provided by proc, but now everything is a bit more well defined, and sysfs doesn't contain process information.

JFS is IBM's journaling filesystem (from AIX) and XFS is Silicon Graphic's journaling filesystem (from IRIX). Filesystems were covered more thoroughly by Ben McMillan at:
http://lugatgt.org/articles/filesystems/
10. Device Mapper

The kernel device-mapper is a driver that allows the definition of new block devices consisting of sectors of exisiting block devices. LVM2 uses this to define the logical volumes.

11. Quota

The 2.5 kernel uses a new quota format that allows for 32-bit UIDs and GIDs, needed for filesystems such as ReiserFS and XFS.

12. CPU Frequency Scaling

CPU frequency scaling allows the user to change the clock speed of the CPU while the computer is running, which is useful for laptops. Support for this exists in the 2.5 proper.

13. IPSec

2.5 adds support for a new protocol family type, PF_KEY, and IPsec network encryption. IPsec support was ported from KAME, and is used by VPNs and the like.

14. Resources