Skip to content


Linux 2.6/3.0 Changes

Presented by Chris Verges & David Shea on April 2, 2002

Table of Contents


1. Introduction

Linux was written back in the 90s by Linus Torvalds. It aimed to be a
POSIX compliant clone for the x86 machines, but has grown into the
friendly penguin OS we all know and love. "Linux" actually refers to
the kernel of the operating system.

Definition: kernel


\Ker’nel\, n. (1) the inner and usually edible part of a
seed or grain or nut or fruit stone; (2) the choicest or
most essential or most vital part of some idea or
experience; (3) (operating system) the essential part of
Unix or other operating systems, responsible for resource
allocation, low-level hardware interfaces, security, etc.



http://dictionary.reference.com/search?q=kernel

When dealing with the Linux kernel, there are two categories one
generally falls into: stable and unstable. Stable kernels, meaning
they have been tested extensively and are the production version, have
an even minor number. Unstable versions have an odd minor number.

Kernel versioning schema:

x.y.z2.4.21 (Stable)

^ ^ ^2.5.66 (Unstable)

| | |2.6.1(Stable)

| | --- release number

| | --- minor version

| ----- major version

As a rule, NEW USERS SHOULD NOT MESS WITH UNSTABLE KERNELS! :)
However, if you feel adventurous and have wonderful backups/don’t care
about your data, go for it. The official kernel website can be found
at http://www.kernel.org/. If you don’t feel like waiting, point your
browser to http://www.kernel.org/pub/linux/kernel/ for a full listing
of kernel-related source. This archive contains everything from the
original 1.0 kernel up to the latest unstable release, including some
obscure patches perhaps not found elsewhere.


2. Installing 2.5

The latest unstable kernel as of this writing is 2.5.66. Soon

(e.g. within my lifetime) the unstable branch will be locked,
accepting no further changes, and release candidate testing will
begin. Since some things have already been unofficially locked, we
can start discussing those. One of these is the install methodology.

From kernel 2.2, the install script has been:

make mrproper
make menuconfig--- choose the options you want
make dep--- figure out all dependencies
make bzImage--- make the kernel proper
make modules--- compile the modules
make modules_install--- install the modules

Starting with 2.5, the script has been significantly cut back:

make menuconfig
make

By default ‘make’ will create the kernel proper for your architecture
and compile/install all modules. It supports -jN for parallel make
operation (running more than one copy of make at the same time). For
the kernel hackers out there, the make script also will compile
individual files by typing ‘make filename‘. For the graphically
inclined, ‘make xconfig’ can replace ‘make menuconfig’, using the
latest in qt graphic components. (It is slow, but bearable.)

Rumors started awhile back about the kbuild system being used in 2.5;
for those that don’t know, kbuild is an alternative build system for
large projects. It has NOT been included in this release.


3. Major changes in 2.5

  • /proc/stat format changed
  • in-kernel module will now free memory marked __init or __init_data
  • kernel build system (see above section)
  • I/O subsystem reworked
    • faster due to new memory layers
    • 512 byte granularity on O_DIRECT I/O calls
    • access up to 16TB on 32-bit architectures, 8EB on 64-bit
  • /proc/sys/vm/swapiness now allows users to set preference for page
    cache over mapped memory
  • Ingo Molnar’s O(1) scheduler
    • sched_yield() problem
  • preemptive patches included kernel-wide
  • futexes (Fast Userspace Mutexes) (http://ds9a.nl/futex-manpages)
  • kernel threads improvements
    • ptrace functionality
    • /proc updates for threading now
  • core dump with style (/proc/sys/kernel/core_pattern)
  • ALSA included as standard (entered late into 2.4)
    • replaces OSS but provides backwards-compatibility
  • AGP 3.0 supported by overhauled agpgart
  • Faster system calls for chips that support SYSENTR extension
    • Intel Pentium Pro/AMD Athlon and higher
    • need updated glibc (>= 2.3.1) for this to work
  • SCSI is almost completely broken
  • quotas have been completely rewritten
  • CD writing/reading overhaul
  • New filesystems (JFS, XFS, NFSv4, sysfs, CIFS, etc.)
  • CPU Frequency Scaling (like SpeedStep(tm) technology)
  • IPSec included in the mainstream
  • Number of ports expanded


4. Deprecated in 2.5

  • khttpd (kernel-based webserver)
  • DRM for XFree86 4.0 (upgraded for 4.1.0)
  • system call table no longer exported
  • ham radio support moved to userspace
  • must boot from bootloader (e.g. no more straight floppy-based
    booting)
  • swap partitions using version 0 (only supports >= v1)
  • Compressed VFAT removed
    • remember the old DriveSpace from DOS 6.2?
  • usbdevfs
  • elvtune


5. The two R’s of CDs

Beginning with the 2.5 kernel, CD writing and ripping can be performed
under DMA mode. For the hardware illiterate, DMA stands for Direct
Memory Access; it allows certain devices to get a request for a block
of information and fill that request directly to preallocated memory,
leaving the CPU free to do other work. Hard drives and network cards
are the two most well-known DMA devices. Without DMA, a computer must
use PIO (Programmable Input/Output). Under PIO, the CPU must handle
the task of moving individual bytes of data from the device’s buffer
to RAM. The advent of DMA brought speed increases in hard drive and
compact disc technologies. Until now, however, CD writing and audio
ripping was still limited to PIO operation only.

This contribution to the kernel was made by Jens Axboe
axboe at suse dot de. It has actually been availale in patch form for the
2.4 kernel for some time now, though never fully accepted until 2.5.
His work generally tends to center around multimedia block devices
(CDs and DVDs), so his website has great documentation about his speed
enhancements in this area.

http://kernel.org/pub/linux/kernel/people/axboe/


6. The (infamous CS3210) O(1) Scheduler

One fateful day, Ingo sat at his computer, a case of Jolt at his feet
and the stench of 1000 long coding sessions all around. As his
fingers began to type, he wrote …

Okay, what actually happened was an attempt to improve the scheduler
latency. Since the 1.0 kernel, the scheduler in Linux has always been
O(n). The reason for this is the data structures used to represent
active processes in Linux.


6.1. Brief History

A process in Linux is actually represented by a struct. This
struct contains things like the process id (PID), nice value (the
value used to indicate the "interactiveness" of a process), etc.
When a process wanted to be given processor time, it would tell the
scheduler to add its struct to the "active processes" list.

Barring special Real-time processes, each process would be given a
quanta of time per epoch. What’s an epoch? Imagine the point in
the course of human history where caffiene can no longer be
produced — e.g. the end of time as we know it. To the scheduler,
an epoch occurs when all the processes that could be scheduled have
been scheduled. That is, every process available has run on the
CPU for at least its quanta of time. (A quantum is a small slice
of something.)

The scheduler in 2.4 simply maintained a linked list of processes
and would scan the list every time it was run to determine which
process would go next. The scheduling priority in Linux allows for
some processes (interactive ones) to be scheduled before
non-interactive processes that simply hog the CPU. You can set
this level with the ‘nice’ command. A higher niceness indicates an
interactive process. (Nice ranges from 0 … 40.)


6.2. Current History

Scanning this linked list EVERY SINGLE TIME the schedule was run
caused the computer to stay in kernelspace far too long –
latency. In attempting to improve this latency, Ingo Molnar
reasoned that determining the highest priority process available
could be done via a priority queue. (Again, we’re not considering
Real-time processes yet!) His kernel patch takes the original
linked list and transforms it into two priority queues, one of
expired processes and one of active processes.

The highest priority process is removed from the "active" priority
queue. It runs for some quanta. If it is preempted or yields
control and has quanta left, it is placed back on the active
queue. If it has no quanta left, however, it’s quanta is reset and
the process is thrown onto the "expired" queue. This is
recursively done until the active queue is completely empty. At
that point, the empty queue becomes the active queue, and vice
versa. Then the show continues….

Priority queue removal in his scheduler runs at O(1). It uses
slight optimization techniques and hardware bitmapping far beyond
the scope of this presentation. Feel free to ask any questions
afterwards, however. There are a few problems with it:

  1. Interactive processes run for the quanta and then do not get
    scheduled again until ALL OTHER PROCESSES (including
    non-interactive ones) have expired their quanta. This leaves
    an interval for lag when the system has a high load.
  2. sched_yield() now causes processes to sleep for quite some
    time due to the dual-queue approach. This should not affect
    programs, yet some (like OpenOffice) were written to take
    advantage of the benefits in the old version of the
    scheduler. These programs will seem to respond more
    sluggishly until their designers recode those portions.


7. Excuse me, I need to interrupt you

The preemptive patches in the kernel are quite a leap forward in
increasing the responsiveness of the Linux system overall. Combined
with Ingo’s O(1) scheduler, there is an amazing decrease in lag time.
It should be noted, however, that the O(1) scheduler and the
preemptive patches are exclusive in development — e.g. you can patch
a 2.4 kernel with either the O(1) scheduler OR the preemptive patches
OR both, but they can be used separately.

If a process is preemptible, it means it can be interrupted
mid-execution to allow another process (usually with higher priority,
like an interrupt handler) to run. Inside the Linux kernel, however,
procedure calls have never been preemptible. That is, once you make a
system call, you cannot be taken off the processor until the system
call has completed. (On the way out of the system call, the kernel
checks to see if the process needs to be rescheduled.)

The preempt kernel patch, maintained by Robert Love, allows 99.9% of
the kernel to be preempted. There are a few areas that cannot be
preempted (the scheduler and some SMP synchronization code, for
instance), but they disable the preemption mechanism for the duration
of their execution. More information about the preempt patches can be
found at their website:

http://kpreempt.sourceforge.net/


8. User-mode Linux

Run Linux inside Linux! Wait a minute … has the penguin gotten to
you again?!

UML is actually a kernel virtual machine built into the Linux kernel.
It is designed to help developers poke around with the more sensitive
internals without having to crash machines and reboot servers. Some
people have gone so far as to run UML as their "main" system (David
Coulson and http://usermodelinux.org/ being the most notable).

With UML, a developer can test kernel code using normal tools like
electric fence, gdb/dbx, etc. The "real" kernel keeps the usermode
kernel separate from the hardware unless you allow it access; even so,
there are quite a few things that just cannot be done because of
abstraction and handling routines. All in all, however, it is a great
tool for the kernel (non) savvy.


9. Filesystems

Filesystems added in the 2.5 kernel include NFSv4, XFS, JFS, and sysfs

NFSv4 is the new new reimplementation of NFS, and is to include
support for ACLs, a "pseudo filesystem" for client caching of
directories and data, state management, and file locking.

sysfs, not to be confused with the system call of the same name, is
a memory based filesystem for the representation and modification
of kernel object (kobjects). This is similar to the functionality
currently provided by proc, but now everything is a bit more well
defined, and sysfs doesn’t contain process information.

JFS is IBM’s journaling filesystem (from AIX) and XFS is Silicon
Graphic’s journaling filesystem (from IRIX). Filesystems were
covered more thoroughly by Ben McMillan at:
http://lugatgt.org/articles/filesystems/


10. Device Mapper

The kernel device-mapper is a driver that allows the definition of new
block devices consisting of sectors of exisiting block devices. LVM2
uses this to define the logical volumes.


11. Quota

The 2.5 kernel uses a new quota format that allows for 32-bit UIDs
and GIDs, needed for filesystems such as ReiserFS and XFS.


12. CPU Frequency Scaling

CPU frequency scaling allows the user to change the clock speed of the
CPU while the computer is running, which is useful for laptops.
Support for this exists in the 2.5 proper.


13. IPSec

2.5 adds support for a new protocol family type, PF_KEY, and IPsec
network encryption. IPsec support was ported from KAME, and is used
by VPNs and the like.


14. Resources

Posted in Articles.

Tagged with .