- 计算资源(SMP):多个CPU,每个里面有多个核,普遍使用SMP架构
- 计算任务(进程管理):多个任务,每个任务多个进程(Linux里面的LWP),互相之间根据优先级抢占运行
- 计算数据(内存管理):多个MM(Memory),每个进程有各自的MM,全局也有公有的MM,进程优先使用自己的
- 计算难点(内核同步):内核同步,在如此复杂并行下保证所有数据的正确性
- 计算细节(Booting、源码结构、虚拟文件系统):如何启动与中止,其内部细节具体如何
- 计算扩容(基于对象的文件系统):将内存独立出去,处理更大的计算量
- 计算迁移(安卓介绍、电源管理):在手机上,电量极少,如何进行计算
- 并行、共享资源,对多进程的支持更好
Process Descriptor: task_struct
- 各种信息:state, thread_info, mm, tty, fs, signal, ...
- PID 16-bit
Process Kernel Stack
- 为当前进程,存下thread_info,其与Process Descriptor可以相互找到
Process List
- 所有的进程
- Doubly linked list
Wait Queue
- 所有Sleeping的进程
- Doubly linked list
Run Queue
- 所有Running的进程
PID Hash Table \(\times\) 4
- pid, tgid, pgid, sid
Process Resouce Limits
Process Switch
Process switch, task switch, context switch
- Hardware context switch: a far jmp (in older Linux)
- Software context switch: a sequence of mov
Performing the Process Switch
- Switching the Page Global Directory
- Switching the Kernel Mode stack and the hardware context
Process switch, task switch, context switch
Process Type
Kernel Process
- Process 0 (swapper process)
- Process 1 (init process)
- Others: keventd, kapm, kswapd, kflushd (also bdflush), kupdated, ksoftirqd, ...
- User Process
Kernel Process
Creating Processes
- clone(), fork(), and vfork()
- kernel_thread(): to create a kernel thread
Destroying Processes
- exit() library function
- Process removal: Releasing the process descriptor of a zombie process by release_task()
Scheduling Policy
- Based on time-sharing: Time slice
- Based on priority ranking: Dynamic
- Classification of processes: Interactive processes, Batch processes, Real-time processes
System Calls
nice() // change the priority
Process Preemption
- 根据公式计算优先级
- Real time priority: 1-99
- Conventional processes: 100-139
Active and Expired Processes
- 区分是否处于其时间片
The schedule() Function
- 进程切换的具体动作
- Direct invocation
- Lazy invocation
Motivation of Completely Fair Scheduler (CFS)
- Red-Black Tree
- Per-CPU variables
- Atomic operation
- Memory barrier:某个空间上的操作,前面的都运行完,再去跑后面的
- Read-Copy Update(RCU):RCU保护下的Kernel Control Path不可以被睡眠
- Spin Lock:曾经很简单,同时就一个人能动;可以一起读,但是只能有一个人写(MP further protection)
- Semaphore:同上改进版,反正比Spin Lock复杂
- SeqLocks:有读的也能写
- 不让某些中断启动,从而防止出问题,注意:是local的
- Local Interrupt disabling
- Local Softirq disabling
Rule of thumb for kernel developers:
- Always keep the concurrency level as high as possible in the system
Two factors:
- The number of I/O devices that operate concurrently
- The number of CPUs that do productive work
- 多个核,一个内存,共享IO和device
- 很多内存,分给各个核,有本地的和公有的,本地的更快(和距离有关)
- 我们有多个核和多个任务
- 一个任务就用一个核的时候,可以不断将任务分配给空闲的核
- 一个任务可以用多个核的时候,可以用多个核同时跑一个任务
- 多个核可能同时操作一块内存空间
- 因此,需要用很多锁来解决这个问题
- Page Frame Management
- Memory Area Management
- Noncontiguous Memory Area Management
- 记下了nothing!没看懂!
Role of VFS
- A common interface to several kinds of filesystems
Filesystems supported by the VFS
- Disk-based filesystems
- Network filesystems
Special filesystems
- eg: /proc
VFS Data Structures
- Superblock objects: super_block structure
- Inode objects: inode structure
- File objects: file structure (Table 12-4)
- Dentry objects: (Table 12-5)
Dentry cache
- A set of dentry objects
- A hash table
Filesystem Types
Files Associated with a Process
- fs field: fs_struct structure
files field: files_struct structure
- fd: file descriptors
- fd[0]: stdin
- fd[1]: stdout
- fd[2]: stderr
NR_OPEN: max # of file descriptors for a process
- Usually 1,048,576
Special Filesystems
- /dev/pts: pseudo terminal support
- /proc: general access point to kernel data structures
- /sys: general access point to system data
- /proc/bus/usb: USB devices
- ...
Filesystem Type Registration
- File_system_type object
- Fs_flags
Files Associated with a Process
Filesystem Handling
Pathname Lookup
Implementation of VFS System Calls
- Open()
- Read()
- Write()
- Close()
File Locking
- 代码结构
- Activity
- Service
- Broadcast Receivers
- Content Provider
Advanced Power Management (APM)
- 完全由BIOS决定
- 看timeout决定是否把某设备power down
Advanced Configuration and Power Interface (ACPI)
- 与BIOS有关,其操作由OS决定
- 通过自动机,软件层的power down
- Global State(4) and Sleeping State(6)
- Legacy State:若这个启动,说明系统不支持或没启用ACPI
Wake Lock
- 如果一个App启动了Wake Lock,那么系统就常亮了
Main Lock
- 系统灭了时,就把它解锁
Early Suspend
- 在Kernel Suspend(灭掉)之前,系统需要做的一些事情
Resume Late
- 类似
Battery Service
- 电池管理
- 10:参与
- 20:Project
- 20:报告(读内核代码 or 参与贡献文档)
- 50:期末考试
- Project、报告要全部做完,怎么也能拿到30+,那么期末就是也要30+,即60分。
- 轻量级进程 LWP(Lightweight Process)
进程描述符 Process Descriptor
task_struct data structure
- 注:tty是控制台终端,ttyx是串口终端
task_struct data structure
PID(Process ID)、TGID(Thread Group ID)
- getpid()
The Process List
- ???
Pidhash Table and Chained Lists
- pid, tgid, pgid, sid
- ???
Wait Queues
- ???
- Process Resource Limits
- Process Switch
Creating Processes
- clone(), fork(), and vfork()
Destroying Processes
- exit() library function: _exit(), exit_group()
- Process removal: Releasing the process descriptor of a zombie process by release_task()
Process Preemption
- Conventional processes: 100-139
- Real time priority: 1-99
Runqueue Balancing in Multiprocessor Systems
3 types of multiprocessor systems
- Classic multiprocessor architecture
- Hyper-threading
3 types of multiprocessor systems
Motivation of Completely Fair Scheduler (CFS)
Red-Black Tree
- Value = fair_clock - wait_runtime + nice (smaller value \(\rightarrow\) higher priority)
- nice(): change the priority
Red-Black Tree
Kernel Control Paths
- a sequence of instructions executed in kernel mode on behalf of current process
Three CPU states are considered
- Running a process in User Mode (User)
- Running an exception or a system call handler (Excp)
- Running an interrupt handler (Intr)
Kernel Preemption
- The main motivation for making a kernel preemptive is to reduce the dispatch latency of the user mode processes: Delay between the time they become runnable and the time they actually begin running
When Synchronization is Necessary
- A race condition can occur when the outcome of a computation depends on how two or more interleaved kernel control paths are nested
To identify and protect the critical regions in exception handlers, interrupt handlers, deferrable functions, and kernel threads
- On single CPU, critical region can be implemented by disabling interrupts while accessing shared data
- If the same data is shared only by the service routines of system calls, critical region can be implemented by disabling kernel preemption while accessing shared data
Things are more complicated on multiprocessor systems
- Different synchronization techniques are necessary
When Synchronization is not Necessary
- The same interrupt cannot occur until the handler terminates
- Interrupt handlers and softirqs are non- preemptable, non-blocking
- A kernel control path performing interrupt handling cannot be interrupted by a kernel control path executing a deferrable function or a system call service routine
- Softirqs cannot be interleaved
Synchronization Primitives
- Synchronizing Accesses to Kernel Data Structures
- Examples of Race Condition Prevention
Introduction on SMP
Categories of Computer Systems
- Single Instruction Single Data (SISD) stream
- Single Instruction Multiple Data (SIMD) stream
- Multiple Instruction Single Data (MISD) stream (Never implemented)
Multiple Instruction Multiple Data (MIMD)
- Master/Slave
- Symmetric Multiprocessors (SMP)
- Clusters
Typical SMP Organization
Categories of Computer Systems
Symmetric multiprocessing (SMP) involves
- a multiprocessor computer hardware and software architecture where two or more identical processors connect to a single, shared main memory, have full access to all I/O devices, and are controlled by a single OS instance that treats all processors equally, reserving none for special purposes.
- Most multiprocessor systems today use an SMP architecture. In the case of multi-core processors, the SMP architecture applies to the cores, treating them as separate processors.
Non-uniform memory access (NUMA) is
- a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor.
- Under NUMA, a processor can access its own local memory faster than non-local memory (memory local to another processor or memory shared between processors).
- The benefits of NUMA are limited to particular workloads, notably on servers where the data are often associated strongly with certain tasks or users
Symmetric multiprocessing (SMP) involves
Process Scheduling with SMP
- 就两张图,看不懂啊......
Synchronization Problem with SMP
- Lock
- 不需要知道啥吧???
- BSP,又称BP,全称Bootstrap Processor,中文:启动CPU
- AP,全称Application Processor,中文:应用CPU
- 本地APIC
- IPI,中文:处理器间中断 \(\leftarrow\) 处理器间通信
- BSP负责操作系统的启动,在启动的最后阶段,BSP通过IPI激活各个AP,在系统的正常运行过程中,BSP和AP基本上是无差别的。
- BIOS初始化(屏蔽AP,建配置表格)
- MBR里的初始程序(GRUB、LILO等)将内核加载到内存
- 执行head.s中的start_up32函数(其末尾将调用start_kernel)
- 执行start_kernel(曾main)
- smp_init() // 启动各AP
- rest_init() // 创建1号进程,自身成为0号进程 \(leftarrow\) cpu_idel()
- 1号进程(init进程)完成其余工作
- 被BSP启动后,在执行head.s中的start_up32函数时,进入initialize_secondary()
- cpu_init()
- smp_callin()
- ...
- return cpu_idle()
- 与UP系统的主要差别是执行进程切换后, 被换下的进程有可能会换到其他CPU上继续运行。在计算优先权时,如果进程上次运行的CPU也是当前CPU,则会适当提高优先权,这样可以更有效地利用 Cache。
- 为了支持SMP,在硬件上需要APIC中断控制系统。Linux定义了各种IPI的中断向量以及传送IPI的函数。
- 作者 Linus Torvalds
- 更多历史详见维基百科 -- Linux
- User mode: Application software, C standard library
- Kernel mode: System Calls, Linux kernel, Hardware
- 性能:效率、速度
- 稳定:健壮、适应
- 能力:多面、灵活、兼容
- 安全
- 可移植
- 可扩展
- 应用
- 系统库(libc)
- 系统调用接口
- 文件系统
- 网络
- 设备驱动
- 调度
- 内存管理
- 架构独立代码
- 硬件
- 巨大
- 分层
- 模块化
- 微核
- 虚拟机 == 源码介绍 ==
- root - The home directory for the root user.
- home - Contains the user's home directories along with directories for services..
- bin - Commands needed during booting up that might be needed by normal users
- sbin - Like bin but commands are not intended for normal users. Commands run by LINUX.
- proc - This filesystem is not on a disk. It is a virtual filesystem that exists in the kernels imagination which is memory.
usr - Contains all commands, libraries, man pages, games and static files for normal operation.
- bin - Almost all user commands. some commands are in /bin or /usr/local/bin.
- sbin - System admin commands not needed on the root filesystem. e.g., most server programs.
- include - Header files for the C programming language. Should be below /user/lib for consistency.
- lib - Unchanging data files for programs and subsystems.
- local - The place for locally installed software and other files.
- man - Manual pages.
- info - Info documents.
- doc - Documentation.
- tmp
- X11R6 - The X windows system files. There is a directory similar to usr below this directory.
- X386 - Like X11R6 but for X11 release 5.
- boot - Files used by the bootstrap loader. Kernel images are often kept here.
- lib - Shared libraries needed by the programs on the root filesystem.
- modules - Loadable kernel modules, especially those needed to boot the system after disasters.
- dev - Device files.
- etc - Configuration files specific to the machine.
- skel - When a home directory is created it is initialized with files from this directory.
- sysconfig - Files that configure the linux system for devices.
var - Contains files that change for mail, news, printers log files, man pages, temp files.
- file
- lib - Files that change while the system is running normally.
- local - Variable data for programs installed in /usr/local.
- lock - Lock files. Used by a program to indicate it is using a particular device or file.
- log - Log files from programs such as login and syslog which logs all logins and logouts.
- run - Files that contain information about the system that is valid until the system is next booted.
- spool - Directories for mail, printer spools, news and other spooled work.
- tmp - Temporary files that are large or need to exist for longer than they should in /tmp.
- catman - A cache for man pages that are formatted on demand.
- mnt - Mount(挂载) points for temporary mounts by the system administrator.
- tmp - Temporary files. Programs running after bootup should use /var/tmp.
- 都是针对各个不同系统架构的源代码
- Each contains kernel, lib, mm, boot and other directories whose contents override code stubs in architecture independent code.
- lib contains highly-optimized common utility routines such as memcpy, checksums, etc.
- 所有Linux的设备驱动,源代码中绝大多数都是这些(大约1.5M)
- device, bus, platform and general directories.
- char – n_tty.c is the default line discipline.
- block – elevator.c, genhd.c, linear.c, ll_rw_blk.c, raidN.c.
- net – specific drivers and general routines Space.c and net_init.c.
scsi – scsi_*.c files are generic;
- sd.c (disk)
- sr.c (CD- ROM)
- st.c (tape)
- sg.c (generic)
- cdrom
- ide
- isdn
- parport
- pcmcia
- pnp
- sound
- telephony
- video
- bus – fc4, i2c, nubus, pci, sbus, tc, usb.
- platform – acorn, macintosh, s390, sgi.
- 虚拟文件系统(VFS,Virtual File System)架构
- 实际文件系统的子目录
- exec.c, binfmt_*.c - files for mapping new process images.
- devices.c, blk_dev.c – device registration, block device support.
- super.c, filesystems.c.
- inode.c, dcache.c, namei.c, buffer.c, file_table.c.
- open.c, read_write.c, select.c, pipe.c, fifo.c.
- fcntl.c, ioctl.c, locks.c, dquot.c, stat.c.
- 这个目录是干嘛的来着~?
- asm-generic - Architecture-dependent include subdirectories.
- Header info needed both by the kernel and user apps.
- Usually linked to /usr/include/linux.
Kernel-only portions guarded by
#ifdef __KERNEL__
- math-emu
- net
- pcmcia
- scsi
- video
- version.c – contains the version banner that prints at boot.
- main.c – architecture-independent boot code.
- start_kernel is the primary entry point.
- 系统级进程间通信工具
- If disabled at compile-time, util.c exports stubs that simply return –ENOSYS.
One file for each facility:
- sem.c – semaphores.
- shm.c – shared memory.
- msg.c – message queues.
- Linux内核的核心代码
sched.c – "the main kernel file":
- scheduler
- wait queues
- timers
- alarms
- task queues
- fork.c, exec.c, signal.c, exit.c etc...
Kernel module support:
- kmod.c, ksyms.c, module.c.
Other operations:
- time.c, resource.c, dma.c, softirq.c, itimer.c.
- printk.c, info.c, panic.c, sysctl.c, sys.c.
- 内核代码不能调用这里的标准C库代码
- brlock.c – “Big Reader” spinlocks.
- cmdline.c – kernel command line parsing routines.
- errno.c – global definition of errno.
- inflate.c – “gunzip” part of gzip.c used during boot.
string.c – portable string code.
- Usually replaced by optimized, architecture- dependent routines.
- vsprintf.c – libc replacement.
- 内存
Paging and swapping:
- swap.c, swapfile.c (paging devices), swap_state.c (cache).
- vmscan.c – paging policies, kswapd.
- page_io.c – low-level page transfer.
Allocation and deallocation:
- slab.c – slab allocator.
- page_alloc.c – page-based allocator.
- vmalloc.c – kernel virtual-memory allocator.
Memory mapping:
- memory.c – paging, fault-handling, page table code.
- filemap.c – file mapping.
- mmap.c, mremap.c, mlock.c, mprotect.c.
- Menu-based kernel configuration.
- Kernel patching.
- Generating kernel documentation. == Booting(是什么、详细介绍:BIOS、MBR、GRUB、LILO、Init Process) ==
- Booting
- 一个引导进程,会在用户打开计算机时,打开操作系统
- Booting Sequence
- 载入操作系统时,计算机做的一系列操作
- 1. Turn on
- 2. CPU jump to address of BIOS (0xFFFF0)
- 3. BIOS runs POST (Power-On Self Test)
- 4. Find bootale devices
- 5. Load and execute boot sector form MBR
- 6. Load OS
- BIOS, Basic Input/Output System
- BIOS是计算机刚打开时运行的一段代码
- BIOS最基本的函数是一段写在芯片里的代码,可以识别和控制计算机的众多设备
- MBR, Master Boot Record
- OS is booted from a hard disk, where the Master Boot Record (MBR) contains the primary boot loader
- The MBR is a 512-byte sector, located in the first sector on the disk (sector 1 of cylinder 0, head 0)
- After the MBR is loaded into RAM, the BIOS yields control to it.
- The first 446 bytes are the primary boot loader, which contains both executable code and error message text
- The next 64 bytes are the partition table, which contains a record for each of four partitions
- The MBR ends with two bytes that are defined as the magic number (0xAA55). The magic number serves as a validation check of the MBR
- To see the contents of MBR, use this command:
- # dd if=/dev/hda of=mbr.bin bs=512 count=1
- # od -xa mbr.bin
- The dd command, which needs to be run from root, reads the first 512 bytes from /dev/hda (the first Integrated Drive Electronics, or IDE drive) and writes them to the mbr.bin file.
- The od command prints the binary file in hex and ASCII formats.
- Boot Loader
- 其实应该叫Kernel Loader,作用是载入Linux Kernel
- Optional, initial RAM disk
- GRUB和LILO是最流行的两种Linux Boot Loader
- GRUB, GRand Unified Bootloader
- GRUB is an operating system independant boot loader
- A multiboot software packet from GNU
- GRUB boot process
- 1. The BIOS finds a bootable device (hard disk) and transfers control to the master boot record
- 2. The MBR contains GRUB stage 1. Given the small size of the MBR, Stage 1 just load the next stage of GRUB
- 3. GRUB Stage 1.5 is located in the first 30 kilobytes of hard disk immediately following the MBR. Stage 1.5 loads Stage 2.
- 4. GRUB Stage 2 receives control, and displays to the user the GRUB boot menu (where the user can manually specify the boot parameters).
- 5. GRUB loads the user-selected (or default) kernel into memory and passes control on to the kernel.
- LILO, LInux LOader
- Not depend on a specific file system
- Can boot from harddisk and floppy
- Up to 16 different images
- Must change LILO when kernel image file or config file is changed
- Kernel
- 大多数计算机操作系统的核心部分
- Kernel会一直存在于内存中,直至断电
- Tasks
- 1. Process management
- 2. Memory management
- 3. Device management
- 4. System call
- Kernel Image
- 压缩过的Kernel图标
- zImage size less than 512 KB
- bzImage size greater than 512 KB
- Major functions flow for Linux kernel boot
- Init Process
- Kernel运行的第一段代码,也是Linux中所有进程的父进程
- The first processes that init starts is a script /etc/rc.d/rc.sysinit
- Based on the appropriate run-level, scripts are executed to start various processes to run the system and make it functional
- Process Id = 1
- Init is responsible for starting system processes as defined in the /etc/inittab file
- Init typically will start multiple instances of "getty" which waits for console logins which spawn one's user shell process
- Upon shutdown, init controls the sequence and processes for shutdown
- Inittab file
- The inittab file describes which processes are started at bootup and during normal operation
- /etc/init.d/boot
- /etc/init.d/rc
- The computer will be booted to the runlevel as defined by the initdefault directive in the /etc/inittab file
- id:5:initdefault:
- RunLevels
- A runlevel is a software configuration of the system which allows only a selected group of processes to exist
- The processes spawned by init for each of these runlevels are defined in the /etc/inittab file
- Init can be in one of eight runlevels: 0-6
- rc#.d files
- rc#.d files are the scripts for a given run level that run during boot and shutdown
- The scripts are found in the directory /etc/rc.d/rc#.d/ where the symbol # represents the run level
- init.d
- Deamon is a background process
- init.d is a directory that admin can start/stop individual demons by changing on it
- /etc/rc.d/init.d/ (Red Hat/Fedora )
- /etc/init.d/ (S.u.s.e.)
- /etc/init.d/ (Debian)
- Start/stop deamon
- Admin can issuing the command and either the start, stop, status, restart or reload option
- i.e. to stop the web server:
- cd /etc/rc.d/init.d/
- (or /etc/init.d/ for S.u.s.e. and Debian)
- httpd stop
OS management
- Process management, memory management
- File systems
Types of devices
- (Char, Block, SCSI, Net)-based devices \(\rightarrow\) device drivers
- Loaded as modules or static in the kernel
- Portability
- IPC (Inter-Process Communication)
- Hardware Management
- Interface Stability
- Module, Kernel Module
- (wiki) An object file that contains code to extend the running kernel;
- (RedHat) Modules are pieces of code that can be loaded and unloaded into the kernel upon demand.
- Advantages
- 1. Allowing the dynamic insertion and removal of code from the kernel at run-time.
- 2. Save memory cost
- Disadvantages
- 1. Fragmentation Penalty \(\rightarrow\) decrease memory performance
# cd /lib/modules/<kernel-version>/ # find . -name "*.ko"
# lsmod
# cat /proc/modules
- Project中有,暂略
- Project中有,暂略
- Project中有,暂略
/proc 是一个虚拟的文件系统
- Real time, resides in the virtual memory
- Tracks the processes running on the machine and the state of the system
- A new /proc file system is created every time your Linux machine reboots
- Highly dynamic. The size of the proc directory is 0 and the last time of modification is the last bootup time.
- /proc file system doesn't exist on any particular media.
- The contents of the /proc file system can be read by anyone who has the requisite permissions.
- Certain parts of the /proc file system can be read only by the owner of the process and of course root. (and some not even by root!!)
- The contents of the /proc are used by many utilities which grab the data from the particular /proc directory and display it.
- eg : top, ps, lspci, dmesg etc
/proc/sys 更改此目录的文件内容,可以实时更改内核变量
- allows you to make configuration changes to a running kernel
- Changing a value within a /proc/sys file is done by the 'echo' command
- Any configuration changes made thus will disappear when the system is restarted
/proc/sys/dev : provides parameters for particular devices on the system
- cdrom/info : many important CD-ROM parameters
- /proc/sys/fs
- acct — Controls the suspension of process accounting based on the percentage of free space available on the filesystem containing the log
- ctrl-alt-del — Controls whether [Ctrl]-[Alt]-[Delete] will gracefully restart the computer using init (value 0) or force an immediate reboot without syncing the dirty buffers to disk (value 1).
- domainname — Allows you to configure the system's domain name, such as domain.com.
- hostname — Allows you to configure the system's host name, such as host.domain.com.
- threads-max — Sets the maximum number of threads to be used by the kernel, with a default value of 4095.
- The random directory data related to generating random numbers for the kernel.
- panic — Defines the number of seconds the kernel will postpone rebooting the system when a kernel panic is experienced. By default, the value is set to 0, which disables automatic rebooting after a panic.
eg : /proc/sys/net/ipv4/ip_forward
- It has default value of "0" which can be seen using 'cat'.
- This can be changed in real time by just changing the value stored in this file from "0" to "1", thus allowing IP forwarding
eg : /proc/sys/net/ipv4/ip_forward
- /proc/sys/vm : facilitates the configuration of the Linux kernel's virtual memory (VM) subsystem
- buddyinfo : Contains the number of free areas of each order for the kernel buddy system
- cmdline : Kernel command line
- cpuinfo : Information about the processor(s).(Human readable)
- devices : List of device drivers configured into the currently running kernel (block and character).
- dma : Shows which DMA channels are being used at the moment.
- execdomains : Execdomains, related to security
- fb : Frame Buffer devices.
- filesystems : Filesystems configured/supported into/by the kernel.
- interrupts : Number of interrupts per IRQ on the x86 architecture.
- iomem : This file shows the current map of the system's memory for its various devices
- ioports : provides a list of currently registered port regions used for input or output communication with a device
- This file represents the physical memory of the system and is stored in the core file format.
- Unlike most /proc files, kcore does display a size. This value is given in bytes and is equal to the size of physical memory (RAM) used plus 4KB.
- Its contents are designed to be examined by a debugger, such as gdb, the GNU Debugger.
- Only the root user has the rights to view this file.
- kmsg : Used to hold messages generated by the kernel. These messages are then picked up by other programs, such as klogd
- Provides a look at load average
- The first three columns measure CPU utilization of the last 1, 5, and 10 minute periods.
- The fourth column shows the number of currently running processes and the total number of processes.
- The last column displays the last process ID used.
- locks : Displays the files currently locked by the kernel
- mdstat : contains the current information for multiple-disk, RAID configurations
- One of the more commonly used /proc files
- It reports back plenty of valuable information about the current utilization of RAM on the system
- misc : This file lists miscellaneous drivers registered on the miscellaneous major device, which is number 10
- modules : Displays a list of all modules that have been loaded by the system
- mounts : This file provides a quick list of all mounts in use by the system
- mtrr : This file refers to the current Memory Type Range Registers (MTRRs) in use with the system
- partitions : Very detailed information on the various partitions currently available to the system
- pci : Full listing of every PCI device on your system
- slabinfo : Information about memory usage on the slab level
- stat : Keeps track of a variety of different statistics about the system since it was last restarted
- swap : Measures swap space and its utilization
- uptime : Contains information about how long the system has on since its last restart
- Tells the versions of the Linux kernel and gcc, as well as the version of
- Red Hat Linux installed on the system.
/proc/<number> 这些数字是进程Id,一个文件夹代表一个进程
- The contents of all the directories are the same as these directories contain the various parameters and the status of the corresponding process.
- You have full access only to the processes that you have started.
- cmdline : it contains the whole command line used to invoke the process. The contents of this file are the command line arguments with all the parameters (without formatting/spaces).
- cwd : symbolic link to the current working directory
- environ : contains all the process-specific environment variables
- exe : symbolic link of the executable
- maps : parts of the process' address space mapped to a file.
- fd : this directory contains the list file descriptors as opened by the particular process.
- root : symbolic link pointing to the directory which is the root file system for the particular process
- status : information about the process
- /proc/self : link to the currently running process
/proc/bus : contains information specific to the various buses available on the system
- eg : for ISA, PCI, and USB buses, current data on each is available in /proc/bus/<bus type directory>
- Individual bus directories, signified with numbers, contains binary files that refer to the various devices available on that bus
- devices file : USB root hub on the motherboard:
/proc/driver : specific drivers in use by kernel
- rtc : output from the driver for the Real Time Clock
- /proc/fs : specific filesystem, file handle, inode, dentry and quota information
/proc/ide : information about IDE devices
- Each IDE channel is represented as a separate directory, such as /proc/ide/ide0 and /proc/ide/ide1
- drivers file : version number of the various drivers
- Device directories : data like cache, capacity, driver, geometry, media, model, settings
/proc/irq : used to set IRQ to CPU affinity
- smp_affinity : which CPUs handle that specific IRQ
/proc/net : networking parameters and statistics
- arp — kernel's ARP table. Useful for connecting hardware address to an IP address on a system.
- dev — Lists the network devices along with transmit and receive statistics.
- route — Displays the kernel's routing table.
- /proc/scsi : like /proc/ide it gives info about scsi devices
- Linux主体是用GNU的C语言编写
- 从c++中吸收了“inline”和“const”
- 支持“属性描述符”(attribute)
- 增加了新的基本数据类型“long long int”用于支持64位cpu
后缀 - 嵌入在C语言代码中的汇编代码
与一般的386汇编语言采用intel定义不同,它采用的是 AT&T定义的格式。主要差别如下:
- Intel中多使用大写字母,而这里大多使用小写字母
- 寄存器名前面要加“%”作为前缀,
- 指令的源操作数与目标操作数的顺序与intel的正好相反。AT&T格式中,源在前,目标在后
- 访问内存的指令的操作数大小(即宽度)由操作码名 称的最后一个字母决定,用作操作码后缀的字母有b( 8位),w(16位),l(32位),e.g movb
- 直接操作数要加“$”作为前缀,intel中不用
- 基本格式 -- asm(“汇编语句” :输出寄存器 :输入寄存器 :会被修改的寄存器);
- 输出和输入寄存器统一按顺序编号,起始是%0
- bootsect.S,这是linux引导扇区的源代码
- setup.S这是辅助程序的一部分
- video.S这是辅助程序的另外一部分,用于引导过程中的屏幕显示
- 另外,子目录compressed中还有两个源代码文件 head.S,misc.c。用于内核映象的解压缩。也属于辅助程序一部分。
- 经过编译,汇编和连接后就形成三个部分:引导扇区的映象bootsetc, 辅助程序setup 和 内核映象本身。
- 大小不超过508KB的内核引导映象称为小映象zImage; 否则称为大内核bzImage
- 加电开机后,intel cpu在实模式下工作,只能使用低端 的640kb(即0XA0000以下)的内存空间(why?)
- 由ROM BIOS或者lilo将启动盘的第一扇区(引导扇区 )的内容装入起始地址为0x7c00的内存空间,然后跳 转到0x7c00开始执行引导扇区的代码
- 该引导扇区内的代码就是bootset.S汇编后生成的二进 制代码
- 该段代码(bootset.S)将自身转移到0x90000处,然后跳 转到那里继续执行,并通过bios提供的“int 0x13”调用 从磁盘上读入setup和内核的映象,然后跳转到setup 的代码中,为执行内核映象做准备
- 这段代码将启动扇区代码由0x7C00移至0x90000处。 Linux将地址为0x90000的代码段称为INITSEG。然后 跳转到go标志,准备一块堆栈,栈底位于 $INITSEG:0x4000-12
- 该段代码利用BIOS中提供的读磁盘调用“int 0x13”从 磁盘将setup.S装入到9000:0200(linux中称之为 SETUPPSEG段),即紧跟在bootsect.S之后,共四 个扇区
- 如果载入失败,则不断尝试循环。除非某次尝试成功 ,否则只有等待系统重启
- LILO(linux loader)也存储在启动扇区中,用以让用户选择上电后使用何种操作系统
- LILO在系统安装阶段建立关于核心代码占用硬盘数据 块的位置的对照表。启动时LILO将利用这张表引导 BIOS装入指定的操作系统
- LILO将用户在启动时输入的命令和参数存储在 empty_zero_page(0x5000)的后半页,供 arch/i386/kernel/setup.c文件的setup_arch()函数使用
- LILO完成任务后,跳转至setup.S程序,转入实模式下 的系统初始化
- setup.S连同内核映象由bootsect.S装入。setup.S从 BIOS获取计算机系统的参数,放到内存参数区,仍在 实模式下运行
- Cpu在setup的执行过程中转入32位保护模式的段式寻 址方式
- 辅助程序setup为内核映象的执行做好准备,然后跳转 到0x100000开始内核本身的执行,此后就是内核的初 始化过程
- 检查签名“55AA5A5A”,此签名位于setup.S代 码段的末尾,判断安装程序是否完全安装进来
- 判断核心(kernel)是否为BIG_KERNEL
- 设置参数
- 关中断
- 检查自身(setup.S)是否在SETUPSEG处
- 置idt(中断描述符表)为空,设置gdt(全局描述符表)
- 真正进入保护模式
保护模式下的核心初始化模块从0x10000开始执行, 负责检查数据区,idt表,页表和寄存器的初始化,同 时进行一些必要的状态检查,最后转入start_kernel() 模块。如果核心系统是压缩存放的,则先执行解压缩 。保护模式下的初始化主要包括:
Arch/i386/boot/compressed目录下的head.S是段保护 模式的汇编程序,先设置堆栈,然后调用同目录 下的 misc.c文件的decompress_kernel()函数解压缩
- 设置堆栈与寄存器
- 检查A20线是否有效
- 数据区BSS全部清零
- 转入核心代码解压缩过程
Arch/i386/boot/compressed目录下的head.S是段保护 模式的汇编程序,先设置堆栈,然后调用同目录 下的 misc.c文件的decompress_kernel()函数解压缩
调用mics.c中的decompress_kernel开始解压缩。解压 缩的步骤为:
- 设置output_buffer
- Makecrc:建立一张CRC(校验)表(lib/inflate.c)
- 调用gunzip()解压缩,同时比较CRC表,如果不一 致说明解压出错。
- 检查kernel的大小
- 解除压缩以后的内核映象放在0x10000处,调转到 此处执行。
调用mics.c中的decompress_kernel开始解压缩。解压 缩的步骤为:
进行解压缩后,核心系统的入口就是arch/i386/kernel 目录下的head.S。系统先初始化寄存器和数据区,然 后执行以下步骤:
- 将ds,es,fs,gs寄存器初始化为_KERNEL_DS的 值
- 进行两级页表的部分初始化。其中第一级 swapper_pg_dir是页目录,页目录的第一个表项所 指的第二级页表称为pg0
- 清空BSS区(未初始化数据区)
- 跳转到setup_idt处对idt表进行初始化
- 复制bootup参数到empty_zero_page
- 检查cpu类型
- 先把swapper_pg_dir清零
- Pg0登记在页目录的第0项和第768项,即把线性地 址0和3G都指向pg0
- 初始化二级页表pg0和pg1
- CPU控制寄存器的初始化:使CR3指向 swapper_page_dir,将CR0的PG位置位(CPU的 paging功能启动位)。CPU的页管理功能便生效
- 两级页表初始化的图示如下:
进行解压缩后,核心系统的入口就是arch/i386/kernel 目录下的head.S。系统先初始化寄存器和数据区,然 后执行以下步骤:
- Gdt表项数=2个内核态段+两个用户态段+4个空闲表 项+4个APM段+2×NRTASK个用于LDT和TSS描述 的段。
- Gdt的初始化的代码如下页图示:
- 2、设定idt寄存器为idt_descr变量的当前值,指向idt表 (共256项),但目前不允许中断(尚未设置中断门)
- 3、在新的页管理方式下,重新设置堆栈,段选择寄存 器,描述符寄存器。
- 前面对CPU进行了初始化,并启动了保护模式
- 中断管理
- 进程管理
- 内存管理
- 设备管理
- 各种数据结构纷繁复杂,需要对各部分进行分析
- 进入保护模式后,系统从start_kernel处开始执行, Start_kernel()函数变成0号进程,不再返回
- Start_kernel显示版本信息,调用setup_arch() (arch/i386/kernel/setup.c):初始化核心的数据结构
- 最后,调用kernel_thread()创建init进程,进行系统配置
- 该部分的代码在init/main.c中
- 调用paging_init()初始化页表
- 调用mem_init()初始化页描述符
- 调用trap_init()和init_IRQ()完成IDT最后的初始化工作
- 调用k_mem_cache_init()和kmem_cache_sizes_init()初始化slab分配器
- 调用time_init()初始化系统日期和时间
- 调用kernel_thread为进程1创建内核线程
- 父进程创建init子进程之后,返回执行cpu_idle
- Init进程(1号进程)首先创建一些后台进程来维护系 统,然后进行系统配置,执行shell编写的初始化程序 。然后转入用户态运行
- 首先调用函数do_basic_setup()做系统初始化的工作( 这之前系统只启动了cpu,内存和一些进程管理方面的 工作)
- 调用free_initmem()函数,将初始化过程中使用的范围 在_init_begin和_init_end之间的页面释放给空闲页面 链表
- 打开一个控制台设备
- 如果存在“/sbin/init”文件,则跳转去执行“/sbin/init”
- 如果存在“/etc/init”文件,则跳转去执行“/etc/init”
- 如果存在“/bin/init”文件,则跳转去执行“/bin/init”
- 如果存在“/bin/sh”文件,则跳转去执行“/bin/sh”
- 在i386中,内核可执行代码在内存中的首地址是否可随意选择?为什么?
- 主引导扇区位于硬盘什么位置?如果一个硬盘的主引导扇区有故障,此硬盘是否还可以使用?
- 在没有LILO的情况下,系统是怎么样引导的
- 进入保护模式为什么要打开A20地址线?
- Linux内核在实模式下的初始化完成哪些功能?
- 进程0和init进程的主要任务是什么?
- 后面整个就不知道思路了,感觉全是碎片,求教!!
- 直接运行下面的命令
cd ~/Downloads/linux-4.5 make make modules_install make install cd /boot mkinitramfs -o /boot/initrd.img-4.5.0 4.5.0
insmod 文件名 [模块参数名=参数值] ...
rmmod 模块名
modinfo *.ko
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
想要操作 /proc 需要
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
static int __init hello_init(void) { printk("<6>Greeting from a linux kernel module.\n"); printk("<6>whom=%s,howmany=%d\n",whom,howmany); proc_create("hello_proc", 0, NULL, &hello_proc_fops); return 0; } module_init(hello_init);
static void __exit hello_exit(void) { remove_proc_entry("hello_proc", NULL); printk("<6>Bye.\n"); } module_exit(hello_exit);
添加变量:module_param(name, type, perm)
static char* whom="world"; static int howmany=1; module_param(howmany, int, S_IRUGO); module_param(whom, charp, S_IRUGO);
- byte(unsigned char)
- short
- ushort
- int
- uint
- long
- ulong
- charp(char* 不超1024字节的字符串)
- bool(int 取值y Y 1 or n N 0)
- invbool(int 同bool 但意义相反)
perm的取值:若不为零,则模块装载后,会在 /sys/module/模块名/parameters/ 目录中产生对应于每个模块参数的文件
#define S_IRUSR 00400 // 文件所有者可读 #define S_IWUSR 00200 // 文件所有者可写 #define S_IXUSR 00100 // 文件所有者可执行 #define S_IRGRP 00040 // 与文件所有者同组的用户可读 #define S_IWGRP 00020 #define S_IXGRP 00010 #define S_IROTH 00004 // 与文件所有者不同组的用户可读 #define S_IWOTH 00002 #define S_IXOTH 00001 // 在 C 语言中,将以上权限用|操作符连接以得到你想设置的权限。:)
- 添加数组:module_param_array(name, type, num, perm)
- num: 是整型指针(int *),模块装载成功后,数组元素个数会被存于 *num。
添加变量:module_param(name, type, perm)
#define KERN_EMERG "<0>" /* system is unusable */ #define KERN_ALERT "<1>" /* action must be taken immediately */ #define KERN_CRIT "<2>" /* critical conditions */ #define KERN_ERR "<3>" /* error conditions */ #define KERN_WARNING "<4>" /* warning conditions */ #define KERN_NOTICE "<5>" /* normal but significant condition */ #define KERN_INFO "<6>" /* informational */ #define KERN_DEBUG "<7>" /* debug-level messages */
模块操作 /proc 文件:
static int hello_proc_show(struct seq_file *m, void *v) { seq_printf(m, "Hello proc!\n"); return 0; } static int hello_proc_open(struct inode *inode, struct file *file) { return single_open(file, hello_proc_show, NULL); } static const struct file_operations hello_proc_fops = { .owner = THIS_MODULE, .open = hello_proc_open, .read = seq_read, .llseek = seq_lseek, .release = single_release, };
#include <linux/init.h> #include <linux/module.h> #include <linux/kernel.h> #include <linux/proc_fs.h> #include <linux/seq_file.h> //struct proc_dir_entry *entry; static char* whom="world"; static int howmany=1; static int hello_proc_show(struct seq_file *m, void *v) { seq_printf(m, "Hello proc!\n"); return 0; } static int hello_proc_open(struct inode *inode, struct file *file) { return single_open(file, hello_proc_show, NULL); } static const struct file_operations hello_proc_fops = { .owner = THIS_MODULE, .open = hello_proc_open, .read = seq_read, .llseek = seq_lseek, .release = single_release, }; static int __init hello_init(void) { printk("<6>Greeting from a linux kernel module.\n"); printk("<6>whom=%s,howmany=%d\n",whom,howmany); proc_create("hello_proc", 0, NULL, &hello_proc_fops); return 0; } static void __exit hello_exit(void) { remove_proc_entry("hello_proc", NULL); printk("<6>Bye.\n"); } module_init(hello_init); module_exit(hello_exit); MODULE_LICENSE("GPL"); module_param(howmany, int, S_IRUGO); module_param(whom, charp, S_IRUGO);
- 全在这篇 博客 里,只要解决一下版本问题就好
cd ~/Downloads/linux-4.5 make clean // 加上这句 make mrproper // 加上这句 make make modules_install make install cd /boot mkinitramfs -o /boot/initrd.img-4.5.0 4.5.0
- 安装自己写的mtest模块
echo listvma > mtest
- 列出当前进程的所有虚拟内存
echo findpage <addr> > mtest
- 找到某虚拟地址对应的物理地址
echo writeval <addr> <value> > mtest
- 向某虚拟地址写对应的值
- 安装自己改好的romfs模块
apt-get install genromfs
- 创建一个文件夹,里面放上各种文件(当然得有NULL文件),把执行权限都删去
genromfs -f <xx.img>
- 生成一个空文件夹
mount -o loop <xx.img> <empty_dir>
ls -l <empty_dir>
,结果应当没有NULL -
ls -l <empty_dir>/NULL
,应当可以看到NULL,并且发现其有x权限 -
cat <empty_dir>/NULL
,应当看不到NULL的内容,只能看到******* -
cat <origin_dir>/NULL
吴晨涛 老师
- 邮箱
- 办公室 SEIEE 3-513
- 职位 Associate Professor Dept. of CSE, SJTU
学历 Dual Ph.D.
- 2012, Electrical and Computer Engineering, Virginia Commonwealth University (VCU), Richmond, VA, USA
- 2010, Computer Architecture, Huazhong University of Science and Technology (HUST), Wuhan, China
研究领域 Data Storage Systems
- Storage management for Big Data
- Cloud storage, Green storage
- Reliable storage systems (e.g., disk arrays)
- Semantic file systems (e.g., object-based storage sys.)
- Cache Algorithms in storage systems
- 领导 Prof. Minyi Guo (Dean of CSE Dept.)
研究领域 Parallel and Distributed Computing
- Parallel and Distributed Systems/Networks
- High Performance Computing
- Cloud Computing
- Big Data
在这里 下载课件、上传Project
- ftp
- User: wuct
- Password: wuct123456
谭超 助教
- 邮箱
- 手机 15821274485
- 没有教材
- Understanding the Linux Kernel 3rd Edition
- Linux Kernel Drivers 3rd Edition
- Linux Kernel Development 3rd Edition
- 计算机组成、操作系统
- C/C++编程
- 理解Linux内核中的C语言编程(即模块编程)
- 理解操作系统内核内容(包括进程、线程、同步、虚拟内存管理、文件管理)
- 学习Linux计算机的内核都实际做了些什么,从芯片一直到应用
Source Insight 3.5
- Download source code from http://www.kernel.org
- Web site:
Source Insight 3.5
- 一本参考书