User:Chillygs:jilu09

From Trusted Cloud Group
Jump to: navigation, search

从suspend到resume (with TOI patch)

Contents

Background

STR & STD

STR = Suspend to ram
STD = Suspend to disk

APCI

ACPI共有六种状态
  分别是S0到S5,它们代表的含义分别是:
  S0--实际上这就是我们平常的工作状态,所有设备全开,功耗一般会超过80W;
  S1--也称为POS(Power on Suspend),这时除了通过CPU时钟控制器将CPU关闭之外,其他的部件仍然正常工作,这时的功耗一般在30W以下;
  S2--这时CPU处于停止运作状态,总线时钟也被关闭,但其余的设备仍然运转;
  S3--这就是我们熟悉的STR(Suspend to RAM),这时的功耗不超过10W;
  S4--也称为STD(Suspend to Disk),这时系统主电源关闭,硬盘存储S4前数据信息,所以S4是比S3更省电状态.
  S5--这种状态是最干脆的,就是连电源在内的所有设备全部关闭,即关机(shutdown),功耗为0。

about TOI

TOI = Tuxonice
Tuxonice是一个专门做hibernate的软件项目,对linux 2.6内核支持用打补丁方式加入其代码。它最大的优点在suspend时可以保存1/2以上内存镜像,且镜像size较小。
TOI文档中描述的suspend steps:
a. freezing system activity
b. eating memory
c. allocation of storage
d. save pageset2 (active & inactive list, page list)
e. suspend drivers, store processor context
f. atomic copy(in this stage, everything halt except Tuxonice)
g. safe pageset1 (including kernel code)
h. image heads
i. power down

Kernel 的第一个入口

此段可参考linux源代码情景分析中系统重引导部分。
文件:Kernel/sys.c
函数:SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd, void __user *, arg)

 #ifdef CONFIG_HIBERNATION
 case LINUX_REBOOT_CMD_SW_SUSPEND:
 ret = hibernate();
 break;
 #endif

关注ret = hibernate();
文件:Kernel/power/man.c
函数:int hibernate(void)

 test_action_state(TOI_REPLACE_SWSUSP) 
 mutex_lock(&pm_mutex);
 atomic_add_unless(&snapshot_device_available, -1, 0)
 pm_prepare_console();
 pm_notifier_call_chain(PM_HIBERNATION_PREPARE);
 usermodehelper_disable();
 create_basic_memory_bitmaps();
 sys_sync();
 prepare_processes(); 实际调用freeze_processes(), 不成功则thaw_processes()
 hibernation_snapshot(hibernation_mode == HIBERNATION_PLATFORM);
 swsusp_write(flags);
 swsusp_free();
 power_down();

只关注power_down(),因为之前无论是STR还是STD都会分配内存,保存resume image。STR也需要这么做是为了防止电源耗尽后系统无法resume。
文件:kernel/power/hibernate.c
函数:static void power_down(void)

 switch (hibernation_mode) {
 case HIBERNATION_TEST:
 case HIBERNATION_TESTPROC:
 	break;
 case HIBERNATION_REBOOT:
 	kernel_restart(NULL);
 	break;
 case HIBERNATION_PLATFORM:
 	hibernation_platform_enter();  // platform support ACPI 
 case HIBERNATION_SHUTDOWN:
 	kernel_power_off();  // ACPI isn’t necessary
 	break;

可以看出Tuxonice在powerdown时可以不依赖platform。当BIOS支持ACPI时,会执行hibernation_platform_enter();如果BIOS不支持ACPI,则执行kernel_power_off()。一些论坛上争论Tuxonice需不需要ACPI或者APM支持,从代码来看是不需要的。不过如果BIOS没有任何电源管理的支持,Tuxonice能否suspend2ram,就不得而知了(suspend2disk肯定没问题)。
补充:STR必须得到platform的支持
如果是走了HIBERNATION_SHUTDOWN这个case,则直接调用kernel_power_off关机 文件:kernel/sys.c
函数:void kernel_power_off(void)

 kernel_shutdown_prepare(SYSTEM_POWER_OFF);
 pm_power_off_prepare();
 disable_nonboot_cpus();
 sysdev_shutdown();
 machine_power_off();

但如果选择了HIBERNATION_PLATFORM,则开始准备休眠
文件:kernel/power/hibernate.c
函数:int hibernation_platform_enter(void)

 /*
  * We have cancelled the power transition by running
  * hibernation_ops->finish() before saving the image, so we should let
  * the firmware know that we're going to enter the sleep state after all
  */
 hibernation_ops->begin();
 entering_platform_hibernation = true;
 suspend_console();
 dpm_suspend_start(PMSG_HIBERNATE);
 dpm_suspend_noirq(PMSG_HIBERNATE);
 disable_nonboot_cpus();
 local_irq_disable();
 sysdev_suspend(PMSG_HIBERNATE);
 hibernation_ops->enter();
 /* We should never get here */
 while (1);


依赖platform的休眠过程 STR

hibernation_ops 包括了抽象了的platform提供的功能,包含一组函数入口地址
static struct platform_hibernation_ops *hibernation_ops; 找其定义
include/linux/suspend.h

 struct platform_hibernation_ops {
 	int (*begin)(void);
 	void (*end)(void);
 	int (*pre_snapshot)(void);
 	void (*finish)(void);
 	int (*prepare)(void);
 	int (*enter)(void);
 	void (*leave)(void);
 	int (*pre_restore)(void);
 	void (*restore_cleanup)(void);
 	void (*recover)(void);
 };

需要关注的两个函数:int (*begin)(void);int (*enter)(void);
找到注释部分,简略说明各个函数的功能

  * @begin: Tell the platform driver that we're starting hibernation.
 *	Called right after shrinking memory and before freezing devices.
 *
 * @enter: Put the system into the low power state after the hibernation image
 *	has been saved to disk.
 *	Called after the nonboot CPUs have been disabled and all of the low
 *	level devices have been shut down (runs with IRQs off).

此外在include/linux/suspend.h中看到TOI相关的函数
extern int try_tuxonice_hibernate(void);
extern void try_tuxonice_resume(void);
先标记下

在kernel/power/hibernate.c文件中有这样一个函数:
void hibernation_set_ops(struct platform_hibernation_ops *ops)
它的作用是为hibernate_ops选择一套合适的函数,对于pre-ACPI 2.0选择acpi_hibernation_ops_old,对于ACPI 2.0选择acpi_hibernation_ops定义的一套函数
在drivers/acpi/sleep.c文件中,可以看到acpi_hibernation_ops是这样一组函数:

 static struct platform_hibernation_ops acpi_hibernation_ops = {
 	.begin = acpi_hibernation_begin,
 	.end = acpi_pm_end,
 	.pre_snapshot = acpi_hibernation_pre_snapshot,
 	.finish = acpi_hibernation_finish,
 	.prepare = acpi_pm_prepare,
 	.enter = acpi_hibernation_enter,
 	.leave = acpi_hibernation_leave,
 	.pre_restore = acpi_pm_disable_gpes,
 	.restore_cleanup = acpi_pm_enable_gpes,
 };

也就是说当执行hibernation_ops->begin();时,实际执行了acpi_hibernation_begin
当执行hibernation_ops->enter();时,实际执行了acpi_hibernation_enter
acpi_suspend_enter函数是真正完成suspend的函数
文件drivers/acpi/sleep.c

 static int acpi_suspend_enter(suspend_state_t pm_state)
 {	ACPI_FLUSH_CPU_CACHE();
 	/* Do arch specific saving of state. */
 	if (acpi_state == ACPI_STATE_S3) {
 		int error = acpi_save_state_mem();
 		if (error)
 			return error;
 	}
 		……

为Resume做好准备

TOI save memory是为了保存镜像, ACPI save mem是设置resume时的GDT等
文件 arch/x86/kernel/acpi/sleep.c

 /**
  * acpi_save_state_mem - save kernel state
  *
  * Create an identity mapped page table and copy the wakeup routine to
  * low memory.
  *
  * Note that this is too late to change acpi_wakeup_address.
  */

看注释,这个函数主要是操作内存页表而不是单纯save memory

 int acpi_save_state_mem(void)
 {
 	struct wakeup_header *header;
  
 	if (!acpi_realmode) {
 		printk(KERN_ERR "Could not allocate memory during boot, "
 		       "S3 disabled\n");
 		return -ENOMEM;
 	}

wakeup_header结构体定义在arch/x86/kernel/acpi/realmode/wakeup.h中,这个文件同时包含了HEADER_OFFSET=0x3f00, WAKEUP_SIZE=0x4000。
因为注释中提到wakeup.s就去看了下,是用汇编写的这个结构体,还有一段wakeup_code, 用于在BIOS不干活的情况下,完成从PM到RM的转换(保护模式——>实模式)

 /* This must match data at wakeup.S */
 struct wakeup_header {
 	u16 video_mode;		/* Video mode number */
 	u16 _jmp1;		/* ljmpl opcode, 32-bit only */
 	u32 pmode_entry;	/* Protected mode resume point, 32-bit only */
 	u16 _jmp2;		/* CS value, 32-bit only */
 	u32 pmode_cr0;		/* Protected mode cr0 */
 	u32 pmode_cr3;		/* Protected mode cr3 */
 	u32 pmode_cr4;		/* Protected mode cr4 */
 	u32 pmode_efer_low;	/* Protected mode EFER */
 	u32 pmode_efer_high;
 	u64 pmode_gdt;
 	u32 realmode_flags;
 	u32 real_magic;
 	u16 trampoline_segment;	/* segment with trampoline code, 64-bit only */
 	u8  _pad1;
 	u8  wakeup_jmp;
 	u16 wakeup_jmp_off;
 	u16 wakeup_jmp_seg;
 	u64 wakeup_gdt[3];
 	u32 signature;		/* To check we have correct structure */
 } __attribute__((__packed__));
 
 extern struct wakeup_header wakeup_header;
 #endif
 
 #define HEADER_OFFSET 0x3f00
 #define WAKEUP_SIZE   0x4000

继续acpi_save_state_mem函数

 memcpy((void *)acpi_realmode, &wakeup_code_start, WAKEUP_SIZE);
 header = (struct wakeup_header *)(acpi_realmode + HEADER_OFFSET);

acpi_realmode在本文件中已经定义:

 /* address in low memory of the wakeup routine. */
 static unsigned long acpi_realmode;

wakeup_code_start在arch/x86/kernel/acpi/realmode/wakeup_rm.s里面:

 /*
  * Wrapper script for the realmode binary as a transport object
  * before copying to low memory.
  */
 	.section ".rodata","a"
 	.globl	wakeup_code_start, wakeup_code_end
 wakeup_code_start:
 	.incbin	"arch/x86/kernel/acpi/realmode/wakeup.bin"

暂时不明白这段copy目的是什么,copy了0x4000大小,指针wakeup_head移动到0x3f00的地方。wakeup.bin这个文件大小为16K,不确定是什么时候生成的。

 if (header->signature != 0x51ee1111) {
 		printk(KERN_ERR "wakeup header does not match\n");
 		return -EINVAL;
 	}
 	header->video_mode = saved_video_mode;
 	header->wakeup_jmp_seg = acpi_wakeup_address >> 4;

可以看出wakeup.bin中0x3f00开始是header结构体,memcpy之后header应该填充了签名。wakeup的时候应该是实模式,寻址还是段页式

   /*
 	 * Set up the wakeup GDT.  We set these up as Big Real Mode,
 	 * that is, with limits set to 4 GB.  At least the Lenovo
 	 * Thinkpad X61 is known to need this for the video BIOS
   	 * initialization quirk to work; this is likely to also
 	 * be the case for other laptops or integrated video devices.
 	 */
 
 	/* GDT[0]: GDT self-pointer */
 	header->wakeup_gdt[0] =
 		(u64)(sizeof(header->wakeup_gdt) - 1) +
 		((u64)(acpi_wakeup_address +
 			((char *)&header->wakeup_gdt - (char *)acpi_realmode))
 				<< 16);
 	/* GDT[1]: big real mode-like code segment */
 	header->wakeup_gdt[1] =
 		GDT_ENTRY(0x809b, acpi_wakeup_address, 0xfffff);
 	/* GDT[2]: big real mode-like data segment */
 	header->wakeup_gdt[2] =
 		GDT_ENTRY(0x8093, acpi_wakeup_address, 0xfffff);

Big Real Mode,实模式,但limit是4G,意义何在?

 store_gdt((struct desc_ptr *)&header->pmode_gdt);
 	header->pmode_efer_low = nx_enabled;
 	if (header->pmode_efer_low & 1) {
 		/* This is strange, why not save efer, always? */
 		rdmsr(MSR_EFER, header->pmode_efer_low,
 			header->pmode_efer_high);
 header->pmode_cr0 = read_cr0();
 header->pmode_cr4 = read_cr4_safe();
 header->realmode_flags = acpi_realmode_flags;
 header->real_magic = 0x12345678;

EFER Extended Feature Enable Register 是AMD K6处理器中增加的寄存器
后面的代码都是config 64位和SMP情况下的。acpi_save_state_mem函数将wakeup时需要的实模式GDT、段寄存器以及当前保护模式下的段寄存器保存到header结构体中,位于内存低地址0x3f00
返回函数acpi_suspend_enter,现在已经完成了flush cache,保存设置所需要GDT、寄存器,接下来:
文件drivers/acpi/sleep.c

 local_irq_save(flags);
 acpi_enable_wakeup_device(acpi_state);

保存IRQ的状态,然后要根据acpi_state来设置能够响应唤醒的设备

 switch (acpi_state) {
 	case ACPI_STATE_S1:
 		barrier();
 		status = acpi_enter_sleep_state(acpi_state);
 		break;
 	case ACPI_STATE_S3:
 		do_suspend_lowlevel();
 		break;
 	}

真正suspend啦

do_suspend_lowlevel()这个函数暂时找不到……

 /* If ACPI is not enabled by the BIOS, we need to enable it here. */
 	if (set_sci_en_on_resume)
 		acpi_write_bit_register(ACPI_BITREG_SCI_ENABLE, 1);
 	else
 		acpi_enable();

ACPI和操作系统的分工,网上看到的一个说法是BIOS搜集硬件信息,定义电源管理方案,操作系统负责执行。

 /* Reprogram control registers and execute _BFS */
 	acpi_leave_sleep_state_prep(acpi_state);
 	/* ACPI 3.0 specs (P62) says that it's the responsibility
 	 * of the OSPM to clear the status bit [ implying that the
 	 * POWER_BUTTON event should not reach userspace ]
 	 */
 	if (ACPI_SUCCESS(status) && (acpi_state == ACPI_STATE_S3))
 		acpi_clear_event(ACPI_EVENT_POWER_BUTTON);
 	/*
 	 * Disable and clear GPE status before interrupt is enabled. Some GPEs
 	 * (like wakeup GPE) haven't handler, this can avoid such GPE misfire.
 	 * acpi_leave_sleep_state will reenable specific GPEs later
 	 */
 	acpi_disable_all_gpes();
 	local_irq_restore(flags);
 	printk(KERN_DEBUG "Back to C!\n");

这一大段,看不懂,不知道什么是GPE,猜测General Power Event

 /* restore processor state */
 	if (acpi_state == ACPI_STATE_S3)
 		acpi_restore_state_mem();
 	return ACPI_SUCCESS(status) ? 0 : -EFAULT;

acpi_restore_state_mem()这……这是一个空函数……
目前在hibernate_op enter()函数里面……然后返回……返回之后是个死循环。那么猜测真正让系统suspend的应该在do_suspend_lowlevel()中,而这个函数结束后还可以clean掉power vent,这是为啥呢……我的理解断电是最后一个步骤,看上去不是……

Personal tools
Namespaces
Variants
Actions
Navigation
Upload file
Toolbox