liwen01 2024.06.09 前言 Linux系统中的ext2、ext3、ext4 文件系统,它们都有很强的向后和向前兼容性,可以在数据不丢失的情况下进行文件系统的升级。目前ext4是一个相对较成熟、稳定且高效的文件系统,适用于绝大部分规模和需求的Linux环境。 ext4它突出的特点有:数
liwen01 2024.06.09
Linux系统中的ext2、ext3、ext4 文件系统,它们都有很强的向后和向前兼容性,可以在数据不丢失的情况下进行文件系统的升级。目前ext4是一个相对较成熟、稳定且高效的文件系统,适用于绝大部分规模和需求的Linux环境。
ext4它突出的特点有: 数据分段管理、多块分配、延迟分配、持久预分配、日志校验、支持更大的文件系统和文件大小。
ext4文件系统的具体实现比较复杂,本文尝试用比较简单的方式用一篇文章的篇幅来简单地介绍一下它的工作原理。
为了分析ext4 文件系统的内部结构和原理,这里我们在Linux中创建一个ext4文件系统镜像,然后通过loop虚拟设备将ext4镜像文件挂载到某个目录上。具体实现步骤如下:
dd if=/dev/zero of=./ext4_image.img bs=1M count=1024
mkfs.ext4 ext4_image.img
sudo mount -o loop ext4_image.img /home/biao/test/ext4/ext4_simulator
dumpe2fs ext4_image.img
输出内容信息(中间省略了部分内容):
dumpe2fs 1.44.1 (24-Mar-2018)
Filesystem volume name:
Last mounted on: /home/biao/test/ext4/ext4_simulator
Filesystem UUID: 0169498e-f5f7-4fb8-9e9e-532088e41333
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file dir_nlink extra_isize metadata_csum
Filesystem flags: signed_directory_hash
Default mount options: user_xattr acl
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 65536
Block count: 262144
Reserved block count: 13107
Free blocks: 247703
Free inodes: 65517
First block: 0
Block size: 4096
Fragment size: 4096
Group descriptor size: 64
Reserved GDT blocks: 127
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
Flex block group size: 16
Filesystem created: Fri May 24 17:18:57 2024
Last mount time: Wed Jun 5 19:15:36 2024
Last write time: Wed Jun 5 19:15:36 2024
Mount count: 3
Maximum mount count: -1
Last checked: Fri May 24 17:18:57 2024
Check interval: 0 ()
Lifetime writes: 6997 kB
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 32
Desired extra isize: 32
Journal inode: 8
Default directory hash: half_md4
Directory Hash Seed: 0faf0e8c-f385-4ecd-b3a4-db2a3329e121
Journal backup: inode blocks
Checksum type: crc32c
Checksum: 0x32dc1b70
Journal features: journal_64bit journal_checksum_v3
Journal size: 32M
Journal length: 8192
Journal sequence: 0x00000017
Journal start: 1
Journal checksum type: crc32c
Journal checksum: 0xa3c1b983
Group 0: (Blocks 0-32767) csum 0xf19b [ITABLE_ZEROED]
Primary superblock at 0, Group descriptors at 1-1
Reserved GDT blocks at 2-128
Block bitmap at 129 (+129), csum 0x8efc34cf
Inode bitmap at 137 (+137), csum 0x49f91ed6
Inode table at 145-656 (+145)
28517 free blocks, 8176 free inodes, 3 directories, 8176 unused inodes
Free blocks: 4251-32767
Free inodes: 17-8192
..........
..........
..........
Group 7: (Blocks 229376-262143) csum 0x7daa [INODE_UNINIT, ITABLE_ZEROED]
Backup superblock at 229376, Group descriptors at 229377-229377
Reserved GDT blocks at 229378-229504
Block bitmap at 136 (bg #0 + 136), csum 0x5bd8cca0
Inode bitmap at 144 (bg #0 + 144), csum 0x00000000
Inode table at 3729-4240 (bg #0 + 3729)
32639 free blocks, 8192 free inodes, 0 directories, 8192 unused inodes
Free blocks: 229505-262143
Free inodes: 57345-65536
从上面dumpe2fs的数据上我们可以看出,一个1GB大小的空间,ext4 文件系统将它分隔成了0~7的8个Group。
ext4 的总体磁盘布局如下:
从上图可以看出:
为什么需要这样设计?这个下面稍晚点再介绍
从上面《1.1 ext4文件系统信息表》中可以知道Primary superblock在第0号block,每个block的大小为4096Byte。
用hexdump 命令查看超级块的数据
biao@ubuntu:~/test/ext4$ hexdump -s 0 -n 4096 -C ext4_image.img
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000400 00 00 01 00 00 00 04 00 33 33 00 00 97 c7 03 00 |........33......|
00000410 ed ff 00 00 00 00 00 00 02 00 00 00 02 00 00 00 |................|
00000420 00 80 00 00 00 80 00 00 00 20 00 00 9c c1 5d 66 |......... ....]f|
00000430 00 d0 5f 66 02 00 ff ff 53 ef 01 00 01 00 00 00 |.._f....S.......|
00000440 81 5b 50 66 00 00 00 00 00 00 00 00 01 00 00 00 |.[Pf............|
00000450 00 00 00 00 0b 00 00 00 00 01 00 00 3c 00 00 00 |............<...|
00000460 c2 02 00 00 6b 04 00 00 01 69 49 8e f5 f7 4f b8 |....k....iI...O.|
00000470 9e 9e 53 20 88 e4 13 33 00 00 00 00 00 00 00 00 |..S ...3........|
00000480 00 00 00 00 00 00 00 00 2f 68 6f 6d 65 2f 62 69 |......../home/bi|
00000490 61 6f 2f 74 65 73 74 2f 65 78 74 34 2f 65 78 74 |ao/test/ext4/ext|
000004a0 34 5f 73 69 6d 75 6c 61 74 6f 72 00 00 00 00 00 |4_simulator.....|
000004b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000004c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 7f 00 |................|
000004d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000004e0 08 00 00 00 00 00 00 00 00 00 00 00 0f af 0e 8c |................|
000004f0 f3 85 4e cd b3 a4 db 2a 33 29 e1 21 01 01 40 00 |..N....*3).!..@.|
00000500 0c 00 00 00 00 00 00 00 81 5b 50 66 0a f3 01 00 |.........[Pf....|
........
biao@ubuntu:~/test/ext4$
对超级块的部分数据进行解析:
从上表可以看出superblock的主要内容有:
文件系统信息、块大小和块组信息、Inode 相关信息、文件系统大小和使用情况、日志相关信息、挂载信息、校验和和备份信息
。
其实使用dumpe2fs命令查看的ext4文件系统信息就是从superblock上的数据解析而来。
除了Primary superblock,还在不同的group中有备份superblock,其内容与Primary superblock原始数据相同,Primary superblock损坏的时候可以从备份区恢复回来。
在 ext4 文件系统中,Group Descriptor(块组描述符)是一个关键的结构,用于描述和管理文件系统的块组(Block Group)。每个块组包含文件系统中的一部分数据块和 inode,并且有自己的元数据来管理这些资源。Group Descriptor 在超级块之后紧随其后,是文件系统的组织和管理的核心部分
从上面《1.1 ext4文件系统信息表》中可以知道group0 的 Group descriptors 在第1个数据块中,其大小为1个block
group 0 中 Group descriptors 的数据如下:
biao@ubuntu:~/test/ext4$ hexdump -s 4096 -n 4096 -C ext4_image.img
00001000 81 00 00 00 89 00 00 00 91 00 00 00 65 6f f0 1f |............eo..|
00001010 03 00 04 00 00 00 00 00 cf 34 d6 1e f0 1f 9b f1 |.........4......|
00001020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00001030 00 00 00 00 00 00 00 00 fc 8e f9 49 00 00 00 00 |...........I....|
00001040 82 00 00 00 8a 00 00 00 91 02 00 00 b5 79 fd 1f |.............y..|
00001050 03 00 04 00 00 00 00 00 c2 fd 0a 43 fd 1f c2 4a |...........C...J|
00001060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00001070 00 00 00 00 00 00 00 00 8e a7 8c 58 00 00 00 00 |...........X....|
.........
biao@ubuntu:~/test/ext4$
对Group descriptors 的数据进行解析,可以看到详细当前group的详细信息。
一个Group descriptors 占用一个block,它不仅仅记录自己Group上的信息,还包括了其它group的Group descriptors
Block bitmap 块位图用于管理块组(Block Group)中的数据块,Block Bitmap 记录了块组中每个块的使用状态,标识哪些块是已使用的,哪些块是空闲的,里面数据是按位标记,为1表示该块已经被使用。
查看Block bitmap中的数据
biao@ubuntu:~/test/ext4$ hexdump -s 528384 -n 4096 -C ext4_image.img
00081000 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
*
00081210 ff ff ff 07 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00081220 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00082000
biao@ubuntu:~/test/ext4$
与Block bitmap工作原理类似,Inode bitmap 是用于管理块组(Block Group)中的inode。Inode Bitmap记录了块组中每个inode的使用状态,标识哪些inode是已使用的,哪些inode是空闲的。
biao@ubuntu:~/test/ext4$ hexdump -s 561152 -n 4096 -C ext4_image.img
00089000 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00089010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00089400 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff |................|
*
0008a000
biao@ubuntu:~/test/ext4$
索引节点表是相对比较复杂的一个元文件,从上面《1.1 ext4文件系统信息表》我们可以知道:
Inode size: 256
Inode table at 145-656 (+145)
查看索引节点信息:
biao@ubuntu:~/test/ext4$ hexdump -s 593920 -n 4096 -C ext4_image.img
00091000 00 00 00 00 00 00 00 00 81 5b 50 66 81 5b 50 66 |.........[Pf.[Pf|
00091010 81 5b 50 66 00 00 00 00 00 00 00 00 00 00 00 00 |.[Pf............|
00091020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00091070 00 00 00 00 00 00 00 00 00 00 00 00 6f 16 00 00 |............o...|
00091080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00091100 ed 41 00 00 00 10 00 00 78 15 61 66 e5 5d 50 66 |.A......x.af.]Pf|
00091110 e5 5d 50 66 00 00 00 00 00 00 07 00 08 00 00 00 |.]Pf............|
00091120 00 00 08 00 04 00 00 00 0a f3 01 00 04 00 00 00 |................|
00091130 00 00 00 00 00 00 00 00 01 00 00 00 91 10 00 00 |................|
00091140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00091170 00 00 00 00 00 00 00 00 00 00 00 00 fa d3 00 00 |................|
00091180 20 00 98 7a 60 ea ef 8e 60 ea ef 8e 78 f5 3f a0 | ..z`...`...x.?.|
00091190 81 5b 50 66 00 00 00 00 00 00 00 00 00 00 00 00 |.[Pf............|
000911a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00091270 00 00 00 00 00 00 00 00 00 00 00 00 8d 16 00 00 |................|
00091280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
对第2个索引节点的参数进行解析:
在ext4文件系统中,0~11号索引是特殊定义的索引节点:
在 ext4 文件系统中,inode 是一个数据结构,代表文件系统中的每个文件和目录。每个 inode 包含了有关文件的元数据,例如文件大小、权限、所有者信息等。inode.i_block 是 inode 结构中用于指向文件数据块的字段,是文件系统如何找到并访问文件内容的核心部分.
inode.i_block 是 ext4 文件系统中确保文件数据高效存储和访问的关键组件,i_block里的数据类型,需要根据i_flags中的参数来确认,上面《图7.1 Inode table参数解析》i_flags 的值是0x080000,同使用的是 Inode uses extents (EXT4_EXTENTS_FL)
iblock的长度是60字节,我们下面通过iblock里的参数找到该inode对应文件所在的block。
文件系统中文件信息如下:
root@ubuntu:/home/biao/test/ext4/ext4_simulator# tree
.
├── lost+found
├── test1
│?? └── 0000.media
├── test2
│?? └── 0011.media
├── test3
│?? └── 0022.media
└── test4
└── 0033.media
5 directories, 4 files
root@ubuntu:/home/biao/test/ext4/ext4_simulator#
如果我们要找到0033.media文件所在block,我们先通过stat 查看0033.media 的inode节点
biao@ubuntu:~/test/ext4/ext4_simulator/test4$ stat 0033.media
File: 0033.media
Size: 1662591 Blocks: 3248 IO Block: 4096 regular file
Device: 719h/1817d Inode: 16 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/ biao) Gid: ( 1000/ biao)
Access: 2024-06-05 10:39:09.000000000 +0800
Modify: 2024-05-14 01:01:26.000000000 +0800
Change: 2024-06-05 10:39:09.423416410 +0800
Birth: -
biao@ubuntu:~/test/ext4/ext4_simulator/test4$
定位到索引所在的位置:
145 * 4096 +(16-1)*256 = 593,920 + 3,840 = 597,760 = 0x91F00
索引节点数据
*
00091f00 a4 81 e8 03 7f 5e 19 00 cd cf 5f 66 cd cf 5f 66 |.....^...._f.._f|
00091f10 66 47 42 66 00 00 00 00 e8 03 01 00 b0 0c 00 00 |fGBf............|
00091f20 00 00 08 00 01 00 00 00 0a f3 01 00 04 00 00 00 |................|
00091f30 00 00 00 00 00 00 00 00 96 01 00 00 b5 84 00 00 |................|
00091f40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
i_block 的偏移量是0x28,对i_block的数据进行解析:
将逻辑块0开始的0x196个block映射到物理0x84b5开始的0x196个物理块中
0x84b5 = 33973
33973 * 4096 = 139,153,408 = 0x84B 5000
查看文件系统的0x84b5 block数据,与0033.media文件的数据是相同的
第 0x84b5 block
biao@ubuntu:~/test/ext4$ hexdump -s 139153408 -n 4096 -C ext4_image.img
084b5000 01 00 00 00 25 25 01 00 7a 34 9e 74 8f 01 00 00 |....%%..z4.t....|
084b5010 8c d1 0f f2 ff ff ff ff 00 00 00 01 40 01 0c 01 |............@...|
084b5020 ff ff 01 40 00 00 03 00 90 00 00 03 00 00 03 00 |...@............|
084b5030 96 bc 09 00 00 00 01 42 01 01 01 40 00 00 03 00 |.......B...@....|
084b5040 90 00 00 03 00 00 03 00 96 a0 01 20 20 05 11 67 |........... ..g|
084b5050 be e4 4a 17 25 05 05 05 e1 00 00 03 00 01 00 00 |..J.%...........|
084b5060 03 00 14 2f 84 02 08 00 00 00 01 44 01 c0 73 c0 |.../.......D..s.|
084b5070 c6 d9 00 00 00 01 26 01 ac 39 80 1f cd 51 b5 b2 |......&..9...Q..|
084b5080 70 02 84 80 26 99 cd b5 f6 00 cf a3 06 b7 71 6b |p...&.........qk|
0033.media
biao@ubuntu:~/test/ext4/ext4_simulator/test4$ hexdump -s 0 -n 4096 -C 0033.media
00000000 01 00 00 00 25 25 01 00 7a 34 9e 74 8f 01 00 00 |....%%..z4.t....|
00000010 8c d1 0f f2 ff ff ff ff 00 00 00 01 40 01 0c 01 |............@...|
00000020 ff ff 01 40 00 00 03 00 90 00 00 03 00 00 03 00 |...@............|
00000030 96 bc 09 00 00 00 01 42 01 01 01 40 00 00 03 00 |.......B...@....|
00000040 90 00 00 03 00 00 03 00 96 a0 01 20 20 05 11 67 |........... ..g|
00000050 be e4 4a 17 25 05 05 05 e1 00 00 03 00 01 00 00 |..J.%...........|
00000060 03 00 14 2f 84 02 08 00 00 00 01 44 01 c0 73 c0 |.../.......D..s.|
00000070 c6 d9 00 00 00 01 26 01 ac 39 80 1f cd 51 b5 b2 |......&..9...Q..|
00000080 70 02 84 80 26 99 cd b5 f6 00 cf a3 06 b7 71 6b |p...&.........qk|
通过上面《图7.2 特殊索引节点》我们知道根目录的inode是2,查看根目录的索引节点位置:
根目录 inode 位置
145 * 4096 +(2-1)*256 = 593,920 + 256 = 594,176 = 0x91100
根目录 inode 数据
*
00091100 ed 41 00 00 00 10 00 00 77 be 5f 66 e5 5d 50 66 |.A......w._f.]Pf|
00091110 e5 5d 50 66 00 00 00 00 00 00 07 00 08 00 00 00 |.]Pf............|
00091120 00 00 08 00 04 00 00 00 0a f3 01 00 04 00 00 00 |................|
00091130 00 00 00 00 00 00 00 00 01 00 00 00 91 10 00 00 |................|
00091140 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
0x1091 = 4,241
4,241 * 4096 = 17,371,136 = 0x109 1000
biao@ubuntu:~/test/ext4$ hexdump -s 17371136 -n 4096 -C ext4_image.img
01091000 02 00 00 00 0c 00 01 02 2e 00 00 00 02 00 00 00 |................|
01091010 0c 00 02 02 2e 2e 00 00 0b 00 00 00 14 00 0a 02 |................|
01091020 6c 6f 73 74 2b 66 6f 75 6e 64 00 00 0c 00 00 00 |lost+found......|
01091030 10 00 05 02 74 65 73 74 31 00 00 00 01 20 00 00 |....test1.... ..|
01091040 10 00 05 02 74 65 73 74 32 00 00 00 02 20 00 00 |....test2.... ..|
01091050 10 00 05 02 74 65 73 74 33 00 00 00 03 20 00 00 |....test3.... ..|
01091060 98 0f 05 02 74 65 73 74 34 00 00 00 00 00 00 00 |....test4.......|
01091070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
01091ff0 00 00 00 00 00 00 00 00 0c 00 00 de 67 85 5b 11 |............g.[.|
01092000
biao@ubuntu:~/test/ext4$
可以看到根目录上的所有信息,下面是对根目录的目录项进行解析
同样的方法,可以定位到各子目录上的信息。
fsck
工具利用超级块、块组描述符、块位图和 inode 位图来检查文件系统的一致性。
fsck
能够更快地进行一致性检查,减少系统恢复时间。
上面只是简单的介绍了ext4文件系统的基础内容,一些更加详细的内容,比如日志、碎片整理、软连接与硬连接等等都还没有介绍,受篇幅限制,这些以后再介绍吧。