\n\/*\n * Per-process file information.\n *\/\ntypedef struct uf_info {\n\tkmutex_t\tfi_lock;\t\/* see below *\/\n\tint\t\tfi_badfd;\t\/* bad file descriptor # *\/\n\tint\t\tfi_action;\t\/* action to take on bad fd use *\/\n\tint\t\tfi_nfiles;\t\/* number of entries in fi_list[] *\/\n\tuf_entry_t *volatile fi_list;\t\/* current file list *\/\n\tuf_rlist_t\t*fi_rlist;\t\/* retired file lists *\/\n} uf_info_t;\n<\/code>\n<\/pre>\nThe file lists are indexed by the integer file descriptor. Now let’s see how we can normally get to this information, the current file list. In usr\/src\/uts\/common\/sys\/user.h we have the user_t structure defined which contains all the per process data related to a user. Through user_t, by accessing it’s field u_finfo (this is of type uf_info_t) we get to the per process file information and to the current file list. A good example is in the Dtrace code (dtrace.c): uf_info_t *finfo = &curthread->t_procp->p_user.u_finfo. There are tools that will allow you to see the fi_list based on the process ID, like pfiles for example.<\/p>\n
The file list contains elements of type uf_entry_t. The definition for uf_entry_t is again in the same file usr\/src\/uts\/common\/sys\/user.h.<\/span><\/p>\n\n\n\/*\n * Entry in the per-process list of open files.\n * Note: only certain fields are copied in flist_grow() and flist_fork().\n * This is indicated in brackets in the structure member comments.\n *\/\ntypedef struct uf_entry {\n\tkmutex_t\tuf_lock;\t\/* per-fd lock [never copied] *\/\n\tstruct file\t*uf_file;\t\/* file pointer [grow, fork] *\/\n\tstruct fpollinfo *uf_fpollinfo;\t\/* poll state [grow] *\/\n\tint\t\tuf_refcnt;\t\/* LWPs accessing this file [grow] *\/\n\tint\t\tuf_alloc;\t\/* right subtree allocs [grow, fork] *\/\n\tshort\t\tuf_flag;\t\/* fcntl F_GETFD flags [grow, fork] *\/\n\tshort\t\tuf_busy;\t\/* file is allocated [grow, fork] *\/\n\tkcondvar_t\tuf_wanted_cv;\t\/* waiting for setf() [never copied] *\/\n\tkcondvar_t\tuf_closing_cv;\t\/* waiting for close() [never copied] *\/\n\tstruct portfd \t*uf_portfd;\t\/* associated with port [grow] *\/\n\t\/* Avoid false sharing - pad to coherency granularity (64 bytes) *\/\n\tchar\t\tuf_pad[64 - sizeof (kmutex_t) - 2 * sizeof (void*) -\n\t\t2 * sizeof (int) - 2 * sizeof (short) -\n\t\t2 * sizeof (kcondvar_t) - sizeof (struct portfd *)];\n} uf_entry_t;\n<\/code>\n<\/pre>\nAs we can see the uf_entry_t contains the pointer to the file in it’s field: uf_file. Let’s take a look further on down the path, to see reference between the file pointer and it’s attached vnode. In Illumos the definition of the struct file resides in usr\/src\/uts\/common\/sys\/file.h.<\/p>\n
\n\n\/*\n * One file structure is allocated for each open\/creat\/pipe call.\n * Main use is to hold the read\/write pointer associated with\n * each open file.\n *\/\ntypedef struct file {\n\tkmutex_t\tf_tlock;\t\/* short term lock *\/\n\tushort_t\tf_flag;\n\tushort_t\tf_flag2;\t\/* extra flags (FSEARCH, FEXEC) *\/\n\tstruct vnode\t*f_vnode;\t\/* pointer to vnode structure *\/\n\toffset_t\tf_offset;\t\/* read\/write character pointer *\/\n\tstruct cred\t*f_cred;\t\/* credentials of user who opened it *\/\n\tstruct f_audit_data\t*f_audit_data;\t\/* file audit data *\/\n\tint\t\tf_count;\t\/* reference count *\/\n} file_t;\n<\/code>\n<\/pre>\nFrom the file pointer we can now reach the system wide attached vnode through the f_vnode field. Multiple processes can hold references to the same vnode. We are almost at the end of the virtual filesystem level here, as we move towards the concrete filesystem implementation. The last barrier is basically the vnode. The vnode definition resides in usr\/src\/uts\/common\/sys\/vnode.h.<\/p>\n
\n\ntypedef struct vnode {\n\tkmutex_t\tv_lock;\t\t\/* protects vnode fields *\/\n\tuint_t\t\tv_flag;\t\t\/* vnode flags (see below) *\/\n\tuint_t\t\tv_count;\t\/* reference count *\/\n\tvoid\t\t*v_data;\t\/* private data for fs *\/\n\tstruct vfs\t*v_vfsp;\t\/* ptr to containing VFS *\/\n\tstruct stdata\t*v_stream;\t\/* associated stream *\/\n\tenum vtype\tv_type;\t\t\/* vnode type *\/\n\tdev_t\t\tv_rdev;\t\t\/* device (VCHR, VBLK) *\/\n\t...\n} vnode_t;\n<\/code>\n<\/pre>\nThe entry point into filesystem specific implementation is the v_data field within the vnode_t structure. Having ZFS underneath, v_data will point to a znode. In ZFS the znode is the equivalent of the the UFS inode. The v_data is casted to a znode_t structure through the macro: #define VTOZ(VP) ((znode_t *)(VP)->v_data). More about the znode and the ZFS posix operations implementation in the following posts.<\/p>\n
Let’s make a short recap through the code path of arriving from process level user data to the ZFS Posix Layer implementation.<\/span><\/p>\n\n\nproc_t process;\nuser_t p_user = process->p_user; \/\/ the user structure\nuf_info_t file_info = p_user->u_finfo; \/\/ per process file information\nuf_entry_t file_entry = file_info->fi_list[0]; \/\/ This is in case we have fd 0\nfile_t p_file = file_entry->uf_file; \/\/ pointer to file struct\nvnode_t file_vnode = p_file->f_vnode; \/\/ the vnode\n(znode_t *)file_vnode->v_data \/\/ reached the ZFS znode\n<\/code>\n<\/pre>\nThe next post will dive into the ZFS posix layer and into the code path of certain sys calls.<\/p>\n
Happy coding.<\/p>\n
\nwritten by Vadim Comanescu.<\/footer>\n<\/blockquote>\n","protected":false},"excerpt":{"rendered":"I’m starting a series of posts about ZFS. I will try to provide as much as possible architectural diagrams and code examples. The series is for [\u2026]<\/span><\/p>\n","protected":false},"author":5,"featured_media":38215,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"content-type":"","inline_featured_image":false,"footnotes":""},"categories":[1],"tags":[502,468,494,501,503,449],"yoast_head":"\nA dive into filesystems \u2013 ZFS \u2013 part 1 - Syneto<\/title>\n \n \n \n \n \n \n \n \n \n \n\t \n\t \n\t \n \n \n \n\t \n\t \n\t \n