| ==================================================== |
| IN-KERNEL CACHE OBJECT REPRESENTATION AND MANAGEMENT |
| ==================================================== |
| |
| By: David Howells <dhowells@redhat.com> |
| |
| Contents: |
| |
| (*) Representation |
| |
| (*) Object management state machine. |
| |
| - Provision of cpu time. |
| - Locking simplification. |
| |
| (*) The set of states. |
| |
| (*) The set of events. |
| |
| |
| ============== |
| REPRESENTATION |
| ============== |
| |
| FS-Cache maintains an in-kernel representation of each object that a netfs is |
| currently interested in. Such objects are represented by the fscache_cookie |
| struct and are referred to as cookies. |
| |
| FS-Cache also maintains a separate in-kernel representation of the objects that |
| a cache backend is currently actively caching. Such objects are represented by |
| the fscache_object struct. The cache backends allocate these upon request, and |
| are expected to embed them in their own representations. These are referred to |
| as objects. |
| |
| There is a 1:N relationship between cookies and objects. A cookie may be |
| represented by multiple objects - an index may exist in more than one cache - |
| or even by no objects (it may not be cached). |
| |
| Furthermore, both cookies and objects are hierarchical. The two hierarchies |
| correspond, but the cookies tree is a superset of the union of the object trees |
| of multiple caches: |
| |
| NETFS INDEX TREE : CACHE 1 : CACHE 2 |
| : : |
| : +-----------+ : |
| +----------->| IObject | : |
| +-----------+ | : +-----------+ : |
| | ICookie |-------+ : | : |
| +-----------+ | : | : +-----------+ |
| | +------------------------------>| IObject | |
| | : | : +-----------+ |
| | : V : | |
| | : +-----------+ : | |
| V +----------->| IObject | : | |
| +-----------+ | : +-----------+ : | |
| | ICookie |-------+ : | : V |
| +-----------+ | : | : +-----------+ |
| | +------------------------------>| IObject | |
| +-----+-----+ : | : +-----------+ |
| | | : | : | |
| V | : V : | |
| +-----------+ | : +-----------+ : | |
| | ICookie |------------------------->| IObject | : | |
| +-----------+ | : +-----------+ : | |
| | V : | : V |
| | +-----------+ : | : +-----------+ |
| | | ICookie |-------------------------------->| IObject | |
| | +-----------+ : | : +-----------+ |
| V | : V : | |
| +-----------+ | : +-----------+ : | |
| | DCookie |------------------------->| DObject | : | |
| +-----------+ | : +-----------+ : | |
| | : : | |
| +-------+-------+ : : | |
| | | : : | |
| V V : : V |
| +-----------+ +-----------+ : : +-----------+ |
| | DCookie | | DCookie |------------------------>| DObject | |
| +-----------+ +-----------+ : : +-----------+ |
| : : |
| |
| In the above illustration, ICookie and IObject represent indices and DCookie |
| and DObject represent data storage objects. Indices may have representation in |
| multiple caches, but currently, non-index objects may not. Objects of any type |
| may also be entirely unrepresented. |
| |
| As far as the netfs API goes, the netfs is only actually permitted to see |
| pointers to the cookies. The cookies themselves and any objects attached to |
| those cookies are hidden from it. |
| |
| |
| =============================== |
| OBJECT MANAGEMENT STATE MACHINE |
| =============================== |
| |
| Within FS-Cache, each active object is managed by its own individual state |
| machine. The state for an object is kept in the fscache_object struct, in |
| object->state. A cookie may point to a set of objects that are in different |
| states. |
| |
| Each state has an action associated with it that is invoked when the machine |
| wakes up in that state. There are four logical sets of states: |
| |
| (1) Preparation: states that wait for the parent objects to become ready. The |
| representations are hierarchical, and it is expected that an object must |
| be created or accessed with respect to its parent object. |
| |
| (2) Initialisation: states that perform lookups in the cache and validate |
| what's found and that create on disk any missing metadata. |
| |
| (3) Normal running: states that allow netfs operations on objects to proceed |
| and that update the state of objects. |
| |
| (4) Termination: states that detach objects from their netfs cookies, that |
| delete objects from disk, that handle disk and system errors and that free |
| up in-memory resources. |
| |
| |
| In most cases, transitioning between states is in response to signalled events. |
| When a state has finished processing, it will usually set the mask of events in |
| which it is interested (object->event_mask) and relinquish the worker thread. |
| Then when an event is raised (by calling fscache_raise_event()), if the event |
| is not masked, the object will be queued for processing (by calling |
| fscache_enqueue_object()). |
| |
| |
| PROVISION OF CPU TIME |
| --------------------- |
| |
| The work to be done by the various states was given CPU time by the threads of |
| the slow work facility. This was used in preference to the workqueue facility |
| because: |
| |
| (1) Threads may be completely occupied for very long periods of time by a |
| particular work item. These state actions may be doing sequences of |
| synchronous, journalled disk accesses (lookup, mkdir, create, setxattr, |
| getxattr, truncate, unlink, rmdir, rename). |
| |
| (2) Threads may do little actual work, but may rather spend a lot of time |
| sleeping on I/O. This means that single-threaded and 1-per-CPU-threaded |
| workqueues don't necessarily have the right numbers of threads. |
| |
| |
| LOCKING SIMPLIFICATION |
| ---------------------- |
| |
| Because only one worker thread may be operating on any particular object's |
| state machine at once, this simplifies the locking, particularly with respect |
| to disconnecting the netfs's representation of a cache object (fscache_cookie) |
| from the cache backend's representation (fscache_object) - which may be |
| requested from either end. |
| |
| |
| ================= |
| THE SET OF STATES |
| ================= |
| |
| The object state machine has a set of states that it can be in. There are |
| preparation states in which the object sets itself up and waits for its parent |
| object to transit to a state that allows access to its children: |
| |
| (1) State FSCACHE_OBJECT_INIT. |
| |
| Initialise the object and wait for the parent object to become active. In |
| the cache, it is expected that it will not be possible to look an object |
| up from the parent object, until that parent object itself has been looked |
| up. |
| |
| There are initialisation states in which the object sets itself up and accesses |
| disk for the object metadata: |
| |
| (2) State FSCACHE_OBJECT_LOOKING_UP. |
| |
| Look up the object on disk, using the parent as a starting point. |
| FS-Cache expects the cache backend to probe the cache to see whether this |
| object is represented there, and if it is, to see if it's valid (coherency |
| management). |
| |
| The cache should call fscache_object_lookup_negative() to indicate lookup |
| failure for whatever reason, and should call fscache_obtained_object() to |
| indicate success. |
| |
| At the completion of lookup, FS-Cache will let the netfs go ahead with |
| read operations, no matter whether the file is yet cached. If not yet |
| cached, read operations will be immediately rejected with ENODATA until |
| the first known page is uncached - as to that point there can be no data |
| to be read out of the cache for that file that isn't currently also held |
| in the pagecache. |
| |
| (3) State FSCACHE_OBJECT_CREATING. |
| |
| Create an object on disk, using the parent as a starting point. This |
| happens if the lookup failed to find the object, or if the object's |
| coherency data indicated what's on disk is out of date. In this state, |
| FS-Cache expects the cache to create |
| |
| The cache should call fscache_obtained_object() if creation completes |
| successfully, fscache_object_lookup_negative() otherwise. |
| |
| At the completion of creation, FS-Cache will start processing write |
| operations the netfs has queued for an object. If creation failed, the |
| write ops will be transparently discarded, and nothing recorded in the |
| cache. |
| |
| There are some normal running states in which the object spends its time |
| servicing netfs requests: |
| |
| (4) State FSCACHE_OBJECT_AVAILABLE. |
| |
| A transient state in which pending operations are started, child objects |
| are permitted to advance from FSCACHE_OBJECT_INIT state, and temporary |
| lookup data is freed. |
| |
| (5) State FSCACHE_OBJECT_ACTIVE. |
| |
| The normal running state. In this state, requests the netfs makes will be |
| passed on to the cache. |
| |
| (6) State FSCACHE_OBJECT_INVALIDATING. |
| |
| The object is undergoing invalidation. When the state comes here, it |
| discards all pending read, write and attribute change operations as it is |
| going to clear out the cache entirely and reinitialise it. It will then |
| continue to the FSCACHE_OBJECT_UPDATING state. |
| |
| (7) State FSCACHE_OBJECT_UPDATING. |
| |
| The state machine comes here to update the object in the cache from the |
| netfs's records. This involves updating the auxiliary data that is used |
| to maintain coherency. |
| |
| And there are terminal states in which an object cleans itself up, deallocates |
| memory and potentially deletes stuff from disk: |
| |
| (8) State FSCACHE_OBJECT_LC_DYING. |
| |
| The object comes here if it is dying because of a lookup or creation |
| error. This would be due to a disk error or system error of some sort. |
| Temporary data is cleaned up, and the parent is released. |
| |
| (9) State FSCACHE_OBJECT_DYING. |
| |
| The object comes here if it is dying due to an error, because its parent |
| cookie has been relinquished by the netfs or because the cache is being |
| withdrawn. |
| |
| Any child objects waiting on this one are given CPU time so that they too |
| can destroy themselves. This object waits for all its children to go away |
| before advancing to the next state. |
| |
| (10) State FSCACHE_OBJECT_ABORT_INIT. |
| |
| The object comes to this state if it was waiting on its parent in |
| FSCACHE_OBJECT_INIT, but its parent died. The object will destroy itself |
| so that the parent may proceed from the FSCACHE_OBJECT_DYING state. |
| |
| (11) State FSCACHE_OBJECT_RELEASING. |
| (12) State FSCACHE_OBJECT_RECYCLING. |
| |
| The object comes to one of these two states when dying once it is rid of |
| all its children, if it is dying because the netfs relinquished its |
| cookie. In the first state, the cached data is expected to persist, and |
| in the second it will be deleted. |
| |
| (13) State FSCACHE_OBJECT_WITHDRAWING. |
| |
| The object transits to this state if the cache decides it wants to |
| withdraw the object from service, perhaps to make space, but also due to |
| error or just because the whole cache is being withdrawn. |
| |
| (14) State FSCACHE_OBJECT_DEAD. |
| |
| The object transits to this state when the in-memory object record is |
| ready to be deleted. The object processor shouldn't ever see an object in |
| this state. |
| |
| |
| THE SET OF EVENTS |
| ----------------- |
| |
| There are a number of events that can be raised to an object state machine: |
| |
| (*) FSCACHE_OBJECT_EV_UPDATE |
| |
| The netfs requested that an object be updated. The state machine will ask |
| the cache backend to update the object, and the cache backend will ask the |
| netfs for details of the change through its cookie definition ops. |
| |
| (*) FSCACHE_OBJECT_EV_CLEARED |
| |
| This is signalled in two circumstances: |
| |
| (a) when an object's last child object is dropped and |
| |
| (b) when the last operation outstanding on an object is completed. |
| |
| This is used to proceed from the dying state. |
| |
| (*) FSCACHE_OBJECT_EV_ERROR |
| |
| This is signalled when an I/O error occurs during the processing of some |
| object. |
| |
| (*) FSCACHE_OBJECT_EV_RELEASE |
| (*) FSCACHE_OBJECT_EV_RETIRE |
| |
| These are signalled when the netfs relinquishes a cookie it was using. |
| The event selected depends on whether the netfs asks for the backing |
| object to be retired (deleted) or retained. |
| |
| (*) FSCACHE_OBJECT_EV_WITHDRAW |
| |
| This is signalled when the cache backend wants to withdraw an object. |
| This means that the object will have to be detached from the netfs's |
| cookie. |
| |
| Because the withdrawing releasing/retiring events are all handled by the object |
| state machine, it doesn't matter if there's a collision with both ends trying |
| to sever the connection at the same time. The state machine can just pick |
| which one it wants to honour, and that effects the other. |