Merge branch 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup updates from Tejun Heo:
- xattr support added. The implementation is shared with tmpfs. The
usage is restricted and intended to be used to manage per-cgroup
metadata by system software. tmpfs changes are routed through this
branch with Hugh's permission.
- cgroup subsystem ID handling simplified.
* 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Define CGROUP_SUBSYS_COUNT according the configuration
cgroup: Assign subsystem IDs during compile time
cgroup: Do not depend on a given order when populating the subsys array
cgroup: Wrap subsystem selection macro
cgroup: Remove CGROUP_BUILTIN_SUBSYS_COUNT
cgroup: net_prio: Do not define task_netpioidx() when not selected
cgroup: net_cls: Do not define task_cls_classid() when not selected
cgroup: net_cls: Move sock_update_classid() declaration to cls_cgroup.h
cgroup: trivial fixes for Documentation/cgroups/cgroups.txt
xattr: mark variable as uninitialized to make both gcc and smatch happy
fs: add missing documentation to simple_xattr functions
cgroup: add documentation on extended attributes usage
cgroup: rename subsys_bits to subsys_mask
cgroup: add xattr support
cgroup: revise how we re-populate root directory
xattr: extract simple_xattr code from tmpfs
diff --git a/Documentation/cgroups/cgroups.txt b/Documentation/cgroups/cgroups.txt
index 4a0b64c..9e04196c 100644
--- a/Documentation/cgroups/cgroups.txt
+++ b/Documentation/cgroups/cgroups.txt
@@ -29,7 +29,8 @@
3.1 Overview
3.2 Synchronization
3.3 Subsystem API
-4. Questions
+4. Extended attributes usage
+5. Questions
1. Control Groups
=================
@@ -62,9 +63,9 @@
At any one time there may be multiple active hierarchies of task
cgroups. Each hierarchy is a partition of all tasks in the system.
-User level code may create and destroy cgroups by name in an
+User-level code may create and destroy cgroups by name in an
instance of the cgroup virtual file system, specify and query to
-which cgroup a task is assigned, and list the task pids assigned to
+which cgroup a task is assigned, and list the task PIDs assigned to
a cgroup. Those creations and assignments only affect the hierarchy
associated with that instance of the cgroup file system.
@@ -72,7 +73,7 @@
tracking. The intention is that other subsystems hook into the generic
cgroup support to provide new attributes for cgroups, such as
accounting/limiting the resources which processes in a cgroup can
-access. For example, cpusets (see Documentation/cgroups/cpusets.txt) allows
+access. For example, cpusets (see Documentation/cgroups/cpusets.txt) allow
you to associate a set of CPUs and a set of memory nodes with the
tasks in each cgroup.
@@ -80,11 +81,11 @@
----------------------------
There are multiple efforts to provide process aggregations in the
-Linux kernel, mainly for resource tracking purposes. Such efforts
+Linux kernel, mainly for resource-tracking purposes. Such efforts
include cpusets, CKRM/ResGroups, UserBeanCounters, and virtual server
namespaces. These all require the basic notion of a
grouping/partitioning of processes, with newly forked processes ending
-in the same group (cgroup) as their parent process.
+up in the same group (cgroup) as their parent process.
The kernel cgroup patch provides the minimum essential kernel
mechanisms required to efficiently implement such groups. It has
@@ -127,14 +128,14 @@
/ \
Professors (15%) students (5%)
-Browsers like Firefox/Lynx go into the WWW network class, while (k)nfsd go
-into NFS network class.
+Browsers like Firefox/Lynx go into the WWW network class, while (k)nfsd goes
+into the NFS network class.
At the same time Firefox/Lynx will share an appropriate CPU/Memory class
depending on who launched it (prof/student).
With the ability to classify tasks differently for different resources
-(by putting those resource subsystems in different hierarchies) then
+(by putting those resource subsystems in different hierarchies),
the admin can easily set up a script which receives exec notifications
and depending on who is launching the browser he can
@@ -145,19 +146,19 @@
appropriate network and other resource class. This may lead to
proliferation of such cgroups.
-Also lets say that the administrator would like to give enhanced network
+Also let's say that the administrator would like to give enhanced network
access temporarily to a student's browser (since it is night and the user
-wants to do online gaming :)) OR give one of the students simulation
-apps enhanced CPU power,
+wants to do online gaming :)) OR give one of the student's simulation
+apps enhanced CPU power.
-With ability to write pids directly to resource classes, it's just a
-matter of :
+With ability to write PIDs directly to resource classes, it's just a
+matter of:
# echo pid > /sys/fs/cgroup/network/<new_class>/tasks
(after some time)
# echo pid > /sys/fs/cgroup/network/<orig_class>/tasks
-Without this ability, he would have to split the cgroup into
+Without this ability, the administrator would have to split the cgroup into
multiple separate ones and then associate the new cgroups with the
new resource classes.
@@ -184,20 +185,20 @@
field of each task_struct using the css_set, anchored at
css_set->tasks.
- - A cgroup hierarchy filesystem can be mounted for browsing and
+ - A cgroup hierarchy filesystem can be mounted for browsing and
manipulation from user space.
- - You can list all the tasks (by pid) attached to any cgroup.
+ - You can list all the tasks (by PID) attached to any cgroup.
The implementation of cgroups requires a few, simple hooks
-into the rest of the kernel, none in performance critical paths:
+into the rest of the kernel, none in performance-critical paths:
- in init/main.c, to initialize the root cgroups and initial
css_set at system boot.
- in fork and exit, to attach and detach a task from its css_set.
-In addition a new file system, of type "cgroup" may be mounted, to
+In addition, a new file system of type "cgroup" may be mounted, to
enable browsing and modifying the cgroups presently known to the
kernel. When mounting a cgroup hierarchy, you may specify a
comma-separated list of subsystems to mount as the filesystem mount
@@ -230,13 +231,13 @@
Each cgroup is represented by a directory in the cgroup file system
containing the following files describing that cgroup:
- - tasks: list of tasks (by pid) attached to that cgroup. This list
- is not guaranteed to be sorted. Writing a thread id into this file
+ - tasks: list of tasks (by PID) attached to that cgroup. This list
+ is not guaranteed to be sorted. Writing a thread ID into this file
moves the thread into this cgroup.
- - cgroup.procs: list of tgids in the cgroup. This list is not
- guaranteed to be sorted or free of duplicate tgids, and userspace
+ - cgroup.procs: list of thread group IDs in the cgroup. This list is
+ not guaranteed to be sorted or free of duplicate TGIDs, and userspace
should sort/uniquify the list if this property is required.
- Writing a thread group id into this file moves all threads in that
+ Writing a thread group ID into this file moves all threads in that
group into this cgroup.
- notify_on_release flag: run the release agent on exit?
- release_agent: the path to use for release notifications (this file
@@ -261,7 +262,7 @@
When a task is moved from one cgroup to another, it gets a new
css_set pointer - if there's an already existing css_set with the
-desired collection of cgroups then that group is reused, else a new
+desired collection of cgroups then that group is reused, otherwise a new
css_set is allocated. The appropriate existing css_set is located by
looking into a hash table.
@@ -292,7 +293,7 @@
removal of abandoned cgroups. The default value of
notify_on_release in the root cgroup at system boot is disabled
(0). The default value of other cgroups at creation is the current
-value of their parents notify_on_release setting. The default value of
+value of their parents' notify_on_release settings. The default value of
a cgroup hierarchy's release_agent path is empty.
1.5 What does clone_children do ?
@@ -316,7 +317,7 @@
4) Create the new cgroup by doing mkdir's and write's (or echo's) in
the /sys/fs/cgroup virtual file system.
5) Start a task that will be the "founding father" of the new job.
- 6) Attach that task to the new cgroup by writing its pid to the
+ 6) Attach that task to the new cgroup by writing its PID to the
/sys/fs/cgroup/cpuset/tasks file for that cgroup.
7) fork, exec or clone the job tasks from this founding father task.
@@ -344,7 +345,7 @@
2.1 Basic Usage
---------------
-Creating, modifying, using the cgroups can be done through the cgroup
+Creating, modifying, using cgroups can be done through the cgroup
virtual filesystem.
To mount a cgroup hierarchy with all available subsystems, type:
@@ -441,7 +442,7 @@
# echo 0 > tasks
You can use the cgroup.procs file instead of the tasks file to move all
-threads in a threadgroup at once. Echoing the pid of any task in a
+threads in a threadgroup at once. Echoing the PID of any task in a
threadgroup to cgroup.procs causes all tasks in that threadgroup to be
be attached to the cgroup. Writing 0 to cgroup.procs moves all tasks
in the writing task's threadgroup.
@@ -479,7 +480,7 @@
There is mechanism which allows to get notifications about changing
status of a cgroup.
-To register new notification handler you need:
+To register a new notification handler you need to:
- create a file descriptor for event notification using eventfd(2);
- open a control file to be monitored (e.g. memory.usage_in_bytes);
- write "<event_fd> <control_fd> <args>" to cgroup.event_control.
@@ -488,7 +489,7 @@
eventfd will be woken up by control file implementation or when the
cgroup is removed.
-To unregister notification handler just close eventfd.
+To unregister a notification handler just close eventfd.
NOTE: Support of notifications should be implemented for the control
file. See documentation for the subsystem.
@@ -502,7 +503,7 @@
Each kernel subsystem that wants to hook into the generic cgroup
system needs to create a cgroup_subsys object. This contains
various methods, which are callbacks from the cgroup system, along
-with a subsystem id which will be assigned by the cgroup system.
+with a subsystem ID which will be assigned by the cgroup system.
Other fields in the cgroup_subsys object include:
@@ -516,7 +517,7 @@
at system boot.
Each cgroup object created by the system has an array of pointers,
-indexed by subsystem id; this pointer is entirely managed by the
+indexed by subsystem ID; this pointer is entirely managed by the
subsystem; the generic cgroup code will never touch this pointer.
3.2 Synchronization
@@ -639,7 +640,7 @@
Called during cgroup_create() to do any parameter
initialization which might be required before a task could attach. For
-example in cpusets, no task may attach before 'cpus' and 'mems' are set
+example, in cpusets, no task may attach before 'cpus' and 'mems' are set
up.
void bind(struct cgroup *root)
@@ -650,7 +651,26 @@
the default hierarchy (which never has sub-cgroups) and a hierarchy
that is being created/destroyed (and hence has no sub-cgroups).
-4. Questions
+4. Extended attribute usage
+===========================
+
+cgroup filesystem supports certain types of extended attributes in its
+directories and files. The current supported types are:
+ - Trusted (XATTR_TRUSTED)
+ - Security (XATTR_SECURITY)
+
+Both require CAP_SYS_ADMIN capability to set.
+
+Like in tmpfs, the extended attributes in cgroup filesystem are stored
+using kernel memory and it's advised to keep the usage at minimum. This
+is the reason why user defined extended attributes are not supported, since
+any user can do it and there's no limit in the value size.
+
+The current known users for this feature are SELinux to limit cgroup usage
+in containers and systemd for assorted meta data like main PID in a cgroup
+(systemd creates a cgroup per service).
+
+5. Questions
============
Q: what's up with this '/bin/echo' ?
@@ -660,5 +680,5 @@
Q: When I attach processes, only the first of the line gets really attached !
A: We can only return one error code per call to write(). So you should also
- put only ONE pid.
+ put only ONE PID.
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 3a16d4f..9336b82 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -68,6 +68,7 @@
#include <net/netns/generic.h>
#include <net/rtnetlink.h>
#include <net/sock.h>
+#include <net/cls_cgroup.h>
#include <asm/uaccess.h>
diff --git a/fs/xattr.c b/fs/xattr.c
index 4d45b71..014f113 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -791,3 +791,183 @@
EXPORT_SYMBOL(generic_listxattr);
EXPORT_SYMBOL(generic_setxattr);
EXPORT_SYMBOL(generic_removexattr);
+
+/*
+ * Allocate new xattr and copy in the value; but leave the name to callers.
+ */
+struct simple_xattr *simple_xattr_alloc(const void *value, size_t size)
+{
+ struct simple_xattr *new_xattr;
+ size_t len;
+
+ /* wrap around? */
+ len = sizeof(*new_xattr) + size;
+ if (len <= sizeof(*new_xattr))
+ return NULL;
+
+ new_xattr = kmalloc(len, GFP_KERNEL);
+ if (!new_xattr)
+ return NULL;
+
+ new_xattr->size = size;
+ memcpy(new_xattr->value, value, size);
+ return new_xattr;
+}
+
+/*
+ * xattr GET operation for in-memory/pseudo filesystems
+ */
+int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
+ void *buffer, size_t size)
+{
+ struct simple_xattr *xattr;
+ int ret = -ENODATA;
+
+ spin_lock(&xattrs->lock);
+ list_for_each_entry(xattr, &xattrs->head, list) {
+ if (strcmp(name, xattr->name))
+ continue;
+
+ ret = xattr->size;
+ if (buffer) {
+ if (size < xattr->size)
+ ret = -ERANGE;
+ else
+ memcpy(buffer, xattr->value, xattr->size);
+ }
+ break;
+ }
+ spin_unlock(&xattrs->lock);
+ return ret;
+}
+
+static int __simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
+ const void *value, size_t size, int flags)
+{
+ struct simple_xattr *xattr;
+ struct simple_xattr *uninitialized_var(new_xattr);
+ int err = 0;
+
+ /* value == NULL means remove */
+ if (value) {
+ new_xattr = simple_xattr_alloc(value, size);
+ if (!new_xattr)
+ return -ENOMEM;
+
+ new_xattr->name = kstrdup(name, GFP_KERNEL);
+ if (!new_xattr->name) {
+ kfree(new_xattr);
+ return -ENOMEM;
+ }
+ }
+
+ spin_lock(&xattrs->lock);
+ list_for_each_entry(xattr, &xattrs->head, list) {
+ if (!strcmp(name, xattr->name)) {
+ if (flags & XATTR_CREATE) {
+ xattr = new_xattr;
+ err = -EEXIST;
+ } else if (new_xattr) {
+ list_replace(&xattr->list, &new_xattr->list);
+ } else {
+ list_del(&xattr->list);
+ }
+ goto out;
+ }
+ }
+ if (flags & XATTR_REPLACE) {
+ xattr = new_xattr;
+ err = -ENODATA;
+ } else {
+ list_add(&new_xattr->list, &xattrs->head);
+ xattr = NULL;
+ }
+out:
+ spin_unlock(&xattrs->lock);
+ if (xattr) {
+ kfree(xattr->name);
+ kfree(xattr);
+ }
+ return err;
+
+}
+
+/**
+ * simple_xattr_set - xattr SET operation for in-memory/pseudo filesystems
+ * @xattrs: target simple_xattr list
+ * @name: name of the new extended attribute
+ * @value: value of the new xattr. If %NULL, will remove the attribute
+ * @size: size of the new xattr
+ * @flags: %XATTR_{CREATE|REPLACE}
+ *
+ * %XATTR_CREATE is set, the xattr shouldn't exist already; otherwise fails
+ * with -EEXIST. If %XATTR_REPLACE is set, the xattr should exist;
+ * otherwise, fails with -ENODATA.
+ *
+ * Returns 0 on success, -errno on failure.
+ */
+int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
+ const void *value, size_t size, int flags)
+{
+ if (size == 0)
+ value = ""; /* empty EA, do not remove */
+ return __simple_xattr_set(xattrs, name, value, size, flags);
+}
+
+/*
+ * xattr REMOVE operation for in-memory/pseudo filesystems
+ */
+int simple_xattr_remove(struct simple_xattrs *xattrs, const char *name)
+{
+ return __simple_xattr_set(xattrs, name, NULL, 0, XATTR_REPLACE);
+}
+
+static bool xattr_is_trusted(const char *name)
+{
+ return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN);
+}
+
+/*
+ * xattr LIST operation for in-memory/pseudo filesystems
+ */
+ssize_t simple_xattr_list(struct simple_xattrs *xattrs, char *buffer,
+ size_t size)
+{
+ bool trusted = capable(CAP_SYS_ADMIN);
+ struct simple_xattr *xattr;
+ size_t used = 0;
+
+ spin_lock(&xattrs->lock);
+ list_for_each_entry(xattr, &xattrs->head, list) {
+ size_t len;
+
+ /* skip "trusted." attributes for unprivileged callers */
+ if (!trusted && xattr_is_trusted(xattr->name))
+ continue;
+
+ len = strlen(xattr->name) + 1;
+ used += len;
+ if (buffer) {
+ if (size < used) {
+ used = -ERANGE;
+ break;
+ }
+ memcpy(buffer, xattr->name, len);
+ buffer += len;
+ }
+ }
+ spin_unlock(&xattrs->lock);
+
+ return used;
+}
+
+/*
+ * Adds an extended attribute to the list
+ */
+void simple_xattr_list_add(struct simple_xattrs *xattrs,
+ struct simple_xattr *new_xattr)
+{
+ spin_lock(&xattrs->lock);
+ list_add(&new_xattr->list, &xattrs->head);
+ spin_unlock(&xattrs->lock);
+}
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index c90eaa8..df354ae 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -17,6 +17,7 @@
#include <linux/rwsem.h>
#include <linux/idr.h>
#include <linux/workqueue.h>
+#include <linux/xattr.h>
#ifdef CONFIG_CGROUPS
@@ -45,17 +46,13 @@
/* Define the enumeration of all builtin cgroup subsystems */
#define SUBSYS(_x) _x ## _subsys_id,
+#define IS_SUBSYS_ENABLED(option) IS_ENABLED(option)
enum cgroup_subsys_id {
#include <linux/cgroup_subsys.h>
- CGROUP_BUILTIN_SUBSYS_COUNT
+ CGROUP_SUBSYS_COUNT,
};
+#undef IS_SUBSYS_ENABLED
#undef SUBSYS
-/*
- * This define indicates the maximum number of subsystems that can be loaded
- * at once. We limit to this many since cgroupfs_root has subsys_bits to keep
- * track of all of them.
- */
-#define CGROUP_SUBSYS_COUNT (BITS_PER_BYTE*sizeof(unsigned long))
/* Per-subsystem/per-cgroup state maintained by the system. */
struct cgroup_subsys_state {
@@ -216,6 +213,9 @@
/* List of events which userspace want to receive */
struct list_head event_list;
spinlock_t event_list_lock;
+
+ /* directory xattrs */
+ struct simple_xattrs xattrs;
};
/*
@@ -309,6 +309,9 @@
/* CFTYPE_* flags */
unsigned int flags;
+ /* file xattrs */
+ struct simple_xattrs xattrs;
+
int (*open)(struct inode *inode, struct file *file);
ssize_t (*read)(struct cgroup *cgrp, struct cftype *cft,
struct file *file,
@@ -394,7 +397,7 @@
*/
struct cftype_set {
struct list_head node; /* chained at subsys->cftsets */
- const struct cftype *cfts;
+ struct cftype *cfts;
};
struct cgroup_scanner {
@@ -406,8 +409,8 @@
void *data;
};
-int cgroup_add_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts);
-int cgroup_rm_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts);
+int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
+int cgroup_rm_cftypes(struct cgroup_subsys *ss, struct cftype *cfts);
int cgroup_is_removed(const struct cgroup *cgrp);
@@ -521,7 +524,9 @@
};
#define SUBSYS(_x) extern struct cgroup_subsys _x ## _subsys;
+#define IS_SUBSYS_ENABLED(option) IS_BUILTIN(option)
#include <linux/cgroup_subsys.h>
+#undef IS_SUBSYS_ENABLED
#undef SUBSYS
static inline struct cgroup_subsys_state *cgroup_subsys_state(
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index dfae957..f204a7a 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -7,73 +7,73 @@
/* */
-#ifdef CONFIG_CPUSETS
+#if IS_SUBSYS_ENABLED(CONFIG_CPUSETS)
SUBSYS(cpuset)
#endif
/* */
-#ifdef CONFIG_CGROUP_DEBUG
+#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_DEBUG)
SUBSYS(debug)
#endif
/* */
-#ifdef CONFIG_CGROUP_SCHED
+#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_SCHED)
SUBSYS(cpu_cgroup)
#endif
/* */
-#ifdef CONFIG_CGROUP_CPUACCT
+#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_CPUACCT)
SUBSYS(cpuacct)
#endif
/* */
-#ifdef CONFIG_MEMCG
+#if IS_SUBSYS_ENABLED(CONFIG_MEMCG)
SUBSYS(mem_cgroup)
#endif
/* */
-#ifdef CONFIG_CGROUP_DEVICE
+#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_DEVICE)
SUBSYS(devices)
#endif
/* */
-#ifdef CONFIG_CGROUP_FREEZER
+#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_FREEZER)
SUBSYS(freezer)
#endif
/* */
-#ifdef CONFIG_NET_CLS_CGROUP
+#if IS_SUBSYS_ENABLED(CONFIG_NET_CLS_CGROUP)
SUBSYS(net_cls)
#endif
/* */
-#ifdef CONFIG_BLK_CGROUP
+#if IS_SUBSYS_ENABLED(CONFIG_BLK_CGROUP)
SUBSYS(blkio)
#endif
/* */
-#ifdef CONFIG_CGROUP_PERF
+#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_PERF)
SUBSYS(perf)
#endif
/* */
-#ifdef CONFIG_NETPRIO_CGROUP
+#if IS_SUBSYS_ENABLED(CONFIG_NETPRIO_CGROUP)
SUBSYS(net_prio)
#endif
/* */
-#ifdef CONFIG_CGROUP_HUGETLB
+#if IS_SUBSYS_ENABLED(CONFIG_CGROUP_HUGETLB)
SUBSYS(hugetlb)
#endif
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index bef2cf0..30aa0dc 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -5,6 +5,7 @@
#include <linux/mempolicy.h>
#include <linux/pagemap.h>
#include <linux/percpu_counter.h>
+#include <linux/xattr.h>
/* inode in-kernel data */
@@ -18,7 +19,7 @@
};
struct shared_policy policy; /* NUMA memory alloc policy */
struct list_head swaplist; /* chain of maybes on swap */
- struct list_head xattr_list; /* list of shmem_xattr */
+ struct simple_xattrs xattrs; /* list of xattrs */
struct inode vfs_inode;
};
diff --git a/include/linux/xattr.h b/include/linux/xattr.h
index e5d12203..2ace7a6 100644
--- a/include/linux/xattr.h
+++ b/include/linux/xattr.h
@@ -59,7 +59,9 @@
#ifdef __KERNEL__
+#include <linux/slab.h>
#include <linux/types.h>
+#include <linux/spinlock.h>
struct inode;
struct dentry;
@@ -96,6 +98,52 @@
char **xattr_value, size_t size, gfp_t flags);
int vfs_xattr_cmp(struct dentry *dentry, const char *xattr_name,
const char *value, size_t size, gfp_t flags);
+
+struct simple_xattrs {
+ struct list_head head;
+ spinlock_t lock;
+};
+
+struct simple_xattr {
+ struct list_head list;
+ char *name;
+ size_t size;
+ char value[0];
+};
+
+/*
+ * initialize the simple_xattrs structure
+ */
+static inline void simple_xattrs_init(struct simple_xattrs *xattrs)
+{
+ INIT_LIST_HEAD(&xattrs->head);
+ spin_lock_init(&xattrs->lock);
+}
+
+/*
+ * free all the xattrs
+ */
+static inline void simple_xattrs_free(struct simple_xattrs *xattrs)
+{
+ struct simple_xattr *xattr, *node;
+
+ list_for_each_entry_safe(xattr, node, &xattrs->head, list) {
+ kfree(xattr->name);
+ kfree(xattr);
+ }
+}
+
+struct simple_xattr *simple_xattr_alloc(const void *value, size_t size);
+int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
+ void *buffer, size_t size);
+int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
+ const void *value, size_t size, int flags);
+int simple_xattr_remove(struct simple_xattrs *xattrs, const char *name);
+ssize_t simple_xattr_list(struct simple_xattrs *xattrs, char *buffer,
+ size_t size);
+void simple_xattr_list_add(struct simple_xattrs *xattrs,
+ struct simple_xattr *new_xattr);
+
#endif /* __KERNEL__ */
#endif /* _LINUX_XATTR_H */
diff --git a/include/net/cls_cgroup.h b/include/net/cls_cgroup.h
index a4dc5b0..b6a6eeb 100644
--- a/include/net/cls_cgroup.h
+++ b/include/net/cls_cgroup.h
@@ -17,14 +17,16 @@
#include <linux/hardirq.h>
#include <linux/rcupdate.h>
-#ifdef CONFIG_CGROUPS
+#if IS_ENABLED(CONFIG_NET_CLS_CGROUP)
struct cgroup_cls_state
{
struct cgroup_subsys_state css;
u32 classid;
};
-#ifdef CONFIG_NET_CLS_CGROUP
+extern void sock_update_classid(struct sock *sk);
+
+#if IS_BUILTIN(CONFIG_NET_CLS_CGROUP)
static inline u32 task_cls_classid(struct task_struct *p)
{
int classid;
@@ -39,32 +41,33 @@
return classid;
}
-#else
-extern int net_cls_subsys_id;
-
+#elif IS_MODULE(CONFIG_NET_CLS_CGROUP)
static inline u32 task_cls_classid(struct task_struct *p)
{
- int id;
+ struct cgroup_subsys_state *css;
u32 classid = 0;
if (in_interrupt())
return 0;
rcu_read_lock();
- id = rcu_dereference_index_check(net_cls_subsys_id,
- rcu_read_lock_held());
- if (id >= 0)
- classid = container_of(task_subsys_state(p, id),
+ css = task_subsys_state(p, net_cls_subsys_id);
+ if (css)
+ classid = container_of(css,
struct cgroup_cls_state, css)->classid;
rcu_read_unlock();
return classid;
}
#endif
-#else
+#else /* !CGROUP_NET_CLS_CGROUP */
+static inline void sock_update_classid(struct sock *sk)
+{
+}
+
static inline u32 task_cls_classid(struct task_struct *p)
{
return 0;
}
-#endif
+#endif /* CGROUP_NET_CLS_CGROUP */
#endif /* _NET_CLS_CGROUP_H */
diff --git a/include/net/netprio_cgroup.h b/include/net/netprio_cgroup.h
index 2719dec..2760f4f 100644
--- a/include/net/netprio_cgroup.h
+++ b/include/net/netprio_cgroup.h
@@ -18,23 +18,18 @@
#include <linux/rcupdate.h>
+#if IS_ENABLED(CONFIG_NETPRIO_CGROUP)
struct netprio_map {
struct rcu_head rcu;
u32 priomap_len;
u32 priomap[];
};
-#ifdef CONFIG_CGROUPS
-
struct cgroup_netprio_state {
struct cgroup_subsys_state css;
u32 prioidx;
};
-#ifndef CONFIG_NETPRIO_CGROUP
-extern int net_prio_subsys_id;
-#endif
-
extern void sock_update_netprioidx(struct sock *sk, struct task_struct *task);
#if IS_BUILTIN(CONFIG_NETPRIO_CGROUP)
@@ -56,33 +51,28 @@
static inline u32 task_netprioidx(struct task_struct *p)
{
- struct cgroup_netprio_state *state;
- int subsys_id;
+ struct cgroup_subsys_state *css;
u32 idx = 0;
rcu_read_lock();
- subsys_id = rcu_dereference_index_check(net_prio_subsys_id,
- rcu_read_lock_held());
- if (subsys_id >= 0) {
- state = container_of(task_subsys_state(p, subsys_id),
- struct cgroup_netprio_state, css);
- idx = state->prioidx;
- }
+ css = task_subsys_state(p, net_prio_subsys_id);
+ if (css)
+ idx = container_of(css,
+ struct cgroup_netprio_state, css)->prioidx;
rcu_read_unlock();
return idx;
}
+#endif
-#else
+#else /* !CONFIG_NETPRIO_CGROUP */
static inline u32 task_netprioidx(struct task_struct *p)
{
return 0;
}
-#endif /* CONFIG_NETPRIO_CGROUP */
-
-#else
#define sock_update_netprioidx(sk, task)
-#endif
+
+#endif /* CONFIG_NETPRIO_CGROUP */
#endif /* _NET_CLS_CGROUP_H */
diff --git a/include/net/sock.h b/include/net/sock.h
index adb7da2..6e6ec18 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1486,14 +1486,6 @@
extern void sock_kfree_s(struct sock *sk, void *mem, int size);
extern void sk_send_sigurg(struct sock *sk);
-#ifdef CONFIG_CGROUPS
-extern void sock_update_classid(struct sock *sk);
-#else
-static inline void sock_update_classid(struct sock *sk)
-{
-}
-#endif
-
/*
* Functions to fill in entries in struct proto_ops when a protocol
* does not implement a particular function.
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7981850..485cc14 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -88,11 +88,12 @@
/*
* Generate an array of cgroup subsystem pointers. At boot time, this is
- * populated up to CGROUP_BUILTIN_SUBSYS_COUNT, and modular subsystems are
+ * populated with the built in subsystems, and modular subsystems are
* registered after that. The mutable section of this array is protected by
* cgroup_mutex.
*/
-#define SUBSYS(_x) &_x ## _subsys,
+#define SUBSYS(_x) [_x ## _subsys_id] = &_x ## _subsys,
+#define IS_SUBSYS_ENABLED(option) IS_BUILTIN(option)
static struct cgroup_subsys *subsys[CGROUP_SUBSYS_COUNT] = {
#include <linux/cgroup_subsys.h>
};
@@ -111,13 +112,13 @@
* The bitmask of subsystems intended to be attached to this
* hierarchy
*/
- unsigned long subsys_bits;
+ unsigned long subsys_mask;
/* Unique id for this hierarchy. */
int hierarchy_id;
/* The bitmask of subsystems currently attached to this hierarchy */
- unsigned long actual_subsys_bits;
+ unsigned long actual_subsys_mask;
/* A list running through the attached subsystems */
struct list_head subsys_list;
@@ -276,7 +277,8 @@
/* bits in struct cgroupfs_root flags field */
enum {
- ROOT_NOPREFIX, /* mounted subsystems have no named prefix */
+ ROOT_NOPREFIX, /* mounted subsystems have no named prefix */
+ ROOT_XATTR, /* supports extended attributes */
};
static int cgroup_is_releasable(const struct cgroup *cgrp)
@@ -556,7 +558,7 @@
* won't change, so no need for locking.
*/
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
- if (root->subsys_bits & (1UL << i)) {
+ if (root->subsys_mask & (1UL << i)) {
/* Subsystem is in this hierarchy. So we want
* the subsystem state from the new
* cgroup */
@@ -824,7 +826,8 @@
static int cgroup_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode);
static struct dentry *cgroup_lookup(struct inode *, struct dentry *, unsigned int);
static int cgroup_rmdir(struct inode *unused_dir, struct dentry *dentry);
-static int cgroup_populate_dir(struct cgroup *cgrp);
+static int cgroup_populate_dir(struct cgroup *cgrp, bool base_files,
+ unsigned long subsys_mask);
static const struct inode_operations cgroup_dir_inode_operations;
static const struct file_operations proc_cgroupstats_operations;
@@ -912,15 +915,19 @@
*/
BUG_ON(!list_empty(&cgrp->pidlists));
+ simple_xattrs_free(&cgrp->xattrs);
+
kfree_rcu(cgrp, rcu_head);
} else {
struct cfent *cfe = __d_cfe(dentry);
struct cgroup *cgrp = dentry->d_parent->d_fsdata;
+ struct cftype *cft = cfe->type;
WARN_ONCE(!list_empty(&cfe->node) &&
cgrp != &cgrp->root->top_cgroup,
"cfe still linked for %s\n", cfe->type->name);
kfree(cfe);
+ simple_xattrs_free(&cft->xattrs);
}
iput(inode);
}
@@ -963,12 +970,29 @@
return -ENOENT;
}
-static void cgroup_clear_directory(struct dentry *dir)
+/**
+ * cgroup_clear_directory - selective removal of base and subsystem files
+ * @dir: directory containing the files
+ * @base_files: true if the base files should be removed
+ * @subsys_mask: mask of the subsystem ids whose files should be removed
+ */
+static void cgroup_clear_directory(struct dentry *dir, bool base_files,
+ unsigned long subsys_mask)
{
struct cgroup *cgrp = __d_cgrp(dir);
+ struct cgroup_subsys *ss;
- while (!list_empty(&cgrp->files))
- cgroup_rm_file(cgrp, NULL);
+ for_each_subsys(cgrp->root, ss) {
+ struct cftype_set *set;
+ if (!test_bit(ss->subsys_id, &subsys_mask))
+ continue;
+ list_for_each_entry(set, &ss->cftsets, node)
+ cgroup_rm_file(cgrp, set->cfts);
+ }
+ if (base_files) {
+ while (!list_empty(&cgrp->files))
+ cgroup_rm_file(cgrp, NULL);
+ }
}
/*
@@ -977,8 +1001,9 @@
static void cgroup_d_remove_dir(struct dentry *dentry)
{
struct dentry *parent;
+ struct cgroupfs_root *root = dentry->d_sb->s_fs_info;
- cgroup_clear_directory(dentry);
+ cgroup_clear_directory(dentry, true, root->subsys_mask);
parent = dentry->d_parent;
spin_lock(&parent->d_lock);
@@ -1022,22 +1047,22 @@
* returns an error, no reference counts are touched.
*/
static int rebind_subsystems(struct cgroupfs_root *root,
- unsigned long final_bits)
+ unsigned long final_subsys_mask)
{
- unsigned long added_bits, removed_bits;
+ unsigned long added_mask, removed_mask;
struct cgroup *cgrp = &root->top_cgroup;
int i;
BUG_ON(!mutex_is_locked(&cgroup_mutex));
BUG_ON(!mutex_is_locked(&cgroup_root_mutex));
- removed_bits = root->actual_subsys_bits & ~final_bits;
- added_bits = final_bits & ~root->actual_subsys_bits;
+ removed_mask = root->actual_subsys_mask & ~final_subsys_mask;
+ added_mask = final_subsys_mask & ~root->actual_subsys_mask;
/* Check that any added subsystems are currently free */
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
unsigned long bit = 1UL << i;
struct cgroup_subsys *ss = subsys[i];
- if (!(bit & added_bits))
+ if (!(bit & added_mask))
continue;
/*
* Nobody should tell us to do a subsys that doesn't exist:
@@ -1062,7 +1087,7 @@
for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
unsigned long bit = 1UL << i;
- if (bit & added_bits) {
+ if (bit & added_mask) {
/* We're binding this subsystem to this hierarchy */
BUG_ON(ss == NULL);
BUG_ON(cgrp->subsys[i]);
@@ -1075,7 +1100,7 @@
if (ss->bind)
ss->bind(cgrp);
/* refcount was already taken, and we're keeping it */
- } else if (bit & removed_bits) {
+ } else if (bit & removed_mask) {
/* We're removing this subsystem */
BUG_ON(ss == NULL);
BUG_ON(cgrp->subsys[i] != dummytop->subsys[i]);
@@ -1088,7 +1113,7 @@
list_move(&ss->sibling, &rootnode.subsys_list);
/* subsystem is now free - drop reference on module */
module_put(ss->module);
- } else if (bit & final_bits) {
+ } else if (bit & final_subsys_mask) {
/* Subsystem state should already exist */
BUG_ON(ss == NULL);
BUG_ON(!cgrp->subsys[i]);
@@ -1105,7 +1130,7 @@
BUG_ON(cgrp->subsys[i]);
}
}
- root->subsys_bits = root->actual_subsys_bits = final_bits;
+ root->subsys_mask = root->actual_subsys_mask = final_subsys_mask;
synchronize_rcu();
return 0;
@@ -1121,6 +1146,8 @@
seq_printf(seq, ",%s", ss->name);
if (test_bit(ROOT_NOPREFIX, &root->flags))
seq_puts(seq, ",noprefix");
+ if (test_bit(ROOT_XATTR, &root->flags))
+ seq_puts(seq, ",xattr");
if (strlen(root->release_agent_path))
seq_printf(seq, ",release_agent=%s", root->release_agent_path);
if (clone_children(&root->top_cgroup))
@@ -1132,7 +1159,7 @@
}
struct cgroup_sb_opts {
- unsigned long subsys_bits;
+ unsigned long subsys_mask;
unsigned long flags;
char *release_agent;
bool clone_children;
@@ -1189,6 +1216,10 @@
opts->clone_children = true;
continue;
}
+ if (!strcmp(token, "xattr")) {
+ set_bit(ROOT_XATTR, &opts->flags);
+ continue;
+ }
if (!strncmp(token, "release_agent=", 14)) {
/* Specifying two release agents is forbidden */
if (opts->release_agent)
@@ -1237,7 +1268,7 @@
/* Mutually exclusive option 'all' + subsystem name */
if (all_ss)
return -EINVAL;
- set_bit(i, &opts->subsys_bits);
+ set_bit(i, &opts->subsys_mask);
one_ss = true;
break;
@@ -1258,7 +1289,7 @@
continue;
if (ss->disabled)
continue;
- set_bit(i, &opts->subsys_bits);
+ set_bit(i, &opts->subsys_mask);
}
}
@@ -1270,19 +1301,19 @@
* the cpuset subsystem.
*/
if (test_bit(ROOT_NOPREFIX, &opts->flags) &&
- (opts->subsys_bits & mask))
+ (opts->subsys_mask & mask))
return -EINVAL;
/* Can't specify "none" and some subsystems */
- if (opts->subsys_bits && opts->none)
+ if (opts->subsys_mask && opts->none)
return -EINVAL;
/*
* We either have to specify by name or by subsystems. (So all
* empty hierarchies must have a name).
*/
- if (!opts->subsys_bits && !opts->name)
+ if (!opts->subsys_mask && !opts->name)
return -EINVAL;
/*
@@ -1291,10 +1322,10 @@
* take duplicate reference counts on a subsystem that's already used,
* but rebind_subsystems handles this case.
*/
- for (i = CGROUP_BUILTIN_SUBSYS_COUNT; i < CGROUP_SUBSYS_COUNT; i++) {
+ for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
unsigned long bit = 1UL << i;
- if (!(bit & opts->subsys_bits))
+ if (!(bit & opts->subsys_mask))
continue;
if (!try_module_get(subsys[i]->module)) {
module_pin_failed = true;
@@ -1307,11 +1338,11 @@
* raced with a module_delete call, and to the user this is
* essentially a "subsystem doesn't exist" case.
*/
- for (i--; i >= CGROUP_BUILTIN_SUBSYS_COUNT; i--) {
+ for (i--; i >= 0; i--) {
/* drop refcounts only on the ones we took */
unsigned long bit = 1UL << i;
- if (!(bit & opts->subsys_bits))
+ if (!(bit & opts->subsys_mask))
continue;
module_put(subsys[i]->module);
}
@@ -1321,13 +1352,13 @@
return 0;
}
-static void drop_parsed_module_refcounts(unsigned long subsys_bits)
+static void drop_parsed_module_refcounts(unsigned long subsys_mask)
{
int i;
- for (i = CGROUP_BUILTIN_SUBSYS_COUNT; i < CGROUP_SUBSYS_COUNT; i++) {
+ for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
unsigned long bit = 1UL << i;
- if (!(bit & subsys_bits))
+ if (!(bit & subsys_mask))
continue;
module_put(subsys[i]->module);
}
@@ -1339,6 +1370,7 @@
struct cgroupfs_root *root = sb->s_fs_info;
struct cgroup *cgrp = &root->top_cgroup;
struct cgroup_sb_opts opts;
+ unsigned long added_mask, removed_mask;
mutex_lock(&cgrp->dentry->d_inode->i_mutex);
mutex_lock(&cgroup_mutex);
@@ -1350,27 +1382,31 @@
goto out_unlock;
/* See feature-removal-schedule.txt */
- if (opts.subsys_bits != root->actual_subsys_bits || opts.release_agent)
+ if (opts.subsys_mask != root->actual_subsys_mask || opts.release_agent)
pr_warning("cgroup: option changes via remount are deprecated (pid=%d comm=%s)\n",
task_tgid_nr(current), current->comm);
+ added_mask = opts.subsys_mask & ~root->subsys_mask;
+ removed_mask = root->subsys_mask & ~opts.subsys_mask;
+
/* Don't allow flags or name to change at remount */
if (opts.flags != root->flags ||
(opts.name && strcmp(opts.name, root->name))) {
ret = -EINVAL;
- drop_parsed_module_refcounts(opts.subsys_bits);
+ drop_parsed_module_refcounts(opts.subsys_mask);
goto out_unlock;
}
- ret = rebind_subsystems(root, opts.subsys_bits);
+ ret = rebind_subsystems(root, opts.subsys_mask);
if (ret) {
- drop_parsed_module_refcounts(opts.subsys_bits);
+ drop_parsed_module_refcounts(opts.subsys_mask);
goto out_unlock;
}
/* clear out any existing files and repopulate subsystem files */
- cgroup_clear_directory(cgrp->dentry);
- cgroup_populate_dir(cgrp);
+ cgroup_clear_directory(cgrp->dentry, false, removed_mask);
+ /* re-populate subsystem files */
+ cgroup_populate_dir(cgrp, false, added_mask);
if (opts.release_agent)
strcpy(root->release_agent_path, opts.release_agent);
@@ -1401,6 +1437,7 @@
mutex_init(&cgrp->pidlist_mutex);
INIT_LIST_HEAD(&cgrp->event_list);
spin_lock_init(&cgrp->event_list_lock);
+ simple_xattrs_init(&cgrp->xattrs);
}
static void init_cgroup_root(struct cgroupfs_root *root)
@@ -1455,8 +1492,8 @@
* If we asked for subsystems (or explicitly for no
* subsystems) then they must match
*/
- if ((opts->subsys_bits || opts->none)
- && (opts->subsys_bits != root->subsys_bits))
+ if ((opts->subsys_mask || opts->none)
+ && (opts->subsys_mask != root->subsys_mask))
return 0;
return 1;
@@ -1466,7 +1503,7 @@
{
struct cgroupfs_root *root;
- if (!opts->subsys_bits && !opts->none)
+ if (!opts->subsys_mask && !opts->none)
return NULL;
root = kzalloc(sizeof(*root), GFP_KERNEL);
@@ -1479,7 +1516,7 @@
}
init_cgroup_root(root);
- root->subsys_bits = opts->subsys_bits;
+ root->subsys_mask = opts->subsys_mask;
root->flags = opts->flags;
if (opts->release_agent)
strcpy(root->release_agent_path, opts->release_agent);
@@ -1511,7 +1548,7 @@
if (!opts->new_root)
return -EINVAL;
- BUG_ON(!opts->subsys_bits && !opts->none);
+ BUG_ON(!opts->subsys_mask && !opts->none);
ret = set_anon_super(sb, NULL);
if (ret)
@@ -1629,7 +1666,7 @@
if (ret)
goto unlock_drop;
- ret = rebind_subsystems(root, root->subsys_bits);
+ ret = rebind_subsystems(root, root->subsys_mask);
if (ret == -EBUSY) {
free_cg_links(&tmp_cg_links);
goto unlock_drop;
@@ -1669,7 +1706,7 @@
BUG_ON(root->number_of_cgroups != 1);
cred = override_creds(&init_cred);
- cgroup_populate_dir(root_cgrp);
+ cgroup_populate_dir(root_cgrp, true, root->subsys_mask);
revert_creds(cred);
mutex_unlock(&cgroup_root_mutex);
mutex_unlock(&cgroup_mutex);
@@ -1681,7 +1718,7 @@
*/
cgroup_drop_root(opts.new_root);
/* no subsys rebinding, so refcounts don't change */
- drop_parsed_module_refcounts(opts.subsys_bits);
+ drop_parsed_module_refcounts(opts.subsys_mask);
}
kfree(opts.release_agent);
@@ -1695,7 +1732,7 @@
drop_new_super:
deactivate_locked_super(sb);
drop_modules:
- drop_parsed_module_refcounts(opts.subsys_bits);
+ drop_parsed_module_refcounts(opts.subsys_mask);
out_err:
kfree(opts.release_agent);
kfree(opts.name);
@@ -1745,6 +1782,8 @@
mutex_unlock(&cgroup_root_mutex);
mutex_unlock(&cgroup_mutex);
+ simple_xattrs_free(&cgrp->xattrs);
+
kill_litter_super(sb);
cgroup_drop_root(root);
}
@@ -2551,6 +2590,64 @@
return simple_rename(old_dir, old_dentry, new_dir, new_dentry);
}
+static struct simple_xattrs *__d_xattrs(struct dentry *dentry)
+{
+ if (S_ISDIR(dentry->d_inode->i_mode))
+ return &__d_cgrp(dentry)->xattrs;
+ else
+ return &__d_cft(dentry)->xattrs;
+}
+
+static inline int xattr_enabled(struct dentry *dentry)
+{
+ struct cgroupfs_root *root = dentry->d_sb->s_fs_info;
+ return test_bit(ROOT_XATTR, &root->flags);
+}
+
+static bool is_valid_xattr(const char *name)
+{
+ if (!strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN) ||
+ !strncmp(name, XATTR_SECURITY_PREFIX, XATTR_SECURITY_PREFIX_LEN))
+ return true;
+ return false;
+}
+
+static int cgroup_setxattr(struct dentry *dentry, const char *name,
+ const void *val, size_t size, int flags)
+{
+ if (!xattr_enabled(dentry))
+ return -EOPNOTSUPP;
+ if (!is_valid_xattr(name))
+ return -EINVAL;
+ return simple_xattr_set(__d_xattrs(dentry), name, val, size, flags);
+}
+
+static int cgroup_removexattr(struct dentry *dentry, const char *name)
+{
+ if (!xattr_enabled(dentry))
+ return -EOPNOTSUPP;
+ if (!is_valid_xattr(name))
+ return -EINVAL;
+ return simple_xattr_remove(__d_xattrs(dentry), name);
+}
+
+static ssize_t cgroup_getxattr(struct dentry *dentry, const char *name,
+ void *buf, size_t size)
+{
+ if (!xattr_enabled(dentry))
+ return -EOPNOTSUPP;
+ if (!is_valid_xattr(name))
+ return -EINVAL;
+ return simple_xattr_get(__d_xattrs(dentry), name, buf, size);
+}
+
+static ssize_t cgroup_listxattr(struct dentry *dentry, char *buf, size_t size)
+{
+ if (!xattr_enabled(dentry))
+ return -EOPNOTSUPP;
+ return simple_xattr_list(__d_xattrs(dentry), buf, size);
+}
+
static const struct file_operations cgroup_file_operations = {
.read = cgroup_file_read,
.write = cgroup_file_write,
@@ -2559,11 +2656,22 @@
.release = cgroup_file_release,
};
+static const struct inode_operations cgroup_file_inode_operations = {
+ .setxattr = cgroup_setxattr,
+ .getxattr = cgroup_getxattr,
+ .listxattr = cgroup_listxattr,
+ .removexattr = cgroup_removexattr,
+};
+
static const struct inode_operations cgroup_dir_inode_operations = {
.lookup = cgroup_lookup,
.mkdir = cgroup_mkdir,
.rmdir = cgroup_rmdir,
.rename = cgroup_rename,
+ .setxattr = cgroup_setxattr,
+ .getxattr = cgroup_getxattr,
+ .listxattr = cgroup_listxattr,
+ .removexattr = cgroup_removexattr,
};
static struct dentry *cgroup_lookup(struct inode *dir, struct dentry *dentry, unsigned int flags)
@@ -2611,6 +2719,7 @@
} else if (S_ISREG(mode)) {
inode->i_size = 0;
inode->i_fop = &cgroup_file_operations;
+ inode->i_op = &cgroup_file_inode_operations;
}
d_instantiate(dentry, inode);
dget(dentry); /* Extra count - pin the dentry in core */
@@ -2671,7 +2780,7 @@
}
static int cgroup_add_file(struct cgroup *cgrp, struct cgroup_subsys *subsys,
- const struct cftype *cft)
+ struct cftype *cft)
{
struct dentry *dir = cgrp->dentry;
struct cgroup *parent = __d_cgrp(dir);
@@ -2681,6 +2790,8 @@
umode_t mode;
char name[MAX_CGROUP_TYPE_NAMELEN + MAX_CFTYPE_NAME + 2] = { 0 };
+ simple_xattrs_init(&cft->xattrs);
+
/* does @cft->flags tell us to skip creation on @cgrp? */
if ((cft->flags & CFTYPE_NOT_ON_ROOT) && !cgrp->parent)
return 0;
@@ -2721,9 +2832,9 @@
}
static int cgroup_addrm_files(struct cgroup *cgrp, struct cgroup_subsys *subsys,
- const struct cftype cfts[], bool is_add)
+ struct cftype cfts[], bool is_add)
{
- const struct cftype *cft;
+ struct cftype *cft;
int err, ret = 0;
for (cft = cfts; cft->name[0] != '\0'; cft++) {
@@ -2757,7 +2868,7 @@
}
static void cgroup_cfts_commit(struct cgroup_subsys *ss,
- const struct cftype *cfts, bool is_add)
+ struct cftype *cfts, bool is_add)
__releases(&cgroup_mutex) __releases(&cgroup_cft_mutex)
{
LIST_HEAD(pending);
@@ -2808,7 +2919,7 @@
* function currently returns 0 as long as @cfts registration is successful
* even if some file creation attempts on existing cgroups fail.
*/
-int cgroup_add_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts)
+int cgroup_add_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
{
struct cftype_set *set;
@@ -2838,7 +2949,7 @@
* Returns 0 on successful unregistration, -ENOENT if @cfts is not
* registered with @ss.
*/
-int cgroup_rm_cftypes(struct cgroup_subsys *ss, const struct cftype *cfts)
+int cgroup_rm_cftypes(struct cgroup_subsys *ss, struct cftype *cfts)
{
struct cftype_set *set;
@@ -3843,18 +3954,29 @@
{ } /* terminate */
};
-static int cgroup_populate_dir(struct cgroup *cgrp)
+/**
+ * cgroup_populate_dir - selectively creation of files in a directory
+ * @cgrp: target cgroup
+ * @base_files: true if the base files should be added
+ * @subsys_mask: mask of the subsystem ids whose files should be added
+ */
+static int cgroup_populate_dir(struct cgroup *cgrp, bool base_files,
+ unsigned long subsys_mask)
{
int err;
struct cgroup_subsys *ss;
- err = cgroup_addrm_files(cgrp, NULL, files, true);
- if (err < 0)
- return err;
+ if (base_files) {
+ err = cgroup_addrm_files(cgrp, NULL, files, true);
+ if (err < 0)
+ return err;
+ }
/* process cftsets of each subsystem */
for_each_subsys(cgrp->root, ss) {
struct cftype_set *set;
+ if (!test_bit(ss->subsys_id, &subsys_mask))
+ continue;
list_for_each_entry(set, &ss->cftsets, node)
cgroup_addrm_files(cgrp, ss, set->cfts, true);
@@ -3988,7 +4110,7 @@
list_add_tail(&cgrp->allcg_node, &root->allcg_list);
- err = cgroup_populate_dir(cgrp);
+ err = cgroup_populate_dir(cgrp, true, root->subsys_mask);
/* If err < 0, we have a half-filled directory - oh well ;) */
mutex_unlock(&cgroup_mutex);
@@ -4321,8 +4443,7 @@
* since cgroup_init_subsys will have already taken care of it.
*/
if (ss->module == NULL) {
- /* a few sanity checks */
- BUG_ON(ss->subsys_id >= CGROUP_BUILTIN_SUBSYS_COUNT);
+ /* a sanity check */
BUG_ON(subsys[ss->subsys_id] != ss);
return 0;
}
@@ -4330,24 +4451,8 @@
/* init base cftset */
cgroup_init_cftsets(ss);
- /*
- * need to register a subsys id before anything else - for example,
- * init_cgroup_css needs it.
- */
mutex_lock(&cgroup_mutex);
- /* find the first empty slot in the array */
- for (i = CGROUP_BUILTIN_SUBSYS_COUNT; i < CGROUP_SUBSYS_COUNT; i++) {
- if (subsys[i] == NULL)
- break;
- }
- if (i == CGROUP_SUBSYS_COUNT) {
- /* maximum number of subsystems already registered! */
- mutex_unlock(&cgroup_mutex);
- return -EBUSY;
- }
- /* assign ourselves the subsys_id */
- ss->subsys_id = i;
- subsys[i] = ss;
+ subsys[ss->subsys_id] = ss;
/*
* no ss->create seems to need anything important in the ss struct, so
@@ -4356,7 +4461,7 @@
css = ss->create(dummytop);
if (IS_ERR(css)) {
/* failure case - need to deassign the subsys[] slot. */
- subsys[i] = NULL;
+ subsys[ss->subsys_id] = NULL;
mutex_unlock(&cgroup_mutex);
return PTR_ERR(css);
}
@@ -4372,7 +4477,7 @@
if (ret) {
dummytop->subsys[ss->subsys_id] = NULL;
ss->destroy(dummytop);
- subsys[i] = NULL;
+ subsys[ss->subsys_id] = NULL;
mutex_unlock(&cgroup_mutex);
return ret;
}
@@ -4439,7 +4544,6 @@
mutex_lock(&cgroup_mutex);
/* deassign the subsys_id */
- BUG_ON(ss->subsys_id < CGROUP_BUILTIN_SUBSYS_COUNT);
subsys[ss->subsys_id] = NULL;
/* remove subsystem from rootnode's list of subsystems */
@@ -4502,10 +4606,13 @@
for (i = 0; i < CSS_SET_TABLE_SIZE; i++)
INIT_HLIST_HEAD(&css_set_table[i]);
- /* at bootup time, we don't worry about modular subsystems */
- for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
+ for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
+ /* at bootup time, we don't worry about modular subsystems */
+ if (!ss || ss->module)
+ continue;
+
BUG_ON(!ss->name);
BUG_ON(strlen(ss->name) > MAX_CGROUP_TYPE_NAMELEN);
BUG_ON(!ss->create);
@@ -4538,9 +4645,12 @@
if (err)
return err;
- /* at bootup time, we don't worry about modular subsystems */
- for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
+ for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
+
+ /* at bootup time, we don't worry about modular subsystems */
+ if (!ss || ss->module)
+ continue;
if (!ss->early_init)
cgroup_init_subsys(ss);
if (ss->use_id)
@@ -4735,13 +4845,16 @@
{
if (need_forkexit_callback) {
int i;
- /*
- * forkexit callbacks are only supported for builtin
- * subsystems, and the builtin section of the subsys array is
- * immutable, so we don't need to lock the subsys array here.
- */
- for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
+ for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
+
+ /*
+ * forkexit callbacks are only supported for
+ * builtin subsystems.
+ */
+ if (!ss || ss->module)
+ continue;
+
if (ss->fork)
ss->fork(child);
}
@@ -4846,12 +4959,13 @@
tsk->cgroups = &init_css_set;
if (run_callbacks && need_forkexit_callback) {
- /*
- * modular subsystems can't use callbacks, so no need to lock
- * the subsys array
- */
- for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
+ for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
+
+ /* modular subsystems can't use callbacks */
+ if (!ss || ss->module)
+ continue;
+
if (ss->exit) {
struct cgroup *old_cgrp =
rcu_dereference_raw(cg->subsys[i])->cgroup;
@@ -5037,13 +5151,17 @@
while ((token = strsep(&str, ",")) != NULL) {
if (!*token)
continue;
- /*
- * cgroup_disable, being at boot time, can't know about module
- * subsystems, so we don't worry about them.
- */
- for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
+ for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
struct cgroup_subsys *ss = subsys[i];
+ /*
+ * cgroup_disable, being at boot time, can't
+ * know about module subsystems, so we don't
+ * worry about them.
+ */
+ if (!ss || ss->module)
+ continue;
+
if (!strcmp(token, ss->name)) {
ss->disabled = 1;
printk(KERN_INFO "Disabling %s control group"
diff --git a/mm/shmem.c b/mm/shmem.c
index d4e184e..d375211 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -77,13 +77,6 @@
/* Symlink up to this size is kmalloc'ed instead of using a swappable page */
#define SHORT_SYMLINK_LEN 128
-struct shmem_xattr {
- struct list_head list; /* anchored by shmem_inode_info->xattr_list */
- char *name; /* xattr name */
- size_t size;
- char value[0];
-};
-
/*
* shmem_fallocate and shmem_writepage communicate via inode->i_private
* (with i_mutex making sure that it has only one user at a time):
@@ -636,7 +629,6 @@
static void shmem_evict_inode(struct inode *inode)
{
struct shmem_inode_info *info = SHMEM_I(inode);
- struct shmem_xattr *xattr, *nxattr;
if (inode->i_mapping->a_ops == &shmem_aops) {
shmem_unacct_size(info->flags, inode->i_size);
@@ -650,10 +642,7 @@
} else
kfree(info->symlink);
- list_for_each_entry_safe(xattr, nxattr, &info->xattr_list, list) {
- kfree(xattr->name);
- kfree(xattr);
- }
+ simple_xattrs_free(&info->xattrs);
BUG_ON(inode->i_blocks);
shmem_free_inode(inode->i_sb);
clear_inode(inode);
@@ -1377,7 +1366,7 @@
spin_lock_init(&info->lock);
info->flags = flags & VM_NORESERVE;
INIT_LIST_HEAD(&info->swaplist);
- INIT_LIST_HEAD(&info->xattr_list);
+ simple_xattrs_init(&info->xattrs);
cache_no_acl(inode);
switch (mode & S_IFMT) {
@@ -2060,28 +2049,6 @@
*/
/*
- * Allocate new xattr and copy in the value; but leave the name to callers.
- */
-static struct shmem_xattr *shmem_xattr_alloc(const void *value, size_t size)
-{
- struct shmem_xattr *new_xattr;
- size_t len;
-
- /* wrap around? */
- len = sizeof(*new_xattr) + size;
- if (len <= sizeof(*new_xattr))
- return NULL;
-
- new_xattr = kmalloc(len, GFP_KERNEL);
- if (!new_xattr)
- return NULL;
-
- new_xattr->size = size;
- memcpy(new_xattr->value, value, size);
- return new_xattr;
-}
-
-/*
* Callback for security_inode_init_security() for acquiring xattrs.
*/
static int shmem_initxattrs(struct inode *inode,
@@ -2090,11 +2057,11 @@
{
struct shmem_inode_info *info = SHMEM_I(inode);
const struct xattr *xattr;
- struct shmem_xattr *new_xattr;
+ struct simple_xattr *new_xattr;
size_t len;
for (xattr = xattr_array; xattr->name != NULL; xattr++) {
- new_xattr = shmem_xattr_alloc(xattr->value, xattr->value_len);
+ new_xattr = simple_xattr_alloc(xattr->value, xattr->value_len);
if (!new_xattr)
return -ENOMEM;
@@ -2111,91 +2078,12 @@
memcpy(new_xattr->name + XATTR_SECURITY_PREFIX_LEN,
xattr->name, len);
- spin_lock(&info->lock);
- list_add(&new_xattr->list, &info->xattr_list);
- spin_unlock(&info->lock);
+ simple_xattr_list_add(&info->xattrs, new_xattr);
}
return 0;
}
-static int shmem_xattr_get(struct dentry *dentry, const char *name,
- void *buffer, size_t size)
-{
- struct shmem_inode_info *info;
- struct shmem_xattr *xattr;
- int ret = -ENODATA;
-
- info = SHMEM_I(dentry->d_inode);
-
- spin_lock(&info->lock);
- list_for_each_entry(xattr, &info->xattr_list, list) {
- if (strcmp(name, xattr->name))
- continue;
-
- ret = xattr->size;
- if (buffer) {
- if (size < xattr->size)
- ret = -ERANGE;
- else
- memcpy(buffer, xattr->value, xattr->size);
- }
- break;
- }
- spin_unlock(&info->lock);
- return ret;
-}
-
-static int shmem_xattr_set(struct inode *inode, const char *name,
- const void *value, size_t size, int flags)
-{
- struct shmem_inode_info *info = SHMEM_I(inode);
- struct shmem_xattr *xattr;
- struct shmem_xattr *new_xattr = NULL;
- int err = 0;
-
- /* value == NULL means remove */
- if (value) {
- new_xattr = shmem_xattr_alloc(value, size);
- if (!new_xattr)
- return -ENOMEM;
-
- new_xattr->name = kstrdup(name, GFP_KERNEL);
- if (!new_xattr->name) {
- kfree(new_xattr);
- return -ENOMEM;
- }
- }
-
- spin_lock(&info->lock);
- list_for_each_entry(xattr, &info->xattr_list, list) {
- if (!strcmp(name, xattr->name)) {
- if (flags & XATTR_CREATE) {
- xattr = new_xattr;
- err = -EEXIST;
- } else if (new_xattr) {
- list_replace(&xattr->list, &new_xattr->list);
- } else {
- list_del(&xattr->list);
- }
- goto out;
- }
- }
- if (flags & XATTR_REPLACE) {
- xattr = new_xattr;
- err = -ENODATA;
- } else {
- list_add(&new_xattr->list, &info->xattr_list);
- xattr = NULL;
- }
-out:
- spin_unlock(&info->lock);
- if (xattr)
- kfree(xattr->name);
- kfree(xattr);
- return err;
-}
-
static const struct xattr_handler *shmem_xattr_handlers[] = {
#ifdef CONFIG_TMPFS_POSIX_ACL
&generic_acl_access_handler,
@@ -2226,6 +2114,7 @@
static ssize_t shmem_getxattr(struct dentry *dentry, const char *name,
void *buffer, size_t size)
{
+ struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
int err;
/*
@@ -2240,12 +2129,13 @@
if (err)
return err;
- return shmem_xattr_get(dentry, name, buffer, size);
+ return simple_xattr_get(&info->xattrs, name, buffer, size);
}
static int shmem_setxattr(struct dentry *dentry, const char *name,
const void *value, size_t size, int flags)
{
+ struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
int err;
/*
@@ -2260,15 +2150,12 @@
if (err)
return err;
- if (size == 0)
- value = ""; /* empty EA, do not remove */
-
- return shmem_xattr_set(dentry->d_inode, name, value, size, flags);
-
+ return simple_xattr_set(&info->xattrs, name, value, size, flags);
}
static int shmem_removexattr(struct dentry *dentry, const char *name)
{
+ struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
int err;
/*
@@ -2283,45 +2170,13 @@
if (err)
return err;
- return shmem_xattr_set(dentry->d_inode, name, NULL, 0, XATTR_REPLACE);
-}
-
-static bool xattr_is_trusted(const char *name)
-{
- return !strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN);
+ return simple_xattr_remove(&info->xattrs, name);
}
static ssize_t shmem_listxattr(struct dentry *dentry, char *buffer, size_t size)
{
- bool trusted = capable(CAP_SYS_ADMIN);
- struct shmem_xattr *xattr;
- struct shmem_inode_info *info;
- size_t used = 0;
-
- info = SHMEM_I(dentry->d_inode);
-
- spin_lock(&info->lock);
- list_for_each_entry(xattr, &info->xattr_list, list) {
- size_t len;
-
- /* skip "trusted." attributes for unprivileged callers */
- if (!trusted && xattr_is_trusted(xattr->name))
- continue;
-
- len = strlen(xattr->name) + 1;
- used += len;
- if (buffer) {
- if (size < used) {
- used = -ERANGE;
- break;
- }
- memcpy(buffer, xattr->name, len);
- buffer += len;
- }
- }
- spin_unlock(&info->lock);
-
- return used;
+ struct shmem_inode_info *info = SHMEM_I(dentry->d_inode);
+ return simple_xattr_list(&info->xattrs, buffer, size);
}
#endif /* CONFIG_TMPFS_XATTR */
diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c
index c75e3f9..6bc460c 100644
--- a/net/core/netprio_cgroup.c
+++ b/net/core/netprio_cgroup.c
@@ -326,9 +326,7 @@
.create = cgrp_create,
.destroy = cgrp_destroy,
.attach = net_prio_attach,
-#ifdef CONFIG_NETPRIO_CGROUP
.subsys_id = net_prio_subsys_id,
-#endif
.base_cftypes = ss_files,
.module = THIS_MODULE
};
@@ -366,10 +364,6 @@
ret = cgroup_load_subsys(&net_prio_subsys);
if (ret)
goto out;
-#ifndef CONFIG_NETPRIO_CGROUP
- smp_wmb();
- net_prio_subsys_id = net_prio_subsys.subsys_id;
-#endif
register_netdevice_notifier(&netprio_device_notifier);
@@ -386,11 +380,6 @@
cgroup_unload_subsys(&net_prio_subsys);
-#ifndef CONFIG_NETPRIO_CGROUP
- net_prio_subsys_id = -1;
- synchronize_rcu();
-#endif
-
rtnl_lock();
for_each_netdev(&init_net, dev) {
old = rtnl_dereference(dev->priomap);
diff --git a/net/core/sock.c b/net/core/sock.c
index a6000fb..341fa1c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -326,17 +326,6 @@
}
EXPORT_SYMBOL(__sk_backlog_rcv);
-#if defined(CONFIG_CGROUPS)
-#if !defined(CONFIG_NET_CLS_CGROUP)
-int net_cls_subsys_id = -1;
-EXPORT_SYMBOL_GPL(net_cls_subsys_id);
-#endif
-#if !defined(CONFIG_NETPRIO_CGROUP)
-int net_prio_subsys_id = -1;
-EXPORT_SYMBOL_GPL(net_prio_subsys_id);
-#endif
-#endif
-
static int sock_set_timeout(long *timeo_p, char __user *optval, int optlen)
{
struct timeval tv;
@@ -1224,6 +1213,7 @@
}
#ifdef CONFIG_CGROUPS
+#if IS_ENABLED(CONFIG_NET_CLS_CGROUP)
void sock_update_classid(struct sock *sk)
{
u32 classid;
@@ -1235,7 +1225,9 @@
sk->sk_classid = classid;
}
EXPORT_SYMBOL(sock_update_classid);
+#endif
+#if IS_ENABLED(CONFIG_NETPRIO_CGROUP)
void sock_update_netprioidx(struct sock *sk, struct task_struct *task)
{
if (in_interrupt())
@@ -1245,6 +1237,7 @@
}
EXPORT_SYMBOL_GPL(sock_update_netprioidx);
#endif
+#endif
/**
* sk_alloc - All socket objects are allocated here
diff --git a/net/sched/cls_cgroup.c b/net/sched/cls_cgroup.c
index 7743ea8..67cf90d 100644
--- a/net/sched/cls_cgroup.c
+++ b/net/sched/cls_cgroup.c
@@ -77,9 +77,7 @@
.name = "net_cls",
.create = cgrp_create,
.destroy = cgrp_destroy,
-#ifdef CONFIG_NET_CLS_CGROUP
.subsys_id = net_cls_subsys_id,
-#endif
.base_cftypes = ss_files,
.module = THIS_MODULE,
};
@@ -283,12 +281,6 @@
if (ret)
goto out;
-#ifndef CONFIG_NET_CLS_CGROUP
- /* We can't use rcu_assign_pointer because this is an int. */
- smp_wmb();
- net_cls_subsys_id = net_cls_subsys.subsys_id;
-#endif
-
ret = register_tcf_proto_ops(&cls_cgroup_ops);
if (ret)
cgroup_unload_subsys(&net_cls_subsys);
@@ -301,11 +293,6 @@
{
unregister_tcf_proto_ops(&cls_cgroup_ops);
-#ifndef CONFIG_NET_CLS_CGROUP
- net_cls_subsys_id = -1;
- synchronize_rcu();
-#endif
-
cgroup_unload_subsys(&net_cls_subsys);
}