netlink: Convert netlink_lookup() to use RCU protected hash table
Heavy Netlink users such as Open vSwitch spend a considerable amount of
time in netlink_lookup() due to the read-lock on nl_table_lock. Use of
RCU relieves the lock contention.
Makes use of the new resizable hash table to avoid locking on the
lookup.
The hash table will grow if entries exceeds 75% of table size up to a
total table size of 64K. It will automatically shrink if usage falls
below 30%.
Also splits nl_table_lock into a separate mutex to protect hash table
mutations and allow synchronize_rcu() to sleep while waiting for readers
during expansion and shrinking.
Before:
9.16% kpktgend_0 [openvswitch] [k] masked_flow_lookup
6.42% kpktgend_0 [pktgen] [k] mod_cur_headers
6.26% kpktgend_0 [pktgen] [k] pktgen_thread_worker
6.23% kpktgend_0 [kernel.kallsyms] [k] memset
4.79% kpktgend_0 [kernel.kallsyms] [k] netlink_lookup
4.37% kpktgend_0 [kernel.kallsyms] [k] memcpy
3.60% kpktgend_0 [openvswitch] [k] ovs_flow_extract
2.69% kpktgend_0 [kernel.kallsyms] [k] jhash2
After:
15.26% kpktgend_0 [openvswitch] [k] masked_flow_lookup
8.12% kpktgend_0 [pktgen] [k] pktgen_thread_worker
7.92% kpktgend_0 [pktgen] [k] mod_cur_headers
5.11% kpktgend_0 [kernel.kallsyms] [k] memset
4.11% kpktgend_0 [openvswitch] [k] ovs_flow_extract
4.06% kpktgend_0 [kernel.kallsyms] [k] _raw_spin_lock
3.90% kpktgend_0 [kernel.kallsyms] [k] jhash2
[...]
0.67% kpktgend_0 [kernel.kallsyms] [k] netlink_lookup
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Reviewed-by: Nikolay Aleksandrov <nikolay@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
diff --git a/net/netlink/diag.c b/net/netlink/diag.c
index 1af29624..7301850 100644
--- a/net/netlink/diag.c
+++ b/net/netlink/diag.c
@@ -4,6 +4,7 @@
#include <linux/netlink.h>
#include <linux/sock_diag.h>
#include <linux/netlink_diag.h>
+#include <linux/rhashtable.h>
#include "af_netlink.h"
@@ -101,16 +102,20 @@
int protocol, int s_num)
{
struct netlink_table *tbl = &nl_table[protocol];
- struct nl_portid_hash *hash = &tbl->hash;
+ struct rhashtable *ht = &tbl->hash;
+ const struct bucket_table *htbl = rht_dereference(ht->tbl, ht);
struct net *net = sock_net(skb->sk);
struct netlink_diag_req *req;
+ struct netlink_sock *nlsk;
struct sock *sk;
int ret = 0, num = 0, i;
req = nlmsg_data(cb->nlh);
- for (i = 0; i <= hash->mask; i++) {
- sk_for_each(sk, &hash->table[i]) {
+ for (i = 0; i < htbl->size; i++) {
+ rht_for_each_entry(nlsk, htbl->buckets[i], ht, node) {
+ sk = (struct sock *)nlsk;
+
if (!net_eq(sock_net(sk), net))
continue;
if (num < s_num) {