Merge 4.14.71 into android-4.14
Changes in 4.14.71
i2c: xiic: Make the start and the byte count write atomic
i2c: i801: fix DNV's SMBCTRL register offset
scsi: lpfc: Correct MDS diag and nvmet configuration
nbd: don't allow invalid blocksize settings
block: bfq: swap puts in bfqg_and_blkg_put
android: binder: fix the race mmap and alloc_new_buf_locked
MIPS: VDSO: Match data page cache colouring when D$ aliases
SMB3: Backup intent flag missing for directory opens with backupuid mounts
smb3: check for and properly advertise directory lease support
Btrfs: fix data corruption when deduplicating between different files
KVM: s390: vsie: copy wrapping keys to right place
KVM: VMX: Do not allow reexecute_instruction() when skipping MMIO instr
ALSA: hda - Fix cancel_work_sync() stall from jackpoll work
cpu/hotplug: Adjust misplaced smb() in cpuhp_thread_fun()
cpu/hotplug: Prevent state corruption on error rollback
x86/microcode: Make sure boot_cpu_data.microcode is up-to-date
x86/microcode: Update the new microcode revision unconditionally
switchtec: Fix Spectre v1 vulnerability
crypto: aes-generic - fix aes-generic regression on powerpc
tpm: separate cmd_ready/go_idle from runtime_pm
ARC: [plat-axs*]: Enable SWAP
misc: mic: SCIF Fix scif_get_new_port() error handling
ethtool: Remove trailing semicolon for static inline
i2c: aspeed: Add an explicit type casting for *get_clk_reg_val
Bluetooth: h5: Fix missing dependency on BT_HCIUART_SERDEV
gpio: tegra: Move driver registration to subsys_init level
powerpc/powernv: Fix concurrency issue with npu->mmio_atsd_usage
selftests/bpf: fix a typo in map in map test
media: davinci: vpif_display: Mix memory leak on probe error path
media: dw2102: Fix memleak on sequence of probes
net: phy: Fix the register offsets in Broadcom iProc mdio mux driver
blk-mq: fix updating tags depth
scsi: target: fix __transport_register_session locking
md/raid5: fix data corruption of replacements after originals dropped
timers: Clear timer_base::must_forward_clk with timer_base::lock held
media: camss: csid: Configure data type and decode format properly
gpu: ipu-v3: default to id 0 on missing OF alias
misc: ti-st: Fix memory leak in the error path of probe()
uio: potential double frees if __uio_register_device() fails
firmware: vpd: Fix section enabled flag on vpd_section_destroy
Drivers: hv: vmbus: Cleanup synic memory free path
tty: rocket: Fix possible buffer overwrite on register_PCI
f2fs: fix to active page in lru list for read path
f2fs: do not set free of current section
f2fs: fix defined but not used build warnings
perf tools: Allow overriding MAX_NR_CPUS at compile time
NFSv4.0 fix client reference leak in callback
perf c2c report: Fix crash for empty browser
perf evlist: Fix error out while applying initial delay and LBR
macintosh/via-pmu: Add missing mmio accessors
ath9k: report tx status on EOSP
ath9k_hw: fix channel maximum power level test
ath10k: prevent active scans on potential unusable channels
wlcore: Set rx_status boottime_ns field on rx
rpmsg: core: add support to power domains for devices
MIPS: Fix ISA virt/bus conversion for non-zero PHYS_OFFSET
ata: libahci: Allow reconfigure of DEVSLP register
ata: libahci: Correct setting of DEVSLP register
scsi: 3ware: fix return 0 on the error path of probe
tools/testing/nvdimm: kaddr and pfn can be NULL to ->direct_access()
ath10k: disable bundle mgmt tx completion event support
Bluetooth: hidp: Fix handling of strncpy for hid->name information
x86/mm: Remove in_nmi() warning from vmalloc_fault()
pinctrl: imx: off by one in imx_pinconf_group_dbg_show()
gpio: ml-ioh: Fix buffer underwrite on probe error path
pinctrl/amd: only handle irq if it is pending and unmasked
net: mvneta: fix mtu change on port without link
f2fs: try grabbing node page lock aggressively in sync scenario
pktcdvd: Fix possible Spectre-v1 for pkt_devs
f2fs: fix to skip GC if type in SSA and SIT is inconsistent
tpm_tis_spi: Pass the SPI IRQ down to the driver
tpm/tpm_i2c_infineon: switch to i2c_lock_bus(..., I2C_LOCK_SEGMENT)
f2fs: fix to do sanity check with reserved blkaddr of inline inode
MIPS: Octeon: add missing of_node_put()
MIPS: generic: fix missing of_node_put()
net: dcb: For wild-card lookups, use priority -1, not 0
dm cache: only allow a single io_mode cache feature to be requested
Input: atmel_mxt_ts - only use first T9 instance
media: s5p-mfc: Fix buffer look up in s5p_mfc_handle_frame_{new, copy_time} functions
partitions/aix: append null character to print data from disk
partitions/aix: fix usage of uninitialized lv_info and lvname structures
media: helene: fix xtal frequency setting at power on
f2fs: fix to wait on page writeback before updating page
f2fs: Fix uninitialized return in f2fs_ioc_shutdown()
iommu/ipmmu-vmsa: Fix allocation in atomic context
mfd: ti_am335x_tscadc: Fix struct clk memory leak
f2fs: fix to do sanity check with {sit,nat}_ver_bitmap_bytesize
NFSv4.1: Fix a potential layoutget/layoutrecall deadlock
MIPS: WARN_ON invalid DMA cache maintenance, not BUG_ON
RDMA/cma: Do not ignore net namespace for unbound cm_id
drm/i915: set DP Main Stream Attribute for color range on DDI platforms
inet: frags: change inet_frags_init_net() return value
inet: frags: add a pointer to struct netns_frags
inet: frags: refactor ipfrag_init()
inet: frags: Convert timers to use timer_setup()
inet: frags: refactor ipv6_frag_init()
inet: frags: refactor lowpan_net_frag_init()
ipv6: export ip6 fragments sysctl to unprivileged users
rhashtable: add schedule points
inet: frags: use rhashtables for reassembly units
inet: frags: remove some helpers
inet: frags: get rif of inet_frag_evicting()
inet: frags: remove inet_frag_maybe_warn_overflow()
inet: frags: break the 2GB limit for frags storage
inet: frags: do not clone skb in ip_expire()
ipv6: frags: rewrite ip6_expire_frag_queue()
rhashtable: reorganize struct rhashtable layout
inet: frags: reorganize struct netns_frags
inet: frags: get rid of ipfrag_skb_cb/FRAG_CB
inet: frags: fix ip6frag_low_thresh boundary
ip: discard IPv4 datagrams with overlapping segments.
net: speed up skb_rbtree_purge()
net: modify skb_rbtree_purge to return the truesize of all purged skbs.
ipv6: defrag: drop non-last frags smaller than min mtu
net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
net: add rb_to_skb() and other rb tree helpers
net: sk_buff rbnode reorg
ipv4: frags: precedence bug in ip_expire()
ip: add helpers to process in-order fragments faster.
ip: process in-order fragments efficiently
ip: frags: fix crash in ip_do_fragment()
mtd: ubi: wl: Fix error return code in ubi_wl_init()
tun: fix use after free for ptr_ring
tuntap: fix use after free during release
autofs: fix autofs_sbi() does not check super block type
mm: get rid of vmacache_flush_all() entirely
Linux 4.14.71
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
diff --git a/.gitignore b/.gitignore
index f6050b8..be92dfa 100644
--- a/.gitignore
+++ b/.gitignore
@@ -122,3 +122,6 @@
# Kdevelop4
*.kdev4
+
+# fetched Android config fragments
+kernel/configs/android-*.cfg
diff --git a/Documentation/ABI/testing/sysfs-class-dual-role-usb b/Documentation/ABI/testing/sysfs-class-dual-role-usb
new file mode 100644
index 0000000..a900fd7
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-dual-role-usb
@@ -0,0 +1,71 @@
+What: /sys/class/dual_role_usb/.../
+Date: June 2015
+Contact: Badhri Jagan Sridharan<badhri@google.com>
+Description:
+ Provide a generic interface to monitor and change
+ the state of dual role usb ports. The name here
+ refers to the name mentioned in the
+ dual_role_phy_desc that is passed while registering
+ the dual_role_phy_intstance through
+ devm_dual_role_instance_register.
+
+What: /sys/class/dual_role_usb/.../supported_modes
+Date: June 2015
+Contact: Badhri Jagan Sridharan<badhri@google.com>
+Description:
+ This is a static node, once initialized this
+ is not expected to change during runtime. "dfp"
+ refers to "downstream facing port" i.e. port can
+ only act as host. "ufp" refers to "upstream
+ facing port" i.e. port can only act as device.
+ "dfp ufp" refers to "dual role port" i.e. the port
+ can either be a host port or a device port.
+
+What: /sys/class/dual_role_usb/.../mode
+Date: June 2015
+Contact: Badhri Jagan Sridharan<badhri@google.com>
+Description:
+ The mode node refers to the current mode in which the
+ port is operating. "dfp" for host ports. "ufp" for device
+ ports and "none" when cable is not connected.
+
+ On devices where the USB mode is software-controllable,
+ userspace can change the mode by writing "dfp" or "ufp".
+ On devices where the USB mode is fixed in hardware,
+ this attribute is read-only.
+
+What: /sys/class/dual_role_usb/.../power_role
+Date: June 2015
+Contact: Badhri Jagan Sridharan<badhri@google.com>
+Description:
+ The power_role node mentions whether the port
+ is "sink"ing or "source"ing power. "none" if
+ they are not connected.
+
+ On devices implementing USB Power Delivery,
+ userspace can control the power role by writing "sink" or
+ "source". On devices without USB-PD, this attribute is
+ read-only.
+
+What: /sys/class/dual_role_usb/.../data_role
+Date: June 2015
+Contact: Badhri Jagan Sridharan<badhri@google.com>
+Description:
+ The data_role node mentions whether the port
+ is acting as "host" or "device" for USB data connection.
+ "none" if there is no active data link.
+
+ On devices implementing USB Power Delivery, userspace
+ can control the data role by writing "host" or "device".
+ On devices without USB-PD, this attribute is read-only
+
+What: /sys/class/dual_role_usb/.../powers_vconn
+Date: June 2015
+Contact: Badhri Jagan Sridharan<badhri@google.com>
+Description:
+ The powers_vconn node mentions whether the port
+ is supplying power for VCONN pin.
+
+ On devices with software control of VCONN,
+ userspace can disable the power supply to VCONN by writing "n",
+ or enable the power supply by writing "y".
diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs
index 11b7f4e..372b88f 100644
--- a/Documentation/ABI/testing/sysfs-fs-f2fs
+++ b/Documentation/ABI/testing/sysfs-fs-f2fs
@@ -51,6 +51,18 @@
Controls the dirty page count condition for the in-place-update
policies.
+What: /sys/fs/f2fs/<disk>/min_hot_blocks
+Date: March 2017
+Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
+Description:
+ Controls the dirty page count condition for redefining hot data.
+
+What: /sys/fs/f2fs/<disk>/min_ssr_sections
+Date: October 2017
+Contact: "Chao Yu" <yuchao0@huawei.com>
+Description:
+ Controls the fee section threshold to trigger SSR allocation.
+
What: /sys/fs/f2fs/<disk>/max_small_discards
Date: November 2013
Contact: "Jaegeuk Kim" <jaegeuk.kim@samsung.com>
@@ -89,6 +101,7 @@
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description:
Controls the trimming rate in batch mode.
+ <deprecated>
What: /sys/fs/f2fs/<disk>/cp_interval
Date: October 2015
@@ -102,6 +115,12 @@
Description:
Controls the idle timing.
+What: /sys/fs/f2fs/<disk>/iostat_enable
+Date: August 2017
+Contact: "Chao Yu" <yuchao0@huawei.com>
+Description:
+ Controls to enable/disable IO stat.
+
What: /sys/fs/f2fs/<disk>/ra_nid_pages
Date: October 2015
Contact: "Chao Yu" <chao2.yu@samsung.com>
@@ -122,6 +141,12 @@
Description:
Shows total written kbytes issued to disk.
+What: /sys/fs/f2fs/<disk>/feature
+Date: July 2017
+Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
+Description:
+ Shows all enabled features in current device.
+
What: /sys/fs/f2fs/<disk>/inject_rate
Date: May 2016
Contact: "Sheng Yong" <shengyong1@huawei.com>
@@ -138,7 +163,18 @@
Date: June 2017
Contact: "Chao Yu" <yuchao0@huawei.com>
Description:
- Controls current reserved blocks in system.
+ Controls target reserved blocks in system, the threshold
+ is soft, it could exceed current available user space.
+
+What: /sys/fs/f2fs/<disk>/current_reserved_blocks
+Date: October 2017
+Contact: "Yunlong Song" <yunlong.song@huawei.com>
+Contact: "Chao Yu" <yuchao0@huawei.com>
+Description:
+ Shows current reserved blocks in system, it may be temporarily
+ smaller than target_reserved_blocks, but will gradually
+ increase to target_reserved_blocks when more free blocks are
+ freed by user later.
What: /sys/fs/f2fs/<disk>/gc_urgent
Date: August 2017
@@ -151,3 +187,20 @@
Contact: "Jaegeuk Kim" <jaegeuk@kernel.org>
Description:
Controls sleep time of GC urgent mode
+
+What: /sys/fs/f2fs/<disk>/readdir_ra
+Date: November 2017
+Contact: "Sheng Yong" <shengyong1@huawei.com>
+Description:
+ Controls readahead inode block in readdir.
+
+What: /sys/fs/f2fs/<disk>/extension_list
+Date: Feburary 2018
+Contact: "Chao Yu" <yuchao0@huawei.com>
+Description:
+ Used to control configure extension list:
+ - Query: cat /sys/fs/f2fs/<disk>/extension_list
+ - Add: echo '[h/c]extension' > /sys/fs/f2fs/<disk>/extension_list
+ - Del: echo '[h/c]!extension' > /sys/fs/f2fs/<disk>/extension_list
+ - [h] means add/del hot file extension
+ - [c] means add/del cold file extension
diff --git a/Documentation/ABI/testing/sysfs-kernel-wakeup_reasons b/Documentation/ABI/testing/sysfs-kernel-wakeup_reasons
new file mode 100644
index 0000000..acb19b9
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-wakeup_reasons
@@ -0,0 +1,16 @@
+What: /sys/kernel/wakeup_reasons/last_resume_reason
+Date: February 2014
+Contact: Ruchi Kandoi <kandoiruchi@google.com>
+Description:
+ The /sys/kernel/wakeup_reasons/last_resume_reason is
+ used to report wakeup reasons after system exited suspend.
+
+What: /sys/kernel/wakeup_reasons/last_suspend_time
+Date: March 2015
+Contact: jinqian <jinqian@google.com>
+Description:
+ The /sys/kernel/wakeup_reasons/last_suspend_time is
+ used to report time spent in last suspend cycle. It contains
+ two numbers (in seconds) separated by space. First number is
+ the time spent in suspend and resume processes. Second number
+ is the time spent in sleep state.
\ No newline at end of file
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9841bad..00492110 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -837,6 +837,9 @@
dis_ucode_ldr [X86] Disable the microcode loader.
+ dm= [DM] Allows early creation of a device-mapper device.
+ See Documentation/device-mapper/boot.txt.
+
dma_debug=off If the kernel is compiled with DMA_API_DEBUG support,
this option disables the debugging code at boot.
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 257e657..875b2b5 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -218,6 +218,7 @@
same_pages the number of same element filled pages written to this disk.
No memory is allocated for such pages.
pages_compacted the number of pages freed during compaction
+ huge_pages the number of incompressible pages
9) Deactivate:
swapoff /dev/zram0
@@ -242,5 +243,29 @@
User should set up backing device via /sys/block/zramX/backing_dev
before disksize setting.
+= memory tracking
+
+With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the
+zram block. It could be useful to catch cold or incompressible
+pages of the process with*pagemap.
+If you enable the feature, you could see block state via
+/sys/kernel/debug/zram/zram0/block_state". The output is as follows,
+
+ 300 75.033841 .wh
+ 301 63.806904 s..
+ 302 63.806919 ..h
+
+First column is zram's block index.
+Second column is access time since the system was booted
+Third column is state of the block.
+(s: same page
+w: written page to backing store
+h: huge page)
+
+First line of above example says 300th block is accessed at 75.033841sec
+and the block's state is huge so it is written back to the backing
+storage. It's a debugging feature so anyone shouldn't rely on it to work
+properly.
+
Nitin Gupta
ngupta@vflare.org
diff --git a/Documentation/device-mapper/boot.txt b/Documentation/device-mapper/boot.txt
new file mode 100644
index 0000000..adcaad5
--- /dev/null
+++ b/Documentation/device-mapper/boot.txt
@@ -0,0 +1,42 @@
+Boot time creation of mapped devices
+===================================
+
+It is possible to configure a device mapper device to act as the root
+device for your system in two ways.
+
+The first is to build an initial ramdisk which boots to a minimal
+userspace which configures the device, then pivot_root(8) in to it.
+
+For simple device mapper configurations, it is possible to boot directly
+using the following kernel command line:
+
+dm="<name> <uuid> <ro>,table line 1,...,table line n"
+
+name = the name to associate with the device
+ after boot, udev, if used, will use that name to label
+ the device node.
+uuid = may be 'none' or the UUID desired for the device.
+ro = may be "ro" or "rw". If "ro", the device and device table will be
+ marked read-only.
+
+Each table line may be as normal when using the dmsetup tool except for
+two variations:
+1. Any use of commas will be interpreted as a newline
+2. Quotation marks cannot be escaped and cannot be used without
+ terminating the dm= argument.
+
+Unless renamed by udev, the device node created will be dm-0 as the
+first minor number for the device-mapper is used during early creation.
+
+Example
+=======
+
+- Booting to a linear array made up of user-mode linux block devices:
+
+ dm="lroot none 0, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" \
+ root=/dev/dm-0
+
+Will boot to a rw dm-linear target of 8192 sectors split across two
+block devices identified by their major:minor numbers. After boot, udev
+will rename this target to /dev/mapper/lroot (depending on the rules).
+No uuid was assigned.
diff --git a/Documentation/device-mapper/verity.txt b/Documentation/device-mapper/verity.txt
index 89fd8f9..b3d2e4a 100644
--- a/Documentation/device-mapper/verity.txt
+++ b/Documentation/device-mapper/verity.txt
@@ -109,6 +109,17 @@
This is the offset, in <data_block_size> blocks, from the start of the
FEC device to the beginning of the encoding data.
+check_at_most_once
+ Verify data blocks only the first time they are read from the data device,
+ rather than every time. This reduces the overhead of dm-verity so that it
+ can be used on systems that are memory and/or CPU constrained. However, it
+ provides a reduced level of security because only offline tampering of the
+ data device's content will be detected, not online tampering.
+
+ Hash blocks are still verified each time they are read from the hash device,
+ since verification of hash blocks is less performance critical than data
+ blocks, and a hash block will not be verified any more after all the data
+ blocks it covers have been verified anyway.
Theory of operation
===================
diff --git a/Documentation/devicetree/bindings/misc/memory-state-time.txt b/Documentation/devicetree/bindings/misc/memory-state-time.txt
new file mode 100644
index 0000000..c99a506
--- /dev/null
+++ b/Documentation/devicetree/bindings/misc/memory-state-time.txt
@@ -0,0 +1,8 @@
+Memory bandwidth and frequency state tracking
+
+Required properties:
+- compatible : should be:
+ "memory-state-time"
+- freq-tbl: Should contain entries with each frequency in Hz.
+- bw-buckets: Should contain upper-bound limits for each bandwidth bucket in Mbps.
+ Must match the framework power_profile.xml for the device.
diff --git a/Documentation/devicetree/bindings/scheduler/sched-energy-costs.txt b/Documentation/devicetree/bindings/scheduler/sched-energy-costs.txt
new file mode 100644
index 0000000..447cc32
--- /dev/null
+++ b/Documentation/devicetree/bindings/scheduler/sched-energy-costs.txt
@@ -0,0 +1,383 @@
+===========================================================
+Energy cost bindings for Energy Aware Scheduling
+===========================================================
+
+===========================================================
+1 - Introduction
+===========================================================
+
+This note specifies bindings required for energy-aware scheduling
+(EAS)[1]. Historically, the scheduler's primary objective has been
+performance. EAS aims to provide an alternative objective - energy
+efficiency. EAS relies on a simple platform energy cost model to
+guide scheduling decisions. The model only considers the CPU
+subsystem.
+
+This note is aligned with the definition of the layout of physical
+CPUs in the system as described in the ARM topology binding
+description [2]. The concept is applicable to any system so long as
+the cost model data is provided for those processing elements in
+that system's topology that EAS is required to service.
+
+Processing elements refer to hardware threads, CPUs and clusters of
+related CPUs in increasing order of hierarchy.
+
+EAS requires two key cost metrics - busy costs and idle costs. Busy
+costs comprise of a list of compute capacities for the processing
+element in question and the corresponding power consumption at that
+capacity. Idle costs comprise of a list of power consumption values
+for each idle state [C-state] that the processing element supports.
+For a detailed description of these metrics, their derivation and
+their use see [3].
+
+These cost metrics are required for processing elements in all
+scheduling domain levels that EAS is required to service.
+
+===========================================================
+2 - energy-costs node
+===========================================================
+
+Energy costs for the processing elements in scheduling domains that
+EAS is required to service are defined in the energy-costs node
+which acts as a container for the actual per processing element cost
+nodes. A single energy-costs node is required for a given system.
+
+- energy-costs node
+
+ Usage: Required
+
+ Description: The energy-costs node is a container node and
+ it's sub-nodes describe costs for each processing element at
+ all scheduling domain levels that EAS is required to
+ service.
+
+ Node name must be "energy-costs".
+
+ The energy-costs node's parent node must be the cpus node.
+
+ The energy-costs node's child nodes can be:
+
+ - one or more cost nodes.
+
+ Any other configuration is considered invalid.
+
+The energy-costs node can only contain a single type of child node
+whose bindings are described in paragraph 4.
+
+===========================================================
+3 - energy-costs node child nodes naming convention
+===========================================================
+
+energy-costs child nodes must follow a naming convention where the
+node name must be "thread-costN", "core-costN", "cluster-costN"
+depending on whether the costs in the node are for a thread, core or
+cluster. N (where N = {0, 1, ...}) is the node number and has no
+bearing to the OS' logical thread, core or cluster index.
+
+===========================================================
+4 - cost node bindings
+===========================================================
+
+Bindings for cost nodes are defined as follows:
+
+- system-cost node
+
+ Description: Optional. Must be declared within an energy-costs
+ node. A system should contain no more than one system-cost node.
+
+ Systems with no modelled system cost should not provide this
+ node.
+
+ The system-cost node name must be "system-costN" as
+ described in 3 above.
+
+ A system-cost node must be a leaf node with no children.
+
+ Properties for system-cost nodes are described in paragraph
+ 5 below.
+
+ Any other configuration is considered invalid.
+
+- cluster-cost node
+
+ Description: must be declared within an energy-costs node. A
+ system can contain multiple clusters and each cluster
+ serviced by EAS must have a corresponding cluster-costs
+ node.
+
+ The cluster-cost node name must be "cluster-costN" as
+ described in 3 above.
+
+ A cluster-cost node must be a leaf node with no children.
+
+ Properties for cluster-cost nodes are described in paragraph
+ 5 below.
+
+ Any other configuration is considered invalid.
+
+- core-cost node
+
+ Description: must be declared within an energy-costs node. A
+ system can contain multiple cores and each core serviced by
+ EAS must have a corresponding core-cost node.
+
+ The core-cost node name must be "core-costN" as described in
+ 3 above.
+
+ A core-cost node must be a leaf node with no children.
+
+ Properties for core-cost nodes are described in paragraph
+ 5 below.
+
+ Any other configuration is considered invalid.
+
+- thread-cost node
+
+ Description: must be declared within an energy-costs node. A
+ system can contain cores with multiple hardware threads and
+ each thread serviced by EAS must have a corresponding
+ thread-cost node.
+
+ The core-cost node name must be "core-costN" as described in
+ 3 above.
+
+ A core-cost node must be a leaf node with no children.
+
+ Properties for thread-cost nodes are described in paragraph
+ 5 below.
+
+ Any other configuration is considered invalid.
+
+===========================================================
+5 - Cost node properties
+==========================================================
+
+All cost node types must have only the following properties:
+
+- busy-cost-data
+
+ Usage: required
+ Value type: An array of 2-item tuples. Each item is of type
+ u32.
+ Definition: The first item in the tuple is the capacity
+ value as described in [3]. The second item in the tuple is
+ the energy cost value as described in [3].
+
+- idle-cost-data
+
+ Usage: required
+ Value type: An array of 1-item tuples. The item is of type
+ u32.
+ Definition: The item in the tuple is the energy cost value
+ as described in [3].
+
+===========================================================
+4 - Extensions to the cpu node
+===========================================================
+
+The cpu node is extended with a property that establishes the
+connection between the processing element represented by the cpu
+node and the cost-nodes associated with this processing element.
+
+The connection is expressed in line with the topological hierarchy
+that this processing element belongs to starting with the level in
+the hierarchy that this processing element itself belongs to through
+to the highest level that EAS is required to service. The
+connection cannot be sparse and must be contiguous from the
+processing element's level through to the highest desired level. The
+highest desired level must be the same for all processing elements.
+
+Example: Given that a cpu node may represent a thread that is a part
+of a core, this property may contain multiple elements which
+associate the thread with cost nodes describing the costs for the
+thread itself, the core the thread belongs to, the cluster the core
+belongs to and so on. The elements must be ordered from the lowest
+level nodes to the highest desired level that EAS must service. The
+highest desired level must be the same for all cpu nodes. The
+elements must not be sparse: there must be elements for the current
+thread, the next level of hierarchy (core) and so on without any
+'holes'.
+
+Example: Given that a cpu node may represent a core that is a part
+of a cluster of related cpus this property may contain multiple
+elements which associate the core with cost nodes describing the
+costs for the core itself, the cluster the core belongs to and so
+on. The elements must be ordered from the lowest level nodes to the
+highest desired level that EAS must service. The highest desired
+level must be the same for all cpu nodes. The elements must not be
+sparse: there must be elements for the current thread, the next
+level of hierarchy (core) and so on without any 'holes'.
+
+If the system comprises of hierarchical clusters of clusters, this
+property will contain multiple associations with the relevant number
+of cluster elements in hierarchical order.
+
+Property added to the cpu node:
+
+- sched-energy-costs
+
+ Usage: required
+ Value type: List of phandles
+ Definition: a list of phandles to specific cost nodes in the
+ energy-costs parent node that correspond to the processing
+ element represented by this cpu node in hierarchical order
+ of topology.
+
+ The order of phandles in the list is significant. The first
+ phandle is to the current processing element's own cost
+ node. Subsequent phandles are to higher hierarchical level
+ cost nodes up until the maximum level that EAS is to
+ service.
+
+ All cpu nodes must have the same highest level cost node.
+
+ The phandle list must not be sparsely populated with handles
+ to non-contiguous hierarchical levels. See commentary above
+ for clarity.
+
+ Any other configuration is invalid.
+
+- freq-energy-model
+ Description: Optional. Must be declared if the energy model
+ represents frequency/power values. If absent, energy model is
+ by default considered as capacity/power.
+
+===========================================================
+5 - Example dts
+===========================================================
+
+Example 1 (ARM 64-bit, 6-cpu system, two clusters of cpus, one
+cluster of 2 Cortex-A57 cpus, one cluster of 4 Cortex-A53 cpus):
+
+cpus {
+ #address-cells = <2>;
+ #size-cells = <0>;
+ .
+ .
+ .
+ A57_0: cpu@0 {
+ compatible = "arm,cortex-a57","arm,armv8";
+ reg = <0x0 0x0>;
+ device_type = "cpu";
+ enable-method = "psci";
+ next-level-cache = <&A57_L2>;
+ clocks = <&scpi_dvfs 0>;
+ cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ sched-energy-costs = <&CPU_COST_0 &CLUSTER_COST_0>;
+ };
+
+ A57_1: cpu@1 {
+ compatible = "arm,cortex-a57","arm,armv8";
+ reg = <0x0 0x1>;
+ device_type = "cpu";
+ enable-method = "psci";
+ next-level-cache = <&A57_L2>;
+ clocks = <&scpi_dvfs 0>;
+ cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ sched-energy-costs = <&CPU_COST_0 &CLUSTER_COST_0>;
+ };
+
+ A53_0: cpu@100 {
+ compatible = "arm,cortex-a53","arm,armv8";
+ reg = <0x0 0x100>;
+ device_type = "cpu";
+ enable-method = "psci";
+ next-level-cache = <&A53_L2>;
+ clocks = <&scpi_dvfs 1>;
+ cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ sched-energy-costs = <&CPU_COST_1 &CLUSTER_COST_1>;
+ };
+
+ A53_1: cpu@101 {
+ compatible = "arm,cortex-a53","arm,armv8";
+ reg = <0x0 0x101>;
+ device_type = "cpu";
+ enable-method = "psci";
+ next-level-cache = <&A53_L2>;
+ clocks = <&scpi_dvfs 1>;
+ cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ sched-energy-costs = <&CPU_COST_1 &CLUSTER_COST_1>;
+ };
+
+ A53_2: cpu@102 {
+ compatible = "arm,cortex-a53","arm,armv8";
+ reg = <0x0 0x102>;
+ device_type = "cpu";
+ enable-method = "psci";
+ next-level-cache = <&A53_L2>;
+ clocks = <&scpi_dvfs 1>;
+ cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ sched-energy-costs = <&CPU_COST_1 &CLUSTER_COST_1>;
+ };
+
+ A53_3: cpu@103 {
+ compatible = "arm,cortex-a53","arm,armv8";
+ reg = <0x0 0x103>;
+ device_type = "cpu";
+ enable-method = "psci";
+ next-level-cache = <&A53_L2>;
+ clocks = <&scpi_dvfs 1>;
+ cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ sched-energy-costs = <&CPU_COST_1 &CLUSTER_COST_1>;
+ };
+
+ energy-costs {
+ CPU_COST_0: core-cost0 {
+ busy-cost-data = <
+ 417 168
+ 579 251
+ 744 359
+ 883 479
+ 1024 616
+ >;
+ idle-cost-data = <
+ 15
+ 0
+ >;
+ };
+ CPU_COST_1: core-cost1 {
+ busy-cost-data = <
+ 235 33
+ 302 46
+ 368 61
+ 406 76
+ 447 93
+ >;
+ idle-cost-data = <
+ 6
+ 0
+ >;
+ };
+ CLUSTER_COST_0: cluster-cost0 {
+ busy-cost-data = <
+ 417 24
+ 579 32
+ 744 43
+ 883 49
+ 1024 64
+ >;
+ idle-cost-data = <
+ 65
+ 24
+ >;
+ };
+ CLUSTER_COST_1: cluster-cost1 {
+ busy-cost-data = <
+ 235 26
+ 303 30
+ 368 39
+ 406 47
+ 447 57
+ >;
+ idle-cost-data = <
+ 56
+ 17
+ >;
+ };
+ };
+};
+
+===============================================================================
+[1] https://lkml.org/lkml/2015/5/12/728
+[2] Documentation/devicetree/bindings/topology.txt
+[3] Documentation/scheduler/sched-energy.txt
diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt
index 13c2ff0..37d8698 100644
--- a/Documentation/filesystems/f2fs.txt
+++ b/Documentation/filesystems/f2fs.txt
@@ -174,6 +174,25 @@
offprjjquota Turn off project journelled quota.
quota Enable plain user disk quota accounting.
noquota Disable all plain disk quota option.
+whint_mode=%s Control which write hints are passed down to block
+ layer. This supports "off", "user-based", and
+ "fs-based". In "off" mode (default), f2fs does not pass
+ down hints. In "user-based" mode, f2fs tries to pass
+ down hints given by users. And in "fs-based" mode, f2fs
+ passes down hints with its policy.
+alloc_mode=%s Adjust block allocation policy, which supports "reuse"
+ and "default".
+fsync_mode=%s Control the policy of fsync. Currently supports "posix",
+ "strict", and "nobarrier". In "posix" mode, which is
+ default, fsync will follow POSIX semantics and does a
+ light operation to improve the filesystem performance.
+ In "strict" mode, fsync will be heavy and behaves in line
+ with xfs, ext4 and btrfs, where xfstest generic/342 will
+ pass, but the performance will regress. "nobarrier" is
+ based on "posix", but doesn't issue flush command for
+ non-atomic files likewise "nobarrier" mount option.
+test_dummy_encryption Enable dummy encryption, which provides a fake fscrypt
+ context. The fake fscrypt context is used by xfstests.
================================================================================
DEBUGFS ENTRIES
diff --git a/Documentation/filesystems/fscrypt.rst b/Documentation/filesystems/fscrypt.rst
new file mode 100644
index 0000000..48b424d
--- /dev/null
+++ b/Documentation/filesystems/fscrypt.rst
@@ -0,0 +1,626 @@
+=====================================
+Filesystem-level encryption (fscrypt)
+=====================================
+
+Introduction
+============
+
+fscrypt is a library which filesystems can hook into to support
+transparent encryption of files and directories.
+
+Note: "fscrypt" in this document refers to the kernel-level portion,
+implemented in ``fs/crypto/``, as opposed to the userspace tool
+`fscrypt <https://github.com/google/fscrypt>`_. This document only
+covers the kernel-level portion. For command-line examples of how to
+use encryption, see the documentation for the userspace tool `fscrypt
+<https://github.com/google/fscrypt>`_. Also, it is recommended to use
+the fscrypt userspace tool, or other existing userspace tools such as
+`fscryptctl <https://github.com/google/fscryptctl>`_ or `Android's key
+management system
+<https://source.android.com/security/encryption/file-based>`_, over
+using the kernel's API directly. Using existing tools reduces the
+chance of introducing your own security bugs. (Nevertheless, for
+completeness this documentation covers the kernel's API anyway.)
+
+Unlike dm-crypt, fscrypt operates at the filesystem level rather than
+at the block device level. This allows it to encrypt different files
+with different keys and to have unencrypted files on the same
+filesystem. This is useful for multi-user systems where each user's
+data-at-rest needs to be cryptographically isolated from the others.
+However, except for filenames, fscrypt does not encrypt filesystem
+metadata.
+
+Unlike eCryptfs, which is a stacked filesystem, fscrypt is integrated
+directly into supported filesystems --- currently ext4, F2FS, and
+UBIFS. This allows encrypted files to be read and written without
+caching both the decrypted and encrypted pages in the pagecache,
+thereby nearly halving the memory used and bringing it in line with
+unencrypted files. Similarly, half as many dentries and inodes are
+needed. eCryptfs also limits encrypted filenames to 143 bytes,
+causing application compatibility issues; fscrypt allows the full 255
+bytes (NAME_MAX). Finally, unlike eCryptfs, the fscrypt API can be
+used by unprivileged users, with no need to mount anything.
+
+fscrypt does not support encrypting files in-place. Instead, it
+supports marking an empty directory as encrypted. Then, after
+userspace provides the key, all regular files, directories, and
+symbolic links created in that directory tree are transparently
+encrypted.
+
+Threat model
+============
+
+Offline attacks
+---------------
+
+Provided that userspace chooses a strong encryption key, fscrypt
+protects the confidentiality of file contents and filenames in the
+event of a single point-in-time permanent offline compromise of the
+block device content. fscrypt does not protect the confidentiality of
+non-filename metadata, e.g. file sizes, file permissions, file
+timestamps, and extended attributes. Also, the existence and location
+of holes (unallocated blocks which logically contain all zeroes) in
+files is not protected.
+
+fscrypt is not guaranteed to protect confidentiality or authenticity
+if an attacker is able to manipulate the filesystem offline prior to
+an authorized user later accessing the filesystem.
+
+Online attacks
+--------------
+
+fscrypt (and storage encryption in general) can only provide limited
+protection, if any at all, against online attacks. In detail:
+
+fscrypt is only resistant to side-channel attacks, such as timing or
+electromagnetic attacks, to the extent that the underlying Linux
+Cryptographic API algorithms are. If a vulnerable algorithm is used,
+such as a table-based implementation of AES, it may be possible for an
+attacker to mount a side channel attack against the online system.
+Side channel attacks may also be mounted against applications
+consuming decrypted data.
+
+After an encryption key has been provided, fscrypt is not designed to
+hide the plaintext file contents or filenames from other users on the
+same system, regardless of the visibility of the keyring key.
+Instead, existing access control mechanisms such as file mode bits,
+POSIX ACLs, LSMs, or mount namespaces should be used for this purpose.
+Also note that as long as the encryption keys are *anywhere* in
+memory, an online attacker can necessarily compromise them by mounting
+a physical attack or by exploiting any kernel security vulnerability
+which provides an arbitrary memory read primitive.
+
+While it is ostensibly possible to "evict" keys from the system,
+recently accessed encrypted files will remain accessible at least
+until the filesystem is unmounted or the VFS caches are dropped, e.g.
+using ``echo 2 > /proc/sys/vm/drop_caches``. Even after that, if the
+RAM is compromised before being powered off, it will likely still be
+possible to recover portions of the plaintext file contents, if not
+some of the encryption keys as well. (Since Linux v4.12, all
+in-kernel keys related to fscrypt are sanitized before being freed.
+However, userspace would need to do its part as well.)
+
+Currently, fscrypt does not prevent a user from maliciously providing
+an incorrect key for another user's existing encrypted files. A
+protection against this is planned.
+
+Key hierarchy
+=============
+
+Master Keys
+-----------
+
+Each encrypted directory tree is protected by a *master key*. Master
+keys can be up to 64 bytes long, and must be at least as long as the
+greater of the key length needed by the contents and filenames
+encryption modes being used. For example, if AES-256-XTS is used for
+contents encryption, the master key must be 64 bytes (512 bits). Note
+that the XTS mode is defined to require a key twice as long as that
+required by the underlying block cipher.
+
+To "unlock" an encrypted directory tree, userspace must provide the
+appropriate master key. There can be any number of master keys, each
+of which protects any number of directory trees on any number of
+filesystems.
+
+Userspace should generate master keys either using a cryptographically
+secure random number generator, or by using a KDF (Key Derivation
+Function). Note that whenever a KDF is used to "stretch" a
+lower-entropy secret such as a passphrase, it is critical that a KDF
+designed for this purpose be used, such as scrypt, PBKDF2, or Argon2.
+
+Per-file keys
+-------------
+
+Master keys are not used to encrypt file contents or names directly.
+Instead, a unique key is derived for each encrypted file, including
+each regular file, directory, and symbolic link. This has several
+advantages:
+
+- In cryptosystems, the same key material should never be used for
+ different purposes. Using the master key as both an XTS key for
+ contents encryption and as a CTS-CBC key for filenames encryption
+ would violate this rule.
+- Per-file keys simplify the choice of IVs (Initialization Vectors)
+ for contents encryption. Without per-file keys, to ensure IV
+ uniqueness both the inode and logical block number would need to be
+ encoded in the IVs. This would make it impossible to renumber
+ inodes, which e.g. ``resize2fs`` can do when resizing an ext4
+ filesystem. With per-file keys, it is sufficient to encode just the
+ logical block number in the IVs.
+- Per-file keys strengthen the encryption of filenames, where IVs are
+ reused out of necessity. With a unique key per directory, IV reuse
+ is limited to within a single directory.
+- Per-file keys allow individual files to be securely erased simply by
+ securely erasing their keys. (Not yet implemented.)
+
+A KDF (Key Derivation Function) is used to derive per-file keys from
+the master key. This is done instead of wrapping a randomly-generated
+key for each file because it reduces the size of the encryption xattr,
+which for some filesystems makes the xattr more likely to fit in-line
+in the filesystem's inode table. With a KDF, only a 16-byte nonce is
+required --- long enough to make key reuse extremely unlikely. A
+wrapped key, on the other hand, would need to be up to 64 bytes ---
+the length of an AES-256-XTS key. Furthermore, currently there is no
+requirement to support unlocking a file with multiple alternative
+master keys or to support rotating master keys. Instead, the master
+keys may be wrapped in userspace, e.g. as done by the `fscrypt
+<https://github.com/google/fscrypt>`_ tool.
+
+The current KDF encrypts the master key using the 16-byte nonce as an
+AES-128-ECB key. The output is used as the derived key. If the
+output is longer than needed, then it is truncated to the needed
+length. Truncation is the norm for directories and symlinks, since
+those use the CTS-CBC encryption mode which requires a key half as
+long as that required by the XTS encryption mode.
+
+Note: this KDF meets the primary security requirement, which is to
+produce unique derived keys that preserve the entropy of the master
+key, assuming that the master key is already a good pseudorandom key.
+However, it is nonstandard and has some problems such as being
+reversible, so it is generally considered to be a mistake! It may be
+replaced with HKDF or another more standard KDF in the future.
+
+Encryption modes and usage
+==========================
+
+fscrypt allows one encryption mode to be specified for file contents
+and one encryption mode to be specified for filenames. Different
+directory trees are permitted to use different encryption modes.
+Currently, the following pairs of encryption modes are supported:
+
+- AES-256-XTS for contents and AES-256-CTS-CBC for filenames
+- AES-128-CBC for contents and AES-128-CTS-CBC for filenames
+- Speck128/256-XTS for contents and Speck128/256-CTS-CBC for filenames
+
+It is strongly recommended to use AES-256-XTS for contents encryption.
+AES-128-CBC was added only for low-powered embedded devices with
+crypto accelerators such as CAAM or CESA that do not support XTS.
+
+Similarly, Speck128/256 support was only added for older or low-end
+CPUs which cannot do AES fast enough -- especially ARM CPUs which have
+NEON instructions but not the Cryptography Extensions -- and for which
+it would not otherwise be feasible to use encryption at all. It is
+not recommended to use Speck on CPUs that have AES instructions.
+Speck support is only available if it has been enabled in the crypto
+API via CONFIG_CRYPTO_SPECK. Also, on ARM platforms, to get
+acceptable performance CONFIG_CRYPTO_SPECK_NEON must be enabled.
+
+New encryption modes can be added relatively easily, without changes
+to individual filesystems. However, authenticated encryption (AE)
+modes are not currently supported because of the difficulty of dealing
+with ciphertext expansion.
+
+For file contents, each filesystem block is encrypted independently.
+Currently, only the case where the filesystem block size is equal to
+the system's page size (usually 4096 bytes) is supported. With the
+XTS mode of operation (recommended), the logical block number within
+the file is used as the IV. With the CBC mode of operation (not
+recommended), ESSIV is used; specifically, the IV for CBC is the
+logical block number encrypted with AES-256, where the AES-256 key is
+the SHA-256 hash of the inode's data encryption key.
+
+For filenames, the full filename is encrypted at once. Because of the
+requirements to retain support for efficient directory lookups and
+filenames of up to 255 bytes, a constant initialization vector (IV) is
+used. However, each encrypted directory uses a unique key, which
+limits IV reuse to within a single directory. Note that IV reuse in
+the context of CTS-CBC encryption means that when the original
+filenames share a common prefix at least as long as the cipher block
+size (16 bytes for AES), the corresponding encrypted filenames will
+also share a common prefix. This is undesirable; it may be fixed in
+the future by switching to an encryption mode that is a strong
+pseudorandom permutation on arbitrary-length messages, e.g. the HEH
+(Hash-Encrypt-Hash) mode.
+
+Since filenames are encrypted with the CTS-CBC mode of operation, the
+plaintext and ciphertext filenames need not be multiples of the AES
+block size, i.e. 16 bytes. However, the minimum size that can be
+encrypted is 16 bytes, so shorter filenames are NUL-padded to 16 bytes
+before being encrypted. In addition, to reduce leakage of filename
+lengths via their ciphertexts, all filenames are NUL-padded to the
+next 4, 8, 16, or 32-byte boundary (configurable). 32 is recommended
+since this provides the best confidentiality, at the cost of making
+directory entries consume slightly more space. Note that since NUL
+(``\0``) is not otherwise a valid character in filenames, the padding
+will never produce duplicate plaintexts.
+
+Symbolic link targets are considered a type of filename and are
+encrypted in the same way as filenames in directory entries. Each
+symlink also uses a unique key; hence, the hardcoded IV is not a
+problem for symlinks.
+
+User API
+========
+
+Setting an encryption policy
+----------------------------
+
+The FS_IOC_SET_ENCRYPTION_POLICY ioctl sets an encryption policy on an
+empty directory or verifies that a directory or regular file already
+has the specified encryption policy. It takes in a pointer to a
+:c:type:`struct fscrypt_policy`, defined as follows::
+
+ #define FS_KEY_DESCRIPTOR_SIZE 8
+
+ struct fscrypt_policy {
+ __u8 version;
+ __u8 contents_encryption_mode;
+ __u8 filenames_encryption_mode;
+ __u8 flags;
+ __u8 master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE];
+ };
+
+This structure must be initialized as follows:
+
+- ``version`` must be 0.
+
+- ``contents_encryption_mode`` and ``filenames_encryption_mode`` must
+ be set to constants from ``<linux/fs.h>`` which identify the
+ encryption modes to use. If unsure, use
+ FS_ENCRYPTION_MODE_AES_256_XTS (1) for ``contents_encryption_mode``
+ and FS_ENCRYPTION_MODE_AES_256_CTS (4) for
+ ``filenames_encryption_mode``.
+
+- ``flags`` must be set to a value from ``<linux/fs.h>`` which
+ identifies the amount of NUL-padding to use when encrypting
+ filenames. If unsure, use FS_POLICY_FLAGS_PAD_32 (0x3).
+
+- ``master_key_descriptor`` specifies how to find the master key in
+ the keyring; see `Adding keys`_. It is up to userspace to choose a
+ unique ``master_key_descriptor`` for each master key. The e4crypt
+ and fscrypt tools use the first 8 bytes of
+ ``SHA-512(SHA-512(master_key))``, but this particular scheme is not
+ required. Also, the master key need not be in the keyring yet when
+ FS_IOC_SET_ENCRYPTION_POLICY is executed. However, it must be added
+ before any files can be created in the encrypted directory.
+
+If the file is not yet encrypted, then FS_IOC_SET_ENCRYPTION_POLICY
+verifies that the file is an empty directory. If so, the specified
+encryption policy is assigned to the directory, turning it into an
+encrypted directory. After that, and after providing the
+corresponding master key as described in `Adding keys`_, all regular
+files, directories (recursively), and symlinks created in the
+directory will be encrypted, inheriting the same encryption policy.
+The filenames in the directory's entries will be encrypted as well.
+
+Alternatively, if the file is already encrypted, then
+FS_IOC_SET_ENCRYPTION_POLICY validates that the specified encryption
+policy exactly matches the actual one. If they match, then the ioctl
+returns 0. Otherwise, it fails with EEXIST. This works on both
+regular files and directories, including nonempty directories.
+
+Note that the ext4 filesystem does not allow the root directory to be
+encrypted, even if it is empty. Users who want to encrypt an entire
+filesystem with one key should consider using dm-crypt instead.
+
+FS_IOC_SET_ENCRYPTION_POLICY can fail with the following errors:
+
+- ``EACCES``: the file is not owned by the process's uid, nor does the
+ process have the CAP_FOWNER capability in a namespace with the file
+ owner's uid mapped
+- ``EEXIST``: the file is already encrypted with an encryption policy
+ different from the one specified
+- ``EINVAL``: an invalid encryption policy was specified (invalid
+ version, mode(s), or flags)
+- ``ENOTDIR``: the file is unencrypted and is a regular file, not a
+ directory
+- ``ENOTEMPTY``: the file is unencrypted and is a nonempty directory
+- ``ENOTTY``: this type of filesystem does not implement encryption
+- ``EOPNOTSUPP``: the kernel was not configured with encryption
+ support for this filesystem, or the filesystem superblock has not
+ had encryption enabled on it. (For example, to use encryption on an
+ ext4 filesystem, CONFIG_EXT4_ENCRYPTION must be enabled in the
+ kernel config, and the superblock must have had the "encrypt"
+ feature flag enabled using ``tune2fs -O encrypt`` or ``mkfs.ext4 -O
+ encrypt``.)
+- ``EPERM``: this directory may not be encrypted, e.g. because it is
+ the root directory of an ext4 filesystem
+- ``EROFS``: the filesystem is readonly
+
+Getting an encryption policy
+----------------------------
+
+The FS_IOC_GET_ENCRYPTION_POLICY ioctl retrieves the :c:type:`struct
+fscrypt_policy`, if any, for a directory or regular file. See above
+for the struct definition. No additional permissions are required
+beyond the ability to open the file.
+
+FS_IOC_GET_ENCRYPTION_POLICY can fail with the following errors:
+
+- ``EINVAL``: the file is encrypted, but it uses an unrecognized
+ encryption context format
+- ``ENODATA``: the file is not encrypted
+- ``ENOTTY``: this type of filesystem does not implement encryption
+- ``EOPNOTSUPP``: the kernel was not configured with encryption
+ support for this filesystem
+
+Note: if you only need to know whether a file is encrypted or not, on
+most filesystems it is also possible to use the FS_IOC_GETFLAGS ioctl
+and check for FS_ENCRYPT_FL, or to use the statx() system call and
+check for STATX_ATTR_ENCRYPTED in stx_attributes.
+
+Getting the per-filesystem salt
+-------------------------------
+
+Some filesystems, such as ext4 and F2FS, also support the deprecated
+ioctl FS_IOC_GET_ENCRYPTION_PWSALT. This ioctl retrieves a randomly
+generated 16-byte value stored in the filesystem superblock. This
+value is intended to used as a salt when deriving an encryption key
+from a passphrase or other low-entropy user credential.
+
+FS_IOC_GET_ENCRYPTION_PWSALT is deprecated. Instead, prefer to
+generate and manage any needed salt(s) in userspace.
+
+Adding keys
+-----------
+
+To provide a master key, userspace must add it to an appropriate
+keyring using the add_key() system call (see:
+``Documentation/security/keys/core.rst``). The key type must be
+"logon"; keys of this type are kept in kernel memory and cannot be
+read back by userspace. The key description must be "fscrypt:"
+followed by the 16-character lower case hex representation of the
+``master_key_descriptor`` that was set in the encryption policy. The
+key payload must conform to the following structure::
+
+ #define FS_MAX_KEY_SIZE 64
+
+ struct fscrypt_key {
+ u32 mode;
+ u8 raw[FS_MAX_KEY_SIZE];
+ u32 size;
+ };
+
+``mode`` is ignored; just set it to 0. The actual key is provided in
+``raw`` with ``size`` indicating its size in bytes. That is, the
+bytes ``raw[0..size-1]`` (inclusive) are the actual key.
+
+The key description prefix "fscrypt:" may alternatively be replaced
+with a filesystem-specific prefix such as "ext4:". However, the
+filesystem-specific prefixes are deprecated and should not be used in
+new programs.
+
+There are several different types of keyrings in which encryption keys
+may be placed, such as a session keyring, a user session keyring, or a
+user keyring. Each key must be placed in a keyring that is "attached"
+to all processes that might need to access files encrypted with it, in
+the sense that request_key() will find the key. Generally, if only
+processes belonging to a specific user need to access a given
+encrypted directory and no session keyring has been installed, then
+that directory's key should be placed in that user's user session
+keyring or user keyring. Otherwise, a session keyring should be
+installed if needed, and the key should be linked into that session
+keyring, or in a keyring linked into that session keyring.
+
+Note: introducing the complex visibility semantics of keyrings here
+was arguably a mistake --- especially given that by design, after any
+process successfully opens an encrypted file (thereby setting up the
+per-file key), possessing the keyring key is not actually required for
+any process to read/write the file until its in-memory inode is
+evicted. In the future there probably should be a way to provide keys
+directly to the filesystem instead, which would make the intended
+semantics clearer.
+
+Access semantics
+================
+
+With the key
+------------
+
+With the encryption key, encrypted regular files, directories, and
+symlinks behave very similarly to their unencrypted counterparts ---
+after all, the encryption is intended to be transparent. However,
+astute users may notice some differences in behavior:
+
+- Unencrypted files, or files encrypted with a different encryption
+ policy (i.e. different key, modes, or flags), cannot be renamed or
+ linked into an encrypted directory; see `Encryption policy
+ enforcement`_. Attempts to do so will fail with EPERM. However,
+ encrypted files can be renamed within an encrypted directory, or
+ into an unencrypted directory.
+
+- Direct I/O is not supported on encrypted files. Attempts to use
+ direct I/O on such files will fall back to buffered I/O.
+
+- The fallocate operations FALLOC_FL_COLLAPSE_RANGE,
+ FALLOC_FL_INSERT_RANGE, and FALLOC_FL_ZERO_RANGE are not supported
+ on encrypted files and will fail with EOPNOTSUPP.
+
+- Online defragmentation of encrypted files is not supported. The
+ EXT4_IOC_MOVE_EXT and F2FS_IOC_MOVE_RANGE ioctls will fail with
+ EOPNOTSUPP.
+
+- The ext4 filesystem does not support data journaling with encrypted
+ regular files. It will fall back to ordered data mode instead.
+
+- DAX (Direct Access) is not supported on encrypted files.
+
+- The st_size of an encrypted symlink will not necessarily give the
+ length of the symlink target as required by POSIX. It will actually
+ give the length of the ciphertext, which will be slightly longer
+ than the plaintext due to NUL-padding and an extra 2-byte overhead.
+
+- The maximum length of an encrypted symlink is 2 bytes shorter than
+ the maximum length of an unencrypted symlink. For example, on an
+ EXT4 filesystem with a 4K block size, unencrypted symlinks can be up
+ to 4095 bytes long, while encrypted symlinks can only be up to 4093
+ bytes long (both lengths excluding the terminating null).
+
+Note that mmap *is* supported. This is possible because the pagecache
+for an encrypted file contains the plaintext, not the ciphertext.
+
+Without the key
+---------------
+
+Some filesystem operations may be performed on encrypted regular
+files, directories, and symlinks even before their encryption key has
+been provided:
+
+- File metadata may be read, e.g. using stat().
+
+- Directories may be listed, in which case the filenames will be
+ listed in an encoded form derived from their ciphertext. The
+ current encoding algorithm is described in `Filename hashing and
+ encoding`_. The algorithm is subject to change, but it is
+ guaranteed that the presented filenames will be no longer than
+ NAME_MAX bytes, will not contain the ``/`` or ``\0`` characters, and
+ will uniquely identify directory entries.
+
+ The ``.`` and ``..`` directory entries are special. They are always
+ present and are not encrypted or encoded.
+
+- Files may be deleted. That is, nondirectory files may be deleted
+ with unlink() as usual, and empty directories may be deleted with
+ rmdir() as usual. Therefore, ``rm`` and ``rm -r`` will work as
+ expected.
+
+- Symlink targets may be read and followed, but they will be presented
+ in encrypted form, similar to filenames in directories. Hence, they
+ are unlikely to point to anywhere useful.
+
+Without the key, regular files cannot be opened or truncated.
+Attempts to do so will fail with ENOKEY. This implies that any
+regular file operations that require a file descriptor, such as
+read(), write(), mmap(), fallocate(), and ioctl(), are also forbidden.
+
+Also without the key, files of any type (including directories) cannot
+be created or linked into an encrypted directory, nor can a name in an
+encrypted directory be the source or target of a rename, nor can an
+O_TMPFILE temporary file be created in an encrypted directory. All
+such operations will fail with ENOKEY.
+
+It is not currently possible to backup and restore encrypted files
+without the encryption key. This would require special APIs which
+have not yet been implemented.
+
+Encryption policy enforcement
+=============================
+
+After an encryption policy has been set on a directory, all regular
+files, directories, and symbolic links created in that directory
+(recursively) will inherit that encryption policy. Special files ---
+that is, named pipes, device nodes, and UNIX domain sockets --- will
+not be encrypted.
+
+Except for those special files, it is forbidden to have unencrypted
+files, or files encrypted with a different encryption policy, in an
+encrypted directory tree. Attempts to link or rename such a file into
+an encrypted directory will fail with EPERM. This is also enforced
+during ->lookup() to provide limited protection against offline
+attacks that try to disable or downgrade encryption in known locations
+where applications may later write sensitive data. It is recommended
+that systems implementing a form of "verified boot" take advantage of
+this by validating all top-level encryption policies prior to access.
+
+Implementation details
+======================
+
+Encryption context
+------------------
+
+An encryption policy is represented on-disk by a :c:type:`struct
+fscrypt_context`. It is up to individual filesystems to decide where
+to store it, but normally it would be stored in a hidden extended
+attribute. It should *not* be exposed by the xattr-related system
+calls such as getxattr() and setxattr() because of the special
+semantics of the encryption xattr. (In particular, there would be
+much confusion if an encryption policy were to be added to or removed
+from anything other than an empty directory.) The struct is defined
+as follows::
+
+ #define FS_KEY_DESCRIPTOR_SIZE 8
+ #define FS_KEY_DERIVATION_NONCE_SIZE 16
+
+ struct fscrypt_context {
+ u8 format;
+ u8 contents_encryption_mode;
+ u8 filenames_encryption_mode;
+ u8 flags;
+ u8 master_key_descriptor[FS_KEY_DESCRIPTOR_SIZE];
+ u8 nonce[FS_KEY_DERIVATION_NONCE_SIZE];
+ };
+
+Note that :c:type:`struct fscrypt_context` contains the same
+information as :c:type:`struct fscrypt_policy` (see `Setting an
+encryption policy`_), except that :c:type:`struct fscrypt_context`
+also contains a nonce. The nonce is randomly generated by the kernel
+and is used to derive the inode's encryption key as described in
+`Per-file keys`_.
+
+Data path changes
+-----------------
+
+For the read path (->readpage()) of regular files, filesystems can
+read the ciphertext into the page cache and decrypt it in-place. The
+page lock must be held until decryption has finished, to prevent the
+page from becoming visible to userspace prematurely.
+
+For the write path (->writepage()) of regular files, filesystems
+cannot encrypt data in-place in the page cache, since the cached
+plaintext must be preserved. Instead, filesystems must encrypt into a
+temporary buffer or "bounce page", then write out the temporary
+buffer. Some filesystems, such as UBIFS, already use temporary
+buffers regardless of encryption. Other filesystems, such as ext4 and
+F2FS, have to allocate bounce pages specially for encryption.
+
+Filename hashing and encoding
+-----------------------------
+
+Modern filesystems accelerate directory lookups by using indexed
+directories. An indexed directory is organized as a tree keyed by
+filename hashes. When a ->lookup() is requested, the filesystem
+normally hashes the filename being looked up so that it can quickly
+find the corresponding directory entry, if any.
+
+With encryption, lookups must be supported and efficient both with and
+without the encryption key. Clearly, it would not work to hash the
+plaintext filenames, since the plaintext filenames are unavailable
+without the key. (Hashing the plaintext filenames would also make it
+impossible for the filesystem's fsck tool to optimize encrypted
+directories.) Instead, filesystems hash the ciphertext filenames,
+i.e. the bytes actually stored on-disk in the directory entries. When
+asked to do a ->lookup() with the key, the filesystem just encrypts
+the user-supplied name to get the ciphertext.
+
+Lookups without the key are more complicated. The raw ciphertext may
+contain the ``\0`` and ``/`` characters, which are illegal in
+filenames. Therefore, readdir() must base64-encode the ciphertext for
+presentation. For most filenames, this works fine; on ->lookup(), the
+filesystem just base64-decodes the user-supplied name to get back to
+the raw ciphertext.
+
+However, for very long filenames, base64 encoding would cause the
+filename length to exceed NAME_MAX. To prevent this, readdir()
+actually presents long filenames in an abbreviated form which encodes
+a strong "hash" of the ciphertext filename, along with the optional
+filesystem-specific hash(es) needed for directory lookups. This
+allows the filesystem to still, with a high degree of confidence, map
+the filename given in ->lookup() back to a particular directory entry
+that was previously listed by readdir(). See :c:type:`struct
+fscrypt_digested_name` in the source for more details.
+
+Note that the precise way that filenames are presented to userspace
+without the key is subject to change in the future. It is only meant
+as a way to temporarily present valid filenames so that commands like
+``rm -r`` work as expected on encrypted directories.
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index adba21b..99ca8e3 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -396,6 +396,8 @@
[stack] = the stack of the main process
[vdso] = the "virtual dynamic shared object",
the kernel system call handler
+ [anon:<name>] = an anonymous mapping that has been
+ named by userspace
or if empty, the mapping is anonymous.
@@ -424,6 +426,7 @@
MMUPageSize: 4 kB
Locked: 0 kB
VmFlags: rd ex mr mw me dw
+Name: name from userspace
the first of these lines shows the same information as is displayed for the
mapping in /proc/PID/maps. The remaining lines show the size of the mapping
@@ -496,6 +499,9 @@
be present in all further kernel releases. Things get changed, the flags may
be vanished or the reverse -- new added.
+The "Name" field will only be present on a mapping that has been named by
+userspace, and will show the name passed in by userspace.
+
This file is only present if the CONFIG_MMU kernel configuration option is
enabled.
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index a054b5a..6a98ad3 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -605,6 +605,16 @@
initial value when the blackhole issue goes away.
By default, it is set to 1hr.
+tcp_fwmark_accept - BOOLEAN
+ If set, incoming connections to listening sockets that do not have a
+ socket mark will set the mark of the accepting socket to the fwmark of
+ the incoming SYN packet. This will cause all packets on that connection
+ (starting from the first SYNACK) to be sent with that fwmark. The
+ listening socket's mark is unchanged. Listening sockets that already
+ have a fwmark set via setsockopt(SOL_SOCKET, SO_MARK, ...) are
+ unaffected.
+ Default: 0
+
tcp_syn_retries - INTEGER
Number of times initial SYNs for an active TCP connection attempt
will be retransmitted. Should not be higher than 127. Default value
diff --git a/Documentation/scheduler/sched-energy.txt b/Documentation/scheduler/sched-energy.txt
new file mode 100644
index 0000000..dab2f90
--- /dev/null
+++ b/Documentation/scheduler/sched-energy.txt
@@ -0,0 +1,362 @@
+Energy cost model for energy-aware scheduling (EXPERIMENTAL)
+
+Introduction
+=============
+
+The basic energy model uses platform energy data stored in sched_group_energy
+data structures attached to the sched_groups in the sched_domain hierarchy. The
+energy cost model offers two functions that can be used to guide scheduling
+decisions:
+
+1. static unsigned int sched_group_energy(struct energy_env *eenv)
+2. static int energy_diff(struct energy_env *eenv)
+
+sched_group_energy() estimates the energy consumed by all cpus in a specific
+sched_group including any shared resources owned exclusively by this group of
+cpus. Resources shared with other cpus are excluded (e.g. later level caches).
+
+energy_diff() estimates the total energy impact of a utilization change. That
+is, adding, removing, or migrating utilization (tasks).
+
+Both functions use a struct energy_env to specify the scenario to be evaluated:
+
+ struct energy_env {
+ struct sched_group *sg_top;
+ struct sched_group *sg_cap;
+ int cap_idx;
+ int util_delta;
+ int src_cpu;
+ int dst_cpu;
+ int energy;
+ };
+
+sg_top: sched_group to be evaluated. Not used by energy_diff().
+
+sg_cap: sched_group covering the cpus in the same frequency domain. Set by
+sched_group_energy().
+
+cap_idx: Capacity state to be used for energy calculations. Set by
+find_new_capacity().
+
+util_delta: Amount of utilization to be added, removed, or migrated.
+
+src_cpu: Source cpu from where 'util_delta' utilization is removed. Should be
+-1 if no source (e.g. task wake-up).
+
+dst_cpu: Destination cpu where 'util_delta' utilization is added. Should be -1
+if utilization is removed (e.g. terminating tasks).
+
+energy: Result of sched_group_energy().
+
+The metric used to represent utilization is the actual per-entity running time
+averaged over time using a geometric series. Very similar to the existing
+per-entity load-tracking, but _not_ scaled by task priority and capped by the
+capacity of the cpu. The latter property does mean that utilization may
+underestimate the compute requirements for task on fully/over utilized cpus.
+The greatest potential for energy savings without affecting performance too much
+is scenarios where the system isn't fully utilized. If the system is deemed
+fully utilized load-balancing should be done with task load (includes task
+priority) instead in the interest of fairness and performance.
+
+
+Background and Terminology
+===========================
+
+To make it clear from the start:
+
+energy = [joule] (resource like a battery on powered devices)
+power = energy/time = [joule/second] = [watt]
+
+The goal of energy-aware scheduling is to minimize energy, while still getting
+the job done. That is, we want to maximize:
+
+ performance [inst/s]
+ --------------------
+ power [W]
+
+which is equivalent to minimizing:
+
+ energy [J]
+ -----------
+ instruction
+
+while still getting 'good' performance. It is essentially an alternative
+optimization objective to the current performance-only objective for the
+scheduler. This alternative considers two objectives: energy-efficiency and
+performance. Hence, there needs to be a user controllable knob to switch the
+objective. Since it is early days, this is currently a sched_feature
+(ENERGY_AWARE).
+
+The idea behind introducing an energy cost model is to allow the scheduler to
+evaluate the implications of its decisions rather than applying energy-saving
+techniques blindly that may only have positive effects on some platforms. At
+the same time, the energy cost model must be as simple as possible to minimize
+the scheduler latency impact.
+
+Platform topology
+------------------
+
+The system topology (cpus, caches, and NUMA information, not peripherals) is
+represented in the scheduler by the sched_domain hierarchy which has
+sched_groups attached at each level that covers one or more cpus (see
+sched-domains.txt for more details). To add energy awareness to the scheduler
+we need to consider power and frequency domains.
+
+Power domain:
+
+A power domain is a part of the system that can be powered on/off
+independently. Power domains are typically organized in a hierarchy where you
+may be able to power down just a cpu or a group of cpus along with any
+associated resources (e.g. shared caches). Powering up a cpu means that all
+power domains it is a part of in the hierarchy must be powered up. Hence, it is
+more expensive to power up the first cpu that belongs to a higher level power
+domain than powering up additional cpus in the same high level domain. Two
+level power domain hierarchy example:
+
+ Power source
+ +-------------------------------+----...
+per group PD G G
+ | +----------+ |
+ +--------+-------| Shared | (other groups)
+per-cpu PD G G | resource |
+ | | +----------+
+ +-------+ +-------+
+ | CPU 0 | | CPU 1 |
+ +-------+ +-------+
+
+Frequency domain:
+
+Frequency domains (P-states) typically cover the same group of cpus as one of
+the power domain levels. That is, there might be several smaller power domains
+sharing the same frequency (P-state) or there might be a power domain spanning
+multiple frequency domains.
+
+From a scheduling point of view there is no need to know the actual frequencies
+[Hz]. All the scheduler cares about is the compute capacity available at the
+current state (P-state) the cpu is in and any other available states. For that
+reason, and to also factor in any cpu micro-architecture differences, compute
+capacity scaling states are called 'capacity states' in this document. For SMP
+systems this is equivalent to P-states. For mixed micro-architecture systems
+(like ARM big.LITTLE) it is P-states scaled according to the micro-architecture
+performance relative to the other cpus in the system.
+
+Energy modelling:
+------------------
+
+Due to the hierarchical nature of the power domains, the most obvious way to
+model energy costs is therefore to associate power and energy costs with
+domains (groups of cpus). Energy costs of shared resources are associated with
+the group of cpus that share the resources, only the cost of powering the
+cpu itself and any private resources (e.g. private L1 caches) is associated
+with the per-cpu groups (lowest level).
+
+For example, for an SMP system with per-cpu power domains and a cluster level
+(group of cpus) power domain we get the overall energy costs to be:
+
+ energy = energy_cluster + n * energy_cpu
+
+where 'n' is the number of cpus powered up and energy_cluster is the cost paid
+as soon as any cpu in the cluster is powered up.
+
+The power and frequency domains can naturally be mapped onto the existing
+sched_domain hierarchy and sched_groups by adding the necessary data to the
+existing data structures.
+
+The energy model considers energy consumption from two contributors (shown in
+the illustration below):
+
+1. Busy energy: Energy consumed while a cpu and the higher level groups that it
+belongs to are busy running tasks. Busy energy is associated with the state of
+the cpu, not an event. The time the cpu spends in this state varies. Thus, the
+most obvious platform parameter for this contribution is busy power
+(energy/time).
+
+2. Idle energy: Energy consumed while a cpu and higher level groups that it
+belongs to are idle (in a C-state). Like busy energy, idle energy is associated
+with the state of the cpu. Thus, the platform parameter for this contribution
+is idle power (energy/time).
+
+Energy consumed during transitions from an idle-state (C-state) to a busy state
+(P-state) or going the other way is ignored by the model to simplify the energy
+model calculations.
+
+
+ Power
+ ^
+ | busy->idle idle->busy
+ | transition transition
+ |
+ | _ __
+ | / \ / \__________________
+ |______________/ \ /
+ | \ /
+ | Busy \ Idle / Busy
+ | low P-state \____________/ high P-state
+ |
+ +------------------------------------------------------------> time
+
+Busy |--------------| |-----------------|
+
+Wakeup |------| |------|
+
+Idle |------------|
+
+
+The basic algorithm
+====================
+
+The basic idea is to determine the total energy impact when utilization is
+added or removed by estimating the impact at each level in the sched_domain
+hierarchy starting from the bottom (sched_group contains just a single cpu).
+The energy cost comes from busy time (sched_group is awake because one or more
+cpus are busy) and idle time (in an idle-state). Energy model numbers account
+for energy costs associated with all cpus in the sched_group as a group.
+
+ for_each_domain(cpu, sd) {
+ sg = sched_group_of(cpu)
+ energy_before = curr_util(sg) * busy_power(sg)
+ + (1-curr_util(sg)) * idle_power(sg)
+ energy_after = new_util(sg) * busy_power(sg)
+ + (1-new_util(sg)) * idle_power(sg)
+ energy_diff += energy_before - energy_after
+
+ }
+
+ return energy_diff
+
+{curr, new}_util: The cpu utilization at the lowest level and the overall
+non-idle time for the entire group for higher levels. Utilization is in the
+range 0.0 to 1.0 in the pseudo-code.
+
+busy_power: The power consumption of the sched_group.
+
+idle_power: The power consumption of the sched_group when idle.
+
+Note: It is a fundamental assumption that the utilization is (roughly) scale
+invariant. Task utilization tracking factors in any frequency scaling and
+performance scaling differences due to difference cpu microarchitectures such
+that task utilization can be used across the entire system.
+
+
+Platform energy data
+=====================
+
+struct sched_group_energy can be attached to sched_groups in the sched_domain
+hierarchy and has the following members:
+
+cap_states:
+ List of struct capacity_state representing the supported capacity states
+ (P-states). struct capacity_state has two members: cap and power, which
+ represents the compute capacity and the busy_power of the state. The
+ list must be ordered by capacity low->high.
+
+nr_cap_states:
+ Number of capacity states in cap_states list.
+
+idle_states:
+ List of struct idle_state containing idle_state power cost for each
+ idle-state supported by the system orderd by shallowest state first.
+ All states must be included at all level in the hierarchy, i.e. a
+ sched_group spanning just a single cpu must also include coupled
+ idle-states (cluster states). In addition to the cpuidle idle-states,
+ the list must also contain an entry for the idling using the arch
+ default idle (arch_idle_cpu()). Despite this state may not be a true
+ hardware idle-state it is considered the shallowest idle-state in the
+ energy model and must be the first entry. cpus may enter this state
+ (possibly 'active idling') if cpuidle decides not enter a cpuidle
+ idle-state. Default idle may not be used when cpuidle is enabled.
+ In this case, it should just be a copy of the first cpuidle idle-state.
+
+nr_idle_states:
+ Number of idle states in idle_states list.
+
+There are no unit requirements for the energy cost data. Data can be normalized
+with any reference, however, the normalization must be consistent across all
+energy cost data. That is, one bogo-joule/watt must be the same quantity for
+data, but we don't care what it is.
+
+A recipe for platform characterization
+=======================================
+
+Obtaining the actual model data for a particular platform requires some way of
+measuring power/energy. There isn't a tool to help with this (yet). This
+section provides a recipe for use as reference. It covers the steps used to
+characterize the ARM TC2 development platform. This sort of measurements is
+expected to be done anyway when tuning cpuidle and cpufreq for a given
+platform.
+
+The energy model needs two types of data (struct sched_group_energy holds
+these) for each sched_group where energy costs should be taken into account:
+
+1. Capacity state information
+
+A list containing the compute capacity and power consumption when fully
+utilized attributed to the group as a whole for each available capacity state.
+At the lowest level (group contains just a single cpu) this is the power of the
+cpu alone without including power consumed by resources shared with other cpus.
+It basically needs to fit the basic modelling approach described in "Background
+and Terminology" section:
+
+ energy_system = energy_shared + n * energy_cpu
+
+for a system containing 'n' busy cpus. Only 'energy_cpu' should be included at
+the lowest level. 'energy_shared' is included at the next level which
+represents the group of cpus among which the resources are shared.
+
+This model is, of course, a simplification of reality. Thus, power/energy
+attributions might not always exactly represent how the hardware is designed.
+Also, busy power is likely to depend on the workload. It is therefore
+recommended to use a representative mix of workloads when characterizing the
+capacity states.
+
+If the group has no capacity scaling support, the list will contain a single
+state where power is the busy power attributed to the group. The capacity
+should be set to a default value (1024).
+
+When frequency domains include multiple power domains, the group representing
+the frequency domain and all child groups share capacity states. This must be
+indicated by setting the SD_SHARE_CAP_STATES sched_domain flag. All groups at
+all levels that share the capacity state must have the list of capacity states
+with the power set to the contribution of the individual group.
+
+2. Idle power information
+
+Stored in the idle_states list. The power number is the group idle power
+consumption in each idle state as well when the group is idle but has not
+entered an idle-state ('active idle' as mentioned earlier). Due to the way the
+energy model is defined, the idle power of the deepest group idle state can
+alternatively be accounted for in the parent group busy power. In that case the
+group idle state power values are offset such that the idle power of the
+deepest state is zero. It is less intuitive, but it is easier to measure as
+idle power consumed by the group and the busy/idle power of the parent group
+cannot be distinguished without per group measurement points.
+
+Measuring capacity states and idle power:
+
+The capacity states' capacity and power can be estimated by running a benchmark
+workload at each available capacity state. By restricting the benchmark to run
+on subsets of cpus it is possible to extrapolate the power consumption of
+shared resources.
+
+ARM TC2 has two clusters of two and three cpus respectively. Each cluster has a
+shared L2 cache. TC2 has on-chip energy counters per cluster. Running a
+benchmark workload on just one cpu in a cluster means that power is consumed in
+the cluster (higher level group) and a single cpu (lowest level group). Adding
+another benchmark task to another cpu increases the power consumption by the
+amount consumed by the additional cpu. Hence, it is possible to extrapolate the
+cluster busy power.
+
+For platforms that don't have energy counters or equivalent instrumentation
+built-in, it may be possible to use an external DAQ to acquire similar data.
+
+If the benchmark includes some performance score (for example sysbench cpu
+benchmark), this can be used to record the compute capacity.
+
+Measuring idle power requires insight into the idle state implementation on the
+particular platform. Specifically, if the platform has coupled idle-states (or
+package states). To measure non-coupled per-cpu idle-states it is necessary to
+keep one cpu busy to keep any shared resources alive to isolate the idle power
+of the cpu from idle/busy power of the shared resources. The cpu can be tricked
+into different per-cpu idle states by disabling the other states. Based on
+various combinations of measurements with specific cpus busy and disabling
+idle-states it is possible to extrapolate the idle-state power.
diff --git a/Documentation/scheduler/sched-pelt.c b/Documentation/scheduler/sched-pelt.c
index e421913..cd3e1fe 100644
--- a/Documentation/scheduler/sched-pelt.c
+++ b/Documentation/scheduler/sched-pelt.c
@@ -10,21 +10,21 @@
#include <math.h>
#include <stdio.h>
-#define HALFLIFE 32
+#define HALFLIFE { 32, 16, 8 }
#define SHIFT 32
double y;
-void calc_runnable_avg_yN_inv(void)
+void calc_runnable_avg_yN_inv(const int halflife)
{
int i;
unsigned int x;
printf("static const u32 runnable_avg_yN_inv[] = {");
- for (i = 0; i < HALFLIFE; i++) {
+ for (i = 0; i < halflife; i++) {
x = ((1UL<<32)-1)*pow(y, i);
- if (i % 6 == 0) printf("\n\t");
+ if (i % 4 == 0) printf("\n\t");
printf("0x%8x, ", x);
}
printf("\n};\n\n");
@@ -32,12 +32,12 @@
int sum = 1024;
-void calc_runnable_avg_yN_sum(void)
+void calc_runnable_avg_yN_sum(const int halflife)
{
int i;
printf("static const u32 runnable_avg_yN_sum[] = {\n\t 0,");
- for (i = 1; i <= HALFLIFE; i++) {
+ for (i = 1; i <= halflife; i++) {
if (i == 1)
sum *= y;
else
@@ -55,7 +55,7 @@
/* first period */
long max = 1024;
-void calc_converged_max(void)
+void calc_converged_max(const int halflife)
{
long last = 0, y_inv = ((1UL<<32)-1)*y;
@@ -73,17 +73,17 @@
last = max;
}
n--;
- printf("#define LOAD_AVG_PERIOD %d\n", HALFLIFE);
+ printf("#define LOAD_AVG_PERIOD %d\n", halflife);
printf("#define LOAD_AVG_MAX %ld\n", max);
-// printf("#define LOAD_AVG_MAX_N %d\n\n", n);
+ printf("#define LOAD_AVG_MAX_N %d\n\n", n);
}
-void calc_accumulated_sum_32(void)
+void calc_accumulated_sum_32(const int halflife)
{
int i, x = sum;
printf("static const u32 __accumulated_sum_N32[] = {\n\t 0,");
- for (i = 1; i <= n/HALFLIFE+1; i++) {
+ for (i = 1; i <= n/halflife+1; i++) {
if (i > 1)
x = x/2 + sum;
@@ -97,12 +97,22 @@
void main(void)
{
+ int hl_value[] = HALFLIFE;
+ int hl_count = sizeof(hl_value) / sizeof(int);
+ int hl_idx, halflife;
+
printf("/* Generated by Documentation/scheduler/sched-pelt; do not modify. */\n\n");
- y = pow(0.5, 1/(double)HALFLIFE);
+ for (hl_idx = 0; hl_idx < hl_count; ++hl_idx) {
+ halflife = hl_value[hl_idx];
- calc_runnable_avg_yN_inv();
-// calc_runnable_avg_yN_sum();
- calc_converged_max();
-// calc_accumulated_sum_32();
+ y = pow(0.5, 1/(double)halflife);
+
+ printf("#if CONFIG_PELT_UTIL_HALFLIFE_%d\n", halflife);
+ calc_runnable_avg_yN_inv(halflife);
+ calc_runnable_avg_yN_sum(halflife);
+ calc_converged_max(halflife);
+ calc_accumulated_sum_32(halflife);
+ printf("#endif\n\n");
+ }
}
diff --git a/Documentation/scheduler/sched-tune.txt b/Documentation/scheduler/sched-tune.txt
new file mode 100644
index 0000000..5df0ea3
--- /dev/null
+++ b/Documentation/scheduler/sched-tune.txt
@@ -0,0 +1,413 @@
+ Central, scheduler-driven, power-performance control
+ (EXPERIMENTAL)
+
+Abstract
+========
+
+The topic of a single simple power-performance tunable, that is wholly
+scheduler centric, and has well defined and predictable properties has come up
+on several occasions in the past [1,2]. With techniques such as a scheduler
+driven DVFS [3], we now have a good framework for implementing such a tunable.
+This document describes the overall ideas behind its design and implementation.
+
+
+Table of Contents
+=================
+
+1. Motivation
+2. Introduction
+3. Signal Boosting Strategy
+4. OPP selection using boosted CPU utilization
+5. Per task group boosting
+6. Per-task wakeup-placement-strategy Selection
+7. Question and Answers
+ - What about "auto" mode?
+ - What about boosting on a congested system?
+ - How CPUs are boosted when we have tasks with multiple boost values?
+8. References
+
+
+1. Motivation
+=============
+
+Sched-DVFS [3] was a new event-driven cpufreq governor which allows the
+scheduler to select the optimal DVFS operating point (OPP) for running a task
+allocated to a CPU. Later, the cpufreq maintainers introduced a similar
+governor, schedutil. The introduction of schedutil also enables running
+workloads at the most energy efficient OPPs.
+
+However, sometimes it may be desired to intentionally boost the performance of
+a workload even if that could imply a reasonable increase in energy
+consumption. For example, in order to reduce the response time of a task, we
+may want to run the task at a higher OPP than the one that is actually required
+by it's CPU bandwidth demand.
+
+This last requirement is especially important if we consider that one of the
+main goals of the utilization-driven governor component is to replace all
+currently available CPUFreq policies. Since sched-DVFS and schedutil are event
+based, as opposed to the sampling driven governors we currently have, they are
+already more responsive at selecting the optimal OPP to run tasks allocated to
+a CPU. However, just tracking the actual task load demand may not be enough
+from a performance standpoint. For example, it is not possible to get
+behaviors similar to those provided by the "performance" and "interactive"
+CPUFreq governors.
+
+This document describes an implementation of a tunable, stacked on top of the
+utilization-driven governors which extends their functionality to support task
+performance boosting.
+
+By "performance boosting" we mean the reduction of the time required to
+complete a task activation, i.e. the time elapsed from a task wakeup to its
+next deactivation (e.g. because it goes back to sleep or it terminates). For
+example, if we consider a simple periodic task which executes the same workload
+for 5[s] every 20[s] while running at a certain OPP, a boosted execution of
+that task must complete each of its activations in less than 5[s].
+
+A previous attempt [5] to introduce such a boosting feature has not been
+successful mainly because of the complexity of the proposed solution. Previous
+versions of the approach described in this document exposed a single simple
+interface to user-space. This single tunable knob allowed the tuning of
+system wide scheduler behaviours ranging from energy efficiency at one end
+through to incremental performance boosting at the other end. This first
+tunable affects all tasks. However, that is not useful for Android products
+so in this version only a more advanced extension of the concept is provided
+which uses CGroups to boost the performance of only selected tasks while using
+the energy efficient default for all others.
+
+The rest of this document introduces in more details the proposed solution
+which has been named SchedTune.
+
+
+2. Introduction
+===============
+
+SchedTune exposes a simple user-space interface provided through a new
+CGroup controller 'stune' which provides two power-performance tunables
+per group:
+
+ /<stune cgroup mount point>/schedtune.prefer_idle
+ /<stune cgroup mount point>/schedtune.boost
+
+The CGroup implementation permits arbitrary user-space defined task
+classification to tune the scheduler for different goals depending on the
+specific nature of the task, e.g. background vs interactive vs low-priority.
+
+More details are given in section 5.
+
+2.1 Boosting
+============
+
+The boost value is expressed as an integer in the range [-100..0..100].
+
+A value of 0 (default) configures the CFS scheduler for maximum energy
+efficiency. This means that sched-DVFS runs the tasks at the minimum OPP
+required to satisfy their workload demand.
+
+A value of 100 configures scheduler for maximum performance, which translates
+to the selection of the maximum OPP on that CPU.
+
+A value of -100 configures scheduler for minimum performance, which translates
+to the selection of the minimum OPP on that CPU.
+
+The range between -100, 0 and 100 can be set to satisfy other scenarios suitably.
+For example to satisfy interactive response or depending on other system events
+(battery level etc).
+
+The overall design of the SchedTune module is built on top of "Per-Entity Load
+Tracking" (PELT) signals and sched-DVFS by introducing a bias on the Operating
+Performance Point (OPP) selection.
+
+Each time a task is allocated on a CPU, cpufreq is given the opportunity to tune
+the operating frequency of that CPU to better match the workload demand. The
+selection of the actual OPP being activated is influenced by the boost value
+for the task CGroup.
+
+This simple biasing approach leverages existing frameworks, which means minimal
+modifications to the scheduler, and yet it allows to achieve a range of
+different behaviours all from a single simple tunable knob.
+
+In EAS schedulers, we use boosted task and CPU utilization for energy
+calculation and energy-aware task placement.
+
+2.2 prefer_idle
+===============
+
+This is a flag which indicates to the scheduler that userspace would like
+the scheduler to focus on energy or to focus on performance.
+
+A value of 0 (default) signals to the CFS scheduler that tasks in this group
+can be placed according to the energy-aware wakeup strategy.
+
+A value of 1 signals to the CFS scheduler that tasks in this group should be
+placed to minimise wakeup latency.
+
+The value is combined with the boost value - task placement will not be
+boost aware however CPU OPP selection is still boost aware.
+
+Android platforms typically use this flag for application tasks which the
+user is currently interacting with.
+
+
+3. Signal Boosting Strategy
+===========================
+
+The whole PELT machinery works based on the value of a few load tracking signals
+which basically track the CPU bandwidth requirements for tasks and the capacity
+of CPUs. The basic idea behind the SchedTune knob is to artificially inflate
+some of these load tracking signals to make a task or RQ appears more demanding
+that it actually is.
+
+Which signals have to be inflated depends on the specific "consumer". However,
+independently from the specific (signal, consumer) pair, it is important to
+define a simple and possibly consistent strategy for the concept of boosting a
+signal.
+
+A boosting strategy defines how the "abstract" user-space defined
+sched_cfs_boost value is translated into an internal "margin" value to be added
+to a signal to get its inflated value:
+
+ margin := boosting_strategy(sched_cfs_boost, signal)
+ boosted_signal := signal + margin
+
+Different boosting strategies were identified and analyzed before selecting the
+one found to be most effective.
+
+Signal Proportional Compensation (SPC)
+--------------------------------------
+
+In this boosting strategy the sched_cfs_boost value is used to compute a
+margin which is proportional to the complement of the original signal.
+When a signal has a maximum possible value, its complement is defined as
+the delta from the actual value and its possible maximum.
+
+Since the tunable implementation uses signals which have SCHED_LOAD_SCALE as
+the maximum possible value, the margin becomes:
+
+ margin := sched_cfs_boost * (SCHED_LOAD_SCALE - signal)
+
+Using this boosting strategy:
+- a 100% sched_cfs_boost means that the signal is scaled to the maximum value
+- each value in the range of sched_cfs_boost effectively inflates the signal in
+ question by a quantity which is proportional to the maximum value.
+
+For example, by applying the SPC boosting strategy to the selection of the OPP
+to run a task it is possible to achieve these behaviors:
+
+- 0% boosting: run the task at the minimum OPP required by its workload
+- 100% boosting: run the task at the maximum OPP available for the CPU
+- 50% boosting: run at the half-way OPP between minimum and maximum
+
+Which means that, at 50% boosting, a task will be scheduled to run at half of
+the maximum theoretically achievable performance on the specific target
+platform.
+
+A graphical representation of an SPC boosted signal is represented in the
+following figure where:
+ a) "-" represents the original signal
+ b) "b" represents a 50% boosted signal
+ c) "p" represents a 100% boosted signal
+
+
+ ^
+ | SCHED_LOAD_SCALE
+ +-----------------------------------------------------------------+
+ |pppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppppp
+ |
+ | boosted_signal
+ | bbbbbbbbbbbbbbbbbbbbbbbb
+ |
+ | original signal
+ | bbbbbbbbbbbbbbbbbbbbbbbb+----------------------+
+ | |
+ |bbbbbbbbbbbbbbbbbb |
+ | |
+ | |
+ | |
+ | +-----------------------+
+ | |
+ | |
+ | |
+ |------------------+
+ |
+ |
+ +----------------------------------------------------------------------->
+
+The plot above shows a ramped load signal (titled 'original_signal') and it's
+boosted equivalent. For each step of the original signal the boosted signal
+corresponding to a 50% boost is midway from the original signal and the upper
+bound. Boosting by 100% generates a boosted signal which is always saturated to
+the upper bound.
+
+
+4. OPP selection using boosted CPU utilization
+==============================================
+
+It is worth calling out that the implementation does not introduce any new load
+signals. Instead, it provides an API to tune existing signals. This tuning is
+done on demand and only in scheduler code paths where it is sensible to do so.
+The new API calls are defined to return either the default signal or a boosted
+one, depending on the value of sched_cfs_boost. This is a clean an non invasive
+modification of the existing existing code paths.
+
+The signal representing a CPU's utilization is boosted according to the
+previously described SPC boosting strategy. To sched-DVFS, this allows a CPU
+(ie CFS run-queue) to appear more used then it actually is.
+
+Thus, with the sched_cfs_boost enabled we have the following main functions to
+get the current utilization of a CPU:
+
+ cpu_util()
+ boosted_cpu_util()
+
+The new boosted_cpu_util() is similar to the first but returns a boosted
+utilization signal which is a function of the sched_cfs_boost value.
+
+This function is used in the CFS scheduler code paths where sched-DVFS needs to
+decide the OPP to run a CPU at.
+For example, this allows selecting the highest OPP for a CPU which has
+the boost value set to 100%.
+
+
+5. Per task group boosting
+==========================
+
+On battery powered devices there usually are many background services which are
+long running and need energy efficient scheduling. On the other hand, some
+applications are more performance sensitive and require an interactive
+response and/or maximum performance, regardless of the energy cost.
+
+To better service such scenarios, the SchedTune implementation has an extension
+that provides a more fine grained boosting interface.
+
+A new CGroup controller, namely "schedtune", can be enabled which allows to
+defined and configure task groups with different boosting values.
+Tasks that require special performance can be put into separate CGroups.
+The value of the boost associated with the tasks in this group can be specified
+using a single knob exposed by the CGroup controller:
+
+ schedtune.boost
+
+This knob allows the definition of a boost value that is to be used for
+SPC boosting of all tasks attached to this group.
+
+The current schedtune controller implementation is really simple and has these
+main characteristics:
+
+ 1) It is only possible to create 1 level depth hierarchies
+
+ The root control groups define the system-wide boost value to be applied
+ by default to all tasks. Its direct subgroups are named "boost groups" and
+ they define the boost value for specific set of tasks.
+ Further nested subgroups are not allowed since they do not have a sensible
+ meaning from a user-space standpoint.
+
+ 2) It is possible to define only a limited number of "boost groups"
+
+ This number is defined at compile time and by default configured to 16.
+ This is a design decision motivated by two main reasons:
+ a) In a real system we do not expect utilization scenarios with more then few
+ boost groups. For example, a reasonable collection of groups could be
+ just "background", "interactive" and "performance".
+ b) It simplifies the implementation considerably, especially for the code
+ which has to compute the per CPU boosting once there are multiple
+ RUNNABLE tasks with different boost values.
+
+Such a simple design should allow servicing the main utilization scenarios identified
+so far. It provides a simple interface which can be used to manage the
+power-performance of all tasks or only selected tasks.
+Moreover, this interface can be easily integrated by user-space run-times (e.g.
+Android, ChromeOS) to implement a QoS solution for task boosting based on tasks
+classification, which has been a long standing requirement.
+
+Setup and usage
+---------------
+
+0. Use a kernel with CONFIG_SCHED_TUNE support enabled
+
+1. Check that the "schedtune" CGroup controller is available:
+
+ root@linaro-nano:~# cat /proc/cgroups
+ #subsys_name hierarchy num_cgroups enabled
+ cpuset 0 1 1
+ cpu 0 1 1
+ schedtune 0 1 1
+
+2. Mount a tmpfs to create the CGroups mount point (Optional)
+
+ root@linaro-nano:~# sudo mount -t tmpfs cgroups /sys/fs/cgroup
+
+3. Mount the "schedtune" controller
+
+ root@linaro-nano:~# mkdir /sys/fs/cgroup/stune
+ root@linaro-nano:~# sudo mount -t cgroup -o schedtune stune /sys/fs/cgroup/stune
+
+4. Create task groups and configure their specific boost value (Optional)
+
+ For example here we create a "performance" boost group configure to boost
+ all its tasks to 100%
+
+ root@linaro-nano:~# mkdir /sys/fs/cgroup/stune/performance
+ root@linaro-nano:~# echo 100 > /sys/fs/cgroup/stune/performance/schedtune.boost
+
+5. Move tasks into the boost group
+
+ For example, the following moves the tasks with PID $TASKPID (and all its
+ threads) into the "performance" boost group.
+
+ root@linaro-nano:~# echo "TASKPID > /sys/fs/cgroup/stune/performance/cgroup.procs
+
+This simple configuration allows only the threads of the $TASKPID task to run,
+when needed, at the highest OPP in the most capable CPU of the system.
+
+
+6. Per-task wakeup-placement-strategy Selection
+===============================================
+
+Many devices have a number of CFS tasks in use which require an absolute
+minimum wakeup latency, and many tasks for which wakeup latency is not
+important.
+
+For touch-driven environments, removing additional wakeup latency can be
+critical.
+
+When you use the Schedtume CGroup controller, you have access to a second
+parameter which allows a group to be marked such that energy_aware task
+placement is bypassed for tasks belonging to that group.
+
+prefer_idle=0 (default - use energy-aware task placement if available)
+prefer_idle=1 (never use energy-aware task placement for these tasks)
+
+Since the regular wakeup task placement algorithm in CFS is biased for
+performance, this has the effect of restoring minimum wakeup latency
+for the desired tasks whilst still allowing energy-aware wakeup placement
+to save energy for other tasks.
+
+
+7. Question and Answers
+=======================
+
+What about "auto" mode?
+-----------------------
+
+The 'auto' mode as described in [5] can be implemented by interfacing SchedTune
+with some suitable user-space element. This element could use the exposed
+system-wide or cgroup based interface.
+
+How are multiple groups of tasks with different boost values managed?
+---------------------------------------------------------------------
+
+The current SchedTune implementation keeps track of the boosted RUNNABLE tasks
+on a CPU. The CPU utilization seen by the scheduler-driven cpufreq governors
+(and used to select an appropriate OPP) is boosted with a value which is the
+maximum of the boost values of the currently RUNNABLE tasks in its RQ.
+
+This allows cpufreq to boost a CPU only while there are boosted tasks ready
+to run and switch back to the energy efficient mode as soon as the last boosted
+task is dequeued.
+
+
+8. References
+=============
+[1] http://lwn.net/Articles/552889
+[2] http://lkml.org/lkml/2012/5/18/91
+[3] http://lkml.org/lkml/2015/6/26/620
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 694968c..b757d6e 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -653,7 +653,8 @@
perf_event_paranoid:
Controls use of the performance events system by unprivileged
-users (without CAP_SYS_ADMIN). The default value is 2.
+users (without CAP_SYS_ADMIN). The default value is 3 if
+CONFIG_SECURITY_PERF_EVENTS_RESTRICT is set, or 2 otherwise.
-1: Allow use of (almost) all events by all users
Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
@@ -661,6 +662,7 @@
Disallow raw tracepoint access by users without CAP_SYS_ADMIN
>=1: Disallow CPU event access by users without CAP_SYS_ADMIN
>=2: Disallow kernel profiling by users without CAP_SYS_ADMIN
+>=3: Disallow all event access by users without CAP_SYS_ADMIN
==============================================================
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 9baf66a..1d1f2cb 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -30,6 +30,7 @@
- dirty_writeback_centisecs
- drop_caches
- extfrag_threshold
+- extra_free_kbytes
- hugepages_treat_as_movable
- hugetlb_shm_group
- laptop_mode
@@ -260,6 +261,21 @@
==============================================================
+extra_free_kbytes
+
+This parameter tells the VM to keep extra free memory between the threshold
+where background reclaim (kswapd) kicks in, and the threshold where direct
+reclaim (by allocating processes) kicks in.
+
+This is useful for workloads that require low latency memory allocations
+and have a bounded burstiness in memory allocations, for example a
+realtime application that receives and transmits network traffic
+(causing in-kernel memory allocations) with a maximum total message burst
+size of 200MB may need 200MB of extra free memory to avoid direct reclaim
+related latencies.
+
+==============================================================
+
hugepages_treat_as_movable
This parameter controls whether we can allocate hugepages from ZONE_MOVABLE
diff --git a/Documentation/trace/events-power.txt b/Documentation/trace/events-power.txt
index 21d514c..4d817d5 100644
--- a/Documentation/trace/events-power.txt
+++ b/Documentation/trace/events-power.txt
@@ -25,6 +25,7 @@
cpu_idle "state=%lu cpu_id=%lu"
cpu_frequency "state=%lu cpu_id=%lu"
+cpu_frequency_limits "min=%lu max=%lu cpu_id=%lu"
A suspend event is used to indicate the system going in and out of the
suspend mode:
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index d4601df..f2fcbb7 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -2407,6 +2407,35 @@
1) 1.449 us | }
+You can disable the hierarchical function call formatting and instead print a
+flat list of function entry and return events. This uses the format described
+in the Output Formatting section and respects all the trace options that
+control that formatting. Hierarchical formatting is the default.
+
+ hierachical: echo nofuncgraph-flat > trace_options
+ flat: echo funcgraph-flat > trace_options
+
+ ie:
+
+ # tracer: function_graph
+ #
+ # entries-in-buffer/entries-written: 68355/68355 #P:2
+ #
+ # _-----=> irqs-off
+ # / _----=> need-resched
+ # | / _---=> hardirq/softirq
+ # || / _--=> preempt-depth
+ # ||| / delay
+ # TASK-PID CPU# |||| TIMESTAMP FUNCTION
+ # | | | |||| | |
+ sh-1806 [001] d... 198.843443: graph_ent: func=_raw_spin_lock
+ sh-1806 [001] d... 198.843445: graph_ent: func=__raw_spin_lock
+ sh-1806 [001] d..1 198.843447: graph_ret: func=__raw_spin_lock
+ sh-1806 [001] d..1 198.843449: graph_ret: func=_raw_spin_lock
+ sh-1806 [001] d..1 198.843451: graph_ent: func=_raw_spin_unlock_irqrestore
+ sh-1806 [001] d... 198.843453: graph_ret: func=_raw_spin_unlock_irqrestore
+
+
You might find other useful features for this tracer in the
following "dynamic ftrace" section such as tracing only specific
functions or tasks.
diff --git a/Makefile b/Makefile
index dd4eaee..9a9fc25 100644
--- a/Makefile
+++ b/Makefile
@@ -372,6 +372,7 @@
# Make variables (CC, etc...)
AS = $(CROSS_COMPILE)as
LD = $(CROSS_COMPILE)ld
+LDGOLD = $(CROSS_COMPILE)ld.gold
CC = $(CROSS_COMPILE)gcc
CPP = $(CC) -E
AR = $(CROSS_COMPILE)ar
@@ -479,7 +480,8 @@
ifeq ($(cc-name),clang)
ifneq ($(CROSS_COMPILE),)
-CLANG_TARGET := --target=$(notdir $(CROSS_COMPILE:%-=%))
+CLANG_TRIPLE ?= $(CROSS_COMPILE)
+CLANG_TARGET := --target=$(notdir $(CLANG_TRIPLE:%-=%))
GCC_TOOLCHAIN := $(realpath $(dir $(shell which $(LD)))/..)
endif
ifneq ($(GCC_TOOLCHAIN),)
@@ -634,6 +636,20 @@
CFLAGS_KCOV := $(call cc-option,-fsanitize-coverage=trace-pc,)
export CFLAGS_GCOV CFLAGS_KCOV
+# Make toolchain changes before including arch/$(SRCARCH)/Makefile to ensure
+# ar/cc/ld-* macros return correct values.
+ifdef CONFIG_LTO_CLANG
+# use GNU gold with LLVMgold for LTO linking, and LD for vmlinux_link
+LDFINAL_vmlinux := $(LD)
+LD := $(LDGOLD)
+LDFLAGS += -plugin LLVMgold.so
+# use llvm-ar for building symbol tables from IR files, and llvm-dis instead
+# of objdump for processing symbol versions and exports
+LLVM_AR := llvm-ar
+LLVM_DIS := llvm-dis
+export LLVM_AR LLVM_DIS
+endif
+
# The arch Makefile can set ARCH_{CPP,A,C}FLAGS to override the default
# values of the respective KBUILD_* variables
ARCH_CPPFLAGS :=
@@ -714,6 +730,7 @@
KBUILD_CFLAGS += $(call cc-disable-warning, format-invalid-specifier)
KBUILD_CFLAGS += $(call cc-disable-warning, gnu)
KBUILD_CFLAGS += $(call cc-disable-warning, address-of-packed-member)
+KBUILD_CFLAGS += $(call cc-disable-warning, duplicate-decl-specifier)
# Quiet clang warning: comparison of unsigned expression < 0 is always false
KBUILD_CFLAGS += $(call cc-disable-warning, tautological-compare)
# CLANG uses a _MergedGlobals as optimization, but this breaks modpost, as the
@@ -791,6 +808,53 @@
KBUILD_CFLAGS += $(call cc-option,-fdata-sections,)
endif
+ifdef CONFIG_LTO_CLANG
+lto-clang-flags := -flto -fvisibility=hidden
+
+# allow disabling only clang LTO where needed
+DISABLE_LTO_CLANG := -fno-lto -fvisibility=default
+export DISABLE_LTO_CLANG
+endif
+
+ifdef CONFIG_LTO
+lto-flags := $(lto-clang-flags)
+KBUILD_CFLAGS += $(lto-flags)
+
+DISABLE_LTO := $(DISABLE_LTO_CLANG)
+export DISABLE_LTO
+
+# LDFINAL_vmlinux and LDFLAGS_FINAL_vmlinux can be set to override
+# the linker and flags for vmlinux_link.
+export LDFINAL_vmlinux LDFLAGS_FINAL_vmlinux
+endif
+
+ifdef CONFIG_CFI_CLANG
+cfi-clang-flags += -fsanitize=cfi
+DISABLE_CFI_CLANG := -fno-sanitize=cfi
+ifdef CONFIG_MODULES
+cfi-clang-flags += -fsanitize-cfi-cross-dso
+DISABLE_CFI_CLANG += -fno-sanitize-cfi-cross-dso
+endif
+ifdef CONFIG_CFI_PERMISSIVE
+cfi-clang-flags += -fsanitize-recover=cfi -fno-sanitize-trap=cfi
+endif
+
+# also disable CFI when LTO is disabled
+DISABLE_LTO_CLANG += $(DISABLE_CFI_CLANG)
+# allow disabling only clang CFI where needed
+export DISABLE_CFI_CLANG
+endif
+
+ifdef CONFIG_CFI
+# cfi-flags are re-tested in prepare-compiler-check
+cfi-flags := $(cfi-clang-flags)
+KBUILD_CFLAGS += $(cfi-flags)
+
+DISABLE_CFI := $(DISABLE_CFI_CLANG)
+DISABLE_LTO += $(DISABLE_CFI)
+export DISABLE_CFI
+endif
+
# arch Makefile may override CC so keep this after arch Makefile is included
NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) -print-file-name=include)
CHECKFLAGS += $(NOSTDINC_FLAGS)
@@ -1108,6 +1172,22 @@
# CC_STACKPROTECTOR_STRONG! Why did it build with _REGULAR?!")
PHONY += prepare-compiler-check
prepare-compiler-check: FORCE
+# Make sure we're using a supported toolchain with LTO_CLANG
+ifdef CONFIG_LTO_CLANG
+ ifneq ($(call clang-ifversion, -ge, 0500, y), y)
+ @echo Cannot use CONFIG_LTO_CLANG: requires clang 5.0 or later >&2 && exit 1
+ endif
+ ifneq ($(call gold-ifversion, -ge, 112000000, y), y)
+ @echo Cannot use CONFIG_LTO_CLANG: requires GNU gold 1.12 or later >&2 && exit 1
+ endif
+endif
+# Make sure compiler supports LTO flags
+ifdef lto-flags
+ ifeq ($(call cc-option, $(lto-flags)),)
+ @echo Cannot use CONFIG_LTO: $(lto-flags) not supported by compiler \
+ >&2 && exit 1
+ endif
+endif
# Make sure compiler supports requested stack protector flag.
ifdef stackp-name
ifeq ($(call cc-option, $(stackp-flag)),)
@@ -1122,6 +1202,11 @@
$(stackp-flag) available but compiler is broken >&2 && exit 1
endif
endif
+ifdef cfi-flags
+ ifeq ($(call cc-option, $(cfi-flags)),)
+ @echo Cannot use CONFIG_CFI: $(cfi-flags) not supported by compiler >&2 && exit 1
+ endif
+endif
@:
# Generate some files
@@ -1581,7 +1666,8 @@
-o -name modules.builtin -o -name '.tmp_*.o.*' \
-o -name '*.c.[012]*.*' \
-o -name '*.ll' \
- -o -name '*.gcno' \) -type f -print | xargs rm -f
+ -o -name '*.gcno' \
+ -o -name '*.*.symversions' \) -type f -print | xargs rm -f
# Generate tags for editors
# ---------------------------------------------------------------------------
diff --git a/arch/Kconfig b/arch/Kconfig
index 40dc31f..6f1ea52 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -611,6 +611,74 @@
sections (e.g., '.text.init'). Typically '.' in section names
is used to distinguish them from label names / C identifiers.
+config LTO
+ def_bool n
+
+config ARCH_SUPPORTS_LTO_CLANG
+ bool
+ help
+ An architecture should select this option it supports:
+ - compiling with clang,
+ - compiling inline assembly with clang's integrated assembler,
+ - and linking with either lld or GNU gold w/ LLVMgold.
+
+choice
+ prompt "Link-Time Optimization (LTO) (EXPERIMENTAL)"
+ default LTO_NONE
+ help
+ This option turns on Link-Time Optimization (LTO).
+
+config LTO_NONE
+ bool "None"
+
+config LTO_CLANG
+ bool "Use clang Link Time Optimization (LTO) (EXPERIMENTAL)"
+ depends on ARCH_SUPPORTS_LTO_CLANG
+ depends on !FTRACE_MCOUNT_RECORD || HAVE_C_RECORDMCOUNT
+ select LTO
+ select THIN_ARCHIVES
+ select LD_DEAD_CODE_DATA_ELIMINATION
+ help
+ This option enables clang's Link Time Optimization (LTO), which allows
+ the compiler to optimize the kernel globally at link time. If you
+ enable this option, the compiler generates LLVM IR instead of object
+ files, and the actual compilation from IR occurs at the LTO link step,
+ which may take several minutes.
+
+ If you select this option, you must compile the kernel with clang >=
+ 5.0 (make CC=clang) and GNU gold from binutils >= 2.27, and have the
+ LLVMgold plug-in in LD_LIBRARY_PATH.
+
+endchoice
+
+config CFI
+ bool
+
+config CFI_PERMISSIVE
+ bool "Use CFI in permissive mode"
+ depends on CFI
+ help
+ When selected, Control Flow Integrity (CFI) violations result in a
+ warning instead of a kernel panic. This option is useful for finding
+ CFI violations in drivers during development.
+
+config CFI_CLANG
+ bool "Use clang Control Flow Integrity (CFI) (EXPERIMENTAL)"
+ depends on LTO_CLANG
+ depends on KALLSYMS
+ select CFI
+ help
+ This option enables clang Control Flow Integrity (CFI), which adds
+ runtime checking for indirect function calls.
+
+config CFI_CLANG_SHADOW
+ bool "Use CFI shadow to speed up cross-module checks"
+ default y
+ depends on CFI_CLANG
+ help
+ If you select this option, the kernel builds a fast look-up table of
+ CFI check functions in loaded modules to reduce overhead.
+
config HAVE_ARCH_WITHIN_STACK_FRAMES
bool
help
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index d1346a1..8586381 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1826,6 +1826,15 @@
help
Say Y if you want to run Linux in a Virtual Machine on Xen on ARM.
+config ARM_FLUSH_CONSOLE_ON_RESTART
+ bool "Force flush the console on restart"
+ help
+ If the console is locked while the system is rebooted, the messages
+ in the temporary logbuffer would not have propogated to all the
+ console drivers. This option forces the console lock to be
+ released if it failed to be acquired, which will cause all the
+ pending messages to be flushed.
+
endmenu
menu "Boot options"
@@ -1854,6 +1863,21 @@
This was deprecated in 2001 and announced to live on for 5 years.
Some old boot loaders still use this way.
+config BUILD_ARM_APPENDED_DTB_IMAGE
+ bool "Build a concatenated zImage/dtb by default"
+ depends on OF
+ help
+ Enabling this option will cause a concatenated zImage and list of
+ DTBs to be built by default (instead of a standalone zImage.)
+ The image will built in arch/arm/boot/zImage-dtb
+
+config BUILD_ARM_APPENDED_DTB_IMAGE_NAMES
+ string "Default dtb names"
+ depends on BUILD_ARM_APPENDED_DTB_IMAGE
+ help
+ Space separated list of names of dtbs to append when
+ building a concatenated zImage-dtb.
+
# Compressed boot loader in ROM. Yes, we really want to ask about
# TEXT and BSS so we preserve their values in the config files.
config ZBOOT_ROM_TEXT
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 36ae445..bc805d7 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -303,6 +303,8 @@
boot := arch/arm/boot
ifeq ($(CONFIG_XIP_KERNEL),y)
KBUILD_IMAGE := $(boot)/xipImage
+else ifeq ($(CONFIG_BUILD_ARM_APPENDED_DTB_IMAGE),y)
+KBUILD_IMAGE := $(boot)/zImage-dtb
else
KBUILD_IMAGE := $(boot)/zImage
endif
@@ -356,6 +358,9 @@
$(Q)$(MAKE) $(build)=arch/arm/vdso $@
endif
+zImage-dtb: vmlinux scripts dtbs
+ $(Q)$(MAKE) $(build)=$(boot) MACHINE=$(MACHINE) $(boot)/$@
+
# We use MRPROPER_FILES and CLEAN_FILES now
archclean:
$(Q)$(MAKE) $(clean)=$(boot)
diff --git a/arch/arm/boot/Makefile b/arch/arm/boot/Makefile
index 50f8d1b..da75630 100644
--- a/arch/arm/boot/Makefile
+++ b/arch/arm/boot/Makefile
@@ -16,6 +16,7 @@
ifneq ($(MACHINE),)
include $(MACHINE)/Makefile.boot
endif
+include $(srctree)/arch/arm/boot/dts/Makefile
# Note: the following conditions must always be true:
# ZRELADDR == virt_to_phys(PAGE_OFFSET + TEXT_OFFSET)
@@ -29,6 +30,14 @@
targets := Image zImage xipImage bootpImage uImage
+DTB_NAMES := $(subst $\",,$(CONFIG_BUILD_ARM_APPENDED_DTB_IMAGE_NAMES))
+ifneq ($(DTB_NAMES),)
+DTB_LIST := $(addsuffix .dtb,$(DTB_NAMES))
+else
+DTB_LIST := $(dtb-y)
+endif
+DTB_OBJS := $(addprefix $(obj)/dts/,$(DTB_LIST))
+
ifeq ($(CONFIG_XIP_KERNEL),y)
$(obj)/xipImage: vmlinux FORCE
@@ -55,6 +64,10 @@
$(obj)/zImage: $(obj)/compressed/vmlinux FORCE
$(call if_changed,objcopy)
+$(obj)/zImage-dtb: $(obj)/zImage $(DTB_OBJS) FORCE
+ $(call if_changed,cat)
+ @echo ' Kernel: $@ is ready'
+
endif
ifneq ($(LOADADDR),)
diff --git a/arch/arm/boot/compressed/head.S b/arch/arm/boot/compressed/head.S
index 5f687ba..03c1227 100644
--- a/arch/arm/boot/compressed/head.S
+++ b/arch/arm/boot/compressed/head.S
@@ -792,6 +792,8 @@
bic r6, r6, #1 << 31 @ 32-bit translation system
bic r6, r6, #(7 << 0) | (1 << 4) @ use only ttbr0
mcrne p15, 0, r3, c2, c0, 0 @ load page table pointer
+ mcrne p15, 0, r0, c8, c7, 0 @ flush I,D TLBs
+ mcr p15, 0, r0, c7, c5, 4 @ ISB
mcrne p15, 0, r1, c3, c0, 0 @ load domain access control
mcrne p15, 0, r6, c2, c0, 2 @ load ttb control
#endif
diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
index eff87a3..86e591c 100644
--- a/arch/arm/boot/dts/Makefile
+++ b/arch/arm/boot/dts/Makefile
@@ -1074,5 +1074,15 @@
dtstree := $(srctree)/$(src)
dtb-$(CONFIG_OF_ALL_DTBS) := $(patsubst $(dtstree)/%.dts,%.dtb, $(wildcard $(dtstree)/*.dts))
-always := $(dtb-y)
+DTB_NAMES := $(subst $\",,$(CONFIG_BUILD_ARM_APPENDED_DTB_IMAGE_NAMES))
+ifneq ($(DTB_NAMES),)
+DTB_LIST := $(addsuffix .dtb,$(DTB_NAMES))
+else
+DTB_LIST := $(dtb-y)
+endif
+
+targets += dtbs dtbs_install
+targets += $(DTB_LIST)
+
+always := $(DTB_LIST)
clean-files := *.dtb
diff --git a/arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts b/arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts
index a4c7713..c5c365f 100644
--- a/arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts
+++ b/arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts
@@ -41,6 +41,7 @@
cci-control-port = <&cci_control1>;
cpu-idle-states = <&CLUSTER_SLEEP_BIG>;
capacity-dmips-mhz = <1024>;
+ sched-energy-costs = <&CPU_COST_A15 &CLUSTER_COST_A15>;
};
cpu1: cpu@1 {
@@ -50,6 +51,7 @@
cci-control-port = <&cci_control1>;
cpu-idle-states = <&CLUSTER_SLEEP_BIG>;
capacity-dmips-mhz = <1024>;
+ sched-energy-costs = <&CPU_COST_A15 &CLUSTER_COST_A15>;
};
cpu2: cpu@2 {
@@ -59,6 +61,7 @@
cci-control-port = <&cci_control2>;
cpu-idle-states = <&CLUSTER_SLEEP_LITTLE>;
capacity-dmips-mhz = <516>;
+ sched-energy-costs = <&CPU_COST_A7 &CLUSTER_COST_A7>;
};
cpu3: cpu@3 {
@@ -68,6 +71,7 @@
cci-control-port = <&cci_control2>;
cpu-idle-states = <&CLUSTER_SLEEP_LITTLE>;
capacity-dmips-mhz = <516>;
+ sched-energy-costs = <&CPU_COST_A7 &CLUSTER_COST_A7>;
};
cpu4: cpu@4 {
@@ -77,6 +81,7 @@
cci-control-port = <&cci_control2>;
cpu-idle-states = <&CLUSTER_SLEEP_LITTLE>;
capacity-dmips-mhz = <516>;
+ sched-energy-costs = <&CPU_COST_A7 &CLUSTER_COST_A7>;
};
idle-states {
@@ -96,6 +101,77 @@
min-residency-us = <2500>;
};
};
+
+ energy-costs {
+ CPU_COST_A15: core-cost0 {
+ busy-cost-data = <
+ 426 2021
+ 512 2312
+ 597 2756
+ 682 3125
+ 768 3524
+ 853 3846
+ 938 5177
+ 1024 6997
+ >;
+ idle-cost-data = <
+ 0
+ 0
+ 0
+ >;
+ };
+ CPU_COST_A7: core-cost1 {
+ busy-cost-data = <
+ 150 187
+ 172 275
+ 215 334
+ 258 407
+ 301 447
+ 344 549
+ 387 761
+ 430 1024
+ >;
+ idle-cost-data = <
+ 0
+ 0
+ 0
+ >;
+ };
+ CLUSTER_COST_A15: cluster-cost0 {
+ busy-cost-data = <
+ 426 7920
+ 512 8165
+ 597 8172
+ 682 8195
+ 768 8265
+ 853 8446
+ 938 11426
+ 1024 15200
+ >;
+ idle-cost-data = <
+ 70
+ 70
+ 25
+ >;
+ };
+ CLUSTER_COST_A7: cluster-cost1 {
+ busy-cost-data = <
+ 150 2967
+ 172 2792
+ 215 2810
+ 258 2815
+ 301 2919
+ 344 2847
+ 387 3917
+ 430 4905
+ >;
+ idle-cost-data = <
+ 25
+ 25
+ 10
+ >;
+ };
+ };
};
memory@80000000 {
diff --git a/arch/arm/common/Kconfig b/arch/arm/common/Kconfig
index e5ad070..f07f479 100644
--- a/arch/arm/common/Kconfig
+++ b/arch/arm/common/Kconfig
@@ -15,3 +15,7 @@
config SHARP_SCOOP
bool
+
+config FIQ_GLUE
+ bool
+ select FIQ
diff --git a/arch/arm/common/Makefile b/arch/arm/common/Makefile
index 70b4a14..10b5064 100644
--- a/arch/arm/common/Makefile
+++ b/arch/arm/common/Makefile
@@ -5,6 +5,7 @@
obj-y += firmware.o
+obj-$(CONFIG_FIQ_GLUE) += fiq_glue.o fiq_glue_setup.o
obj-$(CONFIG_SA1111) += sa1111.o
obj-$(CONFIG_DMABOUNCE) += dmabounce.o
obj-$(CONFIG_SHARP_LOCOMO) += locomo.o
diff --git a/arch/arm/common/fiq_glue.S b/arch/arm/common/fiq_glue.S
new file mode 100644
index 0000000..24b42ce
--- /dev/null
+++ b/arch/arm/common/fiq_glue.S
@@ -0,0 +1,118 @@
+/*
+ * Copyright (C) 2008 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+ .text
+
+ .global fiq_glue_end
+
+ /* fiq stack: r0-r15,cpsr,spsr of interrupted mode */
+
+ENTRY(fiq_glue)
+ /* store pc, cpsr from previous mode, reserve space for spsr */
+ mrs r12, spsr
+ sub lr, lr, #4
+ subs r10, #1
+ bne nested_fiq
+
+ str r12, [sp, #-8]!
+ str lr, [sp, #-4]!
+
+ /* store r8-r14 from previous mode */
+ sub sp, sp, #(7 * 4)
+ stmia sp, {r8-r14}^
+ nop
+
+ /* store r0-r7 from previous mode */
+ stmfd sp!, {r0-r7}
+
+ /* setup func(data,regs) arguments */
+ mov r0, r9
+ mov r1, sp
+ mov r3, r8
+
+ mov r7, sp
+
+ /* Get sp and lr from non-user modes */
+ and r4, r12, #MODE_MASK
+ cmp r4, #USR_MODE
+ beq fiq_from_usr_mode
+
+ mov r7, sp
+ orr r4, r4, #(PSR_I_BIT | PSR_F_BIT)
+ msr cpsr_c, r4
+ str sp, [r7, #(4 * 13)]
+ str lr, [r7, #(4 * 14)]
+ mrs r5, spsr
+ str r5, [r7, #(4 * 17)]
+
+ cmp r4, #(SVC_MODE | PSR_I_BIT | PSR_F_BIT)
+ /* use fiq stack if we reenter this mode */
+ subne sp, r7, #(4 * 3)
+
+fiq_from_usr_mode:
+ msr cpsr_c, #(SVC_MODE | PSR_I_BIT | PSR_F_BIT)
+ mov r2, sp
+ sub sp, r7, #12
+ stmfd sp!, {r2, ip, lr}
+ /* call func(data,regs) */
+ blx r3
+ ldmfd sp, {r2, ip, lr}
+ mov sp, r2
+
+ /* restore/discard saved state */
+ cmp r4, #USR_MODE
+ beq fiq_from_usr_mode_exit
+
+ msr cpsr_c, r4
+ ldr sp, [r7, #(4 * 13)]
+ ldr lr, [r7, #(4 * 14)]
+ msr spsr_cxsf, r5
+
+fiq_from_usr_mode_exit:
+ msr cpsr_c, #(FIQ_MODE | PSR_I_BIT | PSR_F_BIT)
+
+ ldmfd sp!, {r0-r7}
+ ldr lr, [sp, #(4 * 7)]
+ ldr r12, [sp, #(4 * 8)]
+ add sp, sp, #(10 * 4)
+exit_fiq:
+ msr spsr_cxsf, r12
+ add r10, #1
+ cmp r11, #0
+ moveqs pc, lr
+ bx r11 /* jump to custom fiq return function */
+
+nested_fiq:
+ orr r12, r12, #(PSR_F_BIT)
+ b exit_fiq
+
+fiq_glue_end:
+
+ENTRY(fiq_glue_setup) /* func, data, sp, smc call number */
+ stmfd sp!, {r4}
+ mrs r4, cpsr
+ msr cpsr_c, #(FIQ_MODE | PSR_I_BIT | PSR_F_BIT)
+ movs r8, r0
+ mov r9, r1
+ mov sp, r2
+ mov r11, r3
+ moveq r10, #0
+ movne r10, #1
+ msr cpsr_c, r4
+ ldmfd sp!, {r4}
+ bx lr
+
diff --git a/arch/arm/common/fiq_glue_setup.c b/arch/arm/common/fiq_glue_setup.c
new file mode 100644
index 0000000..8cb1b61
--- /dev/null
+++ b/arch/arm/common/fiq_glue_setup.c
@@ -0,0 +1,147 @@
+/*
+ * Copyright (C) 2010 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/percpu.h>
+#include <linux/slab.h>
+#include <asm/fiq.h>
+#include <asm/fiq_glue.h>
+
+extern unsigned char fiq_glue, fiq_glue_end;
+extern void fiq_glue_setup(void *func, void *data, void *sp,
+ fiq_return_handler_t fiq_return_handler);
+
+static struct fiq_handler fiq_debbuger_fiq_handler = {
+ .name = "fiq_glue",
+};
+DEFINE_PER_CPU(void *, fiq_stack);
+static struct fiq_glue_handler *current_handler;
+static fiq_return_handler_t fiq_return_handler;
+static DEFINE_MUTEX(fiq_glue_lock);
+
+static void fiq_glue_setup_helper(void *info)
+{
+ struct fiq_glue_handler *handler = info;
+ fiq_glue_setup(handler->fiq, handler,
+ __get_cpu_var(fiq_stack) + THREAD_START_SP,
+ fiq_return_handler);
+}
+
+int fiq_glue_register_handler(struct fiq_glue_handler *handler)
+{
+ int ret;
+ int cpu;
+
+ if (!handler || !handler->fiq)
+ return -EINVAL;
+
+ mutex_lock(&fiq_glue_lock);
+ if (fiq_stack) {
+ ret = -EBUSY;
+ goto err_busy;
+ }
+
+ for_each_possible_cpu(cpu) {
+ void *stack;
+ stack = (void *)__get_free_pages(GFP_KERNEL, THREAD_SIZE_ORDER);
+ if (WARN_ON(!stack)) {
+ ret = -ENOMEM;
+ goto err_alloc_fiq_stack;
+ }
+ per_cpu(fiq_stack, cpu) = stack;
+ }
+
+ ret = claim_fiq(&fiq_debbuger_fiq_handler);
+ if (WARN_ON(ret))
+ goto err_claim_fiq;
+
+ current_handler = handler;
+ on_each_cpu(fiq_glue_setup_helper, handler, true);
+ set_fiq_handler(&fiq_glue, &fiq_glue_end - &fiq_glue);
+
+ mutex_unlock(&fiq_glue_lock);
+ return 0;
+
+err_claim_fiq:
+err_alloc_fiq_stack:
+ for_each_possible_cpu(cpu) {
+ __free_pages(per_cpu(fiq_stack, cpu), THREAD_SIZE_ORDER);
+ per_cpu(fiq_stack, cpu) = NULL;
+ }
+err_busy:
+ mutex_unlock(&fiq_glue_lock);
+ return ret;
+}
+
+static void fiq_glue_update_return_handler(void (*fiq_return)(void))
+{
+ fiq_return_handler = fiq_return;
+ if (current_handler)
+ on_each_cpu(fiq_glue_setup_helper, current_handler, true);
+}
+
+int fiq_glue_set_return_handler(void (*fiq_return)(void))
+{
+ int ret;
+
+ mutex_lock(&fiq_glue_lock);
+ if (fiq_return_handler) {
+ ret = -EBUSY;
+ goto err_busy;
+ }
+ fiq_glue_update_return_handler(fiq_return);
+ ret = 0;
+err_busy:
+ mutex_unlock(&fiq_glue_lock);
+
+ return ret;
+}
+EXPORT_SYMBOL(fiq_glue_set_return_handler);
+
+int fiq_glue_clear_return_handler(void (*fiq_return)(void))
+{
+ int ret;
+
+ mutex_lock(&fiq_glue_lock);
+ if (WARN_ON(fiq_return_handler != fiq_return)) {
+ ret = -EINVAL;
+ goto err_inval;
+ }
+ fiq_glue_update_return_handler(NULL);
+ ret = 0;
+err_inval:
+ mutex_unlock(&fiq_glue_lock);
+
+ return ret;
+}
+EXPORT_SYMBOL(fiq_glue_clear_return_handler);
+
+/**
+ * fiq_glue_resume - Restore fiqs after suspend or low power idle states
+ *
+ * This must be called before calling local_fiq_enable after returning from a
+ * power state where the fiq mode registers were lost. If a driver provided
+ * a resume hook when it registered the handler it will be called.
+ */
+
+void fiq_glue_resume(void)
+{
+ if (!current_handler)
+ return;
+ fiq_glue_setup(current_handler->fiq, current_handler,
+ __get_cpu_var(fiq_stack) + THREAD_START_SP,
+ fiq_return_handler);
+ if (current_handler->resume)
+ current_handler->resume(current_handler);
+}
+
diff --git a/arch/arm/configs/ranchu_defconfig b/arch/arm/configs/ranchu_defconfig
new file mode 100644
index 0000000..69157c4
--- /dev/null
+++ b/arch/arm/configs/ranchu_defconfig
@@ -0,0 +1,313 @@
+# CONFIG_LOCALVERSION_AUTO is not set
+CONFIG_AUDIT=y
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_TASKSTATS=y
+CONFIG_TASK_DELAY_ACCT=y
+CONFIG_TASK_XACCT=y
+CONFIG_TASK_IO_ACCOUNTING=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+CONFIG_LOG_BUF_SHIFT=14
+CONFIG_CGROUPS=y
+CONFIG_CGROUP_DEBUG=y
+CONFIG_CGROUP_FREEZER=y
+CONFIG_CPUSETS=y
+CONFIG_CGROUP_CPUACCT=y
+CONFIG_CGROUP_SCHED=y
+CONFIG_RT_GROUP_SCHED=y
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_KALLSYMS_ALL=y
+CONFIG_EMBEDDED=y
+CONFIG_PROFILING=y
+CONFIG_OPROFILE=y
+CONFIG_ARCH_MMAP_RND_BITS=16
+# CONFIG_BLK_DEV_BSG is not set
+# CONFIG_IOSCHED_DEADLINE is not set
+# CONFIG_IOSCHED_CFQ is not set
+CONFIG_ARCH_VIRT=y
+CONFIG_ARM_KERNMEM_PERMS=y
+CONFIG_SMP=y
+CONFIG_PREEMPT=y
+CONFIG_AEABI=y
+CONFIG_HIGHMEM=y
+CONFIG_KSM=y
+CONFIG_SECCOMP=y
+CONFIG_CMDLINE="console=ttyAMA0"
+CONFIG_VFP=y
+CONFIG_NEON=y
+# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
+CONFIG_PM_AUTOSLEEP=y
+CONFIG_PM_WAKELOCKS=y
+CONFIG_PM_WAKELOCKS_LIMIT=0
+# CONFIG_PM_WAKELOCKS_GC is not set
+CONFIG_PM_DEBUG=y
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_XFRM_USER=y
+CONFIG_NET_KEY=y
+CONFIG_INET=y
+CONFIG_INET_DIAG_DESTROY=y
+CONFIG_IP_MULTICAST=y
+CONFIG_IP_ADVANCED_ROUTER=y
+CONFIG_IP_MULTIPLE_TABLES=y
+CONFIG_IP_PNP=y
+CONFIG_IP_PNP_DHCP=y
+CONFIG_IP_PNP_BOOTP=y
+CONFIG_INET_ESP=y
+# CONFIG_INET_LRO is not set
+CONFIG_IPV6_ROUTER_PREF=y
+CONFIG_IPV6_ROUTE_INFO=y
+CONFIG_IPV6_OPTIMISTIC_DAD=y
+CONFIG_INET6_AH=y
+CONFIG_INET6_ESP=y
+CONFIG_INET6_IPCOMP=y
+CONFIG_IPV6_MIP6=y
+CONFIG_IPV6_MULTIPLE_TABLES=y
+CONFIG_NETFILTER=y
+CONFIG_NF_CONNTRACK=y
+CONFIG_NF_CONNTRACK_SECMARK=y
+CONFIG_NF_CONNTRACK_EVENTS=y
+CONFIG_NF_CT_PROTO_DCCP=y
+CONFIG_NF_CT_PROTO_SCTP=y
+CONFIG_NF_CT_PROTO_UDPLITE=y
+CONFIG_NF_CONNTRACK_AMANDA=y
+CONFIG_NF_CONNTRACK_FTP=y
+CONFIG_NF_CONNTRACK_H323=y
+CONFIG_NF_CONNTRACK_IRC=y
+CONFIG_NF_CONNTRACK_NETBIOS_NS=y
+CONFIG_NF_CONNTRACK_PPTP=y
+CONFIG_NF_CONNTRACK_SANE=y
+CONFIG_NF_CONNTRACK_TFTP=y
+CONFIG_NF_CT_NETLINK=y
+CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y
+CONFIG_NETFILTER_XT_TARGET_CONNMARK=y
+CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y
+CONFIG_NETFILTER_XT_TARGET_IDLETIMER=y
+CONFIG_NETFILTER_XT_TARGET_MARK=y
+CONFIG_NETFILTER_XT_TARGET_NFLOG=y
+CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y
+CONFIG_NETFILTER_XT_TARGET_TPROXY=y
+CONFIG_NETFILTER_XT_TARGET_TRACE=y
+CONFIG_NETFILTER_XT_TARGET_SECMARK=y
+CONFIG_NETFILTER_XT_TARGET_TCPMSS=y
+CONFIG_NETFILTER_XT_MATCH_COMMENT=y
+CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y
+CONFIG_NETFILTER_XT_MATCH_CONNMARK=y
+CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y
+CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y
+CONFIG_NETFILTER_XT_MATCH_HELPER=y
+CONFIG_NETFILTER_XT_MATCH_IPRANGE=y
+CONFIG_NETFILTER_XT_MATCH_LENGTH=y
+CONFIG_NETFILTER_XT_MATCH_LIMIT=y
+CONFIG_NETFILTER_XT_MATCH_MAC=y
+CONFIG_NETFILTER_XT_MATCH_MARK=y
+CONFIG_NETFILTER_XT_MATCH_POLICY=y
+CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y
+CONFIG_NETFILTER_XT_MATCH_QTAGUID=y
+CONFIG_NETFILTER_XT_MATCH_QUOTA=y
+CONFIG_NETFILTER_XT_MATCH_QUOTA2=y
+CONFIG_NETFILTER_XT_MATCH_SOCKET=y
+CONFIG_NETFILTER_XT_MATCH_STATE=y
+CONFIG_NETFILTER_XT_MATCH_STATISTIC=y
+CONFIG_NETFILTER_XT_MATCH_STRING=y
+CONFIG_NETFILTER_XT_MATCH_TIME=y
+CONFIG_NETFILTER_XT_MATCH_U32=y
+CONFIG_NF_CONNTRACK_IPV4=y
+CONFIG_IP_NF_IPTABLES=y
+CONFIG_IP_NF_MATCH_AH=y
+CONFIG_IP_NF_MATCH_ECN=y
+CONFIG_IP_NF_MATCH_TTL=y
+CONFIG_IP_NF_FILTER=y
+CONFIG_IP_NF_TARGET_REJECT=y
+CONFIG_IP_NF_MANGLE=y
+CONFIG_IP_NF_RAW=y
+CONFIG_IP_NF_SECURITY=y
+CONFIG_IP_NF_ARPTABLES=y
+CONFIG_IP_NF_ARPFILTER=y
+CONFIG_IP_NF_ARP_MANGLE=y
+CONFIG_NF_CONNTRACK_IPV6=y
+CONFIG_IP6_NF_IPTABLES=y
+CONFIG_IP6_NF_FILTER=y
+CONFIG_IP6_NF_TARGET_REJECT=y
+CONFIG_IP6_NF_MANGLE=y
+CONFIG_IP6_NF_RAW=y
+CONFIG_BRIDGE=y
+CONFIG_NET_SCHED=y
+CONFIG_NET_SCH_HTB=y
+CONFIG_NET_CLS_U32=y
+CONFIG_NET_EMATCH=y
+CONFIG_NET_EMATCH_U32=y
+CONFIG_NET_CLS_ACT=y
+# CONFIG_WIRELESS is not set
+CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
+CONFIG_MTD=y
+CONFIG_MTD_CMDLINE_PARTS=y
+CONFIG_MTD_BLOCK=y
+CONFIG_MTD_CFI=y
+CONFIG_MTD_CFI_INTELEXT=y
+CONFIG_MTD_CFI_AMDSTD=y
+CONFIG_BLK_DEV_LOOP=y
+CONFIG_BLK_DEV_RAM=y
+CONFIG_BLK_DEV_RAM_SIZE=8192
+CONFIG_VIRTIO_BLK=y
+CONFIG_MD=y
+CONFIG_BLK_DEV_DM=y
+CONFIG_DM_CRYPT=y
+CONFIG_DM_UEVENT=y
+CONFIG_DM_VERITY=y
+CONFIG_DM_VERITY_FEC=y
+CONFIG_NETDEVICES=y
+CONFIG_TUN=y
+CONFIG_VIRTIO_NET=y
+CONFIG_SMSC911X=y
+CONFIG_PPP=y
+CONFIG_PPP_BSDCOMP=y
+CONFIG_PPP_DEFLATE=y
+CONFIG_PPP_MPPE=y
+CONFIG_USB_USBNET=y
+# CONFIG_WLAN is not set
+CONFIG_INPUT_EVDEV=y
+CONFIG_INPUT_KEYRESET=y
+CONFIG_KEYBOARD_GOLDFISH_EVENTS=y
+# CONFIG_INPUT_MOUSE is not set
+CONFIG_INPUT_JOYSTICK=y
+CONFIG_JOYSTICK_XPAD=y
+CONFIG_JOYSTICK_XPAD_FF=y
+CONFIG_JOYSTICK_XPAD_LEDS=y
+CONFIG_INPUT_TABLET=y
+CONFIG_TABLET_USB_ACECAD=y
+CONFIG_TABLET_USB_AIPTEK=y
+CONFIG_TABLET_USB_GTCO=y
+CONFIG_TABLET_USB_HANWANG=y
+CONFIG_TABLET_USB_KBTAB=y
+CONFIG_INPUT_MISC=y
+CONFIG_INPUT_UINPUT=y
+CONFIG_INPUT_GPIO=y
+# CONFIG_SERIO_SERPORT is not set
+CONFIG_SERIO_AMBAKMI=y
+# CONFIG_VT is not set
+# CONFIG_LEGACY_PTYS is not set
+# CONFIG_DEVMEM is not set
+# CONFIG_DEVKMEM is not set
+CONFIG_SERIAL_AMBA_PL011=y
+CONFIG_SERIAL_AMBA_PL011_CONSOLE=y
+CONFIG_VIRTIO_CONSOLE=y
+# CONFIG_HW_RANDOM is not set
+# CONFIG_HWMON is not set
+CONFIG_MEDIA_SUPPORT=y
+CONFIG_FB=y
+CONFIG_FB_GOLDFISH=y
+CONFIG_FB_SIMPLE=y
+CONFIG_BACKLIGHT_LCD_SUPPORT=y
+CONFIG_LOGO=y
+# CONFIG_LOGO_LINUX_MONO is not set
+# CONFIG_LOGO_LINUX_VGA16 is not set
+CONFIG_SOUND=y
+CONFIG_SND=y
+CONFIG_HIDRAW=y
+CONFIG_UHID=y
+CONFIG_HID_A4TECH=y
+CONFIG_HID_ACRUX=y
+CONFIG_HID_ACRUX_FF=y
+CONFIG_HID_APPLE=y
+CONFIG_HID_BELKIN=y
+CONFIG_HID_CHERRY=y
+CONFIG_HID_CHICONY=y
+CONFIG_HID_PRODIKEYS=y
+CONFIG_HID_CYPRESS=y
+CONFIG_HID_DRAGONRISE=y
+CONFIG_DRAGONRISE_FF=y
+CONFIG_HID_EMS_FF=y
+CONFIG_HID_ELECOM=y
+CONFIG_HID_EZKEY=y
+CONFIG_HID_HOLTEK=y
+CONFIG_HID_KEYTOUCH=y
+CONFIG_HID_KYE=y
+CONFIG_HID_UCLOGIC=y
+CONFIG_HID_WALTOP=y
+CONFIG_HID_GYRATION=y
+CONFIG_HID_TWINHAN=y
+CONFIG_HID_KENSINGTON=y
+CONFIG_HID_LCPOWER=y
+CONFIG_HID_LOGITECH=y
+CONFIG_HID_LOGITECH_DJ=y
+CONFIG_LOGITECH_FF=y
+CONFIG_LOGIRUMBLEPAD2_FF=y
+CONFIG_LOGIG940_FF=y
+CONFIG_HID_MAGICMOUSE=y
+CONFIG_HID_MICROSOFT=y
+CONFIG_HID_MONTEREY=y
+CONFIG_HID_MULTITOUCH=y
+CONFIG_HID_NTRIG=y
+CONFIG_HID_ORTEK=y
+CONFIG_HID_PANTHERLORD=y
+CONFIG_PANTHERLORD_FF=y
+CONFIG_HID_PETALYNX=y
+CONFIG_HID_PICOLCD=y
+CONFIG_HID_PRIMAX=y
+CONFIG_HID_ROCCAT=y
+CONFIG_HID_SAITEK=y
+CONFIG_HID_SAMSUNG=y
+CONFIG_HID_SONY=y
+CONFIG_HID_SPEEDLINK=y
+CONFIG_HID_SUNPLUS=y
+CONFIG_HID_GREENASIA=y
+CONFIG_GREENASIA_FF=y
+CONFIG_HID_SMARTJOYPLUS=y
+CONFIG_SMARTJOYPLUS_FF=y
+CONFIG_HID_TIVO=y
+CONFIG_HID_TOPSEED=y
+CONFIG_HID_THRUSTMASTER=y
+CONFIG_HID_WACOM=y
+CONFIG_HID_WIIMOTE=y
+CONFIG_HID_ZEROPLUS=y
+CONFIG_HID_ZYDACRON=y
+CONFIG_USB_HIDDEV=y
+CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
+CONFIG_USB_EHCI_HCD=y
+CONFIG_USB_OTG_WAKELOCK=y
+CONFIG_RTC_CLASS=y
+CONFIG_RTC_DRV_PL031=y
+CONFIG_VIRTIO_MMIO=y
+CONFIG_STAGING=y
+CONFIG_ASHMEM=y
+CONFIG_ANDROID_LOW_MEMORY_KILLER=y
+CONFIG_SYNC=y
+CONFIG_SW_SYNC=y
+CONFIG_SW_SYNC_USER=y
+CONFIG_ION=y
+CONFIG_GOLDFISH_AUDIO=y
+CONFIG_GOLDFISH=y
+CONFIG_GOLDFISH_PIPE=y
+CONFIG_ANDROID=y
+CONFIG_ANDROID_BINDER_IPC=y
+CONFIG_EXT4_FS=y
+CONFIG_EXT4_FS_SECURITY=y
+CONFIG_QUOTA=y
+CONFIG_FUSE_FS=y
+CONFIG_CUSE=y
+CONFIG_MSDOS_FS=y
+CONFIG_VFAT_FS=y
+CONFIG_TMPFS=y
+CONFIG_TMPFS_POSIX_ACL=y
+CONFIG_PSTORE=y
+CONFIG_PSTORE_CONSOLE=y
+CONFIG_PSTORE_RAM=y
+CONFIG_NFS_FS=y
+CONFIG_ROOT_NFS=y
+CONFIG_NLS_CODEPAGE_437=y
+CONFIG_NLS_ISO8859_1=y
+CONFIG_DEBUG_INFO=y
+CONFIG_MAGIC_SYSRQ=y
+CONFIG_DETECT_HUNG_TASK=y
+CONFIG_PANIC_TIMEOUT=5
+# CONFIG_SCHED_DEBUG is not set
+CONFIG_SCHEDSTATS=y
+CONFIG_TIMER_STATS=y
+CONFIG_ENABLE_DEFAULT_TRACERS=y
+CONFIG_SECURITY=y
+CONFIG_SECURITY_NETWORK=y
+CONFIG_SECURITY_SELINUX=y
+CONFIG_VIRTUALIZATION=y
diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index b8e69fe..925d136 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -121,4 +121,10 @@
select CRYPTO_BLKCIPHER
select CRYPTO_CHACHA20
+config CRYPTO_SPECK_NEON
+ tristate "NEON accelerated Speck cipher algorithms"
+ depends on KERNEL_MODE_NEON
+ select CRYPTO_BLKCIPHER
+ select CRYPTO_SPECK
+
endif
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index c9919c2..3304e67 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -10,6 +10,7 @@
obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
+obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
@@ -53,6 +54,7 @@
crct10dif-arm-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o
crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
+speck-neon-y := speck-neon-core.o speck-neon-glue.o
ifdef REGENERATE_ARM_CRYPTO
quiet_cmd_perl = PERL $@
diff --git a/arch/arm/crypto/speck-neon-core.S b/arch/arm/crypto/speck-neon-core.S
new file mode 100644
index 0000000..3c1e203
--- /dev/null
+++ b/arch/arm/crypto/speck-neon-core.S
@@ -0,0 +1,432 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
+ *
+ * Copyright (c) 2018 Google, Inc
+ *
+ * Author: Eric Biggers <ebiggers@google.com>
+ */
+
+#include <linux/linkage.h>
+
+ .text
+ .fpu neon
+
+ // arguments
+ ROUND_KEYS .req r0 // const {u64,u32} *round_keys
+ NROUNDS .req r1 // int nrounds
+ DST .req r2 // void *dst
+ SRC .req r3 // const void *src
+ NBYTES .req r4 // unsigned int nbytes
+ TWEAK .req r5 // void *tweak
+
+ // registers which hold the data being encrypted/decrypted
+ X0 .req q0
+ X0_L .req d0
+ X0_H .req d1
+ Y0 .req q1
+ Y0_H .req d3
+ X1 .req q2
+ X1_L .req d4
+ X1_H .req d5
+ Y1 .req q3
+ Y1_H .req d7
+ X2 .req q4
+ X2_L .req d8
+ X2_H .req d9
+ Y2 .req q5
+ Y2_H .req d11
+ X3 .req q6
+ X3_L .req d12
+ X3_H .req d13
+ Y3 .req q7
+ Y3_H .req d15
+
+ // the round key, duplicated in all lanes
+ ROUND_KEY .req q8
+ ROUND_KEY_L .req d16
+ ROUND_KEY_H .req d17
+
+ // index vector for vtbl-based 8-bit rotates
+ ROTATE_TABLE .req d18
+
+ // multiplication table for updating XTS tweaks
+ GF128MUL_TABLE .req d19
+ GF64MUL_TABLE .req d19
+
+ // current XTS tweak value(s)
+ TWEAKV .req q10
+ TWEAKV_L .req d20
+ TWEAKV_H .req d21
+
+ TMP0 .req q12
+ TMP0_L .req d24
+ TMP0_H .req d25
+ TMP1 .req q13
+ TMP2 .req q14
+ TMP3 .req q15
+
+ .align 4
+.Lror64_8_table:
+ .byte 1, 2, 3, 4, 5, 6, 7, 0
+.Lror32_8_table:
+ .byte 1, 2, 3, 0, 5, 6, 7, 4
+.Lrol64_8_table:
+ .byte 7, 0, 1, 2, 3, 4, 5, 6
+.Lrol32_8_table:
+ .byte 3, 0, 1, 2, 7, 4, 5, 6
+.Lgf128mul_table:
+ .byte 0, 0x87
+ .fill 14
+.Lgf64mul_table:
+ .byte 0, 0x1b, (0x1b << 1), (0x1b << 1) ^ 0x1b
+ .fill 12
+
+/*
+ * _speck_round_128bytes() - Speck encryption round on 128 bytes at a time
+ *
+ * Do one Speck encryption round on the 128 bytes (8 blocks for Speck128, 16 for
+ * Speck64) stored in X0-X3 and Y0-Y3, using the round key stored in all lanes
+ * of ROUND_KEY. 'n' is the lane size: 64 for Speck128, or 32 for Speck64.
+ *
+ * The 8-bit rotates are implemented using vtbl instead of vshr + vsli because
+ * the vtbl approach is faster on some processors and the same speed on others.
+ */
+.macro _speck_round_128bytes n
+
+ // x = ror(x, 8)
+ vtbl.8 X0_L, {X0_L}, ROTATE_TABLE
+ vtbl.8 X0_H, {X0_H}, ROTATE_TABLE
+ vtbl.8 X1_L, {X1_L}, ROTATE_TABLE
+ vtbl.8 X1_H, {X1_H}, ROTATE_TABLE
+ vtbl.8 X2_L, {X2_L}, ROTATE_TABLE
+ vtbl.8 X2_H, {X2_H}, ROTATE_TABLE
+ vtbl.8 X3_L, {X3_L}, ROTATE_TABLE
+ vtbl.8 X3_H, {X3_H}, ROTATE_TABLE
+
+ // x += y
+ vadd.u\n X0, Y0
+ vadd.u\n X1, Y1
+ vadd.u\n X2, Y2
+ vadd.u\n X3, Y3
+
+ // x ^= k
+ veor X0, ROUND_KEY
+ veor X1, ROUND_KEY
+ veor X2, ROUND_KEY
+ veor X3, ROUND_KEY
+
+ // y = rol(y, 3)
+ vshl.u\n TMP0, Y0, #3
+ vshl.u\n TMP1, Y1, #3
+ vshl.u\n TMP2, Y2, #3
+ vshl.u\n TMP3, Y3, #3
+ vsri.u\n TMP0, Y0, #(\n - 3)
+ vsri.u\n TMP1, Y1, #(\n - 3)
+ vsri.u\n TMP2, Y2, #(\n - 3)
+ vsri.u\n TMP3, Y3, #(\n - 3)
+
+ // y ^= x
+ veor Y0, TMP0, X0
+ veor Y1, TMP1, X1
+ veor Y2, TMP2, X2
+ veor Y3, TMP3, X3
+.endm
+
+/*
+ * _speck_unround_128bytes() - Speck decryption round on 128 bytes at a time
+ *
+ * This is the inverse of _speck_round_128bytes().
+ */
+.macro _speck_unround_128bytes n
+
+ // y ^= x
+ veor TMP0, Y0, X0
+ veor TMP1, Y1, X1
+ veor TMP2, Y2, X2
+ veor TMP3, Y3, X3
+
+ // y = ror(y, 3)
+ vshr.u\n Y0, TMP0, #3
+ vshr.u\n Y1, TMP1, #3
+ vshr.u\n Y2, TMP2, #3
+ vshr.u\n Y3, TMP3, #3
+ vsli.u\n Y0, TMP0, #(\n - 3)
+ vsli.u\n Y1, TMP1, #(\n - 3)
+ vsli.u\n Y2, TMP2, #(\n - 3)
+ vsli.u\n Y3, TMP3, #(\n - 3)
+
+ // x ^= k
+ veor X0, ROUND_KEY
+ veor X1, ROUND_KEY
+ veor X2, ROUND_KEY
+ veor X3, ROUND_KEY
+
+ // x -= y
+ vsub.u\n X0, Y0
+ vsub.u\n X1, Y1
+ vsub.u\n X2, Y2
+ vsub.u\n X3, Y3
+
+ // x = rol(x, 8);
+ vtbl.8 X0_L, {X0_L}, ROTATE_TABLE
+ vtbl.8 X0_H, {X0_H}, ROTATE_TABLE
+ vtbl.8 X1_L, {X1_L}, ROTATE_TABLE
+ vtbl.8 X1_H, {X1_H}, ROTATE_TABLE
+ vtbl.8 X2_L, {X2_L}, ROTATE_TABLE
+ vtbl.8 X2_H, {X2_H}, ROTATE_TABLE
+ vtbl.8 X3_L, {X3_L}, ROTATE_TABLE
+ vtbl.8 X3_H, {X3_H}, ROTATE_TABLE
+.endm
+
+.macro _xts128_precrypt_one dst_reg, tweak_buf, tmp
+
+ // Load the next source block
+ vld1.8 {\dst_reg}, [SRC]!
+
+ // Save the current tweak in the tweak buffer
+ vst1.8 {TWEAKV}, [\tweak_buf:128]!
+
+ // XOR the next source block with the current tweak
+ veor \dst_reg, TWEAKV
+
+ /*
+ * Calculate the next tweak by multiplying the current one by x,
+ * modulo p(x) = x^128 + x^7 + x^2 + x + 1.
+ */
+ vshr.u64 \tmp, TWEAKV, #63
+ vshl.u64 TWEAKV, #1
+ veor TWEAKV_H, \tmp\()_L
+ vtbl.8 \tmp\()_H, {GF128MUL_TABLE}, \tmp\()_H
+ veor TWEAKV_L, \tmp\()_H
+.endm
+
+.macro _xts64_precrypt_two dst_reg, tweak_buf, tmp
+
+ // Load the next two source blocks
+ vld1.8 {\dst_reg}, [SRC]!
+
+ // Save the current two tweaks in the tweak buffer
+ vst1.8 {TWEAKV}, [\tweak_buf:128]!
+
+ // XOR the next two source blocks with the current two tweaks
+ veor \dst_reg, TWEAKV
+
+ /*
+ * Calculate the next two tweaks by multiplying the current ones by x^2,
+ * modulo p(x) = x^64 + x^4 + x^3 + x + 1.
+ */
+ vshr.u64 \tmp, TWEAKV, #62
+ vshl.u64 TWEAKV, #2
+ vtbl.8 \tmp\()_L, {GF64MUL_TABLE}, \tmp\()_L
+ vtbl.8 \tmp\()_H, {GF64MUL_TABLE}, \tmp\()_H
+ veor TWEAKV, \tmp
+.endm
+
+/*
+ * _speck_xts_crypt() - Speck-XTS encryption/decryption
+ *
+ * Encrypt or decrypt NBYTES bytes of data from the SRC buffer to the DST buffer
+ * using Speck-XTS, specifically the variant with a block size of '2n' and round
+ * count given by NROUNDS. The expanded round keys are given in ROUND_KEYS, and
+ * the current XTS tweak value is given in TWEAK. It's assumed that NBYTES is a
+ * nonzero multiple of 128.
+ */
+.macro _speck_xts_crypt n, decrypting
+ push {r4-r7}
+ mov r7, sp
+
+ /*
+ * The first four parameters were passed in registers r0-r3. Load the
+ * additional parameters, which were passed on the stack.
+ */
+ ldr NBYTES, [sp, #16]
+ ldr TWEAK, [sp, #20]
+
+ /*
+ * If decrypting, modify the ROUND_KEYS parameter to point to the last
+ * round key rather than the first, since for decryption the round keys
+ * are used in reverse order.
+ */
+.if \decrypting
+.if \n == 64
+ add ROUND_KEYS, ROUND_KEYS, NROUNDS, lsl #3
+ sub ROUND_KEYS, #8
+.else
+ add ROUND_KEYS, ROUND_KEYS, NROUNDS, lsl #2
+ sub ROUND_KEYS, #4
+.endif
+.endif
+
+ // Load the index vector for vtbl-based 8-bit rotates
+.if \decrypting
+ ldr r12, =.Lrol\n\()_8_table
+.else
+ ldr r12, =.Lror\n\()_8_table
+.endif
+ vld1.8 {ROTATE_TABLE}, [r12:64]
+
+ // One-time XTS preparation
+
+ /*
+ * Allocate stack space to store 128 bytes worth of tweaks. For
+ * performance, this space is aligned to a 16-byte boundary so that we
+ * can use the load/store instructions that declare 16-byte alignment.
+ */
+ sub sp, #128
+ bic sp, #0xf
+
+.if \n == 64
+ // Load first tweak
+ vld1.8 {TWEAKV}, [TWEAK]
+
+ // Load GF(2^128) multiplication table
+ ldr r12, =.Lgf128mul_table
+ vld1.8 {GF128MUL_TABLE}, [r12:64]
+.else
+ // Load first tweak
+ vld1.8 {TWEAKV_L}, [TWEAK]
+
+ // Load GF(2^64) multiplication table
+ ldr r12, =.Lgf64mul_table
+ vld1.8 {GF64MUL_TABLE}, [r12:64]
+
+ // Calculate second tweak, packing it together with the first
+ vshr.u64 TMP0_L, TWEAKV_L, #63
+ vtbl.u8 TMP0_L, {GF64MUL_TABLE}, TMP0_L
+ vshl.u64 TWEAKV_H, TWEAKV_L, #1
+ veor TWEAKV_H, TMP0_L
+.endif
+
+.Lnext_128bytes_\@:
+
+ /*
+ * Load the source blocks into {X,Y}[0-3], XOR them with their XTS tweak
+ * values, and save the tweaks on the stack for later. Then
+ * de-interleave the 'x' and 'y' elements of each block, i.e. make it so
+ * that the X[0-3] registers contain only the second halves of blocks,
+ * and the Y[0-3] registers contain only the first halves of blocks.
+ * (Speck uses the order (y, x) rather than the more intuitive (x, y).)
+ */
+ mov r12, sp
+.if \n == 64
+ _xts128_precrypt_one X0, r12, TMP0
+ _xts128_precrypt_one Y0, r12, TMP0
+ _xts128_precrypt_one X1, r12, TMP0
+ _xts128_precrypt_one Y1, r12, TMP0
+ _xts128_precrypt_one X2, r12, TMP0
+ _xts128_precrypt_one Y2, r12, TMP0
+ _xts128_precrypt_one X3, r12, TMP0
+ _xts128_precrypt_one Y3, r12, TMP0
+ vswp X0_L, Y0_H
+ vswp X1_L, Y1_H
+ vswp X2_L, Y2_H
+ vswp X3_L, Y3_H
+.else
+ _xts64_precrypt_two X0, r12, TMP0
+ _xts64_precrypt_two Y0, r12, TMP0
+ _xts64_precrypt_two X1, r12, TMP0
+ _xts64_precrypt_two Y1, r12, TMP0
+ _xts64_precrypt_two X2, r12, TMP0
+ _xts64_precrypt_two Y2, r12, TMP0
+ _xts64_precrypt_two X3, r12, TMP0
+ _xts64_precrypt_two Y3, r12, TMP0
+ vuzp.32 Y0, X0
+ vuzp.32 Y1, X1
+ vuzp.32 Y2, X2
+ vuzp.32 Y3, X3
+.endif
+
+ // Do the cipher rounds
+
+ mov r12, ROUND_KEYS
+ mov r6, NROUNDS
+
+.Lnext_round_\@:
+.if \decrypting
+.if \n == 64
+ vld1.64 ROUND_KEY_L, [r12]
+ sub r12, #8
+ vmov ROUND_KEY_H, ROUND_KEY_L
+.else
+ vld1.32 {ROUND_KEY_L[],ROUND_KEY_H[]}, [r12]
+ sub r12, #4
+.endif
+ _speck_unround_128bytes \n
+.else
+.if \n == 64
+ vld1.64 ROUND_KEY_L, [r12]!
+ vmov ROUND_KEY_H, ROUND_KEY_L
+.else
+ vld1.32 {ROUND_KEY_L[],ROUND_KEY_H[]}, [r12]!
+.endif
+ _speck_round_128bytes \n
+.endif
+ subs r6, r6, #1
+ bne .Lnext_round_\@
+
+ // Re-interleave the 'x' and 'y' elements of each block
+.if \n == 64
+ vswp X0_L, Y0_H
+ vswp X1_L, Y1_H
+ vswp X2_L, Y2_H
+ vswp X3_L, Y3_H
+.else
+ vzip.32 Y0, X0
+ vzip.32 Y1, X1
+ vzip.32 Y2, X2
+ vzip.32 Y3, X3
+.endif
+
+ // XOR the encrypted/decrypted blocks with the tweaks we saved earlier
+ mov r12, sp
+ vld1.8 {TMP0, TMP1}, [r12:128]!
+ vld1.8 {TMP2, TMP3}, [r12:128]!
+ veor X0, TMP0
+ veor Y0, TMP1
+ veor X1, TMP2
+ veor Y1, TMP3
+ vld1.8 {TMP0, TMP1}, [r12:128]!
+ vld1.8 {TMP2, TMP3}, [r12:128]!
+ veor X2, TMP0
+ veor Y2, TMP1
+ veor X3, TMP2
+ veor Y3, TMP3
+
+ // Store the ciphertext in the destination buffer
+ vst1.8 {X0, Y0}, [DST]!
+ vst1.8 {X1, Y1}, [DST]!
+ vst1.8 {X2, Y2}, [DST]!
+ vst1.8 {X3, Y3}, [DST]!
+
+ // Continue if there are more 128-byte chunks remaining, else return
+ subs NBYTES, #128
+ bne .Lnext_128bytes_\@
+
+ // Store the next tweak
+.if \n == 64
+ vst1.8 {TWEAKV}, [TWEAK]
+.else
+ vst1.8 {TWEAKV_L}, [TWEAK]
+.endif
+
+ mov sp, r7
+ pop {r4-r7}
+ bx lr
+.endm
+
+ENTRY(speck128_xts_encrypt_neon)
+ _speck_xts_crypt n=64, decrypting=0
+ENDPROC(speck128_xts_encrypt_neon)
+
+ENTRY(speck128_xts_decrypt_neon)
+ _speck_xts_crypt n=64, decrypting=1
+ENDPROC(speck128_xts_decrypt_neon)
+
+ENTRY(speck64_xts_encrypt_neon)
+ _speck_xts_crypt n=32, decrypting=0
+ENDPROC(speck64_xts_encrypt_neon)
+
+ENTRY(speck64_xts_decrypt_neon)
+ _speck_xts_crypt n=32, decrypting=1
+ENDPROC(speck64_xts_decrypt_neon)
diff --git a/arch/arm/crypto/speck-neon-glue.c b/arch/arm/crypto/speck-neon-glue.c
new file mode 100644
index 0000000..f012c3e
--- /dev/null
+++ b/arch/arm/crypto/speck-neon-glue.c
@@ -0,0 +1,288 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
+ *
+ * Copyright (c) 2018 Google, Inc
+ *
+ * Note: the NIST recommendation for XTS only specifies a 128-bit block size,
+ * but a 64-bit version (needed for Speck64) is fairly straightforward; the math
+ * is just done in GF(2^64) instead of GF(2^128), with the reducing polynomial
+ * x^64 + x^4 + x^3 + x + 1 from the original XEX paper (Rogaway, 2004:
+ * "Efficient Instantiations of Tweakable Blockciphers and Refinements to Modes
+ * OCB and PMAC"), represented as 0x1B.
+ */
+
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+#include <asm/simd.h>
+#include <crypto/algapi.h>
+#include <crypto/gf128mul.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/speck.h>
+#include <crypto/xts.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+/* The assembly functions only handle multiples of 128 bytes */
+#define SPECK_NEON_CHUNK_SIZE 128
+
+/* Speck128 */
+
+struct speck128_xts_tfm_ctx {
+ struct speck128_tfm_ctx main_key;
+ struct speck128_tfm_ctx tweak_key;
+};
+
+asmlinkage void speck128_xts_encrypt_neon(const u64 *round_keys, int nrounds,
+ void *dst, const void *src,
+ unsigned int nbytes, void *tweak);
+
+asmlinkage void speck128_xts_decrypt_neon(const u64 *round_keys, int nrounds,
+ void *dst, const void *src,
+ unsigned int nbytes, void *tweak);
+
+typedef void (*speck128_crypt_one_t)(const struct speck128_tfm_ctx *,
+ u8 *, const u8 *);
+typedef void (*speck128_xts_crypt_many_t)(const u64 *, int, void *,
+ const void *, unsigned int, void *);
+
+static __always_inline int
+__speck128_xts_crypt(struct skcipher_request *req,
+ speck128_crypt_one_t crypt_one,
+ speck128_xts_crypt_many_t crypt_many)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct speck128_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ le128 tweak;
+ int err;
+
+ err = skcipher_walk_virt(&walk, req, true);
+
+ crypto_speck128_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv);
+
+ while (walk.nbytes > 0) {
+ unsigned int nbytes = walk.nbytes;
+ u8 *dst = walk.dst.virt.addr;
+ const u8 *src = walk.src.virt.addr;
+
+ if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) {
+ unsigned int count;
+
+ count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE);
+ kernel_neon_begin();
+ (*crypt_many)(ctx->main_key.round_keys,
+ ctx->main_key.nrounds,
+ dst, src, count, &tweak);
+ kernel_neon_end();
+ dst += count;
+ src += count;
+ nbytes -= count;
+ }
+
+ /* Handle any remainder with generic code */
+ while (nbytes >= sizeof(tweak)) {
+ le128_xor((le128 *)dst, (const le128 *)src, &tweak);
+ (*crypt_one)(&ctx->main_key, dst, dst);
+ le128_xor((le128 *)dst, (const le128 *)dst, &tweak);
+ gf128mul_x_ble(&tweak, &tweak);
+
+ dst += sizeof(tweak);
+ src += sizeof(tweak);
+ nbytes -= sizeof(tweak);
+ }
+ err = skcipher_walk_done(&walk, nbytes);
+ }
+
+ return err;
+}
+
+static int speck128_xts_encrypt(struct skcipher_request *req)
+{
+ return __speck128_xts_crypt(req, crypto_speck128_encrypt,
+ speck128_xts_encrypt_neon);
+}
+
+static int speck128_xts_decrypt(struct skcipher_request *req)
+{
+ return __speck128_xts_crypt(req, crypto_speck128_decrypt,
+ speck128_xts_decrypt_neon);
+}
+
+static int speck128_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+ unsigned int keylen)
+{
+ struct speck128_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+ int err;
+
+ err = xts_verify_key(tfm, key, keylen);
+ if (err)
+ return err;
+
+ keylen /= 2;
+
+ err = crypto_speck128_setkey(&ctx->main_key, key, keylen);
+ if (err)
+ return err;
+
+ return crypto_speck128_setkey(&ctx->tweak_key, key + keylen, keylen);
+}
+
+/* Speck64 */
+
+struct speck64_xts_tfm_ctx {
+ struct speck64_tfm_ctx main_key;
+ struct speck64_tfm_ctx tweak_key;
+};
+
+asmlinkage void speck64_xts_encrypt_neon(const u32 *round_keys, int nrounds,
+ void *dst, const void *src,
+ unsigned int nbytes, void *tweak);
+
+asmlinkage void speck64_xts_decrypt_neon(const u32 *round_keys, int nrounds,
+ void *dst, const void *src,
+ unsigned int nbytes, void *tweak);
+
+typedef void (*speck64_crypt_one_t)(const struct speck64_tfm_ctx *,
+ u8 *, const u8 *);
+typedef void (*speck64_xts_crypt_many_t)(const u32 *, int, void *,
+ const void *, unsigned int, void *);
+
+static __always_inline int
+__speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one,
+ speck64_xts_crypt_many_t crypt_many)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ __le64 tweak;
+ int err;
+
+ err = skcipher_walk_virt(&walk, req, true);
+
+ crypto_speck64_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv);
+
+ while (walk.nbytes > 0) {
+ unsigned int nbytes = walk.nbytes;
+ u8 *dst = walk.dst.virt.addr;
+ const u8 *src = walk.src.virt.addr;
+
+ if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) {
+ unsigned int count;
+
+ count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE);
+ kernel_neon_begin();
+ (*crypt_many)(ctx->main_key.round_keys,
+ ctx->main_key.nrounds,
+ dst, src, count, &tweak);
+ kernel_neon_end();
+ dst += count;
+ src += count;
+ nbytes -= count;
+ }
+
+ /* Handle any remainder with generic code */
+ while (nbytes >= sizeof(tweak)) {
+ *(__le64 *)dst = *(__le64 *)src ^ tweak;
+ (*crypt_one)(&ctx->main_key, dst, dst);
+ *(__le64 *)dst ^= tweak;
+ tweak = cpu_to_le64((le64_to_cpu(tweak) << 1) ^
+ ((tweak & cpu_to_le64(1ULL << 63)) ?
+ 0x1B : 0));
+ dst += sizeof(tweak);
+ src += sizeof(tweak);
+ nbytes -= sizeof(tweak);
+ }
+ err = skcipher_walk_done(&walk, nbytes);
+ }
+
+ return err;
+}
+
+static int speck64_xts_encrypt(struct skcipher_request *req)
+{
+ return __speck64_xts_crypt(req, crypto_speck64_encrypt,
+ speck64_xts_encrypt_neon);
+}
+
+static int speck64_xts_decrypt(struct skcipher_request *req)
+{
+ return __speck64_xts_crypt(req, crypto_speck64_decrypt,
+ speck64_xts_decrypt_neon);
+}
+
+static int speck64_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+ unsigned int keylen)
+{
+ struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+ int err;
+
+ err = xts_verify_key(tfm, key, keylen);
+ if (err)
+ return err;
+
+ keylen /= 2;
+
+ err = crypto_speck64_setkey(&ctx->main_key, key, keylen);
+ if (err)
+ return err;
+
+ return crypto_speck64_setkey(&ctx->tweak_key, key + keylen, keylen);
+}
+
+static struct skcipher_alg speck_algs[] = {
+ {
+ .base.cra_name = "xts(speck128)",
+ .base.cra_driver_name = "xts-speck128-neon",
+ .base.cra_priority = 300,
+ .base.cra_blocksize = SPECK128_BLOCK_SIZE,
+ .base.cra_ctxsize = sizeof(struct speck128_xts_tfm_ctx),
+ .base.cra_alignmask = 7,
+ .base.cra_module = THIS_MODULE,
+ .min_keysize = 2 * SPECK128_128_KEY_SIZE,
+ .max_keysize = 2 * SPECK128_256_KEY_SIZE,
+ .ivsize = SPECK128_BLOCK_SIZE,
+ .walksize = SPECK_NEON_CHUNK_SIZE,
+ .setkey = speck128_xts_setkey,
+ .encrypt = speck128_xts_encrypt,
+ .decrypt = speck128_xts_decrypt,
+ }, {
+ .base.cra_name = "xts(speck64)",
+ .base.cra_driver_name = "xts-speck64-neon",
+ .base.cra_priority = 300,
+ .base.cra_blocksize = SPECK64_BLOCK_SIZE,
+ .base.cra_ctxsize = sizeof(struct speck64_xts_tfm_ctx),
+ .base.cra_alignmask = 7,
+ .base.cra_module = THIS_MODULE,
+ .min_keysize = 2 * SPECK64_96_KEY_SIZE,
+ .max_keysize = 2 * SPECK64_128_KEY_SIZE,
+ .ivsize = SPECK64_BLOCK_SIZE,
+ .walksize = SPECK_NEON_CHUNK_SIZE,
+ .setkey = speck64_xts_setkey,
+ .encrypt = speck64_xts_encrypt,
+ .decrypt = speck64_xts_decrypt,
+ }
+};
+
+static int __init speck_neon_module_init(void)
+{
+ if (!(elf_hwcap & HWCAP_NEON))
+ return -ENODEV;
+ return crypto_register_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
+}
+
+static void __exit speck_neon_module_exit(void)
+{
+ crypto_unregister_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
+}
+
+module_init(speck_neon_module_init);
+module_exit(speck_neon_module_exit);
+
+MODULE_DESCRIPTION("Speck block cipher (NEON-accelerated)");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
+MODULE_ALIAS_CRYPTO("xts(speck128)");
+MODULE_ALIAS_CRYPTO("xts-speck128-neon");
+MODULE_ALIAS_CRYPTO("xts(speck64)");
+MODULE_ALIAS_CRYPTO("xts-speck64-neon");
diff --git a/arch/arm/include/asm/elf.h b/arch/arm/include/asm/elf.h
index 8c5ca92..51f0d24 100644
--- a/arch/arm/include/asm/elf.h
+++ b/arch/arm/include/asm/elf.h
@@ -113,8 +113,12 @@
#define CORE_DUMP_USE_REGSET
#define ELF_EXEC_PAGESIZE 4096
-/* This is the base location for PIE (ET_DYN with INTERP) loads. */
-#define ELF_ET_DYN_BASE 0x400000UL
+/* This is the location that an ET_DYN program is loaded if exec'ed. Typical
+ use of this is to invoke "./ld.so someprog" to test out a new version of
+ the loader. We need to make sure that it is out of the way of the program
+ that it will "exec", and that there is sufficient room for the brk. */
+
+#define ELF_ET_DYN_BASE (TASK_SIZE / 3 * 2)
/* When the program starts, a1 contains a pointer to a function to be
registered with atexit, as per the SVR4 ABI. A value of 0 means we
diff --git a/arch/arm/include/asm/fiq_glue.h b/arch/arm/include/asm/fiq_glue.h
new file mode 100644
index 0000000..a9e244f9
--- /dev/null
+++ b/arch/arm/include/asm/fiq_glue.h
@@ -0,0 +1,33 @@
+/*
+ * Copyright (C) 2010 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __ASM_FIQ_GLUE_H
+#define __ASM_FIQ_GLUE_H
+
+struct fiq_glue_handler {
+ void (*fiq)(struct fiq_glue_handler *h, void *regs, void *svc_sp);
+ void (*resume)(struct fiq_glue_handler *h);
+};
+typedef void (*fiq_return_handler_t)(void);
+
+int fiq_glue_register_handler(struct fiq_glue_handler *handler);
+int fiq_glue_set_return_handler(fiq_return_handler_t fiq_return);
+int fiq_glue_clear_return_handler(fiq_return_handler_t fiq_return);
+
+#ifdef CONFIG_FIQ_GLUE
+void fiq_glue_resume(void);
+#else
+static inline void fiq_glue_resume(void) {}
+#endif
+
+#endif
diff --git a/arch/arm/include/asm/topology.h b/arch/arm/include/asm/topology.h
index f59ab9b..201dc20 100644
--- a/arch/arm/include/asm/topology.h
+++ b/arch/arm/include/asm/topology.h
@@ -25,6 +25,20 @@
void store_cpu_topology(unsigned int cpuid);
const struct cpumask *cpu_coregroup_mask(int cpu);
+#include <linux/arch_topology.h>
+
+/* Replace task scheduler's default frequency-invariant accounting */
+#define arch_scale_freq_capacity topology_get_freq_scale
+
+/* Replace task scheduler's default max-frequency-invariant accounting */
+#define arch_scale_max_freq_capacity topology_get_max_freq_scale
+
+/* Replace task scheduler's default cpu-invariant accounting */
+#define arch_scale_cpu_capacity topology_get_cpu_scale
+
+/* Enable topology flag updates */
+#define arch_update_cpu_topology topology_update_cpu_topology
+
#else
static inline void init_cpu_topology(void) { }
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index caa0dbe..923a725 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -141,6 +141,8 @@
static int kgdb_brk_fn(struct pt_regs *regs, unsigned int instr)
{
+ if (user_mode(regs))
+ return -1;
kgdb_handle_exception(1, SIGTRAP, 0, regs);
return 0;
@@ -148,6 +150,8 @@
static int kgdb_compiled_brk_fn(struct pt_regs *regs, unsigned int instr)
{
+ if (user_mode(regs))
+ return -1;
compiled_break = 1;
kgdb_handle_exception(1, SIGTRAP, 0, regs);
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index d96714e..4b675a8 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -94,6 +94,77 @@
ledtrig_cpu(CPU_LED_IDLE_END);
}
+/*
+ * dump a block of kernel memory from around the given address
+ */
+static void show_data(unsigned long addr, int nbytes, const char *name)
+{
+ int i, j;
+ int nlines;
+ u32 *p;
+
+ /*
+ * don't attempt to dump non-kernel addresses or
+ * values that are probably just small negative numbers
+ */
+ if (addr < PAGE_OFFSET || addr > -256UL)
+ return;
+
+ printk("\n%s: %#lx:\n", name, addr);
+
+ /*
+ * round address down to a 32 bit boundary
+ * and always dump a multiple of 32 bytes
+ */
+ p = (u32 *)(addr & ~(sizeof(u32) - 1));
+ nbytes += (addr & (sizeof(u32) - 1));
+ nlines = (nbytes + 31) / 32;
+
+
+ for (i = 0; i < nlines; i++) {
+ /*
+ * just display low 16 bits of address to keep
+ * each line of the dump < 80 characters
+ */
+ printk("%04lx ", (unsigned long)p & 0xffff);
+ for (j = 0; j < 8; j++) {
+ u32 data;
+ if (probe_kernel_address(p, data)) {
+ printk(" ********");
+ } else {
+ printk(" %08x", data);
+ }
+ ++p;
+ }
+ printk("\n");
+ }
+}
+
+static void show_extra_register_data(struct pt_regs *regs, int nbytes)
+{
+ mm_segment_t fs;
+
+ fs = get_fs();
+ set_fs(KERNEL_DS);
+ show_data(regs->ARM_pc - nbytes, nbytes * 2, "PC");
+ show_data(regs->ARM_lr - nbytes, nbytes * 2, "LR");
+ show_data(regs->ARM_sp - nbytes, nbytes * 2, "SP");
+ show_data(regs->ARM_ip - nbytes, nbytes * 2, "IP");
+ show_data(regs->ARM_fp - nbytes, nbytes * 2, "FP");
+ show_data(regs->ARM_r0 - nbytes, nbytes * 2, "R0");
+ show_data(regs->ARM_r1 - nbytes, nbytes * 2, "R1");
+ show_data(regs->ARM_r2 - nbytes, nbytes * 2, "R2");
+ show_data(regs->ARM_r3 - nbytes, nbytes * 2, "R3");
+ show_data(regs->ARM_r4 - nbytes, nbytes * 2, "R4");
+ show_data(regs->ARM_r5 - nbytes, nbytes * 2, "R5");
+ show_data(regs->ARM_r6 - nbytes, nbytes * 2, "R6");
+ show_data(regs->ARM_r7 - nbytes, nbytes * 2, "R7");
+ show_data(regs->ARM_r8 - nbytes, nbytes * 2, "R8");
+ show_data(regs->ARM_r9 - nbytes, nbytes * 2, "R9");
+ show_data(regs->ARM_r10 - nbytes, nbytes * 2, "R10");
+ set_fs(fs);
+}
+
void __show_regs(struct pt_regs *regs)
{
unsigned long flags;
@@ -185,6 +256,8 @@
printk("Control: %08x%s\n", ctrl, buf);
}
#endif
+
+ show_extra_register_data(regs, 128);
}
void show_regs(struct pt_regs * regs)
diff --git a/arch/arm/kernel/reboot.c b/arch/arm/kernel/reboot.c
index 3b2aa9a..c742491 100644
--- a/arch/arm/kernel/reboot.c
+++ b/arch/arm/kernel/reboot.c
@@ -6,6 +6,7 @@
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
+#include <linux/console.h>
#include <linux/cpu.h>
#include <linux/delay.h>
#include <linux/reboot.h>
@@ -125,6 +126,31 @@
pm_power_off();
}
+#ifdef CONFIG_ARM_FLUSH_CONSOLE_ON_RESTART
+void arm_machine_flush_console(void)
+{
+ printk("\n");
+ pr_emerg("Restarting %s\n", linux_banner);
+ if (console_trylock()) {
+ console_unlock();
+ return;
+ }
+
+ mdelay(50);
+
+ local_irq_disable();
+ if (!console_trylock())
+ pr_emerg("arm_restart: Console was locked! Busting\n");
+ else
+ pr_emerg("arm_restart: Console was locked!\n");
+ console_unlock();
+}
+#else
+void arm_machine_flush_console(void)
+{
+}
+#endif
+
/*
* Restart requires that the secondary CPUs stop performing any activity
* while the primary CPU resets the system. Systems with a single CPU can
@@ -141,6 +167,10 @@
local_irq_disable();
smp_send_stop();
+ /* Flush the console to make sure all the relevant messages make it
+ * out to the console drivers */
+ arm_machine_flush_console();
+
if (arm_pm_restart)
arm_pm_restart(reboot_mode, cmd);
else
diff --git a/arch/arm/kernel/topology.c b/arch/arm/kernel/topology.c
index 24ac3ca..28ca164 100644
--- a/arch/arm/kernel/topology.c
+++ b/arch/arm/kernel/topology.c
@@ -23,6 +23,7 @@
#include <linux/of.h>
#include <linux/sched.h>
#include <linux/sched/topology.h>
+#include <linux/sched/energy.h>
#include <linux/slab.h>
#include <linux/string.h>
@@ -30,6 +31,18 @@
#include <asm/cputype.h>
#include <asm/topology.h>
+static inline
+const struct sched_group_energy * const cpu_core_energy(int cpu)
+{
+ return sge_array[cpu][SD_LEVEL0];
+}
+
+static inline
+const struct sched_group_energy * const cpu_cluster_energy(int cpu)
+{
+ return sge_array[cpu][SD_LEVEL1];
+}
+
/*
* cpu capacity scale management
*/
@@ -278,23 +291,37 @@
update_cpu_capacity(cpuid);
+ topology_detect_flags();
+
pr_info("CPU%u: thread %d, cpu %d, socket %d, mpidr %x\n",
cpuid, cpu_topology[cpuid].thread_id,
cpu_topology[cpuid].core_id,
cpu_topology[cpuid].socket_id, mpidr);
}
+#ifdef CONFIG_SCHED_MC
+static int core_flags(void)
+{
+ return cpu_core_flags() | topology_core_flags();
+}
+
static inline int cpu_corepower_flags(void)
{
- return SD_SHARE_PKG_RESOURCES | SD_SHARE_POWERDOMAIN;
+ return topology_core_flags()
+ | SD_SHARE_PKG_RESOURCES | SD_SHARE_POWERDOMAIN;
+}
+#endif
+
+static int cpu_flags(void)
+{
+ return topology_cpu_flags();
}
static struct sched_domain_topology_level arm_topology[] = {
#ifdef CONFIG_SCHED_MC
- { cpu_corepower_mask, cpu_corepower_flags, SD_INIT_NAME(GMC) },
- { cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME(MC) },
+ { cpu_coregroup_mask, core_flags, cpu_core_energy, SD_INIT_NAME(MC) },
#endif
- { cpu_cpu_mask, SD_INIT_NAME(DIE) },
+ { cpu_cpu_mask, cpu_flags, cpu_cluster_energy, SD_INIT_NAME(DIE) },
{ NULL, },
};
diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
index 2465995..11da0f5 100644
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -270,6 +270,11 @@
* - end - virtual end address of region
*/
ENTRY(v6_dma_flush_range)
+#ifdef CONFIG_CACHE_FLUSH_RANGE_LIMIT
+ sub r2, r1, r0
+ cmp r2, #CONFIG_CACHE_FLUSH_RANGE_LIMIT
+ bhi v6_dma_flush_dcache_all
+#endif
#ifdef CONFIG_DMA_CACHE_RWFO
ldrb r2, [r0] @ read for ownership
strb r2, [r0] @ write for ownership
@@ -292,6 +297,18 @@
mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
ret lr
+#ifdef CONFIG_CACHE_FLUSH_RANGE_LIMIT
+v6_dma_flush_dcache_all:
+ mov r0, #0
+#ifdef HARVARD_CACHE
+ mcr p15, 0, r0, c7, c14, 0 @ D cache clean+invalidate
+#else
+ mcr p15, 0, r0, c7, c15, 0 @ Cache clean+invalidate
+#endif
+ mcr p15, 0, r0, c7, c10, 4 @ drain write buffer
+ mov pc, lr
+#endif
+
/*
* dma_map_area(start, size, dir)
* - start - kernel virtual start address
diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
index 42f5853..6123d12 100644
--- a/arch/arm/mm/fault.c
+++ b/arch/arm/mm/fault.c
@@ -274,10 +274,10 @@
local_irq_enable();
/*
- * If we're in an interrupt or have no user
+ * If we're in an interrupt, or have no irqs, or have no user
* context, we must not take the fault..
*/
- if (faulthandler_disabled() || !mm)
+ if (faulthandler_disabled() || irqs_disabled() || !mm)
goto no_context;
if (user_mode(regs))
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index c30cd78..71c5559 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -24,6 +24,7 @@
select ARCH_HAVE_NMI_SAFE_CMPXCHG if ACPI_APEI_SEA
select ARCH_USE_CMPXCHG_LOCKREF
select ARCH_SUPPORTS_MEMORY_FAILURE
+ select ARCH_SUPPORTS_LTO_CLANG
select ARCH_SUPPORTS_ATOMIC_RMW
select ARCH_SUPPORTS_NUMA_BALANCING
select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
@@ -433,7 +434,7 @@
config ARM64_ERRATUM_843419
bool "Cortex-A53: 843419: A load or store might access an incorrect address"
- default y
+ default y if !LTO_CLANG
select ARM64_MODULE_CMODEL_LARGE if MODULES
help
This option links the kernel with '--fix-cortex-a53-843419' and
@@ -1069,7 +1070,7 @@
config RANDOMIZE_MODULE_REGION_FULL
bool "Randomize the module region independently from the core kernel"
- depends on RANDOMIZE_BASE
+ depends on RANDOMIZE_BASE && !LTO_CLANG
default y
help
Randomizes the location of the module region without considering the
@@ -1103,6 +1104,23 @@
entering them here. As a minimum, you should specify the the
root device (e.g. root=/dev/nfs).
+choice
+ prompt "Kernel command line type" if CMDLINE != ""
+ default CMDLINE_FROM_BOOTLOADER
+
+config CMDLINE_FROM_BOOTLOADER
+ bool "Use bootloader kernel arguments if available"
+ help
+ Uses the command-line options passed by the boot loader. If
+ the boot loader doesn't provide any, the default kernel command
+ string provided in CMDLINE will be used.
+
+config CMDLINE_EXTEND
+ bool "Extend bootloader kernel arguments"
+ help
+ The command-line arguments provided by the boot loader will be
+ appended to the default kernel command string.
+
config CMDLINE_FORCE
bool "Always use the default kernel command string"
help
@@ -1110,6 +1128,7 @@
loader passes other arguments to the kernel.
This is useful if you cannot or don't want to change the
command-line options your boot loader passes to the kernel.
+endchoice
config EFI_STUB
bool
@@ -1142,6 +1161,41 @@
However, even with this option, the resultant kernel should
continue to boot on existing non-UEFI platforms.
+config BUILD_ARM64_APPENDED_DTB_IMAGE
+ bool "Build a concatenated Image.gz/dtb by default"
+ depends on OF
+ help
+ Enabling this option will cause a concatenated Image.gz and list of
+ DTBs to be built by default (instead of a standalone Image.gz.)
+ The image will built in arch/arm64/boot/Image.gz-dtb
+
+choice
+ prompt "Appended DTB Kernel Image name"
+ depends on BUILD_ARM64_APPENDED_DTB_IMAGE
+ help
+ Enabling this option will cause a specific kernel image Image or
+ Image.gz to be used for final image creation.
+ The image will built in arch/arm64/boot/IMAGE-NAME-dtb
+
+ config IMG_GZ_DTB
+ bool "Image.gz-dtb"
+ config IMG_DTB
+ bool "Image-dtb"
+endchoice
+
+config BUILD_ARM64_APPENDED_KERNEL_IMAGE_NAME
+ string
+ depends on BUILD_ARM64_APPENDED_DTB_IMAGE
+ default "Image.gz-dtb" if IMG_GZ_DTB
+ default "Image-dtb" if IMG_DTB
+
+config BUILD_ARM64_APPENDED_DTB_IMAGE_NAMES
+ string "Default dtb names"
+ depends on BUILD_ARM64_APPENDED_DTB_IMAGE
+ help
+ Space separated list of names of dtbs to append when
+ building a concatenated Image.gz-dtb.
+
endmenu
menu "Userspace binary formats"
diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index 7318165..e64f5f0 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -26,8 +26,17 @@
ifeq ($(call ld-option, --fix-cortex-a53-843419),)
$(warning ld does not support --fix-cortex-a53-843419; kernel may be susceptible to erratum)
else
+ ifeq ($(call gold-ifversion, -lt, 114000000, y), y)
+$(warning This version of GNU gold may generate incorrect code with --fix-cortex-a53-843419;\
+ see https://sourceware.org/bugzilla/show_bug.cgi?id=21491)
+ endif
LDFLAGS_vmlinux += --fix-cortex-a53-843419
endif
+else
+ ifeq ($(ld-name),gold)
+# Pass --no-fix-cortex-a53-843419 to ensure the erratum fix is disabled
+LDFLAGS += --no-fix-cortex-a53-843419
+ endif
endif
KBUILD_DEFCONFIG := defconfig
@@ -49,9 +58,17 @@
endif
endif
-KBUILD_CFLAGS += -mgeneral-regs-only $(lseinstr) $(brokengasinst)
+ifeq ($(cc-name),clang)
+# This is a workaround for https://bugs.llvm.org/show_bug.cgi?id=30792.
+# TODO: revert when this is fixed in LLVM.
+KBUILD_CFLAGS += -mno-implicit-float
+else
+KBUILD_CFLAGS += -mgeneral-regs-only
+endif
+KBUILD_CFLAGS += $(lseinstr) $(brokengasinst)
KBUILD_CFLAGS += -fno-asynchronous-unwind-tables
KBUILD_CFLAGS += $(call cc-option, -mpc-relative-literal-loads)
+KBUILD_CFLAGS += -fno-pic
KBUILD_AFLAGS += $(lseinstr) $(brokengasinst)
KBUILD_CFLAGS += $(call cc-option,-mabi=lp64)
@@ -62,14 +79,22 @@
CHECKFLAGS += -D__AARCH64EB__
AS += -EB
LD += -EB
+ifeq ($(ld-name),gold)
+LDFLAGS += -maarch64_elf64_be_vec
+else
LDFLAGS += -maarch64linuxb
+endif
UTS_MACHINE := aarch64_be
else
KBUILD_CPPFLAGS += -mlittle-endian
CHECKFLAGS += -D__AARCH64EL__
AS += -EL
LD += -EL
+ifeq ($(ld-name),gold)
+LDFLAGS += -maarch64_elf64_le_vec
+else
LDFLAGS += -maarch64linux
+endif
UTS_MACHINE := aarch64
endif
@@ -77,6 +102,10 @@
ifeq ($(CONFIG_ARM64_MODULE_CMODEL_LARGE), y)
KBUILD_CFLAGS_MODULE += -mcmodel=large
+ifeq ($(CONFIG_LTO_CLANG), y)
+# Code model is not stored in LLVM IR, so we need to pass it also to LLVMgold
+LDFLAGS += -plugin-opt=-code-model=large
+endif
endif
ifeq ($(CONFIG_ARM64_MODULE_PLTS),y)
@@ -95,6 +124,10 @@
TEXT_OFFSET := 0x00080000
endif
+ifeq ($(cc-name),clang)
+KBUILD_CFLAGS += $(call cc-disable-warning, asm-operand-widths)
+endif
+
# KASAN_SHADOW_OFFSET = VA_START + (1 << (VA_BITS - 3)) - (1 << 61)
# in 32-bit arithmetic
KASAN_SHADOW_OFFSET := $(shell printf "0x%08x00000000\n" $$(( \
@@ -114,10 +147,15 @@
# Default target when executing plain make
boot := arch/arm64/boot
+ifeq ($(CONFIG_BUILD_ARM64_APPENDED_DTB_IMAGE),y)
+KBUILD_IMAGE := $(boot)/$(subst $\",,$(CONFIG_BUILD_ARM64_APPENDED_KERNEL_IMAGE_NAME))
+else
KBUILD_IMAGE := $(boot)/Image.gz
+endif
+
KBUILD_DTBS := dtbs
-all: Image.gz $(KBUILD_DTBS)
+all: Image.gz $(KBUILD_DTBS) $(subst $\",,$(CONFIG_BUILD_ARM64_APPENDED_KERNEL_IMAGE_NAME))
Image: vmlinux
@@ -140,6 +178,12 @@
dtbs_install:
$(Q)$(MAKE) $(dtbinst)=$(boot)/dts
+Image-dtb: vmlinux scripts dtbs
+ $(Q)$(MAKE) $(build)=$(boot) $(boot)/$@
+
+Image.gz-dtb: vmlinux scripts dtbs Image.gz
+ $(Q)$(MAKE) $(build)=$(boot) $(boot)/$@
+
PHONY += vdso_install
vdso_install:
$(Q)$(MAKE) $(build)=arch/arm64/kernel/vdso $@
diff --git a/arch/arm64/boot/.gitignore b/arch/arm64/boot/.gitignore
index 8dab0bb..34e3520 100644
--- a/arch/arm64/boot/.gitignore
+++ b/arch/arm64/boot/.gitignore
@@ -1,2 +1,4 @@
Image
+Image-dtb
Image.gz
+Image.gz-dtb
diff --git a/arch/arm64/boot/Makefile b/arch/arm64/boot/Makefile
index 1f012c5..2c8cb86 100644
--- a/arch/arm64/boot/Makefile
+++ b/arch/arm64/boot/Makefile
@@ -14,16 +14,29 @@
# Based on the ia64 boot/Makefile.
#
+include $(srctree)/arch/arm64/boot/dts/Makefile
+
OBJCOPYFLAGS_Image :=-O binary -R .note -R .note.gnu.build-id -R .comment -S
targets := Image Image.gz
+DTB_NAMES := $(subst $\",,$(CONFIG_BUILD_ARM64_APPENDED_DTB_IMAGE_NAMES))
+ifneq ($(DTB_NAMES),)
+DTB_LIST := $(addsuffix .dtb,$(DTB_NAMES))
+else
+DTB_LIST := $(dtb-y)
+endif
+DTB_OBJS := $(addprefix $(obj)/dts/,$(DTB_LIST))
+
$(obj)/Image: vmlinux FORCE
$(call if_changed,objcopy)
$(obj)/Image.bz2: $(obj)/Image FORCE
$(call if_changed,bzip2)
+$(obj)/Image-dtb: $(obj)/Image $(DTB_OBJS) FORCE
+ $(call if_changed,cat)
+
$(obj)/Image.gz: $(obj)/Image FORCE
$(call if_changed,gzip)
@@ -36,6 +49,9 @@
$(obj)/Image.lzo: $(obj)/Image FORCE
$(call if_changed,lzo)
+$(obj)/Image.gz-dtb: $(obj)/Image.gz $(DTB_OBJS) FORCE
+ $(call if_changed,cat)
+
install:
$(CONFIG_SHELL) $(srctree)/$(src)/install.sh $(KERNELRELEASE) \
$(obj)/Image System.map "$(INSTALL_PATH)"
diff --git a/arch/arm64/boot/dts/Makefile b/arch/arm64/boot/dts/Makefile
index c6684ab..db5a708 100644
--- a/arch/arm64/boot/dts/Makefile
+++ b/arch/arm64/boot/dts/Makefile
@@ -32,3 +32,17 @@
dtb-$(CONFIG_OF_ALL_DTBS) := $(patsubst $(dtstree)/%.dts,%.dtb, $(foreach d,$(dts-dirs), $(wildcard $(dtstree)/$(d)/*.dts)))
always := $(dtb-y)
+
+targets += dtbs
+
+DTB_NAMES := $(subst $\",,$(CONFIG_BUILD_ARM64_APPENDED_DTB_IMAGE_NAMES))
+ifneq ($(DTB_NAMES),)
+DTB_LIST := $(addsuffix .dtb,$(DTB_NAMES))
+else
+DTB_LIST := $(dtb-y)
+endif
+targets += $(DTB_LIST)
+
+dtbs: $(addprefix $(obj)/, $(DTB_LIST))
+
+clean-files := dts/*.dtb *.dtb
diff --git a/arch/arm64/boot/dts/arm/juno-r2.dts b/arch/arm64/boot/dts/arm/juno-r2.dts
index b39b6d6..d2467e4 100644
--- a/arch/arm64/boot/dts/arm/juno-r2.dts
+++ b/arch/arm64/boot/dts/arm/juno-r2.dts
@@ -98,6 +98,7 @@
next-level-cache = <&A72_L2>;
clocks = <&scpi_dvfs 0>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ sched-energy-costs = <&CPU_COST_A72 &CLUSTER_COST_A72>;
capacity-dmips-mhz = <1024>;
};
@@ -115,6 +116,7 @@
next-level-cache = <&A72_L2>;
clocks = <&scpi_dvfs 0>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ sched-energy-costs = <&CPU_COST_A72 &CLUSTER_COST_A72>;
capacity-dmips-mhz = <1024>;
};
@@ -132,6 +134,7 @@
next-level-cache = <&A53_L2>;
clocks = <&scpi_dvfs 1>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ sched-energy-costs = <&CPU_COST_A53R2 &CLUSTER_COST_A53R2>;
capacity-dmips-mhz = <485>;
};
@@ -149,6 +152,7 @@
next-level-cache = <&A53_L2>;
clocks = <&scpi_dvfs 1>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ sched-energy-costs = <&CPU_COST_A53R2 &CLUSTER_COST_A53R2>;
capacity-dmips-mhz = <485>;
};
@@ -166,6 +170,7 @@
next-level-cache = <&A53_L2>;
clocks = <&scpi_dvfs 1>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ sched-energy-costs = <&CPU_COST_A53R2 &CLUSTER_COST_A53R2>;
capacity-dmips-mhz = <485>;
};
@@ -183,6 +188,7 @@
next-level-cache = <&A53_L2>;
clocks = <&scpi_dvfs 1>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ sched-energy-costs = <&CPU_COST_A53R2 &CLUSTER_COST_A53R2>;
capacity-dmips-mhz = <485>;
};
@@ -199,6 +205,7 @@
cache-line-size = <64>;
cache-sets = <1024>;
};
+ /include/ "juno-sched-energy.dtsi"
};
pmu_a72 {
diff --git a/arch/arm64/boot/dts/arm/juno-sched-energy.dtsi b/arch/arm64/boot/dts/arm/juno-sched-energy.dtsi
new file mode 100644
index 0000000..221196e
--- /dev/null
+++ b/arch/arm64/boot/dts/arm/juno-sched-energy.dtsi
@@ -0,0 +1,123 @@
+/*
+ * ARM JUNO specific energy cost model data. There are no unit requirements for
+ * the data. Data can be normalized to any reference point, but the
+ * normalization must be consistent. That is, one bogo-joule/watt must be the
+ * same quantity for all data, but we don't care what it is.
+ */
+
+energy-costs {
+ /* Juno r0 Energy */
+ CPU_COST_A57: core-cost0 {
+ busy-cost-data = <
+ 417 168
+ 579 251
+ 744 359
+ 883 479
+ 1024 616
+ >;
+ idle-cost-data = <
+ 15
+ 15
+ 0
+ 0
+ >;
+ };
+ CPU_COST_A53: core-cost1 {
+ busy-cost-data = <
+ 235 33
+ 302 46
+ 368 61
+ 406 76
+ 446 93
+ >;
+ idle-cost-data = <
+ 6
+ 6
+ 0
+ 0
+ >;
+ };
+ CLUSTER_COST_A57: cluster-cost0 {
+ busy-cost-data = <
+ 417 24
+ 579 32
+ 744 43
+ 883 49
+ 1024 64
+ >;
+ idle-cost-data = <
+ 65
+ 65
+ 65
+ 24
+ >;
+ };
+ CLUSTER_COST_A53: cluster-cost1 {
+ busy-cost-data = <
+ 235 26
+ 302 30
+ 368 39
+ 406 47
+ 446 57
+ >;
+ idle-cost-data = <
+ 56
+ 56
+ 56
+ 17
+ >;
+ };
+ /* Juno r2 Energy */
+ CPU_COST_A72: core-cost2 {
+ busy-cost-data = <
+ 501 174
+ 849 344
+ 1024 526
+ >;
+ idle-cost-data = <
+ 48
+ 48
+ 0
+ 0
+ >;
+ };
+ CPU_COST_A53R2: core-cost3 {
+ busy-cost-data = <
+ 276 37
+ 501 59
+ 593 117
+ >;
+ idle-cost-data = <
+ 33
+ 33
+ 0
+ 0
+ >;
+ };
+ CLUSTER_COST_A72: cluster-cost2 {
+ busy-cost-data = <
+ 501 48
+ 849 73
+ 1024 107
+ >;
+ idle-cost-data = <
+ 48
+ 48
+ 48
+ 18
+ >;
+ };
+ CLUSTER_COST_A53R2: cluster-cost3 {
+ busy-cost-data = <
+ 276 41
+ 501 86
+ 593 107
+ >;
+ idle-cost-data = <
+ 41
+ 41
+ 41
+ 14
+ >;
+ };
+};
diff --git a/arch/arm64/boot/dts/arm/juno.dts b/arch/arm64/boot/dts/arm/juno.dts
index c9236c4..ae5306a 100644
--- a/arch/arm64/boot/dts/arm/juno.dts
+++ b/arch/arm64/boot/dts/arm/juno.dts
@@ -97,6 +97,7 @@
next-level-cache = <&A57_L2>;
clocks = <&scpi_dvfs 0>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ sched-energy-costs = <&CPU_COST_A57 &CLUSTER_COST_A57>;
capacity-dmips-mhz = <1024>;
};
@@ -114,6 +115,7 @@
next-level-cache = <&A57_L2>;
clocks = <&scpi_dvfs 0>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ sched-energy-costs = <&CPU_COST_A57 &CLUSTER_COST_A57>;
capacity-dmips-mhz = <1024>;
};
@@ -131,6 +133,7 @@
next-level-cache = <&A53_L2>;
clocks = <&scpi_dvfs 1>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ sched-energy-costs = <&CPU_COST_A53 &CLUSTER_COST_A53>;
capacity-dmips-mhz = <578>;
};
@@ -148,6 +151,7 @@
next-level-cache = <&A53_L2>;
clocks = <&scpi_dvfs 1>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ sched-energy-costs = <&CPU_COST_A53 &CLUSTER_COST_A53>;
capacity-dmips-mhz = <578>;
};
@@ -165,6 +169,7 @@
next-level-cache = <&A53_L2>;
clocks = <&scpi_dvfs 1>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ sched-energy-costs = <&CPU_COST_A53 &CLUSTER_COST_A53>;
capacity-dmips-mhz = <578>;
};
@@ -182,6 +187,7 @@
next-level-cache = <&A53_L2>;
clocks = <&scpi_dvfs 1>;
cpu-idle-states = <&CPU_SLEEP_0 &CLUSTER_SLEEP_0>;
+ sched-energy-costs = <&CPU_COST_A53 &CLUSTER_COST_A53>;
capacity-dmips-mhz = <578>;
};
@@ -198,6 +204,7 @@
cache-line-size = <64>;
cache-sets = <1024>;
};
+ /include/ "juno-sched-energy.dtsi"
};
pmu_a57 {
diff --git a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
index ff1dc89..66d48e3 100644
--- a/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
+++ b/arch/arm64/boot/dts/hisilicon/hi6220.dtsi
@@ -92,7 +92,9 @@
cooling-max-level = <0>;
#cooling-cells = <2>; /* min followed by max */
cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
+ sched-energy-costs = <&CPU_COST &CLUSTER_COST &SYSTEM_COST>;
dynamic-power-coefficient = <311>;
+ capacity-dmips-mhz = <1024>;
};
cpu1: cpu@1 {
@@ -103,6 +105,8 @@
next-level-cache = <&CLUSTER0_L2>;
operating-points-v2 = <&cpu_opp_table>;
cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
+ sched-energy-costs = <&CPU_COST &CLUSTER_COST &SYSTEM_COST>;
+ capacity-dmips-mhz = <1024>;
};
cpu2: cpu@2 {
@@ -113,6 +117,7 @@
next-level-cache = <&CLUSTER0_L2>;
operating-points-v2 = <&cpu_opp_table>;
cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
+ capacity-dmips-mhz = <1024>;
};
cpu3: cpu@3 {
@@ -123,6 +128,8 @@
next-level-cache = <&CLUSTER0_L2>;
operating-points-v2 = <&cpu_opp_table>;
cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
+ sched-energy-costs = <&CPU_COST &CLUSTER_COST &SYSTEM_COST>;
+ capacity-dmips-mhz = <1024>;
};
cpu4: cpu@100 {
@@ -133,6 +140,7 @@
next-level-cache = <&CLUSTER1_L2>;
operating-points-v2 = <&cpu_opp_table>;
cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
+ capacity-dmips-mhz = <1024>;
};
cpu5: cpu@101 {
@@ -143,6 +151,8 @@
next-level-cache = <&CLUSTER1_L2>;
operating-points-v2 = <&cpu_opp_table>;
cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
+ sched-energy-costs = <&CPU_COST &CLUSTER_COST &SYSTEM_COST>;
+ capacity-dmips-mhz = <1024>;
};
cpu6: cpu@102 {
@@ -153,6 +163,8 @@
next-level-cache = <&CLUSTER1_L2>;
operating-points-v2 = <&cpu_opp_table>;
cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
+ sched-energy-costs = <&CPU_COST &CLUSTER_COST &SYSTEM_COST>;
+ capacity-dmips-mhz = <1024>;
};
cpu7: cpu@103 {
@@ -163,6 +175,8 @@
next-level-cache = <&CLUSTER1_L2>;
operating-points-v2 = <&cpu_opp_table>;
cpu-idle-states = <&CPU_SLEEP &CLUSTER_SLEEP>;
+ sched-energy-costs = <&CPU_COST &CLUSTER_COST &SYSTEM_COST>;
+ capacity-dmips-mhz = <1024>;
};
CLUSTER0_L2: l2-cache0 {
@@ -172,6 +186,50 @@
CLUSTER1_L2: l2-cache1 {
compatible = "cache";
};
+
+ energy-costs {
+ SYSTEM_COST: system-cost0 {
+ busy-cost-data = <
+ 1024 0
+ >;
+ idle-cost-data = <
+ 0
+ 0
+ 0
+ 0
+ >;
+ };
+ CLUSTER_COST: cluster-cost0 {
+ busy-cost-data = <
+ 178 16
+ 369 29
+ 622 47
+ 819 75
+ 1024 112
+ >;
+ idle-cost-data = <
+ 107
+ 107
+ 47
+ 0
+ >;
+ };
+ CPU_COST: core-cost0 {
+ busy-cost-data = <
+ 178 69
+ 369 125
+ 622 224
+ 819 367
+ 1024 670
+ >;
+ idle-cost-data = <
+ 15
+ 15
+ 0
+ 0
+ >;
+ };
+ };
};
cpu_opp_table: cpu_opp_table {
diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index b057965..14f170f 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -4,6 +4,7 @@
CONFIG_NO_HZ_IDLE=y
CONFIG_HIGH_RES_TIMERS=y
CONFIG_IRQ_TIME_ACCOUNTING=y
+CONFIG_SCHED_WALT=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
CONFIG_TASKSTATS=y
@@ -24,8 +25,9 @@
CONFIG_CGROUP_PERF=y
CONFIG_USER_NS=y
CONFIG_SCHED_AUTOGROUP=y
+CONFIG_SCHED_TUNE=y
+CONFIG_DEFAULT_USE_ENERGY_AWARE=y
CONFIG_BLK_DEV_INITRD=y
-CONFIG_KALLSYMS_ALL=y
# CONFIG_COMPAT_BRK is not set
CONFIG_PROFILING=y
CONFIG_JUMP_LABEL=y
@@ -69,13 +71,13 @@
CONFIG_PCI_LAYERSCAPE=y
CONFIG_PCI_HISI=y
CONFIG_PCIE_QCOM=y
-CONFIG_PCIE_KIRIN=y
CONFIG_PCIE_ARMADA_8K=y
+CONFIG_PCIE_KIRIN=y
CONFIG_PCI_AARDVARK=y
CONFIG_PCIE_RCAR=y
-CONFIG_PCIE_ROCKCHIP=m
CONFIG_PCI_HOST_GENERIC=y
CONFIG_PCI_XGENE=y
+CONFIG_PCIE_ROCKCHIP=m
CONFIG_ARM64_VA_BITS_48=y
CONFIG_SCHED_MC=y
CONFIG_NUMA=y
@@ -93,6 +95,12 @@
CONFIG_WQ_POWER_EFFICIENT_DEFAULT=y
CONFIG_ARM_CPUIDLE=y
CONFIG_CPU_FREQ=y
+CONFIG_CPU_FREQ_STAT=y
+CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL=y
+CONFIG_CPU_FREQ_GOV_POWERSAVE=y
+CONFIG_CPU_FREQ_GOV_USERSPACE=y
+CONFIG_CPU_FREQ_GOV_ONDEMAND=y
+CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y
CONFIG_CPUFREQ_DT=y
CONFIG_ARM_BIG_LITTLE_CPUFREQ=y
CONFIG_ARM_SCPI_CPUFREQ=y
@@ -140,11 +148,10 @@
CONFIG_BT_LEDS=y
# CONFIG_BT_DEBUGFS is not set
CONFIG_BT_HCIUART=m
-CONFIG_BT_HCIUART_LL=y
-CONFIG_CFG80211=m
-CONFIG_MAC80211=m
+CONFIG_CFG80211=y
+CONFIG_MAC80211=y
CONFIG_MAC80211_LEDS=y
-CONFIG_RFKILL=m
+CONFIG_RFKILL=y
CONFIG_NET_9P=y
CONFIG_NET_9P_VIRTIO=y
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
@@ -210,21 +217,16 @@
CONFIG_ROCKCHIP_PHY=y
CONFIG_USB_PEGASUS=m
CONFIG_USB_RTL8150=m
-CONFIG_USB_RTL8152=m
-CONFIG_USB_USBNET=m
-CONFIG_USB_NET_DM9601=m
-CONFIG_USB_NET_SR9800=m
-CONFIG_USB_NET_SMSC75XX=m
-CONFIG_USB_NET_SMSC95XX=m
-CONFIG_USB_NET_PLUSB=m
-CONFIG_USB_NET_MCS7830=m
+CONFIG_USB_RTL8152=y
CONFIG_BRCMFMAC=m
+CONFIG_RTL_CARDS=m
CONFIG_WL18XX=m
CONFIG_WLCORE_SDIO=m
+CONFIG_USB_NET_RNDIS_WLAN=y
CONFIG_INPUT_EVDEV=y
CONFIG_KEYBOARD_ADC=m
-CONFIG_KEYBOARD_CROS_EC=y
CONFIG_KEYBOARD_GPIO=y
+CONFIG_KEYBOARD_CROS_EC=y
CONFIG_INPUT_MISC=y
CONFIG_INPUT_PM8941_PWRKEY=y
CONFIG_INPUT_HISI_POWERKEY=y
@@ -275,20 +277,20 @@
CONFIG_I2C_RCAR=y
CONFIG_I2C_CROS_EC_TUNNEL=y
CONFIG_SPI=y
-CONFIG_SPI_MESON_SPICC=m
-CONFIG_SPI_MESON_SPIFC=m
CONFIG_SPI_BCM2835=m
CONFIG_SPI_BCM2835AUX=m
+CONFIG_SPI_MESON_SPICC=m
+CONFIG_SPI_MESON_SPIFC=m
CONFIG_SPI_ORION=y
CONFIG_SPI_PL022=y
-CONFIG_SPI_QUP=y
CONFIG_SPI_ROCKCHIP=y
+CONFIG_SPI_QUP=y
CONFIG_SPI_S3C64XX=y
CONFIG_SPI_SPIDEV=m
CONFIG_SPMI=y
-CONFIG_PINCTRL_IPQ8074=y
CONFIG_PINCTRL_SINGLE=y
CONFIG_PINCTRL_MAX77620=y
+CONFIG_PINCTRL_IPQ8074=y
CONFIG_PINCTRL_MSM8916=y
CONFIG_PINCTRL_MSM8994=y
CONFIG_PINCTRL_MSM8996=y
@@ -315,9 +317,8 @@
CONFIG_THERMAL_GOV_POWER_ALLOCATOR=y
CONFIG_CPU_THERMAL=y
CONFIG_THERMAL_EMULATION=y
-CONFIG_BRCMSTB_THERMAL=m
-CONFIG_EXYNOS_THERMAL=y
CONFIG_ROCKCHIP_THERMAL=m
+CONFIG_EXYNOS_THERMAL=y
CONFIG_WATCHDOG=y
CONFIG_S3C2410_WATCHDOG=y
CONFIG_MESON_GXBB_WATCHDOG=m
@@ -336,9 +337,9 @@
CONFIG_MFD_SPMI_PMIC=y
CONFIG_MFD_RK808=y
CONFIG_MFD_SEC_CORE=y
+CONFIG_REGULATOR_FIXED_VOLTAGE=y
CONFIG_REGULATOR_AXP20X=y
CONFIG_REGULATOR_FAN53555=y
-CONFIG_REGULATOR_FIXED_VOLTAGE=y
CONFIG_REGULATOR_GPIO=y
CONFIG_REGULATOR_HI6421V530=y
CONFIG_REGULATOR_HI655X=y
@@ -348,16 +349,13 @@
CONFIG_REGULATOR_QCOM_SPMI=y
CONFIG_REGULATOR_RK808=y
CONFIG_REGULATOR_S2MPS11=y
+CONFIG_RC_DEVICES=y
+CONFIG_IR_MESON=m
CONFIG_MEDIA_SUPPORT=m
CONFIG_MEDIA_CAMERA_SUPPORT=y
CONFIG_MEDIA_ANALOG_TV_SUPPORT=y
CONFIG_MEDIA_DIGITAL_TV_SUPPORT=y
CONFIG_MEDIA_CONTROLLER=y
-CONFIG_MEDIA_RC_SUPPORT=y
-CONFIG_RC_CORE=m
-CONFIG_RC_DEVICES=y
-CONFIG_RC_DECODERS=y
-CONFIG_IR_MESON=m
CONFIG_VIDEO_V4L2_SUBDEV_API=y
# CONFIG_DVB_NET is not set
CONFIG_V4L_MEM2MEM_DRIVERS=y
@@ -395,7 +393,6 @@
CONFIG_BACKLIGHT_GENERIC=m
CONFIG_BACKLIGHT_PWM=m
CONFIG_BACKLIGHT_LP855X=m
-CONFIG_FRAMEBUFFER_CONSOLE=y
CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
@@ -494,7 +491,6 @@
CONFIG_COMMON_CLK_RK808=y
CONFIG_COMMON_CLK_SCPI=y
CONFIG_COMMON_CLK_CS2000_CP=y
-CONFIG_COMMON_CLK_S2MPS11=y
CONFIG_CLK_QORIQ=y
CONFIG_COMMON_CLK_PWM=y
CONFIG_COMMON_CLK_QCOM=y
@@ -533,13 +529,13 @@
CONFIG_PWM_ROCKCHIP=y
CONFIG_PWM_SAMSUNG=y
CONFIG_PWM_TEGRA=m
-CONFIG_PHY_RCAR_GEN3_USB2=y
-CONFIG_PHY_HI6220_USB=y
-CONFIG_PHY_SUN4I_USB=y
-CONFIG_PHY_ROCKCHIP_INNO_USB2=y
-CONFIG_PHY_ROCKCHIP_EMMC=y
-CONFIG_PHY_ROCKCHIP_PCIE=m
CONFIG_PHY_XGENE=y
+CONFIG_PHY_SUN4I_USB=y
+CONFIG_PHY_HI6220_USB=y
+CONFIG_PHY_RCAR_GEN3_USB2=y
+CONFIG_PHY_ROCKCHIP_EMMC=y
+CONFIG_PHY_ROCKCHIP_INNO_USB2=y
+CONFIG_PHY_ROCKCHIP_PCIE=m
CONFIG_PHY_TEGRA_XUSB=y
CONFIG_QCOM_L2_PMU=y
CONFIG_QCOM_L3_PMU=y
@@ -581,29 +577,27 @@
CONFIG_KVM=y
CONFIG_PRINTK_TIME=y
CONFIG_DEBUG_INFO=y
-CONFIG_DEBUG_FS=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_KERNEL=y
-CONFIG_LOCKUP_DETECTOR=y
-# CONFIG_SCHED_DEBUG is not set
+CONFIG_SCHEDSTATS=y
# CONFIG_DEBUG_PREEMPT is not set
-# CONFIG_FTRACE is not set
+CONFIG_PROVE_LOCKING=y
+CONFIG_FUNCTION_TRACER=y
+CONFIG_IRQSOFF_TRACER=y
+CONFIG_PREEMPT_TRACER=y
+CONFIG_SCHED_TRACER=y
CONFIG_MEMTEST=y
CONFIG_SECURITY=y
CONFIG_CRYPTO_ECHAINIV=y
CONFIG_CRYPTO_ANSI_CPRNG=y
CONFIG_ARM64_CRYPTO=y
-CONFIG_CRYPTO_SHA256_ARM64=m
CONFIG_CRYPTO_SHA512_ARM64=m
CONFIG_CRYPTO_SHA1_ARM64_CE=y
CONFIG_CRYPTO_SHA2_ARM64_CE=y
CONFIG_CRYPTO_GHASH_ARM64_CE=y
CONFIG_CRYPTO_CRCT10DIF_ARM64_CE=m
CONFIG_CRYPTO_CRC32_ARM64_CE=m
-CONFIG_CRYPTO_AES_ARM64=m
-CONFIG_CRYPTO_AES_ARM64_CE=m
CONFIG_CRYPTO_AES_ARM64_CE_CCM=y
CONFIG_CRYPTO_AES_ARM64_CE_BLK=y
-CONFIG_CRYPTO_AES_ARM64_NEON_BLK=m
CONFIG_CRYPTO_CHACHA20_NEON=m
CONFIG_CRYPTO_AES_ARM64_BS=m
diff --git a/arch/arm64/configs/ranchu64_defconfig b/arch/arm64/configs/ranchu64_defconfig
new file mode 100644
index 0000000..3d2eb32
--- /dev/null
+++ b/arch/arm64/configs/ranchu64_defconfig
@@ -0,0 +1,309 @@
+# CONFIG_LOCALVERSION_AUTO is not set
+# CONFIG_SWAP is not set
+CONFIG_POSIX_MQUEUE=y
+CONFIG_AUDIT=y
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_BSD_PROCESS_ACCT=y
+CONFIG_BSD_PROCESS_ACCT_V3=y
+CONFIG_TASKSTATS=y
+CONFIG_TASK_DELAY_ACCT=y
+CONFIG_TASK_XACCT=y
+CONFIG_TASK_IO_ACCOUNTING=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+CONFIG_LOG_BUF_SHIFT=14
+CONFIG_CGROUP_DEBUG=y
+CONFIG_CGROUP_FREEZER=y
+CONFIG_CGROUP_CPUACCT=y
+CONFIG_RT_GROUP_SCHED=y
+CONFIG_SCHED_AUTOGROUP=y
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_KALLSYMS_ALL=y
+CONFIG_EMBEDDED=y
+# CONFIG_COMPAT_BRK is not set
+CONFIG_PROFILING=y
+CONFIG_ARCH_MMAP_RND_BITS=24
+CONFIG_ARCH_MMAP_RND_COMPAT_BITS=16
+# CONFIG_BLK_DEV_BSG is not set
+# CONFIG_IOSCHED_DEADLINE is not set
+CONFIG_ARCH_VEXPRESS=y
+CONFIG_NR_CPUS=4
+CONFIG_PREEMPT=y
+CONFIG_KSM=y
+CONFIG_SECCOMP=y
+CONFIG_ARMV8_DEPRECATED=y
+CONFIG_SWP_EMULATION=y
+CONFIG_CP15_BARRIER_EMULATION=y
+CONFIG_SETEND_EMULATION=y
+CONFIG_CMDLINE="console=ttyAMA0"
+# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
+CONFIG_COMPAT=y
+CONFIG_PM_AUTOSLEEP=y
+CONFIG_PM_WAKELOCKS=y
+CONFIG_PM_WAKELOCKS_LIMIT=0
+# CONFIG_PM_WAKELOCKS_GC is not set
+CONFIG_PM_DEBUG=y
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_XFRM_USER=y
+CONFIG_NET_KEY=y
+CONFIG_INET=y
+CONFIG_INET_DIAG_DESTROY=y
+CONFIG_IP_MULTICAST=y
+CONFIG_IP_ADVANCED_ROUTER=y
+CONFIG_IP_MULTIPLE_TABLES=y
+CONFIG_IP_PNP=y
+CONFIG_IP_PNP_DHCP=y
+CONFIG_IP_PNP_BOOTP=y
+CONFIG_INET_ESP=y
+# CONFIG_INET_LRO is not set
+CONFIG_IPV6_ROUTER_PREF=y
+CONFIG_IPV6_ROUTE_INFO=y
+CONFIG_IPV6_OPTIMISTIC_DAD=y
+CONFIG_INET6_AH=y
+CONFIG_INET6_ESP=y
+CONFIG_INET6_IPCOMP=y
+CONFIG_IPV6_MIP6=y
+CONFIG_IPV6_MULTIPLE_TABLES=y
+CONFIG_NETFILTER=y
+CONFIG_NF_CONNTRACK=y
+CONFIG_NF_CONNTRACK_SECMARK=y
+CONFIG_NF_CONNTRACK_EVENTS=y
+CONFIG_NF_CT_PROTO_DCCP=y
+CONFIG_NF_CT_PROTO_SCTP=y
+CONFIG_NF_CT_PROTO_UDPLITE=y
+CONFIG_NF_CONNTRACK_AMANDA=y
+CONFIG_NF_CONNTRACK_FTP=y
+CONFIG_NF_CONNTRACK_H323=y
+CONFIG_NF_CONNTRACK_IRC=y
+CONFIG_NF_CONNTRACK_NETBIOS_NS=y
+CONFIG_NF_CONNTRACK_PPTP=y
+CONFIG_NF_CONNTRACK_SANE=y
+CONFIG_NF_CONNTRACK_TFTP=y
+CONFIG_NF_CT_NETLINK=y
+CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y
+CONFIG_NETFILTER_XT_TARGET_CONNMARK=y
+CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y
+CONFIG_NETFILTER_XT_TARGET_IDLETIMER=y
+CONFIG_NETFILTER_XT_TARGET_MARK=y
+CONFIG_NETFILTER_XT_TARGET_NFLOG=y
+CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y
+CONFIG_NETFILTER_XT_TARGET_TPROXY=y
+CONFIG_NETFILTER_XT_TARGET_TRACE=y
+CONFIG_NETFILTER_XT_TARGET_SECMARK=y
+CONFIG_NETFILTER_XT_TARGET_TCPMSS=y
+CONFIG_NETFILTER_XT_MATCH_COMMENT=y
+CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y
+CONFIG_NETFILTER_XT_MATCH_CONNMARK=y
+CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y
+CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y
+CONFIG_NETFILTER_XT_MATCH_HELPER=y
+CONFIG_NETFILTER_XT_MATCH_IPRANGE=y
+CONFIG_NETFILTER_XT_MATCH_LENGTH=y
+CONFIG_NETFILTER_XT_MATCH_LIMIT=y
+CONFIG_NETFILTER_XT_MATCH_MAC=y
+CONFIG_NETFILTER_XT_MATCH_MARK=y
+CONFIG_NETFILTER_XT_MATCH_POLICY=y
+CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y
+CONFIG_NETFILTER_XT_MATCH_QTAGUID=y
+CONFIG_NETFILTER_XT_MATCH_QUOTA=y
+CONFIG_NETFILTER_XT_MATCH_QUOTA2=y
+CONFIG_NETFILTER_XT_MATCH_SOCKET=y
+CONFIG_NETFILTER_XT_MATCH_STATE=y
+CONFIG_NETFILTER_XT_MATCH_STATISTIC=y
+CONFIG_NETFILTER_XT_MATCH_STRING=y
+CONFIG_NETFILTER_XT_MATCH_TIME=y
+CONFIG_NETFILTER_XT_MATCH_U32=y
+CONFIG_NF_CONNTRACK_IPV4=y
+CONFIG_IP_NF_IPTABLES=y
+CONFIG_IP_NF_MATCH_AH=y
+CONFIG_IP_NF_MATCH_ECN=y
+CONFIG_IP_NF_MATCH_RPFILTER=y
+CONFIG_IP_NF_MATCH_TTL=y
+CONFIG_IP_NF_FILTER=y
+CONFIG_IP_NF_TARGET_REJECT=y
+CONFIG_IP_NF_MANGLE=y
+CONFIG_IP_NF_TARGET_ECN=y
+CONFIG_IP_NF_TARGET_TTL=y
+CONFIG_IP_NF_RAW=y
+CONFIG_IP_NF_SECURITY=y
+CONFIG_IP_NF_ARPTABLES=y
+CONFIG_IP_NF_ARPFILTER=y
+CONFIG_IP_NF_ARP_MANGLE=y
+CONFIG_NF_CONNTRACK_IPV6=y
+CONFIG_IP6_NF_IPTABLES=y
+CONFIG_IP6_NF_MATCH_AH=y
+CONFIG_IP6_NF_MATCH_EUI64=y
+CONFIG_IP6_NF_MATCH_FRAG=y
+CONFIG_IP6_NF_MATCH_OPTS=y
+CONFIG_IP6_NF_MATCH_HL=y
+CONFIG_IP6_NF_MATCH_IPV6HEADER=y
+CONFIG_IP6_NF_MATCH_MH=y
+CONFIG_IP6_NF_MATCH_RT=y
+CONFIG_IP6_NF_TARGET_HL=y
+CONFIG_IP6_NF_FILTER=y
+CONFIG_IP6_NF_TARGET_REJECT=y
+CONFIG_IP6_NF_MANGLE=y
+CONFIG_IP6_NF_RAW=y
+CONFIG_BRIDGE=y
+CONFIG_NET_SCHED=y
+CONFIG_NET_SCH_HTB=y
+CONFIG_NET_CLS_U32=y
+CONFIG_NET_EMATCH=y
+CONFIG_NET_EMATCH_U32=y
+CONFIG_NET_CLS_ACT=y
+# CONFIG_WIRELESS is not set
+CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
+CONFIG_BLK_DEV_LOOP=y
+CONFIG_BLK_DEV_RAM=y
+CONFIG_BLK_DEV_RAM_SIZE=8192
+CONFIG_VIRTIO_BLK=y
+CONFIG_SCSI=y
+# CONFIG_SCSI_PROC_FS is not set
+CONFIG_BLK_DEV_SD=y
+# CONFIG_SCSI_LOWLEVEL is not set
+CONFIG_MD=y
+CONFIG_BLK_DEV_DM=y
+CONFIG_DM_CRYPT=y
+CONFIG_DM_UEVENT=y
+CONFIG_DM_VERITY=y
+CONFIG_DM_VERITY_FEC=y
+CONFIG_NETDEVICES=y
+CONFIG_TUN=y
+CONFIG_VIRTIO_NET=y
+CONFIG_SMC91X=y
+CONFIG_PPP=y
+CONFIG_PPP_BSDCOMP=y
+CONFIG_PPP_DEFLATE=y
+CONFIG_PPP_MPPE=y
+# CONFIG_WLAN is not set
+CONFIG_INPUT_EVDEV=y
+CONFIG_INPUT_KEYRESET=y
+CONFIG_KEYBOARD_GOLDFISH_EVENTS=y
+# CONFIG_INPUT_MOUSE is not set
+CONFIG_INPUT_JOYSTICK=y
+CONFIG_INPUT_TABLET=y
+CONFIG_INPUT_MISC=y
+CONFIG_INPUT_UINPUT=y
+CONFIG_INPUT_GPIO=y
+# CONFIG_SERIO_SERPORT is not set
+# CONFIG_VT is not set
+# CONFIG_LEGACY_PTYS is not set
+# CONFIG_DEVMEM is not set
+# CONFIG_DEVKMEM is not set
+CONFIG_SERIAL_AMBA_PL011=y
+CONFIG_SERIAL_AMBA_PL011_CONSOLE=y
+CONFIG_VIRTIO_CONSOLE=y
+# CONFIG_HW_RANDOM is not set
+CONFIG_BATTERY_GOLDFISH=y
+# CONFIG_HWMON is not set
+CONFIG_MEDIA_SUPPORT=y
+CONFIG_FB=y
+CONFIG_FB_GOLDFISH=y
+CONFIG_FB_SIMPLE=y
+CONFIG_BACKLIGHT_LCD_SUPPORT=y
+CONFIG_LOGO=y
+# CONFIG_LOGO_LINUX_MONO is not set
+# CONFIG_LOGO_LINUX_VGA16 is not set
+CONFIG_SOUND=y
+CONFIG_SND=y
+CONFIG_HIDRAW=y
+CONFIG_UHID=y
+CONFIG_HID_A4TECH=y
+CONFIG_HID_ACRUX=y
+CONFIG_HID_ACRUX_FF=y
+CONFIG_HID_APPLE=y
+CONFIG_HID_BELKIN=y
+CONFIG_HID_CHERRY=y
+CONFIG_HID_CHICONY=y
+CONFIG_HID_PRODIKEYS=y
+CONFIG_HID_CYPRESS=y
+CONFIG_HID_DRAGONRISE=y
+CONFIG_DRAGONRISE_FF=y
+CONFIG_HID_EMS_FF=y
+CONFIG_HID_ELECOM=y
+CONFIG_HID_EZKEY=y
+CONFIG_HID_KEYTOUCH=y
+CONFIG_HID_KYE=y
+CONFIG_HID_WALTOP=y
+CONFIG_HID_GYRATION=y
+CONFIG_HID_TWINHAN=y
+CONFIG_HID_KENSINGTON=y
+CONFIG_HID_LCPOWER=y
+CONFIG_HID_LOGITECH=y
+CONFIG_HID_LOGITECH_DJ=y
+CONFIG_LOGITECH_FF=y
+CONFIG_LOGIRUMBLEPAD2_FF=y
+CONFIG_LOGIG940_FF=y
+CONFIG_HID_MAGICMOUSE=y
+CONFIG_HID_MICROSOFT=y
+CONFIG_HID_MONTEREY=y
+CONFIG_HID_MULTITOUCH=y
+CONFIG_HID_ORTEK=y
+CONFIG_HID_PANTHERLORD=y
+CONFIG_PANTHERLORD_FF=y
+CONFIG_HID_PETALYNX=y
+CONFIG_HID_PICOLCD=y
+CONFIG_HID_PRIMAX=y
+CONFIG_HID_SAITEK=y
+CONFIG_HID_SAMSUNG=y
+CONFIG_HID_SPEEDLINK=y
+CONFIG_HID_SUNPLUS=y
+CONFIG_HID_GREENASIA=y
+CONFIG_GREENASIA_FF=y
+CONFIG_HID_SMARTJOYPLUS=y
+CONFIG_SMARTJOYPLUS_FF=y
+CONFIG_HID_TIVO=y
+CONFIG_HID_TOPSEED=y
+CONFIG_HID_THRUSTMASTER=y
+CONFIG_HID_WACOM=y
+CONFIG_HID_WIIMOTE=y
+CONFIG_HID_ZEROPLUS=y
+CONFIG_HID_ZYDACRON=y
+# CONFIG_USB_SUPPORT is not set
+CONFIG_RTC_CLASS=y
+CONFIG_VIRTIO_MMIO=y
+CONFIG_STAGING=y
+CONFIG_ASHMEM=y
+CONFIG_ANDROID_TIMED_GPIO=y
+CONFIG_ANDROID_LOW_MEMORY_KILLER=y
+CONFIG_SYNC=y
+CONFIG_SW_SYNC=y
+CONFIG_SW_SYNC_USER=y
+CONFIG_ION=y
+CONFIG_GOLDFISH_AUDIO=y
+CONFIG_GOLDFISH=y
+CONFIG_GOLDFISH_PIPE=y
+# CONFIG_IOMMU_SUPPORT is not set
+CONFIG_ANDROID=y
+CONFIG_ANDROID_BINDER_IPC=y
+CONFIG_EXT2_FS=y
+CONFIG_EXT4_FS=y
+CONFIG_EXT4_FS_SECURITY=y
+CONFIG_QUOTA=y
+CONFIG_FUSE_FS=y
+CONFIG_CUSE=y
+CONFIG_MSDOS_FS=y
+CONFIG_VFAT_FS=y
+CONFIG_TMPFS=y
+CONFIG_TMPFS_POSIX_ACL=y
+# CONFIG_MISC_FILESYSTEMS is not set
+CONFIG_NFS_FS=y
+CONFIG_ROOT_NFS=y
+CONFIG_NLS_CODEPAGE_437=y
+CONFIG_NLS_ISO8859_1=y
+CONFIG_DEBUG_INFO=y
+CONFIG_DEBUG_FS=y
+CONFIG_MAGIC_SYSRQ=y
+CONFIG_PANIC_TIMEOUT=5
+# CONFIG_SCHED_DEBUG is not set
+CONFIG_SCHEDSTATS=y
+CONFIG_TIMER_STATS=y
+# CONFIG_FTRACE is not set
+CONFIG_ATOMIC64_SELFTEST=y
+CONFIG_DEBUG_RODATA=y
+CONFIG_SECURITY=y
+CONFIG_SECURITY_NETWORK=y
+CONFIG_SECURITY_SELINUX=y
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 70c517a..f856628 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -95,4 +95,10 @@
select CRYPTO_AES_ARM64
select CRYPTO_SIMD
+config CRYPTO_SPECK_NEON
+ tristate "NEON accelerated Speck cipher algorithms"
+ depends on KERNEL_MODE_NEON
+ select CRYPTO_BLKCIPHER
+ select CRYPTO_SPECK
+
endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index 93dbf48..e761c0a 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -44,6 +44,9 @@
obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
+obj-$(CONFIG_CRYPTO_SPECK_NEON) += speck-neon.o
+speck-neon-y := speck-neon-core.o speck-neon-glue.o
+
obj-$(CONFIG_CRYPTO_AES_ARM64) += aes-arm64.o
aes-arm64-y := aes-cipher-core.o aes-cipher-glue.o
diff --git a/arch/arm64/crypto/sha1-ce-glue.c b/arch/arm64/crypto/sha1-ce-glue.c
index efbeb3e..656b959 100644
--- a/arch/arm64/crypto/sha1-ce-glue.c
+++ b/arch/arm64/crypto/sha1-ce-glue.c
@@ -29,6 +29,14 @@
asmlinkage void sha1_ce_transform(struct sha1_ce_state *sst, u8 const *src,
int blocks);
+#ifdef CONFIG_CFI_CLANG
+static inline void __cfi_sha1_ce_transform(struct sha1_state *sst,
+ u8 const *src, int blocks)
+{
+ sha1_ce_transform((struct sha1_ce_state *)sst, src, blocks);
+}
+#define sha1_ce_transform __cfi_sha1_ce_transform
+#endif
const u32 sha1_ce_offsetof_count = offsetof(struct sha1_ce_state, sst.count);
const u32 sha1_ce_offsetof_finalize = offsetof(struct sha1_ce_state, finalize);
diff --git a/arch/arm64/crypto/sha2-ce-glue.c b/arch/arm64/crypto/sha2-ce-glue.c
index fd1ff2b..ddda748 100644
--- a/arch/arm64/crypto/sha2-ce-glue.c
+++ b/arch/arm64/crypto/sha2-ce-glue.c
@@ -29,6 +29,14 @@
asmlinkage void sha2_ce_transform(struct sha256_ce_state *sst, u8 const *src,
int blocks);
+#ifdef CONFIG_CFI_CLANG
+static inline void __cfi_sha2_ce_transform(struct sha256_state *sst,
+ u8 const *src, int blocks)
+{
+ sha2_ce_transform((struct sha256_ce_state *)sst, src, blocks);
+}
+#define sha2_ce_transform __cfi_sha2_ce_transform
+#endif
const u32 sha256_ce_offsetof_count = offsetof(struct sha256_ce_state,
sst.count);
diff --git a/arch/arm64/crypto/speck-neon-core.S b/arch/arm64/crypto/speck-neon-core.S
new file mode 100644
index 0000000..b144634
--- /dev/null
+++ b/arch/arm64/crypto/speck-neon-core.S
@@ -0,0 +1,352 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ARM64 NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
+ *
+ * Copyright (c) 2018 Google, Inc
+ *
+ * Author: Eric Biggers <ebiggers@google.com>
+ */
+
+#include <linux/linkage.h>
+
+ .text
+
+ // arguments
+ ROUND_KEYS .req x0 // const {u64,u32} *round_keys
+ NROUNDS .req w1 // int nrounds
+ NROUNDS_X .req x1
+ DST .req x2 // void *dst
+ SRC .req x3 // const void *src
+ NBYTES .req w4 // unsigned int nbytes
+ TWEAK .req x5 // void *tweak
+
+ // registers which hold the data being encrypted/decrypted
+ // (underscores avoid a naming collision with ARM64 registers x0-x3)
+ X_0 .req v0
+ Y_0 .req v1
+ X_1 .req v2
+ Y_1 .req v3
+ X_2 .req v4
+ Y_2 .req v5
+ X_3 .req v6
+ Y_3 .req v7
+
+ // the round key, duplicated in all lanes
+ ROUND_KEY .req v8
+
+ // index vector for tbl-based 8-bit rotates
+ ROTATE_TABLE .req v9
+ ROTATE_TABLE_Q .req q9
+
+ // temporary registers
+ TMP0 .req v10
+ TMP1 .req v11
+ TMP2 .req v12
+ TMP3 .req v13
+
+ // multiplication table for updating XTS tweaks
+ GFMUL_TABLE .req v14
+ GFMUL_TABLE_Q .req q14
+
+ // next XTS tweak value(s)
+ TWEAKV_NEXT .req v15
+
+ // XTS tweaks for the blocks currently being encrypted/decrypted
+ TWEAKV0 .req v16
+ TWEAKV1 .req v17
+ TWEAKV2 .req v18
+ TWEAKV3 .req v19
+ TWEAKV4 .req v20
+ TWEAKV5 .req v21
+ TWEAKV6 .req v22
+ TWEAKV7 .req v23
+
+ .align 4
+.Lror64_8_table:
+ .octa 0x080f0e0d0c0b0a090007060504030201
+.Lror32_8_table:
+ .octa 0x0c0f0e0d080b0a090407060500030201
+.Lrol64_8_table:
+ .octa 0x0e0d0c0b0a09080f0605040302010007
+.Lrol32_8_table:
+ .octa 0x0e0d0c0f0a09080b0605040702010003
+.Lgf128mul_table:
+ .octa 0x00000000000000870000000000000001
+.Lgf64mul_table:
+ .octa 0x0000000000000000000000002d361b00
+
+/*
+ * _speck_round_128bytes() - Speck encryption round on 128 bytes at a time
+ *
+ * Do one Speck encryption round on the 128 bytes (8 blocks for Speck128, 16 for
+ * Speck64) stored in X0-X3 and Y0-Y3, using the round key stored in all lanes
+ * of ROUND_KEY. 'n' is the lane size: 64 for Speck128, or 32 for Speck64.
+ * 'lanes' is the lane specifier: "2d" for Speck128 or "4s" for Speck64.
+ */
+.macro _speck_round_128bytes n, lanes
+
+ // x = ror(x, 8)
+ tbl X_0.16b, {X_0.16b}, ROTATE_TABLE.16b
+ tbl X_1.16b, {X_1.16b}, ROTATE_TABLE.16b
+ tbl X_2.16b, {X_2.16b}, ROTATE_TABLE.16b
+ tbl X_3.16b, {X_3.16b}, ROTATE_TABLE.16b
+
+ // x += y
+ add X_0.\lanes, X_0.\lanes, Y_0.\lanes
+ add X_1.\lanes, X_1.\lanes, Y_1.\lanes
+ add X_2.\lanes, X_2.\lanes, Y_2.\lanes
+ add X_3.\lanes, X_3.\lanes, Y_3.\lanes
+
+ // x ^= k
+ eor X_0.16b, X_0.16b, ROUND_KEY.16b
+ eor X_1.16b, X_1.16b, ROUND_KEY.16b
+ eor X_2.16b, X_2.16b, ROUND_KEY.16b
+ eor X_3.16b, X_3.16b, ROUND_KEY.16b
+
+ // y = rol(y, 3)
+ shl TMP0.\lanes, Y_0.\lanes, #3
+ shl TMP1.\lanes, Y_1.\lanes, #3
+ shl TMP2.\lanes, Y_2.\lanes, #3
+ shl TMP3.\lanes, Y_3.\lanes, #3
+ sri TMP0.\lanes, Y_0.\lanes, #(\n - 3)
+ sri TMP1.\lanes, Y_1.\lanes, #(\n - 3)
+ sri TMP2.\lanes, Y_2.\lanes, #(\n - 3)
+ sri TMP3.\lanes, Y_3.\lanes, #(\n - 3)
+
+ // y ^= x
+ eor Y_0.16b, TMP0.16b, X_0.16b
+ eor Y_1.16b, TMP1.16b, X_1.16b
+ eor Y_2.16b, TMP2.16b, X_2.16b
+ eor Y_3.16b, TMP3.16b, X_3.16b
+.endm
+
+/*
+ * _speck_unround_128bytes() - Speck decryption round on 128 bytes at a time
+ *
+ * This is the inverse of _speck_round_128bytes().
+ */
+.macro _speck_unround_128bytes n, lanes
+
+ // y ^= x
+ eor TMP0.16b, Y_0.16b, X_0.16b
+ eor TMP1.16b, Y_1.16b, X_1.16b
+ eor TMP2.16b, Y_2.16b, X_2.16b
+ eor TMP3.16b, Y_3.16b, X_3.16b
+
+ // y = ror(y, 3)
+ ushr Y_0.\lanes, TMP0.\lanes, #3
+ ushr Y_1.\lanes, TMP1.\lanes, #3
+ ushr Y_2.\lanes, TMP2.\lanes, #3
+ ushr Y_3.\lanes, TMP3.\lanes, #3
+ sli Y_0.\lanes, TMP0.\lanes, #(\n - 3)
+ sli Y_1.\lanes, TMP1.\lanes, #(\n - 3)
+ sli Y_2.\lanes, TMP2.\lanes, #(\n - 3)
+ sli Y_3.\lanes, TMP3.\lanes, #(\n - 3)
+
+ // x ^= k
+ eor X_0.16b, X_0.16b, ROUND_KEY.16b
+ eor X_1.16b, X_1.16b, ROUND_KEY.16b
+ eor X_2.16b, X_2.16b, ROUND_KEY.16b
+ eor X_3.16b, X_3.16b, ROUND_KEY.16b
+
+ // x -= y
+ sub X_0.\lanes, X_0.\lanes, Y_0.\lanes
+ sub X_1.\lanes, X_1.\lanes, Y_1.\lanes
+ sub X_2.\lanes, X_2.\lanes, Y_2.\lanes
+ sub X_3.\lanes, X_3.\lanes, Y_3.\lanes
+
+ // x = rol(x, 8)
+ tbl X_0.16b, {X_0.16b}, ROTATE_TABLE.16b
+ tbl X_1.16b, {X_1.16b}, ROTATE_TABLE.16b
+ tbl X_2.16b, {X_2.16b}, ROTATE_TABLE.16b
+ tbl X_3.16b, {X_3.16b}, ROTATE_TABLE.16b
+.endm
+
+.macro _next_xts_tweak next, cur, tmp, n
+.if \n == 64
+ /*
+ * Calculate the next tweak by multiplying the current one by x,
+ * modulo p(x) = x^128 + x^7 + x^2 + x + 1.
+ */
+ sshr \tmp\().2d, \cur\().2d, #63
+ and \tmp\().16b, \tmp\().16b, GFMUL_TABLE.16b
+ shl \next\().2d, \cur\().2d, #1
+ ext \tmp\().16b, \tmp\().16b, \tmp\().16b, #8
+ eor \next\().16b, \next\().16b, \tmp\().16b
+.else
+ /*
+ * Calculate the next two tweaks by multiplying the current ones by x^2,
+ * modulo p(x) = x^64 + x^4 + x^3 + x + 1.
+ */
+ ushr \tmp\().2d, \cur\().2d, #62
+ shl \next\().2d, \cur\().2d, #2
+ tbl \tmp\().16b, {GFMUL_TABLE.16b}, \tmp\().16b
+ eor \next\().16b, \next\().16b, \tmp\().16b
+.endif
+.endm
+
+/*
+ * _speck_xts_crypt() - Speck-XTS encryption/decryption
+ *
+ * Encrypt or decrypt NBYTES bytes of data from the SRC buffer to the DST buffer
+ * using Speck-XTS, specifically the variant with a block size of '2n' and round
+ * count given by NROUNDS. The expanded round keys are given in ROUND_KEYS, and
+ * the current XTS tweak value is given in TWEAK. It's assumed that NBYTES is a
+ * nonzero multiple of 128.
+ */
+.macro _speck_xts_crypt n, lanes, decrypting
+
+ /*
+ * If decrypting, modify the ROUND_KEYS parameter to point to the last
+ * round key rather than the first, since for decryption the round keys
+ * are used in reverse order.
+ */
+.if \decrypting
+ mov NROUNDS, NROUNDS /* zero the high 32 bits */
+.if \n == 64
+ add ROUND_KEYS, ROUND_KEYS, NROUNDS_X, lsl #3
+ sub ROUND_KEYS, ROUND_KEYS, #8
+.else
+ add ROUND_KEYS, ROUND_KEYS, NROUNDS_X, lsl #2
+ sub ROUND_KEYS, ROUND_KEYS, #4
+.endif
+.endif
+
+ // Load the index vector for tbl-based 8-bit rotates
+.if \decrypting
+ ldr ROTATE_TABLE_Q, .Lrol\n\()_8_table
+.else
+ ldr ROTATE_TABLE_Q, .Lror\n\()_8_table
+.endif
+
+ // One-time XTS preparation
+.if \n == 64
+ // Load first tweak
+ ld1 {TWEAKV0.16b}, [TWEAK]
+
+ // Load GF(2^128) multiplication table
+ ldr GFMUL_TABLE_Q, .Lgf128mul_table
+.else
+ // Load first tweak
+ ld1 {TWEAKV0.8b}, [TWEAK]
+
+ // Load GF(2^64) multiplication table
+ ldr GFMUL_TABLE_Q, .Lgf64mul_table
+
+ // Calculate second tweak, packing it together with the first
+ ushr TMP0.2d, TWEAKV0.2d, #63
+ shl TMP1.2d, TWEAKV0.2d, #1
+ tbl TMP0.8b, {GFMUL_TABLE.16b}, TMP0.8b
+ eor TMP0.8b, TMP0.8b, TMP1.8b
+ mov TWEAKV0.d[1], TMP0.d[0]
+.endif
+
+.Lnext_128bytes_\@:
+
+ // Calculate XTS tweaks for next 128 bytes
+ _next_xts_tweak TWEAKV1, TWEAKV0, TMP0, \n
+ _next_xts_tweak TWEAKV2, TWEAKV1, TMP0, \n
+ _next_xts_tweak TWEAKV3, TWEAKV2, TMP0, \n
+ _next_xts_tweak TWEAKV4, TWEAKV3, TMP0, \n
+ _next_xts_tweak TWEAKV5, TWEAKV4, TMP0, \n
+ _next_xts_tweak TWEAKV6, TWEAKV5, TMP0, \n
+ _next_xts_tweak TWEAKV7, TWEAKV6, TMP0, \n
+ _next_xts_tweak TWEAKV_NEXT, TWEAKV7, TMP0, \n
+
+ // Load the next source blocks into {X,Y}[0-3]
+ ld1 {X_0.16b-Y_1.16b}, [SRC], #64
+ ld1 {X_2.16b-Y_3.16b}, [SRC], #64
+
+ // XOR the source blocks with their XTS tweaks
+ eor TMP0.16b, X_0.16b, TWEAKV0.16b
+ eor Y_0.16b, Y_0.16b, TWEAKV1.16b
+ eor TMP1.16b, X_1.16b, TWEAKV2.16b
+ eor Y_1.16b, Y_1.16b, TWEAKV3.16b
+ eor TMP2.16b, X_2.16b, TWEAKV4.16b
+ eor Y_2.16b, Y_2.16b, TWEAKV5.16b
+ eor TMP3.16b, X_3.16b, TWEAKV6.16b
+ eor Y_3.16b, Y_3.16b, TWEAKV7.16b
+
+ /*
+ * De-interleave the 'x' and 'y' elements of each block, i.e. make it so
+ * that the X[0-3] registers contain only the second halves of blocks,
+ * and the Y[0-3] registers contain only the first halves of blocks.
+ * (Speck uses the order (y, x) rather than the more intuitive (x, y).)
+ */
+ uzp2 X_0.\lanes, TMP0.\lanes, Y_0.\lanes
+ uzp1 Y_0.\lanes, TMP0.\lanes, Y_0.\lanes
+ uzp2 X_1.\lanes, TMP1.\lanes, Y_1.\lanes
+ uzp1 Y_1.\lanes, TMP1.\lanes, Y_1.\lanes
+ uzp2 X_2.\lanes, TMP2.\lanes, Y_2.\lanes
+ uzp1 Y_2.\lanes, TMP2.\lanes, Y_2.\lanes
+ uzp2 X_3.\lanes, TMP3.\lanes, Y_3.\lanes
+ uzp1 Y_3.\lanes, TMP3.\lanes, Y_3.\lanes
+
+ // Do the cipher rounds
+ mov x6, ROUND_KEYS
+ mov w7, NROUNDS
+.Lnext_round_\@:
+.if \decrypting
+ ld1r {ROUND_KEY.\lanes}, [x6]
+ sub x6, x6, #( \n / 8 )
+ _speck_unround_128bytes \n, \lanes
+.else
+ ld1r {ROUND_KEY.\lanes}, [x6], #( \n / 8 )
+ _speck_round_128bytes \n, \lanes
+.endif
+ subs w7, w7, #1
+ bne .Lnext_round_\@
+
+ // Re-interleave the 'x' and 'y' elements of each block
+ zip1 TMP0.\lanes, Y_0.\lanes, X_0.\lanes
+ zip2 Y_0.\lanes, Y_0.\lanes, X_0.\lanes
+ zip1 TMP1.\lanes, Y_1.\lanes, X_1.\lanes
+ zip2 Y_1.\lanes, Y_1.\lanes, X_1.\lanes
+ zip1 TMP2.\lanes, Y_2.\lanes, X_2.\lanes
+ zip2 Y_2.\lanes, Y_2.\lanes, X_2.\lanes
+ zip1 TMP3.\lanes, Y_3.\lanes, X_3.\lanes
+ zip2 Y_3.\lanes, Y_3.\lanes, X_3.\lanes
+
+ // XOR the encrypted/decrypted blocks with the tweaks calculated earlier
+ eor X_0.16b, TMP0.16b, TWEAKV0.16b
+ eor Y_0.16b, Y_0.16b, TWEAKV1.16b
+ eor X_1.16b, TMP1.16b, TWEAKV2.16b
+ eor Y_1.16b, Y_1.16b, TWEAKV3.16b
+ eor X_2.16b, TMP2.16b, TWEAKV4.16b
+ eor Y_2.16b, Y_2.16b, TWEAKV5.16b
+ eor X_3.16b, TMP3.16b, TWEAKV6.16b
+ eor Y_3.16b, Y_3.16b, TWEAKV7.16b
+ mov TWEAKV0.16b, TWEAKV_NEXT.16b
+
+ // Store the ciphertext in the destination buffer
+ st1 {X_0.16b-Y_1.16b}, [DST], #64
+ st1 {X_2.16b-Y_3.16b}, [DST], #64
+
+ // Continue if there are more 128-byte chunks remaining
+ subs NBYTES, NBYTES, #128
+ bne .Lnext_128bytes_\@
+
+ // Store the next tweak and return
+.if \n == 64
+ st1 {TWEAKV_NEXT.16b}, [TWEAK]
+.else
+ st1 {TWEAKV_NEXT.8b}, [TWEAK]
+.endif
+ ret
+.endm
+
+ENTRY(speck128_xts_encrypt_neon)
+ _speck_xts_crypt n=64, lanes=2d, decrypting=0
+ENDPROC(speck128_xts_encrypt_neon)
+
+ENTRY(speck128_xts_decrypt_neon)
+ _speck_xts_crypt n=64, lanes=2d, decrypting=1
+ENDPROC(speck128_xts_decrypt_neon)
+
+ENTRY(speck64_xts_encrypt_neon)
+ _speck_xts_crypt n=32, lanes=4s, decrypting=0
+ENDPROC(speck64_xts_encrypt_neon)
+
+ENTRY(speck64_xts_decrypt_neon)
+ _speck_xts_crypt n=32, lanes=4s, decrypting=1
+ENDPROC(speck64_xts_decrypt_neon)
diff --git a/arch/arm64/crypto/speck-neon-glue.c b/arch/arm64/crypto/speck-neon-glue.c
new file mode 100644
index 0000000..6e233ae
--- /dev/null
+++ b/arch/arm64/crypto/speck-neon-glue.c
@@ -0,0 +1,282 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NEON-accelerated implementation of Speck128-XTS and Speck64-XTS
+ * (64-bit version; based on the 32-bit version)
+ *
+ * Copyright (c) 2018 Google, Inc
+ */
+
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+#include <asm/simd.h>
+#include <crypto/algapi.h>
+#include <crypto/gf128mul.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/speck.h>
+#include <crypto/xts.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+/* The assembly functions only handle multiples of 128 bytes */
+#define SPECK_NEON_CHUNK_SIZE 128
+
+/* Speck128 */
+
+struct speck128_xts_tfm_ctx {
+ struct speck128_tfm_ctx main_key;
+ struct speck128_tfm_ctx tweak_key;
+};
+
+asmlinkage void speck128_xts_encrypt_neon(const u64 *round_keys, int nrounds,
+ void *dst, const void *src,
+ unsigned int nbytes, void *tweak);
+
+asmlinkage void speck128_xts_decrypt_neon(const u64 *round_keys, int nrounds,
+ void *dst, const void *src,
+ unsigned int nbytes, void *tweak);
+
+typedef void (*speck128_crypt_one_t)(const struct speck128_tfm_ctx *,
+ u8 *, const u8 *);
+typedef void (*speck128_xts_crypt_many_t)(const u64 *, int, void *,
+ const void *, unsigned int, void *);
+
+static __always_inline int
+__speck128_xts_crypt(struct skcipher_request *req,
+ speck128_crypt_one_t crypt_one,
+ speck128_xts_crypt_many_t crypt_many)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct speck128_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ le128 tweak;
+ int err;
+
+ err = skcipher_walk_virt(&walk, req, true);
+
+ crypto_speck128_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv);
+
+ while (walk.nbytes > 0) {
+ unsigned int nbytes = walk.nbytes;
+ u8 *dst = walk.dst.virt.addr;
+ const u8 *src = walk.src.virt.addr;
+
+ if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) {
+ unsigned int count;
+
+ count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE);
+ kernel_neon_begin();
+ (*crypt_many)(ctx->main_key.round_keys,
+ ctx->main_key.nrounds,
+ dst, src, count, &tweak);
+ kernel_neon_end();
+ dst += count;
+ src += count;
+ nbytes -= count;
+ }
+
+ /* Handle any remainder with generic code */
+ while (nbytes >= sizeof(tweak)) {
+ le128_xor((le128 *)dst, (const le128 *)src, &tweak);
+ (*crypt_one)(&ctx->main_key, dst, dst);
+ le128_xor((le128 *)dst, (const le128 *)dst, &tweak);
+ gf128mul_x_ble(&tweak, &tweak);
+
+ dst += sizeof(tweak);
+ src += sizeof(tweak);
+ nbytes -= sizeof(tweak);
+ }
+ err = skcipher_walk_done(&walk, nbytes);
+ }
+
+ return err;
+}
+
+static int speck128_xts_encrypt(struct skcipher_request *req)
+{
+ return __speck128_xts_crypt(req, crypto_speck128_encrypt,
+ speck128_xts_encrypt_neon);
+}
+
+static int speck128_xts_decrypt(struct skcipher_request *req)
+{
+ return __speck128_xts_crypt(req, crypto_speck128_decrypt,
+ speck128_xts_decrypt_neon);
+}
+
+static int speck128_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+ unsigned int keylen)
+{
+ struct speck128_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+ int err;
+
+ err = xts_verify_key(tfm, key, keylen);
+ if (err)
+ return err;
+
+ keylen /= 2;
+
+ err = crypto_speck128_setkey(&ctx->main_key, key, keylen);
+ if (err)
+ return err;
+
+ return crypto_speck128_setkey(&ctx->tweak_key, key + keylen, keylen);
+}
+
+/* Speck64 */
+
+struct speck64_xts_tfm_ctx {
+ struct speck64_tfm_ctx main_key;
+ struct speck64_tfm_ctx tweak_key;
+};
+
+asmlinkage void speck64_xts_encrypt_neon(const u32 *round_keys, int nrounds,
+ void *dst, const void *src,
+ unsigned int nbytes, void *tweak);
+
+asmlinkage void speck64_xts_decrypt_neon(const u32 *round_keys, int nrounds,
+ void *dst, const void *src,
+ unsigned int nbytes, void *tweak);
+
+typedef void (*speck64_crypt_one_t)(const struct speck64_tfm_ctx *,
+ u8 *, const u8 *);
+typedef void (*speck64_xts_crypt_many_t)(const u32 *, int, void *,
+ const void *, unsigned int, void *);
+
+static __always_inline int
+__speck64_xts_crypt(struct skcipher_request *req, speck64_crypt_one_t crypt_one,
+ speck64_xts_crypt_many_t crypt_many)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ const struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ __le64 tweak;
+ int err;
+
+ err = skcipher_walk_virt(&walk, req, true);
+
+ crypto_speck64_encrypt(&ctx->tweak_key, (u8 *)&tweak, walk.iv);
+
+ while (walk.nbytes > 0) {
+ unsigned int nbytes = walk.nbytes;
+ u8 *dst = walk.dst.virt.addr;
+ const u8 *src = walk.src.virt.addr;
+
+ if (nbytes >= SPECK_NEON_CHUNK_SIZE && may_use_simd()) {
+ unsigned int count;
+
+ count = round_down(nbytes, SPECK_NEON_CHUNK_SIZE);
+ kernel_neon_begin();
+ (*crypt_many)(ctx->main_key.round_keys,
+ ctx->main_key.nrounds,
+ dst, src, count, &tweak);
+ kernel_neon_end();
+ dst += count;
+ src += count;
+ nbytes -= count;
+ }
+
+ /* Handle any remainder with generic code */
+ while (nbytes >= sizeof(tweak)) {
+ *(__le64 *)dst = *(__le64 *)src ^ tweak;
+ (*crypt_one)(&ctx->main_key, dst, dst);
+ *(__le64 *)dst ^= tweak;
+ tweak = cpu_to_le64((le64_to_cpu(tweak) << 1) ^
+ ((tweak & cpu_to_le64(1ULL << 63)) ?
+ 0x1B : 0));
+ dst += sizeof(tweak);
+ src += sizeof(tweak);
+ nbytes -= sizeof(tweak);
+ }
+ err = skcipher_walk_done(&walk, nbytes);
+ }
+
+ return err;
+}
+
+static int speck64_xts_encrypt(struct skcipher_request *req)
+{
+ return __speck64_xts_crypt(req, crypto_speck64_encrypt,
+ speck64_xts_encrypt_neon);
+}
+
+static int speck64_xts_decrypt(struct skcipher_request *req)
+{
+ return __speck64_xts_crypt(req, crypto_speck64_decrypt,
+ speck64_xts_decrypt_neon);
+}
+
+static int speck64_xts_setkey(struct crypto_skcipher *tfm, const u8 *key,
+ unsigned int keylen)
+{
+ struct speck64_xts_tfm_ctx *ctx = crypto_skcipher_ctx(tfm);
+ int err;
+
+ err = xts_verify_key(tfm, key, keylen);
+ if (err)
+ return err;
+
+ keylen /= 2;
+
+ err = crypto_speck64_setkey(&ctx->main_key, key, keylen);
+ if (err)
+ return err;
+
+ return crypto_speck64_setkey(&ctx->tweak_key, key + keylen, keylen);
+}
+
+static struct skcipher_alg speck_algs[] = {
+ {
+ .base.cra_name = "xts(speck128)",
+ .base.cra_driver_name = "xts-speck128-neon",
+ .base.cra_priority = 300,
+ .base.cra_blocksize = SPECK128_BLOCK_SIZE,
+ .base.cra_ctxsize = sizeof(struct speck128_xts_tfm_ctx),
+ .base.cra_alignmask = 7,
+ .base.cra_module = THIS_MODULE,
+ .min_keysize = 2 * SPECK128_128_KEY_SIZE,
+ .max_keysize = 2 * SPECK128_256_KEY_SIZE,
+ .ivsize = SPECK128_BLOCK_SIZE,
+ .walksize = SPECK_NEON_CHUNK_SIZE,
+ .setkey = speck128_xts_setkey,
+ .encrypt = speck128_xts_encrypt,
+ .decrypt = speck128_xts_decrypt,
+ }, {
+ .base.cra_name = "xts(speck64)",
+ .base.cra_driver_name = "xts-speck64-neon",
+ .base.cra_priority = 300,
+ .base.cra_blocksize = SPECK64_BLOCK_SIZE,
+ .base.cra_ctxsize = sizeof(struct speck64_xts_tfm_ctx),
+ .base.cra_alignmask = 7,
+ .base.cra_module = THIS_MODULE,
+ .min_keysize = 2 * SPECK64_96_KEY_SIZE,
+ .max_keysize = 2 * SPECK64_128_KEY_SIZE,
+ .ivsize = SPECK64_BLOCK_SIZE,
+ .walksize = SPECK_NEON_CHUNK_SIZE,
+ .setkey = speck64_xts_setkey,
+ .encrypt = speck64_xts_encrypt,
+ .decrypt = speck64_xts_decrypt,
+ }
+};
+
+static int __init speck_neon_module_init(void)
+{
+ if (!(elf_hwcap & HWCAP_ASIMD))
+ return -ENODEV;
+ return crypto_register_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
+}
+
+static void __exit speck_neon_module_exit(void)
+{
+ crypto_unregister_skciphers(speck_algs, ARRAY_SIZE(speck_algs));
+}
+
+module_init(speck_neon_module_init);
+module_exit(speck_neon_module_exit);
+
+MODULE_DESCRIPTION("Speck block cipher (NEON-accelerated)");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
+MODULE_ALIAS_CRYPTO("xts(speck128)");
+MODULE_ALIAS_CRYPTO("xts-speck128-neon");
+MODULE_ALIAS_CRYPTO("xts(speck64)");
+MODULE_ALIAS_CRYPTO("xts-speck64-neon");
diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
index 33be513..36d9863 100644
--- a/arch/arm64/include/asm/elf.h
+++ b/arch/arm64/include/asm/elf.h
@@ -173,7 +173,7 @@
#ifdef CONFIG_COMPAT
/* PIE load location for compat arm. Must match ARM ELF_ET_DYN_BASE. */
-#define COMPAT_ELF_ET_DYN_BASE 0x000400000UL
+#define COMPAT_ELF_ET_DYN_BASE (2 * TASK_SIZE_32 / 3)
/* AArch32 registers. */
#define COMPAT_ELF_NGREG 18
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 4572a9b..20bfb8e 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -29,7 +29,9 @@
({ \
u64 reg; \
asm volatile(ALTERNATIVE("mrs %0, " __stringify(r##nvh),\
- "mrs_s %0, " __stringify(r##vh),\
+ DEFINE_MRS_S \
+ "mrs_s %0, " __stringify(r##vh) "\n"\
+ UNDEFINE_MRS_S, \
ARM64_HAS_VIRT_HOST_EXTN) \
: "=r" (reg)); \
reg; \
@@ -39,7 +41,9 @@
do { \
u64 __val = (u64)(v); \
asm volatile(ALTERNATIVE("msr " __stringify(r##nvh) ", %x0",\
- "msr_s " __stringify(r##vh) ", %x0",\
+ DEFINE_MSR_S \
+ "msr_s " __stringify(r##vh) ", %x0\n"\
+ UNDEFINE_MSR_S, \
ARM64_HAS_VIRT_HOST_EXTN) \
: : "rZ" (__val)); \
} while (0)
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 779d7a2..f7ff065 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -132,7 +132,7 @@
* Atomically replaces the active TTBR1_EL1 PGD with a new VA-compatible PGD,
* avoiding the possibility of conflicting TLB entries being allocated.
*/
-static inline void cpu_replace_ttbr1(pgd_t *pgd)
+static inline void __nocfi cpu_replace_ttbr1(pgd_t *pgd)
{
typedef void (ttbr_replace_func)(phys_addr_t);
extern ttbr_replace_func idmap_cpu_replace_ttbr1;
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index ede80d4..3bdec2f 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -465,20 +465,39 @@
#include <linux/types.h>
-asm(
-" .irp num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30\n"
-" .equ .L__reg_num_x\\num, \\num\n"
-" .endr\n"
+#define __DEFINE_MRS_MSR_S_REGNUM \
+" .irp num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30\n" \
+" .equ .L__reg_num_x\\num, \\num\n" \
+" .endr\n" \
" .equ .L__reg_num_xzr, 31\n"
-"\n"
-" .macro mrs_s, rt, sreg\n"
- __emit_inst(0xd5200000|(\\sreg)|(.L__reg_num_\\rt))
+
+#define DEFINE_MRS_S \
+ __DEFINE_MRS_MSR_S_REGNUM \
+" .macro mrs_s, rt, sreg\n" \
+" .inst 0xd5200000|(\\sreg)|(.L__reg_num_\\rt)\n" \
" .endm\n"
-"\n"
-" .macro msr_s, sreg, rt\n"
- __emit_inst(0xd5000000|(\\sreg)|(.L__reg_num_\\rt))
+
+#define DEFINE_MSR_S \
+ __DEFINE_MRS_MSR_S_REGNUM \
+" .macro msr_s, sreg, rt\n" \
+" .inst 0xd5000000|(\\sreg)|(.L__reg_num_\\rt)\n" \
" .endm\n"
-);
+
+#define UNDEFINE_MRS_S \
+" .purgem mrs_s\n"
+
+#define UNDEFINE_MSR_S \
+" .purgem msr_s\n"
+
+#define __mrs_s(r, v) \
+ DEFINE_MRS_S \
+" mrs_s %0, " __stringify(r) "\n" \
+ UNDEFINE_MRS_S : "=r" (v)
+
+#define __msr_s(r, v) \
+ DEFINE_MSR_S \
+" msr_s " __stringify(r) ", %x0\n" \
+ UNDEFINE_MSR_S : : "rZ" (v)
/*
* Unlike read_cpuid, calls to read_sysreg are never expected to be
@@ -504,15 +523,15 @@
* For registers without architectural names, or simply unsupported by
* GAS.
*/
-#define read_sysreg_s(r) ({ \
- u64 __val; \
- asm volatile("mrs_s %0, " __stringify(r) : "=r" (__val)); \
- __val; \
+#define read_sysreg_s(r) ({ \
+ u64 __val; \
+ asm volatile(__mrs_s(r, __val)); \
+ __val; \
})
-#define write_sysreg_s(v, r) do { \
- u64 __val = (u64)(v); \
- asm volatile("msr_s " __stringify(r) ", %x0" : : "rZ" (__val)); \
+#define write_sysreg_s(v, r) do { \
+ u64 __val = (u64)(v); \
+ asm volatile(__msr_s(r, __val)); \
} while (0)
static inline void config_sctlr_el1(u32 clear, u32 set)
diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index b320228..b7bd705 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -33,6 +33,20 @@
#endif /* CONFIG_NUMA */
+#include <linux/arch_topology.h>
+
+/* Replace task scheduler's default frequency-invariant accounting */
+#define arch_scale_freq_capacity topology_get_freq_scale
+
+/* Replace task scheduler's default max-frequency-invariant accounting */
+#define arch_scale_max_freq_capacity topology_get_max_freq_scale
+
+/* Replace task scheduler's default cpu-invariant accounting */
+#define arch_scale_cpu_capacity topology_get_cpu_scale
+
+/* Enable topology flag updates */
+#define arch_update_cpu_topology topology_update_cpu_topology
+
#include <asm-generic/topology.h>
#endif /* _ASM_ARM_TOPOLOGY_H */
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 003dd39..f389c9a 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -842,7 +842,7 @@
ID_AA64PFR0_CSV3_SHIFT);
}
-static int kpti_install_ng_mappings(void *__unused)
+static int __nocfi kpti_install_ng_mappings(void *__unused)
{
typedef void (kpti_remap_fn)(int, int, phys_addr_t);
extern kpti_remap_fn idmap_kpti_install_ng_mappings;
diff --git a/arch/arm64/kernel/io.c b/arch/arm64/kernel/io.c
index 354be2a..79b1738 100644
--- a/arch/arm64/kernel/io.c
+++ b/arch/arm64/kernel/io.c
@@ -25,8 +25,7 @@
*/
void __memcpy_fromio(void *to, const volatile void __iomem *from, size_t count)
{
- while (count && (!IS_ALIGNED((unsigned long)from, 8) ||
- !IS_ALIGNED((unsigned long)to, 8))) {
+ while (count && !IS_ALIGNED((unsigned long)from, 8)) {
*(u8 *)to = __raw_readb(from);
from++;
to++;
@@ -54,23 +53,22 @@
*/
void __memcpy_toio(volatile void __iomem *to, const void *from, size_t count)
{
- while (count && (!IS_ALIGNED((unsigned long)to, 8) ||
- !IS_ALIGNED((unsigned long)from, 8))) {
- __raw_writeb(*(volatile u8 *)from, to);
+ while (count && !IS_ALIGNED((unsigned long)to, 8)) {
+ __raw_writeb(*(u8 *)from, to);
from++;
to++;
count--;
}
while (count >= 8) {
- __raw_writeq(*(volatile u64 *)from, to);
+ __raw_writeq(*(u64 *)from, to);
from += 8;
to += 8;
count -= 8;
}
while (count) {
- __raw_writeb(*(volatile u8 *)from, to);
+ __raw_writeb(*(u8 *)from, to);
from++;
to++;
count--;
diff --git a/arch/arm64/kernel/module.lds b/arch/arm64/kernel/module.lds
index 22e36a2..99eb7c2 100644
--- a/arch/arm64/kernel/module.lds
+++ b/arch/arm64/kernel/module.lds
@@ -1,5 +1,5 @@
SECTIONS {
- .plt (NOLOAD) : { BYTE(0) }
- .init.plt (NOLOAD) : { BYTE(0) }
- .text.ftrace_trampoline (NOLOAD) : { BYTE(0) }
+ .plt : { BYTE(0) }
+ .init.plt : { BYTE(0) }
+ .text.ftrace_trampoline : { BYTE(0) }
}
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 9e77373..e5d670a 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -170,6 +170,70 @@
while (1);
}
+/*
+ * dump a block of kernel memory from around the given address
+ */
+static void show_data(unsigned long addr, int nbytes, const char *name)
+{
+ int i, j;
+ int nlines;
+ u32 *p;
+
+ /*
+ * don't attempt to dump non-kernel addresses or
+ * values that are probably just small negative numbers
+ */
+ if (addr < PAGE_OFFSET || addr > -256UL)
+ return;
+
+ printk("\n%s: %#lx:\n", name, addr);
+
+ /*
+ * round address down to a 32 bit boundary
+ * and always dump a multiple of 32 bytes
+ */
+ p = (u32 *)(addr & ~(sizeof(u32) - 1));
+ nbytes += (addr & (sizeof(u32) - 1));
+ nlines = (nbytes + 31) / 32;
+
+
+ for (i = 0; i < nlines; i++) {
+ /*
+ * just display low 16 bits of address to keep
+ * each line of the dump < 80 characters
+ */
+ printk("%04lx ", (unsigned long)p & 0xffff);
+ for (j = 0; j < 8; j++) {
+ u32 data;
+ if (probe_kernel_address(p, data)) {
+ pr_cont(" ********");
+ } else {
+ pr_cont(" %08x", data);
+ }
+ ++p;
+ }
+ pr_cont("\n");
+ }
+}
+
+static void show_extra_register_data(struct pt_regs *regs, int nbytes)
+{
+ mm_segment_t fs;
+ unsigned int i;
+
+ fs = get_fs();
+ set_fs(KERNEL_DS);
+ show_data(regs->pc - nbytes, nbytes * 2, "PC");
+ show_data(regs->regs[30] - nbytes, nbytes * 2, "LR");
+ show_data(regs->sp - nbytes, nbytes * 2, "SP");
+ for (i = 0; i < 30; i++) {
+ char name[4];
+ snprintf(name, sizeof(name), "X%u", i);
+ show_data(regs->regs[i] - nbytes, nbytes * 2, name);
+ }
+ set_fs(fs);
+}
+
void __show_regs(struct pt_regs *regs)
{
int i, top_reg;
@@ -205,6 +269,9 @@
pr_cont("\n");
}
+ if (!user_mode(regs))
+ show_extra_register_data(regs, 128);
+ printk("\n");
}
void show_regs(struct pt_regs * regs)
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index 8d48b23..6eb9350 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -21,6 +21,7 @@
#include <linux/of.h>
#include <linux/sched.h>
#include <linux/sched/topology.h>
+#include <linux/sched/energy.h>
#include <linux/slab.h>
#include <linux/string.h>
@@ -280,8 +281,58 @@
topology_populated:
update_siblings_masks(cpuid);
+ topology_detect_flags();
}
+#ifdef CONFIG_SCHED_SMT
+static int smt_flags(void)
+{
+ return cpu_smt_flags() | topology_smt_flags();
+}
+#endif
+
+#ifdef CONFIG_SCHED_MC
+static int core_flags(void)
+{
+ return cpu_core_flags() | topology_core_flags();
+}
+#endif
+
+static int cpu_flags(void)
+{
+ return topology_cpu_flags();
+}
+
+static inline
+const struct sched_group_energy * const cpu_core_energy(int cpu)
+{
+ return sge_array[cpu][SD_LEVEL0];
+}
+
+static inline
+const struct sched_group_energy * const cpu_cluster_energy(int cpu)
+{
+ return sge_array[cpu][SD_LEVEL1];
+}
+
+static inline
+const struct sched_group_energy * const cpu_system_energy(int cpu)
+{
+ return sge_array[cpu][SD_LEVEL2];
+}
+
+static struct sched_domain_topology_level arm64_topology[] = {
+#ifdef CONFIG_SCHED_SMT
+ { cpu_smt_mask, smt_flags, SD_INIT_NAME(SMT) },
+#endif
+#ifdef CONFIG_SCHED_MC
+ { cpu_coregroup_mask, core_flags, cpu_core_energy, SD_INIT_NAME(MC) },
+#endif
+ { cpu_cpu_mask, cpu_flags, cpu_cluster_energy, SD_INIT_NAME(DIE) },
+ { cpu_cpu_mask, NULL, cpu_system_energy, SD_INIT_NAME(SYS) },
+ { NULL, }
+};
+
static void __init reset_cpu_topology(void)
{
unsigned int cpu;
@@ -310,4 +361,6 @@
*/
if (of_have_populated_dt() && parse_dt_topology())
reset_cpu_topology();
+ else
+ set_sched_topology(arm64_topology);
}
diff --git a/arch/arm64/kernel/vdso/Makefile b/arch/arm64/kernel/vdso/Makefile
index b215c71..ef3f9d9 100644
--- a/arch/arm64/kernel/vdso/Makefile
+++ b/arch/arm64/kernel/vdso/Makefile
@@ -15,6 +15,7 @@
ccflags-y := -shared -fno-common -fno-builtin
ccflags-y += -nostdlib -Wl,-soname=linux-vdso.so.1 \
$(call cc-ldoption, -Wl$(comma)--hash-style=sysv)
+ccflags-y += $(DISABLE_LTO)
# Disable gcov profiling for VDSO code
GCOV_PROFILE := n
diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S
index 76320e9..c39872a 100644
--- a/arch/arm64/kernel/vdso/gettimeofday.S
+++ b/arch/arm64/kernel/vdso/gettimeofday.S
@@ -309,7 +309,7 @@
b.ne 4f
ldr x2, 6f
2:
- cbz w1, 3f
+ cbz x1, 3f
stp xzr, x2, [x1]
3: /* res == NULL. */
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index ddfd3c0..cb3e439 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -61,7 +61,7 @@
#define TRAMP_TEXT \
. = ALIGN(PAGE_SIZE); \
VMLINUX_SYMBOL(__entry_tramp_text_start) = .; \
- *(.entry.tramp.text) \
+ KEEP(*(.entry.tramp.text)) \
. = ALIGN(PAGE_SIZE); \
VMLINUX_SYMBOL(__entry_tramp_text_end) = .;
#else
@@ -150,11 +150,11 @@
. = ALIGN(4);
.altinstructions : {
__alt_instructions = .;
- *(.altinstructions)
+ KEEP(*(.altinstructions))
__alt_instructions_end = .;
}
.altinstr_replacement : {
- *(.altinstr_replacement)
+ KEEP(*(.altinstr_replacement))
}
. = ALIGN(PAGE_SIZE);
diff --git a/arch/arm64/kvm/hyp/Makefile b/arch/arm64/kvm/hyp/Makefile
index f04400d..bfa00a9 100644
--- a/arch/arm64/kvm/hyp/Makefile
+++ b/arch/arm64/kvm/hyp/Makefile
@@ -3,7 +3,11 @@
# Makefile for Kernel-based Virtual Machine module, HYP part
#
-ccflags-y += -fno-stack-protector -DDISABLE_BRANCH_PROFILING
+ccflags-y += -fno-stack-protector -DDISABLE_BRANCH_PROFILING $(DISABLE_CFI)
+
+ifeq ($(cc-name),clang)
+ccflags-y += -fno-jump-tables
+endif
KVM=../../../../virt/kvm
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 58470b1..5c172ba 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -166,7 +166,7 @@
/* create a coherent mapping */
page = virt_to_page(ptr);
coherent_ptr = dma_common_contiguous_remap(page, size, VM_USERMAP,
- prot, NULL);
+ prot, __builtin_return_address(0));
if (!coherent_ptr)
goto no_map;
diff --git a/arch/powerpc/platforms/pseries/cmm.c b/arch/powerpc/platforms/pseries/cmm.c
index 4ac419c..560aefd 100644
--- a/arch/powerpc/platforms/pseries/cmm.c
+++ b/arch/powerpc/platforms/pseries/cmm.c
@@ -742,7 +742,7 @@
* Return value:
* 0 on success / other on failure
**/
-static int cmm_set_disable(const char *val, struct kernel_param *kp)
+static int cmm_set_disable(const char *val, const struct kernel_param *kp)
{
int disable = simple_strtoul(val, NULL, 10);
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 1c4d012..9451ea0 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -111,6 +111,8 @@
KBUILD_CFLAGS += $(call cc-option,-mno-80387)
KBUILD_CFLAGS += $(call cc-option,-mno-fp-ret-in-387)
+ KBUILD_CFLAGS += -fno-pic
+
# By default gcc and clang use a stack alignment of 16 bytes for x86.
# However the standard kernel entry on x86-64 leaves the stack on an
# 8-byte boundary. If the compiler isn't informed about the actual
diff --git a/arch/x86/configs/i386_ranchu_defconfig b/arch/x86/configs/i386_ranchu_defconfig
new file mode 100644
index 0000000..4e9dc7d
--- /dev/null
+++ b/arch/x86/configs/i386_ranchu_defconfig
@@ -0,0 +1,421 @@
+# CONFIG_64BIT is not set
+# CONFIG_LOCALVERSION_AUTO is not set
+CONFIG_POSIX_MQUEUE=y
+CONFIG_AUDIT=y
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_BSD_PROCESS_ACCT=y
+CONFIG_TASKSTATS=y
+CONFIG_TASK_DELAY_ACCT=y
+CONFIG_TASK_XACCT=y
+CONFIG_TASK_IO_ACCOUNTING=y
+CONFIG_CGROUPS=y
+CONFIG_CGROUP_DEBUG=y
+CONFIG_CGROUP_FREEZER=y
+CONFIG_CGROUP_CPUACCT=y
+CONFIG_CGROUP_SCHED=y
+CONFIG_RT_GROUP_SCHED=y
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_CC_OPTIMIZE_FOR_SIZE=y
+CONFIG_SYSCTL_SYSCALL=y
+CONFIG_KALLSYMS_ALL=y
+CONFIG_EMBEDDED=y
+# CONFIG_COMPAT_BRK is not set
+CONFIG_ARCH_MMAP_RND_BITS=16
+CONFIG_PARTITION_ADVANCED=y
+CONFIG_OSF_PARTITION=y
+CONFIG_AMIGA_PARTITION=y
+CONFIG_MAC_PARTITION=y
+CONFIG_BSD_DISKLABEL=y
+CONFIG_MINIX_SUBPARTITION=y
+CONFIG_SOLARIS_X86_PARTITION=y
+CONFIG_UNIXWARE_DISKLABEL=y
+CONFIG_SGI_PARTITION=y
+CONFIG_SUN_PARTITION=y
+CONFIG_KARMA_PARTITION=y
+CONFIG_SMP=y
+CONFIG_X86_BIGSMP=y
+CONFIG_MCORE2=y
+CONFIG_X86_GENERIC=y
+CONFIG_HPET_TIMER=y
+CONFIG_NR_CPUS=512
+CONFIG_PREEMPT=y
+# CONFIG_X86_MCE is not set
+CONFIG_X86_REBOOTFIXUPS=y
+CONFIG_X86_MSR=y
+CONFIG_X86_CPUID=y
+CONFIG_KSM=y
+CONFIG_CMA=y
+# CONFIG_MTRR_SANITIZER is not set
+CONFIG_EFI=y
+CONFIG_EFI_STUB=y
+CONFIG_HZ_100=y
+CONFIG_PHYSICAL_START=0x100000
+CONFIG_PM_AUTOSLEEP=y
+CONFIG_PM_WAKELOCKS=y
+CONFIG_PM_WAKELOCKS_LIMIT=0
+# CONFIG_PM_WAKELOCKS_GC is not set
+CONFIG_PM_DEBUG=y
+CONFIG_CPU_FREQ=y
+# CONFIG_CPU_FREQ_STAT is not set
+CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
+CONFIG_CPU_FREQ_GOV_USERSPACE=y
+CONFIG_PCIEPORTBUS=y
+# CONFIG_PCIEASPM is not set
+CONFIG_PCCARD=y
+CONFIG_YENTA=y
+CONFIG_HOTPLUG_PCI=y
+# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
+CONFIG_BINFMT_MISC=y
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_XFRM_USER=y
+CONFIG_NET_KEY=y
+CONFIG_INET=y
+CONFIG_IP_MULTICAST=y
+CONFIG_IP_ADVANCED_ROUTER=y
+CONFIG_IP_MULTIPLE_TABLES=y
+CONFIG_IP_ROUTE_MULTIPATH=y
+CONFIG_IP_ROUTE_VERBOSE=y
+CONFIG_IP_PNP=y
+CONFIG_IP_PNP_DHCP=y
+CONFIG_IP_PNP_BOOTP=y
+CONFIG_IP_PNP_RARP=y
+CONFIG_IP_MROUTE=y
+CONFIG_IP_PIMSM_V1=y
+CONFIG_IP_PIMSM_V2=y
+CONFIG_SYN_COOKIES=y
+CONFIG_INET_ESP=y
+# CONFIG_INET_XFRM_MODE_BEET is not set
+# CONFIG_INET_LRO is not set
+CONFIG_INET_DIAG_DESTROY=y
+CONFIG_IPV6_ROUTER_PREF=y
+CONFIG_IPV6_ROUTE_INFO=y
+CONFIG_IPV6_OPTIMISTIC_DAD=y
+CONFIG_INET6_AH=y
+CONFIG_INET6_ESP=y
+CONFIG_INET6_IPCOMP=y
+CONFIG_IPV6_MIP6=y
+CONFIG_IPV6_MULTIPLE_TABLES=y
+CONFIG_NETLABEL=y
+CONFIG_NETFILTER=y
+CONFIG_NF_CONNTRACK=y
+CONFIG_NF_CONNTRACK_SECMARK=y
+CONFIG_NF_CONNTRACK_EVENTS=y
+CONFIG_NF_CT_PROTO_DCCP=y
+CONFIG_NF_CT_PROTO_SCTP=y
+CONFIG_NF_CT_PROTO_UDPLITE=y
+CONFIG_NF_CONNTRACK_AMANDA=y
+CONFIG_NF_CONNTRACK_FTP=y
+CONFIG_NF_CONNTRACK_H323=y
+CONFIG_NF_CONNTRACK_IRC=y
+CONFIG_NF_CONNTRACK_NETBIOS_NS=y
+CONFIG_NF_CONNTRACK_PPTP=y
+CONFIG_NF_CONNTRACK_SANE=y
+CONFIG_NF_CONNTRACK_TFTP=y
+CONFIG_NF_CT_NETLINK=y
+CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y
+CONFIG_NETFILTER_XT_TARGET_CONNMARK=y
+CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y
+CONFIG_NETFILTER_XT_TARGET_IDLETIMER=y
+CONFIG_NETFILTER_XT_TARGET_MARK=y
+CONFIG_NETFILTER_XT_TARGET_NFLOG=y
+CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y
+CONFIG_NETFILTER_XT_TARGET_TPROXY=y
+CONFIG_NETFILTER_XT_TARGET_TRACE=y
+CONFIG_NETFILTER_XT_TARGET_SECMARK=y
+CONFIG_NETFILTER_XT_TARGET_TCPMSS=y
+CONFIG_NETFILTER_XT_MATCH_COMMENT=y
+CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y
+CONFIG_NETFILTER_XT_MATCH_CONNMARK=y
+CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y
+CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y
+CONFIG_NETFILTER_XT_MATCH_HELPER=y
+CONFIG_NETFILTER_XT_MATCH_IPRANGE=y
+CONFIG_NETFILTER_XT_MATCH_LENGTH=y
+CONFIG_NETFILTER_XT_MATCH_LIMIT=y
+CONFIG_NETFILTER_XT_MATCH_MAC=y
+CONFIG_NETFILTER_XT_MATCH_MARK=y
+CONFIG_NETFILTER_XT_MATCH_POLICY=y
+CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y
+CONFIG_NETFILTER_XT_MATCH_QTAGUID=y
+CONFIG_NETFILTER_XT_MATCH_QUOTA=y
+CONFIG_NETFILTER_XT_MATCH_QUOTA2=y
+CONFIG_NETFILTER_XT_MATCH_SOCKET=y
+CONFIG_NETFILTER_XT_MATCH_STATE=y
+CONFIG_NETFILTER_XT_MATCH_STATISTIC=y
+CONFIG_NETFILTER_XT_MATCH_STRING=y
+CONFIG_NETFILTER_XT_MATCH_TIME=y
+CONFIG_NETFILTER_XT_MATCH_U32=y
+CONFIG_NF_CONNTRACK_IPV4=y
+CONFIG_IP_NF_IPTABLES=y
+CONFIG_IP_NF_MATCH_AH=y
+CONFIG_IP_NF_MATCH_ECN=y
+CONFIG_IP_NF_MATCH_TTL=y
+CONFIG_IP_NF_FILTER=y
+CONFIG_IP_NF_TARGET_REJECT=y
+CONFIG_IP_NF_MANGLE=y
+CONFIG_IP_NF_RAW=y
+CONFIG_IP_NF_SECURITY=y
+CONFIG_IP_NF_ARPTABLES=y
+CONFIG_IP_NF_ARPFILTER=y
+CONFIG_IP_NF_ARP_MANGLE=y
+CONFIG_NF_CONNTRACK_IPV6=y
+CONFIG_IP6_NF_IPTABLES=y
+CONFIG_IP6_NF_FILTER=y
+CONFIG_IP6_NF_TARGET_REJECT=y
+CONFIG_IP6_NF_MANGLE=y
+CONFIG_IP6_NF_RAW=y
+CONFIG_NET_SCHED=y
+CONFIG_NET_SCH_HTB=y
+CONFIG_NET_CLS_U32=y
+CONFIG_NET_EMATCH=y
+CONFIG_NET_EMATCH_U32=y
+CONFIG_NET_CLS_ACT=y
+CONFIG_CFG80211=y
+CONFIG_MAC80211=y
+CONFIG_MAC80211_LEDS=y
+CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
+CONFIG_DMA_CMA=y
+CONFIG_CMA_SIZE_MBYTES=16
+CONFIG_CONNECTOR=y
+CONFIG_BLK_DEV_LOOP=y
+CONFIG_BLK_DEV_RAM=y
+CONFIG_BLK_DEV_RAM_SIZE=8192
+CONFIG_VIRTIO_BLK=y
+CONFIG_BLK_DEV_SD=y
+CONFIG_BLK_DEV_SR=y
+CONFIG_BLK_DEV_SR_VENDOR=y
+CONFIG_CHR_DEV_SG=y
+CONFIG_SCSI_CONSTANTS=y
+CONFIG_SCSI_SPI_ATTRS=y
+CONFIG_SCSI_ISCSI_ATTRS=y
+# CONFIG_SCSI_LOWLEVEL is not set
+CONFIG_ATA=y
+CONFIG_SATA_AHCI=y
+CONFIG_ATA_PIIX=y
+CONFIG_PATA_AMD=y
+CONFIG_PATA_OLDPIIX=y
+CONFIG_PATA_SCH=y
+CONFIG_PATA_MPIIX=y
+CONFIG_ATA_GENERIC=y
+CONFIG_MD=y
+CONFIG_BLK_DEV_MD=y
+CONFIG_BLK_DEV_DM=y
+CONFIG_DM_DEBUG=y
+CONFIG_DM_CRYPT=y
+CONFIG_DM_MIRROR=y
+CONFIG_DM_ZERO=y
+CONFIG_DM_UEVENT=y
+CONFIG_DM_VERITY=y
+CONFIG_DM_VERITY_FEC=y
+CONFIG_NETDEVICES=y
+CONFIG_NETCONSOLE=y
+CONFIG_TUN=y
+CONFIG_VIRTIO_NET=y
+CONFIG_BNX2=y
+CONFIG_TIGON3=y
+CONFIG_NET_TULIP=y
+CONFIG_E100=y
+CONFIG_E1000=y
+CONFIG_E1000E=y
+CONFIG_SKY2=y
+CONFIG_NE2K_PCI=y
+CONFIG_FORCEDETH=y
+CONFIG_8139TOO=y
+# CONFIG_8139TOO_PIO is not set
+CONFIG_R8169=y
+CONFIG_FDDI=y
+CONFIG_PPP=y
+CONFIG_PPP_BSDCOMP=y
+CONFIG_PPP_DEFLATE=y
+CONFIG_PPP_MPPE=y
+CONFIG_USB_USBNET=y
+CONFIG_INPUT_POLLDEV=y
+# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
+CONFIG_INPUT_EVDEV=y
+CONFIG_INPUT_KEYRESET=y
+# CONFIG_KEYBOARD_ATKBD is not set
+CONFIG_KEYBOARD_GOLDFISH_EVENTS=y
+# CONFIG_INPUT_MOUSE is not set
+CONFIG_INPUT_JOYSTICK=y
+CONFIG_JOYSTICK_XPAD=y
+CONFIG_JOYSTICK_XPAD_FF=y
+CONFIG_JOYSTICK_XPAD_LEDS=y
+CONFIG_INPUT_TABLET=y
+CONFIG_TABLET_USB_ACECAD=y
+CONFIG_TABLET_USB_AIPTEK=y
+CONFIG_TABLET_USB_GTCO=y
+CONFIG_TABLET_USB_HANWANG=y
+CONFIG_TABLET_USB_KBTAB=y
+CONFIG_INPUT_TOUCHSCREEN=y
+CONFIG_INPUT_MISC=y
+CONFIG_INPUT_UINPUT=y
+CONFIG_INPUT_GPIO=y
+# CONFIG_SERIO is not set
+# CONFIG_VT is not set
+# CONFIG_LEGACY_PTYS is not set
+CONFIG_SERIAL_NONSTANDARD=y
+# CONFIG_DEVMEM is not set
+# CONFIG_DEVKMEM is not set
+CONFIG_SERIAL_8250=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_VIRTIO_CONSOLE=y
+CONFIG_NVRAM=y
+CONFIG_I2C_I801=y
+CONFIG_BATTERY_GOLDFISH=y
+CONFIG_WATCHDOG=y
+CONFIG_MEDIA_SUPPORT=y
+CONFIG_AGP=y
+CONFIG_AGP_AMD64=y
+CONFIG_AGP_INTEL=y
+CONFIG_DRM=y
+CONFIG_FB_MODE_HELPERS=y
+CONFIG_FB_TILEBLITTING=y
+CONFIG_FB_EFI=y
+CONFIG_FB_GOLDFISH=y
+CONFIG_BACKLIGHT_LCD_SUPPORT=y
+# CONFIG_LCD_CLASS_DEVICE is not set
+CONFIG_SOUND=y
+CONFIG_SND=y
+CONFIG_HIDRAW=y
+CONFIG_UHID=y
+CONFIG_HID_A4TECH=y
+CONFIG_HID_ACRUX=y
+CONFIG_HID_ACRUX_FF=y
+CONFIG_HID_APPLE=y
+CONFIG_HID_BELKIN=y
+CONFIG_HID_CHERRY=y
+CONFIG_HID_CHICONY=y
+CONFIG_HID_PRODIKEYS=y
+CONFIG_HID_CYPRESS=y
+CONFIG_HID_DRAGONRISE=y
+CONFIG_DRAGONRISE_FF=y
+CONFIG_HID_EMS_FF=y
+CONFIG_HID_ELECOM=y
+CONFIG_HID_EZKEY=y
+CONFIG_HID_HOLTEK=y
+CONFIG_HID_KEYTOUCH=y
+CONFIG_HID_KYE=y
+CONFIG_HID_UCLOGIC=y
+CONFIG_HID_WALTOP=y
+CONFIG_HID_GYRATION=y
+CONFIG_HID_TWINHAN=y
+CONFIG_HID_KENSINGTON=y
+CONFIG_HID_LCPOWER=y
+CONFIG_HID_LOGITECH=y
+CONFIG_HID_LOGITECH_DJ=y
+CONFIG_LOGITECH_FF=y
+CONFIG_LOGIRUMBLEPAD2_FF=y
+CONFIG_LOGIG940_FF=y
+CONFIG_HID_MAGICMOUSE=y
+CONFIG_HID_MICROSOFT=y
+CONFIG_HID_MONTEREY=y
+CONFIG_HID_MULTITOUCH=y
+CONFIG_HID_NTRIG=y
+CONFIG_HID_ORTEK=y
+CONFIG_HID_PANTHERLORD=y
+CONFIG_PANTHERLORD_FF=y
+CONFIG_HID_PETALYNX=y
+CONFIG_HID_PICOLCD=y
+CONFIG_HID_PRIMAX=y
+CONFIG_HID_ROCCAT=y
+CONFIG_HID_SAITEK=y
+CONFIG_HID_SAMSUNG=y
+CONFIG_HID_SONY=y
+CONFIG_HID_SPEEDLINK=y
+CONFIG_HID_SUNPLUS=y
+CONFIG_HID_GREENASIA=y
+CONFIG_GREENASIA_FF=y
+CONFIG_HID_SMARTJOYPLUS=y
+CONFIG_SMARTJOYPLUS_FF=y
+CONFIG_HID_TIVO=y
+CONFIG_HID_TOPSEED=y
+CONFIG_HID_THRUSTMASTER=y
+CONFIG_HID_WACOM=y
+CONFIG_HID_WIIMOTE=y
+CONFIG_HID_ZEROPLUS=y
+CONFIG_HID_ZYDACRON=y
+CONFIG_HID_PID=y
+CONFIG_USB_HIDDEV=y
+CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
+CONFIG_USB_MON=y
+CONFIG_USB_EHCI_HCD=y
+# CONFIG_USB_EHCI_TT_NEWSCHED is not set
+CONFIG_USB_OHCI_HCD=y
+CONFIG_USB_UHCI_HCD=y
+CONFIG_USB_PRINTER=y
+CONFIG_USB_STORAGE=y
+CONFIG_USB_OTG_WAKELOCK=y
+CONFIG_EDAC=y
+CONFIG_RTC_CLASS=y
+# CONFIG_RTC_HCTOSYS is not set
+CONFIG_DMADEVICES=y
+CONFIG_VIRTIO_PCI=y
+CONFIG_STAGING=y
+CONFIG_ASHMEM=y
+CONFIG_ANDROID_LOW_MEMORY_KILLER=y
+CONFIG_SYNC=y
+CONFIG_SW_SYNC=y
+CONFIG_SYNC_FILE=y
+CONFIG_ION=y
+CONFIG_GOLDFISH_AUDIO=y
+CONFIG_SND_HDA_INTEL=y
+CONFIG_GOLDFISH=y
+CONFIG_GOLDFISH_PIPE=y
+CONFIG_GOLDFISH_SYNC=y
+CONFIG_ANDROID=y
+CONFIG_ANDROID_BINDER_IPC=y
+CONFIG_ISCSI_IBFT_FIND=y
+CONFIG_EXT4_FS=y
+CONFIG_EXT4_FS_SECURITY=y
+CONFIG_QUOTA=y
+CONFIG_QUOTA_NETLINK_INTERFACE=y
+# CONFIG_PRINT_QUOTA_WARNING is not set
+CONFIG_FUSE_FS=y
+CONFIG_ISO9660_FS=y
+CONFIG_JOLIET=y
+CONFIG_ZISOFS=y
+CONFIG_MSDOS_FS=y
+CONFIG_VFAT_FS=y
+CONFIG_PROC_KCORE=y
+CONFIG_TMPFS=y
+CONFIG_TMPFS_POSIX_ACL=y
+CONFIG_HUGETLBFS=y
+CONFIG_PSTORE=y
+CONFIG_PSTORE_CONSOLE=y
+CONFIG_PSTORE_RAM=y
+# CONFIG_NETWORK_FILESYSTEMS is not set
+CONFIG_NLS_DEFAULT="utf8"
+CONFIG_NLS_CODEPAGE_437=y
+CONFIG_NLS_ASCII=y
+CONFIG_NLS_ISO8859_1=y
+CONFIG_NLS_UTF8=y
+CONFIG_PRINTK_TIME=y
+CONFIG_DEBUG_INFO=y
+# CONFIG_ENABLE_WARN_DEPRECATED is not set
+# CONFIG_ENABLE_MUST_CHECK is not set
+CONFIG_FRAME_WARN=2048
+# CONFIG_UNUSED_SYMBOLS is not set
+CONFIG_MAGIC_SYSRQ=y
+CONFIG_DEBUG_MEMORY_INIT=y
+CONFIG_PANIC_TIMEOUT=5
+CONFIG_SCHEDSTATS=y
+CONFIG_TIMER_STATS=y
+CONFIG_SCHED_TRACER=y
+CONFIG_BLK_DEV_IO_TRACE=y
+CONFIG_PROVIDE_OHCI1394_DMA_INIT=y
+CONFIG_KEYS=y
+CONFIG_SECURITY=y
+CONFIG_SECURITY_NETWORK=y
+CONFIG_SECURITY_SELINUX=y
+CONFIG_CRYPTO_AES_586=y
+CONFIG_CRYPTO_TWOFISH=y
+CONFIG_ASYMMETRIC_KEY_TYPE=y
+CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
+CONFIG_X509_CERTIFICATE_PARSER=y
+CONFIG_PKCS7_MESSAGE_PARSER=y
+CONFIG_PKCS7_TEST_KEY=y
+# CONFIG_VIRTUALIZATION is not set
+CONFIG_CRC_T10DIF=y
diff --git a/arch/x86/configs/x86_64_cuttlefish_defconfig b/arch/x86/configs/x86_64_cuttlefish_defconfig
new file mode 100644
index 0000000..db63c91
--- /dev/null
+++ b/arch/x86/configs/x86_64_cuttlefish_defconfig
@@ -0,0 +1,464 @@
+CONFIG_POSIX_MQUEUE=y
+# CONFIG_FHANDLE is not set
+# CONFIG_USELIB is not set
+CONFIG_AUDIT=y
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_BSD_PROCESS_ACCT=y
+CONFIG_TASKSTATS=y
+CONFIG_TASK_DELAY_ACCT=y
+CONFIG_TASK_XACCT=y
+CONFIG_TASK_IO_ACCOUNTING=y
+CONFIG_IKCONFIG=y
+CONFIG_IKCONFIG_PROC=y
+CONFIG_CGROUPS=y
+CONFIG_MEMCG=y
+CONFIG_MEMCG_SWAP=y
+CONFIG_CGROUP_SCHED=y
+CONFIG_RT_GROUP_SCHED=y
+CONFIG_CGROUP_FREEZER=y
+CONFIG_CGROUP_CPUACCT=y
+CONFIG_CGROUP_BPF=y
+CONFIG_NAMESPACES=y
+CONFIG_BLK_DEV_INITRD=y
+# CONFIG_RD_LZ4 is not set
+CONFIG_KALLSYMS_ALL=y
+# CONFIG_PCSPKR_PLATFORM is not set
+CONFIG_BPF_SYSCALL=y
+CONFIG_EMBEDDED=y
+# CONFIG_COMPAT_BRK is not set
+CONFIG_PROFILING=y
+CONFIG_OPROFILE=y
+CONFIG_KPROBES=y
+CONFIG_JUMP_LABEL=y
+CONFIG_CC_STACKPROTECTOR_STRONG=y
+CONFIG_REFCOUNT_FULL=y
+CONFIG_MODULES=y
+CONFIG_MODULE_UNLOAD=y
+CONFIG_MODVERSIONS=y
+CONFIG_PARTITION_ADVANCED=y
+CONFIG_SMP=y
+CONFIG_HYPERVISOR_GUEST=y
+CONFIG_PARAVIRT=y
+CONFIG_PARAVIRT_SPINLOCKS=y
+CONFIG_MCORE2=y
+CONFIG_PROCESSOR_SELECT=y
+# CONFIG_CPU_SUP_CENTAUR is not set
+CONFIG_NR_CPUS=8
+CONFIG_PREEMPT=y
+# CONFIG_MICROCODE is not set
+CONFIG_X86_MSR=y
+CONFIG_X86_CPUID=y
+CONFIG_KSM=y
+CONFIG_DEFAULT_MMAP_MIN_ADDR=65536
+CONFIG_TRANSPARENT_HUGEPAGE=y
+CONFIG_ZSMALLOC=y
+# CONFIG_MTRR is not set
+CONFIG_HZ_100=y
+CONFIG_KEXEC=y
+CONFIG_CRASH_DUMP=y
+CONFIG_PHYSICAL_START=0x200000
+CONFIG_PHYSICAL_ALIGN=0x1000000
+CONFIG_CMDLINE_BOOL=y
+CONFIG_CMDLINE="console=ttyS0 reboot=p"
+CONFIG_PM_WAKELOCKS=y
+CONFIG_PM_WAKELOCKS_LIMIT=0
+# CONFIG_PM_WAKELOCKS_GC is not set
+CONFIG_PM_DEBUG=y
+CONFIG_ACPI_PROCFS_POWER=y
+# CONFIG_ACPI_FAN is not set
+# CONFIG_ACPI_THERMAL is not set
+# CONFIG_X86_PM_TIMER is not set
+CONFIG_CPU_FREQ_GOV_ONDEMAND=y
+CONFIG_X86_ACPI_CPUFREQ=y
+CONFIG_PCI_MMCONFIG=y
+CONFIG_PCI_MSI=y
+# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
+CONFIG_BINFMT_MISC=y
+CONFIG_IA32_EMULATION=y
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_XFRM_USER=y
+CONFIG_NET_KEY=y
+CONFIG_INET=y
+CONFIG_IP_MULTICAST=y
+CONFIG_IP_ADVANCED_ROUTER=y
+CONFIG_IP_MULTIPLE_TABLES=y
+CONFIG_IP_ROUTE_MULTIPATH=y
+CONFIG_IP_ROUTE_VERBOSE=y
+CONFIG_IP_MROUTE=y
+CONFIG_IP_PIMSM_V1=y
+CONFIG_IP_PIMSM_V2=y
+CONFIG_SYN_COOKIES=y
+CONFIG_NET_IPVTI=y
+CONFIG_INET_ESP=y
+# CONFIG_INET_XFRM_MODE_BEET is not set
+CONFIG_INET_DIAG_DESTROY=y
+CONFIG_TCP_CONG_ADVANCED=y
+# CONFIG_TCP_CONG_BIC is not set
+# CONFIG_TCP_CONG_WESTWOOD is not set
+# CONFIG_TCP_CONG_HTCP is not set
+CONFIG_TCP_MD5SIG=y
+CONFIG_IPV6_ROUTER_PREF=y
+CONFIG_IPV6_ROUTE_INFO=y
+CONFIG_IPV6_OPTIMISTIC_DAD=y
+CONFIG_INET6_AH=y
+CONFIG_INET6_ESP=y
+CONFIG_INET6_IPCOMP=y
+CONFIG_IPV6_MIP6=y
+CONFIG_IPV6_VTI=y
+CONFIG_IPV6_MULTIPLE_TABLES=y
+CONFIG_NETLABEL=y
+CONFIG_NETFILTER=y
+CONFIG_NF_CONNTRACK=y
+CONFIG_NF_CONNTRACK_SECMARK=y
+CONFIG_NF_CONNTRACK_EVENTS=y
+CONFIG_NF_CONNTRACK_AMANDA=y
+CONFIG_NF_CONNTRACK_FTP=y
+CONFIG_NF_CONNTRACK_H323=y
+CONFIG_NF_CONNTRACK_IRC=y
+CONFIG_NF_CONNTRACK_NETBIOS_NS=y
+CONFIG_NF_CONNTRACK_PPTP=y
+CONFIG_NF_CONNTRACK_SANE=y
+CONFIG_NF_CONNTRACK_TFTP=y
+CONFIG_NF_CT_NETLINK=y
+CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y
+CONFIG_NETFILTER_XT_TARGET_CONNMARK=y
+CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y
+CONFIG_NETFILTER_XT_TARGET_IDLETIMER=y
+CONFIG_NETFILTER_XT_TARGET_MARK=y
+CONFIG_NETFILTER_XT_TARGET_NFLOG=y
+CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y
+CONFIG_NETFILTER_XT_TARGET_TPROXY=y
+CONFIG_NETFILTER_XT_TARGET_TRACE=y
+CONFIG_NETFILTER_XT_TARGET_SECMARK=y
+CONFIG_NETFILTER_XT_TARGET_TCPMSS=y
+CONFIG_NETFILTER_XT_MATCH_BPF=y
+CONFIG_NETFILTER_XT_MATCH_COMMENT=y
+CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y
+CONFIG_NETFILTER_XT_MATCH_CONNMARK=y
+CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y
+CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y
+CONFIG_NETFILTER_XT_MATCH_HELPER=y
+CONFIG_NETFILTER_XT_MATCH_IPRANGE=y
+CONFIG_NETFILTER_XT_MATCH_LENGTH=y
+CONFIG_NETFILTER_XT_MATCH_LIMIT=y
+CONFIG_NETFILTER_XT_MATCH_MAC=y
+CONFIG_NETFILTER_XT_MATCH_MARK=y
+CONFIG_NETFILTER_XT_MATCH_POLICY=y
+CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y
+CONFIG_NETFILTER_XT_MATCH_QTAGUID=y
+CONFIG_NETFILTER_XT_MATCH_QUOTA=y
+CONFIG_NETFILTER_XT_MATCH_QUOTA2=y
+CONFIG_NETFILTER_XT_MATCH_SOCKET=y
+CONFIG_NETFILTER_XT_MATCH_STATE=y
+CONFIG_NETFILTER_XT_MATCH_STATISTIC=y
+CONFIG_NETFILTER_XT_MATCH_STRING=y
+CONFIG_NETFILTER_XT_MATCH_TIME=y
+CONFIG_NETFILTER_XT_MATCH_U32=y
+CONFIG_NF_CONNTRACK_IPV4=y
+CONFIG_NF_SOCKET_IPV4=y
+CONFIG_IP_NF_IPTABLES=y
+CONFIG_IP_NF_MATCH_AH=y
+CONFIG_IP_NF_MATCH_ECN=y
+CONFIG_IP_NF_MATCH_TTL=y
+CONFIG_IP_NF_FILTER=y
+CONFIG_IP_NF_TARGET_REJECT=y
+CONFIG_IP_NF_NAT=y
+CONFIG_IP_NF_TARGET_MASQUERADE=y
+CONFIG_IP_NF_TARGET_NETMAP=y
+CONFIG_IP_NF_TARGET_REDIRECT=y
+CONFIG_IP_NF_MANGLE=y
+CONFIG_IP_NF_RAW=y
+CONFIG_IP_NF_SECURITY=y
+CONFIG_IP_NF_ARPTABLES=y
+CONFIG_IP_NF_ARPFILTER=y
+CONFIG_IP_NF_ARP_MANGLE=y
+CONFIG_NF_CONNTRACK_IPV6=y
+CONFIG_NF_SOCKET_IPV6=y
+CONFIG_IP6_NF_IPTABLES=y
+CONFIG_IP6_NF_MATCH_IPV6HEADER=y
+CONFIG_IP6_NF_MATCH_RPFILTER=y
+CONFIG_IP6_NF_FILTER=y
+CONFIG_IP6_NF_TARGET_REJECT=y
+CONFIG_IP6_NF_MANGLE=y
+CONFIG_IP6_NF_RAW=y
+CONFIG_NET_SCHED=y
+CONFIG_NET_SCH_HTB=y
+CONFIG_NET_CLS_U32=y
+CONFIG_NET_EMATCH=y
+CONFIG_NET_EMATCH_U32=y
+CONFIG_NET_CLS_ACT=y
+CONFIG_CFG80211=y
+CONFIG_MAC80211=y
+CONFIG_RFKILL=y
+CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
+CONFIG_DEVTMPFS=y
+CONFIG_DEBUG_DEVRES=y
+CONFIG_OF=y
+CONFIG_OF_UNITTEST=y
+# CONFIG_PNP_DEBUG_MESSAGES is not set
+CONFIG_ZRAM=y
+CONFIG_BLK_DEV_LOOP=y
+CONFIG_BLK_DEV_RAM=y
+CONFIG_BLK_DEV_RAM_SIZE=8192
+CONFIG_VIRTIO_BLK=y
+CONFIG_UID_SYS_STATS=y
+CONFIG_MEMORY_STATE_TIME=y
+CONFIG_SCSI=y
+CONFIG_BLK_DEV_SD=y
+CONFIG_BLK_DEV_SR=y
+CONFIG_BLK_DEV_SR_VENDOR=y
+CONFIG_CHR_DEV_SG=y
+CONFIG_SCSI_CONSTANTS=y
+CONFIG_SCSI_SPI_ATTRS=y
+CONFIG_SCSI_VIRTIO=y
+CONFIG_MD=y
+CONFIG_BLK_DEV_DM=y
+CONFIG_DM_CRYPT=y
+CONFIG_DM_MIRROR=y
+CONFIG_DM_ZERO=y
+CONFIG_DM_UEVENT=y
+CONFIG_DM_VERITY=y
+CONFIG_DM_VERITY_HASH_PREFETCH_MIN_SIZE=1
+CONFIG_DM_VERITY_FEC=y
+CONFIG_DM_ANDROID_VERITY=y
+CONFIG_NETDEVICES=y
+CONFIG_NETCONSOLE=y
+CONFIG_NETCONSOLE_DYNAMIC=y
+CONFIG_TUN=y
+CONFIG_VIRTIO_NET=y
+# CONFIG_ETHERNET is not set
+CONFIG_PPP=y
+CONFIG_PPP_BSDCOMP=y
+CONFIG_PPP_DEFLATE=y
+CONFIG_PPP_MPPE=y
+CONFIG_USB_USBNET=y
+# CONFIG_USB_NET_AX8817X is not set
+# CONFIG_USB_NET_AX88179_178A is not set
+# CONFIG_USB_NET_CDCETHER is not set
+# CONFIG_USB_NET_CDC_NCM is not set
+# CONFIG_USB_NET_NET1080 is not set
+# CONFIG_USB_NET_CDC_SUBSET is not set
+# CONFIG_USB_NET_ZAURUS is not set
+# CONFIG_WLAN_VENDOR_ADMTEK is not set
+# CONFIG_WLAN_VENDOR_ATH is not set
+# CONFIG_WLAN_VENDOR_ATMEL is not set
+# CONFIG_WLAN_VENDOR_BROADCOM is not set
+# CONFIG_WLAN_VENDOR_CISCO is not set
+# CONFIG_WLAN_VENDOR_INTEL is not set
+# CONFIG_WLAN_VENDOR_INTERSIL is not set
+# CONFIG_WLAN_VENDOR_MARVELL is not set
+# CONFIG_WLAN_VENDOR_MEDIATEK is not set
+# CONFIG_WLAN_VENDOR_RALINK is not set
+# CONFIG_WLAN_VENDOR_REALTEK is not set
+# CONFIG_WLAN_VENDOR_RSI is not set
+# CONFIG_WLAN_VENDOR_ST is not set
+# CONFIG_WLAN_VENDOR_TI is not set
+# CONFIG_WLAN_VENDOR_ZYDAS is not set
+# CONFIG_WLAN_VENDOR_QUANTENNA is not set
+CONFIG_MAC80211_HWSIM=y
+CONFIG_INPUT_EVDEV=y
+CONFIG_INPUT_KEYRESET=y
+# CONFIG_INPUT_KEYBOARD is not set
+# CONFIG_INPUT_MOUSE is not set
+CONFIG_INPUT_JOYSTICK=y
+CONFIG_JOYSTICK_XPAD=y
+CONFIG_JOYSTICK_XPAD_FF=y
+CONFIG_JOYSTICK_XPAD_LEDS=y
+CONFIG_INPUT_TABLET=y
+CONFIG_TABLET_USB_ACECAD=y
+CONFIG_TABLET_USB_AIPTEK=y
+CONFIG_TABLET_USB_GTCO=y
+CONFIG_TABLET_USB_HANWANG=y
+CONFIG_TABLET_USB_KBTAB=y
+CONFIG_INPUT_MISC=y
+CONFIG_INPUT_KEYCHORD=y
+CONFIG_INPUT_UINPUT=y
+CONFIG_INPUT_GPIO=y
+# CONFIG_SERIO_I8042 is not set
+# CONFIG_VT is not set
+# CONFIG_LEGACY_PTYS is not set
+# CONFIG_DEVMEM is not set
+CONFIG_SERIAL_8250=y
+# CONFIG_SERIAL_8250_DEPRECATED_OPTIONS is not set
+CONFIG_SERIAL_8250_CONSOLE=y
+# CONFIG_SERIAL_8250_EXAR is not set
+CONFIG_SERIAL_8250_NR_UARTS=48
+CONFIG_SERIAL_8250_EXTENDED=y
+CONFIG_SERIAL_8250_MANY_PORTS=y
+CONFIG_SERIAL_8250_SHARE_IRQ=y
+CONFIG_VIRTIO_CONSOLE=y
+CONFIG_HW_RANDOM=y
+# CONFIG_HW_RANDOM_INTEL is not set
+# CONFIG_HW_RANDOM_AMD is not set
+# CONFIG_HW_RANDOM_VIA is not set
+CONFIG_HW_RANDOM_VIRTIO=y
+CONFIG_HPET=y
+# CONFIG_HPET_MMAP_DEFAULT is not set
+# CONFIG_DEVPORT is not set
+# CONFIG_ACPI_I2C_OPREGION is not set
+# CONFIG_I2C_COMPAT is not set
+# CONFIG_I2C_HELPER_AUTO is not set
+CONFIG_PTP_1588_CLOCK=y
+# CONFIG_HWMON is not set
+# CONFIG_X86_PKG_TEMP_THERMAL is not set
+CONFIG_WATCHDOG=y
+CONFIG_SOFT_WATCHDOG=y
+CONFIG_MEDIA_SUPPORT=y
+# CONFIG_VGA_ARB is not set
+CONFIG_DRM=y
+# CONFIG_DRM_FBDEV_EMULATION is not set
+CONFIG_DRM_VIRTIO_GPU=y
+CONFIG_SOUND=y
+CONFIG_SND=y
+CONFIG_HIDRAW=y
+CONFIG_UHID=y
+CONFIG_HID_A4TECH=y
+CONFIG_HID_ACRUX=y
+CONFIG_HID_ACRUX_FF=y
+CONFIG_HID_APPLE=y
+CONFIG_HID_BELKIN=y
+CONFIG_HID_CHERRY=y
+CONFIG_HID_CHICONY=y
+CONFIG_HID_PRODIKEYS=y
+CONFIG_HID_CYPRESS=y
+CONFIG_HID_DRAGONRISE=y
+CONFIG_DRAGONRISE_FF=y
+CONFIG_HID_EMS_FF=y
+CONFIG_HID_ELECOM=y
+CONFIG_HID_EZKEY=y
+CONFIG_HID_HOLTEK=y
+CONFIG_HID_KEYTOUCH=y
+CONFIG_HID_KYE=y
+CONFIG_HID_UCLOGIC=y
+CONFIG_HID_WALTOP=y
+CONFIG_HID_GYRATION=y
+CONFIG_HID_TWINHAN=y
+CONFIG_HID_KENSINGTON=y
+CONFIG_HID_LCPOWER=y
+CONFIG_HID_LOGITECH=y
+CONFIG_HID_LOGITECH_DJ=y
+CONFIG_LOGITECH_FF=y
+CONFIG_LOGIRUMBLEPAD2_FF=y
+CONFIG_LOGIG940_FF=y
+CONFIG_HID_MAGICMOUSE=y
+CONFIG_HID_MICROSOFT=y
+CONFIG_HID_MONTEREY=y
+CONFIG_HID_MULTITOUCH=y
+CONFIG_HID_NTRIG=y
+CONFIG_HID_ORTEK=y
+CONFIG_HID_PANTHERLORD=y
+CONFIG_PANTHERLORD_FF=y
+CONFIG_HID_PETALYNX=y
+CONFIG_HID_PICOLCD=y
+CONFIG_HID_PRIMAX=y
+CONFIG_HID_ROCCAT=y
+CONFIG_HID_SAITEK=y
+CONFIG_HID_SAMSUNG=y
+CONFIG_HID_SONY=y
+CONFIG_HID_SPEEDLINK=y
+CONFIG_HID_SUNPLUS=y
+CONFIG_HID_GREENASIA=y
+CONFIG_GREENASIA_FF=y
+CONFIG_HID_SMARTJOYPLUS=y
+CONFIG_SMARTJOYPLUS_FF=y
+CONFIG_HID_TIVO=y
+CONFIG_HID_TOPSEED=y
+CONFIG_HID_THRUSTMASTER=y
+CONFIG_HID_WACOM=y
+CONFIG_HID_WIIMOTE=y
+CONFIG_HID_ZEROPLUS=y
+CONFIG_HID_ZYDACRON=y
+CONFIG_USB_HIDDEV=y
+CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
+CONFIG_USB_EHCI_HCD=y
+CONFIG_USB_GADGET=y
+CONFIG_USB_DUMMY_HCD=y
+CONFIG_USB_CONFIGFS=y
+CONFIG_USB_CONFIGFS_F_FS=y
+CONFIG_USB_CONFIGFS_F_ACC=y
+CONFIG_USB_CONFIGFS_F_AUDIO_SRC=y
+CONFIG_USB_CONFIGFS_UEVENT=y
+CONFIG_USB_CONFIGFS_F_MIDI=y
+CONFIG_RTC_CLASS=y
+# CONFIG_RTC_HCTOSYS is not set
+CONFIG_SW_SYNC=y
+CONFIG_VIRTIO_PCI=y
+CONFIG_VIRTIO_BALLOON=y
+CONFIG_VIRTIO_MMIO=y
+CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES=y
+CONFIG_STAGING=y
+CONFIG_ASHMEM=y
+CONFIG_ANDROID_VSOC=y
+CONFIG_ION=y
+# CONFIG_X86_PLATFORM_DEVICES is not set
+# CONFIG_IOMMU_SUPPORT is not set
+CONFIG_ANDROID=y
+CONFIG_ANDROID_BINDER_IPC=y
+# CONFIG_FIRMWARE_MEMMAP is not set
+CONFIG_EXT4_FS=y
+CONFIG_EXT4_FS_POSIX_ACL=y
+CONFIG_EXT4_FS_SECURITY=y
+CONFIG_EXT4_ENCRYPTION=y
+CONFIG_F2FS_FS=y
+CONFIG_F2FS_FS_SECURITY=y
+CONFIG_F2FS_FS_ENCRYPTION=y
+CONFIG_QUOTA=y
+CONFIG_QUOTA_NETLINK_INTERFACE=y
+# CONFIG_PRINT_QUOTA_WARNING is not set
+CONFIG_QFMT_V2=y
+CONFIG_AUTOFS4_FS=y
+CONFIG_MSDOS_FS=y
+CONFIG_VFAT_FS=y
+CONFIG_PROC_KCORE=y
+CONFIG_TMPFS=y
+CONFIG_TMPFS_POSIX_ACL=y
+CONFIG_HUGETLBFS=y
+CONFIG_SDCARD_FS=y
+CONFIG_PSTORE=y
+CONFIG_PSTORE_CONSOLE=y
+CONFIG_PSTORE_RAM=y
+CONFIG_NLS_DEFAULT="utf8"
+CONFIG_NLS_CODEPAGE_437=y
+CONFIG_NLS_ASCII=y
+CONFIG_NLS_ISO8859_1=y
+CONFIG_NLS_UTF8=y
+CONFIG_PRINTK_TIME=y
+CONFIG_DEBUG_INFO=y
+# CONFIG_ENABLE_WARN_DEPRECATED is not set
+# CONFIG_ENABLE_MUST_CHECK is not set
+CONFIG_FRAME_WARN=1024
+# CONFIG_UNUSED_SYMBOLS is not set
+CONFIG_MAGIC_SYSRQ=y
+CONFIG_DEBUG_STACK_USAGE=y
+CONFIG_DEBUG_MEMORY_INIT=y
+CONFIG_DEBUG_STACKOVERFLOW=y
+CONFIG_HARDLOCKUP_DETECTOR=y
+CONFIG_PANIC_TIMEOUT=5
+CONFIG_SCHEDSTATS=y
+CONFIG_RCU_CPU_STALL_TIMEOUT=60
+CONFIG_ENABLE_DEFAULT_TRACERS=y
+CONFIG_IO_DELAY_NONE=y
+CONFIG_DEBUG_BOOT_PARAMS=y
+CONFIG_OPTIMIZE_INLINING=y
+CONFIG_UNWINDER_FRAME_POINTER=y
+CONFIG_SECURITY_PERF_EVENTS_RESTRICT=y
+CONFIG_SECURITY=y
+CONFIG_SECURITY_NETWORK=y
+CONFIG_SECURITY_PATH=y
+CONFIG_HARDENED_USERCOPY=y
+CONFIG_SECURITY_SELINUX=y
+CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE=1
+CONFIG_CRYPTO_RSA=y
+# CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is not set
+CONFIG_CRYPTO_SHA512=y
+CONFIG_CRYPTO_LZ4=y
+CONFIG_CRYPTO_ZSTD=y
+CONFIG_CRYPTO_DEV_VIRTIO=y
+CONFIG_ASYMMETRIC_KEY_TYPE=y
+CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
+CONFIG_X509_CERTIFICATE_PARSER=y
+CONFIG_SYSTEM_TRUSTED_KEYRING=y
+CONFIG_SYSTEM_TRUSTED_KEYS="verity_dev_keys.x509"
diff --git a/arch/x86/configs/x86_64_ranchu_defconfig b/arch/x86/configs/x86_64_ranchu_defconfig
new file mode 100644
index 0000000..81202e3
--- /dev/null
+++ b/arch/x86/configs/x86_64_ranchu_defconfig
@@ -0,0 +1,416 @@
+# CONFIG_LOCALVERSION_AUTO is not set
+CONFIG_POSIX_MQUEUE=y
+CONFIG_AUDIT=y
+CONFIG_NO_HZ=y
+CONFIG_HIGH_RES_TIMERS=y
+CONFIG_BSD_PROCESS_ACCT=y
+CONFIG_TASKSTATS=y
+CONFIG_TASK_DELAY_ACCT=y
+CONFIG_TASK_XACCT=y
+CONFIG_TASK_IO_ACCOUNTING=y
+CONFIG_CGROUPS=y
+CONFIG_CGROUP_DEBUG=y
+CONFIG_CGROUP_FREEZER=y
+CONFIG_CGROUP_CPUACCT=y
+CONFIG_CGROUP_SCHED=y
+CONFIG_RT_GROUP_SCHED=y
+CONFIG_BLK_DEV_INITRD=y
+CONFIG_CC_OPTIMIZE_FOR_SIZE=y
+CONFIG_SYSCTL_SYSCALL=y
+CONFIG_KALLSYMS_ALL=y
+CONFIG_EMBEDDED=y
+# CONFIG_COMPAT_BRK is not set
+CONFIG_ARCH_MMAP_RND_BITS=32
+CONFIG_ARCH_MMAP_RND_COMPAT_BITS=16
+CONFIG_PARTITION_ADVANCED=y
+CONFIG_OSF_PARTITION=y
+CONFIG_AMIGA_PARTITION=y
+CONFIG_MAC_PARTITION=y
+CONFIG_BSD_DISKLABEL=y
+CONFIG_MINIX_SUBPARTITION=y
+CONFIG_SOLARIS_X86_PARTITION=y
+CONFIG_UNIXWARE_DISKLABEL=y
+CONFIG_SGI_PARTITION=y
+CONFIG_SUN_PARTITION=y
+CONFIG_KARMA_PARTITION=y
+CONFIG_SMP=y
+CONFIG_MCORE2=y
+CONFIG_MAXSMP=y
+CONFIG_PREEMPT=y
+# CONFIG_X86_MCE is not set
+CONFIG_X86_MSR=y
+CONFIG_X86_CPUID=y
+CONFIG_KSM=y
+CONFIG_CMA=y
+# CONFIG_MTRR_SANITIZER is not set
+CONFIG_EFI=y
+CONFIG_EFI_STUB=y
+CONFIG_HZ_100=y
+CONFIG_PHYSICAL_START=0x100000
+CONFIG_PM_AUTOSLEEP=y
+CONFIG_PM_WAKELOCKS=y
+CONFIG_PM_WAKELOCKS_LIMIT=0
+# CONFIG_PM_WAKELOCKS_GC is not set
+CONFIG_PM_DEBUG=y
+CONFIG_CPU_FREQ=y
+# CONFIG_CPU_FREQ_STAT is not set
+CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
+CONFIG_CPU_FREQ_GOV_USERSPACE=y
+CONFIG_PCI_MMCONFIG=y
+CONFIG_PCIEPORTBUS=y
+# CONFIG_PCIEASPM is not set
+CONFIG_PCCARD=y
+CONFIG_YENTA=y
+CONFIG_HOTPLUG_PCI=y
+# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
+CONFIG_BINFMT_MISC=y
+CONFIG_IA32_EMULATION=y
+CONFIG_NET=y
+CONFIG_PACKET=y
+CONFIG_UNIX=y
+CONFIG_XFRM_USER=y
+CONFIG_NET_KEY=y
+CONFIG_INET=y
+CONFIG_IP_MULTICAST=y
+CONFIG_IP_ADVANCED_ROUTER=y
+CONFIG_IP_MULTIPLE_TABLES=y
+CONFIG_IP_ROUTE_MULTIPATH=y
+CONFIG_IP_ROUTE_VERBOSE=y
+CONFIG_IP_PNP=y
+CONFIG_IP_PNP_DHCP=y
+CONFIG_IP_PNP_BOOTP=y
+CONFIG_IP_PNP_RARP=y
+CONFIG_IP_MROUTE=y
+CONFIG_IP_PIMSM_V1=y
+CONFIG_IP_PIMSM_V2=y
+CONFIG_SYN_COOKIES=y
+CONFIG_INET_ESP=y
+# CONFIG_INET_XFRM_MODE_BEET is not set
+# CONFIG_INET_LRO is not set
+CONFIG_INET_DIAG_DESTROY=y
+CONFIG_IPV6_ROUTER_PREF=y
+CONFIG_IPV6_ROUTE_INFO=y
+CONFIG_IPV6_OPTIMISTIC_DAD=y
+CONFIG_INET6_AH=y
+CONFIG_INET6_ESP=y
+CONFIG_INET6_IPCOMP=y
+CONFIG_IPV6_MIP6=y
+CONFIG_IPV6_MULTIPLE_TABLES=y
+CONFIG_NETLABEL=y
+CONFIG_NETFILTER=y
+CONFIG_NF_CONNTRACK=y
+CONFIG_NF_CONNTRACK_SECMARK=y
+CONFIG_NF_CONNTRACK_EVENTS=y
+CONFIG_NF_CT_PROTO_DCCP=y
+CONFIG_NF_CT_PROTO_SCTP=y
+CONFIG_NF_CT_PROTO_UDPLITE=y
+CONFIG_NF_CONNTRACK_AMANDA=y
+CONFIG_NF_CONNTRACK_FTP=y
+CONFIG_NF_CONNTRACK_H323=y
+CONFIG_NF_CONNTRACK_IRC=y
+CONFIG_NF_CONNTRACK_NETBIOS_NS=y
+CONFIG_NF_CONNTRACK_PPTP=y
+CONFIG_NF_CONNTRACK_SANE=y
+CONFIG_NF_CONNTRACK_TFTP=y
+CONFIG_NF_CT_NETLINK=y
+CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y
+CONFIG_NETFILTER_XT_TARGET_CONNMARK=y
+CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y
+CONFIG_NETFILTER_XT_TARGET_IDLETIMER=y
+CONFIG_NETFILTER_XT_TARGET_MARK=y
+CONFIG_NETFILTER_XT_TARGET_NFLOG=y
+CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y
+CONFIG_NETFILTER_XT_TARGET_TPROXY=y
+CONFIG_NETFILTER_XT_TARGET_TRACE=y
+CONFIG_NETFILTER_XT_TARGET_SECMARK=y
+CONFIG_NETFILTER_XT_TARGET_TCPMSS=y
+CONFIG_NETFILTER_XT_MATCH_COMMENT=y
+CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y
+CONFIG_NETFILTER_XT_MATCH_CONNMARK=y
+CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y
+CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y
+CONFIG_NETFILTER_XT_MATCH_HELPER=y
+CONFIG_NETFILTER_XT_MATCH_IPRANGE=y
+CONFIG_NETFILTER_XT_MATCH_LENGTH=y
+CONFIG_NETFILTER_XT_MATCH_LIMIT=y
+CONFIG_NETFILTER_XT_MATCH_MAC=y
+CONFIG_NETFILTER_XT_MATCH_MARK=y
+CONFIG_NETFILTER_XT_MATCH_POLICY=y
+CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y
+CONFIG_NETFILTER_XT_MATCH_QTAGUID=y
+CONFIG_NETFILTER_XT_MATCH_QUOTA=y
+CONFIG_NETFILTER_XT_MATCH_QUOTA2=y
+CONFIG_NETFILTER_XT_MATCH_SOCKET=y
+CONFIG_NETFILTER_XT_MATCH_STATE=y
+CONFIG_NETFILTER_XT_MATCH_STATISTIC=y
+CONFIG_NETFILTER_XT_MATCH_STRING=y
+CONFIG_NETFILTER_XT_MATCH_TIME=y
+CONFIG_NETFILTER_XT_MATCH_U32=y
+CONFIG_NF_CONNTRACK_IPV4=y
+CONFIG_IP_NF_IPTABLES=y
+CONFIG_IP_NF_MATCH_AH=y
+CONFIG_IP_NF_MATCH_ECN=y
+CONFIG_IP_NF_MATCH_TTL=y
+CONFIG_IP_NF_FILTER=y
+CONFIG_IP_NF_TARGET_REJECT=y
+CONFIG_IP_NF_MANGLE=y
+CONFIG_IP_NF_RAW=y
+CONFIG_IP_NF_SECURITY=y
+CONFIG_IP_NF_ARPTABLES=y
+CONFIG_IP_NF_ARPFILTER=y
+CONFIG_IP_NF_ARP_MANGLE=y
+CONFIG_NF_CONNTRACK_IPV6=y
+CONFIG_IP6_NF_IPTABLES=y
+CONFIG_IP6_NF_FILTER=y
+CONFIG_IP6_NF_TARGET_REJECT=y
+CONFIG_IP6_NF_MANGLE=y
+CONFIG_IP6_NF_RAW=y
+CONFIG_NET_SCHED=y
+CONFIG_NET_SCH_HTB=y
+CONFIG_NET_CLS_U32=y
+CONFIG_NET_EMATCH=y
+CONFIG_NET_EMATCH_U32=y
+CONFIG_NET_CLS_ACT=y
+CONFIG_CFG80211=y
+CONFIG_MAC80211=y
+CONFIG_MAC80211_LEDS=y
+CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
+CONFIG_DMA_CMA=y
+CONFIG_CONNECTOR=y
+CONFIG_BLK_DEV_LOOP=y
+CONFIG_BLK_DEV_RAM=y
+CONFIG_BLK_DEV_RAM_SIZE=8192
+CONFIG_VIRTIO_BLK=y
+CONFIG_BLK_DEV_SD=y
+CONFIG_BLK_DEV_SR=y
+CONFIG_BLK_DEV_SR_VENDOR=y
+CONFIG_CHR_DEV_SG=y
+CONFIG_SCSI_CONSTANTS=y
+CONFIG_SCSI_SPI_ATTRS=y
+CONFIG_SCSI_ISCSI_ATTRS=y
+# CONFIG_SCSI_LOWLEVEL is not set
+CONFIG_ATA=y
+CONFIG_SATA_AHCI=y
+CONFIG_ATA_PIIX=y
+CONFIG_PATA_AMD=y
+CONFIG_PATA_OLDPIIX=y
+CONFIG_PATA_SCH=y
+CONFIG_PATA_MPIIX=y
+CONFIG_ATA_GENERIC=y
+CONFIG_MD=y
+CONFIG_BLK_DEV_MD=y
+CONFIG_BLK_DEV_DM=y
+CONFIG_DM_DEBUG=y
+CONFIG_DM_CRYPT=y
+CONFIG_DM_MIRROR=y
+CONFIG_DM_ZERO=y
+CONFIG_DM_UEVENT=y
+CONFIG_DM_VERITY=y
+CONFIG_DM_VERITY_FEC=y
+CONFIG_NETDEVICES=y
+CONFIG_NETCONSOLE=y
+CONFIG_TUN=y
+CONFIG_VIRTIO_NET=y
+CONFIG_BNX2=y
+CONFIG_TIGON3=y
+CONFIG_NET_TULIP=y
+CONFIG_E100=y
+CONFIG_E1000=y
+CONFIG_E1000E=y
+CONFIG_SKY2=y
+CONFIG_NE2K_PCI=y
+CONFIG_FORCEDETH=y
+CONFIG_8139TOO=y
+# CONFIG_8139TOO_PIO is not set
+CONFIG_R8169=y
+CONFIG_FDDI=y
+CONFIG_PPP=y
+CONFIG_PPP_BSDCOMP=y
+CONFIG_PPP_DEFLATE=y
+CONFIG_PPP_MPPE=y
+CONFIG_USB_USBNET=y
+CONFIG_INPUT_POLLDEV=y
+# CONFIG_INPUT_MOUSEDEV_PSAUX is not set
+CONFIG_INPUT_EVDEV=y
+CONFIG_INPUT_KEYRESET=y
+# CONFIG_KEYBOARD_ATKBD is not set
+CONFIG_KEYBOARD_GOLDFISH_EVENTS=y
+# CONFIG_INPUT_MOUSE is not set
+CONFIG_INPUT_JOYSTICK=y
+CONFIG_JOYSTICK_XPAD=y
+CONFIG_JOYSTICK_XPAD_FF=y
+CONFIG_JOYSTICK_XPAD_LEDS=y
+CONFIG_INPUT_TABLET=y
+CONFIG_TABLET_USB_ACECAD=y
+CONFIG_TABLET_USB_AIPTEK=y
+CONFIG_TABLET_USB_GTCO=y
+CONFIG_TABLET_USB_HANWANG=y
+CONFIG_TABLET_USB_KBTAB=y
+CONFIG_INPUT_TOUCHSCREEN=y
+CONFIG_INPUT_MISC=y
+CONFIG_INPUT_UINPUT=y
+CONFIG_INPUT_GPIO=y
+# CONFIG_SERIO is not set
+# CONFIG_VT is not set
+# CONFIG_LEGACY_PTYS is not set
+CONFIG_SERIAL_NONSTANDARD=y
+# CONFIG_DEVMEM is not set
+# CONFIG_DEVKMEM is not set
+CONFIG_SERIAL_8250=y
+CONFIG_SERIAL_8250_CONSOLE=y
+CONFIG_VIRTIO_CONSOLE=y
+CONFIG_NVRAM=y
+CONFIG_I2C_I801=y
+CONFIG_BATTERY_GOLDFISH=y
+CONFIG_WATCHDOG=y
+CONFIG_MEDIA_SUPPORT=y
+CONFIG_AGP=y
+CONFIG_AGP_AMD64=y
+CONFIG_AGP_INTEL=y
+CONFIG_DRM=y
+CONFIG_FB_MODE_HELPERS=y
+CONFIG_FB_TILEBLITTING=y
+CONFIG_FB_EFI=y
+CONFIG_FB_GOLDFISH=y
+CONFIG_BACKLIGHT_LCD_SUPPORT=y
+# CONFIG_LCD_CLASS_DEVICE is not set
+CONFIG_SOUND=y
+CONFIG_SND=y
+CONFIG_HIDRAW=y
+CONFIG_UHID=y
+CONFIG_HID_A4TECH=y
+CONFIG_HID_ACRUX=y
+CONFIG_HID_ACRUX_FF=y
+CONFIG_HID_APPLE=y
+CONFIG_HID_BELKIN=y
+CONFIG_HID_CHERRY=y
+CONFIG_HID_CHICONY=y
+CONFIG_HID_PRODIKEYS=y
+CONFIG_HID_CYPRESS=y
+CONFIG_HID_DRAGONRISE=y
+CONFIG_DRAGONRISE_FF=y
+CONFIG_HID_EMS_FF=y
+CONFIG_HID_ELECOM=y
+CONFIG_HID_EZKEY=y
+CONFIG_HID_HOLTEK=y
+CONFIG_HID_KEYTOUCH=y
+CONFIG_HID_KYE=y
+CONFIG_HID_UCLOGIC=y
+CONFIG_HID_WALTOP=y
+CONFIG_HID_GYRATION=y
+CONFIG_HID_TWINHAN=y
+CONFIG_HID_KENSINGTON=y
+CONFIG_HID_LCPOWER=y
+CONFIG_HID_LOGITECH=y
+CONFIG_HID_LOGITECH_DJ=y
+CONFIG_LOGITECH_FF=y
+CONFIG_LOGIRUMBLEPAD2_FF=y
+CONFIG_LOGIG940_FF=y
+CONFIG_HID_MAGICMOUSE=y
+CONFIG_HID_MICROSOFT=y
+CONFIG_HID_MONTEREY=y
+CONFIG_HID_MULTITOUCH=y
+CONFIG_HID_NTRIG=y
+CONFIG_HID_ORTEK=y
+CONFIG_HID_PANTHERLORD=y
+CONFIG_PANTHERLORD_FF=y
+CONFIG_HID_PETALYNX=y
+CONFIG_HID_PICOLCD=y
+CONFIG_HID_PRIMAX=y
+CONFIG_HID_ROCCAT=y
+CONFIG_HID_SAITEK=y
+CONFIG_HID_SAMSUNG=y
+CONFIG_HID_SONY=y
+CONFIG_HID_SPEEDLINK=y
+CONFIG_HID_SUNPLUS=y
+CONFIG_HID_GREENASIA=y
+CONFIG_GREENASIA_FF=y
+CONFIG_HID_SMARTJOYPLUS=y
+CONFIG_SMARTJOYPLUS_FF=y
+CONFIG_HID_TIVO=y
+CONFIG_HID_TOPSEED=y
+CONFIG_HID_THRUSTMASTER=y
+CONFIG_HID_WACOM=y
+CONFIG_HID_WIIMOTE=y
+CONFIG_HID_ZEROPLUS=y
+CONFIG_HID_ZYDACRON=y
+CONFIG_HID_PID=y
+CONFIG_USB_HIDDEV=y
+CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
+CONFIG_USB_MON=y
+CONFIG_USB_EHCI_HCD=y
+# CONFIG_USB_EHCI_TT_NEWSCHED is not set
+CONFIG_USB_OHCI_HCD=y
+CONFIG_USB_UHCI_HCD=y
+CONFIG_USB_PRINTER=y
+CONFIG_USB_STORAGE=y
+CONFIG_USB_OTG_WAKELOCK=y
+CONFIG_EDAC=y
+CONFIG_RTC_CLASS=y
+# CONFIG_RTC_HCTOSYS is not set
+CONFIG_DMADEVICES=y
+CONFIG_VIRTIO_PCI=y
+CONFIG_STAGING=y
+CONFIG_ASHMEM=y
+CONFIG_ANDROID_LOW_MEMORY_KILLER=y
+CONFIG_SYNC=y
+CONFIG_SW_SYNC=y
+CONFIG_SYNC_FILE=y
+CONFIG_ION=y
+CONFIG_GOLDFISH_AUDIO=y
+CONFIG_SND_HDA_INTEL=y
+CONFIG_GOLDFISH=y
+CONFIG_GOLDFISH_PIPE=y
+CONFIG_GOLDFISH_SYNC=y
+CONFIG_ANDROID=y
+CONFIG_ANDROID_BINDER_IPC=y
+CONFIG_ISCSI_IBFT_FIND=y
+CONFIG_EXT4_FS=y
+CONFIG_EXT4_FS_SECURITY=y
+CONFIG_QUOTA=y
+CONFIG_QUOTA_NETLINK_INTERFACE=y
+# CONFIG_PRINT_QUOTA_WARNING is not set
+CONFIG_FUSE_FS=y
+CONFIG_ISO9660_FS=y
+CONFIG_JOLIET=y
+CONFIG_ZISOFS=y
+CONFIG_MSDOS_FS=y
+CONFIG_VFAT_FS=y
+CONFIG_PROC_KCORE=y
+CONFIG_TMPFS=y
+CONFIG_TMPFS_POSIX_ACL=y
+CONFIG_HUGETLBFS=y
+CONFIG_PSTORE=y
+CONFIG_PSTORE_CONSOLE=y
+CONFIG_PSTORE_RAM=y
+# CONFIG_NETWORK_FILESYSTEMS is not set
+CONFIG_NLS_DEFAULT="utf8"
+CONFIG_NLS_CODEPAGE_437=y
+CONFIG_NLS_ASCII=y
+CONFIG_NLS_ISO8859_1=y
+CONFIG_NLS_UTF8=y
+CONFIG_PRINTK_TIME=y
+CONFIG_DEBUG_INFO=y
+# CONFIG_ENABLE_WARN_DEPRECATED is not set
+# CONFIG_ENABLE_MUST_CHECK is not set
+# CONFIG_UNUSED_SYMBOLS is not set
+CONFIG_MAGIC_SYSRQ=y
+CONFIG_DEBUG_MEMORY_INIT=y
+CONFIG_PANIC_TIMEOUT=5
+CONFIG_SCHEDSTATS=y
+CONFIG_TIMER_STATS=y
+CONFIG_SCHED_TRACER=y
+CONFIG_BLK_DEV_IO_TRACE=y
+CONFIG_PROVIDE_OHCI1394_DMA_INIT=y
+CONFIG_KEYS=y
+CONFIG_SECURITY=y
+CONFIG_SECURITY_NETWORK=y
+CONFIG_SECURITY_SELINUX=y
+CONFIG_CRYPTO_TWOFISH=y
+CONFIG_ASYMMETRIC_KEY_TYPE=y
+CONFIG_ASYMMETRIC_PUBLIC_KEY_SUBTYPE=y
+CONFIG_X509_CERTIFICATE_PARSER=y
+CONFIG_PKCS7_MESSAGE_PARSER=y
+CONFIG_PKCS7_TEST_KEY=y
+# CONFIG_VIRTUALIZATION is not set
+CONFIG_CRC_T10DIF=y
diff --git a/arch/x86/entry/vdso/Makefile b/arch/x86/entry/vdso/Makefile
index b545bf9..7faa3ba 100644
--- a/arch/x86/entry/vdso/Makefile
+++ b/arch/x86/entry/vdso/Makefile
@@ -174,7 +174,8 @@
sh $(srctree)/$(src)/checkundef.sh '$(NM)' '$@'
VDSO_LDFLAGS = -fPIC -shared $(call cc-ldoption, -Wl$(comma)--hash-style=both) \
- $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic $(LTO_CFLAGS)
+ $(call cc-ldoption, -Wl$(comma)--build-id) -Wl,-Bsymbolic $(LTO_CFLAGS) \
+ $(filter --target=% --gcc-toolchain=%,$(KBUILD_CFLAGS))
GCOV_PROFILE := n
#
diff --git a/arch/x86/oprofile/nmi_int.c b/arch/x86/oprofile/nmi_int.c
index abff76b..a7a7677 100644
--- a/arch/x86/oprofile/nmi_int.c
+++ b/arch/x86/oprofile/nmi_int.c
@@ -592,7 +592,7 @@
static int force_cpu_type;
-static int set_cpu_type(const char *str, struct kernel_param *kp)
+static int set_cpu_type(const char *str, const struct kernel_param *kp)
{
if (!strcmp(str, "timer")) {
force_cpu_type = timer;
diff --git a/arch/x86/xen/smp_pv.c b/arch/x86/xen/smp_pv.c
index db6d90e..e3b18ad 100644
--- a/arch/x86/xen/smp_pv.c
+++ b/arch/x86/xen/smp_pv.c
@@ -430,6 +430,7 @@
* data back is to call:
*/
tick_nohz_idle_enter();
+ tick_nohz_idle_stop_tick_protected();
cpuhp_online_idle(CPUHP_AP_ONLINE_IDLE);
}
diff --git a/build.config.cuttlefish.x86_64 b/build.config.cuttlefish.x86_64
new file mode 100644
index 0000000..694ed56
--- /dev/null
+++ b/build.config.cuttlefish.x86_64
@@ -0,0 +1,16 @@
+ARCH=x86_64
+BRANCH=android-4.14
+CLANG_TRIPLE=x86_64-linux-gnu-
+CROSS_COMPILE=x86_64-linux-androidkernel-
+DEFCONFIG=x86_64_cuttlefish_defconfig
+EXTRA_CMDS=''
+KERNEL_DIR=common
+POST_DEFCONFIG_CMDS="check_defconfig"
+CLANG_PREBUILT_BIN=prebuilts-master/clang/host/linux-x86/clang-r328903/bin
+LINUX_GCC_CROSS_COMPILE_PREBUILTS_BIN=prebuilts/gcc/linux-x86/x86/x86_64-linux-android-4.9/bin
+FILES="
+arch/x86/boot/bzImage
+vmlinux
+System.map
+"
+STOP_SHIP_TRACEPRINTK=1
diff --git a/build.config.goldfish.arm b/build.config.goldfish.arm
new file mode 100644
index 0000000..ff5646a
--- /dev/null
+++ b/build.config.goldfish.arm
@@ -0,0 +1,13 @@
+ARCH=arm
+BRANCH=android-4.4
+CROSS_COMPILE=arm-linux-androidkernel-
+DEFCONFIG=ranchu_defconfig
+EXTRA_CMDS=''
+KERNEL_DIR=common
+LINUX_GCC_CROSS_COMPILE_PREBUILTS_BIN=prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.9/bin
+FILES="
+arch/arm/boot/zImage
+vmlinux
+System.map
+"
+STOP_SHIP_TRACEPRINTK=1
diff --git a/build.config.goldfish.arm64 b/build.config.goldfish.arm64
new file mode 100644
index 0000000..4c896a6
--- /dev/null
+++ b/build.config.goldfish.arm64
@@ -0,0 +1,13 @@
+ARCH=arm64
+BRANCH=android-4.4
+CROSS_COMPILE=aarch64-linux-android-
+DEFCONFIG=ranchu64_defconfig
+EXTRA_CMDS=''
+KERNEL_DIR=common
+LINUX_GCC_CROSS_COMPILE_PREBUILTS_BIN=prebuilts/gcc/linux-x86/aarch64/aarch64-linux-android-4.9/bin
+FILES="
+arch/arm64/boot/Image
+vmlinux
+System.map
+"
+STOP_SHIP_TRACEPRINTK=1
diff --git a/build.config.goldfish.mips b/build.config.goldfish.mips
new file mode 100644
index 0000000..9a14a44
--- /dev/null
+++ b/build.config.goldfish.mips
@@ -0,0 +1,12 @@
+ARCH=mips
+BRANCH=android-4.4
+CROSS_COMPILE=mips64el-linux-android-
+DEFCONFIG=ranchu_defconfig
+EXTRA_CMDS=''
+KERNEL_DIR=common
+LINUX_GCC_CROSS_COMPILE_PREBUILTS_BIN=prebuilts/gcc/linux-x86/mips/mips64el-linux-android-4.9/bin
+FILES="
+vmlinux
+System.map
+"
+STOP_SHIP_TRACEPRINTK=1
diff --git a/build.config.goldfish.mips64 b/build.config.goldfish.mips64
new file mode 100644
index 0000000..6ad9759
--- /dev/null
+++ b/build.config.goldfish.mips64
@@ -0,0 +1,12 @@
+ARCH=mips
+BRANCH=android-4.4
+CROSS_COMPILE=mips64el-linux-android-
+DEFCONFIG=ranchu64_defconfig
+EXTRA_CMDS=''
+KERNEL_DIR=common
+LINUX_GCC_CROSS_COMPILE_PREBUILTS_BIN=prebuilts/gcc/linux-x86/mips/mips64el-linux-android-4.9/bin
+FILES="
+vmlinux
+System.map
+"
+STOP_SHIP_TRACEPRINTK=1
diff --git a/build.config.goldfish.x86 b/build.config.goldfish.x86
new file mode 100644
index 0000000..2266c62
--- /dev/null
+++ b/build.config.goldfish.x86
@@ -0,0 +1,13 @@
+ARCH=x86
+BRANCH=android-4.4
+CROSS_COMPILE=x86_64-linux-android-
+DEFCONFIG=i386_ranchu_defconfig
+EXTRA_CMDS=''
+KERNEL_DIR=common
+LINUX_GCC_CROSS_COMPILE_PREBUILTS_BIN=prebuilts/gcc/linux-x86/x86/x86_64-linux-android-4.9/bin
+FILES="
+arch/x86/boot/bzImage
+vmlinux
+System.map
+"
+STOP_SHIP_TRACEPRINTK=1
diff --git a/build.config.goldfish.x86_64 b/build.config.goldfish.x86_64
new file mode 100644
index 0000000..08c42c2
--- /dev/null
+++ b/build.config.goldfish.x86_64
@@ -0,0 +1,13 @@
+ARCH=x86_64
+BRANCH=android-4.4
+CROSS_COMPILE=x86_64-linux-android-
+DEFCONFIG=x86_64_ranchu_defconfig
+EXTRA_CMDS=''
+KERNEL_DIR=common
+LINUX_GCC_CROSS_COMPILE_PREBUILTS_BIN=prebuilts/gcc/linux-x86/x86/x86_64-linux-android-4.9/bin
+FILES="
+arch/x86/boot/bzImage
+vmlinux
+System.map
+"
+STOP_SHIP_TRACEPRINTK=1
diff --git a/certs/system_keyring.c b/certs/system_keyring.c
index 8172871..4ba922f 100644
--- a/certs/system_keyring.c
+++ b/certs/system_keyring.c
@@ -264,5 +264,46 @@
return ret;
}
EXPORT_SYMBOL_GPL(verify_pkcs7_signature);
-
#endif /* CONFIG_SYSTEM_DATA_VERIFICATION */
+
+/**
+ * verify_signature_one - Verify a signature with keys from given keyring
+ * @sig: The signature to be verified
+ * @trusted_keys: Trusted keys to use (NULL for builtin trusted keys only,
+ * (void *)1UL for all trusted keys).
+ * @keyid: key description (not partial)
+ */
+int verify_signature_one(const struct public_key_signature *sig,
+ struct key *trusted_keys, const char *keyid)
+{
+ key_ref_t ref;
+ struct key *key;
+ int ret;
+
+ if (!sig)
+ return -EBADMSG;
+ if (!trusted_keys) {
+ trusted_keys = builtin_trusted_keys;
+ } else if (trusted_keys == (void *)1UL) {
+#ifdef CONFIG_SECONDARY_TRUSTED_KEYRING
+ trusted_keys = secondary_trusted_keys;
+#else
+ trusted_keys = builtin_trusted_keys;
+#endif
+ }
+
+ ref = keyring_search(make_key_ref(trusted_keys, 1),
+ &key_type_asymmetric, keyid);
+ if (IS_ERR(ref)) {
+ pr_err("Asymmetric key (%s) not found in keyring(%s)\n",
+ keyid, trusted_keys->description);
+ return -ENOKEY;
+ }
+
+ key = key_ref_to_ptr(ref);
+ ret = verify_signature(key, sig);
+ key_put(key);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(verify_signature_one);
+
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 5579eb8..f7bc7e3 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1468,6 +1468,20 @@
See also:
<http://www.cl.cam.ac.uk/~rja14/serpent.html>
+config CRYPTO_SPECK
+ tristate "Speck cipher algorithm"
+ select CRYPTO_ALGAPI
+ help
+ Speck is a lightweight block cipher that is tuned for optimal
+ performance in software (rather than hardware).
+
+ Speck may not be as secure as AES, and should only be used on systems
+ where AES is not fast enough.
+
+ See also: <https://eprint.iacr.org/2013/404.pdf>
+
+ If unsure, say N.
+
config CRYPTO_TEA
tristate "TEA, XTEA and XETA cipher algorithms"
select CRYPTO_ALGAPI
@@ -1637,6 +1651,15 @@
help
This is the LZ4 high compression mode algorithm.
+config CRYPTO_ZSTD
+ tristate "Zstd compression algorithm"
+ select CRYPTO_ALGAPI
+ select CRYPTO_ACOMP2
+ select ZSTD_COMPRESS
+ select ZSTD_DECOMPRESS
+ help
+ This is the zstd algorithm.
+
comment "Random Number Generation"
config CRYPTO_ANSI_CPRNG
diff --git a/crypto/Makefile b/crypto/Makefile
index 56282e2..76d889a 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -109,6 +109,7 @@
obj-$(CONFIG_CRYPTO_KHAZAD) += khazad.o
obj-$(CONFIG_CRYPTO_ANUBIS) += anubis.o
obj-$(CONFIG_CRYPTO_SEED) += seed.o
+obj-$(CONFIG_CRYPTO_SPECK) += speck.o
obj-$(CONFIG_CRYPTO_SALSA20) += salsa20_generic.o
obj-$(CONFIG_CRYPTO_CHACHA20) += chacha20_generic.o
obj-$(CONFIG_CRYPTO_POLY1305) += poly1305_generic.o
@@ -135,6 +136,7 @@
obj-$(CONFIG_CRYPTO_USER_API_SKCIPHER) += algif_skcipher.o
obj-$(CONFIG_CRYPTO_USER_API_RNG) += algif_rng.o
obj-$(CONFIG_CRYPTO_USER_API_AEAD) += algif_aead.o
+obj-$(CONFIG_CRYPTO_ZSTD) += zstd.o
ecdh_generic-y := ecc.o
ecdh_generic-y += ecdh.o
diff --git a/crypto/api.c b/crypto/api.c
index 941cd4c..2a2479d 100644
--- a/crypto/api.c
+++ b/crypto/api.c
@@ -24,6 +24,7 @@
#include <linux/sched/signal.h>
#include <linux/slab.h>
#include <linux/string.h>
+#include <linux/completion.h>
#include "internal.h"
LIST_HEAD(crypto_alg_list);
@@ -595,5 +596,17 @@
}
EXPORT_SYMBOL_GPL(crypto_has_alg);
+void crypto_req_done(struct crypto_async_request *req, int err)
+{
+ struct crypto_wait *wait = req->data;
+
+ if (err == -EINPROGRESS)
+ return;
+
+ wait->err = err;
+ complete(&wait->completion);
+}
+EXPORT_SYMBOL_GPL(crypto_req_done);
+
MODULE_DESCRIPTION("Cryptographic core API");
MODULE_LICENSE("GPL");
diff --git a/crypto/speck.c b/crypto/speck.c
new file mode 100644
index 0000000..58aa9f7
--- /dev/null
+++ b/crypto/speck.c
@@ -0,0 +1,307 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Speck: a lightweight block cipher
+ *
+ * Copyright (c) 2018 Google, Inc
+ *
+ * Speck has 10 variants, including 5 block sizes. For now we only implement
+ * the variants Speck128/128, Speck128/192, Speck128/256, Speck64/96, and
+ * Speck64/128. Speck${B}/${K} denotes the variant with a block size of B bits
+ * and a key size of K bits. The Speck128 variants are believed to be the most
+ * secure variants, and they use the same block size and key sizes as AES. The
+ * Speck64 variants are less secure, but on 32-bit processors are usually
+ * faster. The remaining variants (Speck32, Speck48, and Speck96) are even less
+ * secure and/or not as well suited for implementation on either 32-bit or
+ * 64-bit processors, so are omitted.
+ *
+ * Reference: "The Simon and Speck Families of Lightweight Block Ciphers"
+ * https://eprint.iacr.org/2013/404.pdf
+ *
+ * In a correspondence, the Speck designers have also clarified that the words
+ * should be interpreted in little-endian format, and the words should be
+ * ordered such that the first word of each block is 'y' rather than 'x', and
+ * the first key word (rather than the last) becomes the first round key.
+ */
+
+#include <asm/unaligned.h>
+#include <crypto/speck.h>
+#include <linux/bitops.h>
+#include <linux/crypto.h>
+#include <linux/init.h>
+#include <linux/module.h>
+
+/* Speck128 */
+
+static __always_inline void speck128_round(u64 *x, u64 *y, u64 k)
+{
+ *x = ror64(*x, 8);
+ *x += *y;
+ *x ^= k;
+ *y = rol64(*y, 3);
+ *y ^= *x;
+}
+
+static __always_inline void speck128_unround(u64 *x, u64 *y, u64 k)
+{
+ *y ^= *x;
+ *y = ror64(*y, 3);
+ *x ^= k;
+ *x -= *y;
+ *x = rol64(*x, 8);
+}
+
+void crypto_speck128_encrypt(const struct speck128_tfm_ctx *ctx,
+ u8 *out, const u8 *in)
+{
+ u64 y = get_unaligned_le64(in);
+ u64 x = get_unaligned_le64(in + 8);
+ int i;
+
+ for (i = 0; i < ctx->nrounds; i++)
+ speck128_round(&x, &y, ctx->round_keys[i]);
+
+ put_unaligned_le64(y, out);
+ put_unaligned_le64(x, out + 8);
+}
+EXPORT_SYMBOL_GPL(crypto_speck128_encrypt);
+
+static void speck128_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+ crypto_speck128_encrypt(crypto_tfm_ctx(tfm), out, in);
+}
+
+void crypto_speck128_decrypt(const struct speck128_tfm_ctx *ctx,
+ u8 *out, const u8 *in)
+{
+ u64 y = get_unaligned_le64(in);
+ u64 x = get_unaligned_le64(in + 8);
+ int i;
+
+ for (i = ctx->nrounds - 1; i >= 0; i--)
+ speck128_unround(&x, &y, ctx->round_keys[i]);
+
+ put_unaligned_le64(y, out);
+ put_unaligned_le64(x, out + 8);
+}
+EXPORT_SYMBOL_GPL(crypto_speck128_decrypt);
+
+static void speck128_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+ crypto_speck128_decrypt(crypto_tfm_ctx(tfm), out, in);
+}
+
+int crypto_speck128_setkey(struct speck128_tfm_ctx *ctx, const u8 *key,
+ unsigned int keylen)
+{
+ u64 l[3];
+ u64 k;
+ int i;
+
+ switch (keylen) {
+ case SPECK128_128_KEY_SIZE:
+ k = get_unaligned_le64(key);
+ l[0] = get_unaligned_le64(key + 8);
+ ctx->nrounds = SPECK128_128_NROUNDS;
+ for (i = 0; i < ctx->nrounds; i++) {
+ ctx->round_keys[i] = k;
+ speck128_round(&l[0], &k, i);
+ }
+ break;
+ case SPECK128_192_KEY_SIZE:
+ k = get_unaligned_le64(key);
+ l[0] = get_unaligned_le64(key + 8);
+ l[1] = get_unaligned_le64(key + 16);
+ ctx->nrounds = SPECK128_192_NROUNDS;
+ for (i = 0; i < ctx->nrounds; i++) {
+ ctx->round_keys[i] = k;
+ speck128_round(&l[i % 2], &k, i);
+ }
+ break;
+ case SPECK128_256_KEY_SIZE:
+ k = get_unaligned_le64(key);
+ l[0] = get_unaligned_le64(key + 8);
+ l[1] = get_unaligned_le64(key + 16);
+ l[2] = get_unaligned_le64(key + 24);
+ ctx->nrounds = SPECK128_256_NROUNDS;
+ for (i = 0; i < ctx->nrounds; i++) {
+ ctx->round_keys[i] = k;
+ speck128_round(&l[i % 3], &k, i);
+ }
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(crypto_speck128_setkey);
+
+static int speck128_setkey(struct crypto_tfm *tfm, const u8 *key,
+ unsigned int keylen)
+{
+ return crypto_speck128_setkey(crypto_tfm_ctx(tfm), key, keylen);
+}
+
+/* Speck64 */
+
+static __always_inline void speck64_round(u32 *x, u32 *y, u32 k)
+{
+ *x = ror32(*x, 8);
+ *x += *y;
+ *x ^= k;
+ *y = rol32(*y, 3);
+ *y ^= *x;
+}
+
+static __always_inline void speck64_unround(u32 *x, u32 *y, u32 k)
+{
+ *y ^= *x;
+ *y = ror32(*y, 3);
+ *x ^= k;
+ *x -= *y;
+ *x = rol32(*x, 8);
+}
+
+void crypto_speck64_encrypt(const struct speck64_tfm_ctx *ctx,
+ u8 *out, const u8 *in)
+{
+ u32 y = get_unaligned_le32(in);
+ u32 x = get_unaligned_le32(in + 4);
+ int i;
+
+ for (i = 0; i < ctx->nrounds; i++)
+ speck64_round(&x, &y, ctx->round_keys[i]);
+
+ put_unaligned_le32(y, out);
+ put_unaligned_le32(x, out + 4);
+}
+EXPORT_SYMBOL_GPL(crypto_speck64_encrypt);
+
+static void speck64_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+ crypto_speck64_encrypt(crypto_tfm_ctx(tfm), out, in);
+}
+
+void crypto_speck64_decrypt(const struct speck64_tfm_ctx *ctx,
+ u8 *out, const u8 *in)
+{
+ u32 y = get_unaligned_le32(in);
+ u32 x = get_unaligned_le32(in + 4);
+ int i;
+
+ for (i = ctx->nrounds - 1; i >= 0; i--)
+ speck64_unround(&x, &y, ctx->round_keys[i]);
+
+ put_unaligned_le32(y, out);
+ put_unaligned_le32(x, out + 4);
+}
+EXPORT_SYMBOL_GPL(crypto_speck64_decrypt);
+
+static void speck64_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
+{
+ crypto_speck64_decrypt(crypto_tfm_ctx(tfm), out, in);
+}
+
+int crypto_speck64_setkey(struct speck64_tfm_ctx *ctx, const u8 *key,
+ unsigned int keylen)
+{
+ u32 l[3];
+ u32 k;
+ int i;
+
+ switch (keylen) {
+ case SPECK64_96_KEY_SIZE:
+ k = get_unaligned_le32(key);
+ l[0] = get_unaligned_le32(key + 4);
+ l[1] = get_unaligned_le32(key + 8);
+ ctx->nrounds = SPECK64_96_NROUNDS;
+ for (i = 0; i < ctx->nrounds; i++) {
+ ctx->round_keys[i] = k;
+ speck64_round(&l[i % 2], &k, i);
+ }
+ break;
+ case SPECK64_128_KEY_SIZE:
+ k = get_unaligned_le32(key);
+ l[0] = get_unaligned_le32(key + 4);
+ l[1] = get_unaligned_le32(key + 8);
+ l[2] = get_unaligned_le32(key + 12);
+ ctx->nrounds = SPECK64_128_NROUNDS;
+ for (i = 0; i < ctx->nrounds; i++) {
+ ctx->round_keys[i] = k;
+ speck64_round(&l[i % 3], &k, i);
+ }
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(crypto_speck64_setkey);
+
+static int speck64_setkey(struct crypto_tfm *tfm, const u8 *key,
+ unsigned int keylen)
+{
+ return crypto_speck64_setkey(crypto_tfm_ctx(tfm), key, keylen);
+}
+
+/* Algorithm definitions */
+
+static struct crypto_alg speck_algs[] = {
+ {
+ .cra_name = "speck128",
+ .cra_driver_name = "speck128-generic",
+ .cra_priority = 100,
+ .cra_flags = CRYPTO_ALG_TYPE_CIPHER,
+ .cra_blocksize = SPECK128_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct speck128_tfm_ctx),
+ .cra_module = THIS_MODULE,
+ .cra_u = {
+ .cipher = {
+ .cia_min_keysize = SPECK128_128_KEY_SIZE,
+ .cia_max_keysize = SPECK128_256_KEY_SIZE,
+ .cia_setkey = speck128_setkey,
+ .cia_encrypt = speck128_encrypt,
+ .cia_decrypt = speck128_decrypt
+ }
+ }
+ }, {
+ .cra_name = "speck64",
+ .cra_driver_name = "speck64-generic",
+ .cra_priority = 100,
+ .cra_flags = CRYPTO_ALG_TYPE_CIPHER,
+ .cra_blocksize = SPECK64_BLOCK_SIZE,
+ .cra_ctxsize = sizeof(struct speck64_tfm_ctx),
+ .cra_module = THIS_MODULE,
+ .cra_u = {
+ .cipher = {
+ .cia_min_keysize = SPECK64_96_KEY_SIZE,
+ .cia_max_keysize = SPECK64_128_KEY_SIZE,
+ .cia_setkey = speck64_setkey,
+ .cia_encrypt = speck64_encrypt,
+ .cia_decrypt = speck64_decrypt
+ }
+ }
+ }
+};
+
+static int __init speck_module_init(void)
+{
+ return crypto_register_algs(speck_algs, ARRAY_SIZE(speck_algs));
+}
+
+static void __exit speck_module_exit(void)
+{
+ crypto_unregister_algs(speck_algs, ARRAY_SIZE(speck_algs));
+}
+
+module_init(speck_module_init);
+module_exit(speck_module_exit);
+
+MODULE_DESCRIPTION("Speck block cipher (generic)");
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
+MODULE_ALIAS_CRYPTO("speck128");
+MODULE_ALIAS_CRYPTO("speck128-generic");
+MODULE_ALIAS_CRYPTO("speck64");
+MODULE_ALIAS_CRYPTO("speck64-generic");
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 7125ba3..ec5d003 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -3034,6 +3034,24 @@
}
}
}, {
+ .alg = "ecb(speck128)",
+ .test = alg_test_skcipher,
+ .suite = {
+ .cipher = {
+ .enc = __VECS(speck128_enc_tv_template),
+ .dec = __VECS(speck128_dec_tv_template)
+ }
+ }
+ }, {
+ .alg = "ecb(speck64)",
+ .test = alg_test_skcipher,
+ .suite = {
+ .cipher = {
+ .enc = __VECS(speck64_enc_tv_template),
+ .dec = __VECS(speck64_dec_tv_template)
+ }
+ }
+ }, {
.alg = "ecb(tea)",
.test = alg_test_skcipher,
.suite = {
@@ -3585,6 +3603,24 @@
}
}
}, {
+ .alg = "xts(speck128)",
+ .test = alg_test_skcipher,
+ .suite = {
+ .cipher = {
+ .enc = __VECS(speck128_xts_enc_tv_template),
+ .dec = __VECS(speck128_xts_dec_tv_template)
+ }
+ }
+ }, {
+ .alg = "xts(speck64)",
+ .test = alg_test_skcipher,
+ .suite = {
+ .cipher = {
+ .enc = __VECS(speck64_xts_enc_tv_template),
+ .dec = __VECS(speck64_xts_dec_tv_template)
+ }
+ }
+ }, {
.alg = "xts(twofish)",
.test = alg_test_skcipher,
.suite = {
@@ -3603,6 +3639,16 @@
.decomp = __VECS(zlib_deflate_decomp_tv_template)
}
}
+ }, {
+ .alg = "zstd",
+ .test = alg_test_comp,
+ .fips_allowed = 1,
+ .suite = {
+ .comp = {
+ .comp = __VECS(zstd_comp_tv_template),
+ .decomp = __VECS(zstd_decomp_tv_template)
+ }
+ }
}
};
diff --git a/crypto/testmgr.h b/crypto/testmgr.h
index fbc0fab..5d634ec 100644
--- a/crypto/testmgr.h
+++ b/crypto/testmgr.h
@@ -13706,6 +13706,1492 @@
},
};
+/*
+ * Speck test vectors taken from the original paper:
+ * "The Simon and Speck Families of Lightweight Block Ciphers"
+ * https://eprint.iacr.org/2013/404.pdf
+ *
+ * Note that the paper does not make byte and word order clear. But it was
+ * confirmed with the authors that the intended orders are little endian byte
+ * order and (y, x) word order. Equivalently, the printed test vectors, when
+ * looking at only the bytes (ignoring the whitespace that divides them into
+ * words), are backwards: the left-most byte is actually the one with the
+ * highest memory address, while the right-most byte is actually the one with
+ * the lowest memory address.
+ */
+
+static const struct cipher_testvec speck128_enc_tv_template[] = {
+ { /* Speck128/128 */
+ .key = "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f",
+ .klen = 16,
+ .input = "\x20\x6d\x61\x64\x65\x20\x69\x74"
+ "\x20\x65\x71\x75\x69\x76\x61\x6c",
+ .ilen = 16,
+ .result = "\x18\x0d\x57\x5c\xdf\xfe\x60\x78"
+ "\x65\x32\x78\x79\x51\x98\x5d\xa6",
+ .rlen = 16,
+ }, { /* Speck128/192 */
+ .key = "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17",
+ .klen = 24,
+ .input = "\x65\x6e\x74\x20\x74\x6f\x20\x43"
+ "\x68\x69\x65\x66\x20\x48\x61\x72",
+ .ilen = 16,
+ .result = "\x86\x18\x3c\xe0\x5d\x18\xbc\xf9"
+ "\x66\x55\x13\x13\x3a\xcf\xe4\x1b",
+ .rlen = 16,
+ }, { /* Speck128/256 */
+ .key = "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17"
+ "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f",
+ .klen = 32,
+ .input = "\x70\x6f\x6f\x6e\x65\x72\x2e\x20"
+ "\x49\x6e\x20\x74\x68\x6f\x73\x65",
+ .ilen = 16,
+ .result = "\x43\x8f\x18\x9c\x8d\xb4\xee\x4e"
+ "\x3e\xf5\xc0\x05\x04\x01\x09\x41",
+ .rlen = 16,
+ },
+};
+
+static const struct cipher_testvec speck128_dec_tv_template[] = {
+ { /* Speck128/128 */
+ .key = "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f",
+ .klen = 16,
+ .input = "\x18\x0d\x57\x5c\xdf\xfe\x60\x78"
+ "\x65\x32\x78\x79\x51\x98\x5d\xa6",
+ .ilen = 16,
+ .result = "\x20\x6d\x61\x64\x65\x20\x69\x74"
+ "\x20\x65\x71\x75\x69\x76\x61\x6c",
+ .rlen = 16,
+ }, { /* Speck128/192 */
+ .key = "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17",
+ .klen = 24,
+ .input = "\x86\x18\x3c\xe0\x5d\x18\xbc\xf9"
+ "\x66\x55\x13\x13\x3a\xcf\xe4\x1b",
+ .ilen = 16,
+ .result = "\x65\x6e\x74\x20\x74\x6f\x20\x43"
+ "\x68\x69\x65\x66\x20\x48\x61\x72",
+ .rlen = 16,
+ }, { /* Speck128/256 */
+ .key = "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17"
+ "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f",
+ .klen = 32,
+ .input = "\x43\x8f\x18\x9c\x8d\xb4\xee\x4e"
+ "\x3e\xf5\xc0\x05\x04\x01\x09\x41",
+ .ilen = 16,
+ .result = "\x70\x6f\x6f\x6e\x65\x72\x2e\x20"
+ "\x49\x6e\x20\x74\x68\x6f\x73\x65",
+ .rlen = 16,
+ },
+};
+
+/*
+ * Speck128-XTS test vectors, taken from the AES-XTS test vectors with the
+ * result recomputed with Speck128 as the cipher
+ */
+
+static const struct cipher_testvec speck128_xts_enc_tv_template[] = {
+ {
+ .key = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .klen = 32,
+ .iv = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .input = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .ilen = 32,
+ .result = "\xbe\xa0\xe7\x03\xd7\xfe\xab\x62"
+ "\x3b\x99\x4a\x64\x74\x77\xac\xed"
+ "\xd8\xf4\xa6\xcf\xae\xb9\x07\x42"
+ "\x51\xd9\xb6\x1d\xe0\x5e\xbc\x54",
+ .rlen = 32,
+ }, {
+ .key = "\x11\x11\x11\x11\x11\x11\x11\x11"
+ "\x11\x11\x11\x11\x11\x11\x11\x11"
+ "\x22\x22\x22\x22\x22\x22\x22\x22"
+ "\x22\x22\x22\x22\x22\x22\x22\x22",
+ .klen = 32,
+ .iv = "\x33\x33\x33\x33\x33\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .input = "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44",
+ .ilen = 32,
+ .result = "\xfb\x53\x81\x75\x6f\x9f\x34\xad"
+ "\x7e\x01\xed\x7b\xcc\xda\x4e\x4a"
+ "\xd4\x84\xa4\x53\xd5\x88\x73\x1b"
+ "\xfd\xcb\xae\x0d\xf3\x04\xee\xe6",
+ .rlen = 32,
+ }, {
+ .key = "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8"
+ "\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0"
+ "\x22\x22\x22\x22\x22\x22\x22\x22"
+ "\x22\x22\x22\x22\x22\x22\x22\x22",
+ .klen = 32,
+ .iv = "\x33\x33\x33\x33\x33\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .input = "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44",
+ .ilen = 32,
+ .result = "\x21\x52\x84\x15\xd1\xf7\x21\x55"
+ "\xd9\x75\x4a\xd3\xc5\xdb\x9f\x7d"
+ "\xda\x63\xb2\xf1\x82\xb0\x89\x59"
+ "\x86\xd4\xaa\xaa\xdd\xff\x4f\x92",
+ .rlen = 32,
+ }, {
+ .key = "\x27\x18\x28\x18\x28\x45\x90\x45"
+ "\x23\x53\x60\x28\x74\x71\x35\x26"
+ "\x31\x41\x59\x26\x53\x58\x97\x93"
+ "\x23\x84\x62\x64\x33\x83\x27\x95",
+ .klen = 32,
+ .iv = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .input = "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17"
+ "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+ "\x20\x21\x22\x23\x24\x25\x26\x27"
+ "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+ "\x30\x31\x32\x33\x34\x35\x36\x37"
+ "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+ "\x40\x41\x42\x43\x44\x45\x46\x47"
+ "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+ "\x50\x51\x52\x53\x54\x55\x56\x57"
+ "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+ "\x60\x61\x62\x63\x64\x65\x66\x67"
+ "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+ "\x70\x71\x72\x73\x74\x75\x76\x77"
+ "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+ "\x80\x81\x82\x83\x84\x85\x86\x87"
+ "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+ "\x90\x91\x92\x93\x94\x95\x96\x97"
+ "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+ "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+ "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+ "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+ "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+ "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+ "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+ "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+ "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+ "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+ "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+ "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+ "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+ "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17"
+ "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+ "\x20\x21\x22\x23\x24\x25\x26\x27"
+ "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+ "\x30\x31\x32\x33\x34\x35\x36\x37"
+ "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+ "\x40\x41\x42\x43\x44\x45\x46\x47"
+ "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+ "\x50\x51\x52\x53\x54\x55\x56\x57"
+ "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+ "\x60\x61\x62\x63\x64\x65\x66\x67"
+ "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+ "\x70\x71\x72\x73\x74\x75\x76\x77"
+ "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+ "\x80\x81\x82\x83\x84\x85\x86\x87"
+ "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+ "\x90\x91\x92\x93\x94\x95\x96\x97"
+ "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+ "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+ "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+ "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+ "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+ "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+ "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+ "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+ "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+ "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+ "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+ "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+ "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+ .ilen = 512,
+ .result = "\x57\xb5\xf8\x71\x6e\x6d\xdd\x82"
+ "\x53\xd0\xed\x2d\x30\xc1\x20\xef"
+ "\x70\x67\x5e\xff\x09\x70\xbb\xc1"
+ "\x3a\x7b\x48\x26\xd9\x0b\xf4\x48"
+ "\xbe\xce\xb1\xc7\xb2\x67\xc4\xa7"
+ "\x76\xf8\x36\x30\xb7\xb4\x9a\xd9"
+ "\xf5\x9d\xd0\x7b\xc1\x06\x96\x44"
+ "\x19\xc5\x58\x84\x63\xb9\x12\x68"
+ "\x68\xc7\xaa\x18\x98\xf2\x1f\x5c"
+ "\x39\xa6\xd8\x32\x2b\xc3\x51\xfd"
+ "\x74\x79\x2e\xb4\x44\xd7\x69\xc4"
+ "\xfc\x29\xe6\xed\x26\x1e\xa6\x9d"
+ "\x1c\xbe\x00\x0e\x7f\x3a\xca\xfb"
+ "\x6d\x13\x65\xa0\xf9\x31\x12\xe2"
+ "\x26\xd1\xec\x2b\x0a\x8b\x59\x99"
+ "\xa7\x49\xa0\x0e\x09\x33\x85\x50"
+ "\xc3\x23\xca\x7a\xdd\x13\x45\x5f"
+ "\xde\x4c\xa7\xcb\x00\x8a\x66\x6f"
+ "\xa2\xb6\xb1\x2e\xe1\xa0\x18\xf6"
+ "\xad\xf3\xbd\xeb\xc7\xef\x55\x4f"
+ "\x79\x91\x8d\x36\x13\x7b\xd0\x4a"
+ "\x6c\x39\xfb\x53\xb8\x6f\x02\x51"
+ "\xa5\x20\xac\x24\x1c\x73\x59\x73"
+ "\x58\x61\x3a\x87\x58\xb3\x20\x56"
+ "\x39\x06\x2b\x4d\xd3\x20\x2b\x89"
+ "\x3f\xa2\xf0\x96\xeb\x7f\xa4\xcd"
+ "\x11\xae\xbd\xcb\x3a\xb4\xd9\x91"
+ "\x09\x35\x71\x50\x65\xac\x92\xe3"
+ "\x7b\x32\xc0\x7a\xdd\xd4\xc3\x92"
+ "\x6f\xeb\x79\xde\x6f\xd3\x25\xc9"
+ "\xcd\x63\xf5\x1e\x7a\x3b\x26\x9d"
+ "\x77\x04\x80\xa9\xbf\x38\xb5\xbd"
+ "\xb8\x05\x07\xbd\xfd\xab\x7b\xf8"
+ "\x2a\x26\xcc\x49\x14\x6d\x55\x01"
+ "\x06\x94\xd8\xb2\x2d\x53\x83\x1b"
+ "\x8f\xd4\xdd\x57\x12\x7e\x18\xba"
+ "\x8e\xe2\x4d\x80\xef\x7e\x6b\x9d"
+ "\x24\xa9\x60\xa4\x97\x85\x86\x2a"
+ "\x01\x00\x09\xf1\xcb\x4a\x24\x1c"
+ "\xd8\xf6\xe6\x5b\xe7\x5d\xf2\xc4"
+ "\x97\x1c\x10\xc6\x4d\x66\x4f\x98"
+ "\x87\x30\xac\xd5\xea\x73\x49\x10"
+ "\x80\xea\xe5\x5f\x4d\x5f\x03\x33"
+ "\x66\x02\x35\x3d\x60\x06\x36\x4f"
+ "\x14\x1c\xd8\x07\x1f\x78\xd0\xf8"
+ "\x4f\x6c\x62\x7c\x15\xa5\x7c\x28"
+ "\x7c\xcc\xeb\x1f\xd1\x07\x90\x93"
+ "\x7e\xc2\xa8\x3a\x80\xc0\xf5\x30"
+ "\xcc\x75\xcf\x16\x26\xa9\x26\x3b"
+ "\xe7\x68\x2f\x15\x21\x5b\xe4\x00"
+ "\xbd\x48\x50\xcd\x75\x70\xc4\x62"
+ "\xbb\x41\xfb\x89\x4a\x88\x3b\x3b"
+ "\x51\x66\x02\x69\x04\x97\x36\xd4"
+ "\x75\xae\x0b\xa3\x42\xf8\xca\x79"
+ "\x8f\x93\xe9\xcc\x38\xbd\xd6\xd2"
+ "\xf9\x70\x4e\xc3\x6a\x8e\x25\xbd"
+ "\xea\x15\x5a\xa0\x85\x7e\x81\x0d"
+ "\x03\xe7\x05\x39\xf5\x05\x26\xee"
+ "\xec\xaa\x1f\x3d\xc9\x98\x76\x01"
+ "\x2c\xf4\xfc\xa3\x88\x77\x38\xc4"
+ "\x50\x65\x50\x6d\x04\x1f\xdf\x5a"
+ "\xaa\xf2\x01\xa9\xc1\x8d\xee\xca"
+ "\x47\x26\xef\x39\xb8\xb4\xf2\xd1"
+ "\xd6\xbb\x1b\x2a\xc1\x34\x14\xcf",
+ .rlen = 512,
+ }, {
+ .key = "\x27\x18\x28\x18\x28\x45\x90\x45"
+ "\x23\x53\x60\x28\x74\x71\x35\x26"
+ "\x62\x49\x77\x57\x24\x70\x93\x69"
+ "\x99\x59\x57\x49\x66\x96\x76\x27"
+ "\x31\x41\x59\x26\x53\x58\x97\x93"
+ "\x23\x84\x62\x64\x33\x83\x27\x95"
+ "\x02\x88\x41\x97\x16\x93\x99\x37"
+ "\x51\x05\x82\x09\x74\x94\x45\x92",
+ .klen = 64,
+ .iv = "\xff\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .input = "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17"
+ "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+ "\x20\x21\x22\x23\x24\x25\x26\x27"
+ "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+ "\x30\x31\x32\x33\x34\x35\x36\x37"
+ "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+ "\x40\x41\x42\x43\x44\x45\x46\x47"
+ "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+ "\x50\x51\x52\x53\x54\x55\x56\x57"
+ "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+ "\x60\x61\x62\x63\x64\x65\x66\x67"
+ "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+ "\x70\x71\x72\x73\x74\x75\x76\x77"
+ "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+ "\x80\x81\x82\x83\x84\x85\x86\x87"
+ "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+ "\x90\x91\x92\x93\x94\x95\x96\x97"
+ "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+ "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+ "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+ "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+ "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+ "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+ "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+ "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+ "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+ "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+ "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+ "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+ "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+ "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17"
+ "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+ "\x20\x21\x22\x23\x24\x25\x26\x27"
+ "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+ "\x30\x31\x32\x33\x34\x35\x36\x37"
+ "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+ "\x40\x41\x42\x43\x44\x45\x46\x47"
+ "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+ "\x50\x51\x52\x53\x54\x55\x56\x57"
+ "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+ "\x60\x61\x62\x63\x64\x65\x66\x67"
+ "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+ "\x70\x71\x72\x73\x74\x75\x76\x77"
+ "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+ "\x80\x81\x82\x83\x84\x85\x86\x87"
+ "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+ "\x90\x91\x92\x93\x94\x95\x96\x97"
+ "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+ "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+ "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+ "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+ "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+ "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+ "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+ "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+ "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+ "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+ "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+ "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+ "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+ .ilen = 512,
+ .result = "\xc5\x85\x2a\x4b\x73\xe4\xf6\xf1"
+ "\x7e\xf9\xf6\xe9\xa3\x73\x36\xcb"
+ "\xaa\xb6\x22\xb0\x24\x6e\x3d\x73"
+ "\x92\x99\xde\xd3\x76\xed\xcd\x63"
+ "\x64\x3a\x22\x57\xc1\x43\x49\xd4"
+ "\x79\x36\x31\x19\x62\xae\x10\x7e"
+ "\x7d\xcf\x7a\xe2\x6b\xce\x27\xfa"
+ "\xdc\x3d\xd9\x83\xd3\x42\x4c\xe0"
+ "\x1b\xd6\x1d\x1a\x6f\xd2\x03\x00"
+ "\xfc\x81\x99\x8a\x14\x62\xf5\x7e"
+ "\x0d\xe7\x12\xe8\x17\x9d\x0b\xec"
+ "\xe2\xf7\xc9\xa7\x63\xd1\x79\xb6"
+ "\x62\x62\x37\xfe\x0a\x4c\x4a\x37"
+ "\x70\xc7\x5e\x96\x5f\xbc\x8e\x9e"
+ "\x85\x3c\x4f\x26\x64\x85\xbc\x68"
+ "\xb0\xe0\x86\x5e\x26\x41\xce\x11"
+ "\x50\xda\x97\x14\xe9\x9e\xc7\x6d"
+ "\x3b\xdc\x43\xde\x2b\x27\x69\x7d"
+ "\xfc\xb0\x28\xbd\x8f\xb1\xc6\x31"
+ "\x14\x4d\xf0\x74\x37\xfd\x07\x25"
+ "\x96\x55\xe5\xfc\x9e\x27\x2a\x74"
+ "\x1b\x83\x4d\x15\x83\xac\x57\xa0"
+ "\xac\xa5\xd0\x38\xef\x19\x56\x53"
+ "\x25\x4b\xfc\xce\x04\x23\xe5\x6b"
+ "\xf6\xc6\x6c\x32\x0b\xb3\x12\xc5"
+ "\xed\x22\x34\x1c\x5d\xed\x17\x06"
+ "\x36\xa3\xe6\x77\xb9\x97\x46\xb8"
+ "\xe9\x3f\x7e\xc7\xbc\x13\x5c\xdc"
+ "\x6e\x3f\x04\x5e\xd1\x59\xa5\x82"
+ "\x35\x91\x3d\x1b\xe4\x97\x9f\x92"
+ "\x1c\x5e\x5f\x6f\x41\xd4\x62\xa1"
+ "\x8d\x39\xfc\x42\xfb\x38\x80\xb9"
+ "\x0a\xe3\xcc\x6a\x93\xd9\x7a\xb1"
+ "\xe9\x69\xaf\x0a\x6b\x75\x38\xa7"
+ "\xa1\xbf\xf7\xda\x95\x93\x4b\x78"
+ "\x19\xf5\x94\xf9\xd2\x00\x33\x37"
+ "\xcf\xf5\x9e\x9c\xf3\xcc\xa6\xee"
+ "\x42\xb2\x9e\x2c\x5f\x48\x23\x26"
+ "\x15\x25\x17\x03\x3d\xfe\x2c\xfc"
+ "\xeb\xba\xda\xe0\x00\x05\xb6\xa6"
+ "\x07\xb3\xe8\x36\x5b\xec\x5b\xbf"
+ "\xd6\x5b\x00\x74\xc6\x97\xf1\x6a"
+ "\x49\xa1\xc3\xfa\x10\x52\xb9\x14"
+ "\xad\xb7\x73\xf8\x78\x12\xc8\x59"
+ "\x17\x80\x4c\x57\x39\xf1\x6d\x80"
+ "\x25\x77\x0f\x5e\x7d\xf0\xaf\x21"
+ "\xec\xce\xb7\xc8\x02\x8a\xed\x53"
+ "\x2c\x25\x68\x2e\x1f\x85\x5e\x67"
+ "\xd1\x07\x7a\x3a\x89\x08\xe0\x34"
+ "\xdc\xdb\x26\xb4\x6b\x77\xfc\x40"
+ "\x31\x15\x72\xa0\xf0\x73\xd9\x3b"
+ "\xd5\xdb\xfe\xfc\x8f\xa9\x44\xa2"
+ "\x09\x9f\xc6\x33\xe5\xe2\x88\xe8"
+ "\xf3\xf0\x1a\xf4\xce\x12\x0f\xd6"
+ "\xf7\x36\xe6\xa4\xf4\x7a\x10\x58"
+ "\xcc\x1f\x48\x49\x65\x47\x75\xe9"
+ "\x28\xe1\x65\x7b\xf2\xc4\xb5\x07"
+ "\xf2\xec\x76\xd8\x8f\x09\xf3\x16"
+ "\xa1\x51\x89\x3b\xeb\x96\x42\xac"
+ "\x65\xe0\x67\x63\x29\xdc\xb4\x7d"
+ "\xf2\x41\x51\x6a\xcb\xde\x3c\xfb"
+ "\x66\x8d\x13\xca\xe0\x59\x2a\x00"
+ "\xc9\x53\x4c\xe6\x9e\xe2\x73\xd5"
+ "\x67\x19\xb2\xbd\x9a\x63\xd7\x5c",
+ .rlen = 512,
+ .also_non_np = 1,
+ .np = 3,
+ .tap = { 512 - 20, 4, 16 },
+ }
+};
+
+static const struct cipher_testvec speck128_xts_dec_tv_template[] = {
+ {
+ .key = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .klen = 32,
+ .iv = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .input = "\xbe\xa0\xe7\x03\xd7\xfe\xab\x62"
+ "\x3b\x99\x4a\x64\x74\x77\xac\xed"
+ "\xd8\xf4\xa6\xcf\xae\xb9\x07\x42"
+ "\x51\xd9\xb6\x1d\xe0\x5e\xbc\x54",
+ .ilen = 32,
+ .result = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .rlen = 32,
+ }, {
+ .key = "\x11\x11\x11\x11\x11\x11\x11\x11"
+ "\x11\x11\x11\x11\x11\x11\x11\x11"
+ "\x22\x22\x22\x22\x22\x22\x22\x22"
+ "\x22\x22\x22\x22\x22\x22\x22\x22",
+ .klen = 32,
+ .iv = "\x33\x33\x33\x33\x33\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .input = "\xfb\x53\x81\x75\x6f\x9f\x34\xad"
+ "\x7e\x01\xed\x7b\xcc\xda\x4e\x4a"
+ "\xd4\x84\xa4\x53\xd5\x88\x73\x1b"
+ "\xfd\xcb\xae\x0d\xf3\x04\xee\xe6",
+ .ilen = 32,
+ .result = "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44",
+ .rlen = 32,
+ }, {
+ .key = "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8"
+ "\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0"
+ "\x22\x22\x22\x22\x22\x22\x22\x22"
+ "\x22\x22\x22\x22\x22\x22\x22\x22",
+ .klen = 32,
+ .iv = "\x33\x33\x33\x33\x33\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .input = "\x21\x52\x84\x15\xd1\xf7\x21\x55"
+ "\xd9\x75\x4a\xd3\xc5\xdb\x9f\x7d"
+ "\xda\x63\xb2\xf1\x82\xb0\x89\x59"
+ "\x86\xd4\xaa\xaa\xdd\xff\x4f\x92",
+ .ilen = 32,
+ .result = "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44",
+ .rlen = 32,
+ }, {
+ .key = "\x27\x18\x28\x18\x28\x45\x90\x45"
+ "\x23\x53\x60\x28\x74\x71\x35\x26"
+ "\x31\x41\x59\x26\x53\x58\x97\x93"
+ "\x23\x84\x62\x64\x33\x83\x27\x95",
+ .klen = 32,
+ .iv = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .input = "\x57\xb5\xf8\x71\x6e\x6d\xdd\x82"
+ "\x53\xd0\xed\x2d\x30\xc1\x20\xef"
+ "\x70\x67\x5e\xff\x09\x70\xbb\xc1"
+ "\x3a\x7b\x48\x26\xd9\x0b\xf4\x48"
+ "\xbe\xce\xb1\xc7\xb2\x67\xc4\xa7"
+ "\x76\xf8\x36\x30\xb7\xb4\x9a\xd9"
+ "\xf5\x9d\xd0\x7b\xc1\x06\x96\x44"
+ "\x19\xc5\x58\x84\x63\xb9\x12\x68"
+ "\x68\xc7\xaa\x18\x98\xf2\x1f\x5c"
+ "\x39\xa6\xd8\x32\x2b\xc3\x51\xfd"
+ "\x74\x79\x2e\xb4\x44\xd7\x69\xc4"
+ "\xfc\x29\xe6\xed\x26\x1e\xa6\x9d"
+ "\x1c\xbe\x00\x0e\x7f\x3a\xca\xfb"
+ "\x6d\x13\x65\xa0\xf9\x31\x12\xe2"
+ "\x26\xd1\xec\x2b\x0a\x8b\x59\x99"
+ "\xa7\x49\xa0\x0e\x09\x33\x85\x50"
+ "\xc3\x23\xca\x7a\xdd\x13\x45\x5f"
+ "\xde\x4c\xa7\xcb\x00\x8a\x66\x6f"
+ "\xa2\xb6\xb1\x2e\xe1\xa0\x18\xf6"
+ "\xad\xf3\xbd\xeb\xc7\xef\x55\x4f"
+ "\x79\x91\x8d\x36\x13\x7b\xd0\x4a"
+ "\x6c\x39\xfb\x53\xb8\x6f\x02\x51"
+ "\xa5\x20\xac\x24\x1c\x73\x59\x73"
+ "\x58\x61\x3a\x87\x58\xb3\x20\x56"
+ "\x39\x06\x2b\x4d\xd3\x20\x2b\x89"
+ "\x3f\xa2\xf0\x96\xeb\x7f\xa4\xcd"
+ "\x11\xae\xbd\xcb\x3a\xb4\xd9\x91"
+ "\x09\x35\x71\x50\x65\xac\x92\xe3"
+ "\x7b\x32\xc0\x7a\xdd\xd4\xc3\x92"
+ "\x6f\xeb\x79\xde\x6f\xd3\x25\xc9"
+ "\xcd\x63\xf5\x1e\x7a\x3b\x26\x9d"
+ "\x77\x04\x80\xa9\xbf\x38\xb5\xbd"
+ "\xb8\x05\x07\xbd\xfd\xab\x7b\xf8"
+ "\x2a\x26\xcc\x49\x14\x6d\x55\x01"
+ "\x06\x94\xd8\xb2\x2d\x53\x83\x1b"
+ "\x8f\xd4\xdd\x57\x12\x7e\x18\xba"
+ "\x8e\xe2\x4d\x80\xef\x7e\x6b\x9d"
+ "\x24\xa9\x60\xa4\x97\x85\x86\x2a"
+ "\x01\x00\x09\xf1\xcb\x4a\x24\x1c"
+ "\xd8\xf6\xe6\x5b\xe7\x5d\xf2\xc4"
+ "\x97\x1c\x10\xc6\x4d\x66\x4f\x98"
+ "\x87\x30\xac\xd5\xea\x73\x49\x10"
+ "\x80\xea\xe5\x5f\x4d\x5f\x03\x33"
+ "\x66\x02\x35\x3d\x60\x06\x36\x4f"
+ "\x14\x1c\xd8\x07\x1f\x78\xd0\xf8"
+ "\x4f\x6c\x62\x7c\x15\xa5\x7c\x28"
+ "\x7c\xcc\xeb\x1f\xd1\x07\x90\x93"
+ "\x7e\xc2\xa8\x3a\x80\xc0\xf5\x30"
+ "\xcc\x75\xcf\x16\x26\xa9\x26\x3b"
+ "\xe7\x68\x2f\x15\x21\x5b\xe4\x00"
+ "\xbd\x48\x50\xcd\x75\x70\xc4\x62"
+ "\xbb\x41\xfb\x89\x4a\x88\x3b\x3b"
+ "\x51\x66\x02\x69\x04\x97\x36\xd4"
+ "\x75\xae\x0b\xa3\x42\xf8\xca\x79"
+ "\x8f\x93\xe9\xcc\x38\xbd\xd6\xd2"
+ "\xf9\x70\x4e\xc3\x6a\x8e\x25\xbd"
+ "\xea\x15\x5a\xa0\x85\x7e\x81\x0d"
+ "\x03\xe7\x05\x39\xf5\x05\x26\xee"
+ "\xec\xaa\x1f\x3d\xc9\x98\x76\x01"
+ "\x2c\xf4\xfc\xa3\x88\x77\x38\xc4"
+ "\x50\x65\x50\x6d\x04\x1f\xdf\x5a"
+ "\xaa\xf2\x01\xa9\xc1\x8d\xee\xca"
+ "\x47\x26\xef\x39\xb8\xb4\xf2\xd1"
+ "\xd6\xbb\x1b\x2a\xc1\x34\x14\xcf",
+ .ilen = 512,
+ .result = "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17"
+ "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+ "\x20\x21\x22\x23\x24\x25\x26\x27"
+ "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+ "\x30\x31\x32\x33\x34\x35\x36\x37"
+ "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+ "\x40\x41\x42\x43\x44\x45\x46\x47"
+ "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+ "\x50\x51\x52\x53\x54\x55\x56\x57"
+ "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+ "\x60\x61\x62\x63\x64\x65\x66\x67"
+ "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+ "\x70\x71\x72\x73\x74\x75\x76\x77"
+ "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+ "\x80\x81\x82\x83\x84\x85\x86\x87"
+ "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+ "\x90\x91\x92\x93\x94\x95\x96\x97"
+ "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+ "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+ "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+ "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+ "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+ "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+ "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+ "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+ "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+ "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+ "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+ "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+ "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+ "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17"
+ "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+ "\x20\x21\x22\x23\x24\x25\x26\x27"
+ "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+ "\x30\x31\x32\x33\x34\x35\x36\x37"
+ "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+ "\x40\x41\x42\x43\x44\x45\x46\x47"
+ "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+ "\x50\x51\x52\x53\x54\x55\x56\x57"
+ "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+ "\x60\x61\x62\x63\x64\x65\x66\x67"
+ "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+ "\x70\x71\x72\x73\x74\x75\x76\x77"
+ "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+ "\x80\x81\x82\x83\x84\x85\x86\x87"
+ "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+ "\x90\x91\x92\x93\x94\x95\x96\x97"
+ "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+ "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+ "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+ "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+ "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+ "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+ "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+ "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+ "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+ "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+ "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+ "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+ "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+ .rlen = 512,
+ }, {
+ .key = "\x27\x18\x28\x18\x28\x45\x90\x45"
+ "\x23\x53\x60\x28\x74\x71\x35\x26"
+ "\x62\x49\x77\x57\x24\x70\x93\x69"
+ "\x99\x59\x57\x49\x66\x96\x76\x27"
+ "\x31\x41\x59\x26\x53\x58\x97\x93"
+ "\x23\x84\x62\x64\x33\x83\x27\x95"
+ "\x02\x88\x41\x97\x16\x93\x99\x37"
+ "\x51\x05\x82\x09\x74\x94\x45\x92",
+ .klen = 64,
+ .iv = "\xff\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .input = "\xc5\x85\x2a\x4b\x73\xe4\xf6\xf1"
+ "\x7e\xf9\xf6\xe9\xa3\x73\x36\xcb"
+ "\xaa\xb6\x22\xb0\x24\x6e\x3d\x73"
+ "\x92\x99\xde\xd3\x76\xed\xcd\x63"
+ "\x64\x3a\x22\x57\xc1\x43\x49\xd4"
+ "\x79\x36\x31\x19\x62\xae\x10\x7e"
+ "\x7d\xcf\x7a\xe2\x6b\xce\x27\xfa"
+ "\xdc\x3d\xd9\x83\xd3\x42\x4c\xe0"
+ "\x1b\xd6\x1d\x1a\x6f\xd2\x03\x00"
+ "\xfc\x81\x99\x8a\x14\x62\xf5\x7e"
+ "\x0d\xe7\x12\xe8\x17\x9d\x0b\xec"
+ "\xe2\xf7\xc9\xa7\x63\xd1\x79\xb6"
+ "\x62\x62\x37\xfe\x0a\x4c\x4a\x37"
+ "\x70\xc7\x5e\x96\x5f\xbc\x8e\x9e"
+ "\x85\x3c\x4f\x26\x64\x85\xbc\x68"
+ "\xb0\xe0\x86\x5e\x26\x41\xce\x11"
+ "\x50\xda\x97\x14\xe9\x9e\xc7\x6d"
+ "\x3b\xdc\x43\xde\x2b\x27\x69\x7d"
+ "\xfc\xb0\x28\xbd\x8f\xb1\xc6\x31"
+ "\x14\x4d\xf0\x74\x37\xfd\x07\x25"
+ "\x96\x55\xe5\xfc\x9e\x27\x2a\x74"
+ "\x1b\x83\x4d\x15\x83\xac\x57\xa0"
+ "\xac\xa5\xd0\x38\xef\x19\x56\x53"
+ "\x25\x4b\xfc\xce\x04\x23\xe5\x6b"
+ "\xf6\xc6\x6c\x32\x0b\xb3\x12\xc5"
+ "\xed\x22\x34\x1c\x5d\xed\x17\x06"
+ "\x36\xa3\xe6\x77\xb9\x97\x46\xb8"
+ "\xe9\x3f\x7e\xc7\xbc\x13\x5c\xdc"
+ "\x6e\x3f\x04\x5e\xd1\x59\xa5\x82"
+ "\x35\x91\x3d\x1b\xe4\x97\x9f\x92"
+ "\x1c\x5e\x5f\x6f\x41\xd4\x62\xa1"
+ "\x8d\x39\xfc\x42\xfb\x38\x80\xb9"
+ "\x0a\xe3\xcc\x6a\x93\xd9\x7a\xb1"
+ "\xe9\x69\xaf\x0a\x6b\x75\x38\xa7"
+ "\xa1\xbf\xf7\xda\x95\x93\x4b\x78"
+ "\x19\xf5\x94\xf9\xd2\x00\x33\x37"
+ "\xcf\xf5\x9e\x9c\xf3\xcc\xa6\xee"
+ "\x42\xb2\x9e\x2c\x5f\x48\x23\x26"
+ "\x15\x25\x17\x03\x3d\xfe\x2c\xfc"
+ "\xeb\xba\xda\xe0\x00\x05\xb6\xa6"
+ "\x07\xb3\xe8\x36\x5b\xec\x5b\xbf"
+ "\xd6\x5b\x00\x74\xc6\x97\xf1\x6a"
+ "\x49\xa1\xc3\xfa\x10\x52\xb9\x14"
+ "\xad\xb7\x73\xf8\x78\x12\xc8\x59"
+ "\x17\x80\x4c\x57\x39\xf1\x6d\x80"
+ "\x25\x77\x0f\x5e\x7d\xf0\xaf\x21"
+ "\xec\xce\xb7\xc8\x02\x8a\xed\x53"
+ "\x2c\x25\x68\x2e\x1f\x85\x5e\x67"
+ "\xd1\x07\x7a\x3a\x89\x08\xe0\x34"
+ "\xdc\xdb\x26\xb4\x6b\x77\xfc\x40"
+ "\x31\x15\x72\xa0\xf0\x73\xd9\x3b"
+ "\xd5\xdb\xfe\xfc\x8f\xa9\x44\xa2"
+ "\x09\x9f\xc6\x33\xe5\xe2\x88\xe8"
+ "\xf3\xf0\x1a\xf4\xce\x12\x0f\xd6"
+ "\xf7\x36\xe6\xa4\xf4\x7a\x10\x58"
+ "\xcc\x1f\x48\x49\x65\x47\x75\xe9"
+ "\x28\xe1\x65\x7b\xf2\xc4\xb5\x07"
+ "\xf2\xec\x76\xd8\x8f\x09\xf3\x16"
+ "\xa1\x51\x89\x3b\xeb\x96\x42\xac"
+ "\x65\xe0\x67\x63\x29\xdc\xb4\x7d"
+ "\xf2\x41\x51\x6a\xcb\xde\x3c\xfb"
+ "\x66\x8d\x13\xca\xe0\x59\x2a\x00"
+ "\xc9\x53\x4c\xe6\x9e\xe2\x73\xd5"
+ "\x67\x19\xb2\xbd\x9a\x63\xd7\x5c",
+ .ilen = 512,
+ .result = "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17"
+ "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+ "\x20\x21\x22\x23\x24\x25\x26\x27"
+ "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+ "\x30\x31\x32\x33\x34\x35\x36\x37"
+ "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+ "\x40\x41\x42\x43\x44\x45\x46\x47"
+ "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+ "\x50\x51\x52\x53\x54\x55\x56\x57"
+ "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+ "\x60\x61\x62\x63\x64\x65\x66\x67"
+ "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+ "\x70\x71\x72\x73\x74\x75\x76\x77"
+ "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+ "\x80\x81\x82\x83\x84\x85\x86\x87"
+ "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+ "\x90\x91\x92\x93\x94\x95\x96\x97"
+ "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+ "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+ "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+ "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+ "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+ "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+ "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+ "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+ "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+ "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+ "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+ "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+ "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+ "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17"
+ "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+ "\x20\x21\x22\x23\x24\x25\x26\x27"
+ "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+ "\x30\x31\x32\x33\x34\x35\x36\x37"
+ "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+ "\x40\x41\x42\x43\x44\x45\x46\x47"
+ "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+ "\x50\x51\x52\x53\x54\x55\x56\x57"
+ "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+ "\x60\x61\x62\x63\x64\x65\x66\x67"
+ "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+ "\x70\x71\x72\x73\x74\x75\x76\x77"
+ "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+ "\x80\x81\x82\x83\x84\x85\x86\x87"
+ "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+ "\x90\x91\x92\x93\x94\x95\x96\x97"
+ "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+ "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+ "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+ "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+ "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+ "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+ "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+ "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+ "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+ "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+ "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+ "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+ "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+ .rlen = 512,
+ .also_non_np = 1,
+ .np = 3,
+ .tap = { 512 - 20, 4, 16 },
+ }
+};
+
+static const struct cipher_testvec speck64_enc_tv_template[] = {
+ { /* Speck64/96 */
+ .key = "\x00\x01\x02\x03\x08\x09\x0a\x0b"
+ "\x10\x11\x12\x13",
+ .klen = 12,
+ .input = "\x65\x61\x6e\x73\x20\x46\x61\x74",
+ .ilen = 8,
+ .result = "\x6c\x94\x75\x41\xec\x52\x79\x9f",
+ .rlen = 8,
+ }, { /* Speck64/128 */
+ .key = "\x00\x01\x02\x03\x08\x09\x0a\x0b"
+ "\x10\x11\x12\x13\x18\x19\x1a\x1b",
+ .klen = 16,
+ .input = "\x2d\x43\x75\x74\x74\x65\x72\x3b",
+ .ilen = 8,
+ .result = "\x8b\x02\x4e\x45\x48\xa5\x6f\x8c",
+ .rlen = 8,
+ },
+};
+
+static const struct cipher_testvec speck64_dec_tv_template[] = {
+ { /* Speck64/96 */
+ .key = "\x00\x01\x02\x03\x08\x09\x0a\x0b"
+ "\x10\x11\x12\x13",
+ .klen = 12,
+ .input = "\x6c\x94\x75\x41\xec\x52\x79\x9f",
+ .ilen = 8,
+ .result = "\x65\x61\x6e\x73\x20\x46\x61\x74",
+ .rlen = 8,
+ }, { /* Speck64/128 */
+ .key = "\x00\x01\x02\x03\x08\x09\x0a\x0b"
+ "\x10\x11\x12\x13\x18\x19\x1a\x1b",
+ .klen = 16,
+ .input = "\x8b\x02\x4e\x45\x48\xa5\x6f\x8c",
+ .ilen = 8,
+ .result = "\x2d\x43\x75\x74\x74\x65\x72\x3b",
+ .rlen = 8,
+ },
+};
+
+/*
+ * Speck64-XTS test vectors, taken from the AES-XTS test vectors with the result
+ * recomputed with Speck64 as the cipher, and key lengths adjusted
+ */
+
+static const struct cipher_testvec speck64_xts_enc_tv_template[] = {
+ {
+ .key = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .klen = 24,
+ .iv = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .input = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .ilen = 32,
+ .result = "\x84\xaf\x54\x07\x19\xd4\x7c\xa6"
+ "\xe4\xfe\xdf\xc4\x1f\x34\xc3\xc2"
+ "\x80\xf5\x72\xe7\xcd\xf0\x99\x22"
+ "\x35\xa7\x2f\x06\xef\xdc\x51\xaa",
+ .rlen = 32,
+ }, {
+ .key = "\x11\x11\x11\x11\x11\x11\x11\x11"
+ "\x11\x11\x11\x11\x11\x11\x11\x11"
+ "\x22\x22\x22\x22\x22\x22\x22\x22",
+ .klen = 24,
+ .iv = "\x33\x33\x33\x33\x33\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .input = "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44",
+ .ilen = 32,
+ .result = "\x12\x56\x73\xcd\x15\x87\xa8\x59"
+ "\xcf\x84\xae\xd9\x1c\x66\xd6\x9f"
+ "\xb3\x12\x69\x7e\x36\xeb\x52\xff"
+ "\x62\xdd\xba\x90\xb3\xe1\xee\x99",
+ .rlen = 32,
+ }, {
+ .key = "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8"
+ "\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0"
+ "\x22\x22\x22\x22\x22\x22\x22\x22",
+ .klen = 24,
+ .iv = "\x33\x33\x33\x33\x33\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .input = "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44",
+ .ilen = 32,
+ .result = "\x15\x1b\xe4\x2c\xa2\x5a\x2d\x2c"
+ "\x27\x36\xc0\xbf\x5d\xea\x36\x37"
+ "\x2d\x1a\x88\xbc\x66\xb5\xd0\x0b"
+ "\xa1\xbc\x19\xb2\x0f\x3b\x75\x34",
+ .rlen = 32,
+ }, {
+ .key = "\x27\x18\x28\x18\x28\x45\x90\x45"
+ "\x23\x53\x60\x28\x74\x71\x35\x26"
+ "\x31\x41\x59\x26\x53\x58\x97\x93",
+ .klen = 24,
+ .iv = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .input = "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17"
+ "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+ "\x20\x21\x22\x23\x24\x25\x26\x27"
+ "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+ "\x30\x31\x32\x33\x34\x35\x36\x37"
+ "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+ "\x40\x41\x42\x43\x44\x45\x46\x47"
+ "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+ "\x50\x51\x52\x53\x54\x55\x56\x57"
+ "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+ "\x60\x61\x62\x63\x64\x65\x66\x67"
+ "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+ "\x70\x71\x72\x73\x74\x75\x76\x77"
+ "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+ "\x80\x81\x82\x83\x84\x85\x86\x87"
+ "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+ "\x90\x91\x92\x93\x94\x95\x96\x97"
+ "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+ "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+ "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+ "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+ "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+ "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+ "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+ "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+ "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+ "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+ "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+ "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+ "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+ "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17"
+ "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+ "\x20\x21\x22\x23\x24\x25\x26\x27"
+ "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+ "\x30\x31\x32\x33\x34\x35\x36\x37"
+ "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+ "\x40\x41\x42\x43\x44\x45\x46\x47"
+ "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+ "\x50\x51\x52\x53\x54\x55\x56\x57"
+ "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+ "\x60\x61\x62\x63\x64\x65\x66\x67"
+ "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+ "\x70\x71\x72\x73\x74\x75\x76\x77"
+ "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+ "\x80\x81\x82\x83\x84\x85\x86\x87"
+ "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+ "\x90\x91\x92\x93\x94\x95\x96\x97"
+ "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+ "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+ "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+ "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+ "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+ "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+ "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+ "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+ "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+ "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+ "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+ "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+ "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+ .ilen = 512,
+ .result = "\xaf\xa1\x81\xa6\x32\xbb\x15\x8e"
+ "\xf8\x95\x2e\xd3\xe6\xee\x7e\x09"
+ "\x0c\x1a\xf5\x02\x97\x8b\xe3\xb3"
+ "\x11\xc7\x39\x96\xd0\x95\xf4\x56"
+ "\xf4\xdd\x03\x38\x01\x44\x2c\xcf"
+ "\x88\xae\x8e\x3c\xcd\xe7\xaa\x66"
+ "\xfe\x3d\xc6\xfb\x01\x23\x51\x43"
+ "\xd5\xd2\x13\x86\x94\x34\xe9\x62"
+ "\xf9\x89\xe3\xd1\x7b\xbe\xf8\xef"
+ "\x76\x35\x04\x3f\xdb\x23\x9d\x0b"
+ "\x85\x42\xb9\x02\xd6\xcc\xdb\x96"
+ "\xa7\x6b\x27\xb6\xd4\x45\x8f\x7d"
+ "\xae\xd2\x04\xd5\xda\xc1\x7e\x24"
+ "\x8c\x73\xbe\x48\x7e\xcf\x65\x28"
+ "\x29\xe5\xbe\x54\x30\xcb\x46\x95"
+ "\x4f\x2e\x8a\x36\xc8\x27\xc5\xbe"
+ "\xd0\x1a\xaf\xab\x26\xcd\x9e\x69"
+ "\xa1\x09\x95\x71\x26\xe9\xc4\xdf"
+ "\xe6\x31\xc3\x46\xda\xaf\x0b\x41"
+ "\x1f\xab\xb1\x8e\xd6\xfc\x0b\xb3"
+ "\x82\xc0\x37\x27\xfc\x91\xa7\x05"
+ "\xfb\xc5\xdc\x2b\x74\x96\x48\x43"
+ "\x5d\x9c\x19\x0f\x60\x63\x3a\x1f"
+ "\x6f\xf0\x03\xbe\x4d\xfd\xc8\x4a"
+ "\xc6\xa4\x81\x6d\xc3\x12\x2a\x5c"
+ "\x07\xff\xf3\x72\x74\x48\xb5\x40"
+ "\x50\xb5\xdd\x90\x43\x31\x18\x15"
+ "\x7b\xf2\xa6\xdb\x83\xc8\x4b\x4a"
+ "\x29\x93\x90\x8b\xda\x07\xf0\x35"
+ "\x6d\x90\x88\x09\x4e\x83\xf5\x5b"
+ "\x94\x12\xbb\x33\x27\x1d\x3f\x23"
+ "\x51\xa8\x7c\x07\xa2\xae\x77\xa6"
+ "\x50\xfd\xcc\xc0\x4f\x80\x7a\x9f"
+ "\x66\xdd\xcd\x75\x24\x8b\x33\xf7"
+ "\x20\xdb\x83\x9b\x4f\x11\x63\x6e"
+ "\xcf\x37\xef\xc9\x11\x01\x5c\x45"
+ "\x32\x99\x7c\x3c\x9e\x42\x89\xe3"
+ "\x70\x6d\x15\x9f\xb1\xe6\xb6\x05"
+ "\xfe\x0c\xb9\x49\x2d\x90\x6d\xcc"
+ "\x5d\x3f\xc1\xfe\x89\x0a\x2e\x2d"
+ "\xa0\xa8\x89\x3b\x73\x39\xa5\x94"
+ "\x4c\xa4\xa6\xbb\xa7\x14\x46\x89"
+ "\x10\xff\xaf\xef\xca\xdd\x4f\x80"
+ "\xb3\xdf\x3b\xab\xd4\xe5\x5a\xc7"
+ "\x33\xca\x00\x8b\x8b\x3f\xea\xec"
+ "\x68\x8a\xc2\x6d\xfd\xd4\x67\x0f"
+ "\x22\x31\xe1\x0e\xfe\x5a\x04\xd5"
+ "\x64\xa3\xf1\x1a\x76\x28\xcc\x35"
+ "\x36\xa7\x0a\x74\xf7\x1c\x44\x9b"
+ "\xc7\x1b\x53\x17\x02\xea\xd1\xad"
+ "\x13\x51\x73\xc0\xa0\xb2\x05\x32"
+ "\xa8\xa2\x37\x2e\xe1\x7a\x3a\x19"
+ "\x26\xb4\x6c\x62\x5d\xb3\x1a\x1d"
+ "\x59\xda\xee\x1a\x22\x18\xda\x0d"
+ "\x88\x0f\x55\x8b\x72\x62\xfd\xc1"
+ "\x69\x13\xcd\x0d\x5f\xc1\x09\x52"
+ "\xee\xd6\xe3\x84\x4d\xee\xf6\x88"
+ "\xaf\x83\xdc\x76\xf4\xc0\x93\x3f"
+ "\x4a\x75\x2f\xb0\x0b\x3e\xc4\x54"
+ "\x7d\x69\x8d\x00\x62\x77\x0d\x14"
+ "\xbe\x7c\xa6\x7d\xc5\x24\x4f\xf3"
+ "\x50\xf7\x5f\xf4\xc2\xca\x41\x97"
+ "\x37\xbe\x75\x74\xcd\xf0\x75\x6e"
+ "\x25\x23\x94\xbd\xda\x8d\xb0\xd4",
+ .rlen = 512,
+ }, {
+ .key = "\x27\x18\x28\x18\x28\x45\x90\x45"
+ "\x23\x53\x60\x28\x74\x71\x35\x26"
+ "\x62\x49\x77\x57\x24\x70\x93\x69"
+ "\x99\x59\x57\x49\x66\x96\x76\x27",
+ .klen = 32,
+ .iv = "\xff\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .input = "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17"
+ "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+ "\x20\x21\x22\x23\x24\x25\x26\x27"
+ "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+ "\x30\x31\x32\x33\x34\x35\x36\x37"
+ "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+ "\x40\x41\x42\x43\x44\x45\x46\x47"
+ "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+ "\x50\x51\x52\x53\x54\x55\x56\x57"
+ "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+ "\x60\x61\x62\x63\x64\x65\x66\x67"
+ "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+ "\x70\x71\x72\x73\x74\x75\x76\x77"
+ "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+ "\x80\x81\x82\x83\x84\x85\x86\x87"
+ "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+ "\x90\x91\x92\x93\x94\x95\x96\x97"
+ "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+ "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+ "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+ "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+ "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+ "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+ "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+ "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+ "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+ "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+ "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+ "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+ "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+ "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17"
+ "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+ "\x20\x21\x22\x23\x24\x25\x26\x27"
+ "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+ "\x30\x31\x32\x33\x34\x35\x36\x37"
+ "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+ "\x40\x41\x42\x43\x44\x45\x46\x47"
+ "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+ "\x50\x51\x52\x53\x54\x55\x56\x57"
+ "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+ "\x60\x61\x62\x63\x64\x65\x66\x67"
+ "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+ "\x70\x71\x72\x73\x74\x75\x76\x77"
+ "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+ "\x80\x81\x82\x83\x84\x85\x86\x87"
+ "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+ "\x90\x91\x92\x93\x94\x95\x96\x97"
+ "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+ "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+ "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+ "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+ "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+ "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+ "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+ "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+ "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+ "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+ "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+ "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+ "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+ .ilen = 512,
+ .result = "\x55\xed\x71\xd3\x02\x8e\x15\x3b"
+ "\xc6\x71\x29\x2d\x3e\x89\x9f\x59"
+ "\x68\x6a\xcc\x8a\x56\x97\xf3\x95"
+ "\x4e\x51\x08\xda\x2a\xf8\x6f\x3c"
+ "\x78\x16\xea\x80\xdb\x33\x75\x94"
+ "\xf9\x29\xc4\x2b\x76\x75\x97\xc7"
+ "\xf2\x98\x2c\xf9\xff\xc8\xd5\x2b"
+ "\x18\xf1\xaf\xcf\x7c\xc5\x0b\xee"
+ "\xad\x3c\x76\x7c\xe6\x27\xa2\x2a"
+ "\xe4\x66\xe1\xab\xa2\x39\xfc\x7c"
+ "\xf5\xec\x32\x74\xa3\xb8\x03\x88"
+ "\x52\xfc\x2e\x56\x3f\xa1\xf0\x9f"
+ "\x84\x5e\x46\xed\x20\x89\xb6\x44"
+ "\x8d\xd0\xed\x54\x47\x16\xbe\x95"
+ "\x8a\xb3\x6b\x72\xc4\x32\x52\x13"
+ "\x1b\xb0\x82\xbe\xac\xf9\x70\xa6"
+ "\x44\x18\xdd\x8c\x6e\xca\x6e\x45"
+ "\x8f\x1e\x10\x07\x57\x25\x98\x7b"
+ "\x17\x8c\x78\xdd\x80\xa7\xd9\xd8"
+ "\x63\xaf\xb9\x67\x57\xfd\xbc\xdb"
+ "\x44\xe9\xc5\x65\xd1\xc7\x3b\xff"
+ "\x20\xa0\x80\x1a\xc3\x9a\xad\x5e"
+ "\x5d\x3b\xd3\x07\xd9\xf5\xfd\x3d"
+ "\x4a\x8b\xa8\xd2\x6e\x7a\x51\x65"
+ "\x6c\x8e\x95\xe0\x45\xc9\x5f\x4a"
+ "\x09\x3c\x3d\x71\x7f\x0c\x84\x2a"
+ "\xc8\x48\x52\x1a\xc2\xd5\xd6\x78"
+ "\x92\x1e\xa0\x90\x2e\xea\xf0\xf3"
+ "\xdc\x0f\xb1\xaf\x0d\x9b\x06\x2e"
+ "\x35\x10\x30\x82\x0d\xe7\xc5\x9b"
+ "\xde\x44\x18\xbd\x9f\xd1\x45\xa9"
+ "\x7b\x7a\x4a\xad\x35\x65\x27\xca"
+ "\xb2\xc3\xd4\x9b\x71\x86\x70\xee"
+ "\xf1\x89\x3b\x85\x4b\x5b\xaa\xaf"
+ "\xfc\x42\xc8\x31\x59\xbe\x16\x60"
+ "\x4f\xf9\xfa\x12\xea\xd0\xa7\x14"
+ "\xf0\x7a\xf3\xd5\x8d\xbd\x81\xef"
+ "\x52\x7f\x29\x51\x94\x20\x67\x3c"
+ "\xd1\xaf\x77\x9f\x22\x5a\x4e\x63"
+ "\xe7\xff\x73\x25\xd1\xdd\x96\x8a"
+ "\x98\x52\x6d\xf3\xac\x3e\xf2\x18"
+ "\x6d\xf6\x0a\x29\xa6\x34\x3d\xed"
+ "\xe3\x27\x0d\x9d\x0a\x02\x44\x7e"
+ "\x5a\x7e\x67\x0f\x0a\x9e\xd6\xad"
+ "\x91\xe6\x4d\x81\x8c\x5c\x59\xaa"
+ "\xfb\xeb\x56\x53\xd2\x7d\x4c\x81"
+ "\x65\x53\x0f\x41\x11\xbd\x98\x99"
+ "\xf9\xc6\xfa\x51\x2e\xa3\xdd\x8d"
+ "\x84\x98\xf9\x34\xed\x33\x2a\x1f"
+ "\x82\xed\xc1\x73\x98\xd3\x02\xdc"
+ "\xe6\xc2\x33\x1d\xa2\xb4\xca\x76"
+ "\x63\x51\x34\x9d\x96\x12\xae\xce"
+ "\x83\xc9\x76\x5e\xa4\x1b\x53\x37"
+ "\x17\xd5\xc0\x80\x1d\x62\xf8\x3d"
+ "\x54\x27\x74\xbb\x10\x86\x57\x46"
+ "\x68\xe1\xed\x14\xe7\x9d\xfc\x84"
+ "\x47\xbc\xc2\xf8\x19\x4b\x99\xcf"
+ "\x7a\xe9\xc4\xb8\x8c\x82\x72\x4d"
+ "\x7b\x4f\x38\x55\x36\x71\x64\xc1"
+ "\xfc\x5c\x75\x52\x33\x02\x18\xf8"
+ "\x17\xe1\x2b\xc2\x43\x39\xbd\x76"
+ "\x9b\x63\x76\x32\x2f\x19\x72\x10"
+ "\x9f\x21\x0c\xf1\x66\x50\x7f\xa5"
+ "\x0d\x1f\x46\xe0\xba\xd3\x2f\x3c",
+ .rlen = 512,
+ .also_non_np = 1,
+ .np = 3,
+ .tap = { 512 - 20, 4, 16 },
+ }
+};
+
+static const struct cipher_testvec speck64_xts_dec_tv_template[] = {
+ {
+ .key = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .klen = 24,
+ .iv = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .input = "\x84\xaf\x54\x07\x19\xd4\x7c\xa6"
+ "\xe4\xfe\xdf\xc4\x1f\x34\xc3\xc2"
+ "\x80\xf5\x72\xe7\xcd\xf0\x99\x22"
+ "\x35\xa7\x2f\x06\xef\xdc\x51\xaa",
+ .ilen = 32,
+ .result = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .rlen = 32,
+ }, {
+ .key = "\x11\x11\x11\x11\x11\x11\x11\x11"
+ "\x11\x11\x11\x11\x11\x11\x11\x11"
+ "\x22\x22\x22\x22\x22\x22\x22\x22",
+ .klen = 24,
+ .iv = "\x33\x33\x33\x33\x33\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .input = "\x12\x56\x73\xcd\x15\x87\xa8\x59"
+ "\xcf\x84\xae\xd9\x1c\x66\xd6\x9f"
+ "\xb3\x12\x69\x7e\x36\xeb\x52\xff"
+ "\x62\xdd\xba\x90\xb3\xe1\xee\x99",
+ .ilen = 32,
+ .result = "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44",
+ .rlen = 32,
+ }, {
+ .key = "\xff\xfe\xfd\xfc\xfb\xfa\xf9\xf8"
+ "\xf7\xf6\xf5\xf4\xf3\xf2\xf1\xf0"
+ "\x22\x22\x22\x22\x22\x22\x22\x22",
+ .klen = 24,
+ .iv = "\x33\x33\x33\x33\x33\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .input = "\x15\x1b\xe4\x2c\xa2\x5a\x2d\x2c"
+ "\x27\x36\xc0\xbf\x5d\xea\x36\x37"
+ "\x2d\x1a\x88\xbc\x66\xb5\xd0\x0b"
+ "\xa1\xbc\x19\xb2\x0f\x3b\x75\x34",
+ .ilen = 32,
+ .result = "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44"
+ "\x44\x44\x44\x44\x44\x44\x44\x44",
+ .rlen = 32,
+ }, {
+ .key = "\x27\x18\x28\x18\x28\x45\x90\x45"
+ "\x23\x53\x60\x28\x74\x71\x35\x26"
+ "\x31\x41\x59\x26\x53\x58\x97\x93",
+ .klen = 24,
+ .iv = "\x00\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .input = "\xaf\xa1\x81\xa6\x32\xbb\x15\x8e"
+ "\xf8\x95\x2e\xd3\xe6\xee\x7e\x09"
+ "\x0c\x1a\xf5\x02\x97\x8b\xe3\xb3"
+ "\x11\xc7\x39\x96\xd0\x95\xf4\x56"
+ "\xf4\xdd\x03\x38\x01\x44\x2c\xcf"
+ "\x88\xae\x8e\x3c\xcd\xe7\xaa\x66"
+ "\xfe\x3d\xc6\xfb\x01\x23\x51\x43"
+ "\xd5\xd2\x13\x86\x94\x34\xe9\x62"
+ "\xf9\x89\xe3\xd1\x7b\xbe\xf8\xef"
+ "\x76\x35\x04\x3f\xdb\x23\x9d\x0b"
+ "\x85\x42\xb9\x02\xd6\xcc\xdb\x96"
+ "\xa7\x6b\x27\xb6\xd4\x45\x8f\x7d"
+ "\xae\xd2\x04\xd5\xda\xc1\x7e\x24"
+ "\x8c\x73\xbe\x48\x7e\xcf\x65\x28"
+ "\x29\xe5\xbe\x54\x30\xcb\x46\x95"
+ "\x4f\x2e\x8a\x36\xc8\x27\xc5\xbe"
+ "\xd0\x1a\xaf\xab\x26\xcd\x9e\x69"
+ "\xa1\x09\x95\x71\x26\xe9\xc4\xdf"
+ "\xe6\x31\xc3\x46\xda\xaf\x0b\x41"
+ "\x1f\xab\xb1\x8e\xd6\xfc\x0b\xb3"
+ "\x82\xc0\x37\x27\xfc\x91\xa7\x05"
+ "\xfb\xc5\xdc\x2b\x74\x96\x48\x43"
+ "\x5d\x9c\x19\x0f\x60\x63\x3a\x1f"
+ "\x6f\xf0\x03\xbe\x4d\xfd\xc8\x4a"
+ "\xc6\xa4\x81\x6d\xc3\x12\x2a\x5c"
+ "\x07\xff\xf3\x72\x74\x48\xb5\x40"
+ "\x50\xb5\xdd\x90\x43\x31\x18\x15"
+ "\x7b\xf2\xa6\xdb\x83\xc8\x4b\x4a"
+ "\x29\x93\x90\x8b\xda\x07\xf0\x35"
+ "\x6d\x90\x88\x09\x4e\x83\xf5\x5b"
+ "\x94\x12\xbb\x33\x27\x1d\x3f\x23"
+ "\x51\xa8\x7c\x07\xa2\xae\x77\xa6"
+ "\x50\xfd\xcc\xc0\x4f\x80\x7a\x9f"
+ "\x66\xdd\xcd\x75\x24\x8b\x33\xf7"
+ "\x20\xdb\x83\x9b\x4f\x11\x63\x6e"
+ "\xcf\x37\xef\xc9\x11\x01\x5c\x45"
+ "\x32\x99\x7c\x3c\x9e\x42\x89\xe3"
+ "\x70\x6d\x15\x9f\xb1\xe6\xb6\x05"
+ "\xfe\x0c\xb9\x49\x2d\x90\x6d\xcc"
+ "\x5d\x3f\xc1\xfe\x89\x0a\x2e\x2d"
+ "\xa0\xa8\x89\x3b\x73\x39\xa5\x94"
+ "\x4c\xa4\xa6\xbb\xa7\x14\x46\x89"
+ "\x10\xff\xaf\xef\xca\xdd\x4f\x80"
+ "\xb3\xdf\x3b\xab\xd4\xe5\x5a\xc7"
+ "\x33\xca\x00\x8b\x8b\x3f\xea\xec"
+ "\x68\x8a\xc2\x6d\xfd\xd4\x67\x0f"
+ "\x22\x31\xe1\x0e\xfe\x5a\x04\xd5"
+ "\x64\xa3\xf1\x1a\x76\x28\xcc\x35"
+ "\x36\xa7\x0a\x74\xf7\x1c\x44\x9b"
+ "\xc7\x1b\x53\x17\x02\xea\xd1\xad"
+ "\x13\x51\x73\xc0\xa0\xb2\x05\x32"
+ "\xa8\xa2\x37\x2e\xe1\x7a\x3a\x19"
+ "\x26\xb4\x6c\x62\x5d\xb3\x1a\x1d"
+ "\x59\xda\xee\x1a\x22\x18\xda\x0d"
+ "\x88\x0f\x55\x8b\x72\x62\xfd\xc1"
+ "\x69\x13\xcd\x0d\x5f\xc1\x09\x52"
+ "\xee\xd6\xe3\x84\x4d\xee\xf6\x88"
+ "\xaf\x83\xdc\x76\xf4\xc0\x93\x3f"
+ "\x4a\x75\x2f\xb0\x0b\x3e\xc4\x54"
+ "\x7d\x69\x8d\x00\x62\x77\x0d\x14"
+ "\xbe\x7c\xa6\x7d\xc5\x24\x4f\xf3"
+ "\x50\xf7\x5f\xf4\xc2\xca\x41\x97"
+ "\x37\xbe\x75\x74\xcd\xf0\x75\x6e"
+ "\x25\x23\x94\xbd\xda\x8d\xb0\xd4",
+ .ilen = 512,
+ .result = "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17"
+ "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+ "\x20\x21\x22\x23\x24\x25\x26\x27"
+ "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+ "\x30\x31\x32\x33\x34\x35\x36\x37"
+ "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+ "\x40\x41\x42\x43\x44\x45\x46\x47"
+ "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+ "\x50\x51\x52\x53\x54\x55\x56\x57"
+ "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+ "\x60\x61\x62\x63\x64\x65\x66\x67"
+ "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+ "\x70\x71\x72\x73\x74\x75\x76\x77"
+ "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+ "\x80\x81\x82\x83\x84\x85\x86\x87"
+ "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+ "\x90\x91\x92\x93\x94\x95\x96\x97"
+ "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+ "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+ "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+ "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+ "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+ "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+ "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+ "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+ "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+ "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+ "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+ "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+ "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+ "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17"
+ "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+ "\x20\x21\x22\x23\x24\x25\x26\x27"
+ "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+ "\x30\x31\x32\x33\x34\x35\x36\x37"
+ "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+ "\x40\x41\x42\x43\x44\x45\x46\x47"
+ "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+ "\x50\x51\x52\x53\x54\x55\x56\x57"
+ "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+ "\x60\x61\x62\x63\x64\x65\x66\x67"
+ "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+ "\x70\x71\x72\x73\x74\x75\x76\x77"
+ "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+ "\x80\x81\x82\x83\x84\x85\x86\x87"
+ "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+ "\x90\x91\x92\x93\x94\x95\x96\x97"
+ "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+ "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+ "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+ "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+ "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+ "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+ "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+ "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+ "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+ "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+ "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+ "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+ "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+ .rlen = 512,
+ }, {
+ .key = "\x27\x18\x28\x18\x28\x45\x90\x45"
+ "\x23\x53\x60\x28\x74\x71\x35\x26"
+ "\x62\x49\x77\x57\x24\x70\x93\x69"
+ "\x99\x59\x57\x49\x66\x96\x76\x27",
+ .klen = 32,
+ .iv = "\xff\x00\x00\x00\x00\x00\x00\x00"
+ "\x00\x00\x00\x00\x00\x00\x00\x00",
+ .input = "\x55\xed\x71\xd3\x02\x8e\x15\x3b"
+ "\xc6\x71\x29\x2d\x3e\x89\x9f\x59"
+ "\x68\x6a\xcc\x8a\x56\x97\xf3\x95"
+ "\x4e\x51\x08\xda\x2a\xf8\x6f\x3c"
+ "\x78\x16\xea\x80\xdb\x33\x75\x94"
+ "\xf9\x29\xc4\x2b\x76\x75\x97\xc7"
+ "\xf2\x98\x2c\xf9\xff\xc8\xd5\x2b"
+ "\x18\xf1\xaf\xcf\x7c\xc5\x0b\xee"
+ "\xad\x3c\x76\x7c\xe6\x27\xa2\x2a"
+ "\xe4\x66\xe1\xab\xa2\x39\xfc\x7c"
+ "\xf5\xec\x32\x74\xa3\xb8\x03\x88"
+ "\x52\xfc\x2e\x56\x3f\xa1\xf0\x9f"
+ "\x84\x5e\x46\xed\x20\x89\xb6\x44"
+ "\x8d\xd0\xed\x54\x47\x16\xbe\x95"
+ "\x8a\xb3\x6b\x72\xc4\x32\x52\x13"
+ "\x1b\xb0\x82\xbe\xac\xf9\x70\xa6"
+ "\x44\x18\xdd\x8c\x6e\xca\x6e\x45"
+ "\x8f\x1e\x10\x07\x57\x25\x98\x7b"
+ "\x17\x8c\x78\xdd\x80\xa7\xd9\xd8"
+ "\x63\xaf\xb9\x67\x57\xfd\xbc\xdb"
+ "\x44\xe9\xc5\x65\xd1\xc7\x3b\xff"
+ "\x20\xa0\x80\x1a\xc3\x9a\xad\x5e"
+ "\x5d\x3b\xd3\x07\xd9\xf5\xfd\x3d"
+ "\x4a\x8b\xa8\xd2\x6e\x7a\x51\x65"
+ "\x6c\x8e\x95\xe0\x45\xc9\x5f\x4a"
+ "\x09\x3c\x3d\x71\x7f\x0c\x84\x2a"
+ "\xc8\x48\x52\x1a\xc2\xd5\xd6\x78"
+ "\x92\x1e\xa0\x90\x2e\xea\xf0\xf3"
+ "\xdc\x0f\xb1\xaf\x0d\x9b\x06\x2e"
+ "\x35\x10\x30\x82\x0d\xe7\xc5\x9b"
+ "\xde\x44\x18\xbd\x9f\xd1\x45\xa9"
+ "\x7b\x7a\x4a\xad\x35\x65\x27\xca"
+ "\xb2\xc3\xd4\x9b\x71\x86\x70\xee"
+ "\xf1\x89\x3b\x85\x4b\x5b\xaa\xaf"
+ "\xfc\x42\xc8\x31\x59\xbe\x16\x60"
+ "\x4f\xf9\xfa\x12\xea\xd0\xa7\x14"
+ "\xf0\x7a\xf3\xd5\x8d\xbd\x81\xef"
+ "\x52\x7f\x29\x51\x94\x20\x67\x3c"
+ "\xd1\xaf\x77\x9f\x22\x5a\x4e\x63"
+ "\xe7\xff\x73\x25\xd1\xdd\x96\x8a"
+ "\x98\x52\x6d\xf3\xac\x3e\xf2\x18"
+ "\x6d\xf6\x0a\x29\xa6\x34\x3d\xed"
+ "\xe3\x27\x0d\x9d\x0a\x02\x44\x7e"
+ "\x5a\x7e\x67\x0f\x0a\x9e\xd6\xad"
+ "\x91\xe6\x4d\x81\x8c\x5c\x59\xaa"
+ "\xfb\xeb\x56\x53\xd2\x7d\x4c\x81"
+ "\x65\x53\x0f\x41\x11\xbd\x98\x99"
+ "\xf9\xc6\xfa\x51\x2e\xa3\xdd\x8d"
+ "\x84\x98\xf9\x34\xed\x33\x2a\x1f"
+ "\x82\xed\xc1\x73\x98\xd3\x02\xdc"
+ "\xe6\xc2\x33\x1d\xa2\xb4\xca\x76"
+ "\x63\x51\x34\x9d\x96\x12\xae\xce"
+ "\x83\xc9\x76\x5e\xa4\x1b\x53\x37"
+ "\x17\xd5\xc0\x80\x1d\x62\xf8\x3d"
+ "\x54\x27\x74\xbb\x10\x86\x57\x46"
+ "\x68\xe1\xed\x14\xe7\x9d\xfc\x84"
+ "\x47\xbc\xc2\xf8\x19\x4b\x99\xcf"
+ "\x7a\xe9\xc4\xb8\x8c\x82\x72\x4d"
+ "\x7b\x4f\x38\x55\x36\x71\x64\xc1"
+ "\xfc\x5c\x75\x52\x33\x02\x18\xf8"
+ "\x17\xe1\x2b\xc2\x43\x39\xbd\x76"
+ "\x9b\x63\x76\x32\x2f\x19\x72\x10"
+ "\x9f\x21\x0c\xf1\x66\x50\x7f\xa5"
+ "\x0d\x1f\x46\xe0\xba\xd3\x2f\x3c",
+ .ilen = 512,
+ .result = "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17"
+ "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+ "\x20\x21\x22\x23\x24\x25\x26\x27"
+ "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+ "\x30\x31\x32\x33\x34\x35\x36\x37"
+ "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+ "\x40\x41\x42\x43\x44\x45\x46\x47"
+ "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+ "\x50\x51\x52\x53\x54\x55\x56\x57"
+ "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+ "\x60\x61\x62\x63\x64\x65\x66\x67"
+ "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+ "\x70\x71\x72\x73\x74\x75\x76\x77"
+ "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+ "\x80\x81\x82\x83\x84\x85\x86\x87"
+ "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+ "\x90\x91\x92\x93\x94\x95\x96\x97"
+ "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+ "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+ "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+ "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+ "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+ "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+ "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+ "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+ "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+ "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+ "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+ "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+ "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff"
+ "\x00\x01\x02\x03\x04\x05\x06\x07"
+ "\x08\x09\x0a\x0b\x0c\x0d\x0e\x0f"
+ "\x10\x11\x12\x13\x14\x15\x16\x17"
+ "\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f"
+ "\x20\x21\x22\x23\x24\x25\x26\x27"
+ "\x28\x29\x2a\x2b\x2c\x2d\x2e\x2f"
+ "\x30\x31\x32\x33\x34\x35\x36\x37"
+ "\x38\x39\x3a\x3b\x3c\x3d\x3e\x3f"
+ "\x40\x41\x42\x43\x44\x45\x46\x47"
+ "\x48\x49\x4a\x4b\x4c\x4d\x4e\x4f"
+ "\x50\x51\x52\x53\x54\x55\x56\x57"
+ "\x58\x59\x5a\x5b\x5c\x5d\x5e\x5f"
+ "\x60\x61\x62\x63\x64\x65\x66\x67"
+ "\x68\x69\x6a\x6b\x6c\x6d\x6e\x6f"
+ "\x70\x71\x72\x73\x74\x75\x76\x77"
+ "\x78\x79\x7a\x7b\x7c\x7d\x7e\x7f"
+ "\x80\x81\x82\x83\x84\x85\x86\x87"
+ "\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f"
+ "\x90\x91\x92\x93\x94\x95\x96\x97"
+ "\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f"
+ "\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7"
+ "\xa8\xa9\xaa\xab\xac\xad\xae\xaf"
+ "\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7"
+ "\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf"
+ "\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7"
+ "\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf"
+ "\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7"
+ "\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf"
+ "\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7"
+ "\xe8\xe9\xea\xeb\xec\xed\xee\xef"
+ "\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7"
+ "\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff",
+ .rlen = 512,
+ .also_non_np = 1,
+ .np = 3,
+ .tap = { 512 - 20, 4, 16 },
+ }
+};
+
/* Cast6 test vectors from RFC 2612 */
static const struct cipher_testvec cast6_enc_tv_template[] = {
{
@@ -34638,4 +36124,75 @@
},
};
+static const struct comp_testvec zstd_comp_tv_template[] = {
+ {
+ .inlen = 68,
+ .outlen = 39,
+ .input = "The algorithm is zstd. "
+ "The algorithm is zstd. "
+ "The algorithm is zstd.",
+ .output = "\x28\xb5\x2f\xfd\x00\x50\xf5\x00\x00\xb8\x54\x68\x65"
+ "\x20\x61\x6c\x67\x6f\x72\x69\x74\x68\x6d\x20\x69\x73"
+ "\x20\x7a\x73\x74\x64\x2e\x20\x01\x00\x55\x73\x36\x01"
+ ,
+ },
+ {
+ .inlen = 244,
+ .outlen = 151,
+ .input = "zstd, short for Zstandard, is a fast lossless "
+ "compression algorithm, targeting real-time "
+ "compression scenarios at zlib-level and better "
+ "compression ratios. The zstd compression library "
+ "provides in-memory compression and decompression "
+ "functions.",
+ .output = "\x28\xb5\x2f\xfd\x00\x50\x75\x04\x00\x42\x4b\x1e\x17"
+ "\x90\x81\x31\x00\xf2\x2f\xe4\x36\xc9\xef\x92\x88\x32"
+ "\xc9\xf2\x24\x94\xd8\x68\x9a\x0f\x00\x0c\xc4\x31\x6f"
+ "\x0d\x0c\x38\xac\x5c\x48\x03\xcd\x63\x67\xc0\xf3\xad"
+ "\x4e\x90\xaa\x78\xa0\xa4\xc5\x99\xda\x2f\xb6\x24\x60"
+ "\xe2\x79\x4b\xaa\xb6\x6b\x85\x0b\xc9\xc6\x04\x66\x86"
+ "\xe2\xcc\xe2\x25\x3f\x4f\x09\xcd\xb8\x9d\xdb\xc1\x90"
+ "\xa9\x11\xbc\x35\x44\x69\x2d\x9c\x64\x4f\x13\x31\x64"
+ "\xcc\xfb\x4d\x95\x93\x86\x7f\x33\x7f\x1a\xef\xe9\x30"
+ "\xf9\x67\xa1\x94\x0a\x69\x0f\x60\xcd\xc3\xab\x99\xdc"
+ "\x42\xed\x97\x05\x00\x33\xc3\x15\x95\x3a\x06\xa0\x0e"
+ "\x20\xa9\x0e\x82\xb9\x43\x45\x01",
+ },
+};
+
+static const struct comp_testvec zstd_decomp_tv_template[] = {
+ {
+ .inlen = 43,
+ .outlen = 68,
+ .input = "\x28\xb5\x2f\xfd\x04\x50\xf5\x00\x00\xb8\x54\x68\x65"
+ "\x20\x61\x6c\x67\x6f\x72\x69\x74\x68\x6d\x20\x69\x73"
+ "\x20\x7a\x73\x74\x64\x2e\x20\x01\x00\x55\x73\x36\x01"
+ "\x6b\xf4\x13\x35",
+ .output = "The algorithm is zstd. "
+ "The algorithm is zstd. "
+ "The algorithm is zstd.",
+ },
+ {
+ .inlen = 155,
+ .outlen = 244,
+ .input = "\x28\xb5\x2f\xfd\x04\x50\x75\x04\x00\x42\x4b\x1e\x17"
+ "\x90\x81\x31\x00\xf2\x2f\xe4\x36\xc9\xef\x92\x88\x32"
+ "\xc9\xf2\x24\x94\xd8\x68\x9a\x0f\x00\x0c\xc4\x31\x6f"
+ "\x0d\x0c\x38\xac\x5c\x48\x03\xcd\x63\x67\xc0\xf3\xad"
+ "\x4e\x90\xaa\x78\xa0\xa4\xc5\x99\xda\x2f\xb6\x24\x60"
+ "\xe2\x79\x4b\xaa\xb6\x6b\x85\x0b\xc9\xc6\x04\x66\x86"
+ "\xe2\xcc\xe2\x25\x3f\x4f\x09\xcd\xb8\x9d\xdb\xc1\x90"
+ "\xa9\x11\xbc\x35\x44\x69\x2d\x9c\x64\x4f\x13\x31\x64"
+ "\xcc\xfb\x4d\x95\x93\x86\x7f\x33\x7f\x1a\xef\xe9\x30"
+ "\xf9\x67\xa1\x94\x0a\x69\x0f\x60\xcd\xc3\xab\x99\xdc"
+ "\x42\xed\x97\x05\x00\x33\xc3\x15\x95\x3a\x06\xa0\x0e"
+ "\x20\xa9\x0e\x82\xb9\x43\x45\x01\xaa\x6d\xda\x0d",
+ .output = "zstd, short for Zstandard, is a fast lossless "
+ "compression algorithm, targeting real-time "
+ "compression scenarios at zlib-level and better "
+ "compression ratios. The zstd compression library "
+ "provides in-memory compression and decompression "
+ "functions.",
+ },
+};
#endif /* _CRYPTO_TESTMGR_H */
diff --git a/crypto/zstd.c b/crypto/zstd.c
new file mode 100644
index 0000000..9a76b3e
--- /dev/null
+++ b/crypto/zstd.c
@@ -0,0 +1,265 @@
+/*
+ * Cryptographic API.
+ *
+ * Copyright (c) 2017-present, Facebook, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ */
+#include <linux/crypto.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/net.h>
+#include <linux/vmalloc.h>
+#include <linux/zstd.h>
+#include <crypto/internal/scompress.h>
+
+
+#define ZSTD_DEF_LEVEL 3
+
+struct zstd_ctx {
+ ZSTD_CCtx *cctx;
+ ZSTD_DCtx *dctx;
+ void *cwksp;
+ void *dwksp;
+};
+
+static ZSTD_parameters zstd_params(void)
+{
+ return ZSTD_getParams(ZSTD_DEF_LEVEL, 0, 0);
+}
+
+static int zstd_comp_init(struct zstd_ctx *ctx)
+{
+ int ret = 0;
+ const ZSTD_parameters params = zstd_params();
+ const size_t wksp_size = ZSTD_CCtxWorkspaceBound(params.cParams);
+
+ ctx->cwksp = vzalloc(wksp_size);
+ if (!ctx->cwksp) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ ctx->cctx = ZSTD_initCCtx(ctx->cwksp, wksp_size);
+ if (!ctx->cctx) {
+ ret = -EINVAL;
+ goto out_free;
+ }
+out:
+ return ret;
+out_free:
+ vfree(ctx->cwksp);
+ goto out;
+}
+
+static int zstd_decomp_init(struct zstd_ctx *ctx)
+{
+ int ret = 0;
+ const size_t wksp_size = ZSTD_DCtxWorkspaceBound();
+
+ ctx->dwksp = vzalloc(wksp_size);
+ if (!ctx->dwksp) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ ctx->dctx = ZSTD_initDCtx(ctx->dwksp, wksp_size);
+ if (!ctx->dctx) {
+ ret = -EINVAL;
+ goto out_free;
+ }
+out:
+ return ret;
+out_free:
+ vfree(ctx->dwksp);
+ goto out;
+}
+
+static void zstd_comp_exit(struct zstd_ctx *ctx)
+{
+ vfree(ctx->cwksp);
+ ctx->cwksp = NULL;
+ ctx->cctx = NULL;
+}
+
+static void zstd_decomp_exit(struct zstd_ctx *ctx)
+{
+ vfree(ctx->dwksp);
+ ctx->dwksp = NULL;
+ ctx->dctx = NULL;
+}
+
+static int __zstd_init(void *ctx)
+{
+ int ret;
+
+ ret = zstd_comp_init(ctx);
+ if (ret)
+ return ret;
+ ret = zstd_decomp_init(ctx);
+ if (ret)
+ zstd_comp_exit(ctx);
+ return ret;
+}
+
+static void *zstd_alloc_ctx(struct crypto_scomp *tfm)
+{
+ int ret;
+ struct zstd_ctx *ctx;
+
+ ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+ if (!ctx)
+ return ERR_PTR(-ENOMEM);
+
+ ret = __zstd_init(ctx);
+ if (ret) {
+ kfree(ctx);
+ return ERR_PTR(ret);
+ }
+
+ return ctx;
+}
+
+static int zstd_init(struct crypto_tfm *tfm)
+{
+ struct zstd_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ return __zstd_init(ctx);
+}
+
+static void __zstd_exit(void *ctx)
+{
+ zstd_comp_exit(ctx);
+ zstd_decomp_exit(ctx);
+}
+
+static void zstd_free_ctx(struct crypto_scomp *tfm, void *ctx)
+{
+ __zstd_exit(ctx);
+ kzfree(ctx);
+}
+
+static void zstd_exit(struct crypto_tfm *tfm)
+{
+ struct zstd_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ __zstd_exit(ctx);
+}
+
+static int __zstd_compress(const u8 *src, unsigned int slen,
+ u8 *dst, unsigned int *dlen, void *ctx)
+{
+ size_t out_len;
+ struct zstd_ctx *zctx = ctx;
+ const ZSTD_parameters params = zstd_params();
+
+ out_len = ZSTD_compressCCtx(zctx->cctx, dst, *dlen, src, slen, params);
+ if (ZSTD_isError(out_len))
+ return -EINVAL;
+ *dlen = out_len;
+ return 0;
+}
+
+static int zstd_compress(struct crypto_tfm *tfm, const u8 *src,
+ unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+ struct zstd_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ return __zstd_compress(src, slen, dst, dlen, ctx);
+}
+
+static int zstd_scompress(struct crypto_scomp *tfm, const u8 *src,
+ unsigned int slen, u8 *dst, unsigned int *dlen,
+ void *ctx)
+{
+ return __zstd_compress(src, slen, dst, dlen, ctx);
+}
+
+static int __zstd_decompress(const u8 *src, unsigned int slen,
+ u8 *dst, unsigned int *dlen, void *ctx)
+{
+ size_t out_len;
+ struct zstd_ctx *zctx = ctx;
+
+ out_len = ZSTD_decompressDCtx(zctx->dctx, dst, *dlen, src, slen);
+ if (ZSTD_isError(out_len))
+ return -EINVAL;
+ *dlen = out_len;
+ return 0;
+}
+
+static int zstd_decompress(struct crypto_tfm *tfm, const u8 *src,
+ unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+ struct zstd_ctx *ctx = crypto_tfm_ctx(tfm);
+
+ return __zstd_decompress(src, slen, dst, dlen, ctx);
+}
+
+static int zstd_sdecompress(struct crypto_scomp *tfm, const u8 *src,
+ unsigned int slen, u8 *dst, unsigned int *dlen,
+ void *ctx)
+{
+ return __zstd_decompress(src, slen, dst, dlen, ctx);
+}
+
+static struct crypto_alg alg = {
+ .cra_name = "zstd",
+ .cra_flags = CRYPTO_ALG_TYPE_COMPRESS,
+ .cra_ctxsize = sizeof(struct zstd_ctx),
+ .cra_module = THIS_MODULE,
+ .cra_init = zstd_init,
+ .cra_exit = zstd_exit,
+ .cra_u = { .compress = {
+ .coa_compress = zstd_compress,
+ .coa_decompress = zstd_decompress } }
+};
+
+static struct scomp_alg scomp = {
+ .alloc_ctx = zstd_alloc_ctx,
+ .free_ctx = zstd_free_ctx,
+ .compress = zstd_scompress,
+ .decompress = zstd_sdecompress,
+ .base = {
+ .cra_name = "zstd",
+ .cra_driver_name = "zstd-scomp",
+ .cra_module = THIS_MODULE,
+ }
+};
+
+static int __init zstd_mod_init(void)
+{
+ int ret;
+
+ ret = crypto_register_alg(&alg);
+ if (ret)
+ return ret;
+
+ ret = crypto_register_scomp(&scomp);
+ if (ret)
+ crypto_unregister_alg(&alg);
+
+ return ret;
+}
+
+static void __exit zstd_mod_fini(void)
+{
+ crypto_unregister_alg(&alg);
+ crypto_unregister_scomp(&scomp);
+}
+
+module_init(zstd_mod_init);
+module_exit(zstd_mod_fini);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Zstd Compression Algorithm");
+MODULE_ALIAS_CRYPTO("zstd");
diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
index 9e80802..a329730 100644
--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -557,7 +557,8 @@
return 0;
}
-static int param_set_lid_init_state(const char *val, struct kernel_param *kp)
+static int param_set_lid_init_state(const char *val,
+ const struct kernel_param *kp)
{
int result = 0;
@@ -575,7 +576,8 @@
return result;
}
-static int param_get_lid_init_state(char *buffer, struct kernel_param *kp)
+static int param_get_lid_init_state(char *buffer,
+ const struct kernel_param *kp)
{
switch (lid_init_state) {
case ACPI_BUTTON_LID_INIT_OPEN:
diff --git a/drivers/acpi/ec.c b/drivers/acpi/ec.c
index 3d624c7..70a0f8b 100644
--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -1962,7 +1962,8 @@
SET_SYSTEM_SLEEP_PM_OPS(acpi_ec_suspend, acpi_ec_resume)
};
-static int param_set_event_clearing(const char *val, struct kernel_param *kp)
+static int param_set_event_clearing(const char *val,
+ const struct kernel_param *kp)
{
int result = 0;
@@ -1980,7 +1981,8 @@
return result;
}
-static int param_get_event_clearing(char *buffer, struct kernel_param *kp)
+static int param_get_event_clearing(char *buffer,
+ const struct kernel_param *kp)
{
switch (ec_event_clearing) {
case ACPI_EC_EVT_TIMING_STATUS:
diff --git a/drivers/acpi/sysfs.c b/drivers/acpi/sysfs.c
index 0fd57bf..9e728a1 100644
--- a/drivers/acpi/sysfs.c
+++ b/drivers/acpi/sysfs.c
@@ -230,7 +230,8 @@
module_param_cb(trace_debug_layer, ¶m_ops_trace_attrib, &acpi_gbl_trace_dbg_layer, 0644);
module_param_cb(trace_debug_level, ¶m_ops_trace_attrib, &acpi_gbl_trace_dbg_level, 0644);
-static int param_set_trace_state(const char *val, struct kernel_param *kp)
+static int param_set_trace_state(const char *val,
+ const struct kernel_param *kp)
{
acpi_status status;
const char *method = trace_method_name;
@@ -266,7 +267,7 @@
return 0;
}
-static int param_get_trace_state(char *buffer, struct kernel_param *kp)
+static int param_get_trace_state(char *buffer, const struct kernel_param *kp)
{
if (!(acpi_gbl_trace_flags & ACPI_TRACE_ENABLED))
return sprintf(buffer, "disable");
@@ -295,7 +296,8 @@
"To enable/disable the ACPI Debug Object output.");
/* /sys/module/acpi/parameters/acpica_version */
-static int param_get_acpica_version(char *buffer, struct kernel_param *kp)
+static int param_get_acpica_version(char *buffer,
+ const struct kernel_param *kp)
{
int result;
diff --git a/drivers/android/Kconfig b/drivers/android/Kconfig
index 7dce379..ee4880b 100644
--- a/drivers/android/Kconfig
+++ b/drivers/android/Kconfig
@@ -10,7 +10,7 @@
config ANDROID_BINDER_IPC
bool "Android Binder IPC Driver"
- depends on MMU
+ depends on MMU && !M68K
default n
---help---
Binder is used in Android for both communication between processes,
@@ -32,19 +32,6 @@
created. Each binder device has its own context manager, and is
therefore logically separated from the other devices.
-config ANDROID_BINDER_IPC_32BIT
- bool "Use old (Android 4.4 and earlier) 32-bit binder API"
- depends on !64BIT && ANDROID_BINDER_IPC
- default y
- ---help---
- The Binder API has been changed to support both 32 and 64bit
- applications in a mixed environment.
-
- Enable this to support an old 32-bit Android user-space (v4.4 and
- earlier).
-
- Note that enabling this will break newer Android user-space.
-
config ANDROID_BINDER_IPC_SELFTEST
bool "Android Binder IPC Driver Selftest"
depends on ANDROID_BINDER_IPC
diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index a86c279..9c06e7f 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -72,11 +72,8 @@
#include <linux/security.h>
#include <linux/spinlock.h>
-#ifdef CONFIG_ANDROID_BINDER_IPC_32BIT
-#define BINDER_IPC_32BIT 1
-#endif
-
#include <uapi/linux/android/binder.h>
+#include <uapi/linux/sched/types.h>
#include "binder_alloc.h"
#include "binder_trace.h"
@@ -141,7 +138,7 @@
};
static uint32_t binder_debug_mask = BINDER_DEBUG_USER_ERROR |
BINDER_DEBUG_FAILED_TRANSACTION | BINDER_DEBUG_DEAD_TRANSACTION;
-module_param_named(debug_mask, binder_debug_mask, uint, S_IWUSR | S_IRUGO);
+module_param_named(debug_mask, binder_debug_mask, uint, 0644);
static char *binder_devices_param = CONFIG_ANDROID_BINDER_DEVICES;
module_param_named(devices, binder_devices_param, charp, 0444);
@@ -150,7 +147,7 @@
static int binder_stop_on_user_error;
static int binder_set_stop_on_user_error(const char *val,
- struct kernel_param *kp)
+ const struct kernel_param *kp)
{
int ret;
@@ -160,7 +157,7 @@
return ret;
}
module_param_call(stop_on_user_error, binder_set_stop_on_user_error,
- param_get_int, &binder_stop_on_user_error, S_IWUSR | S_IRUGO);
+ param_get_int, &binder_stop_on_user_error, 0644);
#define binder_debug(mask, x...) \
do { \
@@ -249,7 +246,7 @@
unsigned int cur = atomic_inc_return(&log->cur);
if (cur >= ARRAY_SIZE(log->entry))
- log->full = 1;
+ log->full = true;
e = &log->entry[cur % ARRAY_SIZE(log->entry)];
WRITE_ONCE(e->debug_id_done, 0);
/*
@@ -351,10 +348,14 @@
* and by @lock)
* @has_async_transaction: async transaction to node in progress
* (protected by @lock)
+ * @sched_policy: minimum scheduling policy for node
+ * (invariant after initialized)
* @accept_fds: file descriptor operations supported for node
* (invariant after initialized)
* @min_priority: minimum scheduling priority
* (invariant after initialized)
+ * @inherit_rt: inherit RT scheduling policy from caller
+ * (invariant after initialized)
* @async_todo: list of async work items
* (protected by @proc->inner_lock)
*
@@ -390,6 +391,8 @@
/*
* invariant after initialization
*/
+ u8 sched_policy:2;
+ u8 inherit_rt:1;
u8 accept_fds:1;
u8 min_priority;
};
@@ -464,6 +467,22 @@
};
/**
+ * struct binder_priority - scheduler policy and priority
+ * @sched_policy scheduler policy
+ * @prio [100..139] for SCHED_NORMAL, [0..99] for FIFO/RT
+ *
+ * The binder driver supports inheriting the following scheduler policies:
+ * SCHED_NORMAL
+ * SCHED_BATCH
+ * SCHED_FIFO
+ * SCHED_RR
+ */
+struct binder_priority {
+ unsigned int sched_policy;
+ int prio;
+};
+
+/**
* struct binder_proc - binder process bookkeeping
* @proc_node: element for binder_procs list
* @threads: rbtree of binder_threads in this proc
@@ -493,8 +512,6 @@
* (protected by @inner_lock)
* @todo: list of work for this process
* (protected by @inner_lock)
- * @wait: wait queue head to wait for proc work
- * (invariant after initialized)
* @stats: per-process binder statistics
* (atomics, no lock needed)
* @delivered_death: list of delivered death notification
@@ -537,14 +554,13 @@
bool is_dead;
struct list_head todo;
- wait_queue_head_t wait;
struct binder_stats stats;
struct list_head delivered_death;
int max_threads;
int requested_threads;
int requested_threads_started;
int tmp_ref;
- long default_priority;
+ struct binder_priority default_priority;
struct dentry *debugfs_entry;
struct binder_alloc alloc;
struct binder_context *context;
@@ -579,6 +595,8 @@
* (protected by @proc->inner_lock)
* @todo: list of work to do for this thread
* (protected by @proc->inner_lock)
+ * @process_todo: whether work in @todo should be processed
+ * (protected by @proc->inner_lock)
* @return_error: transaction errors reported by this thread
* (only accessed by this thread)
* @reply_error: transaction errors reported by target thread
@@ -592,6 +610,7 @@
* @is_dead: thread is dead and awaiting free
* when outstanding transactions are cleaned up
* (protected by @proc->inner_lock)
+ * @task: struct task_struct for this thread
*
* Bookkeeping structure for binder threads.
*/
@@ -604,12 +623,14 @@
bool looper_need_return; /* can be written by other thread */
struct binder_transaction *transaction_stack;
struct list_head todo;
+ bool process_todo;
struct binder_error return_error;
struct binder_error reply_error;
wait_queue_head_t wait;
struct binder_stats stats;
atomic_t tmp_ref;
bool is_dead;
+ struct task_struct *task;
};
struct binder_transaction {
@@ -626,8 +647,9 @@
struct binder_buffer *buffer;
unsigned int code;
unsigned int flags;
- long priority;
- long saved_priority;
+ struct binder_priority priority;
+ struct binder_priority saved_priority;
+ bool set_priority_called;
kuid_t sender_euid;
/**
* @lock: protects @from, @to_proc, and @to_thread
@@ -789,6 +811,16 @@
return ret;
}
+/**
+ * binder_enqueue_work_ilocked() - Add an item to the work list
+ * @work: struct binder_work to add to list
+ * @target_list: list to add work to
+ *
+ * Adds the work to the specified list. Asserts that work
+ * is not already on a list.
+ *
+ * Requires the proc->inner_lock to be held.
+ */
static void
binder_enqueue_work_ilocked(struct binder_work *work,
struct list_head *target_list)
@@ -799,22 +831,56 @@
}
/**
- * binder_enqueue_work() - Add an item to the work list
- * @proc: binder_proc associated with list
+ * binder_enqueue_deferred_thread_work_ilocked() - Add deferred thread work
+ * @thread: thread to queue work to
* @work: struct binder_work to add to list
- * @target_list: list to add work to
*
- * Adds the work to the specified list. Asserts that work
- * is not already on a list.
+ * Adds the work to the todo list of the thread. Doesn't set the process_todo
+ * flag, which means that (if it wasn't already set) the thread will go to
+ * sleep without handling this work when it calls read.
+ *
+ * Requires the proc->inner_lock to be held.
*/
static void
-binder_enqueue_work(struct binder_proc *proc,
- struct binder_work *work,
- struct list_head *target_list)
+binder_enqueue_deferred_thread_work_ilocked(struct binder_thread *thread,
+ struct binder_work *work)
{
- binder_inner_proc_lock(proc);
- binder_enqueue_work_ilocked(work, target_list);
- binder_inner_proc_unlock(proc);
+ binder_enqueue_work_ilocked(work, &thread->todo);
+}
+
+/**
+ * binder_enqueue_thread_work_ilocked() - Add an item to the thread work list
+ * @thread: thread to queue work to
+ * @work: struct binder_work to add to list
+ *
+ * Adds the work to the todo list of the thread, and enables processing
+ * of the todo queue.
+ *
+ * Requires the proc->inner_lock to be held.
+ */
+static void
+binder_enqueue_thread_work_ilocked(struct binder_thread *thread,
+ struct binder_work *work)
+{
+ binder_enqueue_work_ilocked(work, &thread->todo);
+ thread->process_todo = true;
+}
+
+/**
+ * binder_enqueue_thread_work() - Add an item to the thread work list
+ * @thread: thread to queue work to
+ * @work: struct binder_work to add to list
+ *
+ * Adds the work to the todo list of the thread, and enables processing
+ * of the todo queue.
+ */
+static void
+binder_enqueue_thread_work(struct binder_thread *thread,
+ struct binder_work *work)
+{
+ binder_inner_proc_lock(thread->proc);
+ binder_enqueue_thread_work_ilocked(thread, work);
+ binder_inner_proc_unlock(thread->proc);
}
static void
@@ -940,7 +1006,7 @@
static bool binder_has_work_ilocked(struct binder_thread *thread,
bool do_proc_work)
{
- return !binder_worklist_empty_ilocked(&thread->todo) ||
+ return thread->process_todo ||
thread->looper_need_return ||
(do_proc_work &&
!binder_worklist_empty_ilocked(&thread->proc->todo));
@@ -1064,22 +1130,145 @@
binder_wakeup_thread_ilocked(proc, thread, /* sync = */false);
}
-static void binder_set_nice(long nice)
+static bool is_rt_policy(int policy)
{
- long min_nice;
+ return policy == SCHED_FIFO || policy == SCHED_RR;
+}
- if (can_nice(current, nice)) {
- set_user_nice(current, nice);
+static bool is_fair_policy(int policy)
+{
+ return policy == SCHED_NORMAL || policy == SCHED_BATCH;
+}
+
+static bool binder_supported_policy(int policy)
+{
+ return is_fair_policy(policy) || is_rt_policy(policy);
+}
+
+static int to_userspace_prio(int policy, int kernel_priority)
+{
+ if (is_fair_policy(policy))
+ return PRIO_TO_NICE(kernel_priority);
+ else
+ return MAX_USER_RT_PRIO - 1 - kernel_priority;
+}
+
+static int to_kernel_prio(int policy, int user_priority)
+{
+ if (is_fair_policy(policy))
+ return NICE_TO_PRIO(user_priority);
+ else
+ return MAX_USER_RT_PRIO - 1 - user_priority;
+}
+
+static void binder_do_set_priority(struct task_struct *task,
+ struct binder_priority desired,
+ bool verify)
+{
+ int priority; /* user-space prio value */
+ bool has_cap_nice;
+ unsigned int policy = desired.sched_policy;
+
+ if (task->policy == policy && task->normal_prio == desired.prio)
return;
+
+ has_cap_nice = has_capability_noaudit(task, CAP_SYS_NICE);
+
+ priority = to_userspace_prio(policy, desired.prio);
+
+ if (verify && is_rt_policy(policy) && !has_cap_nice) {
+ long max_rtprio = task_rlimit(task, RLIMIT_RTPRIO);
+
+ if (max_rtprio == 0) {
+ policy = SCHED_NORMAL;
+ priority = MIN_NICE;
+ } else if (priority > max_rtprio) {
+ priority = max_rtprio;
+ }
}
- min_nice = rlimit_to_nice(rlimit(RLIMIT_NICE));
- binder_debug(BINDER_DEBUG_PRIORITY_CAP,
- "%d: nice value %ld not allowed use %ld instead\n",
- current->pid, nice, min_nice);
- set_user_nice(current, min_nice);
- if (min_nice <= MAX_NICE)
+
+ if (verify && is_fair_policy(policy) && !has_cap_nice) {
+ long min_nice = rlimit_to_nice(task_rlimit(task, RLIMIT_NICE));
+
+ if (min_nice > MAX_NICE) {
+ binder_user_error("%d RLIMIT_NICE not set\n",
+ task->pid);
+ return;
+ } else if (priority < min_nice) {
+ priority = min_nice;
+ }
+ }
+
+ if (policy != desired.sched_policy ||
+ to_kernel_prio(policy, priority) != desired.prio)
+ binder_debug(BINDER_DEBUG_PRIORITY_CAP,
+ "%d: priority %d not allowed, using %d instead\n",
+ task->pid, desired.prio,
+ to_kernel_prio(policy, priority));
+
+ trace_binder_set_priority(task->tgid, task->pid, task->normal_prio,
+ to_kernel_prio(policy, priority),
+ desired.prio);
+
+ /* Set the actual priority */
+ if (task->policy != policy || is_rt_policy(policy)) {
+ struct sched_param params;
+
+ params.sched_priority = is_rt_policy(policy) ? priority : 0;
+
+ sched_setscheduler_nocheck(task,
+ policy | SCHED_RESET_ON_FORK,
+ ¶ms);
+ }
+ if (is_fair_policy(policy))
+ set_user_nice(task, priority);
+}
+
+static void binder_set_priority(struct task_struct *task,
+ struct binder_priority desired)
+{
+ binder_do_set_priority(task, desired, /* verify = */ true);
+}
+
+static void binder_restore_priority(struct task_struct *task,
+ struct binder_priority desired)
+{
+ binder_do_set_priority(task, desired, /* verify = */ false);
+}
+
+static void binder_transaction_priority(struct task_struct *task,
+ struct binder_transaction *t,
+ struct binder_priority node_prio,
+ bool inherit_rt)
+{
+ struct binder_priority desired_prio = t->priority;
+
+ if (t->set_priority_called)
return;
- binder_user_error("%d RLIMIT_NICE not set\n", current->pid);
+
+ t->set_priority_called = true;
+ t->saved_priority.sched_policy = task->policy;
+ t->saved_priority.prio = task->normal_prio;
+
+ if (!inherit_rt && is_rt_policy(desired_prio.sched_policy)) {
+ desired_prio.prio = NICE_TO_PRIO(0);
+ desired_prio.sched_policy = SCHED_NORMAL;
+ }
+
+ if (node_prio.prio < t->priority.prio ||
+ (node_prio.prio == t->priority.prio &&
+ node_prio.sched_policy == SCHED_FIFO)) {
+ /*
+ * In case the minimum priority on the node is
+ * higher (lower value), use that priority. If
+ * the priority is the same, but the node uses
+ * SCHED_FIFO, prefer SCHED_FIFO, since it can
+ * run unbounded, unlike SCHED_RR.
+ */
+ desired_prio = node_prio;
+ }
+
+ binder_set_priority(task, desired_prio);
}
static struct binder_node *binder_get_node_ilocked(struct binder_proc *proc,
@@ -1132,6 +1321,7 @@
binder_uintptr_t ptr = fp ? fp->binder : 0;
binder_uintptr_t cookie = fp ? fp->cookie : 0;
__u32 flags = fp ? fp->flags : 0;
+ s8 priority;
assert_spin_locked(&proc->inner_lock);
@@ -1164,8 +1354,12 @@
node->ptr = ptr;
node->cookie = cookie;
node->work.type = BINDER_WORK_NODE;
- node->min_priority = flags & FLAT_BINDER_FLAG_PRIORITY_MASK;
+ priority = flags & FLAT_BINDER_FLAG_PRIORITY_MASK;
+ node->sched_policy = (flags & FLAT_BINDER_FLAG_SCHED_POLICY_MASK) >>
+ FLAT_BINDER_FLAG_SCHED_POLICY_SHIFT;
+ node->min_priority = to_kernel_prio(node->sched_policy, priority);
node->accept_fds = !!(flags & FLAT_BINDER_FLAG_ACCEPTS_FDS);
+ node->inherit_rt = !!(flags & FLAT_BINDER_FLAG_INHERIT_RT);
spin_lock_init(&node->lock);
INIT_LIST_HEAD(&node->work.entry);
INIT_LIST_HEAD(&node->async_todo);
@@ -1228,6 +1422,17 @@
node->local_strong_refs++;
if (!node->has_strong_ref && target_list) {
binder_dequeue_work_ilocked(&node->work);
+ /*
+ * Note: this function is the only place where we queue
+ * directly to a thread->todo without using the
+ * corresponding binder_enqueue_thread_work() helper
+ * functions; in this case it's ok to not set the
+ * process_todo flag, since we know this node work will
+ * always be followed by other work that starts queue
+ * processing: in case of synchronous transactions, a
+ * BR_REPLY or BR_ERROR; in case of oneway
+ * transactions, a BR_TRANSACTION_COMPLETE.
+ */
binder_enqueue_work_ilocked(&node->work, target_list);
}
} else {
@@ -1239,6 +1444,9 @@
node->debug_id);
return -EINVAL;
}
+ /*
+ * See comment above
+ */
binder_enqueue_work_ilocked(&node->work, target_list);
}
}
@@ -1928,9 +2136,9 @@
binder_pop_transaction_ilocked(target_thread, t);
if (target_thread->reply_error.cmd == BR_OK) {
target_thread->reply_error.cmd = error_code;
- binder_enqueue_work_ilocked(
- &target_thread->reply_error.work,
- &target_thread->todo);
+ binder_enqueue_thread_work_ilocked(
+ target_thread,
+ &target_thread->reply_error.work);
wake_up_interruptible(&target_thread->wait);
} else {
/*
@@ -2000,8 +2208,8 @@
struct binder_object_header *hdr;
size_t object_size = 0;
- if (offset > buffer->data_size - sizeof(*hdr) ||
- buffer->data_size < sizeof(*hdr) ||
+ if (buffer->data_size < sizeof(*hdr) ||
+ offset > buffer->data_size - sizeof(*hdr) ||
!IS_ALIGNED(offset, sizeof(u32)))
return 0;
@@ -2575,20 +2783,22 @@
struct binder_proc *proc,
struct binder_thread *thread)
{
- struct list_head *target_list = NULL;
struct binder_node *node = t->buffer->target_node;
+ struct binder_priority node_prio;
bool oneway = !!(t->flags & TF_ONE_WAY);
- bool wakeup = true;
+ bool pending_async = false;
BUG_ON(!node);
binder_node_lock(node);
+ node_prio.prio = node->min_priority;
+ node_prio.sched_policy = node->sched_policy;
+
if (oneway) {
BUG_ON(thread);
if (node->has_async_transaction) {
- target_list = &node->async_todo;
- wakeup = false;
+ pending_async = true;
} else {
- node->has_async_transaction = 1;
+ node->has_async_transaction = true;
}
}
@@ -2600,19 +2810,20 @@
return false;
}
- if (!thread && !target_list)
+ if (!thread && !pending_async)
thread = binder_select_thread_ilocked(proc);
- if (thread)
- target_list = &thread->todo;
- else if (!target_list)
- target_list = &proc->todo;
- else
- BUG_ON(target_list != &node->async_todo);
+ if (thread) {
+ binder_transaction_priority(thread->task, t, node_prio,
+ node->inherit_rt);
+ binder_enqueue_thread_work_ilocked(thread, &t->work);
+ } else if (!pending_async) {
+ binder_enqueue_work_ilocked(&t->work, &proc->todo);
+ } else {
+ binder_enqueue_work_ilocked(&t->work, &node->async_todo);
+ }
- binder_enqueue_work_ilocked(&t->work, target_list);
-
- if (wakeup)
+ if (!pending_async)
binder_wakeup_thread_ilocked(proc, thread, !oneway /* sync */);
binder_inner_proc_unlock(proc);
@@ -2727,7 +2938,6 @@
}
thread->transaction_stack = in_reply_to->to_parent;
binder_inner_proc_unlock(proc);
- binder_set_nice(in_reply_to->saved_priority);
target_thread = binder_get_txn_from_and_acq_inner(in_reply_to);
if (target_thread == NULL) {
return_error = BR_DEAD_REPLY;
@@ -2900,7 +3110,15 @@
t->to_thread = target_thread;
t->code = tr->code;
t->flags = tr->flags;
- t->priority = task_nice(current);
+ if (!(t->flags & TF_ONE_WAY) &&
+ binder_supported_policy(current->policy)) {
+ /* Inherit supported policies for synchronous transactions */
+ t->priority.sched_policy = current->policy;
+ t->priority.prio = current->normal_prio;
+ } else {
+ /* Otherwise, fall back to the default priority */
+ t->priority = target_proc->default_priority;
+ }
trace_binder_transaction(reply, t, target_node);
@@ -3115,10 +3333,10 @@
}
}
tcomplete->type = BINDER_WORK_TRANSACTION_COMPLETE;
- binder_enqueue_work(proc, tcomplete, &thread->todo);
t->work.type = BINDER_WORK_TRANSACTION;
if (reply) {
+ binder_enqueue_thread_work(thread, tcomplete);
binder_inner_proc_lock(target_proc);
if (target_thread->is_dead) {
binder_inner_proc_unlock(target_proc);
@@ -3126,13 +3344,22 @@
}
BUG_ON(t->buffer->async_transaction != 0);
binder_pop_transaction_ilocked(target_thread, in_reply_to);
- binder_enqueue_work_ilocked(&t->work, &target_thread->todo);
+ binder_enqueue_thread_work_ilocked(target_thread, &t->work);
binder_inner_proc_unlock(target_proc);
wake_up_interruptible_sync(&target_thread->wait);
+ binder_restore_priority(current, in_reply_to->saved_priority);
binder_free_transaction(in_reply_to);
} else if (!(t->flags & TF_ONE_WAY)) {
BUG_ON(t->buffer->async_transaction != 0);
binder_inner_proc_lock(proc);
+ /*
+ * Defer the TRANSACTION_COMPLETE, so we don't return to
+ * userspace immediately; this allows the target process to
+ * immediately start processing this transaction, reducing
+ * latency. We will then return the TRANSACTION_COMPLETE when
+ * the target replies (or there is an error).
+ */
+ binder_enqueue_deferred_thread_work_ilocked(thread, tcomplete);
t->need_reply = 1;
t->from_parent = thread->transaction_stack;
thread->transaction_stack = t;
@@ -3146,6 +3373,7 @@
} else {
BUG_ON(target_node == NULL);
BUG_ON(t->buffer->async_transaction != 1);
+ binder_enqueue_thread_work(thread, tcomplete);
if (!binder_proc_transaction(t, target_proc, NULL))
goto err_dead_proc_or_thread;
}
@@ -3223,16 +3451,13 @@
BUG_ON(thread->return_error.cmd != BR_OK);
if (in_reply_to) {
+ binder_restore_priority(current, in_reply_to->saved_priority);
thread->return_error.cmd = BR_TRANSACTION_COMPLETE;
- binder_enqueue_work(thread->proc,
- &thread->return_error.work,
- &thread->todo);
+ binder_enqueue_thread_work(thread, &thread->return_error.work);
binder_send_failed_reply(in_reply_to, return_error);
} else {
thread->return_error.cmd = return_error;
- binder_enqueue_work(thread->proc,
- &thread->return_error.work,
- &thread->todo);
+ binder_enqueue_thread_work(thread, &thread->return_error.work);
}
}
@@ -3438,7 +3663,7 @@
w = binder_dequeue_work_head_ilocked(
&buf_node->async_todo);
if (!w) {
- buf_node->has_async_transaction = 0;
+ buf_node->has_async_transaction = false;
} else {
binder_enqueue_work_ilocked(
w, &proc->todo);
@@ -3536,10 +3761,9 @@
WARN_ON(thread->return_error.cmd !=
BR_OK);
thread->return_error.cmd = BR_ERROR;
- binder_enqueue_work(
- thread->proc,
- &thread->return_error.work,
- &thread->todo);
+ binder_enqueue_thread_work(
+ thread,
+ &thread->return_error.work);
binder_debug(
BINDER_DEBUG_FAILED_TRANSACTION,
"%d:%d BC_REQUEST_DEATH_NOTIFICATION failed\n",
@@ -3619,9 +3843,9 @@
if (thread->looper &
(BINDER_LOOPER_STATE_REGISTERED |
BINDER_LOOPER_STATE_ENTERED))
- binder_enqueue_work_ilocked(
- &death->work,
- &thread->todo);
+ binder_enqueue_thread_work_ilocked(
+ thread,
+ &death->work);
else {
binder_enqueue_work_ilocked(
&death->work,
@@ -3676,8 +3900,8 @@
if (thread->looper &
(BINDER_LOOPER_STATE_REGISTERED |
BINDER_LOOPER_STATE_ENTERED))
- binder_enqueue_work_ilocked(
- &death->work, &thread->todo);
+ binder_enqueue_thread_work_ilocked(
+ thread, &death->work);
else {
binder_enqueue_work_ilocked(
&death->work,
@@ -3808,7 +4032,7 @@
wait_event_interruptible(binder_user_error_wait,
binder_stop_on_user_error < 2);
}
- binder_set_nice(proc->default_priority);
+ binder_restore_priority(current, proc->default_priority);
}
if (non_block) {
@@ -3851,6 +4075,8 @@
break;
}
w = binder_dequeue_work_head_ilocked(list);
+ if (binder_worklist_empty_ilocked(&thread->todo))
+ thread->process_todo = false;
switch (w->type) {
case BINDER_WORK_TRANSACTION: {
@@ -3865,6 +4091,7 @@
binder_inner_proc_unlock(proc);
if (put_user(e->cmd, (uint32_t __user *)ptr))
return -EFAULT;
+ cmd = e->cmd;
e->cmd = BR_OK;
ptr += sizeof(uint32_t);
@@ -4020,16 +4247,14 @@
BUG_ON(t->buffer == NULL);
if (t->buffer->target_node) {
struct binder_node *target_node = t->buffer->target_node;
+ struct binder_priority node_prio;
tr.target.ptr = target_node->ptr;
tr.cookie = target_node->cookie;
- t->saved_priority = task_nice(current);
- if (t->priority < target_node->min_priority &&
- !(t->flags & TF_ONE_WAY))
- binder_set_nice(t->priority);
- else if (!(t->flags & TF_ONE_WAY) ||
- t->saved_priority > target_node->min_priority)
- binder_set_nice(target_node->min_priority);
+ node_prio.sched_policy = target_node->sched_policy;
+ node_prio.prio = target_node->min_priority;
+ binder_transaction_priority(current, t, node_prio,
+ target_node->inherit_rt);
cmd = BR_TRANSACTION;
} else {
tr.target.ptr = 0;
@@ -4207,6 +4432,8 @@
binder_stats_created(BINDER_STAT_THREAD);
thread->proc = proc;
thread->pid = current->pid;
+ get_task_struct(current);
+ thread->task = current;
atomic_set(&thread->tmp_ref, 0);
init_waitqueue_head(&thread->wait);
INIT_LIST_HEAD(&thread->todo);
@@ -4257,6 +4484,7 @@
BUG_ON(!list_empty(&thread->todo));
binder_stats_deleted(BINDER_STAT_THREAD);
binder_proc_dec_tmpref(thread->proc);
+ put_task_struct(thread->task);
kfree(thread);
}
@@ -4670,7 +4898,9 @@
failure_string = "bad vm_flags";
goto err_bad_arg;
}
- vma->vm_flags = (vma->vm_flags | VM_DONTCOPY) & ~VM_MAYWRITE;
+ vma->vm_flags |= VM_DONTCOPY | VM_MIXEDMAP;
+ vma->vm_flags &= ~VM_MAYWRITE;
+
vma->vm_ops = &binder_vm_ops;
vma->vm_private_data = proc;
@@ -4683,7 +4913,7 @@
return 0;
err_bad_arg:
- pr_err("binder_mmap: %d %lx-%lx %s failed %d\n",
+ pr_err("%s: %d %lx-%lx %s failed %d\n", __func__,
proc->pid, vma->vm_start, vma->vm_end, failure_string, ret);
return ret;
}
@@ -4693,7 +4923,7 @@
struct binder_proc *proc;
struct binder_device *binder_dev;
- binder_debug(BINDER_DEBUG_OPEN_CLOSE, "binder_open: %d:%d\n",
+ binder_debug(BINDER_DEBUG_OPEN_CLOSE, "%s: %d:%d\n", __func__,
current->group_leader->pid, current->pid);
proc = kzalloc(sizeof(*proc), GFP_KERNEL);
@@ -4705,7 +4935,14 @@
proc->tsk = current->group_leader;
mutex_init(&proc->files_lock);
INIT_LIST_HEAD(&proc->todo);
- proc->default_priority = task_nice(current);
+ if (binder_supported_policy(current->policy)) {
+ proc->default_priority.sched_policy = current->policy;
+ proc->default_priority.prio = current->normal_prio;
+ } else {
+ proc->default_priority.sched_policy = SCHED_NORMAL;
+ proc->default_priority.prio = NICE_TO_PRIO(0);
+ }
+
binder_dev = container_of(filp->private_data, struct binder_device,
miscdev);
proc->context = &binder_dev->context;
@@ -4732,7 +4969,7 @@
* anyway print all contexts that a given PID has, so this
* is not a problem.
*/
- proc->debugfs_entry = debugfs_create_file(strbuf, S_IRUGO,
+ proc->debugfs_entry = debugfs_create_file(strbuf, 0444,
binder_debugfs_dir_entry_proc,
(void *)(unsigned long)proc->pid,
&binder_proc_fops);
@@ -4999,13 +5236,14 @@
spin_lock(&t->lock);
to_proc = t->to_proc;
seq_printf(m,
- "%s %d: %pK from %d:%d to %d:%d code %x flags %x pri %ld r%d",
+ "%s %d: %pK from %d:%d to %d:%d code %x flags %x pri %d:%d r%d",
prefix, t->debug_id, t,
t->from ? t->from->proc->pid : 0,
t->from ? t->from->pid : 0,
to_proc ? to_proc->pid : 0,
t->to_thread ? t->to_thread->pid : 0,
- t->code, t->flags, t->priority, t->need_reply);
+ t->code, t->flags, t->priority.sched_policy,
+ t->priority.prio, t->need_reply);
spin_unlock(&t->lock);
if (proc != to_proc) {
@@ -5123,8 +5361,9 @@
hlist_for_each_entry(ref, &node->refs, node_entry)
count++;
- seq_printf(m, " node %d: u%016llx c%016llx hs %d hw %d ls %d lw %d is %d iw %d tr %d",
+ seq_printf(m, " node %d: u%016llx c%016llx pri %d:%d hs %d hw %d ls %d lw %d is %d iw %d tr %d",
node->debug_id, (u64)node->ptr, (u64)node->cookie,
+ node->sched_policy, node->min_priority,
node->has_strong_ref, node->has_weak_ref,
node->local_strong_refs, node->local_weak_refs,
node->internal_strong_refs, count, node->tmp_refs);
@@ -5561,7 +5800,9 @@
struct binder_device *device;
struct hlist_node *tmp;
- binder_alloc_shrinker_init();
+ ret = binder_alloc_shrinker_init();
+ if (ret)
+ return ret;
atomic_set(&binder_transaction_log.cur, ~0U);
atomic_set(&binder_transaction_log_failed.cur, ~0U);
@@ -5573,27 +5814,27 @@
if (binder_debugfs_dir_entry_root) {
debugfs_create_file("state",
- S_IRUGO,
+ 0444,
binder_debugfs_dir_entry_root,
NULL,
&binder_state_fops);
debugfs_create_file("stats",
- S_IRUGO,
+ 0444,
binder_debugfs_dir_entry_root,
NULL,
&binder_stats_fops);
debugfs_create_file("transactions",
- S_IRUGO,
+ 0444,
binder_debugfs_dir_entry_root,
NULL,
&binder_transactions_fops);
debugfs_create_file("transaction_log",
- S_IRUGO,
+ 0444,
binder_debugfs_dir_entry_root,
&binder_transaction_log,
&binder_transaction_log_fops);
debugfs_create_file("failed_transaction_log",
- S_IRUGO,
+ 0444,
binder_debugfs_dir_entry_root,
&binder_transaction_log_failed,
&binder_transaction_log_fops);
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c
index 58e4658..54debce 100644
--- a/drivers/android/binder_alloc.c
+++ b/drivers/android/binder_alloc.c
@@ -186,12 +186,12 @@
}
static int binder_update_page_range(struct binder_alloc *alloc, int allocate,
- void *start, void *end,
- struct vm_area_struct *vma)
+ void *start, void *end)
{
void *page_addr;
unsigned long user_page_addr;
struct binder_lru_page *page;
+ struct vm_area_struct *vma = NULL;
struct mm_struct *mm = NULL;
bool need_mm = false;
@@ -215,11 +215,11 @@
}
}
- if (!vma && need_mm && mmget_not_zero(alloc->vma_vm_mm))
+ if (need_mm && mmget_not_zero(alloc->vma_vm_mm))
mm = alloc->vma_vm_mm;
if (mm) {
- down_write(&mm->mmap_sem);
+ down_read(&mm->mmap_sem);
vma = alloc->vma;
}
@@ -281,11 +281,14 @@
goto err_vm_insert_page_failed;
}
+ if (index + 1 > alloc->pages_high)
+ alloc->pages_high = index + 1;
+
trace_binder_alloc_page_end(alloc, index);
/* vm_insert_page does not seem to increment the refcount */
}
if (mm) {
- up_write(&mm->mmap_sem);
+ up_read(&mm->mmap_sem);
mmput(mm);
}
return 0;
@@ -318,7 +321,7 @@
}
err_no_vma:
if (mm) {
- up_write(&mm->mmap_sem);
+ up_read(&mm->mmap_sem);
mmput(mm);
}
return vma ? -ENOMEM : -ESRCH;
@@ -352,11 +355,12 @@
return vma;
}
-struct binder_buffer *binder_alloc_new_buf_locked(struct binder_alloc *alloc,
- size_t data_size,
- size_t offsets_size,
- size_t extra_buffers_size,
- int is_async)
+static struct binder_buffer *binder_alloc_new_buf_locked(
+ struct binder_alloc *alloc,
+ size_t data_size,
+ size_t offsets_size,
+ size_t extra_buffers_size,
+ int is_async)
{
struct rb_node *n = alloc->free_buffers.rb_node;
struct binder_buffer *buffer;
@@ -465,7 +469,7 @@
if (end_page_addr > has_page_addr)
end_page_addr = has_page_addr;
ret = binder_update_page_range(alloc, 1,
- (void *)PAGE_ALIGN((uintptr_t)buffer->data), end_page_addr, NULL);
+ (void *)PAGE_ALIGN((uintptr_t)buffer->data), end_page_addr);
if (ret)
return ERR_PTR(ret);
@@ -506,7 +510,7 @@
err_alloc_buf_struct_failed:
binder_update_page_range(alloc, 0,
(void *)PAGE_ALIGN((uintptr_t)buffer->data),
- end_page_addr, NULL);
+ end_page_addr);
return ERR_PTR(-ENOMEM);
}
@@ -590,8 +594,7 @@
alloc->pid, buffer->data,
prev->data, next ? next->data : NULL);
binder_update_page_range(alloc, 0, buffer_start_page(buffer),
- buffer_start_page(buffer) + PAGE_SIZE,
- NULL);
+ buffer_start_page(buffer) + PAGE_SIZE);
}
list_del(&buffer->entry);
kfree(buffer);
@@ -628,8 +631,7 @@
binder_update_page_range(alloc, 0,
(void *)PAGE_ALIGN((uintptr_t)buffer->data),
- (void *)(((uintptr_t)buffer->data + buffer_size) & PAGE_MASK),
- NULL);
+ (void *)(((uintptr_t)buffer->data + buffer_size) & PAGE_MASK));
rb_erase(&buffer->rb_node, &alloc->allocated_buffers);
buffer->free = 1;
@@ -881,6 +883,7 @@
}
mutex_unlock(&alloc->mutex);
seq_printf(m, " pages: %d:%d:%d\n", active, lru, free);
+ seq_printf(m, " pages high watermark: %zu\n", alloc->pages_high);
}
/**
@@ -1010,7 +1013,7 @@
return ret;
}
-struct shrinker binder_shrinker = {
+static struct shrinker binder_shrinker = {
.count_objects = binder_shrink_count,
.scan_objects = binder_shrink_scan,
.seeks = DEFAULT_SEEKS,
@@ -1030,8 +1033,14 @@
INIT_LIST_HEAD(&alloc->buffers);
}
-void binder_alloc_shrinker_init(void)
+int binder_alloc_shrinker_init(void)
{
- list_lru_init(&binder_alloc_lru);
- register_shrinker(&binder_shrinker);
+ int ret = list_lru_init(&binder_alloc_lru);
+
+ if (ret == 0) {
+ ret = register_shrinker(&binder_shrinker);
+ if (ret)
+ list_lru_destroy(&binder_alloc_lru);
+ }
+ return ret;
}
diff --git a/drivers/android/binder_alloc.h b/drivers/android/binder_alloc.h
index 2dd33b6..9ef64e5 100644
--- a/drivers/android/binder_alloc.h
+++ b/drivers/android/binder_alloc.h
@@ -92,6 +92,7 @@
* @pages: array of binder_lru_page
* @buffer_size: size of address space specified via mmap
* @pid: pid for associated binder_proc (invariant after init)
+ * @pages_high: high watermark of offset in @pages
*
* Bookkeeping structure for per-proc address space management for binder
* buffers. It is normally initialized during binder_init() and binder_mmap()
@@ -112,6 +113,7 @@
size_t buffer_size;
uint32_t buffer_free;
int pid;
+ size_t pages_high;
};
#ifdef CONFIG_ANDROID_BINDER_IPC_SELFTEST
@@ -128,7 +130,7 @@
size_t extra_buffers_size,
int is_async);
extern void binder_alloc_init(struct binder_alloc *alloc);
-void binder_alloc_shrinker_init(void);
+extern int binder_alloc_shrinker_init(void);
extern void binder_alloc_vma_close(struct binder_alloc *alloc);
extern struct binder_buffer *
binder_alloc_prepare_to_free(struct binder_alloc *alloc,
diff --git a/drivers/android/binder_trace.h b/drivers/android/binder_trace.h
index 76e3b9c..b11dffc 100644
--- a/drivers/android/binder_trace.h
+++ b/drivers/android/binder_trace.h
@@ -85,6 +85,30 @@
DEFINE_BINDER_FUNCTION_RETURN_EVENT(binder_write_done);
DEFINE_BINDER_FUNCTION_RETURN_EVENT(binder_read_done);
+TRACE_EVENT(binder_set_priority,
+ TP_PROTO(int proc, int thread, unsigned int old_prio,
+ unsigned int desired_prio, unsigned int new_prio),
+ TP_ARGS(proc, thread, old_prio, new_prio, desired_prio),
+
+ TP_STRUCT__entry(
+ __field(int, proc)
+ __field(int, thread)
+ __field(unsigned int, old_prio)
+ __field(unsigned int, new_prio)
+ __field(unsigned int, desired_prio)
+ ),
+ TP_fast_assign(
+ __entry->proc = proc;
+ __entry->thread = thread;
+ __entry->old_prio = old_prio;
+ __entry->new_prio = new_prio;
+ __entry->desired_prio = desired_prio;
+ ),
+ TP_printk("proc=%d thread=%d old=%d => new=%d desired=%d",
+ __entry->proc, __entry->thread, __entry->old_prio,
+ __entry->new_prio, __entry->desired_prio)
+);
+
TRACE_EVENT(binder_wait_for_work,
TP_PROTO(bool proc_work, bool transaction_stack, bool thread_todo),
TP_ARGS(proc_work, transaction_stack, thread_todo),
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 41be9ff..d00e7c4 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -21,14 +21,48 @@
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/sched/topology.h>
+#include <linux/sched/energy.h>
+#include <linux/cpuset.h>
+
+DEFINE_PER_CPU(unsigned long, freq_scale) = SCHED_CAPACITY_SCALE;
+DEFINE_PER_CPU(unsigned long, max_cpu_freq);
+DEFINE_PER_CPU(unsigned long, max_freq_scale) = SCHED_CAPACITY_SCALE;
+
+void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,
+ unsigned long max_freq)
+{
+ unsigned long scale;
+ int i;
+
+ scale = (cur_freq << SCHED_CAPACITY_SHIFT) / max_freq;
+
+ for_each_cpu(i, cpus) {
+ per_cpu(freq_scale, i) = scale;
+ per_cpu(max_cpu_freq, i) = max_freq;
+ }
+}
+
+void arch_set_max_freq_scale(struct cpumask *cpus,
+ unsigned long policy_max_freq)
+{
+ unsigned long scale, max_freq;
+ int cpu = cpumask_first(cpus);
+
+ if (cpu > nr_cpu_ids)
+ return;
+
+ max_freq = per_cpu(max_cpu_freq, cpu);
+ if (!max_freq)
+ return;
+
+ scale = (policy_max_freq << SCHED_CAPACITY_SHIFT) / max_freq;
+
+ for_each_cpu(cpu, cpus)
+ per_cpu(max_freq_scale, cpu) = scale;
+}
static DEFINE_MUTEX(cpu_scale_mutex);
-static DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;
-
-unsigned long topology_get_cpu_scale(struct sched_domain *sd, int cpu)
-{
- return per_cpu(cpu_scale, cpu);
-}
+DEFINE_PER_CPU(unsigned long, cpu_scale) = SCHED_CAPACITY_SCALE;
void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity)
{
@@ -44,6 +78,9 @@
return sprintf(buf, "%lu\n", topology_get_cpu_scale(NULL, cpu->dev.id));
}
+static void update_topology_flags_workfn(struct work_struct *work);
+static DECLARE_WORK(update_topology_flags_work, update_topology_flags_workfn);
+
static ssize_t cpu_capacity_store(struct device *dev,
struct device_attribute *attr,
const char *buf,
@@ -54,6 +91,7 @@
int i;
unsigned long new_capacity;
ssize_t ret;
+ cpumask_var_t mask;
if (!count)
return 0;
@@ -65,10 +103,41 @@
return -EINVAL;
mutex_lock(&cpu_scale_mutex);
- for_each_cpu(i, &cpu_topology[this_cpu].core_sibling)
+
+ if (new_capacity < SCHED_CAPACITY_SCALE) {
+ int highest_score_cpu = 0;
+
+ if (!alloc_cpumask_var(&mask, GFP_KERNEL)) {
+ mutex_unlock(&cpu_scale_mutex);
+ return -ENOMEM;
+ }
+
+ cpumask_andnot(mask, cpu_online_mask,
+ topology_core_cpumask(this_cpu));
+
+ for_each_cpu(i, mask) {
+ if (topology_get_cpu_scale(NULL, i) ==
+ SCHED_CAPACITY_SCALE) {
+ highest_score_cpu = 1;
+ break;
+ }
+ }
+
+ free_cpumask_var(mask);
+
+ if (!highest_score_cpu) {
+ mutex_unlock(&cpu_scale_mutex);
+ return -EINVAL;
+ }
+ }
+
+ for_each_cpu(i, topology_core_cpumask(this_cpu))
topology_set_cpu_scale(i, new_capacity);
mutex_unlock(&cpu_scale_mutex);
+ if (topology_detect_flags())
+ schedule_work(&update_topology_flags_work);
+
return count;
}
@@ -93,6 +162,185 @@
}
subsys_initcall(register_cpu_capacity_sysctl);
+enum asym_cpucap_type { no_asym, asym_thread, asym_core, asym_die };
+static enum asym_cpucap_type asym_cpucap = no_asym;
+enum share_cap_type { no_share_cap, share_cap_thread, share_cap_core, share_cap_die};
+static enum share_cap_type share_cap = no_share_cap;
+
+#ifdef CONFIG_CPU_FREQ
+int detect_share_cap_flag(void)
+{
+ int cpu;
+ enum share_cap_type share_cap_level = no_share_cap;
+ struct cpufreq_policy *policy;
+
+ for_each_possible_cpu(cpu) {
+ policy = cpufreq_cpu_get(cpu);
+
+ if (!policy)
+ return 0;
+
+ if (cpumask_equal(topology_sibling_cpumask(cpu),
+ policy->related_cpus)) {
+ share_cap_level = share_cap_thread;
+ continue;
+ }
+
+ if (cpumask_equal(topology_core_cpumask(cpu),
+ policy->related_cpus)) {
+ share_cap_level = share_cap_core;
+ continue;
+ }
+
+ if (cpumask_equal(cpu_cpu_mask(cpu),
+ policy->related_cpus)) {
+ share_cap_level = share_cap_die;
+ continue;
+ }
+ }
+
+ if (share_cap != share_cap_level) {
+ share_cap = share_cap_level;
+ return 1;
+ }
+
+ return 0;
+}
+#else
+int detect_share_cap_flag(void) { return 0; }
+#endif
+
+/*
+ * Walk cpu topology to determine sched_domain flags.
+ *
+ * SD_ASYM_CPUCAPACITY: Indicates the lowest level that spans all cpu
+ * capacities found in the system for all cpus, i.e. the flag is set
+ * at the same level for all systems. The current algorithm implements
+ * this by looking for higher capacities, which doesn't work for all
+ * conceivable topology, but don't complicate things until it is
+ * necessary.
+ */
+int topology_detect_flags(void)
+{
+ unsigned long max_capacity, capacity;
+ enum asym_cpucap_type asym_level = no_asym;
+ int cpu, die_cpu, core, thread, flags_changed = 0;
+
+ for_each_possible_cpu(cpu) {
+ max_capacity = 0;
+
+ if (asym_level >= asym_thread)
+ goto check_core;
+
+ for_each_cpu(thread, topology_sibling_cpumask(cpu)) {
+ capacity = topology_get_cpu_scale(NULL, thread);
+
+ if (capacity > max_capacity) {
+ if (max_capacity != 0)
+ asym_level = asym_thread;
+
+ max_capacity = capacity;
+ }
+ }
+
+check_core:
+ if (asym_level >= asym_core)
+ goto check_die;
+
+ for_each_cpu(core, topology_core_cpumask(cpu)) {
+ capacity = topology_get_cpu_scale(NULL, core);
+
+ if (capacity > max_capacity) {
+ if (max_capacity != 0)
+ asym_level = asym_core;
+
+ max_capacity = capacity;
+ }
+ }
+check_die:
+ for_each_possible_cpu(die_cpu) {
+ capacity = topology_get_cpu_scale(NULL, die_cpu);
+
+ if (capacity > max_capacity) {
+ if (max_capacity != 0) {
+ asym_level = asym_die;
+ goto done;
+ }
+ }
+ }
+ }
+
+done:
+ if (asym_cpucap != asym_level) {
+ asym_cpucap = asym_level;
+ flags_changed = 1;
+ pr_debug("topology flag change detected\n");
+ }
+
+ if (detect_share_cap_flag())
+ flags_changed = 1;
+
+ return flags_changed;
+}
+
+int topology_smt_flags(void)
+{
+ int flags = 0;
+
+ if (asym_cpucap == asym_thread)
+ flags |= SD_ASYM_CPUCAPACITY;
+
+ if (share_cap == share_cap_thread)
+ flags |= SD_SHARE_CAP_STATES;
+
+ return flags;
+}
+
+int topology_core_flags(void)
+{
+ int flags = 0;
+
+ if (asym_cpucap == asym_core)
+ flags |= SD_ASYM_CPUCAPACITY;
+
+ if (share_cap == share_cap_core)
+ flags |= SD_SHARE_CAP_STATES;
+
+ return flags;
+}
+
+int topology_cpu_flags(void)
+{
+ int flags = 0;
+
+ if (asym_cpucap == asym_die)
+ flags |= SD_ASYM_CPUCAPACITY;
+
+ if (share_cap == share_cap_die)
+ flags |= SD_SHARE_CAP_STATES;
+
+ return flags;
+}
+
+static int update_topology = 0;
+
+int topology_update_cpu_topology(void)
+{
+ return update_topology;
+}
+
+/*
+ * Updating the sched_domains can't be done directly from cpufreq callbacks
+ * due to locking, so queue the work for later.
+ */
+static void update_topology_flags_workfn(struct work_struct *work)
+{
+ update_topology = 1;
+ rebuild_sched_domains();
+ pr_debug("sched_domain hierarchy rebuilt, flags updated\n");
+ update_topology = 0;
+}
+
static u32 capacity_scale;
static u32 *raw_capacity;
@@ -115,13 +363,12 @@
pr_debug("cpu_capacity: capacity_scale=%u\n", capacity_scale);
mutex_lock(&cpu_scale_mutex);
for_each_possible_cpu(cpu) {
- pr_debug("cpu_capacity: cpu=%d raw_capacity=%u\n",
- cpu, raw_capacity[cpu]);
capacity = (raw_capacity[cpu] << SCHED_CAPACITY_SHIFT)
/ capacity_scale;
topology_set_cpu_scale(cpu, capacity);
- pr_debug("cpu_capacity: CPU%d cpu_capacity=%lu\n",
- cpu, topology_get_cpu_scale(NULL, cpu));
+ pr_debug("cpu_capacity: CPU%d cpu_capacity=%lu raw_capacity=%u\n",
+ cpu, topology_get_cpu_scale(NULL, cpu),
+ raw_capacity[cpu]);
}
mutex_unlock(&cpu_scale_mutex);
}
@@ -198,6 +445,9 @@
if (cpumask_empty(cpus_to_visit)) {
topology_normalize_cpu_scale();
+ init_sched_energy_costs();
+ if (topology_detect_flags())
+ schedule_work(&update_topology_flags_work);
free_raw_capacity();
pr_debug("cpu_capacity: parsing done\n");
schedule_work(&parsing_done_work);
@@ -212,6 +462,8 @@
static int __init register_cpufreq_notifier(void)
{
+ int ret;
+
/*
* on ACPI-based systems we need to use the default cpu capacity
* until we have the necessary code to parse the cpu capacity, so
@@ -227,8 +479,13 @@
cpumask_copy(cpus_to_visit, cpu_possible_mask);
- return cpufreq_register_notifier(&init_cpu_capacity_notifier,
- CPUFREQ_POLICY_NOTIFIER);
+ ret = cpufreq_register_notifier(&init_cpu_capacity_notifier,
+ CPUFREQ_POLICY_NOTIFIER);
+
+ if (ret)
+ free_cpumask_var(cpus_to_visit);
+
+ return ret;
}
core_initcall(register_cpufreq_notifier);
@@ -236,6 +493,7 @@
{
cpufreq_unregister_notifier(&init_cpu_capacity_notifier,
CPUFREQ_POLICY_NOTIFIER);
+ free_cpumask_var(cpus_to_visit);
}
#else
diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 770b153..e24f6500 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -34,6 +34,7 @@
#include <linux/cpufreq.h>
#include <linux/cpuidle.h>
#include <linux/timer.h>
+#include <linux/wakeup_reason.h>
#include "../base.h"
#include "power.h"
@@ -1455,6 +1456,7 @@
pm_callback_t callback = NULL;
const char *info = NULL;
int error = 0;
+ char suspend_abort[MAX_SUSPEND_ABORT_LEN];
DECLARE_DPM_WATCHDOG_ON_STACK(wd);
TRACE_DEVICE(dev);
@@ -1475,6 +1477,9 @@
pm_wakeup_event(dev, 0);
if (pm_wakeup_pending()) {
+ pm_get_active_wakeup_sources(suspend_abort,
+ MAX_SUSPEND_ABORT_LEN);
+ log_suspend_abort_reason(suspend_abort);
async_error = -EBUSY;
goto Complete;
}
diff --git a/drivers/base/power/wakeup.c b/drivers/base/power/wakeup.c
index cdd6f25..b932d7f 100644
--- a/drivers/base/power/wakeup.c
+++ b/drivers/base/power/wakeup.c
@@ -15,6 +15,7 @@
#include <linux/seq_file.h>
#include <linux/debugfs.h>
#include <linux/pm_wakeirq.h>
+#include <linux/types.h>
#include <trace/events/power.h>
#include "power.h"
@@ -805,6 +806,37 @@
}
EXPORT_SYMBOL_GPL(pm_wakeup_dev_event);
+void pm_get_active_wakeup_sources(char *pending_wakeup_source, size_t max)
+{
+ struct wakeup_source *ws, *last_active_ws = NULL;
+ int len = 0;
+ bool active = false;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(ws, &wakeup_sources, entry) {
+ if (ws->active && len < max) {
+ if (!active)
+ len += scnprintf(pending_wakeup_source, max,
+ "Pending Wakeup Sources: ");
+ len += scnprintf(pending_wakeup_source + len, max - len,
+ "%s ", ws->name);
+ active = true;
+ } else if (!active &&
+ (!last_active_ws ||
+ ktime_to_ns(ws->last_time) >
+ ktime_to_ns(last_active_ws->last_time))) {
+ last_active_ws = ws;
+ }
+ }
+ if (!active && last_active_ws) {
+ scnprintf(pending_wakeup_source, max,
+ "Last active Wakeup Source: %s",
+ last_active_ws->name);
+ }
+ rcu_read_unlock();
+}
+EXPORT_SYMBOL_GPL(pm_get_active_wakeup_sources);
+
void pm_print_active_wakeup_sources(void)
{
struct wakeup_source *ws;
@@ -1018,7 +1050,7 @@
active_time = 0;
}
- seq_printf(m, "%-12s\t%lu\t\t%lu\t\t%lu\t\t%lu\t\t%lld\t\t%lld\t\t%lld\t\t%lld\t\t%lld\n",
+ seq_printf(m, "%-32s\t%lu\t\t%lu\t\t%lu\t\t%lu\t\t%lld\t\t%lld\t\t%lld\t\t%lld\t\t%lld\n",
ws->name, active_count, ws->event_count,
ws->wakeup_count, ws->expire_count,
ktime_to_ms(active_time), ktime_to_ms(total_time),
@@ -1039,7 +1071,7 @@
struct wakeup_source *ws;
int srcuidx;
- seq_puts(m, "name\t\tactive_count\tevent_count\twakeup_count\t"
+ seq_puts(m, "name\t\t\t\t\tactive_count\tevent_count\twakeup_count\t"
"expire_count\tactive_since\ttotal_time\tmax_time\t"
"last_change\tprevent_suspend_time\n");
diff --git a/drivers/base/syscore.c b/drivers/base/syscore.c
index 8d98a32..96c34a9 100644
--- a/drivers/base/syscore.c
+++ b/drivers/base/syscore.c
@@ -11,6 +11,7 @@
#include <linux/module.h>
#include <linux/suspend.h>
#include <trace/events/power.h>
+#include <linux/wakeup_reason.h>
static LIST_HEAD(syscore_ops_list);
static DEFINE_MUTEX(syscore_ops_lock);
@@ -75,6 +76,8 @@
return 0;
err_out:
+ log_suspend_abort_reason("System core suspend callback %pF failed",
+ ops->suspend);
pr_err("PM: System core suspend callback %pF failed.\n", ops->suspend);
list_for_each_entry_continue(ops, &syscore_ops_list, node)
diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index ac3a31d..6352357 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -13,7 +13,7 @@
It has several use cases, for example: /tmp storage, use as swap
disks and maybe many more.
- See zram.txt for more information.
+ See Documentation/blockdev/zram.txt for more information.
config ZRAM_WRITEBACK
bool "Write back incompressible page to backing device"
@@ -25,4 +25,14 @@
For this feature, admin should set up backing device via
/sys/block/zramX/backing_dev.
- See zram.txt for more infomration.
+ See Documentation/blockdev/zram.txt for more information.
+
+config ZRAM_MEMORY_TRACKING
+ bool "Track zRam block status"
+ depends on ZRAM && DEBUG_FS
+ help
+ With this feature, admin can track the state of allocated blocks
+ of zRAM. Admin could see the information via
+ /sys/kernel/debug/zram/zramX/block_state.
+
+ See Documentation/blockdev/zram.txt for more information.
diff --git a/drivers/block/zram/zcomp.c b/drivers/block/zram/zcomp.c
index 5b8992b..cc66dae 100644
--- a/drivers/block/zram/zcomp.c
+++ b/drivers/block/zram/zcomp.c
@@ -32,6 +32,9 @@
#if IS_ENABLED(CONFIG_CRYPTO_842)
"842",
#endif
+#if IS_ENABLED(CONFIG_CRYPTO_ZSTD)
+ "zstd",
+#endif
NULL
};
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 1e2648e..06f2623 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -31,6 +31,7 @@
#include <linux/err.h>
#include <linux/idr.h>
#include <linux/sysfs.h>
+#include <linux/debugfs.h>
#include <linux/cpuhotplug.h>
#include "zram_drv.h"
@@ -44,14 +45,36 @@
/* Module params (documentation at end) */
static unsigned int num_devices = 1;
+/*
+ * Pages that compress to sizes equals or greater than this are stored
+ * uncompressed in memory.
+ */
+static size_t huge_class_size;
static void zram_free_page(struct zram *zram, size_t index);
+static void zram_slot_lock(struct zram *zram, u32 index)
+{
+ bit_spin_lock(ZRAM_LOCK, &zram->table[index].value);
+}
+
+static void zram_slot_unlock(struct zram *zram, u32 index)
+{
+ bit_spin_unlock(ZRAM_LOCK, &zram->table[index].value);
+}
+
static inline bool init_done(struct zram *zram)
{
return zram->disksize;
}
+static inline bool zram_allocated(struct zram *zram, u32 index)
+{
+
+ return (zram->table[index].value >> (ZRAM_FLAG_SHIFT + 1)) ||
+ zram->table[index].handle;
+}
+
static inline struct zram *dev_to_zram(struct device *dev)
{
return (struct zram *)dev_to_disk(dev)->private_data;
@@ -68,7 +91,7 @@
}
/* flag operations require table entry bit_spin_lock() being held */
-static int zram_test_flag(struct zram *zram, u32 index,
+static bool zram_test_flag(struct zram *zram, u32 index,
enum zram_pageflags flag)
{
return zram->table[index].value & BIT(flag);
@@ -122,14 +145,6 @@
}
#endif
-static void zram_revalidate_disk(struct zram *zram)
-{
- revalidate_disk(zram->disk);
- /* revalidate_disk reset the BDI_CAP_STABLE_WRITES so set again */
- zram->disk->queue->backing_dev_info->capabilities |=
- BDI_CAP_STABLE_WRITES;
-}
-
/*
* Check if request is within bounds and aligned on zram logical blocks.
*/
@@ -441,7 +456,7 @@
WARN_ON_ONCE(!was_set);
}
-void zram_page_end_io(struct bio *bio)
+static void zram_page_end_io(struct bio *bio)
{
struct page *page = bio->bi_io_vec[0].bv_page;
@@ -608,6 +623,114 @@
static void zram_wb_clear(struct zram *zram, u32 index) {}
#endif
+#ifdef CONFIG_ZRAM_MEMORY_TRACKING
+
+static struct dentry *zram_debugfs_root;
+
+static void zram_debugfs_create(void)
+{
+ zram_debugfs_root = debugfs_create_dir("zram", NULL);
+}
+
+static void zram_debugfs_destroy(void)
+{
+ debugfs_remove_recursive(zram_debugfs_root);
+}
+
+static void zram_accessed(struct zram *zram, u32 index)
+{
+ zram->table[index].ac_time = ktime_get_boottime();
+}
+
+static void zram_reset_access(struct zram *zram, u32 index)
+{
+ zram->table[index].ac_time = 0;
+}
+
+static ssize_t read_block_state(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ char *kbuf;
+ ssize_t index, written = 0;
+ struct zram *zram = file->private_data;
+ unsigned long nr_pages = zram->disksize >> PAGE_SHIFT;
+ struct timespec64 ts;
+
+ kbuf = kvmalloc(count, GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ down_read(&zram->init_lock);
+ if (!init_done(zram)) {
+ up_read(&zram->init_lock);
+ kvfree(kbuf);
+ return -EINVAL;
+ }
+
+ for (index = *ppos; index < nr_pages; index++) {
+ int copied;
+
+ zram_slot_lock(zram, index);
+ if (!zram_allocated(zram, index))
+ goto next;
+
+ ts = ktime_to_timespec64(zram->table[index].ac_time);
+ copied = snprintf(kbuf + written, count,
+ "%12zd %12lld.%06lu %c%c%c\n",
+ index, (s64)ts.tv_sec,
+ ts.tv_nsec / NSEC_PER_USEC,
+ zram_test_flag(zram, index, ZRAM_SAME) ? 's' : '.',
+ zram_test_flag(zram, index, ZRAM_WB) ? 'w' : '.',
+ zram_test_flag(zram, index, ZRAM_HUGE) ? 'h' : '.');
+
+ if (count < copied) {
+ zram_slot_unlock(zram, index);
+ break;
+ }
+ written += copied;
+ count -= copied;
+next:
+ zram_slot_unlock(zram, index);
+ *ppos += 1;
+ }
+
+ up_read(&zram->init_lock);
+ if (copy_to_user(buf, kbuf, written))
+ written = -EFAULT;
+ kvfree(kbuf);
+
+ return written;
+}
+
+static const struct file_operations proc_zram_block_state_op = {
+ .open = simple_open,
+ .read = read_block_state,
+ .llseek = default_llseek,
+};
+
+static void zram_debugfs_register(struct zram *zram)
+{
+ if (!zram_debugfs_root)
+ return;
+
+ zram->debugfs_dir = debugfs_create_dir(zram->disk->disk_name,
+ zram_debugfs_root);
+ debugfs_create_file("block_state", 0400, zram->debugfs_dir,
+ zram, &proc_zram_block_state_op);
+}
+
+static void zram_debugfs_unregister(struct zram *zram)
+{
+ debugfs_remove_recursive(zram->debugfs_dir);
+}
+#else
+static void zram_debugfs_create(void) {};
+static void zram_debugfs_destroy(void) {};
+static void zram_accessed(struct zram *zram, u32 index) {};
+static void zram_reset_access(struct zram *zram, u32 index) {};
+static void zram_debugfs_register(struct zram *zram) {};
+static void zram_debugfs_unregister(struct zram *zram) {};
+#endif
/*
* We switched to per-cpu streams and this attr is not needed anymore.
@@ -727,14 +850,15 @@
max_used = atomic_long_read(&zram->stats.max_used_pages);
ret = scnprintf(buf, PAGE_SIZE,
- "%8llu %8llu %8llu %8lu %8ld %8llu %8lu\n",
+ "%8llu %8llu %8llu %8lu %8ld %8llu %8lu %8llu\n",
orig_size << PAGE_SHIFT,
(u64)atomic64_read(&zram->stats.compr_data_size),
mem_used << PAGE_SHIFT,
zram->limit_pages << PAGE_SHIFT,
max_used << PAGE_SHIFT,
(u64)atomic64_read(&zram->stats.same_pages),
- pool_stats.pages_compacted);
+ pool_stats.pages_compacted,
+ (u64)atomic64_read(&zram->stats.huge_pages));
up_read(&zram->init_lock);
return ret;
@@ -761,16 +885,6 @@
static DEVICE_ATTR_RO(mm_stat);
static DEVICE_ATTR_RO(debug_stat);
-static void zram_slot_lock(struct zram *zram, u32 index)
-{
- bit_spin_lock(ZRAM_ACCESS, &zram->table[index].value);
-}
-
-static void zram_slot_unlock(struct zram *zram, u32 index)
-{
- bit_spin_unlock(ZRAM_ACCESS, &zram->table[index].value);
-}
-
static void zram_meta_free(struct zram *zram, u64 disksize)
{
size_t num_pages = disksize >> PAGE_SHIFT;
@@ -799,6 +913,8 @@
return false;
}
+ if (!huge_class_size)
+ huge_class_size = zs_huge_class_size(zram->mem_pool);
return true;
}
@@ -811,6 +927,13 @@
{
unsigned long handle;
+ zram_reset_access(zram, index);
+
+ if (zram_test_flag(zram, index, ZRAM_HUGE)) {
+ zram_clear_flag(zram, index, ZRAM_HUGE);
+ atomic64_dec(&zram->stats.huge_pages);
+ }
+
if (zram_wb_enabled(zram) && zram_test_flag(zram, index, ZRAM_WB)) {
zram_wb_clear(zram, index);
atomic64_dec(&zram->stats.pages_stored);
@@ -978,7 +1101,7 @@
return ret;
}
- if (unlikely(comp_len > max_zpage_size)) {
+ if (unlikely(comp_len >= huge_class_size)) {
if (zram_wb_enabled(zram) && allow_wb) {
zcomp_stream_put(zram->comp);
ret = write_to_bdev(zram, bvec, index, bio, &element);
@@ -990,7 +1113,6 @@
allow_wb = false;
goto compress_again;
}
- comp_len = PAGE_SIZE;
}
/*
@@ -1052,6 +1174,11 @@
zram_slot_lock(zram, index);
zram_free_page(zram, index);
+ if (comp_len == PAGE_SIZE) {
+ zram_set_flag(zram, index, ZRAM_HUGE);
+ atomic64_inc(&zram->stats.huge_pages);
+ }
+
if (flags) {
zram_set_flag(zram, index, flags);
zram_set_element(zram, index, element);
@@ -1172,6 +1299,10 @@
generic_end_io_acct(q, rw_acct, &zram->disk->part0, start_time);
+ zram_slot_lock(zram, index);
+ zram_accessed(zram, index);
+ zram_slot_unlock(zram, index);
+
if (unlikely(ret < 0)) {
if (!is_write)
atomic64_inc(&zram->stats.failed_reads);
@@ -1378,7 +1509,8 @@
zram->comp = comp;
zram->disksize = disksize;
set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
- zram_revalidate_disk(zram);
+
+ revalidate_disk(zram->disk);
up_write(&zram->init_lock);
return len;
@@ -1425,7 +1557,7 @@
/* Make sure all the pending I/O are finished */
fsync_bdev(bdev);
zram_reset_device(zram);
- zram_revalidate_disk(zram);
+ revalidate_disk(zram->disk);
bdput(bdev);
mutex_lock(&bdev->bd_mutex);
@@ -1544,6 +1676,7 @@
/* zram devices sort of resembles non-rotational disks */
queue_flag_set_unlocked(QUEUE_FLAG_NONROT, zram->disk->queue);
queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, zram->disk->queue);
+
/*
* To ensure that we always get PAGE_SIZE aligned
* and n*PAGE_SIZED sized I/O requests.
@@ -1568,6 +1701,8 @@
if (ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE)
blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX);
+ zram->disk->queue->backing_dev_info->capabilities |=
+ BDI_CAP_STABLE_WRITES;
add_disk(zram->disk);
ret = sysfs_create_group(&disk_to_dev(zram->disk)->kobj,
@@ -1579,6 +1714,7 @@
}
strlcpy(zram->compressor, default_compressor, sizeof(zram->compressor));
+ zram_debugfs_register(zram);
pr_info("Added device: %s\n", zram->disk->disk_name);
return device_id;
@@ -1612,6 +1748,7 @@
zram->claim = true;
mutex_unlock(&bdev->bd_mutex);
+ zram_debugfs_unregister(zram);
/*
* Remove sysfs first, so no one will perform a disksize
* store while we destroy the devices. This also helps during
@@ -1629,8 +1766,8 @@
pr_info("Removed device: %s\n", zram->disk->disk_name);
- blk_cleanup_queue(zram->disk->queue);
del_gendisk(zram->disk);
+ blk_cleanup_queue(zram->disk->queue);
put_disk(zram->disk);
kfree(zram);
return 0;
@@ -1714,6 +1851,7 @@
{
class_unregister(&zram_control_class);
idr_for_each(&zram_index_idr, &zram_remove_cb, NULL);
+ zram_debugfs_destroy();
idr_destroy(&zram_index_idr);
unregister_blkdev(zram_major, "zram");
cpuhp_remove_multi_state(CPUHP_ZCOMP_PREPARE);
@@ -1735,6 +1873,7 @@
return ret;
}
+ zram_debugfs_create();
zram_major = register_blkdev(0, "zram");
if (zram_major <= 0) {
pr_err("Unable to get major number\n");
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 31762db..3a1cac4 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -21,22 +21,6 @@
#include "zcomp.h"
-/*-- Configurable parameters */
-
-/*
- * Pages that compress to size greater than this are stored
- * uncompressed in memory.
- */
-static const size_t max_zpage_size = PAGE_SIZE / 4 * 3;
-
-/*
- * NOTE: max_zpage_size must be less than or equal to:
- * ZS_MAX_ALLOC_SIZE. Otherwise, zs_malloc() would
- * always return failure.
- */
-
-/*-- End of configurable params */
-
#define SECTOR_SHIFT 9
#define SECTORS_PER_PAGE_SHIFT (PAGE_SHIFT - SECTOR_SHIFT)
#define SECTORS_PER_PAGE (1 << SECTORS_PER_PAGE_SHIFT)
@@ -60,10 +44,11 @@
/* Flags for zram pages (table[page_no].value) */
enum zram_pageflags {
- /* Page consists the same element */
- ZRAM_SAME = ZRAM_FLAG_SHIFT,
- ZRAM_ACCESS, /* page is now accessed */
+ /* zram slot is locked */
+ ZRAM_LOCK = ZRAM_FLAG_SHIFT,
+ ZRAM_SAME, /* Page consists the same element */
ZRAM_WB, /* page is stored on backing_device */
+ ZRAM_HUGE, /* Incompressible page */
__NR_ZRAM_PAGEFLAGS,
};
@@ -77,6 +62,9 @@
unsigned long element;
};
unsigned long value;
+#ifdef CONFIG_ZRAM_MEMORY_TRACKING
+ ktime_t ac_time;
+#endif
};
struct zram_stats {
@@ -88,6 +76,7 @@
atomic64_t invalid_io; /* non-page-aligned I/O requests */
atomic64_t notify_free; /* no. of swap slot free notifications */
atomic64_t same_pages; /* no. of same element filled pages */
+ atomic64_t huge_pages; /* no. of huge pages */
atomic64_t pages_stored; /* no. of pages currently stored */
atomic_long_t max_used_pages; /* no. of maximum pages stored */
atomic64_t writestall; /* no. of write slow paths */
@@ -124,5 +113,8 @@
unsigned long nr_pages;
spinlock_t bitmap_lock;
#endif
+#ifdef CONFIG_ZRAM_MEMORY_TRACKING
+ struct dentry *debugfs_dir;
+#endif
};
#endif
diff --git a/drivers/char/ipmi/ipmi_poweroff.c b/drivers/char/ipmi/ipmi_poweroff.c
index 9f2e3be..676c910 100644
--- a/drivers/char/ipmi/ipmi_poweroff.c
+++ b/drivers/char/ipmi/ipmi_poweroff.c
@@ -66,7 +66,7 @@
/* Holds the old poweroff function so we can restore it on removal. */
static void (*old_poweroff_func)(void);
-static int set_param_ifnum(const char *val, struct kernel_param *kp)
+static int set_param_ifnum(const char *val, const struct kernel_param *kp)
{
int rv = param_set_int(val, kp);
if (rv)
diff --git a/drivers/char/ipmi/ipmi_si_intf.c b/drivers/char/ipmi/ipmi_si_intf.c
index c04aa11..9abc067 100644
--- a/drivers/char/ipmi/ipmi_si_intf.c
+++ b/drivers/char/ipmi/ipmi_si_intf.c
@@ -1345,7 +1345,7 @@
#define IPMI_MEM_ADDR_SPACE 1
static const char * const addr_space_to_str[] = { "i/o", "mem" };
-static int hotmod_handler(const char *val, struct kernel_param *kp);
+static int hotmod_handler(const char *val, const struct kernel_param *kp);
module_param_call(hotmod, hotmod_handler, NULL, NULL, 0200);
MODULE_PARM_DESC(hotmod, "Add and remove interfaces. See"
@@ -1811,7 +1811,7 @@
return info;
}
-static int hotmod_handler(const char *val, struct kernel_param *kp)
+static int hotmod_handler(const char *val, const struct kernel_param *kp)
{
char *str = kstrdup(val, GFP_KERNEL);
int rv;
diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
index d8addbc..b374515 100644
--- a/drivers/cpufreq/Kconfig
+++ b/drivers/cpufreq/Kconfig
@@ -37,6 +37,13 @@
If in doubt, say N.
+config CPU_FREQ_TIMES
+ bool "CPU frequency time-in-state statistics"
+ help
+ Export CPU time-in-state information through procfs.
+
+ If in doubt, say N.
+
choice
prompt "Default CPUFreq governor"
default CPU_FREQ_DEFAULT_GOV_USERSPACE if ARM_SA1100_CPUFREQ || ARM_SA1110_CPUFREQ
diff --git a/drivers/cpufreq/Makefile b/drivers/cpufreq/Makefile
index 812f9e0..3ad8aeb 100644
--- a/drivers/cpufreq/Makefile
+++ b/drivers/cpufreq/Makefile
@@ -5,7 +5,10 @@
# CPUfreq stats
obj-$(CONFIG_CPU_FREQ_STAT) += cpufreq_stats.o
-# CPUfreq governors
+# CPUfreq times
+obj-$(CONFIG_CPU_FREQ_TIMES) += cpufreq_times.o
+
+# CPUfreq governors
obj-$(CONFIG_CPU_FREQ_GOV_PERFORMANCE) += cpufreq_performance.o
obj-$(CONFIG_CPU_FREQ_GOV_POWERSAVE) += cpufreq_powersave.o
obj-$(CONFIG_CPU_FREQ_GOV_USERSPACE) += cpufreq_userspace.o
diff --git a/drivers/cpufreq/arm_big_little.c b/drivers/cpufreq/arm_big_little.c
index 1750412..0c41ab3 100644
--- a/drivers/cpufreq/arm_big_little.c
+++ b/drivers/cpufreq/arm_big_little.c
@@ -213,6 +213,7 @@
{
u32 cpu = policy->cpu, cur_cluster, new_cluster, actual_cluster;
unsigned int freqs_new;
+ int ret;
cur_cluster = cpu_to_cluster(cpu);
new_cluster = actual_cluster = per_cpu(physical_cluster, cpu);
@@ -229,7 +230,14 @@
}
}
- return bL_cpufreq_set_rate(cpu, actual_cluster, new_cluster, freqs_new);
+ ret = bL_cpufreq_set_rate(cpu, actual_cluster, new_cluster, freqs_new);
+
+ if (!ret) {
+ arch_set_freq_scale(policy->related_cpus, freqs_new,
+ policy->cpuinfo.max_freq);
+ }
+
+ return ret;
}
static inline u32 get_table_count(struct cpufreq_frequency_table *table)
diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
index d83ab94..545946a 100644
--- a/drivers/cpufreq/cpufreq-dt.c
+++ b/drivers/cpufreq/cpufreq-dt.c
@@ -43,9 +43,17 @@
static int set_target(struct cpufreq_policy *policy, unsigned int index)
{
struct private_data *priv = policy->driver_data;
+ unsigned long freq = policy->freq_table[index].frequency;
+ int ret;
- return dev_pm_opp_set_rate(priv->cpu_dev,
- policy->freq_table[index].frequency * 1000);
+ ret = dev_pm_opp_set_rate(priv->cpu_dev, freq * 1000);
+
+ if (!ret) {
+ arch_set_freq_scale(policy->related_cpus, freq,
+ policy->cpuinfo.max_freq);
+ }
+
+ return ret;
}
/*
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 9375430..9273994 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -19,6 +19,7 @@
#include <linux/cpu.h>
#include <linux/cpufreq.h>
+#include <linux/cpufreq_times.h>
#include <linux/delay.h>
#include <linux/device.h>
#include <linux/init.h>
@@ -339,6 +340,7 @@
(unsigned long)freqs->new, (unsigned long)freqs->cpu);
trace_cpu_frequency(freqs->new, freqs->cpu);
cpufreq_stats_record_transition(policy, freqs->new);
+ cpufreq_times_record_transition(freqs);
srcu_notifier_call_chain(&cpufreq_transition_notifier_list,
CPUFREQ_POSTCHANGE, freqs);
if (likely(policy) && likely(policy->cpu == freqs->cpu))
@@ -1289,6 +1291,7 @@
goto out_exit_policy;
cpufreq_stats_create_table(policy);
+ cpufreq_times_create_policy(policy);
write_lock_irqsave(&cpufreq_driver_lock, flags);
list_add(&policy->policy_list, &cpufreq_policy_list);
@@ -2235,6 +2238,10 @@
policy->min = new_policy->min;
policy->max = new_policy->max;
+ arch_set_max_freq_scale(policy->cpus, policy->max);
+
+ trace_cpu_frequency_limits(policy->max, policy->min, policy->cpu);
+
policy->cached_target_freq = UINT_MAX;
pr_debug("new min and max freqs are %u - %u kHz\n",
@@ -2433,6 +2440,23 @@
EXPORT_SYMBOL_GPL(cpufreq_boost_enabled);
/*********************************************************************
+ * FREQUENCY INVARIANT ACCOUNTING SUPPORT *
+ *********************************************************************/
+
+__weak void arch_set_freq_scale(struct cpumask *cpus,
+ unsigned long cur_freq,
+ unsigned long max_freq)
+{
+}
+EXPORT_SYMBOL_GPL(arch_set_freq_scale);
+
+__weak void arch_set_max_freq_scale(struct cpumask *cpus,
+ unsigned long policy_max_freq)
+{
+}
+EXPORT_SYMBOL_GPL(arch_set_max_freq_scale);
+
+/*********************************************************************
* REGISTER / UNREGISTER CPUFREQ DRIVER *
*********************************************************************/
static enum cpuhp_state hp_online;
diff --git a/drivers/cpufreq/cpufreq_times.c b/drivers/cpufreq/cpufreq_times.c
new file mode 100644
index 0000000..a43eeee
--- /dev/null
+++ b/drivers/cpufreq/cpufreq_times.c
@@ -0,0 +1,464 @@
+/* drivers/cpufreq/cpufreq_times.c
+ *
+ * Copyright (C) 2018 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/cpufreq.h>
+#include <linux/cpufreq_times.h>
+#include <linux/hashtable.h>
+#include <linux/init.h>
+#include <linux/jiffies.h>
+#include <linux/proc_fs.h>
+#include <linux/sched.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+#include <linux/threads.h>
+
+#define UID_HASH_BITS 10
+
+static DECLARE_HASHTABLE(uid_hash_table, UID_HASH_BITS);
+
+static DEFINE_SPINLOCK(task_time_in_state_lock); /* task->time_in_state */
+static DEFINE_SPINLOCK(uid_lock); /* uid_hash_table */
+
+struct uid_entry {
+ uid_t uid;
+ unsigned int max_state;
+ struct hlist_node hash;
+ struct rcu_head rcu;
+ u64 time_in_state[0];
+};
+
+/**
+ * struct cpu_freqs - per-cpu frequency information
+ * @offset: start of these freqs' stats in task time_in_state array
+ * @max_state: number of entries in freq_table
+ * @last_index: index in freq_table of last frequency switched to
+ * @freq_table: list of available frequencies
+ */
+struct cpu_freqs {
+ unsigned int offset;
+ unsigned int max_state;
+ unsigned int last_index;
+ unsigned int freq_table[0];
+};
+
+static struct cpu_freqs *all_freqs[NR_CPUS];
+
+static unsigned int next_offset;
+
+
+/* Caller must hold rcu_read_lock() */
+static struct uid_entry *find_uid_entry_rcu(uid_t uid)
+{
+ struct uid_entry *uid_entry;
+
+ hash_for_each_possible_rcu(uid_hash_table, uid_entry, hash, uid) {
+ if (uid_entry->uid == uid)
+ return uid_entry;
+ }
+ return NULL;
+}
+
+/* Caller must hold uid lock */
+static struct uid_entry *find_uid_entry_locked(uid_t uid)
+{
+ struct uid_entry *uid_entry;
+
+ hash_for_each_possible(uid_hash_table, uid_entry, hash, uid) {
+ if (uid_entry->uid == uid)
+ return uid_entry;
+ }
+ return NULL;
+}
+
+/* Caller must hold uid lock */
+static struct uid_entry *find_or_register_uid_locked(uid_t uid)
+{
+ struct uid_entry *uid_entry, *temp;
+ unsigned int max_state = READ_ONCE(next_offset);
+ size_t alloc_size = sizeof(*uid_entry) + max_state *
+ sizeof(uid_entry->time_in_state[0]);
+
+ uid_entry = find_uid_entry_locked(uid);
+ if (uid_entry) {
+ if (uid_entry->max_state == max_state)
+ return uid_entry;
+ /* uid_entry->time_in_state is too small to track all freqs, so
+ * expand it.
+ */
+ temp = __krealloc(uid_entry, alloc_size, GFP_ATOMIC);
+ if (!temp)
+ return uid_entry;
+ temp->max_state = max_state;
+ memset(temp->time_in_state + uid_entry->max_state, 0,
+ (max_state - uid_entry->max_state) *
+ sizeof(uid_entry->time_in_state[0]));
+ if (temp != uid_entry) {
+ hlist_replace_rcu(&uid_entry->hash, &temp->hash);
+ kfree_rcu(uid_entry, rcu);
+ }
+ return temp;
+ }
+
+ uid_entry = kzalloc(alloc_size, GFP_ATOMIC);
+ if (!uid_entry)
+ return NULL;
+
+ uid_entry->uid = uid;
+ uid_entry->max_state = max_state;
+
+ hash_add_rcu(uid_hash_table, &uid_entry->hash, uid);
+
+ return uid_entry;
+}
+
+static bool freq_index_invalid(unsigned int index)
+{
+ unsigned int cpu;
+ struct cpu_freqs *freqs;
+
+ for_each_possible_cpu(cpu) {
+ freqs = all_freqs[cpu];
+ if (!freqs || index < freqs->offset ||
+ freqs->offset + freqs->max_state <= index)
+ continue;
+ return freqs->freq_table[index - freqs->offset] ==
+ CPUFREQ_ENTRY_INVALID;
+ }
+ return true;
+}
+
+static int single_uid_time_in_state_show(struct seq_file *m, void *ptr)
+{
+ struct uid_entry *uid_entry;
+ unsigned int i;
+ u64 time;
+ uid_t uid = from_kuid_munged(current_user_ns(), *(kuid_t *)m->private);
+
+ if (uid == overflowuid)
+ return -EINVAL;
+
+ rcu_read_lock();
+
+ uid_entry = find_uid_entry_rcu(uid);
+ if (!uid_entry) {
+ rcu_read_unlock();
+ return 0;
+ }
+
+ for (i = 0; i < uid_entry->max_state; ++i) {
+ if (freq_index_invalid(i))
+ continue;
+ time = nsec_to_clock_t(uid_entry->time_in_state[i]);
+ seq_write(m, &time, sizeof(time));
+ }
+
+ rcu_read_unlock();
+
+ return 0;
+}
+
+static void *uid_seq_start(struct seq_file *seq, loff_t *pos)
+{
+ if (*pos >= HASH_SIZE(uid_hash_table))
+ return NULL;
+
+ return &uid_hash_table[*pos];
+}
+
+static void *uid_seq_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+ (*pos)++;
+
+ if (*pos >= HASH_SIZE(uid_hash_table))
+ return NULL;
+
+ return &uid_hash_table[*pos];
+}
+
+static void uid_seq_stop(struct seq_file *seq, void *v) { }
+
+static int uid_time_in_state_seq_show(struct seq_file *m, void *v)
+{
+ struct uid_entry *uid_entry;
+ struct cpu_freqs *freqs, *last_freqs = NULL;
+ int i, cpu;
+
+ if (v == uid_hash_table) {
+ seq_puts(m, "uid:");
+ for_each_possible_cpu(cpu) {
+ freqs = all_freqs[cpu];
+ if (!freqs || freqs == last_freqs)
+ continue;
+ last_freqs = freqs;
+ for (i = 0; i < freqs->max_state; i++) {
+ if (freqs->freq_table[i] ==
+ CPUFREQ_ENTRY_INVALID)
+ continue;
+ seq_printf(m, " %d", freqs->freq_table[i]);
+ }
+ }
+ seq_putc(m, '\n');
+ }
+
+ rcu_read_lock();
+
+ hlist_for_each_entry_rcu(uid_entry, (struct hlist_head *)v, hash) {
+ if (uid_entry->max_state)
+ seq_printf(m, "%d:", uid_entry->uid);
+ for (i = 0; i < uid_entry->max_state; ++i) {
+ if (freq_index_invalid(i))
+ continue;
+ seq_printf(m, " %lu", (unsigned long)nsec_to_clock_t(
+ uid_entry->time_in_state[i]));
+ }
+ if (uid_entry->max_state)
+ seq_putc(m, '\n');
+ }
+
+ rcu_read_unlock();
+ return 0;
+}
+
+void cpufreq_task_times_init(struct task_struct *p)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&task_time_in_state_lock, flags);
+ p->time_in_state = NULL;
+ spin_unlock_irqrestore(&task_time_in_state_lock, flags);
+ p->max_state = 0;
+}
+
+void cpufreq_task_times_alloc(struct task_struct *p)
+{
+ void *temp;
+ unsigned long flags;
+ unsigned int max_state = READ_ONCE(next_offset);
+
+ /* We use one array to avoid multiple allocs per task */
+ temp = kcalloc(max_state, sizeof(p->time_in_state[0]), GFP_ATOMIC);
+ if (!temp)
+ return;
+
+ spin_lock_irqsave(&task_time_in_state_lock, flags);
+ p->time_in_state = temp;
+ spin_unlock_irqrestore(&task_time_in_state_lock, flags);
+ p->max_state = max_state;
+}
+
+/* Caller must hold task_time_in_state_lock */
+static int cpufreq_task_times_realloc_locked(struct task_struct *p)
+{
+ void *temp;
+ unsigned int max_state = READ_ONCE(next_offset);
+
+ temp = krealloc(p->time_in_state, max_state * sizeof(u64), GFP_ATOMIC);
+ if (!temp)
+ return -ENOMEM;
+ p->time_in_state = temp;
+ memset(p->time_in_state + p->max_state, 0,
+ (max_state - p->max_state) * sizeof(u64));
+ p->max_state = max_state;
+ return 0;
+}
+
+void cpufreq_task_times_exit(struct task_struct *p)
+{
+ unsigned long flags;
+ void *temp;
+
+ if (!p->time_in_state)
+ return;
+
+ spin_lock_irqsave(&task_time_in_state_lock, flags);
+ temp = p->time_in_state;
+ p->time_in_state = NULL;
+ spin_unlock_irqrestore(&task_time_in_state_lock, flags);
+ kfree(temp);
+}
+
+int proc_time_in_state_show(struct seq_file *m, struct pid_namespace *ns,
+ struct pid *pid, struct task_struct *p)
+{
+ unsigned int cpu, i;
+ u64 cputime;
+ unsigned long flags;
+ struct cpu_freqs *freqs;
+ struct cpu_freqs *last_freqs = NULL;
+
+ spin_lock_irqsave(&task_time_in_state_lock, flags);
+ for_each_possible_cpu(cpu) {
+ freqs = all_freqs[cpu];
+ if (!freqs || freqs == last_freqs)
+ continue;
+ last_freqs = freqs;
+
+ seq_printf(m, "cpu%u\n", cpu);
+ for (i = 0; i < freqs->max_state; i++) {
+ if (freqs->freq_table[i] == CPUFREQ_ENTRY_INVALID)
+ continue;
+ cputime = 0;
+ if (freqs->offset + i < p->max_state &&
+ p->time_in_state)
+ cputime = p->time_in_state[freqs->offset + i];
+ seq_printf(m, "%u %lu\n", freqs->freq_table[i],
+ (unsigned long)nsec_to_clock_t(cputime));
+ }
+ }
+ spin_unlock_irqrestore(&task_time_in_state_lock, flags);
+ return 0;
+}
+
+void cpufreq_acct_update_power(struct task_struct *p, u64 cputime)
+{
+ unsigned long flags;
+ unsigned int state;
+ struct uid_entry *uid_entry;
+ struct cpu_freqs *freqs = all_freqs[task_cpu(p)];
+ uid_t uid = from_kuid_munged(current_user_ns(), task_uid(p));
+
+ if (!freqs || p->flags & PF_EXITING)
+ return;
+
+ state = freqs->offset + READ_ONCE(freqs->last_index);
+
+ spin_lock_irqsave(&task_time_in_state_lock, flags);
+ if ((state < p->max_state || !cpufreq_task_times_realloc_locked(p)) &&
+ p->time_in_state)
+ p->time_in_state[state] += cputime;
+ spin_unlock_irqrestore(&task_time_in_state_lock, flags);
+
+ spin_lock_irqsave(&uid_lock, flags);
+ uid_entry = find_or_register_uid_locked(uid);
+ if (uid_entry && state < uid_entry->max_state)
+ uid_entry->time_in_state[state] += cputime;
+ spin_unlock_irqrestore(&uid_lock, flags);
+}
+
+void cpufreq_times_create_policy(struct cpufreq_policy *policy)
+{
+ int cpu, index;
+ unsigned int count = 0;
+ struct cpufreq_frequency_table *pos, *table;
+ struct cpu_freqs *freqs;
+ void *tmp;
+
+ if (all_freqs[policy->cpu])
+ return;
+
+ table = policy->freq_table;
+ if (!table)
+ return;
+
+ cpufreq_for_each_entry(pos, table)
+ count++;
+
+ tmp = kzalloc(sizeof(*freqs) + sizeof(freqs->freq_table[0]) * count,
+ GFP_KERNEL);
+ if (!tmp)
+ return;
+
+ freqs = tmp;
+ freqs->max_state = count;
+
+ index = cpufreq_frequency_table_get_index(policy, policy->cur);
+ if (index >= 0)
+ WRITE_ONCE(freqs->last_index, index);
+
+ cpufreq_for_each_entry(pos, table)
+ freqs->freq_table[pos - table] = pos->frequency;
+
+ freqs->offset = next_offset;
+ WRITE_ONCE(next_offset, freqs->offset + count);
+ for_each_cpu(cpu, policy->related_cpus)
+ all_freqs[cpu] = freqs;
+}
+
+void cpufreq_task_times_remove_uids(uid_t uid_start, uid_t uid_end)
+{
+ struct uid_entry *uid_entry;
+ struct hlist_node *tmp;
+ unsigned long flags;
+
+ spin_lock_irqsave(&uid_lock, flags);
+
+ for (; uid_start <= uid_end; uid_start++) {
+ hash_for_each_possible_safe(uid_hash_table, uid_entry, tmp,
+ hash, uid_start) {
+ if (uid_start == uid_entry->uid) {
+ hash_del_rcu(&uid_entry->hash);
+ kfree_rcu(uid_entry, rcu);
+ }
+ }
+ }
+
+ spin_unlock_irqrestore(&uid_lock, flags);
+}
+
+void cpufreq_times_record_transition(struct cpufreq_freqs *freq)
+{
+ int index;
+ struct cpu_freqs *freqs = all_freqs[freq->cpu];
+ struct cpufreq_policy *policy;
+
+ if (!freqs)
+ return;
+
+ policy = cpufreq_cpu_get(freq->cpu);
+ if (!policy)
+ return;
+
+ index = cpufreq_frequency_table_get_index(policy, freq->new);
+ if (index >= 0)
+ WRITE_ONCE(freqs->last_index, index);
+
+ cpufreq_cpu_put(policy);
+}
+
+static const struct seq_operations uid_time_in_state_seq_ops = {
+ .start = uid_seq_start,
+ .next = uid_seq_next,
+ .stop = uid_seq_stop,
+ .show = uid_time_in_state_seq_show,
+};
+
+static int uid_time_in_state_open(struct inode *inode, struct file *file)
+{
+ return seq_open(file, &uid_time_in_state_seq_ops);
+}
+
+int single_uid_time_in_state_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, single_uid_time_in_state_show,
+ &(inode->i_uid));
+}
+
+static const struct file_operations uid_time_in_state_fops = {
+ .open = uid_time_in_state_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release,
+};
+
+static int __init cpufreq_times_init(void)
+{
+ proc_create_data("uid_time_in_state", 0444, NULL,
+ &uid_time_in_state_fops, NULL);
+
+ return 0;
+}
+
+early_initcall(cpufreq_times_init);
diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index ed4df58..7c33193 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -212,7 +212,7 @@
}
/* Take note of the planned idle state. */
- sched_idle_set_state(target_state);
+ sched_idle_set_state(target_state, index);
trace_cpu_idle_rcuidle(index, dev->cpu);
time_start = ns_to_ktime(local_clock());
@@ -226,7 +226,7 @@
trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, dev->cpu);
/* The cpu is no longer idle or about to enter idle. */
- sched_idle_set_state(NULL);
+ sched_idle_set_state(NULL, -1);
if (broadcast) {
if (WARN_ON_ONCE(!irqs_disabled()))
@@ -263,12 +263,18 @@
*
* @drv: the cpuidle driver
* @dev: the cpuidle device
+ * @stop_tick: indication on whether or not to stop the tick
*
* Returns the index of the idle state. The return value must not be negative.
+ *
+ * The memory location pointed to by @stop_tick is expected to be written the
+ * 'false' boolean value if the scheduler tick should not be stopped before
+ * entering the returned state.
*/
-int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
+int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
+ bool *stop_tick)
{
- return cpuidle_curr_governor->select(drv, dev);
+ return cpuidle_curr_governor->select(drv, dev, stop_tick);
}
/**
diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
index ce1a2ff..0213e07 100644
--- a/drivers/cpuidle/governors/ladder.c
+++ b/drivers/cpuidle/governors/ladder.c
@@ -62,9 +62,10 @@
* ladder_select_state - selects the next state to enter
* @drv: cpuidle driver
* @dev: the CPU
+ * @dummy: not used
*/
static int ladder_select_state(struct cpuidle_driver *drv,
- struct cpuidle_device *dev)
+ struct cpuidle_device *dev, bool *dummy)
{
struct ladder_device *ldev = this_cpu_ptr(&ladder_devices);
struct ladder_device_state *last_state;
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index 48eaf28..58c103b 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -123,6 +123,7 @@
struct menu_device {
int last_state_idx;
int needs_update;
+ int tick_wakeup;
unsigned int next_timer_us;
unsigned int predicted_us;
@@ -180,7 +181,12 @@
/* for higher loadavg, we are more reluctant */
- mult += 2 * get_loadavg(load);
+ /*
+ * this doesn't work as intended - it is almost always 0, but can
+ * sometimes, depending on workload, spike very high into the hundreds
+ * even when the average cpu load is under 10%.
+ */
+ /* mult += 2 * get_loadavg(); */
/* for IO wait tasks (per cpu!) we add 5x each */
mult += 10 * nr_iowaiters;
@@ -279,8 +285,10 @@
* menu_select - selects the next idle state to enter
* @drv: cpuidle driver containing state data
* @dev: the CPU
+ * @stop_tick: indication on whether or not to stop the tick
*/
-static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
+static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
+ bool *stop_tick)
{
struct menu_device *data = this_cpu_ptr(&menu_devices);
struct device *device = get_cpu_device(dev->cpu);
@@ -292,6 +300,7 @@
unsigned int expected_interval;
unsigned long nr_iowaiters, cpu_load;
int resume_latency = dev_pm_qos_raw_read_value(device);
+ ktime_t delta_next;
if (data->needs_update) {
menu_update(drv, dev);
@@ -303,11 +312,13 @@
latency_req = resume_latency;
/* Special case when user has set very strict latency requirement */
- if (unlikely(latency_req == 0))
+ if (unlikely(latency_req == 0)) {
+ *stop_tick = false;
return 0;
+ }
/* determine the expected residency time, round up */
- data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length());
+ data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length(&delta_next));
get_iowait_load(&nr_iowaiters, &cpu_load);
data->bucket = which_bucket(data->next_timer_us, nr_iowaiters);
@@ -346,14 +357,30 @@
*/
data->predicted_us = min(data->predicted_us, expected_interval);
- /*
- * Use the performance multiplier and the user-configurable
- * latency_req to determine the maximum exit latency.
- */
- interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
- if (latency_req > interactivity_req)
- latency_req = interactivity_req;
+ if (tick_nohz_tick_stopped()) {
+ /*
+ * If the tick is already stopped, the cost of possible short
+ * idle duration misprediction is much higher, because the CPU
+ * may be stuck in a shallow idle state for a long time as a
+ * result of it. In that case say we might mispredict and try
+ * to force the CPU into a state for which we would have stopped
+ * the tick, unless a timer is going to expire really soon
+ * anyway.
+ */
+ if (data->predicted_us < TICK_USEC)
+ data->predicted_us = min_t(unsigned int, TICK_USEC,
+ ktime_to_us(delta_next));
+ } else {
+ /*
+ * Use the performance multiplier and the user-configurable
+ * latency_req to determine the maximum exit latency.
+ */
+ interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
+ if (latency_req > interactivity_req)
+ latency_req = interactivity_req;
+ }
+ expected_interval = data->predicted_us;
/*
* Find the idle state with the lowest power while satisfying
* our constraints.
@@ -369,15 +396,52 @@
idx = i; /* first enabled state */
if (s->target_residency > data->predicted_us)
break;
- if (s->exit_latency > latency_req)
+ if (s->exit_latency > latency_req) {
+ /*
+ * If we break out of the loop for latency reasons, use
+ * the target residency of the selected state as the
+ * expected idle duration so that the tick is retained
+ * as long as that target residency is low enough.
+ */
+ expected_interval = drv->states[idx].target_residency;
break;
-
+ }
idx = i;
}
if (idx == -1)
idx = 0; /* No states enabled. Must use 0. */
+ /*
+ * Don't stop the tick if the selected state is a polling one or if the
+ * expected idle duration is shorter than the tick period length.
+ */
+ if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) ||
+ expected_interval < TICK_USEC) {
+ unsigned int delta_next_us = ktime_to_us(delta_next);
+
+ *stop_tick = false;
+
+ if (!tick_nohz_tick_stopped() && idx > 0 &&
+ drv->states[idx].target_residency > delta_next_us) {
+ /*
+ * The tick is not going to be stopped and the target
+ * residency of the state to be returned is not within
+ * the time until the next timer event including the
+ * tick, so try to correct that.
+ */
+ for (i = idx - 1; i >= 0; i--) {
+ if (drv->states[i].disabled ||
+ dev->states_usage[i].disable)
+ continue;
+
+ idx = i;
+ if (drv->states[i].target_residency <= delta_next_us)
+ break;
+ }
+ }
+ }
+
data->last_state_idx = idx;
return data->last_state_idx;
@@ -397,6 +461,7 @@
data->last_state_idx = index;
data->needs_update = 1;
+ data->tick_wakeup = tick_nohz_idle_got_tick();
}
/**
@@ -427,14 +492,27 @@
* assume the state was never reached and the exit latency is 0.
*/
- /* measured value */
- measured_us = cpuidle_get_last_residency(dev);
+ if (data->tick_wakeup && data->next_timer_us > TICK_USEC) {
+ /*
+ * The nohz code said that there wouldn't be any events within
+ * the tick boundary (if the tick was stopped), but the idle
+ * duration predictor had a differing opinion. Since the CPU
+ * was woken up by a tick (that wasn't stopped after all), the
+ * predictor was not quite right, so assume that the CPU could
+ * have been idle long (but not forever) to help the idle
+ * duration predictor do a better job next time.
+ */
+ measured_us = 9 * MAX_INTERESTING / 10;
+ } else {
+ /* measured value */
+ measured_us = cpuidle_get_last_residency(dev);
- /* Deduct exit latency */
- if (measured_us > 2 * target->exit_latency)
- measured_us -= target->exit_latency;
- else
- measured_us /= 2;
+ /* Deduct exit latency */
+ if (measured_us > 2 * target->exit_latency)
+ measured_us -= target->exit_latency;
+ else
+ measured_us /= 2;
+ }
/* Make sure our coefficients do not exceed unity */
if (measured_us > data->next_timer_us)
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 9a30279..33ea68b 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -329,8 +329,12 @@
spin_lock_irqsave(fence->lock, flags);
ret = !list_empty(&cb->node);
- if (ret)
+ if (ret) {
list_del_init(&cb->node);
+ if (list_empty(&fence->cb_list))
+ if (fence->ops->disable_signaling)
+ fence->ops->disable_signaling(fence);
+ }
spin_unlock_irqrestore(fence->lock, flags);
diff --git a/drivers/dma-buf/sw_sync.c b/drivers/dma-buf/sw_sync.c
index 24f83f9..ebc1196 100644
--- a/drivers/dma-buf/sw_sync.c
+++ b/drivers/dma-buf/sw_sync.c
@@ -169,6 +169,13 @@
return true;
}
+static void timeline_fence_disable_signaling(struct dma_fence *fence)
+{
+ struct sync_pt *pt = dma_fence_to_sync_pt(fence);
+
+ list_del_init(&pt->link);
+}
+
static void timeline_fence_value_str(struct dma_fence *fence,
char *str, int size)
{
@@ -187,6 +194,7 @@
.get_driver_name = timeline_fence_get_driver_name,
.get_timeline_name = timeline_fence_get_timeline_name,
.enable_signaling = timeline_fence_enable_signaling,
+ .disable_signaling = timeline_fence_disable_signaling,
.signaled = timeline_fence_signaled,
.wait = dma_fence_default_wait,
.release = timeline_fence_release,
diff --git a/drivers/edac/edac_mc_sysfs.c b/drivers/edac/edac_mc_sysfs.c
index e4fcfa8..c70ea82 100644
--- a/drivers/edac/edac_mc_sysfs.c
+++ b/drivers/edac/edac_mc_sysfs.c
@@ -50,7 +50,7 @@
return edac_mc_poll_msec;
}
-static int edac_set_poll_msec(const char *val, struct kernel_param *kp)
+static int edac_set_poll_msec(const char *val, const struct kernel_param *kp)
{
unsigned long l;
int ret;
diff --git a/drivers/edac/edac_module.c b/drivers/edac/edac_module.c
index 172598a..32a931d 100644
--- a/drivers/edac/edac_module.c
+++ b/drivers/edac/edac_module.c
@@ -19,7 +19,8 @@
#ifdef CONFIG_EDAC_DEBUG
-static int edac_set_debug_level(const char *buf, struct kernel_param *kp)
+static int edac_set_debug_level(const char *buf,
+ const struct kernel_param *kp)
{
unsigned long val;
int ret;
diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile
index adaa4a9..69b3fbf 100644
--- a/drivers/firmware/efi/libstub/Makefile
+++ b/drivers/firmware/efi/libstub/Makefile
@@ -20,7 +20,8 @@
KBUILD_CFLAGS := $(cflags-y) -DDISABLE_BRANCH_PROFILING \
-D__NO_FORTIFY \
$(call cc-option,-ffreestanding) \
- $(call cc-option,-fno-stack-protector)
+ $(call cc-option,-fno-stack-protector) \
+ $(DISABLE_LTO)
GCOV_PROFILE := n
KASAN_SANITIZE := n
diff --git a/drivers/hid/hid-magicmouse.c b/drivers/hid/hid-magicmouse.c
index 20b40ad..42ed887 100644
--- a/drivers/hid/hid-magicmouse.c
+++ b/drivers/hid/hid-magicmouse.c
@@ -34,7 +34,8 @@
MODULE_PARM_DESC(emulate_scroll_wheel, "Emulate a scroll wheel");
static unsigned int scroll_speed = 32;
-static int param_set_scroll_speed(const char *val, struct kernel_param *kp) {
+static int param_set_scroll_speed(const char *val,
+ const struct kernel_param *kp) {
unsigned long speed;
if (!val || kstrtoul(val, 0, &speed) || speed > 63)
return -EINVAL;
diff --git a/drivers/ide/ide.c b/drivers/ide/ide.c
index d127ace..6ee866f 100644
--- a/drivers/ide/ide.c
+++ b/drivers/ide/ide.c
@@ -244,7 +244,7 @@
static unsigned int ide_disks;
static struct chs_geom ide_disks_chs[MAX_HWIFS * MAX_DRIVES];
-static int ide_set_disk_chs(const char *str, struct kernel_param *kp)
+static int ide_set_disk_chs(const char *str, const struct kernel_param *kp)
{
unsigned int a, b, c = 0, h = 0, s = 0, i, j = 1;
@@ -328,7 +328,7 @@
static unsigned int ide_ignore_cable;
-static int ide_set_ignore_cable(const char *s, struct kernel_param *kp)
+static int ide_set_ignore_cable(const char *s, const struct kernel_param *kp)
{
int i, j = 1;
diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c b/drivers/infiniband/hw/qib/qib_iba7322.c
index 14cadf6..a45e460 100644
--- a/drivers/infiniband/hw/qib/qib_iba7322.c
+++ b/drivers/infiniband/hw/qib/qib_iba7322.c
@@ -150,7 +150,7 @@
.string = txselect_list,
.maxlen = MAX_ATTEN_LEN
};
-static int setup_txselect(const char *, struct kernel_param *);
+static int setup_txselect(const char *, const struct kernel_param *);
module_param_call(txselect, setup_txselect, param_get_string,
&kp_txselect, S_IWUSR | S_IRUGO);
MODULE_PARM_DESC(txselect,
@@ -6169,7 +6169,7 @@
}
/* handle the txselect parameter changing */
-static int setup_txselect(const char *str, struct kernel_param *kp)
+static int setup_txselect(const char *str, const struct kernel_param *kp)
{
struct qib_devdata *dd;
unsigned long val;
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 60105ba..7dbc0d9 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -80,7 +80,7 @@
MODULE_PARM_DESC(srpt_srq_size,
"Shared receive queue (SRQ) size.");
-static int srpt_get_u64_x(char *buffer, struct kernel_param *kp)
+static int srpt_get_u64_x(char *buffer, const struct kernel_param *kp)
{
return sprintf(buffer, "0x%016llx", *(u64 *)kp->arg);
}
diff --git a/drivers/input/Kconfig b/drivers/input/Kconfig
index ff80377..724715e 100644
--- a/drivers/input/Kconfig
+++ b/drivers/input/Kconfig
@@ -184,6 +184,19 @@
To compile this driver as a module, choose M here: the
module will be called apm-power.
+config INPUT_KEYRESET
+ bool "Reset key"
+ depends on INPUT
+ select INPUT_KEYCOMBO
+ ---help---
+ Say Y here if you want to reboot when some keys are pressed;
+
+config INPUT_KEYCOMBO
+ bool "Key combo"
+ depends on INPUT
+ ---help---
+ Say Y here if you want to take action when some keys are pressed;
+
comment "Input Device Drivers"
source "drivers/input/keyboard/Kconfig"
diff --git a/drivers/input/Makefile b/drivers/input/Makefile
index 40de6a7..f0351af 100644
--- a/drivers/input/Makefile
+++ b/drivers/input/Makefile
@@ -27,5 +27,7 @@
obj-$(CONFIG_INPUT_MISC) += misc/
obj-$(CONFIG_INPUT_APMPOWER) += apm-power.o
+obj-$(CONFIG_INPUT_KEYRESET) += keyreset.o
+obj-$(CONFIG_INPUT_KEYCOMBO) += keycombo.o
obj-$(CONFIG_RMI4_CORE) += rmi4/
diff --git a/drivers/input/keyboard/goldfish_events.c b/drivers/input/keyboard/goldfish_events.c
index f6e643b..c877e56 100644
--- a/drivers/input/keyboard/goldfish_events.c
+++ b/drivers/input/keyboard/goldfish_events.c
@@ -17,6 +17,7 @@
#include <linux/interrupt.h>
#include <linux/types.h>
#include <linux/input.h>
+#include <linux/input/mt.h>
#include <linux/kernel.h>
#include <linux/platform_device.h>
#include <linux/slab.h>
@@ -24,6 +25,8 @@
#include <linux/io.h>
#include <linux/acpi.h>
+#define GOLDFISH_MAX_FINGERS 5
+
enum {
REG_READ = 0x00,
REG_SET_PAGE = 0x00,
@@ -52,7 +55,21 @@
value = __raw_readl(edev->addr + REG_READ);
input_event(edev->input, type, code, value);
- input_sync(edev->input);
+ // Send an extra (EV_SYN, SYN_REPORT, 0x0) event
+ // if a key was pressed. Some keyboard device
+ // drivers may only send the EV_KEY event and
+ // not EV_SYN.
+ // Note that sending an extra SYN_REPORT is not
+ // necessary nor correct protocol with other
+ // devices such as touchscreens, which will send
+ // their own SYN_REPORT's when sufficient event
+ // information has been collected (e.g., for
+ // touchscreens, when pressure and X/Y coordinates
+ // have been received). Hence, we will only send
+ // this extra SYN_REPORT if type == EV_KEY.
+ if (type == EV_KEY) {
+ input_sync(edev->input);
+ }
return IRQ_HANDLED;
}
@@ -154,6 +171,15 @@
input_dev->name = edev->name;
input_dev->id.bustype = BUS_HOST;
+ // Set the Goldfish Device to be multi-touch.
+ // In the Ranchu kernel, there is multi-touch-specific
+ // code for handling ABS_MT_SLOT events.
+ // See drivers/input/input.c:input_handle_abs_event.
+ // If we do not issue input_mt_init_slots,
+ // the kernel will filter out needed ABS_MT_SLOT
+ // events when we touch the screen in more than one place,
+ // preventing multi-touch with more than one finger from working.
+ input_mt_init_slots(input_dev, GOLDFISH_MAX_FINGERS, 0);
events_import_bits(edev, input_dev->evbit, EV_SYN, EV_MAX);
events_import_bits(edev, input_dev->keybit, EV_KEY, KEY_MAX);
diff --git a/drivers/input/keycombo.c b/drivers/input/keycombo.c
new file mode 100644
index 0000000..2fba451
--- /dev/null
+++ b/drivers/input/keycombo.c
@@ -0,0 +1,261 @@
+/* drivers/input/keycombo.c
+ *
+ * Copyright (C) 2014 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/input.h>
+#include <linux/keycombo.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/reboot.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+struct keycombo_state {
+ struct input_handler input_handler;
+ unsigned long keybit[BITS_TO_LONGS(KEY_CNT)];
+ unsigned long upbit[BITS_TO_LONGS(KEY_CNT)];
+ unsigned long key[BITS_TO_LONGS(KEY_CNT)];
+ spinlock_t lock;
+ struct workqueue_struct *wq;
+ int key_down_target;
+ int key_down;
+ int key_up;
+ struct delayed_work key_down_work;
+ int delay;
+ struct work_struct key_up_work;
+ void (*key_up_fn)(void *);
+ void (*key_down_fn)(void *);
+ void *priv;
+ int key_is_down;
+ struct wakeup_source combo_held_wake_source;
+ struct wakeup_source combo_up_wake_source;
+};
+
+static void do_key_down(struct work_struct *work)
+{
+ struct delayed_work *dwork = container_of(work, struct delayed_work,
+ work);
+ struct keycombo_state *state = container_of(dwork,
+ struct keycombo_state, key_down_work);
+ if (state->key_down_fn)
+ state->key_down_fn(state->priv);
+}
+
+static void do_key_up(struct work_struct *work)
+{
+ struct keycombo_state *state = container_of(work, struct keycombo_state,
+ key_up_work);
+ if (state->key_up_fn)
+ state->key_up_fn(state->priv);
+ __pm_relax(&state->combo_up_wake_source);
+}
+
+static void keycombo_event(struct input_handle *handle, unsigned int type,
+ unsigned int code, int value)
+{
+ unsigned long flags;
+ struct keycombo_state *state = handle->private;
+
+ if (type != EV_KEY)
+ return;
+
+ if (code >= KEY_MAX)
+ return;
+
+ if (!test_bit(code, state->keybit))
+ return;
+
+ spin_lock_irqsave(&state->lock, flags);
+ if (!test_bit(code, state->key) == !value)
+ goto done;
+ __change_bit(code, state->key);
+ if (test_bit(code, state->upbit)) {
+ if (value)
+ state->key_up++;
+ else
+ state->key_up--;
+ } else {
+ if (value)
+ state->key_down++;
+ else
+ state->key_down--;
+ }
+ if (state->key_down == state->key_down_target && state->key_up == 0) {
+ __pm_stay_awake(&state->combo_held_wake_source);
+ state->key_is_down = 1;
+ if (queue_delayed_work(state->wq, &state->key_down_work,
+ state->delay))
+ pr_debug("Key down work already queued!");
+ } else if (state->key_is_down) {
+ if (!cancel_delayed_work(&state->key_down_work)) {
+ __pm_stay_awake(&state->combo_up_wake_source);
+ queue_work(state->wq, &state->key_up_work);
+ }
+ __pm_relax(&state->combo_held_wake_source);
+ state->key_is_down = 0;
+ }
+done:
+ spin_unlock_irqrestore(&state->lock, flags);
+}
+
+static int keycombo_connect(struct input_handler *handler,
+ struct input_dev *dev,
+ const struct input_device_id *id)
+{
+ int i;
+ int ret;
+ struct input_handle *handle;
+ struct keycombo_state *state =
+ container_of(handler, struct keycombo_state, input_handler);
+ for (i = 0; i < KEY_MAX; i++) {
+ if (test_bit(i, state->keybit) && test_bit(i, dev->keybit))
+ break;
+ }
+ if (i == KEY_MAX)
+ return -ENODEV;
+
+ handle = kzalloc(sizeof(*handle), GFP_KERNEL);
+ if (!handle)
+ return -ENOMEM;
+
+ handle->dev = dev;
+ handle->handler = handler;
+ handle->name = KEYCOMBO_NAME;
+ handle->private = state;
+
+ ret = input_register_handle(handle);
+ if (ret)
+ goto err_input_register_handle;
+
+ ret = input_open_device(handle);
+ if (ret)
+ goto err_input_open_device;
+
+ return 0;
+
+err_input_open_device:
+ input_unregister_handle(handle);
+err_input_register_handle:
+ kfree(handle);
+ return ret;
+}
+
+static void keycombo_disconnect(struct input_handle *handle)
+{
+ input_close_device(handle);
+ input_unregister_handle(handle);
+ kfree(handle);
+}
+
+static const struct input_device_id keycombo_ids[] = {
+ {
+ .flags = INPUT_DEVICE_ID_MATCH_EVBIT,
+ .evbit = { BIT_MASK(EV_KEY) },
+ },
+ { },
+};
+MODULE_DEVICE_TABLE(input, keycombo_ids);
+
+static int keycombo_probe(struct platform_device *pdev)
+{
+ int ret;
+ int key, *keyp;
+ struct keycombo_state *state;
+ struct keycombo_platform_data *pdata = pdev->dev.platform_data;
+
+ if (!pdata)
+ return -EINVAL;
+
+ state = kzalloc(sizeof(*state), GFP_KERNEL);
+ if (!state)
+ return -ENOMEM;
+
+ spin_lock_init(&state->lock);
+ keyp = pdata->keys_down;
+ while ((key = *keyp++)) {
+ if (key >= KEY_MAX)
+ continue;
+ state->key_down_target++;
+ __set_bit(key, state->keybit);
+ }
+ if (pdata->keys_up) {
+ keyp = pdata->keys_up;
+ while ((key = *keyp++)) {
+ if (key >= KEY_MAX)
+ continue;
+ __set_bit(key, state->keybit);
+ __set_bit(key, state->upbit);
+ }
+ }
+
+ state->wq = alloc_ordered_workqueue("keycombo", 0);
+ if (!state->wq)
+ return -ENOMEM;
+
+ state->priv = pdata->priv;
+
+ if (pdata->key_down_fn)
+ state->key_down_fn = pdata->key_down_fn;
+ INIT_DELAYED_WORK(&state->key_down_work, do_key_down);
+
+ if (pdata->key_up_fn)
+ state->key_up_fn = pdata->key_up_fn;
+ INIT_WORK(&state->key_up_work, do_key_up);
+
+ wakeup_source_init(&state->combo_held_wake_source, "key combo");
+ wakeup_source_init(&state->combo_up_wake_source, "key combo up");
+ state->delay = msecs_to_jiffies(pdata->key_down_delay);
+
+ state->input_handler.event = keycombo_event;
+ state->input_handler.connect = keycombo_connect;
+ state->input_handler.disconnect = keycombo_disconnect;
+ state->input_handler.name = KEYCOMBO_NAME;
+ state->input_handler.id_table = keycombo_ids;
+ ret = input_register_handler(&state->input_handler);
+ if (ret) {
+ kfree(state);
+ return ret;
+ }
+ platform_set_drvdata(pdev, state);
+ return 0;
+}
+
+int keycombo_remove(struct platform_device *pdev)
+{
+ struct keycombo_state *state = platform_get_drvdata(pdev);
+ input_unregister_handler(&state->input_handler);
+ destroy_workqueue(state->wq);
+ kfree(state);
+ return 0;
+}
+
+
+struct platform_driver keycombo_driver = {
+ .driver.name = KEYCOMBO_NAME,
+ .probe = keycombo_probe,
+ .remove = keycombo_remove,
+};
+
+static int __init keycombo_init(void)
+{
+ return platform_driver_register(&keycombo_driver);
+}
+
+static void __exit keycombo_exit(void)
+{
+ return platform_driver_unregister(&keycombo_driver);
+}
+
+module_init(keycombo_init);
+module_exit(keycombo_exit);
diff --git a/drivers/input/keyreset.c b/drivers/input/keyreset.c
new file mode 100644
index 0000000..7e5222a
--- /dev/null
+++ b/drivers/input/keyreset.c
@@ -0,0 +1,144 @@
+/* drivers/input/keyreset.c
+ *
+ * Copyright (C) 2014 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/input.h>
+#include <linux/keyreset.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/reboot.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/syscalls.h>
+#include <linux/keycombo.h>
+
+struct keyreset_state {
+ int restart_requested;
+ int (*reset_fn)(void);
+ struct platform_device *pdev_child;
+ struct work_struct restart_work;
+};
+
+static void do_restart(struct work_struct *unused)
+{
+ orderly_reboot();
+}
+
+static void do_reset_fn(void *priv)
+{
+ struct keyreset_state *state = priv;
+ if (state->restart_requested)
+ panic("keyboard reset failed, %d", state->restart_requested);
+ if (state->reset_fn) {
+ state->restart_requested = state->reset_fn();
+ } else {
+ pr_info("keyboard reset\n");
+ schedule_work(&state->restart_work);
+ state->restart_requested = 1;
+ }
+}
+
+static int keyreset_probe(struct platform_device *pdev)
+{
+ int ret = -ENOMEM;
+ struct keycombo_platform_data *pdata_child;
+ struct keyreset_platform_data *pdata = pdev->dev.platform_data;
+ int up_size = 0, down_size = 0, size;
+ int key, *keyp;
+ struct keyreset_state *state;
+
+ if (!pdata)
+ return -EINVAL;
+ state = devm_kzalloc(&pdev->dev, sizeof(*state), GFP_KERNEL);
+ if (!state)
+ return -ENOMEM;
+
+ state->pdev_child = platform_device_alloc(KEYCOMBO_NAME,
+ PLATFORM_DEVID_AUTO);
+ if (!state->pdev_child)
+ return -ENOMEM;
+ state->pdev_child->dev.parent = &pdev->dev;
+ INIT_WORK(&state->restart_work, do_restart);
+
+ keyp = pdata->keys_down;
+ while ((key = *keyp++)) {
+ if (key >= KEY_MAX)
+ continue;
+ down_size++;
+ }
+ if (pdata->keys_up) {
+ keyp = pdata->keys_up;
+ while ((key = *keyp++)) {
+ if (key >= KEY_MAX)
+ continue;
+ up_size++;
+ }
+ }
+ size = sizeof(struct keycombo_platform_data)
+ + sizeof(int) * (down_size + 1);
+ pdata_child = devm_kzalloc(&pdev->dev, size, GFP_KERNEL);
+ if (!pdata_child)
+ goto error;
+ memcpy(pdata_child->keys_down, pdata->keys_down,
+ sizeof(int) * down_size);
+ if (up_size > 0) {
+ pdata_child->keys_up = devm_kzalloc(&pdev->dev, up_size + 1,
+ GFP_KERNEL);
+ if (!pdata_child->keys_up)
+ goto error;
+ memcpy(pdata_child->keys_up, pdata->keys_up,
+ sizeof(int) * up_size);
+ if (!pdata_child->keys_up)
+ goto error;
+ }
+ state->reset_fn = pdata->reset_fn;
+ pdata_child->key_down_fn = do_reset_fn;
+ pdata_child->priv = state;
+ pdata_child->key_down_delay = pdata->key_down_delay;
+ ret = platform_device_add_data(state->pdev_child, pdata_child, size);
+ if (ret)
+ goto error;
+ platform_set_drvdata(pdev, state);
+ return platform_device_add(state->pdev_child);
+error:
+ platform_device_put(state->pdev_child);
+ return ret;
+}
+
+int keyreset_remove(struct platform_device *pdev)
+{
+ struct keyreset_state *state = platform_get_drvdata(pdev);
+ platform_device_put(state->pdev_child);
+ return 0;
+}
+
+
+struct platform_driver keyreset_driver = {
+ .driver.name = KEYRESET_NAME,
+ .probe = keyreset_probe,
+ .remove = keyreset_remove,
+};
+
+static int __init keyreset_init(void)
+{
+ return platform_driver_register(&keyreset_driver);
+}
+
+static void __exit keyreset_exit(void)
+{
+ return platform_driver_unregister(&keyreset_driver);
+}
+
+module_init(keyreset_init);
+module_exit(keyreset_exit);
diff --git a/drivers/input/misc/Kconfig b/drivers/input/misc/Kconfig
index 9f082a3..4b269b3 100644
--- a/drivers/input/misc/Kconfig
+++ b/drivers/input/misc/Kconfig
@@ -367,6 +367,17 @@
To compile this driver as a module, choose M here: the module will be
called ati_remote2.
+config INPUT_KEYCHORD
+ tristate "Key chord input driver support"
+ help
+ Say Y here if you want to enable the key chord driver
+ accessible at /dev/keychord. This driver can be used
+ for receiving notifications when client specified key
+ combinations are pressed.
+
+ To compile this driver as a module, choose M here: the
+ module will be called keychord.
+
config INPUT_KEYSPAN_REMOTE
tristate "Keyspan DMR USB remote control"
depends on USB_ARCH_HAS_HCD
@@ -535,6 +546,11 @@
To compile this driver as a module, choose M here: the
module will be called sgi_btns.
+config INPUT_GPIO
+ tristate "GPIO driver support"
+ help
+ Say Y here if you want to support gpio based keys, wheels etc...
+
config HP_SDC_RTC
tristate "HP SDC Real Time Clock"
depends on (GSC || HP300) && SERIO
diff --git a/drivers/input/misc/Makefile b/drivers/input/misc/Makefile
index 4b6118d..2ecb686 100644
--- a/drivers/input/misc/Makefile
+++ b/drivers/input/misc/Makefile
@@ -38,10 +38,12 @@
obj-$(CONFIG_INPUT_GPIO_BEEPER) += gpio-beeper.o
obj-$(CONFIG_INPUT_GPIO_TILT_POLLED) += gpio_tilt_polled.o
obj-$(CONFIG_INPUT_GPIO_DECODER) += gpio_decoder.o
+obj-$(CONFIG_INPUT_GPIO) += gpio_event.o gpio_matrix.o gpio_input.o gpio_output.o gpio_axis.o
obj-$(CONFIG_INPUT_HISI_POWERKEY) += hisi_powerkey.o
obj-$(CONFIG_HP_SDC_RTC) += hp_sdc_rtc.o
obj-$(CONFIG_INPUT_IMS_PCU) += ims-pcu.o
obj-$(CONFIG_INPUT_IXP4XX_BEEPER) += ixp4xx-beeper.o
+obj-$(CONFIG_INPUT_KEYCHORD) += keychord.o
obj-$(CONFIG_INPUT_KEYSPAN_REMOTE) += keyspan_remote.o
obj-$(CONFIG_INPUT_KXTJ9) += kxtj9.o
obj-$(CONFIG_INPUT_M68K_BEEP) += m68kspkr.o
diff --git a/drivers/input/misc/gpio_axis.c b/drivers/input/misc/gpio_axis.c
new file mode 100644
index 0000000..0acf4a5
--- /dev/null
+++ b/drivers/input/misc/gpio_axis.c
@@ -0,0 +1,192 @@
+/* drivers/input/misc/gpio_axis.c
+ *
+ * Copyright (C) 2007 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/gpio.h>
+#include <linux/gpio_event.h>
+#include <linux/interrupt.h>
+#include <linux/slab.h>
+
+struct gpio_axis_state {
+ struct gpio_event_input_devs *input_devs;
+ struct gpio_event_axis_info *info;
+ uint32_t pos;
+};
+
+uint16_t gpio_axis_4bit_gray_map_table[] = {
+ [0x0] = 0x0, [0x1] = 0x1, /* 0000 0001 */
+ [0x3] = 0x2, [0x2] = 0x3, /* 0011 0010 */
+ [0x6] = 0x4, [0x7] = 0x5, /* 0110 0111 */
+ [0x5] = 0x6, [0x4] = 0x7, /* 0101 0100 */
+ [0xc] = 0x8, [0xd] = 0x9, /* 1100 1101 */
+ [0xf] = 0xa, [0xe] = 0xb, /* 1111 1110 */
+ [0xa] = 0xc, [0xb] = 0xd, /* 1010 1011 */
+ [0x9] = 0xe, [0x8] = 0xf, /* 1001 1000 */
+};
+uint16_t gpio_axis_4bit_gray_map(struct gpio_event_axis_info *info, uint16_t in)
+{
+ return gpio_axis_4bit_gray_map_table[in];
+}
+
+uint16_t gpio_axis_5bit_singletrack_map_table[] = {
+ [0x10] = 0x00, [0x14] = 0x01, [0x1c] = 0x02, /* 10000 10100 11100 */
+ [0x1e] = 0x03, [0x1a] = 0x04, [0x18] = 0x05, /* 11110 11010 11000 */
+ [0x08] = 0x06, [0x0a] = 0x07, [0x0e] = 0x08, /* 01000 01010 01110 */
+ [0x0f] = 0x09, [0x0d] = 0x0a, [0x0c] = 0x0b, /* 01111 01101 01100 */
+ [0x04] = 0x0c, [0x05] = 0x0d, [0x07] = 0x0e, /* 00100 00101 00111 */
+ [0x17] = 0x0f, [0x16] = 0x10, [0x06] = 0x11, /* 10111 10110 00110 */
+ [0x02] = 0x12, [0x12] = 0x13, [0x13] = 0x14, /* 00010 10010 10011 */
+ [0x1b] = 0x15, [0x0b] = 0x16, [0x03] = 0x17, /* 11011 01011 00011 */
+ [0x01] = 0x18, [0x09] = 0x19, [0x19] = 0x1a, /* 00001 01001 11001 */
+ [0x1d] = 0x1b, [0x15] = 0x1c, [0x11] = 0x1d, /* 11101 10101 10001 */
+};
+uint16_t gpio_axis_5bit_singletrack_map(
+ struct gpio_event_axis_info *info, uint16_t in)
+{
+ return gpio_axis_5bit_singletrack_map_table[in];
+}
+
+static void gpio_event_update_axis(struct gpio_axis_state *as, int report)
+{
+ struct gpio_event_axis_info *ai = as->info;
+ int i;
+ int change;
+ uint16_t state = 0;
+ uint16_t pos;
+ uint16_t old_pos = as->pos;
+ for (i = ai->count - 1; i >= 0; i--)
+ state = (state << 1) | gpio_get_value(ai->gpio[i]);
+ pos = ai->map(ai, state);
+ if (ai->flags & GPIOEAF_PRINT_RAW)
+ pr_info("axis %d-%d raw %x, pos %d -> %d\n",
+ ai->type, ai->code, state, old_pos, pos);
+ if (report && pos != old_pos) {
+ if (ai->type == EV_REL) {
+ change = (ai->decoded_size + pos - old_pos) %
+ ai->decoded_size;
+ if (change > ai->decoded_size / 2)
+ change -= ai->decoded_size;
+ if (change == ai->decoded_size / 2) {
+ if (ai->flags & GPIOEAF_PRINT_EVENT)
+ pr_info("axis %d-%d unknown direction, "
+ "pos %d -> %d\n", ai->type,
+ ai->code, old_pos, pos);
+ change = 0; /* no closest direction */
+ }
+ if (ai->flags & GPIOEAF_PRINT_EVENT)
+ pr_info("axis %d-%d change %d\n",
+ ai->type, ai->code, change);
+ input_report_rel(as->input_devs->dev[ai->dev],
+ ai->code, change);
+ } else {
+ if (ai->flags & GPIOEAF_PRINT_EVENT)
+ pr_info("axis %d-%d now %d\n",
+ ai->type, ai->code, pos);
+ input_event(as->input_devs->dev[ai->dev],
+ ai->type, ai->code, pos);
+ }
+ input_sync(as->input_devs->dev[ai->dev]);
+ }
+ as->pos = pos;
+}
+
+static irqreturn_t gpio_axis_irq_handler(int irq, void *dev_id)
+{
+ struct gpio_axis_state *as = dev_id;
+ gpio_event_update_axis(as, 1);
+ return IRQ_HANDLED;
+}
+
+int gpio_event_axis_func(struct gpio_event_input_devs *input_devs,
+ struct gpio_event_info *info, void **data, int func)
+{
+ int ret;
+ int i;
+ int irq;
+ struct gpio_event_axis_info *ai;
+ struct gpio_axis_state *as;
+
+ ai = container_of(info, struct gpio_event_axis_info, info);
+ if (func == GPIO_EVENT_FUNC_SUSPEND) {
+ for (i = 0; i < ai->count; i++)
+ disable_irq(gpio_to_irq(ai->gpio[i]));
+ return 0;
+ }
+ if (func == GPIO_EVENT_FUNC_RESUME) {
+ for (i = 0; i < ai->count; i++)
+ enable_irq(gpio_to_irq(ai->gpio[i]));
+ return 0;
+ }
+
+ if (func == GPIO_EVENT_FUNC_INIT) {
+ *data = as = kmalloc(sizeof(*as), GFP_KERNEL);
+ if (as == NULL) {
+ ret = -ENOMEM;
+ goto err_alloc_axis_state_failed;
+ }
+ as->input_devs = input_devs;
+ as->info = ai;
+ if (ai->dev >= input_devs->count) {
+ pr_err("gpio_event_axis: bad device index %d >= %d "
+ "for %d:%d\n", ai->dev, input_devs->count,
+ ai->type, ai->code);
+ ret = -EINVAL;
+ goto err_bad_device_index;
+ }
+
+ input_set_capability(input_devs->dev[ai->dev],
+ ai->type, ai->code);
+ if (ai->type == EV_ABS) {
+ input_set_abs_params(input_devs->dev[ai->dev], ai->code,
+ 0, ai->decoded_size - 1, 0, 0);
+ }
+ for (i = 0; i < ai->count; i++) {
+ ret = gpio_request(ai->gpio[i], "gpio_event_axis");
+ if (ret < 0)
+ goto err_request_gpio_failed;
+ ret = gpio_direction_input(ai->gpio[i]);
+ if (ret < 0)
+ goto err_gpio_direction_input_failed;
+ ret = irq = gpio_to_irq(ai->gpio[i]);
+ if (ret < 0)
+ goto err_get_irq_num_failed;
+ ret = request_irq(irq, gpio_axis_irq_handler,
+ IRQF_TRIGGER_RISING |
+ IRQF_TRIGGER_FALLING,
+ "gpio_event_axis", as);
+ if (ret < 0)
+ goto err_request_irq_failed;
+ }
+ gpio_event_update_axis(as, 0);
+ return 0;
+ }
+
+ ret = 0;
+ as = *data;
+ for (i = ai->count - 1; i >= 0; i--) {
+ free_irq(gpio_to_irq(ai->gpio[i]), as);
+err_request_irq_failed:
+err_get_irq_num_failed:
+err_gpio_direction_input_failed:
+ gpio_free(ai->gpio[i]);
+err_request_gpio_failed:
+ ;
+ }
+err_bad_device_index:
+ kfree(as);
+ *data = NULL;
+err_alloc_axis_state_failed:
+ return ret;
+}
diff --git a/drivers/input/misc/gpio_event.c b/drivers/input/misc/gpio_event.c
new file mode 100644
index 0000000..90f07eb
--- /dev/null
+++ b/drivers/input/misc/gpio_event.c
@@ -0,0 +1,228 @@
+/* drivers/input/misc/gpio_event.c
+ *
+ * Copyright (C) 2007 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/module.h>
+#include <linux/input.h>
+#include <linux/gpio_event.h>
+#include <linux/hrtimer.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+
+struct gpio_event {
+ struct gpio_event_input_devs *input_devs;
+ const struct gpio_event_platform_data *info;
+ void *state[0];
+};
+
+static int gpio_input_event(
+ struct input_dev *dev, unsigned int type, unsigned int code, int value)
+{
+ int i;
+ int devnr;
+ int ret = 0;
+ int tmp_ret;
+ struct gpio_event_info **ii;
+ struct gpio_event *ip = input_get_drvdata(dev);
+
+ for (devnr = 0; devnr < ip->input_devs->count; devnr++)
+ if (ip->input_devs->dev[devnr] == dev)
+ break;
+ if (devnr == ip->input_devs->count) {
+ pr_err("gpio_input_event: unknown device %p\n", dev);
+ return -EIO;
+ }
+
+ for (i = 0, ii = ip->info->info; i < ip->info->info_count; i++, ii++) {
+ if ((*ii)->event) {
+ tmp_ret = (*ii)->event(ip->input_devs, *ii,
+ &ip->state[i],
+ devnr, type, code, value);
+ if (tmp_ret)
+ ret = tmp_ret;
+ }
+ }
+ return ret;
+}
+
+static int gpio_event_call_all_func(struct gpio_event *ip, int func)
+{
+ int i;
+ int ret;
+ struct gpio_event_info **ii;
+
+ if (func == GPIO_EVENT_FUNC_INIT || func == GPIO_EVENT_FUNC_RESUME) {
+ ii = ip->info->info;
+ for (i = 0; i < ip->info->info_count; i++, ii++) {
+ if ((*ii)->func == NULL) {
+ ret = -ENODEV;
+ pr_err("gpio_event_probe: Incomplete pdata, "
+ "no function\n");
+ goto err_no_func;
+ }
+ if (func == GPIO_EVENT_FUNC_RESUME && (*ii)->no_suspend)
+ continue;
+ ret = (*ii)->func(ip->input_devs, *ii, &ip->state[i],
+ func);
+ if (ret) {
+ pr_err("gpio_event_probe: function failed\n");
+ goto err_func_failed;
+ }
+ }
+ return 0;
+ }
+
+ ret = 0;
+ i = ip->info->info_count;
+ ii = ip->info->info + i;
+ while (i > 0) {
+ i--;
+ ii--;
+ if ((func & ~1) == GPIO_EVENT_FUNC_SUSPEND && (*ii)->no_suspend)
+ continue;
+ (*ii)->func(ip->input_devs, *ii, &ip->state[i], func & ~1);
+err_func_failed:
+err_no_func:
+ ;
+ }
+ return ret;
+}
+
+static void __maybe_unused gpio_event_suspend(struct gpio_event *ip)
+{
+ gpio_event_call_all_func(ip, GPIO_EVENT_FUNC_SUSPEND);
+ if (ip->info->power)
+ ip->info->power(ip->info, 0);
+}
+
+static void __maybe_unused gpio_event_resume(struct gpio_event *ip)
+{
+ if (ip->info->power)
+ ip->info->power(ip->info, 1);
+ gpio_event_call_all_func(ip, GPIO_EVENT_FUNC_RESUME);
+}
+
+static int gpio_event_probe(struct platform_device *pdev)
+{
+ int err;
+ struct gpio_event *ip;
+ struct gpio_event_platform_data *event_info;
+ int dev_count = 1;
+ int i;
+ int registered = 0;
+
+ event_info = pdev->dev.platform_data;
+ if (event_info == NULL) {
+ pr_err("gpio_event_probe: No pdata\n");
+ return -ENODEV;
+ }
+ if ((!event_info->name && !event_info->names[0]) ||
+ !event_info->info || !event_info->info_count) {
+ pr_err("gpio_event_probe: Incomplete pdata\n");
+ return -ENODEV;
+ }
+ if (!event_info->name)
+ while (event_info->names[dev_count])
+ dev_count++;
+ ip = kzalloc(sizeof(*ip) +
+ sizeof(ip->state[0]) * event_info->info_count +
+ sizeof(*ip->input_devs) +
+ sizeof(ip->input_devs->dev[0]) * dev_count, GFP_KERNEL);
+ if (ip == NULL) {
+ err = -ENOMEM;
+ pr_err("gpio_event_probe: Failed to allocate private data\n");
+ goto err_kp_alloc_failed;
+ }
+ ip->input_devs = (void*)&ip->state[event_info->info_count];
+ platform_set_drvdata(pdev, ip);
+
+ for (i = 0; i < dev_count; i++) {
+ struct input_dev *input_dev = input_allocate_device();
+ if (input_dev == NULL) {
+ err = -ENOMEM;
+ pr_err("gpio_event_probe: "
+ "Failed to allocate input device\n");
+ goto err_input_dev_alloc_failed;
+ }
+ input_set_drvdata(input_dev, ip);
+ input_dev->name = event_info->name ?
+ event_info->name : event_info->names[i];
+ input_dev->event = gpio_input_event;
+ ip->input_devs->dev[i] = input_dev;
+ }
+ ip->input_devs->count = dev_count;
+ ip->info = event_info;
+ if (event_info->power)
+ ip->info->power(ip->info, 1);
+
+ err = gpio_event_call_all_func(ip, GPIO_EVENT_FUNC_INIT);
+ if (err)
+ goto err_call_all_func_failed;
+
+ for (i = 0; i < dev_count; i++) {
+ err = input_register_device(ip->input_devs->dev[i]);
+ if (err) {
+ pr_err("gpio_event_probe: Unable to register %s "
+ "input device\n", ip->input_devs->dev[i]->name);
+ goto err_input_register_device_failed;
+ }
+ registered++;
+ }
+
+ return 0;
+
+err_input_register_device_failed:
+ gpio_event_call_all_func(ip, GPIO_EVENT_FUNC_UNINIT);
+err_call_all_func_failed:
+ if (event_info->power)
+ ip->info->power(ip->info, 0);
+ for (i = 0; i < registered; i++)
+ input_unregister_device(ip->input_devs->dev[i]);
+ for (i = dev_count - 1; i >= registered; i--) {
+ input_free_device(ip->input_devs->dev[i]);
+err_input_dev_alloc_failed:
+ ;
+ }
+ kfree(ip);
+err_kp_alloc_failed:
+ return err;
+}
+
+static int gpio_event_remove(struct platform_device *pdev)
+{
+ struct gpio_event *ip = platform_get_drvdata(pdev);
+ int i;
+
+ gpio_event_call_all_func(ip, GPIO_EVENT_FUNC_UNINIT);
+ if (ip->info->power)
+ ip->info->power(ip->info, 0);
+ for (i = 0; i < ip->input_devs->count; i++)
+ input_unregister_device(ip->input_devs->dev[i]);
+ kfree(ip);
+ return 0;
+}
+
+static struct platform_driver gpio_event_driver = {
+ .probe = gpio_event_probe,
+ .remove = gpio_event_remove,
+ .driver = {
+ .name = GPIO_EVENT_DEV_NAME,
+ },
+};
+
+module_platform_driver(gpio_event_driver);
+
+MODULE_DESCRIPTION("GPIO Event Driver");
+MODULE_LICENSE("GPL");
+
diff --git a/drivers/input/misc/gpio_input.c b/drivers/input/misc/gpio_input.c
new file mode 100644
index 0000000..5875d73
--- /dev/null
+++ b/drivers/input/misc/gpio_input.c
@@ -0,0 +1,390 @@
+/* drivers/input/misc/gpio_input.c
+ *
+ * Copyright (C) 2007 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/gpio.h>
+#include <linux/gpio_event.h>
+#include <linux/hrtimer.h>
+#include <linux/input.h>
+#include <linux/interrupt.h>
+#include <linux/slab.h>
+#include <linux/pm_wakeup.h>
+
+enum {
+ DEBOUNCE_UNSTABLE = BIT(0), /* Got irq, while debouncing */
+ DEBOUNCE_PRESSED = BIT(1),
+ DEBOUNCE_NOTPRESSED = BIT(2),
+ DEBOUNCE_WAIT_IRQ = BIT(3), /* Stable irq state */
+ DEBOUNCE_POLL = BIT(4), /* Stable polling state */
+
+ DEBOUNCE_UNKNOWN =
+ DEBOUNCE_PRESSED | DEBOUNCE_NOTPRESSED,
+};
+
+struct gpio_key_state {
+ struct gpio_input_state *ds;
+ uint8_t debounce;
+};
+
+struct gpio_input_state {
+ struct gpio_event_input_devs *input_devs;
+ const struct gpio_event_input_info *info;
+ struct hrtimer timer;
+ int use_irq;
+ int debounce_count;
+ spinlock_t irq_lock;
+ struct wakeup_source *ws;
+ struct gpio_key_state key_state[0];
+};
+
+static enum hrtimer_restart gpio_event_input_timer_func(struct hrtimer *timer)
+{
+ int i;
+ int pressed;
+ struct gpio_input_state *ds =
+ container_of(timer, struct gpio_input_state, timer);
+ unsigned gpio_flags = ds->info->flags;
+ unsigned npolarity;
+ int nkeys = ds->info->keymap_size;
+ const struct gpio_event_direct_entry *key_entry;
+ struct gpio_key_state *key_state;
+ unsigned long irqflags;
+ uint8_t debounce;
+ bool sync_needed;
+
+#if 0
+ key_entry = kp->keys_info->keymap;
+ key_state = kp->key_state;
+ for (i = 0; i < nkeys; i++, key_entry++, key_state++)
+ pr_info("gpio_read_detect_status %d %d\n", key_entry->gpio,
+ gpio_read_detect_status(key_entry->gpio));
+#endif
+ key_entry = ds->info->keymap;
+ key_state = ds->key_state;
+ sync_needed = false;
+ spin_lock_irqsave(&ds->irq_lock, irqflags);
+ for (i = 0; i < nkeys; i++, key_entry++, key_state++) {
+ debounce = key_state->debounce;
+ if (debounce & DEBOUNCE_WAIT_IRQ)
+ continue;
+ if (key_state->debounce & DEBOUNCE_UNSTABLE) {
+ debounce = key_state->debounce = DEBOUNCE_UNKNOWN;
+ enable_irq(gpio_to_irq(key_entry->gpio));
+ if (gpio_flags & GPIOEDF_PRINT_KEY_UNSTABLE)
+ pr_info("gpio_keys_scan_keys: key %x-%x, %d "
+ "(%d) continue debounce\n",
+ ds->info->type, key_entry->code,
+ i, key_entry->gpio);
+ }
+ npolarity = !(gpio_flags & GPIOEDF_ACTIVE_HIGH);
+ pressed = gpio_get_value(key_entry->gpio) ^ npolarity;
+ if (debounce & DEBOUNCE_POLL) {
+ if (pressed == !(debounce & DEBOUNCE_PRESSED)) {
+ ds->debounce_count++;
+ key_state->debounce = DEBOUNCE_UNKNOWN;
+ if (gpio_flags & GPIOEDF_PRINT_KEY_DEBOUNCE)
+ pr_info("gpio_keys_scan_keys: key %x-"
+ "%x, %d (%d) start debounce\n",
+ ds->info->type, key_entry->code,
+ i, key_entry->gpio);
+ }
+ continue;
+ }
+ if (pressed && (debounce & DEBOUNCE_NOTPRESSED)) {
+ if (gpio_flags & GPIOEDF_PRINT_KEY_DEBOUNCE)
+ pr_info("gpio_keys_scan_keys: key %x-%x, %d "
+ "(%d) debounce pressed 1\n",
+ ds->info->type, key_entry->code,
+ i, key_entry->gpio);
+ key_state->debounce = DEBOUNCE_PRESSED;
+ continue;
+ }
+ if (!pressed && (debounce & DEBOUNCE_PRESSED)) {
+ if (gpio_flags & GPIOEDF_PRINT_KEY_DEBOUNCE)
+ pr_info("gpio_keys_scan_keys: key %x-%x, %d "
+ "(%d) debounce pressed 0\n",
+ ds->info->type, key_entry->code,
+ i, key_entry->gpio);
+ key_state->debounce = DEBOUNCE_NOTPRESSED;
+ continue;
+ }
+ /* key is stable */
+ ds->debounce_count--;
+ if (ds->use_irq)
+ key_state->debounce |= DEBOUNCE_WAIT_IRQ;
+ else
+ key_state->debounce |= DEBOUNCE_POLL;
+ if (gpio_flags & GPIOEDF_PRINT_KEYS)
+ pr_info("gpio_keys_scan_keys: key %x-%x, %d (%d) "
+ "changed to %d\n", ds->info->type,
+ key_entry->code, i, key_entry->gpio, pressed);
+ input_event(ds->input_devs->dev[key_entry->dev], ds->info->type,
+ key_entry->code, pressed);
+ sync_needed = true;
+ }
+ if (sync_needed) {
+ for (i = 0; i < ds->input_devs->count; i++)
+ input_sync(ds->input_devs->dev[i]);
+ }
+
+#if 0
+ key_entry = kp->keys_info->keymap;
+ key_state = kp->key_state;
+ for (i = 0; i < nkeys; i++, key_entry++, key_state++) {
+ pr_info("gpio_read_detect_status %d %d\n", key_entry->gpio,
+ gpio_read_detect_status(key_entry->gpio));
+ }
+#endif
+
+ if (ds->debounce_count)
+ hrtimer_start(timer, ds->info->debounce_time, HRTIMER_MODE_REL);
+ else if (!ds->use_irq)
+ hrtimer_start(timer, ds->info->poll_time, HRTIMER_MODE_REL);
+ else
+ __pm_relax(ds->ws);
+
+ spin_unlock_irqrestore(&ds->irq_lock, irqflags);
+
+ return HRTIMER_NORESTART;
+}
+
+static irqreturn_t gpio_event_input_irq_handler(int irq, void *dev_id)
+{
+ struct gpio_key_state *ks = dev_id;
+ struct gpio_input_state *ds = ks->ds;
+ int keymap_index = ks - ds->key_state;
+ const struct gpio_event_direct_entry *key_entry;
+ unsigned long irqflags;
+ int pressed;
+
+ if (!ds->use_irq)
+ return IRQ_HANDLED;
+
+ key_entry = &ds->info->keymap[keymap_index];
+
+ if (ds->info->debounce_time) {
+ spin_lock_irqsave(&ds->irq_lock, irqflags);
+ if (ks->debounce & DEBOUNCE_WAIT_IRQ) {
+ ks->debounce = DEBOUNCE_UNKNOWN;
+ if (ds->debounce_count++ == 0) {
+ __pm_stay_awake(ds->ws);
+ hrtimer_start(
+ &ds->timer, ds->info->debounce_time,
+ HRTIMER_MODE_REL);
+ }
+ if (ds->info->flags & GPIOEDF_PRINT_KEY_DEBOUNCE)
+ pr_info("gpio_event_input_irq_handler: "
+ "key %x-%x, %d (%d) start debounce\n",
+ ds->info->type, key_entry->code,
+ keymap_index, key_entry->gpio);
+ } else {
+ disable_irq_nosync(irq);
+ ks->debounce = DEBOUNCE_UNSTABLE;
+ }
+ spin_unlock_irqrestore(&ds->irq_lock, irqflags);
+ } else {
+ pressed = gpio_get_value(key_entry->gpio) ^
+ !(ds->info->flags & GPIOEDF_ACTIVE_HIGH);
+ if (ds->info->flags & GPIOEDF_PRINT_KEYS)
+ pr_info("gpio_event_input_irq_handler: key %x-%x, %d "
+ "(%d) changed to %d\n",
+ ds->info->type, key_entry->code, keymap_index,
+ key_entry->gpio, pressed);
+ input_event(ds->input_devs->dev[key_entry->dev], ds->info->type,
+ key_entry->code, pressed);
+ input_sync(ds->input_devs->dev[key_entry->dev]);
+ }
+ return IRQ_HANDLED;
+}
+
+static int gpio_event_input_request_irqs(struct gpio_input_state *ds)
+{
+ int i;
+ int err;
+ unsigned int irq;
+ unsigned long req_flags = IRQF_TRIGGER_RISING | IRQF_TRIGGER_FALLING;
+
+ for (i = 0; i < ds->info->keymap_size; i++) {
+ err = irq = gpio_to_irq(ds->info->keymap[i].gpio);
+ if (err < 0)
+ goto err_gpio_get_irq_num_failed;
+ err = request_irq(irq, gpio_event_input_irq_handler,
+ req_flags, "gpio_keys", &ds->key_state[i]);
+ if (err) {
+ pr_err("gpio_event_input_request_irqs: request_irq "
+ "failed for input %d, irq %d\n",
+ ds->info->keymap[i].gpio, irq);
+ goto err_request_irq_failed;
+ }
+ if (ds->info->info.no_suspend) {
+ err = enable_irq_wake(irq);
+ if (err) {
+ pr_err("gpio_event_input_request_irqs: "
+ "enable_irq_wake failed for input %d, "
+ "irq %d\n",
+ ds->info->keymap[i].gpio, irq);
+ goto err_enable_irq_wake_failed;
+ }
+ }
+ }
+ return 0;
+
+ for (i = ds->info->keymap_size - 1; i >= 0; i--) {
+ irq = gpio_to_irq(ds->info->keymap[i].gpio);
+ if (ds->info->info.no_suspend)
+ disable_irq_wake(irq);
+err_enable_irq_wake_failed:
+ free_irq(irq, &ds->key_state[i]);
+err_request_irq_failed:
+err_gpio_get_irq_num_failed:
+ ;
+ }
+ return err;
+}
+
+int gpio_event_input_func(struct gpio_event_input_devs *input_devs,
+ struct gpio_event_info *info, void **data, int func)
+{
+ int ret;
+ int i;
+ unsigned long irqflags;
+ struct gpio_event_input_info *di;
+ struct gpio_input_state *ds = *data;
+ char *wlname;
+
+ di = container_of(info, struct gpio_event_input_info, info);
+
+ if (func == GPIO_EVENT_FUNC_SUSPEND) {
+ if (ds->use_irq)
+ for (i = 0; i < di->keymap_size; i++)
+ disable_irq(gpio_to_irq(di->keymap[i].gpio));
+ hrtimer_cancel(&ds->timer);
+ return 0;
+ }
+ if (func == GPIO_EVENT_FUNC_RESUME) {
+ spin_lock_irqsave(&ds->irq_lock, irqflags);
+ if (ds->use_irq)
+ for (i = 0; i < di->keymap_size; i++)
+ enable_irq(gpio_to_irq(di->keymap[i].gpio));
+ hrtimer_start(&ds->timer, ktime_set(0, 0), HRTIMER_MODE_REL);
+ spin_unlock_irqrestore(&ds->irq_lock, irqflags);
+ return 0;
+ }
+
+ if (func == GPIO_EVENT_FUNC_INIT) {
+ if (ktime_to_ns(di->poll_time) <= 0)
+ di->poll_time = ktime_set(0, 20 * NSEC_PER_MSEC);
+
+ *data = ds = kzalloc(sizeof(*ds) + sizeof(ds->key_state[0]) *
+ di->keymap_size, GFP_KERNEL);
+ if (ds == NULL) {
+ ret = -ENOMEM;
+ pr_err("gpio_event_input_func: "
+ "Failed to allocate private data\n");
+ goto err_ds_alloc_failed;
+ }
+ ds->debounce_count = di->keymap_size;
+ ds->input_devs = input_devs;
+ ds->info = di;
+ wlname = kasprintf(GFP_KERNEL, "gpio_input:%s%s",
+ input_devs->dev[0]->name,
+ (input_devs->count > 1) ? "..." : "");
+
+ ds->ws = wakeup_source_register(wlname);
+ kfree(wlname);
+ if (!ds->ws) {
+ ret = -ENOMEM;
+ pr_err("gpio_event_input_func: "
+ "Failed to allocate wakeup source\n");
+ goto err_ws_failed;
+ }
+
+ spin_lock_init(&ds->irq_lock);
+
+ for (i = 0; i < di->keymap_size; i++) {
+ int dev = di->keymap[i].dev;
+ if (dev >= input_devs->count) {
+ pr_err("gpio_event_input_func: bad device "
+ "index %d >= %d for key code %d\n",
+ dev, input_devs->count,
+ di->keymap[i].code);
+ ret = -EINVAL;
+ goto err_bad_keymap;
+ }
+ input_set_capability(input_devs->dev[dev], di->type,
+ di->keymap[i].code);
+ ds->key_state[i].ds = ds;
+ ds->key_state[i].debounce = DEBOUNCE_UNKNOWN;
+ }
+
+ for (i = 0; i < di->keymap_size; i++) {
+ ret = gpio_request(di->keymap[i].gpio, "gpio_kp_in");
+ if (ret) {
+ pr_err("gpio_event_input_func: gpio_request "
+ "failed for %d\n", di->keymap[i].gpio);
+ goto err_gpio_request_failed;
+ }
+ ret = gpio_direction_input(di->keymap[i].gpio);
+ if (ret) {
+ pr_err("gpio_event_input_func: "
+ "gpio_direction_input failed for %d\n",
+ di->keymap[i].gpio);
+ goto err_gpio_configure_failed;
+ }
+ }
+
+ ret = gpio_event_input_request_irqs(ds);
+
+ spin_lock_irqsave(&ds->irq_lock, irqflags);
+ ds->use_irq = ret == 0;
+
+ pr_info("GPIO Input Driver: Start gpio inputs for %s%s in %s "
+ "mode\n", input_devs->dev[0]->name,
+ (input_devs->count > 1) ? "..." : "",
+ ret == 0 ? "interrupt" : "polling");
+
+ hrtimer_init(&ds->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ ds->timer.function = gpio_event_input_timer_func;
+ hrtimer_start(&ds->timer, ktime_set(0, 0), HRTIMER_MODE_REL);
+ spin_unlock_irqrestore(&ds->irq_lock, irqflags);
+ return 0;
+ }
+
+ ret = 0;
+ spin_lock_irqsave(&ds->irq_lock, irqflags);
+ hrtimer_cancel(&ds->timer);
+ if (ds->use_irq) {
+ for (i = di->keymap_size - 1; i >= 0; i--) {
+ int irq = gpio_to_irq(di->keymap[i].gpio);
+ if (ds->info->info.no_suspend)
+ disable_irq_wake(irq);
+ free_irq(irq, &ds->key_state[i]);
+ }
+ }
+ spin_unlock_irqrestore(&ds->irq_lock, irqflags);
+
+ for (i = di->keymap_size - 1; i >= 0; i--) {
+err_gpio_configure_failed:
+ gpio_free(di->keymap[i].gpio);
+err_gpio_request_failed:
+ ;
+ }
+err_bad_keymap:
+ wakeup_source_unregister(ds->ws);
+err_ws_failed:
+ kfree(ds);
+err_ds_alloc_failed:
+ return ret;
+}
diff --git a/drivers/input/misc/gpio_matrix.c b/drivers/input/misc/gpio_matrix.c
new file mode 100644
index 0000000..08769dd
--- /dev/null
+++ b/drivers/input/misc/gpio_matrix.c
@@ -0,0 +1,440 @@
+/* drivers/input/misc/gpio_matrix.c
+ *
+ * Copyright (C) 2007 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/gpio.h>
+#include <linux/gpio_event.h>
+#include <linux/hrtimer.h>
+#include <linux/interrupt.h>
+#include <linux/slab.h>
+
+struct gpio_kp {
+ struct gpio_event_input_devs *input_devs;
+ struct gpio_event_matrix_info *keypad_info;
+ struct hrtimer timer;
+ struct wakeup_source wake_src;
+ int current_output;
+ unsigned int use_irq:1;
+ unsigned int key_state_changed:1;
+ unsigned int last_key_state_changed:1;
+ unsigned int some_keys_pressed:2;
+ unsigned int disabled_irq:1;
+ unsigned long keys_pressed[0];
+};
+
+static void clear_phantom_key(struct gpio_kp *kp, int out, int in)
+{
+ struct gpio_event_matrix_info *mi = kp->keypad_info;
+ int key_index = out * mi->ninputs + in;
+ unsigned short keyentry = mi->keymap[key_index];
+ unsigned short keycode = keyentry & MATRIX_KEY_MASK;
+ unsigned short dev = keyentry >> MATRIX_CODE_BITS;
+
+ if (!test_bit(keycode, kp->input_devs->dev[dev]->key)) {
+ if (mi->flags & GPIOKPF_PRINT_PHANTOM_KEYS)
+ pr_info("gpiomatrix: phantom key %x, %d-%d (%d-%d) "
+ "cleared\n", keycode, out, in,
+ mi->output_gpios[out], mi->input_gpios[in]);
+ __clear_bit(key_index, kp->keys_pressed);
+ } else {
+ if (mi->flags & GPIOKPF_PRINT_PHANTOM_KEYS)
+ pr_info("gpiomatrix: phantom key %x, %d-%d (%d-%d) "
+ "not cleared\n", keycode, out, in,
+ mi->output_gpios[out], mi->input_gpios[in]);
+ }
+}
+
+static int restore_keys_for_input(struct gpio_kp *kp, int out, int in)
+{
+ int rv = 0;
+ int key_index;
+
+ key_index = out * kp->keypad_info->ninputs + in;
+ while (out < kp->keypad_info->noutputs) {
+ if (test_bit(key_index, kp->keys_pressed)) {
+ rv = 1;
+ clear_phantom_key(kp, out, in);
+ }
+ key_index += kp->keypad_info->ninputs;
+ out++;
+ }
+ return rv;
+}
+
+static void remove_phantom_keys(struct gpio_kp *kp)
+{
+ int out, in, inp;
+ int key_index;
+
+ if (kp->some_keys_pressed < 3)
+ return;
+
+ for (out = 0; out < kp->keypad_info->noutputs; out++) {
+ inp = -1;
+ key_index = out * kp->keypad_info->ninputs;
+ for (in = 0; in < kp->keypad_info->ninputs; in++, key_index++) {
+ if (test_bit(key_index, kp->keys_pressed)) {
+ if (inp == -1) {
+ inp = in;
+ continue;
+ }
+ if (inp >= 0) {
+ if (!restore_keys_for_input(kp, out + 1,
+ inp))
+ break;
+ clear_phantom_key(kp, out, inp);
+ inp = -2;
+ }
+ restore_keys_for_input(kp, out, in);
+ }
+ }
+ }
+}
+
+static void report_key(struct gpio_kp *kp, int key_index, int out, int in)
+{
+ struct gpio_event_matrix_info *mi = kp->keypad_info;
+ int pressed = test_bit(key_index, kp->keys_pressed);
+ unsigned short keyentry = mi->keymap[key_index];
+ unsigned short keycode = keyentry & MATRIX_KEY_MASK;
+ unsigned short dev = keyentry >> MATRIX_CODE_BITS;
+
+ if (pressed != test_bit(keycode, kp->input_devs->dev[dev]->key)) {
+ if (keycode == KEY_RESERVED) {
+ if (mi->flags & GPIOKPF_PRINT_UNMAPPED_KEYS)
+ pr_info("gpiomatrix: unmapped key, %d-%d "
+ "(%d-%d) changed to %d\n",
+ out, in, mi->output_gpios[out],
+ mi->input_gpios[in], pressed);
+ } else {
+ if (mi->flags & GPIOKPF_PRINT_MAPPED_KEYS)
+ pr_info("gpiomatrix: key %x, %d-%d (%d-%d) "
+ "changed to %d\n", keycode,
+ out, in, mi->output_gpios[out],
+ mi->input_gpios[in], pressed);
+ input_report_key(kp->input_devs->dev[dev], keycode, pressed);
+ }
+ }
+}
+
+static void report_sync(struct gpio_kp *kp)
+{
+ int i;
+
+ for (i = 0; i < kp->input_devs->count; i++)
+ input_sync(kp->input_devs->dev[i]);
+}
+
+static enum hrtimer_restart gpio_keypad_timer_func(struct hrtimer *timer)
+{
+ int out, in;
+ int key_index;
+ int gpio;
+ struct gpio_kp *kp = container_of(timer, struct gpio_kp, timer);
+ struct gpio_event_matrix_info *mi = kp->keypad_info;
+ unsigned gpio_keypad_flags = mi->flags;
+ unsigned polarity = !!(gpio_keypad_flags & GPIOKPF_ACTIVE_HIGH);
+
+ out = kp->current_output;
+ if (out == mi->noutputs) {
+ out = 0;
+ kp->last_key_state_changed = kp->key_state_changed;
+ kp->key_state_changed = 0;
+ kp->some_keys_pressed = 0;
+ } else {
+ key_index = out * mi->ninputs;
+ for (in = 0; in < mi->ninputs; in++, key_index++) {
+ gpio = mi->input_gpios[in];
+ if (gpio_get_value(gpio) ^ !polarity) {
+ if (kp->some_keys_pressed < 3)
+ kp->some_keys_pressed++;
+ kp->key_state_changed |= !__test_and_set_bit(
+ key_index, kp->keys_pressed);
+ } else
+ kp->key_state_changed |= __test_and_clear_bit(
+ key_index, kp->keys_pressed);
+ }
+ gpio = mi->output_gpios[out];
+ if (gpio_keypad_flags & GPIOKPF_DRIVE_INACTIVE)
+ gpio_set_value(gpio, !polarity);
+ else
+ gpio_direction_input(gpio);
+ out++;
+ }
+ kp->current_output = out;
+ if (out < mi->noutputs) {
+ gpio = mi->output_gpios[out];
+ if (gpio_keypad_flags & GPIOKPF_DRIVE_INACTIVE)
+ gpio_set_value(gpio, polarity);
+ else
+ gpio_direction_output(gpio, polarity);
+ hrtimer_start(timer, mi->settle_time, HRTIMER_MODE_REL);
+ return HRTIMER_NORESTART;
+ }
+ if (gpio_keypad_flags & GPIOKPF_DEBOUNCE) {
+ if (kp->key_state_changed) {
+ hrtimer_start(&kp->timer, mi->debounce_delay,
+ HRTIMER_MODE_REL);
+ return HRTIMER_NORESTART;
+ }
+ kp->key_state_changed = kp->last_key_state_changed;
+ }
+ if (kp->key_state_changed) {
+ if (gpio_keypad_flags & GPIOKPF_REMOVE_SOME_PHANTOM_KEYS)
+ remove_phantom_keys(kp);
+ key_index = 0;
+ for (out = 0; out < mi->noutputs; out++)
+ for (in = 0; in < mi->ninputs; in++, key_index++)
+ report_key(kp, key_index, out, in);
+ report_sync(kp);
+ }
+ if (!kp->use_irq || kp->some_keys_pressed) {
+ hrtimer_start(timer, mi->poll_time, HRTIMER_MODE_REL);
+ return HRTIMER_NORESTART;
+ }
+
+ /* No keys are pressed, reenable interrupt */
+ for (out = 0; out < mi->noutputs; out++) {
+ if (gpio_keypad_flags & GPIOKPF_DRIVE_INACTIVE)
+ gpio_set_value(mi->output_gpios[out], polarity);
+ else
+ gpio_direction_output(mi->output_gpios[out], polarity);
+ }
+ for (in = 0; in < mi->ninputs; in++)
+ enable_irq(gpio_to_irq(mi->input_gpios[in]));
+ __pm_relax(&kp->wake_src);
+ return HRTIMER_NORESTART;
+}
+
+static irqreturn_t gpio_keypad_irq_handler(int irq_in, void *dev_id)
+{
+ int i;
+ struct gpio_kp *kp = dev_id;
+ struct gpio_event_matrix_info *mi = kp->keypad_info;
+ unsigned gpio_keypad_flags = mi->flags;
+
+ if (!kp->use_irq) {
+ /* ignore interrupt while registering the handler */
+ kp->disabled_irq = 1;
+ disable_irq_nosync(irq_in);
+ return IRQ_HANDLED;
+ }
+
+ for (i = 0; i < mi->ninputs; i++)
+ disable_irq_nosync(gpio_to_irq(mi->input_gpios[i]));
+ for (i = 0; i < mi->noutputs; i++) {
+ if (gpio_keypad_flags & GPIOKPF_DRIVE_INACTIVE)
+ gpio_set_value(mi->output_gpios[i],
+ !(gpio_keypad_flags & GPIOKPF_ACTIVE_HIGH));
+ else
+ gpio_direction_input(mi->output_gpios[i]);
+ }
+ __pm_stay_awake(&kp->wake_src);
+ hrtimer_start(&kp->timer, ktime_set(0, 0), HRTIMER_MODE_REL);
+ return IRQ_HANDLED;
+}
+
+static int gpio_keypad_request_irqs(struct gpio_kp *kp)
+{
+ int i;
+ int err;
+ unsigned int irq;
+ unsigned long request_flags;
+ struct gpio_event_matrix_info *mi = kp->keypad_info;
+
+ switch (mi->flags & (GPIOKPF_ACTIVE_HIGH|GPIOKPF_LEVEL_TRIGGERED_IRQ)) {
+ default:
+ request_flags = IRQF_TRIGGER_FALLING;
+ break;
+ case GPIOKPF_ACTIVE_HIGH:
+ request_flags = IRQF_TRIGGER_RISING;
+ break;
+ case GPIOKPF_LEVEL_TRIGGERED_IRQ:
+ request_flags = IRQF_TRIGGER_LOW;
+ break;
+ case GPIOKPF_LEVEL_TRIGGERED_IRQ | GPIOKPF_ACTIVE_HIGH:
+ request_flags = IRQF_TRIGGER_HIGH;
+ break;
+ }
+
+ for (i = 0; i < mi->ninputs; i++) {
+ err = irq = gpio_to_irq(mi->input_gpios[i]);
+ if (err < 0)
+ goto err_gpio_get_irq_num_failed;
+ err = request_irq(irq, gpio_keypad_irq_handler, request_flags,
+ "gpio_kp", kp);
+ if (err) {
+ pr_err("gpiomatrix: request_irq failed for input %d, "
+ "irq %d\n", mi->input_gpios[i], irq);
+ goto err_request_irq_failed;
+ }
+ err = enable_irq_wake(irq);
+ if (err) {
+ pr_err("gpiomatrix: set_irq_wake failed for input %d, "
+ "irq %d\n", mi->input_gpios[i], irq);
+ }
+ disable_irq(irq);
+ if (kp->disabled_irq) {
+ kp->disabled_irq = 0;
+ enable_irq(irq);
+ }
+ }
+ return 0;
+
+ for (i = mi->noutputs - 1; i >= 0; i--) {
+ free_irq(gpio_to_irq(mi->input_gpios[i]), kp);
+err_request_irq_failed:
+err_gpio_get_irq_num_failed:
+ ;
+ }
+ return err;
+}
+
+int gpio_event_matrix_func(struct gpio_event_input_devs *input_devs,
+ struct gpio_event_info *info, void **data, int func)
+{
+ int i;
+ int err;
+ int key_count;
+ struct gpio_kp *kp;
+ struct gpio_event_matrix_info *mi;
+
+ mi = container_of(info, struct gpio_event_matrix_info, info);
+ if (func == GPIO_EVENT_FUNC_SUSPEND || func == GPIO_EVENT_FUNC_RESUME) {
+ /* TODO: disable scanning */
+ return 0;
+ }
+
+ if (func == GPIO_EVENT_FUNC_INIT) {
+ if (mi->keymap == NULL ||
+ mi->input_gpios == NULL ||
+ mi->output_gpios == NULL) {
+ err = -ENODEV;
+ pr_err("gpiomatrix: Incomplete pdata\n");
+ goto err_invalid_platform_data;
+ }
+ key_count = mi->ninputs * mi->noutputs;
+
+ *data = kp = kzalloc(sizeof(*kp) + sizeof(kp->keys_pressed[0]) *
+ BITS_TO_LONGS(key_count), GFP_KERNEL);
+ if (kp == NULL) {
+ err = -ENOMEM;
+ pr_err("gpiomatrix: Failed to allocate private data\n");
+ goto err_kp_alloc_failed;
+ }
+ kp->input_devs = input_devs;
+ kp->keypad_info = mi;
+ for (i = 0; i < key_count; i++) {
+ unsigned short keyentry = mi->keymap[i];
+ unsigned short keycode = keyentry & MATRIX_KEY_MASK;
+ unsigned short dev = keyentry >> MATRIX_CODE_BITS;
+ if (dev >= input_devs->count) {
+ pr_err("gpiomatrix: bad device index %d >= "
+ "%d for key code %d\n",
+ dev, input_devs->count, keycode);
+ err = -EINVAL;
+ goto err_bad_keymap;
+ }
+ if (keycode && keycode <= KEY_MAX)
+ input_set_capability(input_devs->dev[dev],
+ EV_KEY, keycode);
+ }
+
+ for (i = 0; i < mi->noutputs; i++) {
+ err = gpio_request(mi->output_gpios[i], "gpio_kp_out");
+ if (err) {
+ pr_err("gpiomatrix: gpio_request failed for "
+ "output %d\n", mi->output_gpios[i]);
+ goto err_request_output_gpio_failed;
+ }
+ if (gpio_cansleep(mi->output_gpios[i])) {
+ pr_err("gpiomatrix: unsupported output gpio %d,"
+ " can sleep\n", mi->output_gpios[i]);
+ err = -EINVAL;
+ goto err_output_gpio_configure_failed;
+ }
+ if (mi->flags & GPIOKPF_DRIVE_INACTIVE)
+ err = gpio_direction_output(mi->output_gpios[i],
+ !(mi->flags & GPIOKPF_ACTIVE_HIGH));
+ else
+ err = gpio_direction_input(mi->output_gpios[i]);
+ if (err) {
+ pr_err("gpiomatrix: gpio_configure failed for "
+ "output %d\n", mi->output_gpios[i]);
+ goto err_output_gpio_configure_failed;
+ }
+ }
+ for (i = 0; i < mi->ninputs; i++) {
+ err = gpio_request(mi->input_gpios[i], "gpio_kp_in");
+ if (err) {
+ pr_err("gpiomatrix: gpio_request failed for "
+ "input %d\n", mi->input_gpios[i]);
+ goto err_request_input_gpio_failed;
+ }
+ err = gpio_direction_input(mi->input_gpios[i]);
+ if (err) {
+ pr_err("gpiomatrix: gpio_direction_input failed"
+ " for input %d\n", mi->input_gpios[i]);
+ goto err_gpio_direction_input_failed;
+ }
+ }
+ kp->current_output = mi->noutputs;
+ kp->key_state_changed = 1;
+
+ hrtimer_init(&kp->timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ kp->timer.function = gpio_keypad_timer_func;
+ wakeup_source_init(&kp->wake_src, "gpio_kp");
+ err = gpio_keypad_request_irqs(kp);
+ kp->use_irq = err == 0;
+
+ pr_info("GPIO Matrix Keypad Driver: Start keypad matrix for "
+ "%s%s in %s mode\n", input_devs->dev[0]->name,
+ (input_devs->count > 1) ? "..." : "",
+ kp->use_irq ? "interrupt" : "polling");
+
+ if (kp->use_irq)
+ __pm_stay_awake(&kp->wake_src);
+ hrtimer_start(&kp->timer, ktime_set(0, 0), HRTIMER_MODE_REL);
+
+ return 0;
+ }
+
+ err = 0;
+ kp = *data;
+
+ if (kp->use_irq)
+ for (i = mi->noutputs - 1; i >= 0; i--)
+ free_irq(gpio_to_irq(mi->input_gpios[i]), kp);
+
+ hrtimer_cancel(&kp->timer);
+ wakeup_source_trash(&kp->wake_src);
+ for (i = mi->noutputs - 1; i >= 0; i--) {
+err_gpio_direction_input_failed:
+ gpio_free(mi->input_gpios[i]);
+err_request_input_gpio_failed:
+ ;
+ }
+ for (i = mi->noutputs - 1; i >= 0; i--) {
+err_output_gpio_configure_failed:
+ gpio_free(mi->output_gpios[i]);
+err_request_output_gpio_failed:
+ ;
+ }
+err_bad_keymap:
+ kfree(kp);
+err_kp_alloc_failed:
+err_invalid_platform_data:
+ return err;
+}
diff --git a/drivers/input/misc/gpio_output.c b/drivers/input/misc/gpio_output.c
new file mode 100644
index 0000000..2aac2fa
--- /dev/null
+++ b/drivers/input/misc/gpio_output.c
@@ -0,0 +1,97 @@
+/* drivers/input/misc/gpio_output.c
+ *
+ * Copyright (C) 2007 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/gpio.h>
+#include <linux/gpio_event.h>
+
+int gpio_event_output_event(
+ struct gpio_event_input_devs *input_devs, struct gpio_event_info *info,
+ void **data, unsigned int dev, unsigned int type,
+ unsigned int code, int value)
+{
+ int i;
+ struct gpio_event_output_info *oi;
+ oi = container_of(info, struct gpio_event_output_info, info);
+ if (type != oi->type)
+ return 0;
+ if (!(oi->flags & GPIOEDF_ACTIVE_HIGH))
+ value = !value;
+ for (i = 0; i < oi->keymap_size; i++)
+ if (dev == oi->keymap[i].dev && code == oi->keymap[i].code)
+ gpio_set_value(oi->keymap[i].gpio, value);
+ return 0;
+}
+
+int gpio_event_output_func(
+ struct gpio_event_input_devs *input_devs, struct gpio_event_info *info,
+ void **data, int func)
+{
+ int ret;
+ int i;
+ struct gpio_event_output_info *oi;
+ oi = container_of(info, struct gpio_event_output_info, info);
+
+ if (func == GPIO_EVENT_FUNC_SUSPEND || func == GPIO_EVENT_FUNC_RESUME)
+ return 0;
+
+ if (func == GPIO_EVENT_FUNC_INIT) {
+ int output_level = !(oi->flags & GPIOEDF_ACTIVE_HIGH);
+
+ for (i = 0; i < oi->keymap_size; i++) {
+ int dev = oi->keymap[i].dev;
+ if (dev >= input_devs->count) {
+ pr_err("gpio_event_output_func: bad device "
+ "index %d >= %d for key code %d\n",
+ dev, input_devs->count,
+ oi->keymap[i].code);
+ ret = -EINVAL;
+ goto err_bad_keymap;
+ }
+ input_set_capability(input_devs->dev[dev], oi->type,
+ oi->keymap[i].code);
+ }
+
+ for (i = 0; i < oi->keymap_size; i++) {
+ ret = gpio_request(oi->keymap[i].gpio,
+ "gpio_event_output");
+ if (ret) {
+ pr_err("gpio_event_output_func: gpio_request "
+ "failed for %d\n", oi->keymap[i].gpio);
+ goto err_gpio_request_failed;
+ }
+ ret = gpio_direction_output(oi->keymap[i].gpio,
+ output_level);
+ if (ret) {
+ pr_err("gpio_event_output_func: "
+ "gpio_direction_output failed for %d\n",
+ oi->keymap[i].gpio);
+ goto err_gpio_direction_output_failed;
+ }
+ }
+ return 0;
+ }
+
+ ret = 0;
+ for (i = oi->keymap_size - 1; i >= 0; i--) {
+err_gpio_direction_output_failed:
+ gpio_free(oi->keymap[i].gpio);
+err_gpio_request_failed:
+ ;
+ }
+err_bad_keymap:
+ return ret;
+}
+
diff --git a/drivers/input/misc/keychord.c b/drivers/input/misc/keychord.c
new file mode 100644
index 0000000..791f285
--- /dev/null
+++ b/drivers/input/misc/keychord.c
@@ -0,0 +1,467 @@
+/*
+ * drivers/input/misc/keychord.c
+ *
+ * Copyright (C) 2008 Google, Inc.
+ * Author: Mike Lockwood <lockwood@android.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+*/
+
+#include <linux/poll.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/spinlock.h>
+#include <linux/fs.h>
+#include <linux/miscdevice.h>
+#include <linux/keychord.h>
+#include <linux/sched.h>
+
+#define KEYCHORD_NAME "keychord"
+#define BUFFER_SIZE 16
+
+MODULE_AUTHOR("Mike Lockwood <lockwood@android.com>");
+MODULE_DESCRIPTION("Key chord input driver");
+MODULE_SUPPORTED_DEVICE("keychord");
+MODULE_LICENSE("GPL");
+
+#define NEXT_KEYCHORD(kc) ((struct input_keychord *) \
+ ((char *)kc + sizeof(struct input_keychord) + \
+ kc->count * sizeof(kc->keycodes[0])))
+
+struct keychord_device {
+ struct input_handler input_handler;
+ int registered;
+
+ /* list of keychords to monitor */
+ struct input_keychord *keychords;
+ int keychord_count;
+
+ /* bitmask of keys contained in our keychords */
+ unsigned long keybit[BITS_TO_LONGS(KEY_CNT)];
+ /* current state of the keys */
+ unsigned long keystate[BITS_TO_LONGS(KEY_CNT)];
+ /* number of keys that are currently pressed */
+ int key_down;
+
+ /* second input_device_id is needed for null termination */
+ struct input_device_id device_ids[2];
+
+ spinlock_t lock;
+ wait_queue_head_t waitq;
+ unsigned char head;
+ unsigned char tail;
+ __u16 buff[BUFFER_SIZE];
+ /* Bit to serialize writes to this device */
+#define KEYCHORD_BUSY 0x01
+ unsigned long flags;
+ wait_queue_head_t write_waitq;
+};
+
+static int check_keychord(struct keychord_device *kdev,
+ struct input_keychord *keychord)
+{
+ int i;
+
+ if (keychord->count != kdev->key_down)
+ return 0;
+
+ for (i = 0; i < keychord->count; i++) {
+ if (!test_bit(keychord->keycodes[i], kdev->keystate))
+ return 0;
+ }
+
+ /* we have a match */
+ return 1;
+}
+
+static void keychord_event(struct input_handle *handle, unsigned int type,
+ unsigned int code, int value)
+{
+ struct keychord_device *kdev = handle->private;
+ struct input_keychord *keychord;
+ unsigned long flags;
+ int i, got_chord = 0;
+
+ if (type != EV_KEY || code >= KEY_MAX)
+ return;
+
+ spin_lock_irqsave(&kdev->lock, flags);
+ /* do nothing if key state did not change */
+ if (!test_bit(code, kdev->keystate) == !value)
+ goto done;
+ __change_bit(code, kdev->keystate);
+ if (value)
+ kdev->key_down++;
+ else
+ kdev->key_down--;
+
+ /* don't notify on key up */
+ if (!value)
+ goto done;
+ /* ignore this event if it is not one of the keys we are monitoring */
+ if (!test_bit(code, kdev->keybit))
+ goto done;
+
+ keychord = kdev->keychords;
+ if (!keychord)
+ goto done;
+
+ /* check to see if the keyboard state matches any keychords */
+ for (i = 0; i < kdev->keychord_count; i++) {
+ if (check_keychord(kdev, keychord)) {
+ kdev->buff[kdev->head] = keychord->id;
+ kdev->head = (kdev->head + 1) % BUFFER_SIZE;
+ got_chord = 1;
+ break;
+ }
+ /* skip to next keychord */
+ keychord = NEXT_KEYCHORD(keychord);
+ }
+
+done:
+ spin_unlock_irqrestore(&kdev->lock, flags);
+
+ if (got_chord) {
+ pr_info("keychord: got keychord id %d. Any tasks: %d\n",
+ keychord->id,
+ !list_empty_careful(&kdev->waitq.head));
+ wake_up_interruptible(&kdev->waitq);
+ }
+}
+
+static int keychord_connect(struct input_handler *handler,
+ struct input_dev *dev,
+ const struct input_device_id *id)
+{
+ int i, ret;
+ struct input_handle *handle;
+ struct keychord_device *kdev =
+ container_of(handler, struct keychord_device, input_handler);
+
+ /*
+ * ignore this input device if it does not contain any keycodes
+ * that we are monitoring
+ */
+ for (i = 0; i < KEY_MAX; i++) {
+ if (test_bit(i, kdev->keybit) && test_bit(i, dev->keybit))
+ break;
+ }
+ if (i == KEY_MAX)
+ return -ENODEV;
+
+ handle = kzalloc(sizeof(*handle), GFP_KERNEL);
+ if (!handle)
+ return -ENOMEM;
+
+ handle->dev = dev;
+ handle->handler = handler;
+ handle->name = KEYCHORD_NAME;
+ handle->private = kdev;
+
+ ret = input_register_handle(handle);
+ if (ret)
+ goto err_input_register_handle;
+
+ ret = input_open_device(handle);
+ if (ret)
+ goto err_input_open_device;
+
+ pr_info("keychord: using input dev %s for fevent\n", dev->name);
+ return 0;
+
+err_input_open_device:
+ input_unregister_handle(handle);
+err_input_register_handle:
+ kfree(handle);
+ return ret;
+}
+
+static void keychord_disconnect(struct input_handle *handle)
+{
+ input_close_device(handle);
+ input_unregister_handle(handle);
+ kfree(handle);
+}
+
+/*
+ * keychord_read is used to read keychord events from the driver
+ */
+static ssize_t keychord_read(struct file *file, char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ struct keychord_device *kdev = file->private_data;
+ __u16 id;
+ int retval;
+ unsigned long flags;
+
+ if (count < sizeof(id))
+ return -EINVAL;
+ count = sizeof(id);
+
+ if (kdev->head == kdev->tail && (file->f_flags & O_NONBLOCK))
+ return -EAGAIN;
+
+ retval = wait_event_interruptible(kdev->waitq,
+ kdev->head != kdev->tail);
+ if (retval)
+ return retval;
+
+ spin_lock_irqsave(&kdev->lock, flags);
+ /* pop a keychord ID off the queue */
+ id = kdev->buff[kdev->tail];
+ kdev->tail = (kdev->tail + 1) % BUFFER_SIZE;
+ spin_unlock_irqrestore(&kdev->lock, flags);
+
+ if (copy_to_user(buffer, &id, count))
+ return -EFAULT;
+
+ return count;
+}
+
+/*
+ * serializes writes on a device. can use mutex_lock_interruptible()
+ * for this particular use case as well - a matter of preference.
+ */
+static int
+keychord_write_lock(struct keychord_device *kdev)
+{
+ int ret;
+ unsigned long flags;
+
+ spin_lock_irqsave(&kdev->lock, flags);
+ while (kdev->flags & KEYCHORD_BUSY) {
+ spin_unlock_irqrestore(&kdev->lock, flags);
+ ret = wait_event_interruptible(kdev->write_waitq,
+ ((kdev->flags & KEYCHORD_BUSY) == 0));
+ if (ret)
+ return ret;
+ spin_lock_irqsave(&kdev->lock, flags);
+ }
+ kdev->flags |= KEYCHORD_BUSY;
+ spin_unlock_irqrestore(&kdev->lock, flags);
+ return 0;
+}
+
+static void
+keychord_write_unlock(struct keychord_device *kdev)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&kdev->lock, flags);
+ kdev->flags &= ~KEYCHORD_BUSY;
+ spin_unlock_irqrestore(&kdev->lock, flags);
+ wake_up_interruptible(&kdev->write_waitq);
+}
+
+/*
+ * keychord_write is used to configure the driver
+ */
+static ssize_t keychord_write(struct file *file, const char __user *buffer,
+ size_t count, loff_t *ppos)
+{
+ struct keychord_device *kdev = file->private_data;
+ struct input_keychord *keychords = 0;
+ struct input_keychord *keychord;
+ int ret, i, key;
+ unsigned long flags;
+ size_t resid = count;
+ size_t key_bytes;
+
+ if (count < sizeof(struct input_keychord) || count > PAGE_SIZE)
+ return -EINVAL;
+ keychords = kzalloc(count, GFP_KERNEL);
+ if (!keychords)
+ return -ENOMEM;
+
+ /* read list of keychords from userspace */
+ if (copy_from_user(keychords, buffer, count)) {
+ kfree(keychords);
+ return -EFAULT;
+ }
+
+ /*
+ * Serialize writes to this device to prevent various races.
+ * 1) writers racing here could do duplicate input_unregister_handler()
+ * calls, resulting in attempting to unlink a node from a list that
+ * does not exist.
+ * 2) writers racing here could do duplicate input_register_handler() calls
+ * below, resulting in a duplicate insertion of a node into the list.
+ * 3) a double kfree of keychords can occur (in the event that
+ * input_register_handler() fails below.
+ */
+ ret = keychord_write_lock(kdev);
+ if (ret) {
+ kfree(keychords);
+ return ret;
+ }
+
+ /* unregister handler before changing configuration */
+ if (kdev->registered) {
+ input_unregister_handler(&kdev->input_handler);
+ kdev->registered = 0;
+ }
+
+ spin_lock_irqsave(&kdev->lock, flags);
+ /* clear any existing configuration */
+ kfree(kdev->keychords);
+ kdev->keychords = 0;
+ kdev->keychord_count = 0;
+ kdev->key_down = 0;
+ memset(kdev->keybit, 0, sizeof(kdev->keybit));
+ memset(kdev->keystate, 0, sizeof(kdev->keystate));
+ kdev->head = kdev->tail = 0;
+
+ keychord = keychords;
+
+ while (resid > 0) {
+ /* Is the entire keychord entry header present ? */
+ if (resid < sizeof(struct input_keychord)) {
+ pr_err("keychord: Insufficient bytes present for header %zu\n",
+ resid);
+ goto err_unlock_return;
+ }
+ resid -= sizeof(struct input_keychord);
+ if (keychord->count <= 0) {
+ pr_err("keychord: invalid keycode count %d\n",
+ keychord->count);
+ goto err_unlock_return;
+ }
+ key_bytes = keychord->count * sizeof(keychord->keycodes[0]);
+ /* Do we have all the expected keycodes ? */
+ if (resid < key_bytes) {
+ pr_err("keychord: Insufficient bytes present for keycount %zu\n",
+ resid);
+ goto err_unlock_return;
+ }
+ resid -= key_bytes;
+
+ if (keychord->version != KEYCHORD_VERSION) {
+ pr_err("keychord: unsupported version %d\n",
+ keychord->version);
+ goto err_unlock_return;
+ }
+
+ /* keep track of the keys we are monitoring in keybit */
+ for (i = 0; i < keychord->count; i++) {
+ key = keychord->keycodes[i];
+ if (key < 0 || key >= KEY_CNT) {
+ pr_err("keychord: keycode %d out of range\n",
+ key);
+ goto err_unlock_return;
+ }
+ __set_bit(key, kdev->keybit);
+ }
+
+ kdev->keychord_count++;
+ keychord = NEXT_KEYCHORD(keychord);
+ }
+
+ kdev->keychords = keychords;
+ spin_unlock_irqrestore(&kdev->lock, flags);
+
+ ret = input_register_handler(&kdev->input_handler);
+ if (ret) {
+ kfree(keychords);
+ kdev->keychords = 0;
+ keychord_write_unlock(kdev);
+ return ret;
+ }
+ kdev->registered = 1;
+
+ keychord_write_unlock(kdev);
+
+ return count;
+
+err_unlock_return:
+ spin_unlock_irqrestore(&kdev->lock, flags);
+ kfree(keychords);
+ keychord_write_unlock(kdev);
+ return -EINVAL;
+}
+
+static unsigned int keychord_poll(struct file *file, poll_table *wait)
+{
+ struct keychord_device *kdev = file->private_data;
+
+ poll_wait(file, &kdev->waitq, wait);
+
+ if (kdev->head != kdev->tail)
+ return POLLIN | POLLRDNORM;
+
+ return 0;
+}
+
+static int keychord_open(struct inode *inode, struct file *file)
+{
+ struct keychord_device *kdev;
+
+ kdev = kzalloc(sizeof(struct keychord_device), GFP_KERNEL);
+ if (!kdev)
+ return -ENOMEM;
+
+ spin_lock_init(&kdev->lock);
+ init_waitqueue_head(&kdev->waitq);
+ init_waitqueue_head(&kdev->write_waitq);
+
+ kdev->input_handler.event = keychord_event;
+ kdev->input_handler.connect = keychord_connect;
+ kdev->input_handler.disconnect = keychord_disconnect;
+ kdev->input_handler.name = KEYCHORD_NAME;
+ kdev->input_handler.id_table = kdev->device_ids;
+
+ kdev->device_ids[0].flags = INPUT_DEVICE_ID_MATCH_EVBIT;
+ __set_bit(EV_KEY, kdev->device_ids[0].evbit);
+
+ file->private_data = kdev;
+
+ return 0;
+}
+
+static int keychord_release(struct inode *inode, struct file *file)
+{
+ struct keychord_device *kdev = file->private_data;
+
+ if (kdev->registered)
+ input_unregister_handler(&kdev->input_handler);
+ kfree(kdev->keychords);
+ kfree(kdev);
+
+ return 0;
+}
+
+static const struct file_operations keychord_fops = {
+ .owner = THIS_MODULE,
+ .open = keychord_open,
+ .release = keychord_release,
+ .read = keychord_read,
+ .write = keychord_write,
+ .poll = keychord_poll,
+};
+
+static struct miscdevice keychord_misc = {
+ .fops = &keychord_fops,
+ .name = KEYCHORD_NAME,
+ .minor = MISC_DYNAMIC_MINOR,
+};
+
+static int __init keychord_init(void)
+{
+ return misc_register(&keychord_misc);
+}
+
+static void __exit keychord_exit(void)
+{
+ misc_deregister(&keychord_misc);
+}
+
+module_init(keychord_init);
+module_exit(keychord_exit);
diff --git a/drivers/isdn/hardware/mISDN/avmfritz.c b/drivers/isdn/hardware/mISDN/avmfritz.c
index dce6632..ae2b266 100644
--- a/drivers/isdn/hardware/mISDN/avmfritz.c
+++ b/drivers/isdn/hardware/mISDN/avmfritz.c
@@ -156,7 +156,7 @@
}
static int
-set_debug(const char *val, struct kernel_param *kp)
+set_debug(const char *val, const struct kernel_param *kp)
{
int ret;
struct fritzcard *card;
diff --git a/drivers/isdn/hardware/mISDN/mISDNinfineon.c b/drivers/isdn/hardware/mISDN/mISDNinfineon.c
index d5bdbaf..1fc2906 100644
--- a/drivers/isdn/hardware/mISDN/mISDNinfineon.c
+++ b/drivers/isdn/hardware/mISDN/mISDNinfineon.c
@@ -244,7 +244,7 @@
}
static int
-set_debug(const char *val, struct kernel_param *kp)
+set_debug(const char *val, const struct kernel_param *kp)
{
int ret;
struct inf_hw *card;
diff --git a/drivers/isdn/hardware/mISDN/netjet.c b/drivers/isdn/hardware/mISDN/netjet.c
index 6a6d848..89d9ba8e 100644
--- a/drivers/isdn/hardware/mISDN/netjet.c
+++ b/drivers/isdn/hardware/mISDN/netjet.c
@@ -111,7 +111,7 @@
}
static int
-set_debug(const char *val, struct kernel_param *kp)
+set_debug(const char *val, const struct kernel_param *kp)
{
int ret;
struct tiger_hw *card;
diff --git a/drivers/isdn/hardware/mISDN/speedfax.c b/drivers/isdn/hardware/mISDN/speedfax.c
index 9815bb4..1f1446e 100644
--- a/drivers/isdn/hardware/mISDN/speedfax.c
+++ b/drivers/isdn/hardware/mISDN/speedfax.c
@@ -94,7 +94,7 @@
}
static int
-set_debug(const char *val, struct kernel_param *kp)
+set_debug(const char *val, const struct kernel_param *kp)
{
int ret;
struct sfax_hw *card;
diff --git a/drivers/isdn/hardware/mISDN/w6692.c b/drivers/isdn/hardware/mISDN/w6692.c
index d80072f..209036a 100644
--- a/drivers/isdn/hardware/mISDN/w6692.c
+++ b/drivers/isdn/hardware/mISDN/w6692.c
@@ -101,7 +101,7 @@
}
static int
-set_debug(const char *val, struct kernel_param *kp)
+set_debug(const char *val, const struct kernel_param *kp)
{
int ret;
struct w6692_hw *card;
diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
index 4a249ee..ec359e1 100644
--- a/drivers/md/Kconfig
+++ b/drivers/md/Kconfig
@@ -460,6 +460,21 @@
If unsure, say N.
+config DM_VERITY_HASH_PREFETCH_MIN_SIZE_128
+ bool "Prefetch size 128"
+
+config DM_VERITY_HASH_PREFETCH_MIN_SIZE
+ int "Verity hash prefetch minimum size"
+ depends on DM_VERITY
+ range 1 4096
+ default 128 if DM_VERITY_HASH_PREFETCH_MIN_SIZE_128
+ default 1
+ ---help---
+ This sets minimum number of hash blocks to prefetch for dm-verity.
+ For devices like eMMC, having larger prefetch size like 128 can improve
+ performance with increased memory consumption for keeping more hashes
+ in RAM.
+
config DM_VERITY_FEC
bool "Verity forward error correction support"
depends on DM_VERITY
@@ -540,4 +555,51 @@
If unsure, say N.
+config DM_VERITY_AVB
+ tristate "Support AVB specific verity error behavior"
+ depends on DM_VERITY
+ ---help---
+ Enables Android Verified Boot platform-specific error
+ behavior. In particular, it will modify the vbmeta partition
+ specified on the kernel command-line when non-transient error
+ occurs (followed by a panic).
+
+config DM_ANDROID_VERITY
+ bool "Android verity target support"
+ depends on BLK_DEV_DM=y
+ depends on DM_VERITY=y
+ depends on X509_CERTIFICATE_PARSER
+ depends on SYSTEM_TRUSTED_KEYRING
+ depends on CRYPTO_RSA
+ depends on KEYS
+ depends on ASYMMETRIC_KEY_TYPE
+ depends on ASYMMETRIC_PUBLIC_KEY_SUBTYPE
+ select DM_VERITY_HASH_PREFETCH_MIN_SIZE_128
+ ---help---
+ This device-mapper target is virtually a VERITY target. This
+ target is setup by reading the metadata contents piggybacked
+ to the actual data blocks in the block device. The signature
+ of the metadata contents are verified against the key included
+ in the system keyring. Upon success, the underlying verity
+ target is setup.
+
+config DM_ANDROID_VERITY_AT_MOST_ONCE_DEFAULT_ENABLED
+ bool "Verity will validate blocks at most once"
+ depends on DM_VERITY
+ ---help---
+ Default enables at_most_once option for dm-verity
+
+ Verify data blocks only the first time they are read from the
+ data device, rather than every time. This reduces the overhead
+ of dm-verity so that it can be used on systems that are memory
+ and/or CPU constrained. However, it provides a reduced level
+ of security because only offline tampering of the data device's
+ content will be detected, not online tampering.
+
+ Hash blocks are still verified each time they are read from the
+ hash device, since verification of hash blocks is less performance
+ critical than data blocks, and a hash block will not be verified
+ any more after all the data blocks it covers have been verified anyway.
+
+ If unsure, say N.
endif # MD
diff --git a/drivers/md/Makefile b/drivers/md/Makefile
index e94b6f9..f98717f 100644
--- a/drivers/md/Makefile
+++ b/drivers/md/Makefile
@@ -71,3 +71,11 @@
ifeq ($(CONFIG_DM_VERITY_FEC),y)
dm-verity-objs += dm-verity-fec.o
endif
+
+ifeq ($(CONFIG_DM_VERITY_AVB),y)
+dm-verity-objs += dm-verity-avb.o
+endif
+
+ifeq ($(CONFIG_DM_ANDROID_VERITY),y)
+dm-verity-objs += dm-android-verity.o
+endif
diff --git a/drivers/md/dm-android-verity.c b/drivers/md/dm-android-verity.c
new file mode 100644
index 0000000..20e0593
--- /dev/null
+++ b/drivers/md/dm-android-verity.c
@@ -0,0 +1,925 @@
+/*
+ * Copyright (C) 2015 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/buffer_head.h>
+#include <linux/debugfs.h>
+#include <linux/delay.h>
+#include <linux/device.h>
+#include <linux/device-mapper.h>
+#include <linux/errno.h>
+#include <linux/fs.h>
+#include <linux/fcntl.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/key.h>
+#include <linux/module.h>
+#include <linux/mount.h>
+#include <linux/namei.h>
+#include <linux/of.h>
+#include <linux/reboot.h>
+#include <linux/string.h>
+#include <linux/vmalloc.h>
+
+#include <asm/setup.h>
+#include <crypto/hash.h>
+#include <crypto/hash_info.h>
+#include <crypto/public_key.h>
+#include <crypto/sha.h>
+#include <keys/asymmetric-type.h>
+#include <keys/system_keyring.h>
+
+#include "dm-verity.h"
+#include "dm-android-verity.h"
+
+static char verifiedbootstate[VERITY_COMMANDLINE_PARAM_LENGTH];
+static char veritymode[VERITY_COMMANDLINE_PARAM_LENGTH];
+static char veritykeyid[VERITY_DEFAULT_KEY_ID_LENGTH];
+static char buildvariant[BUILD_VARIANT];
+
+static bool target_added;
+static bool verity_enabled = true;
+struct dentry *debug_dir;
+static int android_verity_ctr(struct dm_target *ti, unsigned argc, char **argv);
+
+static struct target_type android_verity_target = {
+ .name = "android-verity",
+ .version = {1, 0, 0},
+ .module = THIS_MODULE,
+ .ctr = android_verity_ctr,
+ .dtr = verity_dtr,
+ .map = verity_map,
+ .status = verity_status,
+ .prepare_ioctl = verity_prepare_ioctl,
+ .iterate_devices = verity_iterate_devices,
+ .io_hints = verity_io_hints,
+};
+
+static int __init verified_boot_state_param(char *line)
+{
+ strlcpy(verifiedbootstate, line, sizeof(verifiedbootstate));
+ return 1;
+}
+
+__setup("androidboot.verifiedbootstate=", verified_boot_state_param);
+
+static int __init verity_mode_param(char *line)
+{
+ strlcpy(veritymode, line, sizeof(veritymode));
+ return 1;
+}
+
+__setup("androidboot.veritymode=", verity_mode_param);
+
+static int __init verity_keyid_param(char *line)
+{
+ strlcpy(veritykeyid, line, sizeof(veritykeyid));
+ return 1;
+}
+
+__setup("veritykeyid=", verity_keyid_param);
+
+static int __init verity_buildvariant(char *line)
+{
+ strlcpy(buildvariant, line, sizeof(buildvariant));
+ return 1;
+}
+
+__setup("buildvariant=", verity_buildvariant);
+
+static inline bool default_verity_key_id(void)
+{
+ return veritykeyid[0] != '\0';
+}
+
+static inline bool is_eng(void)
+{
+ static const char typeeng[] = "eng";
+
+ return !strncmp(buildvariant, typeeng, sizeof(typeeng));
+}
+
+static inline bool is_userdebug(void)
+{
+ static const char typeuserdebug[] = "userdebug";
+
+ return !strncmp(buildvariant, typeuserdebug, sizeof(typeuserdebug));
+}
+
+static inline bool is_unlocked(void)
+{
+ static const char unlocked[] = "orange";
+
+ return !strncmp(verifiedbootstate, unlocked, sizeof(unlocked));
+}
+
+static int read_block_dev(struct bio_read *payload, struct block_device *bdev,
+ sector_t offset, int length)
+{
+ struct bio *bio;
+ int err = 0, i;
+
+ payload->number_of_pages = DIV_ROUND_UP(length, PAGE_SIZE);
+
+ bio = bio_alloc(GFP_KERNEL, payload->number_of_pages);
+ if (!bio) {
+ DMERR("Error while allocating bio");
+ return -ENOMEM;
+ }
+
+ bio_set_dev(bio, bdev);
+ bio->bi_iter.bi_sector = offset;
+ bio_set_op_attrs(bio, REQ_OP_READ, 0);
+
+ payload->page_io = kzalloc(sizeof(struct page *) *
+ payload->number_of_pages, GFP_KERNEL);
+ if (!payload->page_io) {
+ DMERR("page_io array alloc failed");
+ err = -ENOMEM;
+ goto free_bio;
+ }
+
+ for (i = 0; i < payload->number_of_pages; i++) {
+ payload->page_io[i] = alloc_page(GFP_KERNEL);
+ if (!payload->page_io[i]) {
+ DMERR("alloc_page failed");
+ err = -ENOMEM;
+ goto free_pages;
+ }
+ if (!bio_add_page(bio, payload->page_io[i], PAGE_SIZE, 0)) {
+ DMERR("bio_add_page error");
+ err = -EIO;
+ goto free_pages;
+ }
+ }
+
+ if (!submit_bio_wait(bio))
+ /* success */
+ goto free_bio;
+ DMERR("bio read failed");
+ err = -EIO;
+
+free_pages:
+ for (i = 0; i < payload->number_of_pages; i++)
+ if (payload->page_io[i])
+ __free_page(payload->page_io[i]);
+ kfree(payload->page_io);
+free_bio:
+ bio_put(bio);
+ return err;
+}
+
+static inline u64 fec_div_round_up(u64 x, u64 y)
+{
+ u64 remainder;
+
+ return div64_u64_rem(x, y, &remainder) +
+ (remainder > 0 ? 1 : 0);
+}
+
+static inline void populate_fec_metadata(struct fec_header *header,
+ struct fec_ecc_metadata *ecc)
+{
+ ecc->blocks = fec_div_round_up(le64_to_cpu(header->inp_size),
+ FEC_BLOCK_SIZE);
+ ecc->roots = le32_to_cpu(header->roots);
+ ecc->start = le64_to_cpu(header->inp_size);
+}
+
+static inline int validate_fec_header(struct fec_header *header, u64 offset)
+{
+ /* move offset to make the sanity check work for backup header
+ * as well. */
+ offset -= offset % FEC_BLOCK_SIZE;
+ if (le32_to_cpu(header->magic) != FEC_MAGIC ||
+ le32_to_cpu(header->version) != FEC_VERSION ||
+ le32_to_cpu(header->size) != sizeof(struct fec_header) ||
+ le32_to_cpu(header->roots) == 0 ||
+ le32_to_cpu(header->roots) >= FEC_RSM)
+ return -EINVAL;
+
+ return 0;
+}
+
+static int extract_fec_header(dev_t dev, struct fec_header *fec,
+ struct fec_ecc_metadata *ecc)
+{
+ u64 device_size;
+ struct bio_read payload;
+ int i, err = 0;
+ struct block_device *bdev;
+
+ bdev = blkdev_get_by_dev(dev, FMODE_READ, NULL);
+
+ if (IS_ERR_OR_NULL(bdev)) {
+ DMERR("bdev get error");
+ return PTR_ERR(bdev);
+ }
+
+ device_size = i_size_read(bdev->bd_inode);
+
+ /* fec metadata size is a power of 2 and PAGE_SIZE
+ * is a power of 2 as well.
+ */
+ BUG_ON(FEC_BLOCK_SIZE > PAGE_SIZE);
+ /* 512 byte sector alignment */
+ BUG_ON(((device_size - FEC_BLOCK_SIZE) % (1 << SECTOR_SHIFT)) != 0);
+
+ err = read_block_dev(&payload, bdev, (device_size -
+ FEC_BLOCK_SIZE) / (1 << SECTOR_SHIFT), FEC_BLOCK_SIZE);
+ if (err) {
+ DMERR("Error while reading verity metadata");
+ goto error;
+ }
+
+ BUG_ON(sizeof(struct fec_header) > PAGE_SIZE);
+ memcpy(fec, page_address(payload.page_io[0]),
+ sizeof(*fec));
+
+ ecc->valid = true;
+ if (validate_fec_header(fec, device_size - FEC_BLOCK_SIZE)) {
+ /* Try the backup header */
+ memcpy(fec, page_address(payload.page_io[0]) + FEC_BLOCK_SIZE
+ - sizeof(*fec) ,
+ sizeof(*fec));
+ if (validate_fec_header(fec, device_size -
+ sizeof(struct fec_header)))
+ ecc->valid = false;
+ }
+
+ if (ecc->valid)
+ populate_fec_metadata(fec, ecc);
+
+ for (i = 0; i < payload.number_of_pages; i++)
+ __free_page(payload.page_io[i]);
+ kfree(payload.page_io);
+
+error:
+ blkdev_put(bdev, FMODE_READ);
+ return err;
+}
+static void find_metadata_offset(struct fec_header *fec,
+ struct block_device *bdev, u64 *metadata_offset)
+{
+ u64 device_size;
+
+ device_size = i_size_read(bdev->bd_inode);
+
+ if (le32_to_cpu(fec->magic) == FEC_MAGIC)
+ *metadata_offset = le64_to_cpu(fec->inp_size) -
+ VERITY_METADATA_SIZE;
+ else
+ *metadata_offset = device_size - VERITY_METADATA_SIZE;
+}
+
+static int find_size(dev_t dev, u64 *device_size)
+{
+ struct block_device *bdev;
+
+ bdev = blkdev_get_by_dev(dev, FMODE_READ, NULL);
+ if (IS_ERR_OR_NULL(bdev)) {
+ DMERR("blkdev_get_by_dev failed");
+ return PTR_ERR(bdev);
+ }
+
+ *device_size = i_size_read(bdev->bd_inode);
+ *device_size >>= SECTOR_SHIFT;
+
+ DMINFO("blkdev size in sectors: %llu", *device_size);
+ blkdev_put(bdev, FMODE_READ);
+ return 0;
+}
+
+static int verify_header(struct android_metadata_header *header)
+{
+ int retval = -EINVAL;
+
+ if (is_userdebug() && le32_to_cpu(header->magic_number) ==
+ VERITY_METADATA_MAGIC_DISABLE)
+ return VERITY_STATE_DISABLE;
+
+ if (!(le32_to_cpu(header->magic_number) ==
+ VERITY_METADATA_MAGIC_NUMBER) ||
+ (le32_to_cpu(header->magic_number) ==
+ VERITY_METADATA_MAGIC_DISABLE)) {
+ DMERR("Incorrect magic number");
+ return retval;
+ }
+
+ if (le32_to_cpu(header->protocol_version) !=
+ VERITY_METADATA_VERSION) {
+ DMERR("Unsupported version %u",
+ le32_to_cpu(header->protocol_version));
+ return retval;
+ }
+
+ return 0;
+}
+
+static int extract_metadata(dev_t dev, struct fec_header *fec,
+ struct android_metadata **metadata,
+ bool *verity_enabled)
+{
+ struct block_device *bdev;
+ struct android_metadata_header *header;
+ int i;
+ u32 table_length, copy_length, offset;
+ u64 metadata_offset;
+ struct bio_read payload;
+ int err = 0;
+
+ bdev = blkdev_get_by_dev(dev, FMODE_READ, NULL);
+
+ if (IS_ERR_OR_NULL(bdev)) {
+ DMERR("blkdev_get_by_dev failed");
+ return -ENODEV;
+ }
+
+ find_metadata_offset(fec, bdev, &metadata_offset);
+
+ /* Verity metadata size is a power of 2 and PAGE_SIZE
+ * is a power of 2 as well.
+ * PAGE_SIZE is also a multiple of 512 bytes.
+ */
+ if (VERITY_METADATA_SIZE > PAGE_SIZE)
+ BUG_ON(VERITY_METADATA_SIZE % PAGE_SIZE != 0);
+ /* 512 byte sector alignment */
+ BUG_ON(metadata_offset % (1 << SECTOR_SHIFT) != 0);
+
+ err = read_block_dev(&payload, bdev, metadata_offset /
+ (1 << SECTOR_SHIFT), VERITY_METADATA_SIZE);
+ if (err) {
+ DMERR("Error while reading verity metadata");
+ goto blkdev_release;
+ }
+
+ header = kzalloc(sizeof(*header), GFP_KERNEL);
+ if (!header) {
+ DMERR("kzalloc failed for header");
+ err = -ENOMEM;
+ goto free_payload;
+ }
+
+ memcpy(header, page_address(payload.page_io[0]),
+ sizeof(*header));
+
+ DMINFO("bio magic_number:%u protocol_version:%d table_length:%u",
+ le32_to_cpu(header->magic_number),
+ le32_to_cpu(header->protocol_version),
+ le32_to_cpu(header->table_length));
+
+ err = verify_header(header);
+
+ if (err == VERITY_STATE_DISABLE) {
+ DMERR("Mounting root with verity disabled");
+ *verity_enabled = false;
+ /* we would still have to read the metadata to figure out
+ * the data blocks size. Or may be could map the entire
+ * partition similar to mounting the device.
+ *
+ * Reset error as well as the verity_enabled flag is changed.
+ */
+ err = 0;
+ } else if (err)
+ goto free_header;
+
+ *metadata = kzalloc(sizeof(**metadata), GFP_KERNEL);
+ if (!*metadata) {
+ DMERR("kzalloc for metadata failed");
+ err = -ENOMEM;
+ goto free_header;
+ }
+
+ (*metadata)->header = header;
+ table_length = le32_to_cpu(header->table_length);
+
+ if (table_length == 0 ||
+ table_length > (VERITY_METADATA_SIZE -
+ sizeof(struct android_metadata_header))) {
+ DMERR("table_length too long");
+ err = -EINVAL;
+ goto free_metadata;
+ }
+
+ (*metadata)->verity_table = kzalloc(table_length + 1, GFP_KERNEL);
+
+ if (!(*metadata)->verity_table) {
+ DMERR("kzalloc verity_table failed");
+ err = -ENOMEM;
+ goto free_metadata;
+ }
+
+ if (sizeof(struct android_metadata_header) +
+ table_length <= PAGE_SIZE) {
+ memcpy((*metadata)->verity_table,
+ page_address(payload.page_io[0])
+ + sizeof(struct android_metadata_header),
+ table_length);
+ } else {
+ copy_length = PAGE_SIZE -
+ sizeof(struct android_metadata_header);
+ memcpy((*metadata)->verity_table,
+ page_address(payload.page_io[0])
+ + sizeof(struct android_metadata_header),
+ copy_length);
+ table_length -= copy_length;
+ offset = copy_length;
+ i = 1;
+ while (table_length != 0) {
+ if (table_length > PAGE_SIZE) {
+ memcpy((*metadata)->verity_table + offset,
+ page_address(payload.page_io[i]),
+ PAGE_SIZE);
+ offset += PAGE_SIZE;
+ table_length -= PAGE_SIZE;
+ } else {
+ memcpy((*metadata)->verity_table + offset,
+ page_address(payload.page_io[i]),
+ table_length);
+ table_length = 0;
+ }
+ i++;
+ }
+ }
+ (*metadata)->verity_table[table_length] = '\0';
+
+ DMINFO("verity_table: %s", (*metadata)->verity_table);
+ goto free_payload;
+
+free_metadata:
+ kfree(*metadata);
+free_header:
+ kfree(header);
+free_payload:
+ for (i = 0; i < payload.number_of_pages; i++)
+ if (payload.page_io[i])
+ __free_page(payload.page_io[i]);
+ kfree(payload.page_io);
+blkdev_release:
+ blkdev_put(bdev, FMODE_READ);
+ return err;
+}
+
+/* helper functions to extract properties from dts */
+const char *find_dt_value(const char *name)
+{
+ struct device_node *firmware;
+ const char *value;
+
+ firmware = of_find_node_by_path("/firmware/android");
+ if (!firmware)
+ return NULL;
+ value = of_get_property(firmware, name, NULL);
+ of_node_put(firmware);
+
+ return value;
+}
+
+static int verity_mode(void)
+{
+ static const char enforcing[] = "enforcing";
+ static const char verified_mode_prop[] = "veritymode";
+ const char *value;
+
+ value = find_dt_value(verified_mode_prop);
+ if (!value)
+ value = veritymode;
+ if (!strncmp(value, enforcing, sizeof(enforcing) - 1))
+ return DM_VERITY_MODE_RESTART;
+
+ return DM_VERITY_MODE_EIO;
+}
+
+static void handle_error(void)
+{
+ int mode = verity_mode();
+ if (mode == DM_VERITY_MODE_RESTART) {
+ DMERR("triggering restart");
+ kernel_restart("dm-verity device corrupted");
+ } else {
+ DMERR("Mounting verity root failed");
+ }
+}
+
+static struct public_key_signature *table_make_digest(
+ enum hash_algo hash,
+ const void *table,
+ unsigned long table_len)
+{
+ struct public_key_signature *pks = NULL;
+ struct crypto_shash *tfm;
+ struct shash_desc *desc;
+ size_t digest_size, desc_size;
+ int ret;
+
+ /* Allocate the hashing algorithm we're going to need and find out how
+ * big the hash operational data will be.
+ */
+ tfm = crypto_alloc_shash(hash_algo_name[hash], 0, 0);
+ if (IS_ERR(tfm))
+ return ERR_CAST(tfm);
+
+ desc_size = crypto_shash_descsize(tfm) + sizeof(*desc);
+ digest_size = crypto_shash_digestsize(tfm);
+
+ /* We allocate the hash operational data storage on the end of out
+ * context data and the digest output buffer on the end of that.
+ */
+ ret = -ENOMEM;
+ pks = kzalloc(digest_size + sizeof(*pks) + desc_size, GFP_KERNEL);
+ if (!pks)
+ goto error;
+
+ pks->pkey_algo = "rsa";
+ pks->hash_algo = hash_algo_name[hash];
+ pks->digest = (u8 *)pks + sizeof(*pks) + desc_size;
+ pks->digest_size = digest_size;
+
+ desc = (struct shash_desc *)(pks + 1);
+ desc->tfm = tfm;
+ desc->flags = CRYPTO_TFM_REQ_MAY_SLEEP;
+
+ ret = crypto_shash_init(desc);
+ if (ret < 0)
+ goto error;
+
+ ret = crypto_shash_finup(desc, table, table_len, pks->digest);
+ if (ret < 0)
+ goto error;
+
+ crypto_free_shash(tfm);
+ return pks;
+
+error:
+ kfree(pks);
+ crypto_free_shash(tfm);
+ return ERR_PTR(ret);
+}
+
+
+static int verify_verity_signature(char *key_id,
+ struct android_metadata *metadata)
+{
+ struct public_key_signature *pks = NULL;
+ int retval = -EINVAL;
+
+ if (!key_id)
+ goto error;
+
+ pks = table_make_digest(HASH_ALGO_SHA256,
+ (const void *)metadata->verity_table,
+ le32_to_cpu(metadata->header->table_length));
+ if (IS_ERR(pks)) {
+ DMERR("hashing failed");
+ retval = PTR_ERR(pks);
+ pks = NULL;
+ goto error;
+ }
+
+ pks->s = kmemdup(&metadata->header->signature[0], RSANUMBYTES, GFP_KERNEL);
+ if (!pks->s) {
+ DMERR("Error allocating memory for signature");
+ goto error;
+ }
+ pks->s_size = RSANUMBYTES;
+
+ retval = verify_signature_one(pks, NULL, key_id);
+ kfree(pks->s);
+error:
+ kfree(pks);
+ return retval;
+}
+
+static inline bool test_mult_overflow(sector_t a, u32 b)
+{
+ sector_t r = (sector_t)~0ULL;
+
+ sector_div(r, b);
+ return a > r;
+}
+
+static int add_as_linear_device(struct dm_target *ti, char *dev)
+{
+ /*Move to linear mapping defines*/
+ char *linear_table_args[DM_LINEAR_ARGS] = {dev,
+ DM_LINEAR_TARGET_OFFSET};
+ int err = 0;
+
+ android_verity_target.dtr = dm_linear_dtr,
+ android_verity_target.map = dm_linear_map,
+ android_verity_target.status = dm_linear_status,
+ android_verity_target.end_io = dm_linear_end_io,
+ android_verity_target.prepare_ioctl = dm_linear_prepare_ioctl,
+ android_verity_target.iterate_devices = dm_linear_iterate_devices,
+ android_verity_target.direct_access = dm_linear_dax_direct_access,
+ android_verity_target.dax_copy_from_iter = dm_linear_dax_copy_from_iter,
+ android_verity_target.io_hints = NULL;
+
+ set_disk_ro(dm_disk(dm_table_get_md(ti->table)), 0);
+
+ err = dm_linear_ctr(ti, DM_LINEAR_ARGS, linear_table_args);
+
+ if (!err) {
+ DMINFO("Added android-verity as a linear target");
+ target_added = true;
+ } else
+ DMERR("Failed to add android-verity as linear target");
+
+ return err;
+}
+
+static int create_linear_device(struct dm_target *ti, dev_t dev,
+ char *target_device)
+{
+ u64 device_size = 0;
+ int err = find_size(dev, &device_size);
+
+ if (err) {
+ DMERR("error finding bdev size");
+ handle_error();
+ return err;
+ }
+
+ ti->len = device_size;
+ err = add_as_linear_device(ti, target_device);
+ if (err) {
+ handle_error();
+ return err;
+ }
+ verity_enabled = false;
+ return 0;
+}
+
+/*
+ * Target parameters:
+ * <key id> Key id of the public key in the system keyring.
+ * Verity metadata's signature would be verified against
+ * this. If the key id contains spaces, replace them
+ * with '#'.
+ * <block device> The block device for which dm-verity is being setup.
+ */
+static int android_verity_ctr(struct dm_target *ti, unsigned argc, char **argv)
+{
+ dev_t uninitialized_var(dev);
+ struct android_metadata *metadata = NULL;
+ int err = 0, i, mode;
+ char *key_id = NULL, *table_ptr, dummy, *target_device;
+ char *verity_table_args[VERITY_TABLE_ARGS + 2 + VERITY_TABLE_OPT_FEC_ARGS];
+ /* One for specifying number of opt args and one for mode */
+ sector_t data_sectors;
+ u32 data_block_size;
+ unsigned int no_of_args = VERITY_TABLE_ARGS + 2 + VERITY_TABLE_OPT_FEC_ARGS;
+ struct fec_header uninitialized_var(fec);
+ struct fec_ecc_metadata uninitialized_var(ecc);
+ char buf[FEC_ARG_LENGTH], *buf_ptr;
+ unsigned long long tmpll;
+
+ if (argc == 1) {
+ /* Use the default keyid */
+ if (default_verity_key_id())
+ key_id = veritykeyid;
+ else if (!is_eng()) {
+ DMERR("veritykeyid= is not set");
+ handle_error();
+ return -EINVAL;
+ }
+ target_device = argv[0];
+ } else if (argc == 2) {
+ key_id = argv[0];
+ target_device = argv[1];
+ } else {
+ DMERR("Incorrect number of arguments");
+ handle_error();
+ return -EINVAL;
+ }
+
+ dev = name_to_dev_t(target_device);
+ if (!dev) {
+ DMERR("no dev found for %s", target_device);
+ handle_error();
+ return -EINVAL;
+ }
+
+ if (is_eng())
+ return create_linear_device(ti, dev, target_device);
+
+ strreplace(key_id, '#', ' ');
+
+ DMINFO("key:%s dev:%s", key_id, target_device);
+
+ if (extract_fec_header(dev, &fec, &ecc)) {
+ DMERR("Error while extracting fec header");
+ handle_error();
+ return -EINVAL;
+ }
+
+ err = extract_metadata(dev, &fec, &metadata, &verity_enabled);
+
+ if (err) {
+ /* Allow invalid metadata when the device is unlocked */
+ if (is_unlocked()) {
+ DMWARN("Allow invalid metadata when unlocked");
+ return create_linear_device(ti, dev, target_device);
+ }
+ DMERR("Error while extracting metadata");
+ handle_error();
+ goto free_metadata;
+ }
+
+ if (verity_enabled) {
+ err = verify_verity_signature(key_id, metadata);
+
+ if (err) {
+ DMERR("Signature verification failed");
+ handle_error();
+ goto free_metadata;
+ } else
+ DMINFO("Signature verification success");
+ }
+
+ table_ptr = metadata->verity_table;
+
+ for (i = 0; i < VERITY_TABLE_ARGS; i++) {
+ verity_table_args[i] = strsep(&table_ptr, " ");
+ if (verity_table_args[i] == NULL)
+ break;
+ }
+
+ if (i != VERITY_TABLE_ARGS) {
+ DMERR("Verity table not in the expected format");
+ err = -EINVAL;
+ handle_error();
+ goto free_metadata;
+ }
+
+ if (sscanf(verity_table_args[5], "%llu%c", &tmpll, &dummy)
+ != 1) {
+ DMERR("Verity table not in the expected format");
+ handle_error();
+ err = -EINVAL;
+ goto free_metadata;
+ }
+
+ if (tmpll > ULONG_MAX) {
+ DMERR("<num_data_blocks> too large. Forgot to turn on CONFIG_LBDAF?");
+ handle_error();
+ err = -EINVAL;
+ goto free_metadata;
+ }
+
+ data_sectors = tmpll;
+
+ if (sscanf(verity_table_args[3], "%u%c", &data_block_size, &dummy)
+ != 1) {
+ DMERR("Verity table not in the expected format");
+ handle_error();
+ err = -EINVAL;
+ goto free_metadata;
+ }
+
+ if (test_mult_overflow(data_sectors, data_block_size >>
+ SECTOR_SHIFT)) {
+ DMERR("data_sectors too large");
+ handle_error();
+ err = -EOVERFLOW;
+ goto free_metadata;
+ }
+
+ data_sectors *= data_block_size >> SECTOR_SHIFT;
+ DMINFO("Data sectors %llu", (unsigned long long)data_sectors);
+
+ /* update target length */
+ ti->len = data_sectors;
+
+ /* Setup linear target and free */
+ if (!verity_enabled) {
+ err = add_as_linear_device(ti, target_device);
+ goto free_metadata;
+ }
+
+ /*substitute data_dev and hash_dev*/
+ verity_table_args[1] = target_device;
+ verity_table_args[2] = target_device;
+
+ mode = verity_mode();
+
+ if (ecc.valid && IS_BUILTIN(CONFIG_DM_VERITY_FEC)) {
+ if (mode) {
+ err = snprintf(buf, FEC_ARG_LENGTH,
+ "%u %s " VERITY_TABLE_OPT_FEC_FORMAT,
+ 1 + VERITY_TABLE_OPT_FEC_ARGS,
+ mode == DM_VERITY_MODE_RESTART ?
+ VERITY_TABLE_OPT_RESTART :
+ VERITY_TABLE_OPT_LOGGING,
+ target_device,
+ ecc.start / FEC_BLOCK_SIZE, ecc.blocks,
+ ecc.roots);
+ } else {
+ err = snprintf(buf, FEC_ARG_LENGTH,
+ "%u " VERITY_TABLE_OPT_FEC_FORMAT,
+ VERITY_TABLE_OPT_FEC_ARGS, target_device,
+ ecc.start / FEC_BLOCK_SIZE, ecc.blocks,
+ ecc.roots);
+ }
+ } else if (mode) {
+ err = snprintf(buf, FEC_ARG_LENGTH,
+ "2 " VERITY_TABLE_OPT_IGNZERO " %s",
+ mode == DM_VERITY_MODE_RESTART ?
+ VERITY_TABLE_OPT_RESTART : VERITY_TABLE_OPT_LOGGING);
+ } else {
+ err = snprintf(buf, FEC_ARG_LENGTH, "1 %s",
+ "ignore_zero_blocks");
+ }
+
+ if (err < 0 || err >= FEC_ARG_LENGTH)
+ goto free_metadata;
+
+ buf_ptr = buf;
+
+ for (i = VERITY_TABLE_ARGS; i < (VERITY_TABLE_ARGS +
+ VERITY_TABLE_OPT_FEC_ARGS + 2); i++) {
+ verity_table_args[i] = strsep(&buf_ptr, " ");
+ if (verity_table_args[i] == NULL) {
+ no_of_args = i;
+ break;
+ }
+ }
+
+ err = verity_ctr(ti, no_of_args, verity_table_args);
+ if (err) {
+ DMERR("android-verity failed to create a verity target");
+ } else {
+ target_added = true;
+ DMINFO("android-verity created as verity target");
+ }
+
+free_metadata:
+ if (metadata) {
+ kfree(metadata->header);
+ kfree(metadata->verity_table);
+ }
+ kfree(metadata);
+ return err;
+}
+
+static int __init dm_android_verity_init(void)
+{
+ int r;
+ struct dentry *file;
+
+ r = dm_register_target(&android_verity_target);
+ if (r < 0)
+ DMERR("register failed %d", r);
+
+ /* Tracks the status of the last added target */
+ debug_dir = debugfs_create_dir("android_verity", NULL);
+
+ if (IS_ERR_OR_NULL(debug_dir)) {
+ DMERR("Cannot create android_verity debugfs directory: %ld",
+ PTR_ERR(debug_dir));
+ goto end;
+ }
+
+ file = debugfs_create_bool("target_added", S_IRUGO, debug_dir,
+ &target_added);
+
+ if (IS_ERR_OR_NULL(file)) {
+ DMERR("Cannot create android_verity debugfs directory: %ld",
+ PTR_ERR(debug_dir));
+ debugfs_remove_recursive(debug_dir);
+ goto end;
+ }
+
+ file = debugfs_create_bool("verity_enabled", S_IRUGO, debug_dir,
+ &verity_enabled);
+
+ if (IS_ERR_OR_NULL(file)) {
+ DMERR("Cannot create android_verity debugfs directory: %ld",
+ PTR_ERR(debug_dir));
+ debugfs_remove_recursive(debug_dir);
+ }
+
+end:
+ return r;
+}
+
+static void __exit dm_android_verity_exit(void)
+{
+ if (!IS_ERR_OR_NULL(debug_dir))
+ debugfs_remove_recursive(debug_dir);
+
+ dm_unregister_target(&android_verity_target);
+}
+
+module_init(dm_android_verity_init);
+module_exit(dm_android_verity_exit);
diff --git a/drivers/md/dm-android-verity.h b/drivers/md/dm-android-verity.h
new file mode 100644
index 0000000..ef406c1
--- /dev/null
+++ b/drivers/md/dm-android-verity.h
@@ -0,0 +1,128 @@
+/*
+ * Copyright (C) 2015 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef DM_ANDROID_VERITY_H
+#define DM_ANDROID_VERITY_H
+
+#include <crypto/sha.h>
+
+#define RSANUMBYTES 256
+#define VERITY_METADATA_MAGIC_NUMBER 0xb001b001
+#define VERITY_METADATA_MAGIC_DISABLE 0x46464f56
+#define VERITY_METADATA_VERSION 0
+#define VERITY_STATE_DISABLE 1
+#define DATA_BLOCK_SIZE (4 * 1024)
+#define VERITY_METADATA_SIZE (8 * DATA_BLOCK_SIZE)
+#define VERITY_TABLE_ARGS 10
+#define VERITY_COMMANDLINE_PARAM_LENGTH 20
+#define BUILD_VARIANT 20
+
+/*
+ * <subject>:<sha1-id> is the format for the identifier.
+ * subject can either be the Common Name(CN) + Organization Name(O) or
+ * just the CN if the it is prefixed with O
+ * From https://tools.ietf.org/html/rfc5280#appendix-A
+ * ub-organization-name-length INTEGER ::= 64
+ * ub-common-name-length INTEGER ::= 64
+ *
+ * http://lxr.free-electrons.com/source/crypto/asymmetric_keys/x509_cert_parser.c?v=3.9#L278
+ * ctx->o_size + 2 + ctx->cn_size + 1
+ * + 41 characters for ":" and sha1 id
+ * 64 + 2 + 64 + 1 + 1 + 40 (172)
+ * setting VERITY_DEFAULT_KEY_ID_LENGTH to 200 characters.
+ */
+#define VERITY_DEFAULT_KEY_ID_LENGTH 200
+
+#define FEC_MAGIC 0xFECFECFE
+#define FEC_BLOCK_SIZE (4 * 1024)
+#define FEC_VERSION 0
+#define FEC_RSM 255
+#define FEC_ARG_LENGTH 300
+
+#define VERITY_TABLE_OPT_RESTART "restart_on_corruption"
+#define VERITY_TABLE_OPT_LOGGING "ignore_corruption"
+#define VERITY_TABLE_OPT_IGNZERO "ignore_zero_blocks"
+
+#define VERITY_TABLE_OPT_FEC_FORMAT \
+ "use_fec_from_device %s fec_start %llu fec_blocks %llu fec_roots %u ignore_zero_blocks"
+#define VERITY_TABLE_OPT_FEC_ARGS 9
+
+#define VERITY_DEBUG 0
+
+#define DM_MSG_PREFIX "android-verity"
+
+#define DM_LINEAR_ARGS 2
+#define DM_LINEAR_TARGET_OFFSET "0"
+
+/*
+ * There can be two formats.
+ * if fec is present
+ * <data_blocks> <verity_tree> <verity_metdata_32K><fec_data><fec_data_4K>
+ * if fec is not present
+ * <data_blocks> <verity_tree> <verity_metdata_32K>
+ */
+struct fec_header {
+ __le32 magic;
+ __le32 version;
+ __le32 size;
+ __le32 roots;
+ __le32 fec_size;
+ __le64 inp_size;
+ u8 hash[SHA256_DIGEST_SIZE];
+} __attribute__((packed));
+
+struct android_metadata_header {
+ __le32 magic_number;
+ __le32 protocol_version;
+ char signature[RSANUMBYTES];
+ __le32 table_length;
+};
+
+struct android_metadata {
+ struct android_metadata_header *header;
+ char *verity_table;
+};
+
+struct fec_ecc_metadata {
+ bool valid;
+ u32 roots;
+ u64 blocks;
+ u64 rounds;
+ u64 start;
+};
+
+struct bio_read {
+ struct page **page_io;
+ int number_of_pages;
+};
+
+extern struct target_type linear_target;
+
+extern void dm_linear_dtr(struct dm_target *ti);
+extern int dm_linear_map(struct dm_target *ti, struct bio *bio);
+extern int dm_linear_end_io(struct dm_target *ti, struct bio *bio,
+ blk_status_t *error);
+extern void dm_linear_status(struct dm_target *ti, status_type_t type,
+ unsigned status_flags, char *result, unsigned maxlen);
+extern int dm_linear_prepare_ioctl(struct dm_target *ti,
+ struct block_device **bdev, fmode_t *mode);
+extern int dm_linear_iterate_devices(struct dm_target *ti,
+ iterate_devices_callout_fn fn, void *data);
+extern int dm_linear_ctr(struct dm_target *ti, unsigned int argc, char **argv);
+extern long dm_linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
+ long nr_pages, void **kaddr,
+ pfn_t *pfn);
+extern size_t dm_linear_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
+ void *addr, size_t bytes, struct iov_iter *i);
+#endif /* DM_ANDROID_VERITY_H */
diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index e52676f..4a94d51 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1992,6 +1992,45 @@
dm_hash_exit();
}
+
+/**
+ * dm_ioctl_export - Permanently export a mapped device via the ioctl interface
+ * @md: Pointer to mapped_device
+ * @name: Buffer (size DM_NAME_LEN) for name
+ * @uuid: Buffer (size DM_UUID_LEN) for uuid or NULL if not desired
+ */
+int dm_ioctl_export(struct mapped_device *md, const char *name,
+ const char *uuid)
+{
+ int r = 0;
+ struct hash_cell *hc;
+
+ if (!md) {
+ r = -ENXIO;
+ goto out;
+ }
+
+ /* The name and uuid can only be set once. */
+ mutex_lock(&dm_hash_cells_mutex);
+ hc = dm_get_mdptr(md);
+ mutex_unlock(&dm_hash_cells_mutex);
+ if (hc) {
+ DMERR("%s: already exported", dm_device_name(md));
+ r = -ENXIO;
+ goto out;
+ }
+
+ r = dm_hash_insert(name, uuid, md);
+ if (r) {
+ DMERR("%s: could not bind to '%s'", dm_device_name(md), name);
+ goto out;
+ }
+
+ /* Let udev know we've changed. */
+ dm_kobject_uevent(md, KOBJ_CHANGE, dm_get_event_nr(md));
+out:
+ return r;
+}
/**
* dm_copy_name_and_uuid - Copy mapped device name & uuid into supplied buffers
* @md: Pointer to mapped_device
diff --git a/drivers/md/dm-linear.c b/drivers/md/dm-linear.c
index d5f8eff..e6fd31b 100644
--- a/drivers/md/dm-linear.c
+++ b/drivers/md/dm-linear.c
@@ -26,7 +26,7 @@
/*
* Construct a linear mapping: <dev_path> <offset>
*/
-static int linear_ctr(struct dm_target *ti, unsigned int argc, char **argv)
+int dm_linear_ctr(struct dm_target *ti, unsigned int argc, char **argv)
{
struct linear_c *lc;
unsigned long long tmp;
@@ -69,7 +69,7 @@
return ret;
}
-static void linear_dtr(struct dm_target *ti)
+void dm_linear_dtr(struct dm_target *ti)
{
struct linear_c *lc = (struct linear_c *) ti->private;
@@ -94,14 +94,14 @@
linear_map_sector(ti, bio->bi_iter.bi_sector);
}
-static int linear_map(struct dm_target *ti, struct bio *bio)
+int dm_linear_map(struct dm_target *ti, struct bio *bio)
{
linear_map_bio(ti, bio);
return DM_MAPIO_REMAPPED;
}
-static int linear_end_io(struct dm_target *ti, struct bio *bio,
+int dm_linear_end_io(struct dm_target *ti, struct bio *bio,
blk_status_t *error)
{
struct linear_c *lc = ti->private;
@@ -111,8 +111,9 @@
return DM_ENDIO_DONE;
}
+EXPORT_SYMBOL_GPL(dm_linear_end_io);
-static void linear_status(struct dm_target *ti, status_type_t type,
+void dm_linear_status(struct dm_target *ti, status_type_t type,
unsigned status_flags, char *result, unsigned maxlen)
{
struct linear_c *lc = (struct linear_c *) ti->private;
@@ -129,7 +130,7 @@
}
}
-static int linear_prepare_ioctl(struct dm_target *ti,
+int dm_linear_prepare_ioctl(struct dm_target *ti,
struct block_device **bdev, fmode_t *mode)
{
struct linear_c *lc = (struct linear_c *) ti->private;
@@ -146,7 +147,7 @@
return 0;
}
-static int linear_iterate_devices(struct dm_target *ti,
+int dm_linear_iterate_devices(struct dm_target *ti,
iterate_devices_callout_fn fn, void *data)
{
struct linear_c *lc = ti->private;
@@ -154,7 +155,7 @@
return fn(ti, lc->dev, lc->start, ti->len, data);
}
-static long linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
+long dm_linear_dax_direct_access(struct dm_target *ti, pgoff_t pgoff,
long nr_pages, void **kaddr, pfn_t *pfn)
{
long ret;
@@ -169,8 +170,9 @@
return ret;
return dax_direct_access(dax_dev, pgoff, nr_pages, kaddr, pfn);
}
+EXPORT_SYMBOL_GPL(dm_linear_dax_direct_access);
-static size_t linear_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
+size_t dm_linear_dax_copy_from_iter(struct dm_target *ti, pgoff_t pgoff,
void *addr, size_t bytes, struct iov_iter *i)
{
struct linear_c *lc = ti->private;
@@ -183,21 +185,22 @@
return 0;
return dax_copy_from_iter(dax_dev, pgoff, addr, bytes, i);
}
+EXPORT_SYMBOL_GPL(dm_linear_dax_copy_from_iter);
static struct target_type linear_target = {
.name = "linear",
.version = {1, 4, 0},
.features = DM_TARGET_PASSES_INTEGRITY | DM_TARGET_ZONED_HM,
.module = THIS_MODULE,
- .ctr = linear_ctr,
- .dtr = linear_dtr,
- .map = linear_map,
- .end_io = linear_end_io,
- .status = linear_status,
- .prepare_ioctl = linear_prepare_ioctl,
- .iterate_devices = linear_iterate_devices,
- .direct_access = linear_dax_direct_access,
- .dax_copy_from_iter = linear_dax_copy_from_iter,
+ .ctr = dm_linear_ctr,
+ .dtr = dm_linear_dtr,
+ .map = dm_linear_map,
+ .status = dm_linear_status,
+ .end_io = dm_linear_end_io,
+ .prepare_ioctl = dm_linear_prepare_ioctl,
+ .iterate_devices = dm_linear_iterate_devices,
+ .direct_access = dm_linear_dax_direct_access,
+ .dax_copy_from_iter = dm_linear_dax_copy_from_iter,
};
int __init dm_linear_init(void)
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index f9cd813..f20a8dd 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -11,6 +11,7 @@
#include <linux/vmalloc.h>
#include <linux/blkdev.h>
#include <linux/namei.h>
+#include <linux/mount.h>
#include <linux/ctype.h>
#include <linux/string.h>
#include <linux/slab.h>
@@ -547,14 +548,14 @@
* On the other hand, dm-switch needs to process bulk data using messages and
* excessive use of GFP_NOIO could cause trouble.
*/
-static char **realloc_argv(unsigned *array_size, char **old_argv)
+static char **realloc_argv(unsigned *size, char **old_argv)
{
char **argv;
unsigned new_size;
gfp_t gfp;
- if (*array_size) {
- new_size = *array_size * 2;
+ if (*size) {
+ new_size = *size * 2;
gfp = GFP_KERNEL;
} else {
new_size = 8;
@@ -562,8 +563,8 @@
}
argv = kmalloc(new_size * sizeof(*argv), gfp);
if (argv) {
- memcpy(argv, old_argv, *array_size * sizeof(*argv));
- *array_size = new_size;
+ memcpy(argv, old_argv, *size * sizeof(*argv));
+ *size = new_size;
}
kfree(old_argv);
diff --git a/drivers/md/dm-verity-avb.c b/drivers/md/dm-verity-avb.c
new file mode 100644
index 0000000..a9f102a
--- /dev/null
+++ b/drivers/md/dm-verity-avb.c
@@ -0,0 +1,229 @@
+/*
+ * Copyright (C) 2017 Google.
+ *
+ * This file is released under the GPLv2.
+ *
+ * Based on drivers/md/dm-verity-chromeos.c
+ */
+
+#include <linux/device-mapper.h>
+#include <linux/module.h>
+#include <linux/mount.h>
+
+#define DM_MSG_PREFIX "verity-avb"
+
+/* Set via module parameters. */
+static char avb_vbmeta_device[64];
+static char avb_invalidate_on_error[4];
+
+static void invalidate_vbmeta_endio(struct bio *bio)
+{
+ if (bio->bi_status)
+ DMERR("invalidate_vbmeta_endio: error %d", bio->bi_status);
+ complete(bio->bi_private);
+}
+
+static int invalidate_vbmeta_submit(struct bio *bio,
+ struct block_device *bdev,
+ int op, int access_last_sector,
+ struct page *page)
+{
+ DECLARE_COMPLETION_ONSTACK(wait);
+
+ bio->bi_private = &wait;
+ bio->bi_end_io = invalidate_vbmeta_endio;
+ bio_set_dev(bio, bdev);
+ bio_set_op_attrs(bio, op, REQ_SYNC);
+
+ bio->bi_iter.bi_sector = 0;
+ if (access_last_sector) {
+ sector_t last_sector;
+
+ last_sector = (i_size_read(bdev->bd_inode)>>SECTOR_SHIFT) - 1;
+ bio->bi_iter.bi_sector = last_sector;
+ }
+ if (!bio_add_page(bio, page, PAGE_SIZE, 0)) {
+ DMERR("invalidate_vbmeta_submit: bio_add_page error");
+ return -EIO;
+ }
+
+ submit_bio(bio);
+ /* Wait up to 2 seconds for completion or fail. */
+ if (!wait_for_completion_timeout(&wait, msecs_to_jiffies(2000)))
+ return -EIO;
+ return 0;
+}
+
+static int invalidate_vbmeta(dev_t vbmeta_devt)
+{
+ int ret = 0;
+ struct block_device *bdev;
+ struct bio *bio;
+ struct page *page;
+ fmode_t dev_mode;
+ /* Ensure we do synchronous unblocked I/O. We may also need
+ * sync_bdev() on completion, but it really shouldn't.
+ */
+ int access_last_sector = 0;
+
+ DMINFO("invalidate_vbmeta: acting on device %d:%d",
+ MAJOR(vbmeta_devt), MINOR(vbmeta_devt));
+
+ /* First we open the device for reading. */
+ dev_mode = FMODE_READ | FMODE_EXCL;
+ bdev = blkdev_get_by_dev(vbmeta_devt, dev_mode,
+ invalidate_vbmeta);
+ if (IS_ERR(bdev)) {
+ DMERR("invalidate_kernel: could not open device for reading");
+ dev_mode = 0;
+ ret = -ENOENT;
+ goto failed_to_read;
+ }
+
+ bio = bio_alloc(GFP_NOIO, 1);
+ if (!bio) {
+ ret = -ENOMEM;
+ goto failed_bio_alloc;
+ }
+
+ page = alloc_page(GFP_NOIO);
+ if (!page) {
+ ret = -ENOMEM;
+ goto failed_to_alloc_page;
+ }
+
+ access_last_sector = 0;
+ ret = invalidate_vbmeta_submit(bio, bdev, REQ_OP_READ,
+ access_last_sector, page);
+ if (ret) {
+ DMERR("invalidate_vbmeta: error reading");
+ goto failed_to_submit_read;
+ }
+
+ /* We have a page. Let's make sure it looks right. */
+ if (memcmp("AVB0", page_address(page), 4) == 0) {
+ /* Stamp it. */
+ memcpy(page_address(page), "AVE0", 4);
+ DMINFO("invalidate_vbmeta: found vbmeta partition");
+ } else {
+ /* Could be this is on a AVB footer, check. Also, since the
+ * AVB footer is in the last 64 bytes, adjust for the fact that
+ * we're dealing with 512-byte sectors.
+ */
+ size_t offset = (1<<SECTOR_SHIFT) - 64;
+
+ access_last_sector = 1;
+ ret = invalidate_vbmeta_submit(bio, bdev, REQ_OP_READ,
+ access_last_sector, page);
+ if (ret) {
+ DMERR("invalidate_vbmeta: error reading");
+ goto failed_to_submit_read;
+ }
+ if (memcmp("AVBf", page_address(page) + offset, 4) != 0) {
+ DMERR("invalidate_vbmeta on non-vbmeta partition");
+ ret = -EINVAL;
+ goto invalid_header;
+ }
+ /* Stamp it. */
+ memcpy(page_address(page) + offset, "AVE0", 4);
+ DMINFO("invalidate_vbmeta: found vbmeta footer partition");
+ }
+
+ /* Now rewrite the changed page - the block dev was being
+ * changed on read. Let's reopen here.
+ */
+ blkdev_put(bdev, dev_mode);
+ dev_mode = FMODE_WRITE | FMODE_EXCL;
+ bdev = blkdev_get_by_dev(vbmeta_devt, dev_mode,
+ invalidate_vbmeta);
+ if (IS_ERR(bdev)) {
+ DMERR("invalidate_vbmeta: could not open device for writing");
+ dev_mode = 0;
+ ret = -ENOENT;
+ goto failed_to_write;
+ }
+
+ /* We re-use the same bio to do the write after the read. Need to reset
+ * it to initialize bio->bi_remaining.
+ */
+ bio_reset(bio);
+
+ ret = invalidate_vbmeta_submit(bio, bdev, REQ_OP_WRITE,
+ access_last_sector, page);
+ if (ret) {
+ DMERR("invalidate_vbmeta: error writing");
+ goto failed_to_submit_write;
+ }
+
+ DMERR("invalidate_vbmeta: completed.");
+ ret = 0;
+failed_to_submit_write:
+failed_to_write:
+invalid_header:
+ __free_page(page);
+failed_to_submit_read:
+ /* Technically, we'll leak a page with the pending bio, but
+ * we're about to reboot anyway.
+ */
+failed_to_alloc_page:
+ bio_put(bio);
+failed_bio_alloc:
+ if (dev_mode)
+ blkdev_put(bdev, dev_mode);
+failed_to_read:
+ return ret;
+}
+
+void dm_verity_avb_error_handler(void)
+{
+ dev_t dev;
+
+ DMINFO("AVB error handler called for %s", avb_vbmeta_device);
+
+ if (strcmp(avb_invalidate_on_error, "yes") != 0) {
+ DMINFO("Not configured to invalidate");
+ return;
+ }
+
+ if (avb_vbmeta_device[0] == '\0') {
+ DMERR("avb_vbmeta_device parameter not set");
+ goto fail_no_dev;
+ }
+
+ dev = name_to_dev_t(avb_vbmeta_device);
+ if (!dev) {
+ DMERR("No matching partition for device: %s",
+ avb_vbmeta_device);
+ goto fail_no_dev;
+ }
+
+ invalidate_vbmeta(dev);
+
+fail_no_dev:
+ ;
+}
+
+static int __init dm_verity_avb_init(void)
+{
+ DMINFO("AVB error handler initialized with vbmeta device: %s",
+ avb_vbmeta_device);
+ return 0;
+}
+
+static void __exit dm_verity_avb_exit(void)
+{
+}
+
+module_init(dm_verity_avb_init);
+module_exit(dm_verity_avb_exit);
+
+MODULE_AUTHOR("David Zeuthen <zeuthen@google.com>");
+MODULE_DESCRIPTION("AVB-specific error handler for dm-verity");
+MODULE_LICENSE("GPL");
+
+/* Declare parameter with no module prefix */
+#undef MODULE_PARAM_PREFIX
+#define MODULE_PARAM_PREFIX "androidboot.vbmeta."
+module_param_string(device, avb_vbmeta_device, sizeof(avb_vbmeta_device), 0);
+module_param_string(invalidate_on_error, avb_invalidate_on_error,
+ sizeof(avb_invalidate_on_error), 0);
diff --git a/drivers/md/dm-verity-fec.c b/drivers/md/dm-verity-fec.c
index e13f908..776a4f7 100644
--- a/drivers/md/dm-verity-fec.c
+++ b/drivers/md/dm-verity-fec.c
@@ -11,6 +11,7 @@
#include "dm-verity-fec.h"
#include <linux/math64.h>
+#include <linux/sysfs.h>
#define DM_MSG_PREFIX "verity-fec"
@@ -175,9 +176,11 @@
if (r < 0 && neras)
DMERR_LIMIT("%s: FEC %llu: failed to correct: %d",
v->data_dev->name, (unsigned long long)rsb, r);
- else if (r > 0)
+ else if (r > 0) {
DMWARN_LIMIT("%s: FEC %llu: corrected %d errors",
v->data_dev->name, (unsigned long long)rsb, r);
+ atomic_add_unless(&v->fec->corrected, 1, INT_MAX);
+ }
return r;
}
@@ -545,6 +548,7 @@
void verity_fec_dtr(struct dm_verity *v)
{
struct dm_verity_fec *f = v->fec;
+ struct kobject *kobj = &f->kobj_holder.kobj;
if (!verity_fec_is_enabled(v))
goto out;
@@ -561,6 +565,12 @@
if (f->dev)
dm_put_device(v->ti, f->dev);
+
+ if (kobj->state_initialized) {
+ kobject_put(kobj);
+ wait_for_completion(dm_get_completion_from_kobject(kobj));
+ }
+
out:
kfree(f);
v->fec = NULL;
@@ -649,6 +659,28 @@
return 0;
}
+static ssize_t corrected_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ struct dm_verity_fec *f = container_of(kobj, struct dm_verity_fec,
+ kobj_holder.kobj);
+
+ return sprintf(buf, "%d\n", atomic_read(&f->corrected));
+}
+
+static struct kobj_attribute attr_corrected = __ATTR_RO(corrected);
+
+static struct attribute *fec_attrs[] = {
+ &attr_corrected.attr,
+ NULL
+};
+
+static struct kobj_type fec_ktype = {
+ .sysfs_ops = &kobj_sysfs_ops,
+ .default_attrs = fec_attrs,
+ .release = dm_kobject_release
+};
+
/*
* Allocate dm_verity_fec for v->fec. Must be called before verity_fec_ctr.
*/
@@ -672,8 +704,10 @@
*/
int verity_fec_ctr(struct dm_verity *v)
{
+ int r;
struct dm_verity_fec *f = v->fec;
struct dm_target *ti = v->ti;
+ struct mapped_device *md = dm_table_get_md(ti->table);
u64 hash_blocks;
if (!verity_fec_is_enabled(v)) {
@@ -681,6 +715,16 @@
return 0;
}
+ /* Create a kobject and sysfs attributes */
+ init_completion(&f->kobj_holder.completion);
+
+ r = kobject_init_and_add(&f->kobj_holder.kobj, &fec_ktype,
+ &disk_to_dev(dm_disk(md))->kobj, "%s", "fec");
+ if (r) {
+ ti->error = "Cannot create kobject";
+ return r;
+ }
+
/*
* FEC is computed over data blocks, possible metadata, and
* hash blocks. In other words, FEC covers total of fec_blocks
diff --git a/drivers/md/dm-verity-fec.h b/drivers/md/dm-verity-fec.h
index bb31ce8..4db0cae 100644
--- a/drivers/md/dm-verity-fec.h
+++ b/drivers/md/dm-verity-fec.h
@@ -12,6 +12,8 @@
#ifndef DM_VERITY_FEC_H
#define DM_VERITY_FEC_H
+#include "dm.h"
+#include "dm-core.h"
#include "dm-verity.h"
#include <linux/rslib.h>
@@ -51,6 +53,8 @@
mempool_t *extra_pool; /* mempool for extra buffers */
mempool_t *output_pool; /* mempool for output */
struct kmem_cache *cache; /* cache for buffers */
+ atomic_t corrected; /* corrected errors */
+ struct dm_kobject_holder kobj_holder; /* for sysfs attributes */
};
/* per-bio data */
diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c
index bda3cac..be1e888 100644
--- a/drivers/md/dm-verity-target.c
+++ b/drivers/md/dm-verity-target.c
@@ -32,6 +32,7 @@
#define DM_VERITY_OPT_LOGGING "ignore_corruption"
#define DM_VERITY_OPT_RESTART "restart_on_corruption"
#define DM_VERITY_OPT_IGN_ZEROES "ignore_zero_blocks"
+#define DM_VERITY_OPT_AT_MOST_ONCE "check_at_most_once"
#define DM_VERITY_OPTS_MAX (2 + DM_VERITY_OPTS_FEC)
@@ -275,8 +276,12 @@
if (v->mode == DM_VERITY_MODE_LOGGING)
return 0;
- if (v->mode == DM_VERITY_MODE_RESTART)
+ if (v->mode == DM_VERITY_MODE_RESTART) {
+#ifdef CONFIG_DM_VERITY_AVB
+ dm_verity_avb_error_handler();
+#endif
kernel_restart("dm-verity device corrupted");
+ }
return 1;
}
@@ -474,6 +479,18 @@
}
/*
+ * Moves the bio iter one data block forward.
+ */
+static inline void verity_bv_skip_block(struct dm_verity *v,
+ struct dm_verity_io *io,
+ struct bvec_iter *iter)
+{
+ struct bio *bio = dm_bio_from_per_bio_data(io, v->ti->per_io_data_size);
+
+ bio_advance_iter(bio, iter, 1 << v->data_dev_block_bits);
+}
+
+/*
* Verify one "dm_verity_io" structure.
*/
static int verity_verify_io(struct dm_verity_io *io)
@@ -486,9 +503,16 @@
for (b = 0; b < io->n_blocks; b++) {
int r;
+ sector_t cur_block = io->block + b;
struct ahash_request *req = verity_io_hash_req(v, io);
- r = verity_hash_for_block(v, io, io->block + b,
+ if (v->validated_blocks &&
+ likely(test_bit(cur_block, v->validated_blocks))) {
+ verity_bv_skip_block(v, io, &io->iter);
+ continue;
+ }
+
+ r = verity_hash_for_block(v, io, cur_block,
verity_io_want_digest(v, io),
&is_zero);
if (unlikely(r < 0))
@@ -522,13 +546,16 @@
return r;
if (likely(memcmp(verity_io_real_digest(v, io),
- verity_io_want_digest(v, io), v->digest_size) == 0))
+ verity_io_want_digest(v, io), v->digest_size) == 0)) {
+ if (v->validated_blocks)
+ set_bit(cur_block, v->validated_blocks);
continue;
+ }
else if (verity_fec_decode(v, io, DM_VERITY_BLOCK_TYPE_DATA,
- io->block + b, NULL, &start) == 0)
+ cur_block, NULL, &start) == 0)
continue;
else if (verity_handle_err(v, DM_VERITY_BLOCK_TYPE_DATA,
- io->block + b))
+ cur_block))
return -EIO;
}
@@ -582,6 +609,7 @@
container_of(work, struct dm_verity_prefetch_work, work);
struct dm_verity *v = pw->v;
int i;
+ sector_t prefetch_size;
for (i = v->levels - 2; i >= 0; i--) {
sector_t hash_block_start;
@@ -604,8 +632,14 @@
hash_block_end = v->hash_blocks - 1;
}
no_prefetch_cluster:
+ // for emmc, it is more efficient to send bigger read
+ prefetch_size = max((sector_t)CONFIG_DM_VERITY_HASH_PREFETCH_MIN_SIZE,
+ hash_block_end - hash_block_start + 1);
+ if ((hash_block_start + prefetch_size) >= (v->hash_start + v->hash_blocks)) {
+ prefetch_size = hash_block_end - hash_block_start + 1;
+ }
dm_bufio_prefetch(v->bufio, hash_block_start,
- hash_block_end - hash_block_start + 1);
+ prefetch_size);
}
kfree(pw);
@@ -632,7 +666,7 @@
* Bio map function. It allocates dm_verity_io structure and bio vector and
* fills them. Then it issues prefetches and the I/O.
*/
-static int verity_map(struct dm_target *ti, struct bio *bio)
+int verity_map(struct dm_target *ti, struct bio *bio)
{
struct dm_verity *v = ti->private;
struct dm_verity_io *io;
@@ -677,7 +711,7 @@
/*
* Status: V (valid) or C (corruption found)
*/
-static void verity_status(struct dm_target *ti, status_type_t type,
+void verity_status(struct dm_target *ti, status_type_t type,
unsigned status_flags, char *result, unsigned maxlen)
{
struct dm_verity *v = ti->private;
@@ -714,6 +748,8 @@
args += DM_VERITY_OPTS_FEC;
if (v->zero_digest)
args++;
+ if (v->validated_blocks)
+ args++;
if (!args)
return;
DMEMIT(" %u", args);
@@ -732,12 +768,14 @@
}
if (v->zero_digest)
DMEMIT(" " DM_VERITY_OPT_IGN_ZEROES);
+ if (v->validated_blocks)
+ DMEMIT(" " DM_VERITY_OPT_AT_MOST_ONCE);
sz = verity_fec_status_table(v, sz, result, maxlen);
break;
}
}
-static int verity_prepare_ioctl(struct dm_target *ti,
+int verity_prepare_ioctl(struct dm_target *ti,
struct block_device **bdev, fmode_t *mode)
{
struct dm_verity *v = ti->private;
@@ -750,7 +788,7 @@
return 0;
}
-static int verity_iterate_devices(struct dm_target *ti,
+int verity_iterate_devices(struct dm_target *ti,
iterate_devices_callout_fn fn, void *data)
{
struct dm_verity *v = ti->private;
@@ -758,7 +796,7 @@
return fn(ti, v->data_dev, v->data_start, ti->len, data);
}
-static void verity_io_hints(struct dm_target *ti, struct queue_limits *limits)
+void verity_io_hints(struct dm_target *ti, struct queue_limits *limits)
{
struct dm_verity *v = ti->private;
@@ -771,7 +809,7 @@
blk_limits_io_min(limits, limits->logical_block_size);
}
-static void verity_dtr(struct dm_target *ti)
+void verity_dtr(struct dm_target *ti)
{
struct dm_verity *v = ti->private;
@@ -781,6 +819,7 @@
if (v->bufio)
dm_bufio_client_destroy(v->bufio);
+ kvfree(v->validated_blocks);
kfree(v->salt);
kfree(v->root_digest);
kfree(v->zero_digest);
@@ -801,6 +840,26 @@
kfree(v);
}
+static int verity_alloc_most_once(struct dm_verity *v)
+{
+ struct dm_target *ti = v->ti;
+
+ /* the bitset can only handle INT_MAX blocks */
+ if (v->data_blocks > INT_MAX) {
+ ti->error = "device too large to use check_at_most_once";
+ return -E2BIG;
+ }
+
+ v->validated_blocks = kvzalloc(BITS_TO_LONGS(v->data_blocks) *
+ sizeof(unsigned long), GFP_KERNEL);
+ if (!v->validated_blocks) {
+ ti->error = "failed to allocate bitset for check_at_most_once";
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
static int verity_alloc_zero_digest(struct dm_verity *v)
{
int r = -ENOMEM;
@@ -870,6 +929,12 @@
}
continue;
+ } else if (!strcasecmp(arg_name, DM_VERITY_OPT_AT_MOST_ONCE)) {
+ r = verity_alloc_most_once(v);
+ if (r)
+ return r;
+ continue;
+
} else if (verity_is_fec_opt_arg(arg_name)) {
r = verity_fec_parse_opt_args(as, v, &argc, arg_name);
if (r)
@@ -898,7 +963,7 @@
* <digest>
* <salt> Hex string or "-" if no salt.
*/
-static int verity_ctr(struct dm_target *ti, unsigned argc, char **argv)
+int verity_ctr(struct dm_target *ti, unsigned argc, char **argv)
{
struct dm_verity *v;
struct dm_arg_set as;
@@ -1062,6 +1127,14 @@
goto bad;
}
+#ifdef CONFIG_DM_ANDROID_VERITY_AT_MOST_ONCE_DEFAULT_ENABLED
+ if (!v->validated_blocks) {
+ r = verity_alloc_most_once(v);
+ if (r)
+ goto bad;
+ }
+#endif
+
v->hash_per_block_bits =
__fls((1 << v->hash_dev_block_bits) / v->digest_size);
@@ -1137,7 +1210,7 @@
static struct target_type verity_target = {
.name = "verity",
- .version = {1, 3, 0},
+ .version = {1, 4, 0},
.module = THIS_MODULE,
.ctr = verity_ctr,
.dtr = verity_dtr,
diff --git a/drivers/md/dm-verity.h b/drivers/md/dm-verity.h
index a59e0ad..116e91f 100644
--- a/drivers/md/dm-verity.h
+++ b/drivers/md/dm-verity.h
@@ -63,6 +63,7 @@
sector_t hash_level_block[DM_VERITY_MAX_LEVELS];
struct dm_verity_fec *fec; /* forward error correction */
+ unsigned long *validated_blocks; /* bitset blocks validated */
};
struct dm_verity_io {
@@ -131,4 +132,15 @@
extern int verity_hash_for_block(struct dm_verity *v, struct dm_verity_io *io,
sector_t block, u8 *digest, bool *is_zero);
+extern void verity_status(struct dm_target *ti, status_type_t type,
+ unsigned status_flags, char *result, unsigned maxlen);
+extern int verity_prepare_ioctl(struct dm_target *ti,
+ struct block_device **bdev, fmode_t *mode);
+extern int verity_iterate_devices(struct dm_target *ti,
+ iterate_devices_callout_fn fn, void *data);
+extern void verity_io_hints(struct dm_target *ti, struct queue_limits *limits);
+extern void verity_dtr(struct dm_target *ti);
+extern int verity_ctr(struct dm_target *ti, unsigned argc, char **argv);
+extern int verity_map(struct dm_target *ti, struct bio *bio);
+extern void dm_verity_avb_error_handler(void);
#endif /* DM_VERITY_H */
diff --git a/drivers/md/dm.h b/drivers/md/dm.h
index 38c84c0..ab289ce 100644
--- a/drivers/md/dm.h
+++ b/drivers/md/dm.h
@@ -80,8 +80,6 @@
enum dm_queue_mode dm_get_md_type(struct mapped_device *md);
struct target_type *dm_get_immutable_target_type(struct mapped_device *md);
-int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t);
-
/*
* To check the return value from dm_table_find_target().
*/
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 5599712..24f0f7c 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5371,7 +5371,7 @@
return NULL;
}
-static int add_named_array(const char *val, struct kernel_param *kp)
+static int add_named_array(const char *val, const struct kernel_param *kp)
{
/*
* val must be "md_*" or "mdNNN".
@@ -9346,11 +9346,11 @@
subsys_initcall(md_init);
module_exit(md_exit)
-static int get_ro(char *buffer, struct kernel_param *kp)
+static int get_ro(char *buffer, const struct kernel_param *kp)
{
return sprintf(buffer, "%d", start_readonly);
}
-static int set_ro(const char *val, struct kernel_param *kp)
+static int set_ro(const char *val, const struct kernel_param *kp)
{
return kstrtouint(val, 10, (unsigned int *)&start_readonly);
}
diff --git a/drivers/media/pci/tw686x/tw686x-core.c b/drivers/media/pci/tw686x/tw686x-core.c
index 336e2f9..b762e5f 100644
--- a/drivers/media/pci/tw686x/tw686x-core.c
+++ b/drivers/media/pci/tw686x/tw686x-core.c
@@ -72,12 +72,12 @@
}
}
-static int tw686x_dma_mode_get(char *buffer, struct kernel_param *kp)
+static int tw686x_dma_mode_get(char *buffer, const struct kernel_param *kp)
{
return sprintf(buffer, "%s", dma_mode_name(dma_mode));
}
-static int tw686x_dma_mode_set(const char *val, struct kernel_param *kp)
+static int tw686x_dma_mode_set(const char *val, const struct kernel_param *kp)
{
if (!strcasecmp(val, dma_mode_name(TW686X_DMA_MODE_MEMCPY)))
dma_mode = TW686X_DMA_MODE_MEMCPY;
diff --git a/drivers/media/usb/uvc/uvc_driver.c b/drivers/media/usb/uvc/uvc_driver.c
index 6d22b22..28b91b7 100644
--- a/drivers/media/usb/uvc/uvc_driver.c
+++ b/drivers/media/usb/uvc/uvc_driver.c
@@ -2230,7 +2230,7 @@
* Module parameters
*/
-static int uvc_clock_param_get(char *buffer, struct kernel_param *kp)
+static int uvc_clock_param_get(char *buffer, const struct kernel_param *kp)
{
if (uvc_clock_param == CLOCK_MONOTONIC)
return sprintf(buffer, "CLOCK_MONOTONIC");
@@ -2238,7 +2238,7 @@
return sprintf(buffer, "CLOCK_REALTIME");
}
-static int uvc_clock_param_set(const char *val, struct kernel_param *kp)
+static int uvc_clock_param_set(const char *val, const struct kernel_param *kp)
{
if (strncasecmp(val, "clock_", strlen("clock_")) == 0)
val += strlen("clock_");
diff --git a/drivers/media/v4l2-core/v4l2-ioctl.c b/drivers/media/v4l2-core/v4l2-ioctl.c
index d06941c..9a3a724 100644
--- a/drivers/media/v4l2-core/v4l2-ioctl.c
+++ b/drivers/media/v4l2-core/v4l2-ioctl.c
@@ -2480,11 +2480,8 @@
unsigned int ioctl;
u32 flags;
const char * const name;
- union {
- u32 offset;
- int (*func)(const struct v4l2_ioctl_ops *ops,
- struct file *file, void *fh, void *p);
- } u;
+ int (*func)(const struct v4l2_ioctl_ops *ops, struct file *file,
+ void *fh, void *p);
void (*debug)(const void *arg, bool write_only);
};
@@ -2492,27 +2489,23 @@
#define INFO_FL_PRIO (1 << 0)
/* This control can be valid if the filehandle passes a control handler. */
#define INFO_FL_CTRL (1 << 1)
-/* This is a standard ioctl, no need for special code */
-#define INFO_FL_STD (1 << 2)
/* This is ioctl has its own function */
-#define INFO_FL_FUNC (1 << 3)
+#define INFO_FL_FUNC (1 << 2)
/* Queuing ioctl */
-#define INFO_FL_QUEUE (1 << 4)
+#define INFO_FL_QUEUE (1 << 3)
/* Always copy back result, even on error */
-#define INFO_FL_ALWAYS_COPY (1 << 5)
+#define INFO_FL_ALWAYS_COPY (1 << 4)
/* Zero struct from after the field to the end */
#define INFO_FL_CLEAR(v4l2_struct, field) \
((offsetof(struct v4l2_struct, field) + \
sizeof(((struct v4l2_struct *)0)->field)) << 16)
#define INFO_FL_CLEAR_MASK (_IOC_SIZEMASK << 16)
-#define IOCTL_INFO_STD(_ioctl, _vidioc, _debug, _flags) \
- [_IOC_NR(_ioctl)] = { \
- .ioctl = _ioctl, \
- .flags = _flags | INFO_FL_STD, \
- .name = #_ioctl, \
- .u.offset = offsetof(struct v4l2_ioctl_ops, _vidioc), \
- .debug = _debug, \
+#define DEFINE_IOCTL_STD_FNC(_vidioc) \
+ static int __v4l_ ## _vidioc ## _fnc( \
+ const struct v4l2_ioctl_ops *ops, \
+ struct file *file, void *fh, void *p) { \
+ return ops->_vidioc(file, fh, p); \
}
#define IOCTL_INFO_FNC(_ioctl, _func, _debug, _flags) \
@@ -2520,10 +2513,42 @@
.ioctl = _ioctl, \
.flags = _flags | INFO_FL_FUNC, \
.name = #_ioctl, \
- .u.func = _func, \
+ .func = _func, \
.debug = _debug, \
}
+#define IOCTL_INFO_STD(_ioctl, _vidioc, _debug, _flags) \
+ IOCTL_INFO_FNC(_ioctl, __v4l_ ## _vidioc ## _fnc, _debug, _flags)
+
+DEFINE_IOCTL_STD_FNC(vidioc_g_fbuf)
+DEFINE_IOCTL_STD_FNC(vidioc_s_fbuf)
+DEFINE_IOCTL_STD_FNC(vidioc_expbuf)
+DEFINE_IOCTL_STD_FNC(vidioc_g_std)
+DEFINE_IOCTL_STD_FNC(vidioc_g_audio)
+DEFINE_IOCTL_STD_FNC(vidioc_s_audio)
+DEFINE_IOCTL_STD_FNC(vidioc_g_input)
+DEFINE_IOCTL_STD_FNC(vidioc_g_edid)
+DEFINE_IOCTL_STD_FNC(vidioc_s_edid)
+DEFINE_IOCTL_STD_FNC(vidioc_g_output)
+DEFINE_IOCTL_STD_FNC(vidioc_g_audout)
+DEFINE_IOCTL_STD_FNC(vidioc_s_audout)
+DEFINE_IOCTL_STD_FNC(vidioc_g_jpegcomp)
+DEFINE_IOCTL_STD_FNC(vidioc_s_jpegcomp)
+DEFINE_IOCTL_STD_FNC(vidioc_enumaudio)
+DEFINE_IOCTL_STD_FNC(vidioc_enumaudout)
+DEFINE_IOCTL_STD_FNC(vidioc_enum_framesizes)
+DEFINE_IOCTL_STD_FNC(vidioc_enum_frameintervals)
+DEFINE_IOCTL_STD_FNC(vidioc_g_enc_index)
+DEFINE_IOCTL_STD_FNC(vidioc_encoder_cmd)
+DEFINE_IOCTL_STD_FNC(vidioc_try_encoder_cmd)
+DEFINE_IOCTL_STD_FNC(vidioc_decoder_cmd)
+DEFINE_IOCTL_STD_FNC(vidioc_try_decoder_cmd)
+DEFINE_IOCTL_STD_FNC(vidioc_s_dv_timings)
+DEFINE_IOCTL_STD_FNC(vidioc_g_dv_timings)
+DEFINE_IOCTL_STD_FNC(vidioc_enum_dv_timings)
+DEFINE_IOCTL_STD_FNC(vidioc_query_dv_timings)
+DEFINE_IOCTL_STD_FNC(vidioc_dv_timings_cap)
+
static struct v4l2_ioctl_info v4l2_ioctls[] = {
IOCTL_INFO_FNC(VIDIOC_QUERYCAP, v4l_querycap, v4l_print_querycap, 0),
IOCTL_INFO_FNC(VIDIOC_ENUM_FMT, v4l_enum_fmt, v4l_print_fmtdesc, INFO_FL_CLEAR(v4l2_fmtdesc, type)),
@@ -2708,14 +2733,8 @@
}
write_only = _IOC_DIR(cmd) == _IOC_WRITE;
- if (info->flags & INFO_FL_STD) {
- typedef int (*vidioc_op)(struct file *file, void *fh, void *p);
- const void *p = vfd->ioctl_ops;
- const vidioc_op *vidioc = p + info->u.offset;
-
- ret = (*vidioc)(file, fh, arg);
- } else if (info->flags & INFO_FL_FUNC) {
- ret = info->u.func(ops, file, fh, arg);
+ if (info->flags & INFO_FL_FUNC) {
+ ret = info->func(ops, file, fh, arg);
} else if (!ops->vidioc_default) {
ret = -ENOTTY;
} else {
diff --git a/drivers/message/fusion/mptbase.c b/drivers/message/fusion/mptbase.c
index 84eab28..7a93400 100644
--- a/drivers/message/fusion/mptbase.c
+++ b/drivers/message/fusion/mptbase.c
@@ -99,7 +99,7 @@
MODULE_PARM_DESC(mpt_channel_mapping, " Mapping id's to channels (default=0)");
static int mpt_debug_level;
-static int mpt_set_debug_level(const char *val, struct kernel_param *kp);
+static int mpt_set_debug_level(const char *val, const struct kernel_param *kp);
module_param_call(mpt_debug_level, mpt_set_debug_level, param_get_int,
&mpt_debug_level, 0600);
MODULE_PARM_DESC(mpt_debug_level,
@@ -242,7 +242,7 @@
pci_write_config_word(pdev, PCI_COMMAND, command_reg);
}
-static int mpt_set_debug_level(const char *val, struct kernel_param *kp)
+static int mpt_set_debug_level(const char *val, const struct kernel_param *kp)
{
int ret = param_set_int(val, kp);
MPT_ADAPTER *ioc;
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index 8136dc7..ce2449a 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -506,6 +506,27 @@
Enable this configuration option to enable the host side test driver
for PCI Endpoint.
+config UID_SYS_STATS
+ bool "Per-UID statistics"
+ depends on PROFILING && TASK_XACCT && TASK_IO_ACCOUNTING
+ help
+ Per UID based cpu time statistics exported to /proc/uid_cputime
+ Per UID based io statistics exported to /proc/uid_io
+ Per UID based procstat control in /proc/uid_procstat
+
+config UID_SYS_STATS_DEBUG
+ bool "Per-TASK statistics"
+ depends on UID_SYS_STATS
+ default n
+ help
+ Per TASK based io statistics exported to /proc/uid_io
+
+config MEMORY_STATE_TIME
+ tristate "Memory freq/bandwidth time statistics"
+ depends on PROFILING
+ help
+ Memory time statistics exported to /sys/kernel/memory_state_time
+
source "drivers/misc/c2port/Kconfig"
source "drivers/misc/eeprom/Kconfig"
source "drivers/misc/cb710/Kconfig"
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index ad0e64f..a4ccc7d 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -57,6 +57,9 @@
obj-$(CONFIG_ASPEED_LPC_SNOOP) += aspeed-lpc-snoop.o
obj-$(CONFIG_PCI_ENDPOINT_TEST) += pci_endpoint_test.o
+obj-$(CONFIG_UID_SYS_STATS) += uid_sys_stats.o
+obj-$(CONFIG_MEMORY_STATE_TIME) += memory_state_time.o
+
lkdtm-$(CONFIG_LKDTM) += lkdtm_core.o
lkdtm-$(CONFIG_LKDTM) += lkdtm_bugs.o
lkdtm-$(CONFIG_LKDTM) += lkdtm_heap.o
@@ -66,6 +69,7 @@
lkdtm-$(CONFIG_LKDTM) += lkdtm_usercopy.o
KCOV_INSTRUMENT_lkdtm_rodata.o := n
+CFLAGS_lkdtm_rodata.o += $(DISABLE_LTO)
OBJCOPYFLAGS :=
OBJCOPYFLAGS_lkdtm_rodata_objcopy.o := \
diff --git a/drivers/misc/kgdbts.c b/drivers/misc/kgdbts.c
index fc7efed..24108bf 100644
--- a/drivers/misc/kgdbts.c
+++ b/drivers/misc/kgdbts.c
@@ -1132,7 +1132,8 @@
ts.run_test(0, chr);
}
-static int param_set_kgdbts_var(const char *kmessage, struct kernel_param *kp)
+static int param_set_kgdbts_var(const char *kmessage,
+ const struct kernel_param *kp)
{
int len = strlen(kmessage);
diff --git a/drivers/misc/memory_state_time.c b/drivers/misc/memory_state_time.c
new file mode 100644
index 0000000..ba94dcf
--- /dev/null
+++ b/drivers/misc/memory_state_time.c
@@ -0,0 +1,462 @@
+/* drivers/misc/memory_state_time.c
+ *
+ * Copyright (C) 2016 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/errno.h>
+#include <linux/hashtable.h>
+#include <linux/kconfig.h>
+#include <linux/kernel.h>
+#include <linux/kobject.h>
+#include <linux/memory-state-time.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/of_platform.h>
+#include <linux/slab.h>
+#include <linux/sysfs.h>
+#include <linux/time.h>
+#include <linux/timekeeping.h>
+#include <linux/workqueue.h>
+
+#define KERNEL_ATTR_RO(_name) \
+static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
+
+#define KERNEL_ATTR_RW(_name) \
+static struct kobj_attribute _name##_attr = \
+ __ATTR(_name, 0644, _name##_show, _name##_store)
+
+#define FREQ_HASH_BITS 4
+DECLARE_HASHTABLE(freq_hash_table, FREQ_HASH_BITS);
+
+static DEFINE_MUTEX(mem_lock);
+
+#define TAG "memory_state_time"
+#define BW_NODE "/soc/memory-state-time"
+#define FREQ_TBL "freq-tbl"
+#define BW_TBL "bw-buckets"
+#define NUM_SOURCES "num-sources"
+
+#define LOWEST_FREQ 2
+
+static int curr_bw;
+static int curr_freq;
+static u32 *bw_buckets;
+static u32 *freq_buckets;
+static int num_freqs;
+static int num_buckets;
+static int registered_bw_sources;
+static u64 last_update;
+static bool init_success;
+static struct workqueue_struct *memory_wq;
+static u32 num_sources = 10;
+static int *bandwidths;
+
+struct freq_entry {
+ int freq;
+ u64 *buckets; /* Bandwidth buckets. */
+ struct hlist_node hash;
+};
+
+struct queue_container {
+ struct work_struct update_state;
+ int value;
+ u64 time_now;
+ int id;
+ struct mutex *lock;
+};
+
+static int find_bucket(int bw)
+{
+ int i;
+
+ if (bw_buckets != NULL) {
+ for (i = 0; i < num_buckets; i++) {
+ if (bw_buckets[i] > bw) {
+ pr_debug("Found bucket %d for bandwidth %d\n",
+ i, bw);
+ return i;
+ }
+ }
+ return num_buckets - 1;
+ }
+ return 0;
+}
+
+static u64 get_time_diff(u64 time_now)
+{
+ u64 ms;
+
+ ms = time_now - last_update;
+ last_update = time_now;
+ return ms;
+}
+
+static ssize_t show_stat_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ int i, j;
+ int len = 0;
+ struct freq_entry *freq_entry;
+
+ for (i = 0; i < num_freqs; i++) {
+ hash_for_each_possible(freq_hash_table, freq_entry, hash,
+ freq_buckets[i]) {
+ if (freq_entry->freq == freq_buckets[i]) {
+ len += scnprintf(buf + len, PAGE_SIZE - len,
+ "%d ", freq_buckets[i]);
+ if (len >= PAGE_SIZE)
+ break;
+ for (j = 0; j < num_buckets; j++) {
+ len += scnprintf(buf + len,
+ PAGE_SIZE - len,
+ "%llu ",
+ freq_entry->buckets[j]);
+ }
+ len += scnprintf(buf + len, PAGE_SIZE - len,
+ "\n");
+ }
+ }
+ }
+ pr_debug("Current Time: %llu\n", ktime_get_boot_ns());
+ return len;
+}
+KERNEL_ATTR_RO(show_stat);
+
+static void update_table(u64 time_now)
+{
+ struct freq_entry *freq_entry;
+
+ pr_debug("Last known bw %d freq %d\n", curr_bw, curr_freq);
+ hash_for_each_possible(freq_hash_table, freq_entry, hash, curr_freq) {
+ if (curr_freq == freq_entry->freq) {
+ freq_entry->buckets[find_bucket(curr_bw)]
+ += get_time_diff(time_now);
+ break;
+ }
+ }
+}
+
+static bool freq_exists(int freq)
+{
+ int i;
+
+ for (i = 0; i < num_freqs; i++) {
+ if (freq == freq_buckets[i])
+ return true;
+ }
+ return false;
+}
+
+static int calculate_total_bw(int bw, int index)
+{
+ int i;
+ int total_bw = 0;
+
+ pr_debug("memory_state_time New bw %d for id %d\n", bw, index);
+ bandwidths[index] = bw;
+ for (i = 0; i < registered_bw_sources; i++)
+ total_bw += bandwidths[i];
+ return total_bw;
+}
+
+static void freq_update_do_work(struct work_struct *work)
+{
+ struct queue_container *freq_state_update
+ = container_of(work, struct queue_container,
+ update_state);
+ if (freq_state_update) {
+ mutex_lock(&mem_lock);
+ update_table(freq_state_update->time_now);
+ curr_freq = freq_state_update->value;
+ mutex_unlock(&mem_lock);
+ kfree(freq_state_update);
+ }
+}
+
+static void bw_update_do_work(struct work_struct *work)
+{
+ struct queue_container *bw_state_update
+ = container_of(work, struct queue_container,
+ update_state);
+ if (bw_state_update) {
+ mutex_lock(&mem_lock);
+ update_table(bw_state_update->time_now);
+ curr_bw = calculate_total_bw(bw_state_update->value,
+ bw_state_update->id);
+ mutex_unlock(&mem_lock);
+ kfree(bw_state_update);
+ }
+}
+
+static void memory_state_freq_update(struct memory_state_update_block *ub,
+ int value)
+{
+ if (IS_ENABLED(CONFIG_MEMORY_STATE_TIME)) {
+ if (freq_exists(value) && init_success) {
+ struct queue_container *freq_container
+ = kmalloc(sizeof(struct queue_container),
+ GFP_KERNEL);
+ if (!freq_container)
+ return;
+ INIT_WORK(&freq_container->update_state,
+ freq_update_do_work);
+ freq_container->time_now = ktime_get_boot_ns();
+ freq_container->value = value;
+ pr_debug("Scheduling freq update in work queue\n");
+ queue_work(memory_wq, &freq_container->update_state);
+ } else {
+ pr_debug("Freq does not exist.\n");
+ }
+ }
+}
+
+static void memory_state_bw_update(struct memory_state_update_block *ub,
+ int value)
+{
+ if (IS_ENABLED(CONFIG_MEMORY_STATE_TIME)) {
+ if (init_success) {
+ struct queue_container *bw_container
+ = kmalloc(sizeof(struct queue_container),
+ GFP_KERNEL);
+ if (!bw_container)
+ return;
+ INIT_WORK(&bw_container->update_state,
+ bw_update_do_work);
+ bw_container->time_now = ktime_get_boot_ns();
+ bw_container->value = value;
+ bw_container->id = ub->id;
+ pr_debug("Scheduling bandwidth update in work queue\n");
+ queue_work(memory_wq, &bw_container->update_state);
+ }
+ }
+}
+
+struct memory_state_update_block *memory_state_register_frequency_source(void)
+{
+ struct memory_state_update_block *block;
+
+ if (IS_ENABLED(CONFIG_MEMORY_STATE_TIME)) {
+ pr_debug("Allocating frequency source\n");
+ block = kmalloc(sizeof(struct memory_state_update_block),
+ GFP_KERNEL);
+ if (!block)
+ return NULL;
+ block->update_call = memory_state_freq_update;
+ return block;
+ }
+ pr_err("Config option disabled.\n");
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(memory_state_register_frequency_source);
+
+struct memory_state_update_block *memory_state_register_bandwidth_source(void)
+{
+ struct memory_state_update_block *block;
+
+ if (IS_ENABLED(CONFIG_MEMORY_STATE_TIME)) {
+ pr_debug("Allocating bandwidth source %d\n",
+ registered_bw_sources);
+ block = kmalloc(sizeof(struct memory_state_update_block),
+ GFP_KERNEL);
+ if (!block)
+ return NULL;
+ block->update_call = memory_state_bw_update;
+ if (registered_bw_sources < num_sources) {
+ block->id = registered_bw_sources++;
+ } else {
+ pr_err("Unable to allocate source; max number reached\n");
+ kfree(block);
+ return NULL;
+ }
+ return block;
+ }
+ pr_err("Config option disabled.\n");
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(memory_state_register_bandwidth_source);
+
+/* Buckets are designated by their maximum.
+ * Returns the buckets decided by the capability of the device.
+ */
+static int get_bw_buckets(struct device *dev)
+{
+ int ret, lenb;
+ struct device_node *node = dev->of_node;
+
+ of_property_read_u32(node, NUM_SOURCES, &num_sources);
+ if (!of_find_property(node, BW_TBL, &lenb)) {
+ pr_err("Missing %s property\n", BW_TBL);
+ return -ENODATA;
+ }
+
+ bandwidths = devm_kzalloc(dev,
+ sizeof(*bandwidths) * num_sources, GFP_KERNEL);
+ if (!bandwidths)
+ return -ENOMEM;
+ lenb /= sizeof(*bw_buckets);
+ bw_buckets = devm_kzalloc(dev, lenb * sizeof(*bw_buckets),
+ GFP_KERNEL);
+ if (!bw_buckets) {
+ devm_kfree(dev, bandwidths);
+ return -ENOMEM;
+ }
+ ret = of_property_read_u32_array(node, BW_TBL, bw_buckets,
+ lenb);
+ if (ret < 0) {
+ devm_kfree(dev, bandwidths);
+ devm_kfree(dev, bw_buckets);
+ pr_err("Unable to read bandwidth table from device tree.\n");
+ return ret;
+ }
+
+ curr_bw = 0;
+ num_buckets = lenb;
+ return 0;
+}
+
+/* Adds struct freq_entry nodes to the hashtable for each compatible frequency.
+ * Returns the supported number of frequencies.
+ */
+static int freq_buckets_init(struct device *dev)
+{
+ struct freq_entry *freq_entry;
+ int i;
+ int ret, lenf;
+ struct device_node *node = dev->of_node;
+
+ if (!of_find_property(node, FREQ_TBL, &lenf)) {
+ pr_err("Missing %s property\n", FREQ_TBL);
+ return -ENODATA;
+ }
+
+ lenf /= sizeof(*freq_buckets);
+ freq_buckets = devm_kzalloc(dev, lenf * sizeof(*freq_buckets),
+ GFP_KERNEL);
+ if (!freq_buckets)
+ return -ENOMEM;
+ pr_debug("freqs found len %d\n", lenf);
+ ret = of_property_read_u32_array(node, FREQ_TBL, freq_buckets,
+ lenf);
+ if (ret < 0) {
+ devm_kfree(dev, freq_buckets);
+ pr_err("Unable to read frequency table from device tree.\n");
+ return ret;
+ }
+ pr_debug("ret freq %d\n", ret);
+
+ num_freqs = lenf;
+ curr_freq = freq_buckets[LOWEST_FREQ];
+
+ for (i = 0; i < num_freqs; i++) {
+ freq_entry = devm_kzalloc(dev, sizeof(struct freq_entry),
+ GFP_KERNEL);
+ if (!freq_entry)
+ return -ENOMEM;
+ freq_entry->buckets = devm_kzalloc(dev, sizeof(u64)*num_buckets,
+ GFP_KERNEL);
+ if (!freq_entry->buckets) {
+ devm_kfree(dev, freq_entry);
+ return -ENOMEM;
+ }
+ pr_debug("memory_state_time Adding freq to ht %d\n",
+ freq_buckets[i]);
+ freq_entry->freq = freq_buckets[i];
+ hash_add(freq_hash_table, &freq_entry->hash, freq_buckets[i]);
+ }
+ return 0;
+}
+
+struct kobject *memory_kobj;
+EXPORT_SYMBOL_GPL(memory_kobj);
+
+static struct attribute *memory_attrs[] = {
+ &show_stat_attr.attr,
+ NULL
+};
+
+static struct attribute_group memory_attr_group = {
+ .attrs = memory_attrs,
+};
+
+static int memory_state_time_probe(struct platform_device *pdev)
+{
+ int error;
+
+ error = get_bw_buckets(&pdev->dev);
+ if (error)
+ return error;
+ error = freq_buckets_init(&pdev->dev);
+ if (error)
+ return error;
+ last_update = ktime_get_boot_ns();
+ init_success = true;
+
+ pr_debug("memory_state_time initialized with num_freqs %d\n",
+ num_freqs);
+ return 0;
+}
+
+static const struct of_device_id match_table[] = {
+ { .compatible = "memory-state-time" },
+ {}
+};
+
+static struct platform_driver memory_state_time_driver = {
+ .probe = memory_state_time_probe,
+ .driver = {
+ .name = "memory-state-time",
+ .of_match_table = match_table,
+ .owner = THIS_MODULE,
+ },
+};
+
+static int __init memory_state_time_init(void)
+{
+ int error;
+
+ hash_init(freq_hash_table);
+ memory_wq = create_singlethread_workqueue("memory_wq");
+ if (!memory_wq) {
+ pr_err("Unable to create workqueue.\n");
+ return -EINVAL;
+ }
+ /*
+ * Create sys/kernel directory for memory_state_time.
+ */
+ memory_kobj = kobject_create_and_add(TAG, kernel_kobj);
+ if (!memory_kobj) {
+ pr_err("Unable to allocate memory_kobj for sysfs directory.\n");
+ error = -ENOMEM;
+ goto wq;
+ }
+ error = sysfs_create_group(memory_kobj, &memory_attr_group);
+ if (error) {
+ pr_err("Unable to create sysfs folder.\n");
+ goto kobj;
+ }
+
+ error = platform_driver_register(&memory_state_time_driver);
+ if (error) {
+ pr_err("Unable to register memory_state_time platform driver.\n");
+ goto group;
+ }
+ return 0;
+
+group: sysfs_remove_group(memory_kobj, &memory_attr_group);
+kobj: kobject_put(memory_kobj);
+wq: destroy_workqueue(memory_wq);
+ return error;
+}
+module_init(memory_state_time_init);
diff --git a/drivers/misc/uid_sys_stats.c b/drivers/misc/uid_sys_stats.c
new file mode 100644
index 0000000..88dc1cd
--- /dev/null
+++ b/drivers/misc/uid_sys_stats.c
@@ -0,0 +1,703 @@
+/* drivers/misc/uid_sys_stats.c
+ *
+ * Copyright (C) 2014 - 2015 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/atomic.h>
+#include <linux/cpufreq_times.h>
+#include <linux/err.h>
+#include <linux/hashtable.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/proc_fs.h>
+#include <linux/profile.h>
+#include <linux/rtmutex.h>
+#include <linux/sched/cputime.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+
+
+#define UID_HASH_BITS 10
+DECLARE_HASHTABLE(hash_table, UID_HASH_BITS);
+
+static DEFINE_RT_MUTEX(uid_lock);
+static struct proc_dir_entry *cpu_parent;
+static struct proc_dir_entry *io_parent;
+static struct proc_dir_entry *proc_parent;
+
+struct io_stats {
+ u64 read_bytes;
+ u64 write_bytes;
+ u64 rchar;
+ u64 wchar;
+ u64 fsync;
+};
+
+#define UID_STATE_FOREGROUND 0
+#define UID_STATE_BACKGROUND 1
+#define UID_STATE_BUCKET_SIZE 2
+
+#define UID_STATE_TOTAL_CURR 2
+#define UID_STATE_TOTAL_LAST 3
+#define UID_STATE_DEAD_TASKS 4
+#define UID_STATE_SIZE 5
+
+#define MAX_TASK_COMM_LEN 256
+
+struct task_entry {
+ char comm[MAX_TASK_COMM_LEN];
+ pid_t pid;
+ struct io_stats io[UID_STATE_SIZE];
+ struct hlist_node hash;
+};
+
+struct uid_entry {
+ uid_t uid;
+ u64 utime;
+ u64 stime;
+ u64 active_utime;
+ u64 active_stime;
+ int state;
+ struct io_stats io[UID_STATE_SIZE];
+ struct hlist_node hash;
+#ifdef CONFIG_UID_SYS_STATS_DEBUG
+ DECLARE_HASHTABLE(task_entries, UID_HASH_BITS);
+#endif
+};
+
+static u64 compute_write_bytes(struct task_struct *task)
+{
+ if (task->ioac.write_bytes <= task->ioac.cancelled_write_bytes)
+ return 0;
+
+ return task->ioac.write_bytes - task->ioac.cancelled_write_bytes;
+}
+
+static void compute_io_bucket_stats(struct io_stats *io_bucket,
+ struct io_stats *io_curr,
+ struct io_stats *io_last,
+ struct io_stats *io_dead)
+{
+ /* tasks could switch to another uid group, but its io_last in the
+ * previous uid group could still be positive.
+ * therefore before each update, do an overflow check first
+ */
+ int64_t delta;
+
+ delta = io_curr->read_bytes + io_dead->read_bytes -
+ io_last->read_bytes;
+ io_bucket->read_bytes += delta > 0 ? delta : 0;
+ delta = io_curr->write_bytes + io_dead->write_bytes -
+ io_last->write_bytes;
+ io_bucket->write_bytes += delta > 0 ? delta : 0;
+ delta = io_curr->rchar + io_dead->rchar - io_last->rchar;
+ io_bucket->rchar += delta > 0 ? delta : 0;
+ delta = io_curr->wchar + io_dead->wchar - io_last->wchar;
+ io_bucket->wchar += delta > 0 ? delta : 0;
+ delta = io_curr->fsync + io_dead->fsync - io_last->fsync;
+ io_bucket->fsync += delta > 0 ? delta : 0;
+
+ io_last->read_bytes = io_curr->read_bytes;
+ io_last->write_bytes = io_curr->write_bytes;
+ io_last->rchar = io_curr->rchar;
+ io_last->wchar = io_curr->wchar;
+ io_last->fsync = io_curr->fsync;
+
+ memset(io_dead, 0, sizeof(struct io_stats));
+}
+
+#ifdef CONFIG_UID_SYS_STATS_DEBUG
+static void get_full_task_comm(struct task_entry *task_entry,
+ struct task_struct *task)
+{
+ int i = 0, offset = 0, len = 0;
+ /* save one byte for terminating null character */
+ int unused_len = MAX_TASK_COMM_LEN - TASK_COMM_LEN - 1;
+ char buf[unused_len];
+ struct mm_struct *mm = task->mm;
+
+ /* fill the first TASK_COMM_LEN bytes with thread name */
+ __get_task_comm(task_entry->comm, TASK_COMM_LEN, task);
+ i = strlen(task_entry->comm);
+ while (i < TASK_COMM_LEN)
+ task_entry->comm[i++] = ' ';
+
+ /* next the executable file name */
+ if (mm) {
+ down_read(&mm->mmap_sem);
+ if (mm->exe_file) {
+ char *pathname = d_path(&mm->exe_file->f_path, buf,
+ unused_len);
+
+ if (!IS_ERR(pathname)) {
+ len = strlcpy(task_entry->comm + i, pathname,
+ unused_len);
+ i += len;
+ task_entry->comm[i++] = ' ';
+ unused_len--;
+ }
+ }
+ up_read(&mm->mmap_sem);
+ }
+ unused_len -= len;
+
+ /* fill the rest with command line argument
+ * replace each null or new line character
+ * between args in argv with whitespace */
+ len = get_cmdline(task, buf, unused_len);
+ while (offset < len) {
+ if (buf[offset] != '\0' && buf[offset] != '\n')
+ task_entry->comm[i++] = buf[offset];
+ else
+ task_entry->comm[i++] = ' ';
+ offset++;
+ }
+
+ /* get rid of trailing whitespaces in case when arg is memset to
+ * zero before being reset in userspace
+ */
+ while (task_entry->comm[i-1] == ' ')
+ i--;
+ task_entry->comm[i] = '\0';
+}
+
+static struct task_entry *find_task_entry(struct uid_entry *uid_entry,
+ struct task_struct *task)
+{
+ struct task_entry *task_entry;
+
+ hash_for_each_possible(uid_entry->task_entries, task_entry, hash,
+ task->pid) {
+ if (task->pid == task_entry->pid) {
+ /* if thread name changed, update the entire command */
+ int len = strnchr(task_entry->comm, ' ', TASK_COMM_LEN)
+ - task_entry->comm;
+
+ if (strncmp(task_entry->comm, task->comm, len))
+ get_full_task_comm(task_entry, task);
+ return task_entry;
+ }
+ }
+ return NULL;
+}
+
+static struct task_entry *find_or_register_task(struct uid_entry *uid_entry,
+ struct task_struct *task)
+{
+ struct task_entry *task_entry;
+ pid_t pid = task->pid;
+
+ task_entry = find_task_entry(uid_entry, task);
+ if (task_entry)
+ return task_entry;
+
+ task_entry = kzalloc(sizeof(struct task_entry), GFP_ATOMIC);
+ if (!task_entry)
+ return NULL;
+
+ get_full_task_comm(task_entry, task);
+
+ task_entry->pid = pid;
+ hash_add(uid_entry->task_entries, &task_entry->hash, (unsigned int)pid);
+
+ return task_entry;
+}
+
+static void remove_uid_tasks(struct uid_entry *uid_entry)
+{
+ struct task_entry *task_entry;
+ unsigned long bkt_task;
+ struct hlist_node *tmp_task;
+
+ hash_for_each_safe(uid_entry->task_entries, bkt_task,
+ tmp_task, task_entry, hash) {
+ hash_del(&task_entry->hash);
+ kfree(task_entry);
+ }
+}
+
+static void set_io_uid_tasks_zero(struct uid_entry *uid_entry)
+{
+ struct task_entry *task_entry;
+ unsigned long bkt_task;
+
+ hash_for_each(uid_entry->task_entries, bkt_task, task_entry, hash) {
+ memset(&task_entry->io[UID_STATE_TOTAL_CURR], 0,
+ sizeof(struct io_stats));
+ }
+}
+
+static void add_uid_tasks_io_stats(struct uid_entry *uid_entry,
+ struct task_struct *task, int slot)
+{
+ struct task_entry *task_entry = find_or_register_task(uid_entry, task);
+ struct io_stats *task_io_slot = &task_entry->io[slot];
+
+ task_io_slot->read_bytes += task->ioac.read_bytes;
+ task_io_slot->write_bytes += compute_write_bytes(task);
+ task_io_slot->rchar += task->ioac.rchar;
+ task_io_slot->wchar += task->ioac.wchar;
+ task_io_slot->fsync += task->ioac.syscfs;
+}
+
+static void compute_io_uid_tasks(struct uid_entry *uid_entry)
+{
+ struct task_entry *task_entry;
+ unsigned long bkt_task;
+
+ hash_for_each(uid_entry->task_entries, bkt_task, task_entry, hash) {
+ compute_io_bucket_stats(&task_entry->io[uid_entry->state],
+ &task_entry->io[UID_STATE_TOTAL_CURR],
+ &task_entry->io[UID_STATE_TOTAL_LAST],
+ &task_entry->io[UID_STATE_DEAD_TASKS]);
+ }
+}
+
+static void show_io_uid_tasks(struct seq_file *m, struct uid_entry *uid_entry)
+{
+ struct task_entry *task_entry;
+ unsigned long bkt_task;
+
+ hash_for_each(uid_entry->task_entries, bkt_task, task_entry, hash) {
+ /* Separated by comma because space exists in task comm */
+ seq_printf(m, "task,%s,%lu,%llu,%llu,%llu,%llu,%llu,%llu,%llu,%llu,%llu,%llu\n",
+ task_entry->comm,
+ (unsigned long)task_entry->pid,
+ task_entry->io[UID_STATE_FOREGROUND].rchar,
+ task_entry->io[UID_STATE_FOREGROUND].wchar,
+ task_entry->io[UID_STATE_FOREGROUND].read_bytes,
+ task_entry->io[UID_STATE_FOREGROUND].write_bytes,
+ task_entry->io[UID_STATE_BACKGROUND].rchar,
+ task_entry->io[UID_STATE_BACKGROUND].wchar,
+ task_entry->io[UID_STATE_BACKGROUND].read_bytes,
+ task_entry->io[UID_STATE_BACKGROUND].write_bytes,
+ task_entry->io[UID_STATE_FOREGROUND].fsync,
+ task_entry->io[UID_STATE_BACKGROUND].fsync);
+ }
+}
+#else
+static void remove_uid_tasks(struct uid_entry *uid_entry) {};
+static void set_io_uid_tasks_zero(struct uid_entry *uid_entry) {};
+static void add_uid_tasks_io_stats(struct uid_entry *uid_entry,
+ struct task_struct *task, int slot) {};
+static void compute_io_uid_tasks(struct uid_entry *uid_entry) {};
+static void show_io_uid_tasks(struct seq_file *m,
+ struct uid_entry *uid_entry) {}
+#endif
+
+static struct uid_entry *find_uid_entry(uid_t uid)
+{
+ struct uid_entry *uid_entry;
+ hash_for_each_possible(hash_table, uid_entry, hash, uid) {
+ if (uid_entry->uid == uid)
+ return uid_entry;
+ }
+ return NULL;
+}
+
+static struct uid_entry *find_or_register_uid(uid_t uid)
+{
+ struct uid_entry *uid_entry;
+
+ uid_entry = find_uid_entry(uid);
+ if (uid_entry)
+ return uid_entry;
+
+ uid_entry = kzalloc(sizeof(struct uid_entry), GFP_ATOMIC);
+ if (!uid_entry)
+ return NULL;
+
+ uid_entry->uid = uid;
+#ifdef CONFIG_UID_SYS_STATS_DEBUG
+ hash_init(uid_entry->task_entries);
+#endif
+ hash_add(hash_table, &uid_entry->hash, uid);
+
+ return uid_entry;
+}
+
+static int uid_cputime_show(struct seq_file *m, void *v)
+{
+ struct uid_entry *uid_entry = NULL;
+ struct task_struct *task, *temp;
+ struct user_namespace *user_ns = current_user_ns();
+ u64 utime;
+ u64 stime;
+ unsigned long bkt;
+ uid_t uid;
+
+ rt_mutex_lock(&uid_lock);
+
+ hash_for_each(hash_table, bkt, uid_entry, hash) {
+ uid_entry->active_stime = 0;
+ uid_entry->active_utime = 0;
+ }
+
+ rcu_read_lock();
+ do_each_thread(temp, task) {
+ uid = from_kuid_munged(user_ns, task_uid(task));
+ if (!uid_entry || uid_entry->uid != uid)
+ uid_entry = find_or_register_uid(uid);
+ if (!uid_entry) {
+ rcu_read_unlock();
+ rt_mutex_unlock(&uid_lock);
+ pr_err("%s: failed to find the uid_entry for uid %d\n",
+ __func__, uid);
+ return -ENOMEM;
+ }
+ task_cputime_adjusted(task, &utime, &stime);
+ uid_entry->active_utime += utime;
+ uid_entry->active_stime += stime;
+ } while_each_thread(temp, task);
+ rcu_read_unlock();
+
+ hash_for_each(hash_table, bkt, uid_entry, hash) {
+ u64 total_utime = uid_entry->utime +
+ uid_entry->active_utime;
+ u64 total_stime = uid_entry->stime +
+ uid_entry->active_stime;
+ seq_printf(m, "%d: %llu %llu\n", uid_entry->uid,
+ ktime_to_ms(total_utime), ktime_to_ms(total_stime));
+ }
+
+ rt_mutex_unlock(&uid_lock);
+ return 0;
+}
+
+static int uid_cputime_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, uid_cputime_show, PDE_DATA(inode));
+}
+
+static const struct file_operations uid_cputime_fops = {
+ .open = uid_cputime_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static int uid_remove_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, NULL, NULL);
+}
+
+static ssize_t uid_remove_write(struct file *file,
+ const char __user *buffer, size_t count, loff_t *ppos)
+{
+ struct uid_entry *uid_entry;
+ struct hlist_node *tmp;
+ char uids[128];
+ char *start_uid, *end_uid = NULL;
+ long int uid_start = 0, uid_end = 0;
+
+ if (count >= sizeof(uids))
+ count = sizeof(uids) - 1;
+
+ if (copy_from_user(uids, buffer, count))
+ return -EFAULT;
+
+ uids[count] = '\0';
+ end_uid = uids;
+ start_uid = strsep(&end_uid, "-");
+
+ if (!start_uid || !end_uid)
+ return -EINVAL;
+
+ if (kstrtol(start_uid, 10, &uid_start) != 0 ||
+ kstrtol(end_uid, 10, &uid_end) != 0) {
+ return -EINVAL;
+ }
+
+ /* Also remove uids from /proc/uid_time_in_state */
+ cpufreq_task_times_remove_uids(uid_start, uid_end);
+
+ rt_mutex_lock(&uid_lock);
+
+ for (; uid_start <= uid_end; uid_start++) {
+ hash_for_each_possible_safe(hash_table, uid_entry, tmp,
+ hash, (uid_t)uid_start) {
+ if (uid_start == uid_entry->uid) {
+ remove_uid_tasks(uid_entry);
+ hash_del(&uid_entry->hash);
+ kfree(uid_entry);
+ }
+ }
+ }
+
+ rt_mutex_unlock(&uid_lock);
+ return count;
+}
+
+static const struct file_operations uid_remove_fops = {
+ .open = uid_remove_open,
+ .release = single_release,
+ .write = uid_remove_write,
+};
+
+
+static void add_uid_io_stats(struct uid_entry *uid_entry,
+ struct task_struct *task, int slot)
+{
+ struct io_stats *io_slot = &uid_entry->io[slot];
+
+ io_slot->read_bytes += task->ioac.read_bytes;
+ io_slot->write_bytes += compute_write_bytes(task);
+ io_slot->rchar += task->ioac.rchar;
+ io_slot->wchar += task->ioac.wchar;
+ io_slot->fsync += task->ioac.syscfs;
+
+ add_uid_tasks_io_stats(uid_entry, task, slot);
+}
+
+static void update_io_stats_all_locked(void)
+{
+ struct uid_entry *uid_entry = NULL;
+ struct task_struct *task, *temp;
+ struct user_namespace *user_ns = current_user_ns();
+ unsigned long bkt;
+ uid_t uid;
+
+ hash_for_each(hash_table, bkt, uid_entry, hash) {
+ memset(&uid_entry->io[UID_STATE_TOTAL_CURR], 0,
+ sizeof(struct io_stats));
+ set_io_uid_tasks_zero(uid_entry);
+ }
+
+ rcu_read_lock();
+ do_each_thread(temp, task) {
+ uid = from_kuid_munged(user_ns, task_uid(task));
+ if (!uid_entry || uid_entry->uid != uid)
+ uid_entry = find_or_register_uid(uid);
+ if (!uid_entry)
+ continue;
+ add_uid_io_stats(uid_entry, task, UID_STATE_TOTAL_CURR);
+ } while_each_thread(temp, task);
+ rcu_read_unlock();
+
+ hash_for_each(hash_table, bkt, uid_entry, hash) {
+ compute_io_bucket_stats(&uid_entry->io[uid_entry->state],
+ &uid_entry->io[UID_STATE_TOTAL_CURR],
+ &uid_entry->io[UID_STATE_TOTAL_LAST],
+ &uid_entry->io[UID_STATE_DEAD_TASKS]);
+ compute_io_uid_tasks(uid_entry);
+ }
+}
+
+static void update_io_stats_uid_locked(struct uid_entry *uid_entry)
+{
+ struct task_struct *task, *temp;
+ struct user_namespace *user_ns = current_user_ns();
+
+ memset(&uid_entry->io[UID_STATE_TOTAL_CURR], 0,
+ sizeof(struct io_stats));
+ set_io_uid_tasks_zero(uid_entry);
+
+ rcu_read_lock();
+ do_each_thread(temp, task) {
+ if (from_kuid_munged(user_ns, task_uid(task)) != uid_entry->uid)
+ continue;
+ add_uid_io_stats(uid_entry, task, UID_STATE_TOTAL_CURR);
+ } while_each_thread(temp, task);
+ rcu_read_unlock();
+
+ compute_io_bucket_stats(&uid_entry->io[uid_entry->state],
+ &uid_entry->io[UID_STATE_TOTAL_CURR],
+ &uid_entry->io[UID_STATE_TOTAL_LAST],
+ &uid_entry->io[UID_STATE_DEAD_TASKS]);
+ compute_io_uid_tasks(uid_entry);
+}
+
+
+static int uid_io_show(struct seq_file *m, void *v)
+{
+ struct uid_entry *uid_entry;
+ unsigned long bkt;
+
+ rt_mutex_lock(&uid_lock);
+
+ update_io_stats_all_locked();
+
+ hash_for_each(hash_table, bkt, uid_entry, hash) {
+ seq_printf(m, "%d %llu %llu %llu %llu %llu %llu %llu %llu %llu %llu\n",
+ uid_entry->uid,
+ uid_entry->io[UID_STATE_FOREGROUND].rchar,
+ uid_entry->io[UID_STATE_FOREGROUND].wchar,
+ uid_entry->io[UID_STATE_FOREGROUND].read_bytes,
+ uid_entry->io[UID_STATE_FOREGROUND].write_bytes,
+ uid_entry->io[UID_STATE_BACKGROUND].rchar,
+ uid_entry->io[UID_STATE_BACKGROUND].wchar,
+ uid_entry->io[UID_STATE_BACKGROUND].read_bytes,
+ uid_entry->io[UID_STATE_BACKGROUND].write_bytes,
+ uid_entry->io[UID_STATE_FOREGROUND].fsync,
+ uid_entry->io[UID_STATE_BACKGROUND].fsync);
+
+ show_io_uid_tasks(m, uid_entry);
+ }
+
+ rt_mutex_unlock(&uid_lock);
+ return 0;
+}
+
+static int uid_io_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, uid_io_show, PDE_DATA(inode));
+}
+
+static const struct file_operations uid_io_fops = {
+ .open = uid_io_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+static int uid_procstat_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, NULL, NULL);
+}
+
+static ssize_t uid_procstat_write(struct file *file,
+ const char __user *buffer, size_t count, loff_t *ppos)
+{
+ struct uid_entry *uid_entry;
+ uid_t uid;
+ int argc, state;
+ char input[128];
+
+ if (count >= sizeof(input))
+ return -EINVAL;
+
+ if (copy_from_user(input, buffer, count))
+ return -EFAULT;
+
+ input[count] = '\0';
+
+ argc = sscanf(input, "%u %d", &uid, &state);
+ if (argc != 2)
+ return -EINVAL;
+
+ if (state != UID_STATE_BACKGROUND && state != UID_STATE_FOREGROUND)
+ return -EINVAL;
+
+ rt_mutex_lock(&uid_lock);
+
+ uid_entry = find_or_register_uid(uid);
+ if (!uid_entry) {
+ rt_mutex_unlock(&uid_lock);
+ return -EINVAL;
+ }
+
+ if (uid_entry->state == state) {
+ rt_mutex_unlock(&uid_lock);
+ return count;
+ }
+
+ update_io_stats_uid_locked(uid_entry);
+
+ uid_entry->state = state;
+
+ rt_mutex_unlock(&uid_lock);
+
+ return count;
+}
+
+static const struct file_operations uid_procstat_fops = {
+ .open = uid_procstat_open,
+ .release = single_release,
+ .write = uid_procstat_write,
+};
+
+static int process_notifier(struct notifier_block *self,
+ unsigned long cmd, void *v)
+{
+ struct task_struct *task = v;
+ struct uid_entry *uid_entry;
+ u64 utime, stime;
+ uid_t uid;
+
+ if (!task)
+ return NOTIFY_OK;
+
+ rt_mutex_lock(&uid_lock);
+ uid = from_kuid_munged(current_user_ns(), task_uid(task));
+ uid_entry = find_or_register_uid(uid);
+ if (!uid_entry) {
+ pr_err("%s: failed to find uid %d\n", __func__, uid);
+ goto exit;
+ }
+
+ task_cputime_adjusted(task, &utime, &stime);
+ uid_entry->utime += utime;
+ uid_entry->stime += stime;
+
+ add_uid_io_stats(uid_entry, task, UID_STATE_DEAD_TASKS);
+
+exit:
+ rt_mutex_unlock(&uid_lock);
+ return NOTIFY_OK;
+}
+
+static struct notifier_block process_notifier_block = {
+ .notifier_call = process_notifier,
+};
+
+static int __init proc_uid_sys_stats_init(void)
+{
+ hash_init(hash_table);
+
+ cpu_parent = proc_mkdir("uid_cputime", NULL);
+ if (!cpu_parent) {
+ pr_err("%s: failed to create uid_cputime proc entry\n",
+ __func__);
+ goto err;
+ }
+
+ proc_create_data("remove_uid_range", 0222, cpu_parent,
+ &uid_remove_fops, NULL);
+ proc_create_data("show_uid_stat", 0444, cpu_parent,
+ &uid_cputime_fops, NULL);
+
+ io_parent = proc_mkdir("uid_io", NULL);
+ if (!io_parent) {
+ pr_err("%s: failed to create uid_io proc entry\n",
+ __func__);
+ goto err;
+ }
+
+ proc_create_data("stats", 0444, io_parent,
+ &uid_io_fops, NULL);
+
+ proc_parent = proc_mkdir("uid_procstat", NULL);
+ if (!proc_parent) {
+ pr_err("%s: failed to create uid_procstat proc entry\n",
+ __func__);
+ goto err;
+ }
+
+ proc_create_data("set", 0222, proc_parent,
+ &uid_procstat_fops, NULL);
+
+ profile_event_register(PROFILE_TASK_EXIT, &process_notifier_block);
+
+ return 0;
+
+err:
+ remove_proc_subtree("uid_cputime", NULL);
+ remove_proc_subtree("uid_io", NULL);
+ remove_proc_subtree("uid_procstat", NULL);
+ return -ENOMEM;
+}
+
+early_initcall(proc_uid_sys_stats_init);
diff --git a/drivers/mmc/core/Kconfig b/drivers/mmc/core/Kconfig
index 42e8906..b3faf10 100644
--- a/drivers/mmc/core/Kconfig
+++ b/drivers/mmc/core/Kconfig
@@ -80,3 +80,17 @@
This driver is only of interest to those developing or
testing a host driver. Most people should say N here.
+config MMC_EMBEDDED_SDIO
+ boolean "MMC embedded SDIO device support"
+ help
+ If you say Y here, support will be added for embedded SDIO
+ devices which do not contain the necessary enumeration
+ support in hardware to be properly detected.
+
+config MMC_PARANOID_SD_INIT
+ bool "Enable paranoid SD card initialization"
+ help
+ If you say Y here, the MMC layer will be extra paranoid
+ about re-trying SD init requests. This can be a useful
+ work-around for buggy controllers and hardware. Enable
+ if you are experiencing issues with SD detection.
diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c
index 29bfff2..e71e6070 100644
--- a/drivers/mmc/core/core.c
+++ b/drivers/mmc/core/core.c
@@ -2810,6 +2810,22 @@
init_waitqueue_head(&host->context_info.wait);
}
+#ifdef CONFIG_MMC_EMBEDDED_SDIO
+void mmc_set_embedded_sdio_data(struct mmc_host *host,
+ struct sdio_cis *cis,
+ struct sdio_cccr *cccr,
+ struct sdio_embedded_func *funcs,
+ int num_funcs)
+{
+ host->embedded_sdio_data.cis = cis;
+ host->embedded_sdio_data.cccr = cccr;
+ host->embedded_sdio_data.funcs = funcs;
+ host->embedded_sdio_data.num_funcs = num_funcs;
+}
+
+EXPORT_SYMBOL(mmc_set_embedded_sdio_data);
+#endif
+
static int __init mmc_init(void)
{
int ret;
diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
index ad88deb..f469254 100644
--- a/drivers/mmc/core/host.c
+++ b/drivers/mmc/core/host.c
@@ -429,7 +429,8 @@
#endif
mmc_start_host(host);
- mmc_register_pm_notifier(host);
+ if (!(host->pm_flags & MMC_PM_IGNORE_PM_NOTIFY))
+ mmc_register_pm_notifier(host);
return 0;
}
@@ -446,7 +447,8 @@
*/
void mmc_remove_host(struct mmc_host *host)
{
- mmc_unregister_pm_notifier(host);
+ if (!(host->pm_flags & MMC_PM_IGNORE_PM_NOTIFY))
+ mmc_unregister_pm_notifier(host);
mmc_stop_host(host);
#ifdef CONFIG_DEBUG_FS
diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index bad5c1b..13871e2 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -780,6 +780,7 @@
MMC_DEV_ATTR(name, "%s\n", card->cid.prod_name);
MMC_DEV_ATTR(oemid, "0x%04x\n", card->cid.oemid);
MMC_DEV_ATTR(prv, "0x%x\n", card->cid.prv);
+MMC_DEV_ATTR(rev, "0x%x\n", card->ext_csd.rev);
MMC_DEV_ATTR(pre_eol_info, "0x%02x\n", card->ext_csd.pre_eol_info);
MMC_DEV_ATTR(life_time, "0x%02x 0x%02x\n",
card->ext_csd.device_life_time_est_typ_a,
@@ -838,6 +839,7 @@
&dev_attr_name.attr,
&dev_attr_oemid.attr,
&dev_attr_prv.attr,
+ &dev_attr_rev.attr,
&dev_attr_pre_eol_info.attr,
&dev_attr_life_time.attr,
&dev_attr_serial.attr,
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 0a4e77a..8c4721f 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -17,6 +17,8 @@
#include <linux/mmc/card.h>
#include <linux/mmc/host.h>
+#include <linux/sched/rt.h>
+#include <uapi/linux/sched/types.h>
#include "queue.h"
#include "block.h"
@@ -43,6 +45,11 @@
struct mmc_queue *mq = d;
struct request_queue *q = mq->queue;
struct mmc_context_info *cntx = &mq->card->host->context_info;
+ struct sched_param scheduler_params = {0};
+
+ scheduler_params.sched_priority = 1;
+
+ sched_setscheduler(current, SCHED_FIFO, &scheduler_params);
current->flags |= PF_MEMALLOC;
diff --git a/drivers/mmc/core/sd.c b/drivers/mmc/core/sd.c
index eb9de21..4bbdfa3 100644
--- a/drivers/mmc/core/sd.c
+++ b/drivers/mmc/core/sd.c
@@ -834,6 +834,9 @@
bool reinit)
{
int err;
+#ifdef CONFIG_MMC_PARANOID_SD_INIT
+ int retries;
+#endif
if (!reinit) {
/*
@@ -860,7 +863,26 @@
/*
* Fetch switch information from card.
*/
+#ifdef CONFIG_MMC_PARANOID_SD_INIT
+ for (retries = 1; retries <= 3; retries++) {
+ err = mmc_read_switch(card);
+ if (!err) {
+ if (retries > 1) {
+ printk(KERN_WARNING
+ "%s: recovered\n",
+ mmc_hostname(host));
+ }
+ break;
+ } else {
+ printk(KERN_WARNING
+ "%s: read switch failed (attempt %d)\n",
+ mmc_hostname(host), retries);
+ }
+ }
+#else
err = mmc_read_switch(card);
+#endif
+
if (err)
return err;
}
@@ -1054,14 +1076,33 @@
*/
static void mmc_sd_detect(struct mmc_host *host)
{
- int err;
+ int err = 0;
+#ifdef CONFIG_MMC_PARANOID_SD_INIT
+ int retries = 5;
+#endif
mmc_get_card(host->card);
/*
* Just check if our card has been removed.
*/
+#ifdef CONFIG_MMC_PARANOID_SD_INIT
+ while(retries) {
+ err = mmc_send_status(host->card, NULL);
+ if (err) {
+ retries--;
+ udelay(5);
+ continue;
+ }
+ break;
+ }
+ if (!retries) {
+ printk(KERN_ERR "%s(%s): Unable to re-detect card (%d)\n",
+ __func__, mmc_hostname(host), err);
+ }
+#else
err = _mmc_detect_card_removed(host);
+#endif
mmc_put_card(host->card);
@@ -1120,6 +1161,9 @@
static int _mmc_sd_resume(struct mmc_host *host)
{
int err = 0;
+#ifdef CONFIG_MMC_PARANOID_SD_INIT
+ int retries;
+#endif
mmc_claim_host(host);
@@ -1127,7 +1171,23 @@
goto out;
mmc_power_up(host, host->card->ocr);
+#ifdef CONFIG_MMC_PARANOID_SD_INIT
+ retries = 5;
+ while (retries) {
+ err = mmc_sd_init_card(host, host->card->ocr, host->card);
+
+ if (err) {
+ printk(KERN_ERR "%s: Re-init card rc = %d (retries = %d)\n",
+ mmc_hostname(host), err, retries);
+ mdelay(5);
+ retries--;
+ continue;
+ }
+ break;
+ }
+#else
err = mmc_sd_init_card(host, host->card->ocr, host->card);
+#endif
mmc_card_clr_suspended(host->card);
out:
@@ -1202,6 +1262,9 @@
{
int err;
u32 ocr, rocr;
+#ifdef CONFIG_MMC_PARANOID_SD_INIT
+ int retries;
+#endif
WARN_ON(!host->claimed);
@@ -1237,9 +1300,27 @@
/*
* Detect and init the card.
*/
+#ifdef CONFIG_MMC_PARANOID_SD_INIT
+ retries = 5;
+ while (retries) {
+ err = mmc_sd_init_card(host, rocr, NULL);
+ if (err) {
+ retries--;
+ continue;
+ }
+ break;
+ }
+
+ if (!retries) {
+ printk(KERN_ERR "%s: mmc_sd_init_card() failure (err = %d)\n",
+ mmc_hostname(host), err);
+ goto err;
+ }
+#else
err = mmc_sd_init_card(host, rocr, NULL);
if (err)
goto err;
+#endif
mmc_release_host(host);
err = mmc_add_card(host->card);
diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c
index cc43687..c42e3cd 100644
--- a/drivers/mmc/core/sdio.c
+++ b/drivers/mmc/core/sdio.c
@@ -31,6 +31,10 @@
#include "sdio_ops.h"
#include "sdio_cis.h"
+#ifdef CONFIG_MMC_EMBEDDED_SDIO
+#include <linux/mmc/sdio_ids.h>
+#endif
+
static int sdio_read_fbr(struct sdio_func *func)
{
int ret;
@@ -706,28 +710,44 @@
goto finish;
}
- /*
- * Read the common registers. Note that we should try to
- * validate whether UHS would work or not.
- */
- err = sdio_read_cccr(card, ocr);
- if (err) {
- mmc_sdio_resend_if_cond(host, card);
- if (ocr & R4_18V_PRESENT) {
- /* Retry init sequence, but without R4_18V_PRESENT. */
- retries = 0;
- goto try_again;
- } else {
- goto remove;
+#ifdef CONFIG_MMC_EMBEDDED_SDIO
+ if (host->embedded_sdio_data.cccr)
+ memcpy(&card->cccr, host->embedded_sdio_data.cccr, sizeof(struct sdio_cccr));
+ else {
+#endif
+ /*
+ * Read the common registers. Note that we should try to
+ * validate whether UHS would work or not.
+ */
+ err = sdio_read_cccr(card, ocr);
+ if (err) {
+ mmc_sdio_resend_if_cond(host, card);
+ if (ocr & R4_18V_PRESENT) {
+ /* Retry init sequence, but without R4_18V_PRESENT. */
+ retries = 0;
+ goto try_again;
+ } else {
+ goto remove;
+ }
}
+#ifdef CONFIG_MMC_EMBEDDED_SDIO
}
+#endif
- /*
- * Read the common CIS tuples.
- */
- err = sdio_read_common_cis(card);
- if (err)
- goto remove;
+#ifdef CONFIG_MMC_EMBEDDED_SDIO
+ if (host->embedded_sdio_data.cis)
+ memcpy(&card->cis, host->embedded_sdio_data.cis, sizeof(struct sdio_cis));
+ else {
+#endif
+ /*
+ * Read the common CIS tuples.
+ */
+ err = sdio_read_common_cis(card);
+ if (err)
+ goto remove;
+#ifdef CONFIG_MMC_EMBEDDED_SDIO
+ }
+#endif
if (oldcard) {
int same = (card->cis.vendor == oldcard->cis.vendor &&
@@ -1129,14 +1149,36 @@
funcs = (ocr & 0x70000000) >> 28;
card->sdio_funcs = 0;
+#ifdef CONFIG_MMC_EMBEDDED_SDIO
+ if (host->embedded_sdio_data.funcs)
+ card->sdio_funcs = funcs = host->embedded_sdio_data.num_funcs;
+#endif
+
/*
* Initialize (but don't add) all present functions.
*/
for (i = 0; i < funcs; i++, card->sdio_funcs++) {
- err = sdio_init_func(host->card, i + 1);
- if (err)
- goto remove;
+#ifdef CONFIG_MMC_EMBEDDED_SDIO
+ if (host->embedded_sdio_data.funcs) {
+ struct sdio_func *tmp;
+ tmp = sdio_alloc_func(host->card);
+ if (IS_ERR(tmp))
+ goto remove;
+ tmp->num = (i + 1);
+ card->sdio_func[i] = tmp;
+ tmp->class = host->embedded_sdio_data.funcs[i].f_class;
+ tmp->max_blksize = host->embedded_sdio_data.funcs[i].f_maxblksize;
+ tmp->vendor = card->cis.vendor;
+ tmp->device = card->cis.device;
+ } else {
+#endif
+ err = sdio_init_func(host->card, i + 1);
+ if (err)
+ goto remove;
+#ifdef CONFIG_MMC_EMBEDDED_SDIO
+ }
+#endif
/*
* Enable Runtime PM for this func (if supported)
*/
diff --git a/drivers/mmc/core/sdio_bus.c b/drivers/mmc/core/sdio_bus.c
index 2b32b88..997d556 100644
--- a/drivers/mmc/core/sdio_bus.c
+++ b/drivers/mmc/core/sdio_bus.c
@@ -29,6 +29,10 @@
#include "sdio_cis.h"
#include "sdio_bus.h"
+#ifdef CONFIG_MMC_EMBEDDED_SDIO
+#include <linux/mmc/host.h>
+#endif
+
#define to_sdio_driver(d) container_of(d, struct sdio_driver, drv)
/* show configuration fields */
@@ -264,7 +268,14 @@
{
struct sdio_func *func = dev_to_sdio_func(dev);
- sdio_free_func_cis(func);
+#ifdef CONFIG_MMC_EMBEDDED_SDIO
+ /*
+ * If this device is embedded then we never allocated
+ * cis tables for this func
+ */
+ if (!func->card->host->embedded_sdio_data.funcs)
+#endif
+ sdio_free_func_cis(func);
kfree(func->info);
kfree(func->tmpbuf);
diff --git a/drivers/mtd/devices/block2mtd.c b/drivers/mtd/devices/block2mtd.c
index 7c887f1..62fd690 100644
--- a/drivers/mtd/devices/block2mtd.c
+++ b/drivers/mtd/devices/block2mtd.c
@@ -431,7 +431,7 @@
}
-static int block2mtd_setup(const char *val, struct kernel_param *kp)
+static int block2mtd_setup(const char *val, const struct kernel_param *kp)
{
#ifdef MODULE
return block2mtd_setup2(val);
diff --git a/drivers/mtd/devices/phram.c b/drivers/mtd/devices/phram.c
index 8b66e52..7287696 100644
--- a/drivers/mtd/devices/phram.c
+++ b/drivers/mtd/devices/phram.c
@@ -266,7 +266,7 @@
return ret;
}
-static int phram_param_call(const char *val, struct kernel_param *kp)
+static int phram_param_call(const char *val, const struct kernel_param *kp)
{
#ifdef MODULE
return phram_setup(val);
diff --git a/drivers/mtd/ubi/build.c b/drivers/mtd/ubi/build.c
index 18a72da..21d316f 100644
--- a/drivers/mtd/ubi/build.c
+++ b/drivers/mtd/ubi/build.c
@@ -1348,7 +1348,7 @@
* This function returns zero in case of success and a negative error code in
* case of error.
*/
-static int ubi_mtd_param_parse(const char *val, struct kernel_param *kp)
+static int ubi_mtd_param_parse(const char *val, const struct kernel_param *kp)
{
int i, len;
struct mtd_dev_param *p;
diff --git a/drivers/net/ethernet/sfc/mcdi.c b/drivers/net/ethernet/sfc/mcdi.c
index 3df872f..3702647 100644
--- a/drivers/net/ethernet/sfc/mcdi.c
+++ b/drivers/net/ethernet/sfc/mcdi.c
@@ -376,7 +376,7 @@
* because generally mcdi responses are fast. After that, back off
* and poll once a jiffy (approximately)
*/
- spins = TICK_USEC;
+ spins = USER_TICK_USEC;
finish = jiffies + MCDI_RPC_TIMEOUT;
while (1) {
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e0baea2d..cb8448a 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2277,6 +2277,12 @@
int le;
int ret;
+#ifdef CONFIG_ANDROID_PARANOID_NETWORK
+ if (cmd != TUNGETIFF && !capable(CAP_NET_ADMIN)) {
+ return -EPERM;
+ }
+#endif
+
if (cmd == TUNSETIFF || cmd == TUNSETQUEUE || _IOC_TYPE(cmd) == SOCK_IOC_TYPE) {
if (copy_from_user(&ifr, argp, ifreq_len))
return -EFAULT;
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index 083e5ce..0d63555 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -2841,7 +2841,6 @@
struct brcmf_bss_info_le *bi)
{
struct wiphy *wiphy = cfg_to_wiphy(cfg);
- struct ieee80211_channel *notify_channel;
struct cfg80211_bss *bss;
struct ieee80211_supported_band *band;
struct brcmu_chan ch;
@@ -2851,7 +2850,7 @@
u16 notify_interval;
u8 *notify_ie;
size_t notify_ielen;
- s32 notify_signal;
+ struct cfg80211_inform_bss bss_data = {};
if (le32_to_cpu(bi->length) > WL_BSS_INFO_MAX) {
brcmf_err("Bss info is larger than buffer. Discarding\n");
@@ -2871,27 +2870,28 @@
band = wiphy->bands[NL80211_BAND_5GHZ];
freq = ieee80211_channel_to_frequency(channel, band->band);
- notify_channel = ieee80211_get_channel(wiphy, freq);
+ bss_data.chan = ieee80211_get_channel(wiphy, freq);
+ bss_data.scan_width = NL80211_BSS_CHAN_WIDTH_20;
+ bss_data.boottime_ns = ktime_to_ns(ktime_get_boottime());
notify_capability = le16_to_cpu(bi->capability);
notify_interval = le16_to_cpu(bi->beacon_period);
notify_ie = (u8 *)bi + le16_to_cpu(bi->ie_offset);
notify_ielen = le32_to_cpu(bi->ie_length);
- notify_signal = (s16)le16_to_cpu(bi->RSSI) * 100;
+ bss_data.signal = (s16)le16_to_cpu(bi->RSSI) * 100;
brcmf_dbg(CONN, "bssid: %pM\n", bi->BSSID);
brcmf_dbg(CONN, "Channel: %d(%d)\n", channel, freq);
brcmf_dbg(CONN, "Capability: %X\n", notify_capability);
brcmf_dbg(CONN, "Beacon interval: %d\n", notify_interval);
- brcmf_dbg(CONN, "Signal: %d\n", notify_signal);
+ brcmf_dbg(CONN, "Signal: %d\n", bss_data.signal);
- bss = cfg80211_inform_bss(wiphy, notify_channel,
- CFG80211_BSS_FTYPE_UNKNOWN,
- (const u8 *)bi->BSSID,
- 0, notify_capability,
- notify_interval, notify_ie,
- notify_ielen, notify_signal,
- GFP_KERNEL);
+ bss = cfg80211_inform_bss_data(wiphy, &bss_data,
+ CFG80211_BSS_FTYPE_UNKNOWN,
+ (const u8 *)bi->BSSID,
+ 0, notify_capability,
+ notify_interval, notify_ie,
+ notify_ielen, GFP_KERNEL);
if (!bss)
return -ENOMEM;
diff --git a/drivers/net/wireless/ti/wlcore/init.c b/drivers/net/wireless/ti/wlcore/init.c
index 58898b9..145e10a 100644
--- a/drivers/net/wireless/ti/wlcore/init.c
+++ b/drivers/net/wireless/ti/wlcore/init.c
@@ -549,6 +549,11 @@
{
int ret;
+ /* Disable filtering */
+ ret = wl1271_acx_group_address_tbl(wl, wlvif, false, NULL, 0);
+ if (ret < 0)
+ return ret;
+
ret = wl1271_acx_ap_max_tx_retry(wl, wlvif);
if (ret < 0)
return ret;
diff --git a/drivers/nfc/fdp/i2c.c b/drivers/nfc/fdp/i2c.c
index c4da50e..08a4f82 100644
--- a/drivers/nfc/fdp/i2c.c
+++ b/drivers/nfc/fdp/i2c.c
@@ -176,6 +176,16 @@
/* Packet that contains a length */
if (tmp[0] == 0 && tmp[1] == 0) {
phy->next_read_size = (tmp[2] << 8) + tmp[3] + 3;
+ /*
+ * Ensure next_read_size does not exceed sizeof(tmp)
+ * for reading that many bytes during next iteration
+ */
+ if (phy->next_read_size > FDP_NCI_I2C_MAX_PAYLOAD) {
+ dev_dbg(&client->dev, "%s: corrupted packet\n",
+ __func__);
+ phy->next_read_size = 5;
+ goto flush;
+ }
} else {
phy->next_read_size = FDP_NCI_I2C_MIN_PAYLOAD;
diff --git a/drivers/nfc/st21nfca/dep.c b/drivers/nfc/st21nfca/dep.c
index fd08be2..3420c51 100644
--- a/drivers/nfc/st21nfca/dep.c
+++ b/drivers/nfc/st21nfca/dep.c
@@ -217,7 +217,8 @@
atr_req = (struct st21nfca_atr_req *)skb->data;
- if (atr_req->length < sizeof(struct st21nfca_atr_req)) {
+ if (atr_req->length < sizeof(struct st21nfca_atr_req) ||
+ atr_req->length > skb->len) {
r = -EPROTO;
goto exit;
}
diff --git a/drivers/nfc/st21nfca/se.c b/drivers/nfc/st21nfca/se.c
index 3a98563..6e84e12 100644
--- a/drivers/nfc/st21nfca/se.c
+++ b/drivers/nfc/st21nfca/se.c
@@ -320,23 +320,33 @@
* AID 81 5 to 16
* PARAMETERS 82 0 to 255
*/
- if (skb->len < NFC_MIN_AID_LENGTH + 2 &&
+ if (skb->len < NFC_MIN_AID_LENGTH + 2 ||
skb->data[0] != NFC_EVT_TRANSACTION_AID_TAG)
return -EPROTO;
+ /*
+ * Buffer should have enough space for at least
+ * two tag fields + two length fields + aid_len (skb->data[1])
+ */
+ if (skb->len < skb->data[1] + 4)
+ return -EPROTO;
+
transaction = (struct nfc_evt_transaction *)devm_kzalloc(dev,
skb->len - 2, GFP_KERNEL);
transaction->aid_len = skb->data[1];
memcpy(transaction->aid, &skb->data[2],
transaction->aid_len);
-
- /* Check next byte is PARAMETERS tag (82) */
- if (skb->data[transaction->aid_len + 2] !=
- NFC_EVT_TRANSACTION_PARAMS_TAG)
- return -EPROTO;
-
transaction->params_len = skb->data[transaction->aid_len + 3];
+
+ /* Check next byte is PARAMETERS tag (82) and the length field */
+ if (skb->data[transaction->aid_len + 2] !=
+ NFC_EVT_TRANSACTION_PARAMS_TAG ||
+ skb->len < transaction->aid_len + transaction->params_len + 4) {
+ devm_kfree(dev, transaction);
+ return -EPROTO;
+ }
+
memcpy(transaction->params, skb->data +
transaction->aid_len + 4, transaction->params_len);
diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
index 6337c39..b1103e5 100644
--- a/drivers/of/fdt.c
+++ b/drivers/of/fdt.c
@@ -1112,42 +1112,66 @@
return 0;
}
+/*
+ * Convert configs to something easy to use in C code
+ */
+#if defined(CONFIG_CMDLINE_FORCE)
+static const int overwrite_incoming_cmdline = 1;
+static const int read_dt_cmdline;
+static const int concat_cmdline;
+#elif defined(CONFIG_CMDLINE_EXTEND)
+static const int overwrite_incoming_cmdline;
+static const int read_dt_cmdline = 1;
+static const int concat_cmdline = 1;
+#else /* CMDLINE_FROM_BOOTLOADER */
+static const int overwrite_incoming_cmdline;
+static const int read_dt_cmdline = 1;
+static const int concat_cmdline;
+#endif
+
+#ifdef CONFIG_CMDLINE
+static const char *config_cmdline = CONFIG_CMDLINE;
+#else
+static const char *config_cmdline = "";
+#endif
+
int __init early_init_dt_scan_chosen(unsigned long node, const char *uname,
int depth, void *data)
{
- int l;
- const char *p;
+ int l = 0;
+ const char *p = NULL;
+ char *cmdline = data;
pr_debug("search \"chosen\", depth: %d, uname: %s\n", depth, uname);
- if (depth != 1 || !data ||
+ if (depth != 1 || !cmdline ||
(strcmp(uname, "chosen") != 0 && strcmp(uname, "chosen@0") != 0))
return 0;
early_init_dt_check_for_initrd(node);
- /* Retrieve command line */
- p = of_get_flat_dt_prop(node, "bootargs", &l);
- if (p != NULL && l > 0)
- strlcpy(data, p, min((int)l, COMMAND_LINE_SIZE));
+ /* Put CONFIG_CMDLINE in if forced or if data had nothing in it to start */
+ if (overwrite_incoming_cmdline || !cmdline[0])
+ strlcpy(cmdline, config_cmdline, COMMAND_LINE_SIZE);
- /*
- * CONFIG_CMDLINE is meant to be a default in case nothing else
- * managed to set the command line, unless CONFIG_CMDLINE_FORCE
- * is set in which case we override whatever was found earlier.
- */
-#ifdef CONFIG_CMDLINE
-#if defined(CONFIG_CMDLINE_EXTEND)
- strlcat(data, " ", COMMAND_LINE_SIZE);
- strlcat(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
-#elif defined(CONFIG_CMDLINE_FORCE)
- strlcpy(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
-#else
- /* No arguments from boot loader, use kernel's cmdl*/
- if (!((char *)data)[0])
- strlcpy(data, CONFIG_CMDLINE, COMMAND_LINE_SIZE);
-#endif
-#endif /* CONFIG_CMDLINE */
+ /* Retrieve command line unless forcing */
+ if (read_dt_cmdline)
+ p = of_get_flat_dt_prop(node, "bootargs", &l);
+
+ if (p != NULL && l > 0) {
+ if (concat_cmdline) {
+ int cmdline_len;
+ int copy_len;
+ strlcat(cmdline, " ", COMMAND_LINE_SIZE);
+ cmdline_len = strlen(cmdline);
+ copy_len = COMMAND_LINE_SIZE - cmdline_len - 1;
+ copy_len = min((int)l, copy_len);
+ strncpy(cmdline + cmdline_len, p, copy_len);
+ cmdline[cmdline_len + copy_len] = '\0';
+ } else {
+ strlcpy(cmdline, p, min((int)l, COMMAND_LINE_SIZE));
+ }
+ }
pr_debug("Command line is: %s\n", (char*)data);
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 633e55c..bcb96af 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -1065,7 +1065,8 @@
}
EXPORT_SYMBOL(pci_disable_link_state);
-static int pcie_aspm_set_policy(const char *val, struct kernel_param *kp)
+static int pcie_aspm_set_policy(const char *val,
+ const struct kernel_param *kp)
{
int i;
struct pcie_link_state *link;
@@ -1092,7 +1093,7 @@
return 0;
}
-static int pcie_aspm_get_policy(char *buffer, struct kernel_param *kp)
+static int pcie_aspm_get_policy(char *buffer, const struct kernel_param *kp)
{
int i, cnt = 0;
for (i = 0; i < ARRAY_SIZE(policy_str); i++)
diff --git a/drivers/platform/x86/thinkpad_acpi.c b/drivers/platform/x86/thinkpad_acpi.c
index c407d52..c66efa0 100644
--- a/drivers/platform/x86/thinkpad_acpi.c
+++ b/drivers/platform/x86/thinkpad_acpi.c
@@ -9553,7 +9553,7 @@
},
};
-static int __init set_ibm_param(const char *val, struct kernel_param *kp)
+static int __init set_ibm_param(const char *val, const struct kernel_param *kp)
{
unsigned int i;
struct ibm_struct *ibm;
diff --git a/drivers/power/supply/power_supply_sysfs.c b/drivers/power/supply/power_supply_sysfs.c
index 5204f11..da589c3 100644
--- a/drivers/power/supply/power_supply_sysfs.c
+++ b/drivers/power/supply/power_supply_sysfs.c
@@ -121,7 +121,10 @@
else if (off >= POWER_SUPPLY_PROP_MODEL_NAME)
return sprintf(buf, "%s\n", value.strval);
- return sprintf(buf, "%d\n", value.intval);
+ if (off == POWER_SUPPLY_PROP_CHARGE_COUNTER_EXT)
+ return sprintf(buf, "%lld\n", value.int64val);
+ else
+ return sprintf(buf, "%d\n", value.intval);
}
static ssize_t power_supply_store_property(struct device *dev,
@@ -245,6 +248,12 @@
POWER_SUPPLY_ATTR(precharge_current),
POWER_SUPPLY_ATTR(charge_term_current),
POWER_SUPPLY_ATTR(calibrate),
+ /* Local extensions */
+ POWER_SUPPLY_ATTR(usb_hc),
+ POWER_SUPPLY_ATTR(usb_otg),
+ POWER_SUPPLY_ATTR(charge_enabled),
+ /* Local extensions of type int64_t */
+ POWER_SUPPLY_ATTR(charge_counter_ext),
/* Properties of type `const char *' */
POWER_SUPPLY_ATTR(model_name),
POWER_SUPPLY_ATTR(manufacturer),
diff --git a/drivers/rtc/rtc-palmas.c b/drivers/rtc/rtc-palmas.c
index 4bcfb88..34aea38 100644
--- a/drivers/rtc/rtc-palmas.c
+++ b/drivers/rtc/rtc-palmas.c
@@ -45,6 +45,42 @@
/* Total number of RTC registers needed to set time*/
#define PALMAS_NUM_TIME_REGS (PALMAS_YEARS_REG - PALMAS_SECONDS_REG + 1)
+/*
+ * Special bin2bcd mapping to deal with bcd storage of year.
+ *
+ * 0-69 -> 0xD0
+ * 70-99 (1970 - 1999) -> 0xD0 - 0xF9 (correctly rolls to 0x00)
+ * 100-199 (2000 - 2099) -> 0x00 - 0x99 (does not roll to 0xA0 :-( )
+ * 200-229 (2100 - 2129) -> 0xA0 - 0xC9 (really for completeness)
+ * 230- -> 0xC9
+ *
+ * Confirmed: the only transition that does not work correctly for this rtc
+ * clock is the transition from 2099 to 2100, it proceeds to 2000. We will
+ * accept this issue since the clock retains and transitions the year correctly
+ * in all other conditions.
+ */
+static unsigned char year_bin2bcd(int val)
+{
+ if (val < 70)
+ return 0xD0;
+ if (val < 100)
+ return bin2bcd(val - 20) | 0x80; /* KISS leverage of bin2bcd */
+ if (val >= 230)
+ return 0xC9;
+ if (val >= 200)
+ return bin2bcd(val - 180) | 0x80;
+ return bin2bcd(val - 100);
+}
+
+static int year_bcd2bin(unsigned char val)
+{
+ if (val >= 0xD0)
+ return bcd2bin(val & 0x7F) + 20;
+ if (val >= 0xA0)
+ return bcd2bin(val & 0x7F) + 180;
+ return bcd2bin(val) + 100;
+}
+
static int palmas_rtc_read_time(struct device *dev, struct rtc_time *tm)
{
unsigned char rtc_data[PALMAS_NUM_TIME_REGS];
@@ -71,7 +107,7 @@
tm->tm_hour = bcd2bin(rtc_data[2]);
tm->tm_mday = bcd2bin(rtc_data[3]);
tm->tm_mon = bcd2bin(rtc_data[4]) - 1;
- tm->tm_year = bcd2bin(rtc_data[5]) + 100;
+ tm->tm_year = year_bcd2bin(rtc_data[5]);
return ret;
}
@@ -87,7 +123,7 @@
rtc_data[2] = bin2bcd(tm->tm_hour);
rtc_data[3] = bin2bcd(tm->tm_mday);
rtc_data[4] = bin2bcd(tm->tm_mon + 1);
- rtc_data[5] = bin2bcd(tm->tm_year - 100);
+ rtc_data[5] = year_bin2bcd(tm->tm_year);
/* Stop RTC while updating the RTC time registers */
ret = palmas_update_bits(palmas, PALMAS_RTC_BASE, PALMAS_RTC_CTRL_REG,
@@ -142,7 +178,7 @@
alm->time.tm_hour = bcd2bin(alarm_data[2]);
alm->time.tm_mday = bcd2bin(alarm_data[3]);
alm->time.tm_mon = bcd2bin(alarm_data[4]) - 1;
- alm->time.tm_year = bcd2bin(alarm_data[5]) + 100;
+ alm->time.tm_year = year_bcd2bin(alarm_data[5]);
ret = palmas_read(palmas, PALMAS_RTC_BASE, PALMAS_RTC_INTERRUPTS_REG,
&int_val);
@@ -173,7 +209,7 @@
alarm_data[2] = bin2bcd(alm->time.tm_hour);
alarm_data[3] = bin2bcd(alm->time.tm_mday);
alarm_data[4] = bin2bcd(alm->time.tm_mon + 1);
- alarm_data[5] = bin2bcd(alm->time.tm_year - 100);
+ alarm_data[5] = year_bin2bcd(alm->time.tm_year);
ret = palmas_bulk_write(palmas, PALMAS_RTC_BASE,
PALMAS_ALARM_SECONDS_REG, alarm_data, PALMAS_NUM_TIME_REGS);
diff --git a/drivers/scsi/fcoe/fcoe_transport.c b/drivers/scsi/fcoe/fcoe_transport.c
index 375c536..c5eb0c4 100644
--- a/drivers/scsi/fcoe/fcoe_transport.c
+++ b/drivers/scsi/fcoe/fcoe_transport.c
@@ -32,13 +32,13 @@
MODULE_DESCRIPTION("FIP discovery protocol and FCoE transport for FCoE HBAs");
MODULE_LICENSE("GPL v2");
-static int fcoe_transport_create(const char *, struct kernel_param *);
-static int fcoe_transport_destroy(const char *, struct kernel_param *);
+static int fcoe_transport_create(const char *, const struct kernel_param *);
+static int fcoe_transport_destroy(const char *, const struct kernel_param *);
static int fcoe_transport_show(char *buffer, const struct kernel_param *kp);
static struct fcoe_transport *fcoe_transport_lookup(struct net_device *device);
static struct fcoe_transport *fcoe_netdev_map_lookup(struct net_device *device);
-static int fcoe_transport_enable(const char *, struct kernel_param *);
-static int fcoe_transport_disable(const char *, struct kernel_param *);
+static int fcoe_transport_enable(const char *, const struct kernel_param *);
+static int fcoe_transport_disable(const char *, const struct kernel_param *);
static int libfcoe_device_notification(struct notifier_block *notifier,
ulong event, void *ptr);
@@ -865,7 +865,8 @@
*
* Returns: 0 for success
*/
-static int fcoe_transport_create(const char *buffer, struct kernel_param *kp)
+static int fcoe_transport_create(const char *buffer,
+ const struct kernel_param *kp)
{
int rc = -ENODEV;
struct net_device *netdev = NULL;
@@ -930,7 +931,8 @@
*
* Returns: 0 for success
*/
-static int fcoe_transport_destroy(const char *buffer, struct kernel_param *kp)
+static int fcoe_transport_destroy(const char *buffer,
+ const struct kernel_param *kp)
{
int rc = -ENODEV;
struct net_device *netdev = NULL;
@@ -974,7 +976,8 @@
*
* Returns: 0 for success
*/
-static int fcoe_transport_disable(const char *buffer, struct kernel_param *kp)
+static int fcoe_transport_disable(const char *buffer,
+ const struct kernel_param *kp)
{
int rc = -ENODEV;
struct net_device *netdev = NULL;
@@ -1008,7 +1011,8 @@
*
* Returns: 0 for success
*/
-static int fcoe_transport_enable(const char *buffer, struct kernel_param *kp)
+static int fcoe_transport_enable(const char *buffer,
+ const struct kernel_param *kp)
{
int rc = -ENODEV;
struct net_device *netdev = NULL;
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 9b716c8..66a7982 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -105,7 +105,7 @@
*
*/
static int
-_scsih_set_fwfault_debug(const char *val, struct kernel_param *kp)
+_scsih_set_fwfault_debug(const char *val, const struct kernel_param *kp)
{
int ret = param_set_int(val, kp);
struct MPT3SAS_ADAPTER *ioc;
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index ae5e579..decf9d5 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -281,7 +281,7 @@
* Note: The logging levels are defined in mpt3sas_debug.h.
*/
static int
-_scsih_set_debug_level(const char *val, struct kernel_param *kp)
+_scsih_set_debug_level(const char *val, const struct kernel_param *kp)
{
int ret = param_set_int(val, kp);
struct MPT3SAS_ADAPTER *ioc;
diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
index 71a50b9..4630dc8 100644
--- a/drivers/staging/android/Kconfig
+++ b/drivers/staging/android/Kconfig
@@ -14,8 +14,19 @@
It is, in theory, a good memory allocator for low-memory devices,
because it can discard shared memory units when under memory pressure.
+config ANDROID_VSOC
+ tristate "Android Virtual SoC support"
+ default n
+ depends on PCI_MSI
+ ---help---
+ This option adds support for the Virtual SoC driver needed to boot
+ a 'cuttlefish' Android image inside QEmu. The driver interacts with
+ a QEmu ivshmem device. If built as a module, it will be called vsoc.
+
source "drivers/staging/android/ion/Kconfig"
+source "drivers/staging/android/fiq_debugger/Kconfig"
+
endif # if ANDROID
endmenu
diff --git a/drivers/staging/android/Makefile b/drivers/staging/android/Makefile
index 7cf1564..2638b4a 100644
--- a/drivers/staging/android/Makefile
+++ b/drivers/staging/android/Makefile
@@ -1,5 +1,7 @@
ccflags-y += -I$(src) # needed for trace events
obj-y += ion/
+obj-$(CONFIG_FIQ_DEBUGGER) += fiq_debugger/
obj-$(CONFIG_ASHMEM) += ashmem.o
+obj-$(CONFIG_ANDROID_VSOC) += vsoc.o
diff --git a/drivers/staging/android/TODO b/drivers/staging/android/TODO
index 5f14247..ebd6ba3 100644
--- a/drivers/staging/android/TODO
+++ b/drivers/staging/android/TODO
@@ -12,5 +12,14 @@
- Split /dev/ion up into multiple nodes (e.g. /dev/ion/heap0)
- Better test framework (integration with VGEM was suggested)
+vsoc.c, uapi/vsoc_shm.h
+ - The current driver uses the same wait queue for all of the futexes in a
+ region. This will cause false wakeups in regions with a large number of
+ waiting threads. We should eventually use multiple queues and select the
+ queue based on the region.
+ - Add debugfs support for examining the permissions of regions.
+ - Remove VSOC_WAIT_FOR_INCOMING_INTERRUPT ioctl. This functionality has been
+ superseded by the futex and is there for legacy reasons.
+
Please send patches to Greg Kroah-Hartman <greg@kroah.com> and Cc:
Arve Hjønnevåg <arve@android.com> and Riley Andrews <riandrews@android.com>
diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
index 893b283..69df278 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -400,22 +400,14 @@
}
get_file(asma->file);
- /*
- * XXX - Reworked to use shmem_zero_setup() instead of
- * shmem_set_file while we're in staging. -jstultz
- */
- if (vma->vm_flags & VM_SHARED) {
- ret = shmem_zero_setup(vma);
- if (ret) {
- fput(asma->file);
- goto out;
- }
+ if (vma->vm_flags & VM_SHARED)
+ shmem_set_file(vma, asma->file);
+ else {
+ if (vma->vm_file)
+ fput(vma->vm_file);
+ vma->vm_file = asma->file;
}
- if (vma->vm_file)
- fput(vma->vm_file);
- vma->vm_file = asma->file;
-
out:
mutex_unlock(&ashmem_mutex);
return ret;
@@ -452,9 +444,9 @@
loff_t start = range->pgstart * PAGE_SIZE;
loff_t end = (range->pgend + 1) * PAGE_SIZE;
- vfs_fallocate(range->asma->file,
- FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
- start, end - start);
+ range->asma->file->f_op->fallocate(range->asma->file,
+ FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+ start, end - start);
range->purged = ASHMEM_WAS_PURGED;
lru_del(range);
diff --git a/drivers/staging/android/fiq_debugger/Kconfig b/drivers/staging/android/fiq_debugger/Kconfig
new file mode 100644
index 0000000..60fc224
--- /dev/null
+++ b/drivers/staging/android/fiq_debugger/Kconfig
@@ -0,0 +1,58 @@
+config FIQ_DEBUGGER
+ bool "FIQ Mode Serial Debugger"
+ default n
+ depends on ARM || ARM64
+ help
+ The FIQ serial debugger can accept commands even when the
+ kernel is unresponsive due to being stuck with interrupts
+ disabled.
+
+config FIQ_DEBUGGER_NO_SLEEP
+ bool "Keep serial debugger active"
+ depends on FIQ_DEBUGGER
+ default n
+ help
+ Enables the serial debugger at boot. Passing
+ fiq_debugger.no_sleep on the kernel commandline will
+ override this config option.
+
+config FIQ_DEBUGGER_WAKEUP_IRQ_ALWAYS_ON
+ bool "Don't disable wakeup IRQ when debugger is active"
+ depends on FIQ_DEBUGGER
+ default n
+ help
+ Don't disable the wakeup irq when enabling the uart clock. This will
+ cause extra interrupts, but it makes the serial debugger usable with
+ on some MSM radio builds that ignore the uart clock request in power
+ collapse.
+
+config FIQ_DEBUGGER_CONSOLE
+ bool "Console on FIQ Serial Debugger port"
+ depends on FIQ_DEBUGGER
+ default n
+ help
+ Enables a console so that printk messages are displayed on
+ the debugger serial port as the occur.
+
+config FIQ_DEBUGGER_CONSOLE_DEFAULT_ENABLE
+ bool "Put the FIQ debugger into console mode by default"
+ depends on FIQ_DEBUGGER_CONSOLE
+ default n
+ help
+ If enabled, this puts the fiq debugger into console mode by default.
+ Otherwise, the fiq debugger will start out in debug mode.
+
+config FIQ_DEBUGGER_UART_OVERLAY
+ bool "Install uart DT overlay"
+ depends on FIQ_DEBUGGER
+ select OF_OVERLAY
+ default n
+ help
+ If enabled, fiq debugger is calling fiq_debugger_uart_overlay()
+ that will apply overlay uart_overlay@0 to disable proper uart.
+
+config FIQ_WATCHDOG
+ bool
+ select FIQ_DEBUGGER
+ select PSTORE_RAM
+ default n
diff --git a/drivers/staging/android/fiq_debugger/Makefile b/drivers/staging/android/fiq_debugger/Makefile
new file mode 100644
index 0000000..a7ca487
--- /dev/null
+++ b/drivers/staging/android/fiq_debugger/Makefile
@@ -0,0 +1,4 @@
+obj-y += fiq_debugger.o
+obj-$(CONFIG_ARM) += fiq_debugger_arm.o
+obj-$(CONFIG_ARM64) += fiq_debugger_arm64.o
+obj-$(CONFIG_FIQ_WATCHDOG) += fiq_watchdog.o
diff --git a/drivers/staging/android/fiq_debugger/fiq_debugger.c b/drivers/staging/android/fiq_debugger/fiq_debugger.c
new file mode 100644
index 0000000..f6a8062
--- /dev/null
+++ b/drivers/staging/android/fiq_debugger/fiq_debugger.c
@@ -0,0 +1,1246 @@
+/*
+ * drivers/staging/android/fiq_debugger.c
+ *
+ * Serial Debugger Interface accessed through an FIQ interrupt.
+ *
+ * Copyright (C) 2008 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <stdarg.h>
+#include <linux/module.h>
+#include <linux/io.h>
+#include <linux/console.h>
+#include <linux/interrupt.h>
+#include <linux/clk.h>
+#include <linux/platform_device.h>
+#include <linux/kernel_stat.h>
+#include <linux/kmsg_dump.h>
+#include <linux/irq.h>
+#include <linux/delay.h>
+#include <linux/reboot.h>
+#include <linux/sched/signal.h>
+#include <linux/slab.h>
+#include <linux/smp.h>
+#include <linux/timer.h>
+#include <linux/tty.h>
+#include <linux/tty_flip.h>
+
+#ifdef CONFIG_FIQ_GLUE
+#include <asm/fiq_glue.h>
+#endif
+
+#ifdef CONFIG_FIQ_DEBUGGER_UART_OVERLAY
+#include <linux/of.h>
+#endif
+
+#include <linux/uaccess.h>
+
+#include "fiq_debugger.h"
+#include "fiq_debugger_priv.h"
+#include "fiq_debugger_ringbuf.h"
+
+#define DEBUG_MAX 64
+#define MAX_UNHANDLED_FIQ_COUNT 1000000
+
+#define MAX_FIQ_DEBUGGER_PORTS 4
+
+struct fiq_debugger_state {
+#ifdef CONFIG_FIQ_GLUE
+ struct fiq_glue_handler handler;
+#endif
+ struct fiq_debugger_output output;
+
+ int fiq;
+ int uart_irq;
+ int signal_irq;
+ int wakeup_irq;
+ bool wakeup_irq_no_set_wake;
+ struct clk *clk;
+ struct fiq_debugger_pdata *pdata;
+ struct platform_device *pdev;
+
+ char debug_cmd[DEBUG_MAX];
+ int debug_busy;
+ int debug_abort;
+
+ char debug_buf[DEBUG_MAX];
+ int debug_count;
+
+ bool no_sleep;
+ bool debug_enable;
+ bool ignore_next_wakeup_irq;
+ struct timer_list sleep_timer;
+ spinlock_t sleep_timer_lock;
+ bool uart_enabled;
+ struct wakeup_source debugger_wake_src;
+ bool console_enable;
+ int current_cpu;
+ atomic_t unhandled_fiq_count;
+ bool in_fiq;
+
+ struct work_struct work;
+ spinlock_t work_lock;
+ char work_cmd[DEBUG_MAX];
+
+#ifdef CONFIG_FIQ_DEBUGGER_CONSOLE
+ spinlock_t console_lock;
+ struct console console;
+ struct tty_port tty_port;
+ struct fiq_debugger_ringbuf *tty_rbuf;
+ bool syslog_dumping;
+#endif
+
+ unsigned int last_irqs[NR_IRQS];
+ unsigned int last_local_timer_irqs[NR_CPUS];
+};
+
+#ifdef CONFIG_FIQ_DEBUGGER_CONSOLE
+struct tty_driver *fiq_tty_driver;
+#endif
+
+#ifdef CONFIG_FIQ_DEBUGGER_NO_SLEEP
+static bool initial_no_sleep = true;
+#else
+static bool initial_no_sleep;
+#endif
+
+#ifdef CONFIG_FIQ_DEBUGGER_CONSOLE_DEFAULT_ENABLE
+static bool initial_debug_enable = true;
+static bool initial_console_enable = true;
+#else
+static bool initial_debug_enable;
+static bool initial_console_enable;
+#endif
+
+static bool fiq_kgdb_enable;
+static bool fiq_debugger_disable;
+
+module_param_named(no_sleep, initial_no_sleep, bool, 0644);
+module_param_named(debug_enable, initial_debug_enable, bool, 0644);
+module_param_named(console_enable, initial_console_enable, bool, 0644);
+module_param_named(kgdb_enable, fiq_kgdb_enable, bool, 0644);
+module_param_named(disable, fiq_debugger_disable, bool, 0644);
+
+#ifdef CONFIG_FIQ_DEBUGGER_WAKEUP_IRQ_ALWAYS_ON
+static inline
+void fiq_debugger_enable_wakeup_irq(struct fiq_debugger_state *state) {}
+static inline
+void fiq_debugger_disable_wakeup_irq(struct fiq_debugger_state *state) {}
+#else
+static inline
+void fiq_debugger_enable_wakeup_irq(struct fiq_debugger_state *state)
+{
+ if (state->wakeup_irq < 0)
+ return;
+ enable_irq(state->wakeup_irq);
+ if (!state->wakeup_irq_no_set_wake)
+ enable_irq_wake(state->wakeup_irq);
+}
+static inline
+void fiq_debugger_disable_wakeup_irq(struct fiq_debugger_state *state)
+{
+ if (state->wakeup_irq < 0)
+ return;
+ disable_irq_nosync(state->wakeup_irq);
+ if (!state->wakeup_irq_no_set_wake)
+ disable_irq_wake(state->wakeup_irq);
+}
+#endif
+
+static inline bool fiq_debugger_have_fiq(struct fiq_debugger_state *state)
+{
+ return (state->fiq >= 0);
+}
+
+#ifdef CONFIG_FIQ_GLUE
+static void fiq_debugger_force_irq(struct fiq_debugger_state *state)
+{
+ unsigned int irq = state->signal_irq;
+
+ if (WARN_ON(!fiq_debugger_have_fiq(state)))
+ return;
+ if (state->pdata->force_irq) {
+ state->pdata->force_irq(state->pdev, irq);
+ } else {
+ struct irq_chip *chip = irq_get_chip(irq);
+ if (chip && chip->irq_retrigger)
+ chip->irq_retrigger(irq_get_irq_data(irq));
+ }
+}
+#endif
+
+static void fiq_debugger_uart_enable(struct fiq_debugger_state *state)
+{
+ if (state->clk)
+ clk_enable(state->clk);
+ if (state->pdata->uart_enable)
+ state->pdata->uart_enable(state->pdev);
+}
+
+static void fiq_debugger_uart_disable(struct fiq_debugger_state *state)
+{
+ if (state->pdata->uart_disable)
+ state->pdata->uart_disable(state->pdev);
+ if (state->clk)
+ clk_disable(state->clk);
+}
+
+static void fiq_debugger_uart_flush(struct fiq_debugger_state *state)
+{
+ if (state->pdata->uart_flush)
+ state->pdata->uart_flush(state->pdev);
+}
+
+static void fiq_debugger_putc(struct fiq_debugger_state *state, char c)
+{
+ state->pdata->uart_putc(state->pdev, c);
+}
+
+static void fiq_debugger_puts(struct fiq_debugger_state *state, char *s)
+{
+ unsigned c;
+ while ((c = *s++)) {
+ if (c == '\n')
+ fiq_debugger_putc(state, '\r');
+ fiq_debugger_putc(state, c);
+ }
+}
+
+static void fiq_debugger_prompt(struct fiq_debugger_state *state)
+{
+ fiq_debugger_puts(state, "debug> ");
+}
+
+static void fiq_debugger_dump_kernel_log(struct fiq_debugger_state *state)
+{
+ char buf[512];
+ size_t len;
+ struct kmsg_dumper dumper = { .active = true };
+
+
+ kmsg_dump_rewind_nolock(&dumper);
+ while (kmsg_dump_get_line_nolock(&dumper, true, buf,
+ sizeof(buf) - 1, &len)) {
+ buf[len] = 0;
+ fiq_debugger_puts(state, buf);
+ }
+}
+
+static void fiq_debugger_printf(struct fiq_debugger_output *output,
+ const char *fmt, ...)
+{
+ struct fiq_debugger_state *state;
+ char buf[256];
+ va_list ap;
+
+ state = container_of(output, struct fiq_debugger_state, output);
+ va_start(ap, fmt);
+ vsnprintf(buf, sizeof(buf), fmt, ap);
+ va_end(ap);
+
+ fiq_debugger_puts(state, buf);
+}
+
+/* Safe outside fiq context */
+static int fiq_debugger_printf_nfiq(void *cookie, const char *fmt, ...)
+{
+ struct fiq_debugger_state *state = cookie;
+ char buf[256];
+ va_list ap;
+ unsigned long irq_flags;
+
+ va_start(ap, fmt);
+ vsnprintf(buf, 128, fmt, ap);
+ va_end(ap);
+
+ local_irq_save(irq_flags);
+ fiq_debugger_puts(state, buf);
+ fiq_debugger_uart_flush(state);
+ local_irq_restore(irq_flags);
+ return state->debug_abort;
+}
+
+static void fiq_debugger_dump_irqs(struct fiq_debugger_state *state)
+{
+ int n;
+ struct irq_desc *desc;
+
+ fiq_debugger_printf(&state->output,
+ "irqnr total since-last status name\n");
+ for_each_irq_desc(n, desc) {
+ struct irqaction *act = desc->action;
+ if (!act && !kstat_irqs(n))
+ continue;
+ fiq_debugger_printf(&state->output, "%5d: %10u %11u %8x %s\n", n,
+ kstat_irqs(n),
+ kstat_irqs(n) - state->last_irqs[n],
+ desc->status_use_accessors,
+ (act && act->name) ? act->name : "???");
+ state->last_irqs[n] = kstat_irqs(n);
+ }
+}
+
+static void fiq_debugger_do_ps(struct fiq_debugger_state *state)
+{
+ struct task_struct *g;
+ struct task_struct *p;
+ unsigned task_state;
+ static const char stat_nam[] = "RSDTtZX";
+
+ fiq_debugger_printf(&state->output, "pid ppid prio task pc\n");
+ read_lock(&tasklist_lock);
+ do_each_thread(g, p) {
+ task_state = p->state ? __ffs(p->state) + 1 : 0;
+ fiq_debugger_printf(&state->output,
+ "%5d %5d %4d ", p->pid, p->parent->pid, p->prio);
+ fiq_debugger_printf(&state->output, "%-13.13s %c", p->comm,
+ task_state >= sizeof(stat_nam) ? '?' : stat_nam[task_state]);
+ if (task_state == TASK_RUNNING)
+ fiq_debugger_printf(&state->output, " running\n");
+ else
+ fiq_debugger_printf(&state->output, " %08lx\n",
+ thread_saved_pc(p));
+ } while_each_thread(g, p);
+ read_unlock(&tasklist_lock);
+}
+
+#ifdef CONFIG_FIQ_DEBUGGER_CONSOLE
+static void fiq_debugger_begin_syslog_dump(struct fiq_debugger_state *state)
+{
+ state->syslog_dumping = true;
+}
+
+static void fiq_debugger_end_syslog_dump(struct fiq_debugger_state *state)
+{
+ state->syslog_dumping = false;
+}
+#else
+extern int do_syslog(int type, char __user *bug, int count);
+static void fiq_debugger_begin_syslog_dump(struct fiq_debugger_state *state)
+{
+ do_syslog(5 /* clear */, NULL, 0);
+}
+
+static void fiq_debugger_end_syslog_dump(struct fiq_debugger_state *state)
+{
+ fiq_debugger_dump_kernel_log(state);
+}
+#endif
+
+static void fiq_debugger_do_sysrq(struct fiq_debugger_state *state, char rq)
+{
+ if ((rq == 'g' || rq == 'G') && !fiq_kgdb_enable) {
+ fiq_debugger_printf(&state->output, "sysrq-g blocked\n");
+ return;
+ }
+ fiq_debugger_begin_syslog_dump(state);
+ handle_sysrq(rq);
+ fiq_debugger_end_syslog_dump(state);
+}
+
+#ifdef CONFIG_KGDB
+static void fiq_debugger_do_kgdb(struct fiq_debugger_state *state)
+{
+ if (!fiq_kgdb_enable) {
+ fiq_debugger_printf(&state->output, "kgdb through fiq debugger not enabled\n");
+ return;
+ }
+
+ fiq_debugger_printf(&state->output, "enabling console and triggering kgdb\n");
+ state->console_enable = true;
+ handle_sysrq('g');
+}
+#endif
+
+static void fiq_debugger_schedule_work(struct fiq_debugger_state *state,
+ char *cmd)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&state->work_lock, flags);
+ if (state->work_cmd[0] != '\0') {
+ fiq_debugger_printf(&state->output, "work command processor busy\n");
+ spin_unlock_irqrestore(&state->work_lock, flags);
+ return;
+ }
+
+ strlcpy(state->work_cmd, cmd, sizeof(state->work_cmd));
+ spin_unlock_irqrestore(&state->work_lock, flags);
+
+ schedule_work(&state->work);
+}
+
+static void fiq_debugger_work(struct work_struct *work)
+{
+ struct fiq_debugger_state *state;
+ char work_cmd[DEBUG_MAX];
+ char *cmd;
+ unsigned long flags;
+
+ state = container_of(work, struct fiq_debugger_state, work);
+
+ spin_lock_irqsave(&state->work_lock, flags);
+
+ strlcpy(work_cmd, state->work_cmd, sizeof(work_cmd));
+ state->work_cmd[0] = '\0';
+
+ spin_unlock_irqrestore(&state->work_lock, flags);
+
+ cmd = work_cmd;
+ if (!strncmp(cmd, "reboot", 6)) {
+ cmd += 6;
+ while (*cmd == ' ')
+ cmd++;
+ if (*cmd != '\0')
+ kernel_restart(cmd);
+ else
+ kernel_restart(NULL);
+ } else {
+ fiq_debugger_printf(&state->output, "unknown work command '%s'\n",
+ work_cmd);
+ }
+}
+
+/* This function CANNOT be called in FIQ context */
+static void fiq_debugger_irq_exec(struct fiq_debugger_state *state, char *cmd)
+{
+ if (!strcmp(cmd, "ps"))
+ fiq_debugger_do_ps(state);
+ if (!strcmp(cmd, "sysrq"))
+ fiq_debugger_do_sysrq(state, 'h');
+ if (!strncmp(cmd, "sysrq ", 6))
+ fiq_debugger_do_sysrq(state, cmd[6]);
+#ifdef CONFIG_KGDB
+ if (!strcmp(cmd, "kgdb"))
+ fiq_debugger_do_kgdb(state);
+#endif
+ if (!strncmp(cmd, "reboot", 6))
+ fiq_debugger_schedule_work(state, cmd);
+}
+
+static void fiq_debugger_help(struct fiq_debugger_state *state)
+{
+ fiq_debugger_printf(&state->output,
+ "FIQ Debugger commands:\n"
+ " pc PC status\n"
+ " regs Register dump\n"
+ " allregs Extended Register dump\n"
+ " bt Stack trace\n"
+ " reboot [<c>] Reboot with command <c>\n"
+ " reset [<c>] Hard reset with command <c>\n"
+ " irqs Interupt status\n"
+ " kmsg Kernel log\n"
+ " version Kernel version\n");
+ fiq_debugger_printf(&state->output,
+ " sleep Allow sleep while in FIQ\n"
+ " nosleep Disable sleep while in FIQ\n"
+ " console Switch terminal to console\n"
+ " cpu Current CPU\n"
+ " cpu <number> Switch to CPU<number>\n");
+ fiq_debugger_printf(&state->output,
+ " ps Process list\n"
+ " sysrq sysrq options\n"
+ " sysrq <param> Execute sysrq with <param>\n");
+#ifdef CONFIG_KGDB
+ fiq_debugger_printf(&state->output,
+ " kgdb Enter kernel debugger\n");
+#endif
+}
+
+static void fiq_debugger_take_affinity(void *info)
+{
+ struct fiq_debugger_state *state = info;
+ struct cpumask cpumask;
+
+ cpumask_clear(&cpumask);
+ cpumask_set_cpu(get_cpu(), &cpumask);
+
+ irq_set_affinity(state->uart_irq, &cpumask);
+}
+
+static void fiq_debugger_switch_cpu(struct fiq_debugger_state *state, int cpu)
+{
+ if (!fiq_debugger_have_fiq(state))
+ smp_call_function_single(cpu, fiq_debugger_take_affinity, state,
+ false);
+ state->current_cpu = cpu;
+}
+
+static bool fiq_debugger_fiq_exec(struct fiq_debugger_state *state,
+ const char *cmd, const struct pt_regs *regs,
+ void *svc_sp)
+{
+ bool signal_helper = false;
+
+ if (!strcmp(cmd, "help") || !strcmp(cmd, "?")) {
+ fiq_debugger_help(state);
+ } else if (!strcmp(cmd, "pc")) {
+ fiq_debugger_dump_pc(&state->output, regs);
+ } else if (!strcmp(cmd, "regs")) {
+ fiq_debugger_dump_regs(&state->output, regs);
+ } else if (!strcmp(cmd, "allregs")) {
+ fiq_debugger_dump_allregs(&state->output, regs);
+ } else if (!strcmp(cmd, "bt")) {
+ fiq_debugger_dump_stacktrace(&state->output, regs, 100, svc_sp);
+ } else if (!strncmp(cmd, "reset", 5)) {
+ cmd += 5;
+ while (*cmd == ' ')
+ cmd++;
+ if (*cmd) {
+ char tmp_cmd[32];
+ strlcpy(tmp_cmd, cmd, sizeof(tmp_cmd));
+ machine_restart(tmp_cmd);
+ } else {
+ machine_restart(NULL);
+ }
+ } else if (!strcmp(cmd, "irqs")) {
+ fiq_debugger_dump_irqs(state);
+ } else if (!strcmp(cmd, "kmsg")) {
+ fiq_debugger_dump_kernel_log(state);
+ } else if (!strcmp(cmd, "version")) {
+ fiq_debugger_printf(&state->output, "%s\n", linux_banner);
+ } else if (!strcmp(cmd, "sleep")) {
+ state->no_sleep = false;
+ fiq_debugger_printf(&state->output, "enabling sleep\n");
+ } else if (!strcmp(cmd, "nosleep")) {
+ state->no_sleep = true;
+ fiq_debugger_printf(&state->output, "disabling sleep\n");
+ } else if (!strcmp(cmd, "console")) {
+ fiq_debugger_printf(&state->output, "console mode\n");
+ fiq_debugger_uart_flush(state);
+ state->console_enable = true;
+ } else if (!strcmp(cmd, "cpu")) {
+ fiq_debugger_printf(&state->output, "cpu %d\n", state->current_cpu);
+ } else if (!strncmp(cmd, "cpu ", 4)) {
+ unsigned long cpu = 0;
+ if (kstrtoul(cmd + 4, 10, &cpu) == 0)
+ fiq_debugger_switch_cpu(state, cpu);
+ else
+ fiq_debugger_printf(&state->output, "invalid cpu\n");
+ fiq_debugger_printf(&state->output, "cpu %d\n", state->current_cpu);
+ } else {
+ if (state->debug_busy) {
+ fiq_debugger_printf(&state->output,
+ "command processor busy. trying to abort.\n");
+ state->debug_abort = -1;
+ } else {
+ strcpy(state->debug_cmd, cmd);
+ state->debug_busy = 1;
+ }
+
+ return true;
+ }
+ if (!state->console_enable)
+ fiq_debugger_prompt(state);
+
+ return signal_helper;
+}
+
+static void fiq_debugger_sleep_timer_expired(unsigned long data)
+{
+ struct fiq_debugger_state *state = (struct fiq_debugger_state *)data;
+ unsigned long flags;
+
+ spin_lock_irqsave(&state->sleep_timer_lock, flags);
+ if (state->uart_enabled && !state->no_sleep) {
+ if (state->debug_enable && !state->console_enable) {
+ state->debug_enable = false;
+ fiq_debugger_printf_nfiq(state,
+ "suspending fiq debugger\n");
+ }
+ state->ignore_next_wakeup_irq = true;
+ fiq_debugger_uart_disable(state);
+ state->uart_enabled = false;
+ fiq_debugger_enable_wakeup_irq(state);
+ }
+ __pm_relax(&state->debugger_wake_src);
+ spin_unlock_irqrestore(&state->sleep_timer_lock, flags);
+}
+
+static void fiq_debugger_handle_wakeup(struct fiq_debugger_state *state)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&state->sleep_timer_lock, flags);
+ if (state->wakeup_irq >= 0 && state->ignore_next_wakeup_irq) {
+ state->ignore_next_wakeup_irq = false;
+ } else if (!state->uart_enabled) {
+ __pm_stay_awake(&state->debugger_wake_src);
+ fiq_debugger_uart_enable(state);
+ state->uart_enabled = true;
+ fiq_debugger_disable_wakeup_irq(state);
+ mod_timer(&state->sleep_timer, jiffies + HZ / 2);
+ }
+ spin_unlock_irqrestore(&state->sleep_timer_lock, flags);
+}
+
+static irqreturn_t fiq_debugger_wakeup_irq_handler(int irq, void *dev)
+{
+ struct fiq_debugger_state *state = dev;
+
+ if (!state->no_sleep)
+ fiq_debugger_puts(state, "WAKEUP\n");
+ fiq_debugger_handle_wakeup(state);
+
+ return IRQ_HANDLED;
+}
+
+static
+void fiq_debugger_handle_console_irq_context(struct fiq_debugger_state *state)
+{
+#if defined(CONFIG_FIQ_DEBUGGER_CONSOLE)
+ if (state->tty_port.ops) {
+ int i;
+ int count = fiq_debugger_ringbuf_level(state->tty_rbuf);
+ for (i = 0; i < count; i++) {
+ int c = fiq_debugger_ringbuf_peek(state->tty_rbuf, 0);
+ tty_insert_flip_char(&state->tty_port, c, TTY_NORMAL);
+ if (!fiq_debugger_ringbuf_consume(state->tty_rbuf, 1))
+ pr_warn("fiq tty failed to consume byte\n");
+ }
+ tty_flip_buffer_push(&state->tty_port);
+ }
+#endif
+}
+
+static void fiq_debugger_handle_irq_context(struct fiq_debugger_state *state)
+{
+ if (!state->no_sleep) {
+ unsigned long flags;
+
+ spin_lock_irqsave(&state->sleep_timer_lock, flags);
+ __pm_stay_awake(&state->debugger_wake_src);
+ mod_timer(&state->sleep_timer, jiffies + HZ * 5);
+ spin_unlock_irqrestore(&state->sleep_timer_lock, flags);
+ }
+ fiq_debugger_handle_console_irq_context(state);
+ if (state->debug_busy) {
+ fiq_debugger_irq_exec(state, state->debug_cmd);
+ if (!state->console_enable)
+ fiq_debugger_prompt(state);
+ state->debug_busy = 0;
+ }
+}
+
+static int fiq_debugger_getc(struct fiq_debugger_state *state)
+{
+ return state->pdata->uart_getc(state->pdev);
+}
+
+static bool fiq_debugger_handle_uart_interrupt(struct fiq_debugger_state *state,
+ int this_cpu, const struct pt_regs *regs, void *svc_sp)
+{
+ int c;
+ static int last_c;
+ int count = 0;
+ bool signal_helper = false;
+
+ if (this_cpu != state->current_cpu) {
+ if (state->in_fiq)
+ return false;
+
+ if (atomic_inc_return(&state->unhandled_fiq_count) !=
+ MAX_UNHANDLED_FIQ_COUNT)
+ return false;
+
+ fiq_debugger_printf(&state->output,
+ "fiq_debugger: cpu %d not responding, "
+ "reverting to cpu %d\n", state->current_cpu,
+ this_cpu);
+
+ atomic_set(&state->unhandled_fiq_count, 0);
+ fiq_debugger_switch_cpu(state, this_cpu);
+ return false;
+ }
+
+ state->in_fiq = true;
+
+ while ((c = fiq_debugger_getc(state)) != FIQ_DEBUGGER_NO_CHAR) {
+ count++;
+ if (!state->debug_enable) {
+ if ((c == 13) || (c == 10)) {
+ state->debug_enable = true;
+ state->debug_count = 0;
+ fiq_debugger_prompt(state);
+ }
+ } else if (c == FIQ_DEBUGGER_BREAK) {
+ state->console_enable = false;
+ fiq_debugger_puts(state, "fiq debugger mode\n");
+ state->debug_count = 0;
+ fiq_debugger_prompt(state);
+#ifdef CONFIG_FIQ_DEBUGGER_CONSOLE
+ } else if (state->console_enable && state->tty_rbuf) {
+ fiq_debugger_ringbuf_push(state->tty_rbuf, c);
+ signal_helper = true;
+#endif
+ } else if ((c >= ' ') && (c < 127)) {
+ if (state->debug_count < (DEBUG_MAX - 1)) {
+ state->debug_buf[state->debug_count++] = c;
+ fiq_debugger_putc(state, c);
+ }
+ } else if ((c == 8) || (c == 127)) {
+ if (state->debug_count > 0) {
+ state->debug_count--;
+ fiq_debugger_putc(state, 8);
+ fiq_debugger_putc(state, ' ');
+ fiq_debugger_putc(state, 8);
+ }
+ } else if ((c == 13) || (c == 10)) {
+ if (c == '\r' || (c == '\n' && last_c != '\r')) {
+ fiq_debugger_putc(state, '\r');
+ fiq_debugger_putc(state, '\n');
+ }
+ if (state->debug_count) {
+ state->debug_buf[state->debug_count] = 0;
+ state->debug_count = 0;
+ signal_helper |=
+ fiq_debugger_fiq_exec(state,
+ state->debug_buf,
+ regs, svc_sp);
+ } else {
+ fiq_debugger_prompt(state);
+ }
+ }
+ last_c = c;
+ }
+ if (!state->console_enable)
+ fiq_debugger_uart_flush(state);
+ if (state->pdata->fiq_ack)
+ state->pdata->fiq_ack(state->pdev, state->fiq);
+
+ /* poke sleep timer if necessary */
+ if (state->debug_enable && !state->no_sleep)
+ signal_helper = true;
+
+ atomic_set(&state->unhandled_fiq_count, 0);
+ state->in_fiq = false;
+
+ return signal_helper;
+}
+
+#ifdef CONFIG_FIQ_GLUE
+static void fiq_debugger_fiq(struct fiq_glue_handler *h,
+ const struct pt_regs *regs, void *svc_sp)
+{
+ struct fiq_debugger_state *state =
+ container_of(h, struct fiq_debugger_state, handler);
+ unsigned int this_cpu = THREAD_INFO(svc_sp)->cpu;
+ bool need_irq;
+
+ need_irq = fiq_debugger_handle_uart_interrupt(state, this_cpu, regs,
+ svc_sp);
+ if (need_irq)
+ fiq_debugger_force_irq(state);
+}
+#endif
+
+/*
+ * When not using FIQs, we only use this single interrupt as an entry point.
+ * This just effectively takes over the UART interrupt and does all the work
+ * in this context.
+ */
+static irqreturn_t fiq_debugger_uart_irq(int irq, void *dev)
+{
+ struct fiq_debugger_state *state = dev;
+ bool not_done;
+
+ fiq_debugger_handle_wakeup(state);
+
+ /* handle the debugger irq in regular context */
+ not_done = fiq_debugger_handle_uart_interrupt(state, smp_processor_id(),
+ get_irq_regs(),
+ current_thread_info());
+ if (not_done)
+ fiq_debugger_handle_irq_context(state);
+
+ return IRQ_HANDLED;
+}
+
+/*
+ * If FIQs are used, not everything can happen in fiq context.
+ * FIQ handler does what it can and then signals this interrupt to finish the
+ * job in irq context.
+ */
+static irqreturn_t fiq_debugger_signal_irq(int irq, void *dev)
+{
+ struct fiq_debugger_state *state = dev;
+
+ if (state->pdata->force_irq_ack)
+ state->pdata->force_irq_ack(state->pdev, state->signal_irq);
+
+ fiq_debugger_handle_irq_context(state);
+
+ return IRQ_HANDLED;
+}
+
+#ifdef CONFIG_FIQ_GLUE
+static void fiq_debugger_resume(struct fiq_glue_handler *h)
+{
+ struct fiq_debugger_state *state =
+ container_of(h, struct fiq_debugger_state, handler);
+ if (state->pdata->uart_resume)
+ state->pdata->uart_resume(state->pdev);
+}
+#endif
+
+#if defined(CONFIG_FIQ_DEBUGGER_CONSOLE)
+struct tty_driver *fiq_debugger_console_device(struct console *co, int *index)
+{
+ *index = co->index;
+ return fiq_tty_driver;
+}
+
+static void fiq_debugger_console_write(struct console *co,
+ const char *s, unsigned int count)
+{
+ struct fiq_debugger_state *state;
+ unsigned long flags;
+
+ state = container_of(co, struct fiq_debugger_state, console);
+
+ if (!state->console_enable && !state->syslog_dumping)
+ return;
+
+ fiq_debugger_uart_enable(state);
+ spin_lock_irqsave(&state->console_lock, flags);
+ while (count--) {
+ if (*s == '\n')
+ fiq_debugger_putc(state, '\r');
+ fiq_debugger_putc(state, *s++);
+ }
+ fiq_debugger_uart_flush(state);
+ spin_unlock_irqrestore(&state->console_lock, flags);
+ fiq_debugger_uart_disable(state);
+}
+
+static struct console fiq_debugger_console = {
+ .name = "ttyFIQ",
+ .device = fiq_debugger_console_device,
+ .write = fiq_debugger_console_write,
+ .flags = CON_PRINTBUFFER | CON_ANYTIME | CON_ENABLED,
+};
+
+int fiq_tty_open(struct tty_struct *tty, struct file *filp)
+{
+ int line = tty->index;
+ struct fiq_debugger_state **states = tty->driver->driver_state;
+ struct fiq_debugger_state *state = states[line];
+
+ return tty_port_open(&state->tty_port, tty, filp);
+}
+
+void fiq_tty_close(struct tty_struct *tty, struct file *filp)
+{
+ tty_port_close(tty->port, tty, filp);
+}
+
+int fiq_tty_write(struct tty_struct *tty, const unsigned char *buf, int count)
+{
+ int i;
+ int line = tty->index;
+ struct fiq_debugger_state **states = tty->driver->driver_state;
+ struct fiq_debugger_state *state = states[line];
+
+ if (!state->console_enable)
+ return count;
+
+ fiq_debugger_uart_enable(state);
+ spin_lock_irq(&state->console_lock);
+ for (i = 0; i < count; i++)
+ fiq_debugger_putc(state, *buf++);
+ spin_unlock_irq(&state->console_lock);
+ fiq_debugger_uart_disable(state);
+
+ return count;
+}
+
+int fiq_tty_write_room(struct tty_struct *tty)
+{
+ return 16;
+}
+
+#ifdef CONFIG_CONSOLE_POLL
+static int fiq_tty_poll_init(struct tty_driver *driver, int line, char *options)
+{
+ return 0;
+}
+
+static int fiq_tty_poll_get_char(struct tty_driver *driver, int line)
+{
+ struct fiq_debugger_state **states = driver->driver_state;
+ struct fiq_debugger_state *state = states[line];
+ int c = NO_POLL_CHAR;
+
+ fiq_debugger_uart_enable(state);
+ if (fiq_debugger_have_fiq(state)) {
+ int count = fiq_debugger_ringbuf_level(state->tty_rbuf);
+ if (count > 0) {
+ c = fiq_debugger_ringbuf_peek(state->tty_rbuf, 0);
+ fiq_debugger_ringbuf_consume(state->tty_rbuf, 1);
+ }
+ } else {
+ c = fiq_debugger_getc(state);
+ if (c == FIQ_DEBUGGER_NO_CHAR)
+ c = NO_POLL_CHAR;
+ }
+ fiq_debugger_uart_disable(state);
+
+ return c;
+}
+
+static void fiq_tty_poll_put_char(struct tty_driver *driver, int line, char ch)
+{
+ struct fiq_debugger_state **states = driver->driver_state;
+ struct fiq_debugger_state *state = states[line];
+ fiq_debugger_uart_enable(state);
+ fiq_debugger_putc(state, ch);
+ fiq_debugger_uart_disable(state);
+}
+#endif
+
+static const struct tty_port_operations fiq_tty_port_ops;
+
+static const struct tty_operations fiq_tty_driver_ops = {
+ .write = fiq_tty_write,
+ .write_room = fiq_tty_write_room,
+ .open = fiq_tty_open,
+ .close = fiq_tty_close,
+#ifdef CONFIG_CONSOLE_POLL
+ .poll_init = fiq_tty_poll_init,
+ .poll_get_char = fiq_tty_poll_get_char,
+ .poll_put_char = fiq_tty_poll_put_char,
+#endif
+};
+
+static int fiq_debugger_tty_init(void)
+{
+ int ret;
+ struct fiq_debugger_state **states = NULL;
+
+ states = kzalloc(sizeof(*states) * MAX_FIQ_DEBUGGER_PORTS, GFP_KERNEL);
+ if (!states) {
+ pr_err("Failed to allocate fiq debugger state structres\n");
+ return -ENOMEM;
+ }
+
+ fiq_tty_driver = alloc_tty_driver(MAX_FIQ_DEBUGGER_PORTS);
+ if (!fiq_tty_driver) {
+ pr_err("Failed to allocate fiq debugger tty\n");
+ ret = -ENOMEM;
+ goto err_free_state;
+ }
+
+ fiq_tty_driver->owner = THIS_MODULE;
+ fiq_tty_driver->driver_name = "fiq-debugger";
+ fiq_tty_driver->name = "ttyFIQ";
+ fiq_tty_driver->type = TTY_DRIVER_TYPE_SERIAL;
+ fiq_tty_driver->subtype = SERIAL_TYPE_NORMAL;
+ fiq_tty_driver->init_termios = tty_std_termios;
+ fiq_tty_driver->flags = TTY_DRIVER_REAL_RAW |
+ TTY_DRIVER_DYNAMIC_DEV;
+ fiq_tty_driver->driver_state = states;
+
+ fiq_tty_driver->init_termios.c_cflag =
+ B115200 | CS8 | CREAD | HUPCL | CLOCAL;
+ fiq_tty_driver->init_termios.c_ispeed = 115200;
+ fiq_tty_driver->init_termios.c_ospeed = 115200;
+
+ tty_set_operations(fiq_tty_driver, &fiq_tty_driver_ops);
+
+ ret = tty_register_driver(fiq_tty_driver);
+ if (ret) {
+ pr_err("Failed to register fiq tty: %d\n", ret);
+ goto err_free_tty;
+ }
+
+ pr_info("Registered FIQ tty driver\n");
+ return 0;
+
+err_free_tty:
+ put_tty_driver(fiq_tty_driver);
+ fiq_tty_driver = NULL;
+err_free_state:
+ kfree(states);
+ return ret;
+}
+
+static int fiq_debugger_tty_init_one(struct fiq_debugger_state *state)
+{
+ int ret;
+ struct device *tty_dev;
+ struct fiq_debugger_state **states = fiq_tty_driver->driver_state;
+
+ states[state->pdev->id] = state;
+
+ state->tty_rbuf = fiq_debugger_ringbuf_alloc(1024);
+ if (!state->tty_rbuf) {
+ pr_err("Failed to allocate fiq debugger ringbuf\n");
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ tty_port_init(&state->tty_port);
+ state->tty_port.ops = &fiq_tty_port_ops;
+
+ tty_dev = tty_port_register_device(&state->tty_port, fiq_tty_driver,
+ state->pdev->id, &state->pdev->dev);
+ if (IS_ERR(tty_dev)) {
+ pr_err("Failed to register fiq debugger tty device\n");
+ ret = PTR_ERR(tty_dev);
+ goto err;
+ }
+
+ device_set_wakeup_capable(tty_dev, 1);
+
+ pr_info("Registered fiq debugger ttyFIQ%d\n", state->pdev->id);
+
+ return 0;
+
+err:
+ fiq_debugger_ringbuf_free(state->tty_rbuf);
+ state->tty_rbuf = NULL;
+ return ret;
+}
+#endif
+
+static int fiq_debugger_dev_suspend(struct device *dev)
+{
+ struct platform_device *pdev = to_platform_device(dev);
+ struct fiq_debugger_state *state = platform_get_drvdata(pdev);
+
+ if (state->pdata->uart_dev_suspend)
+ return state->pdata->uart_dev_suspend(pdev);
+ return 0;
+}
+
+static int fiq_debugger_dev_resume(struct device *dev)
+{
+ struct platform_device *pdev = to_platform_device(dev);
+ struct fiq_debugger_state *state = platform_get_drvdata(pdev);
+
+ if (state->pdata->uart_dev_resume)
+ return state->pdata->uart_dev_resume(pdev);
+ return 0;
+}
+
+static int fiq_debugger_probe(struct platform_device *pdev)
+{
+ int ret;
+ struct fiq_debugger_pdata *pdata = dev_get_platdata(&pdev->dev);
+ struct fiq_debugger_state *state;
+ int fiq;
+ int uart_irq;
+
+ if (pdev->id >= MAX_FIQ_DEBUGGER_PORTS)
+ return -EINVAL;
+
+ if (!pdata->uart_getc || !pdata->uart_putc)
+ return -EINVAL;
+ if ((pdata->uart_enable && !pdata->uart_disable) ||
+ (!pdata->uart_enable && pdata->uart_disable))
+ return -EINVAL;
+
+ fiq = platform_get_irq_byname(pdev, "fiq");
+ uart_irq = platform_get_irq_byname(pdev, "uart_irq");
+
+ /* uart_irq mode and fiq mode are mutually exclusive, but one of them
+ * is required */
+ if ((uart_irq < 0 && fiq < 0) || (uart_irq >= 0 && fiq >= 0))
+ return -EINVAL;
+ if (fiq >= 0 && !pdata->fiq_enable)
+ return -EINVAL;
+
+ state = kzalloc(sizeof(*state), GFP_KERNEL);
+ state->output.printf = fiq_debugger_printf;
+ setup_timer(&state->sleep_timer, fiq_debugger_sleep_timer_expired,
+ (unsigned long)state);
+ state->pdata = pdata;
+ state->pdev = pdev;
+ state->no_sleep = initial_no_sleep;
+ state->debug_enable = initial_debug_enable;
+ state->console_enable = initial_console_enable;
+
+ state->fiq = fiq;
+ state->uart_irq = uart_irq;
+ state->signal_irq = platform_get_irq_byname(pdev, "signal");
+ state->wakeup_irq = platform_get_irq_byname(pdev, "wakeup");
+
+ INIT_WORK(&state->work, fiq_debugger_work);
+ spin_lock_init(&state->work_lock);
+
+ platform_set_drvdata(pdev, state);
+
+ spin_lock_init(&state->sleep_timer_lock);
+
+ if (state->wakeup_irq < 0 && fiq_debugger_have_fiq(state))
+ state->no_sleep = true;
+ state->ignore_next_wakeup_irq = !state->no_sleep;
+
+ wakeup_source_init(&state->debugger_wake_src, "serial-debug");
+
+ state->clk = clk_get(&pdev->dev, NULL);
+ if (IS_ERR(state->clk))
+ state->clk = NULL;
+
+ /* do not call pdata->uart_enable here since uart_init may still
+ * need to do some initialization before uart_enable can work.
+ * So, only try to manage the clock during init.
+ */
+ if (state->clk)
+ clk_enable(state->clk);
+
+ if (pdata->uart_init) {
+ ret = pdata->uart_init(pdev);
+ if (ret)
+ goto err_uart_init;
+ }
+
+ fiq_debugger_printf_nfiq(state,
+ "<hit enter %sto activate fiq debugger>\n",
+ state->no_sleep ? "" : "twice ");
+
+#ifdef CONFIG_FIQ_GLUE
+ if (fiq_debugger_have_fiq(state)) {
+ state->handler.fiq = fiq_debugger_fiq;
+ state->handler.resume = fiq_debugger_resume;
+ ret = fiq_glue_register_handler(&state->handler);
+ if (ret) {
+ pr_err("%s: could not install fiq handler\n", __func__);
+ goto err_register_irq;
+ }
+
+ pdata->fiq_enable(pdev, state->fiq, 1);
+ } else
+#endif
+ {
+ ret = request_irq(state->uart_irq, fiq_debugger_uart_irq,
+ IRQF_NO_SUSPEND, "debug", state);
+ if (ret) {
+ pr_err("%s: could not install irq handler\n", __func__);
+ goto err_register_irq;
+ }
+
+ /* for irq-only mode, we want this irq to wake us up, if it
+ * can.
+ */
+ enable_irq_wake(state->uart_irq);
+ }
+
+ if (state->clk)
+ clk_disable(state->clk);
+
+ if (state->signal_irq >= 0) {
+ ret = request_irq(state->signal_irq, fiq_debugger_signal_irq,
+ IRQF_TRIGGER_RISING, "debug-signal", state);
+ if (ret)
+ pr_err("serial_debugger: could not install signal_irq");
+ }
+
+ if (state->wakeup_irq >= 0) {
+ ret = request_irq(state->wakeup_irq,
+ fiq_debugger_wakeup_irq_handler,
+ IRQF_TRIGGER_FALLING,
+ "debug-wakeup", state);
+ if (ret) {
+ pr_err("serial_debugger: "
+ "could not install wakeup irq\n");
+ state->wakeup_irq = -1;
+ } else {
+ ret = enable_irq_wake(state->wakeup_irq);
+ if (ret) {
+ pr_err("serial_debugger: "
+ "could not enable wakeup\n");
+ state->wakeup_irq_no_set_wake = true;
+ }
+ }
+ }
+ if (state->no_sleep)
+ fiq_debugger_handle_wakeup(state);
+
+#if defined(CONFIG_FIQ_DEBUGGER_CONSOLE)
+ spin_lock_init(&state->console_lock);
+ state->console = fiq_debugger_console;
+ state->console.index = pdev->id;
+ if (!console_set_on_cmdline)
+ add_preferred_console(state->console.name,
+ state->console.index, NULL);
+ register_console(&state->console);
+ fiq_debugger_tty_init_one(state);
+#endif
+ return 0;
+
+err_register_irq:
+ if (pdata->uart_free)
+ pdata->uart_free(pdev);
+err_uart_init:
+ if (state->clk)
+ clk_disable(state->clk);
+ if (state->clk)
+ clk_put(state->clk);
+ wakeup_source_trash(&state->debugger_wake_src);
+ platform_set_drvdata(pdev, NULL);
+ kfree(state);
+ return ret;
+}
+
+static const struct dev_pm_ops fiq_debugger_dev_pm_ops = {
+ .suspend = fiq_debugger_dev_suspend,
+ .resume = fiq_debugger_dev_resume,
+};
+
+static struct platform_driver fiq_debugger_driver = {
+ .probe = fiq_debugger_probe,
+ .driver = {
+ .name = "fiq_debugger",
+ .pm = &fiq_debugger_dev_pm_ops,
+ },
+};
+
+#if defined(CONFIG_FIQ_DEBUGGER_UART_OVERLAY)
+int fiq_debugger_uart_overlay(void)
+{
+ struct device_node *onp = of_find_node_by_path("/uart_overlay@0");
+ int ret;
+
+ if (!onp) {
+ pr_err("serial_debugger: uart overlay not found\n");
+ return -ENODEV;
+ }
+
+ ret = of_overlay_create(onp);
+ if (ret < 0) {
+ pr_err("serial_debugger: fail to create overlay: %d\n", ret);
+ of_node_put(onp);
+ return ret;
+ }
+
+ pr_info("serial_debugger: uart overlay applied\n");
+ return 0;
+}
+#endif
+
+static int __init fiq_debugger_init(void)
+{
+ if (fiq_debugger_disable) {
+ pr_err("serial_debugger: disabled\n");
+ return -ENODEV;
+ }
+#if defined(CONFIG_FIQ_DEBUGGER_CONSOLE)
+ fiq_debugger_tty_init();
+#endif
+#if defined(CONFIG_FIQ_DEBUGGER_UART_OVERLAY)
+ fiq_debugger_uart_overlay();
+#endif
+ return platform_driver_register(&fiq_debugger_driver);
+}
+
+postcore_initcall(fiq_debugger_init);
diff --git a/drivers/staging/android/fiq_debugger/fiq_debugger.h b/drivers/staging/android/fiq_debugger/fiq_debugger.h
new file mode 100644
index 0000000..c9ec4f8
--- /dev/null
+++ b/drivers/staging/android/fiq_debugger/fiq_debugger.h
@@ -0,0 +1,64 @@
+/*
+ * drivers/staging/android/fiq_debugger/fiq_debugger.h
+ *
+ * Copyright (C) 2010 Google, Inc.
+ * Author: Colin Cross <ccross@android.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _ARCH_ARM_MACH_TEGRA_FIQ_DEBUGGER_H_
+#define _ARCH_ARM_MACH_TEGRA_FIQ_DEBUGGER_H_
+
+#include <linux/serial_core.h>
+
+#define FIQ_DEBUGGER_NO_CHAR NO_POLL_CHAR
+#define FIQ_DEBUGGER_BREAK 0x00ff0100
+
+#define FIQ_DEBUGGER_FIQ_IRQ_NAME "fiq"
+#define FIQ_DEBUGGER_SIGNAL_IRQ_NAME "signal"
+#define FIQ_DEBUGGER_WAKEUP_IRQ_NAME "wakeup"
+
+/**
+ * struct fiq_debugger_pdata - fiq debugger platform data
+ * @uart_resume: used to restore uart state right before enabling
+ * the fiq.
+ * @uart_enable: Do the work necessary to communicate with the uart
+ * hw (enable clocks, etc.). This must be ref-counted.
+ * @uart_disable: Do the work necessary to disable the uart hw
+ * (disable clocks, etc.). This must be ref-counted.
+ * @uart_dev_suspend: called during PM suspend, generally not needed
+ * for real fiq mode debugger.
+ * @uart_dev_resume: called during PM resume, generally not needed
+ * for real fiq mode debugger.
+ */
+struct fiq_debugger_pdata {
+ int (*uart_init)(struct platform_device *pdev);
+ void (*uart_free)(struct platform_device *pdev);
+ int (*uart_resume)(struct platform_device *pdev);
+ int (*uart_getc)(struct platform_device *pdev);
+ void (*uart_putc)(struct platform_device *pdev, unsigned int c);
+ void (*uart_flush)(struct platform_device *pdev);
+ void (*uart_enable)(struct platform_device *pdev);
+ void (*uart_disable)(struct platform_device *pdev);
+
+ int (*uart_dev_suspend)(struct platform_device *pdev);
+ int (*uart_dev_resume)(struct platform_device *pdev);
+
+ void (*fiq_enable)(struct platform_device *pdev, unsigned int fiq,
+ bool enable);
+ void (*fiq_ack)(struct platform_device *pdev, unsigned int fiq);
+
+ void (*force_irq)(struct platform_device *pdev, unsigned int irq);
+ void (*force_irq_ack)(struct platform_device *pdev, unsigned int irq);
+};
+
+#endif
diff --git a/drivers/staging/android/fiq_debugger/fiq_debugger_arm.c b/drivers/staging/android/fiq_debugger/fiq_debugger_arm.c
new file mode 100644
index 0000000..8b3e013
--- /dev/null
+++ b/drivers/staging/android/fiq_debugger/fiq_debugger_arm.c
@@ -0,0 +1,240 @@
+/*
+ * Copyright (C) 2014 Google, Inc.
+ * Author: Colin Cross <ccross@android.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/ptrace.h>
+#include <linux/uaccess.h>
+
+#include <asm/stacktrace.h>
+
+#include "fiq_debugger_priv.h"
+
+static char *mode_name(unsigned cpsr)
+{
+ switch (cpsr & MODE_MASK) {
+ case USR_MODE: return "USR";
+ case FIQ_MODE: return "FIQ";
+ case IRQ_MODE: return "IRQ";
+ case SVC_MODE: return "SVC";
+ case ABT_MODE: return "ABT";
+ case UND_MODE: return "UND";
+ case SYSTEM_MODE: return "SYS";
+ default: return "???";
+ }
+}
+
+void fiq_debugger_dump_pc(struct fiq_debugger_output *output,
+ const struct pt_regs *regs)
+{
+ output->printf(output, " pc %08x cpsr %08x mode %s\n",
+ regs->ARM_pc, regs->ARM_cpsr, mode_name(regs->ARM_cpsr));
+}
+
+void fiq_debugger_dump_regs(struct fiq_debugger_output *output,
+ const struct pt_regs *regs)
+{
+ output->printf(output,
+ " r0 %08x r1 %08x r2 %08x r3 %08x\n",
+ regs->ARM_r0, regs->ARM_r1, regs->ARM_r2, regs->ARM_r3);
+ output->printf(output,
+ " r4 %08x r5 %08x r6 %08x r7 %08x\n",
+ regs->ARM_r4, regs->ARM_r5, regs->ARM_r6, regs->ARM_r7);
+ output->printf(output,
+ " r8 %08x r9 %08x r10 %08x r11 %08x mode %s\n",
+ regs->ARM_r8, regs->ARM_r9, regs->ARM_r10, regs->ARM_fp,
+ mode_name(regs->ARM_cpsr));
+ output->printf(output,
+ " ip %08x sp %08x lr %08x pc %08x cpsr %08x\n",
+ regs->ARM_ip, regs->ARM_sp, regs->ARM_lr, regs->ARM_pc,
+ regs->ARM_cpsr);
+}
+
+struct mode_regs {
+ unsigned long sp_svc;
+ unsigned long lr_svc;
+ unsigned long spsr_svc;
+
+ unsigned long sp_abt;
+ unsigned long lr_abt;
+ unsigned long spsr_abt;
+
+ unsigned long sp_und;
+ unsigned long lr_und;
+ unsigned long spsr_und;
+
+ unsigned long sp_irq;
+ unsigned long lr_irq;
+ unsigned long spsr_irq;
+
+ unsigned long r8_fiq;
+ unsigned long r9_fiq;
+ unsigned long r10_fiq;
+ unsigned long r11_fiq;
+ unsigned long r12_fiq;
+ unsigned long sp_fiq;
+ unsigned long lr_fiq;
+ unsigned long spsr_fiq;
+};
+
+static void __naked get_mode_regs(struct mode_regs *regs)
+{
+ asm volatile (
+ "mrs r1, cpsr\n"
+ "msr cpsr_c, #0xd3 @(SVC_MODE | PSR_I_BIT | PSR_F_BIT)\n"
+ "stmia r0!, {r13 - r14}\n"
+ "mrs r2, spsr\n"
+ "msr cpsr_c, #0xd7 @(ABT_MODE | PSR_I_BIT | PSR_F_BIT)\n"
+ "stmia r0!, {r2, r13 - r14}\n"
+ "mrs r2, spsr\n"
+ "msr cpsr_c, #0xdb @(UND_MODE | PSR_I_BIT | PSR_F_BIT)\n"
+ "stmia r0!, {r2, r13 - r14}\n"
+ "mrs r2, spsr\n"
+ "msr cpsr_c, #0xd2 @(IRQ_MODE | PSR_I_BIT | PSR_F_BIT)\n"
+ "stmia r0!, {r2, r13 - r14}\n"
+ "mrs r2, spsr\n"
+ "msr cpsr_c, #0xd1 @(FIQ_MODE | PSR_I_BIT | PSR_F_BIT)\n"
+ "stmia r0!, {r2, r8 - r14}\n"
+ "mrs r2, spsr\n"
+ "stmia r0!, {r2}\n"
+ "msr cpsr_c, r1\n"
+ "bx lr\n");
+}
+
+
+void fiq_debugger_dump_allregs(struct fiq_debugger_output *output,
+ const struct pt_regs *regs)
+{
+ struct mode_regs mode_regs;
+ unsigned long mode = regs->ARM_cpsr & MODE_MASK;
+
+ fiq_debugger_dump_regs(output, regs);
+ get_mode_regs(&mode_regs);
+
+ output->printf(output,
+ "%csvc: sp %08x lr %08x spsr %08x\n",
+ mode == SVC_MODE ? '*' : ' ',
+ mode_regs.sp_svc, mode_regs.lr_svc, mode_regs.spsr_svc);
+ output->printf(output,
+ "%cabt: sp %08x lr %08x spsr %08x\n",
+ mode == ABT_MODE ? '*' : ' ',
+ mode_regs.sp_abt, mode_regs.lr_abt, mode_regs.spsr_abt);
+ output->printf(output,
+ "%cund: sp %08x lr %08x spsr %08x\n",
+ mode == UND_MODE ? '*' : ' ',
+ mode_regs.sp_und, mode_regs.lr_und, mode_regs.spsr_und);
+ output->printf(output,
+ "%cirq: sp %08x lr %08x spsr %08x\n",
+ mode == IRQ_MODE ? '*' : ' ',
+ mode_regs.sp_irq, mode_regs.lr_irq, mode_regs.spsr_irq);
+ output->printf(output,
+ "%cfiq: r8 %08x r9 %08x r10 %08x r11 %08x r12 %08x\n",
+ mode == FIQ_MODE ? '*' : ' ',
+ mode_regs.r8_fiq, mode_regs.r9_fiq, mode_regs.r10_fiq,
+ mode_regs.r11_fiq, mode_regs.r12_fiq);
+ output->printf(output,
+ " fiq: sp %08x lr %08x spsr %08x\n",
+ mode_regs.sp_fiq, mode_regs.lr_fiq, mode_regs.spsr_fiq);
+}
+
+struct stacktrace_state {
+ struct fiq_debugger_output *output;
+ unsigned int depth;
+};
+
+static int report_trace(struct stackframe *frame, void *d)
+{
+ struct stacktrace_state *sts = d;
+
+ if (sts->depth) {
+ sts->output->printf(sts->output,
+ " pc: %p (%pF), lr %p (%pF), sp %p, fp %p\n",
+ frame->pc, frame->pc, frame->lr, frame->lr,
+ frame->sp, frame->fp);
+ sts->depth--;
+ return 0;
+ }
+ sts->output->printf(sts->output, " ...\n");
+
+ return sts->depth == 0;
+}
+
+struct frame_tail {
+ struct frame_tail *fp;
+ unsigned long sp;
+ unsigned long lr;
+} __attribute__((packed));
+
+static struct frame_tail *user_backtrace(struct fiq_debugger_output *output,
+ struct frame_tail *tail)
+{
+ struct frame_tail buftail[2];
+
+ /* Also check accessibility of one struct frame_tail beyond */
+ if (!access_ok(VERIFY_READ, tail, sizeof(buftail))) {
+ output->printf(output, " invalid frame pointer %p\n",
+ tail);
+ return NULL;
+ }
+ if (__copy_from_user_inatomic(buftail, tail, sizeof(buftail))) {
+ output->printf(output,
+ " failed to copy frame pointer %p\n", tail);
+ return NULL;
+ }
+
+ output->printf(output, " %p\n", buftail[0].lr);
+
+ /* frame pointers should strictly progress back up the stack
+ * (towards higher addresses) */
+ if (tail >= buftail[0].fp)
+ return NULL;
+
+ return buftail[0].fp-1;
+}
+
+void fiq_debugger_dump_stacktrace(struct fiq_debugger_output *output,
+ const struct pt_regs *regs, unsigned int depth, void *ssp)
+{
+ struct frame_tail *tail;
+ struct thread_info *real_thread_info = THREAD_INFO(ssp);
+ struct stacktrace_state sts;
+
+ sts.depth = depth;
+ sts.output = output;
+ *current_thread_info() = *real_thread_info;
+
+ if (!current)
+ output->printf(output, "current NULL\n");
+ else
+ output->printf(output, "pid: %d comm: %s\n",
+ current->pid, current->comm);
+ fiq_debugger_dump_regs(output, regs);
+
+ if (!user_mode(regs)) {
+ struct stackframe frame;
+ frame.fp = regs->ARM_fp;
+ frame.sp = regs->ARM_sp;
+ frame.lr = regs->ARM_lr;
+ frame.pc = regs->ARM_pc;
+ output->printf(output,
+ " pc: %p (%pF), lr %p (%pF), sp %p, fp %p\n",
+ regs->ARM_pc, regs->ARM_pc, regs->ARM_lr, regs->ARM_lr,
+ regs->ARM_sp, regs->ARM_fp);
+ walk_stackframe(&frame, report_trace, &sts);
+ return;
+ }
+
+ tail = ((struct frame_tail *) regs->ARM_fp) - 1;
+ while (depth-- && tail && !((unsigned long) tail & 3))
+ tail = user_backtrace(output, tail);
+}
diff --git a/drivers/staging/android/fiq_debugger/fiq_debugger_arm64.c b/drivers/staging/android/fiq_debugger/fiq_debugger_arm64.c
new file mode 100644
index 0000000..c53f498
--- /dev/null
+++ b/drivers/staging/android/fiq_debugger/fiq_debugger_arm64.c
@@ -0,0 +1,201 @@
+/*
+ * Copyright (C) 2014 Google, Inc.
+ * Author: Colin Cross <ccross@android.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/ptrace.h>
+#include <asm/stacktrace.h>
+
+#include "fiq_debugger_priv.h"
+
+static char *mode_name(const struct pt_regs *regs)
+{
+ if (compat_user_mode(regs)) {
+ return "USR";
+ } else {
+ switch (processor_mode(regs)) {
+ case PSR_MODE_EL0t: return "EL0t";
+ case PSR_MODE_EL1t: return "EL1t";
+ case PSR_MODE_EL1h: return "EL1h";
+ case PSR_MODE_EL2t: return "EL2t";
+ case PSR_MODE_EL2h: return "EL2h";
+ default: return "???";
+ }
+ }
+}
+
+void fiq_debugger_dump_pc(struct fiq_debugger_output *output,
+ const struct pt_regs *regs)
+{
+ output->printf(output, " pc %016lx cpsr %08lx mode %s\n",
+ regs->pc, regs->pstate, mode_name(regs));
+}
+
+void fiq_debugger_dump_regs_aarch32(struct fiq_debugger_output *output,
+ const struct pt_regs *regs)
+{
+ output->printf(output, " r0 %08x r1 %08x r2 %08x r3 %08x\n",
+ regs->compat_usr(0), regs->compat_usr(1),
+ regs->compat_usr(2), regs->compat_usr(3));
+ output->printf(output, " r4 %08x r5 %08x r6 %08x r7 %08x\n",
+ regs->compat_usr(4), regs->compat_usr(5),
+ regs->compat_usr(6), regs->compat_usr(7));
+ output->printf(output, " r8 %08x r9 %08x r10 %08x r11 %08x\n",
+ regs->compat_usr(8), regs->compat_usr(9),
+ regs->compat_usr(10), regs->compat_usr(11));
+ output->printf(output, " ip %08x sp %08x lr %08x pc %08x\n",
+ regs->compat_usr(12), regs->compat_sp,
+ regs->compat_lr, regs->pc);
+ output->printf(output, " cpsr %08x (%s)\n",
+ regs->pstate, mode_name(regs));
+}
+
+void fiq_debugger_dump_regs_aarch64(struct fiq_debugger_output *output,
+ const struct pt_regs *regs)
+{
+
+ output->printf(output, " x0 %016lx x1 %016lx\n",
+ regs->regs[0], regs->regs[1]);
+ output->printf(output, " x2 %016lx x3 %016lx\n",
+ regs->regs[2], regs->regs[3]);
+ output->printf(output, " x4 %016lx x5 %016lx\n",
+ regs->regs[4], regs->regs[5]);
+ output->printf(output, " x6 %016lx x7 %016lx\n",
+ regs->regs[6], regs->regs[7]);
+ output->printf(output, " x8 %016lx x9 %016lx\n",
+ regs->regs[8], regs->regs[9]);
+ output->printf(output, " x10 %016lx x11 %016lx\n",
+ regs->regs[10], regs->regs[11]);
+ output->printf(output, " x12 %016lx x13 %016lx\n",
+ regs->regs[12], regs->regs[13]);
+ output->printf(output, " x14 %016lx x15 %016lx\n",
+ regs->regs[14], regs->regs[15]);
+ output->printf(output, " x16 %016lx x17 %016lx\n",
+ regs->regs[16], regs->regs[17]);
+ output->printf(output, " x18 %016lx x19 %016lx\n",
+ regs->regs[18], regs->regs[19]);
+ output->printf(output, " x20 %016lx x21 %016lx\n",
+ regs->regs[20], regs->regs[21]);
+ output->printf(output, " x22 %016lx x23 %016lx\n",
+ regs->regs[22], regs->regs[23]);
+ output->printf(output, " x24 %016lx x25 %016lx\n",
+ regs->regs[24], regs->regs[25]);
+ output->printf(output, " x26 %016lx x27 %016lx\n",
+ regs->regs[26], regs->regs[27]);
+ output->printf(output, " x28 %016lx x29 %016lx\n",
+ regs->regs[28], regs->regs[29]);
+ output->printf(output, " x30 %016lx sp %016lx\n",
+ regs->regs[30], regs->sp);
+ output->printf(output, " pc %016lx cpsr %08x (%s)\n",
+ regs->pc, regs->pstate, mode_name(regs));
+}
+
+void fiq_debugger_dump_regs(struct fiq_debugger_output *output,
+ const struct pt_regs *regs)
+{
+ if (compat_user_mode(regs))
+ fiq_debugger_dump_regs_aarch32(output, regs);
+ else
+ fiq_debugger_dump_regs_aarch64(output, regs);
+}
+
+#define READ_SPECIAL_REG(x) ({ \
+ u64 val; \
+ asm volatile ("mrs %0, " # x : "=r"(val)); \
+ val; \
+})
+
+void fiq_debugger_dump_allregs(struct fiq_debugger_output *output,
+ const struct pt_regs *regs)
+{
+ u32 pstate = READ_SPECIAL_REG(CurrentEl);
+ bool in_el2 = (pstate & PSR_MODE_MASK) >= PSR_MODE_EL2t;
+
+ fiq_debugger_dump_regs(output, regs);
+
+ output->printf(output, " sp_el0 %016lx\n",
+ READ_SPECIAL_REG(sp_el0));
+
+ if (in_el2)
+ output->printf(output, " sp_el1 %016lx\n",
+ READ_SPECIAL_REG(sp_el1));
+
+ output->printf(output, " elr_el1 %016lx\n",
+ READ_SPECIAL_REG(elr_el1));
+
+ output->printf(output, " spsr_el1 %08lx\n",
+ READ_SPECIAL_REG(spsr_el1));
+
+ if (in_el2) {
+ output->printf(output, " spsr_irq %08lx\n",
+ READ_SPECIAL_REG(spsr_irq));
+ output->printf(output, " spsr_abt %08lx\n",
+ READ_SPECIAL_REG(spsr_abt));
+ output->printf(output, " spsr_und %08lx\n",
+ READ_SPECIAL_REG(spsr_und));
+ output->printf(output, " spsr_fiq %08lx\n",
+ READ_SPECIAL_REG(spsr_fiq));
+ output->printf(output, " spsr_el2 %08lx\n",
+ READ_SPECIAL_REG(elr_el2));
+ output->printf(output, " spsr_el2 %08lx\n",
+ READ_SPECIAL_REG(spsr_el2));
+ }
+}
+
+struct stacktrace_state {
+ struct fiq_debugger_output *output;
+ unsigned int depth;
+};
+
+static int report_trace(struct stackframe *frame, void *d)
+{
+ struct stacktrace_state *sts = d;
+
+ if (sts->depth) {
+ sts->output->printf(sts->output, "%pF:\n", frame->pc);
+ sts->output->printf(sts->output,
+ " pc %016lx fp %016lx\n",
+ frame->pc, frame->fp);
+ sts->depth--;
+ return 0;
+ }
+ sts->output->printf(sts->output, " ...\n");
+
+ return sts->depth == 0;
+}
+
+void fiq_debugger_dump_stacktrace(struct fiq_debugger_output *output,
+ const struct pt_regs *regs, unsigned int depth, void *ssp)
+{
+ struct thread_info *real_thread_info = THREAD_INFO(ssp);
+ struct stacktrace_state sts;
+
+ sts.depth = depth;
+ sts.output = output;
+ *current_thread_info() = *real_thread_info;
+
+ if (!current)
+ output->printf(output, "current NULL\n");
+ else
+ output->printf(output, "pid: %d comm: %s\n",
+ current->pid, current->comm);
+ fiq_debugger_dump_regs(output, regs);
+
+ if (!user_mode(regs)) {
+ struct stackframe frame;
+ frame.fp = regs->regs[29];
+ frame.pc = regs->pc;
+ output->printf(output, "\n");
+ walk_stackframe(current, &frame, report_trace, &sts);
+ }
+}
diff --git a/drivers/staging/android/fiq_debugger/fiq_debugger_priv.h b/drivers/staging/android/fiq_debugger/fiq_debugger_priv.h
new file mode 100644
index 0000000..d5d051f
--- /dev/null
+++ b/drivers/staging/android/fiq_debugger/fiq_debugger_priv.h
@@ -0,0 +1,37 @@
+/*
+ * Copyright (C) 2014 Google, Inc.
+ * Author: Colin Cross <ccross@android.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _FIQ_DEBUGGER_PRIV_H_
+#define _FIQ_DEBUGGER_PRIV_H_
+
+#define THREAD_INFO(sp) ((struct thread_info *) \
+ ((unsigned long)(sp) & ~(THREAD_SIZE - 1)))
+
+struct fiq_debugger_output {
+ void (*printf)(struct fiq_debugger_output *output, const char *fmt, ...);
+};
+
+struct pt_regs;
+
+void fiq_debugger_dump_pc(struct fiq_debugger_output *output,
+ const struct pt_regs *regs);
+void fiq_debugger_dump_regs(struct fiq_debugger_output *output,
+ const struct pt_regs *regs);
+void fiq_debugger_dump_allregs(struct fiq_debugger_output *output,
+ const struct pt_regs *regs);
+void fiq_debugger_dump_stacktrace(struct fiq_debugger_output *output,
+ const struct pt_regs *regs, unsigned int depth, void *ssp);
+
+#endif
diff --git a/drivers/staging/android/fiq_debugger/fiq_debugger_ringbuf.h b/drivers/staging/android/fiq_debugger/fiq_debugger_ringbuf.h
new file mode 100644
index 0000000..10c3c5d
--- /dev/null
+++ b/drivers/staging/android/fiq_debugger/fiq_debugger_ringbuf.h
@@ -0,0 +1,94 @@
+/*
+ * drivers/staging/android/fiq_debugger/fiq_debugger_ringbuf.h
+ *
+ * simple lockless ringbuffer
+ *
+ * Copyright (C) 2010 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/kernel.h>
+#include <linux/slab.h>
+
+struct fiq_debugger_ringbuf {
+ int len;
+ int head;
+ int tail;
+ u8 buf[];
+};
+
+
+static inline struct fiq_debugger_ringbuf *fiq_debugger_ringbuf_alloc(int len)
+{
+ struct fiq_debugger_ringbuf *rbuf;
+
+ rbuf = kzalloc(sizeof(*rbuf) + len, GFP_KERNEL);
+ if (rbuf == NULL)
+ return NULL;
+
+ rbuf->len = len;
+ rbuf->head = 0;
+ rbuf->tail = 0;
+ smp_mb();
+
+ return rbuf;
+}
+
+static inline void fiq_debugger_ringbuf_free(struct fiq_debugger_ringbuf *rbuf)
+{
+ kfree(rbuf);
+}
+
+static inline int fiq_debugger_ringbuf_level(struct fiq_debugger_ringbuf *rbuf)
+{
+ int level = rbuf->head - rbuf->tail;
+
+ if (level < 0)
+ level = rbuf->len + level;
+
+ return level;
+}
+
+static inline int fiq_debugger_ringbuf_room(struct fiq_debugger_ringbuf *rbuf)
+{
+ return rbuf->len - fiq_debugger_ringbuf_level(rbuf) - 1;
+}
+
+static inline u8
+fiq_debugger_ringbuf_peek(struct fiq_debugger_ringbuf *rbuf, int i)
+{
+ return rbuf->buf[(rbuf->tail + i) % rbuf->len];
+}
+
+static inline int
+fiq_debugger_ringbuf_consume(struct fiq_debugger_ringbuf *rbuf, int count)
+{
+ count = min(count, fiq_debugger_ringbuf_level(rbuf));
+
+ rbuf->tail = (rbuf->tail + count) % rbuf->len;
+ smp_mb();
+
+ return count;
+}
+
+static inline int
+fiq_debugger_ringbuf_push(struct fiq_debugger_ringbuf *rbuf, u8 datum)
+{
+ if (fiq_debugger_ringbuf_room(rbuf) == 0)
+ return 0;
+
+ rbuf->buf[rbuf->head] = datum;
+ smp_mb();
+ rbuf->head = (rbuf->head + 1) % rbuf->len;
+ smp_mb();
+
+ return 1;
+}
diff --git a/drivers/staging/android/fiq_debugger/fiq_watchdog.c b/drivers/staging/android/fiq_debugger/fiq_watchdog.c
new file mode 100644
index 0000000..194b541
--- /dev/null
+++ b/drivers/staging/android/fiq_debugger/fiq_watchdog.c
@@ -0,0 +1,56 @@
+/*
+ * Copyright (C) 2014 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/spinlock.h>
+#include <linux/pstore_ram.h>
+
+#include "fiq_watchdog.h"
+#include "fiq_debugger_priv.h"
+
+static DEFINE_RAW_SPINLOCK(fiq_watchdog_lock);
+
+static void fiq_watchdog_printf(struct fiq_debugger_output *output,
+ const char *fmt, ...)
+{
+ char buf[256];
+ va_list ap;
+ int len;
+
+ va_start(ap, fmt);
+ len = vscnprintf(buf, sizeof(buf), fmt, ap);
+ va_end(ap);
+
+ ramoops_console_write_buf(buf, len);
+}
+
+struct fiq_debugger_output fiq_watchdog_output = {
+ .printf = fiq_watchdog_printf,
+};
+
+void fiq_watchdog_triggered(const struct pt_regs *regs, void *svc_sp)
+{
+ char msg[24];
+ int len;
+
+ raw_spin_lock(&fiq_watchdog_lock);
+
+ len = scnprintf(msg, sizeof(msg), "watchdog fiq cpu %d\n",
+ THREAD_INFO(svc_sp)->cpu);
+ ramoops_console_write_buf(msg, len);
+
+ fiq_debugger_dump_stacktrace(&fiq_watchdog_output, regs, 100, svc_sp);
+
+ raw_spin_unlock(&fiq_watchdog_lock);
+}
diff --git a/drivers/staging/android/fiq_debugger/fiq_watchdog.h b/drivers/staging/android/fiq_debugger/fiq_watchdog.h
new file mode 100644
index 0000000..c6b507f
--- /dev/null
+++ b/drivers/staging/android/fiq_debugger/fiq_watchdog.h
@@ -0,0 +1,20 @@
+/*
+ * Copyright (C) 2014 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _FIQ_WATCHDOG_H_
+#define _FIQ_WATCHDOG_H_
+
+void fiq_watchdog_triggered(const struct pt_regs *regs, void *svc_sp);
+
+#endif
diff --git a/drivers/staging/android/uapi/vsoc_shm.h b/drivers/staging/android/uapi/vsoc_shm.h
new file mode 100644
index 0000000..741b138
--- /dev/null
+++ b/drivers/staging/android/uapi/vsoc_shm.h
@@ -0,0 +1,303 @@
+/*
+ * Copyright (C) 2017 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _UAPI_LINUX_VSOC_SHM_H
+#define _UAPI_LINUX_VSOC_SHM_H
+
+#include <linux/types.h>
+
+/**
+ * A permission is a token that permits a receiver to read and/or write an area
+ * of memory within a Vsoc region.
+ *
+ * An fd_scoped permission grants both read and write access, and can be
+ * attached to a file description (see open(2)).
+ * Ownership of the area can then be shared by passing a file descriptor
+ * among processes.
+ *
+ * begin_offset and end_offset define the area of memory that is controlled by
+ * the permission. owner_offset points to a word, also in shared memory, that
+ * controls ownership of the area.
+ *
+ * ownership of the region expires when the associated file description is
+ * released.
+ *
+ * At most one permission can be attached to each file description.
+ *
+ * This is useful when implementing HALs like gralloc that scope and pass
+ * ownership of shared resources via file descriptors.
+ *
+ * The caller is responsibe for doing any fencing.
+ *
+ * The calling process will normally identify a currently free area of
+ * memory. It will construct a proposed fd_scoped_permission_arg structure:
+ *
+ * begin_offset and end_offset describe the area being claimed
+ *
+ * owner_offset points to the location in shared memory that indicates the
+ * owner of the area.
+ *
+ * owned_value is the value that will be stored in owner_offset iff the
+ * permission can be granted. It must be different than VSOC_REGION_FREE.
+ *
+ * Two fd_scoped_permission structures are compatible if they vary only by
+ * their owned_value fields.
+ *
+ * The driver ensures that, for any group of simultaneous callers proposing
+ * compatible fd_scoped_permissions, it will accept exactly one of the
+ * propopsals. The other callers will get a failure with errno of EAGAIN.
+ *
+ * A process receiving a file descriptor can identify the region being
+ * granted using the VSOC_GET_FD_SCOPED_PERMISSION ioctl.
+ */
+struct fd_scoped_permission {
+ __u32 begin_offset;
+ __u32 end_offset;
+ __u32 owner_offset;
+ __u32 owned_value;
+};
+
+/*
+ * This value represents a free area of memory. The driver expects to see this
+ * value at owner_offset when creating a permission otherwise it will not do it,
+ * and will write this value back once the permission is no longer needed.
+ */
+#define VSOC_REGION_FREE ((__u32)0)
+
+/**
+ * ioctl argument for VSOC_CREATE_FD_SCOPE_PERMISSION
+ */
+struct fd_scoped_permission_arg {
+ struct fd_scoped_permission perm;
+ __s32 managed_region_fd;
+};
+
+#define VSOC_NODE_FREE ((__u32)0)
+
+/*
+ * Describes a signal table in shared memory. Each non-zero entry in the
+ * table indicates that the receiver should signal the futex at the given
+ * offset. Offsets are relative to the region, not the shared memory window.
+ *
+ * interrupt_signalled_offset is used to reliably signal interrupts across the
+ * vmm boundary. There are two roles: transmitter and receiver. For example,
+ * in the host_to_guest_signal_table the host is the transmitter and the
+ * guest is the receiver. The protocol is as follows:
+ *
+ * 1. The transmitter should convert the offset of the futex to an offset
+ * in the signal table [0, (1 << num_nodes_lg2))
+ * The transmitter can choose any appropriate hashing algorithm, including
+ * hash = futex_offset & ((1 << num_nodes_lg2) - 1)
+ *
+ * 3. The transmitter should atomically compare and swap futex_offset with 0
+ * at hash. There are 3 possible outcomes
+ * a. The swap fails because the futex_offset is already in the table.
+ * The transmitter should stop.
+ * b. Some other offset is in the table. This is a hash collision. The
+ * transmitter should move to another table slot and try again. One
+ * possible algorithm:
+ * hash = (hash + 1) & ((1 << num_nodes_lg2) - 1)
+ * c. The swap worked. Continue below.
+ *
+ * 3. The transmitter atomically swaps 1 with the value at the
+ * interrupt_signalled_offset. There are two outcomes:
+ * a. The prior value was 1. In this case an interrupt has already been
+ * posted. The transmitter is done.
+ * b. The prior value was 0, indicating that the receiver may be sleeping.
+ * The transmitter will issue an interrupt.
+ *
+ * 4. On waking the receiver immediately exchanges a 0 with the
+ * interrupt_signalled_offset. If it receives a 0 then this a spurious
+ * interrupt. That may occasionally happen in the current protocol, but
+ * should be rare.
+ *
+ * 5. The receiver scans the signal table by atomicaly exchanging 0 at each
+ * location. If a non-zero offset is returned from the exchange the
+ * receiver wakes all sleepers at the given offset:
+ * futex((int*)(region_base + old_value), FUTEX_WAKE, MAX_INT);
+ *
+ * 6. The receiver thread then does a conditional wait, waking immediately
+ * if the value at interrupt_signalled_offset is non-zero. This catches cases
+ * here additional signals were posted while the table was being scanned.
+ * On the guest the wait is handled via the VSOC_WAIT_FOR_INCOMING_INTERRUPT
+ * ioctl.
+ */
+struct vsoc_signal_table_layout {
+ /* log_2(Number of signal table entries) */
+ __u32 num_nodes_lg2;
+ /*
+ * Offset to the first signal table entry relative to the start of the
+ * region
+ */
+ __u32 futex_uaddr_table_offset;
+ /*
+ * Offset to an atomic_t / atomic uint32_t. A non-zero value indicates
+ * that one or more offsets are currently posted in the table.
+ * semi-unique access to an entry in the table
+ */
+ __u32 interrupt_signalled_offset;
+};
+
+#define VSOC_REGION_WHOLE ((__s32)0)
+#define VSOC_DEVICE_NAME_SZ 16
+
+/**
+ * Each HAL would (usually) talk to a single device region
+ * Mulitple entities care about these regions:
+ * - The ivshmem_server will populate the regions in shared memory
+ * - The guest kernel will read the region, create minor device nodes, and
+ * allow interested parties to register for FUTEX_WAKE events in the region
+ * - HALs will access via the minor device nodes published by the guest kernel
+ * - Host side processes will access the region via the ivshmem_server:
+ * 1. Pass name to ivshmem_server at a UNIX socket
+ * 2. ivshmemserver will reply with 2 fds:
+ * - host->guest doorbell fd
+ * - guest->host doorbell fd
+ * - fd for the shared memory region
+ * - region offset
+ * 3. Start a futex receiver thread on the doorbell fd pointed at the
+ * signal_nodes
+ */
+struct vsoc_device_region {
+ __u16 current_version;
+ __u16 min_compatible_version;
+ __u32 region_begin_offset;
+ __u32 region_end_offset;
+ __u32 offset_of_region_data;
+ struct vsoc_signal_table_layout guest_to_host_signal_table;
+ struct vsoc_signal_table_layout host_to_guest_signal_table;
+ /* Name of the device. Must always be terminated with a '\0', so
+ * the longest supported device name is 15 characters.
+ */
+ char device_name[VSOC_DEVICE_NAME_SZ];
+ /* There are two ways that permissions to access regions are handled:
+ * - When subdivided_by is VSOC_REGION_WHOLE, any process that can
+ * open the device node for the region gains complete access to it.
+ * - When subdivided is set processes that open the region cannot
+ * access it. Access to a sub-region must be established by invoking
+ * the VSOC_CREATE_FD_SCOPE_PERMISSION ioctl on the region
+ * referenced in subdivided_by, providing a fileinstance
+ * (represented by a fd) opened on this region.
+ */
+ __u32 managed_by;
+};
+
+/*
+ * The vsoc layout descriptor.
+ * The first 4K should be reserved for the shm header and region descriptors.
+ * The regions should be page aligned.
+ */
+
+struct vsoc_shm_layout_descriptor {
+ __u16 major_version;
+ __u16 minor_version;
+
+ /* size of the shm. This may be redundant but nice to have */
+ __u32 size;
+
+ /* number of shared memory regions */
+ __u32 region_count;
+
+ /* The offset to the start of region descriptors */
+ __u32 vsoc_region_desc_offset;
+};
+
+/*
+ * This specifies the current version that should be stored in
+ * vsoc_shm_layout_descriptor.major_version and
+ * vsoc_shm_layout_descriptor.minor_version.
+ * It should be updated only if the vsoc_device_region and
+ * vsoc_shm_layout_descriptor structures have changed.
+ * Versioning within each region is transferred
+ * via the min_compatible_version and current_version fields in
+ * vsoc_device_region. The driver does not consult these fields: they are left
+ * for the HALs and host processes and will change independently of the layout
+ * version.
+ */
+#define CURRENT_VSOC_LAYOUT_MAJOR_VERSION 2
+#define CURRENT_VSOC_LAYOUT_MINOR_VERSION 0
+
+#define VSOC_CREATE_FD_SCOPED_PERMISSION \
+ _IOW(0xF5, 0, struct fd_scoped_permission)
+#define VSOC_GET_FD_SCOPED_PERMISSION _IOR(0xF5, 1, struct fd_scoped_permission)
+
+/*
+ * This is used to signal the host to scan the guest_to_host_signal_table
+ * for new futexes to wake. This sends an interrupt if one is not already
+ * in flight.
+ */
+#define VSOC_MAYBE_SEND_INTERRUPT_TO_HOST _IO(0xF5, 2)
+
+/*
+ * When this returns the guest will scan host_to_guest_signal_table to
+ * check for new futexes to wake.
+ */
+/* TODO(ghartman): Consider moving this to the bottom half */
+#define VSOC_WAIT_FOR_INCOMING_INTERRUPT _IO(0xF5, 3)
+
+/*
+ * Guest HALs will use this to retrieve the region description after
+ * opening their device node.
+ */
+#define VSOC_DESCRIBE_REGION _IOR(0xF5, 4, struct vsoc_device_region)
+
+/*
+ * Wake any threads that may be waiting for a host interrupt on this region.
+ * This is mostly used during shutdown.
+ */
+#define VSOC_SELF_INTERRUPT _IO(0xF5, 5)
+
+/*
+ * This is used to signal the host to scan the guest_to_host_signal_table
+ * for new futexes to wake. This sends an interrupt unconditionally.
+ */
+#define VSOC_SEND_INTERRUPT_TO_HOST _IO(0xF5, 6)
+
+enum wait_types {
+ VSOC_WAIT_UNDEFINED = 0,
+ VSOC_WAIT_IF_EQUAL = 1,
+ VSOC_WAIT_IF_EQUAL_TIMEOUT = 2
+};
+
+/*
+ * Wait for a condition to be true
+ *
+ * Note, this is sized and aligned so the 32 bit and 64 bit layouts are
+ * identical.
+ */
+struct vsoc_cond_wait {
+ /* Input: Offset of the 32 bit word to check */
+ __u32 offset;
+ /* Input: Value that will be compared with the offset */
+ __u32 value;
+ /* Monotonic time to wake at in seconds */
+ __u64 wake_time_sec;
+ /* Input: Monotonic time to wait in nanoseconds */
+ __u32 wake_time_nsec;
+ /* Input: Type of wait */
+ __u32 wait_type;
+ /* Output: Number of times the thread woke before returning. */
+ __u32 wakes;
+ /* Ensure that we're 8-byte aligned and 8 byte length for 32/64 bit
+ * compatibility.
+ */
+ __u32 reserved_1;
+};
+
+#define VSOC_COND_WAIT _IOWR(0xF5, 7, struct vsoc_cond_wait)
+
+/* Wake any local threads waiting at the offset given in arg */
+#define VSOC_COND_WAKE _IO(0xF5, 8)
+
+#endif /* _UAPI_LINUX_VSOC_SHM_H */
diff --git a/drivers/staging/android/vsoc.c b/drivers/staging/android/vsoc.c
new file mode 100644
index 0000000..954ed2c
--- /dev/null
+++ b/drivers/staging/android/vsoc.c
@@ -0,0 +1,1165 @@
+/*
+ * drivers/android/staging/vsoc.c
+ *
+ * Android Virtual System on a Chip (VSoC) driver
+ *
+ * Copyright (C) 2017 Google, Inc.
+ *
+ * Author: ghartman@google.com
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ *
+ * Based on drivers/char/kvm_ivshmem.c - driver for KVM Inter-VM shared memory
+ * Copyright 2009 Cam Macdonell <cam@cs.ualberta.ca>
+ *
+ * Based on cirrusfb.c and 8139cp.c:
+ * Copyright 1999-2001 Jeff Garzik
+ * Copyright 2001-2004 Jeff Garzik
+ */
+
+#include <linux/dma-mapping.h>
+#include <linux/freezer.h>
+#include <linux/futex.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/proc_fs.h>
+#include <linux/sched.h>
+#include <linux/syscalls.h>
+#include <linux/uaccess.h>
+#include <linux/interrupt.h>
+#include <linux/mutex.h>
+#include <linux/cdev.h>
+#include <linux/file.h>
+#include "uapi/vsoc_shm.h"
+
+#define VSOC_DEV_NAME "vsoc"
+
+/*
+ * Description of the ivshmem-doorbell PCI device used by QEmu. These
+ * constants follow docs/specs/ivshmem-spec.txt, which can be found in
+ * the QEmu repository. This was last reconciled with the version that
+ * came out with 2.8
+ */
+
+/*
+ * These constants are determined KVM Inter-VM shared memory device
+ * register offsets
+ */
+enum {
+ INTR_MASK = 0x00, /* Interrupt Mask */
+ INTR_STATUS = 0x04, /* Interrupt Status */
+ IV_POSITION = 0x08, /* VM ID */
+ DOORBELL = 0x0c, /* Doorbell */
+};
+
+static const int REGISTER_BAR; /* Equal to 0 */
+static const int MAX_REGISTER_BAR_LEN = 0x100;
+/*
+ * The MSI-x BAR is not used directly.
+ *
+ * static const int MSI_X_BAR = 1;
+ */
+static const int SHARED_MEMORY_BAR = 2;
+
+struct vsoc_region_data {
+ char name[VSOC_DEVICE_NAME_SZ + 1];
+ wait_queue_head_t interrupt_wait_queue;
+ /* TODO(b/73664181): Use multiple futex wait queues */
+ wait_queue_head_t futex_wait_queue;
+ /* Flag indicating that an interrupt has been signalled by the host. */
+ atomic_t *incoming_signalled;
+ /* Flag indicating the guest has signalled the host. */
+ atomic_t *outgoing_signalled;
+ bool irq_requested;
+ bool device_created;
+};
+
+struct vsoc_device {
+ /* Kernel virtual address of REGISTER_BAR. */
+ void __iomem *regs;
+ /* Physical address of SHARED_MEMORY_BAR. */
+ phys_addr_t shm_phys_start;
+ /* Kernel virtual address of SHARED_MEMORY_BAR. */
+ void __iomem *kernel_mapped_shm;
+ /* Size of the entire shared memory window in bytes. */
+ size_t shm_size;
+ /*
+ * Pointer to the virtual address of the shared memory layout structure.
+ * This is probably identical to kernel_mapped_shm, but saving this
+ * here saves a lot of annoying casts.
+ */
+ struct vsoc_shm_layout_descriptor *layout;
+ /*
+ * Points to a table of region descriptors in the kernel's virtual
+ * address space. Calculated from
+ * vsoc_shm_layout_descriptor.vsoc_region_desc_offset
+ */
+ struct vsoc_device_region *regions;
+ /* Head of a list of permissions that have been granted. */
+ struct list_head permissions;
+ struct pci_dev *dev;
+ /* Per-region (and therefore per-interrupt) information. */
+ struct vsoc_region_data *regions_data;
+ /*
+ * Table of msi-x entries. This has to be separated from struct
+ * vsoc_region_data because the kernel deals with them as an array.
+ */
+ struct msix_entry *msix_entries;
+ /* Mutex that protectes the permission list */
+ struct mutex mtx;
+ /* Major number assigned by the kernel */
+ int major;
+ /* Character device assigned by the kernel */
+ struct cdev cdev;
+ /* Device class assigned by the kernel */
+ struct class *class;
+ /*
+ * Flags that indicate what we've initialized. These are used to do an
+ * orderly cleanup of the device.
+ */
+ bool enabled_device;
+ bool requested_regions;
+ bool cdev_added;
+ bool class_added;
+ bool msix_enabled;
+};
+
+static struct vsoc_device vsoc_dev;
+
+/*
+ * TODO(ghartman): Add a /sys filesystem entry that summarizes the permissions.
+ */
+
+struct fd_scoped_permission_node {
+ struct fd_scoped_permission permission;
+ struct list_head list;
+};
+
+struct vsoc_private_data {
+ struct fd_scoped_permission_node *fd_scoped_permission_node;
+};
+
+static long vsoc_ioctl(struct file *, unsigned int, unsigned long);
+static int vsoc_mmap(struct file *, struct vm_area_struct *);
+static int vsoc_open(struct inode *, struct file *);
+static int vsoc_release(struct inode *, struct file *);
+static ssize_t vsoc_read(struct file *, char __user *, size_t, loff_t *);
+static ssize_t vsoc_write(struct file *, const char __user *, size_t, loff_t *);
+static loff_t vsoc_lseek(struct file *filp, loff_t offset, int origin);
+static int do_create_fd_scoped_permission(
+ struct vsoc_device_region *region_p,
+ struct fd_scoped_permission_node *np,
+ struct fd_scoped_permission_arg __user *arg);
+static void do_destroy_fd_scoped_permission(
+ struct vsoc_device_region *owner_region_p,
+ struct fd_scoped_permission *perm);
+static long do_vsoc_describe_region(struct file *,
+ struct vsoc_device_region __user *);
+static ssize_t vsoc_get_area(struct file *filp, __u32 *perm_off);
+
+/**
+ * Validate arguments on entry points to the driver.
+ */
+inline int vsoc_validate_inode(struct inode *inode)
+{
+ if (iminor(inode) >= vsoc_dev.layout->region_count) {
+ dev_err(&vsoc_dev.dev->dev,
+ "describe_region: invalid region %d\n", iminor(inode));
+ return -ENODEV;
+ }
+ return 0;
+}
+
+inline int vsoc_validate_filep(struct file *filp)
+{
+ int ret = vsoc_validate_inode(file_inode(filp));
+
+ if (ret)
+ return ret;
+ if (!filp->private_data) {
+ dev_err(&vsoc_dev.dev->dev,
+ "No private data on fd, region %d\n",
+ iminor(file_inode(filp)));
+ return -EBADFD;
+ }
+ return 0;
+}
+
+/* Converts from shared memory offset to virtual address */
+static inline void *shm_off_to_virtual_addr(__u32 offset)
+{
+ return (void __force *)vsoc_dev.kernel_mapped_shm + offset;
+}
+
+/* Converts from shared memory offset to physical address */
+static inline phys_addr_t shm_off_to_phys_addr(__u32 offset)
+{
+ return vsoc_dev.shm_phys_start + offset;
+}
+
+/**
+ * Convenience functions to obtain the region from the inode or file.
+ * Dangerous to call before validating the inode/file.
+ */
+static inline struct vsoc_device_region *vsoc_region_from_inode(
+ struct inode *inode)
+{
+ return &vsoc_dev.regions[iminor(inode)];
+}
+
+static inline struct vsoc_device_region *vsoc_region_from_filep(
+ struct file *inode)
+{
+ return vsoc_region_from_inode(file_inode(inode));
+}
+
+static inline uint32_t vsoc_device_region_size(struct vsoc_device_region *r)
+{
+ return r->region_end_offset - r->region_begin_offset;
+}
+
+static const struct file_operations vsoc_ops = {
+ .owner = THIS_MODULE,
+ .open = vsoc_open,
+ .mmap = vsoc_mmap,
+ .read = vsoc_read,
+ .unlocked_ioctl = vsoc_ioctl,
+ .compat_ioctl = vsoc_ioctl,
+ .write = vsoc_write,
+ .llseek = vsoc_lseek,
+ .release = vsoc_release,
+};
+
+static struct pci_device_id vsoc_id_table[] = {
+ {0x1af4, 0x1110, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0},
+ {0},
+};
+
+MODULE_DEVICE_TABLE(pci, vsoc_id_table);
+
+static void vsoc_remove_device(struct pci_dev *pdev);
+static int vsoc_probe_device(struct pci_dev *pdev,
+ const struct pci_device_id *ent);
+
+static struct pci_driver vsoc_pci_driver = {
+ .name = "vsoc",
+ .id_table = vsoc_id_table,
+ .probe = vsoc_probe_device,
+ .remove = vsoc_remove_device,
+};
+
+static int do_create_fd_scoped_permission(
+ struct vsoc_device_region *region_p,
+ struct fd_scoped_permission_node *np,
+ struct fd_scoped_permission_arg __user *arg)
+{
+ struct file *managed_filp;
+ s32 managed_fd;
+ atomic_t *owner_ptr = NULL;
+ struct vsoc_device_region *managed_region_p;
+
+ if (copy_from_user(&np->permission, &arg->perm, sizeof(*np)) ||
+ copy_from_user(&managed_fd,
+ &arg->managed_region_fd, sizeof(managed_fd))) {
+ return -EFAULT;
+ }
+ managed_filp = fdget(managed_fd).file;
+ /* Check that it's a valid fd, */
+ if (!managed_filp || vsoc_validate_filep(managed_filp))
+ return -EPERM;
+ /* EEXIST if the given fd already has a permission. */
+ if (((struct vsoc_private_data *)managed_filp->private_data)->
+ fd_scoped_permission_node)
+ return -EEXIST;
+ managed_region_p = vsoc_region_from_filep(managed_filp);
+ /* Check that the provided region is managed by this one */
+ if (&vsoc_dev.regions[managed_region_p->managed_by] != region_p)
+ return -EPERM;
+ /* The area must be well formed and have non-zero size */
+ if (np->permission.begin_offset >= np->permission.end_offset)
+ return -EINVAL;
+ /* The area must fit in the memory window */
+ if (np->permission.end_offset >
+ vsoc_device_region_size(managed_region_p))
+ return -ERANGE;
+ /* The area must be in the region data section */
+ if (np->permission.begin_offset <
+ managed_region_p->offset_of_region_data)
+ return -ERANGE;
+ /* The area must be page aligned */
+ if (!PAGE_ALIGNED(np->permission.begin_offset) ||
+ !PAGE_ALIGNED(np->permission.end_offset))
+ return -EINVAL;
+ /* Owner offset must be naturally aligned in the window */
+ if (np->permission.owner_offset &
+ (sizeof(np->permission.owner_offset) - 1))
+ return -EINVAL;
+ /* The owner flag must reside in the owner memory */
+ if (np->permission.owner_offset + sizeof(np->permission.owner_offset) >
+ vsoc_device_region_size(region_p))
+ return -ERANGE;
+ /* The owner flag must reside in the data section */
+ if (np->permission.owner_offset < region_p->offset_of_region_data)
+ return -EINVAL;
+ /* The owner value must change to claim the memory */
+ if (np->permission.owned_value == VSOC_REGION_FREE)
+ return -EINVAL;
+ owner_ptr =
+ (atomic_t *)shm_off_to_virtual_addr(region_p->region_begin_offset +
+ np->permission.owner_offset);
+ /* We've already verified that this is in the shared memory window, so
+ * it should be safe to write to this address.
+ */
+ if (atomic_cmpxchg(owner_ptr,
+ VSOC_REGION_FREE,
+ np->permission.owned_value) != VSOC_REGION_FREE) {
+ return -EBUSY;
+ }
+ ((struct vsoc_private_data *)managed_filp->private_data)->
+ fd_scoped_permission_node = np;
+ /* The file offset needs to be adjusted if the calling
+ * process did any read/write operations on the fd
+ * before creating the permission.
+ */
+ if (managed_filp->f_pos) {
+ if (managed_filp->f_pos > np->permission.end_offset) {
+ /* If the offset is beyond the permission end, set it
+ * to the end.
+ */
+ managed_filp->f_pos = np->permission.end_offset;
+ } else {
+ /* If the offset is within the permission interval
+ * keep it there otherwise reset it to zero.
+ */
+ if (managed_filp->f_pos < np->permission.begin_offset) {
+ managed_filp->f_pos = 0;
+ } else {
+ managed_filp->f_pos -=
+ np->permission.begin_offset;
+ }
+ }
+ }
+ return 0;
+}
+
+static void do_destroy_fd_scoped_permission_node(
+ struct vsoc_device_region *owner_region_p,
+ struct fd_scoped_permission_node *node)
+{
+ if (node) {
+ do_destroy_fd_scoped_permission(owner_region_p,
+ &node->permission);
+ mutex_lock(&vsoc_dev.mtx);
+ list_del(&node->list);
+ mutex_unlock(&vsoc_dev.mtx);
+ kfree(node);
+ }
+}
+
+static void do_destroy_fd_scoped_permission(
+ struct vsoc_device_region *owner_region_p,
+ struct fd_scoped_permission *perm)
+{
+ atomic_t *owner_ptr = NULL;
+ int prev = 0;
+
+ if (!perm)
+ return;
+ owner_ptr = (atomic_t *)shm_off_to_virtual_addr(
+ owner_region_p->region_begin_offset + perm->owner_offset);
+ prev = atomic_xchg(owner_ptr, VSOC_REGION_FREE);
+ if (prev != perm->owned_value)
+ dev_err(&vsoc_dev.dev->dev,
+ "%x-%x: owner (%s) %x: expected to be %x was %x",
+ perm->begin_offset, perm->end_offset,
+ owner_region_p->device_name, perm->owner_offset,
+ perm->owned_value, prev);
+}
+
+static long do_vsoc_describe_region(struct file *filp,
+ struct vsoc_device_region __user *dest)
+{
+ struct vsoc_device_region *region_p;
+ int retval = vsoc_validate_filep(filp);
+
+ if (retval)
+ return retval;
+ region_p = vsoc_region_from_filep(filp);
+ if (copy_to_user(dest, region_p, sizeof(*region_p)))
+ return -EFAULT;
+ return 0;
+}
+
+/**
+ * Implements the inner logic of cond_wait. Copies to and from userspace are
+ * done in the helper function below.
+ */
+static int handle_vsoc_cond_wait(struct file *filp, struct vsoc_cond_wait *arg)
+{
+ DEFINE_WAIT(wait);
+ u32 region_number = iminor(file_inode(filp));
+ struct vsoc_region_data *data = vsoc_dev.regions_data + region_number;
+ struct hrtimer_sleeper timeout, *to = NULL;
+ int ret = 0;
+ struct vsoc_device_region *region_p = vsoc_region_from_filep(filp);
+ atomic_t *address = NULL;
+ struct timespec ts;
+
+ /* Ensure that the offset is aligned */
+ if (arg->offset & (sizeof(uint32_t) - 1))
+ return -EADDRNOTAVAIL;
+ /* Ensure that the offset is within shared memory */
+ if (((uint64_t)arg->offset) + region_p->region_begin_offset +
+ sizeof(uint32_t) > region_p->region_end_offset)
+ return -E2BIG;
+ address = shm_off_to_virtual_addr(region_p->region_begin_offset +
+ arg->offset);
+
+ /* Ensure that the type of wait is valid */
+ switch (arg->wait_type) {
+ case VSOC_WAIT_IF_EQUAL:
+ break;
+ case VSOC_WAIT_IF_EQUAL_TIMEOUT:
+ to = &timeout;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ if (to) {
+ /* Copy the user-supplied timesec into the kernel structure.
+ * We do things this way to flatten differences between 32 bit
+ * and 64 bit timespecs.
+ */
+ ts.tv_sec = arg->wake_time_sec;
+ ts.tv_nsec = arg->wake_time_nsec;
+
+ if (!timespec_valid(&ts))
+ return -EINVAL;
+ hrtimer_init_on_stack(&to->timer, CLOCK_MONOTONIC,
+ HRTIMER_MODE_ABS);
+ hrtimer_set_expires_range_ns(&to->timer, timespec_to_ktime(ts),
+ current->timer_slack_ns);
+
+ hrtimer_init_sleeper(to, current);
+ }
+
+ while (1) {
+ prepare_to_wait(&data->futex_wait_queue, &wait,
+ TASK_INTERRUPTIBLE);
+ /*
+ * Check the sentinel value after prepare_to_wait. If the value
+ * changes after this check the writer will call signal,
+ * changing the task state from INTERRUPTIBLE to RUNNING. That
+ * will ensure that schedule() will eventually schedule this
+ * task.
+ */
+ if (atomic_read(address) != arg->value) {
+ ret = 0;
+ break;
+ }
+ if (to) {
+ hrtimer_start_expires(&to->timer, HRTIMER_MODE_ABS);
+ if (likely(to->task))
+ freezable_schedule();
+ hrtimer_cancel(&to->timer);
+ if (!to->task) {
+ ret = -ETIMEDOUT;
+ break;
+ }
+ } else {
+ freezable_schedule();
+ }
+ /* Count the number of times that we woke up. This is useful
+ * for unit testing.
+ */
+ ++arg->wakes;
+ if (signal_pending(current)) {
+ ret = -EINTR;
+ break;
+ }
+ }
+ finish_wait(&data->futex_wait_queue, &wait);
+ if (to)
+ destroy_hrtimer_on_stack(&to->timer);
+ return ret;
+}
+
+/**
+ * Handles the details of copying from/to userspace to ensure that the copies
+ * happen on all of the return paths of cond_wait.
+ */
+static int do_vsoc_cond_wait(struct file *filp,
+ struct vsoc_cond_wait __user *untrusted_in)
+{
+ struct vsoc_cond_wait arg;
+ int rval = 0;
+
+ if (copy_from_user(&arg, untrusted_in, sizeof(arg)))
+ return -EFAULT;
+ /* wakes is an out parameter. Initialize it to something sensible. */
+ arg.wakes = 0;
+ rval = handle_vsoc_cond_wait(filp, &arg);
+ if (copy_to_user(untrusted_in, &arg, sizeof(arg)))
+ return -EFAULT;
+ return rval;
+}
+
+static int do_vsoc_cond_wake(struct file *filp, uint32_t offset)
+{
+ struct vsoc_device_region *region_p = vsoc_region_from_filep(filp);
+ u32 region_number = iminor(file_inode(filp));
+ struct vsoc_region_data *data = vsoc_dev.regions_data + region_number;
+ /* Ensure that the offset is aligned */
+ if (offset & (sizeof(uint32_t) - 1))
+ return -EADDRNOTAVAIL;
+ /* Ensure that the offset is within shared memory */
+ if (((uint64_t)offset) + region_p->region_begin_offset +
+ sizeof(uint32_t) > region_p->region_end_offset)
+ return -E2BIG;
+ /*
+ * TODO(b/73664181): Use multiple futex wait queues.
+ * We need to wake every sleeper when the condition changes. Typically
+ * only a single thread will be waiting on the condition, but there
+ * are exceptions. The worst case is about 10 threads.
+ */
+ wake_up_interruptible_all(&data->futex_wait_queue);
+ return 0;
+}
+
+static long vsoc_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+ int rv = 0;
+ struct vsoc_device_region *region_p;
+ u32 reg_num;
+ struct vsoc_region_data *reg_data;
+ int retval = vsoc_validate_filep(filp);
+
+ if (retval)
+ return retval;
+ region_p = vsoc_region_from_filep(filp);
+ reg_num = iminor(file_inode(filp));
+ reg_data = vsoc_dev.regions_data + reg_num;
+ switch (cmd) {
+ case VSOC_CREATE_FD_SCOPED_PERMISSION:
+ {
+ struct fd_scoped_permission_node *node = NULL;
+
+ node = kzalloc(sizeof(*node), GFP_KERNEL);
+ /* We can't allocate memory for the permission */
+ if (!node)
+ return -ENOMEM;
+ INIT_LIST_HEAD(&node->list);
+ rv = do_create_fd_scoped_permission(
+ region_p,
+ node,
+ (struct fd_scoped_permission_arg __user *)arg);
+ if (!rv) {
+ mutex_lock(&vsoc_dev.mtx);
+ list_add(&node->list, &vsoc_dev.permissions);
+ mutex_unlock(&vsoc_dev.mtx);
+ } else {
+ kfree(node);
+ return rv;
+ }
+ }
+ break;
+
+ case VSOC_GET_FD_SCOPED_PERMISSION:
+ {
+ struct fd_scoped_permission_node *node =
+ ((struct vsoc_private_data *)filp->private_data)->
+ fd_scoped_permission_node;
+ if (!node)
+ return -ENOENT;
+ if (copy_to_user
+ ((struct fd_scoped_permission __user *)arg,
+ &node->permission, sizeof(node->permission)))
+ return -EFAULT;
+ }
+ break;
+
+ case VSOC_MAYBE_SEND_INTERRUPT_TO_HOST:
+ if (!atomic_xchg(
+ reg_data->outgoing_signalled,
+ 1)) {
+ writel(reg_num, vsoc_dev.regs + DOORBELL);
+ return 0;
+ } else {
+ return -EBUSY;
+ }
+ break;
+
+ case VSOC_SEND_INTERRUPT_TO_HOST:
+ writel(reg_num, vsoc_dev.regs + DOORBELL);
+ return 0;
+
+ case VSOC_WAIT_FOR_INCOMING_INTERRUPT:
+ wait_event_interruptible(
+ reg_data->interrupt_wait_queue,
+ (atomic_read(reg_data->incoming_signalled) != 0));
+ break;
+
+ case VSOC_DESCRIBE_REGION:
+ return do_vsoc_describe_region(
+ filp,
+ (struct vsoc_device_region __user *)arg);
+
+ case VSOC_SELF_INTERRUPT:
+ atomic_set(reg_data->incoming_signalled, 1);
+ wake_up_interruptible(®_data->interrupt_wait_queue);
+ break;
+
+ case VSOC_COND_WAIT:
+ return do_vsoc_cond_wait(filp,
+ (struct vsoc_cond_wait __user *)arg);
+ case VSOC_COND_WAKE:
+ return do_vsoc_cond_wake(filp, arg);
+
+ default:
+ return -EINVAL;
+ }
+ return 0;
+}
+
+static ssize_t vsoc_read(struct file *filp, char __user *buffer, size_t len,
+ loff_t *poffset)
+{
+ __u32 area_off;
+ const void *area_p;
+ ssize_t area_len;
+ int retval = vsoc_validate_filep(filp);
+
+ if (retval)
+ return retval;
+ area_len = vsoc_get_area(filp, &area_off);
+ area_p = shm_off_to_virtual_addr(area_off);
+ area_p += *poffset;
+ area_len -= *poffset;
+ if (area_len <= 0)
+ return 0;
+ if (area_len < len)
+ len = area_len;
+ if (copy_to_user(buffer, area_p, len))
+ return -EFAULT;
+ *poffset += len;
+ return len;
+}
+
+static loff_t vsoc_lseek(struct file *filp, loff_t offset, int origin)
+{
+ ssize_t area_len = 0;
+ int retval = vsoc_validate_filep(filp);
+
+ if (retval)
+ return retval;
+ area_len = vsoc_get_area(filp, NULL);
+ switch (origin) {
+ case SEEK_SET:
+ break;
+
+ case SEEK_CUR:
+ if (offset > 0 && offset + filp->f_pos < 0)
+ return -EOVERFLOW;
+ offset += filp->f_pos;
+ break;
+
+ case SEEK_END:
+ if (offset > 0 && offset + area_len < 0)
+ return -EOVERFLOW;
+ offset += area_len;
+ break;
+
+ case SEEK_DATA:
+ if (offset >= area_len)
+ return -EINVAL;
+ if (offset < 0)
+ offset = 0;
+ break;
+
+ case SEEK_HOLE:
+ /* Next hole is always the end of the region, unless offset is
+ * beyond that
+ */
+ if (offset < area_len)
+ offset = area_len;
+ break;
+
+ default:
+ return -EINVAL;
+ }
+
+ if (offset < 0 || offset > area_len)
+ return -EINVAL;
+ filp->f_pos = offset;
+
+ return offset;
+}
+
+static ssize_t vsoc_write(struct file *filp, const char __user *buffer,
+ size_t len, loff_t *poffset)
+{
+ __u32 area_off;
+ void *area_p;
+ ssize_t area_len;
+ int retval = vsoc_validate_filep(filp);
+
+ if (retval)
+ return retval;
+ area_len = vsoc_get_area(filp, &area_off);
+ area_p = shm_off_to_virtual_addr(area_off);
+ area_p += *poffset;
+ area_len -= *poffset;
+ if (area_len <= 0)
+ return 0;
+ if (area_len < len)
+ len = area_len;
+ if (copy_from_user(area_p, buffer, len))
+ return -EFAULT;
+ *poffset += len;
+ return len;
+}
+
+static irqreturn_t vsoc_interrupt(int irq, void *region_data_v)
+{
+ struct vsoc_region_data *region_data =
+ (struct vsoc_region_data *)region_data_v;
+ int reg_num = region_data - vsoc_dev.regions_data;
+
+ if (unlikely(!region_data))
+ return IRQ_NONE;
+
+ if (unlikely(reg_num < 0 ||
+ reg_num >= vsoc_dev.layout->region_count)) {
+ dev_err(&vsoc_dev.dev->dev,
+ "invalid irq @%p reg_num=0x%04x\n",
+ region_data, reg_num);
+ return IRQ_NONE;
+ }
+ if (unlikely(vsoc_dev.regions_data + reg_num != region_data)) {
+ dev_err(&vsoc_dev.dev->dev,
+ "irq not aligned @%p reg_num=0x%04x\n",
+ region_data, reg_num);
+ return IRQ_NONE;
+ }
+ wake_up_interruptible(®ion_data->interrupt_wait_queue);
+ return IRQ_HANDLED;
+}
+
+static int vsoc_probe_device(struct pci_dev *pdev,
+ const struct pci_device_id *ent)
+{
+ int result;
+ int i;
+ resource_size_t reg_size;
+ dev_t devt;
+
+ vsoc_dev.dev = pdev;
+ result = pci_enable_device(pdev);
+ if (result) {
+ dev_err(&pdev->dev,
+ "pci_enable_device failed %s: error %d\n",
+ pci_name(pdev), result);
+ return result;
+ }
+ vsoc_dev.enabled_device = true;
+ result = pci_request_regions(pdev, "vsoc");
+ if (result < 0) {
+ dev_err(&pdev->dev, "pci_request_regions failed\n");
+ vsoc_remove_device(pdev);
+ return -EBUSY;
+ }
+ vsoc_dev.requested_regions = true;
+ /* Set up the control registers in BAR 0 */
+ reg_size = pci_resource_len(pdev, REGISTER_BAR);
+ if (reg_size > MAX_REGISTER_BAR_LEN)
+ vsoc_dev.regs =
+ pci_iomap(pdev, REGISTER_BAR, MAX_REGISTER_BAR_LEN);
+ else
+ vsoc_dev.regs = pci_iomap(pdev, REGISTER_BAR, reg_size);
+
+ if (!vsoc_dev.regs) {
+ dev_err(&pdev->dev,
+ "cannot map registers of size %zu\n",
+ (size_t)reg_size);
+ vsoc_remove_device(pdev);
+ return -EBUSY;
+ }
+
+ /* Map the shared memory in BAR 2 */
+ vsoc_dev.shm_phys_start = pci_resource_start(pdev, SHARED_MEMORY_BAR);
+ vsoc_dev.shm_size = pci_resource_len(pdev, SHARED_MEMORY_BAR);
+
+ dev_info(&pdev->dev, "shared memory @ DMA %pa size=0x%zx\n",
+ &vsoc_dev.shm_phys_start, vsoc_dev.shm_size);
+ vsoc_dev.kernel_mapped_shm = pci_iomap_wc(pdev, SHARED_MEMORY_BAR, 0);
+ if (!vsoc_dev.kernel_mapped_shm) {
+ dev_err(&vsoc_dev.dev->dev, "cannot iomap region\n");
+ vsoc_remove_device(pdev);
+ return -EBUSY;
+ }
+
+ vsoc_dev.layout = (struct vsoc_shm_layout_descriptor __force *)
+ vsoc_dev.kernel_mapped_shm;
+ dev_info(&pdev->dev, "major_version: %d\n",
+ vsoc_dev.layout->major_version);
+ dev_info(&pdev->dev, "minor_version: %d\n",
+ vsoc_dev.layout->minor_version);
+ dev_info(&pdev->dev, "size: 0x%x\n", vsoc_dev.layout->size);
+ dev_info(&pdev->dev, "regions: %d\n", vsoc_dev.layout->region_count);
+ if (vsoc_dev.layout->major_version !=
+ CURRENT_VSOC_LAYOUT_MAJOR_VERSION) {
+ dev_err(&vsoc_dev.dev->dev,
+ "driver supports only major_version %d\n",
+ CURRENT_VSOC_LAYOUT_MAJOR_VERSION);
+ vsoc_remove_device(pdev);
+ return -EBUSY;
+ }
+ result = alloc_chrdev_region(&devt, 0, vsoc_dev.layout->region_count,
+ VSOC_DEV_NAME);
+ if (result) {
+ dev_err(&vsoc_dev.dev->dev, "alloc_chrdev_region failed\n");
+ vsoc_remove_device(pdev);
+ return -EBUSY;
+ }
+ vsoc_dev.major = MAJOR(devt);
+ cdev_init(&vsoc_dev.cdev, &vsoc_ops);
+ vsoc_dev.cdev.owner = THIS_MODULE;
+ result = cdev_add(&vsoc_dev.cdev, devt, vsoc_dev.layout->region_count);
+ if (result) {
+ dev_err(&vsoc_dev.dev->dev, "cdev_add error\n");
+ vsoc_remove_device(pdev);
+ return -EBUSY;
+ }
+ vsoc_dev.cdev_added = true;
+ vsoc_dev.class = class_create(THIS_MODULE, VSOC_DEV_NAME);
+ if (IS_ERR(vsoc_dev.class)) {
+ dev_err(&vsoc_dev.dev->dev, "class_create failed\n");
+ vsoc_remove_device(pdev);
+ return PTR_ERR(vsoc_dev.class);
+ }
+ vsoc_dev.class_added = true;
+ vsoc_dev.regions = (struct vsoc_device_region __force *)
+ ((void *)vsoc_dev.layout +
+ vsoc_dev.layout->vsoc_region_desc_offset);
+ vsoc_dev.msix_entries = kcalloc(
+ vsoc_dev.layout->region_count,
+ sizeof(vsoc_dev.msix_entries[0]), GFP_KERNEL);
+ if (!vsoc_dev.msix_entries) {
+ dev_err(&vsoc_dev.dev->dev,
+ "unable to allocate msix_entries\n");
+ vsoc_remove_device(pdev);
+ return -ENOSPC;
+ }
+ vsoc_dev.regions_data = kcalloc(
+ vsoc_dev.layout->region_count,
+ sizeof(vsoc_dev.regions_data[0]), GFP_KERNEL);
+ if (!vsoc_dev.regions_data) {
+ dev_err(&vsoc_dev.dev->dev,
+ "unable to allocate regions' data\n");
+ vsoc_remove_device(pdev);
+ return -ENOSPC;
+ }
+ for (i = 0; i < vsoc_dev.layout->region_count; ++i)
+ vsoc_dev.msix_entries[i].entry = i;
+
+ result = pci_enable_msix_exact(vsoc_dev.dev, vsoc_dev.msix_entries,
+ vsoc_dev.layout->region_count);
+ if (result) {
+ dev_info(&pdev->dev, "pci_enable_msix failed: %d\n", result);
+ vsoc_remove_device(pdev);
+ return -ENOSPC;
+ }
+ /* Check that all regions are well formed */
+ for (i = 0; i < vsoc_dev.layout->region_count; ++i) {
+ const struct vsoc_device_region *region = vsoc_dev.regions + i;
+
+ if (!PAGE_ALIGNED(region->region_begin_offset) ||
+ !PAGE_ALIGNED(region->region_end_offset)) {
+ dev_err(&vsoc_dev.dev->dev,
+ "region %d not aligned (%x:%x)", i,
+ region->region_begin_offset,
+ region->region_end_offset);
+ vsoc_remove_device(pdev);
+ return -EFAULT;
+ }
+ if (region->region_begin_offset >= region->region_end_offset ||
+ region->region_end_offset > vsoc_dev.shm_size) {
+ dev_err(&vsoc_dev.dev->dev,
+ "region %d offsets are wrong: %x %x %zx",
+ i, region->region_begin_offset,
+ region->region_end_offset, vsoc_dev.shm_size);
+ vsoc_remove_device(pdev);
+ return -EFAULT;
+ }
+ if (region->managed_by >= vsoc_dev.layout->region_count) {
+ dev_err(&vsoc_dev.dev->dev,
+ "region %d has invalid owner: %u",
+ i, region->managed_by);
+ vsoc_remove_device(pdev);
+ return -EFAULT;
+ }
+ }
+ vsoc_dev.msix_enabled = true;
+ for (i = 0; i < vsoc_dev.layout->region_count; ++i) {
+ const struct vsoc_device_region *region = vsoc_dev.regions + i;
+ size_t name_sz = sizeof(vsoc_dev.regions_data[i].name) - 1;
+ const struct vsoc_signal_table_layout *h_to_g_signal_table =
+ ®ion->host_to_guest_signal_table;
+ const struct vsoc_signal_table_layout *g_to_h_signal_table =
+ ®ion->guest_to_host_signal_table;
+
+ vsoc_dev.regions_data[i].name[name_sz] = '\0';
+ memcpy(vsoc_dev.regions_data[i].name, region->device_name,
+ name_sz);
+ dev_info(&pdev->dev, "region %d name=%s\n",
+ i, vsoc_dev.regions_data[i].name);
+ init_waitqueue_head(
+ &vsoc_dev.regions_data[i].interrupt_wait_queue);
+ init_waitqueue_head(&vsoc_dev.regions_data[i].futex_wait_queue);
+ vsoc_dev.regions_data[i].incoming_signalled =
+ shm_off_to_virtual_addr(region->region_begin_offset) +
+ h_to_g_signal_table->interrupt_signalled_offset;
+ vsoc_dev.regions_data[i].outgoing_signalled =
+ shm_off_to_virtual_addr(region->region_begin_offset) +
+ g_to_h_signal_table->interrupt_signalled_offset;
+ result = request_irq(
+ vsoc_dev.msix_entries[i].vector,
+ vsoc_interrupt, 0,
+ vsoc_dev.regions_data[i].name,
+ vsoc_dev.regions_data + i);
+ if (result) {
+ dev_info(&pdev->dev,
+ "request_irq failed irq=%d vector=%d\n",
+ i, vsoc_dev.msix_entries[i].vector);
+ vsoc_remove_device(pdev);
+ return -ENOSPC;
+ }
+ vsoc_dev.regions_data[i].irq_requested = true;
+ if (!device_create(vsoc_dev.class, NULL,
+ MKDEV(vsoc_dev.major, i),
+ NULL, vsoc_dev.regions_data[i].name)) {
+ dev_err(&vsoc_dev.dev->dev, "device_create failed\n");
+ vsoc_remove_device(pdev);
+ return -EBUSY;
+ }
+ vsoc_dev.regions_data[i].device_created = true;
+ }
+ return 0;
+}
+
+/*
+ * This should undo all of the allocations in the probe function in reverse
+ * order.
+ *
+ * Notes:
+ *
+ * The device may have been partially initialized, so double check
+ * that the allocations happened.
+ *
+ * This function may be called multiple times, so mark resources as freed
+ * as they are deallocated.
+ */
+static void vsoc_remove_device(struct pci_dev *pdev)
+{
+ int i;
+ /*
+ * pdev is the first thing to be set on probe and the last thing
+ * to be cleared here. If it's NULL then there is no cleanup.
+ */
+ if (!pdev || !vsoc_dev.dev)
+ return;
+ dev_info(&pdev->dev, "remove_device\n");
+ if (vsoc_dev.regions_data) {
+ for (i = 0; i < vsoc_dev.layout->region_count; ++i) {
+ if (vsoc_dev.regions_data[i].device_created) {
+ device_destroy(vsoc_dev.class,
+ MKDEV(vsoc_dev.major, i));
+ vsoc_dev.regions_data[i].device_created = false;
+ }
+ if (vsoc_dev.regions_data[i].irq_requested)
+ free_irq(vsoc_dev.msix_entries[i].vector, NULL);
+ vsoc_dev.regions_data[i].irq_requested = false;
+ }
+ kfree(vsoc_dev.regions_data);
+ vsoc_dev.regions_data = NULL;
+ }
+ if (vsoc_dev.msix_enabled) {
+ pci_disable_msix(pdev);
+ vsoc_dev.msix_enabled = false;
+ }
+ kfree(vsoc_dev.msix_entries);
+ vsoc_dev.msix_entries = NULL;
+ vsoc_dev.regions = NULL;
+ if (vsoc_dev.class_added) {
+ class_destroy(vsoc_dev.class);
+ vsoc_dev.class_added = false;
+ }
+ if (vsoc_dev.cdev_added) {
+ cdev_del(&vsoc_dev.cdev);
+ vsoc_dev.cdev_added = false;
+ }
+ if (vsoc_dev.major && vsoc_dev.layout) {
+ unregister_chrdev_region(MKDEV(vsoc_dev.major, 0),
+ vsoc_dev.layout->region_count);
+ vsoc_dev.major = 0;
+ }
+ vsoc_dev.layout = NULL;
+ if (vsoc_dev.kernel_mapped_shm) {
+ pci_iounmap(pdev, vsoc_dev.kernel_mapped_shm);
+ vsoc_dev.kernel_mapped_shm = NULL;
+ }
+ if (vsoc_dev.regs) {
+ pci_iounmap(pdev, vsoc_dev.regs);
+ vsoc_dev.regs = NULL;
+ }
+ if (vsoc_dev.requested_regions) {
+ pci_release_regions(pdev);
+ vsoc_dev.requested_regions = false;
+ }
+ if (vsoc_dev.enabled_device) {
+ pci_disable_device(pdev);
+ vsoc_dev.enabled_device = false;
+ }
+ /* Do this last: it indicates that the device is not initialized. */
+ vsoc_dev.dev = NULL;
+}
+
+static void __exit vsoc_cleanup_module(void)
+{
+ vsoc_remove_device(vsoc_dev.dev);
+ pci_unregister_driver(&vsoc_pci_driver);
+}
+
+static int __init vsoc_init_module(void)
+{
+ int err = -ENOMEM;
+
+ INIT_LIST_HEAD(&vsoc_dev.permissions);
+ mutex_init(&vsoc_dev.mtx);
+
+ err = pci_register_driver(&vsoc_pci_driver);
+ if (err < 0)
+ return err;
+ return 0;
+}
+
+static int vsoc_open(struct inode *inode, struct file *filp)
+{
+ /* Can't use vsoc_validate_filep because filp is still incomplete */
+ int ret = vsoc_validate_inode(inode);
+
+ if (ret)
+ return ret;
+ filp->private_data =
+ kzalloc(sizeof(struct vsoc_private_data), GFP_KERNEL);
+ if (!filp->private_data)
+ return -ENOMEM;
+ return 0;
+}
+
+static int vsoc_release(struct inode *inode, struct file *filp)
+{
+ struct vsoc_private_data *private_data = NULL;
+ struct fd_scoped_permission_node *node = NULL;
+ struct vsoc_device_region *owner_region_p = NULL;
+ int retval = vsoc_validate_filep(filp);
+
+ if (retval)
+ return retval;
+ private_data = (struct vsoc_private_data *)filp->private_data;
+ if (!private_data)
+ return 0;
+
+ node = private_data->fd_scoped_permission_node;
+ if (node) {
+ owner_region_p = vsoc_region_from_inode(inode);
+ if (owner_region_p->managed_by != VSOC_REGION_WHOLE) {
+ owner_region_p =
+ &vsoc_dev.regions[owner_region_p->managed_by];
+ }
+ do_destroy_fd_scoped_permission_node(owner_region_p, node);
+ private_data->fd_scoped_permission_node = NULL;
+ }
+ kfree(private_data);
+ filp->private_data = NULL;
+
+ return 0;
+}
+
+/*
+ * Returns the device relative offset and length of the area specified by the
+ * fd scoped permission. If there is no fd scoped permission set, a default
+ * permission covering the entire region is assumed, unless the region is owned
+ * by another one, in which case the default is a permission with zero size.
+ */
+static ssize_t vsoc_get_area(struct file *filp, __u32 *area_offset)
+{
+ __u32 off = 0;
+ ssize_t length = 0;
+ struct vsoc_device_region *region_p;
+ struct fd_scoped_permission *perm;
+
+ region_p = vsoc_region_from_filep(filp);
+ off = region_p->region_begin_offset;
+ perm = &((struct vsoc_private_data *)filp->private_data)->
+ fd_scoped_permission_node->permission;
+ if (perm) {
+ off += perm->begin_offset;
+ length = perm->end_offset - perm->begin_offset;
+ } else if (region_p->managed_by == VSOC_REGION_WHOLE) {
+ /* No permission set and the regions is not owned by another,
+ * default to full region access.
+ */
+ length = vsoc_device_region_size(region_p);
+ } else {
+ /* return zero length, access is denied. */
+ length = 0;
+ }
+ if (area_offset)
+ *area_offset = off;
+ return length;
+}
+
+static int vsoc_mmap(struct file *filp, struct vm_area_struct *vma)
+{
+ unsigned long len = vma->vm_end - vma->vm_start;
+ __u32 area_off;
+ phys_addr_t mem_off;
+ ssize_t area_len;
+ int retval = vsoc_validate_filep(filp);
+
+ if (retval)
+ return retval;
+ area_len = vsoc_get_area(filp, &area_off);
+ /* Add the requested offset */
+ area_off += (vma->vm_pgoff << PAGE_SHIFT);
+ area_len -= (vma->vm_pgoff << PAGE_SHIFT);
+ if (area_len < len)
+ return -EINVAL;
+ vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+ mem_off = shm_off_to_phys_addr(area_off);
+ if (io_remap_pfn_range(vma, vma->vm_start, mem_off >> PAGE_SHIFT,
+ len, vma->vm_page_prot))
+ return -EAGAIN;
+ return 0;
+}
+
+module_init(vsoc_init_module);
+module_exit(vsoc_cleanup_module);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Greg Hartman <ghartman@google.com>");
+MODULE_DESCRIPTION("VSoC interpretation of QEmu's ivshmem device");
+MODULE_VERSION("1.0");
diff --git a/drivers/staging/goldfish/Kconfig b/drivers/staging/goldfish/Kconfig
index 4e09460..d293bbc 100644
--- a/drivers/staging/goldfish/Kconfig
+++ b/drivers/staging/goldfish/Kconfig
@@ -4,6 +4,14 @@
---help---
Emulated audio channel for the Goldfish Android Virtual Device
+config GOLDFISH_SYNC
+ tristate "Goldfish AVD Sync Driver"
+ depends on GOLDFISH
+ depends on SW_SYNC
+ depends on SYNC_FILE
+ ---help---
+ Emulated sync fences for the Goldfish Android Virtual Device
+
config MTD_GOLDFISH_NAND
tristate "Goldfish NAND device"
depends on GOLDFISH
diff --git a/drivers/staging/goldfish/Makefile b/drivers/staging/goldfish/Makefile
index dec34ad..3313fce 100644
--- a/drivers/staging/goldfish/Makefile
+++ b/drivers/staging/goldfish/Makefile
@@ -4,3 +4,9 @@
obj-$(CONFIG_GOLDFISH_AUDIO) += goldfish_audio.o
obj-$(CONFIG_MTD_GOLDFISH_NAND) += goldfish_nand.o
+
+# and sync
+
+ccflags-y := -Idrivers/staging/android
+goldfish_sync-objs := goldfish_sync_timeline_fence.o goldfish_sync_timeline.o
+obj-$(CONFIG_GOLDFISH_SYNC) += goldfish_sync.o
diff --git a/drivers/staging/goldfish/goldfish_audio.c b/drivers/staging/goldfish/goldfish_audio.c
index bd55995..0bb0ee2 100644
--- a/drivers/staging/goldfish/goldfish_audio.c
+++ b/drivers/staging/goldfish/goldfish_audio.c
@@ -28,6 +28,7 @@
#include <linux/uaccess.h>
#include <linux/slab.h>
#include <linux/goldfish.h>
+#include <linux/acpi.h>
MODULE_AUTHOR("Google, Inc.");
MODULE_DESCRIPTION("Android QEMU Audio Driver");
@@ -116,6 +117,7 @@
size_t count, loff_t *pos)
{
struct goldfish_audio *data = fp->private_data;
+ unsigned long irq_flags;
int length;
int result = 0;
@@ -129,6 +131,10 @@
wait_event_interruptible(data->wait, data->buffer_status &
AUDIO_INT_READ_BUFFER_FULL);
+ spin_lock_irqsave(&data->lock, irq_flags);
+ data->buffer_status &= ~AUDIO_INT_READ_BUFFER_FULL;
+ spin_unlock_irqrestore(&data->lock, irq_flags);
+
length = AUDIO_READ(data, AUDIO_READ_BUFFER_AVAILABLE);
/* copy data to user space */
@@ -351,12 +357,19 @@
};
MODULE_DEVICE_TABLE(of, goldfish_audio_of_match);
+static const struct acpi_device_id goldfish_audio_acpi_match[] = {
+ { "GFSH0005", 0 },
+ { },
+};
+MODULE_DEVICE_TABLE(acpi, goldfish_audio_acpi_match);
+
static struct platform_driver goldfish_audio_driver = {
.probe = goldfish_audio_probe,
.remove = goldfish_audio_remove,
.driver = {
.name = "goldfish_audio",
.of_match_table = goldfish_audio_of_match,
+ .acpi_match_table = ACPI_PTR(goldfish_audio_acpi_match),
}
};
diff --git a/drivers/staging/goldfish/goldfish_sync_timeline.c b/drivers/staging/goldfish/goldfish_sync_timeline.c
new file mode 100644
index 0000000..880d6e2
--- /dev/null
+++ b/drivers/staging/goldfish/goldfish_sync_timeline.c
@@ -0,0 +1,962 @@
+/*
+ * Copyright (C) 2016 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/fdtable.h>
+#include <linux/file.h>
+#include <linux/init.h>
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/platform_device.h>
+
+#include <linux/interrupt.h>
+#include <linux/kref.h>
+#include <linux/spinlock.h>
+#include <linux/types.h>
+
+#include <linux/io.h>
+#include <linux/mm.h>
+#include <linux/acpi.h>
+
+#include <linux/string.h>
+
+#include <linux/fs.h>
+#include <linux/syscalls.h>
+#include <linux/sync_file.h>
+#include <linux/dma-fence.h>
+
+#include "goldfish_sync_timeline_fence.h"
+
+#define ERR(...) printk(KERN_ERR __VA_ARGS__);
+
+#define INFO(...) printk(KERN_INFO __VA_ARGS__);
+
+#define DPRINT(...) pr_debug(__VA_ARGS__);
+
+#define DTRACE() DPRINT("%s: enter", __func__)
+
+/* The Goldfish sync driver is designed to provide a interface
+ * between the underlying host's sync device and the kernel's
+ * fence sync framework..
+ * The purpose of the device/driver is to enable lightweight
+ * creation and signaling of timelines and fences
+ * in order to synchronize the guest with host-side graphics events.
+ *
+ * Each time the interrupt trips, the driver
+ * may perform a sync operation.
+ */
+
+/* The operations are: */
+
+/* Ready signal - used to mark when irq should lower */
+#define CMD_SYNC_READY 0
+
+/* Create a new timeline. writes timeline handle */
+#define CMD_CREATE_SYNC_TIMELINE 1
+
+/* Create a fence object. reads timeline handle and time argument.
+ * Writes fence fd to the SYNC_REG_HANDLE register. */
+#define CMD_CREATE_SYNC_FENCE 2
+
+/* Increments timeline. reads timeline handle and time argument */
+#define CMD_SYNC_TIMELINE_INC 3
+
+/* Destroys a timeline. reads timeline handle */
+#define CMD_DESTROY_SYNC_TIMELINE 4
+
+/* Starts a wait on the host with
+ * the given glsync object and sync thread handle. */
+#define CMD_TRIGGER_HOST_WAIT 5
+
+/* The register layout is: */
+
+#define SYNC_REG_BATCH_COMMAND 0x00 /* host->guest batch commands */
+#define SYNC_REG_BATCH_GUESTCOMMAND 0x04 /* guest->host batch commands */
+#define SYNC_REG_BATCH_COMMAND_ADDR 0x08 /* communicate physical address of host->guest batch commands */
+#define SYNC_REG_BATCH_COMMAND_ADDR_HIGH 0x0c /* 64-bit part */
+#define SYNC_REG_BATCH_GUESTCOMMAND_ADDR 0x10 /* communicate physical address of guest->host commands */
+#define SYNC_REG_BATCH_GUESTCOMMAND_ADDR_HIGH 0x14 /* 64-bit part */
+#define SYNC_REG_INIT 0x18 /* signals that the device has been probed */
+
+/* There is an ioctl associated with goldfish sync driver.
+ * Make it conflict with ioctls that are not likely to be used
+ * in the emulator.
+ *
+ * '@' 00-0F linux/radeonfb.h conflict!
+ * '@' 00-0F drivers/video/aty/aty128fb.c conflict!
+ */
+#define GOLDFISH_SYNC_IOC_MAGIC '@'
+
+#define GOLDFISH_SYNC_IOC_QUEUE_WORK _IOWR(GOLDFISH_SYNC_IOC_MAGIC, 0, struct goldfish_sync_ioctl_info)
+
+/* The above definitions (command codes, register layout, ioctl definitions)
+ * need to be in sync with the following files:
+ *
+ * Host-side (emulator):
+ * external/qemu/android/emulation/goldfish_sync.h
+ * external/qemu-android/hw/misc/goldfish_sync.c
+ *
+ * Guest-side (system image):
+ * device/generic/goldfish-opengl/system/egl/goldfish_sync.h
+ * device/generic/goldfish/ueventd.ranchu.rc
+ * platform/build/target/board/generic/sepolicy/file_contexts
+ */
+struct goldfish_sync_hostcmd {
+ /* sorted for alignment */
+ uint64_t handle;
+ uint64_t hostcmd_handle;
+ uint32_t cmd;
+ uint32_t time_arg;
+};
+
+struct goldfish_sync_guestcmd {
+ uint64_t host_command; /* uint64_t for alignment */
+ uint64_t glsync_handle;
+ uint64_t thread_handle;
+ uint64_t guest_timeline_handle;
+};
+
+#define GOLDFISH_SYNC_MAX_CMDS 32
+
+struct goldfish_sync_state {
+ char __iomem *reg_base;
+ int irq;
+
+ /* Spinlock protects |to_do| / |to_do_end|. */
+ spinlock_t lock;
+ /* |mutex_lock| protects all concurrent access
+ * to timelines for both kernel and user space. */
+ struct mutex mutex_lock;
+
+ /* Buffer holding commands issued from host. */
+ struct goldfish_sync_hostcmd to_do[GOLDFISH_SYNC_MAX_CMDS];
+ uint32_t to_do_end;
+
+ /* Addresses for the reading or writing
+ * of individual commands. The host can directly write
+ * to |batch_hostcmd| (and then this driver immediately
+ * copies contents to |to_do|). This driver either replies
+ * through |batch_hostcmd| or simply issues a
+ * guest->host command through |batch_guestcmd|.
+ */
+ struct goldfish_sync_hostcmd *batch_hostcmd;
+ struct goldfish_sync_guestcmd *batch_guestcmd;
+
+ /* Used to give this struct itself to a work queue
+ * function for executing actual sync commands. */
+ struct work_struct work_item;
+};
+
+static struct goldfish_sync_state global_sync_state[1];
+
+struct goldfish_sync_timeline_obj {
+ struct goldfish_sync_timeline *sync_tl;
+ uint32_t current_time;
+ /* We need to be careful about when we deallocate
+ * this |goldfish_sync_timeline_obj| struct.
+ * In order to ensure proper cleanup, we need to
+ * consider the triggered host-side wait that may
+ * still be in flight when the guest close()'s a
+ * goldfish_sync device's sync context fd (and
+ * destroys the |sync_tl| field above).
+ * The host-side wait may raise IRQ
+ * and tell the kernel to increment the timeline _after_
+ * the |sync_tl| has already been set to null.
+ *
+ * From observations on OpenGL apps and CTS tests, this
+ * happens at some very low probability upon context
+ * destruction or process close, but it does happen
+ * and it needs to be handled properly. Otherwise,
+ * if we clean up the surrounding |goldfish_sync_timeline_obj|
+ * too early, any |handle| field of any host->guest command
+ * might not even point to a null |sync_tl| field,
+ * but to garbage memory or even a reclaimed |sync_tl|.
+ * If we do not count such "pending waits" and kfree the object
+ * immediately upon |goldfish_sync_timeline_destroy|,
+ * we might get mysterous RCU stalls after running a long
+ * time because the garbage memory that is being read
+ * happens to be interpretable as a |spinlock_t| struct
+ * that is currently in the locked state.
+ *
+ * To track when to free the |goldfish_sync_timeline_obj|
+ * itself, we maintain a kref.
+ * The kref essentially counts the timeline itself plus
+ * the number of waits in flight. kref_init/kref_put
+ * are issued on
+ * |goldfish_sync_timeline_create|/|goldfish_sync_timeline_destroy|
+ * and kref_get/kref_put are issued on
+ * |goldfish_sync_fence_create|/|goldfish_sync_timeline_inc|.
+ *
+ * The timeline is destroyed after reference count
+ * reaches zero, which would happen after
+ * |goldfish_sync_timeline_destroy| and all pending
+ * |goldfish_sync_timeline_inc|'s are fulfilled.
+ *
+ * NOTE (1): We assume that |fence_create| and
+ * |timeline_inc| calls are 1:1, otherwise the kref scheme
+ * will not work. This is a valid assumption as long
+ * as the host-side virtual device implementation
+ * does not insert any timeline increments
+ * that we did not trigger from here.
+ *
+ * NOTE (2): The use of kref by itself requires no locks,
+ * but this does not mean everything works without locks.
+ * Related timeline operations do require a lock of some sort,
+ * or at least are not proven to work without it.
+ * In particualr, we assume that all the operations
+ * done on the |kref| field above are done in contexts where
+ * |global_sync_state->mutex_lock| is held. Do not
+ * remove that lock until everything is proven to work
+ * without it!!! */
+ struct kref kref;
+};
+
+/* We will call |delete_timeline_obj| when the last reference count
+ * of the kref is decremented. This deletes the sync
+ * timeline object along with the wrapper itself. */
+static void delete_timeline_obj(struct kref* kref) {
+ struct goldfish_sync_timeline_obj* obj =
+ container_of(kref, struct goldfish_sync_timeline_obj, kref);
+
+ goldfish_sync_timeline_put_internal(obj->sync_tl);
+ obj->sync_tl = NULL;
+ kfree(obj);
+}
+
+static uint64_t gensym_ctr;
+static void gensym(char *dst)
+{
+ sprintf(dst, "goldfish_sync:gensym:%llu", gensym_ctr);
+ gensym_ctr++;
+}
+
+/* |goldfish_sync_timeline_create| assumes that |global_sync_state->mutex_lock|
+ * is held. */
+static struct goldfish_sync_timeline_obj*
+goldfish_sync_timeline_create(void)
+{
+
+ char timeline_name[256];
+ struct goldfish_sync_timeline *res_sync_tl = NULL;
+ struct goldfish_sync_timeline_obj *res;
+
+ DTRACE();
+
+ gensym(timeline_name);
+
+ res_sync_tl = goldfish_sync_timeline_create_internal(timeline_name);
+ if (!res_sync_tl) {
+ ERR("Failed to create goldfish_sw_sync timeline.");
+ return NULL;
+ }
+
+ res = kzalloc(sizeof(struct goldfish_sync_timeline_obj), GFP_KERNEL);
+ res->sync_tl = res_sync_tl;
+ res->current_time = 0;
+ kref_init(&res->kref);
+
+ DPRINT("new timeline_obj=0x%p", res);
+ return res;
+}
+
+/* |goldfish_sync_fence_create| assumes that |global_sync_state->mutex_lock|
+ * is held. */
+static int
+goldfish_sync_fence_create(struct goldfish_sync_timeline_obj *obj,
+ uint32_t val)
+{
+
+ int fd;
+ char fence_name[256];
+ struct sync_pt *syncpt = NULL;
+ struct sync_file *sync_file_obj = NULL;
+ struct goldfish_sync_timeline *tl;
+
+ DTRACE();
+
+ if (!obj) return -1;
+
+ tl = obj->sync_tl;
+
+ syncpt = goldfish_sync_pt_create_internal(
+ tl, sizeof(struct sync_pt) + 4, val);
+ if (!syncpt) {
+ ERR("could not create sync point! "
+ "goldfish_sync_timeline=0x%p val=%d",
+ tl, val);
+ return -1;
+ }
+
+ fd = get_unused_fd_flags(O_CLOEXEC);
+ if (fd < 0) {
+ ERR("could not get unused fd for sync fence. "
+ "errno=%d", fd);
+ goto err_cleanup_pt;
+ }
+
+ gensym(fence_name);
+
+ sync_file_obj = sync_file_create(&syncpt->base);
+ if (!sync_file_obj) {
+ ERR("could not create sync fence! "
+ "goldfish_sync_timeline=0x%p val=%d sync_pt=0x%p",
+ tl, val, syncpt);
+ goto err_cleanup_fd_pt;
+ }
+
+ DPRINT("installing sync fence into fd %d sync_file_obj=0x%p",
+ fd, sync_file_obj);
+ fd_install(fd, sync_file_obj->file);
+ kref_get(&obj->kref);
+
+ return fd;
+
+err_cleanup_fd_pt:
+ put_unused_fd(fd);
+err_cleanup_pt:
+ dma_fence_put(&syncpt->base);
+ return -1;
+}
+
+/* |goldfish_sync_timeline_inc| assumes that |global_sync_state->mutex_lock|
+ * is held. */
+static void
+goldfish_sync_timeline_inc(struct goldfish_sync_timeline_obj *obj, uint32_t inc)
+{
+ DTRACE();
+ /* Just give up if someone else nuked the timeline.
+ * Whoever it was won't care that it doesn't get signaled. */
+ if (!obj) return;
+
+ DPRINT("timeline_obj=0x%p", obj);
+ goldfish_sync_timeline_signal_internal(obj->sync_tl, inc);
+ DPRINT("incremented timeline. increment max_time");
+ obj->current_time += inc;
+
+ /* Here, we will end up deleting the timeline object if it
+ * turns out that this call was a pending increment after
+ * |goldfish_sync_timeline_destroy| was called. */
+ kref_put(&obj->kref, delete_timeline_obj);
+ DPRINT("done");
+}
+
+/* |goldfish_sync_timeline_destroy| assumes
+ * that |global_sync_state->mutex_lock| is held. */
+static void
+goldfish_sync_timeline_destroy(struct goldfish_sync_timeline_obj *obj)
+{
+ DTRACE();
+ /* See description of |goldfish_sync_timeline_obj| for why we
+ * should not immediately destroy |obj| */
+ kref_put(&obj->kref, delete_timeline_obj);
+}
+
+static inline void
+goldfish_sync_cmd_queue(struct goldfish_sync_state *sync_state,
+ uint32_t cmd,
+ uint64_t handle,
+ uint32_t time_arg,
+ uint64_t hostcmd_handle)
+{
+ struct goldfish_sync_hostcmd *to_add;
+
+ DTRACE();
+
+ BUG_ON(sync_state->to_do_end == GOLDFISH_SYNC_MAX_CMDS);
+
+ to_add = &sync_state->to_do[sync_state->to_do_end];
+
+ to_add->cmd = cmd;
+ to_add->handle = handle;
+ to_add->time_arg = time_arg;
+ to_add->hostcmd_handle = hostcmd_handle;
+
+ sync_state->to_do_end += 1;
+}
+
+static inline void
+goldfish_sync_hostcmd_reply(struct goldfish_sync_state *sync_state,
+ uint32_t cmd,
+ uint64_t handle,
+ uint32_t time_arg,
+ uint64_t hostcmd_handle)
+{
+ unsigned long irq_flags;
+ struct goldfish_sync_hostcmd *batch_hostcmd =
+ sync_state->batch_hostcmd;
+
+ DTRACE();
+
+ spin_lock_irqsave(&sync_state->lock, irq_flags);
+
+ batch_hostcmd->cmd = cmd;
+ batch_hostcmd->handle = handle;
+ batch_hostcmd->time_arg = time_arg;
+ batch_hostcmd->hostcmd_handle = hostcmd_handle;
+ writel(0, sync_state->reg_base + SYNC_REG_BATCH_COMMAND);
+
+ spin_unlock_irqrestore(&sync_state->lock, irq_flags);
+}
+
+static inline void
+goldfish_sync_send_guestcmd(struct goldfish_sync_state *sync_state,
+ uint32_t cmd,
+ uint64_t glsync_handle,
+ uint64_t thread_handle,
+ uint64_t timeline_handle)
+{
+ unsigned long irq_flags;
+ struct goldfish_sync_guestcmd *batch_guestcmd =
+ sync_state->batch_guestcmd;
+
+ DTRACE();
+
+ spin_lock_irqsave(&sync_state->lock, irq_flags);
+
+ batch_guestcmd->host_command = (uint64_t)cmd;
+ batch_guestcmd->glsync_handle = (uint64_t)glsync_handle;
+ batch_guestcmd->thread_handle = (uint64_t)thread_handle;
+ batch_guestcmd->guest_timeline_handle = (uint64_t)timeline_handle;
+ writel(0, sync_state->reg_base + SYNC_REG_BATCH_GUESTCOMMAND);
+
+ spin_unlock_irqrestore(&sync_state->lock, irq_flags);
+}
+
+/* |goldfish_sync_interrupt| handles IRQ raises from the virtual device.
+ * In the context of OpenGL, this interrupt will fire whenever we need
+ * to signal a fence fd in the guest, with the command
+ * |CMD_SYNC_TIMELINE_INC|.
+ * However, because this function will be called in an interrupt context,
+ * it is necessary to do the actual work of signaling off of interrupt context.
+ * The shared work queue is used for this purpose. At the end when
+ * all pending commands are intercepted by the interrupt handler,
+ * we call |schedule_work|, which will later run the actual
+ * desired sync command in |goldfish_sync_work_item_fn|.
+ */
+static irqreturn_t goldfish_sync_interrupt(int irq, void *dev_id)
+{
+
+ struct goldfish_sync_state *sync_state = dev_id;
+
+ uint32_t nextcmd;
+ uint32_t command_r;
+ uint64_t handle_rw;
+ uint32_t time_r;
+ uint64_t hostcmd_handle_rw;
+
+ int count = 0;
+
+ DTRACE();
+
+ sync_state = dev_id;
+
+ spin_lock(&sync_state->lock);
+
+ for (;;) {
+
+ readl(sync_state->reg_base + SYNC_REG_BATCH_COMMAND);
+ nextcmd = sync_state->batch_hostcmd->cmd;
+
+ if (nextcmd == 0)
+ break;
+
+ command_r = nextcmd;
+ handle_rw = sync_state->batch_hostcmd->handle;
+ time_r = sync_state->batch_hostcmd->time_arg;
+ hostcmd_handle_rw = sync_state->batch_hostcmd->hostcmd_handle;
+
+ goldfish_sync_cmd_queue(
+ sync_state,
+ command_r,
+ handle_rw,
+ time_r,
+ hostcmd_handle_rw);
+
+ count++;
+ }
+
+ spin_unlock(&sync_state->lock);
+
+ schedule_work(&sync_state->work_item);
+
+ return (count == 0) ? IRQ_NONE : IRQ_HANDLED;
+}
+
+/* |goldfish_sync_work_item_fn| does the actual work of servicing
+ * host->guest sync commands. This function is triggered whenever
+ * the IRQ for the goldfish sync device is raised. Once it starts
+ * running, it grabs the contents of the buffer containing the
+ * commands it needs to execute (there may be multiple, because
+ * our IRQ is active high and not edge triggered), and then
+ * runs all of them one after the other.
+ */
+static void goldfish_sync_work_item_fn(struct work_struct *input)
+{
+
+ struct goldfish_sync_state *sync_state;
+ int sync_fence_fd;
+
+ struct goldfish_sync_timeline_obj *timeline;
+ uint64_t timeline_ptr;
+
+ uint64_t hostcmd_handle;
+
+ uint32_t cmd;
+ uint64_t handle;
+ uint32_t time_arg;
+
+ struct goldfish_sync_hostcmd *todo;
+ uint32_t todo_end;
+
+ unsigned long irq_flags;
+
+ struct goldfish_sync_hostcmd to_run[GOLDFISH_SYNC_MAX_CMDS];
+ uint32_t i = 0;
+
+ sync_state = container_of(input, struct goldfish_sync_state, work_item);
+
+ mutex_lock(&sync_state->mutex_lock);
+
+ spin_lock_irqsave(&sync_state->lock, irq_flags); {
+
+ todo_end = sync_state->to_do_end;
+
+ DPRINT("num sync todos: %u", sync_state->to_do_end);
+
+ for (i = 0; i < todo_end; i++)
+ to_run[i] = sync_state->to_do[i];
+
+ /* We expect that commands will come in at a slow enough rate
+ * so that incoming items will not be more than
+ * GOLDFISH_SYNC_MAX_CMDS.
+ *
+ * This is because the way the sync device is used,
+ * it's only for managing buffer data transfers per frame,
+ * with a sequential dependency between putting things in
+ * to_do and taking them out. Once a set of commands is
+ * queued up in to_do, the user of the device waits for
+ * them to be processed before queuing additional commands,
+ * which limits the rate at which commands come in
+ * to the rate at which we take them out here.
+ *
+ * We also don't expect more than MAX_CMDS to be issued
+ * at once; there is a correspondence between
+ * which buffers need swapping to the (display / buffer queue)
+ * to particular commands, and we don't expect there to be
+ * enough display or buffer queues in operation at once
+ * to overrun GOLDFISH_SYNC_MAX_CMDS.
+ */
+ sync_state->to_do_end = 0;
+
+ } spin_unlock_irqrestore(&sync_state->lock, irq_flags);
+
+ for (i = 0; i < todo_end; i++) {
+ DPRINT("todo index: %u", i);
+
+ todo = &to_run[i];
+
+ cmd = todo->cmd;
+
+ handle = (uint64_t)todo->handle;
+ time_arg = todo->time_arg;
+ hostcmd_handle = (uint64_t)todo->hostcmd_handle;
+
+ DTRACE();
+
+ timeline = (struct goldfish_sync_timeline_obj *)(uintptr_t)handle;
+
+ switch (cmd) {
+ case CMD_SYNC_READY:
+ break;
+ case CMD_CREATE_SYNC_TIMELINE:
+ DPRINT("exec CMD_CREATE_SYNC_TIMELINE: "
+ "handle=0x%llx time_arg=%d",
+ handle, time_arg);
+ timeline = goldfish_sync_timeline_create();
+ timeline_ptr = (uintptr_t)timeline;
+ goldfish_sync_hostcmd_reply(sync_state, CMD_CREATE_SYNC_TIMELINE,
+ timeline_ptr,
+ 0,
+ hostcmd_handle);
+ DPRINT("sync timeline created: %p", timeline);
+ break;
+ case CMD_CREATE_SYNC_FENCE:
+ DPRINT("exec CMD_CREATE_SYNC_FENCE: "
+ "handle=0x%llx time_arg=%d",
+ handle, time_arg);
+ sync_fence_fd = goldfish_sync_fence_create(timeline, time_arg);
+ goldfish_sync_hostcmd_reply(sync_state, CMD_CREATE_SYNC_FENCE,
+ sync_fence_fd,
+ 0,
+ hostcmd_handle);
+ break;
+ case CMD_SYNC_TIMELINE_INC:
+ DPRINT("exec CMD_SYNC_TIMELINE_INC: "
+ "handle=0x%llx time_arg=%d",
+ handle, time_arg);
+ goldfish_sync_timeline_inc(timeline, time_arg);
+ break;
+ case CMD_DESTROY_SYNC_TIMELINE:
+ DPRINT("exec CMD_DESTROY_SYNC_TIMELINE: "
+ "handle=0x%llx time_arg=%d",
+ handle, time_arg);
+ goldfish_sync_timeline_destroy(timeline);
+ break;
+ }
+ DPRINT("Done executing sync command");
+ }
+ mutex_unlock(&sync_state->mutex_lock);
+}
+
+/* Guest-side interface: file operations */
+
+/* Goldfish sync context and ioctl info.
+ *
+ * When a sync context is created by open()-ing the goldfish sync device, we
+ * create a sync context (|goldfish_sync_context|).
+ *
+ * Currently, the only data required to track is the sync timeline itself
+ * along with the current time, which are all packed up in the
+ * |goldfish_sync_timeline_obj| field. We use a |goldfish_sync_context|
+ * as the filp->private_data.
+ *
+ * Next, when a sync context user requests that work be queued and a fence
+ * fd provided, we use the |goldfish_sync_ioctl_info| struct, which holds
+ * information about which host handles to touch for this particular
+ * queue-work operation. We need to know about the host-side sync thread
+ * and the particular host-side GLsync object. We also possibly write out
+ * a file descriptor.
+ */
+struct goldfish_sync_context {
+ struct goldfish_sync_timeline_obj *timeline;
+};
+
+struct goldfish_sync_ioctl_info {
+ uint64_t host_glsync_handle_in;
+ uint64_t host_syncthread_handle_in;
+ int fence_fd_out;
+};
+
+static int goldfish_sync_open(struct inode *inode, struct file *file)
+{
+
+ struct goldfish_sync_context *sync_context;
+
+ DTRACE();
+
+ mutex_lock(&global_sync_state->mutex_lock);
+
+ sync_context = kzalloc(sizeof(struct goldfish_sync_context), GFP_KERNEL);
+
+ if (sync_context == NULL) {
+ ERR("Creation of goldfish sync context failed!");
+ mutex_unlock(&global_sync_state->mutex_lock);
+ return -ENOMEM;
+ }
+
+ sync_context->timeline = NULL;
+
+ file->private_data = sync_context;
+
+ DPRINT("successfully create a sync context @0x%p", sync_context);
+
+ mutex_unlock(&global_sync_state->mutex_lock);
+
+ return 0;
+}
+
+static int goldfish_sync_release(struct inode *inode, struct file *file)
+{
+
+ struct goldfish_sync_context *sync_context;
+
+ DTRACE();
+
+ mutex_lock(&global_sync_state->mutex_lock);
+
+ sync_context = file->private_data;
+
+ if (sync_context->timeline)
+ goldfish_sync_timeline_destroy(sync_context->timeline);
+
+ sync_context->timeline = NULL;
+
+ kfree(sync_context);
+
+ mutex_unlock(&global_sync_state->mutex_lock);
+
+ return 0;
+}
+
+/* |goldfish_sync_ioctl| is the guest-facing interface of goldfish sync
+ * and is used in conjunction with eglCreateSyncKHR to queue up the
+ * actual work of waiting for the EGL sync command to complete,
+ * possibly returning a fence fd to the guest.
+ */
+static long goldfish_sync_ioctl(struct file *file,
+ unsigned int cmd,
+ unsigned long arg)
+{
+ struct goldfish_sync_context *sync_context_data;
+ struct goldfish_sync_timeline_obj *timeline;
+ int fd_out;
+ struct goldfish_sync_ioctl_info ioctl_data;
+
+ DTRACE();
+
+ sync_context_data = file->private_data;
+ fd_out = -1;
+
+ switch (cmd) {
+ case GOLDFISH_SYNC_IOC_QUEUE_WORK:
+
+ DPRINT("exec GOLDFISH_SYNC_IOC_QUEUE_WORK");
+
+ mutex_lock(&global_sync_state->mutex_lock);
+
+ if (copy_from_user(&ioctl_data,
+ (void __user *)arg,
+ sizeof(ioctl_data))) {
+ ERR("Failed to copy memory for ioctl_data from user.");
+ mutex_unlock(&global_sync_state->mutex_lock);
+ return -EFAULT;
+ }
+
+ if (ioctl_data.host_syncthread_handle_in == 0) {
+ DPRINT("Error: zero host syncthread handle!!!");
+ mutex_unlock(&global_sync_state->mutex_lock);
+ return -EFAULT;
+ }
+
+ if (!sync_context_data->timeline) {
+ DPRINT("no timeline yet, create one.");
+ sync_context_data->timeline = goldfish_sync_timeline_create();
+ DPRINT("timeline: 0x%p", &sync_context_data->timeline);
+ }
+
+ timeline = sync_context_data->timeline;
+ fd_out = goldfish_sync_fence_create(timeline,
+ timeline->current_time + 1);
+ DPRINT("Created fence with fd %d and current time %u (timeline: 0x%p)",
+ fd_out,
+ sync_context_data->timeline->current_time + 1,
+ sync_context_data->timeline);
+
+ ioctl_data.fence_fd_out = fd_out;
+
+ if (copy_to_user((void __user *)arg,
+ &ioctl_data,
+ sizeof(ioctl_data))) {
+ DPRINT("Error, could not copy to user!!!");
+
+ sys_close(fd_out);
+ /* We won't be doing an increment, kref_put immediately. */
+ kref_put(&timeline->kref, delete_timeline_obj);
+ mutex_unlock(&global_sync_state->mutex_lock);
+ return -EFAULT;
+ }
+
+ /* We are now about to trigger a host-side wait;
+ * accumulate on |pending_waits|. */
+ goldfish_sync_send_guestcmd(global_sync_state,
+ CMD_TRIGGER_HOST_WAIT,
+ ioctl_data.host_glsync_handle_in,
+ ioctl_data.host_syncthread_handle_in,
+ (uint64_t)(uintptr_t)(sync_context_data->timeline));
+
+ mutex_unlock(&global_sync_state->mutex_lock);
+ return 0;
+ default:
+ return -ENOTTY;
+ }
+}
+
+static const struct file_operations goldfish_sync_fops = {
+ .owner = THIS_MODULE,
+ .open = goldfish_sync_open,
+ .release = goldfish_sync_release,
+ .unlocked_ioctl = goldfish_sync_ioctl,
+ .compat_ioctl = goldfish_sync_ioctl,
+};
+
+static struct miscdevice goldfish_sync_device = {
+ .name = "goldfish_sync",
+ .fops = &goldfish_sync_fops,
+};
+
+
+static bool setup_verify_batch_cmd_addr(struct goldfish_sync_state *sync_state,
+ void *batch_addr,
+ uint32_t addr_offset,
+ uint32_t addr_offset_high)
+{
+ uint64_t batch_addr_phys;
+ uint32_t batch_addr_phys_test_lo;
+ uint32_t batch_addr_phys_test_hi;
+
+ if (!batch_addr) {
+ ERR("Could not use batch command address!");
+ return false;
+ }
+
+ batch_addr_phys = virt_to_phys(batch_addr);
+ writel((uint32_t)(batch_addr_phys),
+ sync_state->reg_base + addr_offset);
+ writel((uint32_t)(batch_addr_phys >> 32),
+ sync_state->reg_base + addr_offset_high);
+
+ batch_addr_phys_test_lo =
+ readl(sync_state->reg_base + addr_offset);
+ batch_addr_phys_test_hi =
+ readl(sync_state->reg_base + addr_offset_high);
+
+ if (virt_to_phys(batch_addr) !=
+ (((uint64_t)batch_addr_phys_test_hi << 32) |
+ batch_addr_phys_test_lo)) {
+ ERR("Invalid batch command address!");
+ return false;
+ }
+
+ return true;
+}
+
+int goldfish_sync_probe(struct platform_device *pdev)
+{
+ struct resource *ioresource;
+ struct goldfish_sync_state *sync_state = global_sync_state;
+ int status;
+
+ DTRACE();
+
+ sync_state->to_do_end = 0;
+
+ spin_lock_init(&sync_state->lock);
+ mutex_init(&sync_state->mutex_lock);
+
+ platform_set_drvdata(pdev, sync_state);
+
+ ioresource = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ if (ioresource == NULL) {
+ ERR("platform_get_resource failed");
+ return -ENODEV;
+ }
+
+ sync_state->reg_base =
+ devm_ioremap(&pdev->dev, ioresource->start, PAGE_SIZE);
+ if (sync_state->reg_base == NULL) {
+ ERR("Could not ioremap");
+ return -ENOMEM;
+ }
+
+ sync_state->irq = platform_get_irq(pdev, 0);
+ if (sync_state->irq < 0) {
+ ERR("Could not platform_get_irq");
+ return -ENODEV;
+ }
+
+ status = devm_request_irq(&pdev->dev,
+ sync_state->irq,
+ goldfish_sync_interrupt,
+ IRQF_SHARED,
+ pdev->name,
+ sync_state);
+ if (status) {
+ ERR("request_irq failed");
+ return -ENODEV;
+ }
+
+ INIT_WORK(&sync_state->work_item,
+ goldfish_sync_work_item_fn);
+
+ misc_register(&goldfish_sync_device);
+
+ /* Obtain addresses for batch send/recv of commands. */
+ {
+ struct goldfish_sync_hostcmd *batch_addr_hostcmd;
+ struct goldfish_sync_guestcmd *batch_addr_guestcmd;
+
+ batch_addr_hostcmd =
+ devm_kzalloc(&pdev->dev, sizeof(struct goldfish_sync_hostcmd),
+ GFP_KERNEL);
+ batch_addr_guestcmd =
+ devm_kzalloc(&pdev->dev, sizeof(struct goldfish_sync_guestcmd),
+ GFP_KERNEL);
+
+ if (!setup_verify_batch_cmd_addr(sync_state,
+ batch_addr_hostcmd,
+ SYNC_REG_BATCH_COMMAND_ADDR,
+ SYNC_REG_BATCH_COMMAND_ADDR_HIGH)) {
+ ERR("goldfish_sync: Could not setup batch command address");
+ return -ENODEV;
+ }
+
+ if (!setup_verify_batch_cmd_addr(sync_state,
+ batch_addr_guestcmd,
+ SYNC_REG_BATCH_GUESTCOMMAND_ADDR,
+ SYNC_REG_BATCH_GUESTCOMMAND_ADDR_HIGH)) {
+ ERR("goldfish_sync: Could not setup batch guest command address");
+ return -ENODEV;
+ }
+
+ sync_state->batch_hostcmd = batch_addr_hostcmd;
+ sync_state->batch_guestcmd = batch_addr_guestcmd;
+ }
+
+ INFO("goldfish_sync: Initialized goldfish sync device");
+
+ writel(0, sync_state->reg_base + SYNC_REG_INIT);
+
+ return 0;
+}
+
+static int goldfish_sync_remove(struct platform_device *pdev)
+{
+ struct goldfish_sync_state *sync_state = global_sync_state;
+
+ DTRACE();
+
+ misc_deregister(&goldfish_sync_device);
+ memset(sync_state, 0, sizeof(struct goldfish_sync_state));
+ return 0;
+}
+
+static const struct of_device_id goldfish_sync_of_match[] = {
+ { .compatible = "google,goldfish-sync", },
+ {},
+};
+MODULE_DEVICE_TABLE(of, goldfish_sync_of_match);
+
+static const struct acpi_device_id goldfish_sync_acpi_match[] = {
+ { "GFSH0006", 0 },
+ { },
+};
+
+MODULE_DEVICE_TABLE(acpi, goldfish_sync_acpi_match);
+
+static struct platform_driver goldfish_sync = {
+ .probe = goldfish_sync_probe,
+ .remove = goldfish_sync_remove,
+ .driver = {
+ .name = "goldfish_sync",
+ .of_match_table = goldfish_sync_of_match,
+ .acpi_match_table = ACPI_PTR(goldfish_sync_acpi_match),
+ }
+};
+
+module_platform_driver(goldfish_sync);
+
+MODULE_AUTHOR("Google, Inc.");
+MODULE_DESCRIPTION("Android QEMU Sync Driver");
+MODULE_LICENSE("GPL");
+MODULE_VERSION("1.0");
diff --git a/drivers/staging/goldfish/goldfish_sync_timeline_fence.c b/drivers/staging/goldfish/goldfish_sync_timeline_fence.c
new file mode 100644
index 0000000..a5bc2de
--- /dev/null
+++ b/drivers/staging/goldfish/goldfish_sync_timeline_fence.c
@@ -0,0 +1,254 @@
+#include <linux/slab.h>
+#include <linux/fs.h>
+#include <linux/syscalls.h>
+#include <linux/sync_file.h>
+#include <linux/dma-fence.h>
+
+#include "goldfish_sync_timeline_fence.h"
+
+/*
+ * Timeline-based sync for Goldfish Sync
+ * Based on "Sync File validation framework"
+ * (drivers/dma-buf/sw_sync.c)
+ *
+ * Copyright (C) 2017 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+/**
+ * struct goldfish_sync_timeline - sync object
+ * @kref: reference count on fence.
+ * @name: name of the goldfish_sync_timeline. Useful for debugging
+ * @child_list_head: list of children sync_pts for this goldfish_sync_timeline
+ * @child_list_lock: lock protecting @child_list_head and fence.status
+ * @active_list_head: list of active (unsignaled/errored) sync_pts
+ */
+struct goldfish_sync_timeline {
+ struct kref kref;
+ char name[32];
+
+ /* protected by child_list_lock */
+ u64 context;
+ int value;
+
+ struct list_head child_list_head;
+ spinlock_t child_list_lock;
+
+ struct list_head active_list_head;
+};
+
+static inline struct goldfish_sync_timeline *goldfish_dma_fence_parent(struct dma_fence *fence)
+{
+ return container_of(fence->lock, struct goldfish_sync_timeline,
+ child_list_lock);
+}
+
+static const struct dma_fence_ops goldfish_sync_timeline_fence_ops;
+
+static inline struct sync_pt *goldfish_sync_fence_to_sync_pt(struct dma_fence *fence)
+{
+ if (fence->ops != &goldfish_sync_timeline_fence_ops)
+ return NULL;
+ return container_of(fence, struct sync_pt, base);
+}
+
+/**
+ * goldfish_sync_timeline_create_internal() - creates a sync object
+ * @name: sync_timeline name
+ *
+ * Creates a new sync_timeline. Returns the sync_timeline object or NULL in
+ * case of error.
+ */
+struct goldfish_sync_timeline
+*goldfish_sync_timeline_create_internal(const char *name)
+{
+ struct goldfish_sync_timeline *obj;
+
+ obj = kzalloc(sizeof(*obj), GFP_KERNEL);
+ if (!obj)
+ return NULL;
+
+ kref_init(&obj->kref);
+ obj->context = dma_fence_context_alloc(1);
+ strlcpy(obj->name, name, sizeof(obj->name));
+
+ INIT_LIST_HEAD(&obj->child_list_head);
+ INIT_LIST_HEAD(&obj->active_list_head);
+ spin_lock_init(&obj->child_list_lock);
+
+ return obj;
+}
+
+static void goldfish_sync_timeline_free_internal(struct kref *kref)
+{
+ struct goldfish_sync_timeline *obj =
+ container_of(kref, struct goldfish_sync_timeline, kref);
+
+ kfree(obj);
+}
+
+static void goldfish_sync_timeline_get_internal(
+ struct goldfish_sync_timeline *obj)
+{
+ kref_get(&obj->kref);
+}
+
+void goldfish_sync_timeline_put_internal(struct goldfish_sync_timeline *obj)
+{
+ kref_put(&obj->kref, goldfish_sync_timeline_free_internal);
+}
+
+/**
+ * goldfish_sync_timeline_signal() -
+ * signal a status change on a goldfish_sync_timeline
+ * @obj: sync_timeline to signal
+ * @inc: num to increment on timeline->value
+ *
+ * A sync implementation should call this any time one of it's fences
+ * has signaled or has an error condition.
+ */
+void goldfish_sync_timeline_signal_internal(struct goldfish_sync_timeline *obj,
+ unsigned int inc)
+{
+ unsigned long flags;
+ struct sync_pt *pt, *next;
+
+ spin_lock_irqsave(&obj->child_list_lock, flags);
+
+ obj->value += inc;
+
+ list_for_each_entry_safe(pt, next, &obj->active_list_head,
+ active_list) {
+ if (dma_fence_is_signaled_locked(&pt->base))
+ list_del_init(&pt->active_list);
+ }
+
+ spin_unlock_irqrestore(&obj->child_list_lock, flags);
+}
+
+/**
+ * goldfish_sync_pt_create_internal() - creates a sync pt
+ * @parent: fence's parent sync_timeline
+ * @size: size to allocate for this pt
+ * @inc: value of the fence
+ *
+ * Creates a new sync_pt as a child of @parent. @size bytes will be
+ * allocated allowing for implementation specific data to be kept after
+ * the generic sync_timeline struct. Returns the sync_pt object or
+ * NULL in case of error.
+ */
+struct sync_pt *goldfish_sync_pt_create_internal(
+ struct goldfish_sync_timeline *obj, int size,
+ unsigned int value)
+{
+ unsigned long flags;
+ struct sync_pt *pt;
+
+ if (size < sizeof(*pt))
+ return NULL;
+
+ pt = kzalloc(size, GFP_KERNEL);
+ if (!pt)
+ return NULL;
+
+ spin_lock_irqsave(&obj->child_list_lock, flags);
+ goldfish_sync_timeline_get_internal(obj);
+ dma_fence_init(&pt->base, &goldfish_sync_timeline_fence_ops, &obj->child_list_lock,
+ obj->context, value);
+ list_add_tail(&pt->child_list, &obj->child_list_head);
+ INIT_LIST_HEAD(&pt->active_list);
+ spin_unlock_irqrestore(&obj->child_list_lock, flags);
+ return pt;
+}
+
+static const char *goldfish_sync_timeline_fence_get_driver_name(
+ struct dma_fence *fence)
+{
+ return "sw_sync";
+}
+
+static const char *goldfish_sync_timeline_fence_get_timeline_name(
+ struct dma_fence *fence)
+{
+ struct goldfish_sync_timeline *parent = goldfish_dma_fence_parent(fence);
+
+ return parent->name;
+}
+
+static void goldfish_sync_timeline_fence_release(struct dma_fence *fence)
+{
+ struct sync_pt *pt = goldfish_sync_fence_to_sync_pt(fence);
+ struct goldfish_sync_timeline *parent = goldfish_dma_fence_parent(fence);
+ unsigned long flags;
+
+ spin_lock_irqsave(fence->lock, flags);
+ list_del(&pt->child_list);
+ if (!list_empty(&pt->active_list))
+ list_del(&pt->active_list);
+ spin_unlock_irqrestore(fence->lock, flags);
+
+ goldfish_sync_timeline_put_internal(parent);
+ dma_fence_free(fence);
+}
+
+static bool goldfish_sync_timeline_fence_signaled(struct dma_fence *fence)
+{
+ struct goldfish_sync_timeline *parent = goldfish_dma_fence_parent(fence);
+
+ return (fence->seqno > parent->value) ? false : true;
+}
+
+static bool goldfish_sync_timeline_fence_enable_signaling(struct dma_fence *fence)
+{
+ struct sync_pt *pt = goldfish_sync_fence_to_sync_pt(fence);
+ struct goldfish_sync_timeline *parent = goldfish_dma_fence_parent(fence);
+
+ if (goldfish_sync_timeline_fence_signaled(fence))
+ return false;
+
+ list_add_tail(&pt->active_list, &parent->active_list_head);
+ return true;
+}
+
+static void goldfish_sync_timeline_fence_disable_signaling(struct dma_fence *fence)
+{
+ struct sync_pt *pt = container_of(fence, struct sync_pt, base);
+
+ list_del_init(&pt->active_list);
+}
+
+static void goldfish_sync_timeline_fence_value_str(struct dma_fence *fence,
+ char *str, int size)
+{
+ snprintf(str, size, "%d", fence->seqno);
+}
+
+static void goldfish_sync_timeline_fence_timeline_value_str(
+ struct dma_fence *fence,
+ char *str, int size)
+{
+ struct goldfish_sync_timeline *parent = goldfish_dma_fence_parent(fence);
+
+ snprintf(str, size, "%d", parent->value);
+}
+
+static const struct dma_fence_ops goldfish_sync_timeline_fence_ops = {
+ .get_driver_name = goldfish_sync_timeline_fence_get_driver_name,
+ .get_timeline_name = goldfish_sync_timeline_fence_get_timeline_name,
+ .enable_signaling = goldfish_sync_timeline_fence_enable_signaling,
+ .disable_signaling = goldfish_sync_timeline_fence_disable_signaling,
+ .signaled = goldfish_sync_timeline_fence_signaled,
+ .wait = dma_fence_default_wait,
+ .release = goldfish_sync_timeline_fence_release,
+ .fence_value_str = goldfish_sync_timeline_fence_value_str,
+ .timeline_value_str = goldfish_sync_timeline_fence_timeline_value_str,
+};
diff --git a/drivers/staging/goldfish/goldfish_sync_timeline_fence.h b/drivers/staging/goldfish/goldfish_sync_timeline_fence.h
new file mode 100644
index 0000000..638c6fb
--- /dev/null
+++ b/drivers/staging/goldfish/goldfish_sync_timeline_fence.h
@@ -0,0 +1,58 @@
+#include <linux/sync_file.h>
+#include <linux/dma-fence.h>
+
+/**
+ * struct sync_pt - sync_pt object
+ * @base: base dma_fence object
+ * @child_list: sync timeline child's list
+ * @active_list: sync timeline active child's list
+ */
+struct sync_pt {
+ struct dma_fence base;
+ struct list_head child_list;
+ struct list_head active_list;
+};
+
+/**
+ * goldfish_sync_timeline_create_internal() - creates a sync object
+ * @name: goldfish_sync_timeline name
+ *
+ * Creates a new goldfish_sync_timeline.
+ * Returns the goldfish_sync_timeline object or NULL in case of error.
+ */
+struct goldfish_sync_timeline
+*goldfish_sync_timeline_create_internal(const char *name);
+
+/**
+ * goldfish_sync_pt_create_internal() - creates a sync pt
+ * @parent: fence's parent goldfish_sync_timeline
+ * @size: size to allocate for this pt
+ * @inc: value of the fence
+ *
+ * Creates a new sync_pt as a child of @parent. @size bytes will be
+ * allocated allowing for implementation specific data to be kept after
+ * the generic sync_timeline struct. Returns the sync_pt object or
+ * NULL in case of error.
+ */
+struct sync_pt
+*goldfish_sync_pt_create_internal(struct goldfish_sync_timeline *obj,
+ int size, unsigned int value);
+
+/**
+ * goldfish_sync_timeline_signal_internal() -
+ * signal a status change on a sync_timeline
+ * @obj: goldfish_sync_timeline to signal
+ * @inc: num to increment on timeline->value
+ *
+ * A sync implementation should call this any time one of it's fences
+ * has signaled or has an error condition.
+ */
+void goldfish_sync_timeline_signal_internal(struct goldfish_sync_timeline *obj,
+ unsigned int inc);
+
+/**
+ * goldfish_sync_timeline_put_internal() - dec refcount of a sync_timeline
+ * and clean up memory if it was the last ref.
+ * @obj: goldfish_sync_timeline to decref
+ */
+void goldfish_sync_timeline_put_internal(struct goldfish_sync_timeline *obj);
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 6ef8dde..8c97acd 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -1121,9 +1121,9 @@
* in PAGE_SIZE (if PAGE_SIZE greater than LU_PAGE_SIZE), and the
* lu_dirpage for this integrated page will be adjusted.
**/
-static int mdc_read_page_remote(void *data, struct page *page0)
+static int mdc_read_page_remote(struct file *data, struct page *page0)
{
- struct readpage_param *rp = data;
+ struct readpage_param *rp = (struct readpage_param *)data;
struct page **page_pool;
struct page *page;
struct lu_dirpage *dp;
diff --git a/drivers/thermal/hisi_thermal.c b/drivers/thermal/hisi_thermal.c
index 6d8906d..2d855a9 100644
--- a/drivers/thermal/hisi_thermal.c
+++ b/drivers/thermal/hisi_thermal.c
@@ -23,207 +23,423 @@
#include <linux/module.h>
#include <linux/platform_device.h>
#include <linux/io.h>
+#include <linux/of_device.h>
#include "thermal_core.h"
-#define TEMP0_TH (0x4)
-#define TEMP0_RST_TH (0x8)
-#define TEMP0_CFG (0xC)
-#define TEMP0_EN (0x10)
-#define TEMP0_INT_EN (0x14)
-#define TEMP0_INT_CLR (0x18)
-#define TEMP0_RST_MSK (0x1C)
-#define TEMP0_VALUE (0x28)
+#define HI6220_TEMP0_LAG (0x0)
+#define HI6220_TEMP0_TH (0x4)
+#define HI6220_TEMP0_RST_TH (0x8)
+#define HI6220_TEMP0_CFG (0xC)
+#define HI6220_TEMP0_CFG_SS_MSK (0xF000)
+#define HI6220_TEMP0_CFG_HDAK_MSK (0x30)
+#define HI6220_TEMP0_EN (0x10)
+#define HI6220_TEMP0_INT_EN (0x14)
+#define HI6220_TEMP0_INT_CLR (0x18)
+#define HI6220_TEMP0_RST_MSK (0x1C)
+#define HI6220_TEMP0_VALUE (0x28)
-#define HISI_TEMP_BASE (-60000)
-#define HISI_TEMP_RESET (100000)
-#define HISI_TEMP_STEP (784)
+#define HI3660_OFFSET(chan) ((chan) * 0x40)
+#define HI3660_TEMP(chan) (HI3660_OFFSET(chan) + 0x1C)
+#define HI3660_TH(chan) (HI3660_OFFSET(chan) + 0x20)
+#define HI3660_LAG(chan) (HI3660_OFFSET(chan) + 0x28)
+#define HI3660_INT_EN(chan) (HI3660_OFFSET(chan) + 0x2C)
+#define HI3660_INT_CLR(chan) (HI3660_OFFSET(chan) + 0x30)
-#define HISI_MAX_SENSORS 4
+#define HI6220_TEMP_BASE (-60000)
+#define HI6220_TEMP_RESET (100000)
+#define HI6220_TEMP_STEP (785)
+#define HI6220_TEMP_LAG (3500)
+
+#define HI3660_TEMP_BASE (-63780)
+#define HI3660_TEMP_STEP (205)
+#define HI3660_TEMP_LAG (4000)
+
+#define HI6220_DEFAULT_SENSOR 2
+#define HI3660_DEFAULT_SENSOR 1
struct hisi_thermal_sensor {
- struct hisi_thermal_data *thermal;
struct thermal_zone_device *tzd;
-
- long sensor_temp;
uint32_t id;
uint32_t thres_temp;
};
struct hisi_thermal_data {
- struct mutex thermal_lock; /* protects register data */
+ int (*get_temp)(struct hisi_thermal_data *data);
+ int (*enable_sensor)(struct hisi_thermal_data *data);
+ int (*disable_sensor)(struct hisi_thermal_data *data);
+ int (*irq_handler)(struct hisi_thermal_data *data);
struct platform_device *pdev;
struct clk *clk;
- struct hisi_thermal_sensor sensors[HISI_MAX_SENSORS];
-
- int irq, irq_bind_sensor;
- bool irq_enabled;
-
+ struct hisi_thermal_sensor sensor;
void __iomem *regs;
+ int irq;
};
/*
* The temperature computation on the tsensor is as follow:
* Unit: millidegree Celsius
- * Step: 255/200 (0.7843)
+ * Step: 200/255 (0.7843)
* Temperature base: -60°C
*
- * The register is programmed in temperature steps, every step is 784
+ * The register is programmed in temperature steps, every step is 785
* millidegree and begins at -60 000 m°C
*
* The temperature from the steps:
*
- * Temp = TempBase + (steps x 784)
+ * Temp = TempBase + (steps x 785)
*
* and the steps from the temperature:
*
- * steps = (Temp - TempBase) / 784
+ * steps = (Temp - TempBase) / 785
*
*/
-static inline int hisi_thermal_step_to_temp(int step)
+static inline int hi6220_thermal_step_to_temp(int step)
{
- return HISI_TEMP_BASE + (step * HISI_TEMP_STEP);
+ return HI6220_TEMP_BASE + (step * HI6220_TEMP_STEP);
}
-static inline long hisi_thermal_temp_to_step(long temp)
+static inline int hi6220_thermal_temp_to_step(int temp)
{
- return (temp - HISI_TEMP_BASE) / HISI_TEMP_STEP;
+ return DIV_ROUND_UP(temp - HI6220_TEMP_BASE, HI6220_TEMP_STEP);
}
-static inline long hisi_thermal_round_temp(int temp)
+/*
+ * for Hi3660,
+ * Step: 189/922 (0.205)
+ * Temperature base: -63.780°C
+ *
+ * The register is programmed in temperature steps, every step is 205
+ * millidegree and begins at -63 780 m°C
+ */
+static inline int hi3660_thermal_step_to_temp(int step)
{
- return hisi_thermal_step_to_temp(
- hisi_thermal_temp_to_step(temp));
+ return HI3660_TEMP_BASE + step * HI3660_TEMP_STEP;
}
-static long hisi_thermal_get_sensor_temp(struct hisi_thermal_data *data,
- struct hisi_thermal_sensor *sensor)
+static inline int hi3660_thermal_temp_to_step(int temp)
{
- long val;
+ return DIV_ROUND_UP(temp - HI3660_TEMP_BASE, HI3660_TEMP_STEP);
+}
- mutex_lock(&data->thermal_lock);
+/*
+ * The lag register contains 5 bits encoding the temperature in steps.
+ *
+ * Each time the temperature crosses the threshold boundary, an
+ * interrupt is raised. It could be when the temperature is going
+ * above the threshold or below. However, if the temperature is
+ * fluctuating around this value due to the load, we can receive
+ * several interrupts which may not desired.
+ *
+ * We can setup a temperature representing the delta between the
+ * threshold and the current temperature when the temperature is
+ * decreasing.
+ *
+ * For instance: the lag register is 5°C, the threshold is 65°C, when
+ * the temperature reaches 65°C an interrupt is raised and when the
+ * temperature decrease to 65°C - 5°C another interrupt is raised.
+ *
+ * A very short lag can lead to an interrupt storm, a long lag
+ * increase the latency to react to the temperature changes. In our
+ * case, that is not really a problem as we are polling the
+ * temperature.
+ *
+ * [0:4] : lag register
+ *
+ * The temperature is coded in steps, cf. HI6220_TEMP_STEP.
+ *
+ * Min : 0x00 : 0.0 °C
+ * Max : 0x1F : 24.3 °C
+ *
+ * The 'value' parameter is in milliCelsius.
+ */
+static inline void hi6220_thermal_set_lag(void __iomem *addr, int value)
+{
+ writel(DIV_ROUND_UP(value, HI6220_TEMP_STEP) & 0x1F,
+ addr + HI6220_TEMP0_LAG);
+}
- /* disable interrupt */
- writel(0x0, data->regs + TEMP0_INT_EN);
- writel(0x1, data->regs + TEMP0_INT_CLR);
+static inline void hi6220_thermal_alarm_clear(void __iomem *addr, int value)
+{
+ writel(value, addr + HI6220_TEMP0_INT_CLR);
+}
+
+static inline void hi6220_thermal_alarm_enable(void __iomem *addr, int value)
+{
+ writel(value, addr + HI6220_TEMP0_INT_EN);
+}
+
+static inline void hi6220_thermal_alarm_set(void __iomem *addr, int temp)
+{
+ writel(hi6220_thermal_temp_to_step(temp) | 0x0FFFFFF00,
+ addr + HI6220_TEMP0_TH);
+}
+
+static inline void hi6220_thermal_reset_set(void __iomem *addr, int temp)
+{
+ writel(hi6220_thermal_temp_to_step(temp), addr + HI6220_TEMP0_RST_TH);
+}
+
+static inline void hi6220_thermal_reset_enable(void __iomem *addr, int value)
+{
+ writel(value, addr + HI6220_TEMP0_RST_MSK);
+}
+
+static inline void hi6220_thermal_enable(void __iomem *addr, int value)
+{
+ writel(value, addr + HI6220_TEMP0_EN);
+}
+
+static inline int hi6220_thermal_get_temperature(void __iomem *addr)
+{
+ return hi6220_thermal_step_to_temp(readl(addr + HI6220_TEMP0_VALUE));
+}
+
+/*
+ * [0:6] lag register
+ *
+ * The temperature is coded in steps, cf. HI3660_TEMP_STEP.
+ *
+ * Min : 0x00 : 0.0 °C
+ * Max : 0x7F : 26.0 °C
+ *
+ */
+static inline void hi3660_thermal_set_lag(void __iomem *addr,
+ int id, int value)
+{
+ writel(DIV_ROUND_UP(value, HI3660_TEMP_STEP) & 0x7F,
+ addr + HI3660_LAG(id));
+}
+
+static inline void hi3660_thermal_alarm_clear(void __iomem *addr,
+ int id, int value)
+{
+ writel(value, addr + HI3660_INT_CLR(id));
+}
+
+static inline void hi3660_thermal_alarm_enable(void __iomem *addr,
+ int id, int value)
+{
+ writel(value, addr + HI3660_INT_EN(id));
+}
+
+static inline void hi3660_thermal_alarm_set(void __iomem *addr,
+ int id, int value)
+{
+ writel(value, addr + HI3660_TH(id));
+}
+
+static inline int hi3660_thermal_get_temperature(void __iomem *addr, int id)
+{
+ return hi3660_thermal_step_to_temp(readl(addr + HI3660_TEMP(id)));
+}
+
+/*
+ * Temperature configuration register - Sensor selection
+ *
+ * Bits [19:12]
+ *
+ * 0x0: local sensor (default)
+ * 0x1: remote sensor 1 (ACPU cluster 1)
+ * 0x2: remote sensor 2 (ACPU cluster 0)
+ * 0x3: remote sensor 3 (G3D)
+ */
+static inline void hi6220_thermal_sensor_select(void __iomem *addr, int sensor)
+{
+ writel((readl(addr + HI6220_TEMP0_CFG) & ~HI6220_TEMP0_CFG_SS_MSK) |
+ (sensor << 12), addr + HI6220_TEMP0_CFG);
+}
+
+/*
+ * Temperature configuration register - Hdak conversion polling interval
+ *
+ * Bits [5:4]
+ *
+ * 0x0 : 0.768 ms
+ * 0x1 : 6.144 ms
+ * 0x2 : 49.152 ms
+ * 0x3 : 393.216 ms
+ */
+static inline void hi6220_thermal_hdak_set(void __iomem *addr, int value)
+{
+ writel((readl(addr + HI6220_TEMP0_CFG) & ~HI6220_TEMP0_CFG_HDAK_MSK) |
+ (value << 4), addr + HI6220_TEMP0_CFG);
+}
+
+static int hi6220_thermal_irq_handler(struct hisi_thermal_data *data)
+{
+ hi6220_thermal_alarm_clear(data->regs, 1);
+ return 0;
+}
+
+static int hi3660_thermal_irq_handler(struct hisi_thermal_data *data)
+{
+ hi3660_thermal_alarm_clear(data->regs, data->sensor.id, 1);
+ return 0;
+}
+
+static int hi6220_thermal_get_temp(struct hisi_thermal_data *data)
+{
+ return hi6220_thermal_get_temperature(data->regs);
+}
+
+static int hi3660_thermal_get_temp(struct hisi_thermal_data *data)
+{
+ return hi3660_thermal_get_temperature(data->regs, data->sensor.id);
+}
+
+static int hi6220_thermal_disable_sensor(struct hisi_thermal_data *data)
+{
+ /* disable sensor module */
+ hi6220_thermal_enable(data->regs, 0);
+ hi6220_thermal_alarm_enable(data->regs, 0);
+ hi6220_thermal_reset_enable(data->regs, 0);
+
+ clk_disable_unprepare(data->clk);
+
+ return 0;
+}
+
+static int hi3660_thermal_disable_sensor(struct hisi_thermal_data *data)
+{
+ /* disable sensor module */
+ hi3660_thermal_alarm_enable(data->regs, data->sensor.id, 0);
+ return 0;
+}
+
+static int hi6220_thermal_enable_sensor(struct hisi_thermal_data *data)
+{
+ struct hisi_thermal_sensor *sensor = &data->sensor;
+ int ret;
+
+ /* enable clock for tsensor */
+ ret = clk_prepare_enable(data->clk);
+ if (ret)
+ return ret;
/* disable module firstly */
- writel(0x0, data->regs + TEMP0_EN);
+ hi6220_thermal_reset_enable(data->regs, 0);
+ hi6220_thermal_enable(data->regs, 0);
/* select sensor id */
- writel((sensor->id << 12), data->regs + TEMP0_CFG);
-
- /* enable module */
- writel(0x1, data->regs + TEMP0_EN);
-
- usleep_range(3000, 5000);
-
- val = readl(data->regs + TEMP0_VALUE);
- val = hisi_thermal_step_to_temp(val);
-
- mutex_unlock(&data->thermal_lock);
-
- return val;
-}
-
-static void hisi_thermal_enable_bind_irq_sensor
- (struct hisi_thermal_data *data)
-{
- struct hisi_thermal_sensor *sensor;
-
- mutex_lock(&data->thermal_lock);
-
- sensor = &data->sensors[data->irq_bind_sensor];
+ hi6220_thermal_sensor_select(data->regs, sensor->id);
/* setting the hdak time */
- writel(0x0, data->regs + TEMP0_CFG);
+ hi6220_thermal_hdak_set(data->regs, 0);
- /* disable module firstly */
- writel(0x0, data->regs + TEMP0_RST_MSK);
- writel(0x0, data->regs + TEMP0_EN);
-
- /* select sensor id */
- writel((sensor->id << 12), data->regs + TEMP0_CFG);
+ /* setting lag value between current temp and the threshold */
+ hi6220_thermal_set_lag(data->regs, HI6220_TEMP_LAG);
/* enable for interrupt */
- writel(hisi_thermal_temp_to_step(sensor->thres_temp) | 0x0FFFFFF00,
- data->regs + TEMP0_TH);
+ hi6220_thermal_alarm_set(data->regs, sensor->thres_temp);
- writel(hisi_thermal_temp_to_step(HISI_TEMP_RESET),
- data->regs + TEMP0_RST_TH);
+ hi6220_thermal_reset_set(data->regs, HI6220_TEMP_RESET);
/* enable module */
- writel(0x1, data->regs + TEMP0_RST_MSK);
- writel(0x1, data->regs + TEMP0_EN);
+ hi6220_thermal_reset_enable(data->regs, 1);
+ hi6220_thermal_enable(data->regs, 1);
- writel(0x0, data->regs + TEMP0_INT_CLR);
- writel(0x1, data->regs + TEMP0_INT_EN);
+ hi6220_thermal_alarm_clear(data->regs, 0);
+ hi6220_thermal_alarm_enable(data->regs, 1);
- usleep_range(3000, 5000);
-
- mutex_unlock(&data->thermal_lock);
+ return 0;
}
-static void hisi_thermal_disable_sensor(struct hisi_thermal_data *data)
+static int hi3660_thermal_enable_sensor(struct hisi_thermal_data *data)
{
- mutex_lock(&data->thermal_lock);
+ unsigned int value;
+ struct hisi_thermal_sensor *sensor = &data->sensor;
- /* disable sensor module */
- writel(0x0, data->regs + TEMP0_INT_EN);
- writel(0x0, data->regs + TEMP0_RST_MSK);
- writel(0x0, data->regs + TEMP0_EN);
+ /* disable interrupt */
+ hi3660_thermal_alarm_enable(data->regs, sensor->id, 0);
- mutex_unlock(&data->thermal_lock);
+ /* setting lag value between current temp and the threshold */
+ hi3660_thermal_set_lag(data->regs, sensor->id, HI3660_TEMP_LAG);
+
+ /* set interrupt threshold */
+ value = hi3660_thermal_temp_to_step(sensor->thres_temp);
+ hi3660_thermal_alarm_set(data->regs, sensor->id, value);
+
+ /* enable interrupt */
+ hi3660_thermal_alarm_clear(data->regs, sensor->id, 1);
+ hi3660_thermal_alarm_enable(data->regs, sensor->id, 1);
+
+ return 0;
}
-static int hisi_thermal_get_temp(void *_sensor, int *temp)
+static int hi6220_thermal_probe(struct hisi_thermal_data *data)
{
- struct hisi_thermal_sensor *sensor = _sensor;
- struct hisi_thermal_data *data = sensor->thermal;
+ struct platform_device *pdev = data->pdev;
+ struct device *dev = &pdev->dev;
+ struct resource *res;
+ int ret;
- int sensor_id = -1, i;
- long max_temp = 0;
+ data->get_temp = hi6220_thermal_get_temp;
+ data->enable_sensor = hi6220_thermal_enable_sensor;
+ data->disable_sensor = hi6220_thermal_disable_sensor;
+ data->irq_handler = hi6220_thermal_irq_handler;
- *temp = hisi_thermal_get_sensor_temp(data, sensor);
-
- sensor->sensor_temp = *temp;
-
- for (i = 0; i < HISI_MAX_SENSORS; i++) {
- if (!data->sensors[i].tzd)
- continue;
-
- if (data->sensors[i].sensor_temp >= max_temp) {
- max_temp = data->sensors[i].sensor_temp;
- sensor_id = i;
- }
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ data->regs = devm_ioremap_resource(dev, res);
+ if (IS_ERR(data->regs)) {
+ dev_err(dev, "failed to get io address\n");
+ return PTR_ERR(data->regs);
}
- /* If no sensor has been enabled, then skip to enable irq */
- if (sensor_id == -1)
- return 0;
-
- mutex_lock(&data->thermal_lock);
- data->irq_bind_sensor = sensor_id;
- mutex_unlock(&data->thermal_lock);
-
- dev_dbg(&data->pdev->dev, "id=%d, irq=%d, temp=%d, thres=%d\n",
- sensor->id, data->irq_enabled, *temp, sensor->thres_temp);
- /*
- * Bind irq to sensor for two cases:
- * Reenable alarm IRQ if temperature below threshold;
- * if irq has been enabled, always set it;
- */
- if (data->irq_enabled) {
- hisi_thermal_enable_bind_irq_sensor(data);
- return 0;
+ data->clk = devm_clk_get(dev, "thermal_clk");
+ if (IS_ERR(data->clk)) {
+ ret = PTR_ERR(data->clk);
+ if (ret != -EPROBE_DEFER)
+ dev_err(dev, "failed to get thermal clk: %d\n", ret);
+ return ret;
}
- if (max_temp < sensor->thres_temp) {
- data->irq_enabled = true;
- hisi_thermal_enable_bind_irq_sensor(data);
- enable_irq(data->irq);
+ data->irq = platform_get_irq(pdev, 0);
+ if (data->irq < 0)
+ return data->irq;
+
+ data->sensor.id = HI6220_DEFAULT_SENSOR;
+
+ return 0;
+}
+
+static int hi3660_thermal_probe(struct hisi_thermal_data *data)
+{
+ struct platform_device *pdev = data->pdev;
+ struct device *dev = &pdev->dev;
+ struct resource *res;
+
+ data->get_temp = hi3660_thermal_get_temp;
+ data->enable_sensor = hi3660_thermal_enable_sensor;
+ data->disable_sensor = hi3660_thermal_disable_sensor;
+ data->irq_handler = hi3660_thermal_irq_handler;
+
+ res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ data->regs = devm_ioremap_resource(dev, res);
+ if (IS_ERR(data->regs)) {
+ dev_err(dev, "failed to get io address\n");
+ return PTR_ERR(data->regs);
}
+ data->irq = platform_get_irq(pdev, 0);
+ if (data->irq < 0)
+ return data->irq;
+
+ data->sensor.id = HI3660_DEFAULT_SENSOR;
+
+ return 0;
+}
+
+static int hisi_thermal_get_temp(void *__data, int *temp)
+{
+ struct hisi_thermal_data *data = __data;
+ struct hisi_thermal_sensor *sensor = &data->sensor;
+
+ *temp = data->get_temp(data);
+
+ dev_dbg(&data->pdev->dev, "id=%d, temp=%d, thres=%d\n",
+ sensor->id, *temp, sensor->thres_temp);
+
return 0;
}
@@ -231,35 +447,26 @@
.get_temp = hisi_thermal_get_temp,
};
-static irqreturn_t hisi_thermal_alarm_irq(int irq, void *dev)
-{
- struct hisi_thermal_data *data = dev;
-
- disable_irq_nosync(irq);
- data->irq_enabled = false;
-
- return IRQ_WAKE_THREAD;
-}
-
static irqreturn_t hisi_thermal_alarm_irq_thread(int irq, void *dev)
{
struct hisi_thermal_data *data = dev;
- struct hisi_thermal_sensor *sensor;
- int i;
+ struct hisi_thermal_sensor *sensor = &data->sensor;
+ int temp = 0;
- mutex_lock(&data->thermal_lock);
- sensor = &data->sensors[data->irq_bind_sensor];
+ data->irq_handler(data);
- dev_crit(&data->pdev->dev, "THERMAL ALARM: T > %d\n",
- sensor->thres_temp);
- mutex_unlock(&data->thermal_lock);
+ hisi_thermal_get_temp(data, &temp);
- for (i = 0; i < HISI_MAX_SENSORS; i++) {
- if (!data->sensors[i].tzd)
- continue;
+ if (temp >= sensor->thres_temp) {
+ dev_crit(&data->pdev->dev, "THERMAL ALARM: %d > %d\n",
+ temp, sensor->thres_temp);
- thermal_zone_device_update(data->sensors[i].tzd,
+ thermal_zone_device_update(data->sensor.tzd,
THERMAL_EVENT_UNSPECIFIED);
+
+ } else {
+ dev_crit(&data->pdev->dev, "THERMAL ALARM stopped: %d < %d\n",
+ temp, sensor->thres_temp);
}
return IRQ_HANDLED;
@@ -267,17 +474,14 @@
static int hisi_thermal_register_sensor(struct platform_device *pdev,
struct hisi_thermal_data *data,
- struct hisi_thermal_sensor *sensor,
- int index)
+ struct hisi_thermal_sensor *sensor)
{
int ret, i;
const struct thermal_trip *trip;
- sensor->id = index;
- sensor->thermal = data;
-
sensor->tzd = devm_thermal_zone_of_sensor_register(&pdev->dev,
- sensor->id, sensor, &hisi_of_thermal_ops);
+ sensor->id, data,
+ &hisi_of_thermal_ops);
if (IS_ERR(sensor->tzd)) {
ret = PTR_ERR(sensor->tzd);
sensor->tzd = NULL;
@@ -290,7 +494,7 @@
for (i = 0; i < of_thermal_get_ntrips(sensor->tzd); i++) {
if (trip[i].type == THERMAL_TRIP_PASSIVE) {
- sensor->thres_temp = hisi_thermal_round_temp(trip[i].temperature);
+ sensor->thres_temp = trip[i].temperature;
break;
}
}
@@ -299,7 +503,14 @@
}
static const struct of_device_id of_hisi_thermal_match[] = {
- { .compatible = "hisilicon,tsensor" },
+ {
+ .compatible = "hisilicon,tsensor",
+ .data = hi6220_thermal_probe
+ },
+ {
+ .compatible = "hisilicon,hi3660-tsensor",
+ .data = hi3660_thermal_probe
+ },
{ /* end */ }
};
MODULE_DEVICE_TABLE(of, of_hisi_thermal_match);
@@ -316,69 +527,51 @@
static int hisi_thermal_probe(struct platform_device *pdev)
{
struct hisi_thermal_data *data;
- struct resource *res;
- int i;
+ int const (*platform_probe)(struct hisi_thermal_data *);
+ struct device *dev = &pdev->dev;
int ret;
- data = devm_kzalloc(&pdev->dev, sizeof(*data), GFP_KERNEL);
+ data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
if (!data)
return -ENOMEM;
- mutex_init(&data->thermal_lock);
data->pdev = pdev;
-
- res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
- data->regs = devm_ioremap_resource(&pdev->dev, res);
- if (IS_ERR(data->regs)) {
- dev_err(&pdev->dev, "failed to get io address\n");
- return PTR_ERR(data->regs);
- }
-
- data->irq = platform_get_irq(pdev, 0);
- if (data->irq < 0)
- return data->irq;
-
platform_set_drvdata(pdev, data);
- data->clk = devm_clk_get(&pdev->dev, "thermal_clk");
- if (IS_ERR(data->clk)) {
- ret = PTR_ERR(data->clk);
- if (ret != -EPROBE_DEFER)
- dev_err(&pdev->dev,
- "failed to get thermal clk: %d\n", ret);
- return ret;
+ platform_probe = of_device_get_match_data(dev);
+ if (!platform_probe) {
+ dev_err(dev, "failed to get probe func\n");
+ return -EINVAL;
}
- /* enable clock for thermal */
- ret = clk_prepare_enable(data->clk);
+ ret = platform_probe(data);
+ if (ret)
+ return ret;
+
+ ret = hisi_thermal_register_sensor(pdev, data,
+ &data->sensor);
if (ret) {
- dev_err(&pdev->dev, "failed to enable thermal clk: %d\n", ret);
+ dev_err(dev, "failed to register thermal sensor: %d\n", ret);
return ret;
}
- hisi_thermal_enable_bind_irq_sensor(data);
- data->irq_enabled = true;
-
- for (i = 0; i < HISI_MAX_SENSORS; ++i) {
- ret = hisi_thermal_register_sensor(pdev, data,
- &data->sensors[i], i);
- if (ret)
- dev_err(&pdev->dev,
- "failed to register thermal sensor: %d\n", ret);
- else
- hisi_thermal_toggle_sensor(&data->sensors[i], true);
- }
-
- ret = devm_request_threaded_irq(&pdev->dev, data->irq,
- hisi_thermal_alarm_irq,
- hisi_thermal_alarm_irq_thread,
- 0, "hisi_thermal", data);
- if (ret < 0) {
- dev_err(&pdev->dev, "failed to request alarm irq: %d\n", ret);
+ ret = data->enable_sensor(data);
+ if (ret) {
+ dev_err(dev, "Failed to setup the sensor: %d\n", ret);
return ret;
}
- enable_irq(data->irq);
+ if (data->irq) {
+ ret = devm_request_threaded_irq(dev, data->irq, NULL,
+ hisi_thermal_alarm_irq_thread,
+ IRQF_ONESHOT, "hisi_thermal", data);
+ if (ret < 0) {
+ dev_err(dev, "failed to request alarm irq: %d\n", ret);
+ return ret;
+ }
+ }
+
+ hisi_thermal_toggle_sensor(&data->sensor, true);
return 0;
}
@@ -386,19 +579,11 @@
static int hisi_thermal_remove(struct platform_device *pdev)
{
struct hisi_thermal_data *data = platform_get_drvdata(pdev);
- int i;
+ struct hisi_thermal_sensor *sensor = &data->sensor;
- for (i = 0; i < HISI_MAX_SENSORS; i++) {
- struct hisi_thermal_sensor *sensor = &data->sensors[i];
+ hisi_thermal_toggle_sensor(sensor, false);
- if (!sensor->tzd)
- continue;
-
- hisi_thermal_toggle_sensor(sensor, false);
- }
-
- hisi_thermal_disable_sensor(data);
- clk_disable_unprepare(data->clk);
+ data->disable_sensor(data);
return 0;
}
@@ -408,10 +593,7 @@
{
struct hisi_thermal_data *data = dev_get_drvdata(dev);
- hisi_thermal_disable_sensor(data);
- data->irq_enabled = false;
-
- clk_disable_unprepare(data->clk);
+ data->disable_sensor(data);
return 0;
}
@@ -419,16 +601,8 @@
static int hisi_thermal_resume(struct device *dev)
{
struct hisi_thermal_data *data = dev_get_drvdata(dev);
- int ret;
- ret = clk_prepare_enable(data->clk);
- if (ret)
- return ret;
-
- data->irq_enabled = true;
- hisi_thermal_enable_bind_irq_sensor(data);
-
- return 0;
+ return data->enable_sensor(data);
}
#endif
diff --git a/drivers/tty/serial/kgdboc.c b/drivers/tty/serial/kgdboc.c
index a260cde..5532c44 100644
--- a/drivers/tty/serial/kgdboc.c
+++ b/drivers/tty/serial/kgdboc.c
@@ -245,7 +245,8 @@
kgdb_tty_line, chr);
}
-static int param_set_kgdboc_var(const char *kmessage, struct kernel_param *kp)
+static int param_set_kgdboc_var(const char *kmessage,
+ const struct kernel_param *kp)
{
int len = strlen(kmessage);
diff --git a/drivers/usb/gadget/Kconfig b/drivers/usb/gadget/Kconfig
index 31cce78..f572b64 100644
--- a/drivers/usb/gadget/Kconfig
+++ b/drivers/usb/gadget/Kconfig
@@ -215,6 +215,12 @@
config USB_F_TCM
tristate
+config USB_F_AUDIO_SRC
+ tristate
+
+config USB_F_ACC
+ tristate
+
# this first set of drivers all depend on bulk-capable hardware.
config USB_CONFIGFS
@@ -368,6 +374,30 @@
implemented in kernel space (for instance Ethernet, serial or
mass storage) and other are implemented in user space.
+config USB_CONFIGFS_F_ACC
+ boolean "Accessory gadget"
+ depends on USB_CONFIGFS
+ select USB_F_ACC
+ help
+ USB gadget Accessory support
+
+config USB_CONFIGFS_F_AUDIO_SRC
+ boolean "Audio Source gadget"
+ depends on USB_CONFIGFS && USB_CONFIGFS_F_ACC
+ depends on SND
+ select SND_PCM
+ select USB_F_AUDIO_SRC
+ help
+ USB gadget Audio Source support
+
+config USB_CONFIGFS_UEVENT
+ boolean "Uevent notification of Gadget state"
+ depends on USB_CONFIGFS
+ help
+ Enable uevent notifications to userspace when the gadget
+ state changes. The gadget can be in any of the following
+ three states: "CONNECTED/DISCONNECTED/CONFIGURED"
+
config USB_CONFIGFS_F_UAC1
bool "Audio Class 1.0"
depends on USB_CONFIGFS
diff --git a/drivers/usb/gadget/composite.c b/drivers/usb/gadget/composite.c
index b805962..b356719 100644
--- a/drivers/usb/gadget/composite.c
+++ b/drivers/usb/gadget/composite.c
@@ -2004,6 +2004,12 @@
struct usb_composite_dev *cdev = get_gadget_data(gadget);
unsigned long flags;
+ if (cdev == NULL) {
+ WARN(1, "%s: Calling disconnect on a Gadget that is \
+ not connected\n", __func__);
+ return;
+ }
+
/* REVISIT: should we have config and device level
* disconnect callbacks?
*/
diff --git a/drivers/usb/gadget/configfs.c b/drivers/usb/gadget/configfs.c
index aeb9f3c..bf2d0ce 100644
--- a/drivers/usb/gadget/configfs.c
+++ b/drivers/usb/gadget/configfs.c
@@ -9,6 +9,31 @@
#include "u_f.h"
#include "u_os_desc.h"
+#ifdef CONFIG_USB_CONFIGFS_UEVENT
+#include <linux/platform_device.h>
+#include <linux/kdev_t.h>
+#include <linux/usb/ch9.h>
+
+#ifdef CONFIG_USB_CONFIGFS_F_ACC
+extern int acc_ctrlrequest(struct usb_composite_dev *cdev,
+ const struct usb_ctrlrequest *ctrl);
+void acc_disconnect(void);
+#endif
+static struct class *android_class;
+static struct device *android_device;
+static int index;
+
+struct device *create_function_device(char *name)
+{
+ if (android_device && !IS_ERR(android_device))
+ return device_create(android_class, android_device,
+ MKDEV(0, index++), NULL, name);
+ else
+ return ERR_PTR(-EINVAL);
+}
+EXPORT_SYMBOL_GPL(create_function_device);
+#endif
+
int check_user_usb_string(const char *name,
struct usb_gadget_strings *stringtab_dev)
{
@@ -60,6 +85,12 @@
bool use_os_desc;
char b_vendor_code;
char qw_sign[OS_STRING_QW_SIGN_LEN];
+#ifdef CONFIG_USB_CONFIGFS_UEVENT
+ bool connected;
+ bool sw_connected;
+ struct work_struct work;
+ struct device *dev;
+#endif
};
static inline struct gadget_info *to_gadget_info(struct config_item *item)
@@ -265,7 +296,7 @@
mutex_lock(&gi->lock);
- if (!strlen(name)) {
+ if (!strlen(name) || strcmp(name, "none") == 0) {
ret = unregister_gadget(gi);
if (ret)
goto err;
@@ -1371,6 +1402,60 @@
return ret;
}
+#ifdef CONFIG_USB_CONFIGFS_UEVENT
+static void android_work(struct work_struct *data)
+{
+ struct gadget_info *gi = container_of(data, struct gadget_info, work);
+ struct usb_composite_dev *cdev = &gi->cdev;
+ char *disconnected[2] = { "USB_STATE=DISCONNECTED", NULL };
+ char *connected[2] = { "USB_STATE=CONNECTED", NULL };
+ char *configured[2] = { "USB_STATE=CONFIGURED", NULL };
+ /* 0-connected 1-configured 2-disconnected*/
+ bool status[3] = { false, false, false };
+ unsigned long flags;
+ bool uevent_sent = false;
+
+ spin_lock_irqsave(&cdev->lock, flags);
+ if (cdev->config)
+ status[1] = true;
+
+ if (gi->connected != gi->sw_connected) {
+ if (gi->connected)
+ status[0] = true;
+ else
+ status[2] = true;
+ gi->sw_connected = gi->connected;
+ }
+ spin_unlock_irqrestore(&cdev->lock, flags);
+
+ if (status[0]) {
+ kobject_uevent_env(&android_device->kobj,
+ KOBJ_CHANGE, connected);
+ pr_info("%s: sent uevent %s\n", __func__, connected[0]);
+ uevent_sent = true;
+ }
+
+ if (status[1]) {
+ kobject_uevent_env(&android_device->kobj,
+ KOBJ_CHANGE, configured);
+ pr_info("%s: sent uevent %s\n", __func__, configured[0]);
+ uevent_sent = true;
+ }
+
+ if (status[2]) {
+ kobject_uevent_env(&android_device->kobj,
+ KOBJ_CHANGE, disconnected);
+ pr_info("%s: sent uevent %s\n", __func__, disconnected[0]);
+ uevent_sent = true;
+ }
+
+ if (!uevent_sent) {
+ pr_info("%s: did not send uevent (%d %d %p)\n", __func__,
+ gi->connected, gi->sw_connected, cdev->config);
+ }
+}
+#endif
+
static void configfs_composite_unbind(struct usb_gadget *gadget)
{
struct usb_composite_dev *cdev;
@@ -1390,14 +1475,91 @@
set_gadget_data(gadget, NULL);
}
+#ifdef CONFIG_USB_CONFIGFS_UEVENT
+static int android_setup(struct usb_gadget *gadget,
+ const struct usb_ctrlrequest *c)
+{
+ struct usb_composite_dev *cdev = get_gadget_data(gadget);
+ unsigned long flags;
+ struct gadget_info *gi = container_of(cdev, struct gadget_info, cdev);
+ int value = -EOPNOTSUPP;
+ struct usb_function_instance *fi;
+
+ spin_lock_irqsave(&cdev->lock, flags);
+ if (!gi->connected) {
+ gi->connected = 1;
+ schedule_work(&gi->work);
+ }
+ spin_unlock_irqrestore(&cdev->lock, flags);
+ list_for_each_entry(fi, &gi->available_func, cfs_list) {
+ if (fi != NULL && fi->f != NULL && fi->f->setup != NULL) {
+ value = fi->f->setup(fi->f, c);
+ if (value >= 0)
+ break;
+ }
+ }
+
+#ifdef CONFIG_USB_CONFIGFS_F_ACC
+ if (value < 0)
+ value = acc_ctrlrequest(cdev, c);
+#endif
+
+ if (value < 0)
+ value = composite_setup(gadget, c);
+
+ spin_lock_irqsave(&cdev->lock, flags);
+ if (c->bRequest == USB_REQ_SET_CONFIGURATION &&
+ cdev->config) {
+ schedule_work(&gi->work);
+ }
+ spin_unlock_irqrestore(&cdev->lock, flags);
+
+ return value;
+}
+
+static void android_disconnect(struct usb_gadget *gadget)
+{
+ struct usb_composite_dev *cdev = get_gadget_data(gadget);
+ struct gadget_info *gi = container_of(cdev, struct gadget_info, cdev);
+
+ /* FIXME: There's a race between usb_gadget_udc_stop() which is likely
+ * to set the gadget driver to NULL in the udc driver and this drivers
+ * gadget disconnect fn which likely checks for the gadget driver to
+ * be a null ptr. It happens that unbind (doing set_gadget_data(NULL))
+ * is called before the gadget driver is set to NULL and the udc driver
+ * calls disconnect fn which results in cdev being a null ptr.
+ */
+ if (cdev == NULL) {
+ WARN(1, "%s: gadget driver already disconnected\n", __func__);
+ return;
+ }
+
+ /* accessory HID support can be active while the
+ accessory function is not actually enabled,
+ so we need to inform it when we are disconnected.
+ */
+
+#ifdef CONFIG_USB_CONFIGFS_F_ACC
+ acc_disconnect();
+#endif
+ gi->connected = 0;
+ schedule_work(&gi->work);
+ composite_disconnect(gadget);
+}
+#endif
+
static const struct usb_gadget_driver configfs_driver_template = {
.bind = configfs_composite_bind,
.unbind = configfs_composite_unbind,
-
+#ifdef CONFIG_USB_CONFIGFS_UEVENT
+ .setup = android_setup,
+ .reset = android_disconnect,
+ .disconnect = android_disconnect,
+#else
.setup = composite_setup,
.reset = composite_disconnect,
.disconnect = composite_disconnect,
-
+#endif
.suspend = composite_suspend,
.resume = composite_resume,
@@ -1409,6 +1571,89 @@
.match_existing_only = 1,
};
+#ifdef CONFIG_USB_CONFIGFS_UEVENT
+static ssize_t state_show(struct device *pdev, struct device_attribute *attr,
+ char *buf)
+{
+ struct gadget_info *dev = dev_get_drvdata(pdev);
+ struct usb_composite_dev *cdev;
+ char *state = "DISCONNECTED";
+ unsigned long flags;
+
+ if (!dev)
+ goto out;
+
+ cdev = &dev->cdev;
+
+ if (!cdev)
+ goto out;
+
+ spin_lock_irqsave(&cdev->lock, flags);
+ if (cdev->config)
+ state = "CONFIGURED";
+ else if (dev->connected)
+ state = "CONNECTED";
+ spin_unlock_irqrestore(&cdev->lock, flags);
+out:
+ return sprintf(buf, "%s\n", state);
+}
+
+static DEVICE_ATTR(state, S_IRUGO, state_show, NULL);
+
+static struct device_attribute *android_usb_attributes[] = {
+ &dev_attr_state,
+ NULL
+};
+
+static int android_device_create(struct gadget_info *gi)
+{
+ struct device_attribute **attrs;
+ struct device_attribute *attr;
+
+ INIT_WORK(&gi->work, android_work);
+ android_device = device_create(android_class, NULL,
+ MKDEV(0, 0), NULL, "android0");
+ if (IS_ERR(android_device))
+ return PTR_ERR(android_device);
+
+ dev_set_drvdata(android_device, gi);
+
+ attrs = android_usb_attributes;
+ while ((attr = *attrs++)) {
+ int err;
+
+ err = device_create_file(android_device, attr);
+ if (err) {
+ device_destroy(android_device->class,
+ android_device->devt);
+ return err;
+ }
+ }
+
+ return 0;
+}
+
+static void android_device_destroy(void)
+{
+ struct device_attribute **attrs;
+ struct device_attribute *attr;
+
+ attrs = android_usb_attributes;
+ while ((attr = *attrs++))
+ device_remove_file(android_device, attr);
+ device_destroy(android_device->class, android_device->devt);
+}
+#else
+static inline int android_device_create(struct gadget_info *gi)
+{
+ return 0;
+}
+
+static inline void android_device_destroy(void)
+{
+}
+#endif
+
static struct config_group *gadgets_make(
struct config_group *group,
const char *name)
@@ -1460,7 +1705,11 @@
if (!gi->composite.gadget_driver.function)
goto err;
+ if (android_device_create(gi) < 0)
+ goto err;
+
return &gi->group;
+
err:
kfree(gi);
return ERR_PTR(-ENOMEM);
@@ -1469,6 +1718,7 @@
static void gadgets_drop(struct config_group *group, struct config_item *item)
{
config_item_put(item);
+ android_device_destroy();
}
static struct configfs_group_operations gadgets_ops = {
@@ -1508,6 +1758,13 @@
config_group_init(&gadget_subsys.su_group);
ret = configfs_register_subsystem(&gadget_subsys);
+
+#ifdef CONFIG_USB_CONFIGFS_UEVENT
+ android_class = class_create(THIS_MODULE, "android_usb");
+ if (IS_ERR(android_class))
+ return PTR_ERR(android_class);
+#endif
+
return ret;
}
module_init(gadget_cfs_init);
@@ -1515,5 +1772,10 @@
static void __exit gadget_cfs_exit(void)
{
configfs_unregister_subsystem(&gadget_subsys);
+#ifdef CONFIG_USB_CONFIGFS_UEVENT
+ if (!IS_ERR(android_class))
+ class_destroy(android_class);
+#endif
+
}
module_exit(gadget_cfs_exit);
diff --git a/drivers/usb/gadget/function/Makefile b/drivers/usb/gadget/function/Makefile
index 5d3a6cf..d7d5673 100644
--- a/drivers/usb/gadget/function/Makefile
+++ b/drivers/usb/gadget/function/Makefile
@@ -50,3 +50,7 @@
obj-$(CONFIG_USB_F_PRINTER) += usb_f_printer.o
usb_f_tcm-y := f_tcm.o
obj-$(CONFIG_USB_F_TCM) += usb_f_tcm.o
+usb_f_audio_source-y := f_audio_source.o
+obj-$(CONFIG_USB_F_AUDIO_SRC) += usb_f_audio_source.o
+usb_f_accessory-y := f_accessory.o
+obj-$(CONFIG_USB_F_ACC) += usb_f_accessory.o
diff --git a/drivers/usb/gadget/function/f_accessory.c b/drivers/usb/gadget/function/f_accessory.c
new file mode 100644
index 0000000..7aa2656
--- /dev/null
+++ b/drivers/usb/gadget/function/f_accessory.c
@@ -0,0 +1,1352 @@
+/*
+ * Gadget Function Driver for Android USB accessories
+ *
+ * Copyright (C) 2011 Google, Inc.
+ * Author: Mike Lockwood <lockwood@android.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+/* #define DEBUG */
+/* #define VERBOSE_DEBUG */
+
+#include <linux/module.h>
+#include <linux/init.h>
+#include <linux/poll.h>
+#include <linux/delay.h>
+#include <linux/wait.h>
+#include <linux/err.h>
+#include <linux/interrupt.h>
+#include <linux/kthread.h>
+#include <linux/freezer.h>
+
+#include <linux/types.h>
+#include <linux/file.h>
+#include <linux/device.h>
+#include <linux/miscdevice.h>
+
+#include <linux/hid.h>
+#include <linux/hiddev.h>
+#include <linux/usb.h>
+#include <linux/usb/ch9.h>
+#include <linux/usb/f_accessory.h>
+
+#include <linux/configfs.h>
+#include <linux/usb/composite.h>
+
+#define MAX_INST_NAME_LEN 40
+#define BULK_BUFFER_SIZE 16384
+#define ACC_STRING_SIZE 256
+
+#define PROTOCOL_VERSION 2
+
+/* String IDs */
+#define INTERFACE_STRING_INDEX 0
+
+/* number of tx and rx requests to allocate */
+#define TX_REQ_MAX 4
+#define RX_REQ_MAX 2
+
+struct acc_hid_dev {
+ struct list_head list;
+ struct hid_device *hid;
+ struct acc_dev *dev;
+ /* accessory defined ID */
+ int id;
+ /* HID report descriptor */
+ u8 *report_desc;
+ /* length of HID report descriptor */
+ int report_desc_len;
+ /* number of bytes of report_desc we have received so far */
+ int report_desc_offset;
+};
+
+struct acc_dev {
+ struct usb_function function;
+ struct usb_composite_dev *cdev;
+ spinlock_t lock;
+
+ struct usb_ep *ep_in;
+ struct usb_ep *ep_out;
+
+ /* online indicates state of function_set_alt & function_unbind
+ * set to 1 when we connect
+ */
+ int online:1;
+
+ /* disconnected indicates state of open & release
+ * Set to 1 when we disconnect.
+ * Not cleared until our file is closed.
+ */
+ int disconnected:1;
+
+ /* strings sent by the host */
+ char manufacturer[ACC_STRING_SIZE];
+ char model[ACC_STRING_SIZE];
+ char description[ACC_STRING_SIZE];
+ char version[ACC_STRING_SIZE];
+ char uri[ACC_STRING_SIZE];
+ char serial[ACC_STRING_SIZE];
+
+ /* for acc_complete_set_string */
+ int string_index;
+
+ /* set to 1 if we have a pending start request */
+ int start_requested;
+
+ int audio_mode;
+
+ /* synchronize access to our device file */
+ atomic_t open_excl;
+
+ struct list_head tx_idle;
+
+ wait_queue_head_t read_wq;
+ wait_queue_head_t write_wq;
+ struct usb_request *rx_req[RX_REQ_MAX];
+ int rx_done;
+
+ /* delayed work for handling ACCESSORY_START */
+ struct delayed_work start_work;
+
+ /* worker for registering and unregistering hid devices */
+ struct work_struct hid_work;
+
+ /* list of active HID devices */
+ struct list_head hid_list;
+
+ /* list of new HID devices to register */
+ struct list_head new_hid_list;
+
+ /* list of dead HID devices to unregister */
+ struct list_head dead_hid_list;
+};
+
+static struct usb_interface_descriptor acc_interface_desc = {
+ .bLength = USB_DT_INTERFACE_SIZE,
+ .bDescriptorType = USB_DT_INTERFACE,
+ .bInterfaceNumber = 0,
+ .bNumEndpoints = 2,
+ .bInterfaceClass = USB_CLASS_VENDOR_SPEC,
+ .bInterfaceSubClass = USB_SUBCLASS_VENDOR_SPEC,
+ .bInterfaceProtocol = 0,
+};
+
+static struct usb_endpoint_descriptor acc_highspeed_in_desc = {
+ .bLength = USB_DT_ENDPOINT_SIZE,
+ .bDescriptorType = USB_DT_ENDPOINT,
+ .bEndpointAddress = USB_DIR_IN,
+ .bmAttributes = USB_ENDPOINT_XFER_BULK,
+ .wMaxPacketSize = __constant_cpu_to_le16(512),
+};
+
+static struct usb_endpoint_descriptor acc_highspeed_out_desc = {
+ .bLength = USB_DT_ENDPOINT_SIZE,
+ .bDescriptorType = USB_DT_ENDPOINT,
+ .bEndpointAddress = USB_DIR_OUT,
+ .bmAttributes = USB_ENDPOINT_XFER_BULK,
+ .wMaxPacketSize = __constant_cpu_to_le16(512),
+};
+
+static struct usb_endpoint_descriptor acc_fullspeed_in_desc = {
+ .bLength = USB_DT_ENDPOINT_SIZE,
+ .bDescriptorType = USB_DT_ENDPOINT,
+ .bEndpointAddress = USB_DIR_IN,
+ .bmAttributes = USB_ENDPOINT_XFER_BULK,
+};
+
+static struct usb_endpoint_descriptor acc_fullspeed_out_desc = {
+ .bLength = USB_DT_ENDPOINT_SIZE,
+ .bDescriptorType = USB_DT_ENDPOINT,
+ .bEndpointAddress = USB_DIR_OUT,
+ .bmAttributes = USB_ENDPOINT_XFER_BULK,
+};
+
+static struct usb_descriptor_header *fs_acc_descs[] = {
+ (struct usb_descriptor_header *) &acc_interface_desc,
+ (struct usb_descriptor_header *) &acc_fullspeed_in_desc,
+ (struct usb_descriptor_header *) &acc_fullspeed_out_desc,
+ NULL,
+};
+
+static struct usb_descriptor_header *hs_acc_descs[] = {
+ (struct usb_descriptor_header *) &acc_interface_desc,
+ (struct usb_descriptor_header *) &acc_highspeed_in_desc,
+ (struct usb_descriptor_header *) &acc_highspeed_out_desc,
+ NULL,
+};
+
+static struct usb_string acc_string_defs[] = {
+ [INTERFACE_STRING_INDEX].s = "Android Accessory Interface",
+ { }, /* end of list */
+};
+
+static struct usb_gadget_strings acc_string_table = {
+ .language = 0x0409, /* en-US */
+ .strings = acc_string_defs,
+};
+
+static struct usb_gadget_strings *acc_strings[] = {
+ &acc_string_table,
+ NULL,
+};
+
+/* temporary variable used between acc_open() and acc_gadget_bind() */
+static struct acc_dev *_acc_dev;
+
+struct acc_instance {
+ struct usb_function_instance func_inst;
+ const char *name;
+};
+
+static inline struct acc_dev *func_to_dev(struct usb_function *f)
+{
+ return container_of(f, struct acc_dev, function);
+}
+
+static struct usb_request *acc_request_new(struct usb_ep *ep, int buffer_size)
+{
+ struct usb_request *req = usb_ep_alloc_request(ep, GFP_KERNEL);
+
+ if (!req)
+ return NULL;
+
+ /* now allocate buffers for the requests */
+ req->buf = kmalloc(buffer_size, GFP_KERNEL);
+ if (!req->buf) {
+ usb_ep_free_request(ep, req);
+ return NULL;
+ }
+
+ return req;
+}
+
+static void acc_request_free(struct usb_request *req, struct usb_ep *ep)
+{
+ if (req) {
+ kfree(req->buf);
+ usb_ep_free_request(ep, req);
+ }
+}
+
+/* add a request to the tail of a list */
+static void req_put(struct acc_dev *dev, struct list_head *head,
+ struct usb_request *req)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&dev->lock, flags);
+ list_add_tail(&req->list, head);
+ spin_unlock_irqrestore(&dev->lock, flags);
+}
+
+/* remove a request from the head of a list */
+static struct usb_request *req_get(struct acc_dev *dev, struct list_head *head)
+{
+ unsigned long flags;
+ struct usb_request *req;
+
+ spin_lock_irqsave(&dev->lock, flags);
+ if (list_empty(head)) {
+ req = 0;
+ } else {
+ req = list_first_entry(head, struct usb_request, list);
+ list_del(&req->list);
+ }
+ spin_unlock_irqrestore(&dev->lock, flags);
+ return req;
+}
+
+static void acc_set_disconnected(struct acc_dev *dev)
+{
+ dev->disconnected = 1;
+}
+
+static void acc_complete_in(struct usb_ep *ep, struct usb_request *req)
+{
+ struct acc_dev *dev = _acc_dev;
+
+ if (req->status == -ESHUTDOWN) {
+ pr_debug("acc_complete_in set disconnected");
+ acc_set_disconnected(dev);
+ }
+
+ req_put(dev, &dev->tx_idle, req);
+
+ wake_up(&dev->write_wq);
+}
+
+static void acc_complete_out(struct usb_ep *ep, struct usb_request *req)
+{
+ struct acc_dev *dev = _acc_dev;
+
+ dev->rx_done = 1;
+ if (req->status == -ESHUTDOWN) {
+ pr_debug("acc_complete_out set disconnected");
+ acc_set_disconnected(dev);
+ }
+
+ wake_up(&dev->read_wq);
+}
+
+static void acc_complete_set_string(struct usb_ep *ep, struct usb_request *req)
+{
+ struct acc_dev *dev = ep->driver_data;
+ char *string_dest = NULL;
+ int length = req->actual;
+
+ if (req->status != 0) {
+ pr_err("acc_complete_set_string, err %d\n", req->status);
+ return;
+ }
+
+ switch (dev->string_index) {
+ case ACCESSORY_STRING_MANUFACTURER:
+ string_dest = dev->manufacturer;
+ break;
+ case ACCESSORY_STRING_MODEL:
+ string_dest = dev->model;
+ break;
+ case ACCESSORY_STRING_DESCRIPTION:
+ string_dest = dev->description;
+ break;
+ case ACCESSORY_STRING_VERSION:
+ string_dest = dev->version;
+ break;
+ case ACCESSORY_STRING_URI:
+ string_dest = dev->uri;
+ break;
+ case ACCESSORY_STRING_SERIAL:
+ string_dest = dev->serial;
+ break;
+ }
+ if (string_dest) {
+ unsigned long flags;
+
+ if (length >= ACC_STRING_SIZE)
+ length = ACC_STRING_SIZE - 1;
+
+ spin_lock_irqsave(&dev->lock, flags);
+ memcpy(string_dest, req->buf, length);
+ /* ensure zero termination */
+ string_dest[length] = 0;
+ spin_unlock_irqrestore(&dev->lock, flags);
+ } else {
+ pr_err("unknown accessory string index %d\n",
+ dev->string_index);
+ }
+}
+
+static void acc_complete_set_hid_report_desc(struct usb_ep *ep,
+ struct usb_request *req)
+{
+ struct acc_hid_dev *hid = req->context;
+ struct acc_dev *dev = hid->dev;
+ int length = req->actual;
+
+ if (req->status != 0) {
+ pr_err("acc_complete_set_hid_report_desc, err %d\n",
+ req->status);
+ return;
+ }
+
+ memcpy(hid->report_desc + hid->report_desc_offset, req->buf, length);
+ hid->report_desc_offset += length;
+ if (hid->report_desc_offset == hid->report_desc_len) {
+ /* After we have received the entire report descriptor
+ * we schedule work to initialize the HID device
+ */
+ schedule_work(&dev->hid_work);
+ }
+}
+
+static void acc_complete_send_hid_event(struct usb_ep *ep,
+ struct usb_request *req)
+{
+ struct acc_hid_dev *hid = req->context;
+ int length = req->actual;
+
+ if (req->status != 0) {
+ pr_err("acc_complete_send_hid_event, err %d\n", req->status);
+ return;
+ }
+
+ hid_report_raw_event(hid->hid, HID_INPUT_REPORT, req->buf, length, 1);
+}
+
+static int acc_hid_parse(struct hid_device *hid)
+{
+ struct acc_hid_dev *hdev = hid->driver_data;
+
+ hid_parse_report(hid, hdev->report_desc, hdev->report_desc_len);
+ return 0;
+}
+
+static int acc_hid_start(struct hid_device *hid)
+{
+ return 0;
+}
+
+static void acc_hid_stop(struct hid_device *hid)
+{
+}
+
+static int acc_hid_open(struct hid_device *hid)
+{
+ return 0;
+}
+
+static void acc_hid_close(struct hid_device *hid)
+{
+}
+
+static int acc_hid_raw_request(struct hid_device *hid, unsigned char reportnum,
+ __u8 *buf, size_t len, unsigned char rtype, int reqtype)
+{
+ return 0;
+}
+
+static struct hid_ll_driver acc_hid_ll_driver = {
+ .parse = acc_hid_parse,
+ .start = acc_hid_start,
+ .stop = acc_hid_stop,
+ .open = acc_hid_open,
+ .close = acc_hid_close,
+ .raw_request = acc_hid_raw_request,
+};
+
+static struct acc_hid_dev *acc_hid_new(struct acc_dev *dev,
+ int id, int desc_len)
+{
+ struct acc_hid_dev *hdev;
+
+ hdev = kzalloc(sizeof(*hdev), GFP_ATOMIC);
+ if (!hdev)
+ return NULL;
+ hdev->report_desc = kzalloc(desc_len, GFP_ATOMIC);
+ if (!hdev->report_desc) {
+ kfree(hdev);
+ return NULL;
+ }
+ hdev->dev = dev;
+ hdev->id = id;
+ hdev->report_desc_len = desc_len;
+
+ return hdev;
+}
+
+static struct acc_hid_dev *acc_hid_get(struct list_head *list, int id)
+{
+ struct acc_hid_dev *hid;
+
+ list_for_each_entry(hid, list, list) {
+ if (hid->id == id)
+ return hid;
+ }
+ return NULL;
+}
+
+static int acc_register_hid(struct acc_dev *dev, int id, int desc_length)
+{
+ struct acc_hid_dev *hid;
+ unsigned long flags;
+
+ /* report descriptor length must be > 0 */
+ if (desc_length <= 0)
+ return -EINVAL;
+
+ spin_lock_irqsave(&dev->lock, flags);
+ /* replace HID if one already exists with this ID */
+ hid = acc_hid_get(&dev->hid_list, id);
+ if (!hid)
+ hid = acc_hid_get(&dev->new_hid_list, id);
+ if (hid)
+ list_move(&hid->list, &dev->dead_hid_list);
+
+ hid = acc_hid_new(dev, id, desc_length);
+ if (!hid) {
+ spin_unlock_irqrestore(&dev->lock, flags);
+ return -ENOMEM;
+ }
+
+ list_add(&hid->list, &dev->new_hid_list);
+ spin_unlock_irqrestore(&dev->lock, flags);
+
+ /* schedule work to register the HID device */
+ schedule_work(&dev->hid_work);
+ return 0;
+}
+
+static int acc_unregister_hid(struct acc_dev *dev, int id)
+{
+ struct acc_hid_dev *hid;
+ unsigned long flags;
+
+ spin_lock_irqsave(&dev->lock, flags);
+ hid = acc_hid_get(&dev->hid_list, id);
+ if (!hid)
+ hid = acc_hid_get(&dev->new_hid_list, id);
+ if (!hid) {
+ spin_unlock_irqrestore(&dev->lock, flags);
+ return -EINVAL;
+ }
+
+ list_move(&hid->list, &dev->dead_hid_list);
+ spin_unlock_irqrestore(&dev->lock, flags);
+
+ schedule_work(&dev->hid_work);
+ return 0;
+}
+
+static int create_bulk_endpoints(struct acc_dev *dev,
+ struct usb_endpoint_descriptor *in_desc,
+ struct usb_endpoint_descriptor *out_desc)
+{
+ struct usb_composite_dev *cdev = dev->cdev;
+ struct usb_request *req;
+ struct usb_ep *ep;
+ int i;
+
+ DBG(cdev, "create_bulk_endpoints dev: %p\n", dev);
+
+ ep = usb_ep_autoconfig(cdev->gadget, in_desc);
+ if (!ep) {
+ DBG(cdev, "usb_ep_autoconfig for ep_in failed\n");
+ return -ENODEV;
+ }
+ DBG(cdev, "usb_ep_autoconfig for ep_in got %s\n", ep->name);
+ ep->driver_data = dev; /* claim the endpoint */
+ dev->ep_in = ep;
+
+ ep = usb_ep_autoconfig(cdev->gadget, out_desc);
+ if (!ep) {
+ DBG(cdev, "usb_ep_autoconfig for ep_out failed\n");
+ return -ENODEV;
+ }
+ DBG(cdev, "usb_ep_autoconfig for ep_out got %s\n", ep->name);
+ ep->driver_data = dev; /* claim the endpoint */
+ dev->ep_out = ep;
+
+ /* now allocate requests for our endpoints */
+ for (i = 0; i < TX_REQ_MAX; i++) {
+ req = acc_request_new(dev->ep_in, BULK_BUFFER_SIZE);
+ if (!req)
+ goto fail;
+ req->complete = acc_complete_in;
+ req_put(dev, &dev->tx_idle, req);
+ }
+ for (i = 0; i < RX_REQ_MAX; i++) {
+ req = acc_request_new(dev->ep_out, BULK_BUFFER_SIZE);
+ if (!req)
+ goto fail;
+ req->complete = acc_complete_out;
+ dev->rx_req[i] = req;
+ }
+
+ return 0;
+
+fail:
+ pr_err("acc_bind() could not allocate requests\n");
+ while ((req = req_get(dev, &dev->tx_idle)))
+ acc_request_free(req, dev->ep_in);
+ for (i = 0; i < RX_REQ_MAX; i++)
+ acc_request_free(dev->rx_req[i], dev->ep_out);
+ return -1;
+}
+
+static ssize_t acc_read(struct file *fp, char __user *buf,
+ size_t count, loff_t *pos)
+{
+ struct acc_dev *dev = fp->private_data;
+ struct usb_request *req;
+ ssize_t r = count;
+ unsigned xfer;
+ int ret = 0;
+
+ pr_debug("acc_read(%zu)\n", count);
+
+ if (dev->disconnected) {
+ pr_debug("acc_read disconnected");
+ return -ENODEV;
+ }
+
+ if (count > BULK_BUFFER_SIZE)
+ count = BULK_BUFFER_SIZE;
+
+ /* we will block until we're online */
+ pr_debug("acc_read: waiting for online\n");
+ ret = wait_event_interruptible(dev->read_wq, dev->online);
+ if (ret < 0) {
+ r = ret;
+ goto done;
+ }
+
+ if (dev->rx_done) {
+ // last req cancelled. try to get it.
+ req = dev->rx_req[0];
+ goto copy_data;
+ }
+
+requeue_req:
+ /* queue a request */
+ req = dev->rx_req[0];
+ req->length = count;
+ dev->rx_done = 0;
+ ret = usb_ep_queue(dev->ep_out, req, GFP_KERNEL);
+ if (ret < 0) {
+ r = -EIO;
+ goto done;
+ } else {
+ pr_debug("rx %p queue\n", req);
+ }
+
+ /* wait for a request to complete */
+ ret = wait_event_interruptible(dev->read_wq, dev->rx_done);
+ if (ret < 0) {
+ r = ret;
+ ret = usb_ep_dequeue(dev->ep_out, req);
+ if (ret != 0) {
+ // cancel failed. There can be a data already received.
+ // it will be retrieved in the next read.
+ pr_debug("acc_read: cancelling failed %d", ret);
+ }
+ goto done;
+ }
+
+copy_data:
+ dev->rx_done = 0;
+ if (dev->online) {
+ /* If we got a 0-len packet, throw it back and try again. */
+ if (req->actual == 0)
+ goto requeue_req;
+
+ pr_debug("rx %p %u\n", req, req->actual);
+ xfer = (req->actual < count) ? req->actual : count;
+ r = xfer;
+ if (copy_to_user(buf, req->buf, xfer))
+ r = -EFAULT;
+ } else
+ r = -EIO;
+
+done:
+ pr_debug("acc_read returning %zd\n", r);
+ return r;
+}
+
+static ssize_t acc_write(struct file *fp, const char __user *buf,
+ size_t count, loff_t *pos)
+{
+ struct acc_dev *dev = fp->private_data;
+ struct usb_request *req = 0;
+ ssize_t r = count;
+ unsigned xfer;
+ int ret;
+
+ pr_debug("acc_write(%zu)\n", count);
+
+ if (!dev->online || dev->disconnected) {
+ pr_debug("acc_write disconnected or not online");
+ return -ENODEV;
+ }
+
+ while (count > 0) {
+ if (!dev->online) {
+ pr_debug("acc_write dev->error\n");
+ r = -EIO;
+ break;
+ }
+
+ /* get an idle tx request to use */
+ req = 0;
+ ret = wait_event_interruptible(dev->write_wq,
+ ((req = req_get(dev, &dev->tx_idle)) || !dev->online));
+ if (!req) {
+ r = ret;
+ break;
+ }
+
+ if (count > BULK_BUFFER_SIZE) {
+ xfer = BULK_BUFFER_SIZE;
+ /* ZLP, They will be more TX requests so not yet. */
+ req->zero = 0;
+ } else {
+ xfer = count;
+ /* If the data length is a multple of the
+ * maxpacket size then send a zero length packet(ZLP).
+ */
+ req->zero = ((xfer % dev->ep_in->maxpacket) == 0);
+ }
+ if (copy_from_user(req->buf, buf, xfer)) {
+ r = -EFAULT;
+ break;
+ }
+
+ req->length = xfer;
+ ret = usb_ep_queue(dev->ep_in, req, GFP_KERNEL);
+ if (ret < 0) {
+ pr_debug("acc_write: xfer error %d\n", ret);
+ r = -EIO;
+ break;
+ }
+
+ buf += xfer;
+ count -= xfer;
+
+ /* zero this so we don't try to free it on error exit */
+ req = 0;
+ }
+
+ if (req)
+ req_put(dev, &dev->tx_idle, req);
+
+ pr_debug("acc_write returning %zd\n", r);
+ return r;
+}
+
+static long acc_ioctl(struct file *fp, unsigned code, unsigned long value)
+{
+ struct acc_dev *dev = fp->private_data;
+ char *src = NULL;
+ int ret;
+
+ switch (code) {
+ case ACCESSORY_GET_STRING_MANUFACTURER:
+ src = dev->manufacturer;
+ break;
+ case ACCESSORY_GET_STRING_MODEL:
+ src = dev->model;
+ break;
+ case ACCESSORY_GET_STRING_DESCRIPTION:
+ src = dev->description;
+ break;
+ case ACCESSORY_GET_STRING_VERSION:
+ src = dev->version;
+ break;
+ case ACCESSORY_GET_STRING_URI:
+ src = dev->uri;
+ break;
+ case ACCESSORY_GET_STRING_SERIAL:
+ src = dev->serial;
+ break;
+ case ACCESSORY_IS_START_REQUESTED:
+ return dev->start_requested;
+ case ACCESSORY_GET_AUDIO_MODE:
+ return dev->audio_mode;
+ }
+ if (!src)
+ return -EINVAL;
+
+ ret = strlen(src) + 1;
+ if (copy_to_user((void __user *)value, src, ret))
+ ret = -EFAULT;
+ return ret;
+}
+
+static int acc_open(struct inode *ip, struct file *fp)
+{
+ printk(KERN_INFO "acc_open\n");
+ if (atomic_xchg(&_acc_dev->open_excl, 1))
+ return -EBUSY;
+
+ _acc_dev->disconnected = 0;
+ fp->private_data = _acc_dev;
+ return 0;
+}
+
+static int acc_release(struct inode *ip, struct file *fp)
+{
+ printk(KERN_INFO "acc_release\n");
+
+ WARN_ON(!atomic_xchg(&_acc_dev->open_excl, 0));
+ /* indicate that we are disconnected
+ * still could be online so don't touch online flag
+ */
+ _acc_dev->disconnected = 1;
+ return 0;
+}
+
+/* file operations for /dev/usb_accessory */
+static const struct file_operations acc_fops = {
+ .owner = THIS_MODULE,
+ .read = acc_read,
+ .write = acc_write,
+ .unlocked_ioctl = acc_ioctl,
+ .open = acc_open,
+ .release = acc_release,
+};
+
+static int acc_hid_probe(struct hid_device *hdev,
+ const struct hid_device_id *id)
+{
+ int ret;
+
+ ret = hid_parse(hdev);
+ if (ret)
+ return ret;
+ return hid_hw_start(hdev, HID_CONNECT_DEFAULT);
+}
+
+static struct miscdevice acc_device = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = "usb_accessory",
+ .fops = &acc_fops,
+};
+
+static const struct hid_device_id acc_hid_table[] = {
+ { HID_USB_DEVICE(HID_ANY_ID, HID_ANY_ID) },
+ { }
+};
+
+static struct hid_driver acc_hid_driver = {
+ .name = "USB accessory",
+ .id_table = acc_hid_table,
+ .probe = acc_hid_probe,
+};
+
+static void acc_complete_setup_noop(struct usb_ep *ep, struct usb_request *req)
+{
+ /*
+ * Default no-op function when nothing needs to be done for the
+ * setup request
+ */
+}
+
+int acc_ctrlrequest(struct usb_composite_dev *cdev,
+ const struct usb_ctrlrequest *ctrl)
+{
+ struct acc_dev *dev = _acc_dev;
+ int value = -EOPNOTSUPP;
+ struct acc_hid_dev *hid;
+ int offset;
+ u8 b_requestType = ctrl->bRequestType;
+ u8 b_request = ctrl->bRequest;
+ u16 w_index = le16_to_cpu(ctrl->wIndex);
+ u16 w_value = le16_to_cpu(ctrl->wValue);
+ u16 w_length = le16_to_cpu(ctrl->wLength);
+ unsigned long flags;
+
+/*
+ printk(KERN_INFO "acc_ctrlrequest "
+ "%02x.%02x v%04x i%04x l%u\n",
+ b_requestType, b_request,
+ w_value, w_index, w_length);
+*/
+
+ if (b_requestType == (USB_DIR_OUT | USB_TYPE_VENDOR)) {
+ if (b_request == ACCESSORY_START) {
+ dev->start_requested = 1;
+ schedule_delayed_work(
+ &dev->start_work, msecs_to_jiffies(10));
+ value = 0;
+ cdev->req->complete = acc_complete_setup_noop;
+ } else if (b_request == ACCESSORY_SEND_STRING) {
+ dev->string_index = w_index;
+ cdev->gadget->ep0->driver_data = dev;
+ cdev->req->complete = acc_complete_set_string;
+ value = w_length;
+ } else if (b_request == ACCESSORY_SET_AUDIO_MODE &&
+ w_index == 0 && w_length == 0) {
+ dev->audio_mode = w_value;
+ cdev->req->complete = acc_complete_setup_noop;
+ value = 0;
+ } else if (b_request == ACCESSORY_REGISTER_HID) {
+ cdev->req->complete = acc_complete_setup_noop;
+ value = acc_register_hid(dev, w_value, w_index);
+ } else if (b_request == ACCESSORY_UNREGISTER_HID) {
+ cdev->req->complete = acc_complete_setup_noop;
+ value = acc_unregister_hid(dev, w_value);
+ } else if (b_request == ACCESSORY_SET_HID_REPORT_DESC) {
+ spin_lock_irqsave(&dev->lock, flags);
+ hid = acc_hid_get(&dev->new_hid_list, w_value);
+ spin_unlock_irqrestore(&dev->lock, flags);
+ if (!hid) {
+ value = -EINVAL;
+ goto err;
+ }
+ offset = w_index;
+ if (offset != hid->report_desc_offset
+ || offset + w_length > hid->report_desc_len) {
+ value = -EINVAL;
+ goto err;
+ }
+ cdev->req->context = hid;
+ cdev->req->complete = acc_complete_set_hid_report_desc;
+ value = w_length;
+ } else if (b_request == ACCESSORY_SEND_HID_EVENT) {
+ spin_lock_irqsave(&dev->lock, flags);
+ hid = acc_hid_get(&dev->hid_list, w_value);
+ spin_unlock_irqrestore(&dev->lock, flags);
+ if (!hid) {
+ value = -EINVAL;
+ goto err;
+ }
+ cdev->req->context = hid;
+ cdev->req->complete = acc_complete_send_hid_event;
+ value = w_length;
+ }
+ } else if (b_requestType == (USB_DIR_IN | USB_TYPE_VENDOR)) {
+ if (b_request == ACCESSORY_GET_PROTOCOL) {
+ *((u16 *)cdev->req->buf) = PROTOCOL_VERSION;
+ value = sizeof(u16);
+ cdev->req->complete = acc_complete_setup_noop;
+ /* clear any string left over from a previous session */
+ memset(dev->manufacturer, 0, sizeof(dev->manufacturer));
+ memset(dev->model, 0, sizeof(dev->model));
+ memset(dev->description, 0, sizeof(dev->description));
+ memset(dev->version, 0, sizeof(dev->version));
+ memset(dev->uri, 0, sizeof(dev->uri));
+ memset(dev->serial, 0, sizeof(dev->serial));
+ dev->start_requested = 0;
+ dev->audio_mode = 0;
+ }
+ }
+
+ if (value >= 0) {
+ cdev->req->zero = 0;
+ cdev->req->length = value;
+ value = usb_ep_queue(cdev->gadget->ep0, cdev->req, GFP_ATOMIC);
+ if (value < 0)
+ ERROR(cdev, "%s setup response queue error\n",
+ __func__);
+ }
+
+err:
+ if (value == -EOPNOTSUPP)
+ VDBG(cdev,
+ "unknown class-specific control req "
+ "%02x.%02x v%04x i%04x l%u\n",
+ ctrl->bRequestType, ctrl->bRequest,
+ w_value, w_index, w_length);
+ return value;
+}
+EXPORT_SYMBOL_GPL(acc_ctrlrequest);
+
+static int
+__acc_function_bind(struct usb_configuration *c,
+ struct usb_function *f, bool configfs)
+{
+ struct usb_composite_dev *cdev = c->cdev;
+ struct acc_dev *dev = func_to_dev(f);
+ int id;
+ int ret;
+
+ DBG(cdev, "acc_function_bind dev: %p\n", dev);
+
+ if (configfs) {
+ if (acc_string_defs[INTERFACE_STRING_INDEX].id == 0) {
+ ret = usb_string_id(c->cdev);
+ if (ret < 0)
+ return ret;
+ acc_string_defs[INTERFACE_STRING_INDEX].id = ret;
+ acc_interface_desc.iInterface = ret;
+ }
+ dev->cdev = c->cdev;
+ }
+ ret = hid_register_driver(&acc_hid_driver);
+ if (ret)
+ return ret;
+
+ dev->start_requested = 0;
+
+ /* allocate interface ID(s) */
+ id = usb_interface_id(c, f);
+ if (id < 0)
+ return id;
+ acc_interface_desc.bInterfaceNumber = id;
+
+ /* allocate endpoints */
+ ret = create_bulk_endpoints(dev, &acc_fullspeed_in_desc,
+ &acc_fullspeed_out_desc);
+ if (ret)
+ return ret;
+
+ /* support high speed hardware */
+ if (gadget_is_dualspeed(c->cdev->gadget)) {
+ acc_highspeed_in_desc.bEndpointAddress =
+ acc_fullspeed_in_desc.bEndpointAddress;
+ acc_highspeed_out_desc.bEndpointAddress =
+ acc_fullspeed_out_desc.bEndpointAddress;
+ }
+
+ DBG(cdev, "%s speed %s: IN/%s, OUT/%s\n",
+ gadget_is_dualspeed(c->cdev->gadget) ? "dual" : "full",
+ f->name, dev->ep_in->name, dev->ep_out->name);
+ return 0;
+}
+
+static int
+acc_function_bind_configfs(struct usb_configuration *c,
+ struct usb_function *f) {
+ return __acc_function_bind(c, f, true);
+}
+
+static void
+kill_all_hid_devices(struct acc_dev *dev)
+{
+ struct acc_hid_dev *hid;
+ struct list_head *entry, *temp;
+ unsigned long flags;
+
+ /* do nothing if usb accessory device doesn't exist */
+ if (!dev)
+ return;
+
+ spin_lock_irqsave(&dev->lock, flags);
+ list_for_each_safe(entry, temp, &dev->hid_list) {
+ hid = list_entry(entry, struct acc_hid_dev, list);
+ list_del(&hid->list);
+ list_add(&hid->list, &dev->dead_hid_list);
+ }
+ list_for_each_safe(entry, temp, &dev->new_hid_list) {
+ hid = list_entry(entry, struct acc_hid_dev, list);
+ list_del(&hid->list);
+ list_add(&hid->list, &dev->dead_hid_list);
+ }
+ spin_unlock_irqrestore(&dev->lock, flags);
+
+ schedule_work(&dev->hid_work);
+}
+
+static void
+acc_hid_unbind(struct acc_dev *dev)
+{
+ hid_unregister_driver(&acc_hid_driver);
+ kill_all_hid_devices(dev);
+}
+
+static void
+acc_function_unbind(struct usb_configuration *c, struct usb_function *f)
+{
+ struct acc_dev *dev = func_to_dev(f);
+ struct usb_request *req;
+ int i;
+
+ dev->online = 0; /* clear online flag */
+ wake_up(&dev->read_wq); /* unblock reads on closure */
+ wake_up(&dev->write_wq); /* likewise for writes */
+
+ while ((req = req_get(dev, &dev->tx_idle)))
+ acc_request_free(req, dev->ep_in);
+ for (i = 0; i < RX_REQ_MAX; i++)
+ acc_request_free(dev->rx_req[i], dev->ep_out);
+
+ acc_hid_unbind(dev);
+}
+
+static void acc_start_work(struct work_struct *data)
+{
+ char *envp[2] = { "ACCESSORY=START", NULL };
+
+ kobject_uevent_env(&acc_device.this_device->kobj, KOBJ_CHANGE, envp);
+}
+
+static int acc_hid_init(struct acc_hid_dev *hdev)
+{
+ struct hid_device *hid;
+ int ret;
+
+ hid = hid_allocate_device();
+ if (IS_ERR(hid))
+ return PTR_ERR(hid);
+
+ hid->ll_driver = &acc_hid_ll_driver;
+ hid->dev.parent = acc_device.this_device;
+
+ hid->bus = BUS_USB;
+ hid->vendor = HID_ANY_ID;
+ hid->product = HID_ANY_ID;
+ hid->driver_data = hdev;
+ ret = hid_add_device(hid);
+ if (ret) {
+ pr_err("can't add hid device: %d\n", ret);
+ hid_destroy_device(hid);
+ return ret;
+ }
+
+ hdev->hid = hid;
+ return 0;
+}
+
+static void acc_hid_delete(struct acc_hid_dev *hid)
+{
+ kfree(hid->report_desc);
+ kfree(hid);
+}
+
+static void acc_hid_work(struct work_struct *data)
+{
+ struct acc_dev *dev = _acc_dev;
+ struct list_head *entry, *temp;
+ struct acc_hid_dev *hid;
+ struct list_head new_list, dead_list;
+ unsigned long flags;
+
+ INIT_LIST_HEAD(&new_list);
+
+ spin_lock_irqsave(&dev->lock, flags);
+
+ /* copy hids that are ready for initialization to new_list */
+ list_for_each_safe(entry, temp, &dev->new_hid_list) {
+ hid = list_entry(entry, struct acc_hid_dev, list);
+ if (hid->report_desc_offset == hid->report_desc_len)
+ list_move(&hid->list, &new_list);
+ }
+
+ if (list_empty(&dev->dead_hid_list)) {
+ INIT_LIST_HEAD(&dead_list);
+ } else {
+ /* move all of dev->dead_hid_list to dead_list */
+ dead_list.prev = dev->dead_hid_list.prev;
+ dead_list.next = dev->dead_hid_list.next;
+ dead_list.next->prev = &dead_list;
+ dead_list.prev->next = &dead_list;
+ INIT_LIST_HEAD(&dev->dead_hid_list);
+ }
+
+ spin_unlock_irqrestore(&dev->lock, flags);
+
+ /* register new HID devices */
+ list_for_each_safe(entry, temp, &new_list) {
+ hid = list_entry(entry, struct acc_hid_dev, list);
+ if (acc_hid_init(hid)) {
+ pr_err("can't add HID device %p\n", hid);
+ acc_hid_delete(hid);
+ } else {
+ spin_lock_irqsave(&dev->lock, flags);
+ list_move(&hid->list, &dev->hid_list);
+ spin_unlock_irqrestore(&dev->lock, flags);
+ }
+ }
+
+ /* remove dead HID devices */
+ list_for_each_safe(entry, temp, &dead_list) {
+ hid = list_entry(entry, struct acc_hid_dev, list);
+ list_del(&hid->list);
+ if (hid->hid)
+ hid_destroy_device(hid->hid);
+ acc_hid_delete(hid);
+ }
+}
+
+static int acc_function_set_alt(struct usb_function *f,
+ unsigned intf, unsigned alt)
+{
+ struct acc_dev *dev = func_to_dev(f);
+ struct usb_composite_dev *cdev = f->config->cdev;
+ int ret;
+
+ DBG(cdev, "acc_function_set_alt intf: %d alt: %d\n", intf, alt);
+
+ ret = config_ep_by_speed(cdev->gadget, f, dev->ep_in);
+ if (ret)
+ return ret;
+
+ ret = usb_ep_enable(dev->ep_in);
+ if (ret)
+ return ret;
+
+ ret = config_ep_by_speed(cdev->gadget, f, dev->ep_out);
+ if (ret)
+ return ret;
+
+ ret = usb_ep_enable(dev->ep_out);
+ if (ret) {
+ usb_ep_disable(dev->ep_in);
+ return ret;
+ }
+
+ dev->online = 1;
+ dev->disconnected = 0; /* if online then not disconnected */
+
+ /* readers may be blocked waiting for us to go online */
+ wake_up(&dev->read_wq);
+ return 0;
+}
+
+static void acc_function_disable(struct usb_function *f)
+{
+ struct acc_dev *dev = func_to_dev(f);
+ struct usb_composite_dev *cdev = dev->cdev;
+
+ DBG(cdev, "acc_function_disable\n");
+ acc_set_disconnected(dev); /* this now only sets disconnected */
+ dev->online = 0; /* so now need to clear online flag here too */
+ usb_ep_disable(dev->ep_in);
+ usb_ep_disable(dev->ep_out);
+
+ /* readers may be blocked waiting for us to go online */
+ wake_up(&dev->read_wq);
+
+ VDBG(cdev, "%s disabled\n", dev->function.name);
+}
+
+static int acc_setup(void)
+{
+ struct acc_dev *dev;
+ int ret;
+
+ dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+ if (!dev)
+ return -ENOMEM;
+
+ spin_lock_init(&dev->lock);
+ init_waitqueue_head(&dev->read_wq);
+ init_waitqueue_head(&dev->write_wq);
+ atomic_set(&dev->open_excl, 0);
+ INIT_LIST_HEAD(&dev->tx_idle);
+ INIT_LIST_HEAD(&dev->hid_list);
+ INIT_LIST_HEAD(&dev->new_hid_list);
+ INIT_LIST_HEAD(&dev->dead_hid_list);
+ INIT_DELAYED_WORK(&dev->start_work, acc_start_work);
+ INIT_WORK(&dev->hid_work, acc_hid_work);
+
+ /* _acc_dev must be set before calling usb_gadget_register_driver */
+ _acc_dev = dev;
+
+ ret = misc_register(&acc_device);
+ if (ret)
+ goto err;
+
+ return 0;
+
+err:
+ kfree(dev);
+ pr_err("USB accessory gadget driver failed to initialize\n");
+ return ret;
+}
+
+void acc_disconnect(void)
+{
+ /* unregister all HID devices if USB is disconnected */
+ kill_all_hid_devices(_acc_dev);
+}
+EXPORT_SYMBOL_GPL(acc_disconnect);
+
+static void acc_cleanup(void)
+{
+ misc_deregister(&acc_device);
+ kfree(_acc_dev);
+ _acc_dev = NULL;
+}
+static struct acc_instance *to_acc_instance(struct config_item *item)
+{
+ return container_of(to_config_group(item), struct acc_instance,
+ func_inst.group);
+}
+
+static void acc_attr_release(struct config_item *item)
+{
+ struct acc_instance *fi_acc = to_acc_instance(item);
+
+ usb_put_function_instance(&fi_acc->func_inst);
+}
+
+static struct configfs_item_operations acc_item_ops = {
+ .release = acc_attr_release,
+};
+
+static struct config_item_type acc_func_type = {
+ .ct_item_ops = &acc_item_ops,
+ .ct_owner = THIS_MODULE,
+};
+
+static struct acc_instance *to_fi_acc(struct usb_function_instance *fi)
+{
+ return container_of(fi, struct acc_instance, func_inst);
+}
+
+static int acc_set_inst_name(struct usb_function_instance *fi, const char *name)
+{
+ struct acc_instance *fi_acc;
+ char *ptr;
+ int name_len;
+
+ name_len = strlen(name) + 1;
+ if (name_len > MAX_INST_NAME_LEN)
+ return -ENAMETOOLONG;
+
+ ptr = kstrndup(name, name_len, GFP_KERNEL);
+ if (!ptr)
+ return -ENOMEM;
+
+ fi_acc = to_fi_acc(fi);
+ fi_acc->name = ptr;
+ return 0;
+}
+
+static void acc_free_inst(struct usb_function_instance *fi)
+{
+ struct acc_instance *fi_acc;
+
+ fi_acc = to_fi_acc(fi);
+ kfree(fi_acc->name);
+ acc_cleanup();
+}
+
+static struct usb_function_instance *acc_alloc_inst(void)
+{
+ struct acc_instance *fi_acc;
+ struct acc_dev *dev;
+ int err;
+
+ fi_acc = kzalloc(sizeof(*fi_acc), GFP_KERNEL);
+ if (!fi_acc)
+ return ERR_PTR(-ENOMEM);
+ fi_acc->func_inst.set_inst_name = acc_set_inst_name;
+ fi_acc->func_inst.free_func_inst = acc_free_inst;
+
+ err = acc_setup();
+ if (err) {
+ kfree(fi_acc);
+ pr_err("Error setting ACCESSORY\n");
+ return ERR_PTR(err);
+ }
+
+ config_group_init_type_name(&fi_acc->func_inst.group,
+ "", &acc_func_type);
+ dev = _acc_dev;
+ return &fi_acc->func_inst;
+}
+
+static void acc_free(struct usb_function *f)
+{
+/*NO-OP: no function specific resource allocation in mtp_alloc*/
+}
+
+int acc_ctrlrequest_configfs(struct usb_function *f,
+ const struct usb_ctrlrequest *ctrl) {
+ if (f->config != NULL && f->config->cdev != NULL)
+ return acc_ctrlrequest(f->config->cdev, ctrl);
+ else
+ return -1;
+}
+
+static struct usb_function *acc_alloc(struct usb_function_instance *fi)
+{
+ struct acc_dev *dev = _acc_dev;
+
+ pr_info("acc_alloc\n");
+
+ dev->function.name = "accessory";
+ dev->function.strings = acc_strings,
+ dev->function.fs_descriptors = fs_acc_descs;
+ dev->function.hs_descriptors = hs_acc_descs;
+ dev->function.bind = acc_function_bind_configfs;
+ dev->function.unbind = acc_function_unbind;
+ dev->function.set_alt = acc_function_set_alt;
+ dev->function.disable = acc_function_disable;
+ dev->function.free_func = acc_free;
+ dev->function.setup = acc_ctrlrequest_configfs;
+
+ return &dev->function;
+}
+DECLARE_USB_FUNCTION_INIT(accessory, acc_alloc_inst, acc_alloc);
+MODULE_LICENSE("GPL");
diff --git a/drivers/usb/gadget/function/f_audio_source.c b/drivers/usb/gadget/function/f_audio_source.c
new file mode 100644
index 0000000..8124af3
--- /dev/null
+++ b/drivers/usb/gadget/function/f_audio_source.c
@@ -0,0 +1,1071 @@
+/*
+ * Gadget Function Driver for USB audio source device
+ *
+ * Copyright (C) 2012 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/device.h>
+#include <linux/usb/audio.h>
+#include <linux/wait.h>
+#include <linux/pm_qos.h>
+#include <sound/core.h>
+#include <sound/initval.h>
+#include <sound/pcm.h>
+
+#include <linux/usb.h>
+#include <linux/usb_usual.h>
+#include <linux/usb/ch9.h>
+#include <linux/configfs.h>
+#include <linux/usb/composite.h>
+#include <linux/module.h>
+#include <linux/moduleparam.h>
+#define SAMPLE_RATE 44100
+#define FRAMES_PER_MSEC (SAMPLE_RATE / 1000)
+
+#define IN_EP_MAX_PACKET_SIZE 256
+
+/* Number of requests to allocate */
+#define IN_EP_REQ_COUNT 4
+
+#define AUDIO_AC_INTERFACE 0
+#define AUDIO_AS_INTERFACE 1
+#define AUDIO_NUM_INTERFACES 2
+#define MAX_INST_NAME_LEN 40
+
+/* B.3.1 Standard AC Interface Descriptor */
+static struct usb_interface_descriptor ac_interface_desc = {
+ .bLength = USB_DT_INTERFACE_SIZE,
+ .bDescriptorType = USB_DT_INTERFACE,
+ .bNumEndpoints = 0,
+ .bInterfaceClass = USB_CLASS_AUDIO,
+ .bInterfaceSubClass = USB_SUBCLASS_AUDIOCONTROL,
+};
+
+DECLARE_UAC_AC_HEADER_DESCRIPTOR(2);
+
+#define UAC_DT_AC_HEADER_LENGTH UAC_DT_AC_HEADER_SIZE(AUDIO_NUM_INTERFACES)
+/* 1 input terminal, 1 output terminal and 1 feature unit */
+#define UAC_DT_TOTAL_LENGTH (UAC_DT_AC_HEADER_LENGTH \
+ + UAC_DT_INPUT_TERMINAL_SIZE + UAC_DT_OUTPUT_TERMINAL_SIZE \
+ + UAC_DT_FEATURE_UNIT_SIZE(0))
+/* B.3.2 Class-Specific AC Interface Descriptor */
+static struct uac1_ac_header_descriptor_2 ac_header_desc = {
+ .bLength = UAC_DT_AC_HEADER_LENGTH,
+ .bDescriptorType = USB_DT_CS_INTERFACE,
+ .bDescriptorSubtype = UAC_HEADER,
+ .bcdADC = __constant_cpu_to_le16(0x0100),
+ .wTotalLength = __constant_cpu_to_le16(UAC_DT_TOTAL_LENGTH),
+ .bInCollection = AUDIO_NUM_INTERFACES,
+ .baInterfaceNr = {
+ [0] = AUDIO_AC_INTERFACE,
+ [1] = AUDIO_AS_INTERFACE,
+ }
+};
+
+#define INPUT_TERMINAL_ID 1
+static struct uac_input_terminal_descriptor input_terminal_desc = {
+ .bLength = UAC_DT_INPUT_TERMINAL_SIZE,
+ .bDescriptorType = USB_DT_CS_INTERFACE,
+ .bDescriptorSubtype = UAC_INPUT_TERMINAL,
+ .bTerminalID = INPUT_TERMINAL_ID,
+ .wTerminalType = UAC_INPUT_TERMINAL_MICROPHONE,
+ .bAssocTerminal = 0,
+ .wChannelConfig = 0x3,
+};
+
+DECLARE_UAC_FEATURE_UNIT_DESCRIPTOR(0);
+
+#define FEATURE_UNIT_ID 2
+static struct uac_feature_unit_descriptor_0 feature_unit_desc = {
+ .bLength = UAC_DT_FEATURE_UNIT_SIZE(0),
+ .bDescriptorType = USB_DT_CS_INTERFACE,
+ .bDescriptorSubtype = UAC_FEATURE_UNIT,
+ .bUnitID = FEATURE_UNIT_ID,
+ .bSourceID = INPUT_TERMINAL_ID,
+ .bControlSize = 2,
+};
+
+#define OUTPUT_TERMINAL_ID 3
+static struct uac1_output_terminal_descriptor output_terminal_desc = {
+ .bLength = UAC_DT_OUTPUT_TERMINAL_SIZE,
+ .bDescriptorType = USB_DT_CS_INTERFACE,
+ .bDescriptorSubtype = UAC_OUTPUT_TERMINAL,
+ .bTerminalID = OUTPUT_TERMINAL_ID,
+ .wTerminalType = UAC_TERMINAL_STREAMING,
+ .bAssocTerminal = FEATURE_UNIT_ID,
+ .bSourceID = FEATURE_UNIT_ID,
+};
+
+/* B.4.1 Standard AS Interface Descriptor */
+static struct usb_interface_descriptor as_interface_alt_0_desc = {
+ .bLength = USB_DT_INTERFACE_SIZE,
+ .bDescriptorType = USB_DT_INTERFACE,
+ .bAlternateSetting = 0,
+ .bNumEndpoints = 0,
+ .bInterfaceClass = USB_CLASS_AUDIO,
+ .bInterfaceSubClass = USB_SUBCLASS_AUDIOSTREAMING,
+};
+
+static struct usb_interface_descriptor as_interface_alt_1_desc = {
+ .bLength = USB_DT_INTERFACE_SIZE,
+ .bDescriptorType = USB_DT_INTERFACE,
+ .bAlternateSetting = 1,
+ .bNumEndpoints = 1,
+ .bInterfaceClass = USB_CLASS_AUDIO,
+ .bInterfaceSubClass = USB_SUBCLASS_AUDIOSTREAMING,
+};
+
+/* B.4.2 Class-Specific AS Interface Descriptor */
+static struct uac1_as_header_descriptor as_header_desc = {
+ .bLength = UAC_DT_AS_HEADER_SIZE,
+ .bDescriptorType = USB_DT_CS_INTERFACE,
+ .bDescriptorSubtype = UAC_AS_GENERAL,
+ .bTerminalLink = INPUT_TERMINAL_ID,
+ .bDelay = 1,
+ .wFormatTag = UAC_FORMAT_TYPE_I_PCM,
+};
+
+DECLARE_UAC_FORMAT_TYPE_I_DISCRETE_DESC(1);
+
+static struct uac_format_type_i_discrete_descriptor_1 as_type_i_desc = {
+ .bLength = UAC_FORMAT_TYPE_I_DISCRETE_DESC_SIZE(1),
+ .bDescriptorType = USB_DT_CS_INTERFACE,
+ .bDescriptorSubtype = UAC_FORMAT_TYPE,
+ .bFormatType = UAC_FORMAT_TYPE_I,
+ .bSubframeSize = 2,
+ .bBitResolution = 16,
+ .bSamFreqType = 1,
+};
+
+/* Standard ISO IN Endpoint Descriptor for highspeed */
+static struct usb_endpoint_descriptor hs_as_in_ep_desc = {
+ .bLength = USB_DT_ENDPOINT_AUDIO_SIZE,
+ .bDescriptorType = USB_DT_ENDPOINT,
+ .bEndpointAddress = USB_DIR_IN,
+ .bmAttributes = USB_ENDPOINT_SYNC_SYNC
+ | USB_ENDPOINT_XFER_ISOC,
+ .wMaxPacketSize = __constant_cpu_to_le16(IN_EP_MAX_PACKET_SIZE),
+ .bInterval = 4, /* poll 1 per millisecond */
+};
+
+/* Standard ISO IN Endpoint Descriptor for highspeed */
+static struct usb_endpoint_descriptor fs_as_in_ep_desc = {
+ .bLength = USB_DT_ENDPOINT_AUDIO_SIZE,
+ .bDescriptorType = USB_DT_ENDPOINT,
+ .bEndpointAddress = USB_DIR_IN,
+ .bmAttributes = USB_ENDPOINT_SYNC_SYNC
+ | USB_ENDPOINT_XFER_ISOC,
+ .wMaxPacketSize = __constant_cpu_to_le16(IN_EP_MAX_PACKET_SIZE),
+ .bInterval = 1, /* poll 1 per millisecond */
+};
+
+/* Class-specific AS ISO OUT Endpoint Descriptor */
+static struct uac_iso_endpoint_descriptor as_iso_in_desc = {
+ .bLength = UAC_ISO_ENDPOINT_DESC_SIZE,
+ .bDescriptorType = USB_DT_CS_ENDPOINT,
+ .bDescriptorSubtype = UAC_EP_GENERAL,
+ .bmAttributes = 1,
+ .bLockDelayUnits = 1,
+ .wLockDelay = __constant_cpu_to_le16(1),
+};
+
+static struct usb_descriptor_header *hs_audio_desc[] = {
+ (struct usb_descriptor_header *)&ac_interface_desc,
+ (struct usb_descriptor_header *)&ac_header_desc,
+
+ (struct usb_descriptor_header *)&input_terminal_desc,
+ (struct usb_descriptor_header *)&output_terminal_desc,
+ (struct usb_descriptor_header *)&feature_unit_desc,
+
+ (struct usb_descriptor_header *)&as_interface_alt_0_desc,
+ (struct usb_descriptor_header *)&as_interface_alt_1_desc,
+ (struct usb_descriptor_header *)&as_header_desc,
+
+ (struct usb_descriptor_header *)&as_type_i_desc,
+
+ (struct usb_descriptor_header *)&hs_as_in_ep_desc,
+ (struct usb_descriptor_header *)&as_iso_in_desc,
+ NULL,
+};
+
+static struct usb_descriptor_header *fs_audio_desc[] = {
+ (struct usb_descriptor_header *)&ac_interface_desc,
+ (struct usb_descriptor_header *)&ac_header_desc,
+
+ (struct usb_descriptor_header *)&input_terminal_desc,
+ (struct usb_descriptor_header *)&output_terminal_desc,
+ (struct usb_descriptor_header *)&feature_unit_desc,
+
+ (struct usb_descriptor_header *)&as_interface_alt_0_desc,
+ (struct usb_descriptor_header *)&as_interface_alt_1_desc,
+ (struct usb_descriptor_header *)&as_header_desc,
+
+ (struct usb_descriptor_header *)&as_type_i_desc,
+
+ (struct usb_descriptor_header *)&fs_as_in_ep_desc,
+ (struct usb_descriptor_header *)&as_iso_in_desc,
+ NULL,
+};
+
+static struct snd_pcm_hardware audio_hw_info = {
+ .info = SNDRV_PCM_INFO_MMAP |
+ SNDRV_PCM_INFO_MMAP_VALID |
+ SNDRV_PCM_INFO_BATCH |
+ SNDRV_PCM_INFO_INTERLEAVED |
+ SNDRV_PCM_INFO_BLOCK_TRANSFER,
+
+ .formats = SNDRV_PCM_FMTBIT_S16_LE,
+ .channels_min = 2,
+ .channels_max = 2,
+ .rate_min = SAMPLE_RATE,
+ .rate_max = SAMPLE_RATE,
+
+ .buffer_bytes_max = 1024 * 1024,
+ .period_bytes_min = 64,
+ .period_bytes_max = 512 * 1024,
+ .periods_min = 2,
+ .periods_max = 1024,
+};
+
+/*-------------------------------------------------------------------------*/
+
+struct audio_source_config {
+ int card;
+ int device;
+};
+
+struct audio_dev {
+ struct usb_function func;
+ struct snd_card *card;
+ struct snd_pcm *pcm;
+ struct snd_pcm_substream *substream;
+
+ struct list_head idle_reqs;
+ struct usb_ep *in_ep;
+
+ spinlock_t lock;
+
+ /* beginning, end and current position in our buffer */
+ void *buffer_start;
+ void *buffer_end;
+ void *buffer_pos;
+
+ /* byte size of a "period" */
+ unsigned int period;
+ /* bytes sent since last call to snd_pcm_period_elapsed */
+ unsigned int period_offset;
+ /* time we started playing */
+ ktime_t start_time;
+ /* number of frames sent since start_time */
+ s64 frames_sent;
+ struct audio_source_config *config;
+ /* for creating and issuing QoS requests */
+ struct pm_qos_request pm_qos;
+};
+
+static inline struct audio_dev *func_to_audio(struct usb_function *f)
+{
+ return container_of(f, struct audio_dev, func);
+}
+
+/*-------------------------------------------------------------------------*/
+
+struct audio_source_instance {
+ struct usb_function_instance func_inst;
+ const char *name;
+ struct audio_source_config *config;
+ struct device *audio_device;
+};
+
+static void audio_source_attr_release(struct config_item *item);
+
+static struct configfs_item_operations audio_source_item_ops = {
+ .release = audio_source_attr_release,
+};
+
+static struct config_item_type audio_source_func_type = {
+ .ct_item_ops = &audio_source_item_ops,
+ .ct_owner = THIS_MODULE,
+};
+
+static ssize_t audio_source_pcm_show(struct device *dev,
+ struct device_attribute *attr, char *buf);
+
+static DEVICE_ATTR(pcm, S_IRUGO, audio_source_pcm_show, NULL);
+
+static struct device_attribute *audio_source_function_attributes[] = {
+ &dev_attr_pcm,
+ NULL
+};
+
+/*--------------------------------------------------------------------------*/
+
+static struct usb_request *audio_request_new(struct usb_ep *ep, int buffer_size)
+{
+ struct usb_request *req = usb_ep_alloc_request(ep, GFP_KERNEL);
+
+ if (!req)
+ return NULL;
+
+ req->buf = kmalloc(buffer_size, GFP_KERNEL);
+ if (!req->buf) {
+ usb_ep_free_request(ep, req);
+ return NULL;
+ }
+ req->length = buffer_size;
+ return req;
+}
+
+static void audio_request_free(struct usb_request *req, struct usb_ep *ep)
+{
+ if (req) {
+ kfree(req->buf);
+ usb_ep_free_request(ep, req);
+ }
+}
+
+static void audio_req_put(struct audio_dev *audio, struct usb_request *req)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&audio->lock, flags);
+ list_add_tail(&req->list, &audio->idle_reqs);
+ spin_unlock_irqrestore(&audio->lock, flags);
+}
+
+static struct usb_request *audio_req_get(struct audio_dev *audio)
+{
+ unsigned long flags;
+ struct usb_request *req;
+
+ spin_lock_irqsave(&audio->lock, flags);
+ if (list_empty(&audio->idle_reqs)) {
+ req = 0;
+ } else {
+ req = list_first_entry(&audio->idle_reqs, struct usb_request,
+ list);
+ list_del(&req->list);
+ }
+ spin_unlock_irqrestore(&audio->lock, flags);
+ return req;
+}
+
+/* send the appropriate number of packets to match our bitrate */
+static void audio_send(struct audio_dev *audio)
+{
+ struct snd_pcm_runtime *runtime;
+ struct usb_request *req;
+ int length, length1, length2, ret;
+ s64 msecs;
+ s64 frames;
+ ktime_t now;
+
+ /* audio->substream will be null if we have been closed */
+ if (!audio->substream)
+ return;
+ /* audio->buffer_pos will be null if we have been stopped */
+ if (!audio->buffer_pos)
+ return;
+
+ runtime = audio->substream->runtime;
+
+ /* compute number of frames to send */
+ now = ktime_get();
+ msecs = div_s64((ktime_to_ns(now) - ktime_to_ns(audio->start_time)),
+ 1000000);
+ frames = div_s64((msecs * SAMPLE_RATE), 1000);
+
+ /* Readjust our frames_sent if we fall too far behind.
+ * If we get too far behind it is better to drop some frames than
+ * to keep sending data too fast in an attempt to catch up.
+ */
+ if (frames - audio->frames_sent > 10 * FRAMES_PER_MSEC)
+ audio->frames_sent = frames - FRAMES_PER_MSEC;
+
+ frames -= audio->frames_sent;
+
+ /* We need to send something to keep the pipeline going */
+ if (frames <= 0)
+ frames = FRAMES_PER_MSEC;
+
+ while (frames > 0) {
+ req = audio_req_get(audio);
+ if (!req)
+ break;
+
+ length = frames_to_bytes(runtime, frames);
+ if (length > IN_EP_MAX_PACKET_SIZE)
+ length = IN_EP_MAX_PACKET_SIZE;
+
+ if (audio->buffer_pos + length > audio->buffer_end)
+ length1 = audio->buffer_end - audio->buffer_pos;
+ else
+ length1 = length;
+ memcpy(req->buf, audio->buffer_pos, length1);
+ if (length1 < length) {
+ /* Wrap around and copy remaining length
+ * at beginning of buffer.
+ */
+ length2 = length - length1;
+ memcpy(req->buf + length1, audio->buffer_start,
+ length2);
+ audio->buffer_pos = audio->buffer_start + length2;
+ } else {
+ audio->buffer_pos += length1;
+ if (audio->buffer_pos >= audio->buffer_end)
+ audio->buffer_pos = audio->buffer_start;
+ }
+
+ req->length = length;
+ ret = usb_ep_queue(audio->in_ep, req, GFP_ATOMIC);
+ if (ret < 0) {
+ pr_err("usb_ep_queue failed ret: %d\n", ret);
+ audio_req_put(audio, req);
+ break;
+ }
+
+ frames -= bytes_to_frames(runtime, length);
+ audio->frames_sent += bytes_to_frames(runtime, length);
+ }
+}
+
+static void audio_control_complete(struct usb_ep *ep, struct usb_request *req)
+{
+ /* nothing to do here */
+}
+
+static void audio_data_complete(struct usb_ep *ep, struct usb_request *req)
+{
+ struct audio_dev *audio = req->context;
+
+ pr_debug("audio_data_complete req->status %d req->actual %d\n",
+ req->status, req->actual);
+
+ audio_req_put(audio, req);
+
+ if (!audio->buffer_start || req->status)
+ return;
+
+ audio->period_offset += req->actual;
+ if (audio->period_offset >= audio->period) {
+ snd_pcm_period_elapsed(audio->substream);
+ audio->period_offset = 0;
+ }
+ audio_send(audio);
+}
+
+static int audio_set_endpoint_req(struct usb_function *f,
+ const struct usb_ctrlrequest *ctrl)
+{
+ int value = -EOPNOTSUPP;
+ u16 ep = le16_to_cpu(ctrl->wIndex);
+ u16 len = le16_to_cpu(ctrl->wLength);
+ u16 w_value = le16_to_cpu(ctrl->wValue);
+
+ pr_debug("bRequest 0x%x, w_value 0x%04x, len %d, endpoint %d\n",
+ ctrl->bRequest, w_value, len, ep);
+
+ switch (ctrl->bRequest) {
+ case UAC_SET_CUR:
+ case UAC_SET_MIN:
+ case UAC_SET_MAX:
+ case UAC_SET_RES:
+ value = len;
+ break;
+ default:
+ break;
+ }
+
+ return value;
+}
+
+static int audio_get_endpoint_req(struct usb_function *f,
+ const struct usb_ctrlrequest *ctrl)
+{
+ struct usb_composite_dev *cdev = f->config->cdev;
+ int value = -EOPNOTSUPP;
+ u8 ep = ((le16_to_cpu(ctrl->wIndex) >> 8) & 0xFF);
+ u16 len = le16_to_cpu(ctrl->wLength);
+ u16 w_value = le16_to_cpu(ctrl->wValue);
+ u8 *buf = cdev->req->buf;
+
+ pr_debug("bRequest 0x%x, w_value 0x%04x, len %d, endpoint %d\n",
+ ctrl->bRequest, w_value, len, ep);
+
+ if (w_value == UAC_EP_CS_ATTR_SAMPLE_RATE << 8) {
+ switch (ctrl->bRequest) {
+ case UAC_GET_CUR:
+ case UAC_GET_MIN:
+ case UAC_GET_MAX:
+ case UAC_GET_RES:
+ /* return our sample rate */
+ buf[0] = (u8)SAMPLE_RATE;
+ buf[1] = (u8)(SAMPLE_RATE >> 8);
+ buf[2] = (u8)(SAMPLE_RATE >> 16);
+ value = 3;
+ break;
+ default:
+ break;
+ }
+ }
+
+ return value;
+}
+
+static int
+audio_setup(struct usb_function *f, const struct usb_ctrlrequest *ctrl)
+{
+ struct usb_composite_dev *cdev = f->config->cdev;
+ struct usb_request *req = cdev->req;
+ int value = -EOPNOTSUPP;
+ u16 w_index = le16_to_cpu(ctrl->wIndex);
+ u16 w_value = le16_to_cpu(ctrl->wValue);
+ u16 w_length = le16_to_cpu(ctrl->wLength);
+
+ /* composite driver infrastructure handles everything; interface
+ * activation uses set_alt().
+ */
+ switch (ctrl->bRequestType) {
+ case USB_DIR_OUT | USB_TYPE_CLASS | USB_RECIP_ENDPOINT:
+ value = audio_set_endpoint_req(f, ctrl);
+ break;
+
+ case USB_DIR_IN | USB_TYPE_CLASS | USB_RECIP_ENDPOINT:
+ value = audio_get_endpoint_req(f, ctrl);
+ break;
+ }
+
+ /* respond with data transfer or status phase? */
+ if (value >= 0) {
+ pr_debug("audio req%02x.%02x v%04x i%04x l%d\n",
+ ctrl->bRequestType, ctrl->bRequest,
+ w_value, w_index, w_length);
+ req->zero = 0;
+ req->length = value;
+ req->complete = audio_control_complete;
+ value = usb_ep_queue(cdev->gadget->ep0, req, GFP_ATOMIC);
+ if (value < 0)
+ pr_err("audio response on err %d\n", value);
+ }
+
+ /* device either stalls (value < 0) or reports success */
+ return value;
+}
+
+static int audio_set_alt(struct usb_function *f, unsigned intf, unsigned alt)
+{
+ struct audio_dev *audio = func_to_audio(f);
+ struct usb_composite_dev *cdev = f->config->cdev;
+ int ret;
+
+ pr_debug("audio_set_alt intf %d, alt %d\n", intf, alt);
+
+ ret = config_ep_by_speed(cdev->gadget, f, audio->in_ep);
+ if (ret)
+ return ret;
+
+ usb_ep_enable(audio->in_ep);
+ return 0;
+}
+
+static void audio_disable(struct usb_function *f)
+{
+ struct audio_dev *audio = func_to_audio(f);
+
+ pr_debug("audio_disable\n");
+ usb_ep_disable(audio->in_ep);
+}
+
+static void audio_free_func(struct usb_function *f)
+{
+ /* no-op */
+}
+
+/*-------------------------------------------------------------------------*/
+
+static void audio_build_desc(struct audio_dev *audio)
+{
+ u8 *sam_freq;
+ int rate;
+
+ /* Set channel numbers */
+ input_terminal_desc.bNrChannels = 2;
+ as_type_i_desc.bNrChannels = 2;
+
+ /* Set sample rates */
+ rate = SAMPLE_RATE;
+ sam_freq = as_type_i_desc.tSamFreq[0];
+ memcpy(sam_freq, &rate, 3);
+}
+
+
+static int snd_card_setup(struct usb_configuration *c,
+ struct audio_source_config *config);
+static struct audio_source_instance *to_fi_audio_source(
+ const struct usb_function_instance *fi);
+
+
+/* audio function driver setup/binding */
+static int
+audio_bind(struct usb_configuration *c, struct usb_function *f)
+{
+ struct usb_composite_dev *cdev = c->cdev;
+ struct audio_dev *audio = func_to_audio(f);
+ int status;
+ struct usb_ep *ep;
+ struct usb_request *req;
+ int i;
+ int err;
+
+ if (IS_ENABLED(CONFIG_USB_CONFIGFS)) {
+ struct audio_source_instance *fi_audio =
+ to_fi_audio_source(f->fi);
+ struct audio_source_config *config =
+ fi_audio->config;
+
+ err = snd_card_setup(c, config);
+ if (err)
+ return err;
+ }
+
+ audio_build_desc(audio);
+
+ /* allocate instance-specific interface IDs, and patch descriptors */
+ status = usb_interface_id(c, f);
+ if (status < 0)
+ goto fail;
+ ac_interface_desc.bInterfaceNumber = status;
+
+ /* AUDIO_AC_INTERFACE */
+ ac_header_desc.baInterfaceNr[0] = status;
+
+ status = usb_interface_id(c, f);
+ if (status < 0)
+ goto fail;
+ as_interface_alt_0_desc.bInterfaceNumber = status;
+ as_interface_alt_1_desc.bInterfaceNumber = status;
+
+ /* AUDIO_AS_INTERFACE */
+ ac_header_desc.baInterfaceNr[1] = status;
+
+ status = -ENODEV;
+
+ /* allocate our endpoint */
+ ep = usb_ep_autoconfig(cdev->gadget, &fs_as_in_ep_desc);
+ if (!ep)
+ goto fail;
+ audio->in_ep = ep;
+ ep->driver_data = audio; /* claim */
+
+ if (gadget_is_dualspeed(c->cdev->gadget))
+ hs_as_in_ep_desc.bEndpointAddress =
+ fs_as_in_ep_desc.bEndpointAddress;
+
+ f->fs_descriptors = fs_audio_desc;
+ f->hs_descriptors = hs_audio_desc;
+
+ for (i = 0, status = 0; i < IN_EP_REQ_COUNT && status == 0; i++) {
+ req = audio_request_new(ep, IN_EP_MAX_PACKET_SIZE);
+ if (req) {
+ req->context = audio;
+ req->complete = audio_data_complete;
+ audio_req_put(audio, req);
+ } else
+ status = -ENOMEM;
+ }
+
+fail:
+ return status;
+}
+
+static void
+audio_unbind(struct usb_configuration *c, struct usb_function *f)
+{
+ struct audio_dev *audio = func_to_audio(f);
+ struct usb_request *req;
+
+ while ((req = audio_req_get(audio)))
+ audio_request_free(req, audio->in_ep);
+
+ snd_card_free_when_closed(audio->card);
+ audio->card = NULL;
+ audio->pcm = NULL;
+ audio->substream = NULL;
+ audio->in_ep = NULL;
+
+ if (IS_ENABLED(CONFIG_USB_CONFIGFS)) {
+ struct audio_source_instance *fi_audio =
+ to_fi_audio_source(f->fi);
+ struct audio_source_config *config =
+ fi_audio->config;
+
+ config->card = -1;
+ config->device = -1;
+ }
+}
+
+static void audio_pcm_playback_start(struct audio_dev *audio)
+{
+ audio->start_time = ktime_get();
+ audio->frames_sent = 0;
+ audio_send(audio);
+}
+
+static void audio_pcm_playback_stop(struct audio_dev *audio)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&audio->lock, flags);
+ audio->buffer_start = 0;
+ audio->buffer_end = 0;
+ audio->buffer_pos = 0;
+ spin_unlock_irqrestore(&audio->lock, flags);
+}
+
+static int audio_pcm_open(struct snd_pcm_substream *substream)
+{
+ struct snd_pcm_runtime *runtime = substream->runtime;
+ struct audio_dev *audio = substream->private_data;
+
+ runtime->private_data = audio;
+ runtime->hw = audio_hw_info;
+ snd_pcm_limit_hw_rates(runtime);
+ runtime->hw.channels_max = 2;
+
+ audio->substream = substream;
+
+ /* Add the QoS request and set the latency to 0 */
+ pm_qos_add_request(&audio->pm_qos, PM_QOS_CPU_DMA_LATENCY, 0);
+
+ return 0;
+}
+
+static int audio_pcm_close(struct snd_pcm_substream *substream)
+{
+ struct audio_dev *audio = substream->private_data;
+ unsigned long flags;
+
+ spin_lock_irqsave(&audio->lock, flags);
+
+ /* Remove the QoS request */
+ pm_qos_remove_request(&audio->pm_qos);
+
+ audio->substream = NULL;
+ spin_unlock_irqrestore(&audio->lock, flags);
+
+ return 0;
+}
+
+static int audio_pcm_hw_params(struct snd_pcm_substream *substream,
+ struct snd_pcm_hw_params *params)
+{
+ unsigned int channels = params_channels(params);
+ unsigned int rate = params_rate(params);
+
+ if (rate != SAMPLE_RATE)
+ return -EINVAL;
+ if (channels != 2)
+ return -EINVAL;
+
+ return snd_pcm_lib_alloc_vmalloc_buffer(substream,
+ params_buffer_bytes(params));
+}
+
+static int audio_pcm_hw_free(struct snd_pcm_substream *substream)
+{
+ return snd_pcm_lib_free_vmalloc_buffer(substream);
+}
+
+static int audio_pcm_prepare(struct snd_pcm_substream *substream)
+{
+ struct snd_pcm_runtime *runtime = substream->runtime;
+ struct audio_dev *audio = runtime->private_data;
+
+ audio->period = snd_pcm_lib_period_bytes(substream);
+ audio->period_offset = 0;
+ audio->buffer_start = runtime->dma_area;
+ audio->buffer_end = audio->buffer_start
+ + snd_pcm_lib_buffer_bytes(substream);
+ audio->buffer_pos = audio->buffer_start;
+
+ return 0;
+}
+
+static snd_pcm_uframes_t audio_pcm_pointer(struct snd_pcm_substream *substream)
+{
+ struct snd_pcm_runtime *runtime = substream->runtime;
+ struct audio_dev *audio = runtime->private_data;
+ ssize_t bytes = audio->buffer_pos - audio->buffer_start;
+
+ /* return offset of next frame to fill in our buffer */
+ return bytes_to_frames(runtime, bytes);
+}
+
+static int audio_pcm_playback_trigger(struct snd_pcm_substream *substream,
+ int cmd)
+{
+ struct audio_dev *audio = substream->runtime->private_data;
+ int ret = 0;
+
+ switch (cmd) {
+ case SNDRV_PCM_TRIGGER_START:
+ case SNDRV_PCM_TRIGGER_RESUME:
+ audio_pcm_playback_start(audio);
+ break;
+
+ case SNDRV_PCM_TRIGGER_STOP:
+ case SNDRV_PCM_TRIGGER_SUSPEND:
+ audio_pcm_playback_stop(audio);
+ break;
+
+ default:
+ ret = -EINVAL;
+ }
+
+ return ret;
+}
+
+static struct audio_dev _audio_dev = {
+ .func = {
+ .name = "audio_source",
+ .bind = audio_bind,
+ .unbind = audio_unbind,
+ .set_alt = audio_set_alt,
+ .setup = audio_setup,
+ .disable = audio_disable,
+ .free_func = audio_free_func,
+ },
+ .lock = __SPIN_LOCK_UNLOCKED(_audio_dev.lock),
+ .idle_reqs = LIST_HEAD_INIT(_audio_dev.idle_reqs),
+};
+
+static struct snd_pcm_ops audio_playback_ops = {
+ .open = audio_pcm_open,
+ .close = audio_pcm_close,
+ .ioctl = snd_pcm_lib_ioctl,
+ .hw_params = audio_pcm_hw_params,
+ .hw_free = audio_pcm_hw_free,
+ .prepare = audio_pcm_prepare,
+ .trigger = audio_pcm_playback_trigger,
+ .pointer = audio_pcm_pointer,
+};
+
+int audio_source_bind_config(struct usb_configuration *c,
+ struct audio_source_config *config)
+{
+ struct audio_dev *audio;
+ int err;
+
+ config->card = -1;
+ config->device = -1;
+
+ audio = &_audio_dev;
+
+ err = snd_card_setup(c, config);
+ if (err)
+ return err;
+
+ err = usb_add_function(c, &audio->func);
+ if (err)
+ goto add_fail;
+
+ return 0;
+
+add_fail:
+ snd_card_free(audio->card);
+ return err;
+}
+
+static int snd_card_setup(struct usb_configuration *c,
+ struct audio_source_config *config)
+{
+ struct audio_dev *audio;
+ struct snd_card *card;
+ struct snd_pcm *pcm;
+ int err;
+
+ audio = &_audio_dev;
+
+ err = snd_card_new(&c->cdev->gadget->dev,
+ SNDRV_DEFAULT_IDX1, SNDRV_DEFAULT_STR1,
+ THIS_MODULE, 0, &card);
+ if (err)
+ return err;
+
+ err = snd_pcm_new(card, "USB audio source", 0, 1, 0, &pcm);
+ if (err)
+ goto pcm_fail;
+
+ pcm->private_data = audio;
+ pcm->info_flags = 0;
+ audio->pcm = pcm;
+
+ strlcpy(pcm->name, "USB gadget audio", sizeof(pcm->name));
+
+ snd_pcm_set_ops(pcm, SNDRV_PCM_STREAM_PLAYBACK, &audio_playback_ops);
+ snd_pcm_lib_preallocate_pages_for_all(pcm, SNDRV_DMA_TYPE_DEV,
+ NULL, 0, 64 * 1024);
+
+ strlcpy(card->driver, "audio_source", sizeof(card->driver));
+ strlcpy(card->shortname, card->driver, sizeof(card->shortname));
+ strlcpy(card->longname, "USB accessory audio source",
+ sizeof(card->longname));
+
+ err = snd_card_register(card);
+ if (err)
+ goto register_fail;
+
+ config->card = pcm->card->number;
+ config->device = pcm->device;
+ audio->card = card;
+ return 0;
+
+register_fail:
+pcm_fail:
+ snd_card_free(audio->card);
+ return err;
+}
+
+static struct audio_source_instance *to_audio_source_instance(
+ struct config_item *item)
+{
+ return container_of(to_config_group(item), struct audio_source_instance,
+ func_inst.group);
+}
+
+static struct audio_source_instance *to_fi_audio_source(
+ const struct usb_function_instance *fi)
+{
+ return container_of(fi, struct audio_source_instance, func_inst);
+}
+
+static void audio_source_attr_release(struct config_item *item)
+{
+ struct audio_source_instance *fi_audio = to_audio_source_instance(item);
+
+ usb_put_function_instance(&fi_audio->func_inst);
+}
+
+static int audio_source_set_inst_name(struct usb_function_instance *fi,
+ const char *name)
+{
+ struct audio_source_instance *fi_audio;
+ char *ptr;
+ int name_len;
+
+ name_len = strlen(name) + 1;
+ if (name_len > MAX_INST_NAME_LEN)
+ return -ENAMETOOLONG;
+
+ ptr = kstrndup(name, name_len, GFP_KERNEL);
+ if (!ptr)
+ return -ENOMEM;
+
+ fi_audio = to_fi_audio_source(fi);
+ fi_audio->name = ptr;
+
+ return 0;
+}
+
+static void audio_source_free_inst(struct usb_function_instance *fi)
+{
+ struct audio_source_instance *fi_audio;
+
+ fi_audio = to_fi_audio_source(fi);
+ device_destroy(fi_audio->audio_device->class,
+ fi_audio->audio_device->devt);
+ kfree(fi_audio->name);
+ kfree(fi_audio->config);
+}
+
+static ssize_t audio_source_pcm_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct audio_source_instance *fi_audio = dev_get_drvdata(dev);
+ struct audio_source_config *config = fi_audio->config;
+
+ /* print PCM card and device numbers */
+ return sprintf(buf, "%d %d\n", config->card, config->device);
+}
+
+struct device *create_function_device(char *name);
+
+static struct usb_function_instance *audio_source_alloc_inst(void)
+{
+ struct audio_source_instance *fi_audio;
+ struct device_attribute **attrs;
+ struct device_attribute *attr;
+ struct device *dev;
+ void *err_ptr;
+ int err = 0;
+
+ fi_audio = kzalloc(sizeof(*fi_audio), GFP_KERNEL);
+ if (!fi_audio)
+ return ERR_PTR(-ENOMEM);
+
+ fi_audio->func_inst.set_inst_name = audio_source_set_inst_name;
+ fi_audio->func_inst.free_func_inst = audio_source_free_inst;
+
+ fi_audio->config = kzalloc(sizeof(struct audio_source_config),
+ GFP_KERNEL);
+ if (!fi_audio->config) {
+ err_ptr = ERR_PTR(-ENOMEM);
+ goto fail_audio;
+ }
+
+ config_group_init_type_name(&fi_audio->func_inst.group, "",
+ &audio_source_func_type);
+ dev = create_function_device("f_audio_source");
+
+ if (IS_ERR(dev)) {
+ err_ptr = dev;
+ goto fail_audio_config;
+ }
+
+ fi_audio->config->card = -1;
+ fi_audio->config->device = -1;
+ fi_audio->audio_device = dev;
+
+ attrs = audio_source_function_attributes;
+ if (attrs) {
+ while ((attr = *attrs++) && !err)
+ err = device_create_file(dev, attr);
+ if (err) {
+ err_ptr = ERR_PTR(-EINVAL);
+ goto fail_device;
+ }
+ }
+
+ dev_set_drvdata(dev, fi_audio);
+ _audio_dev.config = fi_audio->config;
+
+ return &fi_audio->func_inst;
+
+fail_device:
+ device_destroy(dev->class, dev->devt);
+fail_audio_config:
+ kfree(fi_audio->config);
+fail_audio:
+ kfree(fi_audio);
+ return err_ptr;
+
+}
+
+static struct usb_function *audio_source_alloc(struct usb_function_instance *fi)
+{
+ return &_audio_dev.func;
+}
+
+DECLARE_USB_FUNCTION_INIT(audio_source, audio_source_alloc_inst,
+ audio_source_alloc);
+MODULE_LICENSE("GPL");
diff --git a/drivers/usb/gadget/function/f_midi.c b/drivers/usb/gadget/function/f_midi.c
index 71cf552..f0a8e66 100644
--- a/drivers/usb/gadget/function/f_midi.c
+++ b/drivers/usb/gadget/function/f_midi.c
@@ -1208,6 +1208,65 @@
kfree(opts);
}
+#ifdef CONFIG_USB_CONFIGFS_UEVENT
+extern struct device *create_function_device(char *name);
+static ssize_t alsa_show(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ struct usb_function_instance *fi_midi = dev_get_drvdata(dev);
+ struct f_midi *midi;
+
+ if (!fi_midi->f)
+ dev_warn(dev, "f_midi: function not set\n");
+
+ if (fi_midi && fi_midi->f) {
+ midi = func_to_midi(fi_midi->f);
+ if (midi->rmidi && midi->rmidi->card)
+ return sprintf(buf, "%d %d\n",
+ midi->rmidi->card->number, midi->rmidi->device);
+ }
+
+ /* print PCM card and device numbers */
+ return sprintf(buf, "%d %d\n", -1, -1);
+}
+
+static DEVICE_ATTR(alsa, S_IRUGO, alsa_show, NULL);
+
+static struct device_attribute *alsa_function_attributes[] = {
+ &dev_attr_alsa,
+ NULL
+};
+
+static int create_alsa_device(struct usb_function_instance *fi)
+{
+ struct device *dev;
+ struct device_attribute **attrs;
+ struct device_attribute *attr;
+ int err = 0;
+
+ dev = create_function_device("f_midi");
+ if (IS_ERR(dev))
+ return PTR_ERR(dev);
+
+ attrs = alsa_function_attributes;
+ if (attrs) {
+ while ((attr = *attrs++) && !err)
+ err = device_create_file(dev, attr);
+ if (err) {
+ device_destroy(dev->class, dev->devt);
+ return -EINVAL;
+ }
+ }
+ dev_set_drvdata(dev, fi);
+ return 0;
+}
+#else
+static int create_alsa_device(struct usb_function_instance *fi)
+{
+ return 0;
+}
+#endif
+
static struct usb_function_instance *f_midi_alloc_inst(void)
{
struct f_midi_opts *opts;
@@ -1225,6 +1284,11 @@
opts->in_ports = 1;
opts->out_ports = 1;
+ if (create_alsa_device(&opts->func_inst)) {
+ kfree(opts);
+ return ERR_PTR(-ENODEV);
+ }
+
config_group_init_type_name(&opts->func_inst.group, "",
&midi_func_type);
@@ -1243,6 +1307,7 @@
kfree(midi->id);
kfifo_free(&midi->in_req_fifo);
kfree(midi);
+ opts->func_inst.f = NULL;
--opts->refcnt;
}
mutex_unlock(&opts->lock);
@@ -1329,6 +1394,7 @@
midi->func.disable = f_midi_disable;
midi->func.free_func = f_midi_free;
+ fi->f = &midi->func;
return &midi->func;
setup_fail:
diff --git a/drivers/usb/misc/sisusbvga/sisusb_con.c b/drivers/usb/misc/sisusbvga/sisusb_con.c
index f019d80..dc45cfc 100644
--- a/drivers/usb/misc/sisusbvga/sisusb_con.c
+++ b/drivers/usb/misc/sisusbvga/sisusb_con.c
@@ -1216,7 +1216,7 @@
/* Interface routine */
static int
sisusbcon_font_set(struct vc_data *c, struct console_font *font,
- unsigned flags)
+ unsigned int flags)
{
struct sisusb_usb_data *sisusb;
unsigned charcount = font->charcount;
@@ -1337,29 +1337,65 @@
vc_resize(vc, 80, 25);
}
-static int sisusbdummycon_dummy(void)
+static void sisusbdummycon_deinit(struct vc_data *vc) { }
+static void sisusbdummycon_clear(struct vc_data *vc, int sy, int sx,
+ int height, int width) { }
+static void sisusbdummycon_putc(struct vc_data *vc, int c, int ypos,
+ int xpos) { }
+static void sisusbdummycon_putcs(struct vc_data *vc, const unsigned short *s,
+ int count, int ypos, int xpos) { }
+static void sisusbdummycon_cursor(struct vc_data *vc, int mode) { }
+
+static bool sisusbdummycon_scroll(struct vc_data *vc, unsigned int top,
+ unsigned int bottom, enum con_scroll dir,
+ unsigned int lines)
{
- return 0;
+ return false;
}
-#define SISUSBCONDUMMY (void *)sisusbdummycon_dummy
+static int sisusbdummycon_switch(struct vc_data *vc)
+{
+ return 0;
+}
+
+static int sisusbdummycon_blank(struct vc_data *vc, int blank, int mode_switch)
+{
+ return 0;
+}
+
+static int sisusbdummycon_font_set(struct vc_data *vc,
+ struct console_font *font,
+ unsigned int flags)
+{
+ return 0;
+}
+
+static int sisusbdummycon_font_default(struct vc_data *vc,
+ struct console_font *font, char *name)
+{
+ return 0;
+}
+
+static int sisusbdummycon_font_copy(struct vc_data *vc, int con)
+{
+ return 0;
+}
static const struct consw sisusb_dummy_con = {
.owner = THIS_MODULE,
.con_startup = sisusbdummycon_startup,
.con_init = sisusbdummycon_init,
- .con_deinit = SISUSBCONDUMMY,
- .con_clear = SISUSBCONDUMMY,
- .con_putc = SISUSBCONDUMMY,
- .con_putcs = SISUSBCONDUMMY,
- .con_cursor = SISUSBCONDUMMY,
- .con_scroll = SISUSBCONDUMMY,
- .con_switch = SISUSBCONDUMMY,
- .con_blank = SISUSBCONDUMMY,
- .con_font_set = SISUSBCONDUMMY,
- .con_font_get = SISUSBCONDUMMY,
- .con_font_default = SISUSBCONDUMMY,
- .con_font_copy = SISUSBCONDUMMY,
+ .con_deinit = sisusbdummycon_deinit,
+ .con_clear = sisusbdummycon_clear,
+ .con_putc = sisusbdummycon_putc,
+ .con_putcs = sisusbdummycon_putcs,
+ .con_cursor = sisusbdummycon_cursor,
+ .con_scroll = sisusbdummycon_scroll,
+ .con_switch = sisusbdummycon_switch,
+ .con_blank = sisusbdummycon_blank,
+ .con_font_set = sisusbdummycon_font_set,
+ .con_font_default = sisusbdummycon_font_default,
+ .con_font_copy = sisusbdummycon_font_copy,
};
int
diff --git a/drivers/usb/phy/Kconfig b/drivers/usb/phy/Kconfig
index aff702c..15e822e 100644
--- a/drivers/usb/phy/Kconfig
+++ b/drivers/usb/phy/Kconfig
@@ -7,6 +7,14 @@
select EXTCON
def_bool n
+config USB_OTG_WAKELOCK
+ bool "Hold a wakelock when USB connected"
+ depends on PM_WAKELOCKS
+ select USB_OTG_UTILS
+ help
+ Select this to automatically hold a wakelock when USB is
+ connected, preventing suspend.
+
#
# USB Transceiver Drivers
#
@@ -202,4 +210,13 @@
Provides read/write operations to the ULPI phy register set for
controllers with a viewport register (e.g. Chipidea/ARC controllers).
+config DUAL_ROLE_USB_INTF
+ bool "Generic DUAL ROLE sysfs interface"
+ depends on SYSFS && USB_PHY
+ help
+ A generic sysfs interface to track and change the state of
+ dual role usb phys. The usb phy drivers can register to
+ this interface to expose it capabilities to the userspace
+ and thereby allowing userspace to change the port mode.
+
endmenu
diff --git a/drivers/usb/phy/Makefile b/drivers/usb/phy/Makefile
index 0c40ccc..68867d6 100644
--- a/drivers/usb/phy/Makefile
+++ b/drivers/usb/phy/Makefile
@@ -4,6 +4,8 @@
#
obj-$(CONFIG_USB_PHY) += phy.o
obj-$(CONFIG_OF) += of.o
+obj-$(CONFIG_USB_OTG_WAKELOCK) += otg-wakelock.o
+obj-$(CONFIG_DUAL_ROLE_USB_INTF) += class-dual-role.o
# transceiver drivers, keep the list sorted
diff --git a/drivers/usb/phy/class-dual-role.c b/drivers/usb/phy/class-dual-role.c
new file mode 100644
index 0000000..51fcb54
--- /dev/null
+++ b/drivers/usb/phy/class-dual-role.c
@@ -0,0 +1,529 @@
+/*
+ * class-dual-role.c
+ *
+ * Copyright (C) 2015 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/ctype.h>
+#include <linux/device.h>
+#include <linux/usb/class-dual-role.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <linux/stat.h>
+#include <linux/types.h>
+
+#define DUAL_ROLE_NOTIFICATION_TIMEOUT 2000
+
+static ssize_t dual_role_store_property(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count);
+static ssize_t dual_role_show_property(struct device *dev,
+ struct device_attribute *attr,
+ char *buf);
+
+#define DUAL_ROLE_ATTR(_name) \
+{ \
+ .attr = { .name = #_name }, \
+ .show = dual_role_show_property, \
+ .store = dual_role_store_property, \
+}
+
+static struct device_attribute dual_role_attrs[] = {
+ DUAL_ROLE_ATTR(supported_modes),
+ DUAL_ROLE_ATTR(mode),
+ DUAL_ROLE_ATTR(power_role),
+ DUAL_ROLE_ATTR(data_role),
+ DUAL_ROLE_ATTR(powers_vconn),
+};
+
+struct class *dual_role_class;
+EXPORT_SYMBOL_GPL(dual_role_class);
+
+static struct device_type dual_role_dev_type;
+
+static char *kstrdupcase(const char *str, gfp_t gfp, bool to_upper)
+{
+ char *ret, *ustr;
+
+ ustr = ret = kmalloc(strlen(str) + 1, gfp);
+
+ if (!ret)
+ return NULL;
+
+ while (*str)
+ *ustr++ = to_upper ? toupper(*str++) : tolower(*str++);
+
+ *ustr = 0;
+
+ return ret;
+}
+
+static void dual_role_changed_work(struct work_struct *work)
+{
+ struct dual_role_phy_instance *dual_role =
+ container_of(work, struct dual_role_phy_instance,
+ changed_work);
+
+ dev_dbg(&dual_role->dev, "%s\n", __func__);
+ kobject_uevent(&dual_role->dev.kobj, KOBJ_CHANGE);
+}
+
+void dual_role_instance_changed(struct dual_role_phy_instance *dual_role)
+{
+ dev_dbg(&dual_role->dev, "%s\n", __func__);
+ pm_wakeup_event(&dual_role->dev, DUAL_ROLE_NOTIFICATION_TIMEOUT);
+ schedule_work(&dual_role->changed_work);
+}
+EXPORT_SYMBOL_GPL(dual_role_instance_changed);
+
+int dual_role_get_property(struct dual_role_phy_instance *dual_role,
+ enum dual_role_property prop,
+ unsigned int *val)
+{
+ return dual_role->desc->get_property(dual_role, prop, val);
+}
+EXPORT_SYMBOL_GPL(dual_role_get_property);
+
+int dual_role_set_property(struct dual_role_phy_instance *dual_role,
+ enum dual_role_property prop,
+ const unsigned int *val)
+{
+ if (!dual_role->desc->set_property)
+ return -ENODEV;
+
+ return dual_role->desc->set_property(dual_role, prop, val);
+}
+EXPORT_SYMBOL_GPL(dual_role_set_property);
+
+int dual_role_property_is_writeable(struct dual_role_phy_instance *dual_role,
+ enum dual_role_property prop)
+{
+ if (!dual_role->desc->property_is_writeable)
+ return -ENODEV;
+
+ return dual_role->desc->property_is_writeable(dual_role, prop);
+}
+EXPORT_SYMBOL_GPL(dual_role_property_is_writeable);
+
+static void dual_role_dev_release(struct device *dev)
+{
+ struct dual_role_phy_instance *dual_role =
+ container_of(dev, struct dual_role_phy_instance, dev);
+ pr_debug("device: '%s': %s\n", dev_name(dev), __func__);
+ kfree(dual_role);
+}
+
+static struct dual_role_phy_instance *__must_check
+__dual_role_register(struct device *parent,
+ const struct dual_role_phy_desc *desc)
+{
+ struct device *dev;
+ struct dual_role_phy_instance *dual_role;
+ int rc;
+
+ dual_role = kzalloc(sizeof(*dual_role), GFP_KERNEL);
+ if (!dual_role)
+ return ERR_PTR(-ENOMEM);
+
+ dev = &dual_role->dev;
+
+ device_initialize(dev);
+
+ dev->class = dual_role_class;
+ dev->type = &dual_role_dev_type;
+ dev->parent = parent;
+ dev->release = dual_role_dev_release;
+ dev_set_drvdata(dev, dual_role);
+ dual_role->desc = desc;
+
+ rc = dev_set_name(dev, "%s", desc->name);
+ if (rc)
+ goto dev_set_name_failed;
+
+ INIT_WORK(&dual_role->changed_work, dual_role_changed_work);
+
+ rc = device_init_wakeup(dev, true);
+ if (rc)
+ goto wakeup_init_failed;
+
+ rc = device_add(dev);
+ if (rc)
+ goto device_add_failed;
+
+ dual_role_instance_changed(dual_role);
+
+ return dual_role;
+
+device_add_failed:
+ device_init_wakeup(dev, false);
+wakeup_init_failed:
+dev_set_name_failed:
+ put_device(dev);
+ kfree(dual_role);
+
+ return ERR_PTR(rc);
+}
+
+static void dual_role_instance_unregister(struct dual_role_phy_instance
+ *dual_role)
+{
+ cancel_work_sync(&dual_role->changed_work);
+ device_init_wakeup(&dual_role->dev, false);
+ device_unregister(&dual_role->dev);
+}
+
+static void devm_dual_role_release(struct device *dev, void *res)
+{
+ struct dual_role_phy_instance **dual_role = res;
+
+ dual_role_instance_unregister(*dual_role);
+}
+
+struct dual_role_phy_instance *__must_check
+devm_dual_role_instance_register(struct device *parent,
+ const struct dual_role_phy_desc *desc)
+{
+ struct dual_role_phy_instance **ptr, *dual_role;
+
+ ptr = devres_alloc(devm_dual_role_release, sizeof(*ptr), GFP_KERNEL);
+
+ if (!ptr)
+ return ERR_PTR(-ENOMEM);
+ dual_role = __dual_role_register(parent, desc);
+ if (IS_ERR(dual_role)) {
+ devres_free(ptr);
+ } else {
+ *ptr = dual_role;
+ devres_add(parent, ptr);
+ }
+ return dual_role;
+}
+EXPORT_SYMBOL_GPL(devm_dual_role_instance_register);
+
+static int devm_dual_role_match(struct device *dev, void *res, void *data)
+{
+ struct dual_role_phy_instance **r = res;
+
+ if (WARN_ON(!r || !*r))
+ return 0;
+
+ return *r == data;
+}
+
+void devm_dual_role_instance_unregister(struct device *dev,
+ struct dual_role_phy_instance
+ *dual_role)
+{
+ int rc;
+
+ rc = devres_release(dev, devm_dual_role_release,
+ devm_dual_role_match, dual_role);
+ WARN_ON(rc);
+}
+EXPORT_SYMBOL_GPL(devm_dual_role_instance_unregister);
+
+void *dual_role_get_drvdata(struct dual_role_phy_instance *dual_role)
+{
+ return dual_role->drv_data;
+}
+EXPORT_SYMBOL_GPL(dual_role_get_drvdata);
+
+/***************** Device attribute functions **************************/
+
+/* port type */
+static char *supported_modes_text[] = {
+ "ufp dfp", "dfp", "ufp"
+};
+
+/* current mode */
+static char *mode_text[] = {
+ "ufp", "dfp", "none"
+};
+
+/* Power role */
+static char *pr_text[] = {
+ "source", "sink", "none"
+};
+
+/* Data role */
+static char *dr_text[] = {
+ "host", "device", "none"
+};
+
+/* Vconn supply */
+static char *vconn_supply_text[] = {
+ "n", "y"
+};
+
+static ssize_t dual_role_show_property(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ ssize_t ret = 0;
+ struct dual_role_phy_instance *dual_role = dev_get_drvdata(dev);
+ const ptrdiff_t off = attr - dual_role_attrs;
+ unsigned int value;
+
+ if (off == DUAL_ROLE_PROP_SUPPORTED_MODES) {
+ value = dual_role->desc->supported_modes;
+ } else {
+ ret = dual_role_get_property(dual_role, off, &value);
+
+ if (ret < 0) {
+ if (ret == -ENODATA)
+ dev_dbg(dev,
+ "driver has no data for `%s' property\n",
+ attr->attr.name);
+ else if (ret != -ENODEV)
+ dev_err(dev,
+ "driver failed to report `%s' property: %zd\n",
+ attr->attr.name, ret);
+ return ret;
+ }
+ }
+
+ if (off == DUAL_ROLE_PROP_SUPPORTED_MODES) {
+ BUILD_BUG_ON(DUAL_ROLE_PROP_SUPPORTED_MODES_TOTAL !=
+ ARRAY_SIZE(supported_modes_text));
+ if (value < DUAL_ROLE_PROP_SUPPORTED_MODES_TOTAL)
+ return snprintf(buf, PAGE_SIZE, "%s\n",
+ supported_modes_text[value]);
+ else
+ return -EIO;
+ } else if (off == DUAL_ROLE_PROP_MODE) {
+ BUILD_BUG_ON(DUAL_ROLE_PROP_MODE_TOTAL !=
+ ARRAY_SIZE(mode_text));
+ if (value < DUAL_ROLE_PROP_MODE_TOTAL)
+ return snprintf(buf, PAGE_SIZE, "%s\n",
+ mode_text[value]);
+ else
+ return -EIO;
+ } else if (off == DUAL_ROLE_PROP_PR) {
+ BUILD_BUG_ON(DUAL_ROLE_PROP_PR_TOTAL != ARRAY_SIZE(pr_text));
+ if (value < DUAL_ROLE_PROP_PR_TOTAL)
+ return snprintf(buf, PAGE_SIZE, "%s\n",
+ pr_text[value]);
+ else
+ return -EIO;
+ } else if (off == DUAL_ROLE_PROP_DR) {
+ BUILD_BUG_ON(DUAL_ROLE_PROP_DR_TOTAL != ARRAY_SIZE(dr_text));
+ if (value < DUAL_ROLE_PROP_DR_TOTAL)
+ return snprintf(buf, PAGE_SIZE, "%s\n",
+ dr_text[value]);
+ else
+ return -EIO;
+ } else if (off == DUAL_ROLE_PROP_VCONN_SUPPLY) {
+ BUILD_BUG_ON(DUAL_ROLE_PROP_VCONN_SUPPLY_TOTAL !=
+ ARRAY_SIZE(vconn_supply_text));
+ if (value < DUAL_ROLE_PROP_VCONN_SUPPLY_TOTAL)
+ return snprintf(buf, PAGE_SIZE, "%s\n",
+ vconn_supply_text[value]);
+ else
+ return -EIO;
+ } else
+ return -EIO;
+}
+
+static ssize_t dual_role_store_property(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ ssize_t ret;
+ struct dual_role_phy_instance *dual_role = dev_get_drvdata(dev);
+ const ptrdiff_t off = attr - dual_role_attrs;
+ unsigned int value;
+ int total, i;
+ char *dup_buf, **text_array;
+ bool result = false;
+
+ dup_buf = kstrdupcase(buf, GFP_KERNEL, false);
+ switch (off) {
+ case DUAL_ROLE_PROP_MODE:
+ total = DUAL_ROLE_PROP_MODE_TOTAL;
+ text_array = mode_text;
+ break;
+ case DUAL_ROLE_PROP_PR:
+ total = DUAL_ROLE_PROP_PR_TOTAL;
+ text_array = pr_text;
+ break;
+ case DUAL_ROLE_PROP_DR:
+ total = DUAL_ROLE_PROP_DR_TOTAL;
+ text_array = dr_text;
+ break;
+ case DUAL_ROLE_PROP_VCONN_SUPPLY:
+ ret = strtobool(dup_buf, &result);
+ value = result;
+ if (!ret)
+ goto setprop;
+ default:
+ ret = -EINVAL;
+ goto error;
+ }
+
+ for (i = 0; i <= total; i++) {
+ if (i == total) {
+ ret = -ENOTSUPP;
+ goto error;
+ }
+ if (!strncmp(*(text_array + i), dup_buf,
+ strlen(*(text_array + i)))) {
+ value = i;
+ break;
+ }
+ }
+
+setprop:
+ ret = dual_role->desc->set_property(dual_role, off, &value);
+
+error:
+ kfree(dup_buf);
+
+ if (ret < 0)
+ return ret;
+
+ return count;
+}
+
+static umode_t dual_role_attr_is_visible(struct kobject *kobj,
+ struct attribute *attr, int attrno)
+{
+ struct device *dev = container_of(kobj, struct device, kobj);
+ struct dual_role_phy_instance *dual_role = dev_get_drvdata(dev);
+ umode_t mode = S_IRUSR | S_IRGRP | S_IROTH;
+ int i;
+
+ if (attrno == DUAL_ROLE_PROP_SUPPORTED_MODES)
+ return mode;
+
+ for (i = 0; i < dual_role->desc->num_properties; i++) {
+ int property = dual_role->desc->properties[i];
+
+ if (property == attrno) {
+ if (dual_role->desc->property_is_writeable &&
+ dual_role_property_is_writeable(dual_role, property)
+ > 0)
+ mode |= S_IWUSR;
+
+ return mode;
+ }
+ }
+
+ return 0;
+}
+
+static struct attribute *__dual_role_attrs[ARRAY_SIZE(dual_role_attrs) + 1];
+
+static struct attribute_group dual_role_attr_group = {
+ .attrs = __dual_role_attrs,
+ .is_visible = dual_role_attr_is_visible,
+};
+
+static const struct attribute_group *dual_role_attr_groups[] = {
+ &dual_role_attr_group,
+ NULL,
+};
+
+void dual_role_init_attrs(struct device_type *dev_type)
+{
+ int i;
+
+ dev_type->groups = dual_role_attr_groups;
+
+ for (i = 0; i < ARRAY_SIZE(dual_role_attrs); i++)
+ __dual_role_attrs[i] = &dual_role_attrs[i].attr;
+}
+
+int dual_role_uevent(struct device *dev, struct kobj_uevent_env *env)
+{
+ struct dual_role_phy_instance *dual_role = dev_get_drvdata(dev);
+ int ret = 0, j;
+ char *prop_buf;
+ char *attrname;
+
+ dev_dbg(dev, "uevent\n");
+
+ if (!dual_role || !dual_role->desc) {
+ dev_dbg(dev, "No dual_role phy yet\n");
+ return ret;
+ }
+
+ dev_dbg(dev, "DUAL_ROLE_NAME=%s\n", dual_role->desc->name);
+
+ ret = add_uevent_var(env, "DUAL_ROLE_NAME=%s", dual_role->desc->name);
+ if (ret)
+ return ret;
+
+ prop_buf = (char *)get_zeroed_page(GFP_KERNEL);
+ if (!prop_buf)
+ return -ENOMEM;
+
+ for (j = 0; j < dual_role->desc->num_properties; j++) {
+ struct device_attribute *attr;
+ char *line;
+
+ attr = &dual_role_attrs[dual_role->desc->properties[j]];
+
+ ret = dual_role_show_property(dev, attr, prop_buf);
+ if (ret == -ENODEV || ret == -ENODATA) {
+ ret = 0;
+ continue;
+ }
+
+ if (ret < 0)
+ goto out;
+ line = strnchr(prop_buf, PAGE_SIZE, '\n');
+ if (line)
+ *line = 0;
+
+ attrname = kstrdupcase(attr->attr.name, GFP_KERNEL, true);
+ if (!attrname)
+ ret = -ENOMEM;
+
+ dev_dbg(dev, "prop %s=%s\n", attrname, prop_buf);
+
+ ret = add_uevent_var(env, "DUAL_ROLE_%s=%s", attrname,
+ prop_buf);
+ kfree(attrname);
+ if (ret)
+ goto out;
+ }
+
+out:
+ free_page((unsigned long)prop_buf);
+
+ return ret;
+}
+
+/******************* Module Init ***********************************/
+
+static int __init dual_role_class_init(void)
+{
+ dual_role_class = class_create(THIS_MODULE, "dual_role_usb");
+
+ if (IS_ERR(dual_role_class))
+ return PTR_ERR(dual_role_class);
+
+ dual_role_class->dev_uevent = dual_role_uevent;
+ dual_role_init_attrs(&dual_role_dev_type);
+
+ return 0;
+}
+
+static void __exit dual_role_class_exit(void)
+{
+ class_destroy(dual_role_class);
+}
+
+subsys_initcall(dual_role_class_init);
+module_exit(dual_role_class_exit);
diff --git a/drivers/usb/phy/otg-wakelock.c b/drivers/usb/phy/otg-wakelock.c
new file mode 100644
index 0000000..ecd7410
--- /dev/null
+++ b/drivers/usb/phy/otg-wakelock.c
@@ -0,0 +1,170 @@
+/*
+ * otg-wakelock.c
+ *
+ * Copyright (C) 2011 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/device.h>
+#include <linux/err.h>
+#include <linux/module.h>
+#include <linux/notifier.h>
+#include <linux/spinlock.h>
+#include <linux/usb/otg.h>
+
+#define TEMPORARY_HOLD_TIME 2000
+
+static bool enabled = true;
+static struct usb_phy *otgwl_xceiv;
+static struct notifier_block otgwl_nb;
+
+/*
+ * otgwl_spinlock is held while the VBUS lock is grabbed or dropped and the
+ * held field is updated to match.
+ */
+
+static DEFINE_SPINLOCK(otgwl_spinlock);
+
+/*
+ * Only one lock, but since these 3 fields are associated with each other...
+ */
+
+struct otgwl_lock {
+ char name[40];
+ struct wakeup_source wakesrc;
+ bool held;
+};
+
+/*
+ * VBUS present lock. Also used as a timed lock on charger
+ * connect/disconnect and USB host disconnect, to allow the system
+ * to react to the change in power.
+ */
+
+static struct otgwl_lock vbus_lock;
+
+static void otgwl_hold(struct otgwl_lock *lock)
+{
+ if (!lock->held) {
+ __pm_stay_awake(&lock->wakesrc);
+ lock->held = true;
+ }
+}
+
+static void otgwl_temporary_hold(struct otgwl_lock *lock)
+{
+ __pm_wakeup_event(&lock->wakesrc, TEMPORARY_HOLD_TIME);
+ lock->held = false;
+}
+
+static void otgwl_drop(struct otgwl_lock *lock)
+{
+ if (lock->held) {
+ __pm_relax(&lock->wakesrc);
+ lock->held = false;
+ }
+}
+
+static void otgwl_handle_event(unsigned long event)
+{
+ unsigned long irqflags;
+
+ spin_lock_irqsave(&otgwl_spinlock, irqflags);
+
+ if (!enabled) {
+ otgwl_drop(&vbus_lock);
+ spin_unlock_irqrestore(&otgwl_spinlock, irqflags);
+ return;
+ }
+
+ switch (event) {
+ case USB_EVENT_VBUS:
+ case USB_EVENT_ENUMERATED:
+ otgwl_hold(&vbus_lock);
+ break;
+
+ case USB_EVENT_NONE:
+ case USB_EVENT_ID:
+ case USB_EVENT_CHARGER:
+ otgwl_temporary_hold(&vbus_lock);
+ break;
+
+ default:
+ break;
+ }
+
+ spin_unlock_irqrestore(&otgwl_spinlock, irqflags);
+}
+
+static int otgwl_otg_notifications(struct notifier_block *nb,
+ unsigned long event, void *unused)
+{
+ otgwl_handle_event(event);
+ return NOTIFY_OK;
+}
+
+static int set_enabled(const char *val, const struct kernel_param *kp)
+{
+ int rv = param_set_bool(val, kp);
+
+ if (rv)
+ return rv;
+
+ if (otgwl_xceiv)
+ otgwl_handle_event(otgwl_xceiv->last_event);
+
+ return 0;
+}
+
+static struct kernel_param_ops enabled_param_ops = {
+ .set = set_enabled,
+ .get = param_get_bool,
+};
+
+module_param_cb(enabled, &enabled_param_ops, &enabled, 0644);
+MODULE_PARM_DESC(enabled, "enable wakelock when VBUS present");
+
+static int __init otg_wakelock_init(void)
+{
+ int ret;
+ struct usb_phy *phy;
+
+ phy = usb_get_phy(USB_PHY_TYPE_USB2);
+
+ if (IS_ERR(phy)) {
+ pr_err("%s: No USB transceiver found\n", __func__);
+ return PTR_ERR(phy);
+ }
+ otgwl_xceiv = phy;
+
+ snprintf(vbus_lock.name, sizeof(vbus_lock.name), "vbus-%s",
+ dev_name(otgwl_xceiv->dev));
+ wakeup_source_init(&vbus_lock.wakesrc, vbus_lock.name);
+
+ otgwl_nb.notifier_call = otgwl_otg_notifications;
+ ret = usb_register_notifier(otgwl_xceiv, &otgwl_nb);
+
+ if (ret) {
+ pr_err("%s: usb_register_notifier on transceiver %s"
+ " failed\n", __func__,
+ dev_name(otgwl_xceiv->dev));
+ otgwl_xceiv = NULL;
+ wakeup_source_trash(&vbus_lock.wakesrc);
+ return ret;
+ }
+
+ otgwl_handle_event(otgwl_xceiv->last_event);
+ return ret;
+}
+
+late_initcall(otg_wakelock_init);
diff --git a/drivers/video/console/dummycon.c b/drivers/video/console/dummycon.c
index b90ef96..f2eafe2 100644
--- a/drivers/video/console/dummycon.c
+++ b/drivers/video/console/dummycon.c
@@ -41,12 +41,47 @@
vc_resize(vc, DUMMY_COLUMNS, DUMMY_ROWS);
}
-static int dummycon_dummy(void)
+static void dummycon_deinit(struct vc_data *vc) { }
+static void dummycon_clear(struct vc_data *vc, int sy, int sx, int height,
+ int width) { }
+static void dummycon_putc(struct vc_data *vc, int c, int ypos, int xpos) { }
+static void dummycon_putcs(struct vc_data *vc, const unsigned short *s,
+ int count, int ypos, int xpos) { }
+static void dummycon_cursor(struct vc_data *vc, int mode) { }
+
+static bool dummycon_scroll(struct vc_data *vc, unsigned int top,
+ unsigned int bottom, enum con_scroll dir,
+ unsigned int lines)
{
- return 0;
+ return false;
}
-#define DUMMY (void *)dummycon_dummy
+static int dummycon_switch(struct vc_data *vc)
+{
+ return 0;
+}
+
+static int dummycon_blank(struct vc_data *vc, int blank, int mode_switch)
+{
+ return 0;
+}
+
+static int dummycon_font_set(struct vc_data *vc, struct console_font *font,
+ unsigned int flags)
+{
+ return 0;
+}
+
+static int dummycon_font_default(struct vc_data *vc,
+ struct console_font *font, char *name)
+{
+ return 0;
+}
+
+static int dummycon_font_copy(struct vc_data *vc, int con)
+{
+ return 0;
+}
/*
* The console `switch' structure for the dummy console
@@ -55,19 +90,19 @@
*/
const struct consw dummy_con = {
- .owner = THIS_MODULE,
- .con_startup = dummycon_startup,
- .con_init = dummycon_init,
- .con_deinit = DUMMY,
- .con_clear = DUMMY,
- .con_putc = DUMMY,
- .con_putcs = DUMMY,
- .con_cursor = DUMMY,
- .con_scroll = DUMMY,
- .con_switch = DUMMY,
- .con_blank = DUMMY,
- .con_font_set = DUMMY,
- .con_font_default = DUMMY,
- .con_font_copy = DUMMY,
+ .owner = THIS_MODULE,
+ .con_startup = dummycon_startup,
+ .con_init = dummycon_init,
+ .con_deinit = dummycon_deinit,
+ .con_clear = dummycon_clear,
+ .con_putc = dummycon_putc,
+ .con_putcs = dummycon_putcs,
+ .con_cursor = dummycon_cursor,
+ .con_scroll = dummycon_scroll,
+ .con_switch = dummycon_switch,
+ .con_blank = dummycon_blank,
+ .con_font_set = dummycon_font_set,
+ .con_font_default = dummycon_font_default,
+ .con_font_copy = dummycon_font_copy,
};
EXPORT_SYMBOL_GPL(dummy_con);
diff --git a/drivers/video/console/newport_con.c b/drivers/video/console/newport_con.c
index 42d02a2..7f2526b 100644
--- a/drivers/video/console/newport_con.c
+++ b/drivers/video/console/newport_con.c
@@ -673,12 +673,12 @@
return true;
}
-static int newport_dummy(struct vc_data *c)
+static int newport_set_origin(struct vc_data *vc)
{
return 0;
}
-#define DUMMY (void *) newport_dummy
+static void newport_save_screen(struct vc_data *vc) { }
const struct consw newport_con = {
.owner = THIS_MODULE,
@@ -694,8 +694,8 @@
.con_blank = newport_blank,
.con_font_set = newport_font_set,
.con_font_default = newport_font_default,
- .con_set_origin = DUMMY,
- .con_save_screen = DUMMY
+ .con_set_origin = newport_set_origin,
+ .con_save_screen = newport_save_screen
};
static int newport_probe(struct gio_device *dev,
diff --git a/drivers/video/console/vgacon.c b/drivers/video/console/vgacon.c
index a17ba14..f09e17b 100644
--- a/drivers/video/console/vgacon.c
+++ b/drivers/video/console/vgacon.c
@@ -1272,7 +1272,8 @@
return 0;
}
-static int vgacon_font_set(struct vc_data *c, struct console_font *font, unsigned flags)
+static int vgacon_font_set(struct vc_data *c, struct console_font *font,
+ unsigned int flags)
{
unsigned charcount = font->charcount;
int rc;
@@ -1407,21 +1408,20 @@
* The console `switch' structure for the VGA based console
*/
-static int vgacon_dummy(struct vc_data *c)
-{
- return 0;
-}
-
-#define DUMMY (void *) vgacon_dummy
+static void vgacon_clear(struct vc_data *vc, int sy, int sx, int height,
+ int width) { }
+static void vgacon_putc(struct vc_data *vc, int c, int ypos, int xpos) { }
+static void vgacon_putcs(struct vc_data *vc, const unsigned short *s,
+ int count, int ypos, int xpos) { }
const struct consw vga_con = {
.owner = THIS_MODULE,
.con_startup = vgacon_startup,
.con_init = vgacon_init,
.con_deinit = vgacon_deinit,
- .con_clear = DUMMY,
- .con_putc = DUMMY,
- .con_putcs = DUMMY,
+ .con_clear = vgacon_clear,
+ .con_putc = vgacon_putc,
+ .con_putcs = vgacon_putcs,
.con_cursor = vgacon_cursor,
.con_scroll = vgacon_scroll,
.con_switch = vgacon_switch,
diff --git a/drivers/video/fbdev/core/fbcon.c b/drivers/video/fbdev/core/fbcon.c
index 04612f9..235d549 100644
--- a/drivers/video/fbdev/core/fbcon.c
+++ b/drivers/video/fbdev/core/fbcon.c
@@ -2588,7 +2588,8 @@
* is ever implemented.
*/
-static int fbcon_set_font(struct vc_data *vc, struct console_font *font, unsigned flags)
+static int fbcon_set_font(struct vc_data *vc, struct console_font *font,
+ unsigned int flags)
{
struct fb_info *info = registered_fb[con2fb_map[vc->vc_num]];
unsigned charcount = font->charcount;
diff --git a/drivers/video/fbdev/goldfishfb.c b/drivers/video/fbdev/goldfishfb.c
index 7f6c9e6..1e56b50 100644
--- a/drivers/video/fbdev/goldfishfb.c
+++ b/drivers/video/fbdev/goldfishfb.c
@@ -26,6 +26,7 @@
#include <linux/interrupt.h>
#include <linux/ioport.h>
#include <linux/platform_device.h>
+#include <linux/acpi.h>
enum {
FB_GET_WIDTH = 0x00,
@@ -234,7 +235,7 @@
fb->fb.var.activate = FB_ACTIVATE_NOW;
fb->fb.var.height = readl(fb->reg_base + FB_GET_PHYS_HEIGHT);
fb->fb.var.width = readl(fb->reg_base + FB_GET_PHYS_WIDTH);
- fb->fb.var.pixclock = 10000;
+ fb->fb.var.pixclock = 0;
fb->fb.var.red.offset = 11;
fb->fb.var.red.length = 5;
@@ -304,12 +305,25 @@
return 0;
}
+static const struct of_device_id goldfish_fb_of_match[] = {
+ { .compatible = "google,goldfish-fb", },
+ {},
+};
+MODULE_DEVICE_TABLE(of, goldfish_fb_of_match);
+
+static const struct acpi_device_id goldfish_fb_acpi_match[] = {
+ { "GFSH0004", 0 },
+ { },
+};
+MODULE_DEVICE_TABLE(acpi, goldfish_fb_acpi_match);
static struct platform_driver goldfish_fb_driver = {
.probe = goldfish_fb_probe,
.remove = goldfish_fb_remove,
.driver = {
- .name = "goldfish_fb"
+ .name = "goldfish_fb",
+ .of_match_table = goldfish_fb_of_match,
+ .acpi_match_table = ACPI_PTR(goldfish_fb_acpi_match),
}
};
diff --git a/fs/Kconfig b/fs/Kconfig
index 7aee6d6..121fabf 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -227,6 +227,7 @@
source "fs/adfs/Kconfig"
source "fs/affs/Kconfig"
source "fs/ecryptfs/Kconfig"
+source "fs/sdcardfs/Kconfig"
source "fs/hfs/Kconfig"
source "fs/hfsplus/Kconfig"
source "fs/befs/Kconfig"
diff --git a/fs/Makefile b/fs/Makefile
index ef772f1..1e34e4b 100644
--- a/fs/Makefile
+++ b/fs/Makefile
@@ -4,7 +4,7 @@
#
# 14 Sep 2000, Christoph Hellwig <hch@infradead.org>
# Rewritten to use lists instead of if-statements.
-#
+#
obj-y := open.o read_write.o file_table.o super.o \
char_dev.o stat.o exec.o pipe.o namei.o fcntl.o \
@@ -62,7 +62,7 @@
obj-$(CONFIG_PROFILING) += dcookies.o
obj-$(CONFIG_DLM) += dlm/
-
+
# Do not add any filesystems before this line
obj-$(CONFIG_FSCACHE) += fscache/
obj-$(CONFIG_REISERFS_FS) += reiserfs/
@@ -84,6 +84,7 @@
obj-$(CONFIG_HFSPLUS_FS) += hfsplus/ # Before hfs to find wrapped HFS+
obj-$(CONFIG_HFS_FS) += hfs/
obj-$(CONFIG_ECRYPT_FS) += ecryptfs/
+obj-$(CONFIG_SDCARD_FS) += sdcardfs/
obj-$(CONFIG_VXFS_FS) += freevxfs/
obj-$(CONFIG_NFS_FS) += nfs/
obj-$(CONFIG_EXPORTFS) += exportfs/
diff --git a/fs/afs/file.c b/fs/afs/file.c
index 510cba1..e6e949c 100644
--- a/fs/afs/file.c
+++ b/fs/afs/file.c
@@ -140,12 +140,11 @@
/*
* read page from file, directory or symlink, given a key to use
*/
-int afs_page_filler(void *data, struct page *page)
+static int __afs_page_filler(struct key *key, struct page *page)
{
struct inode *inode = page->mapping->host;
struct afs_vnode *vnode = AFS_FS_I(inode);
struct afs_read *req;
- struct key *key = data;
int ret;
_enter("{%x},{%lu},{%lu}", key_serial(key), inode->i_ino, page->index);
@@ -249,6 +248,13 @@
return ret;
}
+int afs_page_filler(struct file *data, struct page *page)
+{
+ struct key *key = (struct key *)data;
+
+ return __afs_page_filler(key, page);
+}
+
/*
* read page from file, directory or symlink, given a file to nominate the key
* to be used
@@ -261,14 +267,14 @@
if (file) {
key = file->private_data;
ASSERT(key != NULL);
- ret = afs_page_filler(key, page);
+ ret = __afs_page_filler(key, page);
} else {
struct inode *inode = page->mapping->host;
key = afs_request_key(AFS_FS_S(inode->i_sb)->volume->cell);
if (IS_ERR(key)) {
ret = PTR_ERR(key);
} else {
- ret = afs_page_filler(key, page);
+ ret = __afs_page_filler(key, page);
key_put(key);
}
}
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 82e1655..ed42f21 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -485,7 +485,7 @@
extern int afs_open(struct inode *, struct file *);
extern int afs_release(struct inode *, struct file *);
-extern int afs_page_filler(void *, struct page *);
+extern int afs_page_filler(struct file *, struct page *);
extern void afs_put_read(struct afs_read *);
/*
diff --git a/fs/attr.c b/fs/attr.c
index 12ffdb6..f5510d9 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -202,7 +202,7 @@
* the file open for write, as there can be no conflicting delegation in
* that case.
*/
-int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **delegated_inode)
+int notify_change2(struct vfsmount *mnt, struct dentry * dentry, struct iattr * attr, struct inode **delegated_inode)
{
struct inode *inode = dentry->d_inode;
umode_t mode = inode->i_mode;
@@ -226,7 +226,7 @@
return -EPERM;
if (!inode_owner_or_capable(inode)) {
- error = inode_permission(inode, MAY_WRITE);
+ error = inode_permission2(mnt, inode, MAY_WRITE);
if (error)
return error;
}
@@ -309,7 +309,9 @@
if (error)
return error;
- if (inode->i_op->setattr)
+ if (mnt && inode->i_op->setattr2)
+ error = inode->i_op->setattr2(mnt, dentry, attr);
+ else if (inode->i_op->setattr)
error = inode->i_op->setattr(dentry, attr);
else
error = simple_setattr(dentry, attr);
@@ -322,4 +324,10 @@
return error;
}
+EXPORT_SYMBOL(notify_change2);
+
+int notify_change(struct dentry * dentry, struct iattr * attr, struct inode **delegated_inode)
+{
+ return notify_change2(NULL, dentry, attr, delegated_inode);
+}
EXPORT_SYMBOL(notify_change);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5b62e06..6d0af1f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3819,8 +3819,8 @@
if (wbc->sync_mode == WB_SYNC_ALL)
tag_pages_for_writeback(mapping, index, end);
while (!done && !nr_to_write_done && (index <= end) &&
- (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag,
- min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1))) {
+ (nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end,
+ tag))) {
unsigned i;
scanned = 1;
@@ -3830,11 +3830,6 @@
if (!PagePrivate(page))
continue;
- if (!wbc->range_cyclic && page->index > end) {
- done = 1;
- break;
- }
-
spin_lock(&mapping->private_lock);
if (!PagePrivate(page)) {
spin_unlock(&mapping->private_lock);
@@ -3966,8 +3961,8 @@
tag_pages_for_writeback(mapping, index, end);
done_index = index;
while (!done && !nr_to_write_done && (index <= end) &&
- (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag,
- min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1))) {
+ (nr_pages = pagevec_lookup_range_tag(&pvec, mapping,
+ &index, end, tag))) {
unsigned i;
scanned = 1;
@@ -3992,12 +3987,6 @@
continue;
}
- if (!wbc->range_cyclic && page->index > end) {
- done = 1;
- unlock_page(page);
- continue;
- }
-
if (wbc->sync_mode != WB_SYNC_NONE) {
if (PageWriteback(page))
flush_fn(data);
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index 4d62265..26c682d 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -870,15 +870,10 @@
max_pages = wsize >> PAGE_SHIFT;
get_more_pages:
- pvec_pages = min_t(unsigned, PAGEVEC_SIZE,
- max_pages - locked_pages);
- if (end - index < (u64)(pvec_pages - 1))
- pvec_pages = (unsigned)(end - index) + 1;
-
- pvec_pages = pagevec_lookup_tag(&pvec, mapping, &index,
- PAGECACHE_TAG_DIRTY,
- pvec_pages);
- dout("pagevec_lookup_tag got %d\n", pvec_pages);
+ pvec_pages = pagevec_lookup_range_nr_tag(&pvec, mapping, &index,
+ end, PAGECACHE_TAG_DIRTY,
+ max_pages - locked_pages);
+ dout("pagevec_lookup_range_tag got %d\n", pvec_pages);
if (!pvec_pages && !locked_pages)
break;
for (i = 0; i < pvec_pages && locked_pages < max_pages; i++) {
@@ -896,16 +891,6 @@
unlock_page(page);
continue;
}
- if (page->index > end) {
- dout("end of range %p\n", page);
- /* can't be range_cyclic (1st pass) because
- * end == -1 in that case. */
- stop = true;
- if (ceph_wbc.head_snapc)
- done = true;
- unlock_page(page);
- break;
- }
if (strip_unit_end && (page->index > strip_unit_end)) {
dout("end of strip unit %p\n", page);
unlock_page(page);
@@ -1177,8 +1162,7 @@
index = 0;
while ((index <= end) &&
(nr = pagevec_lookup_tag(&pvec, mapping, &index,
- PAGECACHE_TAG_WRITEBACK,
- PAGEVEC_SIZE))) {
+ PAGECACHE_TAG_WRITEBACK))) {
for (i = 0; i < nr; i++) {
page = pvec.pages[i];
if (page_snap_context(page) != snapc)
diff --git a/fs/coredump.c b/fs/coredump.c
index 52c63d6..4b15f40 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -747,7 +747,7 @@
goto close_fail;
if (!(cprm.file->f_mode & FMODE_CAN_WRITE))
goto close_fail;
- if (do_truncate(cprm.file->f_path.dentry, 0, 0, cprm.file))
+ if (do_truncate2(cprm.file->f_path.mnt, cprm.file->f_path.dentry, 0, 0, cprm.file))
goto close_fail;
}
diff --git a/fs/crypto/Makefile b/fs/crypto/Makefile
index 9f6607f..cb49698 100644
--- a/fs/crypto/Makefile
+++ b/fs/crypto/Makefile
@@ -1,4 +1,4 @@
obj-$(CONFIG_FS_ENCRYPTION) += fscrypto.o
-fscrypto-y := crypto.o fname.o policy.o keyinfo.o
+fscrypto-y := crypto.o fname.o hooks.o keyinfo.o policy.o
fscrypto-$(CONFIG_BLOCK) += bio.o
diff --git a/fs/crypto/bio.c b/fs/crypto/bio.c
index 0d5e6a5..0959044 100644
--- a/fs/crypto/bio.c
+++ b/fs/crypto/bio.c
@@ -26,15 +26,8 @@
#include <linux/namei.h>
#include "fscrypt_private.h"
-/*
- * Call fscrypt_decrypt_page on every single page, reusing the encryption
- * context.
- */
-static void completion_pages(struct work_struct *work)
+static void __fscrypt_decrypt_bio(struct bio *bio, bool done)
{
- struct fscrypt_ctx *ctx =
- container_of(work, struct fscrypt_ctx, r.work);
- struct bio *bio = ctx->r.bio;
struct bio_vec *bv;
int i;
@@ -46,22 +39,38 @@
if (ret) {
WARN_ON_ONCE(1);
SetPageError(page);
- } else {
+ } else if (done) {
SetPageUptodate(page);
}
- unlock_page(page);
+ if (done)
+ unlock_page(page);
}
+}
+
+void fscrypt_decrypt_bio(struct bio *bio)
+{
+ __fscrypt_decrypt_bio(bio, false);
+}
+EXPORT_SYMBOL(fscrypt_decrypt_bio);
+
+static void completion_pages(struct work_struct *work)
+{
+ struct fscrypt_ctx *ctx =
+ container_of(work, struct fscrypt_ctx, r.work);
+ struct bio *bio = ctx->r.bio;
+
+ __fscrypt_decrypt_bio(bio, true);
fscrypt_release_ctx(ctx);
bio_put(bio);
}
-void fscrypt_decrypt_bio_pages(struct fscrypt_ctx *ctx, struct bio *bio)
+void fscrypt_enqueue_decrypt_bio(struct fscrypt_ctx *ctx, struct bio *bio)
{
INIT_WORK(&ctx->r.work, completion_pages);
ctx->r.bio = bio;
- queue_work(fscrypt_read_workqueue, &ctx->r.work);
+ fscrypt_enqueue_decrypt_work(&ctx->r.work);
}
-EXPORT_SYMBOL(fscrypt_decrypt_bio_pages);
+EXPORT_SYMBOL(fscrypt_enqueue_decrypt_bio);
void fscrypt_pullback_bio_page(struct page **page, bool restore)
{
diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c
index daf2683..0f46cf5 100644
--- a/fs/crypto/crypto.c
+++ b/fs/crypto/crypto.c
@@ -27,6 +27,7 @@
#include <linux/dcache.h>
#include <linux/namei.h>
#include <crypto/aes.h>
+#include <crypto/skcipher.h>
#include "fscrypt_private.h"
static unsigned int num_prealloc_crypto_pages = 32;
@@ -44,12 +45,18 @@
static LIST_HEAD(fscrypt_free_ctxs);
static DEFINE_SPINLOCK(fscrypt_ctx_lock);
-struct workqueue_struct *fscrypt_read_workqueue;
+static struct workqueue_struct *fscrypt_read_workqueue;
static DEFINE_MUTEX(fscrypt_init_mutex);
static struct kmem_cache *fscrypt_ctx_cachep;
struct kmem_cache *fscrypt_info_cachep;
+void fscrypt_enqueue_decrypt_work(struct work_struct *work)
+{
+ queue_work(fscrypt_read_workqueue, work);
+}
+EXPORT_SYMBOL(fscrypt_enqueue_decrypt_work);
+
/**
* fscrypt_release_ctx() - Releases an encryption context
* @ctx: The encryption context to release.
@@ -126,21 +133,6 @@
}
EXPORT_SYMBOL(fscrypt_get_ctx);
-/**
- * page_crypt_complete() - completion callback for page crypto
- * @req: The asynchronous cipher request context
- * @res: The result of the cipher operation
- */
-static void page_crypt_complete(struct crypto_async_request *req, int res)
-{
- struct fscrypt_completion_result *ecr = req->data;
-
- if (res == -EINPROGRESS)
- return;
- ecr->res = res;
- complete(&ecr->completion);
-}
-
int fscrypt_do_page_crypto(const struct inode *inode, fscrypt_direction_t rw,
u64 lblk_num, struct page *src_page,
struct page *dest_page, unsigned int len,
@@ -151,7 +143,7 @@
u8 padding[FS_IV_SIZE - sizeof(__le64)];
} iv;
struct skcipher_request *req = NULL;
- DECLARE_FS_COMPLETION_RESULT(ecr);
+ DECLARE_CRYPTO_WAIT(wait);
struct scatterlist dst, src;
struct fscrypt_info *ci = inode->i_crypt_info;
struct crypto_skcipher *tfm = ci->ci_ctfm;
@@ -170,16 +162,12 @@
}
req = skcipher_request_alloc(tfm, gfp_flags);
- if (!req) {
- printk_ratelimited(KERN_ERR
- "%s: crypto_request_alloc() failed\n",
- __func__);
+ if (!req)
return -ENOMEM;
- }
skcipher_request_set_callback(
req, CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
- page_crypt_complete, &ecr);
+ crypto_req_done, &wait);
sg_init_table(&dst, 1);
sg_set_page(&dst, dest_page, len, offs);
@@ -187,19 +175,15 @@
sg_set_page(&src, src_page, len, offs);
skcipher_request_set_crypt(req, &src, &dst, len, &iv);
if (rw == FS_DECRYPT)
- res = crypto_skcipher_decrypt(req);
+ res = crypto_wait_req(crypto_skcipher_decrypt(req), &wait);
else
- res = crypto_skcipher_encrypt(req);
- if (res == -EINPROGRESS || res == -EBUSY) {
- BUG_ON(req->base.data != &ecr);
- wait_for_completion(&ecr.completion);
- res = ecr.res;
- }
+ res = crypto_wait_req(crypto_skcipher_encrypt(req), &wait);
skcipher_request_free(req);
if (res) {
- printk_ratelimited(KERN_ERR
- "%s: crypto_skcipher_encrypt() returned %d\n",
- __func__, res);
+ fscrypt_err(inode->i_sb,
+ "%scryption failed for inode %lu, block %llu: %d",
+ (rw == FS_DECRYPT ? "de" : "en"),
+ inode->i_ino, lblk_num, res);
return res;
}
return 0;
@@ -340,12 +324,11 @@
return -ECHILD;
dir = dget_parent(dentry);
- if (!d_inode(dir)->i_sb->s_cop->is_encrypted(d_inode(dir))) {
+ if (!IS_ENCRYPTED(d_inode(dir))) {
dput(dir);
return 0;
}
- /* this should eventually be an flag in d_flags */
spin_lock(&dentry->d_lock);
cached_with_key = dentry->d_flags & DCACHE_ENCRYPTED_WITH_KEY;
spin_unlock(&dentry->d_lock);
@@ -372,7 +355,6 @@
const struct dentry_operations fscrypt_d_ops = {
.d_revalidate = fscrypt_d_revalidate,
};
-EXPORT_SYMBOL(fscrypt_d_ops);
void fscrypt_restore_control_page(struct page *page)
{
@@ -441,6 +423,27 @@
return res;
}
+void fscrypt_msg(struct super_block *sb, const char *level,
+ const char *fmt, ...)
+{
+ static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
+ struct va_format vaf;
+ va_list args;
+
+ if (!__ratelimit(&rs))
+ return;
+
+ va_start(args, fmt);
+ vaf.fmt = fmt;
+ vaf.va = &args;
+ if (sb)
+ printk("%sfscrypt (%s): %pV\n", level, sb->s_id, &vaf);
+ else
+ printk("%sfscrypt: %pV\n", level, &vaf);
+ va_end(args);
+}
+
/**
* fscrypt_init() - Set up for fs encryption.
*/
diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index 8606da1..d7a0f68 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -13,89 +13,70 @@
#include <linux/scatterlist.h>
#include <linux/ratelimit.h>
+#include <crypto/skcipher.h>
#include "fscrypt_private.h"
-/**
- * fname_crypt_complete() - completion callback for filename crypto
- * @req: The asynchronous cipher request context
- * @res: The result of the cipher operation
- */
-static void fname_crypt_complete(struct crypto_async_request *req, int res)
+static inline bool fscrypt_is_dot_dotdot(const struct qstr *str)
{
- struct fscrypt_completion_result *ecr = req->data;
+ if (str->len == 1 && str->name[0] == '.')
+ return true;
- if (res == -EINPROGRESS)
- return;
- ecr->res = res;
- complete(&ecr->completion);
+ if (str->len == 2 && str->name[0] == '.' && str->name[1] == '.')
+ return true;
+
+ return false;
}
/**
* fname_encrypt() - encrypt a filename
*
- * The caller must have allocated sufficient memory for the @oname string.
+ * The output buffer must be at least as large as the input buffer.
+ * Any extra space is filled with NUL padding before encryption.
*
* Return: 0 on success, -errno on failure
*/
-static int fname_encrypt(struct inode *inode,
- const struct qstr *iname, struct fscrypt_str *oname)
+int fname_encrypt(struct inode *inode, const struct qstr *iname,
+ u8 *out, unsigned int olen)
{
struct skcipher_request *req = NULL;
- DECLARE_FS_COMPLETION_RESULT(ecr);
- struct fscrypt_info *ci = inode->i_crypt_info;
- struct crypto_skcipher *tfm = ci->ci_ctfm;
+ DECLARE_CRYPTO_WAIT(wait);
+ struct crypto_skcipher *tfm = inode->i_crypt_info->ci_ctfm;
int res = 0;
char iv[FS_CRYPTO_BLOCK_SIZE];
struct scatterlist sg;
- int padding = 4 << (ci->ci_flags & FS_POLICY_FLAGS_PAD_MASK);
- unsigned int lim;
- unsigned int cryptlen;
-
- lim = inode->i_sb->s_cop->max_namelen(inode);
- if (iname->len <= 0 || iname->len > lim)
- return -EIO;
/*
* Copy the filename to the output buffer for encrypting in-place and
* pad it with the needed number of NUL bytes.
*/
- cryptlen = max_t(unsigned int, iname->len, FS_CRYPTO_BLOCK_SIZE);
- cryptlen = round_up(cryptlen, padding);
- cryptlen = min(cryptlen, lim);
- memcpy(oname->name, iname->name, iname->len);
- memset(oname->name + iname->len, 0, cryptlen - iname->len);
+ if (WARN_ON(olen < iname->len))
+ return -ENOBUFS;
+ memcpy(out, iname->name, iname->len);
+ memset(out + iname->len, 0, olen - iname->len);
/* Initialize the IV */
memset(iv, 0, FS_CRYPTO_BLOCK_SIZE);
/* Set up the encryption request */
req = skcipher_request_alloc(tfm, GFP_NOFS);
- if (!req) {
- printk_ratelimited(KERN_ERR
- "%s: skcipher_request_alloc() failed\n", __func__);
+ if (!req)
return -ENOMEM;
- }
skcipher_request_set_callback(req,
CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
- fname_crypt_complete, &ecr);
- sg_init_one(&sg, oname->name, cryptlen);
- skcipher_request_set_crypt(req, &sg, &sg, cryptlen, iv);
+ crypto_req_done, &wait);
+ sg_init_one(&sg, out, olen);
+ skcipher_request_set_crypt(req, &sg, &sg, olen, iv);
/* Do the encryption */
- res = crypto_skcipher_encrypt(req);
- if (res == -EINPROGRESS || res == -EBUSY) {
- /* Request is being completed asynchronously; wait for it */
- wait_for_completion(&ecr.completion);
- res = ecr.res;
- }
+ res = crypto_wait_req(crypto_skcipher_encrypt(req), &wait);
skcipher_request_free(req);
if (res < 0) {
- printk_ratelimited(KERN_ERR
- "%s: Error (error code %d)\n", __func__, res);
+ fscrypt_err(inode->i_sb,
+ "Filename encryption failed for inode %lu: %d",
+ inode->i_ino, res);
return res;
}
- oname->len = cryptlen;
return 0;
}
@@ -111,28 +92,19 @@
struct fscrypt_str *oname)
{
struct skcipher_request *req = NULL;
- DECLARE_FS_COMPLETION_RESULT(ecr);
+ DECLARE_CRYPTO_WAIT(wait);
struct scatterlist src_sg, dst_sg;
- struct fscrypt_info *ci = inode->i_crypt_info;
- struct crypto_skcipher *tfm = ci->ci_ctfm;
+ struct crypto_skcipher *tfm = inode->i_crypt_info->ci_ctfm;
int res = 0;
char iv[FS_CRYPTO_BLOCK_SIZE];
- unsigned lim;
-
- lim = inode->i_sb->s_cop->max_namelen(inode);
- if (iname->len <= 0 || iname->len > lim)
- return -EIO;
/* Allocate request */
req = skcipher_request_alloc(tfm, GFP_NOFS);
- if (!req) {
- printk_ratelimited(KERN_ERR
- "%s: crypto_request_alloc() failed\n", __func__);
+ if (!req)
return -ENOMEM;
- }
skcipher_request_set_callback(req,
CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
- fname_crypt_complete, &ecr);
+ crypto_req_done, &wait);
/* Initialize IV */
memset(iv, 0, FS_CRYPTO_BLOCK_SIZE);
@@ -141,15 +113,12 @@
sg_init_one(&src_sg, iname->name, iname->len);
sg_init_one(&dst_sg, oname->name, oname->len);
skcipher_request_set_crypt(req, &src_sg, &dst_sg, iname->len, iv);
- res = crypto_skcipher_decrypt(req);
- if (res == -EINPROGRESS || res == -EBUSY) {
- wait_for_completion(&ecr.completion);
- res = ecr.res;
- }
+ res = crypto_wait_req(crypto_skcipher_decrypt(req), &wait);
skcipher_request_free(req);
if (res < 0) {
- printk_ratelimited(KERN_ERR
- "%s: Error (error code %d)\n", __func__, res);
+ fscrypt_err(inode->i_sb,
+ "Filename decryption failed for inode %lu: %d",
+ inode->i_ino, res);
return res;
}
@@ -212,50 +181,52 @@
return cp - dst;
}
-u32 fscrypt_fname_encrypted_size(const struct inode *inode, u32 ilen)
+bool fscrypt_fname_encrypted_size(const struct inode *inode, u32 orig_len,
+ u32 max_len, u32 *encrypted_len_ret)
{
- int padding = 32;
- struct fscrypt_info *ci = inode->i_crypt_info;
+ int padding = 4 << (inode->i_crypt_info->ci_flags &
+ FS_POLICY_FLAGS_PAD_MASK);
+ u32 encrypted_len;
- if (ci)
- padding = 4 << (ci->ci_flags & FS_POLICY_FLAGS_PAD_MASK);
- ilen = max(ilen, (u32)FS_CRYPTO_BLOCK_SIZE);
- return round_up(ilen, padding);
+ if (orig_len > max_len)
+ return false;
+ encrypted_len = max(orig_len, (u32)FS_CRYPTO_BLOCK_SIZE);
+ encrypted_len = round_up(encrypted_len, padding);
+ *encrypted_len_ret = min(encrypted_len, max_len);
+ return true;
}
-EXPORT_SYMBOL(fscrypt_fname_encrypted_size);
/**
- * fscrypt_fname_crypto_alloc_obuff() -
+ * fscrypt_fname_alloc_buffer - allocate a buffer for presented filenames
*
- * Allocates an output buffer that is sufficient for the crypto operation
- * specified by the context and the direction.
+ * Allocate a buffer that is large enough to hold any decrypted or encoded
+ * filename (null-terminated), for the given maximum encrypted filename length.
+ *
+ * Return: 0 on success, -errno on failure
*/
int fscrypt_fname_alloc_buffer(const struct inode *inode,
- u32 ilen, struct fscrypt_str *crypto_str)
+ u32 max_encrypted_len,
+ struct fscrypt_str *crypto_str)
{
- u32 olen = fscrypt_fname_encrypted_size(inode, ilen);
const u32 max_encoded_len =
max_t(u32, BASE64_CHARS(FSCRYPT_FNAME_MAX_UNDIGESTED_SIZE),
1 + BASE64_CHARS(sizeof(struct fscrypt_digested_name)));
+ u32 max_presented_len;
- crypto_str->len = olen;
- olen = max(olen, max_encoded_len);
+ max_presented_len = max(max_encoded_len, max_encrypted_len);
- /*
- * Allocated buffer can hold one more character to null-terminate the
- * string
- */
- crypto_str->name = kmalloc(olen + 1, GFP_NOFS);
- if (!(crypto_str->name))
+ crypto_str->name = kmalloc(max_presented_len + 1, GFP_NOFS);
+ if (!crypto_str->name)
return -ENOMEM;
+ crypto_str->len = max_presented_len;
return 0;
}
EXPORT_SYMBOL(fscrypt_fname_alloc_buffer);
/**
- * fscrypt_fname_crypto_free_buffer() -
+ * fscrypt_fname_free_buffer - free the buffer for presented filenames
*
- * Frees the buffer allocated for crypto operation.
+ * Free the buffer allocated by fscrypt_fname_alloc_buffer().
*/
void fscrypt_fname_free_buffer(struct fscrypt_str *crypto_str)
{
@@ -322,35 +293,6 @@
EXPORT_SYMBOL(fscrypt_fname_disk_to_usr);
/**
- * fscrypt_fname_usr_to_disk() - converts a filename from user space to disk
- * space
- *
- * The caller must have allocated sufficient memory for the @oname string.
- *
- * Return: 0 on success, -errno on failure
- */
-int fscrypt_fname_usr_to_disk(struct inode *inode,
- const struct qstr *iname,
- struct fscrypt_str *oname)
-{
- if (fscrypt_is_dot_dotdot(iname)) {
- oname->name[0] = '.';
- oname->name[iname->len - 1] = '.';
- oname->len = iname->len;
- return 0;
- }
- if (inode->i_crypt_info)
- return fname_encrypt(inode, iname, oname);
- /*
- * Without a proper key, a user is not allowed to modify the filenames
- * in a directory. Consequently, a user space name cannot be mapped to
- * a disk-space name
- */
- return -ENOKEY;
-}
-EXPORT_SYMBOL(fscrypt_fname_usr_to_disk);
-
-/**
* fscrypt_setup_filename() - prepare to search a possibly encrypted directory
* @dir: the directory that will be searched
* @iname: the user-provided filename being searched for
@@ -383,22 +325,27 @@
memset(fname, 0, sizeof(struct fscrypt_name));
fname->usr_fname = iname;
- if (!dir->i_sb->s_cop->is_encrypted(dir) ||
- fscrypt_is_dot_dotdot(iname)) {
+ if (!IS_ENCRYPTED(dir) || fscrypt_is_dot_dotdot(iname)) {
fname->disk_name.name = (unsigned char *)iname->name;
fname->disk_name.len = iname->len;
return 0;
}
ret = fscrypt_get_encryption_info(dir);
- if (ret && ret != -EOPNOTSUPP)
+ if (ret)
return ret;
if (dir->i_crypt_info) {
- ret = fscrypt_fname_alloc_buffer(dir, iname->len,
- &fname->crypto_buf);
- if (ret)
- return ret;
- ret = fname_encrypt(dir, iname, &fname->crypto_buf);
+ if (!fscrypt_fname_encrypted_size(dir, iname->len,
+ dir->i_sb->s_cop->max_namelen,
+ &fname->crypto_buf.len))
+ return -ENAMETOOLONG;
+ fname->crypto_buf.name = kmalloc(fname->crypto_buf.len,
+ GFP_NOFS);
+ if (!fname->crypto_buf.name)
+ return -ENOMEM;
+
+ ret = fname_encrypt(dir, iname, fname->crypto_buf.name,
+ fname->crypto_buf.len);
if (ret)
goto errout;
fname->disk_name.name = fname->crypto_buf.name;
@@ -450,7 +397,7 @@
return 0;
errout:
- fscrypt_fname_free_buffer(&fname->crypto_buf);
+ kfree(fname->crypto_buf.name);
return ret;
}
EXPORT_SYMBOL(fscrypt_setup_filename);
diff --git a/fs/crypto/fscrypt_private.h b/fs/crypto/fscrypt_private.h
index 092e9da..39c20ef 100644
--- a/fs/crypto/fscrypt_private.h
+++ b/fs/crypto/fscrypt_private.h
@@ -12,20 +12,13 @@
#ifndef _FSCRYPT_PRIVATE_H
#define _FSCRYPT_PRIVATE_H
-#include <linux/fscrypt_supp.h>
+#define __FS_HAS_ENCRYPTION 1
+#include <linux/fscrypt.h>
#include <crypto/hash.h>
/* Encryption parameters */
#define FS_IV_SIZE 16
-#define FS_AES_128_ECB_KEY_SIZE 16
-#define FS_AES_128_CBC_KEY_SIZE 16
-#define FS_AES_128_CTS_KEY_SIZE 16
-#define FS_AES_256_GCM_KEY_SIZE 32
-#define FS_AES_256_CBC_KEY_SIZE 32
-#define FS_AES_256_CTS_KEY_SIZE 32
-#define FS_AES_256_XTS_KEY_SIZE 64
-
-#define FS_KEY_DERIVATION_NONCE_SIZE 16
+#define FS_KEY_DERIVATION_NONCE_SIZE 16
/**
* Encryption context for inode
@@ -49,6 +42,15 @@
#define FS_ENCRYPTION_CONTEXT_FORMAT_V1 1
+/**
+ * For encrypted symlinks, the ciphertext length is stored at the beginning
+ * of the string in little-endian format.
+ */
+struct fscrypt_symlink_data {
+ __le16 len;
+ char encrypted_path[1];
+} __packed;
+
/*
* A pointer to this structure is stored in the file system's in-core
* representation of an inode.
@@ -70,19 +72,27 @@
#define FS_CTX_REQUIRES_FREE_ENCRYPT_FL 0x00000001
#define FS_CTX_HAS_BOUNCE_BUFFER_FL 0x00000002
-struct fscrypt_completion_result {
- struct completion completion;
- int res;
-};
+static inline bool fscrypt_valid_enc_modes(u32 contents_mode,
+ u32 filenames_mode)
+{
+ if (contents_mode == FS_ENCRYPTION_MODE_AES_128_CBC &&
+ filenames_mode == FS_ENCRYPTION_MODE_AES_128_CTS)
+ return true;
-#define DECLARE_FS_COMPLETION_RESULT(ecr) \
- struct fscrypt_completion_result ecr = { \
- COMPLETION_INITIALIZER_ONSTACK((ecr).completion), 0 }
+ if (contents_mode == FS_ENCRYPTION_MODE_AES_256_XTS &&
+ filenames_mode == FS_ENCRYPTION_MODE_AES_256_CTS)
+ return true;
+ if (contents_mode == FS_ENCRYPTION_MODE_SPECK128_256_XTS &&
+ filenames_mode == FS_ENCRYPTION_MODE_SPECK128_256_CTS)
+ return true;
+
+ return false;
+}
/* crypto.c */
+extern struct kmem_cache *fscrypt_info_cachep;
extern int fscrypt_initialize(unsigned int cop_flags);
-extern struct workqueue_struct *fscrypt_read_workqueue;
extern int fscrypt_do_page_crypto(const struct inode *inode,
fscrypt_direction_t rw, u64 lblk_num,
struct page *src_page,
@@ -91,6 +101,22 @@
gfp_t gfp_flags);
extern struct page *fscrypt_alloc_bounce_page(struct fscrypt_ctx *ctx,
gfp_t gfp_flags);
+extern const struct dentry_operations fscrypt_d_ops;
+
+extern void __printf(3, 4) __cold
+fscrypt_msg(struct super_block *sb, const char *level, const char *fmt, ...);
+
+#define fscrypt_warn(sb, fmt, ...) \
+ fscrypt_msg(sb, KERN_WARNING, fmt, ##__VA_ARGS__)
+#define fscrypt_err(sb, fmt, ...) \
+ fscrypt_msg(sb, KERN_ERR, fmt, ##__VA_ARGS__)
+
+/* fname.c */
+extern int fname_encrypt(struct inode *inode, const struct qstr *iname,
+ u8 *out, unsigned int olen);
+extern bool fscrypt_fname_encrypted_size(const struct inode *inode,
+ u32 orig_len, u32 max_len,
+ u32 *encrypted_len_ret);
/* keyinfo.c */
extern void __exit fscrypt_essiv_cleanup(void);
diff --git a/fs/crypto/hooks.c b/fs/crypto/hooks.c
new file mode 100644
index 0000000..926e5df
--- /dev/null
+++ b/fs/crypto/hooks.c
@@ -0,0 +1,271 @@
+/*
+ * fs/crypto/hooks.c
+ *
+ * Encryption hooks for higher-level filesystem operations.
+ */
+
+#include <linux/ratelimit.h>
+#include "fscrypt_private.h"
+
+/**
+ * fscrypt_file_open - prepare to open a possibly-encrypted regular file
+ * @inode: the inode being opened
+ * @filp: the struct file being set up
+ *
+ * Currently, an encrypted regular file can only be opened if its encryption key
+ * is available; access to the raw encrypted contents is not supported.
+ * Therefore, we first set up the inode's encryption key (if not already done)
+ * and return an error if it's unavailable.
+ *
+ * We also verify that if the parent directory (from the path via which the file
+ * is being opened) is encrypted, then the inode being opened uses the same
+ * encryption policy. This is needed as part of the enforcement that all files
+ * in an encrypted directory tree use the same encryption policy, as a
+ * protection against certain types of offline attacks. Note that this check is
+ * needed even when opening an *unencrypted* file, since it's forbidden to have
+ * an unencrypted file in an encrypted directory.
+ *
+ * Return: 0 on success, -ENOKEY if the key is missing, or another -errno code
+ */
+int fscrypt_file_open(struct inode *inode, struct file *filp)
+{
+ int err;
+ struct dentry *dir;
+
+ err = fscrypt_require_key(inode);
+ if (err)
+ return err;
+
+ dir = dget_parent(file_dentry(filp));
+ if (IS_ENCRYPTED(d_inode(dir)) &&
+ !fscrypt_has_permitted_context(d_inode(dir), inode)) {
+ fscrypt_warn(inode->i_sb,
+ "inconsistent encryption contexts: %lu/%lu",
+ d_inode(dir)->i_ino, inode->i_ino);
+ err = -EPERM;
+ }
+ dput(dir);
+ return err;
+}
+EXPORT_SYMBOL_GPL(fscrypt_file_open);
+
+int __fscrypt_prepare_link(struct inode *inode, struct inode *dir)
+{
+ int err;
+
+ err = fscrypt_require_key(dir);
+ if (err)
+ return err;
+
+ if (!fscrypt_has_permitted_context(dir, inode))
+ return -EPERM;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(__fscrypt_prepare_link);
+
+int __fscrypt_prepare_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
+{
+ int err;
+
+ err = fscrypt_require_key(old_dir);
+ if (err)
+ return err;
+
+ err = fscrypt_require_key(new_dir);
+ if (err)
+ return err;
+
+ if (old_dir != new_dir) {
+ if (IS_ENCRYPTED(new_dir) &&
+ !fscrypt_has_permitted_context(new_dir,
+ d_inode(old_dentry)))
+ return -EPERM;
+
+ if ((flags & RENAME_EXCHANGE) &&
+ IS_ENCRYPTED(old_dir) &&
+ !fscrypt_has_permitted_context(old_dir,
+ d_inode(new_dentry)))
+ return -EPERM;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(__fscrypt_prepare_rename);
+
+int __fscrypt_prepare_lookup(struct inode *dir, struct dentry *dentry)
+{
+ int err = fscrypt_get_encryption_info(dir);
+
+ if (err)
+ return err;
+
+ if (fscrypt_has_encryption_key(dir)) {
+ spin_lock(&dentry->d_lock);
+ dentry->d_flags |= DCACHE_ENCRYPTED_WITH_KEY;
+ spin_unlock(&dentry->d_lock);
+ }
+
+ d_set_d_op(dentry, &fscrypt_d_ops);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(__fscrypt_prepare_lookup);
+
+int __fscrypt_prepare_symlink(struct inode *dir, unsigned int len,
+ unsigned int max_len,
+ struct fscrypt_str *disk_link)
+{
+ int err;
+
+ /*
+ * To calculate the size of the encrypted symlink target we need to know
+ * the amount of NUL padding, which is determined by the flags set in
+ * the encryption policy which will be inherited from the directory.
+ * The easiest way to get access to this is to just load the directory's
+ * fscrypt_info, since we'll need it to create the dir_entry anyway.
+ *
+ * Note: in test_dummy_encryption mode, @dir may be unencrypted.
+ */
+ err = fscrypt_get_encryption_info(dir);
+ if (err)
+ return err;
+ if (!fscrypt_has_encryption_key(dir))
+ return -ENOKEY;
+
+ /*
+ * Calculate the size of the encrypted symlink and verify it won't
+ * exceed max_len. Note that for historical reasons, encrypted symlink
+ * targets are prefixed with the ciphertext length, despite this
+ * actually being redundant with i_size. This decreases by 2 bytes the
+ * longest symlink target we can accept.
+ *
+ * We could recover 1 byte by not counting a null terminator, but
+ * counting it (even though it is meaningless for ciphertext) is simpler
+ * for now since filesystems will assume it is there and subtract it.
+ */
+ if (!fscrypt_fname_encrypted_size(dir, len,
+ max_len - sizeof(struct fscrypt_symlink_data),
+ &disk_link->len))
+ return -ENAMETOOLONG;
+ disk_link->len += sizeof(struct fscrypt_symlink_data);
+
+ disk_link->name = NULL;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(__fscrypt_prepare_symlink);
+
+int __fscrypt_encrypt_symlink(struct inode *inode, const char *target,
+ unsigned int len, struct fscrypt_str *disk_link)
+{
+ int err;
+ struct qstr iname = QSTR_INIT(target, len);
+ struct fscrypt_symlink_data *sd;
+ unsigned int ciphertext_len;
+
+ err = fscrypt_require_key(inode);
+ if (err)
+ return err;
+
+ if (disk_link->name) {
+ /* filesystem-provided buffer */
+ sd = (struct fscrypt_symlink_data *)disk_link->name;
+ } else {
+ sd = kmalloc(disk_link->len, GFP_NOFS);
+ if (!sd)
+ return -ENOMEM;
+ }
+ ciphertext_len = disk_link->len - sizeof(*sd);
+ sd->len = cpu_to_le16(ciphertext_len);
+
+ err = fname_encrypt(inode, &iname, sd->encrypted_path, ciphertext_len);
+ if (err) {
+ if (!disk_link->name)
+ kfree(sd);
+ return err;
+ }
+ /*
+ * Null-terminating the ciphertext doesn't make sense, but we still
+ * count the null terminator in the length, so we might as well
+ * initialize it just in case the filesystem writes it out.
+ */
+ sd->encrypted_path[ciphertext_len] = '\0';
+
+ if (!disk_link->name)
+ disk_link->name = (unsigned char *)sd;
+ return 0;
+}
+EXPORT_SYMBOL_GPL(__fscrypt_encrypt_symlink);
+
+/**
+ * fscrypt_get_symlink - get the target of an encrypted symlink
+ * @inode: the symlink inode
+ * @caddr: the on-disk contents of the symlink
+ * @max_size: size of @caddr buffer
+ * @done: if successful, will be set up to free the returned target
+ *
+ * If the symlink's encryption key is available, we decrypt its target.
+ * Otherwise, we encode its target for presentation.
+ *
+ * This may sleep, so the filesystem must have dropped out of RCU mode already.
+ *
+ * Return: the presentable symlink target or an ERR_PTR()
+ */
+const char *fscrypt_get_symlink(struct inode *inode, const void *caddr,
+ unsigned int max_size,
+ struct delayed_call *done)
+{
+ const struct fscrypt_symlink_data *sd;
+ struct fscrypt_str cstr, pstr;
+ int err;
+
+ /* This is for encrypted symlinks only */
+ if (WARN_ON(!IS_ENCRYPTED(inode)))
+ return ERR_PTR(-EINVAL);
+
+ /*
+ * Try to set up the symlink's encryption key, but we can continue
+ * regardless of whether the key is available or not.
+ */
+ err = fscrypt_get_encryption_info(inode);
+ if (err)
+ return ERR_PTR(err);
+
+ /*
+ * For historical reasons, encrypted symlink targets are prefixed with
+ * the ciphertext length, even though this is redundant with i_size.
+ */
+
+ if (max_size < sizeof(*sd))
+ return ERR_PTR(-EUCLEAN);
+ sd = caddr;
+ cstr.name = (unsigned char *)sd->encrypted_path;
+ cstr.len = le16_to_cpu(sd->len);
+
+ if (cstr.len == 0)
+ return ERR_PTR(-EUCLEAN);
+
+ if (cstr.len + sizeof(*sd) - 1 > max_size)
+ return ERR_PTR(-EUCLEAN);
+
+ err = fscrypt_fname_alloc_buffer(inode, cstr.len, &pstr);
+ if (err)
+ return ERR_PTR(err);
+
+ err = fscrypt_fname_disk_to_usr(inode, 0, 0, &cstr, &pstr);
+ if (err)
+ goto err_kfree;
+
+ err = -EUCLEAN;
+ if (pstr.name[0] == '\0')
+ goto err_kfree;
+
+ pstr.name[pstr.len] = '\0';
+ set_delayed_call(done, kfree_link, pstr.name);
+ return pstr.name;
+
+err_kfree:
+ kfree(pstr.name);
+ return ERR_PTR(err);
+}
+EXPORT_SYMBOL_GPL(fscrypt_get_symlink);
diff --git a/fs/crypto/keyinfo.c b/fs/crypto/keyinfo.c
index a386302..e997ca5 100644
--- a/fs/crypto/keyinfo.c
+++ b/fs/crypto/keyinfo.c
@@ -14,36 +14,25 @@
#include <linux/ratelimit.h>
#include <crypto/aes.h>
#include <crypto/sha.h>
+#include <crypto/skcipher.h>
#include "fscrypt_private.h"
static struct crypto_shash *essiv_hash_tfm;
-static void derive_crypt_complete(struct crypto_async_request *req, int rc)
-{
- struct fscrypt_completion_result *ecr = req->data;
-
- if (rc == -EINPROGRESS)
- return;
-
- ecr->res = rc;
- complete(&ecr->completion);
-}
-
-/**
- * derive_key_aes() - Derive a key using AES-128-ECB
- * @deriving_key: Encryption key used for derivation.
- * @source_key: Source key to which to apply derivation.
- * @derived_raw_key: Derived raw key.
+/*
+ * Key derivation function. This generates the derived key by encrypting the
+ * master key with AES-128-ECB using the inode's nonce as the AES key.
*
- * Return: Zero on success; non-zero otherwise.
+ * The master key must be at least as long as the derived key. If the master
+ * key is longer, then only the first 'derived_keysize' bytes are used.
*/
-static int derive_key_aes(u8 deriving_key[FS_AES_128_ECB_KEY_SIZE],
- const struct fscrypt_key *source_key,
- u8 derived_raw_key[FS_MAX_KEY_SIZE])
+static int derive_key_aes(const u8 *master_key,
+ const struct fscrypt_context *ctx,
+ u8 *derived_key, unsigned int derived_keysize)
{
int res = 0;
struct skcipher_request *req = NULL;
- DECLARE_FS_COMPLETION_RESULT(ecr);
+ DECLARE_CRYPTO_WAIT(wait);
struct scatterlist src_sg, dst_sg;
struct crypto_skcipher *tfm = crypto_alloc_skcipher("ecb(aes)", 0, 0);
@@ -60,122 +49,163 @@
}
skcipher_request_set_callback(req,
CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
- derive_crypt_complete, &ecr);
- res = crypto_skcipher_setkey(tfm, deriving_key,
- FS_AES_128_ECB_KEY_SIZE);
+ crypto_req_done, &wait);
+ res = crypto_skcipher_setkey(tfm, ctx->nonce, sizeof(ctx->nonce));
if (res < 0)
goto out;
- sg_init_one(&src_sg, source_key->raw, source_key->size);
- sg_init_one(&dst_sg, derived_raw_key, source_key->size);
- skcipher_request_set_crypt(req, &src_sg, &dst_sg, source_key->size,
+ sg_init_one(&src_sg, master_key, derived_keysize);
+ sg_init_one(&dst_sg, derived_key, derived_keysize);
+ skcipher_request_set_crypt(req, &src_sg, &dst_sg, derived_keysize,
NULL);
- res = crypto_skcipher_encrypt(req);
- if (res == -EINPROGRESS || res == -EBUSY) {
- wait_for_completion(&ecr.completion);
- res = ecr.res;
- }
+ res = crypto_wait_req(crypto_skcipher_encrypt(req), &wait);
out:
skcipher_request_free(req);
crypto_free_skcipher(tfm);
return res;
}
-static int validate_user_key(struct fscrypt_info *crypt_info,
- struct fscrypt_context *ctx, u8 *raw_key,
- const char *prefix, int min_keysize)
+/*
+ * Search the current task's subscribed keyrings for a "logon" key with
+ * description prefix:descriptor, and if found acquire a read lock on it and
+ * return a pointer to its validated payload in *payload_ret.
+ */
+static struct key *
+find_and_lock_process_key(const char *prefix,
+ const u8 descriptor[FS_KEY_DESCRIPTOR_SIZE],
+ unsigned int min_keysize,
+ const struct fscrypt_key **payload_ret)
{
char *description;
- struct key *keyring_key;
- struct fscrypt_key *master_key;
+ struct key *key;
const struct user_key_payload *ukp;
- int res;
+ const struct fscrypt_key *payload;
description = kasprintf(GFP_NOFS, "%s%*phN", prefix,
- FS_KEY_DESCRIPTOR_SIZE,
- ctx->master_key_descriptor);
+ FS_KEY_DESCRIPTOR_SIZE, descriptor);
if (!description)
- return -ENOMEM;
+ return ERR_PTR(-ENOMEM);
- keyring_key = request_key(&key_type_logon, description, NULL);
+ key = request_key(&key_type_logon, description, NULL);
kfree(description);
- if (IS_ERR(keyring_key))
- return PTR_ERR(keyring_key);
- down_read(&keyring_key->sem);
+ if (IS_ERR(key))
+ return key;
- if (keyring_key->type != &key_type_logon) {
- printk_once(KERN_WARNING
- "%s: key type must be logon\n", __func__);
- res = -ENOKEY;
- goto out;
- }
- ukp = user_key_payload_locked(keyring_key);
- if (!ukp) {
- /* key was revoked before we acquired its semaphore */
- res = -EKEYREVOKED;
- goto out;
- }
- if (ukp->datalen != sizeof(struct fscrypt_key)) {
- res = -EINVAL;
- goto out;
- }
- master_key = (struct fscrypt_key *)ukp->data;
- BUILD_BUG_ON(FS_AES_128_ECB_KEY_SIZE != FS_KEY_DERIVATION_NONCE_SIZE);
+ down_read(&key->sem);
+ ukp = user_key_payload_locked(key);
- if (master_key->size < min_keysize || master_key->size > FS_MAX_KEY_SIZE
- || master_key->size % AES_BLOCK_SIZE != 0) {
- printk_once(KERN_WARNING
- "%s: key size incorrect: %d\n",
- __func__, master_key->size);
- res = -ENOKEY;
- goto out;
+ if (!ukp) /* was the key revoked before we acquired its semaphore? */
+ goto invalid;
+
+ payload = (const struct fscrypt_key *)ukp->data;
+
+ if (ukp->datalen != sizeof(struct fscrypt_key) ||
+ payload->size < 1 || payload->size > FS_MAX_KEY_SIZE) {
+ fscrypt_warn(NULL,
+ "key with description '%s' has invalid payload",
+ key->description);
+ goto invalid;
}
- res = derive_key_aes(ctx->nonce, master_key, raw_key);
-out:
- up_read(&keyring_key->sem);
- key_put(keyring_key);
- return res;
+
+ if (payload->size < min_keysize) {
+ fscrypt_warn(NULL,
+ "key with description '%s' is too short (got %u bytes, need %u+ bytes)",
+ key->description, payload->size, min_keysize);
+ goto invalid;
+ }
+
+ *payload_ret = payload;
+ return key;
+
+invalid:
+ up_read(&key->sem);
+ key_put(key);
+ return ERR_PTR(-ENOKEY);
}
-static const struct {
+/* Find the master key, then derive the inode's actual encryption key */
+static int find_and_derive_key(const struct inode *inode,
+ const struct fscrypt_context *ctx,
+ u8 *derived_key, unsigned int derived_keysize)
+{
+ struct key *key;
+ const struct fscrypt_key *payload;
+ int err;
+
+ key = find_and_lock_process_key(FS_KEY_DESC_PREFIX,
+ ctx->master_key_descriptor,
+ derived_keysize, &payload);
+ if (key == ERR_PTR(-ENOKEY) && inode->i_sb->s_cop->key_prefix) {
+ key = find_and_lock_process_key(inode->i_sb->s_cop->key_prefix,
+ ctx->master_key_descriptor,
+ derived_keysize, &payload);
+ }
+ if (IS_ERR(key))
+ return PTR_ERR(key);
+ err = derive_key_aes(payload->raw, ctx, derived_key, derived_keysize);
+ up_read(&key->sem);
+ key_put(key);
+ return err;
+}
+
+static struct fscrypt_mode {
+ const char *friendly_name;
const char *cipher_str;
int keysize;
+ bool logged_impl_name;
} available_modes[] = {
- [FS_ENCRYPTION_MODE_AES_256_XTS] = { "xts(aes)",
- FS_AES_256_XTS_KEY_SIZE },
- [FS_ENCRYPTION_MODE_AES_256_CTS] = { "cts(cbc(aes))",
- FS_AES_256_CTS_KEY_SIZE },
- [FS_ENCRYPTION_MODE_AES_128_CBC] = { "cbc(aes)",
- FS_AES_128_CBC_KEY_SIZE },
- [FS_ENCRYPTION_MODE_AES_128_CTS] = { "cts(cbc(aes))",
- FS_AES_128_CTS_KEY_SIZE },
+ [FS_ENCRYPTION_MODE_AES_256_XTS] = {
+ .friendly_name = "AES-256-XTS",
+ .cipher_str = "xts(aes)",
+ .keysize = 64,
+ },
+ [FS_ENCRYPTION_MODE_AES_256_CTS] = {
+ .friendly_name = "AES-256-CTS-CBC",
+ .cipher_str = "cts(cbc(aes))",
+ .keysize = 32,
+ },
+ [FS_ENCRYPTION_MODE_AES_128_CBC] = {
+ .friendly_name = "AES-128-CBC",
+ .cipher_str = "cbc(aes)",
+ .keysize = 16,
+ },
+ [FS_ENCRYPTION_MODE_AES_128_CTS] = {
+ .friendly_name = "AES-128-CTS-CBC",
+ .cipher_str = "cts(cbc(aes))",
+ .keysize = 16,
+ },
+ [FS_ENCRYPTION_MODE_SPECK128_256_XTS] = {
+ .friendly_name = "Speck128/256-XTS",
+ .cipher_str = "xts(speck128)",
+ .keysize = 64,
+ },
+ [FS_ENCRYPTION_MODE_SPECK128_256_CTS] = {
+ .friendly_name = "Speck128/256-CTS-CBC",
+ .cipher_str = "cts(cbc(speck128))",
+ .keysize = 32,
+ },
};
-static int determine_cipher_type(struct fscrypt_info *ci, struct inode *inode,
- const char **cipher_str_ret, int *keysize_ret)
+static struct fscrypt_mode *
+select_encryption_mode(const struct fscrypt_info *ci, const struct inode *inode)
{
- u32 mode;
-
if (!fscrypt_valid_enc_modes(ci->ci_data_mode, ci->ci_filename_mode)) {
- pr_warn_ratelimited("fscrypt: inode %lu uses unsupported encryption modes (contents mode %d, filenames mode %d)\n",
- inode->i_ino,
- ci->ci_data_mode, ci->ci_filename_mode);
- return -EINVAL;
+ fscrypt_warn(inode->i_sb,
+ "inode %lu uses unsupported encryption modes (contents mode %d, filenames mode %d)",
+ inode->i_ino, ci->ci_data_mode,
+ ci->ci_filename_mode);
+ return ERR_PTR(-EINVAL);
}
- if (S_ISREG(inode->i_mode)) {
- mode = ci->ci_data_mode;
- } else if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode)) {
- mode = ci->ci_filename_mode;
- } else {
- WARN_ONCE(1, "fscrypt: filesystem tried to load encryption info for inode %lu, which is not encryptable (file type %d)\n",
- inode->i_ino, (inode->i_mode & S_IFMT));
- return -EINVAL;
- }
+ if (S_ISREG(inode->i_mode))
+ return &available_modes[ci->ci_data_mode];
- *cipher_str_ret = available_modes[mode].cipher_str;
- *keysize_ret = available_modes[mode].keysize;
- return 0;
+ if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
+ return &available_modes[ci->ci_filename_mode];
+
+ WARN_ONCE(1, "fscrypt: filesystem tried to load encryption info for inode %lu, which is not encryptable (file type %d)\n",
+ inode->i_ino, (inode->i_mode & S_IFMT));
+ return ERR_PTR(-EINVAL);
}
static void put_crypt_info(struct fscrypt_info *ci)
@@ -198,8 +228,9 @@
tfm = crypto_alloc_shash("sha256", 0, 0);
if (IS_ERR(tfm)) {
- pr_warn_ratelimited("fscrypt: error allocating SHA-256 transform: %ld\n",
- PTR_ERR(tfm));
+ fscrypt_warn(NULL,
+ "error allocating SHA-256 transform: %ld",
+ PTR_ERR(tfm));
return PTR_ERR(tfm);
}
prev_tfm = cmpxchg(&essiv_hash_tfm, NULL, tfm);
@@ -259,8 +290,7 @@
struct fscrypt_info *crypt_info;
struct fscrypt_context ctx;
struct crypto_skcipher *ctfm;
- const char *cipher_str;
- int keysize;
+ struct fscrypt_mode *mode;
u8 *raw_key = NULL;
int res;
@@ -274,7 +304,7 @@
res = inode->i_sb->s_cop->get_context(inode, &ctx, sizeof(ctx));
if (res < 0) {
if (!fscrypt_dummy_context_enabled(inode) ||
- inode->i_sb->s_cop->is_encrypted(inode))
+ IS_ENCRYPTED(inode))
return res;
/* Fake up a context for an unencrypted directory */
memset(&ctx, 0, sizeof(ctx));
@@ -304,57 +334,59 @@
memcpy(crypt_info->ci_master_key, ctx.master_key_descriptor,
sizeof(crypt_info->ci_master_key));
- res = determine_cipher_type(crypt_info, inode, &cipher_str, &keysize);
- if (res)
+ mode = select_encryption_mode(crypt_info, inode);
+ if (IS_ERR(mode)) {
+ res = PTR_ERR(mode);
goto out;
+ }
/*
* This cannot be a stack buffer because it is passed to the scatterlist
* crypto API as part of key derivation.
*/
res = -ENOMEM;
- raw_key = kmalloc(FS_MAX_KEY_SIZE, GFP_NOFS);
+ raw_key = kmalloc(mode->keysize, GFP_NOFS);
if (!raw_key)
goto out;
- res = validate_user_key(crypt_info, &ctx, raw_key, FS_KEY_DESC_PREFIX,
- keysize);
- if (res && inode->i_sb->s_cop->key_prefix) {
- int res2 = validate_user_key(crypt_info, &ctx, raw_key,
- inode->i_sb->s_cop->key_prefix,
- keysize);
- if (res2) {
- if (res2 == -ENOKEY)
- res = -ENOKEY;
- goto out;
- }
- } else if (res) {
+ res = find_and_derive_key(inode, &ctx, raw_key, mode->keysize);
+ if (res)
+ goto out;
+
+ ctfm = crypto_alloc_skcipher(mode->cipher_str, 0, 0);
+ if (IS_ERR(ctfm)) {
+ res = PTR_ERR(ctfm);
+ fscrypt_warn(inode->i_sb,
+ "error allocating '%s' transform for inode %lu: %d",
+ mode->cipher_str, inode->i_ino, res);
goto out;
}
- ctfm = crypto_alloc_skcipher(cipher_str, 0, 0);
- if (!ctfm || IS_ERR(ctfm)) {
- res = ctfm ? PTR_ERR(ctfm) : -ENOMEM;
- pr_debug("%s: error %d (inode %lu) allocating crypto tfm\n",
- __func__, res, inode->i_ino);
- goto out;
+ if (unlikely(!mode->logged_impl_name)) {
+ /*
+ * fscrypt performance can vary greatly depending on which
+ * crypto algorithm implementation is used. Help people debug
+ * performance problems by logging the ->cra_driver_name the
+ * first time a mode is used. Note that multiple threads can
+ * race here, but it doesn't really matter.
+ */
+ mode->logged_impl_name = true;
+ pr_info("fscrypt: %s using implementation \"%s\"\n",
+ mode->friendly_name,
+ crypto_skcipher_alg(ctfm)->base.cra_driver_name);
}
crypt_info->ci_ctfm = ctfm;
- crypto_skcipher_clear_flags(ctfm, ~0);
crypto_skcipher_set_flags(ctfm, CRYPTO_TFM_REQ_WEAK_KEY);
- /*
- * if the provided key is longer than keysize, we use the first
- * keysize bytes of the derived key only
- */
- res = crypto_skcipher_setkey(ctfm, raw_key, keysize);
+ res = crypto_skcipher_setkey(ctfm, raw_key, mode->keysize);
if (res)
goto out;
if (S_ISREG(inode->i_mode) &&
crypt_info->ci_data_mode == FS_ENCRYPTION_MODE_AES_128_CBC) {
- res = init_essiv_generator(crypt_info, raw_key, keysize);
+ res = init_essiv_generator(crypt_info, raw_key, mode->keysize);
if (res) {
- pr_debug("%s: error %d (inode %lu) allocating essiv tfm\n",
- __func__, res, inode->i_ino);
+ fscrypt_warn(inode->i_sb,
+ "error initializing ESSIV generator for inode %lu: %d",
+ inode->i_ino, res);
goto out;
}
}
@@ -369,19 +401,9 @@
}
EXPORT_SYMBOL(fscrypt_get_encryption_info);
-void fscrypt_put_encryption_info(struct inode *inode, struct fscrypt_info *ci)
+void fscrypt_put_encryption_info(struct inode *inode)
{
- struct fscrypt_info *prev;
-
- if (ci == NULL)
- ci = ACCESS_ONCE(inode->i_crypt_info);
- if (ci == NULL)
- return;
-
- prev = cmpxchg(&inode->i_crypt_info, ci, NULL);
- if (prev != ci)
- return;
-
- put_crypt_info(ci);
+ put_crypt_info(inode->i_crypt_info);
+ inode->i_crypt_info = NULL;
}
EXPORT_SYMBOL(fscrypt_put_encryption_info);
diff --git a/fs/crypto/policy.c b/fs/crypto/policy.c
index a120649b..c6d431a 100644
--- a/fs/crypto/policy.c
+++ b/fs/crypto/policy.c
@@ -110,7 +110,7 @@
struct fscrypt_policy policy;
int res;
- if (!inode->i_sb->s_cop->is_encrypted(inode))
+ if (!IS_ENCRYPTED(inode))
return -ENODATA;
res = inode->i_sb->s_cop->get_context(inode, &ctx, sizeof(ctx));
@@ -167,11 +167,11 @@
return 1;
/* No restrictions if the parent directory is unencrypted */
- if (!cops->is_encrypted(parent))
+ if (!IS_ENCRYPTED(parent))
return 1;
/* Encrypted directories must not contain unencrypted files */
- if (!cops->is_encrypted(child))
+ if (!IS_ENCRYPTED(child))
return 0;
/*
diff --git a/fs/dcache.c b/fs/dcache.c
index c1a7c17..765ea65 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -3263,6 +3263,7 @@
return ERR_PTR(error);
return res;
}
+EXPORT_SYMBOL(d_absolute_path);
/*
* same as __d_path but appends "(deleted)" for unlinked files.
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 2fabd19..a7d3947 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -34,6 +34,7 @@
#include <linux/mutex.h>
#include <linux/anon_inodes.h>
#include <linux/device.h>
+#include <linux/freezer.h>
#include <linux/uaccess.h>
#include <asm/io.h>
#include <asm/mman.h>
@@ -1826,7 +1827,8 @@
}
spin_unlock_irqrestore(&ep->lock, flags);
- if (!schedule_hrtimeout_range(to, slack, HRTIMER_MODE_ABS))
+ if (!freezable_schedule_hrtimeout_range(to, slack,
+ HRTIMER_MODE_ABS))
timed_out = 1;
spin_lock_irqsave(&ep->lock, flags);
diff --git a/fs/exec.c b/fs/exec.c
index 0da4d74..062159d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1303,7 +1303,7 @@
void would_dump(struct linux_binprm *bprm, struct file *file)
{
struct inode *inode = file_inode(file);
- if (inode_permission(inode, MAY_READ) < 0) {
+ if (inode_permission2(file->f_path.mnt, inode, MAY_READ) < 0) {
struct user_namespace *old, *user_ns;
bprm->interp_flags |= BINPRM_FLAGS_ENFORCE_NONDUMP;
diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
index 0ac6281..f17715d 100644
--- a/fs/exofs/inode.c
+++ b/fs/exofs/inode.c
@@ -377,9 +377,8 @@
* and will start a new collection. Eventually caller must submit the last
* segment if present.
*/
-static int readpage_strip(void *data, struct page *page)
+static int __readpage_strip(struct page_collect *pcol, struct page *page)
{
- struct page_collect *pcol = data;
struct inode *inode = pcol->inode;
struct exofs_i_info *oi = exofs_i(inode);
loff_t i_size = i_size_read(inode);
@@ -470,6 +469,13 @@
return ret;
}
+static int readpage_strip(struct file *data, struct page *page)
+{
+ struct page_collect *pcol = (struct page_collect *)data;
+
+ return __readpage_strip(pcol, page);
+}
+
static int exofs_readpages(struct file *file, struct address_space *mapping,
struct list_head *pages, unsigned nr_pages)
{
@@ -499,7 +505,7 @@
_pcol_init(&pcol, 1, page->mapping->host);
pcol.read_4_write = read_4_write;
- ret = readpage_strip(&pcol, page);
+ ret = __readpage_strip(&pcol, page);
if (ret) {
EXOFS_ERR("_readpage => %d\n", ret);
return ret;
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 0abb30d..69e83cf 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -34,17 +34,15 @@
#include <linux/percpu_counter.h>
#include <linux/ratelimit.h>
#include <crypto/hash.h>
-#ifdef CONFIG_EXT4_FS_ENCRYPTION
-#include <linux/fscrypt_supp.h>
-#else
-#include <linux/fscrypt_notsupp.h>
-#endif
#include <linux/falloc.h>
#include <linux/percpu-rwsem.h>
#ifdef __KERNEL__
#include <linux/compat.h>
#endif
+#define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_EXT4_FS_ENCRYPTION)
+#include <linux/fscrypt.h>
+
/*
* The fourth extended filesystem constants/structures
*/
diff --git a/fs/ext4/inline.c b/fs/ext4/inline.c
index b549cfd..a2deb66 100644
--- a/fs/ext4/inline.c
+++ b/fs/ext4/inline.c
@@ -18,6 +18,7 @@
#include "ext4.h"
#include "xattr.h"
#include "truncate.h"
+#include <trace/events/android_fs.h>
#define EXT4_XATTR_SYSTEM_DATA "data"
#define EXT4_MIN_INLINE_DATA_SIZE ((sizeof(__le32) * EXT4_N_BLOCKS))
@@ -511,6 +512,17 @@
return -EAGAIN;
}
+ if (trace_android_fs_dataread_start_enabled()) {
+ char *path, pathbuf[MAX_TRACE_PATHBUF_LEN];
+
+ path = android_fstrace_get_pathname(pathbuf,
+ MAX_TRACE_PATHBUF_LEN,
+ inode);
+ trace_android_fs_dataread_start(inode, page_offset(page),
+ PAGE_SIZE, current->pid,
+ path, current->comm);
+ }
+
/*
* Current inline data can only exist in the 1st page,
* So for all the other pages, just set them uptodate.
@@ -522,6 +534,8 @@
SetPageUptodate(page);
}
+ trace_android_fs_dataread_end(inode, page_offset(page), PAGE_SIZE);
+
up_read(&EXT4_I(inode)->xattr_sem);
unlock_page(page);
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index f9baa59..e7da1b6 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -46,6 +46,7 @@
#include "truncate.h"
#include <trace/events/ext4.h>
+#include <trace/events/android_fs.h>
#define MPAGE_DA_EXTENT_TAIL 0x01
@@ -1252,6 +1253,16 @@
if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
return -EIO;
+ if (trace_android_fs_datawrite_start_enabled()) {
+ char *path, pathbuf[MAX_TRACE_PATHBUF_LEN];
+
+ path = android_fstrace_get_pathname(pathbuf,
+ MAX_TRACE_PATHBUF_LEN,
+ inode);
+ trace_android_fs_datawrite_start(inode, pos, len,
+ current->pid, path,
+ current->comm);
+ }
trace_ext4_write_begin(inode, pos, len, flags);
/*
* Reserve one block more for addition to orphan list in case
@@ -1390,6 +1401,7 @@
int i_size_changed = 0;
int inline_data = ext4_has_inline_data(inode);
+ trace_android_fs_datawrite_end(inode, pos, len);
trace_ext4_write_end(inode, pos, len, copied);
if (inline_data) {
ret = ext4_write_inline_data_end(inode, pos, len,
@@ -1495,6 +1507,7 @@
int size_changed = 0;
int inline_data = ext4_has_inline_data(inode);
+ trace_android_fs_datawrite_end(inode, pos, len);
trace_ext4_journalled_write_end(inode, pos, len, copied);
from = pos & (PAGE_SIZE - 1);
to = from + len;
@@ -2627,8 +2640,8 @@
mpd->map.m_len = 0;
mpd->next_page = index;
while (index <= end) {
- nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag,
- min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1);
+ nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end,
+ tag);
if (nr_pages == 0)
goto out;
@@ -2636,16 +2649,6 @@
struct page *page = pvec.pages[i];
/*
- * At this point, the page may be truncated or
- * invalidated (changing page->mapping to NULL), or
- * even swizzled back from swapper_space to tmpfs file
- * mapping. However, page->index will not change
- * because we have a reference on the page.
- */
- if (page->index > end)
- goto out;
-
- /*
* Accumulated enough dirty pages? This doesn't apply
* to WB_SYNC_ALL mode. For integrity sync we have to
* keep going because someone may be concurrently
@@ -3031,6 +3034,16 @@
len, flags, pagep, fsdata);
}
*fsdata = (void *)0;
+ if (trace_android_fs_datawrite_start_enabled()) {
+ char *path, pathbuf[MAX_TRACE_PATHBUF_LEN];
+
+ path = android_fstrace_get_pathname(pathbuf,
+ MAX_TRACE_PATHBUF_LEN,
+ inode);
+ trace_android_fs_datawrite_start(inode, pos, len,
+ current->pid,
+ path, current->comm);
+ }
trace_ext4_da_write_begin(inode, pos, len, flags);
if (ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)) {
@@ -3149,6 +3162,7 @@
return ext4_write_end(file, mapping, pos,
len, copied, page, fsdata);
+ trace_android_fs_datawrite_end(inode, pos, len);
trace_ext4_da_write_end(inode, pos, len, copied);
start = pos & (PAGE_SIZE - 1);
end = start + copied - 1;
@@ -3794,6 +3808,7 @@
size_t count = iov_iter_count(iter);
loff_t offset = iocb->ki_pos;
ssize_t ret;
+ int rw = iov_iter_rw(iter);
#ifdef CONFIG_EXT4_FS_ENCRYPTION
if (ext4_encrypted_inode(inode) && S_ISREG(inode->i_mode))
@@ -3814,12 +3829,42 @@
if (WARN_ON_ONCE(IS_DAX(inode)))
return 0;
+ if (trace_android_fs_dataread_start_enabled() &&
+ (rw == READ)) {
+ char *path, pathbuf[MAX_TRACE_PATHBUF_LEN];
+
+ path = android_fstrace_get_pathname(pathbuf,
+ MAX_TRACE_PATHBUF_LEN,
+ inode);
+ trace_android_fs_dataread_start(inode, offset, count,
+ current->pid, path,
+ current->comm);
+ }
+ if (trace_android_fs_datawrite_start_enabled() &&
+ (rw == WRITE)) {
+ char *path, pathbuf[MAX_TRACE_PATHBUF_LEN];
+
+ path = android_fstrace_get_pathname(pathbuf,
+ MAX_TRACE_PATHBUF_LEN,
+ inode);
+ trace_android_fs_datawrite_start(inode, offset, count,
+ current->pid, path,
+ current->comm);
+ }
trace_ext4_direct_IO_enter(inode, offset, count, iov_iter_rw(iter));
if (iov_iter_rw(iter) == READ)
ret = ext4_direct_IO_read(iocb, iter);
else
ret = ext4_direct_IO_write(iocb, iter);
trace_ext4_direct_IO_exit(inode, offset, count, iov_iter_rw(iter), ret);
+
+ if (trace_android_fs_dataread_start_enabled() &&
+ (rw == READ))
+ trace_android_fs_dataread_end(inode, offset, count);
+ if (trace_android_fs_datawrite_start_enabled() &&
+ (rw == WRITE))
+ trace_android_fs_datawrite_end(inode, offset, count);
+
return ret;
}
@@ -4605,10 +4650,13 @@
new_fl |= S_DIRSYNC;
if (test_opt(inode->i_sb, DAX) && S_ISREG(inode->i_mode) &&
!ext4_should_journal_data(inode) && !ext4_has_inline_data(inode) &&
- !ext4_encrypted_inode(inode))
+ !(flags & EXT4_ENCRYPT_FL))
new_fl |= S_DAX;
+ if (flags & EXT4_ENCRYPT_FL)
+ new_fl |= S_ENCRYPTED;
inode_set_flags(inode, new_fl,
- S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|S_DAX);
+ S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|S_DAX|
+ S_ENCRYPTED);
}
static blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode,
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 1db39e1..7431e1b 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1544,24 +1544,14 @@
struct inode *inode;
struct ext4_dir_entry_2 *de;
struct buffer_head *bh;
+ int err;
- if (ext4_encrypted_inode(dir)) {
- int res = fscrypt_get_encryption_info(dir);
+ err = fscrypt_prepare_lookup(dir, dentry, flags);
+ if (err)
+ return ERR_PTR(err);
- /*
- * DCACHE_ENCRYPTED_WITH_KEY is set if the dentry is
- * created while the directory was encrypted and we
- * have access to the key.
- */
- if (fscrypt_has_encryption_key(dir))
- fscrypt_set_encrypted_dentry(dentry);
- fscrypt_set_d_op(dentry);
- if (res && res != -ENOKEY)
- return ERR_PTR(res);
- }
-
- if (dentry->d_name.len > EXT4_NAME_LEN)
- return ERR_PTR(-ENAMETOOLONG);
+ if (dentry->d_name.len > EXT4_NAME_LEN)
+ return ERR_PTR(-ENAMETOOLONG);
bh = ext4_find_entry(dir, &dentry->d_name, &de, NULL);
if (IS_ERR(bh))
@@ -3065,39 +3055,19 @@
struct inode *inode;
int err, len = strlen(symname);
int credits;
- bool encryption_required;
struct fscrypt_str disk_link;
- struct fscrypt_symlink_data *sd = NULL;
if (unlikely(ext4_forced_shutdown(EXT4_SB(dir->i_sb))))
return -EIO;
- disk_link.len = len + 1;
- disk_link.name = (char *) symname;
-
- encryption_required = (ext4_encrypted_inode(dir) ||
- DUMMY_ENCRYPTION_ENABLED(EXT4_SB(dir->i_sb)));
- if (encryption_required) {
- err = fscrypt_get_encryption_info(dir);
- if (err)
- return err;
- if (!fscrypt_has_encryption_key(dir))
- return -ENOKEY;
- disk_link.len = (fscrypt_fname_encrypted_size(dir, len) +
- sizeof(struct fscrypt_symlink_data));
- sd = kzalloc(disk_link.len, GFP_KERNEL);
- if (!sd)
- return -ENOMEM;
- }
-
- if (disk_link.len > dir->i_sb->s_blocksize) {
- err = -ENAMETOOLONG;
- goto err_free_sd;
- }
+ err = fscrypt_prepare_symlink(dir, symname, len, dir->i_sb->s_blocksize,
+ &disk_link);
+ if (err)
+ return err;
err = dquot_initialize(dir);
if (err)
- goto err_free_sd;
+ return err;
if ((disk_link.len > EXT4_N_BLOCKS * 4)) {
/*
@@ -3126,27 +3096,18 @@
if (IS_ERR(inode)) {
if (handle)
ext4_journal_stop(handle);
- err = PTR_ERR(inode);
- goto err_free_sd;
+ return PTR_ERR(inode);
}
- if (encryption_required) {
- struct qstr istr;
- struct fscrypt_str ostr =
- FSTR_INIT(sd->encrypted_path, disk_link.len);
-
- istr.name = (const unsigned char *) symname;
- istr.len = len;
- err = fscrypt_fname_usr_to_disk(inode, &istr, &ostr);
+ if (IS_ENCRYPTED(inode)) {
+ err = fscrypt_encrypt_symlink(inode, symname, len, &disk_link);
if (err)
goto err_drop_inode;
- sd->len = cpu_to_le16(ostr.len);
- disk_link.name = (char *) sd;
inode->i_op = &ext4_encrypted_symlink_inode_operations;
}
if ((disk_link.len > EXT4_N_BLOCKS * 4)) {
- if (!encryption_required)
+ if (!IS_ENCRYPTED(inode))
inode->i_op = &ext4_symlink_inode_operations;
inode_nohighmem(inode);
ext4_set_aops(inode);
@@ -3188,7 +3149,7 @@
} else {
/* clear the extent format for fast symlink */
ext4_clear_inode_flag(inode, EXT4_INODE_EXTENTS);
- if (!encryption_required) {
+ if (!IS_ENCRYPTED(inode)) {
inode->i_op = &ext4_fast_symlink_inode_operations;
inode->i_link = (char *)&EXT4_I(inode)->i_data;
}
@@ -3203,16 +3164,17 @@
if (handle)
ext4_journal_stop(handle);
- kfree(sd);
- return err;
+ goto out_free_encrypted_link;
+
err_drop_inode:
if (handle)
ext4_journal_stop(handle);
clear_nlink(inode);
unlock_new_inode(inode);
iput(inode);
-err_free_sd:
- kfree(sd);
+out_free_encrypted_link:
+ if (disk_link.name != (unsigned char *)symname)
+ kfree(disk_link.name);
return err;
}
diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index 9ffa6fa..039aa69 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -46,6 +46,7 @@
#include <linux/cleancache.h>
#include "ext4.h"
+#include <trace/events/android_fs.h>
static inline bool ext4_bio_encrypted(struct bio *bio)
{
@@ -56,6 +57,17 @@
#endif
}
+static void
+ext4_trace_read_completion(struct bio *bio)
+{
+ struct page *first_page = bio->bi_io_vec[0].bv_page;
+
+ if (first_page != NULL)
+ trace_android_fs_dataread_end(first_page->mapping->host,
+ page_offset(first_page),
+ bio->bi_iter.bi_size);
+}
+
/*
* I/O completion handler for multipage BIOs.
*
@@ -73,11 +85,14 @@
struct bio_vec *bv;
int i;
+ if (trace_android_fs_dataread_start_enabled())
+ ext4_trace_read_completion(bio);
+
if (ext4_bio_encrypted(bio)) {
if (bio->bi_status) {
fscrypt_release_ctx(bio->bi_private);
} else {
- fscrypt_decrypt_bio_pages(bio->bi_private, bio);
+ fscrypt_enqueue_decrypt_bio(bio->bi_private, bio);
return;
}
}
@@ -96,6 +111,30 @@
bio_put(bio);
}
+static void
+ext4_submit_bio_read(struct bio *bio)
+{
+ if (trace_android_fs_dataread_start_enabled()) {
+ struct page *first_page = bio->bi_io_vec[0].bv_page;
+
+ if (first_page != NULL) {
+ char *path, pathbuf[MAX_TRACE_PATHBUF_LEN];
+
+ path = android_fstrace_get_pathname(pathbuf,
+ MAX_TRACE_PATHBUF_LEN,
+ first_page->mapping->host);
+ trace_android_fs_dataread_start(
+ first_page->mapping->host,
+ page_offset(first_page),
+ bio->bi_iter.bi_size,
+ current->pid,
+ path,
+ current->comm);
+ }
+ }
+ submit_bio(bio);
+}
+
int ext4_mpage_readpages(struct address_space *mapping,
struct list_head *pages, struct page *page,
unsigned nr_pages)
@@ -236,7 +275,7 @@
*/
if (bio && (last_block_in_bio != blocks[0] - 1)) {
submit_and_realloc:
- submit_bio(bio);
+ ext4_submit_bio_read(bio);
bio = NULL;
}
if (bio == NULL) {
@@ -269,14 +308,14 @@
if (((map.m_flags & EXT4_MAP_BOUNDARY) &&
(relative_block == map.m_len)) ||
(first_hole != blocks_per_page)) {
- submit_bio(bio);
+ ext4_submit_bio_read(bio);
bio = NULL;
} else
last_block_in_bio = blocks[blocks_per_page - 1];
goto next_page;
confused:
if (bio) {
- submit_bio(bio);
+ ext4_submit_bio_read(bio);
bio = NULL;
}
if (!PageUptodate(page))
@@ -289,6 +328,6 @@
}
BUG_ON(pages && !list_empty(pages));
if (bio)
- submit_bio(bio);
+ ext4_submit_bio_read(bio);
return 0;
}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index b4fb085..8619163 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1070,9 +1070,7 @@
jbd2_free_inode(EXT4_I(inode)->jinode);
EXT4_I(inode)->jinode = NULL;
}
-#ifdef CONFIG_EXT4_FS_ENCRYPTION
- fscrypt_put_encryption_info(inode, NULL);
-#endif
+ fscrypt_put_encryption_info(inode);
}
static struct inode *ext4_nfs_get_inode(struct super_block *sb,
@@ -1182,7 +1180,8 @@
ext4_clear_inode_state(inode,
EXT4_STATE_MAY_INLINE_DATA);
/*
- * Update inode->i_flags - e.g. S_DAX may get disabled
+ * Update inode->i_flags - S_ENCRYPTED will be enabled,
+ * S_DAX may be disabled
*/
ext4_set_inode_flags(inode);
}
@@ -1207,7 +1206,10 @@
ctx, len, 0);
if (!res) {
ext4_set_inode_flag(inode, EXT4_INODE_ENCRYPT);
- /* Update inode->i_flags - e.g. S_DAX may get disabled */
+ /*
+ * Update inode->i_flags - S_ENCRYPTED will be enabled,
+ * S_DAX may be disabled
+ */
ext4_set_inode_flags(inode);
res = ext4_mark_inode_dirty(handle, inode);
if (res)
@@ -1227,24 +1229,13 @@
return DUMMY_ENCRYPTION_ENABLED(EXT4_SB(inode->i_sb));
}
-static unsigned ext4_max_namelen(struct inode *inode)
-{
- return S_ISLNK(inode->i_mode) ? inode->i_sb->s_blocksize :
- EXT4_NAME_LEN;
-}
-
static const struct fscrypt_operations ext4_cryptops = {
.key_prefix = "ext4:",
.get_context = ext4_get_context,
.set_context = ext4_set_context,
.dummy_context = ext4_dummy_context,
- .is_encrypted = ext4_encrypted_inode,
.empty_dir = ext4_empty_dir,
- .max_namelen = ext4_max_namelen,
-};
-#else
-static const struct fscrypt_operations ext4_cryptops = {
- .is_encrypted = ext4_encrypted_inode,
+ .max_namelen = EXT4_NAME_LEN,
};
#endif
@@ -4064,7 +4055,9 @@
sb->s_op = &ext4_sops;
sb->s_export_op = &ext4_export_ops;
sb->s_xattr = ext4_xattr_handlers;
+#ifdef CONFIG_EXT4_FS_ENCRYPTION
sb->s_cop = &ext4_cryptops;
+#endif
#ifdef CONFIG_QUOTA
sb->dq_op = &ext4_quota_operations;
if (ext4_has_feature_quota(sb))
diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c
index a2006c9..dd05af9 100644
--- a/fs/ext4/symlink.c
+++ b/fs/ext4/symlink.c
@@ -28,59 +28,28 @@
struct delayed_call *done)
{
struct page *cpage = NULL;
- char *caddr, *paddr = NULL;
- struct fscrypt_str cstr, pstr;
- struct fscrypt_symlink_data *sd;
- int res;
- u32 max_size = inode->i_sb->s_blocksize;
+ const void *caddr;
+ unsigned int max_size;
+ const char *paddr;
if (!dentry)
return ERR_PTR(-ECHILD);
- res = fscrypt_get_encryption_info(inode);
- if (res)
- return ERR_PTR(res);
-
if (ext4_inode_is_fast_symlink(inode)) {
- caddr = (char *) EXT4_I(inode)->i_data;
+ caddr = EXT4_I(inode)->i_data;
max_size = sizeof(EXT4_I(inode)->i_data);
} else {
cpage = read_mapping_page(inode->i_mapping, 0, NULL);
if (IS_ERR(cpage))
return ERR_CAST(cpage);
caddr = page_address(cpage);
+ max_size = inode->i_sb->s_blocksize;
}
- /* Symlink is encrypted */
- sd = (struct fscrypt_symlink_data *)caddr;
- cstr.name = sd->encrypted_path;
- cstr.len = le16_to_cpu(sd->len);
- if ((cstr.len + sizeof(struct fscrypt_symlink_data) - 1) > max_size) {
- /* Symlink data on the disk is corrupted */
- res = -EFSCORRUPTED;
- goto errout;
- }
-
- res = fscrypt_fname_alloc_buffer(inode, cstr.len, &pstr);
- if (res)
- goto errout;
- paddr = pstr.name;
-
- res = fscrypt_fname_disk_to_usr(inode, 0, 0, &cstr, &pstr);
- if (res)
- goto errout;
-
- /* Null-terminate the name */
- paddr[pstr.len] = '\0';
+ paddr = fscrypt_get_symlink(inode, caddr, max_size, done);
if (cpage)
put_page(cpage);
- set_delayed_call(done, kfree_link, paddr);
return paddr;
-errout:
- if (cpage)
- put_page(cpage);
- kfree(paddr);
- return ERR_PTR(res);
}
const struct inode_operations ext4_encrypted_symlink_inode_operations = {
diff --git a/fs/f2fs/acl.c b/fs/f2fs/acl.c
index 436b3a1..1118241 100644
--- a/fs/f2fs/acl.c
+++ b/fs/f2fs/acl.c
@@ -250,6 +250,9 @@
int f2fs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
{
+ if (unlikely(f2fs_cp_error(F2FS_I_SB(inode))))
+ return -EIO;
+
return __f2fs_set_acl(inode, type, acl, NULL);
}
@@ -267,7 +270,7 @@
sizeof(struct posix_acl_entry);
clone = kmemdup(acl, size, flags);
if (clone)
- atomic_set(&clone->a_refcount, 1);
+ refcount_set(&clone->a_refcount, 1);
}
return clone;
}
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index c282e21..d7a53c6 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -24,12 +24,11 @@
#include <trace/events/f2fs.h>
static struct kmem_cache *ino_entry_slab;
-struct kmem_cache *inode_entry_slab;
+struct kmem_cache *f2fs_inode_entry_slab;
void f2fs_stop_checkpoint(struct f2fs_sb_info *sbi, bool end_io)
{
set_ckpt_flags(sbi, CP_ERROR_FLAG);
- sbi->sb->s_flags |= MS_RDONLY;
if (!end_io)
f2fs_flush_merged_writes(sbi);
}
@@ -37,7 +36,7 @@
/*
* We guarantee no failure on the returned page.
*/
-struct page *grab_meta_page(struct f2fs_sb_info *sbi, pgoff_t index)
+struct page *f2fs_grab_meta_page(struct f2fs_sb_info *sbi, pgoff_t index)
{
struct address_space *mapping = META_MAPPING(sbi);
struct page *page = NULL;
@@ -69,6 +68,7 @@
.old_blkaddr = index,
.new_blkaddr = index,
.encrypted_page = NULL,
+ .is_meta = is_meta,
};
if (unlikely(!is_meta))
@@ -100,24 +100,27 @@
* readonly and make sure do not write checkpoint with non-uptodate
* meta page.
*/
- if (unlikely(!PageUptodate(page)))
+ if (unlikely(!PageUptodate(page))) {
+ memset(page_address(page), 0, PAGE_SIZE);
f2fs_stop_checkpoint(sbi, false);
+ }
out:
return page;
}
-struct page *get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index)
+struct page *f2fs_get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index)
{
return __get_meta_page(sbi, index, true);
}
/* for POR only */
-struct page *get_tmp_page(struct f2fs_sb_info *sbi, pgoff_t index)
+struct page *f2fs_get_tmp_page(struct f2fs_sb_info *sbi, pgoff_t index)
{
return __get_meta_page(sbi, index, false);
}
-bool is_valid_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type)
+bool f2fs_is_valid_meta_blkaddr(struct f2fs_sb_info *sbi,
+ block_t blkaddr, int type)
{
switch (type) {
case META_NAT:
@@ -151,7 +154,7 @@
/*
* Readahead CP/NAT/SIT/SSA pages
*/
-int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages,
+int f2fs_ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages,
int type, bool sync)
{
struct page *page;
@@ -163,6 +166,7 @@
.op_flags = sync ? (REQ_META | REQ_PRIO) : REQ_RAHEAD,
.encrypted_page = NULL,
.in_list = false,
+ .is_meta = (type != META_POR),
};
struct blk_plug plug;
@@ -172,7 +176,7 @@
blk_start_plug(&plug);
for (; nrpages-- > 0; blkno++) {
- if (!is_valid_blkaddr(sbi, blkno, type))
+ if (!f2fs_is_valid_meta_blkaddr(sbi, blkno, type))
goto out;
switch (type) {
@@ -216,7 +220,7 @@
return blkno - start;
}
-void ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index)
+void f2fs_ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index)
{
struct page *page;
bool readahead = false;
@@ -227,7 +231,7 @@
f2fs_put_page(page, 0);
if (readahead)
- ra_meta_pages(sbi, index, BIO_MAX_PAGES, META_POR, true);
+ f2fs_ra_meta_pages(sbi, index, BIO_MAX_PAGES, META_POR, true);
}
static int __f2fs_write_meta_page(struct page *page,
@@ -238,14 +242,17 @@
trace_f2fs_writepage(page, META);
+ if (unlikely(f2fs_cp_error(sbi))) {
+ dec_page_count(sbi, F2FS_DIRTY_META);
+ unlock_page(page);
+ return 0;
+ }
if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
goto redirty_out;
if (wbc->for_reclaim && page->index < GET_SUM_BLOCK(sbi, 0))
goto redirty_out;
- if (unlikely(f2fs_cp_error(sbi)))
- goto redirty_out;
- write_meta_page(sbi, page, io_type);
+ f2fs_do_write_meta_page(sbi, page, io_type);
dec_page_count(sbi, F2FS_DIRTY_META);
if (wbc->for_reclaim)
@@ -290,7 +297,7 @@
trace_f2fs_writepages(mapping->host, wbc, META);
diff = nr_pages_to_write(sbi, META, wbc);
- written = sync_meta_pages(sbi, META, wbc->nr_to_write, FS_META_IO);
+ written = f2fs_sync_meta_pages(sbi, META, wbc->nr_to_write, FS_META_IO);
mutex_unlock(&sbi->cp_mutex);
wbc->nr_to_write = max((long)0, wbc->nr_to_write - written - diff);
return 0;
@@ -301,13 +308,14 @@
return 0;
}
-long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type,
+long f2fs_sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type,
long nr_to_write, enum iostat_type io_type)
{
struct address_space *mapping = META_MAPPING(sbi);
- pgoff_t index = 0, end = ULONG_MAX, prev = ULONG_MAX;
+ pgoff_t index = 0, prev = ULONG_MAX;
struct pagevec pvec;
long nwritten = 0;
+ int nr_pages;
struct writeback_control wbc = {
.for_reclaim = 0,
};
@@ -317,13 +325,9 @@
blk_start_plug(&plug);
- while (index <= end) {
- int i, nr_pages;
- nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
- PAGECACHE_TAG_DIRTY,
- min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1);
- if (unlikely(nr_pages == 0))
- break;
+ while ((nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
+ PAGECACHE_TAG_DIRTY))) {
+ int i;
for (i = 0; i < nr_pages; i++) {
struct page *page = pvec.pages[i];
@@ -381,7 +385,7 @@
if (!PageUptodate(page))
SetPageUptodate(page);
if (!PageDirty(page)) {
- f2fs_set_page_dirty_nobuffers(page);
+ __set_page_dirty_nobuffers(page);
inc_page_count(F2FS_P_SB(page), F2FS_DIRTY_META);
SetPagePrivate(page);
f2fs_trace_pid(page);
@@ -401,24 +405,23 @@
#endif
};
-static void __add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
+static void __add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino,
+ unsigned int devidx, int type)
{
struct inode_management *im = &sbi->im[type];
struct ino_entry *e, *tmp;
tmp = f2fs_kmem_cache_alloc(ino_entry_slab, GFP_NOFS);
-retry:
+
radix_tree_preload(GFP_NOFS | __GFP_NOFAIL);
spin_lock(&im->ino_lock);
e = radix_tree_lookup(&im->ino_root, ino);
if (!e) {
e = tmp;
- if (radix_tree_insert(&im->ino_root, ino, e)) {
- spin_unlock(&im->ino_lock);
- radix_tree_preload_end();
- goto retry;
- }
+ if (unlikely(radix_tree_insert(&im->ino_root, ino, e)))
+ f2fs_bug_on(sbi, 1);
+
memset(e, 0, sizeof(struct ino_entry));
e->ino = ino;
@@ -426,6 +429,10 @@
if (type != ORPHAN_INO)
im->ino_num++;
}
+
+ if (type == FLUSH_INO)
+ f2fs_set_bit(devidx, (char *)&e->dirty_device);
+
spin_unlock(&im->ino_lock);
radix_tree_preload_end();
@@ -451,20 +458,20 @@
spin_unlock(&im->ino_lock);
}
-void add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
+void f2fs_add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
{
/* add new dirty ino entry into list */
- __add_ino_entry(sbi, ino, type);
+ __add_ino_entry(sbi, ino, 0, type);
}
-void remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
+void f2fs_remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type)
{
/* remove dirty ino entry from list */
__remove_ino_entry(sbi, ino, type);
}
/* mode should be APPEND_INO or UPDATE_INO */
-bool exist_written_data(struct f2fs_sb_info *sbi, nid_t ino, int mode)
+bool f2fs_exist_written_data(struct f2fs_sb_info *sbi, nid_t ino, int mode)
{
struct inode_management *im = &sbi->im[mode];
struct ino_entry *e;
@@ -475,12 +482,12 @@
return e ? true : false;
}
-void release_ino_entry(struct f2fs_sb_info *sbi, bool all)
+void f2fs_release_ino_entry(struct f2fs_sb_info *sbi, bool all)
{
struct ino_entry *e, *tmp;
int i;
- for (i = all ? ORPHAN_INO: APPEND_INO; i <= UPDATE_INO; i++) {
+ for (i = all ? ORPHAN_INO : APPEND_INO; i < MAX_INO_ENTRY; i++) {
struct inode_management *im = &sbi->im[i];
spin_lock(&im->ino_lock);
@@ -494,7 +501,28 @@
}
}
-int acquire_orphan_inode(struct f2fs_sb_info *sbi)
+void f2fs_set_dirty_device(struct f2fs_sb_info *sbi, nid_t ino,
+ unsigned int devidx, int type)
+{
+ __add_ino_entry(sbi, ino, devidx, type);
+}
+
+bool f2fs_is_dirty_device(struct f2fs_sb_info *sbi, nid_t ino,
+ unsigned int devidx, int type)
+{
+ struct inode_management *im = &sbi->im[type];
+ struct ino_entry *e;
+ bool is_dirty = false;
+
+ spin_lock(&im->ino_lock);
+ e = radix_tree_lookup(&im->ino_root, ino);
+ if (e && f2fs_test_bit(devidx, (char *)&e->dirty_device))
+ is_dirty = true;
+ spin_unlock(&im->ino_lock);
+ return is_dirty;
+}
+
+int f2fs_acquire_orphan_inode(struct f2fs_sb_info *sbi)
{
struct inode_management *im = &sbi->im[ORPHAN_INO];
int err = 0;
@@ -517,7 +545,7 @@
return err;
}
-void release_orphan_inode(struct f2fs_sb_info *sbi)
+void f2fs_release_orphan_inode(struct f2fs_sb_info *sbi)
{
struct inode_management *im = &sbi->im[ORPHAN_INO];
@@ -527,14 +555,14 @@
spin_unlock(&im->ino_lock);
}
-void add_orphan_inode(struct inode *inode)
+void f2fs_add_orphan_inode(struct inode *inode)
{
/* add new orphan ino entry into list */
- __add_ino_entry(F2FS_I_SB(inode), inode->i_ino, ORPHAN_INO);
- update_inode_page(inode);
+ __add_ino_entry(F2FS_I_SB(inode), inode->i_ino, 0, ORPHAN_INO);
+ f2fs_update_inode_page(inode);
}
-void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
+void f2fs_remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino)
{
/* remove orphan entry from orphan list */
__remove_ino_entry(sbi, ino, ORPHAN_INO);
@@ -544,17 +572,12 @@
{
struct inode *inode;
struct node_info ni;
- int err = acquire_orphan_inode(sbi);
+ int err = f2fs_acquire_orphan_inode(sbi);
- if (err) {
- set_sbi_flag(sbi, SBI_NEED_FSCK);
- f2fs_msg(sbi->sb, KERN_WARNING,
- "%s: orphan failed (ino=%x), run fsck to fix.",
- __func__, ino);
- return err;
- }
+ if (err)
+ goto err_out;
- __add_ino_entry(sbi, ino, ORPHAN_INO);
+ __add_ino_entry(sbi, ino, 0, ORPHAN_INO);
inode = f2fs_iget_retry(sbi->sb, ino);
if (IS_ERR(inode)) {
@@ -566,30 +589,43 @@
return PTR_ERR(inode);
}
+ err = dquot_initialize(inode);
+ if (err) {
+ iput(inode);
+ goto err_out;
+ }
+
clear_nlink(inode);
/* truncate all the data during iput */
iput(inode);
- get_node_info(sbi, ino, &ni);
+ f2fs_get_node_info(sbi, ino, &ni);
/* ENOMEM was fully retried in f2fs_evict_inode. */
if (ni.blk_addr != NULL_ADDR) {
- set_sbi_flag(sbi, SBI_NEED_FSCK);
- f2fs_msg(sbi->sb, KERN_WARNING,
- "%s: orphan failed (ino=%x) by kernel, retry mount.",
- __func__, ino);
- return -EIO;
+ err = -EIO;
+ goto err_out;
}
__remove_ino_entry(sbi, ino, ORPHAN_INO);
return 0;
+
+err_out:
+ set_sbi_flag(sbi, SBI_NEED_FSCK);
+ f2fs_msg(sbi->sb, KERN_WARNING,
+ "%s: orphan failed (ino=%x), run fsck to fix.",
+ __func__, ino);
+ return err;
}
-int recover_orphan_inodes(struct f2fs_sb_info *sbi)
+int f2fs_recover_orphan_inodes(struct f2fs_sb_info *sbi)
{
block_t start_blk, orphan_blocks, i, j;
unsigned int s_flags = sbi->sb->s_flags;
int err = 0;
+#ifdef CONFIG_QUOTA
+ int quota_enabled;
+#endif
if (!is_set_ckpt_flags(sbi, CP_ORPHAN_PRESENT_FLAG))
return 0;
@@ -602,17 +638,18 @@
#ifdef CONFIG_QUOTA
/* Needed for iput() to work correctly and not trash data */
sbi->sb->s_flags |= MS_ACTIVE;
+
/* Turn on quotas so that they are updated correctly */
- f2fs_enable_quota_files(sbi);
+ quota_enabled = f2fs_enable_quota_files(sbi, s_flags & MS_RDONLY);
#endif
start_blk = __start_cp_addr(sbi) + 1 + __cp_payload(sbi);
orphan_blocks = __start_sum_addr(sbi) - 1 - __cp_payload(sbi);
- ra_meta_pages(sbi, start_blk, orphan_blocks, META_CP, true);
+ f2fs_ra_meta_pages(sbi, start_blk, orphan_blocks, META_CP, true);
for (i = 0; i < orphan_blocks; i++) {
- struct page *page = get_meta_page(sbi, start_blk + i);
+ struct page *page = f2fs_get_meta_page(sbi, start_blk + i);
struct f2fs_orphan_block *orphan_blk;
orphan_blk = (struct f2fs_orphan_block *)page_address(page);
@@ -631,7 +668,8 @@
out:
#ifdef CONFIG_QUOTA
/* Turn quotas off */
- f2fs_quota_off_umount(sbi->sb);
+ if (quota_enabled)
+ f2fs_quota_off_umount(sbi->sb);
#endif
sbi->sb->s_flags = s_flags; /* Restore MS_RDONLY status */
@@ -661,7 +699,7 @@
/* loop for each orphan inode entry and write them in Jornal block */
list_for_each_entry(orphan, head, list) {
if (!page) {
- page = grab_meta_page(sbi, start_blk++);
+ page = f2fs_grab_meta_page(sbi, start_blk++);
orphan_blk =
(struct f2fs_orphan_block *)page_address(page);
memset(orphan_blk, 0, sizeof(*orphan_blk));
@@ -703,7 +741,7 @@
size_t crc_offset = 0;
__u32 crc = 0;
- *cp_page = get_meta_page(sbi, cp_addr);
+ *cp_page = f2fs_get_meta_page(sbi, cp_addr);
*cp_block = (struct f2fs_checkpoint *)page_address(*cp_page);
crc_offset = le32_to_cpu((*cp_block)->checksum_offset);
@@ -756,7 +794,7 @@
return NULL;
}
-int get_valid_checkpoint(struct f2fs_sb_info *sbi)
+int f2fs_get_valid_checkpoint(struct f2fs_sb_info *sbi)
{
struct f2fs_checkpoint *cp_block;
struct f2fs_super_block *fsb = sbi->raw_super;
@@ -768,7 +806,8 @@
block_t cp_blk_no;
int i;
- sbi->ckpt = kzalloc(cp_blks * blk_size, GFP_KERNEL);
+ sbi->ckpt = f2fs_kzalloc(sbi, array_size(blk_size, cp_blks),
+ GFP_KERNEL);
if (!sbi->ckpt)
return -ENOMEM;
/*
@@ -800,7 +839,7 @@
memcpy(sbi->ckpt, cp_block, blk_size);
/* Sanity checking of checkpoint */
- if (sanity_check_ckpt(sbi))
+ if (f2fs_sanity_check_ckpt(sbi))
goto free_fail_no_cp;
if (cur_page == cp1)
@@ -819,7 +858,7 @@
void *sit_bitmap_ptr;
unsigned char *ckpt = (unsigned char *)sbi->ckpt;
- cur_page = get_meta_page(sbi, cp_blk_no + i);
+ cur_page = f2fs_get_meta_page(sbi, cp_blk_no + i);
sit_bitmap_ptr = page_address(cur_page);
memcpy(ckpt + i * blk_size, sit_bitmap_ptr, blk_size);
f2fs_put_page(cur_page, 1);
@@ -864,7 +903,7 @@
stat_dec_dirty_inode(F2FS_I_SB(inode), type);
}
-void update_dirty_page(struct inode *inode, struct page *page)
+void f2fs_update_dirty_page(struct inode *inode, struct page *page)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
enum inode_type type = S_ISDIR(inode->i_mode) ? DIR_INODE : FILE_INODE;
@@ -883,7 +922,7 @@
f2fs_trace_pid(page);
}
-void remove_dirty_inode(struct inode *inode)
+void f2fs_remove_dirty_inode(struct inode *inode)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
enum inode_type type = S_ISDIR(inode->i_mode) ? DIR_INODE : FILE_INODE;
@@ -900,7 +939,7 @@
spin_unlock(&sbi->inode_lock[type]);
}
-int sync_dirty_inodes(struct f2fs_sb_info *sbi, enum inode_type type)
+int f2fs_sync_dirty_inodes(struct f2fs_sb_info *sbi, enum inode_type type)
{
struct list_head *head;
struct inode *inode;
@@ -983,10 +1022,10 @@
/* it's on eviction */
if (is_inode_flag_set(inode, FI_DIRTY_INODE))
- update_inode_page(inode);
+ f2fs_update_inode_page(inode);
iput(inode);
}
- };
+ }
return 0;
}
@@ -1023,7 +1062,7 @@
/* write all the dirty dentry pages */
if (get_pages(sbi, F2FS_DIRTY_DENTS)) {
f2fs_unlock_all(sbi);
- err = sync_dirty_inodes(sbi, DIR_INODE);
+ err = f2fs_sync_dirty_inodes(sbi, DIR_INODE);
if (err)
goto out;
cond_resched();
@@ -1051,7 +1090,9 @@
if (get_pages(sbi, F2FS_DIRTY_NODES)) {
up_write(&sbi->node_write);
- err = sync_node_pages(sbi, &wbc, false, FS_CP_NODE_IO);
+ atomic_inc(&sbi->wb_sync_req[NODE]);
+ err = f2fs_sync_node_pages(sbi, &wbc, false, FS_CP_NODE_IO);
+ atomic_dec(&sbi->wb_sync_req[NODE]);
if (err) {
up_write(&sbi->node_change);
f2fs_unlock_all(sbi);
@@ -1131,10 +1172,44 @@
/* set this flag to activate crc|cp_ver for recovery */
__set_ckpt_flags(ckpt, CP_CRC_RECOVERY_FLAG);
+ __clear_ckpt_flags(ckpt, CP_NOCRC_RECOVERY_FLAG);
spin_unlock_irqrestore(&sbi->cp_lock, flags);
}
+static void commit_checkpoint(struct f2fs_sb_info *sbi,
+ void *src, block_t blk_addr)
+{
+ struct writeback_control wbc = {
+ .for_reclaim = 0,
+ };
+
+ /*
+ * pagevec_lookup_tag and lock_page again will take
+ * some extra time. Therefore, f2fs_update_meta_pages and
+ * f2fs_sync_meta_pages are combined in this function.
+ */
+ struct page *page = f2fs_grab_meta_page(sbi, blk_addr);
+ int err;
+
+ memcpy(page_address(page), src, PAGE_SIZE);
+ set_page_dirty(page);
+
+ f2fs_wait_on_page_writeback(page, META, true);
+ f2fs_bug_on(sbi, PageWriteback(page));
+ if (unlikely(!clear_page_dirty_for_io(page)))
+ f2fs_bug_on(sbi, 1);
+
+ /* writeout cp pack 2 page */
+ err = __f2fs_write_meta_page(page, &wbc, FS_CP_META_IO);
+ f2fs_bug_on(sbi, err);
+
+ f2fs_put_page(page, 0);
+
+ /* submit checkpoint (with barrier if NOBARRIER is not set) */
+ f2fs_submit_merged_write(sbi, META_FLUSH);
+}
+
static int do_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
{
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
@@ -1148,10 +1223,11 @@
struct super_block *sb = sbi->sb;
struct curseg_info *seg_i = CURSEG_I(sbi, CURSEG_HOT_NODE);
u64 kbytes_written;
+ int err;
/* Flush all the NAT/SIT pages */
while (get_pages(sbi, F2FS_DIRTY_META)) {
- sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO);
+ f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO);
if (unlikely(f2fs_cp_error(sbi)))
return -EIO;
}
@@ -1160,7 +1236,7 @@
* modify checkpoint
* version number is already updated
*/
- ckpt->elapsed_time = cpu_to_le64(get_mtime(sbi));
+ ckpt->elapsed_time = cpu_to_le64(get_mtime(sbi, true));
ckpt->free_segment_count = cpu_to_le32(free_segments(sbi));
for (i = 0; i < NR_CURSEG_NODE_TYPE; i++) {
ckpt->cur_node_segno[i] =
@@ -1180,7 +1256,7 @@
}
/* 2 cp + n data seg summary + orphan inode blocks */
- data_sum_blocks = npages_for_summary_flush(sbi, false);
+ data_sum_blocks = f2fs_npages_for_summary_flush(sbi, false);
spin_lock_irqsave(&sbi->cp_lock, flags);
if (data_sum_blocks < NR_CURSEG_DATA_TYPE)
__set_ckpt_flags(ckpt, CP_COMPACT_SUM_FLAG);
@@ -1225,27 +1301,23 @@
blk = start_blk + sbi->blocks_per_seg - nm_i->nat_bits_blocks;
for (i = 0; i < nm_i->nat_bits_blocks; i++)
- update_meta_page(sbi, nm_i->nat_bits +
+ f2fs_update_meta_page(sbi, nm_i->nat_bits +
(i << F2FS_BLKSIZE_BITS), blk + i);
/* Flush all the NAT BITS pages */
while (get_pages(sbi, F2FS_DIRTY_META)) {
- sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO);
+ f2fs_sync_meta_pages(sbi, META, LONG_MAX,
+ FS_CP_META_IO);
if (unlikely(f2fs_cp_error(sbi)))
return -EIO;
}
}
- /* need to wait for end_io results */
- wait_on_all_pages_writeback(sbi);
- if (unlikely(f2fs_cp_error(sbi)))
- return -EIO;
-
/* write out checkpoint buffer at block 0 */
- update_meta_page(sbi, ckpt, start_blk++);
+ f2fs_update_meta_page(sbi, ckpt, start_blk++);
for (i = 1; i < 1 + cp_payload_blks; i++)
- update_meta_page(sbi, (char *)ckpt + i * F2FS_BLKSIZE,
+ f2fs_update_meta_page(sbi, (char *)ckpt + i * F2FS_BLKSIZE,
start_blk++);
if (orphan_num) {
@@ -1253,7 +1325,7 @@
start_blk += orphan_blocks;
}
- write_data_summaries(sbi, start_blk);
+ f2fs_write_data_summaries(sbi, start_blk);
start_blk += data_sum_blocks;
/* Record write statistics in the hot node summary */
@@ -1264,33 +1336,33 @@
seg_i->journal->info.kbytes_written = cpu_to_le64(kbytes_written);
if (__remain_node_summaries(cpc->reason)) {
- write_node_summaries(sbi, start_blk);
+ f2fs_write_node_summaries(sbi, start_blk);
start_blk += NR_CURSEG_NODE_TYPE;
}
- /* writeout checkpoint block */
- update_meta_page(sbi, ckpt, start_blk);
-
- /* wait for previous submitted node/meta pages writeback */
- wait_on_all_pages_writeback(sbi);
-
- if (unlikely(f2fs_cp_error(sbi)))
- return -EIO;
-
- filemap_fdatawait_range(NODE_MAPPING(sbi), 0, LLONG_MAX);
- filemap_fdatawait_range(META_MAPPING(sbi), 0, LLONG_MAX);
-
/* update user_block_counts */
sbi->last_valid_block_count = sbi->total_valid_block_count;
percpu_counter_set(&sbi->alloc_valid_block_count, 0);
- /* Here, we only have one bio having CP pack */
- sync_meta_pages(sbi, META_FLUSH, LONG_MAX, FS_CP_META_IO);
+ /* Here, we have one bio having CP pack except cp pack 2 page */
+ f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_CP_META_IO);
/* wait for previous submitted meta pages writeback */
wait_on_all_pages_writeback(sbi);
- release_ino_entry(sbi, false);
+ if (unlikely(f2fs_cp_error(sbi)))
+ return -EIO;
+
+ /* flush all device cache */
+ err = f2fs_flush_device_cache(sbi);
+ if (err)
+ return err;
+
+ /* barrier and flush checkpoint cp pack 2 page if it can */
+ commit_checkpoint(sbi, ckpt, start_blk);
+ wait_on_all_pages_writeback(sbi);
+
+ f2fs_release_ino_entry(sbi, false);
if (unlikely(f2fs_cp_error(sbi)))
return -EIO;
@@ -1315,7 +1387,7 @@
/*
* We guarantee that this checkpoint procedure will not fail.
*/
-int write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
+int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc)
{
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
unsigned long long ckpt_ver;
@@ -1348,7 +1420,7 @@
/* this is the case of multiple fstrims without any changes */
if (cpc->reason & CP_DISCARD) {
- if (!exist_trim_candidates(sbi, cpc)) {
+ if (!f2fs_exist_trim_candidates(sbi, cpc)) {
unblock_operations(sbi);
goto out;
}
@@ -1356,8 +1428,8 @@
if (NM_I(sbi)->dirty_nat_cnt == 0 &&
SIT_I(sbi)->dirty_sentries == 0 &&
prefree_segments(sbi) == 0) {
- flush_sit_entries(sbi, cpc);
- clear_prefree_segments(sbi, cpc);
+ f2fs_flush_sit_entries(sbi, cpc);
+ f2fs_clear_prefree_segments(sbi, cpc);
unblock_operations(sbi);
goto out;
}
@@ -1372,15 +1444,15 @@
ckpt->checkpoint_ver = cpu_to_le64(++ckpt_ver);
/* write cached NAT/SIT entries to NAT/SIT area */
- flush_nat_entries(sbi, cpc);
- flush_sit_entries(sbi, cpc);
+ f2fs_flush_nat_entries(sbi, cpc);
+ f2fs_flush_sit_entries(sbi, cpc);
/* unlock all the fs_lock[] in do_checkpoint() */
err = do_checkpoint(sbi, cpc);
if (err)
- release_discard_addrs(sbi);
+ f2fs_release_discard_addrs(sbi);
else
- clear_prefree_segments(sbi, cpc);
+ f2fs_clear_prefree_segments(sbi, cpc);
unblock_operations(sbi);
stat_inc_cp_count(sbi->stat_info);
@@ -1397,7 +1469,7 @@
return err;
}
-void init_ino_entry_info(struct f2fs_sb_info *sbi)
+void f2fs_init_ino_entry_info(struct f2fs_sb_info *sbi)
{
int i;
@@ -1415,23 +1487,23 @@
F2FS_ORPHANS_PER_BLOCK;
}
-int __init create_checkpoint_caches(void)
+int __init f2fs_create_checkpoint_caches(void)
{
ino_entry_slab = f2fs_kmem_cache_create("f2fs_ino_entry",
sizeof(struct ino_entry));
if (!ino_entry_slab)
return -ENOMEM;
- inode_entry_slab = f2fs_kmem_cache_create("f2fs_inode_entry",
+ f2fs_inode_entry_slab = f2fs_kmem_cache_create("f2fs_inode_entry",
sizeof(struct inode_entry));
- if (!inode_entry_slab) {
+ if (!f2fs_inode_entry_slab) {
kmem_cache_destroy(ino_entry_slab);
return -ENOMEM;
}
return 0;
}
-void destroy_checkpoint_caches(void)
+void f2fs_destroy_checkpoint_caches(void)
{
kmem_cache_destroy(ino_entry_slab);
- kmem_cache_destroy(inode_entry_slab);
+ kmem_cache_destroy(f2fs_inode_entry_slab);
}
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index e10bd73..e5936df 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -19,8 +19,6 @@
#include <linux/bio.h>
#include <linux/prefetch.h>
#include <linux/uio.h>
-#include <linux/mm.h>
-#include <linux/memcontrol.h>
#include <linux/cleancache.h>
#include <linux/sched/signal.h>
@@ -29,6 +27,12 @@
#include "segment.h"
#include "trace.h"
#include <trace/events/f2fs.h>
+#include <trace/events/android_fs.h>
+
+#define NUM_PREALLOC_POST_READ_CTXS 128
+
+static struct kmem_cache *bio_post_read_ctx_cache;
+static mempool_t *bio_post_read_ctx_pool;
static bool __is_cp_guaranteed(struct page *page)
{
@@ -45,16 +49,84 @@
if (inode->i_ino == F2FS_META_INO(sbi) ||
inode->i_ino == F2FS_NODE_INO(sbi) ||
S_ISDIR(inode->i_mode) ||
+ (S_ISREG(inode->i_mode) &&
+ is_inode_flag_set(inode, FI_ATOMIC_FILE)) ||
is_cold_data(page))
return true;
return false;
}
-static void f2fs_read_end_io(struct bio *bio)
+/* postprocessing steps for read bios */
+enum bio_post_read_step {
+ STEP_INITIAL = 0,
+ STEP_DECRYPT,
+};
+
+struct bio_post_read_ctx {
+ struct bio *bio;
+ struct work_struct work;
+ unsigned int cur_step;
+ unsigned int enabled_steps;
+};
+
+static void __read_end_io(struct bio *bio)
{
- struct bio_vec *bvec;
+ struct page *page;
+ struct bio_vec *bv;
int i;
+ bio_for_each_segment_all(bv, bio, i) {
+ page = bv->bv_page;
+
+ /* PG_error was set if any post_read step failed */
+ if (bio->bi_status || PageError(page)) {
+ ClearPageUptodate(page);
+ SetPageError(page);
+ } else {
+ SetPageUptodate(page);
+ }
+ unlock_page(page);
+ }
+ if (bio->bi_private)
+ mempool_free(bio->bi_private, bio_post_read_ctx_pool);
+ bio_put(bio);
+}
+
+static void bio_post_read_processing(struct bio_post_read_ctx *ctx);
+
+static void decrypt_work(struct work_struct *work)
+{
+ struct bio_post_read_ctx *ctx =
+ container_of(work, struct bio_post_read_ctx, work);
+
+ fscrypt_decrypt_bio(ctx->bio);
+
+ bio_post_read_processing(ctx);
+}
+
+static void bio_post_read_processing(struct bio_post_read_ctx *ctx)
+{
+ switch (++ctx->cur_step) {
+ case STEP_DECRYPT:
+ if (ctx->enabled_steps & (1 << STEP_DECRYPT)) {
+ INIT_WORK(&ctx->work, decrypt_work);
+ fscrypt_enqueue_decrypt_work(&ctx->work);
+ return;
+ }
+ ctx->cur_step++;
+ /* fall-through */
+ default:
+ __read_end_io(ctx->bio);
+ }
+}
+
+static bool f2fs_bio_post_read_required(struct bio *bio)
+{
+ return bio->bi_private && !bio->bi_status;
+}
+
+static void f2fs_read_end_io(struct bio *bio)
+{
#ifdef CONFIG_F2FS_FAULT_INJECTION
if (time_to_inject(F2FS_P_SB(bio->bi_io_vec->bv_page), FAULT_IO)) {
f2fs_show_injection_info(FAULT_IO);
@@ -62,28 +134,15 @@
}
#endif
- if (f2fs_bio_encrypted(bio)) {
- if (bio->bi_status) {
- fscrypt_release_ctx(bio->bi_private);
- } else {
- fscrypt_decrypt_bio_pages(bio->bi_private, bio);
- return;
- }
+ if (f2fs_bio_post_read_required(bio)) {
+ struct bio_post_read_ctx *ctx = bio->bi_private;
+
+ ctx->cur_step = STEP_INITIAL;
+ bio_post_read_processing(ctx);
+ return;
}
- bio_for_each_segment_all(bvec, bio, i) {
- struct page *page = bvec->bv_page;
-
- if (!bio->bi_status) {
- if (!PageUptodate(page))
- SetPageUptodate(page);
- } else {
- ClearPageUptodate(page);
- SetPageError(page);
- }
- unlock_page(page);
- }
- bio_put(bio);
+ __read_end_io(bio);
}
static void f2fs_write_end_io(struct bio *bio)
@@ -111,8 +170,13 @@
if (unlikely(bio->bi_status)) {
mapping_set_error(page->mapping, -EIO);
- f2fs_stop_checkpoint(sbi, true);
+ if (type == F2FS_WB_CP_DATA)
+ f2fs_stop_checkpoint(sbi, true);
}
+
+ f2fs_bug_on(sbi, page->mapping == NODE_MAPPING(sbi) &&
+ page->index != nid_of_node(page));
+
dec_page_count(sbi, type);
clear_cold_data(page);
end_page_writeback(page);
@@ -169,15 +233,25 @@
* Low-level block read/write IO operations.
*/
static struct bio *__bio_alloc(struct f2fs_sb_info *sbi, block_t blk_addr,
- int npages, bool is_read)
+ struct writeback_control *wbc,
+ int npages, bool is_read,
+ enum page_type type, enum temp_type temp)
{
struct bio *bio;
- bio = f2fs_bio_alloc(npages);
+ bio = f2fs_bio_alloc(sbi, npages, true);
f2fs_target_device(sbi, blk_addr, bio);
- bio->bi_end_io = is_read ? f2fs_read_end_io : f2fs_write_end_io;
- bio->bi_private = is_read ? NULL : sbi;
+ if (is_read) {
+ bio->bi_end_io = f2fs_read_end_io;
+ bio->bi_private = NULL;
+ } else {
+ bio->bi_end_io = f2fs_write_end_io;
+ bio->bi_private = sbi;
+ bio->bi_write_hint = f2fs_io_type_to_rw_hint(sbi, type, temp);
+ }
+ if (wbc)
+ wbc_init_bio(wbc, bio);
return bio;
}
@@ -188,13 +262,12 @@
if (!is_read_io(bio_op(bio))) {
unsigned int start;
- if (f2fs_sb_mounted_blkzoned(sbi->sb) &&
- current->plug && (type == DATA || type == NODE))
- blk_finish_plug(current->plug);
-
if (type != DATA && type != NODE)
goto submit_io;
+ if (f2fs_sb_has_blkzoned(sbi->sb) && current->plug)
+ blk_finish_plug(current->plug);
+
start = bio->bi_iter.bi_size >> F2FS_BLKSIZE_BITS;
start %= F2FS_IO_SIZE(sbi);
@@ -369,11 +442,13 @@
struct page *page = fio->encrypted_page ?
fio->encrypted_page : fio->page;
+ verify_block_addr(fio, fio->new_blkaddr);
trace_f2fs_submit_page_bio(page, fio);
f2fs_trace_ios(fio, 0);
/* Allocate a new bio */
- bio = __bio_alloc(fio->sbi, fio->new_blkaddr, 1, is_read_io(fio->op));
+ bio = __bio_alloc(fio->sbi, fio->new_blkaddr, fio->io_wbc,
+ 1, is_read_io(fio->op), fio->type, fio->temp);
if (bio_add_page(bio, page, PAGE_SIZE, 0) < PAGE_SIZE) {
bio_put(bio);
@@ -388,13 +463,12 @@
return 0;
}
-int f2fs_submit_page_write(struct f2fs_io_info *fio)
+void f2fs_submit_page_write(struct f2fs_io_info *fio)
{
struct f2fs_sb_info *sbi = fio->sbi;
enum page_type btype = PAGE_TYPE_OF_BIO(fio->type);
struct f2fs_bio_info *io = sbi->write_io[btype] + fio->temp;
struct page *bio_page;
- int err = 0;
f2fs_bug_on(sbi, is_read_io(fio->op));
@@ -404,7 +478,7 @@
spin_lock(&io->io_lock);
if (list_empty(&io->io_list)) {
spin_unlock(&io->io_lock);
- goto out_fail;
+ goto out;
}
fio = list_first_entry(&io->io_list,
struct f2fs_io_info, list);
@@ -412,14 +486,14 @@
spin_unlock(&io->io_lock);
}
- if (fio->old_blkaddr != NEW_ADDR)
- verify_block_addr(sbi, fio->old_blkaddr);
- verify_block_addr(sbi, fio->new_blkaddr);
+ if (is_valid_blkaddr(fio->old_blkaddr))
+ verify_block_addr(fio, fio->old_blkaddr);
+ verify_block_addr(fio, fio->new_blkaddr);
bio_page = fio->encrypted_page ? fio->encrypted_page : fio->page;
- /* set submitted = 1 as a return value */
- fio->submitted = 1;
+ /* set submitted = true as a return value */
+ fio->submitted = true;
inc_page_count(sbi, WB_DATA_TYPE(bio_page));
@@ -431,12 +505,13 @@
if (io->bio == NULL) {
if ((fio->type == DATA || fio->type == NODE) &&
fio->new_blkaddr & F2FS_IO_SIZE_MASK(sbi)) {
- err = -EAGAIN;
dec_page_count(sbi, WB_DATA_TYPE(bio_page));
- goto out_fail;
+ fio->retry = true;
+ goto skip;
}
- io->bio = __bio_alloc(sbi, fio->new_blkaddr,
- BIO_MAX_PAGES, false);
+ io->bio = __bio_alloc(sbi, fio->new_blkaddr, fio->io_wbc,
+ BIO_MAX_PAGES, false,
+ fio->type, fio->temp);
io->fio = *fio;
}
@@ -445,45 +520,51 @@
goto alloc_new;
}
+ if (fio->io_wbc)
+ wbc_account_io(fio->io_wbc, bio_page, PAGE_SIZE);
+
io->last_block_in_bio = fio->new_blkaddr;
f2fs_trace_ios(fio, 0);
trace_f2fs_submit_page_write(fio->page, fio);
-
+skip:
if (fio->in_list)
goto next;
-out_fail:
+out:
up_write(&io->io_rwsem);
- return err;
}
static struct bio *f2fs_grab_read_bio(struct inode *inode, block_t blkaddr,
unsigned nr_pages)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
- struct fscrypt_ctx *ctx = NULL;
struct bio *bio;
+ struct bio_post_read_ctx *ctx;
+ unsigned int post_read_steps = 0;
- if (f2fs_encrypted_file(inode)) {
- ctx = fscrypt_get_ctx(inode, GFP_NOFS);
- if (IS_ERR(ctx))
- return ERR_CAST(ctx);
+ bio = f2fs_bio_alloc(sbi, min_t(int, nr_pages, BIO_MAX_PAGES), false);
+ if (!bio)
+ return ERR_PTR(-ENOMEM);
+ f2fs_target_device(sbi, blkaddr, bio);
+ bio->bi_end_io = f2fs_read_end_io;
+ bio_set_op_attrs(bio, REQ_OP_READ, 0);
+
+ if (f2fs_encrypted_file(inode))
+ post_read_steps |= 1 << STEP_DECRYPT;
+ if (post_read_steps) {
+ ctx = mempool_alloc(bio_post_read_ctx_pool, GFP_NOFS);
+ if (!ctx) {
+ bio_put(bio);
+ return ERR_PTR(-ENOMEM);
+ }
+ ctx->bio = bio;
+ ctx->enabled_steps = post_read_steps;
+ bio->bi_private = ctx;
/* wait the page to be moved by cleaning */
f2fs_wait_on_block_writeback(sbi, blkaddr);
}
- bio = bio_alloc(GFP_KERNEL, min_t(int, nr_pages, BIO_MAX_PAGES));
- if (!bio) {
- if (ctx)
- fscrypt_release_ctx(ctx);
- return ERR_PTR(-ENOMEM);
- }
- f2fs_target_device(sbi, blkaddr, bio);
- bio->bi_end_io = f2fs_read_end_io;
- bio->bi_private = ctx;
- bio_set_op_attrs(bio, REQ_OP_READ, 0);
-
return bio;
}
@@ -524,7 +605,7 @@
* ->node_page
* update block addresses in the node page
*/
-void set_data_blkaddr(struct dnode_of_data *dn)
+void f2fs_set_data_blkaddr(struct dnode_of_data *dn)
{
f2fs_wait_on_page_writeback(dn->node_page, NODE, true);
__set_data_blkaddr(dn);
@@ -535,12 +616,12 @@
void f2fs_update_data_blkaddr(struct dnode_of_data *dn, block_t blkaddr)
{
dn->data_blkaddr = blkaddr;
- set_data_blkaddr(dn);
+ f2fs_set_data_blkaddr(dn);
f2fs_update_extent_cache(dn);
}
/* dn->ofs_in_node will be returned with up-to-date last block pointer */
-int reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count)
+int f2fs_reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);
int err;
@@ -574,12 +655,12 @@
}
/* Should keep dn->ofs_in_node unchanged */
-int reserve_new_block(struct dnode_of_data *dn)
+int f2fs_reserve_new_block(struct dnode_of_data *dn)
{
unsigned int ofs_in_node = dn->ofs_in_node;
int ret;
- ret = reserve_new_blocks(dn, 1);
+ ret = f2fs_reserve_new_blocks(dn, 1);
dn->ofs_in_node = ofs_in_node;
return ret;
}
@@ -589,12 +670,12 @@
bool need_put = dn->inode_page ? false : true;
int err;
- err = get_dnode_of_data(dn, index, ALLOC_NODE);
+ err = f2fs_get_dnode_of_data(dn, index, ALLOC_NODE);
if (err)
return err;
if (dn->data_blkaddr == NULL_ADDR)
- err = reserve_new_block(dn);
+ err = f2fs_reserve_new_block(dn);
if (err || need_put)
f2fs_put_dnode(dn);
return err;
@@ -613,7 +694,7 @@
return f2fs_reserve_block(dn, index);
}
-struct page *get_read_data_page(struct inode *inode, pgoff_t index,
+struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index,
int op_flags, bool for_write)
{
struct address_space *mapping = inode->i_mapping;
@@ -632,7 +713,7 @@
}
set_new_dnode(&dn, inode, NULL, NULL, 0);
- err = get_dnode_of_data(&dn, index, LOOKUP_NODE);
+ err = f2fs_get_dnode_of_data(&dn, index, LOOKUP_NODE);
if (err)
goto put_err;
f2fs_put_dnode(&dn);
@@ -651,7 +732,8 @@
* A new dentry page is allocated but not able to be written, since its
* new inode page couldn't be allocated due to -ENOSPC.
* In such the case, its blkaddr can be remained as NEW_ADDR.
- * see, f2fs_add_link -> get_new_data_page -> init_inode_metadata.
+ * see, f2fs_add_link -> f2fs_get_new_data_page ->
+ * f2fs_init_inode_metadata.
*/
if (dn.data_blkaddr == NEW_ADDR) {
zero_user_segment(page, 0, PAGE_SIZE);
@@ -671,7 +753,7 @@
return ERR_PTR(err);
}
-struct page *find_data_page(struct inode *inode, pgoff_t index)
+struct page *f2fs_find_data_page(struct inode *inode, pgoff_t index)
{
struct address_space *mapping = inode->i_mapping;
struct page *page;
@@ -681,7 +763,7 @@
return page;
f2fs_put_page(page, 0);
- page = get_read_data_page(inode, index, 0, false);
+ page = f2fs_get_read_data_page(inode, index, 0, false);
if (IS_ERR(page))
return page;
@@ -701,13 +783,13 @@
* Because, the callers, functions in dir.c and GC, should be able to know
* whether this page exists or not.
*/
-struct page *get_lock_data_page(struct inode *inode, pgoff_t index,
+struct page *f2fs_get_lock_data_page(struct inode *inode, pgoff_t index,
bool for_write)
{
struct address_space *mapping = inode->i_mapping;
struct page *page;
repeat:
- page = get_read_data_page(inode, index, 0, for_write);
+ page = f2fs_get_read_data_page(inode, index, 0, for_write);
if (IS_ERR(page))
return page;
@@ -733,7 +815,7 @@
* Note that, ipage is set only by make_empty_dir, and if any error occur,
* ipage should be released by this function.
*/
-struct page *get_new_data_page(struct inode *inode,
+struct page *f2fs_get_new_data_page(struct inode *inode,
struct page *ipage, pgoff_t index, bool new_i_size)
{
struct address_space *mapping = inode->i_mapping;
@@ -772,7 +854,7 @@
/* if ipage exists, blkaddr should be NEW_ADDR */
f2fs_bug_on(F2FS_I_SB(inode), ipage);
- page = get_lock_data_page(inode, index, true);
+ page = f2fs_get_lock_data_page(inode, index, true);
if (IS_ERR(page))
return page;
}
@@ -783,7 +865,7 @@
return page;
}
-static int __allocate_data_block(struct dnode_of_data *dn)
+static int __allocate_data_block(struct dnode_of_data *dn, int seg_type)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);
struct f2fs_summary sum;
@@ -804,15 +886,15 @@
return err;
alloc:
- get_node_info(sbi, dn->nid, &ni);
+ f2fs_get_node_info(sbi, dn->nid, &ni);
set_summary(&sum, dn->nid, dn->ofs_in_node, ni.version);
- allocate_data_block(sbi, NULL, dn->data_blkaddr, &dn->data_blkaddr,
- &sum, CURSEG_WARM_DATA, NULL, false);
- set_data_blkaddr(dn);
+ f2fs_allocate_data_block(sbi, NULL, dn->data_blkaddr, &dn->data_blkaddr,
+ &sum, seg_type, NULL, false);
+ f2fs_set_data_blkaddr(dn);
/* update i_size */
- fofs = start_bidx_of_node(ofs_of_node(dn->node_page), dn->inode) +
+ fofs = f2fs_start_bidx_of_node(ofs_of_node(dn->node_page), dn->inode) +
dn->ofs_in_node;
if (i_size_read(dn->inode) < ((loff_t)(fofs + 1) << PAGE_SHIFT))
f2fs_i_size_write(dn->inode,
@@ -820,18 +902,20 @@
return 0;
}
-static inline bool __force_buffered_io(struct inode *inode, int rw)
-{
- return (f2fs_encrypted_file(inode) ||
- (rw == WRITE && test_opt(F2FS_I_SB(inode), LFS)) ||
- F2FS_I_SB(inode)->s_ndevs);
-}
-
int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from)
{
struct inode *inode = file_inode(iocb->ki_filp);
struct f2fs_map_blocks map;
+ int flag;
int err = 0;
+ bool direct_io = iocb->ki_flags & IOCB_DIRECT;
+
+ /* convert inline data for Direct I/O*/
+ if (direct_io) {
+ err = f2fs_convert_inline_inode(inode);
+ if (err)
+ return err;
+ }
if (is_inode_flag_set(inode, FI_NO_PREALLOC))
return 0;
@@ -844,23 +928,33 @@
map.m_len = 0;
map.m_next_pgofs = NULL;
+ map.m_next_extent = NULL;
+ map.m_seg_type = NO_CHECK_TYPE;
- if (iocb->ki_flags & IOCB_DIRECT) {
- err = f2fs_convert_inline_inode(inode);
- if (err)
- return err;
- return f2fs_map_blocks(inode, &map, 1,
- __force_buffered_io(inode, WRITE) ?
- F2FS_GET_BLOCK_PRE_AIO :
- F2FS_GET_BLOCK_PRE_DIO);
+ if (direct_io) {
+ map.m_seg_type = f2fs_rw_hint_to_seg_type(iocb->ki_hint);
+ flag = f2fs_force_buffered_io(inode, WRITE) ?
+ F2FS_GET_BLOCK_PRE_AIO :
+ F2FS_GET_BLOCK_PRE_DIO;
+ goto map_blocks;
}
if (iocb->ki_pos + iov_iter_count(from) > MAX_INLINE_DATA(inode)) {
err = f2fs_convert_inline_inode(inode);
if (err)
return err;
}
- if (!f2fs_has_inline_data(inode))
- return f2fs_map_blocks(inode, &map, 1, F2FS_GET_BLOCK_PRE_AIO);
+ if (f2fs_has_inline_data(inode))
+ return err;
+
+ flag = F2FS_GET_BLOCK_PRE_AIO;
+
+map_blocks:
+ err = f2fs_map_blocks(inode, &map, 1, flag);
+ if (map.m_len > 0 && err == -ENOSPC) {
+ if (!direct_io)
+ set_inode_flag(inode, FI_NO_PREALLOC);
+ err = 0;
+ }
return err;
}
@@ -901,6 +995,7 @@
blkcnt_t prealloc;
struct extent_info ei = {0,0,0};
block_t blkaddr;
+ unsigned int start_pgofs;
if (!maxblocks)
return 0;
@@ -916,6 +1011,8 @@
map->m_pblk = ei.blk + pgofs - ei.fofs;
map->m_len = min((pgoff_t)maxblocks, ei.fofs + ei.len - pgofs);
map->m_flags = F2FS_MAP_MAPPED;
+ if (map->m_next_extent)
+ *map->m_next_extent = pgofs + map->m_len;
goto out;
}
@@ -925,7 +1022,7 @@
/* When reading holes, we need its node page */
set_new_dnode(&dn, inode, NULL, NULL, 0);
- err = get_dnode_of_data(&dn, pgofs, mode);
+ err = f2fs_get_dnode_of_data(&dn, pgofs, mode);
if (err) {
if (flag == F2FS_GET_BLOCK_BMAP)
map->m_pblk = 0;
@@ -933,11 +1030,15 @@
err = 0;
if (map->m_next_pgofs)
*map->m_next_pgofs =
- get_next_page_offset(&dn, pgofs);
+ f2fs_get_next_page_offset(&dn, pgofs);
+ if (map->m_next_extent)
+ *map->m_next_extent =
+ f2fs_get_next_page_offset(&dn, pgofs);
}
goto unlock_out;
}
+ start_pgofs = pgofs;
prealloc = 0;
last_ofs_in_node = ofs_in_node = dn.ofs_in_node;
end_offset = ADDRS_PER_PAGE(dn.node_page, inode);
@@ -945,7 +1046,7 @@
next_block:
blkaddr = datablock_addr(dn.inode, dn.node_page, dn.ofs_in_node);
- if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR) {
+ if (!is_valid_blkaddr(blkaddr)) {
if (create) {
if (unlikely(f2fs_cp_error(sbi))) {
err = -EIO;
@@ -957,7 +1058,8 @@
last_ofs_in_node = dn.ofs_in_node;
}
} else {
- err = __allocate_data_block(&dn);
+ err = __allocate_data_block(&dn,
+ map->m_seg_type);
if (!err)
set_inode_flag(inode, FI_APPEND_WRITE);
}
@@ -970,14 +1072,20 @@
map->m_pblk = 0;
goto sync_out;
}
+ if (flag == F2FS_GET_BLOCK_PRECACHE)
+ goto sync_out;
if (flag == F2FS_GET_BLOCK_FIEMAP &&
blkaddr == NULL_ADDR) {
if (map->m_next_pgofs)
*map->m_next_pgofs = pgofs + 1;
- }
- if (flag != F2FS_GET_BLOCK_FIEMAP ||
- blkaddr != NEW_ADDR)
goto sync_out;
+ }
+ if (flag != F2FS_GET_BLOCK_FIEMAP) {
+ /* for defragment case */
+ if (map->m_next_pgofs)
+ *map->m_next_pgofs = pgofs + 1;
+ goto sync_out;
+ }
}
}
@@ -1011,7 +1119,7 @@
(pgofs == end || dn.ofs_in_node == end_offset)) {
dn.ofs_in_node = ofs_in_node;
- err = reserve_new_blocks(&dn, prealloc);
+ err = f2fs_reserve_new_blocks(&dn, prealloc);
if (err)
goto sync_out;
@@ -1028,6 +1136,16 @@
else if (dn.ofs_in_node < end_offset)
goto next_block;
+ if (flag == F2FS_GET_BLOCK_PRECACHE) {
+ if (map->m_flags & F2FS_MAP_MAPPED) {
+ unsigned int ofs = start_pgofs - map->m_lblk;
+
+ f2fs_update_extent_cache_range(&dn,
+ start_pgofs, map->m_pblk + ofs,
+ map->m_len - ofs);
+ }
+ }
+
f2fs_put_dnode(&dn);
if (create) {
@@ -1037,6 +1155,17 @@
goto next_dnode;
sync_out:
+ if (flag == F2FS_GET_BLOCK_PRECACHE) {
+ if (map->m_flags & F2FS_MAP_MAPPED) {
+ unsigned int ofs = start_pgofs - map->m_lblk;
+
+ f2fs_update_extent_cache_range(&dn,
+ start_pgofs, map->m_pblk + ofs,
+ map->m_len - ofs);
+ }
+ if (map->m_next_extent)
+ *map->m_next_extent = pgofs + 1;
+ }
f2fs_put_dnode(&dn);
unlock_out:
if (create) {
@@ -1048,9 +1177,34 @@
return err;
}
+bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len)
+{
+ struct f2fs_map_blocks map;
+ block_t last_lblk;
+ int err;
+
+ if (pos + len > i_size_read(inode))
+ return false;
+
+ map.m_lblk = F2FS_BYTES_TO_BLK(pos);
+ map.m_next_pgofs = NULL;
+ map.m_next_extent = NULL;
+ map.m_seg_type = NO_CHECK_TYPE;
+ last_lblk = F2FS_BLK_ALIGN(pos + len);
+
+ while (map.m_lblk < last_lblk) {
+ map.m_len = last_lblk - map.m_lblk;
+ err = f2fs_map_blocks(inode, &map, 0, F2FS_GET_BLOCK_DEFAULT);
+ if (err || map.m_len == 0)
+ return false;
+ map.m_lblk += map.m_len;
+ }
+ return true;
+}
+
static int __get_data_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh, int create, int flag,
- pgoff_t *next_pgofs)
+ pgoff_t *next_pgofs, int seg_type)
{
struct f2fs_map_blocks map;
int err;
@@ -1058,6 +1212,8 @@
map.m_lblk = iblock;
map.m_len = bh->b_size >> inode->i_blkbits;
map.m_next_pgofs = next_pgofs;
+ map.m_next_extent = NULL;
+ map.m_seg_type = seg_type;
err = f2fs_map_blocks(inode, &map, create, flag);
if (!err) {
@@ -1073,14 +1229,17 @@
pgoff_t *next_pgofs)
{
return __get_data_block(inode, iblock, bh_result, create,
- flag, next_pgofs);
+ flag, next_pgofs,
+ NO_CHECK_TYPE);
}
static int get_data_block_dio(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
return __get_data_block(inode, iblock, bh_result, create,
- F2FS_GET_BLOCK_DEFAULT, NULL);
+ F2FS_GET_BLOCK_DEFAULT, NULL,
+ f2fs_rw_hint_to_seg_type(
+ inode->i_write_hint));
}
static int get_data_block_bmap(struct inode *inode, sector_t iblock,
@@ -1091,7 +1250,8 @@
return -EFBIG;
return __get_data_block(inode, iblock, bh_result, create,
- F2FS_GET_BLOCK_BMAP, NULL);
+ F2FS_GET_BLOCK_BMAP, NULL,
+ NO_CHECK_TYPE);
}
static inline sector_t logical_to_blk(struct inode *inode, loff_t offset)
@@ -1104,6 +1264,68 @@
return (blk << inode->i_blkbits);
}
+static int f2fs_xattr_fiemap(struct inode *inode,
+ struct fiemap_extent_info *fieinfo)
+{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ struct page *page;
+ struct node_info ni;
+ __u64 phys = 0, len;
+ __u32 flags;
+ nid_t xnid = F2FS_I(inode)->i_xattr_nid;
+ int err = 0;
+
+ if (f2fs_has_inline_xattr(inode)) {
+ int offset;
+
+ page = f2fs_grab_cache_page(NODE_MAPPING(sbi),
+ inode->i_ino, false);
+ if (!page)
+ return -ENOMEM;
+
+ f2fs_get_node_info(sbi, inode->i_ino, &ni);
+
+ phys = (__u64)blk_to_logical(inode, ni.blk_addr);
+ offset = offsetof(struct f2fs_inode, i_addr) +
+ sizeof(__le32) * (DEF_ADDRS_PER_INODE -
+ get_inline_xattr_addrs(inode));
+
+ phys += offset;
+ len = inline_xattr_size(inode);
+
+ f2fs_put_page(page, 1);
+
+ flags = FIEMAP_EXTENT_DATA_INLINE | FIEMAP_EXTENT_NOT_ALIGNED;
+
+ if (!xnid)
+ flags |= FIEMAP_EXTENT_LAST;
+
+ err = fiemap_fill_next_extent(fieinfo, 0, phys, len, flags);
+ if (err || err == 1)
+ return err;
+ }
+
+ if (xnid) {
+ page = f2fs_grab_cache_page(NODE_MAPPING(sbi), xnid, false);
+ if (!page)
+ return -ENOMEM;
+
+ f2fs_get_node_info(sbi, xnid, &ni);
+
+ phys = (__u64)blk_to_logical(inode, ni.blk_addr);
+ len = inode->i_sb->s_blocksize;
+
+ f2fs_put_page(page, 1);
+
+ flags = FIEMAP_EXTENT_LAST;
+ }
+
+ if (phys)
+ err = fiemap_fill_next_extent(fieinfo, 0, phys, len, flags);
+
+ return (err < 0 ? err : 0);
+}
+
int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
u64 start, u64 len)
{
@@ -1114,18 +1336,29 @@
u32 flags = 0;
int ret = 0;
- ret = fiemap_check_flags(fieinfo, FIEMAP_FLAG_SYNC);
+ if (fieinfo->fi_flags & FIEMAP_FLAG_CACHE) {
+ ret = f2fs_precache_extents(inode);
+ if (ret)
+ return ret;
+ }
+
+ ret = fiemap_check_flags(fieinfo, FIEMAP_FLAG_SYNC | FIEMAP_FLAG_XATTR);
if (ret)
return ret;
+ inode_lock(inode);
+
+ if (fieinfo->fi_flags & FIEMAP_FLAG_XATTR) {
+ ret = f2fs_xattr_fiemap(inode, fieinfo);
+ goto out;
+ }
+
if (f2fs_has_inline_data(inode)) {
ret = f2fs_inline_data_fiemap(inode, fieinfo, start, len);
if (ret != -EAGAIN)
- return ret;
+ goto out;
}
- inode_lock(inode);
-
if (logical_to_blk(inode, len) == 0)
len = blk_to_logical(inode, 1);
@@ -1195,7 +1428,6 @@
unsigned nr_pages)
{
struct bio *bio = NULL;
- unsigned page_idx;
sector_t last_block_in_bio = 0;
struct inode *inode = mapping->host;
const unsigned blkbits = inode->i_blkbits;
@@ -1211,9 +1443,10 @@
map.m_len = 0;
map.m_flags = 0;
map.m_next_pgofs = NULL;
+ map.m_next_extent = NULL;
+ map.m_seg_type = NO_CHECK_TYPE;
- for (page_idx = 0; nr_pages; page_idx++, nr_pages--) {
-
+ for (; nr_pages; nr_pages--) {
if (pages) {
page = list_last_entry(pages, struct page, lru);
@@ -1334,7 +1567,7 @@
struct address_space *mapping,
struct list_head *pages, unsigned nr_pages)
{
- struct inode *inode = file->f_mapping->host;
+ struct inode *inode = mapping->host;
struct page *page = list_last_entry(pages, struct page, lru);
trace_f2fs_readpages(inode, page, nr_pages);
@@ -1354,7 +1587,7 @@
if (!f2fs_encrypted_file(inode))
return 0;
- /* wait for GCed encrypted page writeback */
+ /* wait for GCed page writeback via META_MAPPING */
f2fs_wait_on_block_writeback(fio->sbi, fio->old_blkaddr);
retry_encrypt:
@@ -1373,30 +1606,82 @@
return PTR_ERR(fio->encrypted_page);
}
+static inline bool check_inplace_update_policy(struct inode *inode,
+ struct f2fs_io_info *fio)
+{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ unsigned int policy = SM_I(sbi)->ipu_policy;
+
+ if (policy & (0x1 << F2FS_IPU_FORCE))
+ return true;
+ if (policy & (0x1 << F2FS_IPU_SSR) && f2fs_need_SSR(sbi))
+ return true;
+ if (policy & (0x1 << F2FS_IPU_UTIL) &&
+ utilization(sbi) > SM_I(sbi)->min_ipu_util)
+ return true;
+ if (policy & (0x1 << F2FS_IPU_SSR_UTIL) && f2fs_need_SSR(sbi) &&
+ utilization(sbi) > SM_I(sbi)->min_ipu_util)
+ return true;
+
+ /*
+ * IPU for rewrite async pages
+ */
+ if (policy & (0x1 << F2FS_IPU_ASYNC) &&
+ fio && fio->op == REQ_OP_WRITE &&
+ !(fio->op_flags & REQ_SYNC) &&
+ !f2fs_encrypted_inode(inode))
+ return true;
+
+ /* this is only set during fdatasync */
+ if (policy & (0x1 << F2FS_IPU_FSYNC) &&
+ is_inode_flag_set(inode, FI_NEED_IPU))
+ return true;
+
+ return false;
+}
+
+bool f2fs_should_update_inplace(struct inode *inode, struct f2fs_io_info *fio)
+{
+ if (f2fs_is_pinned_file(inode))
+ return true;
+
+ /* if this is cold file, we should overwrite to avoid fragmentation */
+ if (file_is_cold(inode))
+ return true;
+
+ return check_inplace_update_policy(inode, fio);
+}
+
+bool f2fs_should_update_outplace(struct inode *inode, struct f2fs_io_info *fio)
+{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+
+ if (test_opt(sbi, LFS))
+ return true;
+ if (S_ISDIR(inode->i_mode))
+ return true;
+ if (f2fs_is_atomic_file(inode))
+ return true;
+ if (fio) {
+ if (is_cold_data(fio->page))
+ return true;
+ if (IS_ATOMIC_WRITTEN_PAGE(fio->page))
+ return true;
+ }
+ return false;
+}
+
static inline bool need_inplace_update(struct f2fs_io_info *fio)
{
struct inode *inode = fio->page->mapping->host;
- if (S_ISDIR(inode->i_mode) || f2fs_is_atomic_file(inode))
- return false;
- if (is_cold_data(fio->page))
- return false;
- if (IS_ATOMIC_WRITTEN_PAGE(fio->page))
+ if (f2fs_should_update_outplace(inode, fio))
return false;
- return need_inplace_update_policy(inode, fio);
+ return f2fs_should_update_inplace(inode, fio);
}
-static inline bool valid_ipu_blkaddr(struct f2fs_io_info *fio)
-{
- if (fio->old_blkaddr == NEW_ADDR)
- return false;
- if (fio->old_blkaddr == NULL_ADDR)
- return false;
- return true;
-}
-
-int do_write_data_page(struct f2fs_io_info *fio)
+int f2fs_do_write_data_page(struct f2fs_io_info *fio)
{
struct page *page = fio->page;
struct inode *inode = page->mapping->host;
@@ -1410,7 +1695,7 @@
f2fs_lookup_extent_cache(inode, page->index, &ei)) {
fio->old_blkaddr = ei.blk + page->index - ei.fofs;
- if (valid_ipu_blkaddr(fio)) {
+ if (is_valid_blkaddr(fio->old_blkaddr)) {
ipu_force = true;
fio->need_lock = LOCK_DONE;
goto got_it;
@@ -1421,7 +1706,7 @@
if (fio->need_lock == LOCK_REQ && !f2fs_trylock_op(fio->sbi))
return -EAGAIN;
- err = get_dnode_of_data(&dn, page->index, LOOKUP_NODE);
+ err = f2fs_get_dnode_of_data(&dn, page->index, LOOKUP_NODE);
if (err)
goto out;
@@ -1437,16 +1722,18 @@
* If current allocation needs SSR,
* it had better in-place writes for updated data.
*/
- if (ipu_force || (valid_ipu_blkaddr(fio) && need_inplace_update(fio))) {
+ if (ipu_force || (is_valid_blkaddr(fio->old_blkaddr) &&
+ need_inplace_update(fio))) {
err = encrypt_one_page(fio);
if (err)
goto out_writepage;
set_page_writeback(page);
+ ClearPageError(page);
f2fs_put_dnode(&dn);
if (fio->need_lock == LOCK_REQ)
f2fs_unlock_op(fio->sbi);
- err = rewrite_data_page(fio);
+ err = f2fs_inplace_write_data(fio);
trace_f2fs_do_write_data_page(fio->page, IPU);
set_inode_flag(inode, FI_UPDATE_WRITE);
return err;
@@ -1465,9 +1752,10 @@
goto out_writepage;
set_page_writeback(page);
+ ClearPageError(page);
/* LFS mode write path */
- write_data_page(&dn, fio);
+ f2fs_outplace_write_data(&dn, fio);
trace_f2fs_do_write_data_page(page, OPU);
set_inode_flag(inode, FI_APPEND_WRITE);
if (page->index == 0)
@@ -1495,6 +1783,7 @@
int err = 0;
struct f2fs_io_info fio = {
.sbi = sbi,
+ .ino = inode->i_ino,
.type = DATA,
.op = REQ_OP_WRITE,
.op_flags = wbc_to_write_flags(wbc),
@@ -1504,10 +1793,23 @@
.submitted = false,
.need_lock = LOCK_RETRY,
.io_type = io_type,
+ .io_wbc = wbc,
};
trace_f2fs_writepage(page, DATA);
+ /* we should bypass data pages to proceed the kworkder jobs */
+ if (unlikely(f2fs_cp_error(sbi))) {
+ mapping_set_error(page->mapping, -EIO);
+ /*
+ * don't drop any dirty dentry pages for keeping lastest
+ * directory structure.
+ */
+ if (S_ISDIR(inode->i_mode))
+ goto redirty_out;
+ goto out;
+ }
+
if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
goto redirty_out;
@@ -1529,19 +1831,13 @@
/* we should not write 0'th page having journal header */
if (f2fs_is_volatile_file(inode) && (!page->index ||
(!wbc->for_reclaim &&
- available_free_memory(sbi, BASE_CHECK))))
+ f2fs_available_free_memory(sbi, BASE_CHECK))))
goto redirty_out;
- /* we should bypass data pages to proceed the kworkder jobs */
- if (unlikely(f2fs_cp_error(sbi))) {
- mapping_set_error(page->mapping, -EIO);
- goto out;
- }
-
/* Dentry blocks are controlled by checkpoint */
if (S_ISDIR(inode->i_mode)) {
fio.need_lock = LOCK_DONE;
- err = do_write_data_page(&fio);
+ err = f2fs_do_write_data_page(&fio);
goto done;
}
@@ -1560,14 +1856,21 @@
}
if (err == -EAGAIN) {
- err = do_write_data_page(&fio);
+ err = f2fs_do_write_data_page(&fio);
if (err == -EAGAIN) {
fio.need_lock = LOCK_REQ;
- err = do_write_data_page(&fio);
+ err = f2fs_do_write_data_page(&fio);
}
}
- if (F2FS_I(inode)->last_disk_size < psize)
- F2FS_I(inode)->last_disk_size = psize;
+
+ if (err) {
+ file_set_keep_isize(inode);
+ } else {
+ down_write(&F2FS_I(inode)->i_sem);
+ if (F2FS_I(inode)->last_disk_size < psize)
+ F2FS_I(inode)->last_disk_size = psize;
+ up_write(&F2FS_I(inode)->i_sem);
+ }
done:
if (err && err != -ENOENT)
@@ -1581,7 +1884,7 @@
if (wbc->for_reclaim) {
f2fs_submit_merged_write_cond(sbi, inode, 0, page->index, DATA);
clear_inode_flag(inode, FI_HOT_DATA);
- remove_dirty_inode(inode);
+ f2fs_remove_dirty_inode(inode);
submitted = NULL;
}
@@ -1631,6 +1934,7 @@
int ret = 0;
int done = 0;
struct pagevec pvec;
+ struct f2fs_sb_info *sbi = F2FS_M_SB(mapping);
int nr_pages;
pgoff_t uninitialized_var(writeback_index);
pgoff_t index;
@@ -1675,8 +1979,8 @@
while (!done && (index <= end)) {
int i;
- nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag,
- min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1);
+ nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end,
+ tag);
if (nr_pages == 0)
break;
@@ -1684,7 +1988,9 @@
struct page *page = pvec.pages[i];
bool submitted = false;
- if (page->index > end) {
+ /* give a priority to WB_SYNC threads */
+ if (atomic_read(&sbi->wb_sync_req[DATA]) &&
+ wbc->sync_mode == WB_SYNC_NONE) {
done = 1;
break;
}
@@ -1743,9 +2049,7 @@
last_idx = page->index;
}
- /* give a priority to WB_SYNC threads */
- if ((atomic_read(&F2FS_M_SB(mapping)->wb_sync_req) ||
- --wbc->nr_to_write <= 0) &&
+ if (--wbc->nr_to_write <= 0 &&
wbc->sync_mode == WB_SYNC_NONE) {
done = 1;
break;
@@ -1771,7 +2075,7 @@
return ret;
}
-int __f2fs_write_data_pages(struct address_space *mapping,
+static int __f2fs_write_data_pages(struct address_space *mapping,
struct writeback_control *wbc,
enum iostat_type io_type)
{
@@ -1794,7 +2098,7 @@
if (S_ISDIR(inode->i_mode) && wbc->sync_mode == WB_SYNC_NONE &&
get_dirty_pages(inode) < nr_pages_to_skip(sbi, DATA) &&
- available_free_memory(sbi, DIRTY_DENTS))
+ f2fs_available_free_memory(sbi, DIRTY_DENTS))
goto skip_write;
/* skip writing during file defragment */
@@ -1805,8 +2109,8 @@
/* to avoid spliting IOs due to mixed WB_SYNC_ALL and WB_SYNC_NONE */
if (wbc->sync_mode == WB_SYNC_ALL)
- atomic_inc(&sbi->wb_sync_req);
- else if (atomic_read(&sbi->wb_sync_req))
+ atomic_inc(&sbi->wb_sync_req[DATA]);
+ else if (atomic_read(&sbi->wb_sync_req[DATA]))
goto skip_write;
blk_start_plug(&plug);
@@ -1814,13 +2118,13 @@
blk_finish_plug(&plug);
if (wbc->sync_mode == WB_SYNC_ALL)
- atomic_dec(&sbi->wb_sync_req);
+ atomic_dec(&sbi->wb_sync_req[DATA]);
/*
* if some pages were truncated, we cannot guarantee its mapping->host
* to detect pending bios.
*/
- remove_dirty_inode(inode);
+ f2fs_remove_dirty_inode(inode);
return ret;
skip_write:
@@ -1847,7 +2151,7 @@
if (to > i_size) {
down_write(&F2FS_I(inode)->i_mmap_sem);
truncate_pagecache(inode, i_size);
- truncate_blocks(inode, i_size, true);
+ f2fs_truncate_blocks(inode, i_size, true);
up_write(&F2FS_I(inode)->i_mmap_sem);
}
}
@@ -1879,7 +2183,7 @@
}
restart:
/* check inline_data */
- ipage = get_node_page(sbi, inode->i_ino);
+ ipage = f2fs_get_node_page(sbi, inode->i_ino);
if (IS_ERR(ipage)) {
err = PTR_ERR(ipage);
goto unlock_out;
@@ -1889,7 +2193,7 @@
if (f2fs_has_inline_data(inode)) {
if (pos + len <= MAX_INLINE_DATA(inode)) {
- read_inline_data(page, ipage);
+ f2fs_do_read_inline_data(page, ipage);
set_inode_flag(inode, FI_DATA_EXIST);
if (inode->i_nlink)
set_inline_node(ipage);
@@ -1907,7 +2211,7 @@
dn.data_blkaddr = ei.blk + index - ei.fofs;
} else {
/* hole case */
- err = get_dnode_of_data(&dn, index, LOOKUP_NODE);
+ err = f2fs_get_dnode_of_data(&dn, index, LOOKUP_NODE);
if (err || dn.data_blkaddr == NULL_ADDR) {
f2fs_put_dnode(&dn);
__do_map_lock(sbi, F2FS_GET_BLOCK_PRE_AIO,
@@ -1937,12 +2241,29 @@
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct page *page = NULL;
pgoff_t index = ((unsigned long long) pos) >> PAGE_SHIFT;
- bool need_balance = false;
+ bool need_balance = false, drop_atomic = false;
block_t blkaddr = NULL_ADDR;
int err = 0;
+ if (trace_android_fs_datawrite_start_enabled()) {
+ char *path, pathbuf[MAX_TRACE_PATHBUF_LEN];
+
+ path = android_fstrace_get_pathname(pathbuf,
+ MAX_TRACE_PATHBUF_LEN,
+ inode);
+ trace_android_fs_datawrite_start(inode, pos, len,
+ current->pid, path,
+ current->comm);
+ }
trace_f2fs_write_begin(inode, pos, len, flags);
+ if (f2fs_is_atomic_file(inode) &&
+ !f2fs_available_free_memory(sbi, INMEM_PAGES)) {
+ err = -ENOMEM;
+ drop_atomic = true;
+ goto fail;
+ }
+
/*
* We should check this at this moment to avoid deadlock on inode page
* and #0 page. The locking rule for inline_data conversion should be:
@@ -1958,7 +2279,7 @@
* Do not use grab_cache_page_write_begin() to avoid deadlock due to
* wait_for_stable_page. Will wait that below with our IO control.
*/
- page = pagecache_get_page(mapping, index,
+ page = f2fs_pagecache_get_page(mapping, index,
FGP_LOCK | FGP_WRITE | FGP_CREAT, GFP_NOFS);
if (!page) {
err = -ENOMEM;
@@ -1985,8 +2306,8 @@
f2fs_wait_on_page_writeback(page, DATA, false);
- /* wait for GCed encrypted page writeback */
- if (f2fs_encrypted_file(inode))
+ /* wait for GCed page writeback via META_MAPPING */
+ if (f2fs_post_read_required(inode))
f2fs_wait_on_block_writeback(sbi, blkaddr);
if (len == PAGE_SIZE || PageUptodate(page))
@@ -2020,6 +2341,8 @@
fail:
f2fs_put_page(page, 1);
f2fs_write_failed(mapping, pos + len);
+ if (drop_atomic)
+ f2fs_drop_inmem_pages_all(sbi, false);
return err;
}
@@ -2030,6 +2353,7 @@
{
struct inode *inode = page->mapping->host;
+ trace_android_fs_datawrite_end(inode, pos, len);
trace_f2fs_write_end(inode, pos, len, copied);
/*
@@ -2074,25 +2398,63 @@
{
struct address_space *mapping = iocb->ki_filp->f_mapping;
struct inode *inode = mapping->host;
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
size_t count = iov_iter_count(iter);
loff_t offset = iocb->ki_pos;
int rw = iov_iter_rw(iter);
int err;
+ enum rw_hint hint = iocb->ki_hint;
+ int whint_mode = F2FS_OPTION(sbi).whint_mode;
err = check_direct_IO(inode, iter, offset);
if (err)
return err;
- if (__force_buffered_io(inode, rw))
+ if (f2fs_force_buffered_io(inode, rw))
return 0;
trace_f2fs_direct_IO_enter(inode, offset, count, rw);
- down_read(&F2FS_I(inode)->dio_rwsem[rw]);
+ if (trace_android_fs_dataread_start_enabled() &&
+ (rw == READ)) {
+ char *path, pathbuf[MAX_TRACE_PATHBUF_LEN];
+
+ path = android_fstrace_get_pathname(pathbuf,
+ MAX_TRACE_PATHBUF_LEN,
+ inode);
+ trace_android_fs_dataread_start(inode, offset,
+ count, current->pid, path,
+ current->comm);
+ }
+ if (trace_android_fs_datawrite_start_enabled() &&
+ (rw == WRITE)) {
+ char *path, pathbuf[MAX_TRACE_PATHBUF_LEN];
+
+ path = android_fstrace_get_pathname(pathbuf,
+ MAX_TRACE_PATHBUF_LEN,
+ inode);
+ trace_android_fs_datawrite_start(inode, offset, count,
+ current->pid, path,
+ current->comm);
+ }
+ if (rw == WRITE && whint_mode == WHINT_MODE_OFF)
+ iocb->ki_hint = WRITE_LIFE_NOT_SET;
+
+ if (!down_read_trylock(&F2FS_I(inode)->i_gc_rwsem[rw])) {
+ if (iocb->ki_flags & IOCB_NOWAIT) {
+ iocb->ki_hint = hint;
+ err = -EAGAIN;
+ goto out;
+ }
+ down_read(&F2FS_I(inode)->i_gc_rwsem[rw]);
+ }
+
err = blockdev_direct_IO(iocb, inode, iter, get_data_block_dio);
- up_read(&F2FS_I(inode)->dio_rwsem[rw]);
+ up_read(&F2FS_I(inode)->i_gc_rwsem[rw]);
if (rw == WRITE) {
+ if (whint_mode == WHINT_MODE_OFF)
+ iocb->ki_hint = hint;
if (err > 0) {
f2fs_update_iostat(F2FS_I_SB(inode), APP_DIRECT_IO,
err);
@@ -2101,6 +2463,13 @@
f2fs_write_failed(mapping, offset + count);
}
}
+out:
+ if (trace_android_fs_dataread_start_enabled() &&
+ (rw == READ))
+ trace_android_fs_dataread_end(inode, offset, count);
+ if (trace_android_fs_datawrite_start_enabled() &&
+ (rw == WRITE))
+ trace_android_fs_datawrite_end(inode, offset, count);
trace_f2fs_direct_IO_exit(inode, offset, count, rw, err);
@@ -2124,13 +2493,13 @@
dec_page_count(sbi, F2FS_DIRTY_NODES);
} else {
inode_dec_dirty_pages(inode);
- remove_dirty_inode(inode);
+ f2fs_remove_dirty_inode(inode);
}
}
/* This is atomic written page, keep Private */
if (IS_ATOMIC_WRITTEN_PAGE(page))
- return drop_inmem_page(inode, page);
+ return f2fs_drop_inmem_page(inode, page);
set_page_private(page, 0);
ClearPagePrivate(page);
@@ -2151,35 +2520,6 @@
return 1;
}
-/*
- * This was copied from __set_page_dirty_buffers which gives higher performance
- * in very high speed storages. (e.g., pmem)
- */
-void f2fs_set_page_dirty_nobuffers(struct page *page)
-{
- struct address_space *mapping = page->mapping;
- unsigned long flags;
-
- if (unlikely(!mapping))
- return;
-
- spin_lock(&mapping->private_lock);
- lock_page_memcg(page);
- SetPageDirty(page);
- spin_unlock(&mapping->private_lock);
-
- spin_lock_irqsave(&mapping->tree_lock, flags);
- WARN_ON_ONCE(!PageUptodate(page));
- account_page_dirtied(page, mapping);
- radix_tree_tag_set(&mapping->page_tree,
- page_index(page), PAGECACHE_TAG_DIRTY);
- spin_unlock_irqrestore(&mapping->tree_lock, flags);
- unlock_page_memcg(page);
-
- __mark_inode_dirty(mapping->host, I_DIRTY_PAGES);
- return;
-}
-
static int f2fs_set_data_page_dirty(struct page *page)
{
struct address_space *mapping = page->mapping;
@@ -2196,7 +2536,7 @@
if (f2fs_is_atomic_file(inode) && !f2fs_is_commit_atomic_write(inode)) {
if (!IS_ATOMIC_WRITTEN_PAGE(page)) {
- register_inmem_page(inode, page);
+ f2fs_register_inmem_page(inode, page);
return 1;
}
/*
@@ -2207,8 +2547,8 @@
}
if (!PageDirty(page)) {
- f2fs_set_page_dirty_nobuffers(page);
- update_dirty_page(inode, page);
+ __set_page_dirty_nobuffers(page);
+ f2fs_update_dirty_page(inode, page);
return 1;
}
return 0;
@@ -2303,3 +2643,38 @@
.migratepage = f2fs_migrate_page,
#endif
};
+
+void f2fs_clear_radix_tree_dirty_tag(struct page *page)
+{
+ struct address_space *mapping = page_mapping(page);
+ unsigned long flags;
+
+ spin_lock_irqsave(&mapping->tree_lock, flags);
+ radix_tree_tag_clear(&mapping->page_tree, page_index(page),
+ PAGECACHE_TAG_DIRTY);
+ spin_unlock_irqrestore(&mapping->tree_lock, flags);
+}
+
+int __init f2fs_init_post_read_processing(void)
+{
+ bio_post_read_ctx_cache = KMEM_CACHE(bio_post_read_ctx, 0);
+ if (!bio_post_read_ctx_cache)
+ goto fail;
+ bio_post_read_ctx_pool =
+ mempool_create_slab_pool(NUM_PREALLOC_POST_READ_CTXS,
+ bio_post_read_ctx_cache);
+ if (!bio_post_read_ctx_pool)
+ goto fail_free_cache;
+ return 0;
+
+fail_free_cache:
+ kmem_cache_destroy(bio_post_read_ctx_cache);
+fail:
+ return -ENOMEM;
+}
+
+void __exit f2fs_destroy_post_read_processing(void)
+{
+ mempool_destroy(bio_post_read_ctx_pool);
+ kmem_cache_destroy(bio_post_read_ctx_cache);
+}
diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index 87f4498..2d65e77 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -45,9 +45,11 @@
si->ndirty_dent = get_pages(sbi, F2FS_DIRTY_DENTS);
si->ndirty_meta = get_pages(sbi, F2FS_DIRTY_META);
si->ndirty_data = get_pages(sbi, F2FS_DIRTY_DATA);
+ si->ndirty_qdata = get_pages(sbi, F2FS_DIRTY_QDATA);
si->ndirty_imeta = get_pages(sbi, F2FS_DIRTY_IMETA);
si->ndirty_dirs = sbi->ndirty_inode[DIR_INODE];
si->ndirty_files = sbi->ndirty_inode[FILE_INODE];
+ si->nquota_files = sbi->nquota_files;
si->ndirty_all = sbi->ndirty_inode[DIRTY_META];
si->inmem_pages = get_pages(sbi, F2FS_INMEM_PAGES);
si->aw_cnt = atomic_read(&sbi->aw_cnt);
@@ -61,6 +63,8 @@
atomic_read(&SM_I(sbi)->fcc_info->issued_flush);
si->nr_flushing =
atomic_read(&SM_I(sbi)->fcc_info->issing_flush);
+ si->flush_list_empty =
+ llist_empty(&SM_I(sbi)->fcc_info->issue_list);
}
if (SM_I(sbi) && SM_I(sbi)->dcc_info) {
si->nr_discarded =
@@ -96,10 +100,12 @@
si->dirty_nats = NM_I(sbi)->dirty_nat_cnt;
si->sits = MAIN_SEGS(sbi);
si->dirty_sits = SIT_I(sbi)->dirty_sentries;
- si->free_nids = NM_I(sbi)->nid_cnt[FREE_NID_LIST];
+ si->free_nids = NM_I(sbi)->nid_cnt[FREE_NID];
si->avail_nids = NM_I(sbi)->available_nids;
- si->alloc_nids = NM_I(sbi)->nid_cnt[ALLOC_NID_LIST];
+ si->alloc_nids = NM_I(sbi)->nid_cnt[PREALLOC_NID];
si->bg_gc = sbi->bg_gc;
+ si->skipped_atomic_files[BG_GC] = sbi->skipped_atomic_files[BG_GC];
+ si->skipped_atomic_files[FG_GC] = sbi->skipped_atomic_files[FG_GC];
si->util_free = (int)(free_user_blocks(sbi) >> sbi->log_blocks_per_seg)
* 100 / (int)(sbi->user_block_count >> sbi->log_blocks_per_seg)
/ 2;
@@ -175,7 +181,6 @@
si->base_mem += sizeof(struct f2fs_sb_info) + sbi->sb->s_blocksize;
si->base_mem += 2 * sizeof(struct f2fs_inode_info);
si->base_mem += sizeof(*sbi->ckpt);
- si->base_mem += sizeof(struct percpu_counter) * NR_COUNT_TYPE;
/* build sm */
si->base_mem += sizeof(struct f2fs_sm_info);
@@ -231,14 +236,14 @@
}
/* free nids */
- si->cache_mem += (NM_I(sbi)->nid_cnt[FREE_NID_LIST] +
- NM_I(sbi)->nid_cnt[ALLOC_NID_LIST]) *
+ si->cache_mem += (NM_I(sbi)->nid_cnt[FREE_NID] +
+ NM_I(sbi)->nid_cnt[PREALLOC_NID]) *
sizeof(struct free_nid);
si->cache_mem += NM_I(sbi)->nat_cnt * sizeof(struct nat_entry);
si->cache_mem += NM_I(sbi)->dirty_nat_cnt *
sizeof(struct nat_entry_set);
si->cache_mem += si->inmem_pages * sizeof(struct inmem_pages);
- for (i = 0; i <= ORPHAN_INO; i++)
+ for (i = 0; i < MAX_INO_ENTRY; i++)
si->cache_mem += sbi->im[i].ino_num * sizeof(struct ino_entry);
si->cache_mem += atomic_read(&sbi->total_ext_tree) *
sizeof(struct extent_tree);
@@ -262,9 +267,10 @@
list_for_each_entry(si, &f2fs_stat_list, stat_list) {
update_general_status(si->sbi);
- seq_printf(s, "\n=====[ partition info(%pg). #%d, %s]=====\n",
+ seq_printf(s, "\n=====[ partition info(%pg). #%d, %s, CP: %s]=====\n",
si->sbi->sb->s_bdev, i++,
- f2fs_readonly(si->sbi->sb) ? "RO": "RW");
+ f2fs_readonly(si->sbi->sb) ? "RO": "RW",
+ f2fs_cp_error(si->sbi) ? "Error": "Good");
seq_printf(s, "[SB: 1] [CP: 2] [SIT: %d] [NAT: %d] ",
si->sit_area_segs, si->nat_area_segs);
seq_printf(s, "[SSA: %d] [MAIN: %d",
@@ -338,6 +344,10 @@
si->bg_data_blks);
seq_printf(s, " - node blocks : %d (%d)\n", si->node_blks,
si->bg_node_blks);
+ seq_printf(s, "Skipped : atomic write %llu (%llu)\n",
+ si->skipped_atomic_files[BG_GC] +
+ si->skipped_atomic_files[FG_GC],
+ si->skipped_atomic_files[BG_GC]);
seq_puts(s, "\nExtent Cache:\n");
seq_printf(s, " - Hit Count: L1-1:%llu L1-2:%llu L2:%llu\n",
si->hit_largest, si->hit_cached,
@@ -349,10 +359,11 @@
seq_printf(s, " - Inner Struct Count: tree: %d(%d), node: %d\n",
si->ext_tree, si->zombie_tree, si->ext_node);
seq_puts(s, "\nBalancing F2FS Async:\n");
- seq_printf(s, " - IO (CP: %4d, Data: %4d, Flush: (%4d %4d), "
+ seq_printf(s, " - IO (CP: %4d, Data: %4d, Flush: (%4d %4d %4d), "
"Discard: (%4d %4d)) cmd: %4d undiscard:%4u\n",
si->nr_wb_cp_data, si->nr_wb_data,
si->nr_flushing, si->nr_flushed,
+ si->flush_list_empty,
si->nr_discarding, si->nr_discarded,
si->nr_discard_cmd, si->undiscard_blks);
seq_printf(s, " - inmem: %4d, atomic IO: %4d (Max. %4d), "
@@ -365,6 +376,8 @@
si->ndirty_dent, si->ndirty_dirs, si->ndirty_all);
seq_printf(s, " - datas: %4d in files:%4d\n",
si->ndirty_data, si->ndirty_files);
+ seq_printf(s, " - quota datas: %4d in quota files:%4d\n",
+ si->ndirty_qdata, si->nquota_files);
seq_printf(s, " - meta: %4d in %4d\n",
si->ndirty_meta, si->meta_pages);
seq_printf(s, " - imeta: %4d\n",
@@ -432,7 +445,7 @@
struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
struct f2fs_stat_info *si;
- si = kzalloc(sizeof(struct f2fs_stat_info), GFP_KERNEL);
+ si = f2fs_kzalloc(sbi, sizeof(struct f2fs_stat_info), GFP_KERNEL);
if (!si)
return -ENOMEM;
diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c
index c0c933ad..7f955c4 100644
--- a/fs/f2fs/dir.c
+++ b/fs/f2fs/dir.c
@@ -10,10 +10,12 @@
*/
#include <linux/fs.h>
#include <linux/f2fs_fs.h>
+#include <linux/sched/signal.h>
#include "f2fs.h"
#include "node.h"
#include "acl.h"
#include "xattr.h"
+#include <trace/events/f2fs.h>
static unsigned long dir_blocks(struct inode *inode)
{
@@ -58,12 +60,12 @@
[S_IFLNK >> S_SHIFT] = F2FS_FT_SYMLINK,
};
-void set_de_type(struct f2fs_dir_entry *de, umode_t mode)
+static void set_de_type(struct f2fs_dir_entry *de, umode_t mode)
{
de->file_type = f2fs_type_by_mode[(mode & S_IFMT) >> S_SHIFT];
}
-unsigned char get_de_type(struct f2fs_dir_entry *de)
+unsigned char f2fs_get_de_type(struct f2fs_dir_entry *de)
{
if (de->file_type < F2FS_FT_MAX)
return f2fs_filetype_table[de->file_type];
@@ -92,19 +94,17 @@
struct f2fs_dir_entry *de;
struct f2fs_dentry_ptr d;
- dentry_blk = (struct f2fs_dentry_block *)kmap(dentry_page);
+ dentry_blk = (struct f2fs_dentry_block *)page_address(dentry_page);
make_dentry_ptr_block(NULL, &d, dentry_blk);
- de = find_target_dentry(fname, namehash, max_slots, &d);
+ de = f2fs_find_target_dentry(fname, namehash, max_slots, &d);
if (de)
*res_page = dentry_page;
- else
- kunmap(dentry_page);
return de;
}
-struct f2fs_dir_entry *find_target_dentry(struct fscrypt_name *fname,
+struct f2fs_dir_entry *f2fs_find_target_dentry(struct fscrypt_name *fname,
f2fs_hash_t namehash, int *max_slots,
struct f2fs_dentry_ptr *d)
{
@@ -171,7 +171,7 @@
for (; bidx < end_block; bidx++) {
/* no need to allocate new dentry pages to all the indices */
- dentry_page = find_data_page(dir, bidx);
+ dentry_page = f2fs_find_data_page(dir, bidx);
if (IS_ERR(dentry_page)) {
if (PTR_ERR(dentry_page) == -ENOENT) {
room = true;
@@ -210,7 +210,7 @@
if (f2fs_has_inline_dentry(dir)) {
*res_page = NULL;
- de = find_in_inline_dir(dir, fname, res_page);
+ de = f2fs_find_in_inline_dir(dir, fname, res_page);
goto out;
}
@@ -285,7 +285,6 @@
de = f2fs_find_entry(dir, qstr, page);
if (de) {
res = le32_to_cpu(de->ino);
- f2fs_dentry_kunmap(dir, *page);
f2fs_put_page(*page, 0);
}
@@ -300,7 +299,6 @@
f2fs_wait_on_page_writeback(page, type, true);
de->ino = cpu_to_le32(inode->i_ino);
set_de_type(de, inode->i_mode);
- f2fs_dentry_kunmap(dir, page);
set_page_dirty(page);
dir->i_mtime = dir->i_ctime = current_time(dir);
@@ -321,7 +319,7 @@
set_page_dirty(ipage);
}
-void do_make_empty_dir(struct inode *inode, struct inode *parent,
+void f2fs_do_make_empty_dir(struct inode *inode, struct inode *parent,
struct f2fs_dentry_ptr *d)
{
struct qstr dot = QSTR_INIT(".", 1);
@@ -342,33 +340,32 @@
struct f2fs_dentry_ptr d;
if (f2fs_has_inline_dentry(inode))
- return make_empty_inline_dir(inode, parent, page);
+ return f2fs_make_empty_inline_dir(inode, parent, page);
- dentry_page = get_new_data_page(inode, page, 0, true);
+ dentry_page = f2fs_get_new_data_page(inode, page, 0, true);
if (IS_ERR(dentry_page))
return PTR_ERR(dentry_page);
- dentry_blk = kmap_atomic(dentry_page);
+ dentry_blk = page_address(dentry_page);
make_dentry_ptr_block(NULL, &d, dentry_blk);
- do_make_empty_dir(inode, parent, &d);
-
- kunmap_atomic(dentry_blk);
+ f2fs_do_make_empty_dir(inode, parent, &d);
set_page_dirty(dentry_page);
f2fs_put_page(dentry_page, 1);
return 0;
}
-struct page *init_inode_metadata(struct inode *inode, struct inode *dir,
+struct page *f2fs_init_inode_metadata(struct inode *inode, struct inode *dir,
const struct qstr *new_name, const struct qstr *orig_name,
struct page *dpage)
{
struct page *page;
+ int dummy_encrypt = DUMMY_ENCRYPTION_ENABLED(F2FS_I_SB(dir));
int err;
if (is_inode_flag_set(inode, FI_NEW_INODE)) {
- page = new_inode_page(inode);
+ page = f2fs_new_inode_page(inode);
if (IS_ERR(page))
return page;
@@ -391,17 +388,16 @@
if (err)
goto put_error;
- if (f2fs_encrypted_inode(dir) && f2fs_may_encrypt(inode)) {
+ if ((f2fs_encrypted_inode(dir) || dummy_encrypt) &&
+ f2fs_may_encrypt(inode)) {
err = fscrypt_inherit_context(dir, inode, page, false);
if (err)
goto put_error;
}
} else {
- page = get_node_page(F2FS_I_SB(dir), inode->i_ino);
+ page = f2fs_get_node_page(F2FS_I_SB(dir), inode->i_ino);
if (IS_ERR(page))
return page;
-
- set_cold_node(inode, page);
}
if (new_name) {
@@ -422,19 +418,19 @@
* we should remove this inode from orphan list.
*/
if (inode->i_nlink == 0)
- remove_orphan_inode(F2FS_I_SB(dir), inode->i_ino);
+ f2fs_remove_orphan_inode(F2FS_I_SB(dir), inode->i_ino);
f2fs_i_links_write(inode, true);
}
return page;
put_error:
clear_nlink(inode);
- update_inode(inode, page);
+ f2fs_update_inode(inode, page);
f2fs_put_page(page, 1);
return ERR_PTR(err);
}
-void update_parent_metadata(struct inode *dir, struct inode *inode,
+void f2fs_update_parent_metadata(struct inode *dir, struct inode *inode,
unsigned int current_depth)
{
if (inode && is_inode_flag_set(inode, FI_NEW_INODE)) {
@@ -452,7 +448,7 @@
clear_inode_flag(inode, FI_INC_LINK);
}
-int room_for_filename(const void *bitmap, int slots, int max_slots)
+int f2fs_room_for_filename(const void *bitmap, int slots, int max_slots)
{
int bit_start = 0;
int zero_start, zero_end;
@@ -541,17 +537,16 @@
(le32_to_cpu(dentry_hash) % nbucket));
for (block = bidx; block <= (bidx + nblock - 1); block++) {
- dentry_page = get_new_data_page(dir, NULL, block, true);
+ dentry_page = f2fs_get_new_data_page(dir, NULL, block, true);
if (IS_ERR(dentry_page))
return PTR_ERR(dentry_page);
- dentry_blk = kmap(dentry_page);
- bit_pos = room_for_filename(&dentry_blk->dentry_bitmap,
+ dentry_blk = page_address(dentry_page);
+ bit_pos = f2fs_room_for_filename(&dentry_blk->dentry_bitmap,
slots, NR_DENTRY_IN_BLOCK);
if (bit_pos < NR_DENTRY_IN_BLOCK)
goto add_dentry;
- kunmap(dentry_page);
f2fs_put_page(dentry_page, 1);
}
@@ -563,7 +558,7 @@
if (inode) {
down_write(&F2FS_I(inode)->i_sem);
- page = init_inode_metadata(inode, dir, new_name,
+ page = f2fs_init_inode_metadata(inode, dir, new_name,
orig_name, NULL);
if (IS_ERR(page)) {
err = PTR_ERR(page);
@@ -581,18 +576,17 @@
f2fs_put_page(page, 1);
}
- update_parent_metadata(dir, inode, current_depth);
+ f2fs_update_parent_metadata(dir, inode, current_depth);
fail:
if (inode)
up_write(&F2FS_I(inode)->i_sem);
- kunmap(dentry_page);
f2fs_put_page(dentry_page, 1);
return err;
}
-int __f2fs_do_add_link(struct inode *dir, struct fscrypt_name *fname,
+int f2fs_add_dentry(struct inode *dir, struct fscrypt_name *fname,
struct inode *inode, nid_t ino, umode_t mode)
{
struct qstr new_name;
@@ -616,7 +610,7 @@
* Caller should grab and release a rwsem by calling f2fs_lock_op() and
* f2fs_unlock_op().
*/
-int __f2fs_add_link(struct inode *dir, const struct qstr *name,
+int f2fs_do_add_link(struct inode *dir, const struct qstr *name,
struct inode *inode, nid_t ino, umode_t mode)
{
struct fscrypt_name fname;
@@ -640,13 +634,12 @@
F2FS_I(dir)->task = NULL;
}
if (de) {
- f2fs_dentry_kunmap(dir, page);
f2fs_put_page(page, 0);
err = -EEXIST;
} else if (IS_ERR(page)) {
err = PTR_ERR(page);
} else {
- err = __f2fs_do_add_link(dir, &fname, inode, ino, mode);
+ err = f2fs_add_dentry(dir, &fname, inode, ino, mode);
}
fscrypt_free_filename(&fname);
return err;
@@ -658,7 +651,7 @@
int err = 0;
down_write(&F2FS_I(inode)->i_sem);
- page = init_inode_metadata(inode, dir, NULL, NULL, NULL);
+ page = f2fs_init_inode_metadata(inode, dir, NULL, NULL, NULL);
if (IS_ERR(page)) {
err = PTR_ERR(page);
goto fail;
@@ -690,9 +683,9 @@
up_write(&F2FS_I(inode)->i_sem);
if (inode->i_nlink == 0)
- add_orphan_inode(inode);
+ f2fs_add_orphan_inode(inode);
else
- release_orphan_inode(sbi);
+ f2fs_release_orphan_inode(sbi);
}
/*
@@ -705,12 +698,13 @@
struct f2fs_dentry_block *dentry_blk;
unsigned int bit_pos;
int slots = GET_DENTRY_SLOTS(le16_to_cpu(dentry->name_len));
- struct address_space *mapping = page_mapping(page);
- unsigned long flags;
int i;
f2fs_update_time(F2FS_I_SB(dir), REQ_TIME);
+ if (F2FS_OPTION(F2FS_I_SB(dir)).fsync_mode == FSYNC_MODE_STRICT)
+ f2fs_add_ino_entry(F2FS_I_SB(dir), dir->i_ino, TRANS_DIR_INO);
+
if (f2fs_has_inline_dentry(dir))
return f2fs_delete_inline_entry(dentry, page, dir, inode);
@@ -726,7 +720,6 @@
bit_pos = find_next_bit_le(&dentry_blk->dentry_bitmap,
NR_DENTRY_IN_BLOCK,
0);
- kunmap(page); /* kunmap - pair of f2fs_find_entry */
set_page_dirty(page);
dir->i_ctime = dir->i_mtime = current_time(dir);
@@ -736,17 +729,13 @@
f2fs_drop_nlink(dir, inode);
if (bit_pos == NR_DENTRY_IN_BLOCK &&
- !truncate_hole(dir, page->index, page->index + 1)) {
- spin_lock_irqsave(&mapping->tree_lock, flags);
- radix_tree_tag_clear(&mapping->page_tree, page_index(page),
- PAGECACHE_TAG_DIRTY);
- spin_unlock_irqrestore(&mapping->tree_lock, flags);
-
+ !f2fs_truncate_hole(dir, page->index, page->index + 1)) {
+ f2fs_clear_radix_tree_dirty_tag(page);
clear_page_dirty_for_io(page);
ClearPagePrivate(page);
ClearPageUptodate(page);
inode_dec_dirty_pages(dir);
- remove_dirty_inode(dir);
+ f2fs_remove_dirty_inode(dir);
}
f2fs_put_page(page, 1);
}
@@ -763,7 +752,7 @@
return f2fs_empty_inline_dir(dir);
for (bidx = 0; bidx < nblock; bidx++) {
- dentry_page = get_lock_data_page(dir, bidx, false);
+ dentry_page = f2fs_get_lock_data_page(dir, bidx, false);
if (IS_ERR(dentry_page)) {
if (PTR_ERR(dentry_page) == -ENOENT)
continue;
@@ -771,7 +760,7 @@
return false;
}
- dentry_blk = kmap_atomic(dentry_page);
+ dentry_blk = page_address(dentry_page);
if (bidx == 0)
bit_pos = 2;
else
@@ -779,7 +768,6 @@
bit_pos = find_next_bit_le(&dentry_blk->dentry_bitmap,
NR_DENTRY_IN_BLOCK,
bit_pos);
- kunmap_atomic(dentry_blk);
f2fs_put_page(dentry_page, 1);
@@ -796,6 +784,7 @@
unsigned int bit_pos;
struct f2fs_dir_entry *de = NULL;
struct fscrypt_str de_name = FSTR_INIT(NULL, 0);
+ struct f2fs_sb_info *sbi = F2FS_I_SB(d->inode);
bit_pos = ((unsigned long)ctx->pos % d->max);
@@ -811,7 +800,7 @@
continue;
}
- d_type = get_de_type(de);
+ d_type = f2fs_get_de_type(de);
de_name.name = d->filename[bit_pos];
de_name.len = le16_to_cpu(de->name_len);
@@ -834,6 +823,9 @@
le32_to_cpu(de->ino), d_type))
return 1;
+ if (sbi->readdir_ra == 1)
+ f2fs_ra_node_page(sbi, le32_to_cpu(de->ino));
+
bit_pos += GET_DENTRY_SLOTS(le16_to_cpu(de->name_len));
ctx->pos = start_pos + bit_pos;
}
@@ -847,6 +839,7 @@
struct f2fs_dentry_block *dentry_blk = NULL;
struct page *dentry_page = NULL;
struct file_ra_state *ra = &file->f_ra;
+ loff_t start_pos = ctx->pos;
unsigned int n = ((unsigned long)ctx->pos / NR_DENTRY_IN_BLOCK);
struct f2fs_dentry_ptr d;
struct fscrypt_str fstr = FSTR_INIT(NULL, 0);
@@ -855,53 +848,60 @@
if (f2fs_encrypted_inode(inode)) {
err = fscrypt_get_encryption_info(inode);
if (err && err != -ENOKEY)
- return err;
+ goto out;
err = fscrypt_fname_alloc_buffer(inode, F2FS_NAME_LEN, &fstr);
if (err < 0)
- return err;
+ goto out;
}
if (f2fs_has_inline_dentry(inode)) {
err = f2fs_read_inline_dir(file, ctx, &fstr);
- goto out;
+ goto out_free;
}
- /* readahead for multi pages of dir */
- if (npages - n > 1 && !ra_has_index(ra, n))
- page_cache_sync_readahead(inode->i_mapping, ra, file, n,
+ for (; n < npages; n++, ctx->pos = n * NR_DENTRY_IN_BLOCK) {
+
+ /* allow readdir() to be interrupted */
+ if (fatal_signal_pending(current)) {
+ err = -ERESTARTSYS;
+ goto out_free;
+ }
+ cond_resched();
+
+ /* readahead for multi pages of dir */
+ if (npages - n > 1 && !ra_has_index(ra, n))
+ page_cache_sync_readahead(inode->i_mapping, ra, file, n,
min(npages - n, (pgoff_t)MAX_DIR_RA_PAGES));
- for (; n < npages; n++) {
- dentry_page = get_lock_data_page(inode, n, false);
+ dentry_page = f2fs_get_lock_data_page(inode, n, false);
if (IS_ERR(dentry_page)) {
err = PTR_ERR(dentry_page);
if (err == -ENOENT) {
err = 0;
continue;
} else {
- goto out;
+ goto out_free;
}
}
- dentry_blk = kmap(dentry_page);
+ dentry_blk = page_address(dentry_page);
make_dentry_ptr_block(inode, &d, dentry_blk);
err = f2fs_fill_dentries(ctx, &d,
n * NR_DENTRY_IN_BLOCK, &fstr);
if (err) {
- kunmap(dentry_page);
f2fs_put_page(dentry_page, 1);
break;
}
- ctx->pos = (n + 1) * NR_DENTRY_IN_BLOCK;
- kunmap(dentry_page);
f2fs_put_page(dentry_page, 1);
}
-out:
+out_free:
fscrypt_fname_free_buffer(&fstr);
+out:
+ trace_f2fs_readdir(inode, start_pos, ctx->pos, err);
return err < 0 ? err : 0;
}
diff --git a/fs/f2fs/extent_cache.c b/fs/f2fs/extent_cache.c
index aff6c2e..231b77e 100644
--- a/fs/f2fs/extent_cache.c
+++ b/fs/f2fs/extent_cache.c
@@ -49,7 +49,7 @@
return NULL;
}
-struct rb_entry *__lookup_rb_tree(struct rb_root *root,
+struct rb_entry *f2fs_lookup_rb_tree(struct rb_root *root,
struct rb_entry *cached_re, unsigned int ofs)
{
struct rb_entry *re;
@@ -61,7 +61,7 @@
return re;
}
-struct rb_node **__lookup_rb_tree_for_insert(struct f2fs_sb_info *sbi,
+struct rb_node **f2fs_lookup_rb_tree_for_insert(struct f2fs_sb_info *sbi,
struct rb_root *root, struct rb_node **parent,
unsigned int ofs)
{
@@ -92,7 +92,7 @@
* in order to simpfy the insertion after.
* tree must stay unchanged between lookup and insertion.
*/
-struct rb_entry *__lookup_rb_tree_ret(struct rb_root *root,
+struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root *root,
struct rb_entry *cached_re,
unsigned int ofs,
struct rb_entry **prev_entry,
@@ -159,7 +159,7 @@
return re;
}
-bool __check_rb_tree_consistence(struct f2fs_sb_info *sbi,
+bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
struct rb_root *root)
{
#ifdef CONFIG_F2FS_CHECK_FS
@@ -390,7 +390,7 @@
goto out;
}
- en = (struct extent_node *)__lookup_rb_tree(&et->root,
+ en = (struct extent_node *)f2fs_lookup_rb_tree(&et->root,
(struct rb_entry *)et->cached_en, pgofs);
if (!en)
goto out;
@@ -460,7 +460,7 @@
struct rb_node *insert_parent)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
- struct rb_node **p = &et->root.rb_node;
+ struct rb_node **p;
struct rb_node *parent = NULL;
struct extent_node *en = NULL;
@@ -470,7 +470,7 @@
goto do_insert;
}
- p = __lookup_rb_tree_for_insert(sbi, &et->root, &parent, ei->fofs);
+ p = f2fs_lookup_rb_tree_for_insert(sbi, &et->root, &parent, ei->fofs);
do_insert:
en = __attach_extent_node(sbi, et, ei, parent, p);
if (!en)
@@ -520,7 +520,7 @@
__drop_largest_extent(inode, fofs, len);
/* 1. lookup first extent node in range [fofs, fofs + len - 1] */
- en = (struct extent_node *)__lookup_rb_tree_ret(&et->root,
+ en = (struct extent_node *)f2fs_lookup_rb_tree_ret(&et->root,
(struct rb_entry *)et->cached_en, fofs,
(struct rb_entry **)&prev_en,
(struct rb_entry **)&next_en,
@@ -773,7 +773,7 @@
else
blkaddr = dn->data_blkaddr;
- fofs = start_bidx_of_node(ofs_of_node(dn->node_page), dn->inode) +
+ fofs = f2fs_start_bidx_of_node(ofs_of_node(dn->node_page), dn->inode) +
dn->ofs_in_node;
f2fs_update_extent_tree_range(dn->inode, fofs, blkaddr, 1);
}
@@ -788,7 +788,7 @@
f2fs_update_extent_tree_range(dn->inode, fofs, blkaddr, len);
}
-void init_extent_cache_info(struct f2fs_sb_info *sbi)
+void f2fs_init_extent_cache_info(struct f2fs_sb_info *sbi)
{
INIT_RADIX_TREE(&sbi->extent_tree_root, GFP_NOIO);
mutex_init(&sbi->extent_tree_lock);
@@ -800,7 +800,7 @@
atomic_set(&sbi->total_ext_node, 0);
}
-int __init create_extent_cache(void)
+int __init f2fs_create_extent_cache(void)
{
extent_tree_slab = f2fs_kmem_cache_create("f2fs_extent_tree",
sizeof(struct extent_tree));
@@ -815,7 +815,7 @@
return 0;
}
-void destroy_extent_cache(void)
+void f2fs_destroy_extent_cache(void)
{
kmem_cache_destroy(extent_node_slab);
kmem_cache_destroy(extent_tree_slab);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 54f8520..6ac53c5 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -19,16 +19,16 @@
#include <linux/magic.h>
#include <linux/kobject.h>
#include <linux/sched.h>
+#include <linux/cred.h>
#include <linux/vmalloc.h>
#include <linux/bio.h>
#include <linux/blkdev.h>
#include <linux/quotaops.h>
-#ifdef CONFIG_F2FS_FS_ENCRYPTION
-#include <linux/fscrypt_supp.h>
-#else
-#include <linux/fscrypt_notsupp.h>
-#endif
#include <crypto/hash.h>
+#include <linux/overflow.h>
+
+#define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_F2FS_FS_ENCRYPTION)
+#include <linux/fscrypt.h>
#ifdef CONFIG_F2FS_CHECK_FS
#define f2fs_bug_on(sbi, condition) BUG_ON(condition)
@@ -45,7 +45,10 @@
#ifdef CONFIG_F2FS_FAULT_INJECTION
enum {
FAULT_KMALLOC,
+ FAULT_KVMALLOC,
FAULT_PAGE_ALLOC,
+ FAULT_PAGE_GET,
+ FAULT_ALLOC_BIO,
FAULT_ALLOC_NID,
FAULT_ORPHAN,
FAULT_BLOCK,
@@ -93,10 +96,13 @@
#define F2FS_MOUNT_GRPQUOTA 0x00100000
#define F2FS_MOUNT_PRJQUOTA 0x00200000
#define F2FS_MOUNT_QUOTA 0x00400000
+#define F2FS_MOUNT_INLINE_XATTR_SIZE 0x00800000
+#define F2FS_MOUNT_RESERVE_ROOT 0x01000000
-#define clear_opt(sbi, option) ((sbi)->mount_opt.opt &= ~F2FS_MOUNT_##option)
-#define set_opt(sbi, option) ((sbi)->mount_opt.opt |= F2FS_MOUNT_##option)
-#define test_opt(sbi, option) ((sbi)->mount_opt.opt & F2FS_MOUNT_##option)
+#define F2FS_OPTION(sbi) ((sbi)->mount_opt)
+#define clear_opt(sbi, option) (F2FS_OPTION(sbi).opt &= ~F2FS_MOUNT_##option)
+#define set_opt(sbi, option) (F2FS_OPTION(sbi).opt |= F2FS_MOUNT_##option)
+#define test_opt(sbi, option) (F2FS_OPTION(sbi).opt & F2FS_MOUNT_##option)
#define ver_after(a, b) (typecheck(unsigned long long, a) && \
typecheck(unsigned long long, b) && \
@@ -109,7 +115,26 @@
typedef u32 nid_t;
struct f2fs_mount_info {
- unsigned int opt;
+ unsigned int opt;
+ int write_io_size_bits; /* Write IO size bits */
+ block_t root_reserved_blocks; /* root reserved blocks */
+ kuid_t s_resuid; /* reserved blocks for uid */
+ kgid_t s_resgid; /* reserved blocks for gid */
+ int active_logs; /* # of active logs */
+ int inline_xattr_size; /* inline xattr size */
+#ifdef CONFIG_F2FS_FAULT_INJECTION
+ struct f2fs_fault_info fault_info; /* For fault injection */
+#endif
+#ifdef CONFIG_QUOTA
+ /* Names of quota files with journalled quota */
+ char *s_qf_names[MAXQUOTAS];
+ int s_jquota_fmt; /* Format of quota to use */
+#endif
+ /* For which write hints are passed down to block layer */
+ int whint_mode;
+ int alloc_mode; /* segment allocation policy */
+ int fsync_mode; /* fsync policy */
+ bool test_dummy_encryption; /* test dummy encryption */
};
#define F2FS_FEATURE_ENCRYPT 0x0001
@@ -118,6 +143,11 @@
#define F2FS_FEATURE_EXTRA_ATTR 0x0008
#define F2FS_FEATURE_PRJQUOTA 0x0010
#define F2FS_FEATURE_INODE_CHKSUM 0x0020
+#define F2FS_FEATURE_FLEXIBLE_INLINE_XATTR 0x0040
+#define F2FS_FEATURE_QUOTA_INO 0x0080
+#define F2FS_FEATURE_INODE_CRTIME 0x0100
+#define F2FS_FEATURE_LOST_FOUND 0x0200
+#define F2FS_FEATURE_VERITY 0x0400 /* reserved */
#define F2FS_HAS_FEATURE(sb, mask) \
((F2FS_SB(sb)->raw_super->feature & cpu_to_le32(mask)) != 0)
@@ -127,6 +157,12 @@
(F2FS_SB(sb)->raw_super->feature &= ~cpu_to_le32(mask))
/*
+ * Default values for user and/or group using reserved blocks
+ */
+#define F2FS_DEF_RESUID 0
+#define F2FS_DEF_RESGID 0
+
+/*
* For checkpoint manager
*/
enum {
@@ -141,15 +177,13 @@
#define CP_DISCARD 0x00000010
#define CP_TRIMMED 0x00000020
-#define DEF_BATCHED_TRIM_SECTIONS 2048
-#define BATCHED_TRIM_SEGMENTS(sbi) \
- (GET_SEG_FROM_SEC(sbi, SM_I(sbi)->trim_sections))
-#define BATCHED_TRIM_BLOCKS(sbi) \
- (BATCHED_TRIM_SEGMENTS(sbi) << (sbi)->log_blocks_per_seg)
#define MAX_DISCARD_BLOCKS(sbi) BLKS_PER_SEC(sbi)
-#define DISCARD_ISSUE_RATE 8
+#define DEF_MAX_DISCARD_REQUEST 8 /* issue 8 discards per round */
+#define DEF_MAX_DISCARD_LEN 512 /* Max. 2MB per discard */
#define DEF_MIN_DISCARD_ISSUE_TIME 50 /* 50 ms, if exists */
+#define DEF_MID_DISCARD_ISSUE_TIME 500 /* 500 ms, if device busy */
#define DEF_MAX_DISCARD_ISSUE_TIME 60000 /* 60 s, if no candidates */
+#define DEF_DISCARD_URGENT_UTIL 80 /* do more discard over 80% */
#define DEF_CP_INTERVAL 60 /* 60 secs */
#define DEF_IDLE_INTERVAL 5 /* 5 secs */
@@ -158,7 +192,6 @@
__u64 trim_start;
__u64 trim_end;
__u64 trim_minlen;
- __u64 trimmed;
};
/*
@@ -177,12 +210,15 @@
ORPHAN_INO, /* for orphan ino list */
APPEND_INO, /* for append ino list */
UPDATE_INO, /* for update ino list */
+ TRANS_DIR_INO, /* for trasactions dir ino list */
+ FLUSH_INO, /* for multiple device flushing */
MAX_INO_ENTRY, /* max. list */
};
struct ino_entry {
- struct list_head list; /* list head */
- nid_t ino; /* inode number */
+ struct list_head list; /* list head */
+ nid_t ino; /* inode number */
+ unsigned int dirty_device; /* dirty device bitmap */
};
/* for the list of inodes to be GCed */
@@ -206,10 +242,6 @@
#define plist_idx(blk_num) ((blk_num) >= MAX_PLIST_NUM ? \
(MAX_PLIST_NUM - 1) : (blk_num - 1))
-#define P_ACTIVE 0x01
-#define P_TRIM 0x02
-#define plist_issue(tag) (((tag) & P_ACTIVE) || ((tag) & P_TRIM))
-
enum {
D_PREP,
D_SUBMIT,
@@ -241,12 +273,32 @@
int error; /* bio error */
};
+enum {
+ DPOLICY_BG,
+ DPOLICY_FORCE,
+ DPOLICY_FSTRIM,
+ DPOLICY_UMOUNT,
+ MAX_DPOLICY,
+};
+
+struct discard_policy {
+ int type; /* type of discard */
+ unsigned int min_interval; /* used for candidates exist */
+ unsigned int mid_interval; /* used for device busy */
+ unsigned int max_interval; /* used for candidates not exist */
+ unsigned int max_requests; /* # of discards issued per round */
+ unsigned int io_aware_gran; /* minimum granularity discard not be aware of I/O */
+ bool io_aware; /* issue discard in idle time */
+ bool sync; /* submit discard with REQ_SYNC flag */
+ unsigned int granularity; /* discard granularity */
+};
+
struct discard_cmd_control {
struct task_struct *f2fs_issue_discard; /* discard thread */
struct list_head entry_list; /* 4KB discard entry list */
struct list_head pend_list[MAX_PLIST_NUM];/* store pending entries */
- unsigned char pend_list_tag[MAX_PLIST_NUM];/* tag for pending entries */
struct list_head wait_list; /* store on-flushing entries */
+ struct list_head fstrim_list; /* in-flight discard from fstrim */
wait_queue_head_t discard_wait_queue; /* waiting queue for wake-up */
unsigned int discard_wake; /* to wake up discard thread */
struct mutex cmd_lock;
@@ -327,6 +379,9 @@
#define F2FS_IOC_GARBAGE_COLLECT_RANGE _IOW(F2FS_IOCTL_MAGIC, 11, \
struct f2fs_gc_range)
#define F2FS_IOC_GET_FEATURES _IOR(F2FS_IOCTL_MAGIC, 12, __u32)
+#define F2FS_IOC_SET_PIN_FILE _IOW(F2FS_IOCTL_MAGIC, 13, __u32)
+#define F2FS_IOC_GET_PIN_FILE _IOR(F2FS_IOCTL_MAGIC, 14, __u32)
+#define F2FS_IOC_PRECACHE_EXTENTS _IO(F2FS_IOCTL_MAGIC, 15)
#define F2FS_IOC_SET_ENCRYPTION_POLICY FS_IOC_SET_ENCRYPTION_POLICY
#define F2FS_IOC_GET_ENCRYPTION_POLICY FS_IOC_GET_ENCRYPTION_POLICY
@@ -379,11 +434,13 @@
/* for inline stuff */
#define DEF_INLINE_RESERVED_SIZE 1
+#define DEF_MIN_INLINE_SIZE 1
static inline int get_extra_isize(struct inode *inode);
-#define MAX_INLINE_DATA(inode) (sizeof(__le32) * \
- (CUR_ADDRS_PER_INODE(inode) - \
- DEF_INLINE_RESERVED_SIZE - \
- F2FS_INLINE_XATTR_ADDRS))
+static inline int get_inline_xattr_addrs(struct inode *inode);
+#define MAX_INLINE_DATA(inode) (sizeof(__le32) * \
+ (CUR_ADDRS_PER_INODE(inode) - \
+ get_inline_xattr_addrs(inode) - \
+ DEF_INLINE_RESERVED_SIZE))
/* for inline dir */
#define NR_INLINE_DENTRY(inode) (MAX_INLINE_DATA(inode) * BITS_PER_BYTE / \
@@ -415,7 +472,7 @@
d->inode = inode;
d->max = NR_DENTRY_IN_BLOCK;
d->nr_bitmap = SIZE_OF_DENTRY_BITMAP;
- d->bitmap = &t->dentry_bitmap;
+ d->bitmap = t->dentry_bitmap;
d->dentry = t->dentry;
d->filename = t->filename;
}
@@ -519,6 +576,8 @@
unsigned int m_len;
unsigned int m_flags;
pgoff_t *m_next_pgofs; /* point next possible non-hole pgofs */
+ pgoff_t *m_next_extent; /* point to next possible extent */
+ int m_seg_type;
};
/* for flag in get_data_block */
@@ -528,6 +587,7 @@
F2FS_GET_BLOCK_BMAP,
F2FS_GET_BLOCK_PRE_DIO,
F2FS_GET_BLOCK_PRE_AIO,
+ F2FS_GET_BLOCK_PRECACHE,
};
/*
@@ -538,6 +598,8 @@
#define FADVISE_ENCRYPT_BIT 0x04
#define FADVISE_ENC_NAME_BIT 0x08
#define FADVISE_KEEP_SIZE_BIT 0x10
+#define FADVISE_HOT_BIT 0x20
+#define FADVISE_VERITY_BIT 0x40 /* reserved */
#define file_is_cold(inode) is_file(inode, FADVISE_COLD_BIT)
#define file_wrong_pino(inode) is_file(inode, FADVISE_LOST_PINO_BIT)
@@ -552,15 +614,26 @@
#define file_set_enc_name(inode) set_file(inode, FADVISE_ENC_NAME_BIT)
#define file_keep_isize(inode) is_file(inode, FADVISE_KEEP_SIZE_BIT)
#define file_set_keep_isize(inode) set_file(inode, FADVISE_KEEP_SIZE_BIT)
+#define file_is_hot(inode) is_file(inode, FADVISE_HOT_BIT)
+#define file_set_hot(inode) set_file(inode, FADVISE_HOT_BIT)
+#define file_clear_hot(inode) clear_file(inode, FADVISE_HOT_BIT)
#define DEF_DIR_LEVEL 0
+enum {
+ GC_FAILURE_PIN,
+ GC_FAILURE_ATOMIC,
+ MAX_GC_FAILURE
+};
+
struct f2fs_inode_info {
struct inode vfs_inode; /* serve a vfs inode */
unsigned long i_flags; /* keep an inode flags for ioctl */
unsigned char i_advise; /* use to give file attribute hints */
unsigned char i_dir_level; /* use for dentry level for large dir */
- unsigned int i_current_depth; /* use only in directory structure */
+ unsigned int i_current_depth; /* only for directory depth */
+ /* for gc failure statistic */
+ unsigned int i_gc_failures[MAX_GC_FAILURE];
unsigned int i_pino; /* parent inode number */
umode_t i_acl_mode; /* keep file acl mode temporarily */
@@ -583,16 +656,22 @@
#endif
struct list_head dirty_list; /* dirty list for dirs and files */
struct list_head gdirty_list; /* linked in global dirty list */
+ struct list_head inmem_ilist; /* list for inmem inodes */
struct list_head inmem_pages; /* inmemory pages managed by f2fs */
struct task_struct *inmem_task; /* store inmemory task */
struct mutex inmem_lock; /* lock for inmemory pages */
struct extent_tree *extent_tree; /* cached extent_tree entry */
- struct rw_semaphore dio_rwsem[2];/* avoid racing between dio and gc */
+
+ /* avoid racing between foreground op and gc */
+ struct rw_semaphore i_gc_rwsem[2];
struct rw_semaphore i_mmap_sem;
struct rw_semaphore i_xattr_sem; /* avoid racing between reading and changing EAs */
int i_extra_isize; /* size of extra space located in i_addr */
kprojid_t i_projid; /* id for project quota */
+ int i_inline_xattr_size; /* inline xattr size */
+ struct timespec i_crtime; /* inode creation time */
+ struct timespec i_disk_time[4]; /* inode disk times */
};
static inline void get_extent_info(struct extent_info *ext,
@@ -622,7 +701,8 @@
static inline bool __is_discard_mergeable(struct discard_info *back,
struct discard_info *front)
{
- return back->lstart + back->len == front->lstart;
+ return (back->lstart + back->len == front->lstart) &&
+ (back->len + front->len < DEF_MAX_DISCARD_LEN);
}
static inline bool __is_discard_back_mergeable(struct discard_info *cur,
@@ -666,10 +746,13 @@
}
}
-enum nid_list {
- FREE_NID_LIST,
- ALLOC_NID_LIST,
- MAX_NID_LIST,
+/*
+ * For free nid management
+ */
+enum nid_state {
+ FREE_NID, /* newly added to free nid list */
+ PREALLOC_NID, /* it is preallocated */
+ MAX_NID_STATE,
};
struct f2fs_nm_info {
@@ -692,11 +775,11 @@
/* free node ids management */
struct radix_tree_root free_nid_root;/* root of the free_nid cache */
- struct list_head nid_list[MAX_NID_LIST];/* lists for free nids */
- unsigned int nid_cnt[MAX_NID_LIST]; /* the number of free node id */
+ struct list_head free_nid_list; /* list for free nids excluding preallocated nids */
+ unsigned int nid_cnt[MAX_NID_STATE]; /* the number of free node id */
spinlock_t nid_list_lock; /* protect nid lists ops */
struct mutex build_lock; /* lock for build free nids */
- unsigned char (*free_nid_bitmap)[NAT_ENTRY_BITMAP_SIZE];
+ unsigned char **free_nid_bitmap;
unsigned char *nat_block_bitmap;
unsigned short *free_nid_count; /* free nid count of NAT block */
@@ -771,6 +854,7 @@
struct flush_cmd {
struct completion wait;
struct llist_node llnode;
+ nid_t ino;
int ret;
};
@@ -789,6 +873,8 @@
struct dirty_seglist_info *dirty_info; /* dirty segment information */
struct curseg_info *curseg_array; /* active segment information */
+ struct rw_semaphore curseg_lock; /* for preventing curseg change */
+
block_t seg0_blkaddr; /* block address of 0'th segment */
block_t main_blkaddr; /* start block address of main area */
block_t ssa_blkaddr; /* start block address of SSA area */
@@ -810,6 +896,7 @@
unsigned int min_ipu_util; /* in-place-update threshold */
unsigned int min_fsync_blocks; /* threshold for fsync */
unsigned int min_hot_blocks; /* threshold for hot block allocation */
+ unsigned int min_ssr_sections; /* threshold to trigger SSR allocation */
/* for flush command control */
struct flush_cmd_control *fcc_info;
@@ -831,6 +918,7 @@
enum count_type {
F2FS_DIRTY_DENTS,
F2FS_DIRTY_DATA,
+ F2FS_DIRTY_QDATA,
F2FS_DIRTY_NODES,
F2FS_DIRTY_META,
F2FS_INMEM_PAGES,
@@ -879,6 +967,19 @@
LOCK_RETRY,
};
+enum cp_reason_type {
+ CP_NO_NEEDED,
+ CP_NON_REGULAR,
+ CP_HARDLINK,
+ CP_SB_NEED_CP,
+ CP_WRONG_PINO,
+ CP_NO_SPC_ROLL,
+ CP_NODE_NEED_CP,
+ CP_FASTBOOT_MODE,
+ CP_SPEC_LOG_NUM,
+ CP_RECOVER_DIR,
+};
+
enum iostat_type {
APP_DIRECT_IO, /* app direct IOs */
APP_BUFFERED_IO, /* app buffered IOs */
@@ -898,6 +999,7 @@
struct f2fs_io_info {
struct f2fs_sb_info *sbi; /* f2fs_sb_info pointer */
+ nid_t ino; /* inode number */
enum page_type type; /* contains DATA/NODE/META/META_FLUSH */
enum temp_type temp; /* contains HOT/WARM/COLD */
int op; /* contains REQ_OP_ */
@@ -910,7 +1012,10 @@
bool submitted; /* indicate IO submission */
int need_lock; /* indicate we need to lock cp_rwsem */
bool in_list; /* indicate fio is in io_list */
+ bool is_meta; /* indicate borrow meta inode mapping or not */
+ bool retry; /* need to reallocate block address */
enum iostat_type io_type; /* io type */
+ struct writeback_control *io_wbc; /* writeback control */
};
#define is_read_io(rw) ((rw) == READ)
@@ -942,6 +1047,7 @@
DIR_INODE, /* for dirty dir inode */
FILE_INODE, /* for dirty regular/symlink inode */
DIRTY_META, /* for all dirtied inode metadata */
+ ATOMIC_FILE, /* for all atomic files */
NR_INODE_TYPE,
};
@@ -969,10 +1075,42 @@
MAX_TIME,
};
+enum {
+ GC_NORMAL,
+ GC_IDLE_CB,
+ GC_IDLE_GREEDY,
+ GC_URGENT,
+};
+
+enum {
+ WHINT_MODE_OFF, /* not pass down write hints */
+ WHINT_MODE_USER, /* try to pass down hints given by users */
+ WHINT_MODE_FS, /* pass down hints with F2FS policy */
+};
+
+enum {
+ ALLOC_MODE_DEFAULT, /* stay default */
+ ALLOC_MODE_REUSE, /* reuse segments as much as possible */
+};
+
+enum fsync_mode {
+ FSYNC_MODE_POSIX, /* fsync follows posix semantics */
+ FSYNC_MODE_STRICT, /* fsync behaves in line with ext4 */
+ FSYNC_MODE_NOBARRIER, /* fsync behaves nobarrier based on posix */
+};
+
+#ifdef CONFIG_F2FS_FS_ENCRYPTION
+#define DUMMY_ENCRYPTION_ENABLED(sbi) \
+ (unlikely(F2FS_OPTION(sbi).test_dummy_encryption))
+#else
+#define DUMMY_ENCRYPTION_ENABLED(sbi) (0)
+#endif
+
struct f2fs_sb_info {
struct super_block *sb; /* pointer to VFS super block */
struct proc_dir_entry *s_proc; /* proc entry */
struct f2fs_super_block *raw_super; /* raw super block pointer */
+ struct rw_semaphore sb_lock; /* lock for raw super block */
int valid_super_block; /* valid super block no */
unsigned long s_flag; /* flags for sbi */
@@ -992,7 +1130,8 @@
struct f2fs_bio_info *write_io[NR_PAGE_TYPE]; /* for write bios */
struct mutex wio_mutex[NR_PAGE_TYPE - 1][NR_TEMP_TYPE];
/* bio ordering for NODE/DATA */
- int write_io_size_bits; /* Write IO size bits */
+ /* keep migration IO order for LFS mode */
+ struct rw_semaphore io_order_lock;
mempool_t *write_io_dummy; /* Dummy pages */
/* for checkpoint */
@@ -1042,14 +1181,18 @@
unsigned int total_node_count; /* total node block count */
unsigned int total_valid_node_count; /* valid node block count */
loff_t max_file_blocks; /* max block index of file */
- int active_logs; /* # of active logs */
int dir_level; /* directory level */
+ unsigned int trigger_ssr_threshold; /* threshold to trigger ssr */
+ int readdir_ra; /* readahead inode in readdir */
block_t user_block_count; /* # of user blocks */
block_t total_valid_block_count; /* # of valid blocks */
block_t discard_blks; /* discard command candidats */
block_t last_valid_block_count; /* for recovery */
block_t reserved_blocks; /* configurable reserved blocks */
+ block_t current_reserved_blocks; /* current reserved blocks */
+
+ unsigned int nquota_files; /* # of quota sysfile */
u32 s_next_generation; /* for NFS support */
@@ -1059,7 +1202,7 @@
struct percpu_counter alloc_valid_block_count;
/* writeback control */
- atomic_t wb_sync_req; /* count # of WB_SYNC threads */
+ atomic_t wb_sync_req[META]; /* count # of WB_SYNC threads */
/* valid inode count */
struct percpu_counter total_valid_inode_count;
@@ -1070,9 +1213,12 @@
struct mutex gc_mutex; /* mutex for GC */
struct f2fs_gc_kthread *gc_thread; /* GC thread */
unsigned int cur_victim_sec; /* current victim section num */
+ unsigned int gc_mode; /* current GC state */
+ /* for skip statistic */
+ unsigned long long skipped_atomic_files[2]; /* FG_GC and BG_GC */
- /* threshold for converting bg victims for fg */
- u64 fggc_threshold;
+ /* threshold for gc trials on pinned files */
+ u64 gc_pin_file_threshold;
/* maximum # of trials to find a victim segment for SSR and GC */
unsigned int max_victim_search;
@@ -1115,6 +1261,8 @@
struct list_head s_list;
int s_ndevs; /* number of devices */
struct f2fs_dev_info *devs; /* for device list */
+ unsigned int dirty_device; /* for checkpoint data flush */
+ spinlock_t dev_lock; /* protect dirty_device */
struct mutex umount_mutex;
unsigned int shrinker_run_no;
@@ -1127,17 +1275,6 @@
/* Precomputed FS UUID checksum for seeding other checksums */
__u32 s_chksum_seed;
-
- /* For fault injection */
-#ifdef CONFIG_F2FS_FAULT_INJECTION
- struct f2fs_fault_info fault_info;
-#endif
-
-#ifdef CONFIG_QUOTA
- /* Names of quota files with journalled quota */
- char *s_qf_names[MAXQUOTAS];
- int s_jquota_fmt; /* Format of quota to use */
-#endif
};
#ifdef CONFIG_F2FS_FAULT_INJECTION
@@ -1147,7 +1284,7 @@
__func__, __builtin_return_address(0))
static inline bool time_to_inject(struct f2fs_sb_info *sbi, int type)
{
- struct f2fs_fault_info *ffi = &sbi->fault_info;
+ struct f2fs_fault_info *ffi = &F2FS_OPTION(sbi).fault_info;
if (!ffi->inject_rate)
return false;
@@ -1178,8 +1315,7 @@
static inline bool f2fs_time_over(struct f2fs_sb_info *sbi, int type)
{
- struct timespec ts = {sbi->interval_time[type], 0};
- unsigned long interval = timespec_to_jiffies(&ts);
+ unsigned long interval = sbi->interval_time[type] * HZ;
return time_after(jiffies, sbi->last_time[type] + interval);
}
@@ -1199,33 +1335,7 @@
/*
* Inline functions
*/
-static inline u32 f2fs_crc32(struct f2fs_sb_info *sbi, const void *address,
- unsigned int length)
-{
- SHASH_DESC_ON_STACK(shash, sbi->s_chksum_driver);
- u32 *ctx = (u32 *)shash_desc_ctx(shash);
- u32 retval;
- int err;
-
- shash->tfm = sbi->s_chksum_driver;
- shash->flags = 0;
- *ctx = F2FS_SUPER_MAGIC;
-
- err = crypto_shash_update(shash, address, length);
- BUG_ON(err);
-
- retval = *ctx;
- barrier_data(ctx);
- return retval;
-}
-
-static inline bool f2fs_crc_valid(struct f2fs_sb_info *sbi, __u32 blk_crc,
- void *buf, size_t buf_size)
-{
- return f2fs_crc32(sbi, buf, buf_size) == blk_crc;
-}
-
-static inline u32 f2fs_chksum(struct f2fs_sb_info *sbi, u32 crc,
+static inline u32 __f2fs_crc32(struct f2fs_sb_info *sbi, u32 crc,
const void *address, unsigned int length)
{
struct {
@@ -1246,6 +1356,24 @@
return *(u32 *)desc.ctx;
}
+static inline u32 f2fs_crc32(struct f2fs_sb_info *sbi, const void *address,
+ unsigned int length)
+{
+ return __f2fs_crc32(sbi, F2FS_SUPER_MAGIC, address, length);
+}
+
+static inline bool f2fs_crc_valid(struct f2fs_sb_info *sbi, __u32 blk_crc,
+ void *buf, size_t buf_size)
+{
+ return f2fs_crc32(sbi, buf, buf_size) == blk_crc;
+}
+
+static inline u32 f2fs_chksum(struct f2fs_sb_info *sbi, u32 crc,
+ const void *address, unsigned int length)
+{
+ return __f2fs_crc32(sbi, crc, address, length);
+}
+
static inline struct f2fs_inode_info *F2FS_I(struct inode *inode)
{
return container_of(inode, struct f2fs_inode_info, vfs_inode);
@@ -1346,6 +1474,13 @@
return le64_to_cpu(cp->checkpoint_ver);
}
+static inline unsigned long f2fs_qf_ino(struct super_block *sb, int type)
+{
+ if (type < F2FS_MAX_QUOTAS)
+ return le32_to_cpu(F2FS_SB(sb)->raw_super->qf_ino[type]);
+ return 0;
+}
+
static inline __u64 cur_cp_crc(struct f2fs_checkpoint *cp)
{
size_t crc_offset = le32_to_cpu(cp->checksum_offset);
@@ -1485,6 +1620,25 @@
return ofs == XATTR_NODE_OFFSET;
}
+static inline bool __allow_reserved_blocks(struct f2fs_sb_info *sbi,
+ struct inode *inode, bool cap)
+{
+ if (!inode)
+ return true;
+ if (!test_opt(sbi, RESERVE_ROOT))
+ return false;
+ if (IS_NOQUOTA(inode))
+ return true;
+ if (uid_eq(F2FS_OPTION(sbi).s_resuid, current_fsuid()))
+ return true;
+ if (!gid_eq(F2FS_OPTION(sbi).s_resgid, GLOBAL_ROOT_GID) &&
+ in_group_p(F2FS_OPTION(sbi).s_resgid))
+ return true;
+ if (cap && capable(CAP_SYS_RESOURCE))
+ return true;
+ return false;
+}
+
static inline void f2fs_i_blocks_write(struct inode *, block_t, bool, bool);
static inline int inc_valid_block_count(struct f2fs_sb_info *sbi,
struct inode *inode, blkcnt_t *count)
@@ -1512,12 +1666,19 @@
spin_lock(&sbi->stat_lock);
sbi->total_valid_block_count += (block_t)(*count);
- avail_user_block_count = sbi->user_block_count - sbi->reserved_blocks;
+ avail_user_block_count = sbi->user_block_count -
+ sbi->current_reserved_blocks;
+
+ if (!__allow_reserved_blocks(sbi, inode, true))
+ avail_user_block_count -= F2FS_OPTION(sbi).root_reserved_blocks;
+
if (unlikely(sbi->total_valid_block_count > avail_user_block_count)) {
diff = sbi->total_valid_block_count - avail_user_block_count;
+ if (diff > *count)
+ diff = *count;
*count -= diff;
release = diff;
- sbi->total_valid_block_count = avail_user_block_count;
+ sbi->total_valid_block_count -= diff;
if (!*count) {
spin_unlock(&sbi->stat_lock);
percpu_counter_sub(&sbi->alloc_valid_block_count, diff);
@@ -1526,7 +1687,7 @@
}
spin_unlock(&sbi->stat_lock);
- if (release)
+ if (unlikely(release))
dquot_release_reservation_block(inode, release);
f2fs_i_blocks_write(inode, *count, true, true);
return 0;
@@ -1546,6 +1707,10 @@
f2fs_bug_on(sbi, sbi->total_valid_block_count < (block_t) count);
f2fs_bug_on(sbi, inode->i_blocks < sectors);
sbi->total_valid_block_count -= (block_t)count;
+ if (sbi->reserved_blocks &&
+ sbi->current_reserved_blocks < sbi->reserved_blocks)
+ sbi->current_reserved_blocks = min(sbi->reserved_blocks,
+ sbi->current_reserved_blocks + count);
spin_unlock(&sbi->stat_lock);
f2fs_i_blocks_write(inode, count, false, true);
}
@@ -1566,6 +1731,8 @@
atomic_inc(&F2FS_I(inode)->dirty_pages);
inc_page_count(F2FS_I_SB(inode), S_ISDIR(inode->i_mode) ?
F2FS_DIRTY_DENTS : F2FS_DIRTY_DATA);
+ if (IS_NOQUOTA(inode))
+ inc_page_count(F2FS_I_SB(inode), F2FS_DIRTY_QDATA);
}
static inline void dec_page_count(struct f2fs_sb_info *sbi, int count_type)
@@ -1582,6 +1749,8 @@
atomic_dec(&F2FS_I(inode)->dirty_pages);
dec_page_count(F2FS_I_SB(inode), S_ISDIR(inode->i_mode) ?
F2FS_DIRTY_DENTS : F2FS_DIRTY_DATA);
+ if (IS_NOQUOTA(inode))
+ dec_page_count(F2FS_I_SB(inode), F2FS_DIRTY_QDATA);
}
static inline s64 get_pages(struct f2fs_sb_info *sbi, int count_type)
@@ -1636,6 +1805,12 @@
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
int offset;
+ if (is_set_ckpt_flags(sbi, CP_LARGE_NAT_BITMAP_FLAG)) {
+ offset = (flag == SIT_BITMAP) ?
+ le32_to_cpu(ckpt->nat_ver_bitmap_bytesize) : 0;
+ return &ckpt->sit_nat_version_bitmap + offset;
+ }
+
if (__cp_payload(sbi) > 0) {
if (flag == NAT_BITMAP)
return &ckpt->sit_nat_version_bitmap;
@@ -1689,11 +1864,22 @@
return ret;
}
+#ifdef CONFIG_F2FS_FAULT_INJECTION
+ if (time_to_inject(sbi, FAULT_BLOCK)) {
+ f2fs_show_injection_info(FAULT_BLOCK);
+ goto enospc;
+ }
+#endif
+
spin_lock(&sbi->stat_lock);
- valid_block_count = sbi->total_valid_block_count + 1;
- if (unlikely(valid_block_count + sbi->reserved_blocks >
- sbi->user_block_count)) {
+ valid_block_count = sbi->total_valid_block_count +
+ sbi->current_reserved_blocks + 1;
+
+ if (!__allow_reserved_blocks(sbi, inode, false))
+ valid_block_count += F2FS_OPTION(sbi).root_reserved_blocks;
+
+ if (unlikely(valid_block_count > sbi->user_block_count)) {
spin_unlock(&sbi->stat_lock);
goto enospc;
}
@@ -1735,6 +1921,9 @@
sbi->total_valid_node_count--;
sbi->total_valid_block_count--;
+ if (sbi->reserved_blocks &&
+ sbi->current_reserved_blocks < sbi->reserved_blocks)
+ sbi->current_reserved_blocks++;
spin_unlock(&sbi->stat_lock);
@@ -1786,6 +1975,19 @@
return grab_cache_page_write_begin(mapping, index, AOP_FLAG_NOFS);
}
+static inline struct page *f2fs_pagecache_get_page(
+ struct address_space *mapping, pgoff_t index,
+ int fgp_flags, gfp_t gfp_mask)
+{
+#ifdef CONFIG_F2FS_FAULT_INJECTION
+ if (time_to_inject(F2FS_M_SB(mapping), FAULT_PAGE_GET)) {
+ f2fs_show_injection_info(FAULT_PAGE_GET);
+ return NULL;
+ }
+#endif
+ return pagecache_get_page(mapping, index, fgp_flags, gfp_mask);
+}
+
static inline void f2fs_copy_page(struct page *src, struct page *dst)
{
char *src_kaddr = kmap(src);
@@ -1835,15 +2037,25 @@
return entry;
}
-static inline struct bio *f2fs_bio_alloc(int npages)
+static inline struct bio *f2fs_bio_alloc(struct f2fs_sb_info *sbi,
+ int npages, bool no_fail)
{
struct bio *bio;
- /* No failure on bio allocation */
- bio = bio_alloc(GFP_NOIO, npages);
- if (!bio)
- bio = bio_alloc(GFP_NOIO | __GFP_NOFAIL, npages);
- return bio;
+ if (no_fail) {
+ /* No failure on bio allocation */
+ bio = bio_alloc(GFP_NOIO, npages);
+ if (!bio)
+ bio = bio_alloc(GFP_NOIO | __GFP_NOFAIL, npages);
+ return bio;
+ }
+#ifdef CONFIG_F2FS_FAULT_INJECTION
+ if (time_to_inject(sbi, FAULT_ALLOC_BIO)) {
+ f2fs_show_injection_info(FAULT_ALLOC_BIO);
+ return NULL;
+ }
+#endif
+ return bio_alloc(GFP_KERNEL, npages);
}
static inline void f2fs_radix_tree_insert(struct radix_tree_root *root,
@@ -1885,11 +2097,11 @@
raw_node = F2FS_NODE(node_page);
/* from GC path only */
- if (!inode) {
- if (is_inode)
+ if (is_inode) {
+ if (!inode)
base = offset_in_addr(&raw_node->i);
- } else if (f2fs_has_extra_attr(inode) && is_inode) {
- base = get_extra_isize(inode);
+ else if (f2fs_has_extra_attr(inode))
+ base = get_extra_isize(inode);
}
addr_array = blkaddr_in_node(raw_node);
@@ -1956,9 +2168,60 @@
*addr ^= mask;
}
-#define F2FS_REG_FLMASK (~(FS_DIRSYNC_FL | FS_TOPDIR_FL))
-#define F2FS_OTHER_FLMASK (FS_NODUMP_FL | FS_NOATIME_FL)
-#define F2FS_FL_INHERITED (FS_PROJINHERIT_FL)
+/*
+ * Inode flags
+ */
+#define F2FS_SECRM_FL 0x00000001 /* Secure deletion */
+#define F2FS_UNRM_FL 0x00000002 /* Undelete */
+#define F2FS_COMPR_FL 0x00000004 /* Compress file */
+#define F2FS_SYNC_FL 0x00000008 /* Synchronous updates */
+#define F2FS_IMMUTABLE_FL 0x00000010 /* Immutable file */
+#define F2FS_APPEND_FL 0x00000020 /* writes to file may only append */
+#define F2FS_NODUMP_FL 0x00000040 /* do not dump file */
+#define F2FS_NOATIME_FL 0x00000080 /* do not update atime */
+/* Reserved for compression usage... */
+#define F2FS_DIRTY_FL 0x00000100
+#define F2FS_COMPRBLK_FL 0x00000200 /* One or more compressed clusters */
+#define F2FS_NOCOMPR_FL 0x00000400 /* Don't compress */
+#define F2FS_ENCRYPT_FL 0x00000800 /* encrypted file */
+/* End compression flags --- maybe not all used */
+#define F2FS_INDEX_FL 0x00001000 /* hash-indexed directory */
+#define F2FS_IMAGIC_FL 0x00002000 /* AFS directory */
+#define F2FS_JOURNAL_DATA_FL 0x00004000 /* file data should be journaled */
+#define F2FS_NOTAIL_FL 0x00008000 /* file tail should not be merged */
+#define F2FS_DIRSYNC_FL 0x00010000 /* dirsync behaviour (directories only) */
+#define F2FS_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/
+#define F2FS_HUGE_FILE_FL 0x00040000 /* Set to each huge file */
+#define F2FS_EXTENTS_FL 0x00080000 /* Inode uses extents */
+#define F2FS_EA_INODE_FL 0x00200000 /* Inode used for large EA */
+#define F2FS_EOFBLOCKS_FL 0x00400000 /* Blocks allocated beyond EOF */
+#define F2FS_INLINE_DATA_FL 0x10000000 /* Inode has inline data. */
+#define F2FS_PROJINHERIT_FL 0x20000000 /* Create with parents projid */
+#define F2FS_RESERVED_FL 0x80000000 /* reserved for ext4 lib */
+
+#define F2FS_FL_USER_VISIBLE 0x304BDFFF /* User visible flags */
+#define F2FS_FL_USER_MODIFIABLE 0x204BC0FF /* User modifiable flags */
+
+/* Flags we can manipulate with through F2FS_IOC_FSSETXATTR */
+#define F2FS_FL_XFLAG_VISIBLE (F2FS_SYNC_FL | \
+ F2FS_IMMUTABLE_FL | \
+ F2FS_APPEND_FL | \
+ F2FS_NODUMP_FL | \
+ F2FS_NOATIME_FL | \
+ F2FS_PROJINHERIT_FL)
+
+/* Flags that should be inherited by new inodes from their parent. */
+#define F2FS_FL_INHERITED (F2FS_SECRM_FL | F2FS_UNRM_FL | F2FS_COMPR_FL |\
+ F2FS_SYNC_FL | F2FS_NODUMP_FL | F2FS_NOATIME_FL |\
+ F2FS_NOCOMPR_FL | F2FS_JOURNAL_DATA_FL |\
+ F2FS_NOTAIL_FL | F2FS_DIRSYNC_FL |\
+ F2FS_PROJINHERIT_FL)
+
+/* Flags that are appropriate for regular files (all but dir-specific ones). */
+#define F2FS_REG_FLMASK (~(F2FS_DIRSYNC_FL | F2FS_TOPDIR_FL))
+
+/* Flags that are appropriate for non-directories/regular files. */
+#define F2FS_OTHER_FLMASK (F2FS_NODUMP_FL | F2FS_NOATIME_FL)
static inline __u32 f2fs_mask_flags(umode_t mode, __u32 flags)
{
@@ -2000,6 +2263,8 @@
FI_HOT_DATA, /* indicate file is hot */
FI_EXTRA_ATTR, /* indicate file has extra attribute */
FI_PROJ_INHERIT, /* indicate file inherits projectid */
+ FI_PIN_FILE, /* indicate file should not be gced */
+ FI_ATOMIC_REVOKE_REQUEST, /* request to drop atomic data */
};
static inline void __mark_inode_dirty_flag(struct inode *inode,
@@ -2009,10 +2274,12 @@
case FI_INLINE_XATTR:
case FI_INLINE_DATA:
case FI_INLINE_DENTRY:
+ case FI_NEW_INODE:
if (set)
return;
case FI_DATA_EXIST:
case FI_INLINE_DOTS:
+ case FI_PIN_FILE:
f2fs_mark_inode_dirty_sync(inode, true);
}
}
@@ -2093,6 +2360,13 @@
f2fs_mark_inode_dirty_sync(inode, true);
}
+static inline void f2fs_i_gc_failures_write(struct inode *inode,
+ unsigned int count)
+{
+ F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN] = count;
+ f2fs_mark_inode_dirty_sync(inode, true);
+}
+
static inline void f2fs_i_xnid_write(struct inode *inode, nid_t xnid)
{
F2FS_I(inode)->i_xattr_nid = xnid;
@@ -2121,6 +2395,8 @@
set_bit(FI_INLINE_DOTS, &fi->flags);
if (ri->i_inline & F2FS_EXTRA_ATTR)
set_bit(FI_EXTRA_ATTR, &fi->flags);
+ if (ri->i_inline & F2FS_PIN_FILE)
+ set_bit(FI_PIN_FILE, &fi->flags);
}
static inline void set_raw_inline(struct inode *inode, struct f2fs_inode *ri)
@@ -2139,6 +2415,8 @@
ri->i_inline |= F2FS_INLINE_DOTS;
if (is_inode_flag_set(inode, FI_EXTRA_ATTR))
ri->i_inline |= F2FS_EXTRA_ATTR;
+ if (is_inode_flag_set(inode, FI_PIN_FILE))
+ ri->i_inline |= F2FS_PIN_FILE;
}
static inline int f2fs_has_extra_attr(struct inode *inode)
@@ -2153,25 +2431,20 @@
static inline unsigned int addrs_per_inode(struct inode *inode)
{
- if (f2fs_has_inline_xattr(inode))
- return CUR_ADDRS_PER_INODE(inode) - F2FS_INLINE_XATTR_ADDRS;
- return CUR_ADDRS_PER_INODE(inode);
+ return CUR_ADDRS_PER_INODE(inode) - get_inline_xattr_addrs(inode);
}
-static inline void *inline_xattr_addr(struct page *page)
+static inline void *inline_xattr_addr(struct inode *inode, struct page *page)
{
struct f2fs_inode *ri = F2FS_INODE(page);
return (void *)&(ri->i_addr[DEF_ADDRS_PER_INODE -
- F2FS_INLINE_XATTR_ADDRS]);
+ get_inline_xattr_addrs(inode)]);
}
static inline int inline_xattr_size(struct inode *inode)
{
- if (f2fs_has_inline_xattr(inode))
- return F2FS_INLINE_XATTR_ADDRS << 2;
- else
- return 0;
+ return get_inline_xattr_addrs(inode) * sizeof(__le32);
}
static inline int f2fs_has_inline_data(struct inode *inode)
@@ -2189,6 +2462,11 @@
return is_inode_flag_set(inode, FI_INLINE_DOTS);
}
+static inline bool f2fs_is_pinned_file(struct inode *inode)
+{
+ return is_inode_flag_set(inode, FI_PIN_FILE);
+}
+
static inline bool f2fs_is_atomic_file(struct inode *inode)
{
return is_inode_flag_set(inode, FI_ATOMIC_FILE);
@@ -2227,12 +2505,6 @@
return is_inode_flag_set(inode, FI_INLINE_DENTRY);
}
-static inline void f2fs_dentry_kunmap(struct inode *dir, struct page *page)
-{
- if (!f2fs_has_inline_dentry(dir))
- kunmap(page);
-}
-
static inline int is_file(struct inode *inode, int type)
{
return F2FS_I(inode)->i_advise & type;
@@ -2252,9 +2524,10 @@
static inline bool f2fs_skip_inode_update(struct inode *inode, int dsync)
{
+ bool ret;
+
if (dsync) {
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
- bool ret;
spin_lock(&sbi->inode_lock[DIRTY_META]);
ret = list_empty(&F2FS_I(inode)->gdirty_list);
@@ -2263,14 +2536,29 @@
}
if (!is_inode_flag_set(inode, FI_AUTO_RECOVER) ||
file_keep_isize(inode) ||
- i_size_read(inode) & PAGE_MASK)
+ i_size_read(inode) & ~PAGE_MASK)
return false;
- return F2FS_I(inode)->last_disk_size == i_size_read(inode);
+
+ if (!timespec_equal(F2FS_I(inode)->i_disk_time, &inode->i_atime))
+ return false;
+ if (!timespec_equal(F2FS_I(inode)->i_disk_time + 1, &inode->i_ctime))
+ return false;
+ if (!timespec_equal(F2FS_I(inode)->i_disk_time + 2, &inode->i_mtime))
+ return false;
+ if (!timespec_equal(F2FS_I(inode)->i_disk_time + 3,
+ &F2FS_I(inode)->i_crtime))
+ return false;
+
+ down_read(&F2FS_I(inode)->i_sem);
+ ret = F2FS_I(inode)->last_disk_size == i_size_read(inode);
+ up_read(&F2FS_I(inode)->i_sem);
+
+ return ret;
}
-static inline int f2fs_readonly(struct super_block *sb)
+static inline bool f2fs_readonly(struct super_block *sb)
{
- return sb->s_flags & MS_RDONLY;
+ return sb_rdonly(sb);
}
static inline bool f2fs_cp_error(struct f2fs_sb_info *sbi)
@@ -2310,12 +2598,41 @@
return kmalloc(size, flags);
}
+static inline void *f2fs_kzalloc(struct f2fs_sb_info *sbi,
+ size_t size, gfp_t flags)
+{
+ return f2fs_kmalloc(sbi, size, flags | __GFP_ZERO);
+}
+
+static inline void *f2fs_kvmalloc(struct f2fs_sb_info *sbi,
+ size_t size, gfp_t flags)
+{
+#ifdef CONFIG_F2FS_FAULT_INJECTION
+ if (time_to_inject(sbi, FAULT_KVMALLOC)) {
+ f2fs_show_injection_info(FAULT_KVMALLOC);
+ return NULL;
+ }
+#endif
+ return kvmalloc(size, flags);
+}
+
+static inline void *f2fs_kvzalloc(struct f2fs_sb_info *sbi,
+ size_t size, gfp_t flags)
+{
+ return f2fs_kvmalloc(sbi, size, flags | __GFP_ZERO);
+}
+
static inline int get_extra_isize(struct inode *inode)
{
return F2FS_I(inode)->i_extra_isize / sizeof(__le32);
}
-#define get_inode_mode(i) \
+static inline int get_inline_xattr_addrs(struct inode *inode)
+{
+ return F2FS_I(inode)->i_inline_xattr_size;
+}
+
+#define f2fs_get_inode_mode(i) \
((is_inode_flag_set(i, FI_ACL_MODE)) ? \
(F2FS_I(i)->i_acl_mode) : ((i)->i_mode))
@@ -2354,20 +2671,29 @@
spin_unlock(&sbi->iostat_lock);
}
+static inline bool is_valid_blkaddr(block_t blkaddr)
+{
+ if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR)
+ return false;
+ return true;
+}
+
/*
* file.c
*/
int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync);
-void truncate_data_blocks(struct dnode_of_data *dn);
-int truncate_blocks(struct inode *inode, u64 from, bool lock);
+void f2fs_truncate_data_blocks(struct dnode_of_data *dn);
+int f2fs_truncate_blocks(struct inode *inode, u64 from, bool lock);
int f2fs_truncate(struct inode *inode);
int f2fs_getattr(const struct path *path, struct kstat *stat,
u32 request_mask, unsigned int flags);
int f2fs_setattr(struct dentry *dentry, struct iattr *attr);
-int truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t pg_end);
-int truncate_data_blocks_range(struct dnode_of_data *dn, int count);
+int f2fs_truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t pg_end);
+void f2fs_truncate_data_blocks_range(struct dnode_of_data *dn, int count);
+int f2fs_precache_extents(struct inode *inode);
long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg);
long f2fs_compat_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
+int f2fs_pin_file_control(struct inode *inode, bool inc);
/*
* inode.c
@@ -2377,36 +2703,37 @@
void f2fs_inode_chksum_set(struct f2fs_sb_info *sbi, struct page *page);
struct inode *f2fs_iget(struct super_block *sb, unsigned long ino);
struct inode *f2fs_iget_retry(struct super_block *sb, unsigned long ino);
-int try_to_free_nats(struct f2fs_sb_info *sbi, int nr_shrink);
-int update_inode(struct inode *inode, struct page *node_page);
-int update_inode_page(struct inode *inode);
+int f2fs_try_to_free_nats(struct f2fs_sb_info *sbi, int nr_shrink);
+void f2fs_update_inode(struct inode *inode, struct page *node_page);
+void f2fs_update_inode_page(struct inode *inode);
int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc);
void f2fs_evict_inode(struct inode *inode);
-void handle_failed_inode(struct inode *inode);
+void f2fs_handle_failed_inode(struct inode *inode);
/*
* namei.c
*/
+int f2fs_update_extension_list(struct f2fs_sb_info *sbi, const char *name,
+ bool hot, bool set);
struct dentry *f2fs_get_parent(struct dentry *child);
/*
* dir.c
*/
-void set_de_type(struct f2fs_dir_entry *de, umode_t mode);
-unsigned char get_de_type(struct f2fs_dir_entry *de);
-struct f2fs_dir_entry *find_target_dentry(struct fscrypt_name *fname,
+unsigned char f2fs_get_de_type(struct f2fs_dir_entry *de);
+struct f2fs_dir_entry *f2fs_find_target_dentry(struct fscrypt_name *fname,
f2fs_hash_t namehash, int *max_slots,
struct f2fs_dentry_ptr *d);
int f2fs_fill_dentries(struct dir_context *ctx, struct f2fs_dentry_ptr *d,
unsigned int start_pos, struct fscrypt_str *fstr);
-void do_make_empty_dir(struct inode *inode, struct inode *parent,
+void f2fs_do_make_empty_dir(struct inode *inode, struct inode *parent,
struct f2fs_dentry_ptr *d);
-struct page *init_inode_metadata(struct inode *inode, struct inode *dir,
+struct page *f2fs_init_inode_metadata(struct inode *inode, struct inode *dir,
const struct qstr *new_name,
const struct qstr *orig_name, struct page *dpage);
-void update_parent_metadata(struct inode *dir, struct inode *inode,
+void f2fs_update_parent_metadata(struct inode *dir, struct inode *inode,
unsigned int current_depth);
-int room_for_filename(const void *bitmap, int slots, int max_slots);
+int f2fs_room_for_filename(const void *bitmap, int slots, int max_slots);
void f2fs_drop_nlink(struct inode *dir, struct inode *inode);
struct f2fs_dir_entry *__f2fs_find_entry(struct inode *dir,
struct fscrypt_name *fname, struct page **res_page);
@@ -2423,9 +2750,9 @@
int f2fs_add_regular_entry(struct inode *dir, const struct qstr *new_name,
const struct qstr *orig_name,
struct inode *inode, nid_t ino, umode_t mode);
-int __f2fs_do_add_link(struct inode *dir, struct fscrypt_name *fname,
+int f2fs_add_dentry(struct inode *dir, struct fscrypt_name *fname,
struct inode *inode, nid_t ino, umode_t mode);
-int __f2fs_add_link(struct inode *dir, const struct qstr *name,
+int f2fs_do_add_link(struct inode *dir, const struct qstr *name,
struct inode *inode, nid_t ino, umode_t mode);
void f2fs_delete_entry(struct f2fs_dir_entry *dentry, struct page *page,
struct inode *dir, struct inode *inode);
@@ -2434,7 +2761,7 @@
static inline int f2fs_add_link(struct dentry *dentry, struct inode *inode)
{
- return __f2fs_add_link(d_inode(dentry->d_parent), &dentry->d_name,
+ return f2fs_do_add_link(d_inode(dentry->d_parent), &dentry->d_name,
inode, inode->i_ino, inode->i_mode);
}
@@ -2443,13 +2770,13 @@
*/
int f2fs_inode_dirtied(struct inode *inode, bool sync);
void f2fs_inode_synced(struct inode *inode);
-void f2fs_enable_quota_files(struct f2fs_sb_info *sbi);
+int f2fs_enable_quota_files(struct f2fs_sb_info *sbi, bool rdonly);
void f2fs_quota_off_umount(struct super_block *sb);
int f2fs_commit_super(struct f2fs_sb_info *sbi, bool recover);
int f2fs_sync_fs(struct super_block *sb, int sync);
extern __printf(3, 4)
void f2fs_msg(struct super_block *sb, const char *level, const char *fmt, ...);
-int sanity_check_ckpt(struct f2fs_sb_info *sbi);
+int f2fs_sanity_check_ckpt(struct f2fs_sb_info *sbi);
/*
* hash.c
@@ -2463,168 +2790,183 @@
struct dnode_of_data;
struct node_info;
-int check_nid_range(struct f2fs_sb_info *sbi, nid_t nid);
-bool available_free_memory(struct f2fs_sb_info *sbi, int type);
-int need_dentry_mark(struct f2fs_sb_info *sbi, nid_t nid);
-bool is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid);
-bool need_inode_block_update(struct f2fs_sb_info *sbi, nid_t ino);
-void get_node_info(struct f2fs_sb_info *sbi, nid_t nid, struct node_info *ni);
-pgoff_t get_next_page_offset(struct dnode_of_data *dn, pgoff_t pgofs);
-int get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode);
-int truncate_inode_blocks(struct inode *inode, pgoff_t from);
-int truncate_xattr_node(struct inode *inode, struct page *page);
-int wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, nid_t ino);
-int remove_inode_page(struct inode *inode);
-struct page *new_inode_page(struct inode *inode);
-struct page *new_node_page(struct dnode_of_data *dn, unsigned int ofs);
-void ra_node_page(struct f2fs_sb_info *sbi, nid_t nid);
-struct page *get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid);
-struct page *get_node_page_ra(struct page *parent, int start);
-void move_node_page(struct page *node_page, int gc_type);
-int fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode,
+int f2fs_check_nid_range(struct f2fs_sb_info *sbi, nid_t nid);
+bool f2fs_available_free_memory(struct f2fs_sb_info *sbi, int type);
+int f2fs_need_dentry_mark(struct f2fs_sb_info *sbi, nid_t nid);
+bool f2fs_is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid);
+bool f2fs_need_inode_block_update(struct f2fs_sb_info *sbi, nid_t ino);
+void f2fs_get_node_info(struct f2fs_sb_info *sbi, nid_t nid,
+ struct node_info *ni);
+pgoff_t f2fs_get_next_page_offset(struct dnode_of_data *dn, pgoff_t pgofs);
+int f2fs_get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode);
+int f2fs_truncate_inode_blocks(struct inode *inode, pgoff_t from);
+int f2fs_truncate_xattr_node(struct inode *inode);
+int f2fs_wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, nid_t ino);
+int f2fs_remove_inode_page(struct inode *inode);
+struct page *f2fs_new_inode_page(struct inode *inode);
+struct page *f2fs_new_node_page(struct dnode_of_data *dn, unsigned int ofs);
+void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid);
+struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid);
+struct page *f2fs_get_node_page_ra(struct page *parent, int start);
+void f2fs_move_node_page(struct page *node_page, int gc_type);
+int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode,
struct writeback_control *wbc, bool atomic);
-int sync_node_pages(struct f2fs_sb_info *sbi, struct writeback_control *wbc,
+int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
+ struct writeback_control *wbc,
bool do_balance, enum iostat_type io_type);
-void build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount);
-bool alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid);
-void alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid);
-void alloc_nid_failed(struct f2fs_sb_info *sbi, nid_t nid);
-int try_to_free_nids(struct f2fs_sb_info *sbi, int nr_shrink);
-void recover_inline_xattr(struct inode *inode, struct page *page);
-int recover_xattr_data(struct inode *inode, struct page *page,
- block_t blkaddr);
-int recover_inode_page(struct f2fs_sb_info *sbi, struct page *page);
-int restore_node_summary(struct f2fs_sb_info *sbi,
+void f2fs_build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount);
+bool f2fs_alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid);
+void f2fs_alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid);
+void f2fs_alloc_nid_failed(struct f2fs_sb_info *sbi, nid_t nid);
+int f2fs_try_to_free_nids(struct f2fs_sb_info *sbi, int nr_shrink);
+void f2fs_recover_inline_xattr(struct inode *inode, struct page *page);
+int f2fs_recover_xattr_data(struct inode *inode, struct page *page);
+int f2fs_recover_inode_page(struct f2fs_sb_info *sbi, struct page *page);
+void f2fs_restore_node_summary(struct f2fs_sb_info *sbi,
unsigned int segno, struct f2fs_summary_block *sum);
-void flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
-int build_node_manager(struct f2fs_sb_info *sbi);
-void destroy_node_manager(struct f2fs_sb_info *sbi);
-int __init create_node_manager_caches(void);
-void destroy_node_manager_caches(void);
+void f2fs_flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
+int f2fs_build_node_manager(struct f2fs_sb_info *sbi);
+void f2fs_destroy_node_manager(struct f2fs_sb_info *sbi);
+int __init f2fs_create_node_manager_caches(void);
+void f2fs_destroy_node_manager_caches(void);
/*
* segment.c
*/
-bool need_SSR(struct f2fs_sb_info *sbi);
-void register_inmem_page(struct inode *inode, struct page *page);
-void drop_inmem_pages(struct inode *inode);
-void drop_inmem_page(struct inode *inode, struct page *page);
-int commit_inmem_pages(struct inode *inode);
+bool f2fs_need_SSR(struct f2fs_sb_info *sbi);
+void f2fs_register_inmem_page(struct inode *inode, struct page *page);
+void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, bool gc_failure);
+void f2fs_drop_inmem_pages(struct inode *inode);
+void f2fs_drop_inmem_page(struct inode *inode, struct page *page);
+int f2fs_commit_inmem_pages(struct inode *inode);
void f2fs_balance_fs(struct f2fs_sb_info *sbi, bool need);
void f2fs_balance_fs_bg(struct f2fs_sb_info *sbi);
-int f2fs_issue_flush(struct f2fs_sb_info *sbi);
-int create_flush_cmd_control(struct f2fs_sb_info *sbi);
-void destroy_flush_cmd_control(struct f2fs_sb_info *sbi, bool free);
-void invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr);
-bool is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr);
-void refresh_sit_entry(struct f2fs_sb_info *sbi, block_t old, block_t new);
-void stop_discard_thread(struct f2fs_sb_info *sbi);
-void f2fs_wait_discard_bios(struct f2fs_sb_info *sbi, bool umount);
-void clear_prefree_segments(struct f2fs_sb_info *sbi, struct cp_control *cpc);
-void release_discard_addrs(struct f2fs_sb_info *sbi);
-int npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra);
-void allocate_new_segments(struct f2fs_sb_info *sbi);
+int f2fs_issue_flush(struct f2fs_sb_info *sbi, nid_t ino);
+int f2fs_create_flush_cmd_control(struct f2fs_sb_info *sbi);
+int f2fs_flush_device_cache(struct f2fs_sb_info *sbi);
+void f2fs_destroy_flush_cmd_control(struct f2fs_sb_info *sbi, bool free);
+void f2fs_invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr);
+bool f2fs_is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr);
+void f2fs_drop_discard_cmd(struct f2fs_sb_info *sbi);
+void f2fs_stop_discard_thread(struct f2fs_sb_info *sbi);
+bool f2fs_wait_discard_bios(struct f2fs_sb_info *sbi);
+void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
+ struct cp_control *cpc);
+void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi);
+int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra);
+void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi);
int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range);
-bool exist_trim_candidates(struct f2fs_sb_info *sbi, struct cp_control *cpc);
-struct page *get_sum_page(struct f2fs_sb_info *sbi, unsigned int segno);
-void update_meta_page(struct f2fs_sb_info *sbi, void *src, block_t blk_addr);
-void write_meta_page(struct f2fs_sb_info *sbi, struct page *page,
+bool f2fs_exist_trim_candidates(struct f2fs_sb_info *sbi,
+ struct cp_control *cpc);
+struct page *f2fs_get_sum_page(struct f2fs_sb_info *sbi, unsigned int segno);
+void f2fs_update_meta_page(struct f2fs_sb_info *sbi, void *src,
+ block_t blk_addr);
+void f2fs_do_write_meta_page(struct f2fs_sb_info *sbi, struct page *page,
enum iostat_type io_type);
-void write_node_page(unsigned int nid, struct f2fs_io_info *fio);
-void write_data_page(struct dnode_of_data *dn, struct f2fs_io_info *fio);
-int rewrite_data_page(struct f2fs_io_info *fio);
-void __f2fs_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
+void f2fs_do_write_node_page(unsigned int nid, struct f2fs_io_info *fio);
+void f2fs_outplace_write_data(struct dnode_of_data *dn,
+ struct f2fs_io_info *fio);
+int f2fs_inplace_write_data(struct f2fs_io_info *fio);
+void f2fs_do_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
block_t old_blkaddr, block_t new_blkaddr,
bool recover_curseg, bool recover_newaddr);
void f2fs_replace_block(struct f2fs_sb_info *sbi, struct dnode_of_data *dn,
block_t old_addr, block_t new_addr,
unsigned char version, bool recover_curseg,
bool recover_newaddr);
-void allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,
+void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,
block_t old_blkaddr, block_t *new_blkaddr,
struct f2fs_summary *sum, int type,
struct f2fs_io_info *fio, bool add_list);
void f2fs_wait_on_page_writeback(struct page *page,
enum page_type type, bool ordered);
void f2fs_wait_on_block_writeback(struct f2fs_sb_info *sbi, block_t blkaddr);
-void write_data_summaries(struct f2fs_sb_info *sbi, block_t start_blk);
-void write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk);
-int lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
+void f2fs_write_data_summaries(struct f2fs_sb_info *sbi, block_t start_blk);
+void f2fs_write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk);
+int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
unsigned int val, int alloc);
-void flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
-int build_segment_manager(struct f2fs_sb_info *sbi);
-void destroy_segment_manager(struct f2fs_sb_info *sbi);
-int __init create_segment_manager_caches(void);
-void destroy_segment_manager_caches(void);
+void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc);
+int f2fs_build_segment_manager(struct f2fs_sb_info *sbi);
+void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi);
+int __init f2fs_create_segment_manager_caches(void);
+void f2fs_destroy_segment_manager_caches(void);
+int f2fs_rw_hint_to_seg_type(enum rw_hint hint);
+enum rw_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
+ enum page_type type, enum temp_type temp);
/*
* checkpoint.c
*/
void f2fs_stop_checkpoint(struct f2fs_sb_info *sbi, bool end_io);
-struct page *grab_meta_page(struct f2fs_sb_info *sbi, pgoff_t index);
-struct page *get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index);
-struct page *get_tmp_page(struct f2fs_sb_info *sbi, pgoff_t index);
-bool is_valid_blkaddr(struct f2fs_sb_info *sbi, block_t blkaddr, int type);
-int ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages,
+struct page *f2fs_grab_meta_page(struct f2fs_sb_info *sbi, pgoff_t index);
+struct page *f2fs_get_meta_page(struct f2fs_sb_info *sbi, pgoff_t index);
+struct page *f2fs_get_tmp_page(struct f2fs_sb_info *sbi, pgoff_t index);
+bool f2fs_is_valid_meta_blkaddr(struct f2fs_sb_info *sbi,
+ block_t blkaddr, int type);
+int f2fs_ra_meta_pages(struct f2fs_sb_info *sbi, block_t start, int nrpages,
int type, bool sync);
-void ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index);
-long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type,
+void f2fs_ra_meta_pages_cond(struct f2fs_sb_info *sbi, pgoff_t index);
+long f2fs_sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type,
long nr_to_write, enum iostat_type io_type);
-void add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type);
-void remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type);
-void release_ino_entry(struct f2fs_sb_info *sbi, bool all);
-bool exist_written_data(struct f2fs_sb_info *sbi, nid_t ino, int mode);
+void f2fs_add_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type);
+void f2fs_remove_ino_entry(struct f2fs_sb_info *sbi, nid_t ino, int type);
+void f2fs_release_ino_entry(struct f2fs_sb_info *sbi, bool all);
+bool f2fs_exist_written_data(struct f2fs_sb_info *sbi, nid_t ino, int mode);
+void f2fs_set_dirty_device(struct f2fs_sb_info *sbi, nid_t ino,
+ unsigned int devidx, int type);
+bool f2fs_is_dirty_device(struct f2fs_sb_info *sbi, nid_t ino,
+ unsigned int devidx, int type);
int f2fs_sync_inode_meta(struct f2fs_sb_info *sbi);
-int acquire_orphan_inode(struct f2fs_sb_info *sbi);
-void release_orphan_inode(struct f2fs_sb_info *sbi);
-void add_orphan_inode(struct inode *inode);
-void remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino);
-int recover_orphan_inodes(struct f2fs_sb_info *sbi);
-int get_valid_checkpoint(struct f2fs_sb_info *sbi);
-void update_dirty_page(struct inode *inode, struct page *page);
-void remove_dirty_inode(struct inode *inode);
-int sync_dirty_inodes(struct f2fs_sb_info *sbi, enum inode_type type);
-int write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc);
-void init_ino_entry_info(struct f2fs_sb_info *sbi);
-int __init create_checkpoint_caches(void);
-void destroy_checkpoint_caches(void);
+int f2fs_acquire_orphan_inode(struct f2fs_sb_info *sbi);
+void f2fs_release_orphan_inode(struct f2fs_sb_info *sbi);
+void f2fs_add_orphan_inode(struct inode *inode);
+void f2fs_remove_orphan_inode(struct f2fs_sb_info *sbi, nid_t ino);
+int f2fs_recover_orphan_inodes(struct f2fs_sb_info *sbi);
+int f2fs_get_valid_checkpoint(struct f2fs_sb_info *sbi);
+void f2fs_update_dirty_page(struct inode *inode, struct page *page);
+void f2fs_remove_dirty_inode(struct inode *inode);
+int f2fs_sync_dirty_inodes(struct f2fs_sb_info *sbi, enum inode_type type);
+int f2fs_write_checkpoint(struct f2fs_sb_info *sbi, struct cp_control *cpc);
+void f2fs_init_ino_entry_info(struct f2fs_sb_info *sbi);
+int __init f2fs_create_checkpoint_caches(void);
+void f2fs_destroy_checkpoint_caches(void);
/*
* data.c
*/
+int f2fs_init_post_read_processing(void);
+void f2fs_destroy_post_read_processing(void);
void f2fs_submit_merged_write(struct f2fs_sb_info *sbi, enum page_type type);
void f2fs_submit_merged_write_cond(struct f2fs_sb_info *sbi,
struct inode *inode, nid_t ino, pgoff_t idx,
enum page_type type);
void f2fs_flush_merged_writes(struct f2fs_sb_info *sbi);
int f2fs_submit_page_bio(struct f2fs_io_info *fio);
-int f2fs_submit_page_write(struct f2fs_io_info *fio);
+void f2fs_submit_page_write(struct f2fs_io_info *fio);
struct block_device *f2fs_target_device(struct f2fs_sb_info *sbi,
block_t blk_addr, struct bio *bio);
int f2fs_target_device_index(struct f2fs_sb_info *sbi, block_t blkaddr);
-void set_data_blkaddr(struct dnode_of_data *dn);
+void f2fs_set_data_blkaddr(struct dnode_of_data *dn);
void f2fs_update_data_blkaddr(struct dnode_of_data *dn, block_t blkaddr);
-int reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count);
-int reserve_new_block(struct dnode_of_data *dn);
+int f2fs_reserve_new_blocks(struct dnode_of_data *dn, blkcnt_t count);
+int f2fs_reserve_new_block(struct dnode_of_data *dn);
int f2fs_get_block(struct dnode_of_data *dn, pgoff_t index);
int f2fs_preallocate_blocks(struct kiocb *iocb, struct iov_iter *from);
int f2fs_reserve_block(struct dnode_of_data *dn, pgoff_t index);
-struct page *get_read_data_page(struct inode *inode, pgoff_t index,
+struct page *f2fs_get_read_data_page(struct inode *inode, pgoff_t index,
int op_flags, bool for_write);
-struct page *find_data_page(struct inode *inode, pgoff_t index);
-struct page *get_lock_data_page(struct inode *inode, pgoff_t index,
+struct page *f2fs_find_data_page(struct inode *inode, pgoff_t index);
+struct page *f2fs_get_lock_data_page(struct inode *inode, pgoff_t index,
bool for_write);
-struct page *get_new_data_page(struct inode *inode,
+struct page *f2fs_get_new_data_page(struct inode *inode,
struct page *ipage, pgoff_t index, bool new_i_size);
-int do_write_data_page(struct f2fs_io_info *fio);
+int f2fs_do_write_data_page(struct f2fs_io_info *fio);
int f2fs_map_blocks(struct inode *inode, struct f2fs_map_blocks *map,
int create, int flag);
int f2fs_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
u64 start, u64 len);
-void f2fs_set_page_dirty_nobuffers(struct page *page);
-int __f2fs_write_data_pages(struct address_space *mapping,
- struct writeback_control *wbc,
- enum iostat_type io_type);
+bool f2fs_should_update_inplace(struct inode *inode, struct f2fs_io_info *fio);
+bool f2fs_should_update_outplace(struct inode *inode, struct f2fs_io_info *fio);
void f2fs_invalidate_page(struct page *page, unsigned int offset,
unsigned int length);
int f2fs_release_page(struct page *page, gfp_t wait);
@@ -2632,22 +2974,24 @@
int f2fs_migrate_page(struct address_space *mapping, struct page *newpage,
struct page *page, enum migrate_mode mode);
#endif
+bool f2fs_overwrite_io(struct inode *inode, loff_t pos, size_t len);
+void f2fs_clear_radix_tree_dirty_tag(struct page *page);
/*
* gc.c
*/
-int start_gc_thread(struct f2fs_sb_info *sbi);
-void stop_gc_thread(struct f2fs_sb_info *sbi);
-block_t start_bidx_of_node(unsigned int node_ofs, struct inode *inode);
+int f2fs_start_gc_thread(struct f2fs_sb_info *sbi);
+void f2fs_stop_gc_thread(struct f2fs_sb_info *sbi);
+block_t f2fs_start_bidx_of_node(unsigned int node_ofs, struct inode *inode);
int f2fs_gc(struct f2fs_sb_info *sbi, bool sync, bool background,
unsigned int segno);
-void build_gc_manager(struct f2fs_sb_info *sbi);
+void f2fs_build_gc_manager(struct f2fs_sb_info *sbi);
/*
* recovery.c
*/
-int recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only);
-bool space_for_roll_forward(struct f2fs_sb_info *sbi);
+int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only);
+bool f2fs_space_for_roll_forward(struct f2fs_sb_info *sbi);
/*
* debug.c
@@ -2661,14 +3005,16 @@
unsigned long long hit_largest, hit_cached, hit_rbtree;
unsigned long long hit_total, total_ext;
int ext_tree, zombie_tree, ext_node;
- int ndirty_node, ndirty_dent, ndirty_meta, ndirty_data, ndirty_imeta;
+ int ndirty_node, ndirty_dent, ndirty_meta, ndirty_imeta;
+ int ndirty_data, ndirty_qdata;
int inmem_pages;
- unsigned int ndirty_dirs, ndirty_files, ndirty_all;
+ unsigned int ndirty_dirs, ndirty_files, nquota_files, ndirty_all;
int nats, dirty_nats, sits, dirty_sits;
int free_nids, avail_nids, alloc_nids;
int total_count, utilization;
int bg_gc, nr_wb_cp_data, nr_wb_data;
- int nr_flushing, nr_flushed, nr_discarding, nr_discarded;
+ int nr_flushing, nr_flushed, flush_list_empty;
+ int nr_discarding, nr_discarded;
int nr_discard_cmd;
unsigned int undiscard_blks;
int inline_xattr, inline_inode, inline_dir, append, update, orphans;
@@ -2683,6 +3029,7 @@
int bg_node_segs, bg_data_segs;
int tot_blks, data_blks, node_blks;
int bg_data_blks, bg_node_blks;
+ unsigned long long skipped_atomic_files[2];
int curseg[NR_CURSEG_TYPE];
int cursec[NR_CURSEG_TYPE];
int curzone[NR_CURSEG_TYPE];
@@ -2849,29 +3196,31 @@
extern const struct inode_operations f2fs_symlink_inode_operations;
extern const struct inode_operations f2fs_encrypted_symlink_inode_operations;
extern const struct inode_operations f2fs_special_inode_operations;
-extern struct kmem_cache *inode_entry_slab;
+extern struct kmem_cache *f2fs_inode_entry_slab;
/*
* inline.c
*/
bool f2fs_may_inline_data(struct inode *inode);
bool f2fs_may_inline_dentry(struct inode *inode);
-void read_inline_data(struct page *page, struct page *ipage);
-void truncate_inline_inode(struct inode *inode, struct page *ipage, u64 from);
+void f2fs_do_read_inline_data(struct page *page, struct page *ipage);
+void f2fs_truncate_inline_inode(struct inode *inode,
+ struct page *ipage, u64 from);
int f2fs_read_inline_data(struct inode *inode, struct page *page);
int f2fs_convert_inline_page(struct dnode_of_data *dn, struct page *page);
int f2fs_convert_inline_inode(struct inode *inode);
int f2fs_write_inline_data(struct inode *inode, struct page *page);
-bool recover_inline_data(struct inode *inode, struct page *npage);
-struct f2fs_dir_entry *find_in_inline_dir(struct inode *dir,
+bool f2fs_recover_inline_data(struct inode *inode, struct page *npage);
+struct f2fs_dir_entry *f2fs_find_in_inline_dir(struct inode *dir,
struct fscrypt_name *fname, struct page **res_page);
-int make_empty_inline_dir(struct inode *inode, struct inode *parent,
+int f2fs_make_empty_inline_dir(struct inode *inode, struct inode *parent,
struct page *ipage);
int f2fs_add_inline_entry(struct inode *dir, const struct qstr *new_name,
const struct qstr *orig_name,
struct inode *inode, nid_t ino, umode_t mode);
-void f2fs_delete_inline_entry(struct f2fs_dir_entry *dentry, struct page *page,
- struct inode *dir, struct inode *inode);
+void f2fs_delete_inline_entry(struct f2fs_dir_entry *dentry,
+ struct page *page, struct inode *dir,
+ struct inode *inode);
bool f2fs_empty_inline_dir(struct inode *dir);
int f2fs_read_inline_dir(struct file *file, struct dir_context *ctx,
struct fscrypt_str *fstr);
@@ -2892,17 +3241,17 @@
/*
* extent_cache.c
*/
-struct rb_entry *__lookup_rb_tree(struct rb_root *root,
+struct rb_entry *f2fs_lookup_rb_tree(struct rb_root *root,
struct rb_entry *cached_re, unsigned int ofs);
-struct rb_node **__lookup_rb_tree_for_insert(struct f2fs_sb_info *sbi,
+struct rb_node **f2fs_lookup_rb_tree_for_insert(struct f2fs_sb_info *sbi,
struct rb_root *root, struct rb_node **parent,
unsigned int ofs);
-struct rb_entry *__lookup_rb_tree_ret(struct rb_root *root,
+struct rb_entry *f2fs_lookup_rb_tree_ret(struct rb_root *root,
struct rb_entry *cached_re, unsigned int ofs,
struct rb_entry **prev_entry, struct rb_entry **next_entry,
struct rb_node ***insert_p, struct rb_node **insert_parent,
bool force);
-bool __check_rb_tree_consistence(struct f2fs_sb_info *sbi,
+bool f2fs_check_rb_tree_consistence(struct f2fs_sb_info *sbi,
struct rb_root *root);
unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info *sbi, int nr_shrink);
bool f2fs_init_extent_tree(struct inode *inode, struct f2fs_extent *i_ext);
@@ -2914,9 +3263,9 @@
void f2fs_update_extent_cache(struct dnode_of_data *dn);
void f2fs_update_extent_cache_range(struct dnode_of_data *dn,
pgoff_t fofs, block_t blkaddr, unsigned int len);
-void init_extent_cache_info(struct f2fs_sb_info *sbi);
-int __init create_extent_cache(void);
-void destroy_extent_cache(void);
+void f2fs_init_extent_cache_info(struct f2fs_sb_info *sbi);
+int __init f2fs_create_extent_cache(void);
+void f2fs_destroy_extent_cache(void);
/*
* sysfs.c
@@ -2943,38 +3292,34 @@
{
#ifdef CONFIG_F2FS_FS_ENCRYPTION
file_set_encrypt(inode);
+ inode->i_flags |= S_ENCRYPTED;
#endif
}
-static inline bool f2fs_bio_encrypted(struct bio *bio)
+/*
+ * Returns true if the reads of the inode's data need to undergo some
+ * postprocessing step, like decryption or authenticity verification.
+ */
+static inline bool f2fs_post_read_required(struct inode *inode)
{
- return bio->bi_private != NULL;
+ return f2fs_encrypted_file(inode);
}
-static inline int f2fs_sb_has_crypto(struct super_block *sb)
-{
- return F2FS_HAS_FEATURE(sb, F2FS_FEATURE_ENCRYPT);
+#define F2FS_FEATURE_FUNCS(name, flagname) \
+static inline int f2fs_sb_has_##name(struct super_block *sb) \
+{ \
+ return F2FS_HAS_FEATURE(sb, F2FS_FEATURE_##flagname); \
}
-static inline int f2fs_sb_mounted_blkzoned(struct super_block *sb)
-{
- return F2FS_HAS_FEATURE(sb, F2FS_FEATURE_BLKZONED);
-}
-
-static inline int f2fs_sb_has_extra_attr(struct super_block *sb)
-{
- return F2FS_HAS_FEATURE(sb, F2FS_FEATURE_EXTRA_ATTR);
-}
-
-static inline int f2fs_sb_has_project_quota(struct super_block *sb)
-{
- return F2FS_HAS_FEATURE(sb, F2FS_FEATURE_PRJQUOTA);
-}
-
-static inline int f2fs_sb_has_inode_chksum(struct super_block *sb)
-{
- return F2FS_HAS_FEATURE(sb, F2FS_FEATURE_INODE_CHKSUM);
-}
+F2FS_FEATURE_FUNCS(encrypt, ENCRYPT);
+F2FS_FEATURE_FUNCS(blkzoned, BLKZONED);
+F2FS_FEATURE_FUNCS(extra_attr, EXTRA_ATTR);
+F2FS_FEATURE_FUNCS(project_quota, PRJQUOTA);
+F2FS_FEATURE_FUNCS(inode_chksum, INODE_CHKSUM);
+F2FS_FEATURE_FUNCS(flexible_inline_xattr, FLEXIBLE_INLINE_XATTR);
+F2FS_FEATURE_FUNCS(quota_ino, QUOTA_INO);
+F2FS_FEATURE_FUNCS(inode_crtime, INODE_CRTIME);
+F2FS_FEATURE_FUNCS(lost_found, LOST_FOUND);
#ifdef CONFIG_BLK_DEV_ZONED
static inline int get_blkz_type(struct f2fs_sb_info *sbi,
@@ -2994,7 +3339,7 @@
{
struct request_queue *q = bdev_get_queue(sbi->sb->s_bdev);
- return blk_queue_discard(q) || f2fs_sb_mounted_blkzoned(sbi->sb);
+ return blk_queue_discard(q) || f2fs_sb_has_blkzoned(sbi->sb);
}
static inline void set_opt_mode(struct f2fs_sb_info *sbi, unsigned int mt)
@@ -3023,4 +3368,11 @@
#endif
}
+static inline bool f2fs_force_buffered_io(struct inode *inode, int rw)
+{
+ return (f2fs_post_read_required(inode) ||
+ (rw == WRITE && test_opt(F2FS_I_SB(inode), LFS)) ||
+ F2FS_I_SB(inode)->s_ndevs);
+}
+
#endif
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 6f58973..55b8964 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -53,6 +53,11 @@
struct dnode_of_data dn;
int err;
+ if (unlikely(f2fs_cp_error(sbi))) {
+ err = -EIO;
+ goto err;
+ }
+
sb_start_pagefault(inode->i_sb);
f2fs_bug_on(sbi, f2fs_has_inline_data(inode));
@@ -90,7 +95,8 @@
/* page is wholly or partially inside EOF */
if (((loff_t)(page->index + 1) << PAGE_SHIFT) >
i_size_read(inode)) {
- unsigned offset;
+ loff_t offset;
+
offset = i_size_read(inode) & ~PAGE_MASK;
zero_user_segment(page, offset, PAGE_SIZE);
}
@@ -105,8 +111,8 @@
/* fill the page */
f2fs_wait_on_page_writeback(page, DATA, false);
- /* wait for GCed encrypted page writeback */
- if (f2fs_encrypted_file(inode))
+ /* wait for GCed page writeback via META_MAPPING */
+ if (f2fs_post_read_required(inode))
f2fs_wait_on_block_writeback(sbi, dn.data_blkaddr);
out_sem:
@@ -114,6 +120,7 @@
out:
sb_end_pagefault(inode->i_sb);
f2fs_update_time(sbi, REQ_TIME);
+err:
return block_page_mkwrite_return(err);
}
@@ -138,27 +145,34 @@
return 1;
}
-static inline bool need_do_checkpoint(struct inode *inode)
+static inline enum cp_reason_type need_do_checkpoint(struct inode *inode)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
- bool need_cp = false;
+ enum cp_reason_type cp_reason = CP_NO_NEEDED;
- if (!S_ISREG(inode->i_mode) || inode->i_nlink != 1)
- need_cp = true;
+ if (!S_ISREG(inode->i_mode))
+ cp_reason = CP_NON_REGULAR;
+ else if (inode->i_nlink != 1)
+ cp_reason = CP_HARDLINK;
else if (is_sbi_flag_set(sbi, SBI_NEED_CP))
- need_cp = true;
+ cp_reason = CP_SB_NEED_CP;
else if (file_wrong_pino(inode))
- need_cp = true;
- else if (!space_for_roll_forward(sbi))
- need_cp = true;
- else if (!is_checkpointed_node(sbi, F2FS_I(inode)->i_pino))
- need_cp = true;
+ cp_reason = CP_WRONG_PINO;
+ else if (!f2fs_space_for_roll_forward(sbi))
+ cp_reason = CP_NO_SPC_ROLL;
+ else if (!f2fs_is_checkpointed_node(sbi, F2FS_I(inode)->i_pino))
+ cp_reason = CP_NODE_NEED_CP;
else if (test_opt(sbi, FASTBOOT))
- need_cp = true;
- else if (sbi->active_logs == 2)
- need_cp = true;
+ cp_reason = CP_FASTBOOT_MODE;
+ else if (F2FS_OPTION(sbi).active_logs == 2)
+ cp_reason = CP_SPEC_LOG_NUM;
+ else if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT &&
+ f2fs_need_dentry_mark(sbi, inode->i_ino) &&
+ f2fs_exist_written_data(sbi, F2FS_I(inode)->i_pino,
+ TRANS_DIR_INO))
+ cp_reason = CP_RECOVER_DIR;
- return need_cp;
+ return cp_reason;
}
static bool need_inode_page_update(struct f2fs_sb_info *sbi, nid_t ino)
@@ -166,7 +180,7 @@
struct page *i = find_get_page(NODE_MAPPING(sbi), ino);
bool ret = false;
/* But we need to avoid that there are some inode updates */
- if ((i && PageDirty(i)) || need_inode_block_update(sbi, ino))
+ if ((i && PageDirty(i)) || f2fs_need_inode_block_update(sbi, ino))
ret = true;
f2fs_put_page(i, 0);
return ret;
@@ -193,7 +207,7 @@
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
nid_t ino = inode->i_ino;
int ret = 0;
- bool need_cp = false;
+ enum cp_reason_type cp_reason = 0;
struct writeback_control wbc = {
.sync_mode = WB_SYNC_ALL,
.nr_to_write = LONG_MAX,
@@ -212,7 +226,7 @@
clear_inode_flag(inode, FI_NEED_IPU);
if (ret) {
- trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret);
+ trace_f2fs_sync_file_exit(inode, cp_reason, datasync, ret);
return ret;
}
@@ -226,14 +240,14 @@
* if there is no written data, don't waste time to write recovery info.
*/
if (!is_inode_flag_set(inode, FI_APPEND_WRITE) &&
- !exist_written_data(sbi, ino, APPEND_INO)) {
+ !f2fs_exist_written_data(sbi, ino, APPEND_INO)) {
/* it may call write_inode just prior to fsync */
if (need_inode_page_update(sbi, ino))
goto go_write;
if (is_inode_flag_set(inode, FI_UPDATE_WRITE) ||
- exist_written_data(sbi, ino, UPDATE_INO))
+ f2fs_exist_written_data(sbi, ino, UPDATE_INO))
goto flush_out;
goto out;
}
@@ -243,10 +257,10 @@
* sudden-power-off.
*/
down_read(&F2FS_I(inode)->i_sem);
- need_cp = need_do_checkpoint(inode);
+ cp_reason = need_do_checkpoint(inode);
up_read(&F2FS_I(inode)->i_sem);
- if (need_cp) {
+ if (cp_reason) {
/* all the dirty node pages should be flushed for POR */
ret = f2fs_sync_fs(inode->i_sb, 1);
@@ -260,7 +274,9 @@
goto out;
}
sync_nodes:
- ret = fsync_node_pages(sbi, inode, &wbc, atomic);
+ atomic_inc(&sbi->wb_sync_req[NODE]);
+ ret = f2fs_fsync_node_pages(sbi, inode, &wbc, atomic);
+ atomic_dec(&sbi->wb_sync_req[NODE]);
if (ret)
goto out;
@@ -270,7 +286,7 @@
goto out;
}
- if (need_inode_block_update(sbi, ino)) {
+ if (f2fs_need_inode_block_update(sbi, ino)) {
f2fs_mark_inode_dirty_sync(inode, true);
f2fs_write_inode(inode, NULL);
goto sync_nodes;
@@ -285,46 +301,52 @@
* given fsync mark.
*/
if (!atomic) {
- ret = wait_on_node_pages_writeback(sbi, ino);
+ ret = f2fs_wait_on_node_pages_writeback(sbi, ino);
if (ret)
goto out;
}
/* once recovery info is written, don't need to tack this */
- remove_ino_entry(sbi, ino, APPEND_INO);
+ f2fs_remove_ino_entry(sbi, ino, APPEND_INO);
clear_inode_flag(inode, FI_APPEND_WRITE);
flush_out:
- remove_ino_entry(sbi, ino, UPDATE_INO);
- clear_inode_flag(inode, FI_UPDATE_WRITE);
- if (!atomic)
- ret = f2fs_issue_flush(sbi);
+ if (!atomic && F2FS_OPTION(sbi).fsync_mode != FSYNC_MODE_NOBARRIER)
+ ret = f2fs_issue_flush(sbi, inode->i_ino);
+ if (!ret) {
+ f2fs_remove_ino_entry(sbi, ino, UPDATE_INO);
+ clear_inode_flag(inode, FI_UPDATE_WRITE);
+ f2fs_remove_ino_entry(sbi, ino, FLUSH_INO);
+ }
f2fs_update_time(sbi, REQ_TIME);
out:
- trace_f2fs_sync_file_exit(inode, need_cp, datasync, ret);
+ trace_f2fs_sync_file_exit(inode, cp_reason, datasync, ret);
f2fs_trace_ios(NULL, 1);
return ret;
}
int f2fs_sync_file(struct file *file, loff_t start, loff_t end, int datasync)
{
+ if (unlikely(f2fs_cp_error(F2FS_I_SB(file_inode(file)))))
+ return -EIO;
return f2fs_do_sync_file(file, start, end, datasync, false);
}
static pgoff_t __get_first_dirty_index(struct address_space *mapping,
pgoff_t pgofs, int whence)
{
- struct pagevec pvec;
+ struct page *page;
int nr_pages;
if (whence != SEEK_DATA)
return 0;
/* find first dirty page index */
- pagevec_init(&pvec, 0);
- nr_pages = pagevec_lookup_tag(&pvec, mapping, &pgofs,
- PAGECACHE_TAG_DIRTY, 1);
- pgofs = nr_pages ? pvec.pages[0]->index : ULONG_MAX;
- pagevec_release(&pvec);
+ nr_pages = find_get_pages_tag(mapping, &pgofs, PAGECACHE_TAG_DIRTY,
+ 1, &page);
+ if (!nr_pages)
+ return ULONG_MAX;
+ pgofs = page->index;
+ put_page(page);
return pgofs;
}
@@ -334,7 +356,7 @@
switch (whence) {
case SEEK_DATA:
if ((blkaddr == NEW_ADDR && dirty == pgofs) ||
- (blkaddr != NEW_ADDR && blkaddr != NULL_ADDR))
+ is_valid_blkaddr(blkaddr))
return true;
break;
case SEEK_HOLE:
@@ -374,13 +396,13 @@
for (; data_ofs < isize; data_ofs = (loff_t)pgofs << PAGE_SHIFT) {
set_new_dnode(&dn, inode, NULL, NULL, 0);
- err = get_dnode_of_data(&dn, pgofs, LOOKUP_NODE);
+ err = f2fs_get_dnode_of_data(&dn, pgofs, LOOKUP_NODE);
if (err && err != -ENOENT) {
goto fail;
} else if (err == -ENOENT) {
/* direct node does not exists */
if (whence == SEEK_DATA) {
- pgofs = get_next_page_offset(&dn, pgofs);
+ pgofs = f2fs_get_next_page_offset(&dn, pgofs);
continue;
} else {
goto found;
@@ -394,6 +416,7 @@
dn.ofs_in_node++, pgofs++,
data_ofs = (loff_t)pgofs << PAGE_SHIFT) {
block_t blkaddr;
+
blkaddr = datablock_addr(dn.inode,
dn.node_page, dn.ofs_in_node);
@@ -443,6 +466,9 @@
struct inode *inode = file_inode(file);
int err;
+ if (unlikely(f2fs_cp_error(F2FS_I_SB(inode))))
+ return -EIO;
+
/* we don't need to use inline_data strictly */
err = f2fs_convert_inline_inode(inode);
if (err)
@@ -455,26 +481,17 @@
static int f2fs_file_open(struct inode *inode, struct file *filp)
{
- struct dentry *dir;
+ int err = fscrypt_file_open(inode, filp);
- if (f2fs_encrypted_inode(inode)) {
- int ret = fscrypt_get_encryption_info(inode);
- if (ret)
- return -EACCES;
- if (!fscrypt_has_encryption_key(inode))
- return -ENOKEY;
- }
- dir = dget_parent(file_dentry(filp));
- if (f2fs_encrypted_inode(d_inode(dir)) &&
- !fscrypt_has_permitted_context(d_inode(dir), inode)) {
- dput(dir);
- return -EPERM;
- }
- dput(dir);
+ if (err)
+ return err;
+
+ filp->f_mode |= FMODE_NOWAIT;
+
return dquot_file_open(inode, filp);
}
-int truncate_data_blocks_range(struct dnode_of_data *dn, int count)
+void f2fs_truncate_data_blocks_range(struct dnode_of_data *dn, int count)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);
struct f2fs_node *raw_node;
@@ -490,12 +507,13 @@
for (; count > 0; count--, addr++, dn->ofs_in_node++) {
block_t blkaddr = le32_to_cpu(*addr);
+
if (blkaddr == NULL_ADDR)
continue;
dn->data_blkaddr = NULL_ADDR;
- set_data_blkaddr(dn);
- invalidate_blocks(sbi, blkaddr);
+ f2fs_set_data_blkaddr(dn);
+ f2fs_invalidate_blocks(sbi, blkaddr);
if (dn->ofs_in_node == 0 && IS_INODE(dn->node_page))
clear_inode_flag(dn->inode, FI_FIRST_BLOCK_WRITTEN);
nr_free++;
@@ -507,7 +525,7 @@
* once we invalidate valid blkaddr in range [ofs, ofs + count],
* we will invalidate all blkaddr in the whole range.
*/
- fofs = start_bidx_of_node(ofs_of_node(dn->node_page),
+ fofs = f2fs_start_bidx_of_node(ofs_of_node(dn->node_page),
dn->inode) + ofs;
f2fs_update_extent_cache_range(dn, fofs, 0, len);
dec_valid_block_count(sbi, dn->inode, nr_free);
@@ -517,18 +535,17 @@
f2fs_update_time(sbi, REQ_TIME);
trace_f2fs_truncate_data_blocks_range(dn->inode, dn->nid,
dn->ofs_in_node, nr_free);
- return nr_free;
}
-void truncate_data_blocks(struct dnode_of_data *dn)
+void f2fs_truncate_data_blocks(struct dnode_of_data *dn)
{
- truncate_data_blocks_range(dn, ADDRS_PER_BLOCK);
+ f2fs_truncate_data_blocks_range(dn, ADDRS_PER_BLOCK);
}
static int truncate_partial_data_page(struct inode *inode, u64 from,
bool cache_only)
{
- unsigned offset = from & (PAGE_SIZE - 1);
+ loff_t offset = from & (PAGE_SIZE - 1);
pgoff_t index = from >> PAGE_SHIFT;
struct address_space *mapping = inode->i_mapping;
struct page *page;
@@ -544,7 +561,7 @@
return 0;
}
- page = get_lock_data_page(inode, index, true);
+ page = f2fs_get_lock_data_page(inode, index, true);
if (IS_ERR(page))
return PTR_ERR(page) == -ENOENT ? 0 : PTR_ERR(page);
truncate_out:
@@ -559,10 +576,9 @@
return 0;
}
-int truncate_blocks(struct inode *inode, u64 from, bool lock)
+int f2fs_truncate_blocks(struct inode *inode, u64 from, bool lock)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
- unsigned int blocksize = inode->i_sb->s_blocksize;
struct dnode_of_data dn;
pgoff_t free_from;
int count = 0, err = 0;
@@ -571,7 +587,7 @@
trace_f2fs_truncate_blocks_enter(inode, from);
- free_from = (pgoff_t)F2FS_BYTES_TO_BLK(from + blocksize - 1);
+ free_from = (pgoff_t)F2FS_BLK_ALIGN(from);
if (free_from >= sbi->max_file_blocks)
goto free_partial;
@@ -579,21 +595,21 @@
if (lock)
f2fs_lock_op(sbi);
- ipage = get_node_page(sbi, inode->i_ino);
+ ipage = f2fs_get_node_page(sbi, inode->i_ino);
if (IS_ERR(ipage)) {
err = PTR_ERR(ipage);
goto out;
}
if (f2fs_has_inline_data(inode)) {
- truncate_inline_inode(inode, ipage, from);
+ f2fs_truncate_inline_inode(inode, ipage, from);
f2fs_put_page(ipage, 1);
truncate_page = true;
goto out;
}
set_new_dnode(&dn, inode, ipage, NULL, 0);
- err = get_dnode_of_data(&dn, free_from, LOOKUP_NODE_RA);
+ err = f2fs_get_dnode_of_data(&dn, free_from, LOOKUP_NODE_RA);
if (err) {
if (err == -ENOENT)
goto free_next;
@@ -606,13 +622,13 @@
f2fs_bug_on(sbi, count < 0);
if (dn.ofs_in_node || IS_INODE(dn.node_page)) {
- truncate_data_blocks_range(&dn, count);
+ f2fs_truncate_data_blocks_range(&dn, count);
free_from += count;
}
f2fs_put_dnode(&dn);
free_next:
- err = truncate_inode_blocks(inode, free_from);
+ err = f2fs_truncate_inode_blocks(inode, free_from);
out:
if (lock)
f2fs_unlock_op(sbi);
@@ -629,6 +645,9 @@
{
int err;
+ if (unlikely(f2fs_cp_error(F2FS_I_SB(inode))))
+ return -EIO;
+
if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
S_ISLNK(inode->i_mode)))
return 0;
@@ -648,7 +667,7 @@
return err;
}
- err = truncate_blocks(inode, i_size_read(inode), true);
+ err = f2fs_truncate_blocks(inode, i_size_read(inode), true);
if (err)
return err;
@@ -662,18 +681,27 @@
{
struct inode *inode = d_inode(path->dentry);
struct f2fs_inode_info *fi = F2FS_I(inode);
+ struct f2fs_inode *ri;
unsigned int flags;
- flags = fi->i_flags & (FS_FL_USER_VISIBLE | FS_PROJINHERIT_FL);
- if (flags & FS_APPEND_FL)
+ if (f2fs_has_extra_attr(inode) &&
+ f2fs_sb_has_inode_crtime(inode->i_sb) &&
+ F2FS_FITS_IN_INODE(ri, fi->i_extra_isize, i_crtime)) {
+ stat->result_mask |= STATX_BTIME;
+ stat->btime.tv_sec = fi->i_crtime.tv_sec;
+ stat->btime.tv_nsec = fi->i_crtime.tv_nsec;
+ }
+
+ flags = fi->i_flags & F2FS_FL_USER_VISIBLE;
+ if (flags & F2FS_APPEND_FL)
stat->attributes |= STATX_ATTR_APPEND;
- if (flags & FS_COMPR_FL)
+ if (flags & F2FS_COMPR_FL)
stat->attributes |= STATX_ATTR_COMPRESSED;
if (f2fs_encrypted_inode(inode))
stat->attributes |= STATX_ATTR_ENCRYPTED;
- if (flags & FS_IMMUTABLE_FL)
+ if (flags & F2FS_IMMUTABLE_FL)
stat->attributes |= STATX_ATTR_IMMUTABLE;
- if (flags & FS_NODUMP_FL)
+ if (flags & F2FS_NODUMP_FL)
stat->attributes |= STATX_ATTR_NODUMP;
stat->attributes_mask |= (STATX_ATTR_APPEND |
@@ -728,10 +756,17 @@
int err;
bool size_changed = false;
+ if (unlikely(f2fs_cp_error(F2FS_I_SB(inode))))
+ return -EIO;
+
err = setattr_prepare(dentry, attr);
if (err)
return err;
+ err = fscrypt_prepare_setattr(dentry, attr);
+ if (err)
+ return err;
+
if (is_quota_modification(inode, attr)) {
err = dquot_initialize(inode);
if (err)
@@ -747,14 +782,6 @@
}
if (attr->ia_valid & ATTR_SIZE) {
- if (f2fs_encrypted_inode(inode)) {
- err = fscrypt_get_encryption_info(inode);
- if (err)
- return err;
- if (!fscrypt_has_encryption_key(inode))
- return -ENOKEY;
- }
-
if (attr->ia_size <= i_size_read(inode)) {
down_write(&F2FS_I(inode)->i_mmap_sem);
truncate_setsize(inode, attr->ia_size);
@@ -780,13 +807,17 @@
inode->i_mtime = inode->i_ctime = current_time(inode);
}
+ down_write(&F2FS_I(inode)->i_sem);
+ F2FS_I(inode)->last_disk_size = i_size_read(inode);
+ up_write(&F2FS_I(inode)->i_sem);
+
size_changed = true;
}
__setattr_copy(inode, attr);
if (attr->ia_valid & ATTR_MODE) {
- err = posix_acl_chmod(inode, get_inode_mode(inode));
+ err = posix_acl_chmod(inode, f2fs_get_inode_mode(inode));
if (err || is_inode_flag_set(inode, FI_ACL_MODE)) {
inode->i_mode = F2FS_I(inode)->i_acl_mode;
clear_inode_flag(inode, FI_ACL_MODE);
@@ -825,7 +856,7 @@
f2fs_balance_fs(sbi, true);
f2fs_lock_op(sbi);
- page = get_new_data_page(inode, NULL, index, false);
+ page = f2fs_get_new_data_page(inode, NULL, index, false);
f2fs_unlock_op(sbi);
if (IS_ERR(page))
@@ -838,7 +869,7 @@
return 0;
}
-int truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t pg_end)
+int f2fs_truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t pg_end)
{
int err;
@@ -847,10 +878,11 @@
pgoff_t end_offset, count;
set_new_dnode(&dn, inode, NULL, NULL, 0);
- err = get_dnode_of_data(&dn, pg_start, LOOKUP_NODE);
+ err = f2fs_get_dnode_of_data(&dn, pg_start, LOOKUP_NODE);
if (err) {
if (err == -ENOENT) {
- pg_start++;
+ pg_start = f2fs_get_next_page_offset(&dn,
+ pg_start);
continue;
}
return err;
@@ -861,7 +893,7 @@
f2fs_bug_on(F2FS_I_SB(inode), count == 0 || count > end_offset);
- truncate_data_blocks_range(&dn, count);
+ f2fs_truncate_data_blocks_range(&dn, count);
f2fs_put_dnode(&dn);
pg_start += count;
@@ -917,7 +949,7 @@
blk_end - 1);
f2fs_lock_op(sbi);
- ret = truncate_hole(inode, pg_start, pg_end);
+ ret = f2fs_truncate_hole(inode, pg_start, pg_end);
f2fs_unlock_op(sbi);
up_write(&F2FS_I(inode)->i_mmap_sem);
}
@@ -935,7 +967,7 @@
next_dnode:
set_new_dnode(&dn, inode, NULL, NULL, 0);
- ret = get_dnode_of_data(&dn, off, LOOKUP_NODE_RA);
+ ret = f2fs_get_dnode_of_data(&dn, off, LOOKUP_NODE_RA);
if (ret && ret != -ENOENT) {
return ret;
} else if (ret == -ENOENT) {
@@ -952,7 +984,7 @@
for (i = 0; i < done; i++, blkaddr++, do_replace++, dn.ofs_in_node++) {
*blkaddr = datablock_addr(dn.inode,
dn.node_page, dn.ofs_in_node);
- if (!is_checkpointed_data(sbi, *blkaddr)) {
+ if (!f2fs_is_checkpointed_data(sbi, *blkaddr)) {
if (test_opt(sbi, LFS)) {
f2fs_put_dnode(&dn);
@@ -985,10 +1017,10 @@
continue;
set_new_dnode(&dn, inode, NULL, NULL, 0);
- ret = get_dnode_of_data(&dn, off + i, LOOKUP_NODE_RA);
+ ret = f2fs_get_dnode_of_data(&dn, off + i, LOOKUP_NODE_RA);
if (ret) {
dec_valid_block_count(sbi, inode, 1);
- invalidate_blocks(sbi, *blkaddr);
+ f2fs_invalidate_blocks(sbi, *blkaddr);
} else {
f2fs_update_data_blkaddr(&dn, *blkaddr);
}
@@ -1018,18 +1050,18 @@
pgoff_t ilen;
set_new_dnode(&dn, dst_inode, NULL, NULL, 0);
- ret = get_dnode_of_data(&dn, dst + i, ALLOC_NODE);
+ ret = f2fs_get_dnode_of_data(&dn, dst + i, ALLOC_NODE);
if (ret)
return ret;
- get_node_info(sbi, dn.nid, &ni);
+ f2fs_get_node_info(sbi, dn.nid, &ni);
ilen = min((pgoff_t)
ADDRS_PER_PAGE(dn.node_page, dst_inode) -
dn.ofs_in_node, len - i);
do {
dn.data_blkaddr = datablock_addr(dn.inode,
dn.node_page, dn.ofs_in_node);
- truncate_data_blocks_range(&dn, 1);
+ f2fs_truncate_data_blocks_range(&dn, 1);
if (do_replace[i]) {
f2fs_i_blocks_write(src_inode,
@@ -1052,10 +1084,11 @@
} else {
struct page *psrc, *pdst;
- psrc = get_lock_data_page(src_inode, src + i, true);
+ psrc = f2fs_get_lock_data_page(src_inode,
+ src + i, true);
if (IS_ERR(psrc))
return PTR_ERR(psrc);
- pdst = get_new_data_page(dst_inode, NULL, dst + i,
+ pdst = f2fs_get_new_data_page(dst_inode, NULL, dst + i,
true);
if (IS_ERR(pdst)) {
f2fs_put_page(psrc, 1);
@@ -1066,7 +1099,8 @@
f2fs_put_page(pdst, 1);
f2fs_put_page(psrc, 1);
- ret = truncate_hole(src_inode, src + i, src + i + 1);
+ ret = f2fs_truncate_hole(src_inode,
+ src + i, src + i + 1);
if (ret)
return ret;
i++;
@@ -1087,11 +1121,15 @@
while (len) {
olen = min((pgoff_t)4 * ADDRS_PER_BLOCK, len);
- src_blkaddr = kvzalloc(sizeof(block_t) * olen, GFP_KERNEL);
+ src_blkaddr = f2fs_kvzalloc(F2FS_I_SB(src_inode),
+ array_size(olen, sizeof(block_t)),
+ GFP_KERNEL);
if (!src_blkaddr)
return -ENOMEM;
- do_replace = kvzalloc(sizeof(int) * olen, GFP_KERNEL);
+ do_replace = f2fs_kvzalloc(F2FS_I_SB(src_inode),
+ array_size(olen, sizeof(int)),
+ GFP_KERNEL);
if (!do_replace) {
kvfree(src_blkaddr);
return -ENOMEM;
@@ -1117,7 +1155,7 @@
return 0;
roll_back:
- __roll_back_blkaddrs(src_inode, src_blkaddr, do_replace, src, len);
+ __roll_back_blkaddrs(src_inode, src_blkaddr, do_replace, src, olen);
kvfree(src_blkaddr);
kvfree(do_replace);
return ret;
@@ -1159,17 +1197,20 @@
pg_start = offset >> PAGE_SHIFT;
pg_end = (offset + len) >> PAGE_SHIFT;
+ /* avoid gc operation during block exchange */
+ down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
+
down_write(&F2FS_I(inode)->i_mmap_sem);
/* write out all dirty pages from offset */
ret = filemap_write_and_wait_range(inode->i_mapping, offset, LLONG_MAX);
if (ret)
- goto out;
+ goto out_unlock;
truncate_pagecache(inode, offset);
ret = f2fs_do_collapse(inode, pg_start, pg_end);
if (ret)
- goto out;
+ goto out_unlock;
/* write out all moved pages, if possible */
filemap_write_and_wait_range(inode->i_mapping, offset, LLONG_MAX);
@@ -1178,12 +1219,12 @@
new_size = i_size_read(inode) - len;
truncate_pagecache(inode, new_size);
- ret = truncate_blocks(inode, new_size, true);
+ ret = f2fs_truncate_blocks(inode, new_size, true);
if (!ret)
f2fs_i_size_write(inode, new_size);
-
-out:
+out_unlock:
up_write(&F2FS_I(inode)->i_mmap_sem);
+ up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
return ret;
}
@@ -1203,7 +1244,7 @@
}
dn->ofs_in_node = ofs_in_node;
- ret = reserve_new_blocks(dn, count);
+ ret = f2fs_reserve_new_blocks(dn, count);
if (ret)
return ret;
@@ -1212,7 +1253,7 @@
dn->data_blkaddr = datablock_addr(dn->inode,
dn->node_page, dn->ofs_in_node);
/*
- * reserve_new_blocks will not guarantee entire block
+ * f2fs_reserve_new_blocks will not guarantee entire block
* allocation.
*/
if (dn->data_blkaddr == NULL_ADDR) {
@@ -1220,9 +1261,9 @@
break;
}
if (dn->data_blkaddr != NEW_ADDR) {
- invalidate_blocks(sbi, dn->data_blkaddr);
+ f2fs_invalidate_blocks(sbi, dn->data_blkaddr);
dn->data_blkaddr = NEW_ADDR;
- set_data_blkaddr(dn);
+ f2fs_set_data_blkaddr(dn);
}
}
@@ -1288,7 +1329,7 @@
f2fs_lock_op(sbi);
set_new_dnode(&dn, inode, NULL, NULL, 0);
- ret = get_dnode_of_data(&dn, index, ALLOC_NODE);
+ ret = f2fs_get_dnode_of_data(&dn, index, ALLOC_NODE);
if (ret) {
f2fs_unlock_op(sbi);
goto out;
@@ -1358,8 +1399,11 @@
f2fs_balance_fs(sbi, true);
+ /* avoid gc operation during block exchange */
+ down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
+
down_write(&F2FS_I(inode)->i_mmap_sem);
- ret = truncate_blocks(inode, i_size_read(inode), true);
+ ret = f2fs_truncate_blocks(inode, i_size_read(inode), true);
if (ret)
goto out;
@@ -1397,6 +1441,7 @@
f2fs_i_size_write(inode, new_size);
out:
up_write(&F2FS_I(inode)->i_mmap_sem);
+ up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
return ret;
}
@@ -1404,7 +1449,8 @@
loff_t len, int mode)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
- struct f2fs_map_blocks map = { .m_next_pgofs = NULL };
+ struct f2fs_map_blocks map = { .m_next_pgofs = NULL,
+ .m_next_extent = NULL, .m_seg_type = NO_CHECK_TYPE };
pgoff_t pg_end;
loff_t new_size = i_size_read(inode);
loff_t off_end;
@@ -1438,14 +1484,18 @@
last_off = map.m_lblk + map.m_len - 1;
/* update new size to the failed position */
- new_size = (last_off == pg_end) ? offset + len:
+ new_size = (last_off == pg_end) ? offset + len :
(loff_t)(last_off + 1) << PAGE_SHIFT;
} else {
new_size = ((loff_t)pg_end << PAGE_SHIFT) + off_end;
}
- if (!(mode & FALLOC_FL_KEEP_SIZE) && i_size_read(inode) < new_size)
- f2fs_i_size_write(inode, new_size);
+ if (new_size > i_size_read(inode)) {
+ if (mode & FALLOC_FL_KEEP_SIZE)
+ file_set_keep_isize(inode);
+ else
+ f2fs_i_size_write(inode, new_size);
+ }
return err;
}
@@ -1456,6 +1506,9 @@
struct inode *inode = file_inode(file);
long ret = 0;
+ if (unlikely(f2fs_cp_error(F2FS_I_SB(inode))))
+ return -EIO;
+
/* f2fs only support ->fallocate for regular file */
if (!S_ISREG(inode->i_mode))
return -EINVAL;
@@ -1489,8 +1542,6 @@
if (!ret) {
inode->i_mtime = inode->i_ctime = current_time(inode);
f2fs_mark_inode_dirty_sync(inode, false);
- if (mode & FALLOC_FL_KEEP_SIZE)
- file_set_keep_isize(inode);
f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
}
@@ -1513,13 +1564,13 @@
/* some remained atomic pages should discarded */
if (f2fs_is_atomic_file(inode))
- drop_inmem_pages(inode);
+ f2fs_drop_inmem_pages(inode);
if (f2fs_is_volatile_file(inode)) {
- clear_inode_flag(inode, FI_VOLATILE_FILE);
- stat_dec_volatile_write(inode);
set_inode_flag(inode, FI_DROP_CACHE);
filemap_fdatawrite(inode->i_mapping);
clear_inode_flag(inode, FI_DROP_CACHE);
+ clear_inode_flag(inode, FI_VOLATILE_FILE);
+ stat_dec_volatile_write(inode);
}
return 0;
}
@@ -1536,7 +1587,7 @@
*/
if (f2fs_is_atomic_file(inode) &&
F2FS_I(inode)->inmem_task == current)
- drop_inmem_pages(inode);
+ f2fs_drop_inmem_pages(inode);
return 0;
}
@@ -1544,8 +1595,15 @@
{
struct inode *inode = file_inode(filp);
struct f2fs_inode_info *fi = F2FS_I(inode);
- unsigned int flags = fi->i_flags &
- (FS_FL_USER_VISIBLE | FS_PROJINHERIT_FL);
+ unsigned int flags = fi->i_flags;
+
+ if (file_is_encrypt(inode))
+ flags |= F2FS_ENCRYPT_FL;
+ if (f2fs_has_inline_data(inode) || f2fs_has_inline_dentry(inode))
+ flags |= F2FS_INLINE_DATA_FL;
+
+ flags &= F2FS_FL_USER_VISIBLE;
+
return put_user(flags, (int __user *)arg);
}
@@ -1562,15 +1620,15 @@
oldflags = fi->i_flags;
- if ((flags ^ oldflags) & (FS_APPEND_FL | FS_IMMUTABLE_FL))
+ if ((flags ^ oldflags) & (F2FS_APPEND_FL | F2FS_IMMUTABLE_FL))
if (!capable(CAP_LINUX_IMMUTABLE))
return -EPERM;
- flags = flags & (FS_FL_USER_MODIFIABLE | FS_PROJINHERIT_FL);
- flags |= oldflags & ~(FS_FL_USER_MODIFIABLE | FS_PROJINHERIT_FL);
+ flags = flags & F2FS_FL_USER_MODIFIABLE;
+ flags |= oldflags & ~F2FS_FL_USER_MODIFIABLE;
fi->i_flags = flags;
- if (fi->i_flags & FS_PROJINHERIT_FL)
+ if (fi->i_flags & F2FS_PROJINHERIT_FL)
set_inode_flag(inode, FI_PROJ_INHERIT);
else
clear_inode_flag(inode, FI_PROJ_INHERIT);
@@ -1630,7 +1688,7 @@
inode_lock(inode);
- down_write(&F2FS_I(inode)->dio_rwsem[WRITE]);
+ down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
if (f2fs_is_atomic_file(inode))
goto out;
@@ -1639,29 +1697,25 @@
if (ret)
goto out;
- set_inode_flag(inode, FI_ATOMIC_FILE);
- set_inode_flag(inode, FI_HOT_DATA);
- f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
-
if (!get_dirty_pages(inode))
- goto inc_stat;
+ goto skip_flush;
f2fs_msg(F2FS_I_SB(inode)->sb, KERN_WARNING,
"Unexpected flush for atomic writes: ino=%lu, npages=%u",
inode->i_ino, get_dirty_pages(inode));
ret = filemap_write_and_wait_range(inode->i_mapping, 0, LLONG_MAX);
- if (ret) {
- clear_inode_flag(inode, FI_ATOMIC_FILE);
- clear_inode_flag(inode, FI_HOT_DATA);
+ if (ret)
goto out;
- }
+skip_flush:
+ set_inode_flag(inode, FI_ATOMIC_FILE);
+ clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
+ f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
-inc_stat:
F2FS_I(inode)->inmem_task = current;
stat_inc_atomic_write(inode);
stat_update_max_atomic_write(inode);
out:
- up_write(&F2FS_I(inode)->dio_rwsem[WRITE]);
+ up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
inode_unlock(inode);
mnt_drop_write_file(filp);
return ret;
@@ -1681,24 +1735,33 @@
inode_lock(inode);
- if (f2fs_is_volatile_file(inode))
+ down_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
+
+ if (f2fs_is_volatile_file(inode)) {
+ ret = -EINVAL;
goto err_out;
+ }
if (f2fs_is_atomic_file(inode)) {
- ret = commit_inmem_pages(inode);
+ ret = f2fs_commit_inmem_pages(inode);
if (ret)
goto err_out;
ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 0, true);
if (!ret) {
clear_inode_flag(inode, FI_ATOMIC_FILE);
- clear_inode_flag(inode, FI_HOT_DATA);
+ F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
stat_dec_atomic_write(inode);
}
} else {
ret = f2fs_do_sync_file(filp, 0, LLONG_MAX, 1, false);
}
err_out:
+ if (is_inode_flag_set(inode, FI_ATOMIC_REVOKE_REQUEST)) {
+ clear_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
+ ret = -EINVAL;
+ }
+ up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
inode_unlock(inode);
mnt_drop_write_file(filp);
return ret;
@@ -1783,7 +1846,7 @@
inode_lock(inode);
if (f2fs_is_atomic_file(inode))
- drop_inmem_pages(inode);
+ f2fs_drop_inmem_pages(inode);
if (f2fs_is_volatile_file(inode)) {
clear_inode_flag(inode, FI_VOLATILE_FILE);
stat_dec_volatile_write(inode);
@@ -1820,27 +1883,40 @@
switch (in) {
case F2FS_GOING_DOWN_FULLSYNC:
sb = freeze_bdev(sb->s_bdev);
- if (sb && !IS_ERR(sb)) {
+ if (IS_ERR(sb)) {
+ ret = PTR_ERR(sb);
+ goto out;
+ }
+ if (sb) {
f2fs_stop_checkpoint(sbi, false);
thaw_bdev(sb->s_bdev, sb);
}
break;
case F2FS_GOING_DOWN_METASYNC:
/* do checkpoint only */
- f2fs_sync_fs(sb, 1);
+ ret = f2fs_sync_fs(sb, 1);
+ if (ret)
+ goto out;
f2fs_stop_checkpoint(sbi, false);
break;
case F2FS_GOING_DOWN_NOSYNC:
f2fs_stop_checkpoint(sbi, false);
break;
case F2FS_GOING_DOWN_METAFLUSH:
- sync_meta_pages(sbi, META, LONG_MAX, FS_META_IO);
+ f2fs_sync_meta_pages(sbi, META, LONG_MAX, FS_META_IO);
f2fs_stop_checkpoint(sbi, false);
break;
default:
ret = -EINVAL;
goto out;
}
+
+ f2fs_stop_gc_thread(sbi);
+ f2fs_stop_discard_thread(sbi);
+
+ f2fs_drop_discard_cmd(sbi);
+ clear_opt(sbi, DISCARD);
+
f2fs_update_time(sbi, REQ_TIME);
out:
if (in != F2FS_GOING_DOWN_FULLSYNC)
@@ -1898,6 +1974,9 @@
{
struct inode *inode = file_inode(filp);
+ if (!f2fs_sb_has_encrypt(inode->i_sb))
+ return -EOPNOTSUPP;
+
f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
return fscrypt_ioctl_set_policy(filp, (const void __user *)arg);
@@ -1905,6 +1984,8 @@
static int f2fs_ioc_get_encryption_policy(struct file *filp, unsigned long arg)
{
+ if (!f2fs_sb_has_encrypt(file_inode(filp)->i_sb))
+ return -EOPNOTSUPP;
return fscrypt_ioctl_get_policy(filp, (void __user *)arg);
}
@@ -1914,16 +1995,18 @@
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
int err;
- if (!f2fs_sb_has_crypto(inode->i_sb))
+ if (!f2fs_sb_has_encrypt(inode->i_sb))
return -EOPNOTSUPP;
- if (uuid_is_nonzero(sbi->raw_super->encrypt_pw_salt))
- goto got_it;
-
err = mnt_want_write_file(filp);
if (err)
return err;
+ down_write(&sbi->sb_lock);
+
+ if (uuid_is_nonzero(sbi->raw_super->encrypt_pw_salt))
+ goto got_it;
+
/* update superblock with uuid */
generate_random_uuid(sbi->raw_super->encrypt_pw_salt);
@@ -1931,15 +2014,16 @@
if (err) {
/* undo new data */
memset(sbi->raw_super->encrypt_pw_salt, 0, 16);
- mnt_drop_write_file(filp);
- return err;
+ goto out_err;
}
- mnt_drop_write_file(filp);
got_it:
if (copy_to_user((__u8 __user *)arg, sbi->raw_super->encrypt_pw_salt,
16))
- return -EFAULT;
- return 0;
+ err = -EFAULT;
+out_err:
+ up_write(&sbi->sb_lock);
+ mnt_drop_write_file(filp);
+ return err;
}
static int f2fs_ioc_gc(struct file *filp, unsigned long arg)
@@ -1995,13 +2079,15 @@
if (f2fs_readonly(sbi->sb))
return -EROFS;
+ end = range.start + range.len;
+ if (range.start < MAIN_BLKADDR(sbi) || end >= MAX_BLKADDR(sbi)) {
+ return -EINVAL;
+ }
+
ret = mnt_want_write_file(filp);
if (ret)
return ret;
- end = range.start + range.len;
- if (range.start < MAIN_BLKADDR(sbi) || end >= MAX_BLKADDR(sbi))
- return -EINVAL;
do_more:
if (!range.sync) {
if (!mutex_trylock(&sbi->gc_mutex)) {
@@ -2021,7 +2107,7 @@
return ret;
}
-static int f2fs_ioc_write_checkpoint(struct file *filp, unsigned long arg)
+static int f2fs_ioc_f2fs_write_checkpoint(struct file *filp, unsigned long arg)
{
struct inode *inode = file_inode(filp);
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
@@ -2048,9 +2134,10 @@
struct f2fs_defragment *range)
{
struct inode *inode = file_inode(filp);
- struct f2fs_map_blocks map = { .m_next_pgofs = NULL };
- struct extent_info ei = {0,0,0};
- pgoff_t pg_start, pg_end;
+ struct f2fs_map_blocks map = { .m_next_extent = NULL,
+ .m_seg_type = NO_CHECK_TYPE };
+ struct extent_info ei = {0, 0, 0};
+ pgoff_t pg_start, pg_end, next_pgofs;
unsigned int blk_per_seg = sbi->blocks_per_seg;
unsigned int total = 0, sec_num;
block_t blk_end = 0;
@@ -2058,7 +2145,7 @@
int err;
/* if in-place-update policy is enabled, don't waste time here */
- if (need_inplace_update_policy(inode, NULL))
+ if (f2fs_should_update_inplace(inode, NULL))
return -EINVAL;
pg_start = range->start >> PAGE_SHIFT;
@@ -2084,6 +2171,7 @@
}
map.m_lblk = pg_start;
+ map.m_next_pgofs = &next_pgofs;
/*
* lookup mapping info in dnode page cache, skip defragmenting if all
@@ -2097,14 +2185,16 @@
goto out;
if (!(map.m_flags & F2FS_MAP_FLAGS)) {
- map.m_lblk++;
+ map.m_lblk = next_pgofs;
continue;
}
- if (blk_end && blk_end != map.m_pblk) {
+ if (blk_end && blk_end != map.m_pblk)
fragmented = true;
- break;
- }
+
+ /* record total count of block that we're going to move */
+ total += map.m_len;
+
blk_end = map.m_pblk + map.m_len;
map.m_lblk += map.m_len;
@@ -2113,10 +2203,7 @@
if (!fragmented)
goto out;
- map.m_lblk = pg_start;
- map.m_len = pg_end - pg_start;
-
- sec_num = (map.m_len + BLKS_PER_SEC(sbi) - 1) / BLKS_PER_SEC(sbi);
+ sec_num = (total + BLKS_PER_SEC(sbi) - 1) / BLKS_PER_SEC(sbi);
/*
* make sure there are enough free section for LFS allocation, this can
@@ -2128,6 +2215,10 @@
goto out;
}
+ map.m_lblk = pg_start;
+ map.m_len = pg_end - pg_start;
+ total = 0;
+
while (map.m_lblk < pg_end) {
pgoff_t idx;
int cnt = 0;
@@ -2139,7 +2230,7 @@
goto clear_out;
if (!(map.m_flags & F2FS_MAP_FLAGS)) {
- map.m_lblk++;
+ map.m_lblk = next_pgofs;
continue;
}
@@ -2149,7 +2240,7 @@
while (idx < map.m_lblk + map.m_len && cnt < blk_per_seg) {
struct page *page;
- page = get_lock_data_page(inode, idx, true);
+ page = f2fs_get_lock_data_page(inode, idx, true);
if (IS_ERR(page)) {
err = PTR_ERR(page);
goto clear_out;
@@ -2260,9 +2351,13 @@
}
inode_lock(src);
+ down_write(&F2FS_I(src)->i_gc_rwsem[WRITE]);
if (src != dst) {
- if (!inode_trylock(dst)) {
- ret = -EBUSY;
+ ret = -EBUSY;
+ if (!inode_trylock(dst))
+ goto out;
+ if (!down_write_trylock(&F2FS_I(dst)->i_gc_rwsem[WRITE])) {
+ inode_unlock(dst);
goto out;
}
}
@@ -2322,9 +2417,12 @@
}
f2fs_unlock_op(sbi);
out_unlock:
- if (src != dst)
+ if (src != dst) {
+ up_write(&F2FS_I(dst)->i_gc_rwsem[WRITE]);
inode_unlock(dst);
+ }
out:
+ up_write(&F2FS_I(src)->i_gc_rwsem[WRITE]);
inode_unlock(src);
return ret;
}
@@ -2482,7 +2580,7 @@
if (IS_NOQUOTA(inode))
goto out_unlock;
- ipage = get_node_page(sbi, inode->i_ino);
+ ipage = f2fs_get_node_page(sbi, inode->i_ino);
if (IS_ERR(ipage)) {
err = PTR_ERR(ipage);
goto out_unlock;
@@ -2531,17 +2629,17 @@
{
__u32 xflags = 0;
- if (iflags & FS_SYNC_FL)
+ if (iflags & F2FS_SYNC_FL)
xflags |= FS_XFLAG_SYNC;
- if (iflags & FS_IMMUTABLE_FL)
+ if (iflags & F2FS_IMMUTABLE_FL)
xflags |= FS_XFLAG_IMMUTABLE;
- if (iflags & FS_APPEND_FL)
+ if (iflags & F2FS_APPEND_FL)
xflags |= FS_XFLAG_APPEND;
- if (iflags & FS_NODUMP_FL)
+ if (iflags & F2FS_NODUMP_FL)
xflags |= FS_XFLAG_NODUMP;
- if (iflags & FS_NOATIME_FL)
+ if (iflags & F2FS_NOATIME_FL)
xflags |= FS_XFLAG_NOATIME;
- if (iflags & FS_PROJINHERIT_FL)
+ if (iflags & F2FS_PROJINHERIT_FL)
xflags |= FS_XFLAG_PROJINHERIT;
return xflags;
}
@@ -2550,31 +2648,23 @@
FS_XFLAG_APPEND | FS_XFLAG_NODUMP | \
FS_XFLAG_NOATIME | FS_XFLAG_PROJINHERIT)
-/* Flags we can manipulate with through EXT4_IOC_FSSETXATTR */
-#define F2FS_FL_XFLAG_VISIBLE (FS_SYNC_FL | \
- FS_IMMUTABLE_FL | \
- FS_APPEND_FL | \
- FS_NODUMP_FL | \
- FS_NOATIME_FL | \
- FS_PROJINHERIT_FL)
-
/* Transfer xflags flags to internal */
static inline unsigned long f2fs_xflags_to_iflags(__u32 xflags)
{
unsigned long iflags = 0;
if (xflags & FS_XFLAG_SYNC)
- iflags |= FS_SYNC_FL;
+ iflags |= F2FS_SYNC_FL;
if (xflags & FS_XFLAG_IMMUTABLE)
- iflags |= FS_IMMUTABLE_FL;
+ iflags |= F2FS_IMMUTABLE_FL;
if (xflags & FS_XFLAG_APPEND)
- iflags |= FS_APPEND_FL;
+ iflags |= F2FS_APPEND_FL;
if (xflags & FS_XFLAG_NODUMP)
- iflags |= FS_NODUMP_FL;
+ iflags |= F2FS_NODUMP_FL;
if (xflags & FS_XFLAG_NOATIME)
- iflags |= FS_NOATIME_FL;
+ iflags |= F2FS_NOATIME_FL;
if (xflags & FS_XFLAG_PROJINHERIT)
- iflags |= FS_PROJINHERIT_FL;
+ iflags |= F2FS_PROJINHERIT_FL;
return iflags;
}
@@ -2587,7 +2677,7 @@
memset(&fa, 0, sizeof(struct fsxattr));
fa.fsx_xflags = f2fs_iflags_to_xflags(fi->i_flags &
- (FS_FL_USER_VISIBLE | FS_PROJINHERIT_FL));
+ F2FS_FL_USER_VISIBLE);
if (f2fs_sb_has_project_quota(inode->i_sb))
fa.fsx_projid = (__u32)from_kprojid(&init_user_ns,
@@ -2640,8 +2730,132 @@
return 0;
}
+int f2fs_pin_file_control(struct inode *inode, bool inc)
+{
+ struct f2fs_inode_info *fi = F2FS_I(inode);
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+
+ /* Use i_gc_failures for normal file as a risk signal. */
+ if (inc)
+ f2fs_i_gc_failures_write(inode,
+ fi->i_gc_failures[GC_FAILURE_PIN] + 1);
+
+ if (fi->i_gc_failures[GC_FAILURE_PIN] > sbi->gc_pin_file_threshold) {
+ f2fs_msg(sbi->sb, KERN_WARNING,
+ "%s: Enable GC = ino %lx after %x GC trials\n",
+ __func__, inode->i_ino,
+ fi->i_gc_failures[GC_FAILURE_PIN]);
+ clear_inode_flag(inode, FI_PIN_FILE);
+ return -EAGAIN;
+ }
+ return 0;
+}
+
+static int f2fs_ioc_set_pin_file(struct file *filp, unsigned long arg)
+{
+ struct inode *inode = file_inode(filp);
+ __u32 pin;
+ int ret = 0;
+
+ if (!inode_owner_or_capable(inode))
+ return -EACCES;
+
+ if (get_user(pin, (__u32 __user *)arg))
+ return -EFAULT;
+
+ if (!S_ISREG(inode->i_mode))
+ return -EINVAL;
+
+ if (f2fs_readonly(F2FS_I_SB(inode)->sb))
+ return -EROFS;
+
+ ret = mnt_want_write_file(filp);
+ if (ret)
+ return ret;
+
+ inode_lock(inode);
+
+ if (f2fs_should_update_outplace(inode, NULL)) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ if (!pin) {
+ clear_inode_flag(inode, FI_PIN_FILE);
+ F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN] = 1;
+ goto done;
+ }
+
+ if (f2fs_pin_file_control(inode, false)) {
+ ret = -EAGAIN;
+ goto out;
+ }
+ ret = f2fs_convert_inline_inode(inode);
+ if (ret)
+ goto out;
+
+ set_inode_flag(inode, FI_PIN_FILE);
+ ret = F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN];
+done:
+ f2fs_update_time(F2FS_I_SB(inode), REQ_TIME);
+out:
+ inode_unlock(inode);
+ mnt_drop_write_file(filp);
+ return ret;
+}
+
+static int f2fs_ioc_get_pin_file(struct file *filp, unsigned long arg)
+{
+ struct inode *inode = file_inode(filp);
+ __u32 pin = 0;
+
+ if (is_inode_flag_set(inode, FI_PIN_FILE))
+ pin = F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN];
+ return put_user(pin, (u32 __user *)arg);
+}
+
+int f2fs_precache_extents(struct inode *inode)
+{
+ struct f2fs_inode_info *fi = F2FS_I(inode);
+ struct f2fs_map_blocks map;
+ pgoff_t m_next_extent;
+ loff_t end;
+ int err;
+
+ if (is_inode_flag_set(inode, FI_NO_EXTENT))
+ return -EOPNOTSUPP;
+
+ map.m_lblk = 0;
+ map.m_next_pgofs = NULL;
+ map.m_next_extent = &m_next_extent;
+ map.m_seg_type = NO_CHECK_TYPE;
+ end = F2FS_I_SB(inode)->max_file_blocks;
+
+ while (map.m_lblk < end) {
+ map.m_len = end - map.m_lblk;
+
+ down_write(&fi->i_gc_rwsem[WRITE]);
+ err = f2fs_map_blocks(inode, &map, 0, F2FS_GET_BLOCK_PRECACHE);
+ up_write(&fi->i_gc_rwsem[WRITE]);
+ if (err)
+ return err;
+
+ map.m_lblk = m_next_extent;
+ }
+
+ return err;
+}
+
+static int f2fs_ioc_precache_extents(struct file *filp, unsigned long arg)
+{
+ return f2fs_precache_extents(file_inode(filp));
+}
+
long f2fs_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
{
+ if (unlikely(f2fs_cp_error(F2FS_I_SB(file_inode(filp)))))
+ return -EIO;
+
switch (cmd) {
case F2FS_IOC_GETFLAGS:
return f2fs_ioc_getflags(filp, arg);
@@ -2674,7 +2888,7 @@
case F2FS_IOC_GARBAGE_COLLECT_RANGE:
return f2fs_ioc_gc_range(filp, arg);
case F2FS_IOC_WRITE_CHECKPOINT:
- return f2fs_ioc_write_checkpoint(filp, arg);
+ return f2fs_ioc_f2fs_write_checkpoint(filp, arg);
case F2FS_IOC_DEFRAGMENT:
return f2fs_ioc_defragment(filp, arg);
case F2FS_IOC_MOVE_RANGE:
@@ -2687,6 +2901,12 @@
return f2fs_ioc_fsgetxattr(filp, arg);
case F2FS_IOC_FSSETXATTR:
return f2fs_ioc_fssetxattr(filp, arg);
+ case F2FS_IOC_GET_PIN_FILE:
+ return f2fs_ioc_get_pin_file(filp, arg);
+ case F2FS_IOC_SET_PIN_FILE:
+ return f2fs_ioc_set_pin_file(filp, arg);
+ case F2FS_IOC_PRECACHE_EXTENTS:
+ return f2fs_ioc_precache_extents(filp, arg);
default:
return -ENOTTY;
}
@@ -2696,10 +2916,20 @@
{
struct file *file = iocb->ki_filp;
struct inode *inode = file_inode(file);
- struct blk_plug plug;
ssize_t ret;
- inode_lock(inode);
+ if (unlikely(f2fs_cp_error(F2FS_I_SB(inode))))
+ return -EIO;
+
+ if ((iocb->ki_flags & IOCB_NOWAIT) && !(iocb->ki_flags & IOCB_DIRECT))
+ return -EINVAL;
+
+ if (!inode_trylock(inode)) {
+ if (iocb->ki_flags & IOCB_NOWAIT)
+ return -EAGAIN;
+ inode_lock(inode);
+ }
+
ret = generic_write_checks(iocb, from);
if (ret > 0) {
bool preallocated = false;
@@ -2709,18 +2939,30 @@
if (iov_iter_fault_in_readable(from, iov_iter_count(from)))
set_inode_flag(inode, FI_NO_PREALLOC);
- preallocated = true;
- target_size = iocb->ki_pos + iov_iter_count(from);
+ if ((iocb->ki_flags & IOCB_NOWAIT) &&
+ (iocb->ki_flags & IOCB_DIRECT)) {
+ if (!f2fs_overwrite_io(inode, iocb->ki_pos,
+ iov_iter_count(from)) ||
+ f2fs_has_inline_data(inode) ||
+ f2fs_force_buffered_io(inode, WRITE)) {
+ clear_inode_flag(inode,
+ FI_NO_PREALLOC);
+ inode_unlock(inode);
+ return -EAGAIN;
+ }
- err = f2fs_preallocate_blocks(iocb, from);
- if (err) {
- clear_inode_flag(inode, FI_NO_PREALLOC);
- inode_unlock(inode);
- return err;
+ } else {
+ preallocated = true;
+ target_size = iocb->ki_pos + iov_iter_count(from);
+
+ err = f2fs_preallocate_blocks(iocb, from);
+ if (err) {
+ clear_inode_flag(inode, FI_NO_PREALLOC);
+ inode_unlock(inode);
+ return err;
+ }
}
- blk_start_plug(&plug);
ret = __generic_file_write_iter(iocb, from);
- blk_finish_plug(&plug);
clear_inode_flag(inode, FI_NO_PREALLOC);
/* if we couldn't write data, we should deallocate blocks. */
@@ -2768,6 +3010,9 @@
case F2FS_IOC_GET_FEATURES:
case F2FS_IOC_FSGETXATTR:
case F2FS_IOC_FSSETXATTR:
+ case F2FS_IOC_GET_PIN_FILE:
+ case F2FS_IOC_SET_PIN_FILE:
+ case F2FS_IOC_PRECACHE_EXTENTS:
break;
default:
return -ENOIOCTLCMD;
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index f228844..9be5b50 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -76,14 +76,15 @@
* invalidated soon after by user update or deletion.
* So, I'd like to wait some time to collect dirty segments.
*/
- if (!mutex_trylock(&sbi->gc_mutex))
- goto next;
-
- if (gc_th->gc_urgent) {
+ if (sbi->gc_mode == GC_URGENT) {
wait_ms = gc_th->urgent_sleep_time;
+ mutex_lock(&sbi->gc_mutex);
goto do_gc;
}
+ if (!mutex_trylock(&sbi->gc_mutex))
+ goto next;
+
if (!is_idle(sbi)) {
increase_sleep_time(gc_th, &wait_ms);
mutex_unlock(&sbi->gc_mutex);
@@ -113,7 +114,7 @@
return 0;
}
-int start_gc_thread(struct f2fs_sb_info *sbi)
+int f2fs_start_gc_thread(struct f2fs_sb_info *sbi)
{
struct f2fs_gc_kthread *gc_th;
dev_t dev = sbi->sb->s_bdev->bd_dev;
@@ -130,8 +131,6 @@
gc_th->max_sleep_time = DEF_GC_THREAD_MAX_SLEEP_TIME;
gc_th->no_gc_sleep_time = DEF_GC_THREAD_NOGC_SLEEP_TIME;
- gc_th->gc_idle = 0;
- gc_th->gc_urgent = 0;
gc_th->gc_wake= 0;
sbi->gc_thread = gc_th;
@@ -147,7 +146,7 @@
return err;
}
-void stop_gc_thread(struct f2fs_sb_info *sbi)
+void f2fs_stop_gc_thread(struct f2fs_sb_info *sbi)
{
struct f2fs_gc_kthread *gc_th = sbi->gc_thread;
if (!gc_th)
@@ -157,15 +156,18 @@
sbi->gc_thread = NULL;
}
-static int select_gc_type(struct f2fs_gc_kthread *gc_th, int gc_type)
+static int select_gc_type(struct f2fs_sb_info *sbi, int gc_type)
{
int gc_mode = (gc_type == BG_GC) ? GC_CB : GC_GREEDY;
- if (gc_th && gc_th->gc_idle) {
- if (gc_th->gc_idle == 1)
- gc_mode = GC_CB;
- else if (gc_th->gc_idle == 2)
- gc_mode = GC_GREEDY;
+ switch (sbi->gc_mode) {
+ case GC_IDLE_CB:
+ gc_mode = GC_CB;
+ break;
+ case GC_IDLE_GREEDY:
+ case GC_URGENT:
+ gc_mode = GC_GREEDY;
+ break;
}
return gc_mode;
}
@@ -181,14 +183,16 @@
p->max_search = dirty_i->nr_dirty[type];
p->ofs_unit = 1;
} else {
- p->gc_mode = select_gc_type(sbi->gc_thread, gc_type);
+ p->gc_mode = select_gc_type(sbi, gc_type);
p->dirty_segmap = dirty_i->dirty_segmap[DIRTY];
p->max_search = dirty_i->nr_dirty[DIRTY];
p->ofs_unit = sbi->segs_per_sec;
}
/* we need to check every dirty segments in the FG_GC case */
- if (gc_type != FG_GC && p->max_search > sbi->max_victim_search)
+ if (gc_type != FG_GC &&
+ (sbi->gc_mode != GC_URGENT) &&
+ p->max_search > sbi->max_victim_search)
p->max_search = sbi->max_victim_search;
/* let's select beginning hot/small space first in no_heap mode*/
@@ -226,10 +230,6 @@
for_each_set_bit(secno, dirty_i->victim_secmap, MAIN_SECS(sbi)) {
if (sec_usage_check(sbi, secno))
continue;
-
- if (no_fggc_candidate(sbi, secno))
- continue;
-
clear_bit(secno, dirty_i->victim_secmap);
return GET_SEG_FROM_SEC(sbi, secno);
}
@@ -268,16 +268,6 @@
return UINT_MAX - ((100 * (100 - u) * age) / (100 + u));
}
-static unsigned int get_greedy_cost(struct f2fs_sb_info *sbi,
- unsigned int segno)
-{
- unsigned int valid_blocks =
- get_valid_blocks(sbi, segno, true);
-
- return IS_DATASEG(get_seg_entry(sbi, segno)->type) ?
- valid_blocks * 2 : valid_blocks;
-}
-
static inline unsigned int get_gc_cost(struct f2fs_sb_info *sbi,
unsigned int segno, struct victim_sel_policy *p)
{
@@ -286,7 +276,7 @@
/* alloc_mode == LFS */
if (p->gc_mode == GC_GREEDY)
- return get_greedy_cost(sbi, segno);
+ return get_valid_blocks(sbi, segno, true);
else
return get_cb_cost(sbi, segno);
}
@@ -379,9 +369,6 @@
goto next;
if (gc_type == BG_GC && test_bit(secno, dirty_i->victim_secmap))
goto next;
- if (gc_type == FG_GC && p.alloc_mode == LFS &&
- no_fggc_candidate(sbi, secno))
- goto next;
cost = get_gc_cost(sbi, segno, &p);
@@ -442,7 +429,7 @@
iput(inode);
return;
}
- new_ie = f2fs_kmem_cache_alloc(inode_entry_slab, GFP_NOFS);
+ new_ie = f2fs_kmem_cache_alloc(f2fs_inode_entry_slab, GFP_NOFS);
new_ie->inode = inode;
f2fs_radix_tree_insert(&gc_list->iroot, inode->i_ino, new_ie);
@@ -456,7 +443,7 @@
radix_tree_delete(&gc_list->iroot, ie->inode->i_ino);
iput(ie->inode);
list_del(&ie->list);
- kmem_cache_free(inode_entry_slab, ie);
+ kmem_cache_free(f2fs_inode_entry_slab, ie);
}
}
@@ -467,10 +454,10 @@
struct seg_entry *sentry;
int ret;
- mutex_lock(&sit_i->sentry_lock);
+ down_read(&sit_i->sentry_lock);
sentry = get_seg_entry(sbi, segno);
ret = f2fs_test_bit(offset, sentry->cur_valid_map);
- mutex_unlock(&sit_i->sentry_lock);
+ up_read(&sit_i->sentry_lock);
return ret;
}
@@ -486,12 +473,16 @@
block_t start_addr;
int off;
int phase = 0;
+ bool fggc = (gc_type == FG_GC);
start_addr = START_BLOCK(sbi, segno);
next_step:
entry = sum;
+ if (fggc && phase == 2)
+ atomic_inc(&sbi->wb_sync_req[NODE]);
+
for (off = 0; off < sbi->blocks_per_seg; off++, entry++) {
nid_t nid = le32_to_cpu(entry->nid);
struct page *node_page;
@@ -505,39 +496,42 @@
continue;
if (phase == 0) {
- ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), 1,
+ f2fs_ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), 1,
META_NAT, true);
continue;
}
if (phase == 1) {
- ra_node_page(sbi, nid);
+ f2fs_ra_node_page(sbi, nid);
continue;
}
/* phase == 2 */
- node_page = get_node_page(sbi, nid);
+ node_page = f2fs_get_node_page(sbi, nid);
if (IS_ERR(node_page))
continue;
- /* block may become invalid during get_node_page */
+ /* block may become invalid during f2fs_get_node_page */
if (check_valid_map(sbi, segno, off) == 0) {
f2fs_put_page(node_page, 1);
continue;
}
- get_node_info(sbi, nid, &ni);
+ f2fs_get_node_info(sbi, nid, &ni);
if (ni.blk_addr != start_addr + off) {
f2fs_put_page(node_page, 1);
continue;
}
- move_node_page(node_page, gc_type);
+ f2fs_move_node_page(node_page, gc_type);
stat_inc_node_blk_count(sbi, 1, gc_type);
}
if (++phase < 3)
goto next_step;
+
+ if (fggc)
+ atomic_dec(&sbi->wb_sync_req[NODE]);
}
/*
@@ -547,7 +541,7 @@
* as indirect or double indirect node blocks, are given, it must be a caller's
* bug.
*/
-block_t start_bidx_of_node(unsigned int node_ofs, struct inode *inode)
+block_t f2fs_start_bidx_of_node(unsigned int node_ofs, struct inode *inode)
{
unsigned int indirect_blks = 2 * NIDS_PER_BLOCK + 4;
unsigned int bidx;
@@ -578,11 +572,11 @@
nid = le32_to_cpu(sum->nid);
ofs_in_node = le16_to_cpu(sum->ofs_in_node);
- node_page = get_node_page(sbi, nid);
+ node_page = f2fs_get_node_page(sbi, nid);
if (IS_ERR(node_page))
return false;
- get_node_info(sbi, nid, dni);
+ f2fs_get_node_info(sbi, nid, dni);
if (sum->version != dni->version) {
f2fs_msg(sbi->sb, KERN_WARNING,
@@ -605,16 +599,18 @@
* This can be used to move blocks, aka LBAs, directly on disk.
*/
static void move_data_block(struct inode *inode, block_t bidx,
- unsigned int segno, int off)
+ int gc_type, unsigned int segno, int off)
{
struct f2fs_io_info fio = {
.sbi = F2FS_I_SB(inode),
+ .ino = inode->i_ino,
.type = DATA,
.temp = COLD,
.op = REQ_OP_READ,
.op_flags = 0,
.encrypted_page = NULL,
.in_list = false,
+ .retry = false,
};
struct dnode_of_data dn;
struct f2fs_summary sum;
@@ -622,6 +618,7 @@
struct page *page;
block_t newaddr;
int err;
+ bool lfs_mode = test_opt(fio.sbi, LFS);
/* do not read out */
page = f2fs_grab_cache_page(inode->i_mapping, bidx, false);
@@ -631,11 +628,19 @@
if (!check_valid_map(F2FS_I_SB(inode), segno, off))
goto out;
- if (f2fs_is_atomic_file(inode))
+ if (f2fs_is_atomic_file(inode)) {
+ F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC]++;
+ F2FS_I_SB(inode)->skipped_atomic_files[gc_type]++;
goto out;
+ }
+
+ if (f2fs_is_pinned_file(inode)) {
+ f2fs_pin_file_control(inode, true);
+ goto out;
+ }
set_new_dnode(&dn, inode, NULL, NULL, 0);
- err = get_dnode_of_data(&dn, bidx, LOOKUP_NODE);
+ err = f2fs_get_dnode_of_data(&dn, bidx, LOOKUP_NODE);
if (err)
goto out;
@@ -650,18 +655,21 @@
*/
f2fs_wait_on_page_writeback(page, DATA, true);
- get_node_info(fio.sbi, dn.nid, &ni);
+ f2fs_get_node_info(fio.sbi, dn.nid, &ni);
set_summary(&sum, dn.nid, dn.ofs_in_node, ni.version);
/* read page */
fio.page = page;
fio.new_blkaddr = fio.old_blkaddr = dn.data_blkaddr;
- allocate_data_block(fio.sbi, NULL, fio.old_blkaddr, &newaddr,
+ if (lfs_mode)
+ down_write(&fio.sbi->io_order_lock);
+
+ f2fs_allocate_data_block(fio.sbi, NULL, fio.old_blkaddr, &newaddr,
&sum, CURSEG_COLD_DATA, NULL, false);
- fio.encrypted_page = pagecache_get_page(META_MAPPING(fio.sbi), newaddr,
- FGP_LOCK | FGP_CREAT, GFP_NOFS);
+ fio.encrypted_page = f2fs_pagecache_get_page(META_MAPPING(fio.sbi),
+ newaddr, FGP_LOCK | FGP_CREAT, GFP_NOFS);
if (!fio.encrypted_page) {
err = -ENOMEM;
goto recover_block;
@@ -689,6 +697,7 @@
dec_page_count(fio.sbi, F2FS_DIRTY_META);
set_page_writeback(fio.encrypted_page);
+ ClearPageError(page);
/* allocate block address */
f2fs_wait_on_page_writeback(dn.node_page, NODE, true);
@@ -696,8 +705,8 @@
fio.op = REQ_OP_WRITE;
fio.op_flags = REQ_SYNC;
fio.new_blkaddr = newaddr;
- err = f2fs_submit_page_write(&fio);
- if (err) {
+ f2fs_submit_page_write(&fio);
+ if (fio.retry) {
if (PageWriteback(fio.encrypted_page))
end_page_writeback(fio.encrypted_page);
goto put_page_out;
@@ -712,8 +721,10 @@
put_page_out:
f2fs_put_page(fio.encrypted_page, 1);
recover_block:
+ if (lfs_mode)
+ up_write(&fio.sbi->io_order_lock);
if (err)
- __f2fs_replace_block(fio.sbi, &sum, newaddr, fio.old_blkaddr,
+ f2fs_do_replace_block(fio.sbi, &sum, newaddr, fio.old_blkaddr,
true, true);
put_out:
f2fs_put_dnode(&dn);
@@ -726,15 +737,23 @@
{
struct page *page;
- page = get_lock_data_page(inode, bidx, true);
+ page = f2fs_get_lock_data_page(inode, bidx, true);
if (IS_ERR(page))
return;
if (!check_valid_map(F2FS_I_SB(inode), segno, off))
goto out;
- if (f2fs_is_atomic_file(inode))
+ if (f2fs_is_atomic_file(inode)) {
+ F2FS_I(inode)->i_gc_failures[GC_FAILURE_ATOMIC]++;
+ F2FS_I_SB(inode)->skipped_atomic_files[gc_type]++;
goto out;
+ }
+ if (f2fs_is_pinned_file(inode)) {
+ if (gc_type == FG_GC)
+ f2fs_pin_file_control(inode, true);
+ goto out;
+ }
if (gc_type == BG_GC) {
if (PageWriteback(page))
@@ -744,6 +763,7 @@
} else {
struct f2fs_io_info fio = {
.sbi = F2FS_I_SB(inode),
+ .ino = inode->i_ino,
.type = DATA,
.temp = COLD,
.op = REQ_OP_WRITE,
@@ -762,12 +782,12 @@
f2fs_wait_on_page_writeback(page, DATA, true);
if (clear_page_dirty_for_io(page)) {
inode_dec_dirty_pages(inode);
- remove_dirty_inode(inode);
+ f2fs_remove_dirty_inode(inode);
}
set_cold_data(page);
- err = do_write_data_page(&fio);
+ err = f2fs_do_write_data_page(&fio);
if (err) {
clear_cold_data(page);
if (err == -ENOMEM) {
@@ -819,13 +839,13 @@
continue;
if (phase == 0) {
- ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), 1,
+ f2fs_ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), 1,
META_NAT, true);
continue;
}
if (phase == 1) {
- ra_node_page(sbi, nid);
+ f2fs_ra_node_page(sbi, nid);
continue;
}
@@ -834,7 +854,7 @@
continue;
if (phase == 2) {
- ra_node_page(sbi, dni.ino);
+ f2fs_ra_node_page(sbi, dni.ino);
continue;
}
@@ -845,16 +865,23 @@
if (IS_ERR(inode) || is_bad_inode(inode))
continue;
- /* if encrypted inode, let's go phase 3 */
- if (f2fs_encrypted_file(inode)) {
+ /* if inode uses special I/O path, let's go phase 3 */
+ if (f2fs_post_read_required(inode)) {
add_gc_inode(gc_list, inode);
continue;
}
- start_bidx = start_bidx_of_node(nofs, inode);
- data_page = get_read_data_page(inode,
+ if (!down_write_trylock(
+ &F2FS_I(inode)->i_gc_rwsem[WRITE])) {
+ iput(inode);
+ continue;
+ }
+
+ start_bidx = f2fs_start_bidx_of_node(nofs, inode);
+ data_page = f2fs_get_read_data_page(inode,
start_bidx + ofs_in_node, REQ_RAHEAD,
true);
+ up_write(&F2FS_I(inode)->i_gc_rwsem[WRITE]);
if (IS_ERR(data_page)) {
iput(inode);
continue;
@@ -872,11 +899,11 @@
bool locked = false;
if (S_ISREG(inode->i_mode)) {
- if (!down_write_trylock(&fi->dio_rwsem[READ]))
+ if (!down_write_trylock(&fi->i_gc_rwsem[READ]))
continue;
if (!down_write_trylock(
- &fi->dio_rwsem[WRITE])) {
- up_write(&fi->dio_rwsem[READ]);
+ &fi->i_gc_rwsem[WRITE])) {
+ up_write(&fi->i_gc_rwsem[READ]);
continue;
}
locked = true;
@@ -885,17 +912,18 @@
inode_dio_wait(inode);
}
- start_bidx = start_bidx_of_node(nofs, inode)
+ start_bidx = f2fs_start_bidx_of_node(nofs, inode)
+ ofs_in_node;
- if (f2fs_encrypted_file(inode))
- move_data_block(inode, start_bidx, segno, off);
+ if (f2fs_post_read_required(inode))
+ move_data_block(inode, start_bidx, gc_type,
+ segno, off);
else
move_data_page(inode, start_bidx, gc_type,
segno, off);
if (locked) {
- up_write(&fi->dio_rwsem[WRITE]);
- up_write(&fi->dio_rwsem[READ]);
+ up_write(&fi->i_gc_rwsem[WRITE]);
+ up_write(&fi->i_gc_rwsem[READ]);
}
stat_inc_data_blk_count(sbi, 1, gc_type);
@@ -912,10 +940,10 @@
struct sit_info *sit_i = SIT_I(sbi);
int ret;
- mutex_lock(&sit_i->sentry_lock);
+ down_write(&sit_i->sentry_lock);
ret = DIRTY_I(sbi)->v_ops->get_victim(sbi, victim, gc_type,
NO_CHECK_TYPE, LFS);
- mutex_unlock(&sit_i->sentry_lock);
+ up_write(&sit_i->sentry_lock);
return ret;
}
@@ -934,12 +962,12 @@
/* readahead multi ssa blocks those have contiguous address */
if (sbi->segs_per_sec > 1)
- ra_meta_pages(sbi, GET_SUM_BLOCK(sbi, segno),
+ f2fs_ra_meta_pages(sbi, GET_SUM_BLOCK(sbi, segno),
sbi->segs_per_sec, META_SSA, true);
/* reference all summary page */
while (segno < end_segno) {
- sum_page = get_sum_page(sbi, segno++);
+ sum_page = f2fs_get_sum_page(sbi, segno++);
unlock_page(sum_page);
}
@@ -969,8 +997,8 @@
/*
* this is to avoid deadlock:
* - lock_page(sum_page) - f2fs_replace_block
- * - check_valid_map() - mutex_lock(sentry_lock)
- * - mutex_lock(sentry_lock) - change_curseg()
+ * - check_valid_map() - down_write(sentry_lock)
+ * - down_read(sentry_lock) - change_curseg()
* - lock_page(sum_page)
*/
if (type == SUM_TYPE_NODE)
@@ -1011,6 +1039,8 @@
.ilist = LIST_HEAD_INIT(gc_list.ilist),
.iroot = RADIX_TREE_INIT(GFP_NOFS),
};
+ unsigned long long last_skipped = sbi->skipped_atomic_files[FG_GC];
+ unsigned int skipped_round = 0, round = 0;
trace_f2fs_gc_begin(sbi->sb, sync, background,
get_pages(sbi, F2FS_DIRTY_NODES),
@@ -1039,7 +1069,7 @@
* secure free segments which doesn't need fggc any more.
*/
if (prefree_segments(sbi)) {
- ret = write_checkpoint(sbi, &cpc);
+ ret = f2fs_write_checkpoint(sbi, &cpc);
if (ret)
goto stop;
}
@@ -1062,17 +1092,27 @@
sec_freed++;
total_freed += seg_freed;
+ if (gc_type == FG_GC) {
+ if (sbi->skipped_atomic_files[FG_GC] > last_skipped)
+ skipped_round++;
+ last_skipped = sbi->skipped_atomic_files[FG_GC];
+ round++;
+ }
+
if (gc_type == FG_GC)
sbi->cur_victim_sec = NULL_SEGNO;
if (!sync) {
if (has_not_enough_free_secs(sbi, sec_freed, 0)) {
+ if (skipped_round > MAX_SKIP_ATOMIC_COUNT &&
+ skipped_round * 2 >= round)
+ f2fs_drop_inmem_pages_all(sbi, true);
segno = NULL_SEGNO;
goto gc_more;
}
if (gc_type == FG_GC)
- ret = write_checkpoint(sbi, &cpc);
+ ret = f2fs_write_checkpoint(sbi, &cpc);
}
stop:
SIT_I(sbi)->last_victim[ALLOC_NEXT] = 0;
@@ -1096,19 +1136,11 @@
return ret;
}
-void build_gc_manager(struct f2fs_sb_info *sbi)
+void f2fs_build_gc_manager(struct f2fs_sb_info *sbi)
{
- u64 main_count, resv_count, ovp_count;
-
DIRTY_I(sbi)->v_ops = &default_v_ops;
- /* threshold of # of valid blocks in a section for victims of FG_GC */
- main_count = SM_I(sbi)->main_segments << sbi->log_blocks_per_seg;
- resv_count = SM_I(sbi)->reserved_segments << sbi->log_blocks_per_seg;
- ovp_count = SM_I(sbi)->ovp_segments << sbi->log_blocks_per_seg;
-
- sbi->fggc_threshold = div64_u64((main_count - ovp_count) *
- BLKS_PER_SEC(sbi), (main_count - resv_count));
+ sbi->gc_pin_file_threshold = DEF_GC_FAILED_PINNED_FILES;
/* give warm/cold data area from slower device */
if (sbi->s_ndevs && sbi->segs_per_sec == 1)
diff --git a/fs/f2fs/gc.h b/fs/f2fs/gc.h
index 9325191..c8619e4 100644
--- a/fs/f2fs/gc.h
+++ b/fs/f2fs/gc.h
@@ -20,6 +20,8 @@
#define LIMIT_INVALID_BLOCK 40 /* percentage over total user space */
#define LIMIT_FREE_BLOCK 40 /* percentage over invalid + free space */
+#define DEF_GC_FAILED_PINNED_FILES 2048
+
/* Search max. number of dirty segments to select a victim segment */
#define DEF_MAX_VICTIM_SEARCH 4096 /* covers 8GB */
@@ -34,8 +36,6 @@
unsigned int no_gc_sleep_time;
/* for changing gc mode */
- unsigned int gc_idle;
- unsigned int gc_urgent;
unsigned int gc_wake;
};
diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
index 888a9dc..edbc542 100644
--- a/fs/f2fs/inline.c
+++ b/fs/f2fs/inline.c
@@ -13,6 +13,7 @@
#include "f2fs.h"
#include "node.h"
+#include <trace/events/android_fs.h>
bool f2fs_may_inline_data(struct inode *inode)
{
@@ -25,7 +26,7 @@
if (i_size_read(inode) > MAX_INLINE_DATA(inode))
return false;
- if (f2fs_encrypted_file(inode))
+ if (f2fs_post_read_required(inode))
return false;
return true;
@@ -42,7 +43,7 @@
return true;
}
-void read_inline_data(struct page *page, struct page *ipage)
+void f2fs_do_read_inline_data(struct page *page, struct page *ipage)
{
struct inode *inode = page->mapping->host;
void *src_addr, *dst_addr;
@@ -64,7 +65,8 @@
SetPageUptodate(page);
}
-void truncate_inline_inode(struct inode *inode, struct page *ipage, u64 from)
+void f2fs_truncate_inline_inode(struct inode *inode,
+ struct page *ipage, u64 from)
{
void *addr;
@@ -85,25 +87,42 @@
{
struct page *ipage;
- ipage = get_node_page(F2FS_I_SB(inode), inode->i_ino);
+ if (trace_android_fs_dataread_start_enabled()) {
+ char *path, pathbuf[MAX_TRACE_PATHBUF_LEN];
+
+ path = android_fstrace_get_pathname(pathbuf,
+ MAX_TRACE_PATHBUF_LEN,
+ inode);
+ trace_android_fs_dataread_start(inode, page_offset(page),
+ PAGE_SIZE, current->pid,
+ path, current->comm);
+ }
+
+ ipage = f2fs_get_node_page(F2FS_I_SB(inode), inode->i_ino);
if (IS_ERR(ipage)) {
+ trace_android_fs_dataread_end(inode, page_offset(page),
+ PAGE_SIZE);
unlock_page(page);
return PTR_ERR(ipage);
}
if (!f2fs_has_inline_data(inode)) {
f2fs_put_page(ipage, 1);
+ trace_android_fs_dataread_end(inode, page_offset(page),
+ PAGE_SIZE);
return -EAGAIN;
}
if (page->index)
zero_user_segment(page, 0, PAGE_SIZE);
else
- read_inline_data(page, ipage);
+ f2fs_do_read_inline_data(page, ipage);
if (!PageUptodate(page))
SetPageUptodate(page);
f2fs_put_page(ipage, 1);
+ trace_android_fs_dataread_end(inode, page_offset(page),
+ PAGE_SIZE);
unlock_page(page);
return 0;
}
@@ -112,6 +131,7 @@
{
struct f2fs_io_info fio = {
.sbi = F2FS_I_SB(dn->inode),
+ .ino = dn->inode->i_ino,
.type = DATA,
.op = REQ_OP_WRITE,
.op_flags = REQ_SYNC | REQ_PRIO,
@@ -140,7 +160,7 @@
f2fs_bug_on(F2FS_P_SB(page), PageWriteback(page));
- read_inline_data(page, dn->inode_page);
+ f2fs_do_read_inline_data(page, dn->inode_page);
set_page_dirty(page);
/* clear dirty state */
@@ -148,20 +168,21 @@
/* write data page to try to make data consistent */
set_page_writeback(page);
+ ClearPageError(page);
fio.old_blkaddr = dn->data_blkaddr;
set_inode_flag(dn->inode, FI_HOT_DATA);
- write_data_page(dn, &fio);
+ f2fs_outplace_write_data(dn, &fio);
f2fs_wait_on_page_writeback(page, DATA, true);
if (dirty) {
inode_dec_dirty_pages(dn->inode);
- remove_dirty_inode(dn->inode);
+ f2fs_remove_dirty_inode(dn->inode);
}
/* this converted inline_data should be recovered. */
set_inode_flag(dn->inode, FI_APPEND_WRITE);
/* clear inline data and flag after data writeback */
- truncate_inline_inode(dn->inode, dn->inode_page, 0);
+ f2fs_truncate_inline_inode(dn->inode, dn->inode_page, 0);
clear_inline_node(dn->inode_page);
clear_out:
stat_dec_inline_inode(dn->inode);
@@ -186,7 +207,7 @@
f2fs_lock_op(sbi);
- ipage = get_node_page(sbi, inode->i_ino);
+ ipage = f2fs_get_node_page(sbi, inode->i_ino);
if (IS_ERR(ipage)) {
err = PTR_ERR(ipage);
goto out;
@@ -212,12 +233,10 @@
{
void *src_addr, *dst_addr;
struct dnode_of_data dn;
- struct address_space *mapping = page_mapping(page);
- unsigned long flags;
int err;
set_new_dnode(&dn, inode, NULL, NULL, 0);
- err = get_dnode_of_data(&dn, 0, LOOKUP_NODE);
+ err = f2fs_get_dnode_of_data(&dn, 0, LOOKUP_NODE);
if (err)
return err;
@@ -235,10 +254,7 @@
kunmap_atomic(src_addr);
set_page_dirty(dn.inode_page);
- spin_lock_irqsave(&mapping->tree_lock, flags);
- radix_tree_tag_clear(&mapping->page_tree, page_index(page),
- PAGECACHE_TAG_DIRTY);
- spin_unlock_irqrestore(&mapping->tree_lock, flags);
+ f2fs_clear_radix_tree_dirty_tag(page);
set_inode_flag(inode, FI_APPEND_WRITE);
set_inode_flag(inode, FI_DATA_EXIST);
@@ -248,7 +264,7 @@
return 0;
}
-bool recover_inline_data(struct inode *inode, struct page *npage)
+bool f2fs_recover_inline_data(struct inode *inode, struct page *npage)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct f2fs_inode *ri = NULL;
@@ -269,7 +285,7 @@
if (f2fs_has_inline_data(inode) &&
ri && (ri->i_inline & F2FS_INLINE_DATA)) {
process_inline:
- ipage = get_node_page(sbi, inode->i_ino);
+ ipage = f2fs_get_node_page(sbi, inode->i_ino);
f2fs_bug_on(sbi, IS_ERR(ipage));
f2fs_wait_on_page_writeback(ipage, NODE, true);
@@ -287,20 +303,20 @@
}
if (f2fs_has_inline_data(inode)) {
- ipage = get_node_page(sbi, inode->i_ino);
+ ipage = f2fs_get_node_page(sbi, inode->i_ino);
f2fs_bug_on(sbi, IS_ERR(ipage));
- truncate_inline_inode(inode, ipage, 0);
+ f2fs_truncate_inline_inode(inode, ipage, 0);
clear_inode_flag(inode, FI_INLINE_DATA);
f2fs_put_page(ipage, 1);
} else if (ri && (ri->i_inline & F2FS_INLINE_DATA)) {
- if (truncate_blocks(inode, 0, false))
+ if (f2fs_truncate_blocks(inode, 0, false))
return false;
goto process_inline;
}
return false;
}
-struct f2fs_dir_entry *find_in_inline_dir(struct inode *dir,
+struct f2fs_dir_entry *f2fs_find_in_inline_dir(struct inode *dir,
struct fscrypt_name *fname, struct page **res_page)
{
struct f2fs_sb_info *sbi = F2FS_SB(dir->i_sb);
@@ -311,7 +327,7 @@
void *inline_dentry;
f2fs_hash_t namehash;
- ipage = get_node_page(sbi, dir->i_ino);
+ ipage = f2fs_get_node_page(sbi, dir->i_ino);
if (IS_ERR(ipage)) {
*res_page = ipage;
return NULL;
@@ -322,7 +338,7 @@
inline_dentry = inline_data_addr(dir, ipage);
make_dentry_ptr_inline(dir, &d, inline_dentry);
- de = find_target_dentry(fname, namehash, NULL, &d);
+ de = f2fs_find_target_dentry(fname, namehash, NULL, &d);
unlock_page(ipage);
if (de)
*res_page = ipage;
@@ -332,7 +348,7 @@
return de;
}
-int make_empty_inline_dir(struct inode *inode, struct inode *parent,
+int f2fs_make_empty_inline_dir(struct inode *inode, struct inode *parent,
struct page *ipage)
{
struct f2fs_dentry_ptr d;
@@ -341,7 +357,7 @@
inline_dentry = inline_data_addr(inode, ipage);
make_dentry_ptr_inline(inode, &d, inline_dentry);
- do_make_empty_dir(inode, parent, &d);
+ f2fs_do_make_empty_dir(inode, parent, &d);
set_page_dirty(ipage);
@@ -387,9 +403,8 @@
}
f2fs_wait_on_page_writeback(page, DATA, true);
- zero_user_segment(page, MAX_INLINE_DATA(dir), PAGE_SIZE);
- dentry_blk = kmap_atomic(page);
+ dentry_blk = page_address(page);
make_dentry_ptr_inline(dir, &src, inline_dentry);
make_dentry_ptr_block(dir, &dst, dentry_blk);
@@ -406,13 +421,12 @@
memcpy(dst.dentry, src.dentry, SIZE_OF_DIR_ENTRY * src.max);
memcpy(dst.filename, src.filename, src.max * F2FS_SLOT_LEN);
- kunmap_atomic(dentry_blk);
if (!PageUptodate(page))
SetPageUptodate(page);
set_page_dirty(page);
/* clear inline dir and flag after data writeback */
- truncate_inline_inode(dir, ipage, 0);
+ f2fs_truncate_inline_inode(dir, ipage, 0);
stat_dec_inline_dir(dir);
clear_inode_flag(dir, FI_INLINE_DENTRY);
@@ -455,7 +469,7 @@
new_name.len = le16_to_cpu(de->name_len);
ino = le32_to_cpu(de->ino);
- fake_mode = get_de_type(de) << S_SHIFT;
+ fake_mode = f2fs_get_de_type(de) << S_SHIFT;
err = f2fs_add_regular_entry(dir, &new_name, NULL, NULL,
ino, fake_mode);
@@ -467,8 +481,8 @@
return 0;
punch_dentry_pages:
truncate_inode_pages(&dir->i_data, 0);
- truncate_blocks(dir, 0, false);
- remove_dirty_inode(dir);
+ f2fs_truncate_blocks(dir, 0, false);
+ f2fs_remove_dirty_inode(dir);
return err;
}
@@ -486,7 +500,7 @@
}
memcpy(backup_dentry, inline_dentry, MAX_INLINE_DATA(dir));
- truncate_inline_inode(dir, ipage, 0);
+ f2fs_truncate_inline_inode(dir, ipage, 0);
unlock_page(ipage);
@@ -536,14 +550,14 @@
struct page *page = NULL;
int err = 0;
- ipage = get_node_page(sbi, dir->i_ino);
+ ipage = f2fs_get_node_page(sbi, dir->i_ino);
if (IS_ERR(ipage))
return PTR_ERR(ipage);
inline_dentry = inline_data_addr(dir, ipage);
make_dentry_ptr_inline(dir, &d, inline_dentry);
- bit_pos = room_for_filename(d.bitmap, slots, d.max);
+ bit_pos = f2fs_room_for_filename(d.bitmap, slots, d.max);
if (bit_pos >= d.max) {
err = f2fs_convert_inline_dir(dir, ipage, inline_dentry);
if (err)
@@ -554,7 +568,7 @@
if (inode) {
down_write(&F2FS_I(inode)->i_sem);
- page = init_inode_metadata(inode, dir, new_name,
+ page = f2fs_init_inode_metadata(inode, dir, new_name,
orig_name, ipage);
if (IS_ERR(page)) {
err = PTR_ERR(page);
@@ -575,7 +589,7 @@
f2fs_put_page(page, 1);
}
- update_parent_metadata(dir, inode, 0);
+ f2fs_update_parent_metadata(dir, inode, 0);
fail:
if (inode)
up_write(&F2FS_I(inode)->i_sem);
@@ -621,7 +635,7 @@
void *inline_dentry;
struct f2fs_dentry_ptr d;
- ipage = get_node_page(sbi, dir->i_ino);
+ ipage = f2fs_get_node_page(sbi, dir->i_ino);
if (IS_ERR(ipage))
return false;
@@ -652,7 +666,7 @@
if (ctx->pos == d.max)
return 0;
- ipage = get_node_page(F2FS_I_SB(inode), inode->i_ino);
+ ipage = f2fs_get_node_page(F2FS_I_SB(inode), inode->i_ino);
if (IS_ERR(ipage))
return PTR_ERR(ipage);
@@ -678,7 +692,7 @@
struct page *ipage;
int err = 0;
- ipage = get_node_page(F2FS_I_SB(inode), inode->i_ino);
+ ipage = f2fs_get_node_page(F2FS_I_SB(inode), inode->i_ino);
if (IS_ERR(ipage))
return PTR_ERR(ipage);
@@ -694,7 +708,7 @@
ilen = start + len;
ilen -= start;
- get_node_info(F2FS_I_SB(inode), inode->i_ino, &ni);
+ f2fs_get_node_info(F2FS_I_SB(inode), inode->i_ino, &ni);
byteaddr = (__u64)ni.blk_addr << inode->i_sb->s_blocksize_bits;
byteaddr += (char *)inline_data_addr(inode, ipage) -
(char *)F2FS_INODE(ipage);
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 259b0aa..30a7773 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -22,6 +22,9 @@
void f2fs_mark_inode_dirty_sync(struct inode *inode, bool sync)
{
+ if (is_inode_flag_set(inode, FI_NEW_INODE))
+ return;
+
if (f2fs_inode_dirtied(inode, sync))
return;
@@ -33,18 +36,21 @@
unsigned int flags = F2FS_I(inode)->i_flags;
unsigned int new_fl = 0;
- if (flags & FS_SYNC_FL)
+ if (flags & F2FS_SYNC_FL)
new_fl |= S_SYNC;
- if (flags & FS_APPEND_FL)
+ if (flags & F2FS_APPEND_FL)
new_fl |= S_APPEND;
- if (flags & FS_IMMUTABLE_FL)
+ if (flags & F2FS_IMMUTABLE_FL)
new_fl |= S_IMMUTABLE;
- if (flags & FS_NOATIME_FL)
+ if (flags & F2FS_NOATIME_FL)
new_fl |= S_NOATIME;
- if (flags & FS_DIRSYNC_FL)
+ if (flags & F2FS_DIRSYNC_FL)
new_fl |= S_DIRSYNC;
+ if (f2fs_encrypted_inode(inode))
+ new_fl |= S_ENCRYPTED;
inode_set_flags(inode, new_fl,
- S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC);
+ S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|
+ S_ENCRYPTED);
}
static void __get_inode_rdev(struct inode *inode, struct f2fs_inode *ri)
@@ -66,7 +72,7 @@
{
block_t addr = le32_to_cpu(ri->i_addr[offset_in_addr(ri)]);
- if (addr != NEW_ADDR && addr != NULL_ADDR)
+ if (is_valid_blkaddr(addr))
return true;
return false;
}
@@ -111,7 +117,6 @@
static bool f2fs_enable_inode_chksum(struct f2fs_sb_info *sbi, struct page *page)
{
struct f2fs_inode *ri = &F2FS_NODE(page)->i;
- int extra_isize = le32_to_cpu(ri->i_extra_isize);
if (!f2fs_sb_has_inode_chksum(sbi->sb))
return false;
@@ -119,7 +124,8 @@
if (!RAW_IS_INODE(F2FS_NODE(page)) || !(ri->i_inline & F2FS_EXTRA_ATTR))
return false;
- if (!F2FS_FITS_IN_INODE(ri, extra_isize, i_inode_checksum))
+ if (!F2FS_FITS_IN_INODE(ri, le16_to_cpu(ri->i_extra_isize),
+ i_inode_checksum))
return false;
return true;
@@ -179,6 +185,21 @@
ri->i_inode_checksum = cpu_to_le32(f2fs_inode_chksum(sbi, page));
}
+static bool sanity_check_inode(struct inode *inode)
+{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+
+ if (f2fs_sb_has_flexible_inline_xattr(sbi->sb)
+ && !f2fs_has_extra_attr(inode)) {
+ set_sbi_flag(sbi, SBI_NEED_FSCK);
+ f2fs_msg(sbi->sb, KERN_WARNING,
+ "%s: corrupted inode ino=%lx, run fsck to fix.",
+ __func__, inode->i_ino);
+ return false;
+ }
+ return true;
+}
+
static int do_read_inode(struct inode *inode)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
@@ -188,10 +209,10 @@
projid_t i_projid;
/* Check if ino is within scope */
- if (check_nid_range(sbi, inode->i_ino))
+ if (f2fs_check_nid_range(sbi, inode->i_ino))
return -EINVAL;
- node_page = get_node_page(sbi, inode->i_ino);
+ node_page = f2fs_get_node_page(sbi, inode->i_ino);
if (IS_ERR(node_page))
return PTR_ERR(node_page);
@@ -211,8 +232,11 @@
inode->i_ctime.tv_nsec = le32_to_cpu(ri->i_ctime_nsec);
inode->i_mtime.tv_nsec = le32_to_cpu(ri->i_mtime_nsec);
inode->i_generation = le32_to_cpu(ri->i_generation);
-
- fi->i_current_depth = le32_to_cpu(ri->i_current_depth);
+ if (S_ISDIR(inode->i_mode))
+ fi->i_current_depth = le32_to_cpu(ri->i_current_depth);
+ else if (S_ISREG(inode->i_mode))
+ fi->i_gc_failures[GC_FAILURE_PIN] =
+ le16_to_cpu(ri->i_gc_failures);
fi->i_xattr_nid = le32_to_cpu(ri->i_xattr_nid);
fi->i_flags = le32_to_cpu(ri->i_flags);
fi->flags = 0;
@@ -228,6 +252,22 @@
fi->i_extra_isize = f2fs_has_extra_attr(inode) ?
le16_to_cpu(ri->i_extra_isize) : 0;
+ if (f2fs_sb_has_flexible_inline_xattr(sbi->sb)) {
+ fi->i_inline_xattr_size = le16_to_cpu(ri->i_inline_xattr_size);
+ } else if (f2fs_has_inline_xattr(inode) ||
+ f2fs_has_inline_dentry(inode)) {
+ fi->i_inline_xattr_size = DEFAULT_INLINE_XATTR_ADDRS;
+ } else {
+
+ /*
+ * Previous inline data or directory always reserved 200 bytes
+ * in inode layout, even if inline_xattr is disabled. In order
+ * to keep inline_dentry's structure for backward compatibility,
+ * we get the space back only from inline_data.
+ */
+ fi->i_inline_xattr_size = 0;
+ }
+
/* check data exist */
if (f2fs_has_inline_data(inode) && !f2fs_exist_data(inode))
__recover_inline_status(inode, node_page);
@@ -238,10 +278,10 @@
if (__written_first_block(ri))
set_inode_flag(inode, FI_FIRST_BLOCK_WRITTEN);
- if (!need_inode_block_update(sbi, inode->i_ino))
+ if (!f2fs_need_inode_block_update(sbi, inode->i_ino))
fi->last_disk_size = inode->i_size;
- if (fi->i_flags & FS_PROJINHERIT_FL)
+ if (fi->i_flags & F2FS_PROJINHERIT_FL)
set_inode_flag(inode, FI_PROJ_INHERIT);
if (f2fs_has_extra_attr(inode) && f2fs_sb_has_project_quota(sbi->sb) &&
@@ -251,6 +291,16 @@
i_projid = F2FS_DEF_PROJID;
fi->i_projid = make_kprojid(&init_user_ns, i_projid);
+ if (f2fs_has_extra_attr(inode) && f2fs_sb_has_inode_crtime(sbi->sb) &&
+ F2FS_FITS_IN_INODE(ri, fi->i_extra_isize, i_crtime)) {
+ fi->i_crtime.tv_sec = le64_to_cpu(ri->i_crtime);
+ fi->i_crtime.tv_nsec = le32_to_cpu(ri->i_crtime_nsec);
+ }
+
+ F2FS_I(inode)->i_disk_time[0] = inode->i_atime;
+ F2FS_I(inode)->i_disk_time[1] = inode->i_ctime;
+ F2FS_I(inode)->i_disk_time[2] = inode->i_mtime;
+ F2FS_I(inode)->i_disk_time[3] = F2FS_I(inode)->i_crtime;
f2fs_put_page(node_page, 1);
stat_inc_inline_xattr(inode);
@@ -280,13 +330,17 @@
ret = do_read_inode(inode);
if (ret)
goto bad_inode;
+ if (!sanity_check_inode(inode)) {
+ ret = -EINVAL;
+ goto bad_inode;
+ }
make_now:
if (ino == F2FS_NODE_INO(sbi)) {
inode->i_mapping->a_ops = &f2fs_node_aops;
- mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_ZERO);
+ mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
} else if (ino == F2FS_META_INO(sbi)) {
inode->i_mapping->a_ops = &f2fs_meta_aops;
- mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_ZERO);
+ mapping_set_gfp_mask(inode->i_mapping, GFP_NOFS);
} else if (S_ISREG(inode->i_mode)) {
inode->i_op = &f2fs_file_inode_operations;
inode->i_fop = &f2fs_file_operations;
@@ -295,7 +349,7 @@
inode->i_op = &f2fs_dir_inode_operations;
inode->i_fop = &f2fs_dir_operations;
inode->i_mapping->a_ops = &f2fs_dblock_aops;
- mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_HIGH_ZERO);
+ inode_nohighmem(inode);
} else if (S_ISLNK(inode->i_mode)) {
if (f2fs_encrypted_inode(inode))
inode->i_op = &f2fs_encrypted_symlink_inode_operations;
@@ -336,14 +390,15 @@
return inode;
}
-int update_inode(struct inode *inode, struct page *node_page)
+void f2fs_update_inode(struct inode *inode, struct page *node_page)
{
struct f2fs_inode *ri;
struct extent_tree *et = F2FS_I(inode)->extent_tree;
- f2fs_inode_synced(inode);
-
f2fs_wait_on_page_writeback(node_page, NODE, true);
+ set_page_dirty(node_page);
+
+ f2fs_inode_synced(inode);
ri = F2FS_INODE(node_page);
@@ -370,7 +425,12 @@
ri->i_atime_nsec = cpu_to_le32(inode->i_atime.tv_nsec);
ri->i_ctime_nsec = cpu_to_le32(inode->i_ctime.tv_nsec);
ri->i_mtime_nsec = cpu_to_le32(inode->i_mtime.tv_nsec);
- ri->i_current_depth = cpu_to_le32(F2FS_I(inode)->i_current_depth);
+ if (S_ISDIR(inode->i_mode))
+ ri->i_current_depth =
+ cpu_to_le32(F2FS_I(inode)->i_current_depth);
+ else if (S_ISREG(inode->i_mode))
+ ri->i_gc_failures =
+ cpu_to_le16(F2FS_I(inode)->i_gc_failures[GC_FAILURE_PIN]);
ri->i_xattr_nid = cpu_to_le32(F2FS_I(inode)->i_xattr_nid);
ri->i_flags = cpu_to_le32(F2FS_I(inode)->i_flags);
ri->i_pino = cpu_to_le32(F2FS_I(inode)->i_pino);
@@ -380,6 +440,10 @@
if (f2fs_has_extra_attr(inode)) {
ri->i_extra_isize = cpu_to_le16(F2FS_I(inode)->i_extra_isize);
+ if (f2fs_sb_has_flexible_inline_xattr(F2FS_I_SB(inode)->sb))
+ ri->i_inline_xattr_size =
+ cpu_to_le16(F2FS_I(inode)->i_inline_xattr_size);
+
if (f2fs_sb_has_project_quota(F2FS_I_SB(inode)->sb) &&
F2FS_FITS_IN_INODE(ri, F2FS_I(inode)->i_extra_isize,
i_projid)) {
@@ -389,25 +453,35 @@
F2FS_I(inode)->i_projid);
ri->i_projid = cpu_to_le32(i_projid);
}
+
+ if (f2fs_sb_has_inode_crtime(F2FS_I_SB(inode)->sb) &&
+ F2FS_FITS_IN_INODE(ri, F2FS_I(inode)->i_extra_isize,
+ i_crtime)) {
+ ri->i_crtime =
+ cpu_to_le64(F2FS_I(inode)->i_crtime.tv_sec);
+ ri->i_crtime_nsec =
+ cpu_to_le32(F2FS_I(inode)->i_crtime.tv_nsec);
+ }
}
__set_inode_rdev(inode, ri);
- set_cold_node(inode, node_page);
/* deleted inode */
if (inode->i_nlink == 0)
clear_inline_node(node_page);
- return set_page_dirty(node_page);
+ F2FS_I(inode)->i_disk_time[0] = inode->i_atime;
+ F2FS_I(inode)->i_disk_time[1] = inode->i_ctime;
+ F2FS_I(inode)->i_disk_time[2] = inode->i_mtime;
+ F2FS_I(inode)->i_disk_time[3] = F2FS_I(inode)->i_crtime;
}
-int update_inode_page(struct inode *inode)
+void f2fs_update_inode_page(struct inode *inode)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct page *node_page;
- int ret = 0;
retry:
- node_page = get_node_page(sbi, inode->i_ino);
+ node_page = f2fs_get_node_page(sbi, inode->i_ino);
if (IS_ERR(node_page)) {
int err = PTR_ERR(node_page);
if (err == -ENOMEM) {
@@ -416,11 +490,10 @@
} else if (err != -ENOENT) {
f2fs_stop_checkpoint(sbi, false);
}
- return 0;
+ return;
}
- ret = update_inode(inode, node_page);
+ f2fs_update_inode(inode, node_page);
f2fs_put_page(node_page, 1);
- return ret;
}
int f2fs_write_inode(struct inode *inode, struct writeback_control *wbc)
@@ -438,7 +511,7 @@
* We need to balance fs here to prevent from producing dirty node pages
* during the urgent cleaning time when runing out of free sections.
*/
- update_inode_page(inode);
+ f2fs_update_inode_page(inode);
if (wbc && wbc->nr_to_write)
f2fs_balance_fs(sbi, true);
return 0;
@@ -455,7 +528,7 @@
/* some remained atomic pages should discarded */
if (f2fs_is_atomic_file(inode))
- drop_inmem_pages(inode);
+ f2fs_drop_inmem_pages(inode);
trace_f2fs_evict_inode(inode);
truncate_inode_pages_final(&inode->i_data);
@@ -465,7 +538,7 @@
goto out_clear;
f2fs_bug_on(sbi, get_dirty_pages(inode));
- remove_dirty_inode(inode);
+ f2fs_remove_dirty_inode(inode);
f2fs_destroy_extent_tree(inode);
@@ -474,8 +547,9 @@
dquot_initialize(inode);
- remove_ino_entry(sbi, inode->i_ino, APPEND_INO);
- remove_ino_entry(sbi, inode->i_ino, UPDATE_INO);
+ f2fs_remove_ino_entry(sbi, inode->i_ino, APPEND_INO);
+ f2fs_remove_ino_entry(sbi, inode->i_ino, UPDATE_INO);
+ f2fs_remove_ino_entry(sbi, inode->i_ino, FLUSH_INO);
sb_start_intwrite(inode->i_sb);
set_inode_flag(inode, FI_NO_ALLOC);
@@ -492,7 +566,7 @@
#endif
if (!err) {
f2fs_lock_op(sbi);
- err = remove_inode_page(inode);
+ err = f2fs_remove_inode_page(inode);
f2fs_unlock_op(sbi);
if (err == -ENOENT)
err = 0;
@@ -505,7 +579,7 @@
}
if (err)
- update_inode_page(inode);
+ f2fs_update_inode_page(inode);
dquot_free_inode(inode);
sb_end_intwrite(inode->i_sb);
no_delete:
@@ -515,8 +589,10 @@
stat_dec_inline_dir(inode);
stat_dec_inline_inode(inode);
- if (!is_set_ckpt_flags(sbi, CP_ERROR_FLAG))
+ if (likely(!is_set_ckpt_flags(sbi, CP_ERROR_FLAG)))
f2fs_bug_on(sbi, is_inode_flag_set(inode, FI_DIRTY_INODE));
+ else
+ f2fs_inode_synced(inode);
/* ino == 0, if f2fs_new_inode() was failed t*/
if (inode->i_ino)
@@ -526,27 +602,27 @@
invalidate_mapping_pages(NODE_MAPPING(sbi), xnid, xnid);
if (inode->i_nlink) {
if (is_inode_flag_set(inode, FI_APPEND_WRITE))
- add_ino_entry(sbi, inode->i_ino, APPEND_INO);
+ f2fs_add_ino_entry(sbi, inode->i_ino, APPEND_INO);
if (is_inode_flag_set(inode, FI_UPDATE_WRITE))
- add_ino_entry(sbi, inode->i_ino, UPDATE_INO);
+ f2fs_add_ino_entry(sbi, inode->i_ino, UPDATE_INO);
}
if (is_inode_flag_set(inode, FI_FREE_NID)) {
- alloc_nid_failed(sbi, inode->i_ino);
+ f2fs_alloc_nid_failed(sbi, inode->i_ino);
clear_inode_flag(inode, FI_FREE_NID);
} else {
/*
* If xattr nid is corrupted, we can reach out error condition,
- * err & !exist_written_data(sbi, inode->i_ino, ORPHAN_INO)).
- * In that case, check_nid_range() is enough to give a clue.
+ * err & !f2fs_exist_written_data(sbi, inode->i_ino, ORPHAN_INO)).
+ * In that case, f2fs_check_nid_range() is enough to give a clue.
*/
}
out_clear:
- fscrypt_put_encryption_info(inode, NULL);
+ fscrypt_put_encryption_info(inode);
clear_inode(inode);
}
/* caller should call f2fs_lock_op() */
-void handle_failed_inode(struct inode *inode)
+void f2fs_handle_failed_inode(struct inode *inode)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct node_info ni;
@@ -561,7 +637,7 @@
* we must call this to avoid inode being remained as dirty, resulting
* in a panic when flushing dirty inodes in gdirty_list.
*/
- update_inode_page(inode);
+ f2fs_update_inode_page(inode);
f2fs_inode_synced(inode);
/* don't make bad inode, since it becomes a regular file. */
@@ -572,18 +648,18 @@
* so we can prevent losing this orphan when encoutering checkpoint
* and following suddenly power-off.
*/
- get_node_info(sbi, inode->i_ino, &ni);
+ f2fs_get_node_info(sbi, inode->i_ino, &ni);
if (ni.blk_addr != NULL_ADDR) {
- int err = acquire_orphan_inode(sbi);
+ int err = f2fs_acquire_orphan_inode(sbi);
if (err) {
set_sbi_flag(sbi, SBI_NEED_FSCK);
f2fs_msg(sbi->sb, KERN_WARNING,
"Too many orphan inodes, run fsck to fix.");
} else {
- add_orphan_inode(inode);
+ f2fs_add_orphan_inode(inode);
}
- alloc_nid_done(sbi, inode->i_ino);
+ f2fs_alloc_nid_done(sbi, inode->i_ino);
} else {
set_inode_flag(inode, FI_FREE_NID);
}
diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c
index b80e7db..64050c8 100644
--- a/fs/f2fs/namei.c
+++ b/fs/f2fs/namei.c
@@ -29,6 +29,7 @@
nid_t ino;
struct inode *inode;
bool nid_free = false;
+ int xattr_size = 0;
int err;
inode = new_inode(dir->i_sb);
@@ -36,7 +37,7 @@
return ERR_PTR(-ENOMEM);
f2fs_lock_op(sbi);
- if (!alloc_nid(sbi, &ino)) {
+ if (!f2fs_alloc_nid(sbi, &ino)) {
f2fs_unlock_op(sbi);
err = -ENOSPC;
goto fail;
@@ -49,9 +50,13 @@
inode->i_ino = ino;
inode->i_blocks = 0;
- inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
+ inode->i_mtime = inode->i_atime = inode->i_ctime =
+ F2FS_I(inode)->i_crtime = current_time(inode);
inode->i_generation = sbi->s_next_generation++;
+ if (S_ISDIR(inode->i_mode))
+ F2FS_I(inode)->i_current_depth = 1;
+
err = insert_inode_locked(inode);
if (err) {
err = -EINVAL;
@@ -59,7 +64,7 @@
}
if (f2fs_sb_has_project_quota(sbi->sb) &&
- (F2FS_I(dir)->i_flags & FS_PROJINHERIT_FL))
+ (F2FS_I(dir)->i_flags & F2FS_PROJINHERIT_FL))
F2FS_I(inode)->i_projid = F2FS_I(dir)->i_projid;
else
F2FS_I(inode)->i_projid = make_kprojid(&init_user_ns,
@@ -73,12 +78,13 @@
if (err)
goto fail_drop;
- /* If the directory encrypted, then we should encrypt the inode. */
- if (f2fs_encrypted_inode(dir) && f2fs_may_encrypt(inode))
- f2fs_set_encrypted_inode(inode);
-
set_inode_flag(inode, FI_NEW_INODE);
+ /* If the directory encrypted, then we should encrypt the inode. */
+ if ((f2fs_encrypted_inode(dir) || DUMMY_ENCRYPTION_ENABLED(sbi)) &&
+ f2fs_may_encrypt(inode))
+ f2fs_set_encrypted_inode(inode);
+
if (f2fs_sb_has_extra_attr(sbi->sb)) {
set_inode_flag(inode, FI_EXTRA_ATTR);
F2FS_I(inode)->i_extra_isize = F2FS_TOTAL_EXTRA_ATTR_SIZE;
@@ -86,11 +92,23 @@
if (test_opt(sbi, INLINE_XATTR))
set_inode_flag(inode, FI_INLINE_XATTR);
+
if (test_opt(sbi, INLINE_DATA) && f2fs_may_inline_data(inode))
set_inode_flag(inode, FI_INLINE_DATA);
if (f2fs_may_inline_dentry(inode))
set_inode_flag(inode, FI_INLINE_DENTRY);
+ if (f2fs_sb_has_flexible_inline_xattr(sbi->sb)) {
+ f2fs_bug_on(sbi, !f2fs_has_extra_attr(inode));
+ if (f2fs_has_inline_xattr(inode))
+ xattr_size = F2FS_OPTION(sbi).inline_xattr_size;
+ /* Otherwise, will be 0 */
+ } else if (f2fs_has_inline_xattr(inode) ||
+ f2fs_has_inline_dentry(inode)) {
+ xattr_size = DEFAULT_INLINE_XATTR_ADDRS;
+ }
+ F2FS_I(inode)->i_inline_xattr_size = xattr_size;
+
f2fs_init_extent_tree(inode, NULL);
stat_inc_inline_xattr(inode);
@@ -101,9 +119,9 @@
f2fs_mask_flags(mode, F2FS_I(dir)->i_flags & F2FS_FL_INHERITED);
if (S_ISDIR(inode->i_mode))
- F2FS_I(inode)->i_flags |= FS_INDEX_FL;
+ F2FS_I(inode)->i_flags |= F2FS_INDEX_FL;
- if (F2FS_I(inode)->i_flags & FS_PROJINHERIT_FL)
+ if (F2FS_I(inode)->i_flags & F2FS_PROJINHERIT_FL)
set_inode_flag(inode, FI_PROJ_INHERIT);
trace_f2fs_new_inode(inode, 0);
@@ -128,7 +146,7 @@
return ERR_PTR(err);
}
-static int is_multimedia_file(const unsigned char *s, const char *sub)
+static int is_extension_exist(const unsigned char *s, const char *sub)
{
size_t slen = strlen(s);
size_t sublen = strlen(sub);
@@ -154,19 +172,94 @@
/*
* Set multimedia files as cold files for hot/cold data separation
*/
-static inline void set_cold_files(struct f2fs_sb_info *sbi, struct inode *inode,
+static inline void set_file_temperature(struct f2fs_sb_info *sbi, struct inode *inode,
const unsigned char *name)
{
- int i;
- __u8 (*extlist)[8] = sbi->raw_super->extension_list;
+ __u8 (*extlist)[F2FS_EXTENSION_LEN] = sbi->raw_super->extension_list;
+ int i, cold_count, hot_count;
- int count = le32_to_cpu(sbi->raw_super->extension_count);
- for (i = 0; i < count; i++) {
- if (is_multimedia_file(name, extlist[i])) {
+ down_read(&sbi->sb_lock);
+
+ cold_count = le32_to_cpu(sbi->raw_super->extension_count);
+ hot_count = sbi->raw_super->hot_ext_count;
+
+ for (i = 0; i < cold_count + hot_count; i++) {
+ if (!is_extension_exist(name, extlist[i]))
+ continue;
+ if (i < cold_count)
file_set_cold(inode);
- break;
- }
+ else
+ file_set_hot(inode);
+ break;
}
+
+ up_read(&sbi->sb_lock);
+}
+
+int f2fs_update_extension_list(struct f2fs_sb_info *sbi, const char *name,
+ bool hot, bool set)
+{
+ __u8 (*extlist)[F2FS_EXTENSION_LEN] = sbi->raw_super->extension_list;
+ int cold_count = le32_to_cpu(sbi->raw_super->extension_count);
+ int hot_count = sbi->raw_super->hot_ext_count;
+ int total_count = cold_count + hot_count;
+ int start, count;
+ int i;
+
+ if (set) {
+ if (total_count == F2FS_MAX_EXTENSION)
+ return -EINVAL;
+ } else {
+ if (!hot && !cold_count)
+ return -EINVAL;
+ if (hot && !hot_count)
+ return -EINVAL;
+ }
+
+ if (hot) {
+ start = cold_count;
+ count = total_count;
+ } else {
+ start = 0;
+ count = cold_count;
+ }
+
+ for (i = start; i < count; i++) {
+ if (strcmp(name, extlist[i]))
+ continue;
+
+ if (set)
+ return -EINVAL;
+
+ memcpy(extlist[i], extlist[i + 1],
+ F2FS_EXTENSION_LEN * (total_count - i - 1));
+ memset(extlist[total_count - 1], 0, F2FS_EXTENSION_LEN);
+ if (hot)
+ sbi->raw_super->hot_ext_count = hot_count - 1;
+ else
+ sbi->raw_super->extension_count =
+ cpu_to_le32(cold_count - 1);
+ return 0;
+ }
+
+ if (!set)
+ return -EINVAL;
+
+ if (hot) {
+ strncpy(extlist[count], name, strlen(name));
+ sbi->raw_super->hot_ext_count = hot_count + 1;
+ } else {
+ char buf[F2FS_MAX_EXTENSION][F2FS_EXTENSION_LEN];
+
+ memcpy(buf, &extlist[cold_count],
+ F2FS_EXTENSION_LEN * hot_count);
+ memset(extlist[cold_count], 0, F2FS_EXTENSION_LEN);
+ strncpy(extlist[cold_count], name, strlen(name));
+ memcpy(&extlist[cold_count + 1], buf,
+ F2FS_EXTENSION_LEN * hot_count);
+ sbi->raw_super->extension_count = cpu_to_le32(cold_count + 1);
+ }
+ return 0;
}
static int f2fs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
@@ -177,6 +270,9 @@
nid_t ino = 0;
int err;
+ if (unlikely(f2fs_cp_error(sbi)))
+ return -EIO;
+
err = dquot_initialize(dir);
if (err)
return err;
@@ -186,7 +282,7 @@
return PTR_ERR(inode);
if (!test_opt(sbi, DISABLE_EXT_IDENTIFY))
- set_cold_files(sbi, inode, dentry->d_name.name);
+ set_file_temperature(sbi, inode, dentry->d_name.name);
inode->i_op = &f2fs_file_inode_operations;
inode->i_fop = &f2fs_file_operations;
@@ -199,7 +295,7 @@
goto out;
f2fs_unlock_op(sbi);
- alloc_nid_done(sbi, ino);
+ f2fs_alloc_nid_done(sbi, ino);
d_instantiate_new(dentry, inode);
@@ -209,7 +305,7 @@
f2fs_balance_fs(sbi, true);
return 0;
out:
- handle_failed_inode(inode);
+ f2fs_handle_failed_inode(inode);
return err;
}
@@ -220,9 +316,12 @@
struct f2fs_sb_info *sbi = F2FS_I_SB(dir);
int err;
- if (f2fs_encrypted_inode(dir) &&
- !fscrypt_has_permitted_context(dir, inode))
- return -EPERM;
+ if (unlikely(f2fs_cp_error(sbi)))
+ return -EIO;
+
+ err = fscrypt_prepare_link(old_dentry, dir, dentry);
+ if (err)
+ return err;
if (is_inode_flag_set(dir, FI_PROJ_INHERIT) &&
(!projid_eq(F2FS_I(dir)->i_projid,
@@ -296,26 +395,23 @@
de = f2fs_find_entry(dir, &dot, &page);
if (de) {
- f2fs_dentry_kunmap(dir, page);
f2fs_put_page(page, 0);
} else if (IS_ERR(page)) {
err = PTR_ERR(page);
goto out;
} else {
- err = __f2fs_add_link(dir, &dot, NULL, dir->i_ino, S_IFDIR);
+ err = f2fs_do_add_link(dir, &dot, NULL, dir->i_ino, S_IFDIR);
if (err)
goto out;
}
de = f2fs_find_entry(dir, &dotdot, &page);
- if (de) {
- f2fs_dentry_kunmap(dir, page);
+ if (de)
f2fs_put_page(page, 0);
- } else if (IS_ERR(page)) {
+ else if (IS_ERR(page))
err = PTR_ERR(page);
- } else {
- err = __f2fs_add_link(dir, &dotdot, NULL, pino, S_IFDIR);
- }
+ else
+ err = f2fs_do_add_link(dir, &dotdot, NULL, pino, S_IFDIR);
out:
if (!err)
clear_inode_flag(dir, FI_INLINE_DOTS);
@@ -330,53 +426,50 @@
struct inode *inode = NULL;
struct f2fs_dir_entry *de;
struct page *page;
- nid_t ino;
+ struct dentry *new;
+ nid_t ino = -1;
int err = 0;
unsigned int root_ino = F2FS_ROOT_INO(F2FS_I_SB(dir));
- if (f2fs_encrypted_inode(dir)) {
- int res = fscrypt_get_encryption_info(dir);
+ trace_f2fs_lookup_start(dir, dentry, flags);
- /*
- * DCACHE_ENCRYPTED_WITH_KEY is set if the dentry is
- * created while the directory was encrypted and we
- * don't have access to the key.
- */
- if (fscrypt_has_encryption_key(dir))
- fscrypt_set_encrypted_dentry(dentry);
- fscrypt_set_d_op(dentry);
- if (res && res != -ENOKEY)
- return ERR_PTR(res);
+ err = fscrypt_prepare_lookup(dir, dentry, flags);
+ if (err)
+ goto out;
+
+ if (dentry->d_name.len > F2FS_NAME_LEN) {
+ err = -ENAMETOOLONG;
+ goto out;
}
- if (dentry->d_name.len > F2FS_NAME_LEN)
- return ERR_PTR(-ENAMETOOLONG);
-
de = f2fs_find_entry(dir, &dentry->d_name, &page);
if (!de) {
- if (IS_ERR(page))
- return (struct dentry *)page;
- return d_splice_alias(inode, dentry);
+ if (IS_ERR(page)) {
+ err = PTR_ERR(page);
+ goto out;
+ }
+ goto out_splice;
}
ino = le32_to_cpu(de->ino);
- f2fs_dentry_kunmap(dir, page);
f2fs_put_page(page, 0);
inode = f2fs_iget(dir->i_sb, ino);
- if (IS_ERR(inode))
- return ERR_CAST(inode);
+ if (IS_ERR(inode)) {
+ err = PTR_ERR(inode);
+ goto out;
+ }
if ((dir->i_ino == root_ino) && f2fs_has_inline_dots(dir)) {
err = __recover_dot_dentries(dir, root_ino);
if (err)
- goto err_out;
+ goto out_iput;
}
if (f2fs_has_inline_dots(inode)) {
err = __recover_dot_dentries(inode, dir->i_ino);
if (err)
- goto err_out;
+ goto out_iput;
}
if (f2fs_encrypted_inode(dir) &&
(S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode)) &&
@@ -385,12 +478,18 @@
"Inconsistent encryption contexts: %lu/%lu",
dir->i_ino, inode->i_ino);
err = -EPERM;
- goto err_out;
+ goto out_iput;
}
- return d_splice_alias(inode, dentry);
-
-err_out:
+out_splice:
+ new = d_splice_alias(inode, dentry);
+ if (IS_ERR(new))
+ err = PTR_ERR(new);
+ trace_f2fs_lookup_end(dir, dentry, ino, err);
+ return new;
+out_iput:
iput(inode);
+out:
+ trace_f2fs_lookup_end(dir, dentry, ino, err);
return ERR_PTR(err);
}
@@ -404,9 +503,15 @@
trace_f2fs_unlink_enter(dir, dentry);
+ if (unlikely(f2fs_cp_error(sbi)))
+ return -EIO;
+
err = dquot_initialize(dir);
if (err)
return err;
+ err = dquot_initialize(inode);
+ if (err)
+ return err;
de = f2fs_find_entry(dir, &dentry->d_name, &page);
if (!de) {
@@ -418,10 +523,9 @@
f2fs_balance_fs(sbi, true);
f2fs_lock_op(sbi);
- err = acquire_orphan_inode(sbi);
+ err = f2fs_acquire_orphan_inode(sbi);
if (err) {
f2fs_unlock_op(sbi);
- f2fs_dentry_kunmap(dir, page);
f2fs_put_page(page, 0);
goto fail;
}
@@ -455,24 +559,16 @@
struct f2fs_sb_info *sbi = F2FS_I_SB(dir);
struct inode *inode;
size_t len = strlen(symname);
- struct fscrypt_str disk_link = FSTR_INIT((char *)symname, len + 1);
- struct fscrypt_symlink_data *sd = NULL;
+ struct fscrypt_str disk_link;
int err;
- if (f2fs_encrypted_inode(dir)) {
- err = fscrypt_get_encryption_info(dir);
- if (err)
- return err;
+ if (unlikely(f2fs_cp_error(sbi)))
+ return -EIO;
- if (!fscrypt_has_encryption_key(dir))
- return -ENOKEY;
-
- disk_link.len = (fscrypt_fname_encrypted_size(dir, len) +
- sizeof(struct fscrypt_symlink_data));
- }
-
- if (disk_link.len > dir->i_sb->s_blocksize)
- return -ENAMETOOLONG;
+ err = fscrypt_prepare_symlink(dir, symname, len, dir->i_sb->s_blocksize,
+ &disk_link);
+ if (err)
+ return err;
err = dquot_initialize(dir);
if (err)
@@ -482,7 +578,7 @@
if (IS_ERR(inode))
return PTR_ERR(inode);
- if (f2fs_encrypted_inode(inode))
+ if (IS_ENCRYPTED(inode))
inode->i_op = &f2fs_encrypted_symlink_inode_operations;
else
inode->i_op = &f2fs_symlink_inode_operations;
@@ -492,38 +588,13 @@
f2fs_lock_op(sbi);
err = f2fs_add_link(dentry, inode);
if (err)
- goto out;
+ goto out_f2fs_handle_failed_inode;
f2fs_unlock_op(sbi);
- alloc_nid_done(sbi, inode->i_ino);
+ f2fs_alloc_nid_done(sbi, inode->i_ino);
- if (f2fs_encrypted_inode(inode)) {
- struct qstr istr = QSTR_INIT(symname, len);
- struct fscrypt_str ostr;
-
- sd = kzalloc(disk_link.len, GFP_NOFS);
- if (!sd) {
- err = -ENOMEM;
- goto err_out;
- }
-
- err = fscrypt_get_encryption_info(inode);
- if (err)
- goto err_out;
-
- if (!fscrypt_has_encryption_key(inode)) {
- err = -ENOKEY;
- goto err_out;
- }
-
- ostr.name = sd->encrypted_path;
- ostr.len = disk_link.len;
- err = fscrypt_fname_usr_to_disk(inode, &istr, &ostr);
- if (err)
- goto err_out;
-
- sd->len = cpu_to_le16(ostr.len);
- disk_link.name = (char *)sd;
- }
+ err = fscrypt_encrypt_symlink(inode, symname, len, &disk_link);
+ if (err)
+ goto err_out;
err = page_symlink(inode, disk_link.name, disk_link.len);
@@ -549,12 +620,14 @@
f2fs_unlink(dir, dentry);
}
- kfree(sd);
-
f2fs_balance_fs(sbi, true);
- return err;
-out:
- handle_failed_inode(inode);
+ goto out_free_encrypted_link;
+
+out_f2fs_handle_failed_inode:
+ f2fs_handle_failed_inode(inode);
+out_free_encrypted_link:
+ if (disk_link.name != (unsigned char *)symname)
+ kfree(disk_link.name);
return err;
}
@@ -564,6 +637,9 @@
struct inode *inode;
int err;
+ if (unlikely(f2fs_cp_error(sbi)))
+ return -EIO;
+
err = dquot_initialize(dir);
if (err)
return err;
@@ -575,7 +651,7 @@
inode->i_op = &f2fs_dir_inode_operations;
inode->i_fop = &f2fs_dir_operations;
inode->i_mapping->a_ops = &f2fs_dblock_aops;
- mapping_set_gfp_mask(inode->i_mapping, GFP_F2FS_HIGH_ZERO);
+ inode_nohighmem(inode);
set_inode_flag(inode, FI_INC_LINK);
f2fs_lock_op(sbi);
@@ -584,7 +660,7 @@
goto out_fail;
f2fs_unlock_op(sbi);
- alloc_nid_done(sbi, inode->i_ino);
+ f2fs_alloc_nid_done(sbi, inode->i_ino);
d_instantiate_new(dentry, inode);
@@ -596,7 +672,7 @@
out_fail:
clear_inode_flag(inode, FI_INC_LINK);
- handle_failed_inode(inode);
+ f2fs_handle_failed_inode(inode);
return err;
}
@@ -615,6 +691,9 @@
struct inode *inode;
int err = 0;
+ if (unlikely(f2fs_cp_error(sbi)))
+ return -EIO;
+
err = dquot_initialize(dir);
if (err)
return err;
@@ -632,7 +711,7 @@
goto out;
f2fs_unlock_op(sbi);
- alloc_nid_done(sbi, inode->i_ino);
+ f2fs_alloc_nid_done(sbi, inode->i_ino);
d_instantiate_new(dentry, inode);
@@ -642,7 +721,7 @@
f2fs_balance_fs(sbi, true);
return 0;
out:
- handle_failed_inode(inode);
+ f2fs_handle_failed_inode(inode);
return err;
}
@@ -671,7 +750,7 @@
}
f2fs_lock_op(sbi);
- err = acquire_orphan_inode(sbi);
+ err = f2fs_acquire_orphan_inode(sbi);
if (err)
goto out;
@@ -683,8 +762,8 @@
* add this non-linked tmpfile to orphan list, in this way we could
* remove all unused data of tmpfile after abnormal power-off.
*/
- add_orphan_inode(inode);
- alloc_nid_done(sbi, inode->i_ino);
+ f2fs_add_orphan_inode(inode);
+ f2fs_alloc_nid_done(sbi, inode->i_ino);
if (whiteout) {
f2fs_i_links_write(inode, false);
@@ -700,15 +779,20 @@
return 0;
release_out:
- release_orphan_inode(sbi);
+ f2fs_release_orphan_inode(sbi);
out:
- handle_failed_inode(inode);
+ f2fs_handle_failed_inode(inode);
return err;
}
static int f2fs_tmpfile(struct inode *dir, struct dentry *dentry, umode_t mode)
{
- if (f2fs_encrypted_inode(dir)) {
+ struct f2fs_sb_info *sbi = F2FS_I_SB(dir);
+
+ if (unlikely(f2fs_cp_error(sbi)))
+ return -EIO;
+
+ if (f2fs_encrypted_inode(dir) || DUMMY_ENCRYPTION_ENABLED(sbi)) {
int err = fscrypt_get_encryption_info(dir);
if (err)
return err;
@@ -719,6 +803,9 @@
static int f2fs_create_whiteout(struct inode *dir, struct inode **whiteout)
{
+ if (unlikely(f2fs_cp_error(F2FS_I_SB(dir))))
+ return -EIO;
+
return __f2fs_tmpfile(dir, NULL, S_IFCHR | WHITEOUT_MODE, whiteout);
}
@@ -738,17 +825,8 @@
bool is_old_inline = f2fs_has_inline_dentry(old_dir);
int err = -ENOENT;
- if ((f2fs_encrypted_inode(old_dir) &&
- !fscrypt_has_encryption_key(old_dir)) ||
- (f2fs_encrypted_inode(new_dir) &&
- !fscrypt_has_encryption_key(new_dir)))
- return -ENOKEY;
-
- if ((old_dir != new_dir) && f2fs_encrypted_inode(new_dir) &&
- !fscrypt_has_permitted_context(new_dir, old_inode)) {
- err = -EPERM;
- goto out;
- }
+ if (unlikely(f2fs_cp_error(sbi)))
+ return -EIO;
if (is_inode_flag_set(new_dir, FI_PROJ_INHERIT) &&
(!projid_eq(F2FS_I(new_dir)->i_projid,
@@ -763,6 +841,12 @@
if (err)
goto out;
+ if (new_inode) {
+ err = dquot_initialize(new_inode);
+ if (err)
+ goto out;
+ }
+
old_entry = f2fs_find_entry(old_dir, &old_dentry->d_name, &old_page);
if (!old_entry) {
if (IS_ERR(old_page))
@@ -804,7 +888,7 @@
f2fs_lock_op(sbi);
- err = acquire_orphan_inode(sbi);
+ err = f2fs_acquire_orphan_inode(sbi);
if (err)
goto put_out_dir;
@@ -818,9 +902,9 @@
up_write(&F2FS_I(new_inode)->i_sem);
if (!new_inode->i_nlink)
- add_orphan_inode(new_inode);
+ f2fs_add_orphan_inode(new_inode);
else
- release_orphan_inode(sbi);
+ f2fs_release_orphan_inode(sbi);
} else {
f2fs_balance_fs(sbi, true);
@@ -881,15 +965,19 @@
}
if (old_dir_entry) {
- if (old_dir != new_dir && !whiteout) {
+ if (old_dir != new_dir && !whiteout)
f2fs_set_link(old_inode, old_dir_entry,
old_dir_page, new_dir);
- } else {
- f2fs_dentry_kunmap(old_inode, old_dir_page);
+ else
f2fs_put_page(old_dir_page, 0);
- }
f2fs_i_links_write(old_dir, false);
}
+ if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT) {
+ f2fs_add_ino_entry(sbi, new_dir->i_ino, TRANS_DIR_INO);
+ if (S_ISDIR(old_inode->i_mode))
+ f2fs_add_ino_entry(sbi, old_inode->i_ino,
+ TRANS_DIR_INO);
+ }
f2fs_unlock_op(sbi);
@@ -899,20 +987,15 @@
put_out_dir:
f2fs_unlock_op(sbi);
- if (new_page) {
- f2fs_dentry_kunmap(new_dir, new_page);
+ if (new_page)
f2fs_put_page(new_page, 0);
- }
out_whiteout:
if (whiteout)
iput(whiteout);
out_dir:
- if (old_dir_entry) {
- f2fs_dentry_kunmap(old_inode, old_dir_page);
+ if (old_dir_entry)
f2fs_put_page(old_dir_page, 0);
- }
out_old:
- f2fs_dentry_kunmap(old_dir, old_page);
f2fs_put_page(old_page, 0);
out:
return err;
@@ -931,17 +1014,8 @@
int old_nlink = 0, new_nlink = 0;
int err = -ENOENT;
- if ((f2fs_encrypted_inode(old_dir) &&
- !fscrypt_has_encryption_key(old_dir)) ||
- (f2fs_encrypted_inode(new_dir) &&
- !fscrypt_has_encryption_key(new_dir)))
- return -ENOKEY;
-
- if ((f2fs_encrypted_inode(old_dir) || f2fs_encrypted_inode(new_dir)) &&
- (old_dir != new_dir) &&
- (!fscrypt_has_permitted_context(new_dir, old_inode) ||
- !fscrypt_has_permitted_context(old_dir, new_inode)))
- return -EPERM;
+ if (unlikely(f2fs_cp_error(sbi)))
+ return -EIO;
if ((is_inode_flag_set(new_dir, FI_PROJ_INHERIT) &&
!projid_eq(F2FS_I(new_dir)->i_projid,
@@ -1053,6 +1127,11 @@
}
f2fs_mark_inode_dirty_sync(new_dir, false);
+ if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT) {
+ f2fs_add_ino_entry(sbi, old_dir->i_ino, TRANS_DIR_INO);
+ f2fs_add_ino_entry(sbi, new_dir->i_ino, TRANS_DIR_INO);
+ }
+
f2fs_unlock_op(sbi);
if (IS_DIRSYNC(old_dir) || IS_DIRSYNC(new_dir))
@@ -1060,19 +1139,15 @@
return 0;
out_new_dir:
if (new_dir_entry) {
- f2fs_dentry_kunmap(new_inode, new_dir_page);
f2fs_put_page(new_dir_page, 0);
}
out_old_dir:
if (old_dir_entry) {
- f2fs_dentry_kunmap(old_inode, old_dir_page);
f2fs_put_page(old_dir_page, 0);
}
out_new:
- f2fs_dentry_kunmap(new_dir, new_page);
f2fs_put_page(new_page, 0);
out_old:
- f2fs_dentry_kunmap(old_dir, old_page);
f2fs_put_page(old_page, 0);
out:
return err;
@@ -1082,9 +1157,16 @@
struct inode *new_dir, struct dentry *new_dentry,
unsigned int flags)
{
+ int err;
+
if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT))
return -EINVAL;
+ err = fscrypt_prepare_rename(old_dir, old_dentry, new_dir, new_dentry,
+ flags);
+ if (err)
+ return err;
+
if (flags & RENAME_EXCHANGE) {
return f2fs_cross_rename(old_dir, old_dentry,
new_dir, new_dentry);
@@ -1100,68 +1182,20 @@
struct inode *inode,
struct delayed_call *done)
{
- struct page *cpage = NULL;
- char *caddr, *paddr = NULL;
- struct fscrypt_str cstr = FSTR_INIT(NULL, 0);
- struct fscrypt_str pstr = FSTR_INIT(NULL, 0);
- struct fscrypt_symlink_data *sd;
- u32 max_size = inode->i_sb->s_blocksize;
- int res;
+ struct page *page;
+ const char *target;
if (!dentry)
return ERR_PTR(-ECHILD);
- res = fscrypt_get_encryption_info(inode);
- if (res)
- return ERR_PTR(res);
+ page = read_mapping_page(inode->i_mapping, 0, NULL);
+ if (IS_ERR(page))
+ return ERR_CAST(page);
- cpage = read_mapping_page(inode->i_mapping, 0, NULL);
- if (IS_ERR(cpage))
- return ERR_CAST(cpage);
- caddr = page_address(cpage);
-
- /* Symlink is encrypted */
- sd = (struct fscrypt_symlink_data *)caddr;
- cstr.name = sd->encrypted_path;
- cstr.len = le16_to_cpu(sd->len);
-
- /* this is broken symlink case */
- if (unlikely(cstr.len == 0)) {
- res = -ENOENT;
- goto errout;
- }
-
- if ((cstr.len + sizeof(struct fscrypt_symlink_data) - 1) > max_size) {
- /* Symlink data on the disk is corrupted */
- res = -EIO;
- goto errout;
- }
- res = fscrypt_fname_alloc_buffer(inode, cstr.len, &pstr);
- if (res)
- goto errout;
-
- res = fscrypt_fname_disk_to_usr(inode, 0, 0, &cstr, &pstr);
- if (res)
- goto errout;
-
- /* this is broken symlink case */
- if (unlikely(pstr.name[0] == 0)) {
- res = -ENOENT;
- goto errout;
- }
-
- paddr = pstr.name;
-
- /* Null-terminate the name */
- paddr[pstr.len] = '\0';
-
- put_page(cpage);
- set_delayed_call(done, kfree_link, paddr);
- return paddr;
-errout:
- fscrypt_fname_free_buffer(&pstr);
- put_page(cpage);
- return ERR_PTR(res);
+ target = fscrypt_get_symlink(inode, page_address(page),
+ inode->i_sb->s_blocksize, done);
+ put_page(page);
+ return target;
}
const struct inode_operations f2fs_encrypted_symlink_inode_operations = {
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 712505e..aab2dd2 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -23,7 +23,7 @@
#include "trace.h"
#include <trace/events/f2fs.h>
-#define on_build_free_nids(nmi) mutex_is_locked(&(nm_i)->build_lock)
+#define on_f2fs_build_free_nids(nmi) mutex_is_locked(&(nm_i)->build_lock)
static struct kmem_cache *nat_entry_slab;
static struct kmem_cache *free_nid_slab;
@@ -32,7 +32,7 @@
/*
* Check whether the given nid is within node id range.
*/
-int check_nid_range(struct f2fs_sb_info *sbi, nid_t nid)
+int f2fs_check_nid_range(struct f2fs_sb_info *sbi, nid_t nid)
{
if (unlikely(nid < F2FS_ROOT_INO(sbi) || nid >= NM_I(sbi)->max_nid)) {
set_sbi_flag(sbi, SBI_NEED_FSCK);
@@ -44,7 +44,7 @@
return 0;
}
-bool available_free_memory(struct f2fs_sb_info *sbi, int type)
+bool f2fs_available_free_memory(struct f2fs_sb_info *sbi, int type)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct sysinfo val;
@@ -61,7 +61,7 @@
* give 25%, 25%, 50%, 50%, 50% memory for each components respectively
*/
if (type == FREE_NIDS) {
- mem_size = (nm_i->nid_cnt[FREE_NID_LIST] *
+ mem_size = (nm_i->nid_cnt[FREE_NID] *
sizeof(struct free_nid)) >> PAGE_SHIFT;
res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 2);
} else if (type == NAT_ENTRIES) {
@@ -78,7 +78,7 @@
} else if (type == INO_ENTRIES) {
int i;
- for (i = 0; i <= UPDATE_INO; i++)
+ for (i = 0; i < MAX_INO_ENTRY; i++)
mem_size += sbi->im[i].ino_num *
sizeof(struct ino_entry);
mem_size >>= PAGE_SHIFT;
@@ -89,6 +89,10 @@
atomic_read(&sbi->total_ext_node) *
sizeof(struct extent_node)) >> PAGE_SHIFT;
res = mem_size < ((avail_ram * nm_i->ram_thresh / 100) >> 1);
+ } else if (type == INMEM_PAGES) {
+ /* it allows 20% / total_ram for inmemory pages */
+ mem_size = get_pages(sbi, F2FS_INMEM_PAGES);
+ res = mem_size < (val.totalram / 5);
} else {
if (!sbi->sb->s_bdi->wb.dirty_exceeded)
return true;
@@ -98,18 +102,10 @@
static void clear_node_page_dirty(struct page *page)
{
- struct address_space *mapping = page->mapping;
- unsigned int long flags;
-
if (PageDirty(page)) {
- spin_lock_irqsave(&mapping->tree_lock, flags);
- radix_tree_tag_clear(&mapping->page_tree,
- page_index(page),
- PAGECACHE_TAG_DIRTY);
- spin_unlock_irqrestore(&mapping->tree_lock, flags);
-
+ f2fs_clear_radix_tree_dirty_tag(page);
clear_page_dirty_for_io(page);
- dec_page_count(F2FS_M_SB(mapping), F2FS_DIRTY_NODES);
+ dec_page_count(F2FS_P_SB(page), F2FS_DIRTY_NODES);
}
ClearPageUptodate(page);
}
@@ -117,7 +113,7 @@
static struct page *get_current_nat_page(struct f2fs_sb_info *sbi, nid_t nid)
{
pgoff_t index = current_nat_addr(sbi, nid);
- return get_meta_page(sbi, index);
+ return f2fs_get_meta_page(sbi, index);
}
static struct page *get_next_nat_page(struct f2fs_sb_info *sbi, nid_t nid)
@@ -134,8 +130,8 @@
dst_off = next_nat_addr(sbi, src_off);
/* get current nat block page with lock */
- src_page = get_meta_page(sbi, src_off);
- dst_page = grab_meta_page(sbi, dst_off);
+ src_page = f2fs_get_meta_page(sbi, src_off);
+ dst_page = f2fs_grab_meta_page(sbi, dst_off);
f2fs_bug_on(sbi, PageDirty(src_page));
src_addr = page_address(src_page);
@@ -149,6 +145,42 @@
return dst_page;
}
+static struct nat_entry *__alloc_nat_entry(nid_t nid, bool no_fail)
+{
+ struct nat_entry *new;
+
+ if (no_fail)
+ new = f2fs_kmem_cache_alloc(nat_entry_slab, GFP_F2FS_ZERO);
+ else
+ new = kmem_cache_alloc(nat_entry_slab, GFP_F2FS_ZERO);
+ if (new) {
+ nat_set_nid(new, nid);
+ nat_reset_flag(new);
+ }
+ return new;
+}
+
+static void __free_nat_entry(struct nat_entry *e)
+{
+ kmem_cache_free(nat_entry_slab, e);
+}
+
+/* must be locked by nat_tree_lock */
+static struct nat_entry *__init_nat_entry(struct f2fs_nm_info *nm_i,
+ struct nat_entry *ne, struct f2fs_nat_entry *raw_ne, bool no_fail)
+{
+ if (no_fail)
+ f2fs_radix_tree_insert(&nm_i->nat_root, nat_get_nid(ne), ne);
+ else if (radix_tree_insert(&nm_i->nat_root, nat_get_nid(ne), ne))
+ return NULL;
+
+ if (raw_ne)
+ node_info_from_raw_nat(&ne->ni, raw_ne);
+ list_add_tail(&ne->list, &nm_i->nat_entries);
+ nm_i->nat_cnt++;
+ return ne;
+}
+
static struct nat_entry *__lookup_nat_cache(struct f2fs_nm_info *nm_i, nid_t n)
{
return radix_tree_lookup(&nm_i->nat_root, n);
@@ -165,11 +197,11 @@
list_del(&e->list);
radix_tree_delete(&nm_i->nat_root, nat_get_nid(e));
nm_i->nat_cnt--;
- kmem_cache_free(nat_entry_slab, e);
+ __free_nat_entry(e);
}
-static void __set_nat_cache_dirty(struct f2fs_nm_info *nm_i,
- struct nat_entry *ne)
+static struct nat_entry_set *__grab_nat_entry_set(struct f2fs_nm_info *nm_i,
+ struct nat_entry *ne)
{
nid_t set = NAT_BLOCK_OFFSET(ne->ni.nid);
struct nat_entry_set *head;
@@ -184,15 +216,36 @@
head->entry_cnt = 0;
f2fs_radix_tree_insert(&nm_i->nat_set_root, set, head);
}
+ return head;
+}
+
+static void __set_nat_cache_dirty(struct f2fs_nm_info *nm_i,
+ struct nat_entry *ne)
+{
+ struct nat_entry_set *head;
+ bool new_ne = nat_get_blkaddr(ne) == NEW_ADDR;
+
+ if (!new_ne)
+ head = __grab_nat_entry_set(nm_i, ne);
+
+ /*
+ * update entry_cnt in below condition:
+ * 1. update NEW_ADDR to valid block address;
+ * 2. update old block address to new one;
+ */
+ if (!new_ne && (get_nat_flag(ne, IS_PREALLOC) ||
+ !get_nat_flag(ne, IS_DIRTY)))
+ head->entry_cnt++;
+
+ set_nat_flag(ne, IS_PREALLOC, new_ne);
if (get_nat_flag(ne, IS_DIRTY))
goto refresh_list;
nm_i->dirty_nat_cnt++;
- head->entry_cnt++;
set_nat_flag(ne, IS_DIRTY, true);
refresh_list:
- if (nat_get_blkaddr(ne) == NEW_ADDR)
+ if (new_ne)
list_del_init(&ne->list);
else
list_move_tail(&ne->list, &head->entry_list);
@@ -214,7 +267,7 @@
start, nr);
}
-int need_dentry_mark(struct f2fs_sb_info *sbi, nid_t nid)
+int f2fs_need_dentry_mark(struct f2fs_sb_info *sbi, nid_t nid)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct nat_entry *e;
@@ -231,7 +284,7 @@
return need;
}
-bool is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid)
+bool f2fs_is_checkpointed_node(struct f2fs_sb_info *sbi, nid_t nid)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct nat_entry *e;
@@ -245,7 +298,7 @@
return is_cp;
}
-bool need_inode_block_update(struct f2fs_sb_info *sbi, nid_t ino)
+bool f2fs_need_inode_block_update(struct f2fs_sb_info *sbi, nid_t ino)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct nat_entry *e;
@@ -261,49 +314,29 @@
return need_update;
}
-static struct nat_entry *grab_nat_entry(struct f2fs_nm_info *nm_i, nid_t nid,
- bool no_fail)
-{
- struct nat_entry *new;
-
- if (no_fail) {
- new = f2fs_kmem_cache_alloc(nat_entry_slab, GFP_NOFS);
- f2fs_radix_tree_insert(&nm_i->nat_root, nid, new);
- } else {
- new = kmem_cache_alloc(nat_entry_slab, GFP_NOFS);
- if (!new)
- return NULL;
- if (radix_tree_insert(&nm_i->nat_root, nid, new)) {
- kmem_cache_free(nat_entry_slab, new);
- return NULL;
- }
- }
-
- memset(new, 0, sizeof(struct nat_entry));
- nat_set_nid(new, nid);
- nat_reset_flag(new);
- list_add_tail(&new->list, &nm_i->nat_entries);
- nm_i->nat_cnt++;
- return new;
-}
-
+/* must be locked by nat_tree_lock */
static void cache_nat_entry(struct f2fs_sb_info *sbi, nid_t nid,
struct f2fs_nat_entry *ne)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
- struct nat_entry *e;
+ struct nat_entry *new, *e;
+ new = __alloc_nat_entry(nid, false);
+ if (!new)
+ return;
+
+ down_write(&nm_i->nat_tree_lock);
e = __lookup_nat_cache(nm_i, nid);
- if (!e) {
- e = grab_nat_entry(nm_i, nid, false);
- if (e)
- node_info_from_raw_nat(&e->ni, ne);
- } else {
+ if (!e)
+ e = __init_nat_entry(nm_i, new, ne, false);
+ else
f2fs_bug_on(sbi, nat_get_ino(e) != le32_to_cpu(ne->ino) ||
nat_get_blkaddr(e) !=
le32_to_cpu(ne->block_addr) ||
nat_get_version(e) != ne->version);
- }
+ up_write(&nm_i->nat_tree_lock);
+ if (e != new)
+ __free_nat_entry(new);
}
static void set_node_addr(struct f2fs_sb_info *sbi, struct node_info *ni,
@@ -311,11 +344,12 @@
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct nat_entry *e;
+ struct nat_entry *new = __alloc_nat_entry(ni->nid, true);
down_write(&nm_i->nat_tree_lock);
e = __lookup_nat_cache(nm_i, ni->nid);
if (!e) {
- e = grab_nat_entry(nm_i, ni->nid, true);
+ e = __init_nat_entry(nm_i, new, NULL, true);
copy_node_info(&e->ni, ni);
f2fs_bug_on(sbi, ni->blk_addr == NEW_ADDR);
} else if (new_blkaddr == NEW_ADDR) {
@@ -327,6 +361,9 @@
copy_node_info(&e->ni, ni);
f2fs_bug_on(sbi, ni->blk_addr != NULL_ADDR);
}
+ /* let's free early to reduce memory consumption */
+ if (e != new)
+ __free_nat_entry(new);
/* sanity check */
f2fs_bug_on(sbi, nat_get_blkaddr(e) != ni->blk_addr);
@@ -334,23 +371,18 @@
new_blkaddr == NULL_ADDR);
f2fs_bug_on(sbi, nat_get_blkaddr(e) == NEW_ADDR &&
new_blkaddr == NEW_ADDR);
- f2fs_bug_on(sbi, nat_get_blkaddr(e) != NEW_ADDR &&
- nat_get_blkaddr(e) != NULL_ADDR &&
+ f2fs_bug_on(sbi, is_valid_blkaddr(nat_get_blkaddr(e)) &&
new_blkaddr == NEW_ADDR);
/* increment version no as node is removed */
if (nat_get_blkaddr(e) != NEW_ADDR && new_blkaddr == NULL_ADDR) {
unsigned char version = nat_get_version(e);
nat_set_version(e, inc_node_version(version));
-
- /* in order to reuse the nid */
- if (nm_i->next_scan_nid > ni->nid)
- nm_i->next_scan_nid = ni->nid;
}
/* change address */
nat_set_blkaddr(e, new_blkaddr);
- if (new_blkaddr == NEW_ADDR || new_blkaddr == NULL_ADDR)
+ if (!is_valid_blkaddr(new_blkaddr))
set_nat_flag(e, IS_CHECKPOINTED, false);
__set_nat_cache_dirty(nm_i, e);
@@ -365,7 +397,7 @@
up_write(&nm_i->nat_tree_lock);
}
-int try_to_free_nats(struct f2fs_sb_info *sbi, int nr_shrink)
+int f2fs_try_to_free_nats(struct f2fs_sb_info *sbi, int nr_shrink)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
int nr = nr_shrink;
@@ -387,7 +419,8 @@
/*
* This function always returns success
*/
-void get_node_info(struct f2fs_sb_info *sbi, nid_t nid, struct node_info *ni)
+void f2fs_get_node_info(struct f2fs_sb_info *sbi, nid_t nid,
+ struct node_info *ni)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
@@ -417,7 +450,7 @@
/* Check current segment summary */
down_read(&curseg->journal_rwsem);
- i = lookup_journal_in_cursum(journal, NAT_JOURNAL, nid, 0);
+ i = f2fs_lookup_journal_in_cursum(journal, NAT_JOURNAL, nid, 0);
if (i >= 0) {
ne = nat_in_journal(journal, i);
node_info_from_raw_nat(ni, &ne);
@@ -432,22 +465,20 @@
index = current_nat_addr(sbi, nid);
up_read(&nm_i->nat_tree_lock);
- page = get_meta_page(sbi, index);
+ page = f2fs_get_meta_page(sbi, index);
nat_blk = (struct f2fs_nat_block *)page_address(page);
ne = nat_blk->entries[nid - start_nid];
node_info_from_raw_nat(ni, &ne);
f2fs_put_page(page, 1);
cache:
/* cache nat entry */
- down_write(&nm_i->nat_tree_lock);
cache_nat_entry(sbi, nid, &ne);
- up_write(&nm_i->nat_tree_lock);
}
/*
* readahead MAX_RA_NODE number of node pages.
*/
-static void ra_node_pages(struct page *parent, int start, int n)
+static void f2fs_ra_node_pages(struct page *parent, int start, int n)
{
struct f2fs_sb_info *sbi = F2FS_P_SB(parent);
struct blk_plug plug;
@@ -461,13 +492,13 @@
end = min(end, NIDS_PER_BLOCK);
for (i = start; i < end; i++) {
nid = get_nid(parent, i, false);
- ra_node_page(sbi, nid);
+ f2fs_ra_node_page(sbi, nid);
}
blk_finish_plug(&plug);
}
-pgoff_t get_next_page_offset(struct dnode_of_data *dn, pgoff_t pgofs)
+pgoff_t f2fs_get_next_page_offset(struct dnode_of_data *dn, pgoff_t pgofs)
{
const long direct_index = ADDRS_PER_INODE(dn->inode);
const long direct_blks = ADDRS_PER_BLOCK;
@@ -582,7 +613,7 @@
* f2fs_unlock_op() only if ro is not set RDONLY_NODE.
* In the case of RDONLY_NODE, we don't need to care about mutex.
*/
-int get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode)
+int f2fs_get_dnode_of_data(struct dnode_of_data *dn, pgoff_t index, int mode)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);
struct page *npage[4];
@@ -601,7 +632,7 @@
npage[0] = dn->inode_page;
if (!npage[0]) {
- npage[0] = get_node_page(sbi, nids[0]);
+ npage[0] = f2fs_get_node_page(sbi, nids[0]);
if (IS_ERR(npage[0]))
return PTR_ERR(npage[0]);
}
@@ -625,24 +656,24 @@
if (!nids[i] && mode == ALLOC_NODE) {
/* alloc new node */
- if (!alloc_nid(sbi, &(nids[i]))) {
+ if (!f2fs_alloc_nid(sbi, &(nids[i]))) {
err = -ENOSPC;
goto release_pages;
}
dn->nid = nids[i];
- npage[i] = new_node_page(dn, noffset[i]);
+ npage[i] = f2fs_new_node_page(dn, noffset[i]);
if (IS_ERR(npage[i])) {
- alloc_nid_failed(sbi, nids[i]);
+ f2fs_alloc_nid_failed(sbi, nids[i]);
err = PTR_ERR(npage[i]);
goto release_pages;
}
set_nid(parent, offset[i - 1], nids[i], i == 1);
- alloc_nid_done(sbi, nids[i]);
+ f2fs_alloc_nid_done(sbi, nids[i]);
done = true;
} else if (mode == LOOKUP_NODE_RA && i == level && level > 1) {
- npage[i] = get_node_page_ra(parent, offset[i - 1]);
+ npage[i] = f2fs_get_node_page_ra(parent, offset[i - 1]);
if (IS_ERR(npage[i])) {
err = PTR_ERR(npage[i]);
goto release_pages;
@@ -657,7 +688,7 @@
}
if (!done) {
- npage[i] = get_node_page(sbi, nids[i]);
+ npage[i] = f2fs_get_node_page(sbi, nids[i]);
if (IS_ERR(npage[i])) {
err = PTR_ERR(npage[i]);
f2fs_put_page(npage[0], 0);
@@ -696,16 +727,15 @@
struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);
struct node_info ni;
- get_node_info(sbi, dn->nid, &ni);
- f2fs_bug_on(sbi, ni.blk_addr == NULL_ADDR);
+ f2fs_get_node_info(sbi, dn->nid, &ni);
/* Deallocate node address */
- invalidate_blocks(sbi, ni.blk_addr);
+ f2fs_invalidate_blocks(sbi, ni.blk_addr);
dec_valid_node_count(sbi, dn->inode, dn->nid == dn->inode->i_ino);
set_node_addr(sbi, &ni, NULL_ADDR, false);
if (dn->nid == dn->inode->i_ino) {
- remove_orphan_inode(sbi, dn->nid);
+ f2fs_remove_orphan_inode(sbi, dn->nid);
dec_valid_inode_count(sbi);
f2fs_inode_synced(dn->inode);
}
@@ -730,7 +760,7 @@
return 1;
/* get direct node */
- page = get_node_page(F2FS_I_SB(dn->inode), dn->nid);
+ page = f2fs_get_node_page(F2FS_I_SB(dn->inode), dn->nid);
if (IS_ERR(page) && PTR_ERR(page) == -ENOENT)
return 1;
else if (IS_ERR(page))
@@ -739,7 +769,7 @@
/* Make dnode_of_data for parameter */
dn->node_page = page;
dn->ofs_in_node = 0;
- truncate_data_blocks(dn);
+ f2fs_truncate_data_blocks(dn);
truncate_node(dn);
return 1;
}
@@ -760,13 +790,13 @@
trace_f2fs_truncate_nodes_enter(dn->inode, dn->nid, dn->data_blkaddr);
- page = get_node_page(F2FS_I_SB(dn->inode), dn->nid);
+ page = f2fs_get_node_page(F2FS_I_SB(dn->inode), dn->nid);
if (IS_ERR(page)) {
trace_f2fs_truncate_nodes_exit(dn->inode, PTR_ERR(page));
return PTR_ERR(page);
}
- ra_node_pages(page, ofs, NIDS_PER_BLOCK);
+ f2fs_ra_node_pages(page, ofs, NIDS_PER_BLOCK);
rn = F2FS_NODE(page);
if (depth < 3) {
@@ -836,7 +866,7 @@
/* get indirect nodes in the path */
for (i = 0; i < idx + 1; i++) {
/* reference count'll be increased */
- pages[i] = get_node_page(F2FS_I_SB(dn->inode), nid[i]);
+ pages[i] = f2fs_get_node_page(F2FS_I_SB(dn->inode), nid[i]);
if (IS_ERR(pages[i])) {
err = PTR_ERR(pages[i]);
idx = i - 1;
@@ -845,7 +875,7 @@
nid[i + 1] = get_nid(pages[i], offset[i + 1], false);
}
- ra_node_pages(pages[idx], offset[idx + 1], NIDS_PER_BLOCK);
+ f2fs_ra_node_pages(pages[idx], offset[idx + 1], NIDS_PER_BLOCK);
/* free direct nodes linked to a partial indirect node */
for (i = offset[idx + 1]; i < NIDS_PER_BLOCK; i++) {
@@ -882,7 +912,7 @@
/*
* All the block addresses of data and nodes should be nullified.
*/
-int truncate_inode_blocks(struct inode *inode, pgoff_t from)
+int f2fs_truncate_inode_blocks(struct inode *inode, pgoff_t from)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
int err = 0, cont = 1;
@@ -898,7 +928,7 @@
if (level < 0)
return level;
- page = get_node_page(sbi, inode->i_ino);
+ page = f2fs_get_node_page(sbi, inode->i_ino);
if (IS_ERR(page)) {
trace_f2fs_truncate_inode_blocks_exit(inode, PTR_ERR(page));
return PTR_ERR(page);
@@ -977,7 +1007,8 @@
return err > 0 ? 0 : err;
}
-int truncate_xattr_node(struct inode *inode, struct page *page)
+/* caller must lock inode page */
+int f2fs_truncate_xattr_node(struct inode *inode)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
nid_t nid = F2FS_I(inode)->i_xattr_nid;
@@ -987,16 +1018,13 @@
if (!nid)
return 0;
- npage = get_node_page(sbi, nid);
+ npage = f2fs_get_node_page(sbi, nid);
if (IS_ERR(npage))
return PTR_ERR(npage);
f2fs_i_xnid_write(inode, 0);
- set_new_dnode(&dn, inode, page, npage, nid);
-
- if (page)
- dn.inode_page_locked = true;
+ set_new_dnode(&dn, inode, NULL, npage, nid);
truncate_node(&dn);
return 0;
}
@@ -1005,17 +1033,17 @@
* Caller should grab and release a rwsem by calling f2fs_lock_op() and
* f2fs_unlock_op().
*/
-int remove_inode_page(struct inode *inode)
+int f2fs_remove_inode_page(struct inode *inode)
{
struct dnode_of_data dn;
int err;
set_new_dnode(&dn, inode, NULL, NULL, inode->i_ino);
- err = get_dnode_of_data(&dn, 0, LOOKUP_NODE);
+ err = f2fs_get_dnode_of_data(&dn, 0, LOOKUP_NODE);
if (err)
return err;
- err = truncate_xattr_node(inode, dn.inode_page);
+ err = f2fs_truncate_xattr_node(inode);
if (err) {
f2fs_put_dnode(&dn);
return err;
@@ -1024,7 +1052,7 @@
/* remove potential inline_data blocks */
if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) ||
S_ISLNK(inode->i_mode))
- truncate_data_blocks_range(&dn, 1);
+ f2fs_truncate_data_blocks_range(&dn, 1);
/* 0 is possible, after f2fs_new_inode() has failed */
f2fs_bug_on(F2FS_I_SB(inode),
@@ -1035,7 +1063,7 @@
return 0;
}
-struct page *new_inode_page(struct inode *inode)
+struct page *f2fs_new_inode_page(struct inode *inode)
{
struct dnode_of_data dn;
@@ -1043,10 +1071,10 @@
set_new_dnode(&dn, inode, NULL, NULL, inode->i_ino);
/* caller should f2fs_put_page(page, 1); */
- return new_node_page(&dn, 0);
+ return f2fs_new_node_page(&dn, 0);
}
-struct page *new_node_page(struct dnode_of_data *dn, unsigned int ofs)
+struct page *f2fs_new_node_page(struct dnode_of_data *dn, unsigned int ofs)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(dn->inode);
struct node_info new_ni;
@@ -1064,7 +1092,7 @@
goto fail;
#ifdef CONFIG_F2FS_CHECK_FS
- get_node_info(sbi, dn->nid, &new_ni);
+ f2fs_get_node_info(sbi, dn->nid, &new_ni);
f2fs_bug_on(sbi, new_ni.blk_addr != NULL_ADDR);
#endif
new_ni.nid = dn->nid;
@@ -1076,7 +1104,7 @@
f2fs_wait_on_page_writeback(page, NODE, true);
fill_node_footer(page, dn->nid, dn->inode->i_ino, ofs, true);
- set_cold_node(dn->inode, page);
+ set_cold_node(page, S_ISDIR(dn->inode->i_mode));
if (!PageUptodate(page))
SetPageUptodate(page);
if (set_page_dirty(page))
@@ -1116,7 +1144,7 @@
if (PageUptodate(page))
return LOCKED_PAGE;
- get_node_info(sbi, page->index, &ni);
+ f2fs_get_node_info(sbi, page->index, &ni);
if (unlikely(ni.blk_addr == NULL_ADDR)) {
ClearPageUptodate(page);
@@ -1130,14 +1158,14 @@
/*
* Readahead a node page
*/
-void ra_node_page(struct f2fs_sb_info *sbi, nid_t nid)
+void f2fs_ra_node_page(struct f2fs_sb_info *sbi, nid_t nid)
{
struct page *apage;
int err;
if (!nid)
return;
- if (check_nid_range(sbi, nid))
+ if (f2fs_check_nid_range(sbi, nid))
return;
rcu_read_lock();
@@ -1162,7 +1190,7 @@
if (!nid)
return ERR_PTR(-ENOENT);
- if (check_nid_range(sbi, nid))
+ if (f2fs_check_nid_range(sbi, nid))
return ERR_PTR(-EINVAL);
repeat:
page = f2fs_grab_cache_page(NODE_MAPPING(sbi), nid, false);
@@ -1179,7 +1207,7 @@
}
if (parent)
- ra_node_pages(parent, start + 1, MAX_RA_NODE);
+ f2fs_ra_node_pages(parent, start + 1, MAX_RA_NODE);
lock_page(page);
@@ -1213,12 +1241,12 @@
return page;
}
-struct page *get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid)
+struct page *f2fs_get_node_page(struct f2fs_sb_info *sbi, pgoff_t nid)
{
return __get_node_page(sbi, nid, NULL, 0);
}
-struct page *get_node_page_ra(struct page *parent, int start)
+struct page *f2fs_get_node_page_ra(struct page *parent, int start)
{
struct f2fs_sb_info *sbi = F2FS_P_SB(parent);
nid_t nid = get_nid(parent, start, false);
@@ -1237,7 +1265,8 @@
if (!inode)
return;
- page = pagecache_get_page(inode->i_mapping, 0, FGP_LOCK|FGP_NOWAIT, 0);
+ page = f2fs_pagecache_get_page(inode->i_mapping, 0,
+ FGP_LOCK|FGP_NOWAIT, 0);
if (!page)
goto iput_out;
@@ -1252,7 +1281,7 @@
ret = f2fs_write_inline_data(inode, page);
inode_dec_dirty_pages(inode);
- remove_dirty_inode(inode);
+ f2fs_remove_dirty_inode(inode);
if (ret)
set_page_dirty(page);
page_out:
@@ -1261,54 +1290,19 @@
iput(inode);
}
-void move_node_page(struct page *node_page, int gc_type)
-{
- if (gc_type == FG_GC) {
- struct f2fs_sb_info *sbi = F2FS_P_SB(node_page);
- struct writeback_control wbc = {
- .sync_mode = WB_SYNC_ALL,
- .nr_to_write = 1,
- .for_reclaim = 0,
- };
-
- set_page_dirty(node_page);
- f2fs_wait_on_page_writeback(node_page, NODE, true);
-
- f2fs_bug_on(sbi, PageWriteback(node_page));
- if (!clear_page_dirty_for_io(node_page))
- goto out_page;
-
- if (NODE_MAPPING(sbi)->a_ops->writepage(node_page, &wbc))
- unlock_page(node_page);
- goto release_page;
- } else {
- /* set page dirty and write it */
- if (!PageWriteback(node_page))
- set_page_dirty(node_page);
- }
-out_page:
- unlock_page(node_page);
-release_page:
- f2fs_put_page(node_page, 0);
-}
-
static struct page *last_fsync_dnode(struct f2fs_sb_info *sbi, nid_t ino)
{
- pgoff_t index, end;
+ pgoff_t index;
struct pagevec pvec;
struct page *last_page = NULL;
+ int nr_pages;
pagevec_init(&pvec, 0);
index = 0;
- end = ULONG_MAX;
- while (index <= end) {
- int i, nr_pages;
- nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index,
- PAGECACHE_TAG_DIRTY,
- min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1);
- if (nr_pages == 0)
- break;
+ while ((nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index,
+ PAGECACHE_TAG_DIRTY))) {
+ int i;
for (i = 0; i < nr_pages; i++) {
struct page *page = pvec.pages[i];
@@ -1361,6 +1355,7 @@
struct node_info ni;
struct f2fs_io_info fio = {
.sbi = sbi,
+ .ino = ino_of_node(page),
.type = NODE,
.op = REQ_OP_WRITE,
.op_flags = wbc_to_write_flags(wbc),
@@ -1368,15 +1363,17 @@
.encrypted_page = NULL,
.submitted = false,
.io_type = io_type,
+ .io_wbc = wbc,
};
trace_f2fs_writepage(page, NODE);
- if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
- goto redirty_out;
if (unlikely(f2fs_cp_error(sbi)))
goto redirty_out;
+ if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
+ goto redirty_out;
+
/* get old block addr of this node page */
nid = nid_of_node(page);
f2fs_bug_on(sbi, page->index != nid);
@@ -1388,7 +1385,7 @@
down_read(&sbi->node_write);
}
- get_node_info(sbi, nid, &ni);
+ f2fs_get_node_info(sbi, nid, &ni);
/* This page is already truncated */
if (unlikely(ni.blk_addr == NULL_ADDR)) {
@@ -1403,8 +1400,9 @@
fio.op_flags |= REQ_PREFLUSH | REQ_FUA;
set_page_writeback(page);
+ ClearPageError(page);
fio.old_blkaddr = ni.blk_addr;
- write_node_page(nid, &fio);
+ f2fs_do_write_node_page(nid, &fio);
set_node_addr(sbi, &ni, fio.new_blkaddr, is_fsync_dnode(page));
dec_page_count(sbi, F2FS_DIRTY_NODES);
up_read(&sbi->node_write);
@@ -1433,22 +1431,54 @@
return AOP_WRITEPAGE_ACTIVATE;
}
+void f2fs_move_node_page(struct page *node_page, int gc_type)
+{
+ if (gc_type == FG_GC) {
+ struct writeback_control wbc = {
+ .sync_mode = WB_SYNC_ALL,
+ .nr_to_write = 1,
+ .for_reclaim = 0,
+ };
+
+ set_page_dirty(node_page);
+ f2fs_wait_on_page_writeback(node_page, NODE, true);
+
+ f2fs_bug_on(F2FS_P_SB(node_page), PageWriteback(node_page));
+ if (!clear_page_dirty_for_io(node_page))
+ goto out_page;
+
+ if (__write_node_page(node_page, false, NULL,
+ &wbc, false, FS_GC_NODE_IO))
+ unlock_page(node_page);
+ goto release_page;
+ } else {
+ /* set page dirty and write it */
+ if (!PageWriteback(node_page))
+ set_page_dirty(node_page);
+ }
+out_page:
+ unlock_page(node_page);
+release_page:
+ f2fs_put_page(node_page, 0);
+}
+
static int f2fs_write_node_page(struct page *page,
struct writeback_control *wbc)
{
return __write_node_page(page, false, NULL, wbc, false, FS_NODE_IO);
}
-int fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode,
+int f2fs_fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode,
struct writeback_control *wbc, bool atomic)
{
- pgoff_t index, end;
+ pgoff_t index;
pgoff_t last_idx = ULONG_MAX;
struct pagevec pvec;
int ret = 0;
struct page *last_page = NULL;
bool marked = false;
nid_t ino = inode->i_ino;
+ int nr_pages;
if (atomic) {
last_page = last_fsync_dnode(sbi, ino);
@@ -1458,15 +1488,10 @@
retry:
pagevec_init(&pvec, 0);
index = 0;
- end = ULONG_MAX;
- while (index <= end) {
- int i, nr_pages;
- nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index,
- PAGECACHE_TAG_DIRTY,
- min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1);
- if (nr_pages == 0)
- break;
+ while ((nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index,
+ PAGECACHE_TAG_DIRTY))) {
+ int i;
for (i = 0; i < nr_pages; i++) {
struct page *page = pvec.pages[i];
@@ -1510,9 +1535,9 @@
if (IS_INODE(page)) {
if (is_inode_flag_set(inode,
FI_DIRTY_INODE))
- update_inode(inode, page);
+ f2fs_update_inode(inode, page);
set_dentry_mark(page,
- need_dentry_mark(sbi, ino));
+ f2fs_need_dentry_mark(sbi, ino));
}
/* may be written by other thread */
if (!PageDirty(page))
@@ -1562,37 +1587,35 @@
return ret ? -EIO: 0;
}
-int sync_node_pages(struct f2fs_sb_info *sbi, struct writeback_control *wbc,
+int f2fs_sync_node_pages(struct f2fs_sb_info *sbi,
+ struct writeback_control *wbc,
bool do_balance, enum iostat_type io_type)
{
- pgoff_t index, end;
+ pgoff_t index;
struct pagevec pvec;
int step = 0;
int nwritten = 0;
int ret = 0;
+ int nr_pages, done = 0;
pagevec_init(&pvec, 0);
next_step:
index = 0;
- end = ULONG_MAX;
- while (index <= end) {
- int i, nr_pages;
- nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index,
- PAGECACHE_TAG_DIRTY,
- min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1);
- if (nr_pages == 0)
- break;
+ while (!done && (nr_pages = pagevec_lookup_tag(&pvec,
+ NODE_MAPPING(sbi), &index, PAGECACHE_TAG_DIRTY))) {
+ int i;
for (i = 0; i < nr_pages; i++) {
struct page *page = pvec.pages[i];
bool submitted = false;
- if (unlikely(f2fs_cp_error(sbi))) {
- pagevec_release(&pvec);
- ret = -EIO;
- goto out;
+ /* give a priority to WB_SYNC threads */
+ if (atomic_read(&sbi->wb_sync_req[NODE]) &&
+ wbc->sync_mode == WB_SYNC_NONE) {
+ done = 1;
+ break;
}
/*
@@ -1666,35 +1689,31 @@
step++;
goto next_step;
}
-out:
+
if (nwritten)
f2fs_submit_merged_write(sbi, NODE);
+
+ if (unlikely(f2fs_cp_error(sbi)))
+ return -EIO;
return ret;
}
-int wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, nid_t ino)
+int f2fs_wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, nid_t ino)
{
- pgoff_t index = 0, end = ULONG_MAX;
+ pgoff_t index = 0;
struct pagevec pvec;
int ret2, ret = 0;
+ int nr_pages;
pagevec_init(&pvec, 0);
- while (index <= end) {
- int i, nr_pages;
- nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index,
- PAGECACHE_TAG_WRITEBACK,
- min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1);
- if (nr_pages == 0)
- break;
+ while ((nr_pages = pagevec_lookup_tag(&pvec, NODE_MAPPING(sbi), &index,
+ PAGECACHE_TAG_WRITEBACK))) {
+ int i;
for (i = 0; i < nr_pages; i++) {
struct page *page = pvec.pages[i];
- /* until radix tree lookup accepts end_index */
- if (unlikely(page->index > end))
- continue;
-
if (ino && ino_of_node(page) == ino) {
f2fs_wait_on_page_writeback(page, NODE, true);
if (TestClearPageError(page))
@@ -1728,14 +1747,21 @@
if (get_pages(sbi, F2FS_DIRTY_NODES) < nr_pages_to_skip(sbi, NODE))
goto skip_write;
+ if (wbc->sync_mode == WB_SYNC_ALL)
+ atomic_inc(&sbi->wb_sync_req[NODE]);
+ else if (atomic_read(&sbi->wb_sync_req[NODE]))
+ goto skip_write;
+
trace_f2fs_writepages(mapping->host, wbc, NODE);
diff = nr_pages_to_write(sbi, NODE, wbc);
- wbc->sync_mode = WB_SYNC_NONE;
blk_start_plug(&plug);
- sync_node_pages(sbi, wbc, true, FS_NODE_IO);
+ f2fs_sync_node_pages(sbi, wbc, true, FS_NODE_IO);
blk_finish_plug(&plug);
wbc->nr_to_write = max((long)0, wbc->nr_to_write - diff);
+
+ if (wbc->sync_mode == WB_SYNC_ALL)
+ atomic_dec(&sbi->wb_sync_req[NODE]);
return 0;
skip_write:
@@ -1751,7 +1777,7 @@
if (!PageUptodate(page))
SetPageUptodate(page);
if (!PageDirty(page)) {
- f2fs_set_page_dirty_nobuffers(page);
+ __set_page_dirty_nobuffers(page);
inc_page_count(F2FS_P_SB(page), F2FS_DIRTY_NODES);
SetPagePrivate(page);
f2fs_trace_pid(page);
@@ -1780,39 +1806,83 @@
return radix_tree_lookup(&nm_i->free_nid_root, n);
}
-static int __insert_nid_to_list(struct f2fs_sb_info *sbi,
- struct free_nid *i, enum nid_list list, bool new)
+static int __insert_free_nid(struct f2fs_sb_info *sbi,
+ struct free_nid *i, enum nid_state state)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
- if (new) {
- int err = radix_tree_insert(&nm_i->free_nid_root, i->nid, i);
- if (err)
- return err;
- }
+ int err = radix_tree_insert(&nm_i->free_nid_root, i->nid, i);
+ if (err)
+ return err;
- f2fs_bug_on(sbi, list == FREE_NID_LIST ? i->state != NID_NEW :
- i->state != NID_ALLOC);
- nm_i->nid_cnt[list]++;
- list_add_tail(&i->list, &nm_i->nid_list[list]);
+ f2fs_bug_on(sbi, state != i->state);
+ nm_i->nid_cnt[state]++;
+ if (state == FREE_NID)
+ list_add_tail(&i->list, &nm_i->free_nid_list);
return 0;
}
-static void __remove_nid_from_list(struct f2fs_sb_info *sbi,
- struct free_nid *i, enum nid_list list, bool reuse)
+static void __remove_free_nid(struct f2fs_sb_info *sbi,
+ struct free_nid *i, enum nid_state state)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
- f2fs_bug_on(sbi, list == FREE_NID_LIST ? i->state != NID_NEW :
- i->state != NID_ALLOC);
- nm_i->nid_cnt[list]--;
- list_del(&i->list);
- if (!reuse)
- radix_tree_delete(&nm_i->free_nid_root, i->nid);
+ f2fs_bug_on(sbi, state != i->state);
+ nm_i->nid_cnt[state]--;
+ if (state == FREE_NID)
+ list_del(&i->list);
+ radix_tree_delete(&nm_i->free_nid_root, i->nid);
+}
+
+static void __move_free_nid(struct f2fs_sb_info *sbi, struct free_nid *i,
+ enum nid_state org_state, enum nid_state dst_state)
+{
+ struct f2fs_nm_info *nm_i = NM_I(sbi);
+
+ f2fs_bug_on(sbi, org_state != i->state);
+ i->state = dst_state;
+ nm_i->nid_cnt[org_state]--;
+ nm_i->nid_cnt[dst_state]++;
+
+ switch (dst_state) {
+ case PREALLOC_NID:
+ list_del(&i->list);
+ break;
+ case FREE_NID:
+ list_add_tail(&i->list, &nm_i->free_nid_list);
+ break;
+ default:
+ BUG_ON(1);
+ }
+}
+
+static void update_free_nid_bitmap(struct f2fs_sb_info *sbi, nid_t nid,
+ bool set, bool build)
+{
+ struct f2fs_nm_info *nm_i = NM_I(sbi);
+ unsigned int nat_ofs = NAT_BLOCK_OFFSET(nid);
+ unsigned int nid_ofs = nid - START_NID(nid);
+
+ if (!test_bit_le(nat_ofs, nm_i->nat_block_bitmap))
+ return;
+
+ if (set) {
+ if (test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
+ return;
+ __set_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
+ nm_i->free_nid_count[nat_ofs]++;
+ } else {
+ if (!test_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]))
+ return;
+ __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
+ if (!build)
+ nm_i->free_nid_count[nat_ofs]--;
+ }
}
/* return if the nid is recognized as free */
-static bool add_free_nid(struct f2fs_sb_info *sbi, nid_t nid, bool build)
+static bool add_free_nid(struct f2fs_sb_info *sbi,
+ nid_t nid, bool build, bool update)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct free_nid *i, *e;
@@ -1826,10 +1896,9 @@
i = f2fs_kmem_cache_alloc(free_nid_slab, GFP_NOFS);
i->nid = nid;
- i->state = NID_NEW;
+ i->state = FREE_NID;
- if (radix_tree_preload(GFP_NOFS))
- goto err;
+ radix_tree_preload(GFP_NOFS | __GFP_NOFAIL);
spin_lock(&nm_i->nid_list_lock);
@@ -1838,22 +1907,22 @@
* Thread A Thread B
* - f2fs_create
* - f2fs_new_inode
- * - alloc_nid
- * - __insert_nid_to_list(ALLOC_NID_LIST)
+ * - f2fs_alloc_nid
+ * - __insert_nid_to_list(PREALLOC_NID)
* - f2fs_balance_fs_bg
- * - build_free_nids
- * - __build_free_nids
+ * - f2fs_build_free_nids
+ * - __f2fs_build_free_nids
* - scan_nat_page
* - add_free_nid
* - __lookup_nat_cache
* - f2fs_add_link
- * - init_inode_metadata
- * - new_inode_page
- * - new_node_page
+ * - f2fs_init_inode_metadata
+ * - f2fs_new_inode_page
+ * - f2fs_new_node_page
* - set_node_addr
- * - alloc_nid_done
- * - __remove_nid_from_list(ALLOC_NID_LIST)
- * - __insert_nid_to_list(FREE_NID_LIST)
+ * - f2fs_alloc_nid_done
+ * - __remove_nid_from_list(PREALLOC_NID)
+ * - __insert_nid_to_list(FREE_NID)
*/
ne = __lookup_nat_cache(nm_i, nid);
if (ne && (!get_nat_flag(ne, IS_CHECKPOINTED) ||
@@ -1862,17 +1931,22 @@
e = __lookup_free_nid_list(nm_i, nid);
if (e) {
- if (e->state == NID_NEW)
+ if (e->state == FREE_NID)
ret = true;
goto err_out;
}
}
ret = true;
- err = __insert_nid_to_list(sbi, i, FREE_NID_LIST, true);
+ err = __insert_free_nid(sbi, i, FREE_NID);
err_out:
+ if (update) {
+ update_free_nid_bitmap(sbi, nid, ret, build);
+ if (!build)
+ nm_i->available_nids++;
+ }
spin_unlock(&nm_i->nid_list_lock);
radix_tree_preload_end();
-err:
+
if (err)
kmem_cache_free(free_nid_slab, i);
return ret;
@@ -1886,8 +1960,8 @@
spin_lock(&nm_i->nid_list_lock);
i = __lookup_free_nid_list(nm_i, nid);
- if (i && i->state == NID_NEW) {
- __remove_nid_from_list(sbi, i, FREE_NID_LIST, false);
+ if (i && i->state == FREE_NID) {
+ __remove_free_nid(sbi, i, FREE_NID);
need_free = true;
}
spin_unlock(&nm_i->nid_list_lock);
@@ -1896,27 +1970,6 @@
kmem_cache_free(free_nid_slab, i);
}
-static void update_free_nid_bitmap(struct f2fs_sb_info *sbi, nid_t nid,
- bool set, bool build)
-{
- struct f2fs_nm_info *nm_i = NM_I(sbi);
- unsigned int nat_ofs = NAT_BLOCK_OFFSET(nid);
- unsigned int nid_ofs = nid - START_NID(nid);
-
- if (!test_bit_le(nat_ofs, nm_i->nat_block_bitmap))
- return;
-
- if (set)
- __set_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
- else
- __clear_bit_le(nid_ofs, nm_i->free_nid_bitmap[nat_ofs]);
-
- if (set)
- nm_i->free_nid_count[nat_ofs]++;
- else if (!build)
- nm_i->free_nid_count[nat_ofs]--;
-}
-
static void scan_nat_page(struct f2fs_sb_info *sbi,
struct page *nat_page, nid_t start_nid)
{
@@ -1926,35 +1979,52 @@
unsigned int nat_ofs = NAT_BLOCK_OFFSET(start_nid);
int i;
- if (test_bit_le(nat_ofs, nm_i->nat_block_bitmap))
- return;
-
__set_bit_le(nat_ofs, nm_i->nat_block_bitmap);
i = start_nid % NAT_ENTRY_PER_BLOCK;
for (; i < NAT_ENTRY_PER_BLOCK; i++, start_nid++) {
- bool freed = false;
-
if (unlikely(start_nid >= nm_i->max_nid))
break;
blk_addr = le32_to_cpu(nat_blk->entries[i].block_addr);
f2fs_bug_on(sbi, blk_addr == NEW_ADDR);
- if (blk_addr == NULL_ADDR)
- freed = add_free_nid(sbi, start_nid, true);
- spin_lock(&NM_I(sbi)->nid_list_lock);
- update_free_nid_bitmap(sbi, start_nid, freed, true);
- spin_unlock(&NM_I(sbi)->nid_list_lock);
+ if (blk_addr == NULL_ADDR) {
+ add_free_nid(sbi, start_nid, true, true);
+ } else {
+ spin_lock(&NM_I(sbi)->nid_list_lock);
+ update_free_nid_bitmap(sbi, start_nid, false, true);
+ spin_unlock(&NM_I(sbi)->nid_list_lock);
+ }
}
}
+static void scan_curseg_cache(struct f2fs_sb_info *sbi)
+{
+ struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
+ struct f2fs_journal *journal = curseg->journal;
+ int i;
+
+ down_read(&curseg->journal_rwsem);
+ for (i = 0; i < nats_in_cursum(journal); i++) {
+ block_t addr;
+ nid_t nid;
+
+ addr = le32_to_cpu(nat_in_journal(journal, i).block_addr);
+ nid = le32_to_cpu(nid_in_journal(journal, i));
+ if (addr == NULL_ADDR)
+ add_free_nid(sbi, nid, true, false);
+ else
+ remove_free_nid(sbi, nid);
+ }
+ up_read(&curseg->journal_rwsem);
+}
+
static void scan_free_nid_bits(struct f2fs_sb_info *sbi)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
- struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
- struct f2fs_journal *journal = curseg->journal;
unsigned int i, idx;
+ nid_t nid;
down_read(&nm_i->nat_tree_lock);
@@ -1964,40 +2034,28 @@
if (!nm_i->free_nid_count[i])
continue;
for (idx = 0; idx < NAT_ENTRY_PER_BLOCK; idx++) {
- nid_t nid;
-
- if (!test_bit_le(idx, nm_i->free_nid_bitmap[i]))
- continue;
+ idx = find_next_bit_le(nm_i->free_nid_bitmap[i],
+ NAT_ENTRY_PER_BLOCK, idx);
+ if (idx >= NAT_ENTRY_PER_BLOCK)
+ break;
nid = i * NAT_ENTRY_PER_BLOCK + idx;
- add_free_nid(sbi, nid, true);
+ add_free_nid(sbi, nid, true, false);
- if (nm_i->nid_cnt[FREE_NID_LIST] >= MAX_FREE_NIDS)
+ if (nm_i->nid_cnt[FREE_NID] >= MAX_FREE_NIDS)
goto out;
}
}
out:
- down_read(&curseg->journal_rwsem);
- for (i = 0; i < nats_in_cursum(journal); i++) {
- block_t addr;
- nid_t nid;
+ scan_curseg_cache(sbi);
- addr = le32_to_cpu(nat_in_journal(journal, i).block_addr);
- nid = le32_to_cpu(nid_in_journal(journal, i));
- if (addr == NULL_ADDR)
- add_free_nid(sbi, nid, true);
- else
- remove_free_nid(sbi, nid);
- }
- up_read(&curseg->journal_rwsem);
up_read(&nm_i->nat_tree_lock);
}
-static void __build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount)
+static void __f2fs_build_free_nids(struct f2fs_sb_info *sbi,
+ bool sync, bool mount)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
- struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
- struct f2fs_journal *journal = curseg->journal;
int i = 0;
nid_t nid = nm_i->next_scan_nid;
@@ -2005,31 +2063,34 @@
nid = 0;
/* Enough entries */
- if (nm_i->nid_cnt[FREE_NID_LIST] >= NAT_ENTRY_PER_BLOCK)
+ if (nm_i->nid_cnt[FREE_NID] >= NAT_ENTRY_PER_BLOCK)
return;
- if (!sync && !available_free_memory(sbi, FREE_NIDS))
+ if (!sync && !f2fs_available_free_memory(sbi, FREE_NIDS))
return;
if (!mount) {
/* try to find free nids in free_nid_bitmap */
scan_free_nid_bits(sbi);
- if (nm_i->nid_cnt[FREE_NID_LIST])
+ if (nm_i->nid_cnt[FREE_NID] >= NAT_ENTRY_PER_BLOCK)
return;
}
/* readahead nat pages to be scanned */
- ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), FREE_NID_PAGES,
+ f2fs_ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nid), FREE_NID_PAGES,
META_NAT, true);
down_read(&nm_i->nat_tree_lock);
while (1) {
- struct page *page = get_current_nat_page(sbi, nid);
+ if (!test_bit_le(NAT_BLOCK_OFFSET(nid),
+ nm_i->nat_block_bitmap)) {
+ struct page *page = get_current_nat_page(sbi, nid);
- scan_nat_page(sbi, page, nid);
- f2fs_put_page(page, 1);
+ scan_nat_page(sbi, page, nid);
+ f2fs_put_page(page, 1);
+ }
nid += (NAT_ENTRY_PER_BLOCK - (nid % NAT_ENTRY_PER_BLOCK));
if (unlikely(nid >= nm_i->max_nid))
@@ -2043,28 +2104,18 @@
nm_i->next_scan_nid = nid;
/* find free nids from current sum_pages */
- down_read(&curseg->journal_rwsem);
- for (i = 0; i < nats_in_cursum(journal); i++) {
- block_t addr;
+ scan_curseg_cache(sbi);
- addr = le32_to_cpu(nat_in_journal(journal, i).block_addr);
- nid = le32_to_cpu(nid_in_journal(journal, i));
- if (addr == NULL_ADDR)
- add_free_nid(sbi, nid, true);
- else
- remove_free_nid(sbi, nid);
- }
- up_read(&curseg->journal_rwsem);
up_read(&nm_i->nat_tree_lock);
- ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nm_i->next_scan_nid),
+ f2fs_ra_meta_pages(sbi, NAT_BLOCK_OFFSET(nm_i->next_scan_nid),
nm_i->ra_nid_pages, META_NAT, false);
}
-void build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount)
+void f2fs_build_free_nids(struct f2fs_sb_info *sbi, bool sync, bool mount)
{
mutex_lock(&NM_I(sbi)->build_lock);
- __build_free_nids(sbi, sync, mount);
+ __f2fs_build_free_nids(sbi, sync, mount);
mutex_unlock(&NM_I(sbi)->build_lock);
}
@@ -2073,7 +2124,7 @@
* from second parameter of this function.
* The returned nid could be used ino as well as nid when inode is created.
*/
-bool alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid)
+bool f2fs_alloc_nid(struct f2fs_sb_info *sbi, nid_t *nid)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct free_nid *i = NULL;
@@ -2091,16 +2142,14 @@
return false;
}
- /* We should not use stale free nids created by build_free_nids */
- if (nm_i->nid_cnt[FREE_NID_LIST] && !on_build_free_nids(nm_i)) {
- f2fs_bug_on(sbi, list_empty(&nm_i->nid_list[FREE_NID_LIST]));
- i = list_first_entry(&nm_i->nid_list[FREE_NID_LIST],
+ /* We should not use stale free nids created by f2fs_build_free_nids */
+ if (nm_i->nid_cnt[FREE_NID] && !on_f2fs_build_free_nids(nm_i)) {
+ f2fs_bug_on(sbi, list_empty(&nm_i->free_nid_list));
+ i = list_first_entry(&nm_i->free_nid_list,
struct free_nid, list);
*nid = i->nid;
- __remove_nid_from_list(sbi, i, FREE_NID_LIST, true);
- i->state = NID_ALLOC;
- __insert_nid_to_list(sbi, i, ALLOC_NID_LIST, false);
+ __move_free_nid(sbi, i, FREE_NID, PREALLOC_NID);
nm_i->available_nids--;
update_free_nid_bitmap(sbi, *nid, false, false);
@@ -2111,14 +2160,14 @@
spin_unlock(&nm_i->nid_list_lock);
/* Let's scan nat pages and its caches to get free nids */
- build_free_nids(sbi, true, false);
+ f2fs_build_free_nids(sbi, true, false);
goto retry;
}
/*
- * alloc_nid() should be called prior to this function.
+ * f2fs_alloc_nid() should be called prior to this function.
*/
-void alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid)
+void f2fs_alloc_nid_done(struct f2fs_sb_info *sbi, nid_t nid)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct free_nid *i;
@@ -2126,16 +2175,16 @@
spin_lock(&nm_i->nid_list_lock);
i = __lookup_free_nid_list(nm_i, nid);
f2fs_bug_on(sbi, !i);
- __remove_nid_from_list(sbi, i, ALLOC_NID_LIST, false);
+ __remove_free_nid(sbi, i, PREALLOC_NID);
spin_unlock(&nm_i->nid_list_lock);
kmem_cache_free(free_nid_slab, i);
}
/*
- * alloc_nid() should be called prior to this function.
+ * f2fs_alloc_nid() should be called prior to this function.
*/
-void alloc_nid_failed(struct f2fs_sb_info *sbi, nid_t nid)
+void f2fs_alloc_nid_failed(struct f2fs_sb_info *sbi, nid_t nid)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct free_nid *i;
@@ -2148,13 +2197,11 @@
i = __lookup_free_nid_list(nm_i, nid);
f2fs_bug_on(sbi, !i);
- if (!available_free_memory(sbi, FREE_NIDS)) {
- __remove_nid_from_list(sbi, i, ALLOC_NID_LIST, false);
+ if (!f2fs_available_free_memory(sbi, FREE_NIDS)) {
+ __remove_free_nid(sbi, i, PREALLOC_NID);
need_free = true;
} else {
- __remove_nid_from_list(sbi, i, ALLOC_NID_LIST, true);
- i->state = NID_NEW;
- __insert_nid_to_list(sbi, i, FREE_NID_LIST, false);
+ __move_free_nid(sbi, i, PREALLOC_NID, FREE_NID);
}
nm_i->available_nids++;
@@ -2167,26 +2214,25 @@
kmem_cache_free(free_nid_slab, i);
}
-int try_to_free_nids(struct f2fs_sb_info *sbi, int nr_shrink)
+int f2fs_try_to_free_nids(struct f2fs_sb_info *sbi, int nr_shrink)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct free_nid *i, *next;
int nr = nr_shrink;
- if (nm_i->nid_cnt[FREE_NID_LIST] <= MAX_FREE_NIDS)
+ if (nm_i->nid_cnt[FREE_NID] <= MAX_FREE_NIDS)
return 0;
if (!mutex_trylock(&nm_i->build_lock))
return 0;
spin_lock(&nm_i->nid_list_lock);
- list_for_each_entry_safe(i, next, &nm_i->nid_list[FREE_NID_LIST],
- list) {
+ list_for_each_entry_safe(i, next, &nm_i->free_nid_list, list) {
if (nr_shrink <= 0 ||
- nm_i->nid_cnt[FREE_NID_LIST] <= MAX_FREE_NIDS)
+ nm_i->nid_cnt[FREE_NID] <= MAX_FREE_NIDS)
break;
- __remove_nid_from_list(sbi, i, FREE_NID_LIST, false);
+ __remove_free_nid(sbi, i, FREE_NID);
kmem_cache_free(free_nid_slab, i);
nr_shrink--;
}
@@ -2196,34 +2242,36 @@
return nr - nr_shrink;
}
-void recover_inline_xattr(struct inode *inode, struct page *page)
+void f2fs_recover_inline_xattr(struct inode *inode, struct page *page)
{
void *src_addr, *dst_addr;
size_t inline_size;
struct page *ipage;
struct f2fs_inode *ri;
- ipage = get_node_page(F2FS_I_SB(inode), inode->i_ino);
+ ipage = f2fs_get_node_page(F2FS_I_SB(inode), inode->i_ino);
f2fs_bug_on(F2FS_I_SB(inode), IS_ERR(ipage));
ri = F2FS_INODE(page);
- if (!(ri->i_inline & F2FS_INLINE_XATTR)) {
+ if (ri->i_inline & F2FS_INLINE_XATTR) {
+ set_inode_flag(inode, FI_INLINE_XATTR);
+ } else {
clear_inode_flag(inode, FI_INLINE_XATTR);
goto update_inode;
}
- dst_addr = inline_xattr_addr(ipage);
- src_addr = inline_xattr_addr(page);
+ dst_addr = inline_xattr_addr(inode, ipage);
+ src_addr = inline_xattr_addr(inode, page);
inline_size = inline_xattr_size(inode);
f2fs_wait_on_page_writeback(ipage, NODE, true);
memcpy(dst_addr, src_addr, inline_size);
update_inode:
- update_inode(inode, ipage);
+ f2fs_update_inode(inode, ipage);
f2fs_put_page(ipage, 1);
}
-int recover_xattr_data(struct inode *inode, struct page *page, block_t blkaddr)
+int f2fs_recover_xattr_data(struct inode *inode, struct page *page)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
nid_t prev_xnid = F2FS_I(inode)->i_xattr_nid;
@@ -2236,26 +2284,25 @@
goto recover_xnid;
/* 1: invalidate the previous xattr nid */
- get_node_info(sbi, prev_xnid, &ni);
- f2fs_bug_on(sbi, ni.blk_addr == NULL_ADDR);
- invalidate_blocks(sbi, ni.blk_addr);
+ f2fs_get_node_info(sbi, prev_xnid, &ni);
+ f2fs_invalidate_blocks(sbi, ni.blk_addr);
dec_valid_node_count(sbi, inode, false);
set_node_addr(sbi, &ni, NULL_ADDR, false);
recover_xnid:
/* 2: update xattr nid in inode */
- if (!alloc_nid(sbi, &new_xnid))
+ if (!f2fs_alloc_nid(sbi, &new_xnid))
return -ENOSPC;
set_new_dnode(&dn, inode, NULL, NULL, new_xnid);
- xpage = new_node_page(&dn, XATTR_NODE_OFFSET);
+ xpage = f2fs_new_node_page(&dn, XATTR_NODE_OFFSET);
if (IS_ERR(xpage)) {
- alloc_nid_failed(sbi, new_xnid);
+ f2fs_alloc_nid_failed(sbi, new_xnid);
return PTR_ERR(xpage);
}
- alloc_nid_done(sbi, new_xnid);
- update_inode_page(inode);
+ f2fs_alloc_nid_done(sbi, new_xnid);
+ f2fs_update_inode_page(inode);
/* 3: update and set xattr node page dirty */
memcpy(F2FS_NODE(xpage), F2FS_NODE(page), VALID_XATTR_BLOCK_SIZE);
@@ -2266,14 +2313,14 @@
return 0;
}
-int recover_inode_page(struct f2fs_sb_info *sbi, struct page *page)
+int f2fs_recover_inode_page(struct f2fs_sb_info *sbi, struct page *page)
{
struct f2fs_inode *src, *dst;
nid_t ino = ino_of_node(page);
struct node_info old_ni, new_ni;
struct page *ipage;
- get_node_info(sbi, ino, &old_ni);
+ f2fs_get_node_info(sbi, ino, &old_ni);
if (unlikely(old_ni.blk_addr != NULL_ADDR))
return -EINVAL;
@@ -2290,6 +2337,7 @@
if (!PageUptodate(ipage))
SetPageUptodate(ipage);
fill_node_footer(ipage, ino, ino, 0, true);
+ set_cold_node(page, false);
src = F2FS_INODE(page);
dst = F2FS_INODE(ipage);
@@ -2302,6 +2350,12 @@
dst->i_inline = src->i_inline & (F2FS_INLINE_XATTR | F2FS_EXTRA_ATTR);
if (dst->i_inline & F2FS_EXTRA_ATTR) {
dst->i_extra_isize = src->i_extra_isize;
+
+ if (f2fs_sb_has_flexible_inline_xattr(sbi->sb) &&
+ F2FS_FITS_IN_INODE(src, le16_to_cpu(src->i_extra_isize),
+ i_inline_xattr_size))
+ dst->i_inline_xattr_size = src->i_inline_xattr_size;
+
if (f2fs_sb_has_project_quota(sbi->sb) &&
F2FS_FITS_IN_INODE(src, le16_to_cpu(src->i_extra_isize),
i_projid))
@@ -2320,7 +2374,7 @@
return 0;
}
-int restore_node_summary(struct f2fs_sb_info *sbi,
+void f2fs_restore_node_summary(struct f2fs_sb_info *sbi,
unsigned int segno, struct f2fs_summary_block *sum)
{
struct f2fs_node *rn;
@@ -2337,10 +2391,10 @@
nrpages = min(last_offset - i, BIO_MAX_PAGES);
/* readahead node pages */
- ra_meta_pages(sbi, addr, nrpages, META_POR, true);
+ f2fs_ra_meta_pages(sbi, addr, nrpages, META_POR, true);
for (idx = addr; idx < addr + nrpages; idx++) {
- struct page *page = get_tmp_page(sbi, idx);
+ struct page *page = f2fs_get_tmp_page(sbi, idx);
rn = F2FS_NODE(page);
sum_entry->nid = rn->footer.nid;
@@ -2353,7 +2407,6 @@
invalidate_mapping_pages(META_MAPPING(sbi), addr,
addr + nrpages);
}
- return 0;
}
static void remove_nats_in_journal(struct f2fs_sb_info *sbi)
@@ -2373,8 +2426,8 @@
ne = __lookup_nat_cache(nm_i, nid);
if (!ne) {
- ne = grab_nat_entry(nm_i, nid, true);
- node_info_from_raw_nat(&ne->ni, &raw_ne);
+ ne = __alloc_nat_entry(nid, true);
+ __init_nat_entry(nm_i, ne, &raw_ne, true);
}
/*
@@ -2420,15 +2473,17 @@
unsigned int nat_index = start_nid / NAT_ENTRY_PER_BLOCK;
struct f2fs_nat_block *nat_blk = page_address(page);
int valid = 0;
- int i;
+ int i = 0;
if (!enabled_nat_bits(sbi, NULL))
return;
- for (i = 0; i < NAT_ENTRY_PER_BLOCK; i++) {
- if (start_nid == 0 && i == 0)
- valid++;
- if (nat_blk->entries[i].block_addr)
+ if (nat_index == 0) {
+ valid = 1;
+ i = 1;
+ }
+ for (; i < NAT_ENTRY_PER_BLOCK; i++) {
+ if (nat_blk->entries[i].block_addr != NULL_ADDR)
valid++;
}
if (valid == 0) {
@@ -2481,7 +2536,7 @@
f2fs_bug_on(sbi, nat_get_blkaddr(ne) == NEW_ADDR);
if (to_journal) {
- offset = lookup_journal_in_cursum(journal,
+ offset = f2fs_lookup_journal_in_cursum(journal,
NAT_JOURNAL, nid, 1);
f2fs_bug_on(sbi, offset < 0);
raw_ne = &nat_in_journal(journal, offset);
@@ -2493,11 +2548,7 @@
nat_reset_flag(ne);
__clear_nat_cache_dirty(NM_I(sbi), set, ne);
if (nat_get_blkaddr(ne) == NULL_ADDR) {
- add_free_nid(sbi, nid, false);
- spin_lock(&NM_I(sbi)->nid_list_lock);
- NM_I(sbi)->available_nids++;
- update_free_nid_bitmap(sbi, nid, true, false);
- spin_unlock(&NM_I(sbi)->nid_list_lock);
+ add_free_nid(sbi, nid, false, true);
} else {
spin_lock(&NM_I(sbi)->nid_list_lock);
update_free_nid_bitmap(sbi, nid, false, false);
@@ -2522,7 +2573,7 @@
/*
* This function is called during the checkpointing process.
*/
-void flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
+void f2fs_flush_nat_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_HOT_DATA);
@@ -2576,17 +2627,16 @@
if (!enabled_nat_bits(sbi, NULL))
return 0;
- nm_i->nat_bits_blocks = F2FS_BYTES_TO_BLK((nat_bits_bytes << 1) + 8 +
- F2FS_BLKSIZE - 1);
- nm_i->nat_bits = kzalloc(nm_i->nat_bits_blocks << F2FS_BLKSIZE_BITS,
- GFP_KERNEL);
+ nm_i->nat_bits_blocks = F2FS_BLK_ALIGN((nat_bits_bytes << 1) + 8);
+ nm_i->nat_bits = f2fs_kzalloc(sbi,
+ nm_i->nat_bits_blocks << F2FS_BLKSIZE_BITS, GFP_KERNEL);
if (!nm_i->nat_bits)
return -ENOMEM;
nat_bits_addr = __start_cp_addr(sbi) + sbi->blocks_per_seg -
nm_i->nat_bits_blocks;
for (i = 0; i < nm_i->nat_bits_blocks; i++) {
- struct page *page = get_meta_page(sbi, nat_bits_addr++);
+ struct page *page = f2fs_get_meta_page(sbi, nat_bits_addr++);
memcpy(nm_i->nat_bits + (i << F2FS_BLKSIZE_BITS),
page_address(page), F2FS_BLKSIZE);
@@ -2623,7 +2673,7 @@
__set_bit_le(i, nm_i->nat_block_bitmap);
nid = i * NAT_ENTRY_PER_BLOCK;
- last_nid = (i + 1) * NAT_ENTRY_PER_BLOCK;
+ last_nid = nid + NAT_ENTRY_PER_BLOCK;
spin_lock(&NM_I(sbi)->nid_list_lock);
for (; nid < last_nid; nid++)
@@ -2657,17 +2707,16 @@
/* not used nids: 0, node, meta, (and root counted as valid node) */
nm_i->available_nids = nm_i->max_nid - sbi->total_valid_node_count -
- F2FS_RESERVED_NODE_NUM;
- nm_i->nid_cnt[FREE_NID_LIST] = 0;
- nm_i->nid_cnt[ALLOC_NID_LIST] = 0;
+ sbi->nquota_files - F2FS_RESERVED_NODE_NUM;
+ nm_i->nid_cnt[FREE_NID] = 0;
+ nm_i->nid_cnt[PREALLOC_NID] = 0;
nm_i->nat_cnt = 0;
nm_i->ram_thresh = DEF_RAM_THRESHOLD;
nm_i->ra_nid_pages = DEF_RA_NID_PAGES;
nm_i->dirty_nats_ratio = DEF_DIRTY_NAT_RATIO_THRESHOLD;
INIT_RADIX_TREE(&nm_i->free_nid_root, GFP_ATOMIC);
- INIT_LIST_HEAD(&nm_i->nid_list[FREE_NID_LIST]);
- INIT_LIST_HEAD(&nm_i->nid_list[ALLOC_NID_LIST]);
+ INIT_LIST_HEAD(&nm_i->free_nid_list);
INIT_RADIX_TREE(&nm_i->nat_root, GFP_NOIO);
INIT_RADIX_TREE(&nm_i->nat_set_root, GFP_NOIO);
INIT_LIST_HEAD(&nm_i->nat_entries);
@@ -2704,29 +2753,42 @@
static int init_free_nid_cache(struct f2fs_sb_info *sbi)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
+ int i;
- nm_i->free_nid_bitmap = kvzalloc(nm_i->nat_blocks *
- NAT_ENTRY_BITMAP_SIZE, GFP_KERNEL);
+ nm_i->free_nid_bitmap =
+ f2fs_kzalloc(sbi, array_size(sizeof(unsigned char *),
+ nm_i->nat_blocks),
+ GFP_KERNEL);
if (!nm_i->free_nid_bitmap)
return -ENOMEM;
- nm_i->nat_block_bitmap = kvzalloc(nm_i->nat_blocks / 8,
+ for (i = 0; i < nm_i->nat_blocks; i++) {
+ nm_i->free_nid_bitmap[i] = f2fs_kvzalloc(sbi,
+ NAT_ENTRY_BITMAP_SIZE_ALIGNED, GFP_KERNEL);
+ if (!nm_i->free_nid_bitmap)
+ return -ENOMEM;
+ }
+
+ nm_i->nat_block_bitmap = f2fs_kvzalloc(sbi, nm_i->nat_blocks / 8,
GFP_KERNEL);
if (!nm_i->nat_block_bitmap)
return -ENOMEM;
- nm_i->free_nid_count = kvzalloc(nm_i->nat_blocks *
- sizeof(unsigned short), GFP_KERNEL);
+ nm_i->free_nid_count =
+ f2fs_kvzalloc(sbi, array_size(sizeof(unsigned short),
+ nm_i->nat_blocks),
+ GFP_KERNEL);
if (!nm_i->free_nid_count)
return -ENOMEM;
return 0;
}
-int build_node_manager(struct f2fs_sb_info *sbi)
+int f2fs_build_node_manager(struct f2fs_sb_info *sbi)
{
int err;
- sbi->nm_info = kzalloc(sizeof(struct f2fs_nm_info), GFP_KERNEL);
+ sbi->nm_info = f2fs_kzalloc(sbi, sizeof(struct f2fs_nm_info),
+ GFP_KERNEL);
if (!sbi->nm_info)
return -ENOMEM;
@@ -2741,11 +2803,11 @@
/* load free nid status from nat_bits table */
load_free_nid_bitmap(sbi);
- build_free_nids(sbi, true, true);
+ f2fs_build_free_nids(sbi, true, true);
return 0;
}
-void destroy_node_manager(struct f2fs_sb_info *sbi)
+void f2fs_destroy_node_manager(struct f2fs_sb_info *sbi)
{
struct f2fs_nm_info *nm_i = NM_I(sbi);
struct free_nid *i, *next_i;
@@ -2759,16 +2821,15 @@
/* destroy free nid list */
spin_lock(&nm_i->nid_list_lock);
- list_for_each_entry_safe(i, next_i, &nm_i->nid_list[FREE_NID_LIST],
- list) {
- __remove_nid_from_list(sbi, i, FREE_NID_LIST, false);
+ list_for_each_entry_safe(i, next_i, &nm_i->free_nid_list, list) {
+ __remove_free_nid(sbi, i, FREE_NID);
spin_unlock(&nm_i->nid_list_lock);
kmem_cache_free(free_nid_slab, i);
spin_lock(&nm_i->nid_list_lock);
}
- f2fs_bug_on(sbi, nm_i->nid_cnt[FREE_NID_LIST]);
- f2fs_bug_on(sbi, nm_i->nid_cnt[ALLOC_NID_LIST]);
- f2fs_bug_on(sbi, !list_empty(&nm_i->nid_list[ALLOC_NID_LIST]));
+ f2fs_bug_on(sbi, nm_i->nid_cnt[FREE_NID]);
+ f2fs_bug_on(sbi, nm_i->nid_cnt[PREALLOC_NID]);
+ f2fs_bug_on(sbi, !list_empty(&nm_i->free_nid_list));
spin_unlock(&nm_i->nid_list_lock);
/* destroy nat cache */
@@ -2800,7 +2861,13 @@
up_write(&nm_i->nat_tree_lock);
kvfree(nm_i->nat_block_bitmap);
- kvfree(nm_i->free_nid_bitmap);
+ if (nm_i->free_nid_bitmap) {
+ int i;
+
+ for (i = 0; i < nm_i->nat_blocks; i++)
+ kvfree(nm_i->free_nid_bitmap[i]);
+ kfree(nm_i->free_nid_bitmap);
+ }
kvfree(nm_i->free_nid_count);
kfree(nm_i->nat_bitmap);
@@ -2812,7 +2879,7 @@
kfree(nm_i);
}
-int __init create_node_manager_caches(void)
+int __init f2fs_create_node_manager_caches(void)
{
nat_entry_slab = f2fs_kmem_cache_create("nat_entry",
sizeof(struct nat_entry));
@@ -2838,7 +2905,7 @@
return -ENOMEM;
}
-void destroy_node_manager_caches(void)
+void f2fs_destroy_node_manager_caches(void)
{
kmem_cache_destroy(nat_entry_set_slab);
kmem_cache_destroy(free_nid_slab);
diff --git a/fs/f2fs/node.h b/fs/f2fs/node.h
index bb53e99..b95e49e 100644
--- a/fs/f2fs/node.h
+++ b/fs/f2fs/node.h
@@ -44,6 +44,7 @@
HAS_FSYNCED_INODE, /* is the inode fsynced before? */
HAS_LAST_FSYNC, /* has the latest node fsync mark? */
IS_DIRTY, /* this nat entry is dirty? */
+ IS_PREALLOC, /* nat entry is preallocated */
};
/*
@@ -140,6 +141,7 @@
DIRTY_DENTS, /* indicates dirty dentry pages */
INO_ENTRIES, /* indicates inode entries */
EXTENT_CACHE, /* indicates extent cache */
+ INMEM_PAGES, /* indicates inmemory pages */
BASE_CHECK, /* check kernel status */
};
@@ -150,18 +152,10 @@
unsigned int entry_cnt; /* the # of nat entries in set */
};
-/*
- * For free nid mangement
- */
-enum nid_state {
- NID_NEW, /* newly added to free nid list */
- NID_ALLOC /* it is allocated */
-};
-
struct free_nid {
struct list_head list; /* for free node id list */
nid_t nid; /* node id */
- int state; /* in use or not: NID_NEW or NID_ALLOC */
+ int state; /* in use or not: FREE_NID or PREALLOC_NID */
};
static inline void next_free_nid(struct f2fs_sb_info *sbi, nid_t *nid)
@@ -170,12 +164,11 @@
struct free_nid *fnid;
spin_lock(&nm_i->nid_list_lock);
- if (nm_i->nid_cnt[FREE_NID_LIST] <= 0) {
+ if (nm_i->nid_cnt[FREE_NID] <= 0) {
spin_unlock(&nm_i->nid_list_lock);
return;
}
- fnid = list_first_entry(&nm_i->nid_list[FREE_NID_LIST],
- struct free_nid, list);
+ fnid = list_first_entry(&nm_i->free_nid_list, struct free_nid, list);
*nid = fnid->nid;
spin_unlock(&nm_i->nid_list_lock);
}
@@ -313,6 +306,10 @@
struct f2fs_checkpoint *ckpt = F2FS_CKPT(F2FS_P_SB(page));
__u64 cp_ver = cur_cp_version(ckpt);
+ /* Don't care crc part, if fsck.f2fs sets it. */
+ if (__is_set_ckpt_flags(ckpt, CP_NOCRC_RECOVERY_FLAG))
+ return (cp_ver << 32) == (cpver_of_node(page) << 32);
+
if (__is_set_ckpt_flags(ckpt, CP_CRC_RECOVERY_FLAG))
cp_ver |= (cur_cp_crc(ckpt) << 32);
@@ -426,12 +423,12 @@
ClearPageChecked(page);
}
-static inline void set_cold_node(struct inode *inode, struct page *page)
+static inline void set_cold_node(struct page *page, bool is_dir)
{
struct f2fs_node *rn = F2FS_NODE(page);
unsigned int flag = le32_to_cpu(rn->footer.flag);
- if (S_ISDIR(inode->i_mode))
+ if (is_dir)
flag &= ~(0x1 << COLD_BIT_SHIFT);
else
flag |= (0x1 << COLD_BIT_SHIFT);
diff --git a/fs/f2fs/recovery.c b/fs/f2fs/recovery.c
index 9626758..daf81d4 100644
--- a/fs/f2fs/recovery.c
+++ b/fs/f2fs/recovery.c
@@ -47,7 +47,7 @@
static struct kmem_cache *fsync_entry_slab;
-bool space_for_roll_forward(struct f2fs_sb_info *sbi)
+bool f2fs_space_for_roll_forward(struct f2fs_sb_info *sbi)
{
s64 nalloc = percpu_counter_sum_positive(&sbi->alloc_valid_block_count);
@@ -144,7 +144,7 @@
retry:
de = __f2fs_find_entry(dir, &fname, &page);
if (de && inode->i_ino == le32_to_cpu(de->ino))
- goto out_unmap_put;
+ goto out_put;
if (de) {
einode = f2fs_iget_retry(inode->i_sb, le32_to_cpu(de->ino));
@@ -153,19 +153,19 @@
err = PTR_ERR(einode);
if (err == -ENOENT)
err = -EEXIST;
- goto out_unmap_put;
+ goto out_put;
}
err = dquot_initialize(einode);
if (err) {
iput(einode);
- goto out_unmap_put;
+ goto out_put;
}
- err = acquire_orphan_inode(F2FS_I_SB(inode));
+ err = f2fs_acquire_orphan_inode(F2FS_I_SB(inode));
if (err) {
iput(einode);
- goto out_unmap_put;
+ goto out_put;
}
f2fs_delete_entry(de, page, dir, einode);
iput(einode);
@@ -173,15 +173,14 @@
} else if (IS_ERR(page)) {
err = PTR_ERR(page);
} else {
- err = __f2fs_do_add_link(dir, &fname, inode,
+ err = f2fs_add_dentry(dir, &fname, inode,
inode->i_ino, inode->i_mode);
}
if (err == -ENOMEM)
goto retry;
goto out;
-out_unmap_put:
- f2fs_dentry_kunmap(dir, page);
+out_put:
f2fs_put_page(page, 0);
out:
if (file_enc_name(inode))
@@ -195,6 +194,18 @@
return err;
}
+static void recover_inline_flags(struct inode *inode, struct f2fs_inode *ri)
+{
+ if (ri->i_inline & F2FS_PIN_FILE)
+ set_inode_flag(inode, FI_PIN_FILE);
+ else
+ clear_inode_flag(inode, FI_PIN_FILE);
+ if (ri->i_inline & F2FS_DATA_EXIST)
+ set_inode_flag(inode, FI_DATA_EXIST);
+ else
+ clear_inode_flag(inode, FI_DATA_EXIST);
+}
+
static void recover_inode(struct inode *inode, struct page *page)
{
struct f2fs_inode *raw = F2FS_INODE(page);
@@ -211,13 +222,16 @@
F2FS_I(inode)->i_advise = raw->i_advise;
+ recover_inline_flags(inode, raw);
+
if (file_enc_name(inode))
name = "<encrypted>";
else
name = F2FS_INODE(page)->i_name;
- f2fs_msg(inode->i_sb, KERN_NOTICE, "recover_inode: ino = %x, name = %s",
- ino_of_node(page), name);
+ f2fs_msg(inode->i_sb, KERN_NOTICE,
+ "recover_inode: ino = %x, name = %s, inline = %x",
+ ino_of_node(page), name, raw->i_inline);
}
static int find_fsync_dnodes(struct f2fs_sb_info *sbi, struct list_head *head,
@@ -226,6 +240,9 @@
struct curseg_info *curseg;
struct page *page = NULL;
block_t blkaddr;
+ unsigned int loop_cnt = 0;
+ unsigned int free_blocks = sbi->user_block_count -
+ valid_user_blocks(sbi);
int err = 0;
/* get node pages in the current segment */
@@ -235,10 +252,10 @@
while (1) {
struct fsync_inode_entry *entry;
- if (!is_valid_blkaddr(sbi, blkaddr, META_POR))
+ if (!f2fs_is_valid_meta_blkaddr(sbi, blkaddr, META_POR))
return 0;
- page = get_tmp_page(sbi, blkaddr);
+ page = f2fs_get_tmp_page(sbi, blkaddr);
if (!is_recoverable_dnode(page))
break;
@@ -252,7 +269,7 @@
if (!check_only &&
IS_INODE(page) && is_dent_dnode(page)) {
- err = recover_inode_page(sbi, page);
+ err = f2fs_recover_inode_page(sbi, page);
if (err)
break;
quota_inode = true;
@@ -278,11 +295,22 @@
if (IS_INODE(page) && is_dent_dnode(page))
entry->last_dentry = blkaddr;
next:
+ /* sanity check in order to detect looped node chain */
+ if (++loop_cnt >= free_blocks ||
+ blkaddr == next_blkaddr_of_node(page)) {
+ f2fs_msg(sbi->sb, KERN_NOTICE,
+ "%s: detect looped node chain, "
+ "blkaddr:%u, next:%u",
+ __func__, blkaddr, next_blkaddr_of_node(page));
+ err = -EINVAL;
+ break;
+ }
+
/* check next segment */
blkaddr = next_blkaddr_of_node(page);
f2fs_put_page(page, 1);
- ra_meta_pages_cond(sbi, blkaddr);
+ f2fs_ra_meta_pages_cond(sbi, blkaddr);
}
f2fs_put_page(page, 1);
return err;
@@ -325,7 +353,7 @@
}
}
- sum_page = get_sum_page(sbi, segno);
+ sum_page = f2fs_get_sum_page(sbi, segno);
sum_node = (struct f2fs_summary_block *)page_address(sum_page);
sum = sum_node->entries[blkoff];
f2fs_put_page(sum_page, 1);
@@ -345,7 +373,7 @@
}
/* Get the node page */
- node_page = get_node_page(sbi, nid);
+ node_page = f2fs_get_node_page(sbi, nid);
if (IS_ERR(node_page))
return PTR_ERR(node_page);
@@ -370,7 +398,8 @@
inode = dn->inode;
}
- bidx = start_bidx_of_node(offset, inode) + le16_to_cpu(sum.ofs_in_node);
+ bidx = f2fs_start_bidx_of_node(offset, inode) +
+ le16_to_cpu(sum.ofs_in_node);
/*
* if inode page is locked, unlock temporarily, but its reference
@@ -380,11 +409,11 @@
unlock_page(dn->inode_page);
set_new_dnode(&tdn, inode, NULL, NULL, 0);
- if (get_dnode_of_data(&tdn, bidx, LOOKUP_NODE))
+ if (f2fs_get_dnode_of_data(&tdn, bidx, LOOKUP_NODE))
goto out;
if (tdn.data_blkaddr == blkaddr)
- truncate_data_blocks_range(&tdn, 1);
+ f2fs_truncate_data_blocks_range(&tdn, 1);
f2fs_put_dnode(&tdn);
out:
@@ -397,14 +426,14 @@
truncate_out:
if (datablock_addr(tdn.inode, tdn.node_page,
tdn.ofs_in_node) == blkaddr)
- truncate_data_blocks_range(&tdn, 1);
+ f2fs_truncate_data_blocks_range(&tdn, 1);
if (dn->inode->i_ino == nid && !dn->inode_page_locked)
unlock_page(dn->inode_page);
return 0;
}
static int do_recover_data(struct f2fs_sb_info *sbi, struct inode *inode,
- struct page *page, block_t blkaddr)
+ struct page *page)
{
struct dnode_of_data dn;
struct node_info ni;
@@ -413,25 +442,25 @@
/* step 1: recover xattr */
if (IS_INODE(page)) {
- recover_inline_xattr(inode, page);
+ f2fs_recover_inline_xattr(inode, page);
} else if (f2fs_has_xattr_block(ofs_of_node(page))) {
- err = recover_xattr_data(inode, page, blkaddr);
+ err = f2fs_recover_xattr_data(inode, page);
if (!err)
recovered++;
goto out;
}
/* step 2: recover inline data */
- if (recover_inline_data(inode, page))
+ if (f2fs_recover_inline_data(inode, page))
goto out;
/* step 3: recover data indices */
- start = start_bidx_of_node(ofs_of_node(page), inode);
+ start = f2fs_start_bidx_of_node(ofs_of_node(page), inode);
end = start + ADDRS_PER_PAGE(page, inode);
set_new_dnode(&dn, inode, NULL, NULL, 0);
retry_dn:
- err = get_dnode_of_data(&dn, start, ALLOC_NODE);
+ err = f2fs_get_dnode_of_data(&dn, start, ALLOC_NODE);
if (err) {
if (err == -ENOMEM) {
congestion_wait(BLK_RW_ASYNC, HZ/50);
@@ -442,7 +471,7 @@
f2fs_wait_on_page_writeback(dn.node_page, NODE, true);
- get_node_info(sbi, dn.nid, &ni);
+ f2fs_get_node_info(sbi, dn.nid, &ni);
f2fs_bug_on(sbi, ni.ino != ino_of_node(page));
f2fs_bug_on(sbi, ofs_of_node(dn.node_page) != ofs_of_node(page));
@@ -458,7 +487,7 @@
/* dest is invalid, just invalidate src block */
if (dest == NULL_ADDR) {
- truncate_data_blocks_range(&dn, 1);
+ f2fs_truncate_data_blocks_range(&dn, 1);
continue;
}
@@ -472,19 +501,19 @@
* and then reserve one new block in dnode page.
*/
if (dest == NEW_ADDR) {
- truncate_data_blocks_range(&dn, 1);
- reserve_new_block(&dn);
+ f2fs_truncate_data_blocks_range(&dn, 1);
+ f2fs_reserve_new_block(&dn);
continue;
}
/* dest is valid block, try to recover from src to dest */
- if (is_valid_blkaddr(sbi, dest, META_POR)) {
+ if (f2fs_is_valid_meta_blkaddr(sbi, dest, META_POR)) {
if (src == NULL_ADDR) {
- err = reserve_new_block(&dn);
+ err = f2fs_reserve_new_block(&dn);
#ifdef CONFIG_F2FS_FAULT_INJECTION
while (err)
- err = reserve_new_block(&dn);
+ err = f2fs_reserve_new_block(&dn);
#endif
/* We should not get -ENOSPC */
f2fs_bug_on(sbi, err);
@@ -539,12 +568,12 @@
while (1) {
struct fsync_inode_entry *entry;
- if (!is_valid_blkaddr(sbi, blkaddr, META_POR))
+ if (!f2fs_is_valid_meta_blkaddr(sbi, blkaddr, META_POR))
break;
- ra_meta_pages_cond(sbi, blkaddr);
+ f2fs_ra_meta_pages_cond(sbi, blkaddr);
- page = get_tmp_page(sbi, blkaddr);
+ page = f2fs_get_tmp_page(sbi, blkaddr);
if (!is_recoverable_dnode(page)) {
f2fs_put_page(page, 1);
@@ -568,7 +597,7 @@
break;
}
}
- err = do_recover_data(sbi, entry->inode, page, blkaddr);
+ err = do_recover_data(sbi, entry->inode, page);
if (err) {
f2fs_put_page(page, 1);
break;
@@ -582,11 +611,11 @@
f2fs_put_page(page, 1);
}
if (!err)
- allocate_new_segments(sbi);
+ f2fs_allocate_new_segments(sbi);
return err;
}
-int recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only)
+int f2fs_recover_fsync_data(struct f2fs_sb_info *sbi, bool check_only)
{
struct list_head inode_list;
struct list_head dir_list;
@@ -594,6 +623,9 @@
int ret = 0;
unsigned long s_flags = sbi->sb->s_flags;
bool need_writecp = false;
+#ifdef CONFIG_QUOTA
+ int quota_enabled;
+#endif
if (s_flags & MS_RDONLY) {
f2fs_msg(sbi->sb, KERN_INFO, "orphan cleanup on readonly fs");
@@ -604,7 +636,7 @@
/* Needed for iput() to work correctly and not trash data */
sbi->sb->s_flags |= MS_ACTIVE;
/* Turn on quotas so that they are updated correctly */
- f2fs_enable_quota_files(sbi);
+ quota_enabled = f2fs_enable_quota_files(sbi, s_flags & MS_RDONLY);
#endif
fsync_entry_slab = f2fs_kmem_cache_create("f2fs_fsync_inode_entry",
@@ -658,14 +690,15 @@
struct cp_control cpc = {
.reason = CP_RECOVERY,
};
- err = write_checkpoint(sbi, &cpc);
+ err = f2fs_write_checkpoint(sbi, &cpc);
}
kmem_cache_destroy(fsync_entry_slab);
out:
#ifdef CONFIG_QUOTA
/* Turn quotas off */
- f2fs_quota_off_umount(sbi->sb);
+ if (quota_enabled)
+ f2fs_quota_off_umount(sbi->sb);
#endif
sbi->sb->s_flags = s_flags; /* Restore MS_RDONLY status */
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 3c7bbba..9efce17 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -169,7 +169,7 @@
return result - size + __reverse_ffz(tmp);
}
-bool need_SSR(struct f2fs_sb_info *sbi)
+bool f2fs_need_SSR(struct f2fs_sb_info *sbi)
{
int node_secs = get_blocktype_secs(sbi, F2FS_DIRTY_NODES);
int dent_secs = get_blocktype_secs(sbi, F2FS_DIRTY_DENTS);
@@ -177,15 +177,16 @@
if (test_opt(sbi, LFS))
return false;
- if (sbi->gc_thread && sbi->gc_thread->gc_urgent)
+ if (sbi->gc_mode == GC_URGENT)
return true;
return free_sections(sbi) <= (node_secs + 2 * dent_secs + imeta_secs +
- 2 * reserved_sections(sbi));
+ SM_I(sbi)->min_ssr_sections + reserved_sections(sbi));
}
-void register_inmem_page(struct inode *inode, struct page *page)
+void f2fs_register_inmem_page(struct inode *inode, struct page *page)
{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct f2fs_inode_info *fi = F2FS_I(inode);
struct inmem_pages *new;
@@ -204,6 +205,10 @@
mutex_lock(&fi->inmem_lock);
get_page(page);
list_add_tail(&new->list, &fi->inmem_pages);
+ spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
+ if (list_empty(&fi->inmem_ilist))
+ list_add_tail(&fi->inmem_ilist, &sbi->inode_list[ATOMIC_FILE]);
+ spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
inc_page_count(F2FS_I_SB(inode), F2FS_INMEM_PAGES);
mutex_unlock(&fi->inmem_lock);
@@ -234,7 +239,8 @@
trace_f2fs_commit_inmem_page(page, INMEM_REVOKE);
retry:
set_new_dnode(&dn, inode, NULL, NULL, 0);
- err = get_dnode_of_data(&dn, page->index, LOOKUP_NODE);
+ err = f2fs_get_dnode_of_data(&dn, page->index,
+ LOOKUP_NODE);
if (err) {
if (err == -ENOMEM) {
congestion_wait(BLK_RW_ASYNC, HZ/50);
@@ -244,8 +250,12 @@
err = -EAGAIN;
goto next;
}
- get_node_info(sbi, dn.nid, &ni);
- f2fs_replace_block(sbi, &dn, dn.data_blkaddr,
+ f2fs_get_node_info(sbi, dn.nid, &ni);
+ if (cur->old_addr == NEW_ADDR) {
+ f2fs_invalidate_blocks(sbi, dn.data_blkaddr);
+ f2fs_update_data_blkaddr(&dn, NEW_ADDR);
+ } else
+ f2fs_replace_block(sbi, &dn, dn.data_blkaddr,
cur->old_addr, ni.version, true, true);
f2fs_put_dnode(&dn);
}
@@ -264,20 +274,57 @@
return err;
}
-void drop_inmem_pages(struct inode *inode)
+void f2fs_drop_inmem_pages_all(struct f2fs_sb_info *sbi, bool gc_failure)
{
+ struct list_head *head = &sbi->inode_list[ATOMIC_FILE];
+ struct inode *inode;
+ struct f2fs_inode_info *fi;
+next:
+ spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
+ if (list_empty(head)) {
+ spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
+ return;
+ }
+ fi = list_first_entry(head, struct f2fs_inode_info, inmem_ilist);
+ inode = igrab(&fi->vfs_inode);
+ spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
+
+ if (inode) {
+ if (gc_failure) {
+ if (fi->i_gc_failures[GC_FAILURE_ATOMIC])
+ goto drop;
+ goto skip;
+ }
+drop:
+ set_inode_flag(inode, FI_ATOMIC_REVOKE_REQUEST);
+ f2fs_drop_inmem_pages(inode);
+ iput(inode);
+ }
+skip:
+ congestion_wait(BLK_RW_ASYNC, HZ/50);
+ cond_resched();
+ goto next;
+}
+
+void f2fs_drop_inmem_pages(struct inode *inode)
+{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct f2fs_inode_info *fi = F2FS_I(inode);
mutex_lock(&fi->inmem_lock);
__revoke_inmem_pages(inode, &fi->inmem_pages, true, false);
+ spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
+ if (!list_empty(&fi->inmem_ilist))
+ list_del_init(&fi->inmem_ilist);
+ spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
mutex_unlock(&fi->inmem_lock);
clear_inode_flag(inode, FI_ATOMIC_FILE);
- clear_inode_flag(inode, FI_HOT_DATA);
+ fi->i_gc_failures[GC_FAILURE_ATOMIC] = 0;
stat_dec_atomic_write(inode);
}
-void drop_inmem_page(struct inode *inode, struct page *page)
+void f2fs_drop_inmem_page(struct inode *inode, struct page *page)
{
struct f2fs_inode_info *fi = F2FS_I(inode);
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
@@ -292,7 +339,7 @@
break;
}
- f2fs_bug_on(sbi, !cur || cur->page != page);
+ f2fs_bug_on(sbi, list_empty(head) || cur->page != page);
list_del(&cur->list);
mutex_unlock(&fi->inmem_lock);
@@ -307,22 +354,25 @@
trace_f2fs_commit_inmem_page(page, INMEM_INVALIDATE);
}
-static int __commit_inmem_pages(struct inode *inode,
- struct list_head *revoke_list)
+static int __f2fs_commit_inmem_pages(struct inode *inode)
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct f2fs_inode_info *fi = F2FS_I(inode);
struct inmem_pages *cur, *tmp;
struct f2fs_io_info fio = {
.sbi = sbi,
+ .ino = inode->i_ino,
.type = DATA,
.op = REQ_OP_WRITE,
.op_flags = REQ_SYNC | REQ_PRIO,
.io_type = FS_DATA_IO,
};
+ struct list_head revoke_list;
pgoff_t last_idx = ULONG_MAX;
int err = 0;
+ INIT_LIST_HEAD(&revoke_list);
+
list_for_each_entry_safe(cur, tmp, &fi->inmem_pages, list) {
struct page *page = cur->page;
@@ -334,14 +384,14 @@
f2fs_wait_on_page_writeback(page, DATA, true);
if (clear_page_dirty_for_io(page)) {
inode_dec_dirty_pages(inode);
- remove_dirty_inode(inode);
+ f2fs_remove_dirty_inode(inode);
}
retry:
fio.page = page;
fio.old_blkaddr = NULL_ADDR;
fio.encrypted_page = NULL;
fio.need_lock = LOCK_DONE;
- err = do_write_data_page(&fio);
+ err = f2fs_do_write_data_page(&fio);
if (err) {
if (err == -ENOMEM) {
congestion_wait(BLK_RW_ASYNC, HZ/50);
@@ -356,35 +406,13 @@
last_idx = page->index;
}
unlock_page(page);
- list_move_tail(&cur->list, revoke_list);
+ list_move_tail(&cur->list, &revoke_list);
}
if (last_idx != ULONG_MAX)
f2fs_submit_merged_write_cond(sbi, inode, 0, last_idx, DATA);
- if (!err)
- __revoke_inmem_pages(inode, revoke_list, false, false);
-
- return err;
-}
-
-int commit_inmem_pages(struct inode *inode)
-{
- struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
- struct f2fs_inode_info *fi = F2FS_I(inode);
- struct list_head revoke_list;
- int err;
-
- INIT_LIST_HEAD(&revoke_list);
- f2fs_balance_fs(sbi, true);
- f2fs_lock_op(sbi);
-
- set_inode_flag(inode, FI_ATOMIC_COMMIT);
-
- mutex_lock(&fi->inmem_lock);
- err = __commit_inmem_pages(inode, &revoke_list);
if (err) {
- int ret;
/*
* try to revoke all committed pages, but still we could fail
* due to no memory or other reason, if that happened, EAGAIN
@@ -393,13 +421,35 @@
* recovery or rewrite & commit last transaction. For other
* error number, revoking was done by filesystem itself.
*/
- ret = __revoke_inmem_pages(inode, &revoke_list, false, true);
- if (ret)
- err = ret;
+ err = __revoke_inmem_pages(inode, &revoke_list, false, true);
/* drop all uncommitted pages */
__revoke_inmem_pages(inode, &fi->inmem_pages, true, false);
+ } else {
+ __revoke_inmem_pages(inode, &revoke_list, false, false);
}
+
+ return err;
+}
+
+int f2fs_commit_inmem_pages(struct inode *inode)
+{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ struct f2fs_inode_info *fi = F2FS_I(inode);
+ int err;
+
+ f2fs_balance_fs(sbi, true);
+ f2fs_lock_op(sbi);
+
+ set_inode_flag(inode, FI_ATOMIC_COMMIT);
+
+ mutex_lock(&fi->inmem_lock);
+ err = __f2fs_commit_inmem_pages(inode);
+
+ spin_lock(&sbi->inode_lock[ATOMIC_FILE]);
+ if (!list_empty(&fi->inmem_ilist))
+ list_del_init(&fi->inmem_ilist);
+ spin_unlock(&sbi->inode_lock[ATOMIC_FILE]);
mutex_unlock(&fi->inmem_lock);
clear_inode_flag(inode, FI_ATOMIC_COMMIT);
@@ -441,24 +491,24 @@
return;
/* try to shrink extent cache when there is no enough memory */
- if (!available_free_memory(sbi, EXTENT_CACHE))
+ if (!f2fs_available_free_memory(sbi, EXTENT_CACHE))
f2fs_shrink_extent_tree(sbi, EXTENT_CACHE_SHRINK_NUMBER);
/* check the # of cached NAT entries */
- if (!available_free_memory(sbi, NAT_ENTRIES))
- try_to_free_nats(sbi, NAT_ENTRY_PER_BLOCK);
+ if (!f2fs_available_free_memory(sbi, NAT_ENTRIES))
+ f2fs_try_to_free_nats(sbi, NAT_ENTRY_PER_BLOCK);
- if (!available_free_memory(sbi, FREE_NIDS))
- try_to_free_nids(sbi, MAX_FREE_NIDS);
+ if (!f2fs_available_free_memory(sbi, FREE_NIDS))
+ f2fs_try_to_free_nids(sbi, MAX_FREE_NIDS);
else
- build_free_nids(sbi, false, false);
+ f2fs_build_free_nids(sbi, false, false);
if (!is_idle(sbi) && !excess_dirty_nats(sbi))
return;
/* checkpoint is the only way to shrink partial cached entries */
- if (!available_free_memory(sbi, NAT_ENTRIES) ||
- !available_free_memory(sbi, INO_ENTRIES) ||
+ if (!f2fs_available_free_memory(sbi, NAT_ENTRIES) ||
+ !f2fs_available_free_memory(sbi, INO_ENTRIES) ||
excess_prefree_segs(sbi) ||
excess_dirty_nats(sbi) ||
f2fs_time_over(sbi, CP_TIME)) {
@@ -466,7 +516,7 @@
struct blk_plug plug;
blk_start_plug(&plug);
- sync_dirty_inodes(sbi, FILE_INODE);
+ f2fs_sync_dirty_inodes(sbi, FILE_INODE);
blk_finish_plug(&plug);
}
f2fs_sync_fs(sbi->sb, true);
@@ -477,7 +527,7 @@
static int __submit_flush_wait(struct f2fs_sb_info *sbi,
struct block_device *bdev)
{
- struct bio *bio = f2fs_bio_alloc(0);
+ struct bio *bio = f2fs_bio_alloc(sbi, 0, true);
int ret;
bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH;
@@ -490,15 +540,17 @@
return ret;
}
-static int submit_flush_wait(struct f2fs_sb_info *sbi)
+static int submit_flush_wait(struct f2fs_sb_info *sbi, nid_t ino)
{
- int ret = __submit_flush_wait(sbi, sbi->sb->s_bdev);
+ int ret = 0;
int i;
- if (!sbi->s_ndevs || ret)
- return ret;
+ if (!sbi->s_ndevs)
+ return __submit_flush_wait(sbi, sbi->sb->s_bdev);
- for (i = 1; i < sbi->s_ndevs; i++) {
+ for (i = 0; i < sbi->s_ndevs; i++) {
+ if (!f2fs_is_dirty_device(sbi, ino, i, FLUSH_INO))
+ continue;
ret = __submit_flush_wait(sbi, FDEV(i).bdev);
if (ret)
break;
@@ -524,7 +576,9 @@
fcc->dispatch_list = llist_del_all(&fcc->issue_list);
fcc->dispatch_list = llist_reverse_order(fcc->dispatch_list);
- ret = submit_flush_wait(sbi);
+ cmd = llist_entry(fcc->dispatch_list, struct flush_cmd, llnode);
+
+ ret = submit_flush_wait(sbi, cmd->ino);
atomic_inc(&fcc->issued_flush);
llist_for_each_entry_safe(cmd, next,
@@ -542,7 +596,7 @@
goto repeat;
}
-int f2fs_issue_flush(struct f2fs_sb_info *sbi)
+int f2fs_issue_flush(struct f2fs_sb_info *sbi, nid_t ino)
{
struct flush_cmd_control *fcc = SM_I(sbi)->fcc_info;
struct flush_cmd cmd;
@@ -552,19 +606,20 @@
return 0;
if (!test_opt(sbi, FLUSH_MERGE)) {
- ret = submit_flush_wait(sbi);
+ ret = submit_flush_wait(sbi, ino);
atomic_inc(&fcc->issued_flush);
return ret;
}
- if (atomic_inc_return(&fcc->issing_flush) == 1) {
- ret = submit_flush_wait(sbi);
+ if (atomic_inc_return(&fcc->issing_flush) == 1 || sbi->s_ndevs > 1) {
+ ret = submit_flush_wait(sbi, ino);
atomic_dec(&fcc->issing_flush);
atomic_inc(&fcc->issued_flush);
return ret;
}
+ cmd.ino = ino;
init_completion(&cmd.wait);
llist_add(&cmd.llnode, &fcc->issue_list);
@@ -588,7 +643,7 @@
} else {
struct flush_cmd *tmp, *next;
- ret = submit_flush_wait(sbi);
+ ret = submit_flush_wait(sbi, ino);
llist_for_each_entry_safe(tmp, next, list, llnode) {
if (tmp == &cmd) {
@@ -605,7 +660,7 @@
return cmd.ret;
}
-int create_flush_cmd_control(struct f2fs_sb_info *sbi)
+int f2fs_create_flush_cmd_control(struct f2fs_sb_info *sbi)
{
dev_t dev = sbi->sb->s_bdev->bd_dev;
struct flush_cmd_control *fcc;
@@ -618,7 +673,7 @@
goto init_thread;
}
- fcc = kzalloc(sizeof(struct flush_cmd_control), GFP_KERNEL);
+ fcc = f2fs_kzalloc(sbi, sizeof(struct flush_cmd_control), GFP_KERNEL);
if (!fcc)
return -ENOMEM;
atomic_set(&fcc->issued_flush, 0);
@@ -642,7 +697,7 @@
return err;
}
-void destroy_flush_cmd_control(struct f2fs_sb_info *sbi, bool free)
+void f2fs_destroy_flush_cmd_control(struct f2fs_sb_info *sbi, bool free)
{
struct flush_cmd_control *fcc = SM_I(sbi)->fcc_info;
@@ -658,6 +713,28 @@
}
}
+int f2fs_flush_device_cache(struct f2fs_sb_info *sbi)
+{
+ int ret = 0, i;
+
+ if (!sbi->s_ndevs)
+ return 0;
+
+ for (i = 1; i < sbi->s_ndevs; i++) {
+ if (!f2fs_test_bit(i, (char *)&sbi->dirty_device))
+ continue;
+ ret = __submit_flush_wait(sbi, FDEV(i).bdev);
+ if (ret)
+ break;
+
+ spin_lock(&sbi->dev_lock);
+ f2fs_clear_bit(i, (char *)&sbi->dirty_device);
+ spin_unlock(&sbi->dev_lock);
+ }
+
+ return ret;
+}
+
static void __locate_dirty_segment(struct f2fs_sb_info *sbi, unsigned int segno,
enum dirty_type dirty_type)
{
@@ -799,6 +876,8 @@
{
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
+ trace_f2fs_remove_discard(dc->bdev, dc->start, dc->len);
+
f2fs_bug_on(sbi, dc->ref);
if (dc->error == -EOPNOTSUPP)
@@ -821,7 +900,7 @@
bio_put(bio);
}
-void __check_sit_bitmap(struct f2fs_sb_info *sbi,
+static void __check_sit_bitmap(struct f2fs_sb_info *sbi,
block_t start, block_t end)
{
#ifdef CONFIG_F2FS_CHECK_FS
@@ -848,16 +927,59 @@
#endif
}
+static void __init_discard_policy(struct f2fs_sb_info *sbi,
+ struct discard_policy *dpolicy,
+ int discard_type, unsigned int granularity)
+{
+ /* common policy */
+ dpolicy->type = discard_type;
+ dpolicy->sync = true;
+ dpolicy->granularity = granularity;
+
+ dpolicy->max_requests = DEF_MAX_DISCARD_REQUEST;
+ dpolicy->io_aware_gran = MAX_PLIST_NUM;
+
+ if (discard_type == DPOLICY_BG) {
+ dpolicy->min_interval = DEF_MIN_DISCARD_ISSUE_TIME;
+ dpolicy->mid_interval = DEF_MID_DISCARD_ISSUE_TIME;
+ dpolicy->max_interval = DEF_MAX_DISCARD_ISSUE_TIME;
+ dpolicy->io_aware = true;
+ dpolicy->sync = false;
+ if (utilization(sbi) > DEF_DISCARD_URGENT_UTIL) {
+ dpolicy->granularity = 1;
+ dpolicy->max_interval = DEF_MIN_DISCARD_ISSUE_TIME;
+ }
+ } else if (discard_type == DPOLICY_FORCE) {
+ dpolicy->min_interval = DEF_MIN_DISCARD_ISSUE_TIME;
+ dpolicy->mid_interval = DEF_MID_DISCARD_ISSUE_TIME;
+ dpolicy->max_interval = DEF_MAX_DISCARD_ISSUE_TIME;
+ dpolicy->io_aware = false;
+ } else if (discard_type == DPOLICY_FSTRIM) {
+ dpolicy->io_aware = false;
+ } else if (discard_type == DPOLICY_UMOUNT) {
+ dpolicy->max_requests = UINT_MAX;
+ dpolicy->io_aware = false;
+ }
+}
+
+
/* this function is copied from blkdev_issue_discard from block/blk-lib.c */
static void __submit_discard_cmd(struct f2fs_sb_info *sbi,
- struct discard_cmd *dc)
+ struct discard_policy *dpolicy,
+ struct discard_cmd *dc)
{
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
+ struct list_head *wait_list = (dpolicy->type == DPOLICY_FSTRIM) ?
+ &(dcc->fstrim_list) : &(dcc->wait_list);
struct bio *bio = NULL;
+ int flag = dpolicy->sync ? REQ_SYNC : 0;
if (dc->state != D_PREP)
return;
+ if (is_sbi_flag_set(sbi, SBI_NEED_FSCK))
+ return;
+
trace_f2fs_issue_discard(dc->bdev, dc->start, dc->len);
dc->error = __blkdev_issue_discard(dc->bdev,
@@ -872,9 +994,9 @@
if (bio) {
bio->bi_private = dc;
bio->bi_end_io = f2fs_submit_discard_endio;
- bio->bi_opf |= REQ_SYNC;
+ bio->bi_opf |= flag;
submit_bio(bio);
- list_move_tail(&dc->list, &dcc->wait_list);
+ list_move_tail(&dc->list, wait_list);
__check_sit_bitmap(sbi, dc->start, dc->start + dc->len);
f2fs_update_iostat(sbi, FS_DISCARD, 1);
@@ -891,7 +1013,7 @@
struct rb_node *insert_parent)
{
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
- struct rb_node **p = &dcc->root.rb_node;
+ struct rb_node **p;
struct rb_node *parent = NULL;
struct discard_cmd *dc = NULL;
@@ -901,7 +1023,7 @@
goto do_insert;
}
- p = __lookup_rb_tree_for_insert(sbi, &dcc->root, &parent, lstart);
+ p = f2fs_lookup_rb_tree_for_insert(sbi, &dcc->root, &parent, lstart);
do_insert:
dc = __attach_discard_cmd(sbi, bdev, lstart, start, len, parent, p);
if (!dc)
@@ -966,7 +1088,7 @@
mutex_lock(&dcc->cmd_lock);
- dc = (struct discard_cmd *)__lookup_rb_tree_ret(&dcc->root,
+ dc = (struct discard_cmd *)f2fs_lookup_rb_tree_ret(&dcc->root,
NULL, lstart,
(struct rb_entry **)&prev_dc,
(struct rb_entry **)&next_dc,
@@ -1059,58 +1181,49 @@
return 0;
}
-static int __issue_discard_cmd(struct f2fs_sb_info *sbi, bool issue_cond)
+static int __issue_discard_cmd(struct f2fs_sb_info *sbi,
+ struct discard_policy *dpolicy)
{
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
struct list_head *pend_list;
struct discard_cmd *dc, *tmp;
struct blk_plug plug;
- int iter = 0, issued = 0;
- int i;
+ int i, iter = 0, issued = 0;
bool io_interrupted = false;
- mutex_lock(&dcc->cmd_lock);
- f2fs_bug_on(sbi,
- !__check_rb_tree_consistence(sbi, &dcc->root));
- blk_start_plug(&plug);
- for (i = MAX_PLIST_NUM - 1;
- i >= 0 && plist_issue(dcc->pend_list_tag[i]); i--) {
+ for (i = MAX_PLIST_NUM - 1; i >= 0; i--) {
+ if (i + 1 < dpolicy->granularity)
+ break;
pend_list = &dcc->pend_list[i];
+
+ mutex_lock(&dcc->cmd_lock);
+ if (list_empty(pend_list))
+ goto next;
+ f2fs_bug_on(sbi,
+ !f2fs_check_rb_tree_consistence(sbi, &dcc->root));
+ blk_start_plug(&plug);
list_for_each_entry_safe(dc, tmp, pend_list, list) {
f2fs_bug_on(sbi, dc->state != D_PREP);
- /* Hurry up to finish fstrim */
- if (dcc->pend_list_tag[i] & P_TRIM) {
- __submit_discard_cmd(sbi, dc);
- issued++;
-
- if (fatal_signal_pending(current))
- break;
- continue;
- }
-
- if (!issue_cond) {
- __submit_discard_cmd(sbi, dc);
- issued++;
- continue;
- }
-
- if (is_idle(sbi)) {
- __submit_discard_cmd(sbi, dc);
- issued++;
- } else {
+ if (dpolicy->io_aware && i < dpolicy->io_aware_gran &&
+ !is_idle(sbi)) {
io_interrupted = true;
+ goto skip;
}
- if (++iter >= DISCARD_ISSUE_RATE)
- goto out;
+ __submit_discard_cmd(sbi, dpolicy, dc);
+ issued++;
+skip:
+ if (++iter >= dpolicy->max_requests)
+ break;
}
- if (list_empty(pend_list) && dcc->pend_list_tag[i] & P_TRIM)
- dcc->pend_list_tag[i] &= (~P_TRIM);
+ blk_finish_plug(&plug);
+next:
+ mutex_unlock(&dcc->cmd_lock);
+
+ if (iter >= dpolicy->max_requests)
+ break;
}
-out:
- blk_finish_plug(&plug);
- mutex_unlock(&dcc->cmd_lock);
if (!issued && io_interrupted)
issued = -1;
@@ -1118,12 +1231,13 @@
return issued;
}
-static void __drop_discard_cmd(struct f2fs_sb_info *sbi)
+static bool __drop_discard_cmd(struct f2fs_sb_info *sbi)
{
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
struct list_head *pend_list;
struct discard_cmd *dc, *tmp;
int i;
+ bool dropped = false;
mutex_lock(&dcc->cmd_lock);
for (i = MAX_PLIST_NUM - 1; i >= 0; i--) {
@@ -1131,39 +1245,63 @@
list_for_each_entry_safe(dc, tmp, pend_list, list) {
f2fs_bug_on(sbi, dc->state != D_PREP);
__remove_discard_cmd(sbi, dc);
+ dropped = true;
}
}
mutex_unlock(&dcc->cmd_lock);
+
+ return dropped;
}
-static void __wait_one_discard_bio(struct f2fs_sb_info *sbi,
+void f2fs_drop_discard_cmd(struct f2fs_sb_info *sbi)
+{
+ __drop_discard_cmd(sbi);
+}
+
+static unsigned int __wait_one_discard_bio(struct f2fs_sb_info *sbi,
struct discard_cmd *dc)
{
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
+ unsigned int len = 0;
wait_for_completion_io(&dc->wait);
mutex_lock(&dcc->cmd_lock);
f2fs_bug_on(sbi, dc->state != D_DONE);
dc->ref--;
- if (!dc->ref)
+ if (!dc->ref) {
+ if (!dc->error)
+ len = dc->len;
__remove_discard_cmd(sbi, dc);
+ }
mutex_unlock(&dcc->cmd_lock);
+
+ return len;
}
-static void __wait_discard_cmd(struct f2fs_sb_info *sbi, bool wait_cond)
+static unsigned int __wait_discard_cmd_range(struct f2fs_sb_info *sbi,
+ struct discard_policy *dpolicy,
+ block_t start, block_t end)
{
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
- struct list_head *wait_list = &(dcc->wait_list);
+ struct list_head *wait_list = (dpolicy->type == DPOLICY_FSTRIM) ?
+ &(dcc->fstrim_list) : &(dcc->wait_list);
struct discard_cmd *dc, *tmp;
bool need_wait;
+ unsigned int trimmed = 0;
next:
need_wait = false;
mutex_lock(&dcc->cmd_lock);
list_for_each_entry_safe(dc, tmp, wait_list, list) {
- if (!wait_cond || (dc->state == D_DONE && !dc->ref)) {
+ if (dc->lstart + dc->len <= start || end <= dc->lstart)
+ continue;
+ if (dc->len < dpolicy->granularity)
+ continue;
+ if (dc->state == D_DONE && !dc->ref) {
wait_for_completion_io(&dc->wait);
+ if (!dc->error)
+ trimmed += dc->len;
__remove_discard_cmd(sbi, dc);
} else {
dc->ref++;
@@ -1174,20 +1312,40 @@
mutex_unlock(&dcc->cmd_lock);
if (need_wait) {
- __wait_one_discard_bio(sbi, dc);
+ trimmed += __wait_one_discard_bio(sbi, dc);
goto next;
}
+
+ return trimmed;
+}
+
+static void __wait_all_discard_cmd(struct f2fs_sb_info *sbi,
+ struct discard_policy *dpolicy)
+{
+ struct discard_policy dp;
+
+ if (dpolicy) {
+ __wait_discard_cmd_range(sbi, dpolicy, 0, UINT_MAX);
+ return;
+ }
+
+ /* wait all */
+ __init_discard_policy(sbi, &dp, DPOLICY_FSTRIM, 1);
+ __wait_discard_cmd_range(sbi, &dp, 0, UINT_MAX);
+ __init_discard_policy(sbi, &dp, DPOLICY_UMOUNT, 1);
+ __wait_discard_cmd_range(sbi, &dp, 0, UINT_MAX);
}
/* This should be covered by global mutex, &sit_i->sentry_lock */
-void f2fs_wait_discard_bio(struct f2fs_sb_info *sbi, block_t blkaddr)
+static void f2fs_wait_discard_bio(struct f2fs_sb_info *sbi, block_t blkaddr)
{
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
struct discard_cmd *dc;
bool need_wait = false;
mutex_lock(&dcc->cmd_lock);
- dc = (struct discard_cmd *)__lookup_rb_tree(&dcc->root, NULL, blkaddr);
+ dc = (struct discard_cmd *)f2fs_lookup_rb_tree(&dcc->root,
+ NULL, blkaddr);
if (dc) {
if (dc->state == D_PREP) {
__punch_discard_cmd(sbi, dc, blkaddr);
@@ -1202,7 +1360,7 @@
__wait_one_discard_bio(sbi, dc);
}
-void stop_discard_thread(struct f2fs_sb_info *sbi)
+void f2fs_stop_discard_thread(struct f2fs_sb_info *sbi)
{
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
@@ -1214,23 +1372,21 @@
}
}
-/* This comes from f2fs_put_super and f2fs_trim_fs */
-void f2fs_wait_discard_bios(struct f2fs_sb_info *sbi, bool umount)
-{
- __issue_discard_cmd(sbi, false);
- __drop_discard_cmd(sbi);
- __wait_discard_cmd(sbi, !umount);
-}
-
-static void mark_discard_range_all(struct f2fs_sb_info *sbi)
+/* This comes from f2fs_put_super */
+bool f2fs_wait_discard_bios(struct f2fs_sb_info *sbi)
{
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
- int i;
+ struct discard_policy dpolicy;
+ bool dropped;
- mutex_lock(&dcc->cmd_lock);
- for (i = 0; i < MAX_PLIST_NUM; i++)
- dcc->pend_list_tag[i] |= P_TRIM;
- mutex_unlock(&dcc->cmd_lock);
+ __init_discard_policy(sbi, &dpolicy, DPOLICY_UMOUNT,
+ dcc->discard_granularity);
+ __issue_discard_cmd(sbi, &dpolicy);
+ dropped = __drop_discard_cmd(sbi);
+
+ /* just to make sure there is no pending discard commands */
+ __wait_all_discard_cmd(sbi, NULL);
+ return dropped;
}
static int issue_discard_thread(void *data)
@@ -1238,35 +1394,48 @@
struct f2fs_sb_info *sbi = data;
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
wait_queue_head_t *q = &dcc->discard_wait_queue;
+ struct discard_policy dpolicy;
unsigned int wait_ms = DEF_MIN_DISCARD_ISSUE_TIME;
int issued;
set_freezable();
do {
+ __init_discard_policy(sbi, &dpolicy, DPOLICY_BG,
+ dcc->discard_granularity);
+
wait_event_interruptible_timeout(*q,
kthread_should_stop() || freezing(current) ||
dcc->discard_wake,
msecs_to_jiffies(wait_ms));
+
+ if (dcc->discard_wake)
+ dcc->discard_wake = 0;
+
if (try_to_freeze())
continue;
+ if (f2fs_readonly(sbi->sb))
+ continue;
if (kthread_should_stop())
return 0;
-
- if (dcc->discard_wake) {
- dcc->discard_wake = 0;
- if (sbi->gc_thread && sbi->gc_thread->gc_urgent)
- mark_discard_range_all(sbi);
+ if (is_sbi_flag_set(sbi, SBI_NEED_FSCK)) {
+ wait_ms = dpolicy.max_interval;
+ continue;
}
+ if (sbi->gc_mode == GC_URGENT)
+ __init_discard_policy(sbi, &dpolicy, DPOLICY_FORCE, 1);
+
sb_start_intwrite(sbi->sb);
- issued = __issue_discard_cmd(sbi, true);
- if (issued) {
- __wait_discard_cmd(sbi, true);
- wait_ms = DEF_MIN_DISCARD_ISSUE_TIME;
+ issued = __issue_discard_cmd(sbi, &dpolicy);
+ if (issued > 0) {
+ __wait_all_discard_cmd(sbi, &dpolicy);
+ wait_ms = dpolicy.min_interval;
+ } else if (issued == -1){
+ wait_ms = dpolicy.mid_interval;
} else {
- wait_ms = DEF_MAX_DISCARD_ISSUE_TIME;
+ wait_ms = dpolicy.max_interval;
}
sb_end_intwrite(sbi->sb);
@@ -1326,7 +1495,7 @@
struct block_device *bdev, block_t blkstart, block_t blklen)
{
#ifdef CONFIG_BLK_DEV_ZONED
- if (f2fs_sb_mounted_blkzoned(sbi->sb) &&
+ if (f2fs_sb_has_blkzoned(sbi->sb) &&
bdev_zoned_model(bdev) != BLK_ZONED_NONE)
return __f2fs_issue_discard_zone(sbi, bdev, blkstart, blklen);
#endif
@@ -1433,20 +1602,24 @@
return false;
}
-void release_discard_addrs(struct f2fs_sb_info *sbi)
+static void release_discard_addr(struct discard_entry *entry)
+{
+ list_del(&entry->list);
+ kmem_cache_free(discard_entry_slab, entry);
+}
+
+void f2fs_release_discard_addrs(struct f2fs_sb_info *sbi)
{
struct list_head *head = &(SM_I(sbi)->dcc_info->entry_list);
struct discard_entry *entry, *this;
/* drop caches */
- list_for_each_entry_safe(entry, this, head, list) {
- list_del(&entry->list);
- kmem_cache_free(discard_entry_slab, entry);
- }
+ list_for_each_entry_safe(entry, this, head, list)
+ release_discard_addr(entry);
}
/*
- * Should call clear_prefree_segments after checkpoint is done.
+ * Should call f2fs_clear_prefree_segments after checkpoint is done.
*/
static void set_prefree_as_free_segments(struct f2fs_sb_info *sbi)
{
@@ -1459,7 +1632,8 @@
mutex_unlock(&dirty_i->seglist_lock);
}
-void clear_prefree_segments(struct f2fs_sb_info *sbi, struct cp_control *cpc)
+void f2fs_clear_prefree_segments(struct f2fs_sb_info *sbi,
+ struct cp_control *cpc)
{
struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
struct list_head *head = &dcc->entry_list;
@@ -1524,13 +1698,12 @@
sbi->blocks_per_seg, cur_pos);
len = next_pos - cur_pos;
- if (f2fs_sb_mounted_blkzoned(sbi->sb) ||
+ if (f2fs_sb_has_blkzoned(sbi->sb) ||
(force && len < cpc->trim_minlen))
goto skip;
f2fs_issue_discard(sbi, entry->start_blkaddr + cur_pos,
len);
- cpc->trimmed += len;
total_len += len;
} else {
next_pos = find_next_bit_le(entry->discard_map,
@@ -1543,9 +1716,8 @@
if (cur_pos < sbi->blocks_per_seg)
goto find_next;
- list_del(&entry->list);
+ release_discard_addr(entry);
dcc->nr_discards -= total_len;
- kmem_cache_free(discard_entry_slab, entry);
}
wake_up_discard_thread(sbi, false);
@@ -1562,18 +1734,16 @@
goto init_thread;
}
- dcc = kzalloc(sizeof(struct discard_cmd_control), GFP_KERNEL);
+ dcc = f2fs_kzalloc(sbi, sizeof(struct discard_cmd_control), GFP_KERNEL);
if (!dcc)
return -ENOMEM;
dcc->discard_granularity = DEFAULT_DISCARD_GRANULARITY;
INIT_LIST_HEAD(&dcc->entry_list);
- for (i = 0; i < MAX_PLIST_NUM; i++) {
+ for (i = 0; i < MAX_PLIST_NUM; i++)
INIT_LIST_HEAD(&dcc->pend_list[i]);
- if (i >= dcc->discard_granularity - 1)
- dcc->pend_list_tag[i] |= P_ACTIVE;
- }
INIT_LIST_HEAD(&dcc->wait_list);
+ INIT_LIST_HEAD(&dcc->fstrim_list);
mutex_init(&dcc->cmd_lock);
atomic_set(&dcc->issued_discard, 0);
atomic_set(&dcc->issing_discard, 0);
@@ -1605,7 +1775,7 @@
if (!dcc)
return;
- stop_discard_thread(sbi);
+ f2fs_stop_discard_thread(sbi);
kfree(dcc);
SM_I(sbi)->dcc_info = NULL;
@@ -1652,8 +1822,9 @@
(new_vblocks > sbi->blocks_per_seg)));
se->valid_blocks = new_vblocks;
- se->mtime = get_mtime(sbi);
- SIT_I(sbi)->max_mtime = se->mtime;
+ se->mtime = get_mtime(sbi, false);
+ if (se->mtime > SIT_I(sbi)->max_mtime)
+ SIT_I(sbi)->max_mtime = se->mtime;
/* Update valid block bitmap */
if (del > 0) {
@@ -1681,7 +1852,7 @@
sbi->discard_blks--;
/* don't overwrite by SSR to keep node chain */
- if (se->type == CURSEG_WARM_NODE) {
+ if (IS_NODESEG(se->type)) {
if (!f2fs_test_and_set_bit(offset, se->ckpt_valid_map))
se->ckpt_valid_blocks++;
}
@@ -1721,17 +1892,7 @@
get_sec_entry(sbi, segno)->valid_blocks += del;
}
-void refresh_sit_entry(struct f2fs_sb_info *sbi, block_t old, block_t new)
-{
- update_sit_entry(sbi, new, 1);
- if (GET_SEGNO(sbi, old) != NULL_SEGNO)
- update_sit_entry(sbi, old, -1);
-
- locate_dirty_segment(sbi, GET_SEGNO(sbi, old));
- locate_dirty_segment(sbi, GET_SEGNO(sbi, new));
-}
-
-void invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr)
+void f2fs_invalidate_blocks(struct f2fs_sb_info *sbi, block_t addr)
{
unsigned int segno = GET_SEGNO(sbi, addr);
struct sit_info *sit_i = SIT_I(sbi);
@@ -1741,27 +1902,27 @@
return;
/* add it into sit main buffer */
- mutex_lock(&sit_i->sentry_lock);
+ down_write(&sit_i->sentry_lock);
update_sit_entry(sbi, addr, -1);
/* add it into dirty seglist */
locate_dirty_segment(sbi, segno);
- mutex_unlock(&sit_i->sentry_lock);
+ up_write(&sit_i->sentry_lock);
}
-bool is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr)
+bool f2fs_is_checkpointed_data(struct f2fs_sb_info *sbi, block_t blkaddr)
{
struct sit_info *sit_i = SIT_I(sbi);
unsigned int segno, offset;
struct seg_entry *se;
bool is_cp = false;
- if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR)
+ if (!is_valid_blkaddr(blkaddr))
return true;
- mutex_lock(&sit_i->sentry_lock);
+ down_read(&sit_i->sentry_lock);
segno = GET_SEGNO(sbi, blkaddr);
se = get_seg_entry(sbi, segno);
@@ -1770,7 +1931,7 @@
if (f2fs_test_bit(offset, se->ckpt_valid_map))
is_cp = true;
- mutex_unlock(&sit_i->sentry_lock);
+ up_read(&sit_i->sentry_lock);
return is_cp;
}
@@ -1790,7 +1951,7 @@
/*
* Calculate the number of current summary pages for writing
*/
-int npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra)
+int f2fs_npages_for_summary_flush(struct f2fs_sb_info *sbi, bool for_ra)
{
int valid_sum_count = 0;
int i, sum_in_page;
@@ -1820,20 +1981,17 @@
/*
* Caller should put this summary page
*/
-struct page *get_sum_page(struct f2fs_sb_info *sbi, unsigned int segno)
+struct page *f2fs_get_sum_page(struct f2fs_sb_info *sbi, unsigned int segno)
{
- return get_meta_page(sbi, GET_SUM_BLOCK(sbi, segno));
+ return f2fs_get_meta_page(sbi, GET_SUM_BLOCK(sbi, segno));
}
-void update_meta_page(struct f2fs_sb_info *sbi, void *src, block_t blk_addr)
+void f2fs_update_meta_page(struct f2fs_sb_info *sbi,
+ void *src, block_t blk_addr)
{
- struct page *page = grab_meta_page(sbi, blk_addr);
- void *dst = page_address(page);
+ struct page *page = f2fs_grab_meta_page(sbi, blk_addr);
- if (src)
- memcpy(dst, src, PAGE_SIZE);
- else
- memset(dst, 0, PAGE_SIZE);
+ memcpy(page_address(page), src, PAGE_SIZE);
set_page_dirty(page);
f2fs_put_page(page, 1);
}
@@ -1841,18 +1999,19 @@
static void write_sum_page(struct f2fs_sb_info *sbi,
struct f2fs_summary_block *sum_blk, block_t blk_addr)
{
- update_meta_page(sbi, (void *)sum_blk, blk_addr);
+ f2fs_update_meta_page(sbi, (void *)sum_blk, blk_addr);
}
static void write_current_sum_page(struct f2fs_sb_info *sbi,
int type, block_t blk_addr)
{
struct curseg_info *curseg = CURSEG_I(sbi, type);
- struct page *page = grab_meta_page(sbi, blk_addr);
+ struct page *page = f2fs_grab_meta_page(sbi, blk_addr);
struct f2fs_summary_block *src = curseg->sum_blk;
struct f2fs_summary_block *dst;
dst = (struct f2fs_summary_block *)page_address(page);
+ memset(dst, 0, PAGE_SIZE);
mutex_lock(&curseg->curseg_mutex);
@@ -1932,7 +2091,6 @@
}
secno = left_start;
skip_left:
- hint = secno;
segno = GET_SEG_FROM_SEC(sbi, secno);
zoneno = GET_ZONE_FROM_SEC(sbi, secno);
@@ -2003,6 +2161,11 @@
if (SIT_I(sbi)->last_victim[ALLOC_NEXT])
return SIT_I(sbi)->last_victim[ALLOC_NEXT];
+
+ /* find segments from 0 to reuse freed segments */
+ if (F2FS_OPTION(sbi).alloc_mode == ALLOC_MODE_REUSE)
+ return 0;
+
return CURSEG_I(sbi, type)->segno;
}
@@ -2088,7 +2251,7 @@
curseg->alloc_type = SSR;
__next_free_blkoff(sbi, curseg, 0);
- sum_page = get_sum_page(sbi, new_segno);
+ sum_page = f2fs_get_sum_page(sbi, new_segno);
sum_node = (struct f2fs_summary_block *)page_address(sum_page);
memcpy(curseg->sum_blk, sum_node, SUM_ENTRY_SIZE);
f2fs_put_page(sum_page, 1);
@@ -2102,7 +2265,7 @@
int i, cnt;
bool reversed = false;
- /* need_SSR() already forces to do this */
+ /* f2fs_need_SSR() already forces to do this */
if (v_ops->get_victim(sbi, &segno, BG_GC, type, SSR)) {
curseg->next_segno = segno;
return 1;
@@ -2154,7 +2317,7 @@
new_curseg(sbi, type, false);
else if (curseg->alloc_type == LFS && is_next_segment_free(sbi, type))
new_curseg(sbi, type, false);
- else if (need_SSR(sbi) && get_ssr_segment(sbi, type))
+ else if (f2fs_need_SSR(sbi) && get_ssr_segment(sbi, type))
change_curseg(sbi, type);
else
new_curseg(sbi, type, false);
@@ -2162,97 +2325,168 @@
stat_inc_seg_type(sbi, curseg);
}
-void allocate_new_segments(struct f2fs_sb_info *sbi)
+void f2fs_allocate_new_segments(struct f2fs_sb_info *sbi)
{
struct curseg_info *curseg;
unsigned int old_segno;
int i;
+ down_write(&SIT_I(sbi)->sentry_lock);
+
for (i = CURSEG_HOT_DATA; i <= CURSEG_COLD_DATA; i++) {
curseg = CURSEG_I(sbi, i);
old_segno = curseg->segno;
SIT_I(sbi)->s_ops->allocate_segment(sbi, i, true);
locate_dirty_segment(sbi, old_segno);
}
+
+ up_write(&SIT_I(sbi)->sentry_lock);
}
static const struct segment_allocation default_salloc_ops = {
.allocate_segment = allocate_segment_by_default,
};
-bool exist_trim_candidates(struct f2fs_sb_info *sbi, struct cp_control *cpc)
+bool f2fs_exist_trim_candidates(struct f2fs_sb_info *sbi,
+ struct cp_control *cpc)
{
__u64 trim_start = cpc->trim_start;
bool has_candidate = false;
- mutex_lock(&SIT_I(sbi)->sentry_lock);
+ down_write(&SIT_I(sbi)->sentry_lock);
for (; cpc->trim_start <= cpc->trim_end; cpc->trim_start++) {
if (add_discard_addrs(sbi, cpc, true)) {
has_candidate = true;
break;
}
}
- mutex_unlock(&SIT_I(sbi)->sentry_lock);
+ up_write(&SIT_I(sbi)->sentry_lock);
cpc->trim_start = trim_start;
return has_candidate;
}
+static void __issue_discard_cmd_range(struct f2fs_sb_info *sbi,
+ struct discard_policy *dpolicy,
+ unsigned int start, unsigned int end)
+{
+ struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
+ struct discard_cmd *prev_dc = NULL, *next_dc = NULL;
+ struct rb_node **insert_p = NULL, *insert_parent = NULL;
+ struct discard_cmd *dc;
+ struct blk_plug plug;
+ int issued;
+
+next:
+ issued = 0;
+
+ mutex_lock(&dcc->cmd_lock);
+ f2fs_bug_on(sbi, !f2fs_check_rb_tree_consistence(sbi, &dcc->root));
+
+ dc = (struct discard_cmd *)f2fs_lookup_rb_tree_ret(&dcc->root,
+ NULL, start,
+ (struct rb_entry **)&prev_dc,
+ (struct rb_entry **)&next_dc,
+ &insert_p, &insert_parent, true);
+ if (!dc)
+ dc = next_dc;
+
+ blk_start_plug(&plug);
+
+ while (dc && dc->lstart <= end) {
+ struct rb_node *node;
+
+ if (dc->len < dpolicy->granularity)
+ goto skip;
+
+ if (dc->state != D_PREP) {
+ list_move_tail(&dc->list, &dcc->fstrim_list);
+ goto skip;
+ }
+
+ __submit_discard_cmd(sbi, dpolicy, dc);
+
+ if (++issued >= dpolicy->max_requests) {
+ start = dc->lstart + dc->len;
+
+ blk_finish_plug(&plug);
+ mutex_unlock(&dcc->cmd_lock);
+ __wait_all_discard_cmd(sbi, NULL);
+ congestion_wait(BLK_RW_ASYNC, HZ/50);
+ goto next;
+ }
+skip:
+ node = rb_next(&dc->rb_node);
+ dc = rb_entry_safe(node, struct discard_cmd, rb_node);
+
+ if (fatal_signal_pending(current))
+ break;
+ }
+
+ blk_finish_plug(&plug);
+ mutex_unlock(&dcc->cmd_lock);
+}
+
int f2fs_trim_fs(struct f2fs_sb_info *sbi, struct fstrim_range *range)
{
__u64 start = F2FS_BYTES_TO_BLK(range->start);
__u64 end = start + F2FS_BYTES_TO_BLK(range->len) - 1;
unsigned int start_segno, end_segno;
+ block_t start_block, end_block;
struct cp_control cpc;
+ struct discard_policy dpolicy;
+ unsigned long long trimmed = 0;
int err = 0;
if (start >= MAX_BLKADDR(sbi) || range->len < sbi->blocksize)
return -EINVAL;
- cpc.trimmed = 0;
if (end <= MAIN_BLKADDR(sbi))
- goto out;
+ return -EINVAL;
if (is_sbi_flag_set(sbi, SBI_NEED_FSCK)) {
f2fs_msg(sbi->sb, KERN_WARNING,
"Found FS corruption, run fsck to fix.");
- goto out;
+ return -EIO;
}
/* start/end segment number in main_area */
start_segno = (start <= MAIN_BLKADDR(sbi)) ? 0 : GET_SEGNO(sbi, start);
end_segno = (end >= MAX_BLKADDR(sbi)) ? MAIN_SEGS(sbi) - 1 :
GET_SEGNO(sbi, end);
+
cpc.reason = CP_DISCARD;
cpc.trim_minlen = max_t(__u64, 1, F2FS_BYTES_TO_BLK(range->minlen));
+ cpc.trim_start = start_segno;
+ cpc.trim_end = end_segno;
- /* do checkpoint to issue discard commands safely */
- for (; start_segno <= end_segno; start_segno = cpc.trim_end + 1) {
- cpc.trim_start = start_segno;
+ if (sbi->discard_blks == 0)
+ goto out;
- if (sbi->discard_blks == 0)
- break;
- else if (sbi->discard_blks < BATCHED_TRIM_BLOCKS(sbi))
- cpc.trim_end = end_segno;
- else
- cpc.trim_end = min_t(unsigned int,
- rounddown(start_segno +
- BATCHED_TRIM_SEGMENTS(sbi),
- sbi->segs_per_sec) - 1, end_segno);
+ mutex_lock(&sbi->gc_mutex);
+ err = f2fs_write_checkpoint(sbi, &cpc);
+ mutex_unlock(&sbi->gc_mutex);
+ if (err)
+ goto out;
- mutex_lock(&sbi->gc_mutex);
- err = write_checkpoint(sbi, &cpc);
- mutex_unlock(&sbi->gc_mutex);
- if (err)
- break;
+ start_block = START_BLOCK(sbi, start_segno);
+ end_block = START_BLOCK(sbi, end_segno + 1);
- schedule();
+ __init_discard_policy(sbi, &dpolicy, DPOLICY_FSTRIM, cpc.trim_minlen);
+ __issue_discard_cmd_range(sbi, &dpolicy, start_block, end_block);
+
+ /*
+ * We filed discard candidates, but actually we don't need to wait for
+ * all of them, since they'll be issued in idle time along with runtime
+ * discard option. User configuration looks like using runtime discard
+ * or periodic fstrim instead of it.
+ */
+ if (!test_opt(sbi, DISCARD)) {
+ trimmed = __wait_discard_cmd_range(sbi, &dpolicy,
+ start_block, end_block);
+ range->len = F2FS_BLK_TO_BYTES(trimmed);
}
- /* It's time to issue all the filed discards */
- mark_discard_range_all(sbi);
- f2fs_wait_discard_bios(sbi, false);
out:
- range->len = F2FS_BLK_TO_BYTES(cpc.trimmed);
return err;
}
@@ -2264,6 +2498,113 @@
return false;
}
+int f2fs_rw_hint_to_seg_type(enum rw_hint hint)
+{
+ switch (hint) {
+ case WRITE_LIFE_SHORT:
+ return CURSEG_HOT_DATA;
+ case WRITE_LIFE_EXTREME:
+ return CURSEG_COLD_DATA;
+ default:
+ return CURSEG_WARM_DATA;
+ }
+}
+
+/* This returns write hints for each segment type. This hints will be
+ * passed down to block layer. There are mapping tables which depend on
+ * the mount option 'whint_mode'.
+ *
+ * 1) whint_mode=off. F2FS only passes down WRITE_LIFE_NOT_SET.
+ *
+ * 2) whint_mode=user-based. F2FS tries to pass down hints given by users.
+ *
+ * User F2FS Block
+ * ---- ---- -----
+ * META WRITE_LIFE_NOT_SET
+ * HOT_NODE "
+ * WARM_NODE "
+ * COLD_NODE "
+ * ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
+ * extension list " "
+ *
+ * -- buffered io
+ * WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
+ * WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
+ * WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
+ * WRITE_LIFE_NONE " "
+ * WRITE_LIFE_MEDIUM " "
+ * WRITE_LIFE_LONG " "
+ *
+ * -- direct io
+ * WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
+ * WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
+ * WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
+ * WRITE_LIFE_NONE " WRITE_LIFE_NONE
+ * WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
+ * WRITE_LIFE_LONG " WRITE_LIFE_LONG
+ *
+ * 3) whint_mode=fs-based. F2FS passes down hints with its policy.
+ *
+ * User F2FS Block
+ * ---- ---- -----
+ * META WRITE_LIFE_MEDIUM;
+ * HOT_NODE WRITE_LIFE_NOT_SET
+ * WARM_NODE "
+ * COLD_NODE WRITE_LIFE_NONE
+ * ioctl(COLD) COLD_DATA WRITE_LIFE_EXTREME
+ * extension list " "
+ *
+ * -- buffered io
+ * WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
+ * WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
+ * WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_LONG
+ * WRITE_LIFE_NONE " "
+ * WRITE_LIFE_MEDIUM " "
+ * WRITE_LIFE_LONG " "
+ *
+ * -- direct io
+ * WRITE_LIFE_EXTREME COLD_DATA WRITE_LIFE_EXTREME
+ * WRITE_LIFE_SHORT HOT_DATA WRITE_LIFE_SHORT
+ * WRITE_LIFE_NOT_SET WARM_DATA WRITE_LIFE_NOT_SET
+ * WRITE_LIFE_NONE " WRITE_LIFE_NONE
+ * WRITE_LIFE_MEDIUM " WRITE_LIFE_MEDIUM
+ * WRITE_LIFE_LONG " WRITE_LIFE_LONG
+ */
+
+enum rw_hint f2fs_io_type_to_rw_hint(struct f2fs_sb_info *sbi,
+ enum page_type type, enum temp_type temp)
+{
+ if (F2FS_OPTION(sbi).whint_mode == WHINT_MODE_USER) {
+ if (type == DATA) {
+ if (temp == WARM)
+ return WRITE_LIFE_NOT_SET;
+ else if (temp == HOT)
+ return WRITE_LIFE_SHORT;
+ else if (temp == COLD)
+ return WRITE_LIFE_EXTREME;
+ } else {
+ return WRITE_LIFE_NOT_SET;
+ }
+ } else if (F2FS_OPTION(sbi).whint_mode == WHINT_MODE_FS) {
+ if (type == DATA) {
+ if (temp == WARM)
+ return WRITE_LIFE_LONG;
+ else if (temp == HOT)
+ return WRITE_LIFE_SHORT;
+ else if (temp == COLD)
+ return WRITE_LIFE_EXTREME;
+ } else if (type == NODE) {
+ if (temp == WARM || temp == HOT)
+ return WRITE_LIFE_NOT_SET;
+ else if (temp == COLD)
+ return WRITE_LIFE_NONE;
+ } else if (type == META) {
+ return WRITE_LIFE_MEDIUM;
+ }
+ }
+ return WRITE_LIFE_NOT_SET;
+}
+
static int __get_segment_type_2(struct f2fs_io_info *fio)
{
if (fio->type == DATA)
@@ -2296,9 +2637,12 @@
if (is_cold_data(fio->page) || file_is_cold(inode))
return CURSEG_COLD_DATA;
- if (is_inode_flag_set(inode, FI_HOT_DATA))
+ if (file_is_hot(inode) ||
+ is_inode_flag_set(inode, FI_HOT_DATA) ||
+ is_inode_flag_set(inode, FI_ATOMIC_FILE) ||
+ is_inode_flag_set(inode, FI_VOLATILE_FILE))
return CURSEG_HOT_DATA;
- return CURSEG_WARM_DATA;
+ return f2fs_rw_hint_to_seg_type(inode->i_write_hint);
} else {
if (IS_DNODE(fio->page))
return is_cold_node(fio->page) ? CURSEG_WARM_NODE :
@@ -2311,7 +2655,7 @@
{
int type = 0;
- switch (fio->sbi->active_logs) {
+ switch (F2FS_OPTION(fio->sbi).active_logs) {
case 2:
type = __get_segment_type_2(fio);
break;
@@ -2334,7 +2678,7 @@
return type;
}
-void allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,
+void f2fs_allocate_data_block(struct f2fs_sb_info *sbi, struct page *page,
block_t old_blkaddr, block_t *new_blkaddr,
struct f2fs_summary *sum, int type,
struct f2fs_io_info *fio, bool add_list)
@@ -2342,8 +2686,10 @@
struct sit_info *sit_i = SIT_I(sbi);
struct curseg_info *curseg = CURSEG_I(sbi, type);
+ down_read(&SM_I(sbi)->curseg_lock);
+
mutex_lock(&curseg->curseg_mutex);
- mutex_lock(&sit_i->sentry_lock);
+ down_write(&sit_i->sentry_lock);
*new_blkaddr = NEXT_FREE_BLKADDR(sbi, curseg);
@@ -2360,15 +2706,26 @@
stat_inc_block_count(sbi, curseg);
+ /*
+ * SIT information should be updated before segment allocation,
+ * since SSR needs latest valid block information.
+ */
+ update_sit_entry(sbi, *new_blkaddr, 1);
+ if (GET_SEGNO(sbi, old_blkaddr) != NULL_SEGNO)
+ update_sit_entry(sbi, old_blkaddr, -1);
+
if (!__has_curseg_space(sbi, type))
sit_i->s_ops->allocate_segment(sbi, type, false);
- /*
- * SIT information should be updated after segment allocation,
- * since we need to keep dirty segments precisely under SSR.
- */
- refresh_sit_entry(sbi, old_blkaddr, *new_blkaddr);
- mutex_unlock(&sit_i->sentry_lock);
+ /*
+ * segment dirty status should be updated after segment allocation,
+ * so we just need to update status only one time after previous
+ * segment being closed.
+ */
+ locate_dirty_segment(sbi, GET_SEGNO(sbi, old_blkaddr));
+ locate_dirty_segment(sbi, GET_SEGNO(sbi, *new_blkaddr));
+
+ up_write(&sit_i->sentry_lock);
if (page && IS_NODESEG(type)) {
fill_node_footer_blkaddr(page, NEXT_FREE_BLKADDR(sbi, curseg));
@@ -2381,6 +2738,7 @@
INIT_LIST_HEAD(&fio->list);
fio->in_list = true;
+ fio->retry = false;
io = sbi->write_io[fio->type] + fio->temp;
spin_lock(&io->io_lock);
list_add_tail(&fio->list, &io->io_list);
@@ -2388,31 +2746,62 @@
}
mutex_unlock(&curseg->curseg_mutex);
+
+ up_read(&SM_I(sbi)->curseg_lock);
+}
+
+static void update_device_state(struct f2fs_io_info *fio)
+{
+ struct f2fs_sb_info *sbi = fio->sbi;
+ unsigned int devidx;
+
+ if (!sbi->s_ndevs)
+ return;
+
+ devidx = f2fs_target_device_index(sbi, fio->new_blkaddr);
+
+ /* update device state for fsync */
+ f2fs_set_dirty_device(sbi, fio->ino, devidx, FLUSH_INO);
+
+ /* update device state for checkpoint */
+ if (!f2fs_test_bit(devidx, (char *)&sbi->dirty_device)) {
+ spin_lock(&sbi->dev_lock);
+ f2fs_set_bit(devidx, (char *)&sbi->dirty_device);
+ spin_unlock(&sbi->dev_lock);
+ }
}
static void do_write_page(struct f2fs_summary *sum, struct f2fs_io_info *fio)
{
int type = __get_segment_type(fio);
- int err;
+ bool keep_order = (test_opt(fio->sbi, LFS) && type == CURSEG_COLD_DATA);
+ if (keep_order)
+ down_read(&fio->sbi->io_order_lock);
reallocate:
- allocate_data_block(fio->sbi, fio->page, fio->old_blkaddr,
+ f2fs_allocate_data_block(fio->sbi, fio->page, fio->old_blkaddr,
&fio->new_blkaddr, sum, type, fio, true);
/* writeout dirty page into bdev */
- err = f2fs_submit_page_write(fio);
- if (err == -EAGAIN) {
+ f2fs_submit_page_write(fio);
+ if (fio->retry) {
fio->old_blkaddr = fio->new_blkaddr;
goto reallocate;
}
+
+ update_device_state(fio);
+
+ if (keep_order)
+ up_read(&fio->sbi->io_order_lock);
}
-void write_meta_page(struct f2fs_sb_info *sbi, struct page *page,
+void f2fs_do_write_meta_page(struct f2fs_sb_info *sbi, struct page *page,
enum iostat_type io_type)
{
struct f2fs_io_info fio = {
.sbi = sbi,
.type = META,
+ .temp = HOT,
.op = REQ_OP_WRITE,
.op_flags = REQ_SYNC | REQ_META | REQ_PRIO,
.old_blkaddr = page->index,
@@ -2426,12 +2815,13 @@
fio.op_flags &= ~REQ_META;
set_page_writeback(page);
+ ClearPageError(page);
f2fs_submit_page_write(&fio);
f2fs_update_iostat(sbi, io_type, F2FS_BLKSIZE);
}
-void write_node_page(unsigned int nid, struct f2fs_io_info *fio)
+void f2fs_do_write_node_page(unsigned int nid, struct f2fs_io_info *fio)
{
struct f2fs_summary sum;
@@ -2441,14 +2831,15 @@
f2fs_update_iostat(fio->sbi, fio->io_type, F2FS_BLKSIZE);
}
-void write_data_page(struct dnode_of_data *dn, struct f2fs_io_info *fio)
+void f2fs_outplace_write_data(struct dnode_of_data *dn,
+ struct f2fs_io_info *fio)
{
struct f2fs_sb_info *sbi = fio->sbi;
struct f2fs_summary sum;
struct node_info ni;
f2fs_bug_on(sbi, dn->data_blkaddr == NULL_ADDR);
- get_node_info(sbi, dn->nid, &ni);
+ f2fs_get_node_info(sbi, dn->nid, &ni);
set_summary(&sum, dn->nid, dn->ofs_in_node, ni.version);
do_write_page(&sum, fio);
f2fs_update_data_blkaddr(dn, fio->new_blkaddr);
@@ -2456,21 +2847,42 @@
f2fs_update_iostat(sbi, fio->io_type, F2FS_BLKSIZE);
}
-int rewrite_data_page(struct f2fs_io_info *fio)
+int f2fs_inplace_write_data(struct f2fs_io_info *fio)
{
int err;
+ struct f2fs_sb_info *sbi = fio->sbi;
fio->new_blkaddr = fio->old_blkaddr;
+ /* i/o temperature is needed for passing down write hints */
+ __get_segment_type(fio);
+
+ f2fs_bug_on(sbi, !IS_DATASEG(get_seg_entry(sbi,
+ GET_SEGNO(sbi, fio->new_blkaddr))->type));
+
stat_inc_inplace_blocks(fio->sbi);
err = f2fs_submit_page_bio(fio);
+ if (!err)
+ update_device_state(fio);
f2fs_update_iostat(fio->sbi, fio->io_type, F2FS_BLKSIZE);
return err;
}
-void __f2fs_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
+static inline int __f2fs_get_curseg(struct f2fs_sb_info *sbi,
+ unsigned int segno)
+{
+ int i;
+
+ for (i = CURSEG_HOT_DATA; i < NO_CHECK_TYPE; i++) {
+ if (CURSEG_I(sbi, i)->segno == segno)
+ break;
+ }
+ return i;
+}
+
+void f2fs_do_replace_block(struct f2fs_sb_info *sbi, struct f2fs_summary *sum,
block_t old_blkaddr, block_t new_blkaddr,
bool recover_curseg, bool recover_newaddr)
{
@@ -2485,6 +2897,8 @@
se = get_seg_entry(sbi, segno);
type = se->type;
+ down_write(&SM_I(sbi)->curseg_lock);
+
if (!recover_curseg) {
/* for recovery flow */
if (se->valid_blocks == 0 && !IS_CURSEG(sbi, segno)) {
@@ -2494,14 +2908,20 @@
type = CURSEG_WARM_DATA;
}
} else {
- if (!IS_CURSEG(sbi, segno))
+ if (IS_CURSEG(sbi, segno)) {
+ /* se->type is volatile as SSR allocation */
+ type = __f2fs_get_curseg(sbi, segno);
+ f2fs_bug_on(sbi, type == NO_CHECK_TYPE);
+ } else {
type = CURSEG_WARM_DATA;
+ }
}
+ f2fs_bug_on(sbi, !IS_DATASEG(type));
curseg = CURSEG_I(sbi, type);
mutex_lock(&curseg->curseg_mutex);
- mutex_lock(&sit_i->sentry_lock);
+ down_write(&sit_i->sentry_lock);
old_cursegno = curseg->segno;
old_blkoff = curseg->next_blkoff;
@@ -2533,8 +2953,9 @@
curseg->next_blkoff = old_blkoff;
}
- mutex_unlock(&sit_i->sentry_lock);
+ up_write(&sit_i->sentry_lock);
mutex_unlock(&curseg->curseg_mutex);
+ up_write(&SM_I(sbi)->curseg_lock);
}
void f2fs_replace_block(struct f2fs_sb_info *sbi, struct dnode_of_data *dn,
@@ -2546,7 +2967,7 @@
set_summary(&sum, dn->nid, dn->ofs_in_node, version);
- __f2fs_replace_block(sbi, &sum, old_addr, new_addr,
+ f2fs_do_replace_block(sbi, &sum, old_addr, new_addr,
recover_curseg, recover_newaddr);
f2fs_update_data_blkaddr(dn, new_addr);
@@ -2571,7 +2992,7 @@
{
struct page *cpage;
- if (blkaddr == NEW_ADDR || blkaddr == NULL_ADDR)
+ if (!is_valid_blkaddr(blkaddr))
return;
cpage = find_lock_page(META_MAPPING(sbi), blkaddr);
@@ -2581,7 +3002,7 @@
}
}
-static int read_compacted_summaries(struct f2fs_sb_info *sbi)
+static void read_compacted_summaries(struct f2fs_sb_info *sbi)
{
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
struct curseg_info *seg_i;
@@ -2592,7 +3013,7 @@
start = start_sum_block(sbi);
- page = get_meta_page(sbi, start++);
+ page = f2fs_get_meta_page(sbi, start++);
kaddr = (unsigned char *)page_address(page);
/* Step 1: restore nat cache */
@@ -2632,13 +3053,12 @@
f2fs_put_page(page, 1);
page = NULL;
- page = get_meta_page(sbi, start++);
+ page = f2fs_get_meta_page(sbi, start++);
kaddr = (unsigned char *)page_address(page);
offset = 0;
}
}
f2fs_put_page(page, 1);
- return 0;
}
static int read_normal_summaries(struct f2fs_sb_info *sbi, int type)
@@ -2672,7 +3092,7 @@
blk_addr = GET_SUM_BLOCK(sbi, segno);
}
- new = get_meta_page(sbi, blk_addr);
+ new = f2fs_get_meta_page(sbi, blk_addr);
sum = (struct f2fs_summary_block *)page_address(new);
if (IS_NODESEG(type)) {
@@ -2684,13 +3104,7 @@
ns->ofs_in_node = 0;
}
} else {
- int err;
-
- err = restore_node_summary(sbi, segno, sum);
- if (err) {
- f2fs_put_page(new, 1);
- return err;
- }
+ f2fs_restore_node_summary(sbi, segno, sum);
}
}
@@ -2722,20 +3136,19 @@
int err;
if (is_set_ckpt_flags(sbi, CP_COMPACT_SUM_FLAG)) {
- int npages = npages_for_summary_flush(sbi, true);
+ int npages = f2fs_npages_for_summary_flush(sbi, true);
if (npages >= 2)
- ra_meta_pages(sbi, start_sum_block(sbi), npages,
+ f2fs_ra_meta_pages(sbi, start_sum_block(sbi), npages,
META_CP, true);
/* restore for compacted data summary */
- if (read_compacted_summaries(sbi))
- return -EINVAL;
+ read_compacted_summaries(sbi);
type = CURSEG_HOT_NODE;
}
if (__exist_node_summaries(sbi))
- ra_meta_pages(sbi, sum_blk_addr(sbi, NR_CURSEG_TYPE, type),
+ f2fs_ra_meta_pages(sbi, sum_blk_addr(sbi, NR_CURSEG_TYPE, type),
NR_CURSEG_TYPE - type, META_CP, true);
for (; type <= CURSEG_COLD_NODE; type++) {
@@ -2761,8 +3174,9 @@
int written_size = 0;
int i, j;
- page = grab_meta_page(sbi, blkaddr++);
+ page = f2fs_grab_meta_page(sbi, blkaddr++);
kaddr = (unsigned char *)page_address(page);
+ memset(kaddr, 0, PAGE_SIZE);
/* Step 1: write nat cache */
seg_i = CURSEG_I(sbi, CURSEG_HOT_DATA);
@@ -2785,8 +3199,9 @@
for (j = 0; j < blkoff; j++) {
if (!page) {
- page = grab_meta_page(sbi, blkaddr++);
+ page = f2fs_grab_meta_page(sbi, blkaddr++);
kaddr = (unsigned char *)page_address(page);
+ memset(kaddr, 0, PAGE_SIZE);
written_size = 0;
}
summary = (struct f2fs_summary *)(kaddr + written_size);
@@ -2821,7 +3236,7 @@
write_current_sum_page(sbi, i, blkaddr + (i - type));
}
-void write_data_summaries(struct f2fs_sb_info *sbi, block_t start_blk)
+void f2fs_write_data_summaries(struct f2fs_sb_info *sbi, block_t start_blk)
{
if (is_set_ckpt_flags(sbi, CP_COMPACT_SUM_FLAG))
write_compacted_summaries(sbi, start_blk);
@@ -2829,12 +3244,12 @@
write_normal_summaries(sbi, start_blk, CURSEG_HOT_DATA);
}
-void write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk)
+void f2fs_write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk)
{
write_normal_summaries(sbi, start_blk, CURSEG_HOT_NODE);
}
-int lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
+int f2fs_lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
unsigned int val, int alloc)
{
int i;
@@ -2859,35 +3274,26 @@
static struct page *get_current_sit_page(struct f2fs_sb_info *sbi,
unsigned int segno)
{
- return get_meta_page(sbi, current_sit_addr(sbi, segno));
+ return f2fs_get_meta_page(sbi, current_sit_addr(sbi, segno));
}
static struct page *get_next_sit_page(struct f2fs_sb_info *sbi,
unsigned int start)
{
struct sit_info *sit_i = SIT_I(sbi);
- struct page *src_page, *dst_page;
+ struct page *page;
pgoff_t src_off, dst_off;
- void *src_addr, *dst_addr;
src_off = current_sit_addr(sbi, start);
dst_off = next_sit_addr(sbi, src_off);
- /* get current sit block page without lock */
- src_page = get_meta_page(sbi, src_off);
- dst_page = grab_meta_page(sbi, dst_off);
- f2fs_bug_on(sbi, PageDirty(src_page));
+ page = f2fs_grab_meta_page(sbi, dst_off);
+ seg_info_to_sit_page(sbi, page, start);
- src_addr = page_address(src_page);
- dst_addr = page_address(dst_page);
- memcpy(dst_addr, src_addr, PAGE_SIZE);
-
- set_page_dirty(dst_page);
- f2fs_put_page(src_page, 1);
-
+ set_page_dirty(page);
set_to_next_sit(sit_i, start);
- return dst_page;
+ return page;
}
static struct sit_entry_set *grab_sit_entry_set(void)
@@ -2977,7 +3383,7 @@
* CP calls this function, which flushes SIT entries including sit_journal,
* and moves prefree segs to free segs.
*/
-void flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
+void f2fs_flush_sit_entries(struct f2fs_sb_info *sbi, struct cp_control *cpc)
{
struct sit_info *sit_i = SIT_I(sbi);
unsigned long *bitmap = sit_i->dirty_sentries_bitmap;
@@ -2988,7 +3394,7 @@
bool to_journal = true;
struct seg_entry *se;
- mutex_lock(&sit_i->sentry_lock);
+ down_write(&sit_i->sentry_lock);
if (!sit_i->dirty_sentries)
goto out;
@@ -3036,6 +3442,11 @@
int offset, sit_offset;
se = get_seg_entry(sbi, segno);
+#ifdef CONFIG_F2FS_CHECK_FS
+ if (memcmp(se->cur_valid_map, se->cur_valid_map_mir,
+ SIT_VBLOCK_MAP_SIZE))
+ f2fs_bug_on(sbi, 1);
+#endif
/* add discard candidates */
if (!(cpc->reason & CP_DISCARD)) {
@@ -3044,17 +3455,21 @@
}
if (to_journal) {
- offset = lookup_journal_in_cursum(journal,
+ offset = f2fs_lookup_journal_in_cursum(journal,
SIT_JOURNAL, segno, 1);
f2fs_bug_on(sbi, offset < 0);
segno_in_journal(journal, offset) =
cpu_to_le32(segno);
seg_info_to_raw_sit(se,
&sit_in_journal(journal, offset));
+ check_block_count(sbi, segno,
+ &sit_in_journal(journal, offset));
} else {
sit_offset = SIT_ENTRY_OFFSET(sit_i, segno);
seg_info_to_raw_sit(se,
&raw_sit->entries[sit_offset]);
+ check_block_count(sbi, segno,
+ &raw_sit->entries[sit_offset]);
}
__clear_bit(segno, bitmap);
@@ -3082,7 +3497,7 @@
cpc->trim_start = trim_start;
}
- mutex_unlock(&sit_i->sentry_lock);
+ up_write(&sit_i->sentry_lock);
set_prefree_as_free_segments(sbi);
}
@@ -3096,53 +3511,59 @@
unsigned int bitmap_size;
/* allocate memory for SIT information */
- sit_i = kzalloc(sizeof(struct sit_info), GFP_KERNEL);
+ sit_i = f2fs_kzalloc(sbi, sizeof(struct sit_info), GFP_KERNEL);
if (!sit_i)
return -ENOMEM;
SM_I(sbi)->sit_info = sit_i;
- sit_i->sentries = kvzalloc(MAIN_SEGS(sbi) *
- sizeof(struct seg_entry), GFP_KERNEL);
+ sit_i->sentries =
+ f2fs_kvzalloc(sbi, array_size(sizeof(struct seg_entry),
+ MAIN_SEGS(sbi)),
+ GFP_KERNEL);
if (!sit_i->sentries)
return -ENOMEM;
bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
- sit_i->dirty_sentries_bitmap = kvzalloc(bitmap_size, GFP_KERNEL);
+ sit_i->dirty_sentries_bitmap = f2fs_kvzalloc(sbi, bitmap_size,
+ GFP_KERNEL);
if (!sit_i->dirty_sentries_bitmap)
return -ENOMEM;
for (start = 0; start < MAIN_SEGS(sbi); start++) {
sit_i->sentries[start].cur_valid_map
- = kzalloc(SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
+ = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
sit_i->sentries[start].ckpt_valid_map
- = kzalloc(SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
+ = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
if (!sit_i->sentries[start].cur_valid_map ||
!sit_i->sentries[start].ckpt_valid_map)
return -ENOMEM;
#ifdef CONFIG_F2FS_CHECK_FS
sit_i->sentries[start].cur_valid_map_mir
- = kzalloc(SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
+ = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
if (!sit_i->sentries[start].cur_valid_map_mir)
return -ENOMEM;
#endif
if (f2fs_discard_en(sbi)) {
sit_i->sentries[start].discard_map
- = kzalloc(SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
+ = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE,
+ GFP_KERNEL);
if (!sit_i->sentries[start].discard_map)
return -ENOMEM;
}
}
- sit_i->tmp_map = kzalloc(SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
+ sit_i->tmp_map = f2fs_kzalloc(sbi, SIT_VBLOCK_MAP_SIZE, GFP_KERNEL);
if (!sit_i->tmp_map)
return -ENOMEM;
if (sbi->segs_per_sec > 1) {
- sit_i->sec_entries = kvzalloc(MAIN_SECS(sbi) *
- sizeof(struct sec_entry), GFP_KERNEL);
+ sit_i->sec_entries =
+ f2fs_kvzalloc(sbi, array_size(sizeof(struct sec_entry),
+ MAIN_SECS(sbi)),
+ GFP_KERNEL);
if (!sit_i->sec_entries)
return -ENOMEM;
}
@@ -3175,7 +3596,7 @@
sit_i->sents_per_block = SIT_ENTRY_PER_BLOCK;
sit_i->elapsed_time = le64_to_cpu(sbi->ckpt->elapsed_time);
sit_i->mounted_time = ktime_get_real_seconds();
- mutex_init(&sit_i->sentry_lock);
+ init_rwsem(&sit_i->sentry_lock);
return 0;
}
@@ -3185,19 +3606,19 @@
unsigned int bitmap_size, sec_bitmap_size;
/* allocate memory for free segmap information */
- free_i = kzalloc(sizeof(struct free_segmap_info), GFP_KERNEL);
+ free_i = f2fs_kzalloc(sbi, sizeof(struct free_segmap_info), GFP_KERNEL);
if (!free_i)
return -ENOMEM;
SM_I(sbi)->free_info = free_i;
bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
- free_i->free_segmap = kvmalloc(bitmap_size, GFP_KERNEL);
+ free_i->free_segmap = f2fs_kvmalloc(sbi, bitmap_size, GFP_KERNEL);
if (!free_i->free_segmap)
return -ENOMEM;
sec_bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
- free_i->free_secmap = kvmalloc(sec_bitmap_size, GFP_KERNEL);
+ free_i->free_secmap = f2fs_kvmalloc(sbi, sec_bitmap_size, GFP_KERNEL);
if (!free_i->free_secmap)
return -ENOMEM;
@@ -3218,7 +3639,8 @@
struct curseg_info *array;
int i;
- array = kcalloc(NR_CURSEG_TYPE, sizeof(*array), GFP_KERNEL);
+ array = f2fs_kzalloc(sbi, array_size(NR_CURSEG_TYPE, sizeof(*array)),
+ GFP_KERNEL);
if (!array)
return -ENOMEM;
@@ -3226,12 +3648,12 @@
for (i = 0; i < NR_CURSEG_TYPE; i++) {
mutex_init(&array[i].curseg_mutex);
- array[i].sum_blk = kzalloc(PAGE_SIZE, GFP_KERNEL);
+ array[i].sum_blk = f2fs_kzalloc(sbi, PAGE_SIZE, GFP_KERNEL);
if (!array[i].sum_blk)
return -ENOMEM;
init_rwsem(&array[i].journal_rwsem);
- array[i].journal = kzalloc(sizeof(struct f2fs_journal),
- GFP_KERNEL);
+ array[i].journal = f2fs_kzalloc(sbi,
+ sizeof(struct f2fs_journal), GFP_KERNEL);
if (!array[i].journal)
return -ENOMEM;
array[i].segno = NULL_SEGNO;
@@ -3254,7 +3676,7 @@
block_t total_node_blocks = 0;
do {
- readed = ra_meta_pages(sbi, start_blk, BIO_MAX_PAGES,
+ readed = f2fs_ra_meta_pages(sbi, start_blk, BIO_MAX_PAGES,
META_SIT, true);
start = start_blk * sit_i->sents_per_block;
@@ -3304,6 +3726,15 @@
unsigned int old_valid_blocks;
start = le32_to_cpu(segno_in_journal(journal, i));
+ if (start >= MAIN_SEGS(sbi)) {
+ f2fs_msg(sbi->sb, KERN_ERR,
+ "Wrong journal entry on segno %u",
+ start);
+ set_sbi_flag(sbi, SBI_NEED_FSCK);
+ err = -EINVAL;
+ break;
+ }
+
se = &sit_i->sentries[start];
sit = sit_in_journal(journal, i);
@@ -3325,14 +3756,17 @@
} else {
memcpy(se->discard_map, se->cur_valid_map,
SIT_VBLOCK_MAP_SIZE);
- sbi->discard_blks += old_valid_blocks -
- se->valid_blocks;
+ sbi->discard_blks += old_valid_blocks;
+ sbi->discard_blks -= se->valid_blocks;
}
}
- if (sbi->segs_per_sec > 1)
+ if (sbi->segs_per_sec > 1) {
get_sec_entry(sbi, start)->valid_blocks +=
- se->valid_blocks - old_valid_blocks;
+ se->valid_blocks;
+ get_sec_entry(sbi, start)->valid_blocks -=
+ old_valid_blocks;
+ }
}
up_read(&curseg->journal_rwsem);
@@ -3399,7 +3833,7 @@
struct dirty_seglist_info *dirty_i = DIRTY_I(sbi);
unsigned int bitmap_size = f2fs_bitmap_size(MAIN_SECS(sbi));
- dirty_i->victim_secmap = kvzalloc(bitmap_size, GFP_KERNEL);
+ dirty_i->victim_secmap = f2fs_kvzalloc(sbi, bitmap_size, GFP_KERNEL);
if (!dirty_i->victim_secmap)
return -ENOMEM;
return 0;
@@ -3411,7 +3845,8 @@
unsigned int bitmap_size, i;
/* allocate memory for dirty segments list information */
- dirty_i = kzalloc(sizeof(struct dirty_seglist_info), GFP_KERNEL);
+ dirty_i = f2fs_kzalloc(sbi, sizeof(struct dirty_seglist_info),
+ GFP_KERNEL);
if (!dirty_i)
return -ENOMEM;
@@ -3421,7 +3856,8 @@
bitmap_size = f2fs_bitmap_size(MAIN_SEGS(sbi));
for (i = 0; i < NR_DIRTY_TYPE; i++) {
- dirty_i->dirty_segmap[i] = kvzalloc(bitmap_size, GFP_KERNEL);
+ dirty_i->dirty_segmap[i] = f2fs_kvzalloc(sbi, bitmap_size,
+ GFP_KERNEL);
if (!dirty_i->dirty_segmap[i])
return -ENOMEM;
}
@@ -3438,9 +3874,9 @@
struct sit_info *sit_i = SIT_I(sbi);
unsigned int segno;
- mutex_lock(&sit_i->sentry_lock);
+ down_write(&sit_i->sentry_lock);
- sit_i->min_mtime = LLONG_MAX;
+ sit_i->min_mtime = ULLONG_MAX;
for (segno = 0; segno < MAIN_SEGS(sbi); segno += sbi->segs_per_sec) {
unsigned int i;
@@ -3454,18 +3890,18 @@
if (sit_i->min_mtime > mtime)
sit_i->min_mtime = mtime;
}
- sit_i->max_mtime = get_mtime(sbi);
- mutex_unlock(&sit_i->sentry_lock);
+ sit_i->max_mtime = get_mtime(sbi, false);
+ up_write(&sit_i->sentry_lock);
}
-int build_segment_manager(struct f2fs_sb_info *sbi)
+int f2fs_build_segment_manager(struct f2fs_sb_info *sbi)
{
struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
struct f2fs_checkpoint *ckpt = F2FS_CKPT(sbi);
struct f2fs_sm_info *sm_info;
int err;
- sm_info = kzalloc(sizeof(struct f2fs_sm_info), GFP_KERNEL);
+ sm_info = f2fs_kzalloc(sbi, sizeof(struct f2fs_sm_info), GFP_KERNEL);
if (!sm_info)
return -ENOMEM;
@@ -3488,13 +3924,14 @@
sm_info->min_ipu_util = DEF_MIN_IPU_UTIL;
sm_info->min_fsync_blocks = DEF_MIN_FSYNC_BLOCKS;
sm_info->min_hot_blocks = DEF_MIN_HOT_BLOCKS;
-
- sm_info->trim_sections = DEF_BATCHED_TRIM_SECTIONS;
+ sm_info->min_ssr_sections = reserved_sections(sbi);
INIT_LIST_HEAD(&sm_info->sit_entry_set);
+ init_rwsem(&sm_info->curseg_lock);
+
if (!f2fs_readonly(sbi->sb)) {
- err = create_flush_cmd_control(sbi);
+ err = f2fs_create_flush_cmd_control(sbi);
if (err)
return err;
}
@@ -3619,13 +4056,13 @@
kfree(sit_i);
}
-void destroy_segment_manager(struct f2fs_sb_info *sbi)
+void f2fs_destroy_segment_manager(struct f2fs_sb_info *sbi)
{
struct f2fs_sm_info *sm_info = SM_I(sbi);
if (!sm_info)
return;
- destroy_flush_cmd_control(sbi, true);
+ f2fs_destroy_flush_cmd_control(sbi, true);
destroy_discard_cmd_control(sbi);
destroy_dirty_segmap(sbi);
destroy_curseg(sbi);
@@ -3635,7 +4072,7 @@
kfree(sm_info);
}
-int __init create_segment_manager_caches(void)
+int __init f2fs_create_segment_manager_caches(void)
{
discard_entry_slab = f2fs_kmem_cache_create("discard_entry",
sizeof(struct discard_entry));
@@ -3668,7 +4105,7 @@
return -ENOMEM;
}
-void destroy_segment_manager_caches(void)
+void f2fs_destroy_segment_manager_caches(void)
{
kmem_cache_destroy(sit_entry_set_slab);
kmem_cache_destroy(discard_cmd_slab);
diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
index 4dfb508..38c549d 100644
--- a/fs/f2fs/segment.h
+++ b/fs/f2fs/segment.h
@@ -53,13 +53,19 @@
((secno) == CURSEG_I(sbi, CURSEG_COLD_NODE)->segno / \
(sbi)->segs_per_sec)) \
-#define MAIN_BLKADDR(sbi) (SM_I(sbi)->main_blkaddr)
-#define SEG0_BLKADDR(sbi) (SM_I(sbi)->seg0_blkaddr)
+#define MAIN_BLKADDR(sbi) \
+ (SM_I(sbi) ? SM_I(sbi)->main_blkaddr : \
+ le32_to_cpu(F2FS_RAW_SUPER(sbi)->main_blkaddr))
+#define SEG0_BLKADDR(sbi) \
+ (SM_I(sbi) ? SM_I(sbi)->seg0_blkaddr : \
+ le32_to_cpu(F2FS_RAW_SUPER(sbi)->segment0_blkaddr))
#define MAIN_SEGS(sbi) (SM_I(sbi)->main_segments)
#define MAIN_SECS(sbi) ((sbi)->total_sections)
-#define TOTAL_SEGS(sbi) (SM_I(sbi)->segment_count)
+#define TOTAL_SEGS(sbi) \
+ (SM_I(sbi) ? SM_I(sbi)->segment_count : \
+ le32_to_cpu(F2FS_RAW_SUPER(sbi)->segment_count))
#define TOTAL_BLKS(sbi) (TOTAL_SEGS(sbi) << (sbi)->log_blocks_per_seg)
#define MAX_BLKADDR(sbi) (SEG0_BLKADDR(sbi) + TOTAL_BLKS(sbi))
@@ -79,7 +85,7 @@
(GET_SEGOFF_FROM_SEG0(sbi, blk_addr) & ((sbi)->blocks_per_seg - 1))
#define GET_SEGNO(sbi, blk_addr) \
- ((((blk_addr) == NULL_ADDR) || ((blk_addr) == NEW_ADDR)) ? \
+ ((!is_valid_blkaddr(blk_addr)) ? \
NULL_SEGNO : GET_L2R_SEGNO(FREE_I(sbi), \
GET_SEGNO_FROM_SEG0(sbi, blk_addr)))
#define BLKS_PER_SEC(sbi) \
@@ -209,6 +215,8 @@
#define IS_DUMMY_WRITTEN_PAGE(page) \
(page_private(page) == (unsigned long)DUMMY_WRITTEN_PAGE)
+#define MAX_SKIP_ATOMIC_COUNT 16
+
struct inmem_pages {
struct list_head list;
struct page *page;
@@ -231,7 +239,7 @@
unsigned long *dirty_sentries_bitmap; /* bitmap for dirty sentries */
unsigned int dirty_sentries; /* # of dirty sentries */
unsigned int sents_per_block; /* # of SIT entries per block */
- struct mutex sentry_lock; /* to protect SIT cache */
+ struct rw_semaphore sentry_lock; /* to protect SIT cache */
struct seg_entry *sentries; /* SIT segment-level cache */
struct sec_entry *sec_entries; /* SIT section-level cache */
@@ -348,16 +356,42 @@
se->mtime = le64_to_cpu(rs->mtime);
}
-static inline void seg_info_to_raw_sit(struct seg_entry *se,
+static inline void __seg_info_to_raw_sit(struct seg_entry *se,
struct f2fs_sit_entry *rs)
{
unsigned short raw_vblocks = (se->type << SIT_VBLOCKS_SHIFT) |
se->valid_blocks;
rs->vblocks = cpu_to_le16(raw_vblocks);
memcpy(rs->valid_map, se->cur_valid_map, SIT_VBLOCK_MAP_SIZE);
+ rs->mtime = cpu_to_le64(se->mtime);
+}
+
+static inline void seg_info_to_sit_page(struct f2fs_sb_info *sbi,
+ struct page *page, unsigned int start)
+{
+ struct f2fs_sit_block *raw_sit;
+ struct seg_entry *se;
+ struct f2fs_sit_entry *rs;
+ unsigned int end = min(start + SIT_ENTRY_PER_BLOCK,
+ (unsigned long)MAIN_SEGS(sbi));
+ int i;
+
+ raw_sit = (struct f2fs_sit_block *)page_address(page);
+ memset(raw_sit, 0, PAGE_SIZE);
+ for (i = 0; i < end - start; i++) {
+ rs = &raw_sit->entries[i];
+ se = get_seg_entry(sbi, start + i);
+ __seg_info_to_raw_sit(se, rs);
+ }
+}
+
+static inline void seg_info_to_raw_sit(struct seg_entry *se,
+ struct f2fs_sit_entry *rs)
+{
+ __seg_info_to_raw_sit(se, rs);
+
memcpy(se->ckpt_valid_map, rs->valid_map, SIT_VBLOCK_MAP_SIZE);
se->ckpt_valid_blocks = se->valid_blocks;
- rs->mtime = cpu_to_le64(se->mtime);
}
static inline unsigned int find_next_inuse(struct free_segmap_info *free_i,
@@ -500,6 +534,33 @@
return GET_SEC_FROM_SEG(sbi, (unsigned int)reserved_segments(sbi));
}
+static inline bool has_curseg_enough_space(struct f2fs_sb_info *sbi)
+{
+ unsigned int node_blocks = get_pages(sbi, F2FS_DIRTY_NODES) +
+ get_pages(sbi, F2FS_DIRTY_DENTS);
+ unsigned int dent_blocks = get_pages(sbi, F2FS_DIRTY_DENTS);
+ unsigned int segno, left_blocks;
+ int i;
+
+ /* check current node segment */
+ for (i = CURSEG_HOT_NODE; i <= CURSEG_COLD_NODE; i++) {
+ segno = CURSEG_I(sbi, i)->segno;
+ left_blocks = sbi->blocks_per_seg -
+ get_seg_entry(sbi, segno)->ckpt_valid_blocks;
+
+ if (node_blocks > left_blocks)
+ return false;
+ }
+
+ /* check current data segment */
+ segno = CURSEG_I(sbi, CURSEG_HOT_DATA)->segno;
+ left_blocks = sbi->blocks_per_seg -
+ get_seg_entry(sbi, segno)->ckpt_valid_blocks;
+ if (dent_blocks > left_blocks)
+ return false;
+ return true;
+}
+
static inline bool has_not_enough_free_secs(struct f2fs_sb_info *sbi,
int freed, int needed)
{
@@ -510,6 +571,9 @@
if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
return false;
+ if (free_sections(sbi) + freed == reserved_sections(sbi) + needed &&
+ has_curseg_enough_space(sbi))
+ return false;
return (free_sections(sbi) + freed) <=
(node_secs + 2 * dent_secs + imeta_secs +
reserved_sections(sbi) + needed);
@@ -544,6 +608,8 @@
#define DEF_MIN_FSYNC_BLOCKS 8
#define DEF_MIN_HOT_BLOCKS 16
+#define SMALL_VOLUME_SEGMENTS (16 * 512) /* 16GB */
+
enum {
F2FS_IPU_FORCE,
F2FS_IPU_SSR,
@@ -553,47 +619,6 @@
F2FS_IPU_ASYNC,
};
-static inline bool need_inplace_update_policy(struct inode *inode,
- struct f2fs_io_info *fio)
-{
- struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
- unsigned int policy = SM_I(sbi)->ipu_policy;
-
- if (test_opt(sbi, LFS))
- return false;
-
- /* if this is cold file, we should overwrite to avoid fragmentation */
- if (file_is_cold(inode))
- return true;
-
- if (policy & (0x1 << F2FS_IPU_FORCE))
- return true;
- if (policy & (0x1 << F2FS_IPU_SSR) && need_SSR(sbi))
- return true;
- if (policy & (0x1 << F2FS_IPU_UTIL) &&
- utilization(sbi) > SM_I(sbi)->min_ipu_util)
- return true;
- if (policy & (0x1 << F2FS_IPU_SSR_UTIL) && need_SSR(sbi) &&
- utilization(sbi) > SM_I(sbi)->min_ipu_util)
- return true;
-
- /*
- * IPU for rewrite async pages
- */
- if (policy & (0x1 << F2FS_IPU_ASYNC) &&
- fio && fio->op == REQ_OP_WRITE &&
- !(fio->op_flags & REQ_SYNC) &&
- !f2fs_encrypted_inode(inode))
- return true;
-
- /* this is only set during fdatasync */
- if (policy & (0x1 << F2FS_IPU_FSYNC) &&
- is_inode_flag_set(inode, FI_NEED_IPU))
- return true;
-
- return false;
-}
-
static inline unsigned int curseg_segno(struct f2fs_sb_info *sbi,
int type)
{
@@ -619,10 +644,17 @@
f2fs_bug_on(sbi, segno > TOTAL_SEGS(sbi) - 1);
}
-static inline void verify_block_addr(struct f2fs_sb_info *sbi, block_t blk_addr)
+static inline void verify_block_addr(struct f2fs_io_info *fio, block_t blk_addr)
{
- BUG_ON(blk_addr < SEG0_BLKADDR(sbi)
- || blk_addr >= MAX_BLKADDR(sbi));
+ struct f2fs_sb_info *sbi = fio->sbi;
+
+ if (PAGE_TYPE_OF_BIO(fio->type) == META &&
+ (!is_read_io(fio->op) || fio->is_meta))
+ BUG_ON(blk_addr < SEG0_BLKADDR(sbi) ||
+ blk_addr >= MAIN_BLKADDR(sbi));
+ else
+ BUG_ON(blk_addr < MAIN_BLKADDR(sbi) ||
+ blk_addr >= MAX_BLKADDR(sbi));
}
/*
@@ -716,12 +748,23 @@
#endif
}
-static inline unsigned long long get_mtime(struct f2fs_sb_info *sbi)
+static inline unsigned long long get_mtime(struct f2fs_sb_info *sbi,
+ bool base_time)
{
struct sit_info *sit_i = SIT_I(sbi);
- time64_t now = ktime_get_real_seconds();
+ time64_t diff, now = ktime_get_real_seconds();
- return sit_i->elapsed_time + now - sit_i->mounted_time;
+ if (now >= sit_i->mounted_time)
+ return sit_i->elapsed_time + now - sit_i->mounted_time;
+
+ /* system time is set to the past */
+ if (!base_time) {
+ diff = sit_i->mounted_time - now;
+ if (sit_i->elapsed_time >= diff)
+ return sit_i->elapsed_time - diff;
+ return 0;
+ }
+ return sit_i->elapsed_time;
}
static inline void set_summary(struct f2fs_summary *sum, nid_t nid,
@@ -745,15 +788,6 @@
- (base + 1) + type;
}
-static inline bool no_fggc_candidate(struct f2fs_sb_info *sbi,
- unsigned int secno)
-{
- if (get_valid_blocks(sbi, GET_SEG_FROM_SEC(sbi, secno), true) >=
- sbi->fggc_threshold)
- return true;
- return false;
-}
-
static inline bool sec_usage_check(struct f2fs_sb_info *sbi, unsigned int secno)
{
if (IS_CURSEC(sbi, secno) || (sbi->cur_victim_sec == secno))
@@ -813,8 +847,9 @@
goto wake_up;
mutex_lock(&dcc->cmd_lock);
- for (i = MAX_PLIST_NUM - 1;
- i >= 0 && plist_issue(dcc->pend_list_tag[i]); i--) {
+ for (i = MAX_PLIST_NUM - 1; i >= 0; i--) {
+ if (i + 1 < dcc->discard_granularity)
+ break;
if (!list_empty(&dcc->pend_list[i])) {
wakeup = true;
break;
diff --git a/fs/f2fs/shrinker.c b/fs/f2fs/shrinker.c
index 5c60fc2..36cfd81 100644
--- a/fs/f2fs/shrinker.c
+++ b/fs/f2fs/shrinker.c
@@ -28,7 +28,7 @@
static unsigned long __count_free_nids(struct f2fs_sb_info *sbi)
{
- long count = NM_I(sbi)->nid_cnt[FREE_NID_LIST] - MAX_FREE_NIDS;
+ long count = NM_I(sbi)->nid_cnt[FREE_NID] - MAX_FREE_NIDS;
return count > 0 ? count : 0;
}
@@ -109,11 +109,11 @@
/* shrink clean nat cache entries */
if (freed < nr)
- freed += try_to_free_nats(sbi, nr - freed);
+ freed += f2fs_try_to_free_nats(sbi, nr - freed);
/* shrink free nids cache entries */
if (freed < nr)
- freed += try_to_free_nids(sbi, nr - freed);
+ freed += f2fs_try_to_free_nids(sbi, nr - freed);
spin_lock(&f2fs_list_lock);
p = p->next;
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index eae3590..fb1948e 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -43,7 +43,10 @@
char *fault_name[FAULT_MAX] = {
[FAULT_KMALLOC] = "kmalloc",
+ [FAULT_KVMALLOC] = "kvmalloc",
[FAULT_PAGE_ALLOC] = "page alloc",
+ [FAULT_PAGE_GET] = "page get",
+ [FAULT_ALLOC_BIO] = "alloc bio",
[FAULT_ALLOC_NID] = "alloc nid",
[FAULT_ORPHAN] = "orphan",
[FAULT_BLOCK] = "no more block",
@@ -57,7 +60,7 @@
static void f2fs_build_fault_attr(struct f2fs_sb_info *sbi,
unsigned int rate)
{
- struct f2fs_fault_info *ffi = &sbi->fault_info;
+ struct f2fs_fault_info *ffi = &F2FS_OPTION(sbi).fault_info;
if (rate) {
atomic_set(&ffi->inject_ops, 0);
@@ -92,6 +95,7 @@
Opt_disable_ext_identify,
Opt_inline_xattr,
Opt_noinline_xattr,
+ Opt_inline_xattr_size,
Opt_inline_data,
Opt_inline_dentry,
Opt_noinline_dentry,
@@ -103,6 +107,9 @@
Opt_noextent_cache,
Opt_noinline_data,
Opt_data_flush,
+ Opt_reserve_root,
+ Opt_resgid,
+ Opt_resuid,
Opt_mode,
Opt_io_size_bits,
Opt_fault_injection,
@@ -122,6 +129,10 @@
Opt_jqfmt_vfsold,
Opt_jqfmt_vfsv0,
Opt_jqfmt_vfsv1,
+ Opt_whint,
+ Opt_alloc,
+ Opt_fsync,
+ Opt_test_dummy_encryption,
Opt_err,
};
@@ -141,6 +152,7 @@
{Opt_disable_ext_identify, "disable_ext_identify"},
{Opt_inline_xattr, "inline_xattr"},
{Opt_noinline_xattr, "noinline_xattr"},
+ {Opt_inline_xattr_size, "inline_xattr_size=%u"},
{Opt_inline_data, "inline_data"},
{Opt_inline_dentry, "inline_dentry"},
{Opt_noinline_dentry, "noinline_dentry"},
@@ -152,6 +164,9 @@
{Opt_noextent_cache, "noextent_cache"},
{Opt_noinline_data, "noinline_data"},
{Opt_data_flush, "data_flush"},
+ {Opt_reserve_root, "reserve_root=%u"},
+ {Opt_resgid, "resgid=%u"},
+ {Opt_resuid, "resuid=%u"},
{Opt_mode, "mode=%s"},
{Opt_io_size_bits, "io_bits=%u"},
{Opt_fault_injection, "fault_injection=%u"},
@@ -171,6 +186,10 @@
{Opt_jqfmt_vfsold, "jqfmt=vfsold"},
{Opt_jqfmt_vfsv0, "jqfmt=vfsv0"},
{Opt_jqfmt_vfsv1, "jqfmt=vfsv1"},
+ {Opt_whint, "whint_mode=%s"},
+ {Opt_alloc, "alloc_mode=%s"},
+ {Opt_fsync, "fsync_mode=%s"},
+ {Opt_test_dummy_encryption, "test_dummy_encryption"},
{Opt_err, NULL},
};
@@ -186,6 +205,31 @@
va_end(args);
}
+static inline void limit_reserve_root(struct f2fs_sb_info *sbi)
+{
+ block_t limit = (sbi->user_block_count << 1) / 1000;
+
+ /* limit is 0.2% */
+ if (test_opt(sbi, RESERVE_ROOT) &&
+ F2FS_OPTION(sbi).root_reserved_blocks > limit) {
+ F2FS_OPTION(sbi).root_reserved_blocks = limit;
+ f2fs_msg(sbi->sb, KERN_INFO,
+ "Reduce reserved blocks for root = %u",
+ F2FS_OPTION(sbi).root_reserved_blocks);
+ }
+ if (!test_opt(sbi, RESERVE_ROOT) &&
+ (!uid_eq(F2FS_OPTION(sbi).s_resuid,
+ make_kuid(&init_user_ns, F2FS_DEF_RESUID)) ||
+ !gid_eq(F2FS_OPTION(sbi).s_resgid,
+ make_kgid(&init_user_ns, F2FS_DEF_RESGID))))
+ f2fs_msg(sbi->sb, KERN_INFO,
+ "Ignore s_resuid=%u, s_resgid=%u w/o reserve_root",
+ from_kuid_munged(&init_user_ns,
+ F2FS_OPTION(sbi).s_resuid),
+ from_kgid_munged(&init_user_ns,
+ F2FS_OPTION(sbi).s_resgid));
+}
+
static void init_once(void *foo)
{
struct f2fs_inode_info *fi = (struct f2fs_inode_info *) foo;
@@ -203,20 +247,26 @@
char *qname;
int ret = -EINVAL;
- if (sb_any_quota_loaded(sb) && !sbi->s_qf_names[qtype]) {
+ if (sb_any_quota_loaded(sb) && !F2FS_OPTION(sbi).s_qf_names[qtype]) {
f2fs_msg(sb, KERN_ERR,
"Cannot change journaled "
"quota options when quota turned on");
return -EINVAL;
}
+ if (f2fs_sb_has_quota_ino(sb)) {
+ f2fs_msg(sb, KERN_INFO,
+ "QUOTA feature is enabled, so ignore qf_name");
+ return 0;
+ }
+
qname = match_strdup(args);
if (!qname) {
f2fs_msg(sb, KERN_ERR,
"Not enough memory for storing quotafile name");
return -EINVAL;
}
- if (sbi->s_qf_names[qtype]) {
- if (strcmp(sbi->s_qf_names[qtype], qname) == 0)
+ if (F2FS_OPTION(sbi).s_qf_names[qtype]) {
+ if (strcmp(F2FS_OPTION(sbi).s_qf_names[qtype], qname) == 0)
ret = 0;
else
f2fs_msg(sb, KERN_ERR,
@@ -229,7 +279,7 @@
"quotafile must be on filesystem root");
goto errout;
}
- sbi->s_qf_names[qtype] = qname;
+ F2FS_OPTION(sbi).s_qf_names[qtype] = qname;
set_opt(sbi, QUOTA);
return 0;
errout:
@@ -241,13 +291,13 @@
{
struct f2fs_sb_info *sbi = F2FS_SB(sb);
- if (sb_any_quota_loaded(sb) && sbi->s_qf_names[qtype]) {
+ if (sb_any_quota_loaded(sb) && F2FS_OPTION(sbi).s_qf_names[qtype]) {
f2fs_msg(sb, KERN_ERR, "Cannot change journaled quota options"
" when quota turned on");
return -EINVAL;
}
- kfree(sbi->s_qf_names[qtype]);
- sbi->s_qf_names[qtype] = NULL;
+ kfree(F2FS_OPTION(sbi).s_qf_names[qtype]);
+ F2FS_OPTION(sbi).s_qf_names[qtype] = NULL;
return 0;
}
@@ -263,15 +313,19 @@
"Cannot enable project quota enforcement.");
return -1;
}
- if (sbi->s_qf_names[USRQUOTA] || sbi->s_qf_names[GRPQUOTA] ||
- sbi->s_qf_names[PRJQUOTA]) {
- if (test_opt(sbi, USRQUOTA) && sbi->s_qf_names[USRQUOTA])
+ if (F2FS_OPTION(sbi).s_qf_names[USRQUOTA] ||
+ F2FS_OPTION(sbi).s_qf_names[GRPQUOTA] ||
+ F2FS_OPTION(sbi).s_qf_names[PRJQUOTA]) {
+ if (test_opt(sbi, USRQUOTA) &&
+ F2FS_OPTION(sbi).s_qf_names[USRQUOTA])
clear_opt(sbi, USRQUOTA);
- if (test_opt(sbi, GRPQUOTA) && sbi->s_qf_names[GRPQUOTA])
+ if (test_opt(sbi, GRPQUOTA) &&
+ F2FS_OPTION(sbi).s_qf_names[GRPQUOTA])
clear_opt(sbi, GRPQUOTA);
- if (test_opt(sbi, PRJQUOTA) && sbi->s_qf_names[PRJQUOTA])
+ if (test_opt(sbi, PRJQUOTA) &&
+ F2FS_OPTION(sbi).s_qf_names[PRJQUOTA])
clear_opt(sbi, PRJQUOTA);
if (test_opt(sbi, GRPQUOTA) || test_opt(sbi, USRQUOTA) ||
@@ -281,12 +335,24 @@
return -1;
}
- if (!sbi->s_jquota_fmt) {
+ if (!F2FS_OPTION(sbi).s_jquota_fmt) {
f2fs_msg(sbi->sb, KERN_ERR, "journaled quota format "
"not specified");
return -1;
}
}
+
+ if (f2fs_sb_has_quota_ino(sbi->sb) && F2FS_OPTION(sbi).s_jquota_fmt) {
+ f2fs_msg(sbi->sb, KERN_INFO,
+ "QUOTA feature is enabled, so ignore jquota_fmt");
+ F2FS_OPTION(sbi).s_jquota_fmt = 0;
+ }
+ if (f2fs_sb_has_quota_ino(sbi->sb) && f2fs_readonly(sbi->sb)) {
+ f2fs_msg(sbi->sb, KERN_INFO,
+ "Filesystem with quota feature cannot be mounted RDWR "
+ "without CONFIG_QUOTA");
+ return -1;
+ }
return 0;
}
#endif
@@ -298,6 +364,8 @@
substring_t args[MAX_OPT_ARGS];
char *p, *name;
int arg = 0;
+ kuid_t uid;
+ kgid_t gid;
#ifdef CONFIG_QUOTA
int ret;
#endif
@@ -350,14 +418,14 @@
q = bdev_get_queue(sb->s_bdev);
if (blk_queue_discard(q)) {
set_opt(sbi, DISCARD);
- } else if (!f2fs_sb_mounted_blkzoned(sb)) {
+ } else if (!f2fs_sb_has_blkzoned(sb)) {
f2fs_msg(sb, KERN_WARNING,
"mounting with \"discard\" option, but "
"the device does not support discard");
}
break;
case Opt_nodiscard:
- if (f2fs_sb_mounted_blkzoned(sb)) {
+ if (f2fs_sb_has_blkzoned(sb)) {
f2fs_msg(sb, KERN_WARNING,
"discard is required for zoned block devices");
return -EINVAL;
@@ -383,6 +451,12 @@
case Opt_noinline_xattr:
clear_opt(sbi, INLINE_XATTR);
break;
+ case Opt_inline_xattr_size:
+ if (args->from && match_int(args, &arg))
+ return -EINVAL;
+ set_opt(sbi, INLINE_XATTR_SIZE);
+ F2FS_OPTION(sbi).inline_xattr_size = arg;
+ break;
#else
case Opt_user_xattr:
f2fs_msg(sb, KERN_INFO,
@@ -421,7 +495,7 @@
return -EINVAL;
if (arg != 2 && arg != 4 && arg != NR_CURSEG_TYPE)
return -EINVAL;
- sbi->active_logs = arg;
+ F2FS_OPTION(sbi).active_logs = arg;
break;
case Opt_disable_ext_identify:
set_opt(sbi, DISABLE_EXT_IDENTIFY);
@@ -459,6 +533,40 @@
case Opt_data_flush:
set_opt(sbi, DATA_FLUSH);
break;
+ case Opt_reserve_root:
+ if (args->from && match_int(args, &arg))
+ return -EINVAL;
+ if (test_opt(sbi, RESERVE_ROOT)) {
+ f2fs_msg(sb, KERN_INFO,
+ "Preserve previous reserve_root=%u",
+ F2FS_OPTION(sbi).root_reserved_blocks);
+ } else {
+ F2FS_OPTION(sbi).root_reserved_blocks = arg;
+ set_opt(sbi, RESERVE_ROOT);
+ }
+ break;
+ case Opt_resuid:
+ if (args->from && match_int(args, &arg))
+ return -EINVAL;
+ uid = make_kuid(current_user_ns(), arg);
+ if (!uid_valid(uid)) {
+ f2fs_msg(sb, KERN_ERR,
+ "Invalid uid value %d", arg);
+ return -EINVAL;
+ }
+ F2FS_OPTION(sbi).s_resuid = uid;
+ break;
+ case Opt_resgid:
+ if (args->from && match_int(args, &arg))
+ return -EINVAL;
+ gid = make_kgid(current_user_ns(), arg);
+ if (!gid_valid(gid)) {
+ f2fs_msg(sb, KERN_ERR,
+ "Invalid gid value %d", arg);
+ return -EINVAL;
+ }
+ F2FS_OPTION(sbi).s_resgid = gid;
+ break;
case Opt_mode:
name = match_strdup(&args[0]);
@@ -466,7 +574,7 @@
return -ENOMEM;
if (strlen(name) == 8 &&
!strncmp(name, "adaptive", 8)) {
- if (f2fs_sb_mounted_blkzoned(sb)) {
+ if (f2fs_sb_has_blkzoned(sb)) {
f2fs_msg(sb, KERN_WARNING,
"adaptive mode is not allowed with "
"zoned block device feature");
@@ -492,7 +600,7 @@
1 << arg, BIO_MAX_PAGES);
return -EINVAL;
}
- sbi->write_io_size_bits = arg;
+ F2FS_OPTION(sbi).write_io_size_bits = arg;
break;
case Opt_fault_injection:
if (args->from && match_int(args, &arg))
@@ -553,13 +661,13 @@
return ret;
break;
case Opt_jqfmt_vfsold:
- sbi->s_jquota_fmt = QFMT_VFS_OLD;
+ F2FS_OPTION(sbi).s_jquota_fmt = QFMT_VFS_OLD;
break;
case Opt_jqfmt_vfsv0:
- sbi->s_jquota_fmt = QFMT_VFS_V0;
+ F2FS_OPTION(sbi).s_jquota_fmt = QFMT_VFS_V0;
break;
case Opt_jqfmt_vfsv1:
- sbi->s_jquota_fmt = QFMT_VFS_V1;
+ F2FS_OPTION(sbi).s_jquota_fmt = QFMT_VFS_V1;
break;
case Opt_noquota:
clear_opt(sbi, QUOTA);
@@ -586,6 +694,77 @@
"quota operations not supported");
break;
#endif
+ case Opt_whint:
+ name = match_strdup(&args[0]);
+ if (!name)
+ return -ENOMEM;
+ if (strlen(name) == 10 &&
+ !strncmp(name, "user-based", 10)) {
+ F2FS_OPTION(sbi).whint_mode = WHINT_MODE_USER;
+ } else if (strlen(name) == 3 &&
+ !strncmp(name, "off", 3)) {
+ F2FS_OPTION(sbi).whint_mode = WHINT_MODE_OFF;
+ } else if (strlen(name) == 8 &&
+ !strncmp(name, "fs-based", 8)) {
+ F2FS_OPTION(sbi).whint_mode = WHINT_MODE_FS;
+ } else {
+ kfree(name);
+ return -EINVAL;
+ }
+ kfree(name);
+ break;
+ case Opt_alloc:
+ name = match_strdup(&args[0]);
+ if (!name)
+ return -ENOMEM;
+
+ if (strlen(name) == 7 &&
+ !strncmp(name, "default", 7)) {
+ F2FS_OPTION(sbi).alloc_mode = ALLOC_MODE_DEFAULT;
+ } else if (strlen(name) == 5 &&
+ !strncmp(name, "reuse", 5)) {
+ F2FS_OPTION(sbi).alloc_mode = ALLOC_MODE_REUSE;
+ } else {
+ kfree(name);
+ return -EINVAL;
+ }
+ kfree(name);
+ break;
+ case Opt_fsync:
+ name = match_strdup(&args[0]);
+ if (!name)
+ return -ENOMEM;
+ if (strlen(name) == 5 &&
+ !strncmp(name, "posix", 5)) {
+ F2FS_OPTION(sbi).fsync_mode = FSYNC_MODE_POSIX;
+ } else if (strlen(name) == 6 &&
+ !strncmp(name, "strict", 6)) {
+ F2FS_OPTION(sbi).fsync_mode = FSYNC_MODE_STRICT;
+ } else if (strlen(name) == 9 &&
+ !strncmp(name, "nobarrier", 9)) {
+ F2FS_OPTION(sbi).fsync_mode =
+ FSYNC_MODE_NOBARRIER;
+ } else {
+ kfree(name);
+ return -EINVAL;
+ }
+ kfree(name);
+ break;
+ case Opt_test_dummy_encryption:
+#ifdef CONFIG_F2FS_FS_ENCRYPTION
+ if (!f2fs_sb_has_encrypt(sb)) {
+ f2fs_msg(sb, KERN_ERR, "Encrypt feature is off");
+ return -EINVAL;
+ }
+
+ F2FS_OPTION(sbi).test_dummy_encryption = true;
+ f2fs_msg(sb, KERN_INFO,
+ "Test dummy encryption mode enabled");
+#else
+ f2fs_msg(sb, KERN_INFO,
+ "Test dummy encryption mount option ignored");
+#endif
+ break;
default:
f2fs_msg(sb, KERN_ERR,
"Unrecognized mount option \"%s\" or missing value",
@@ -604,6 +783,38 @@
F2FS_IO_SIZE_KB(sbi));
return -EINVAL;
}
+
+ if (test_opt(sbi, INLINE_XATTR_SIZE)) {
+ if (!f2fs_sb_has_extra_attr(sb) ||
+ !f2fs_sb_has_flexible_inline_xattr(sb)) {
+ f2fs_msg(sb, KERN_ERR,
+ "extra_attr or flexible_inline_xattr "
+ "feature is off");
+ return -EINVAL;
+ }
+ if (!test_opt(sbi, INLINE_XATTR)) {
+ f2fs_msg(sb, KERN_ERR,
+ "inline_xattr_size option should be "
+ "set with inline_xattr option");
+ return -EINVAL;
+ }
+ if (!F2FS_OPTION(sbi).inline_xattr_size ||
+ F2FS_OPTION(sbi).inline_xattr_size >=
+ DEF_ADDRS_PER_INODE -
+ F2FS_TOTAL_EXTRA_ATTR_SIZE -
+ DEF_INLINE_RESERVED_SIZE -
+ DEF_MIN_INLINE_SIZE) {
+ f2fs_msg(sb, KERN_ERR,
+ "inline xattr size is out of range");
+ return -EINVAL;
+ }
+ }
+
+ /* Not pass down write hints if the number of active logs is lesser
+ * than NR_CURSEG_TYPE.
+ */
+ if (F2FS_OPTION(sbi).active_logs != NR_CURSEG_TYPE)
+ F2FS_OPTION(sbi).whint_mode = WHINT_MODE_OFF;
return 0;
}
@@ -618,24 +829,18 @@
init_once((void *) fi);
/* Initialize f2fs-specific inode info */
- fi->vfs_inode.i_version = 1;
atomic_set(&fi->dirty_pages, 0);
- fi->i_current_depth = 1;
- fi->i_advise = 0;
init_rwsem(&fi->i_sem);
INIT_LIST_HEAD(&fi->dirty_list);
INIT_LIST_HEAD(&fi->gdirty_list);
+ INIT_LIST_HEAD(&fi->inmem_ilist);
INIT_LIST_HEAD(&fi->inmem_pages);
mutex_init(&fi->inmem_lock);
- init_rwsem(&fi->dio_rwsem[READ]);
- init_rwsem(&fi->dio_rwsem[WRITE]);
+ init_rwsem(&fi->i_gc_rwsem[READ]);
+ init_rwsem(&fi->i_gc_rwsem[WRITE]);
init_rwsem(&fi->i_mmap_sem);
init_rwsem(&fi->i_xattr_sem);
-#ifdef CONFIG_QUOTA
- memset(&fi->i_dquot, 0, sizeof(fi->i_dquot));
- fi->i_reserved_quota = 0;
-#endif
/* Will be used by directory only */
fi->i_dir_level = F2FS_SB(sb)->dir_level;
@@ -660,7 +865,7 @@
/* some remained atomic pages should discarded */
if (f2fs_is_atomic_file(inode))
- drop_inmem_pages(inode);
+ f2fs_drop_inmem_pages(inode);
/* should remain fi->extent_tree for writepage */
f2fs_destroy_extent_node(inode);
@@ -673,7 +878,6 @@
sb_end_intwrite(inode->i_sb);
- fscrypt_put_encryption_info(inode, NULL);
spin_lock(&inode->i_lock);
atomic_dec(&inode->i_count);
}
@@ -781,6 +985,7 @@
{
struct f2fs_sb_info *sbi = F2FS_SB(sb);
int i;
+ bool dropped;
f2fs_quota_off_umount(sb);
@@ -797,27 +1002,27 @@
struct cp_control cpc = {
.reason = CP_UMOUNT,
};
- write_checkpoint(sbi, &cpc);
+ f2fs_write_checkpoint(sbi, &cpc);
}
/* be sure to wait for any on-going discard commands */
- f2fs_wait_discard_bios(sbi, true);
+ dropped = f2fs_wait_discard_bios(sbi);
- if (f2fs_discard_en(sbi) && !sbi->discard_blks) {
+ if (f2fs_discard_en(sbi) && !sbi->discard_blks && !dropped) {
struct cp_control cpc = {
.reason = CP_UMOUNT | CP_TRIMMED,
};
- write_checkpoint(sbi, &cpc);
+ f2fs_write_checkpoint(sbi, &cpc);
}
- /* write_checkpoint can update stat informaion */
+ /* f2fs_write_checkpoint can update stat informaion */
f2fs_destroy_stats(sbi);
/*
* normally superblock is clean, so we need to release this.
* In addition, EIO will skip do checkpoint, we need this as well.
*/
- release_ino_entry(sbi, true);
+ f2fs_release_ino_entry(sbi, true);
f2fs_leave_shrinker(sbi);
mutex_unlock(&sbi->umount_mutex);
@@ -829,8 +1034,8 @@
iput(sbi->meta_inode);
/* destroy f2fs internal modules */
- destroy_node_manager(sbi);
- destroy_segment_manager(sbi);
+ f2fs_destroy_node_manager(sbi);
+ f2fs_destroy_segment_manager(sbi);
kfree(sbi->ckpt);
@@ -845,7 +1050,7 @@
mempool_destroy(sbi->write_io_dummy);
#ifdef CONFIG_QUOTA
for (i = 0; i < MAXQUOTAS; i++)
- kfree(sbi->s_qf_names[i]);
+ kfree(F2FS_OPTION(sbi).s_qf_names[i]);
#endif
destroy_percpu_info(sbi);
for (i = 0; i < NR_PAGE_TYPE; i++)
@@ -858,6 +1063,9 @@
struct f2fs_sb_info *sbi = F2FS_SB(sb);
int err = 0;
+ if (unlikely(f2fs_cp_error(sbi)))
+ return 0;
+
trace_f2fs_sync_fs(sb, sync);
if (unlikely(is_sbi_flag_set(sbi, SBI_POR_DOING)))
@@ -869,7 +1077,7 @@
cpc.reason = __get_cp_reason(sbi);
mutex_lock(&sbi->gc_mutex);
- err = write_checkpoint(sbi, &cpc);
+ err = f2fs_write_checkpoint(sbi, &cpc);
mutex_unlock(&sbi->gc_mutex);
}
f2fs_trace_ios(NULL, 1);
@@ -944,22 +1152,26 @@
struct super_block *sb = dentry->d_sb;
struct f2fs_sb_info *sbi = F2FS_SB(sb);
u64 id = huge_encode_dev(sb->s_bdev->bd_dev);
- block_t total_count, user_block_count, start_count, ovp_count;
+ block_t total_count, user_block_count, start_count;
u64 avail_node_count;
total_count = le64_to_cpu(sbi->raw_super->block_count);
user_block_count = sbi->user_block_count;
start_count = le32_to_cpu(sbi->raw_super->segment0_blkaddr);
- ovp_count = SM_I(sbi)->ovp_segments << sbi->log_blocks_per_seg;
buf->f_type = F2FS_SUPER_MAGIC;
buf->f_bsize = sbi->blocksize;
buf->f_blocks = total_count - start_count;
- buf->f_bfree = user_block_count - valid_user_blocks(sbi) + ovp_count;
- buf->f_bavail = user_block_count - valid_user_blocks(sbi) -
- sbi->reserved_blocks;
+ buf->f_bfree = user_block_count - valid_user_blocks(sbi) -
+ sbi->current_reserved_blocks;
+ if (buf->f_bfree > F2FS_OPTION(sbi).root_reserved_blocks)
+ buf->f_bavail = buf->f_bfree -
+ F2FS_OPTION(sbi).root_reserved_blocks;
+ else
+ buf->f_bavail = 0;
- avail_node_count = sbi->total_node_count - F2FS_RESERVED_NODE_NUM;
+ avail_node_count = sbi->total_node_count - sbi->nquota_files -
+ F2FS_RESERVED_NODE_NUM;
if (avail_node_count > user_block_count) {
buf->f_files = user_block_count;
@@ -989,10 +1201,10 @@
#ifdef CONFIG_QUOTA
struct f2fs_sb_info *sbi = F2FS_SB(sb);
- if (sbi->s_jquota_fmt) {
+ if (F2FS_OPTION(sbi).s_jquota_fmt) {
char *fmtname = "";
- switch (sbi->s_jquota_fmt) {
+ switch (F2FS_OPTION(sbi).s_jquota_fmt) {
case QFMT_VFS_OLD:
fmtname = "vfsold";
break;
@@ -1006,14 +1218,17 @@
seq_printf(seq, ",jqfmt=%s", fmtname);
}
- if (sbi->s_qf_names[USRQUOTA])
- seq_show_option(seq, "usrjquota", sbi->s_qf_names[USRQUOTA]);
+ if (F2FS_OPTION(sbi).s_qf_names[USRQUOTA])
+ seq_show_option(seq, "usrjquota",
+ F2FS_OPTION(sbi).s_qf_names[USRQUOTA]);
- if (sbi->s_qf_names[GRPQUOTA])
- seq_show_option(seq, "grpjquota", sbi->s_qf_names[GRPQUOTA]);
+ if (F2FS_OPTION(sbi).s_qf_names[GRPQUOTA])
+ seq_show_option(seq, "grpjquota",
+ F2FS_OPTION(sbi).s_qf_names[GRPQUOTA]);
- if (sbi->s_qf_names[PRJQUOTA])
- seq_show_option(seq, "prjjquota", sbi->s_qf_names[PRJQUOTA]);
+ if (F2FS_OPTION(sbi).s_qf_names[PRJQUOTA])
+ seq_show_option(seq, "prjjquota",
+ F2FS_OPTION(sbi).s_qf_names[PRJQUOTA]);
#endif
}
@@ -1046,6 +1261,9 @@
seq_puts(seq, ",inline_xattr");
else
seq_puts(seq, ",noinline_xattr");
+ if (test_opt(sbi, INLINE_XATTR_SIZE))
+ seq_printf(seq, ",inline_xattr_size=%u",
+ F2FS_OPTION(sbi).inline_xattr_size);
#endif
#ifdef CONFIG_F2FS_FS_POSIX_ACL
if (test_opt(sbi, POSIX_ACL))
@@ -1081,13 +1299,20 @@
seq_puts(seq, "adaptive");
else if (test_opt(sbi, LFS))
seq_puts(seq, "lfs");
- seq_printf(seq, ",active_logs=%u", sbi->active_logs);
+ seq_printf(seq, ",active_logs=%u", F2FS_OPTION(sbi).active_logs);
+ if (test_opt(sbi, RESERVE_ROOT))
+ seq_printf(seq, ",reserve_root=%u,resuid=%u,resgid=%u",
+ F2FS_OPTION(sbi).root_reserved_blocks,
+ from_kuid_munged(&init_user_ns,
+ F2FS_OPTION(sbi).s_resuid),
+ from_kgid_munged(&init_user_ns,
+ F2FS_OPTION(sbi).s_resgid));
if (F2FS_IO_SIZE_BITS(sbi))
seq_printf(seq, ",io_size=%uKB", F2FS_IO_SIZE_KB(sbi));
#ifdef CONFIG_F2FS_FAULT_INJECTION
if (test_opt(sbi, FAULT_INJECTION))
seq_printf(seq, ",fault_injection=%u",
- sbi->fault_info.inject_rate);
+ F2FS_OPTION(sbi).fault_info.inject_rate);
#endif
#ifdef CONFIG_QUOTA
if (test_opt(sbi, QUOTA))
@@ -1100,14 +1325,37 @@
seq_puts(seq, ",prjquota");
#endif
f2fs_show_quota_options(seq, sbi->sb);
+ if (F2FS_OPTION(sbi).whint_mode == WHINT_MODE_USER)
+ seq_printf(seq, ",whint_mode=%s", "user-based");
+ else if (F2FS_OPTION(sbi).whint_mode == WHINT_MODE_FS)
+ seq_printf(seq, ",whint_mode=%s", "fs-based");
+#ifdef CONFIG_F2FS_FS_ENCRYPTION
+ if (F2FS_OPTION(sbi).test_dummy_encryption)
+ seq_puts(seq, ",test_dummy_encryption");
+#endif
+ if (F2FS_OPTION(sbi).alloc_mode == ALLOC_MODE_DEFAULT)
+ seq_printf(seq, ",alloc_mode=%s", "default");
+ else if (F2FS_OPTION(sbi).alloc_mode == ALLOC_MODE_REUSE)
+ seq_printf(seq, ",alloc_mode=%s", "reuse");
+
+ if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_POSIX)
+ seq_printf(seq, ",fsync_mode=%s", "posix");
+ else if (F2FS_OPTION(sbi).fsync_mode == FSYNC_MODE_STRICT)
+ seq_printf(seq, ",fsync_mode=%s", "strict");
return 0;
}
static void default_options(struct f2fs_sb_info *sbi)
{
/* init some FS parameters */
- sbi->active_logs = NR_CURSEG_TYPE;
+ F2FS_OPTION(sbi).active_logs = NR_CURSEG_TYPE;
+ F2FS_OPTION(sbi).inline_xattr_size = DEFAULT_INLINE_XATTR_ADDRS;
+ F2FS_OPTION(sbi).whint_mode = WHINT_MODE_OFF;
+ F2FS_OPTION(sbi).alloc_mode = ALLOC_MODE_DEFAULT;
+ F2FS_OPTION(sbi).fsync_mode = FSYNC_MODE_POSIX;
+ F2FS_OPTION(sbi).test_dummy_encryption = false;
+ sbi->readdir_ra = 1;
set_opt(sbi, BG_GC);
set_opt(sbi, INLINE_XATTR);
@@ -1117,7 +1365,7 @@
set_opt(sbi, NOHEAP);
sbi->sb->s_flags |= MS_LAZYTIME;
set_opt(sbi, FLUSH_MERGE);
- if (f2fs_sb_mounted_blkzoned(sbi->sb)) {
+ if (f2fs_sb_has_blkzoned(sbi->sb)) {
set_opt_mode(sbi, F2FS_MOUNT_LFS);
set_opt(sbi, DISCARD);
} else {
@@ -1136,21 +1384,19 @@
#endif
}
+#ifdef CONFIG_QUOTA
+static int f2fs_enable_quotas(struct super_block *sb);
+#endif
static int f2fs_remount(struct super_block *sb, int *flags, char *data)
{
struct f2fs_sb_info *sbi = F2FS_SB(sb);
struct f2fs_mount_info org_mount_opt;
unsigned long old_sb_flags;
- int err, active_logs;
+ int err;
bool need_restart_gc = false;
bool need_stop_gc = false;
bool no_extent_cache = !test_opt(sbi, EXTENT_CACHE);
-#ifdef CONFIG_F2FS_FAULT_INJECTION
- struct f2fs_fault_info ffi = sbi->fault_info;
-#endif
#ifdef CONFIG_QUOTA
- int s_jquota_fmt;
- char *s_qf_names[MAXQUOTAS];
int i, j;
#endif
@@ -1160,21 +1406,21 @@
*/
org_mount_opt = sbi->mount_opt;
old_sb_flags = sb->s_flags;
- active_logs = sbi->active_logs;
#ifdef CONFIG_QUOTA
- s_jquota_fmt = sbi->s_jquota_fmt;
+ org_mount_opt.s_jquota_fmt = F2FS_OPTION(sbi).s_jquota_fmt;
for (i = 0; i < MAXQUOTAS; i++) {
- if (sbi->s_qf_names[i]) {
- s_qf_names[i] = kstrdup(sbi->s_qf_names[i],
- GFP_KERNEL);
- if (!s_qf_names[i]) {
+ if (F2FS_OPTION(sbi).s_qf_names[i]) {
+ org_mount_opt.s_qf_names[i] =
+ kstrdup(F2FS_OPTION(sbi).s_qf_names[i],
+ GFP_KERNEL);
+ if (!org_mount_opt.s_qf_names[i]) {
for (j = 0; j < i; j++)
- kfree(s_qf_names[j]);
+ kfree(org_mount_opt.s_qf_names[j]);
return -ENOMEM;
}
} else {
- s_qf_names[i] = NULL;
+ org_mount_opt.s_qf_names[i] = NULL;
}
}
#endif
@@ -1202,16 +1448,23 @@
if (f2fs_readonly(sb) && (*flags & MS_RDONLY))
goto skip;
+#ifdef CONFIG_QUOTA
if (!f2fs_readonly(sb) && (*flags & MS_RDONLY)) {
err = dquot_suspend(sb, -1);
if (err < 0)
goto restore_opts;
- } else {
+ } else if (f2fs_readonly(sb) && !(*flags & MS_RDONLY)) {
/* dquot_resume needs RW */
sb->s_flags &= ~MS_RDONLY;
- dquot_resume(sb, -1);
+ if (sb_any_quota_suspended(sb)) {
+ dquot_resume(sb, -1);
+ } else if (f2fs_sb_has_quota_ino(sb)) {
+ err = f2fs_enable_quotas(sb);
+ if (err)
+ goto restore_opts;
+ }
}
-
+#endif
/* disallow enable/disable extent_cache dynamically */
if (no_extent_cache == !!test_opt(sbi, EXTENT_CACHE)) {
err = -EINVAL;
@@ -1227,17 +1480,18 @@
*/
if ((*flags & MS_RDONLY) || !test_opt(sbi, BG_GC)) {
if (sbi->gc_thread) {
- stop_gc_thread(sbi);
+ f2fs_stop_gc_thread(sbi);
need_restart_gc = true;
}
} else if (!sbi->gc_thread) {
- err = start_gc_thread(sbi);
+ err = f2fs_start_gc_thread(sbi);
if (err)
goto restore_opts;
need_stop_gc = true;
}
- if (*flags & MS_RDONLY) {
+ if (*flags & MS_RDONLY ||
+ F2FS_OPTION(sbi).whint_mode != org_mount_opt.whint_mode) {
writeback_inodes_sb(sb, WB_REASON_SYNC);
sync_inodes_sb(sb);
@@ -1253,9 +1507,9 @@
*/
if ((*flags & MS_RDONLY) || !test_opt(sbi, FLUSH_MERGE)) {
clear_opt(sbi, FLUSH_MERGE);
- destroy_flush_cmd_control(sbi, false);
+ f2fs_destroy_flush_cmd_control(sbi, false);
} else {
- err = create_flush_cmd_control(sbi);
+ err = f2fs_create_flush_cmd_control(sbi);
if (err)
goto restore_gc;
}
@@ -1263,35 +1517,32 @@
#ifdef CONFIG_QUOTA
/* Release old quota file names */
for (i = 0; i < MAXQUOTAS; i++)
- kfree(s_qf_names[i]);
+ kfree(org_mount_opt.s_qf_names[i]);
#endif
/* Update the POSIXACL Flag */
sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
(test_opt(sbi, POSIX_ACL) ? MS_POSIXACL : 0);
+ limit_reserve_root(sbi);
return 0;
restore_gc:
if (need_restart_gc) {
- if (start_gc_thread(sbi))
+ if (f2fs_start_gc_thread(sbi))
f2fs_msg(sbi->sb, KERN_WARNING,
"background gc thread has stopped");
} else if (need_stop_gc) {
- stop_gc_thread(sbi);
+ f2fs_stop_gc_thread(sbi);
}
restore_opts:
#ifdef CONFIG_QUOTA
- sbi->s_jquota_fmt = s_jquota_fmt;
+ F2FS_OPTION(sbi).s_jquota_fmt = org_mount_opt.s_jquota_fmt;
for (i = 0; i < MAXQUOTAS; i++) {
- kfree(sbi->s_qf_names[i]);
- sbi->s_qf_names[i] = s_qf_names[i];
+ kfree(F2FS_OPTION(sbi).s_qf_names[i]);
+ F2FS_OPTION(sbi).s_qf_names[i] = org_mount_opt.s_qf_names[i];
}
#endif
sbi->mount_opt = org_mount_opt;
- sbi->active_logs = active_logs;
sb->s_flags = old_sb_flags;
-#ifdef CONFIG_F2FS_FAULT_INJECTION
- sbi->fault_info = ffi;
-#endif
return err;
}
@@ -1319,9 +1570,14 @@
while (toread > 0) {
tocopy = min_t(unsigned long, sb->s_blocksize - offset, toread);
repeat:
- page = read_mapping_page(mapping, blkidx, NULL);
- if (IS_ERR(page))
+ page = read_cache_page_gfp(mapping, blkidx, GFP_NOFS);
+ if (IS_ERR(page)) {
+ if (PTR_ERR(page) == -ENOMEM) {
+ congestion_wait(BLK_RW_ASYNC, HZ/50);
+ goto repeat;
+ }
return PTR_ERR(page);
+ }
lock_page(page);
@@ -1364,11 +1620,16 @@
while (towrite > 0) {
tocopy = min_t(unsigned long, sb->s_blocksize - offset,
towrite);
-
+retry:
err = a_ops->write_begin(NULL, mapping, off, tocopy, 0,
&page, NULL);
- if (unlikely(err))
+ if (unlikely(err)) {
+ if (err == -ENOMEM) {
+ congestion_wait(BLK_RW_ASYNC, HZ/50);
+ goto retry;
+ }
break;
+ }
kaddr = kmap_atomic(page);
memcpy(kaddr + offset, data, tocopy);
@@ -1385,8 +1646,7 @@
}
if (len == towrite)
- return 0;
- inode->i_version++;
+ return err;
inode->i_mtime = inode->i_ctime = current_time(inode);
f2fs_mark_inode_dirty_sync(inode, false);
return len - towrite;
@@ -1404,23 +1664,95 @@
static int f2fs_quota_on_mount(struct f2fs_sb_info *sbi, int type)
{
- return dquot_quota_on_mount(sbi->sb, sbi->s_qf_names[type],
- sbi->s_jquota_fmt, type);
+ return dquot_quota_on_mount(sbi->sb, F2FS_OPTION(sbi).s_qf_names[type],
+ F2FS_OPTION(sbi).s_jquota_fmt, type);
}
-void f2fs_enable_quota_files(struct f2fs_sb_info *sbi)
+int f2fs_enable_quota_files(struct f2fs_sb_info *sbi, bool rdonly)
{
- int i, ret;
+ int enabled = 0;
+ int i, err;
+
+ if (f2fs_sb_has_quota_ino(sbi->sb) && rdonly) {
+ err = f2fs_enable_quotas(sbi->sb);
+ if (err) {
+ f2fs_msg(sbi->sb, KERN_ERR,
+ "Cannot turn on quota_ino: %d", err);
+ return 0;
+ }
+ return 1;
+ }
for (i = 0; i < MAXQUOTAS; i++) {
- if (sbi->s_qf_names[i]) {
- ret = f2fs_quota_on_mount(sbi, i);
- if (ret < 0)
- f2fs_msg(sbi->sb, KERN_ERR,
- "Cannot turn on journaled "
- "quota: error %d", ret);
+ if (F2FS_OPTION(sbi).s_qf_names[i]) {
+ err = f2fs_quota_on_mount(sbi, i);
+ if (!err) {
+ enabled = 1;
+ continue;
+ }
+ f2fs_msg(sbi->sb, KERN_ERR,
+ "Cannot turn on quotas: %d on %d", err, i);
}
}
+ return enabled;
+}
+
+static int f2fs_quota_enable(struct super_block *sb, int type, int format_id,
+ unsigned int flags)
+{
+ struct inode *qf_inode;
+ unsigned long qf_inum;
+ int err;
+
+ BUG_ON(!f2fs_sb_has_quota_ino(sb));
+
+ qf_inum = f2fs_qf_ino(sb, type);
+ if (!qf_inum)
+ return -EPERM;
+
+ qf_inode = f2fs_iget(sb, qf_inum);
+ if (IS_ERR(qf_inode)) {
+ f2fs_msg(sb, KERN_ERR,
+ "Bad quota inode %u:%lu", type, qf_inum);
+ return PTR_ERR(qf_inode);
+ }
+
+ /* Don't account quota for quota files to avoid recursion */
+ qf_inode->i_flags |= S_NOQUOTA;
+ err = dquot_enable(qf_inode, type, format_id, flags);
+ iput(qf_inode);
+ return err;
+}
+
+static int f2fs_enable_quotas(struct super_block *sb)
+{
+ int type, err = 0;
+ unsigned long qf_inum;
+ bool quota_mopt[MAXQUOTAS] = {
+ test_opt(F2FS_SB(sb), USRQUOTA),
+ test_opt(F2FS_SB(sb), GRPQUOTA),
+ test_opt(F2FS_SB(sb), PRJQUOTA),
+ };
+
+ sb_dqopt(sb)->flags |= DQUOT_QUOTA_SYS_FILE | DQUOT_NOLIST_DIRTY;
+ for (type = 0; type < MAXQUOTAS; type++) {
+ qf_inum = f2fs_qf_ino(sb, type);
+ if (qf_inum) {
+ err = f2fs_quota_enable(sb, type, QFMT_VFS_V1,
+ DQUOT_USAGE_ENABLED |
+ (quota_mopt[type] ? DQUOT_LIMITS_ENABLED : 0));
+ if (err) {
+ f2fs_msg(sb, KERN_ERR,
+ "Failed to enable quota tracking "
+ "(type=%d, err=%d). Please run "
+ "fsck to fix.", type, err);
+ for (type--; type >= 0; type--)
+ dquot_quota_off(sb, type);
+ return err;
+ }
+ }
+ }
+ return 0;
}
static int f2fs_quota_sync(struct super_block *sb, int type)
@@ -1471,7 +1803,7 @@
inode = d_inode(path->dentry);
inode_lock(inode);
- F2FS_I(inode)->i_flags |= FS_NOATIME_FL | FS_IMMUTABLE_FL;
+ F2FS_I(inode)->i_flags |= F2FS_NOATIME_FL | F2FS_IMMUTABLE_FL;
inode_set_flags(inode, S_NOATIME | S_IMMUTABLE,
S_NOATIME | S_IMMUTABLE);
inode_unlock(inode);
@@ -1491,11 +1823,11 @@
f2fs_quota_sync(sb, type);
err = dquot_quota_off(sb, type);
- if (err)
+ if (err || f2fs_sb_has_quota_ino(sb))
goto out_put;
inode_lock(inode);
- F2FS_I(inode)->i_flags &= ~(FS_NOATIME_FL | FS_IMMUTABLE_FL);
+ F2FS_I(inode)->i_flags &= ~(F2FS_NOATIME_FL | F2FS_IMMUTABLE_FL);
inode_set_flags(inode, 0, S_NOATIME | S_IMMUTABLE);
inode_unlock(inode);
f2fs_mark_inode_dirty_sync(inode, false);
@@ -1512,7 +1844,7 @@
f2fs_quota_off(sb, type);
}
-int f2fs_get_projid(struct inode *inode, kprojid_t *projid)
+static int f2fs_get_projid(struct inode *inode, kprojid_t *projid)
{
*projid = F2FS_I(inode)->i_projid;
return 0;
@@ -1579,28 +1911,35 @@
static int f2fs_set_context(struct inode *inode, const void *ctx, size_t len,
void *fs_data)
{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+
+ /*
+ * Encrypting the root directory is not allowed because fsck
+ * expects lost+found directory to exist and remain unencrypted
+ * if LOST_FOUND feature is enabled.
+ *
+ */
+ if (f2fs_sb_has_lost_found(sbi->sb) &&
+ inode->i_ino == F2FS_ROOT_INO(sbi))
+ return -EPERM;
+
return f2fs_setxattr(inode, F2FS_XATTR_INDEX_ENCRYPTION,
F2FS_XATTR_NAME_ENCRYPTION_CONTEXT,
ctx, len, fs_data, XATTR_CREATE);
}
-static unsigned f2fs_max_namelen(struct inode *inode)
+static bool f2fs_dummy_context(struct inode *inode)
{
- return S_ISLNK(inode->i_mode) ?
- inode->i_sb->s_blocksize : F2FS_NAME_LEN;
+ return DUMMY_ENCRYPTION_ENABLED(F2FS_I_SB(inode));
}
static const struct fscrypt_operations f2fs_cryptops = {
.key_prefix = "f2fs:",
.get_context = f2fs_get_context,
.set_context = f2fs_set_context,
- .is_encrypted = f2fs_encrypted_inode,
+ .dummy_context = f2fs_dummy_context,
.empty_dir = f2fs_empty_dir,
- .max_namelen = f2fs_max_namelen,
-};
-#else
-static const struct fscrypt_operations f2fs_cryptops = {
- .is_encrypted = f2fs_encrypted_inode,
+ .max_namelen = F2FS_NAME_LEN,
};
#endif
@@ -1610,7 +1949,7 @@
struct f2fs_sb_info *sbi = F2FS_SB(sb);
struct inode *inode;
- if (check_nid_range(sbi, ino))
+ if (f2fs_check_nid_range(sbi, ino))
return ERR_PTR(-ESTALE);
/*
@@ -1656,7 +1995,7 @@
/*
* note: previously, result is equal to (DEF_ADDRS_PER_INODE -
- * F2FS_INLINE_XATTR_ADDRS), but now f2fs try to reserve more
+ * DEFAULT_INLINE_XATTR_ADDRS), but now f2fs try to reserve more
* space in inode.i_addr, it will be more safe to reassign
* result as zero.
*/
@@ -1681,7 +2020,6 @@
lock_buffer(bh);
if (super)
memcpy(bh->b_data + F2FS_SUPER_OFFSET, super, sizeof(*super));
- set_buffer_uptodate(bh);
set_buffer_dirty(bh);
unlock_buffer(bh);
@@ -1794,6 +2132,8 @@
static int sanity_check_raw_super(struct f2fs_sb_info *sbi,
struct buffer_head *bh)
{
+ block_t segment_count, segs_per_sec, secs_per_zone;
+ block_t total_sections, blocks_per_seg;
struct f2fs_super_block *raw_super = (struct f2fs_super_block *)
(bh->b_data + F2FS_SUPER_OFFSET);
struct super_block *sb = sbi->sb;
@@ -1850,6 +2190,72 @@
return 1;
}
+ segment_count = le32_to_cpu(raw_super->segment_count);
+ segs_per_sec = le32_to_cpu(raw_super->segs_per_sec);
+ secs_per_zone = le32_to_cpu(raw_super->secs_per_zone);
+ total_sections = le32_to_cpu(raw_super->section_count);
+
+ /* blocks_per_seg should be 512, given the above check */
+ blocks_per_seg = 1 << le32_to_cpu(raw_super->log_blocks_per_seg);
+
+ if (segment_count > F2FS_MAX_SEGMENT ||
+ segment_count < F2FS_MIN_SEGMENTS) {
+ f2fs_msg(sb, KERN_INFO,
+ "Invalid segment count (%u)",
+ segment_count);
+ return 1;
+ }
+
+ if (total_sections > segment_count ||
+ total_sections < F2FS_MIN_SEGMENTS ||
+ segs_per_sec > segment_count || !segs_per_sec) {
+ f2fs_msg(sb, KERN_INFO,
+ "Invalid segment/section count (%u, %u x %u)",
+ segment_count, total_sections, segs_per_sec);
+ return 1;
+ }
+
+ if ((segment_count / segs_per_sec) < total_sections) {
+ f2fs_msg(sb, KERN_INFO,
+ "Small segment_count (%u < %u * %u)",
+ segment_count, segs_per_sec, total_sections);
+ return 1;
+ }
+
+ if (segment_count > (le32_to_cpu(raw_super->block_count) >> 9)) {
+ f2fs_msg(sb, KERN_INFO,
+ "Wrong segment_count / block_count (%u > %u)",
+ segment_count, le32_to_cpu(raw_super->block_count));
+ return 1;
+ }
+
+ if (secs_per_zone > total_sections) {
+ f2fs_msg(sb, KERN_INFO,
+ "Wrong secs_per_zone (%u > %u)",
+ secs_per_zone, total_sections);
+ return 1;
+ }
+ if (le32_to_cpu(raw_super->extension_count) > F2FS_MAX_EXTENSION ||
+ raw_super->hot_ext_count > F2FS_MAX_EXTENSION ||
+ (le32_to_cpu(raw_super->extension_count) +
+ raw_super->hot_ext_count) > F2FS_MAX_EXTENSION) {
+ f2fs_msg(sb, KERN_INFO,
+ "Corrupted extension count (%u + %u > %u)",
+ le32_to_cpu(raw_super->extension_count),
+ raw_super->hot_ext_count,
+ F2FS_MAX_EXTENSION);
+ return 1;
+ }
+
+ if (le32_to_cpu(raw_super->cp_payload) >
+ (blocks_per_seg - F2FS_CP_PACKS)) {
+ f2fs_msg(sb, KERN_INFO,
+ "Insane cp_payload (%u > %u)",
+ le32_to_cpu(raw_super->cp_payload),
+ blocks_per_seg - F2FS_CP_PACKS);
+ return 1;
+ }
+
/* check reserved ino info */
if (le32_to_cpu(raw_super->node_ino) != 1 ||
le32_to_cpu(raw_super->meta_ino) != 2 ||
@@ -1862,13 +2268,6 @@
return 1;
}
- if (le32_to_cpu(raw_super->segment_count) > F2FS_MAX_SEGMENT) {
- f2fs_msg(sb, KERN_INFO,
- "Invalid segment count (%u)",
- le32_to_cpu(raw_super->segment_count));
- return 1;
- }
-
/* check CP/SIT/NAT/SSA/MAIN_AREA area boundary */
if (sanity_check_area_boundary(sbi, bh))
return 1;
@@ -1876,7 +2275,7 @@
return 0;
}
-int sanity_check_ckpt(struct f2fs_sb_info *sbi)
+int f2fs_sanity_check_ckpt(struct f2fs_sb_info *sbi)
{
unsigned int total, fsmeta;
struct f2fs_super_block *raw_super = F2FS_RAW_SUPER(sbi);
@@ -1974,14 +2373,21 @@
for (i = 0; i < NR_COUNT_TYPE; i++)
atomic_set(&sbi->nr_pages[i], 0);
- atomic_set(&sbi->wb_sync_req, 0);
+ for (i = 0; i < META; i++)
+ atomic_set(&sbi->wb_sync_req[i], 0);
INIT_LIST_HEAD(&sbi->s_list);
mutex_init(&sbi->umount_mutex);
for (i = 0; i < NR_PAGE_TYPE - 1; i++)
for (j = HOT; j < NR_TEMP_TYPE; j++)
mutex_init(&sbi->wio_mutex[i][j]);
+ init_rwsem(&sbi->io_order_lock);
spin_lock_init(&sbi->cp_lock);
+
+ sbi->dirty_device = 0;
+ spin_lock_init(&sbi->dev_lock);
+
+ init_rwsem(&sbi->sb_lock);
}
static int init_percpu_info(struct f2fs_sb_info *sbi)
@@ -2007,7 +2413,7 @@
unsigned int n = 0;
int err = -EIO;
- if (!f2fs_sb_mounted_blkzoned(sbi->sb))
+ if (!f2fs_sb_has_blkzoned(sbi->sb))
return 0;
if (sbi->blocks_per_blkz && sbi->blocks_per_blkz !=
@@ -2023,14 +2429,17 @@
if (nr_sectors & (bdev_zone_sectors(bdev) - 1))
FDEV(devi).nr_blkz++;
- FDEV(devi).blkz_type = kmalloc(FDEV(devi).nr_blkz, GFP_KERNEL);
+ FDEV(devi).blkz_type = f2fs_kmalloc(sbi, FDEV(devi).nr_blkz,
+ GFP_KERNEL);
if (!FDEV(devi).blkz_type)
return -ENOMEM;
#define F2FS_REPORT_NR_ZONES 4096
- zones = kcalloc(F2FS_REPORT_NR_ZONES, sizeof(struct blk_zone),
- GFP_KERNEL);
+ zones = f2fs_kzalloc(sbi,
+ array_size(F2FS_REPORT_NR_ZONES,
+ sizeof(struct blk_zone)),
+ GFP_KERNEL);
if (!zones)
return -ENOMEM;
@@ -2134,7 +2543,7 @@
}
/* write back-up superblock first */
- bh = sb_getblk(sbi->sb, sbi->valid_super_block ? 0: 1);
+ bh = sb_bread(sbi->sb, sbi->valid_super_block ? 0 : 1);
if (!bh)
return -EIO;
err = __f2fs_commit_super(bh, F2FS_RAW_SUPER(sbi));
@@ -2145,7 +2554,7 @@
return err;
/* write current valid superblock */
- bh = sb_getblk(sbi->sb, sbi->valid_super_block);
+ bh = sb_bread(sbi->sb, sbi->valid_super_block);
if (!bh)
return -EIO;
err = __f2fs_commit_super(bh, F2FS_RAW_SUPER(sbi));
@@ -2170,8 +2579,10 @@
* Initialize multiple devices information, or single
* zoned block device information.
*/
- sbi->devs = kcalloc(max_devices, sizeof(struct f2fs_dev_info),
- GFP_KERNEL);
+ sbi->devs = f2fs_kzalloc(sbi,
+ array_size(max_devices,
+ sizeof(struct f2fs_dev_info)),
+ GFP_KERNEL);
if (!sbi->devs)
return -ENOMEM;
@@ -2213,7 +2624,7 @@
#ifdef CONFIG_BLK_DEV_ZONED
if (bdev_zoned_model(FDEV(i).bdev) == BLK_ZONED_HM &&
- !f2fs_sb_mounted_blkzoned(sbi->sb)) {
+ !f2fs_sb_has_blkzoned(sbi->sb)) {
f2fs_msg(sbi->sb, KERN_ERR,
"Zoned block device feature not enabled\n");
return -EINVAL;
@@ -2247,6 +2658,18 @@
return 0;
}
+static void f2fs_tuning_parameters(struct f2fs_sb_info *sbi)
+{
+ struct f2fs_sm_info *sm_i = SM_I(sbi);
+
+ /* adjust parameters according to the volume size */
+ if (sm_i->main_segments <= SMALL_VOLUME_SEGMENTS) {
+ F2FS_OPTION(sbi).alloc_mode = ALLOC_MODE_REUSE;
+ sm_i->dcc_info->discard_granularity = 1;
+ sm_i->ipu_policy = 1 << F2FS_IPU_FORCE;
+ }
+}
+
static int f2fs_fill_super(struct super_block *sb, void *data, int silent)
{
struct f2fs_sb_info *sbi;
@@ -2294,6 +2717,9 @@
sb->s_fs_info = sbi;
sbi->raw_super = raw_super;
+ F2FS_OPTION(sbi).s_resuid = make_kuid(&init_user_ns, F2FS_DEF_RESUID);
+ F2FS_OPTION(sbi).s_resgid = make_kgid(&init_user_ns, F2FS_DEF_RESGID);
+
/* precompute checksum seed for metadata */
if (f2fs_sb_has_inode_chksum(sb))
sbi->s_chksum_seed = f2fs_chksum(sbi, ~0, raw_super->uuid,
@@ -2305,7 +2731,7 @@
* devices, but mandatory for host-managed zoned block devices.
*/
#ifndef CONFIG_BLK_DEV_ZONED
- if (f2fs_sb_mounted_blkzoned(sb)) {
+ if (f2fs_sb_has_blkzoned(sb)) {
f2fs_msg(sb, KERN_ERR,
"Zoned block device support is not enabled\n");
err = -EOPNOTSUPP;
@@ -2332,12 +2758,24 @@
#ifdef CONFIG_QUOTA
sb->dq_op = &f2fs_quota_operations;
- sb->s_qcop = &f2fs_quotactl_ops;
+ if (f2fs_sb_has_quota_ino(sb))
+ sb->s_qcop = &dquot_quotactl_sysfile_ops;
+ else
+ sb->s_qcop = &f2fs_quotactl_ops;
sb->s_quota_types = QTYPE_MASK_USR | QTYPE_MASK_GRP | QTYPE_MASK_PRJ;
+
+ if (f2fs_sb_has_quota_ino(sbi->sb)) {
+ for (i = 0; i < MAXQUOTAS; i++) {
+ if (f2fs_qf_ino(sbi->sb, i))
+ sbi->nquota_files++;
+ }
+ }
#endif
sb->s_op = &f2fs_sops;
+#ifdef CONFIG_F2FS_FS_ENCRYPTION
sb->s_cop = &f2fs_cryptops;
+#endif
sb->s_xattr = f2fs_xattr_handlers;
sb->s_export_op = &f2fs_export_ops;
sb->s_magic = F2FS_SUPER_MAGIC;
@@ -2345,6 +2783,7 @@
sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
(test_opt(sbi, POSIX_ACL) ? MS_POSIXACL : 0);
memcpy(&sb->s_uuid, raw_super->uuid, sizeof(raw_super->uuid));
+ sb->s_iflags |= SB_I_CGROUPWB;
/* init f2fs-specific super block info */
sbi->valid_super_block = valid_super_block;
@@ -2365,8 +2804,11 @@
int n = (i == META) ? 1: NR_TEMP_TYPE;
int j;
- sbi->write_io[i] = kmalloc(n * sizeof(struct f2fs_bio_info),
- GFP_KERNEL);
+ sbi->write_io[i] =
+ f2fs_kmalloc(sbi,
+ array_size(n,
+ sizeof(struct f2fs_bio_info)),
+ GFP_KERNEL);
if (!sbi->write_io[i]) {
err = -ENOMEM;
goto free_options;
@@ -2387,14 +2829,14 @@
err = init_percpu_info(sbi);
if (err)
- goto free_options;
+ goto free_bio_info;
if (F2FS_IO_SIZE(sbi) > 1) {
sbi->write_io_dummy =
mempool_create_page_pool(2 * (F2FS_IO_SIZE(sbi) - 1), 0);
if (!sbi->write_io_dummy) {
err = -ENOMEM;
- goto free_options;
+ goto free_percpu;
}
}
@@ -2406,7 +2848,7 @@
goto free_io_dummy;
}
- err = get_valid_checkpoint(sbi);
+ err = f2fs_get_valid_checkpoint(sbi);
if (err) {
f2fs_msg(sb, KERN_ERR, "Failed to get valid F2FS checkpoint");
goto free_meta_inode;
@@ -2428,24 +2870,26 @@
le64_to_cpu(sbi->ckpt->valid_block_count);
sbi->last_valid_block_count = sbi->total_valid_block_count;
sbi->reserved_blocks = 0;
+ sbi->current_reserved_blocks = 0;
+ limit_reserve_root(sbi);
for (i = 0; i < NR_INODE_TYPE; i++) {
INIT_LIST_HEAD(&sbi->inode_list[i]);
spin_lock_init(&sbi->inode_lock[i]);
}
- init_extent_cache_info(sbi);
+ f2fs_init_extent_cache_info(sbi);
- init_ino_entry_info(sbi);
+ f2fs_init_ino_entry_info(sbi);
/* setup f2fs internal modules */
- err = build_segment_manager(sbi);
+ err = f2fs_build_segment_manager(sbi);
if (err) {
f2fs_msg(sb, KERN_ERR,
"Failed to initialize F2FS segment manager");
goto free_sm;
}
- err = build_node_manager(sbi);
+ err = f2fs_build_node_manager(sbi);
if (err) {
f2fs_msg(sb, KERN_ERR,
"Failed to initialize F2FS node manager");
@@ -2463,7 +2907,7 @@
sbi->kbytes_written =
le64_to_cpu(seg_i->journal->info.kbytes_written);
- build_gc_manager(sbi);
+ f2fs_build_gc_manager(sbi);
/* get an inode for node space */
sbi->node_inode = f2fs_iget(sb, F2FS_NODE_INO(sbi));
@@ -2473,18 +2917,16 @@
goto free_nm;
}
- f2fs_join_shrinker(sbi);
-
err = f2fs_build_stats(sbi);
if (err)
- goto free_nm;
+ goto free_node_inode;
/* read root inode and dentry */
root = f2fs_iget(sb, F2FS_ROOT_INO(sbi));
if (IS_ERR(root)) {
f2fs_msg(sb, KERN_ERR, "Failed to read root inode");
err = PTR_ERR(root);
- goto free_node_inode;
+ goto free_stats;
}
if (!S_ISDIR(root->i_mode) || !root->i_blocks || !root->i_size) {
iput(root);
@@ -2502,10 +2944,24 @@
if (err)
goto free_root_inode;
+#ifdef CONFIG_QUOTA
+ /*
+ * Turn on quotas which were not enabled for read-only mounts if
+ * filesystem has quota feature, so that they are updated correctly.
+ */
+ if (f2fs_sb_has_quota_ino(sb) && !f2fs_readonly(sb)) {
+ err = f2fs_enable_quotas(sb);
+ if (err) {
+ f2fs_msg(sb, KERN_ERR,
+ "Cannot turn on quotas: error %d", err);
+ goto free_sysfs;
+ }
+ }
+#endif
/* if there are nt orphan nodes free them */
- err = recover_orphan_inodes(sbi);
+ err = f2fs_recover_orphan_inodes(sbi);
if (err)
- goto free_sysfs;
+ goto free_meta;
/* recover fsynced data */
if (!test_opt(sbi, DISABLE_ROLL_FORWARD)) {
@@ -2525,7 +2981,7 @@
if (!retry)
goto skip_recovery;
- err = recover_fsync_data(sbi, false);
+ err = f2fs_recover_fsync_data(sbi, false);
if (err < 0) {
need_fsck = true;
f2fs_msg(sb, KERN_ERR,
@@ -2533,17 +2989,17 @@
goto free_meta;
}
} else {
- err = recover_fsync_data(sbi, true);
+ err = f2fs_recover_fsync_data(sbi, true);
if (!f2fs_readonly(sb) && err > 0) {
err = -EINVAL;
f2fs_msg(sb, KERN_ERR,
"Need to recover fsync data");
- goto free_sysfs;
+ goto free_meta;
}
}
skip_recovery:
- /* recover_fsync_data() cleared this already */
+ /* f2fs_recover_fsync_data() cleared this already */
clear_sbi_flag(sbi, SBI_POR_DOING);
/*
@@ -2552,7 +3008,7 @@
*/
if (test_opt(sbi, BG_GC) && !f2fs_readonly(sb)) {
/* After POR, we can run background GC thread.*/
- err = start_gc_thread(sbi);
+ err = f2fs_start_gc_thread(sbi);
if (err)
goto free_meta;
}
@@ -2566,6 +3022,10 @@
sbi->valid_super_block ? 1 : 2, err);
}
+ f2fs_join_shrinker(sbi);
+
+ f2fs_tuning_parameters(sbi);
+
f2fs_msg(sbi->sb, KERN_NOTICE, "Mounted with checkpoint version = %llx",
cur_cp_version(F2FS_CKPT(sbi)));
f2fs_update_time(sbi, CP_TIME);
@@ -2573,31 +3033,35 @@
return 0;
free_meta:
+#ifdef CONFIG_QUOTA
+ if (f2fs_sb_has_quota_ino(sb) && !f2fs_readonly(sb))
+ f2fs_quota_off_umount(sbi->sb);
+#endif
f2fs_sync_inode_meta(sbi);
/*
- * Some dirty meta pages can be produced by recover_orphan_inodes()
+ * Some dirty meta pages can be produced by f2fs_recover_orphan_inodes()
* failed by EIO. Then, iput(node_inode) can trigger balance_fs_bg()
- * followed by write_checkpoint() through f2fs_write_node_pages(), which
- * falls into an infinite loop in sync_meta_pages().
+ * followed by f2fs_write_checkpoint() through f2fs_write_node_pages(), which
+ * falls into an infinite loop in f2fs_sync_meta_pages().
*/
truncate_inode_pages_final(META_MAPPING(sbi));
+#ifdef CONFIG_QUOTA
free_sysfs:
+#endif
f2fs_unregister_sysfs(sbi);
free_root_inode:
dput(sb->s_root);
sb->s_root = NULL;
-free_node_inode:
- truncate_inode_pages_final(NODE_MAPPING(sbi));
- mutex_lock(&sbi->umount_mutex);
- release_ino_entry(sbi, true);
- f2fs_leave_shrinker(sbi);
- iput(sbi->node_inode);
- mutex_unlock(&sbi->umount_mutex);
+free_stats:
f2fs_destroy_stats(sbi);
+free_node_inode:
+ f2fs_release_ino_entry(sbi, true);
+ truncate_inode_pages_final(NODE_MAPPING(sbi));
+ iput(sbi->node_inode);
free_nm:
- destroy_node_manager(sbi);
+ f2fs_destroy_node_manager(sbi);
free_sm:
- destroy_segment_manager(sbi);
+ f2fs_destroy_segment_manager(sbi);
free_devices:
destroy_device_list(sbi);
kfree(sbi->ckpt);
@@ -2606,13 +3070,15 @@
iput(sbi->meta_inode);
free_io_dummy:
mempool_destroy(sbi->write_io_dummy);
-free_options:
+free_percpu:
+ destroy_percpu_info(sbi);
+free_bio_info:
for (i = 0; i < NR_PAGE_TYPE; i++)
kfree(sbi->write_io[i]);
- destroy_percpu_info(sbi);
+free_options:
#ifdef CONFIG_QUOTA
for (i = 0; i < MAXQUOTAS; i++)
- kfree(sbi->s_qf_names[i]);
+ kfree(F2FS_OPTION(sbi).s_qf_names[i]);
#endif
kfree(options);
free_sb_buf:
@@ -2641,8 +3107,8 @@
{
if (sb->s_root) {
set_sbi_flag(F2FS_SB(sb), SBI_IS_CLOSE);
- stop_gc_thread(F2FS_SB(sb));
- stop_discard_thread(F2FS_SB(sb));
+ f2fs_stop_gc_thread(F2FS_SB(sb));
+ f2fs_stop_discard_thread(F2FS_SB(sb));
}
kill_block_super(sb);
}
@@ -2691,16 +3157,16 @@
err = init_inodecache();
if (err)
goto fail;
- err = create_node_manager_caches();
+ err = f2fs_create_node_manager_caches();
if (err)
goto free_inodecache;
- err = create_segment_manager_caches();
+ err = f2fs_create_segment_manager_caches();
if (err)
goto free_node_manager_caches;
- err = create_checkpoint_caches();
+ err = f2fs_create_checkpoint_caches();
if (err)
goto free_segment_manager_caches;
- err = create_extent_cache();
+ err = f2fs_create_extent_cache();
if (err)
goto free_checkpoint_caches;
err = f2fs_init_sysfs();
@@ -2715,8 +3181,13 @@
err = f2fs_create_root_stats();
if (err)
goto free_filesystem;
+ err = f2fs_init_post_read_processing();
+ if (err)
+ goto free_root_stats;
return 0;
+free_root_stats:
+ f2fs_destroy_root_stats();
free_filesystem:
unregister_filesystem(&f2fs_fs_type);
free_shrinker:
@@ -2724,13 +3195,13 @@
free_sysfs:
f2fs_exit_sysfs();
free_extent_cache:
- destroy_extent_cache();
+ f2fs_destroy_extent_cache();
free_checkpoint_caches:
- destroy_checkpoint_caches();
+ f2fs_destroy_checkpoint_caches();
free_segment_manager_caches:
- destroy_segment_manager_caches();
+ f2fs_destroy_segment_manager_caches();
free_node_manager_caches:
- destroy_node_manager_caches();
+ f2fs_destroy_node_manager_caches();
free_inodecache:
destroy_inodecache();
fail:
@@ -2739,14 +3210,15 @@
static void __exit exit_f2fs_fs(void)
{
+ f2fs_destroy_post_read_processing();
f2fs_destroy_root_stats();
unregister_filesystem(&f2fs_fs_type);
unregister_shrinker(&f2fs_shrinker_info);
f2fs_exit_sysfs();
- destroy_extent_cache();
- destroy_checkpoint_caches();
- destroy_segment_manager_caches();
- destroy_node_manager_caches();
+ f2fs_destroy_extent_cache();
+ f2fs_destroy_checkpoint_caches();
+ f2fs_destroy_segment_manager_caches();
+ f2fs_destroy_node_manager_caches();
destroy_inodecache();
f2fs_destroy_trace_ios();
}
diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index 93af9d7..780416f 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -31,7 +31,7 @@
FAULT_INFO_RATE, /* struct f2fs_fault_info */
FAULT_INFO_TYPE, /* struct f2fs_fault_info */
#endif
- RESERVED_BLOCKS,
+ RESERVED_BLOCKS, /* struct f2fs_sb_info */
};
struct f2fs_attr {
@@ -59,11 +59,18 @@
#ifdef CONFIG_F2FS_FAULT_INJECTION
else if (struct_type == FAULT_INFO_RATE ||
struct_type == FAULT_INFO_TYPE)
- return (unsigned char *)&sbi->fault_info;
+ return (unsigned char *)&F2FS_OPTION(sbi).fault_info;
#endif
return NULL;
}
+static ssize_t dirty_segments_show(struct f2fs_attr *a,
+ struct f2fs_sb_info *sbi, char *buf)
+{
+ return snprintf(buf, PAGE_SIZE, "%llu\n",
+ (unsigned long long)(dirty_segments(sbi)));
+}
+
static ssize_t lifetime_write_kbytes_show(struct f2fs_attr *a,
struct f2fs_sb_info *sbi, char *buf)
{
@@ -86,10 +93,10 @@
if (!sb->s_bdev->bd_part)
return snprintf(buf, PAGE_SIZE, "0\n");
- if (f2fs_sb_has_crypto(sb))
+ if (f2fs_sb_has_encrypt(sb))
len += snprintf(buf, PAGE_SIZE - len, "%s",
"encryption");
- if (f2fs_sb_mounted_blkzoned(sb))
+ if (f2fs_sb_has_blkzoned(sb))
len += snprintf(buf + len, PAGE_SIZE - len, "%s%s",
len ? ", " : "", "blkzoned");
if (f2fs_sb_has_extra_attr(sb))
@@ -101,10 +108,28 @@
if (f2fs_sb_has_inode_chksum(sb))
len += snprintf(buf + len, PAGE_SIZE - len, "%s%s",
len ? ", " : "", "inode_checksum");
+ if (f2fs_sb_has_flexible_inline_xattr(sb))
+ len += snprintf(buf + len, PAGE_SIZE - len, "%s%s",
+ len ? ", " : "", "flexible_inline_xattr");
+ if (f2fs_sb_has_quota_ino(sb))
+ len += snprintf(buf + len, PAGE_SIZE - len, "%s%s",
+ len ? ", " : "", "quota_ino");
+ if (f2fs_sb_has_inode_crtime(sb))
+ len += snprintf(buf + len, PAGE_SIZE - len, "%s%s",
+ len ? ", " : "", "inode_crtime");
+ if (f2fs_sb_has_lost_found(sb))
+ len += snprintf(buf + len, PAGE_SIZE - len, "%s%s",
+ len ? ", " : "", "lost_found");
len += snprintf(buf + len, PAGE_SIZE - len, "\n");
return len;
}
+static ssize_t current_reserved_blocks_show(struct f2fs_attr *a,
+ struct f2fs_sb_info *sbi, char *buf)
+{
+ return snprintf(buf, PAGE_SIZE, "%u\n", sbi->current_reserved_blocks);
+}
+
static ssize_t f2fs_sbi_show(struct f2fs_attr *a,
struct f2fs_sb_info *sbi, char *buf)
{
@@ -115,12 +140,33 @@
if (!ptr)
return -EINVAL;
+ if (!strcmp(a->attr.name, "extension_list")) {
+ __u8 (*extlist)[F2FS_EXTENSION_LEN] =
+ sbi->raw_super->extension_list;
+ int cold_count = le32_to_cpu(sbi->raw_super->extension_count);
+ int hot_count = sbi->raw_super->hot_ext_count;
+ int len = 0, i;
+
+ len += snprintf(buf + len, PAGE_SIZE - len,
+ "cold file extension:\n");
+ for (i = 0; i < cold_count; i++)
+ len += snprintf(buf + len, PAGE_SIZE - len, "%s\n",
+ extlist[i]);
+
+ len += snprintf(buf + len, PAGE_SIZE - len,
+ "hot file extension:\n");
+ for (i = cold_count; i < cold_count + hot_count; i++)
+ len += snprintf(buf + len, PAGE_SIZE - len, "%s\n",
+ extlist[i]);
+ return len;
+ }
+
ui = (unsigned int *)(ptr + a->offset);
return snprintf(buf, PAGE_SIZE, "%u\n", *ui);
}
-static ssize_t f2fs_sbi_store(struct f2fs_attr *a,
+static ssize_t __sbi_store(struct f2fs_attr *a,
struct f2fs_sb_info *sbi,
const char *buf, size_t count)
{
@@ -133,6 +179,41 @@
if (!ptr)
return -EINVAL;
+ if (!strcmp(a->attr.name, "extension_list")) {
+ const char *name = strim((char *)buf);
+ bool set = true, hot;
+
+ if (!strncmp(name, "[h]", 3))
+ hot = true;
+ else if (!strncmp(name, "[c]", 3))
+ hot = false;
+ else
+ return -EINVAL;
+
+ name += 3;
+
+ if (*name == '!') {
+ name++;
+ set = false;
+ }
+
+ if (strlen(name) >= F2FS_EXTENSION_LEN)
+ return -EINVAL;
+
+ down_write(&sbi->sb_lock);
+
+ ret = f2fs_update_extension_list(sbi, name, hot, set);
+ if (ret)
+ goto out;
+
+ ret = f2fs_commit_super(sbi, false);
+ if (ret)
+ f2fs_update_extension_list(sbi, name, hot, !set);
+out:
+ up_write(&sbi->sb_lock);
+ return ret ? ret : count;
+ }
+
ui = (unsigned int *)(ptr + a->offset);
ret = kstrtoul(skip_spaces(buf), 0, &t);
@@ -144,51 +225,77 @@
#endif
if (a->struct_type == RESERVED_BLOCKS) {
spin_lock(&sbi->stat_lock);
- if ((unsigned long)sbi->total_valid_block_count + t >
- (unsigned long)sbi->user_block_count) {
+ if (t > (unsigned long)(sbi->user_block_count -
+ F2FS_OPTION(sbi).root_reserved_blocks)) {
spin_unlock(&sbi->stat_lock);
return -EINVAL;
}
*ui = t;
+ sbi->current_reserved_blocks = min(sbi->reserved_blocks,
+ sbi->user_block_count - valid_user_blocks(sbi));
spin_unlock(&sbi->stat_lock);
return count;
}
if (!strcmp(a->attr.name, "discard_granularity")) {
- struct discard_cmd_control *dcc = SM_I(sbi)->dcc_info;
- int i;
-
if (t == 0 || t > MAX_PLIST_NUM)
return -EINVAL;
if (t == *ui)
return count;
-
- mutex_lock(&dcc->cmd_lock);
- for (i = 0; i < MAX_PLIST_NUM; i++) {
- if (i >= t - 1)
- dcc->pend_list_tag[i] |= P_ACTIVE;
- else
- dcc->pend_list_tag[i] &= (~P_ACTIVE);
- }
- mutex_unlock(&dcc->cmd_lock);
-
*ui = t;
return count;
}
+ if (!strcmp(a->attr.name, "trim_sections"))
+ return -EINVAL;
+
+ if (!strcmp(a->attr.name, "gc_urgent")) {
+ if (t >= 1) {
+ sbi->gc_mode = GC_URGENT;
+ if (sbi->gc_thread) {
+ wake_up_interruptible_all(
+ &sbi->gc_thread->gc_wait_queue_head);
+ wake_up_discard_thread(sbi, true);
+ }
+ } else {
+ sbi->gc_mode = GC_NORMAL;
+ }
+ return count;
+ }
+ if (!strcmp(a->attr.name, "gc_idle")) {
+ if (t == GC_IDLE_CB)
+ sbi->gc_mode = GC_IDLE_CB;
+ else if (t == GC_IDLE_GREEDY)
+ sbi->gc_mode = GC_IDLE_GREEDY;
+ else
+ sbi->gc_mode = GC_NORMAL;
+ return count;
+ }
+
*ui = t;
if (!strcmp(a->attr.name, "iostat_enable") && *ui == 0)
f2fs_reset_iostat(sbi);
- if (!strcmp(a->attr.name, "gc_urgent") && t == 1 && sbi->gc_thread) {
- sbi->gc_thread->gc_wake = 1;
- wake_up_interruptible_all(&sbi->gc_thread->gc_wait_queue_head);
- wake_up_discard_thread(sbi, true);
- }
-
return count;
}
+static ssize_t f2fs_sbi_store(struct f2fs_attr *a,
+ struct f2fs_sb_info *sbi,
+ const char *buf, size_t count)
+{
+ ssize_t ret;
+ bool gc_entry = (!strcmp(a->attr.name, "gc_urgent") ||
+ a->struct_type == GC_THREAD);
+
+ if (gc_entry)
+ down_read(&sbi->sb->s_umount);
+ ret = __sbi_store(a, sbi, buf, count);
+ if (gc_entry)
+ up_read(&sbi->sb->s_umount);
+
+ return ret;
+}
+
static ssize_t f2fs_attr_show(struct kobject *kobj,
struct attribute *attr, char *buf)
{
@@ -223,6 +330,10 @@
FEAT_EXTRA_ATTR,
FEAT_PROJECT_QUOTA,
FEAT_INODE_CHECKSUM,
+ FEAT_FLEXIBLE_INLINE_XATTR,
+ FEAT_QUOTA_INO,
+ FEAT_INODE_CRTIME,
+ FEAT_LOST_FOUND,
};
static ssize_t f2fs_feature_show(struct f2fs_attr *a,
@@ -235,6 +346,10 @@
case FEAT_EXTRA_ATTR:
case FEAT_PROJECT_QUOTA:
case FEAT_INODE_CHECKSUM:
+ case FEAT_FLEXIBLE_INLINE_XATTR:
+ case FEAT_QUOTA_INO:
+ case FEAT_INODE_CRTIME:
+ case FEAT_LOST_FOUND:
return snprintf(buf, PAGE_SIZE, "supported\n");
}
return 0;
@@ -269,8 +384,8 @@
F2FS_RW_ATTR(GC_THREAD, f2fs_gc_kthread, gc_min_sleep_time, min_sleep_time);
F2FS_RW_ATTR(GC_THREAD, f2fs_gc_kthread, gc_max_sleep_time, max_sleep_time);
F2FS_RW_ATTR(GC_THREAD, f2fs_gc_kthread, gc_no_gc_sleep_time, no_gc_sleep_time);
-F2FS_RW_ATTR(GC_THREAD, f2fs_gc_kthread, gc_idle, gc_idle);
-F2FS_RW_ATTR(GC_THREAD, f2fs_gc_kthread, gc_urgent, gc_urgent);
+F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_idle, gc_mode);
+F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_urgent, gc_mode);
F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, reclaim_segments, rec_prefree_segments);
F2FS_RW_ATTR(DCC_INFO, discard_cmd_control, max_small_discards, max_discards);
F2FS_RW_ATTR(DCC_INFO, discard_cmd_control, discard_granularity, discard_granularity);
@@ -280,6 +395,7 @@
F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_ipu_util, min_ipu_util);
F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_fsync_blocks, min_fsync_blocks);
F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_hot_blocks, min_hot_blocks);
+F2FS_RW_ATTR(SM_INFO, f2fs_sm_info, min_ssr_sections, min_ssr_sections);
F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, ram_thresh, ram_thresh);
F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, ra_nid_pages, ra_nid_pages);
F2FS_RW_ATTR(NM_INFO, f2fs_nm_info, dirty_nats_ratio, dirty_nats_ratio);
@@ -288,12 +404,17 @@
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, cp_interval, interval_time[CP_TIME]);
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, idle_interval, interval_time[REQ_TIME]);
F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, iostat_enable, iostat_enable);
+F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, readdir_ra, readdir_ra);
+F2FS_RW_ATTR(F2FS_SBI, f2fs_sb_info, gc_pin_file_thresh, gc_pin_file_threshold);
+F2FS_RW_ATTR(F2FS_SBI, f2fs_super_block, extension_list, extension_list);
#ifdef CONFIG_F2FS_FAULT_INJECTION
F2FS_RW_ATTR(FAULT_INFO_RATE, f2fs_fault_info, inject_rate, inject_rate);
F2FS_RW_ATTR(FAULT_INFO_TYPE, f2fs_fault_info, inject_type, inject_type);
#endif
+F2FS_GENERAL_RO_ATTR(dirty_segments);
F2FS_GENERAL_RO_ATTR(lifetime_write_kbytes);
F2FS_GENERAL_RO_ATTR(features);
+F2FS_GENERAL_RO_ATTR(current_reserved_blocks);
#ifdef CONFIG_F2FS_FS_ENCRYPTION
F2FS_FEATURE_RO_ATTR(encryption, FEAT_CRYPTO);
@@ -305,6 +426,10 @@
F2FS_FEATURE_RO_ATTR(extra_attr, FEAT_EXTRA_ATTR);
F2FS_FEATURE_RO_ATTR(project_quota, FEAT_PROJECT_QUOTA);
F2FS_FEATURE_RO_ATTR(inode_checksum, FEAT_INODE_CHECKSUM);
+F2FS_FEATURE_RO_ATTR(flexible_inline_xattr, FEAT_FLEXIBLE_INLINE_XATTR);
+F2FS_FEATURE_RO_ATTR(quota_ino, FEAT_QUOTA_INO);
+F2FS_FEATURE_RO_ATTR(inode_crtime, FEAT_INODE_CRTIME);
+F2FS_FEATURE_RO_ATTR(lost_found, FEAT_LOST_FOUND);
#define ATTR_LIST(name) (&f2fs_attr_##name.attr)
static struct attribute *f2fs_attrs[] = {
@@ -322,6 +447,7 @@
ATTR_LIST(min_ipu_util),
ATTR_LIST(min_fsync_blocks),
ATTR_LIST(min_hot_blocks),
+ ATTR_LIST(min_ssr_sections),
ATTR_LIST(max_victim_search),
ATTR_LIST(dir_level),
ATTR_LIST(ram_thresh),
@@ -330,13 +456,18 @@
ATTR_LIST(cp_interval),
ATTR_LIST(idle_interval),
ATTR_LIST(iostat_enable),
+ ATTR_LIST(readdir_ra),
+ ATTR_LIST(gc_pin_file_thresh),
+ ATTR_LIST(extension_list),
#ifdef CONFIG_F2FS_FAULT_INJECTION
ATTR_LIST(inject_rate),
ATTR_LIST(inject_type),
#endif
+ ATTR_LIST(dirty_segments),
ATTR_LIST(lifetime_write_kbytes),
ATTR_LIST(features),
ATTR_LIST(reserved_blocks),
+ ATTR_LIST(current_reserved_blocks),
NULL,
};
@@ -351,6 +482,10 @@
ATTR_LIST(extra_attr),
ATTR_LIST(project_quota),
ATTR_LIST(inode_checksum),
+ ATTR_LIST(flexible_inline_xattr),
+ ATTR_LIST(quota_ino),
+ ATTR_LIST(inode_crtime),
+ ATTR_LIST(lost_found),
NULL,
};
diff --git a/fs/f2fs/trace.c b/fs/f2fs/trace.c
index bccbbf2..a1fcd00 100644
--- a/fs/f2fs/trace.c
+++ b/fs/f2fs/trace.c
@@ -17,7 +17,7 @@
#include "trace.h"
static RADIX_TREE(pids, GFP_ATOMIC);
-static spinlock_t pids_lock;
+static struct mutex pids_lock;
static struct last_io_info last_io;
static inline void __print_last_io(void)
@@ -64,7 +64,7 @@
if (radix_tree_preload(GFP_NOFS))
return;
- spin_lock(&pids_lock);
+ mutex_lock(&pids_lock);
p = radix_tree_lookup(&pids, pid);
if (p == current)
goto out;
@@ -77,7 +77,7 @@
MAJOR(inode->i_sb->s_dev), MINOR(inode->i_sb->s_dev),
pid, current->comm);
out:
- spin_unlock(&pids_lock);
+ mutex_unlock(&pids_lock);
radix_tree_preload_end();
}
@@ -122,7 +122,7 @@
void f2fs_build_trace_ios(void)
{
- spin_lock_init(&pids_lock);
+ mutex_init(&pids_lock);
}
#define PIDVEC_SIZE 128
@@ -150,7 +150,7 @@
pid_t next_pid = 0;
unsigned int found;
- spin_lock(&pids_lock);
+ mutex_lock(&pids_lock);
while ((found = gang_lookup_pids(pid, next_pid, PIDVEC_SIZE))) {
unsigned idx;
@@ -158,5 +158,5 @@
for (idx = 0; idx < found; idx++)
radix_tree_delete(&pids, pid[idx]);
}
- spin_unlock(&pids_lock);
+ mutex_unlock(&pids_lock);
}
diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c
index 7c65540..7082718 100644
--- a/fs/f2fs/xattr.c
+++ b/fs/f2fs/xattr.c
@@ -217,12 +217,12 @@
return entry;
}
-static struct f2fs_xattr_entry *__find_inline_xattr(void *base_addr,
- void **last_addr, int index,
- size_t len, const char *name)
+static struct f2fs_xattr_entry *__find_inline_xattr(struct inode *inode,
+ void *base_addr, void **last_addr, int index,
+ size_t len, const char *name)
{
struct f2fs_xattr_entry *entry;
- unsigned int inline_size = F2FS_INLINE_XATTR_ADDRS << 2;
+ unsigned int inline_size = inline_xattr_size(inode);
list_for_each_xattr(entry, base_addr) {
if ((void *)entry + sizeof(__u32) > base_addr + inline_size ||
@@ -241,12 +241,54 @@
return entry;
}
+static int read_inline_xattr(struct inode *inode, struct page *ipage,
+ void *txattr_addr)
+{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ unsigned int inline_size = inline_xattr_size(inode);
+ struct page *page = NULL;
+ void *inline_addr;
+
+ if (ipage) {
+ inline_addr = inline_xattr_addr(inode, ipage);
+ } else {
+ page = f2fs_get_node_page(sbi, inode->i_ino);
+ if (IS_ERR(page))
+ return PTR_ERR(page);
+
+ inline_addr = inline_xattr_addr(inode, page);
+ }
+ memcpy(txattr_addr, inline_addr, inline_size);
+ f2fs_put_page(page, 1);
+
+ return 0;
+}
+
+static int read_xattr_block(struct inode *inode, void *txattr_addr)
+{
+ struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
+ nid_t xnid = F2FS_I(inode)->i_xattr_nid;
+ unsigned int inline_size = inline_xattr_size(inode);
+ struct page *xpage;
+ void *xattr_addr;
+
+ /* The inode already has an extended attribute block. */
+ xpage = f2fs_get_node_page(sbi, xnid);
+ if (IS_ERR(xpage))
+ return PTR_ERR(xpage);
+
+ xattr_addr = page_address(xpage);
+ memcpy(txattr_addr + inline_size, xattr_addr, VALID_XATTR_BLOCK_SIZE);
+ f2fs_put_page(xpage, 1);
+
+ return 0;
+}
+
static int lookup_all_xattrs(struct inode *inode, struct page *ipage,
unsigned int index, unsigned int len,
const char *name, struct f2fs_xattr_entry **xe,
void **base_addr)
{
- struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
void *cur_addr, *txattr_addr, *last_addr = NULL;
nid_t xnid = F2FS_I(inode)->i_xattr_nid;
unsigned int size = xnid ? VALID_XATTR_BLOCK_SIZE : 0;
@@ -256,30 +298,18 @@
if (!size && !inline_size)
return -ENODATA;
- txattr_addr = kzalloc(inline_size + size + XATTR_PADDING_SIZE,
- GFP_F2FS_ZERO);
+ txattr_addr = f2fs_kzalloc(F2FS_I_SB(inode),
+ inline_size + size + XATTR_PADDING_SIZE, GFP_NOFS);
if (!txattr_addr)
return -ENOMEM;
/* read from inline xattr */
if (inline_size) {
- struct page *page = NULL;
- void *inline_addr;
+ err = read_inline_xattr(inode, ipage, txattr_addr);
+ if (err)
+ goto out;
- if (ipage) {
- inline_addr = inline_xattr_addr(ipage);
- } else {
- page = get_node_page(sbi, inode->i_ino);
- if (IS_ERR(page)) {
- err = PTR_ERR(page);
- goto out;
- }
- inline_addr = inline_xattr_addr(page);
- }
- memcpy(txattr_addr, inline_addr, inline_size);
- f2fs_put_page(page, 1);
-
- *xe = __find_inline_xattr(txattr_addr, &last_addr,
+ *xe = __find_inline_xattr(inode, txattr_addr, &last_addr,
index, len, name);
if (*xe)
goto check;
@@ -287,19 +317,9 @@
/* read from xattr node block */
if (xnid) {
- struct page *xpage;
- void *xattr_addr;
-
- /* The inode already has an extended attribute block. */
- xpage = get_node_page(sbi, xnid);
- if (IS_ERR(xpage)) {
- err = PTR_ERR(xpage);
+ err = read_xattr_block(inode, txattr_addr);
+ if (err)
goto out;
- }
-
- xattr_addr = page_address(xpage);
- memcpy(txattr_addr + inline_size, xattr_addr, size);
- f2fs_put_page(xpage, 1);
}
if (last_addr)
@@ -324,7 +344,6 @@
static int read_all_xattrs(struct inode *inode, struct page *ipage,
void **base_addr)
{
- struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
struct f2fs_xattr_header *header;
nid_t xnid = F2FS_I(inode)->i_xattr_nid;
unsigned int size = VALID_XATTR_BLOCK_SIZE;
@@ -332,45 +351,23 @@
void *txattr_addr;
int err;
- txattr_addr = kzalloc(inline_size + size + XATTR_PADDING_SIZE,
- GFP_F2FS_ZERO);
+ txattr_addr = f2fs_kzalloc(F2FS_I_SB(inode),
+ inline_size + size + XATTR_PADDING_SIZE, GFP_NOFS);
if (!txattr_addr)
return -ENOMEM;
/* read from inline xattr */
if (inline_size) {
- struct page *page = NULL;
- void *inline_addr;
-
- if (ipage) {
- inline_addr = inline_xattr_addr(ipage);
- } else {
- page = get_node_page(sbi, inode->i_ino);
- if (IS_ERR(page)) {
- err = PTR_ERR(page);
- goto fail;
- }
- inline_addr = inline_xattr_addr(page);
- }
- memcpy(txattr_addr, inline_addr, inline_size);
- f2fs_put_page(page, 1);
+ err = read_inline_xattr(inode, ipage, txattr_addr);
+ if (err)
+ goto fail;
}
/* read from xattr node block */
if (xnid) {
- struct page *xpage;
- void *xattr_addr;
-
- /* The inode already has an extended attribute block. */
- xpage = get_node_page(sbi, xnid);
- if (IS_ERR(xpage)) {
- err = PTR_ERR(xpage);
+ err = read_xattr_block(inode, txattr_addr);
+ if (err)
goto fail;
- }
-
- xattr_addr = page_address(xpage);
- memcpy(txattr_addr + inline_size, xattr_addr, size);
- f2fs_put_page(xpage, 1);
}
header = XATTR_HDR(txattr_addr);
@@ -392,70 +389,81 @@
{
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
size_t inline_size = inline_xattr_size(inode);
+ struct page *in_page = NULL;
void *xattr_addr;
+ void *inline_addr = NULL;
struct page *xpage;
nid_t new_nid = 0;
- int err;
+ int err = 0;
if (hsize > inline_size && !F2FS_I(inode)->i_xattr_nid)
- if (!alloc_nid(sbi, &new_nid))
+ if (!f2fs_alloc_nid(sbi, &new_nid))
return -ENOSPC;
/* write to inline xattr */
if (inline_size) {
- struct page *page = NULL;
- void *inline_addr;
-
if (ipage) {
- inline_addr = inline_xattr_addr(ipage);
- f2fs_wait_on_page_writeback(ipage, NODE, true);
- set_page_dirty(ipage);
+ inline_addr = inline_xattr_addr(inode, ipage);
} else {
- page = get_node_page(sbi, inode->i_ino);
- if (IS_ERR(page)) {
- alloc_nid_failed(sbi, new_nid);
- return PTR_ERR(page);
+ in_page = f2fs_get_node_page(sbi, inode->i_ino);
+ if (IS_ERR(in_page)) {
+ f2fs_alloc_nid_failed(sbi, new_nid);
+ return PTR_ERR(in_page);
}
- inline_addr = inline_xattr_addr(page);
- f2fs_wait_on_page_writeback(page, NODE, true);
+ inline_addr = inline_xattr_addr(inode, in_page);
}
- memcpy(inline_addr, txattr_addr, inline_size);
- f2fs_put_page(page, 1);
+ f2fs_wait_on_page_writeback(ipage ? ipage : in_page,
+ NODE, true);
/* no need to use xattr node block */
if (hsize <= inline_size) {
- err = truncate_xattr_node(inode, ipage);
- alloc_nid_failed(sbi, new_nid);
- return err;
+ err = f2fs_truncate_xattr_node(inode);
+ f2fs_alloc_nid_failed(sbi, new_nid);
+ if (err) {
+ f2fs_put_page(in_page, 1);
+ return err;
+ }
+ memcpy(inline_addr, txattr_addr, inline_size);
+ set_page_dirty(ipage ? ipage : in_page);
+ goto in_page_out;
}
}
/* write to xattr node block */
if (F2FS_I(inode)->i_xattr_nid) {
- xpage = get_node_page(sbi, F2FS_I(inode)->i_xattr_nid);
+ xpage = f2fs_get_node_page(sbi, F2FS_I(inode)->i_xattr_nid);
if (IS_ERR(xpage)) {
- alloc_nid_failed(sbi, new_nid);
- return PTR_ERR(xpage);
+ err = PTR_ERR(xpage);
+ f2fs_alloc_nid_failed(sbi, new_nid);
+ goto in_page_out;
}
f2fs_bug_on(sbi, new_nid);
f2fs_wait_on_page_writeback(xpage, NODE, true);
} else {
struct dnode_of_data dn;
set_new_dnode(&dn, inode, NULL, NULL, new_nid);
- xpage = new_node_page(&dn, XATTR_NODE_OFFSET);
+ xpage = f2fs_new_node_page(&dn, XATTR_NODE_OFFSET);
if (IS_ERR(xpage)) {
- alloc_nid_failed(sbi, new_nid);
- return PTR_ERR(xpage);
+ err = PTR_ERR(xpage);
+ f2fs_alloc_nid_failed(sbi, new_nid);
+ goto in_page_out;
}
- alloc_nid_done(sbi, new_nid);
+ f2fs_alloc_nid_done(sbi, new_nid);
}
-
xattr_addr = page_address(xpage);
- memcpy(xattr_addr, txattr_addr + inline_size, VALID_XATTR_BLOCK_SIZE);
- set_page_dirty(xpage);
- f2fs_put_page(xpage, 1);
- return 0;
+ if (inline_size)
+ memcpy(inline_addr, txattr_addr, inline_size);
+ memcpy(xattr_addr, txattr_addr + inline_size, VALID_XATTR_BLOCK_SIZE);
+
+ if (inline_size)
+ set_page_dirty(ipage ? ipage : in_page);
+ set_page_dirty(xpage);
+
+ f2fs_put_page(xpage, 1);
+in_page_out:
+ f2fs_put_page(in_page, 1);
+ return err;
}
int f2fs_getxattr(struct inode *inode, int index, const char *name,
@@ -592,7 +600,7 @@
goto exit;
}
- if (f2fs_xattr_value_same(here, value, size))
+ if (value && f2fs_xattr_value_same(here, value, size))
goto exit;
} else if ((flags & XATTR_REPLACE)) {
error = -ENODATA;
@@ -681,7 +689,11 @@
struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
int err;
- /* this case is only from init_inode_metadata */
+ err = dquot_initialize(inode);
+ if (err)
+ return err;
+
+ /* this case is only from f2fs_init_inode_metadata */
if (ipage)
return __f2fs_setxattr(inode, index, name, value,
size, ipage, flags);
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 3244932f..8e1b9b1 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2112,7 +2112,7 @@
(dirtytime && (inode->i_state & I_DIRTY_INODE)))
return;
- if (unlikely(block_dump))
+ if (unlikely(block_dump > 1))
block_dump___mark_inode_dirty(inode);
spin_lock(&inode->i_lock);
diff --git a/fs/fs_struct.c b/fs/fs_struct.c
index be02507..987c95b 100644
--- a/fs/fs_struct.c
+++ b/fs/fs_struct.c
@@ -45,6 +45,7 @@
if (old_pwd.dentry)
path_put(&old_pwd);
}
+EXPORT_SYMBOL(set_fs_pwd);
static inline int replace_path(struct path *p, const struct path *old, const struct path *new)
{
@@ -90,6 +91,7 @@
path_put(&fs->pwd);
kmem_cache_free(fs_cachep, fs);
}
+EXPORT_SYMBOL(free_fs_struct);
void exit_fs(struct task_struct *tsk)
{
@@ -128,6 +130,7 @@
}
return fs;
}
+EXPORT_SYMBOL_GPL(copy_fs_struct);
int unshare_fs_struct(void)
{
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index ee8105a..d563ec1 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -14,6 +14,7 @@
#include <linux/sched/signal.h>
#include <linux/uio.h>
#include <linux/miscdevice.h>
+#include <linux/namei.h>
#include <linux/pagemap.h>
#include <linux/file.h>
#include <linux/slab.h>
@@ -21,6 +22,7 @@
#include <linux/swap.h>
#include <linux/splice.h>
#include <linux/sched.h>
+#include <linux/freezer.h>
MODULE_ALIAS_MISCDEV(FUSE_MINOR);
MODULE_ALIAS("devname:fuse");
@@ -464,7 +466,9 @@
* Either request is already in userspace, or it was forced.
* Wait it out.
*/
- wait_event(req->waitq, test_bit(FR_FINISHED, &req->flags));
+ while (!test_bit(FR_FINISHED, &req->flags))
+ wait_event_freezable(req->waitq,
+ test_bit(FR_FINISHED, &req->flags));
}
static void __fuse_request_send(struct fuse_conn *fc, struct fuse_req *req)
@@ -1897,6 +1901,12 @@
cs->move_pages = 0;
err = copy_out_args(cs, &req->out, nbytes);
+ if (req->in.h.opcode == FUSE_CANONICAL_PATH) {
+ char *path = (char *)req->out.args[0].value;
+
+ path[req->out.args[0].size - 1] = 0;
+ req->out.h.error = kern_path(path, 0, req->canonical_path);
+ }
fuse_copy_finish(cs);
spin_lock(&fpq->lock);
diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 29868c3..72ea877 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -262,6 +262,50 @@
goto out;
}
+/*
+ * Get the canonical path. Since we must translate to a path, this must be done
+ * in the context of the userspace daemon, however, the userspace daemon cannot
+ * look up paths on its own. Instead, we handle the lookup as a special case
+ * inside of the write request.
+ */
+static void fuse_dentry_canonical_path(const struct path *path, struct path *canonical_path) {
+ struct inode *inode = path->dentry->d_inode;
+ struct fuse_conn *fc = get_fuse_conn(inode);
+ struct fuse_req *req;
+ int err;
+ char *path_name;
+
+ req = fuse_get_req(fc, 1);
+ err = PTR_ERR(req);
+ if (IS_ERR(req))
+ goto default_path;
+
+ path_name = (char*)__get_free_page(GFP_KERNEL);
+ if (!path_name) {
+ fuse_put_request(fc, req);
+ goto default_path;
+ }
+
+ req->in.h.opcode = FUSE_CANONICAL_PATH;
+ req->in.h.nodeid = get_node_id(inode);
+ req->in.numargs = 0;
+ req->out.numargs = 1;
+ req->out.args[0].size = PATH_MAX;
+ req->out.args[0].value = path_name;
+ req->canonical_path = canonical_path;
+ req->out.argvar = 1;
+ fuse_request_send(fc, req);
+ err = req->out.h.error;
+ fuse_put_request(fc, req);
+ free_page((unsigned long)path_name);
+ if (!err)
+ return;
+default_path:
+ canonical_path->dentry = path->dentry;
+ canonical_path->mnt = path->mnt;
+ path_get(canonical_path);
+}
+
static int invalid_nodeid(u64 nodeid)
{
return !nodeid || nodeid == FUSE_ROOT_ID;
@@ -284,11 +328,13 @@
.d_revalidate = fuse_dentry_revalidate,
.d_init = fuse_dentry_init,
.d_release = fuse_dentry_release,
+ .d_canonical_path = fuse_dentry_canonical_path,
};
const struct dentry_operations fuse_root_dentry_operations = {
.d_init = fuse_dentry_init,
.d_release = fuse_dentry_release,
+ .d_canonical_path = fuse_dentry_canonical_path,
};
int fuse_valid_type(int m)
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index fb4738e..d10f340 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -837,9 +837,9 @@
unsigned nr_pages;
};
-static int fuse_readpages_fill(void *_data, struct page *page)
+static int fuse_readpages_fill(struct file *_data, struct page *page)
{
- struct fuse_fill_data *data = _data;
+ struct fuse_fill_data *data = (struct fuse_fill_data *)_data;
struct fuse_req *req = data->req;
struct inode *inode = data->inode;
struct fuse_conn *fc = get_fuse_conn(inode);
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index e10564015..078ce45 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -370,6 +370,9 @@
/** Inode used in the request or NULL */
struct inode *inode;
+ /** Path used for completing d_canonical_path */
+ struct path *canonical_path;
+
/** AIO control block */
struct fuse_io_priv *io;
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index ffb6178..9713898 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -31,7 +31,7 @@
struct list_head fuse_conn_list;
DEFINE_MUTEX(fuse_mutex);
-static int set_global_limit(const char *val, struct kernel_param *kp);
+static int set_global_limit(const char *val, const struct kernel_param *kp);
unsigned max_user_bgreq;
module_param_call(max_user_bgreq, set_global_limit, param_get_uint,
@@ -826,7 +826,7 @@
*limit = (1 << 16) - 1;
}
-static int set_global_limit(const char *val, struct kernel_param *kp)
+static int set_global_limit(const char *val, const struct kernel_param *kp)
{
int rv;
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 68ed069..ec3147e 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -280,22 +280,6 @@
for(i = 0; i < nr_pages; i++) {
struct page *page = pvec->pages[i];
- /*
- * At this point, the page may be truncated or
- * invalidated (changing page->mapping to NULL), or
- * even swizzled back from swapper_space to tmpfs file
- * mapping. However, page->index will not change
- * because we have a reference on the page.
- */
- if (page->index > end) {
- /*
- * can't be range_cyclic (1st pass) because
- * end == -1 in that case.
- */
- ret = 1;
- break;
- }
-
*done_index = page->index;
lock_page(page);
@@ -413,8 +397,8 @@
tag_pages_for_writeback(mapping, index, end);
done_index = index;
while (!done && (index <= end)) {
- nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag,
- min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1);
+ nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end,
+ tag);
if (nr_pages == 0)
break;
@@ -523,7 +507,7 @@
*
*/
-static int __gfs2_readpage(void *file, struct page *page)
+static int __gfs2_readpage(struct file *file, struct page *page)
{
struct gfs2_inode *ip = GFS2_I(page->mapping->host);
struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host);
diff --git a/fs/inode.c b/fs/inode.c
index cfc36d1..602e28f 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -1794,7 +1794,7 @@
return mask;
}
-static int __remove_privs(struct dentry *dentry, int kill)
+static int __remove_privs(struct vfsmount *mnt, struct dentry *dentry, int kill)
{
struct iattr newattrs;
@@ -1803,7 +1803,7 @@
* Note we call this on write, so notify_change will not
* encounter any conflicting delegations:
*/
- return notify_change(dentry, &newattrs, NULL);
+ return notify_change2(mnt, dentry, &newattrs, NULL);
}
/*
@@ -1825,7 +1825,7 @@
if (kill < 0)
return kill;
if (kill)
- error = __remove_privs(dentry, kill);
+ error = __remove_privs(file->f_path.mnt, dentry, kill);
if (!error)
inode_has_no_xattr(inode);
diff --git a/fs/internal.h b/fs/internal.h
index 48cee21..b433845 100644
--- a/fs/internal.h
+++ b/fs/internal.h
@@ -90,9 +90,11 @@
* super.c
*/
extern int do_remount_sb(struct super_block *, int, void *, int);
+extern int do_remount_sb2(struct vfsmount *, struct super_block *, int,
+ void *, int);
extern bool trylock_super(struct super_block *sb);
extern struct dentry *mount_fs(struct file_system_type *,
- int, const char *, void *);
+ int, const char *, struct vfsmount *, void *);
extern struct super_block *user_get_super(dev_t);
/*
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index 809cbcc..da2825a 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -614,7 +614,7 @@
*/
#define param_set_min_max(name, type, which_strtol, min, max) \
-static int param_set_##name(const char *val, struct kernel_param *kp) \
+static int param_set_##name(const char *val, const struct kernel_param *kp) \
{ \
char *endp; \
__typeof__(type) num = which_strtol(val, &endp, 0); \
diff --git a/fs/mpage.c b/fs/mpage.c
index b7e7f57..93e3cf7 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -32,6 +32,14 @@
#include <linux/cleancache.h>
#include "internal.h"
+#define CREATE_TRACE_POINTS
+#include <trace/events/android_fs.h>
+
+EXPORT_TRACEPOINT_SYMBOL(android_fs_datawrite_start);
+EXPORT_TRACEPOINT_SYMBOL(android_fs_datawrite_end);
+EXPORT_TRACEPOINT_SYMBOL(android_fs_dataread_start);
+EXPORT_TRACEPOINT_SYMBOL(android_fs_dataread_end);
+
/*
* I/O completion handler for multipage BIOs.
*
@@ -49,6 +57,16 @@
struct bio_vec *bv;
int i;
+ if (trace_android_fs_dataread_end_enabled() &&
+ (bio_data_dir(bio) == READ)) {
+ struct page *first_page = bio->bi_io_vec[0].bv_page;
+
+ if (first_page != NULL)
+ trace_android_fs_dataread_end(first_page->mapping->host,
+ page_offset(first_page),
+ bio->bi_iter.bi_size);
+ }
+
bio_for_each_segment_all(bv, bio, i) {
struct page *page = bv->bv_page;
page_endio(page, op_is_write(bio_op(bio)),
@@ -60,6 +78,24 @@
static struct bio *mpage_bio_submit(int op, int op_flags, struct bio *bio)
{
+ if (trace_android_fs_dataread_start_enabled() && (op == REQ_OP_READ)) {
+ struct page *first_page = bio->bi_io_vec[0].bv_page;
+
+ if (first_page != NULL) {
+ char *path, pathbuf[MAX_TRACE_PATHBUF_LEN];
+
+ path = android_fstrace_get_pathname(pathbuf,
+ MAX_TRACE_PATHBUF_LEN,
+ first_page->mapping->host);
+ trace_android_fs_dataread_start(
+ first_page->mapping->host,
+ page_offset(first_page),
+ bio->bi_iter.bi_size,
+ current->pid,
+ path,
+ current->comm);
+ }
+ }
bio->bi_end_io = mpage_end_io;
bio_set_op_attrs(bio, op, op_flags);
guard_bio_eod(op, bio);
diff --git a/fs/namei.c b/fs/namei.c
index 0b46b85..d0a390a 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -377,9 +377,11 @@
* flag in inode->i_opflags, that says "this has not special
* permission function, use the fast case".
*/
-static inline int do_inode_permission(struct inode *inode, int mask)
+static inline int do_inode_permission(struct vfsmount *mnt, struct inode *inode, int mask)
{
if (unlikely(!(inode->i_opflags & IOP_FASTPERM))) {
+ if (likely(mnt && inode->i_op->permission2))
+ return inode->i_op->permission2(mnt, inode, mask);
if (likely(inode->i_op->permission))
return inode->i_op->permission(inode, mask);
@@ -403,7 +405,7 @@
* This does not check for a read-only file system. You probably want
* inode_permission().
*/
-int __inode_permission(struct inode *inode, int mask)
+int __inode_permission2(struct vfsmount *mnt, struct inode *inode, int mask)
{
int retval;
@@ -423,7 +425,7 @@
return -EACCES;
}
- retval = do_inode_permission(inode, mask);
+ retval = do_inode_permission(mnt, inode, mask);
if (retval)
return retval;
@@ -431,7 +433,14 @@
if (retval)
return retval;
- return security_inode_permission(inode, mask);
+ retval = security_inode_permission(inode, mask);
+ return retval;
+}
+EXPORT_SYMBOL(__inode_permission2);
+
+int __inode_permission(struct inode *inode, int mask)
+{
+ return __inode_permission2(NULL, inode, mask);
}
EXPORT_SYMBOL(__inode_permission);
@@ -466,14 +475,20 @@
*
* When checking for MAY_APPEND, MAY_WRITE must also be set in @mask.
*/
-int inode_permission(struct inode *inode, int mask)
+int inode_permission2(struct vfsmount *mnt, struct inode *inode, int mask)
{
int retval;
retval = sb_permission(inode->i_sb, inode, mask);
if (retval)
return retval;
- return __inode_permission(inode, mask);
+ return __inode_permission2(mnt, inode, mask);
+}
+EXPORT_SYMBOL(inode_permission2);
+
+int inode_permission(struct inode *inode, int mask)
+{
+ return inode_permission2(NULL, inode, mask);
}
EXPORT_SYMBOL(inode_permission);
@@ -1666,13 +1681,13 @@
static inline int may_lookup(struct nameidata *nd)
{
if (nd->flags & LOOKUP_RCU) {
- int err = inode_permission(nd->inode, MAY_EXEC|MAY_NOT_BLOCK);
+ int err = inode_permission2(nd->path.mnt, nd->inode, MAY_EXEC|MAY_NOT_BLOCK);
if (err != -ECHILD)
return err;
if (unlazy_walk(nd))
return -ECHILD;
}
- return inode_permission(nd->inode, MAY_EXEC);
+ return inode_permission2(nd->path.mnt, nd->inode, MAY_EXEC);
}
static inline int handle_dots(struct nameidata *nd, int type)
@@ -2448,6 +2463,7 @@
/**
* lookup_one_len - filesystem helper to lookup single pathname component
* @name: pathname component to lookup
+ * @mnt: mount we are looking up on
* @base: base directory to lookup from
* @len: maximum length @len should be interpreted to
*
@@ -2456,7 +2472,7 @@
*
* The caller must hold base->i_mutex.
*/
-struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
+struct dentry *lookup_one_len2(const char *name, struct vfsmount *mnt, struct dentry *base, int len)
{
struct qstr this;
unsigned int c;
@@ -2490,12 +2506,18 @@
return ERR_PTR(err);
}
- err = inode_permission(base->d_inode, MAY_EXEC);
+ err = inode_permission2(mnt, base->d_inode, MAY_EXEC);
if (err)
return ERR_PTR(err);
return __lookup_hash(&this, base, 0);
}
+EXPORT_SYMBOL(lookup_one_len2);
+
+struct dentry *lookup_one_len(const char *name, struct dentry *base, int len)
+{
+ return lookup_one_len2(name, NULL, base, len);
+}
EXPORT_SYMBOL(lookup_one_len);
/**
@@ -2773,7 +2795,7 @@
* 11. We don't allow removal of NFS sillyrenamed files; it's handled by
* nfs_async_unlink().
*/
-static int may_delete(struct inode *dir, struct dentry *victim, bool isdir)
+static int may_delete(struct vfsmount *mnt, struct inode *dir, struct dentry *victim, bool isdir)
{
struct inode *inode = d_backing_inode(victim);
int error;
@@ -2785,7 +2807,7 @@
BUG_ON(victim->d_parent->d_inode != dir);
audit_inode_child(dir, victim, AUDIT_TYPE_CHILD_DELETE);
- error = inode_permission(dir, MAY_WRITE | MAY_EXEC);
+ error = inode_permission2(mnt, dir, MAY_WRITE | MAY_EXEC);
if (error)
return error;
if (IS_APPEND(dir))
@@ -2817,7 +2839,7 @@
* 4. We should have write and exec permissions on dir
* 5. We can't do it if dir is immutable (done in permission())
*/
-static inline int may_create(struct inode *dir, struct dentry *child)
+static inline int may_create(struct vfsmount *mnt, struct inode *dir, struct dentry *child)
{
struct user_namespace *s_user_ns;
audit_inode_child(dir, child, AUDIT_TYPE_CHILD_CREATE);
@@ -2829,7 +2851,7 @@
if (!kuid_has_mapping(s_user_ns, current_fsuid()) ||
!kgid_has_mapping(s_user_ns, current_fsgid()))
return -EOVERFLOW;
- return inode_permission(dir, MAY_WRITE | MAY_EXEC);
+ return inode_permission2(mnt, dir, MAY_WRITE | MAY_EXEC);
}
/*
@@ -2876,10 +2898,10 @@
}
EXPORT_SYMBOL(unlock_rename);
-int vfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
- bool want_excl)
+int vfs_create2(struct vfsmount *mnt, struct inode *dir, struct dentry *dentry,
+ umode_t mode, bool want_excl)
{
- int error = may_create(dir, dentry);
+ int error = may_create(mnt, dir, dentry);
if (error)
return error;
@@ -2895,6 +2917,13 @@
fsnotify_create(dir, dentry);
return error;
}
+EXPORT_SYMBOL(vfs_create2);
+
+int vfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
+ bool want_excl)
+{
+ return vfs_create2(NULL, dir, dentry, mode, want_excl);
+}
EXPORT_SYMBOL(vfs_create);
bool may_open_dev(const struct path *path)
@@ -2906,6 +2935,7 @@
static int may_open(const struct path *path, int acc_mode, int flag)
{
struct dentry *dentry = path->dentry;
+ struct vfsmount *mnt = path->mnt;
struct inode *inode = dentry->d_inode;
int error;
@@ -2930,7 +2960,7 @@
break;
}
- error = inode_permission(inode, MAY_OPEN | acc_mode);
+ error = inode_permission2(mnt, inode, MAY_OPEN | acc_mode);
if (error)
return error;
@@ -2965,7 +2995,7 @@
if (!error)
error = security_path_truncate(path);
if (!error) {
- error = do_truncate(path->dentry, 0,
+ error = do_truncate2(path->mnt, path->dentry, 0,
ATTR_MTIME|ATTR_CTIME|ATTR_OPEN,
filp);
}
@@ -2992,7 +3022,7 @@
!kgid_has_mapping(s_user_ns, current_fsgid()))
return -EOVERFLOW;
- error = inode_permission(dir->dentry->d_inode, MAY_WRITE | MAY_EXEC);
+ error = inode_permission2(dir->mnt, dir->dentry->d_inode, MAY_WRITE | MAY_EXEC);
if (error)
return error;
@@ -3405,7 +3435,8 @@
int error;
/* we want directory to be writable */
- error = inode_permission(dir, MAY_WRITE | MAY_EXEC);
+ error = inode_permission2(ERR_PTR(-EOPNOTSUPP), dir,
+ MAY_WRITE | MAY_EXEC);
if (error)
goto out_err;
error = -EOPNOTSUPP;
@@ -3683,9 +3714,9 @@
}
EXPORT_SYMBOL(user_path_create);
-int vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
+int vfs_mknod2(struct vfsmount *mnt, struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
{
- int error = may_create(dir, dentry);
+ int error = may_create(mnt, dir, dentry);
if (error)
return error;
@@ -3709,6 +3740,12 @@
fsnotify_create(dir, dentry);
return error;
}
+EXPORT_SYMBOL(vfs_mknod2);
+
+int vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t dev)
+{
+ return vfs_mknod2(NULL, dir, dentry, mode, dev);
+}
EXPORT_SYMBOL(vfs_mknod);
static int may_mknod(umode_t mode)
@@ -3751,12 +3788,12 @@
goto out;
switch (mode & S_IFMT) {
case 0: case S_IFREG:
- error = vfs_create(path.dentry->d_inode,dentry,mode,true);
+ error = vfs_create2(path.mnt, path.dentry->d_inode,dentry,mode,true);
if (!error)
ima_post_path_mknod(dentry);
break;
case S_IFCHR: case S_IFBLK:
- error = vfs_mknod(path.dentry->d_inode,dentry,mode,
+ error = vfs_mknod2(path.mnt, path.dentry->d_inode,dentry,mode,
new_decode_dev(dev));
break;
case S_IFIFO: case S_IFSOCK:
@@ -3777,9 +3814,9 @@
return sys_mknodat(AT_FDCWD, filename, mode, dev);
}
-int vfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+int vfs_mkdir2(struct vfsmount *mnt, struct inode *dir, struct dentry *dentry, umode_t mode)
{
- int error = may_create(dir, dentry);
+ int error = may_create(mnt, dir, dentry);
unsigned max_links = dir->i_sb->s_max_links;
if (error)
@@ -3801,6 +3838,12 @@
fsnotify_mkdir(dir, dentry);
return error;
}
+EXPORT_SYMBOL(vfs_mkdir2);
+
+int vfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+{
+ return vfs_mkdir2(NULL, dir, dentry, mode);
+}
EXPORT_SYMBOL(vfs_mkdir);
SYSCALL_DEFINE3(mkdirat, int, dfd, const char __user *, pathname, umode_t, mode)
@@ -3819,7 +3862,7 @@
mode &= ~current_umask();
error = security_path_mkdir(&path, dentry, mode);
if (!error)
- error = vfs_mkdir(path.dentry->d_inode, dentry, mode);
+ error = vfs_mkdir2(path.mnt, path.dentry->d_inode, dentry, mode);
done_path_create(&path, dentry);
if (retry_estale(error, lookup_flags)) {
lookup_flags |= LOOKUP_REVAL;
@@ -3833,9 +3876,9 @@
return sys_mkdirat(AT_FDCWD, pathname, mode);
}
-int vfs_rmdir(struct inode *dir, struct dentry *dentry)
+int vfs_rmdir2(struct vfsmount *mnt, struct inode *dir, struct dentry *dentry)
{
- int error = may_delete(dir, dentry, 1);
+ int error = may_delete(mnt, dir, dentry, 1);
if (error)
return error;
@@ -3870,6 +3913,12 @@
d_delete(dentry);
return error;
}
+EXPORT_SYMBOL(vfs_rmdir2);
+
+int vfs_rmdir(struct inode *dir, struct dentry *dentry)
+{
+ return vfs_rmdir2(NULL, dir, dentry);
+}
EXPORT_SYMBOL(vfs_rmdir);
static long do_rmdir(int dfd, const char __user *pathname)
@@ -3915,7 +3964,7 @@
error = security_path_rmdir(&path, dentry);
if (error)
goto exit3;
- error = vfs_rmdir(path.dentry->d_inode, dentry);
+ error = vfs_rmdir2(path.mnt, path.dentry->d_inode, dentry);
exit3:
dput(dentry);
exit2:
@@ -3954,10 +4003,10 @@
* be appropriate for callers that expect the underlying filesystem not
* to be NFS exported.
*/
-int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegated_inode)
+int vfs_unlink2(struct vfsmount *mnt, struct inode *dir, struct dentry *dentry, struct inode **delegated_inode)
{
struct inode *target = dentry->d_inode;
- int error = may_delete(dir, dentry, 0);
+ int error = may_delete(mnt, dir, dentry, 0);
if (error)
return error;
@@ -3992,6 +4041,12 @@
return error;
}
+EXPORT_SYMBOL(vfs_unlink2);
+
+int vfs_unlink(struct inode *dir, struct dentry *dentry, struct inode **delegated_inode)
+{
+ return vfs_unlink2(NULL, dir, dentry, delegated_inode);
+}
EXPORT_SYMBOL(vfs_unlink);
/*
@@ -4039,7 +4094,7 @@
error = security_path_unlink(&path, dentry);
if (error)
goto exit2;
- error = vfs_unlink(path.dentry->d_inode, dentry, &delegated_inode);
+ error = vfs_unlink2(path.mnt, path.dentry->d_inode, dentry, &delegated_inode);
exit2:
dput(dentry);
}
@@ -4089,9 +4144,9 @@
return do_unlinkat(AT_FDCWD, pathname);
}
-int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname)
+int vfs_symlink2(struct vfsmount *mnt, struct inode *dir, struct dentry *dentry, const char *oldname)
{
- int error = may_create(dir, dentry);
+ int error = may_create(mnt, dir, dentry);
if (error)
return error;
@@ -4108,6 +4163,12 @@
fsnotify_create(dir, dentry);
return error;
}
+EXPORT_SYMBOL(vfs_symlink2);
+
+int vfs_symlink(struct inode *dir, struct dentry *dentry, const char *oldname)
+{
+ return vfs_symlink2(NULL, dir, dentry, oldname);
+}
EXPORT_SYMBOL(vfs_symlink);
SYSCALL_DEFINE3(symlinkat, const char __user *, oldname,
@@ -4130,7 +4191,7 @@
error = security_path_symlink(&path, dentry, from->name);
if (!error)
- error = vfs_symlink(path.dentry->d_inode, dentry, from->name);
+ error = vfs_symlink2(path.mnt, path.dentry->d_inode, dentry, from->name);
done_path_create(&path, dentry);
if (retry_estale(error, lookup_flags)) {
lookup_flags |= LOOKUP_REVAL;
@@ -4165,7 +4226,7 @@
* be appropriate for callers that expect the underlying filesystem not
* to be NFS exported.
*/
-int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry, struct inode **delegated_inode)
+int vfs_link2(struct vfsmount *mnt, struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry, struct inode **delegated_inode)
{
struct inode *inode = old_dentry->d_inode;
unsigned max_links = dir->i_sb->s_max_links;
@@ -4174,7 +4235,7 @@
if (!inode)
return -ENOENT;
- error = may_create(dir, new_dentry);
+ error = may_create(mnt, dir, new_dentry);
if (error)
return error;
@@ -4224,6 +4285,12 @@
fsnotify_link(dir, inode, new_dentry);
return error;
}
+EXPORT_SYMBOL(vfs_link2);
+
+int vfs_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry, struct inode **delegated_inode)
+{
+ return vfs_link2(NULL, old_dentry, dir, new_dentry, delegated_inode);
+}
EXPORT_SYMBOL(vfs_link);
/*
@@ -4279,7 +4346,7 @@
error = security_path_link(old_path.dentry, &new_path, new_dentry);
if (error)
goto out_dput;
- error = vfs_link(old_path.dentry, new_path.dentry->d_inode, new_dentry, &delegated_inode);
+ error = vfs_link2(old_path.mnt, old_path.dentry, new_path.dentry->d_inode, new_dentry, &delegated_inode);
out_dput:
done_path_create(&new_path, new_dentry);
if (delegated_inode) {
@@ -4355,7 +4422,8 @@
* ->i_mutex on parents, which works but leads to some truly excessive
* locking].
*/
-int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+int vfs_rename2(struct vfsmount *mnt,
+ struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry,
struct inode **delegated_inode, unsigned int flags)
{
@@ -4370,19 +4438,19 @@
if (source == target)
return 0;
- error = may_delete(old_dir, old_dentry, is_dir);
+ error = may_delete(mnt, old_dir, old_dentry, is_dir);
if (error)
return error;
if (!target) {
- error = may_create(new_dir, new_dentry);
+ error = may_create(mnt, new_dir, new_dentry);
} else {
new_is_dir = d_is_dir(new_dentry);
if (!(flags & RENAME_EXCHANGE))
- error = may_delete(new_dir, new_dentry, is_dir);
+ error = may_delete(mnt, new_dir, new_dentry, is_dir);
else
- error = may_delete(new_dir, new_dentry, new_is_dir);
+ error = may_delete(mnt, new_dir, new_dentry, new_is_dir);
}
if (error)
return error;
@@ -4396,12 +4464,12 @@
*/
if (new_dir != old_dir) {
if (is_dir) {
- error = inode_permission(source, MAY_WRITE);
+ error = inode_permission2(mnt, source, MAY_WRITE);
if (error)
return error;
}
if ((flags & RENAME_EXCHANGE) && new_is_dir) {
- error = inode_permission(target, MAY_WRITE);
+ error = inode_permission2(mnt, target, MAY_WRITE);
if (error)
return error;
}
@@ -4478,6 +4546,14 @@
return error;
}
+EXPORT_SYMBOL(vfs_rename2);
+
+int vfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry,
+ struct inode **delegated_inode, unsigned int flags)
+{
+ return vfs_rename2(NULL, old_dir, old_dentry, new_dir, new_dentry, delegated_inode, flags);
+}
EXPORT_SYMBOL(vfs_rename);
SYSCALL_DEFINE5(renameat2, int, olddfd, const char __user *, oldname,
@@ -4591,7 +4667,7 @@
&new_path, new_dentry, flags);
if (error)
goto exit5;
- error = vfs_rename(old_path.dentry->d_inode, old_dentry,
+ error = vfs_rename2(old_path.mnt, old_path.dentry->d_inode, old_dentry,
new_path.dentry->d_inode, new_dentry,
&delegated_inode, flags);
exit5:
@@ -4636,7 +4712,7 @@
int vfs_whiteout(struct inode *dir, struct dentry *dentry)
{
- int error = may_create(dir, dentry);
+ int error = may_create(NULL, dir, dentry);
if (error)
return error;
diff --git a/fs/namespace.c b/fs/namespace.c
index 9dc146e7..ace65a4 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -226,6 +226,7 @@
mnt->mnt_count = 1;
mnt->mnt_writers = 0;
#endif
+ mnt->mnt.data = NULL;
INIT_HLIST_NODE(&mnt->mnt_hash);
INIT_LIST_HEAD(&mnt->mnt_child);
@@ -637,6 +638,7 @@
static void free_vfsmnt(struct mount *mnt)
{
+ kfree(mnt->mnt.data);
kfree_const(mnt->mnt_devname);
#ifdef CONFIG_SMP
free_percpu(mnt->mnt_pcp);
@@ -1040,10 +1042,18 @@
if (!mnt)
return ERR_PTR(-ENOMEM);
+ if (type->alloc_mnt_data) {
+ mnt->mnt.data = type->alloc_mnt_data();
+ if (!mnt->mnt.data) {
+ mnt_free_id(mnt);
+ free_vfsmnt(mnt);
+ return ERR_PTR(-ENOMEM);
+ }
+ }
if (flags & SB_KERNMOUNT)
mnt->mnt.mnt_flags = MNT_INTERNAL;
- root = mount_fs(type, flags, name, data);
+ root = mount_fs(type, flags, name, &mnt->mnt, data);
if (IS_ERR(root)) {
mnt_free_id(mnt);
free_vfsmnt(mnt);
@@ -1087,6 +1097,14 @@
if (!mnt)
return ERR_PTR(-ENOMEM);
+ if (sb->s_op->clone_mnt_data) {
+ mnt->mnt.data = sb->s_op->clone_mnt_data(old->mnt.data);
+ if (!mnt->mnt.data) {
+ err = -ENOMEM;
+ goto out_free;
+ }
+ }
+
if (flag & (CL_SLAVE | CL_PRIVATE | CL_SHARED_TO_SLAVE))
mnt->mnt_group_id = 0; /* not a peer of original */
else
@@ -2354,8 +2372,14 @@
err = change_mount_flags(path->mnt, ms_flags);
else if (!capable(CAP_SYS_ADMIN))
err = -EPERM;
- else
- err = do_remount_sb(sb, sb_flags, data, 0);
+ else {
+ err = do_remount_sb2(path->mnt, sb, sb_flags, data, 0);
+ namespace_lock();
+ lock_mount_hash();
+ propagate_remount(mnt);
+ unlock_mount_hash();
+ namespace_unlock();
+ }
if (!err) {
lock_mount_hash();
mnt_flags |= mnt->mnt.mnt_flags & ~MNT_USER_SETTABLE_MASK;
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index bf2c436..64dc8ea 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -671,8 +671,9 @@
* We only need to convert from xdr once so future lookups are much simpler
*/
static
-int nfs_readdir_filler(nfs_readdir_descriptor_t *desc, struct page* page)
+int nfs_readdir_filler(struct file *file, struct page* page)
{
+ nfs_readdir_descriptor_t *desc = (nfs_readdir_descriptor_t *)file;
struct inode *inode = file_inode(desc->file);
int ret;
@@ -705,7 +706,7 @@
struct page *get_cache_page(nfs_readdir_descriptor_t *desc)
{
return read_cache_page(desc->file->f_mapping,
- desc->page_index, (filler_t *)nfs_readdir_filler, desc);
+ desc->page_index, nfs_readdir_filler, desc);
}
/*
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 48d7277..42dbf4f 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -354,7 +354,7 @@
};
static int
-readpage_async_filler(void *data, struct page *page)
+readpage_async_filler(struct file *data, struct page *page)
{
struct nfs_readdesc *desc = (struct nfs_readdesc *)data;
struct nfs_page *new;
diff --git a/fs/nfs/symlink.c b/fs/nfs/symlink.c
index 06eb44b..220d5ba 100644
--- a/fs/nfs/symlink.c
+++ b/fs/nfs/symlink.c
@@ -26,8 +26,9 @@
* and straight-forward than readdir caching.
*/
-static int nfs_symlink_filler(struct inode *inode, struct page *page)
+static int nfs_symlink_filler(struct file *file, struct page *page)
{
+ struct inode *inode = (struct inode *)file;
int error;
error = NFS_PROTO(inode)->readlink(inode, page, 0, PAGE_SIZE);
@@ -66,7 +67,7 @@
if (err)
return err;
page = read_cache_page(&inode->i_data, 0,
- (filler_t *)nfs_symlink_filler, inode);
+ nfs_symlink_filler, inode);
if (IS_ERR(page))
return ERR_CAST(page);
}
diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c
index 06ffa13..35989c7 100644
--- a/fs/nilfs2/btree.c
+++ b/fs/nilfs2/btree.c
@@ -2158,8 +2158,8 @@
pagevec_init(&pvec, 0);
- while (pagevec_lookup_tag(&pvec, btcache, &index, PAGECACHE_TAG_DIRTY,
- PAGEVEC_SIZE)) {
+ while (pagevec_lookup_tag(&pvec, btcache, &index,
+ PAGECACHE_TAG_DIRTY)) {
for (i = 0; i < pagevec_count(&pvec); i++) {
bh = head = page_buffers(pvec.pages[i]);
do {
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 8616c46..1c16726 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -257,8 +257,7 @@
pagevec_init(&pvec, 0);
repeat:
- if (!pagevec_lookup_tag(&pvec, smap, &index, PAGECACHE_TAG_DIRTY,
- PAGEVEC_SIZE))
+ if (!pagevec_lookup_tag(&pvec, smap, &index, PAGECACHE_TAG_DIRTY))
return 0;
for (i = 0; i < pagevec_count(&pvec); i++) {
@@ -376,8 +375,8 @@
pagevec_init(&pvec, 0);
- while (pagevec_lookup_tag(&pvec, mapping, &index, PAGECACHE_TAG_DIRTY,
- PAGEVEC_SIZE)) {
+ while (pagevec_lookup_tag(&pvec, mapping, &index,
+ PAGECACHE_TAG_DIRTY)) {
for (i = 0; i < pagevec_count(&pvec); i++) {
struct page *page = pvec.pages[i];
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 50e1295..5f229f5 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -711,18 +711,14 @@
pagevec_init(&pvec, 0);
repeat:
if (unlikely(index > last) ||
- !pagevec_lookup_tag(&pvec, mapping, &index, PAGECACHE_TAG_DIRTY,
- min_t(pgoff_t, last - index,
- PAGEVEC_SIZE - 1) + 1))
+ !pagevec_lookup_range_tag(&pvec, mapping, &index, last,
+ PAGECACHE_TAG_DIRTY))
return ndirties;
for (i = 0; i < pagevec_count(&pvec); i++) {
struct buffer_head *bh, *head;
struct page *page = pvec.pages[i];
- if (unlikely(page->index > last))
- break;
-
lock_page(page);
if (!page_has_buffers(page))
create_empty_buffers(page, i_blocksize(inode), 0);
@@ -759,8 +755,8 @@
pagevec_init(&pvec, 0);
- while (pagevec_lookup_tag(&pvec, mapping, &index, PAGECACHE_TAG_DIRTY,
- PAGEVEC_SIZE)) {
+ while (pagevec_lookup_tag(&pvec, mapping, &index,
+ PAGECACHE_TAG_DIRTY)) {
for (i = 0; i < pagevec_count(&pvec); i++) {
bh = head = page_buffers(pvec.pages[i]);
do {
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 9752e72..685e39e 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -495,7 +495,7 @@
}
/* you can only watch an inode if you have read permissions on it */
- ret = inode_permission(path->dentry->d_inode, MAY_READ);
+ ret = inode_permission2(path->mnt, path->dentry->d_inode, MAY_READ);
if (ret)
path_put(path);
out:
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 7cc7d3f..5c3caea 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -335,7 +335,7 @@
if (error)
return error;
/* you can only watch an inode if you have read permissions on it */
- error = inode_permission(path->dentry->d_inode, MAY_READ);
+ error = inode_permission2(path->mnt, path->dentry->d_inode, MAY_READ);
if (error)
path_put(path);
return error;
@@ -671,6 +671,8 @@
struct fsnotify_group *group;
struct inode *inode;
struct path path;
+ struct path alteredpath;
+ struct path *canonical_path = &path;
struct fd f;
int ret;
unsigned flags = 0;
@@ -710,13 +712,22 @@
if (ret)
goto fput_and_out;
+ /* support stacked filesystems */
+ if(path.dentry && path.dentry->d_op) {
+ if (path.dentry->d_op->d_canonical_path) {
+ path.dentry->d_op->d_canonical_path(&path, &alteredpath);
+ canonical_path = &alteredpath;
+ path_put(&path);
+ }
+ }
+
/* inode held in place by reference to path; group by fget on fd */
- inode = path.dentry->d_inode;
+ inode = canonical_path->dentry->d_inode;
group = f.file->private_data;
/* create/update an inode mark */
ret = inotify_update_watch(group, inode, mask);
- path_put(&path);
+ path_put(canonical_path);
fput_and_out:
fdput(f);
return ret;
diff --git a/fs/ocfs2/dlmfs/dlmfs.c b/fs/ocfs2/dlmfs/dlmfs.c
index 9ab9e18..988137d 100644
--- a/fs/ocfs2/dlmfs/dlmfs.c
+++ b/fs/ocfs2/dlmfs/dlmfs.c
@@ -88,13 +88,13 @@
*/
#define DLMFS_CAPABILITIES "bast stackglue"
static int param_set_dlmfs_capabilities(const char *val,
- struct kernel_param *kp)
+ const struct kernel_param *kp)
{
printk(KERN_ERR "%s: readonly parameter\n", kp->name);
return -EINVAL;
}
static int param_get_dlmfs_capabilities(char *buffer,
- struct kernel_param *kp)
+ const struct kernel_param *kp)
{
return strlcpy(buffer, DLMFS_CAPABILITIES,
strlen(DLMFS_CAPABILITIES) + 1);
diff --git a/fs/open.c b/fs/open.c
index 7ea1184..39c24f1 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -34,8 +34,8 @@
#include "internal.h"
-int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
- struct file *filp)
+int do_truncate2(struct vfsmount *mnt, struct dentry *dentry, loff_t length,
+ unsigned int time_attrs, struct file *filp)
{
int ret;
struct iattr newattrs;
@@ -60,18 +60,25 @@
inode_lock(dentry->d_inode);
/* Note any delegations or leases have already been broken: */
- ret = notify_change(dentry, &newattrs, NULL);
+ ret = notify_change2(mnt, dentry, &newattrs, NULL);
inode_unlock(dentry->d_inode);
return ret;
}
+int do_truncate(struct dentry *dentry, loff_t length, unsigned int time_attrs,
+ struct file *filp)
+{
+ return do_truncate2(NULL, dentry, length, time_attrs, filp);
+}
long vfs_truncate(const struct path *path, loff_t length)
{
struct inode *inode;
+ struct vfsmount *mnt;
struct dentry *upperdentry;
long error;
inode = path->dentry->d_inode;
+ mnt = path->mnt;
/* For directories it's -EISDIR, for other non-regulars - -EINVAL */
if (S_ISDIR(inode->i_mode))
@@ -83,7 +90,7 @@
if (error)
goto out;
- error = inode_permission(inode, MAY_WRITE);
+ error = inode_permission2(mnt, inode, MAY_WRITE);
if (error)
goto mnt_drop_write_and_out;
@@ -117,7 +124,7 @@
if (!error)
error = security_path_truncate(path);
if (!error)
- error = do_truncate(path->dentry, length, 0, NULL);
+ error = do_truncate2(mnt, path->dentry, length, 0, NULL);
put_write_and_out:
put_write_access(upperdentry->d_inode);
@@ -166,6 +173,7 @@
{
struct inode *inode;
struct dentry *dentry;
+ struct vfsmount *mnt;
struct fd f;
int error;
@@ -182,6 +190,7 @@
small = 0;
dentry = f.file->f_path.dentry;
+ mnt = f.file->f_path.mnt;
inode = dentry->d_inode;
error = -EINVAL;
if (!S_ISREG(inode->i_mode) || !(f.file->f_mode & FMODE_WRITE))
@@ -202,7 +211,7 @@
if (!error)
error = security_path_truncate(&f.file->f_path);
if (!error)
- error = do_truncate(dentry, length, ATTR_MTIME|ATTR_CTIME, f.file);
+ error = do_truncate2(mnt, dentry, length, ATTR_MTIME|ATTR_CTIME, f.file);
sb_end_write(inode->i_sb);
out_putf:
fdput(f);
@@ -356,6 +365,7 @@
struct cred *override_cred;
struct path path;
struct inode *inode;
+ struct vfsmount *mnt;
int res;
unsigned int lookup_flags = LOOKUP_FOLLOW;
@@ -386,6 +396,7 @@
goto out;
inode = d_backing_inode(path.dentry);
+ mnt = path.mnt;
if ((mode & MAY_EXEC) && S_ISREG(inode->i_mode)) {
/*
@@ -397,7 +408,7 @@
goto out_path_release;
}
- res = inode_permission(inode, mode | MAY_ACCESS);
+ res = inode_permission2(mnt, inode, mode | MAY_ACCESS);
/* SuS v2 requires we report a read only fs too */
if (res || !(mode & S_IWOTH) || special_file(inode->i_mode))
goto out_path_release;
@@ -441,7 +452,7 @@
if (error)
goto out;
- error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
+ error = inode_permission2(path.mnt, path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
if (error)
goto dput_and_out;
@@ -470,7 +481,8 @@
if (!d_can_lookup(f.file->f_path.dentry))
goto out_putf;
- error = inode_permission(file_inode(f.file), MAY_EXEC | MAY_CHDIR);
+ error = inode_permission2(f.file->f_path.mnt, file_inode(f.file),
+ MAY_EXEC | MAY_CHDIR);
if (!error)
set_fs_pwd(current->fs, &f.file->f_path);
out_putf:
@@ -489,7 +501,7 @@
if (error)
goto out;
- error = inode_permission(path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
+ error = inode_permission2(path.mnt, path.dentry->d_inode, MAY_EXEC | MAY_CHDIR);
if (error)
goto dput_and_out;
@@ -529,7 +541,7 @@
goto out_unlock;
newattrs.ia_mode = (mode & S_IALLUGO) | (inode->i_mode & ~S_IALLUGO);
newattrs.ia_valid = ATTR_MODE | ATTR_CTIME;
- error = notify_change(path->dentry, &newattrs, &delegated_inode);
+ error = notify_change2(path->mnt, path->dentry, &newattrs, &delegated_inode);
out_unlock:
inode_unlock(inode);
if (delegated_inode) {
@@ -609,7 +621,7 @@
inode_lock(inode);
error = security_path_chown(path, uid, gid);
if (!error)
- error = notify_change(path->dentry, &newattrs, &delegated_inode);
+ error = notify_change2(path->mnt, path->dentry, &newattrs, &delegated_inode);
inode_unlock(inode);
if (delegated_inode) {
error = break_deleg_wait(&delegated_inode);
diff --git a/fs/pnode.c b/fs/pnode.c
index 53d411a..56f9a28 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -607,3 +607,37 @@
return 0;
}
+
+/*
+ * Iterates over all slaves, and slaves of slaves.
+ */
+static struct mount *next_descendent(struct mount *root, struct mount *cur)
+{
+ if (!IS_MNT_NEW(cur) && !list_empty(&cur->mnt_slave_list))
+ return first_slave(cur);
+ do {
+ struct mount *master = cur->mnt_master;
+
+ if (!master || cur->mnt_slave.next != &master->mnt_slave_list) {
+ struct mount *next = next_slave(cur);
+
+ return (next == root) ? NULL : next;
+ }
+ cur = master;
+ } while (cur != root);
+ return NULL;
+}
+
+void propagate_remount(struct mount *mnt)
+{
+ struct mount *m = mnt;
+ struct super_block *sb = mnt->mnt.mnt_sb;
+
+ if (sb->s_op->copy_mnt_data) {
+ m = next_descendent(mnt, m);
+ while (m) {
+ sb->s_op->copy_mnt_data(m->mnt.data, mnt->mnt.data);
+ m = next_descendent(mnt, m);
+ }
+ }
+}
diff --git a/fs/pnode.h b/fs/pnode.h
index dc87e65..a9a6576 100644
--- a/fs/pnode.h
+++ b/fs/pnode.h
@@ -44,6 +44,7 @@
int propagate_umount(struct list_head *);
int propagate_mount_busy(struct mount *, int);
void propagate_mount_unlock(struct mount *);
+void propagate_remount(struct mount *);
void mnt_release_group_id(struct mount *);
int get_dominating_id(struct mount *mnt, const struct path *root);
unsigned int mnt_get_count(struct mount *mnt);
diff --git a/fs/posix_acl.c b/fs/posix_acl.c
index eebf5f6..2fd0fde 100644
--- a/fs/posix_acl.c
+++ b/fs/posix_acl.c
@@ -43,7 +43,7 @@
rcu_read_lock();
acl = rcu_dereference(*p);
if (!acl || is_uncached_acl(acl) ||
- atomic_inc_not_zero(&acl->a_refcount))
+ refcount_inc_not_zero(&acl->a_refcount))
break;
rcu_read_unlock();
cpu_relax();
@@ -164,7 +164,7 @@
void
posix_acl_init(struct posix_acl *acl, int count)
{
- atomic_set(&acl->a_refcount, 1);
+ refcount_set(&acl->a_refcount, 1);
acl->a_count = count;
}
EXPORT_SYMBOL(posix_acl_init);
@@ -197,7 +197,7 @@
sizeof(struct posix_acl_entry);
clone = kmemdup(acl, size, flags);
if (clone)
- atomic_set(&clone->a_refcount, 1);
+ refcount_set(&clone->a_refcount, 1);
}
return clone;
}
diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
index 1ade120..08dce22 100644
--- a/fs/proc/Kconfig
+++ b/fs/proc/Kconfig
@@ -81,3 +81,10 @@
Say Y if you are running any user-space software which takes benefit from
this interface. For example, rkt is such a piece of software.
+
+config PROC_UID
+ bool "Include /proc/uid/ files"
+ default y
+ depends on PROC_FS && RT_MUTEXES
+ help
+ Provides aggregated per-uid information under /proc/uid.
diff --git a/fs/proc/Makefile b/fs/proc/Makefile
index f7456c4..d8dcb18 100644
--- a/fs/proc/Makefile
+++ b/fs/proc/Makefile
@@ -26,6 +26,7 @@
proc-y += namespaces.o
proc-y += self.o
proc-y += thread_self.o
+proc-$(CONFIG_PROC_UID) += uid.o
proc-$(CONFIG_PROC_SYSCTL) += proc_sysctl.o
proc-$(CONFIG_NET) += proc_net.o
proc-$(CONFIG_PROC_KCORE) += kcore.o
diff --git a/fs/proc/base.c b/fs/proc/base.c
index c5c42f3..154c5d9 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -93,6 +93,7 @@
#include <linux/sched/stat.h>
#include <linux/flex_array.h>
#include <linux/posix-timers.h>
+#include <linux/cpufreq_times.h>
#ifdef CONFIG_HARDWALL
#include <asm/hardwall.h>
#endif
@@ -3022,6 +3023,9 @@
#ifdef CONFIG_LIVEPATCH
ONE("patch_state", S_IRUSR, proc_pid_patch_state),
#endif
+#ifdef CONFIG_CPU_FREQ_TIMES
+ ONE("time_in_state", 0444, proc_time_in_state_show),
+#endif
};
static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx)
@@ -3409,6 +3413,9 @@
#ifdef CONFIG_LIVEPATCH
ONE("patch_state", S_IRUSR, proc_pid_patch_state),
#endif
+#ifdef CONFIG_CPU_FREQ_TIMES
+ ONE("time_in_state", 0444, proc_time_in_state_show),
+#endif
};
static int proc_tid_base_readdir(struct file *file, struct dir_context *ctx)
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index a34195e..5e55186 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -249,6 +249,15 @@
#endif
/*
+ * uid.c
+ */
+#ifdef CONFIG_PROC_UID
+extern int proc_uid_init(void);
+#else
+static inline void proc_uid_init(void) { }
+#endif
+
+/*
* proc_tty.c
*/
#ifdef CONFIG_TTY
diff --git a/fs/proc/root.c b/fs/proc/root.c
index 4e42aba..eafe87e 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -136,7 +136,7 @@
proc_symlink("mounts", NULL, "self/mounts");
proc_net_init();
-
+ proc_uid_init();
#ifdef CONFIG_SYSVIPC
proc_mkdir("sysvipc", NULL);
#endif
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 519522d..69e4b04 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -130,6 +130,56 @@
}
#endif
+static void seq_print_vma_name(struct seq_file *m, struct vm_area_struct *vma)
+{
+ const char __user *name = vma_get_anon_name(vma);
+ struct mm_struct *mm = vma->vm_mm;
+
+ unsigned long page_start_vaddr;
+ unsigned long page_offset;
+ unsigned long num_pages;
+ unsigned long max_len = NAME_MAX;
+ int i;
+
+ page_start_vaddr = (unsigned long)name & PAGE_MASK;
+ page_offset = (unsigned long)name - page_start_vaddr;
+ num_pages = DIV_ROUND_UP(page_offset + max_len, PAGE_SIZE);
+
+ seq_puts(m, "[anon:");
+
+ for (i = 0; i < num_pages; i++) {
+ int len;
+ int write_len;
+ const char *kaddr;
+ long pages_pinned;
+ struct page *page;
+
+ pages_pinned = get_user_pages_remote(current, mm,
+ page_start_vaddr, 1, 0, &page, NULL, NULL);
+ if (pages_pinned < 1) {
+ seq_puts(m, "<fault>]");
+ return;
+ }
+
+ kaddr = (const char *)kmap(page);
+ len = min(max_len, PAGE_SIZE - page_offset);
+ write_len = strnlen(kaddr + page_offset, len);
+ seq_write(m, kaddr + page_offset, write_len);
+ kunmap(page);
+ put_page(page);
+
+ /* if strnlen hit a null terminator then we're done */
+ if (write_len != len)
+ break;
+
+ max_len -= len;
+ page_offset = 0;
+ page_start_vaddr += PAGE_SIZE;
+ }
+
+ seq_putc(m, ']');
+}
+
static void vma_stop(struct proc_maps_private *priv)
{
struct mm_struct *mm = priv->mm;
@@ -349,8 +399,15 @@
goto done;
}
- if (is_stack(vma))
+ if (is_stack(vma)) {
name = "[stack]";
+ goto done;
+ }
+
+ if (vma_get_anon_name(vma)) {
+ seq_pad(m, ' ');
+ seq_print_vma_name(m, vma);
+ }
}
done:
@@ -798,6 +855,11 @@
if (!rollup_mode) {
show_map_vma(m, vma, is_pid);
+ if (vma_get_anon_name(vma)) {
+ seq_puts(m, "Name: ");
+ seq_print_vma_name(m, vma);
+ seq_putc(m, '\n');
+ }
} else if (last_vma) {
show_vma_header_prefix(
m, mss->first_vma_start, vma->vm_end, 0, 0, 0, 0);
diff --git a/fs/proc/uid.c b/fs/proc/uid.c
new file mode 100644
index 0000000..6a096d2
--- /dev/null
+++ b/fs/proc/uid.c
@@ -0,0 +1,295 @@
+/*
+ * /proc/uid support
+ */
+
+#include <linux/cpufreq_times.h>
+#include <linux/fs.h>
+#include <linux/hashtable.h>
+#include <linux/init.h>
+#include <linux/proc_fs.h>
+#include <linux/rtmutex.h>
+#include <linux/sched.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include "internal.h"
+
+static struct proc_dir_entry *proc_uid;
+
+#define UID_HASH_BITS 10
+
+static DECLARE_HASHTABLE(proc_uid_hash_table, UID_HASH_BITS);
+
+/*
+ * use rt_mutex here to avoid priority inversion between high-priority readers
+ * of these files and tasks calling proc_register_uid().
+ */
+static DEFINE_RT_MUTEX(proc_uid_lock); /* proc_uid_hash_table */
+
+struct uid_hash_entry {
+ uid_t uid;
+ struct hlist_node hash;
+};
+
+/* Caller must hold proc_uid_lock */
+static bool uid_hash_entry_exists_locked(uid_t uid)
+{
+ struct uid_hash_entry *entry;
+
+ hash_for_each_possible(proc_uid_hash_table, entry, hash, uid) {
+ if (entry->uid == uid)
+ return true;
+ }
+ return false;
+}
+
+void proc_register_uid(kuid_t kuid)
+{
+ struct uid_hash_entry *entry;
+ bool exists;
+ uid_t uid = from_kuid_munged(current_user_ns(), kuid);
+
+ rt_mutex_lock(&proc_uid_lock);
+ exists = uid_hash_entry_exists_locked(uid);
+ rt_mutex_unlock(&proc_uid_lock);
+ if (exists)
+ return;
+
+ entry = kzalloc(sizeof(struct uid_hash_entry), GFP_KERNEL);
+ if (!entry)
+ return;
+ entry->uid = uid;
+
+ rt_mutex_lock(&proc_uid_lock);
+ if (uid_hash_entry_exists_locked(uid))
+ kfree(entry);
+ else
+ hash_add(proc_uid_hash_table, &entry->hash, uid);
+ rt_mutex_unlock(&proc_uid_lock);
+}
+
+struct uid_entry {
+ const char *name;
+ int len;
+ umode_t mode;
+ const struct inode_operations *iop;
+ const struct file_operations *fop;
+};
+
+#define NOD(NAME, MODE, IOP, FOP) { \
+ .name = (NAME), \
+ .len = sizeof(NAME) - 1, \
+ .mode = MODE, \
+ .iop = IOP, \
+ .fop = FOP, \
+}
+
+#ifdef CONFIG_CPU_FREQ_TIMES
+static const struct file_operations proc_uid_time_in_state_operations = {
+ .open = single_uid_time_in_state_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+#endif
+
+static const struct uid_entry uid_base_stuff[] = {
+#ifdef CONFIG_CPU_FREQ_TIMES
+ NOD("time_in_state", 0444, NULL, &proc_uid_time_in_state_operations),
+#endif
+};
+
+static const struct inode_operations proc_uid_def_inode_operations = {
+ .setattr = proc_setattr,
+};
+
+static struct inode *proc_uid_make_inode(struct super_block *sb, kuid_t kuid)
+{
+ struct inode *inode;
+
+ inode = new_inode(sb);
+ if (!inode)
+ return NULL;
+
+ inode->i_ino = get_next_ino();
+ inode->i_mtime = inode->i_atime = inode->i_ctime = current_time(inode);
+ inode->i_op = &proc_uid_def_inode_operations;
+ inode->i_uid = kuid;
+
+ return inode;
+}
+
+static int proc_uident_instantiate(struct inode *dir, struct dentry *dentry,
+ struct task_struct *unused, const void *ptr)
+{
+ const struct uid_entry *u = ptr;
+ struct inode *inode;
+
+ inode = proc_uid_make_inode(dir->i_sb, dir->i_uid);
+ if (!inode)
+ return -ENOENT;
+
+ inode->i_mode = u->mode;
+ if (S_ISDIR(inode->i_mode))
+ set_nlink(inode, 2);
+ if (u->iop)
+ inode->i_op = u->iop;
+ if (u->fop)
+ inode->i_fop = u->fop;
+ d_add(dentry, inode);
+ return 0;
+}
+
+static struct dentry *proc_uid_base_lookup(struct inode *dir,
+ struct dentry *dentry,
+ unsigned int flags)
+{
+ const struct uid_entry *u, *last;
+ unsigned int nents = ARRAY_SIZE(uid_base_stuff);
+
+ if (nents == 0)
+ return ERR_PTR(-ENOENT);
+
+ last = &uid_base_stuff[nents - 1];
+ for (u = uid_base_stuff; u <= last; u++) {
+ if (u->len != dentry->d_name.len)
+ continue;
+ if (!memcmp(dentry->d_name.name, u->name, u->len))
+ break;
+ }
+ if (u > last)
+ return ERR_PTR(-ENOENT);
+
+ return ERR_PTR(proc_uident_instantiate(dir, dentry, NULL, u));
+}
+
+static int proc_uid_base_readdir(struct file *file, struct dir_context *ctx)
+{
+ unsigned int nents = ARRAY_SIZE(uid_base_stuff);
+ const struct uid_entry *u;
+
+ if (!dir_emit_dots(file, ctx))
+ return 0;
+
+ if (ctx->pos >= nents + 2)
+ return 0;
+
+ for (u = uid_base_stuff + (ctx->pos - 2);
+ u < uid_base_stuff + nents; u++) {
+ if (!proc_fill_cache(file, ctx, u->name, u->len,
+ proc_uident_instantiate, NULL, u))
+ break;
+ ctx->pos++;
+ }
+
+ return 0;
+}
+
+static const struct inode_operations proc_uid_base_inode_operations = {
+ .lookup = proc_uid_base_lookup,
+ .setattr = proc_setattr,
+};
+
+static const struct file_operations proc_uid_base_operations = {
+ .read = generic_read_dir,
+ .iterate = proc_uid_base_readdir,
+ .llseek = default_llseek,
+};
+
+static int proc_uid_instantiate(struct inode *dir, struct dentry *dentry,
+ struct task_struct *unused, const void *ptr)
+{
+ unsigned int i, len;
+ nlink_t nlinks;
+ kuid_t *kuid = (kuid_t *)ptr;
+ struct inode *inode = proc_uid_make_inode(dir->i_sb, *kuid);
+
+ if (!inode)
+ return -ENOENT;
+
+ inode->i_mode = S_IFDIR | 0555;
+ inode->i_op = &proc_uid_base_inode_operations;
+ inode->i_fop = &proc_uid_base_operations;
+ inode->i_flags |= S_IMMUTABLE;
+
+ nlinks = 2;
+ len = ARRAY_SIZE(uid_base_stuff);
+ for (i = 0; i < len; ++i) {
+ if (S_ISDIR(uid_base_stuff[i].mode))
+ ++nlinks;
+ }
+ set_nlink(inode, nlinks);
+
+ d_add(dentry, inode);
+
+ return 0;
+}
+
+static int proc_uid_readdir(struct file *file, struct dir_context *ctx)
+{
+ int last_shown, i;
+ unsigned long bkt;
+ struct uid_hash_entry *entry;
+
+ if (!dir_emit_dots(file, ctx))
+ return 0;
+
+ i = 0;
+ last_shown = ctx->pos - 2;
+ rt_mutex_lock(&proc_uid_lock);
+ hash_for_each(proc_uid_hash_table, bkt, entry, hash) {
+ int len;
+ char buf[PROC_NUMBUF];
+
+ if (i < last_shown)
+ continue;
+ len = snprintf(buf, sizeof(buf), "%u", entry->uid);
+ if (!proc_fill_cache(file, ctx, buf, len,
+ proc_uid_instantiate, NULL, &entry->uid))
+ break;
+ i++;
+ ctx->pos++;
+ }
+ rt_mutex_unlock(&proc_uid_lock);
+ return 0;
+}
+
+static struct dentry *proc_uid_lookup(struct inode *dir, struct dentry *dentry,
+ unsigned int flags)
+{
+ int result = -ENOENT;
+
+ uid_t uid = name_to_int(&dentry->d_name);
+ bool uid_exists;
+
+ rt_mutex_lock(&proc_uid_lock);
+ uid_exists = uid_hash_entry_exists_locked(uid);
+ rt_mutex_unlock(&proc_uid_lock);
+ if (uid_exists) {
+ kuid_t kuid = make_kuid(current_user_ns(), uid);
+
+ result = proc_uid_instantiate(dir, dentry, NULL, &kuid);
+ }
+ return ERR_PTR(result);
+}
+
+static const struct file_operations proc_uid_operations = {
+ .read = generic_read_dir,
+ .iterate = proc_uid_readdir,
+ .llseek = default_llseek,
+};
+
+static const struct inode_operations proc_uid_inode_operations = {
+ .lookup = proc_uid_lookup,
+ .setattr = proc_setattr,
+};
+
+int __init proc_uid_init(void)
+{
+ proc_uid = proc_mkdir("uid", NULL);
+ if (!proc_uid)
+ return -ENOMEM;
+ proc_uid->proc_iops = &proc_uid_inode_operations;
+ proc_uid->proc_fops = &proc_uid_operations;
+
+ return 0;
+}
diff --git a/fs/proc_namespace.c b/fs/proc_namespace.c
index 7626ee1..b859aae 100644
--- a/fs/proc_namespace.c
+++ b/fs/proc_namespace.c
@@ -121,7 +121,9 @@
if (err)
goto out;
show_mnt_opts(m, mnt);
- if (sb->s_op->show_options)
+ if (sb->s_op->show_options2)
+ err = sb->s_op->show_options2(mnt, m, mnt_path.dentry);
+ else if (sb->s_op->show_options)
err = sb->s_op->show_options(m, mnt_path.dentry);
seq_puts(m, " 0 0\n");
out:
@@ -183,7 +185,9 @@
err = show_sb_opts(m, sb);
if (err)
goto out;
- if (sb->s_op->show_options)
+ if (sb->s_op->show_options2) {
+ err = sb->s_op->show_options2(mnt, m, mnt->mnt_root);
+ } else if (sb->s_op->show_options)
err = sb->s_op->show_options(m, mnt->mnt_root);
seq_putc(m, '\n');
out:
diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c
index 7125b39..379b535 100644
--- a/fs/pstore/ram.c
+++ b/fs/pstore/ram.c
@@ -707,6 +707,12 @@
return 0;
}
+void notrace ramoops_console_write_buf(const char *buf, size_t size)
+{
+ struct ramoops_context *cxt = &oops_cxt;
+ persistent_ram_write(cxt->cprz, buf, size);
+}
+
static int ramoops_probe(struct platform_device *pdev)
{
struct device *dev = &pdev->dev;
diff --git a/fs/read_write.c b/fs/read_write.c
index 0046d72..62b9c34 100644
--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -455,6 +455,8 @@
return ret;
}
+EXPORT_SYMBOL(vfs_read);
+
static ssize_t new_sync_write(struct file *filp, const char __user *buf, size_t len, loff_t *ppos)
{
struct iovec iov = { .iov_base = (void __user *)buf, .iov_len = len };
@@ -553,6 +555,8 @@
return ret;
}
+EXPORT_SYMBOL(vfs_write);
+
static inline loff_t file_pos_read(struct file *file)
{
return file->f_pos;
diff --git a/fs/sdcardfs/Kconfig b/fs/sdcardfs/Kconfig
new file mode 100644
index 0000000..a1c1033
--- /dev/null
+++ b/fs/sdcardfs/Kconfig
@@ -0,0 +1,13 @@
+config SDCARD_FS
+ tristate "sdcard file system"
+ depends on CONFIGFS_FS
+ default n
+ help
+ Sdcardfs is based on Wrapfs file system.
+
+config SDCARD_FS_FADV_NOACTIVE
+ bool "sdcardfs fadvise noactive support"
+ depends on FADV_NOACTIVE
+ default y
+ help
+ Sdcardfs supports fadvise noactive mode.
diff --git a/fs/sdcardfs/Makefile b/fs/sdcardfs/Makefile
new file mode 100644
index 0000000..b84fbb2
--- /dev/null
+++ b/fs/sdcardfs/Makefile
@@ -0,0 +1,7 @@
+SDCARDFS_VERSION="0.1"
+
+EXTRA_CFLAGS += -DSDCARDFS_VERSION=\"$(SDCARDFS_VERSION)\"
+
+obj-$(CONFIG_SDCARD_FS) += sdcardfs.o
+
+sdcardfs-y := dentry.o file.o inode.o main.o super.o lookup.o mmap.o packagelist.o derived_perm.o
diff --git a/fs/sdcardfs/dentry.c b/fs/sdcardfs/dentry.c
new file mode 100644
index 0000000..776d549
--- /dev/null
+++ b/fs/sdcardfs/dentry.c
@@ -0,0 +1,189 @@
+/*
+ * fs/sdcardfs/dentry.c
+ *
+ * Copyright (c) 2013 Samsung Electronics Co. Ltd
+ * Authors: Daeho Jeong, Woojoong Lee, Seunghwan Hyun,
+ * Sunghwan Yun, Sungjong Seo
+ *
+ * This program has been developed as a stackable file system based on
+ * the WrapFS which written by
+ *
+ * Copyright (c) 1998-2011 Erez Zadok
+ * Copyright (c) 2009 Shrikar Archak
+ * Copyright (c) 2003-2011 Stony Brook University
+ * Copyright (c) 2003-2011 The Research Foundation of SUNY
+ *
+ * This file is dual licensed. It may be redistributed and/or modified
+ * under the terms of the Apache 2.0 License OR version 2 of the GNU
+ * General Public License.
+ */
+
+#include "sdcardfs.h"
+#include "linux/ctype.h"
+
+/*
+ * returns: -ERRNO if error (returned to user)
+ * 0: tell VFS to invalidate dentry
+ * 1: dentry is valid
+ */
+static int sdcardfs_d_revalidate(struct dentry *dentry, unsigned int flags)
+{
+ int err = 1;
+ struct path parent_lower_path, lower_path;
+ struct dentry *parent_dentry = NULL;
+ struct dentry *parent_lower_dentry = NULL;
+ struct dentry *lower_cur_parent_dentry = NULL;
+ struct dentry *lower_dentry = NULL;
+ struct inode *inode;
+ struct sdcardfs_inode_data *data;
+
+ if (flags & LOOKUP_RCU)
+ return -ECHILD;
+
+ spin_lock(&dentry->d_lock);
+ if (IS_ROOT(dentry)) {
+ spin_unlock(&dentry->d_lock);
+ return 1;
+ }
+ spin_unlock(&dentry->d_lock);
+
+ /* check uninitialized obb_dentry and
+ * whether the base obbpath has been changed or not
+ */
+ if (is_obbpath_invalid(dentry)) {
+ return 0;
+ }
+
+ parent_dentry = dget_parent(dentry);
+ sdcardfs_get_lower_path(parent_dentry, &parent_lower_path);
+ sdcardfs_get_real_lower(dentry, &lower_path);
+ parent_lower_dentry = parent_lower_path.dentry;
+ lower_dentry = lower_path.dentry;
+ lower_cur_parent_dentry = dget_parent(lower_dentry);
+
+ if ((lower_dentry->d_flags & DCACHE_OP_REVALIDATE)) {
+ err = lower_dentry->d_op->d_revalidate(lower_dentry, flags);
+ if (err == 0) {
+ goto out;
+ }
+ }
+
+ spin_lock(&lower_dentry->d_lock);
+ if (d_unhashed(lower_dentry)) {
+ spin_unlock(&lower_dentry->d_lock);
+ err = 0;
+ goto out;
+ }
+ spin_unlock(&lower_dentry->d_lock);
+
+ if (parent_lower_dentry != lower_cur_parent_dentry) {
+ err = 0;
+ goto out;
+ }
+
+ if (dentry < lower_dentry) {
+ spin_lock(&dentry->d_lock);
+ spin_lock_nested(&lower_dentry->d_lock, DENTRY_D_LOCK_NESTED);
+ } else {
+ spin_lock(&lower_dentry->d_lock);
+ spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
+ }
+
+ if (!qstr_case_eq(&dentry->d_name, &lower_dentry->d_name)) {
+ err = 0;
+ }
+
+ if (dentry < lower_dentry) {
+ spin_unlock(&lower_dentry->d_lock);
+ spin_unlock(&dentry->d_lock);
+ } else {
+ spin_unlock(&dentry->d_lock);
+ spin_unlock(&lower_dentry->d_lock);
+ }
+ if (!err)
+ goto out;
+
+ /* If our top's inode is gone, we may be out of date */
+ inode = igrab(d_inode(dentry));
+ if (inode) {
+ data = top_data_get(SDCARDFS_I(inode));
+ if (!data || data->abandoned) {
+ err = 0;
+ }
+ if (data)
+ data_put(data);
+ iput(inode);
+ }
+
+out:
+ dput(parent_dentry);
+ dput(lower_cur_parent_dentry);
+ sdcardfs_put_lower_path(parent_dentry, &parent_lower_path);
+ sdcardfs_put_real_lower(dentry, &lower_path);
+ return err;
+}
+
+static void sdcardfs_d_release(struct dentry *dentry)
+{
+ if (!dentry || !dentry->d_fsdata)
+ return;
+ /* release and reset the lower paths */
+ if (has_graft_path(dentry))
+ sdcardfs_put_reset_orig_path(dentry);
+ sdcardfs_put_reset_lower_path(dentry);
+ free_dentry_private_data(dentry);
+}
+
+static int sdcardfs_hash_ci(const struct dentry *dentry,
+ struct qstr *qstr)
+{
+ /*
+ * This function is copy of vfat_hashi.
+ * FIXME Should we support national language?
+ * Refer to vfat_hashi()
+ * struct nls_table *t = MSDOS_SB(dentry->d_sb)->nls_io;
+ */
+ const unsigned char *name;
+ unsigned int len;
+ unsigned long hash;
+
+ name = qstr->name;
+ len = qstr->len;
+
+ hash = init_name_hash(dentry);
+ while (len--)
+ hash = partial_name_hash(tolower(*name++), hash);
+ qstr->hash = end_name_hash(hash);
+
+ return 0;
+}
+
+/*
+ * Case insensitive compare of two vfat names.
+ */
+static int sdcardfs_cmp_ci(const struct dentry *dentry,
+ unsigned int len, const char *str, const struct qstr *name)
+{
+ /* FIXME Should we support national language? */
+
+ if (name->len == len) {
+ if (str_n_case_eq(name->name, str, len))
+ return 0;
+ }
+ return 1;
+}
+
+static void sdcardfs_canonical_path(const struct path *path,
+ struct path *actual_path)
+{
+ sdcardfs_get_real_lower(path->dentry, actual_path);
+}
+
+const struct dentry_operations sdcardfs_ci_dops = {
+ .d_revalidate = sdcardfs_d_revalidate,
+ .d_release = sdcardfs_d_release,
+ .d_hash = sdcardfs_hash_ci,
+ .d_compare = sdcardfs_cmp_ci,
+ .d_canonical_path = sdcardfs_canonical_path,
+};
+
diff --git a/fs/sdcardfs/derived_perm.c b/fs/sdcardfs/derived_perm.c
new file mode 100644
index 0000000..0b3b223
--- /dev/null
+++ b/fs/sdcardfs/derived_perm.c
@@ -0,0 +1,472 @@
+/*
+ * fs/sdcardfs/derived_perm.c
+ *
+ * Copyright (c) 2013 Samsung Electronics Co. Ltd
+ * Authors: Daeho Jeong, Woojoong Lee, Seunghwan Hyun,
+ * Sunghwan Yun, Sungjong Seo
+ *
+ * This program has been developed as a stackable file system based on
+ * the WrapFS which written by
+ *
+ * Copyright (c) 1998-2011 Erez Zadok
+ * Copyright (c) 2009 Shrikar Archak
+ * Copyright (c) 2003-2011 Stony Brook University
+ * Copyright (c) 2003-2011 The Research Foundation of SUNY
+ *
+ * This file is dual licensed. It may be redistributed and/or modified
+ * under the terms of the Apache 2.0 License OR version 2 of the GNU
+ * General Public License.
+ */
+
+#include "sdcardfs.h"
+
+/* copy derived state from parent inode */
+static void inherit_derived_state(struct inode *parent, struct inode *child)
+{
+ struct sdcardfs_inode_info *pi = SDCARDFS_I(parent);
+ struct sdcardfs_inode_info *ci = SDCARDFS_I(child);
+
+ ci->data->perm = PERM_INHERIT;
+ ci->data->userid = pi->data->userid;
+ ci->data->d_uid = pi->data->d_uid;
+ ci->data->under_android = pi->data->under_android;
+ ci->data->under_cache = pi->data->under_cache;
+ ci->data->under_obb = pi->data->under_obb;
+}
+
+/* helper function for derived state */
+void setup_derived_state(struct inode *inode, perm_t perm, userid_t userid,
+ uid_t uid)
+{
+ struct sdcardfs_inode_info *info = SDCARDFS_I(inode);
+
+ info->data->perm = perm;
+ info->data->userid = userid;
+ info->data->d_uid = uid;
+ info->data->under_android = false;
+ info->data->under_cache = false;
+ info->data->under_obb = false;
+}
+
+/* While renaming, there is a point where we want the path from dentry,
+ * but the name from newdentry
+ */
+void get_derived_permission_new(struct dentry *parent, struct dentry *dentry,
+ const struct qstr *name)
+{
+ struct sdcardfs_inode_info *info = SDCARDFS_I(d_inode(dentry));
+ struct sdcardfs_inode_info *parent_info = SDCARDFS_I(d_inode(parent));
+ struct sdcardfs_inode_data *parent_data = parent_info->data;
+ appid_t appid;
+ unsigned long user_num;
+ int err;
+ struct qstr q_Android = QSTR_LITERAL("Android");
+ struct qstr q_data = QSTR_LITERAL("data");
+ struct qstr q_obb = QSTR_LITERAL("obb");
+ struct qstr q_media = QSTR_LITERAL("media");
+ struct qstr q_cache = QSTR_LITERAL("cache");
+
+ /* By default, each inode inherits from its parent.
+ * the properties are maintained on its private fields
+ * because the inode attributes will be modified with that of
+ * its lower inode.
+ * These values are used by our custom permission call instead
+ * of using the inode permissions.
+ */
+
+ inherit_derived_state(d_inode(parent), d_inode(dentry));
+
+ /* Files don't get special labels */
+ if (!S_ISDIR(d_inode(dentry)->i_mode)) {
+ set_top(info, parent_info);
+ return;
+ }
+ /* Derive custom permissions based on parent and current node */
+ switch (parent_data->perm) {
+ case PERM_INHERIT:
+ case PERM_ANDROID_PACKAGE_CACHE:
+ set_top(info, parent_info);
+ break;
+ case PERM_PRE_ROOT:
+ /* Legacy internal layout places users at top level */
+ info->data->perm = PERM_ROOT;
+ err = kstrtoul(name->name, 10, &user_num);
+ if (err)
+ info->data->userid = 0;
+ else
+ info->data->userid = user_num;
+ break;
+ case PERM_ROOT:
+ /* Assume masked off by default. */
+ if (qstr_case_eq(name, &q_Android)) {
+ /* App-specific directories inside; let anyone traverse */
+ info->data->perm = PERM_ANDROID;
+ info->data->under_android = true;
+ } else {
+ set_top(info, parent_info);
+ }
+ break;
+ case PERM_ANDROID:
+ if (qstr_case_eq(name, &q_data)) {
+ /* App-specific directories inside; let anyone traverse */
+ info->data->perm = PERM_ANDROID_DATA;
+ } else if (qstr_case_eq(name, &q_obb)) {
+ /* App-specific directories inside; let anyone traverse */
+ info->data->perm = PERM_ANDROID_OBB;
+ info->data->under_obb = true;
+ /* Single OBB directory is always shared */
+ } else if (qstr_case_eq(name, &q_media)) {
+ /* App-specific directories inside; let anyone traverse */
+ info->data->perm = PERM_ANDROID_MEDIA;
+ } else {
+ set_top(info, parent_info);
+ }
+ break;
+ case PERM_ANDROID_OBB:
+ case PERM_ANDROID_DATA:
+ case PERM_ANDROID_MEDIA:
+ info->data->perm = PERM_ANDROID_PACKAGE;
+ appid = get_appid(name->name);
+ if (appid != 0 && !is_excluded(name->name, parent_data->userid))
+ info->data->d_uid =
+ multiuser_get_uid(parent_data->userid, appid);
+ break;
+ case PERM_ANDROID_PACKAGE:
+ if (qstr_case_eq(name, &q_cache)) {
+ info->data->perm = PERM_ANDROID_PACKAGE_CACHE;
+ info->data->under_cache = true;
+ }
+ set_top(info, parent_info);
+ break;
+ }
+}
+
+void get_derived_permission(struct dentry *parent, struct dentry *dentry)
+{
+ get_derived_permission_new(parent, dentry, &dentry->d_name);
+}
+
+static appid_t get_type(const char *name)
+{
+ const char *ext = strrchr(name, '.');
+ appid_t id;
+
+ if (ext && ext[0]) {
+ ext = &ext[1];
+ id = get_ext_gid(ext);
+ return id?:AID_MEDIA_RW;
+ }
+ return AID_MEDIA_RW;
+}
+
+void fixup_lower_ownership(struct dentry *dentry, const char *name)
+{
+ struct path path;
+ struct inode *inode;
+ struct inode *delegated_inode = NULL;
+ int error;
+ struct sdcardfs_inode_info *info;
+ struct sdcardfs_inode_data *info_d;
+ struct sdcardfs_inode_data *info_top;
+ perm_t perm;
+ struct sdcardfs_sb_info *sbi = SDCARDFS_SB(dentry->d_sb);
+ uid_t uid = sbi->options.fs_low_uid;
+ gid_t gid = sbi->options.fs_low_gid;
+ struct iattr newattrs;
+
+ if (!sbi->options.gid_derivation)
+ return;
+
+ info = SDCARDFS_I(d_inode(dentry));
+ info_d = info->data;
+ perm = info_d->perm;
+ if (info_d->under_obb) {
+ perm = PERM_ANDROID_OBB;
+ } else if (info_d->under_cache) {
+ perm = PERM_ANDROID_PACKAGE_CACHE;
+ } else if (perm == PERM_INHERIT) {
+ info_top = top_data_get(info);
+ perm = info_top->perm;
+ data_put(info_top);
+ }
+
+ switch (perm) {
+ case PERM_ROOT:
+ case PERM_ANDROID:
+ case PERM_ANDROID_DATA:
+ case PERM_ANDROID_MEDIA:
+ case PERM_ANDROID_PACKAGE:
+ case PERM_ANDROID_PACKAGE_CACHE:
+ uid = multiuser_get_uid(info_d->userid, uid);
+ break;
+ case PERM_ANDROID_OBB:
+ uid = AID_MEDIA_OBB;
+ break;
+ case PERM_PRE_ROOT:
+ default:
+ break;
+ }
+ switch (perm) {
+ case PERM_ROOT:
+ case PERM_ANDROID:
+ case PERM_ANDROID_DATA:
+ case PERM_ANDROID_MEDIA:
+ if (S_ISDIR(d_inode(dentry)->i_mode))
+ gid = multiuser_get_uid(info_d->userid, AID_MEDIA_RW);
+ else
+ gid = multiuser_get_uid(info_d->userid, get_type(name));
+ break;
+ case PERM_ANDROID_OBB:
+ gid = AID_MEDIA_OBB;
+ break;
+ case PERM_ANDROID_PACKAGE:
+ if (uid_is_app(info_d->d_uid))
+ gid = multiuser_get_ext_gid(info_d->d_uid);
+ else
+ gid = multiuser_get_uid(info_d->userid, AID_MEDIA_RW);
+ break;
+ case PERM_ANDROID_PACKAGE_CACHE:
+ if (uid_is_app(info_d->d_uid))
+ gid = multiuser_get_ext_cache_gid(info_d->d_uid);
+ else
+ gid = multiuser_get_uid(info_d->userid, AID_MEDIA_RW);
+ break;
+ case PERM_PRE_ROOT:
+ default:
+ break;
+ }
+
+ sdcardfs_get_lower_path(dentry, &path);
+ inode = d_inode(path.dentry);
+ if (d_inode(path.dentry)->i_gid.val != gid || d_inode(path.dentry)->i_uid.val != uid) {
+retry_deleg:
+ newattrs.ia_valid = ATTR_GID | ATTR_UID | ATTR_FORCE;
+ newattrs.ia_uid = make_kuid(current_user_ns(), uid);
+ newattrs.ia_gid = make_kgid(current_user_ns(), gid);
+ if (!S_ISDIR(inode->i_mode))
+ newattrs.ia_valid |=
+ ATTR_KILL_SUID | ATTR_KILL_SGID | ATTR_KILL_PRIV;
+ inode_lock(inode);
+ error = security_path_chown(&path, newattrs.ia_uid, newattrs.ia_gid);
+ if (!error)
+ error = notify_change2(path.mnt, path.dentry, &newattrs, &delegated_inode);
+ inode_unlock(inode);
+ if (delegated_inode) {
+ error = break_deleg_wait(&delegated_inode);
+ if (!error)
+ goto retry_deleg;
+ }
+ if (error)
+ pr_debug("sdcardfs: Failed to touch up lower fs gid/uid for %s\n", name);
+ }
+ sdcardfs_put_lower_path(dentry, &path);
+}
+
+static int descendant_may_need_fixup(struct sdcardfs_inode_data *data,
+ struct limit_search *limit)
+{
+ if (data->perm == PERM_ROOT)
+ return (limit->flags & BY_USERID) ?
+ data->userid == limit->userid : 1;
+ if (data->perm == PERM_PRE_ROOT || data->perm == PERM_ANDROID)
+ return 1;
+ return 0;
+}
+
+static int needs_fixup(perm_t perm)
+{
+ if (perm == PERM_ANDROID_DATA || perm == PERM_ANDROID_OBB
+ || perm == PERM_ANDROID_MEDIA)
+ return 1;
+ return 0;
+}
+
+static void __fixup_perms_recursive(struct dentry *dentry, struct limit_search *limit, int depth)
+{
+ struct dentry *child;
+ struct sdcardfs_inode_info *info;
+
+ /*
+ * All paths will terminate their recursion on hitting PERM_ANDROID_OBB,
+ * PERM_ANDROID_MEDIA, or PERM_ANDROID_DATA. This happens at a depth of
+ * at most 3.
+ */
+ WARN(depth > 3, "%s: Max expected depth exceeded!\n", __func__);
+ spin_lock_nested(&dentry->d_lock, depth);
+ if (!d_inode(dentry)) {
+ spin_unlock(&dentry->d_lock);
+ return;
+ }
+ info = SDCARDFS_I(d_inode(dentry));
+
+ if (needs_fixup(info->data->perm)) {
+ list_for_each_entry(child, &dentry->d_subdirs, d_child) {
+ spin_lock_nested(&child->d_lock, depth + 1);
+ if (!(limit->flags & BY_NAME) || qstr_case_eq(&child->d_name, &limit->name)) {
+ if (d_inode(child)) {
+ get_derived_permission(dentry, child);
+ fixup_tmp_permissions(d_inode(child));
+ spin_unlock(&child->d_lock);
+ break;
+ }
+ }
+ spin_unlock(&child->d_lock);
+ }
+ } else if (descendant_may_need_fixup(info->data, limit)) {
+ list_for_each_entry(child, &dentry->d_subdirs, d_child) {
+ __fixup_perms_recursive(child, limit, depth + 1);
+ }
+ }
+ spin_unlock(&dentry->d_lock);
+}
+
+void fixup_perms_recursive(struct dentry *dentry, struct limit_search *limit)
+{
+ __fixup_perms_recursive(dentry, limit, 0);
+}
+
+/* main function for updating derived permission */
+inline void update_derived_permission_lock(struct dentry *dentry)
+{
+ struct dentry *parent;
+
+ if (!dentry || !d_inode(dentry)) {
+ pr_err("sdcardfs: %s: invalid dentry\n", __func__);
+ return;
+ }
+ /* FIXME:
+ * 1. need to check whether the dentry is updated or not
+ * 2. remove the root dentry update
+ */
+ if (!IS_ROOT(dentry)) {
+ parent = dget_parent(dentry);
+ if (parent) {
+ get_derived_permission(parent, dentry);
+ dput(parent);
+ }
+ }
+ fixup_tmp_permissions(d_inode(dentry));
+}
+
+int need_graft_path(struct dentry *dentry)
+{
+ int ret = 0;
+ struct dentry *parent = dget_parent(dentry);
+ struct sdcardfs_inode_info *parent_info = SDCARDFS_I(d_inode(parent));
+ struct sdcardfs_sb_info *sbi = SDCARDFS_SB(dentry->d_sb);
+ struct qstr obb = QSTR_LITERAL("obb");
+
+ if (parent_info->data->perm == PERM_ANDROID &&
+ qstr_case_eq(&dentry->d_name, &obb)) {
+
+ /* /Android/obb is the base obbpath of DERIVED_UNIFIED */
+ if (!(sbi->options.multiuser == false
+ && parent_info->data->userid == 0)) {
+ ret = 1;
+ }
+ }
+ dput(parent);
+ return ret;
+}
+
+int is_obbpath_invalid(struct dentry *dent)
+{
+ int ret = 0;
+ struct sdcardfs_dentry_info *di = SDCARDFS_D(dent);
+ struct sdcardfs_sb_info *sbi = SDCARDFS_SB(dent->d_sb);
+ char *path_buf, *obbpath_s;
+ int need_put = 0;
+ struct path lower_path;
+
+ /* check the base obbpath has been changed.
+ * this routine can check an uninitialized obb dentry as well.
+ * regarding the uninitialized obb, refer to the sdcardfs_mkdir()
+ */
+ spin_lock(&di->lock);
+ if (di->orig_path.dentry) {
+ if (!di->lower_path.dentry) {
+ ret = 1;
+ } else {
+ path_get(&di->lower_path);
+
+ path_buf = kmalloc(PATH_MAX, GFP_ATOMIC);
+ if (!path_buf) {
+ ret = 1;
+ pr_err("sdcardfs: fail to allocate path_buf in %s.\n", __func__);
+ } else {
+ obbpath_s = d_path(&di->lower_path, path_buf, PATH_MAX);
+ if (d_unhashed(di->lower_path.dentry) ||
+ !str_case_eq(sbi->obbpath_s, obbpath_s)) {
+ ret = 1;
+ }
+ kfree(path_buf);
+ }
+
+ pathcpy(&lower_path, &di->lower_path);
+ need_put = 1;
+ }
+ }
+ spin_unlock(&di->lock);
+ if (need_put)
+ path_put(&lower_path);
+ return ret;
+}
+
+int is_base_obbpath(struct dentry *dentry)
+{
+ int ret = 0;
+ struct dentry *parent = dget_parent(dentry);
+ struct sdcardfs_inode_info *parent_info = SDCARDFS_I(d_inode(parent));
+ struct sdcardfs_sb_info *sbi = SDCARDFS_SB(dentry->d_sb);
+ struct qstr q_obb = QSTR_LITERAL("obb");
+
+ spin_lock(&SDCARDFS_D(dentry)->lock);
+ if (sbi->options.multiuser) {
+ if (parent_info->data->perm == PERM_PRE_ROOT &&
+ qstr_case_eq(&dentry->d_name, &q_obb)) {
+ ret = 1;
+ }
+ } else if (parent_info->data->perm == PERM_ANDROID &&
+ qstr_case_eq(&dentry->d_name, &q_obb)) {
+ ret = 1;
+ }
+ spin_unlock(&SDCARDFS_D(dentry)->lock);
+ return ret;
+}
+
+/* The lower_path will be stored to the dentry's orig_path
+ * and the base obbpath will be copyed to the lower_path variable.
+ * if an error returned, there's no change in the lower_path
+ * returns: -ERRNO if error (0: no error)
+ */
+int setup_obb_dentry(struct dentry *dentry, struct path *lower_path)
+{
+ int err = 0;
+ struct sdcardfs_sb_info *sbi = SDCARDFS_SB(dentry->d_sb);
+ struct path obbpath;
+
+ /* A local obb dentry must have its own orig_path to support rmdir
+ * and mkdir of itself. Usually, we expect that the sbi->obbpath
+ * is avaiable on this stage.
+ */
+ sdcardfs_set_orig_path(dentry, lower_path);
+
+ err = kern_path(sbi->obbpath_s,
+ LOOKUP_FOLLOW | LOOKUP_DIRECTORY, &obbpath);
+
+ if (!err) {
+ /* the obbpath base has been found */
+ pathcpy(lower_path, &obbpath);
+ } else {
+ /* if the sbi->obbpath is not available, we can optionally
+ * setup the lower_path with its orig_path.
+ * but, the current implementation just returns an error
+ * because the sdcard daemon also regards this case as
+ * a lookup fail.
+ */
+ pr_info("sdcardfs: the sbi->obbpath is not available\n");
+ }
+ return err;
+}
+
+
diff --git a/fs/sdcardfs/file.c b/fs/sdcardfs/file.c
new file mode 100644
index 0000000..1461254
--- /dev/null
+++ b/fs/sdcardfs/file.c
@@ -0,0 +1,455 @@
+/*
+ * fs/sdcardfs/file.c
+ *
+ * Copyright (c) 2013 Samsung Electronics Co. Ltd
+ * Authors: Daeho Jeong, Woojoong Lee, Seunghwan Hyun,
+ * Sunghwan Yun, Sungjong Seo
+ *
+ * This program has been developed as a stackable file system based on
+ * the WrapFS which written by
+ *
+ * Copyright (c) 1998-2011 Erez Zadok
+ * Copyright (c) 2009 Shrikar Archak
+ * Copyright (c) 2003-2011 Stony Brook University
+ * Copyright (c) 2003-2011 The Research Foundation of SUNY
+ *
+ * This file is dual licensed. It may be redistributed and/or modified
+ * under the terms of the Apache 2.0 License OR version 2 of the GNU
+ * General Public License.
+ */
+
+#include "sdcardfs.h"
+#ifdef CONFIG_SDCARD_FS_FADV_NOACTIVE
+#include <linux/backing-dev.h>
+#endif
+
+static ssize_t sdcardfs_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ int err;
+ struct file *lower_file;
+ struct dentry *dentry = file->f_path.dentry;
+#ifdef CONFIG_SDCARD_FS_FADV_NOACTIVE
+ struct backing_dev_info *bdi;
+#endif
+
+ lower_file = sdcardfs_lower_file(file);
+
+#ifdef CONFIG_SDCARD_FS_FADV_NOACTIVE
+ if (file->f_mode & FMODE_NOACTIVE) {
+ if (!(lower_file->f_mode & FMODE_NOACTIVE)) {
+ bdi = lower_file->f_mapping->backing_dev_info;
+ lower_file->f_ra.ra_pages = bdi->ra_pages * 2;
+ spin_lock(&lower_file->f_lock);
+ lower_file->f_mode |= FMODE_NOACTIVE;
+ spin_unlock(&lower_file->f_lock);
+ }
+ }
+#endif
+
+ err = vfs_read(lower_file, buf, count, ppos);
+ /* update our inode atime upon a successful lower read */
+ if (err >= 0)
+ fsstack_copy_attr_atime(d_inode(dentry),
+ file_inode(lower_file));
+
+ return err;
+}
+
+static ssize_t sdcardfs_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ int err;
+ struct file *lower_file;
+ struct dentry *dentry = file->f_path.dentry;
+ struct inode *inode = d_inode(dentry);
+
+ /* check disk space */
+ if (!check_min_free_space(dentry, count, 0)) {
+ pr_err("No minimum free space.\n");
+ return -ENOSPC;
+ }
+
+ lower_file = sdcardfs_lower_file(file);
+ err = vfs_write(lower_file, buf, count, ppos);
+ /* update our inode times+sizes upon a successful lower write */
+ if (err >= 0) {
+ if (sizeof(loff_t) > sizeof(long))
+ inode_lock(inode);
+ fsstack_copy_inode_size(inode, file_inode(lower_file));
+ fsstack_copy_attr_times(inode, file_inode(lower_file));
+ if (sizeof(loff_t) > sizeof(long))
+ inode_unlock(inode);
+ }
+
+ return err;
+}
+
+static int sdcardfs_readdir(struct file *file, struct dir_context *ctx)
+{
+ int err;
+ struct file *lower_file = NULL;
+ struct dentry *dentry = file->f_path.dentry;
+
+ lower_file = sdcardfs_lower_file(file);
+
+ lower_file->f_pos = file->f_pos;
+ err = iterate_dir(lower_file, ctx);
+ file->f_pos = lower_file->f_pos;
+ if (err >= 0) /* copy the atime */
+ fsstack_copy_attr_atime(d_inode(dentry),
+ file_inode(lower_file));
+ return err;
+}
+
+static long sdcardfs_unlocked_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ long err = -ENOTTY;
+ struct file *lower_file;
+ const struct cred *saved_cred = NULL;
+ struct dentry *dentry = file->f_path.dentry;
+ struct sdcardfs_sb_info *sbi = SDCARDFS_SB(dentry->d_sb);
+
+ lower_file = sdcardfs_lower_file(file);
+
+ /* XXX: use vfs_ioctl if/when VFS exports it */
+ if (!lower_file || !lower_file->f_op)
+ goto out;
+
+ /* save current_cred and override it */
+ OVERRIDE_CRED(sbi, saved_cred, SDCARDFS_I(file_inode(file)));
+
+ if (lower_file->f_op->unlocked_ioctl)
+ err = lower_file->f_op->unlocked_ioctl(lower_file, cmd, arg);
+
+ /* some ioctls can change inode attributes (EXT2_IOC_SETFLAGS) */
+ if (!err)
+ sdcardfs_copy_and_fix_attrs(file_inode(file),
+ file_inode(lower_file));
+ REVERT_CRED(saved_cred);
+out:
+ return err;
+}
+
+#ifdef CONFIG_COMPAT
+static long sdcardfs_compat_ioctl(struct file *file, unsigned int cmd,
+ unsigned long arg)
+{
+ long err = -ENOTTY;
+ struct file *lower_file;
+ const struct cred *saved_cred = NULL;
+ struct dentry *dentry = file->f_path.dentry;
+ struct sdcardfs_sb_info *sbi = SDCARDFS_SB(dentry->d_sb);
+
+ lower_file = sdcardfs_lower_file(file);
+
+ /* XXX: use vfs_ioctl if/when VFS exports it */
+ if (!lower_file || !lower_file->f_op)
+ goto out;
+
+ /* save current_cred and override it */
+ OVERRIDE_CRED(sbi, saved_cred, SDCARDFS_I(file_inode(file)));
+
+ if (lower_file->f_op->compat_ioctl)
+ err = lower_file->f_op->compat_ioctl(lower_file, cmd, arg);
+
+ REVERT_CRED(saved_cred);
+out:
+ return err;
+}
+#endif
+
+static int sdcardfs_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ int err = 0;
+ bool willwrite;
+ struct file *lower_file;
+ const struct vm_operations_struct *saved_vm_ops = NULL;
+
+ /* this might be deferred to mmap's writepage */
+ willwrite = ((vma->vm_flags | VM_SHARED | VM_WRITE) == vma->vm_flags);
+
+ /*
+ * File systems which do not implement ->writepage may use
+ * generic_file_readonly_mmap as their ->mmap op. If you call
+ * generic_file_readonly_mmap with VM_WRITE, you'd get an -EINVAL.
+ * But we cannot call the lower ->mmap op, so we can't tell that
+ * writeable mappings won't work. Therefore, our only choice is to
+ * check if the lower file system supports the ->writepage, and if
+ * not, return EINVAL (the same error that
+ * generic_file_readonly_mmap returns in that case).
+ */
+ lower_file = sdcardfs_lower_file(file);
+ if (willwrite && !lower_file->f_mapping->a_ops->writepage) {
+ err = -EINVAL;
+ pr_err("sdcardfs: lower file system does not support writeable mmap\n");
+ goto out;
+ }
+
+ /*
+ * find and save lower vm_ops.
+ *
+ * XXX: the VFS should have a cleaner way of finding the lower vm_ops
+ */
+ if (!SDCARDFS_F(file)->lower_vm_ops) {
+ err = lower_file->f_op->mmap(lower_file, vma);
+ if (err) {
+ pr_err("sdcardfs: lower mmap failed %d\n", err);
+ goto out;
+ }
+ saved_vm_ops = vma->vm_ops; /* save: came from lower ->mmap */
+ }
+
+ /*
+ * Next 3 lines are all I need from generic_file_mmap. I definitely
+ * don't want its test for ->readpage which returns -ENOEXEC.
+ */
+ file_accessed(file);
+ vma->vm_ops = &sdcardfs_vm_ops;
+
+ file->f_mapping->a_ops = &sdcardfs_aops; /* set our aops */
+ if (!SDCARDFS_F(file)->lower_vm_ops) /* save for our ->fault */
+ SDCARDFS_F(file)->lower_vm_ops = saved_vm_ops;
+ vma->vm_private_data = file;
+ get_file(lower_file);
+ vma->vm_file = lower_file;
+
+out:
+ return err;
+}
+
+static int sdcardfs_open(struct inode *inode, struct file *file)
+{
+ int err = 0;
+ struct file *lower_file = NULL;
+ struct path lower_path;
+ struct dentry *dentry = file->f_path.dentry;
+ struct dentry *parent = dget_parent(dentry);
+ struct sdcardfs_sb_info *sbi = SDCARDFS_SB(dentry->d_sb);
+ const struct cred *saved_cred = NULL;
+
+ /* don't open unhashed/deleted files */
+ if (d_unhashed(dentry)) {
+ err = -ENOENT;
+ goto out_err;
+ }
+
+ if (!check_caller_access_to_name(d_inode(parent), &dentry->d_name)) {
+ err = -EACCES;
+ goto out_err;
+ }
+
+ /* save current_cred and override it */
+ OVERRIDE_CRED(sbi, saved_cred, SDCARDFS_I(inode));
+
+ file->private_data =
+ kzalloc(sizeof(struct sdcardfs_file_info), GFP_KERNEL);
+ if (!SDCARDFS_F(file)) {
+ err = -ENOMEM;
+ goto out_revert_cred;
+ }
+
+ /* open lower object and link sdcardfs's file struct to lower's */
+ sdcardfs_get_lower_path(file->f_path.dentry, &lower_path);
+ lower_file = dentry_open(&lower_path, file->f_flags, current_cred());
+ path_put(&lower_path);
+ if (IS_ERR(lower_file)) {
+ err = PTR_ERR(lower_file);
+ lower_file = sdcardfs_lower_file(file);
+ if (lower_file) {
+ sdcardfs_set_lower_file(file, NULL);
+ fput(lower_file); /* fput calls dput for lower_dentry */
+ }
+ } else {
+ sdcardfs_set_lower_file(file, lower_file);
+ }
+
+ if (err)
+ kfree(SDCARDFS_F(file));
+ else
+ sdcardfs_copy_and_fix_attrs(inode, sdcardfs_lower_inode(inode));
+
+out_revert_cred:
+ REVERT_CRED(saved_cred);
+out_err:
+ dput(parent);
+ return err;
+}
+
+static int sdcardfs_flush(struct file *file, fl_owner_t id)
+{
+ int err = 0;
+ struct file *lower_file = NULL;
+
+ lower_file = sdcardfs_lower_file(file);
+ if (lower_file && lower_file->f_op && lower_file->f_op->flush) {
+ filemap_write_and_wait(file->f_mapping);
+ err = lower_file->f_op->flush(lower_file, id);
+ }
+
+ return err;
+}
+
+/* release all lower object references & free the file info structure */
+static int sdcardfs_file_release(struct inode *inode, struct file *file)
+{
+ struct file *lower_file;
+
+ lower_file = sdcardfs_lower_file(file);
+ if (lower_file) {
+ sdcardfs_set_lower_file(file, NULL);
+ fput(lower_file);
+ }
+
+ kfree(SDCARDFS_F(file));
+ return 0;
+}
+
+static int sdcardfs_fsync(struct file *file, loff_t start, loff_t end,
+ int datasync)
+{
+ int err;
+ struct file *lower_file;
+ struct path lower_path;
+ struct dentry *dentry = file->f_path.dentry;
+
+ err = __generic_file_fsync(file, start, end, datasync);
+ if (err)
+ goto out;
+
+ lower_file = sdcardfs_lower_file(file);
+ sdcardfs_get_lower_path(dentry, &lower_path);
+ err = vfs_fsync_range(lower_file, start, end, datasync);
+ sdcardfs_put_lower_path(dentry, &lower_path);
+out:
+ return err;
+}
+
+static int sdcardfs_fasync(int fd, struct file *file, int flag)
+{
+ int err = 0;
+ struct file *lower_file = NULL;
+
+ lower_file = sdcardfs_lower_file(file);
+ if (lower_file->f_op && lower_file->f_op->fasync)
+ err = lower_file->f_op->fasync(fd, lower_file, flag);
+
+ return err;
+}
+
+/*
+ * Sdcardfs cannot use generic_file_llseek as ->llseek, because it would
+ * only set the offset of the upper file. So we have to implement our
+ * own method to set both the upper and lower file offsets
+ * consistently.
+ */
+static loff_t sdcardfs_file_llseek(struct file *file, loff_t offset, int whence)
+{
+ int err;
+ struct file *lower_file;
+
+ err = generic_file_llseek(file, offset, whence);
+ if (err < 0)
+ goto out;
+
+ lower_file = sdcardfs_lower_file(file);
+ err = generic_file_llseek(lower_file, offset, whence);
+
+out:
+ return err;
+}
+
+/*
+ * Sdcardfs read_iter, redirect modified iocb to lower read_iter
+ */
+ssize_t sdcardfs_read_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+ int err;
+ struct file *file = iocb->ki_filp, *lower_file;
+
+ lower_file = sdcardfs_lower_file(file);
+ if (!lower_file->f_op->read_iter) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ get_file(lower_file); /* prevent lower_file from being released */
+ iocb->ki_filp = lower_file;
+ err = lower_file->f_op->read_iter(iocb, iter);
+ iocb->ki_filp = file;
+ fput(lower_file);
+ /* update upper inode atime as needed */
+ if (err >= 0 || err == -EIOCBQUEUED)
+ fsstack_copy_attr_atime(file->f_path.dentry->d_inode,
+ file_inode(lower_file));
+out:
+ return err;
+}
+
+/*
+ * Sdcardfs write_iter, redirect modified iocb to lower write_iter
+ */
+ssize_t sdcardfs_write_iter(struct kiocb *iocb, struct iov_iter *iter)
+{
+ int err;
+ struct file *file = iocb->ki_filp, *lower_file;
+ struct inode *inode = file->f_path.dentry->d_inode;
+
+ lower_file = sdcardfs_lower_file(file);
+ if (!lower_file->f_op->write_iter) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ get_file(lower_file); /* prevent lower_file from being released */
+ iocb->ki_filp = lower_file;
+ err = lower_file->f_op->write_iter(iocb, iter);
+ iocb->ki_filp = file;
+ fput(lower_file);
+ /* update upper inode times/sizes as needed */
+ if (err >= 0 || err == -EIOCBQUEUED) {
+ if (sizeof(loff_t) > sizeof(long))
+ inode_lock(inode);
+ fsstack_copy_inode_size(inode, file_inode(lower_file));
+ fsstack_copy_attr_times(inode, file_inode(lower_file));
+ if (sizeof(loff_t) > sizeof(long))
+ inode_unlock(inode);
+ }
+out:
+ return err;
+}
+
+const struct file_operations sdcardfs_main_fops = {
+ .llseek = generic_file_llseek,
+ .read = sdcardfs_read,
+ .write = sdcardfs_write,
+ .unlocked_ioctl = sdcardfs_unlocked_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = sdcardfs_compat_ioctl,
+#endif
+ .mmap = sdcardfs_mmap,
+ .open = sdcardfs_open,
+ .flush = sdcardfs_flush,
+ .release = sdcardfs_file_release,
+ .fsync = sdcardfs_fsync,
+ .fasync = sdcardfs_fasync,
+ .read_iter = sdcardfs_read_iter,
+ .write_iter = sdcardfs_write_iter,
+};
+
+/* trimmed directory options */
+const struct file_operations sdcardfs_dir_fops = {
+ .llseek = sdcardfs_file_llseek,
+ .read = generic_read_dir,
+ .iterate = sdcardfs_readdir,
+ .unlocked_ioctl = sdcardfs_unlocked_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = sdcardfs_compat_ioctl,
+#endif
+ .open = sdcardfs_open,
+ .release = sdcardfs_file_release,
+ .flush = sdcardfs_flush,
+ .fsync = sdcardfs_fsync,
+ .fasync = sdcardfs_fasync,
+};
diff --git a/fs/sdcardfs/inode.c b/fs/sdcardfs/inode.c
new file mode 100644
index 0000000..2de5a4d
--- /dev/null
+++ b/fs/sdcardfs/inode.c
@@ -0,0 +1,925 @@
+/*
+ * fs/sdcardfs/inode.c
+ *
+ * Copyright (c) 2013 Samsung Electronics Co. Ltd
+ * Authors: Daeho Jeong, Woojoong Lee, Seunghwan Hyun,
+ * Sunghwan Yun, Sungjong Seo
+ *
+ * This program has been developed as a stackable file system based on
+ * the WrapFS which written by
+ *
+ * Copyright (c) 1998-2011 Erez Zadok
+ * Copyright (c) 2009 Shrikar Archak
+ * Copyright (c) 2003-2011 Stony Brook University
+ * Copyright (c) 2003-2011 The Research Foundation of SUNY
+ *
+ * This file is dual licensed. It may be redistributed and/or modified
+ * under the terms of the Apache 2.0 License OR version 2 of the GNU
+ * General Public License.
+ */
+
+#include "sdcardfs.h"
+#include <linux/fs_struct.h>
+#include <linux/ratelimit.h>
+
+/* Do not directly use this function. Use OVERRIDE_CRED() instead. */
+const struct cred *override_fsids(struct sdcardfs_sb_info *sbi,
+ struct sdcardfs_inode_data *data)
+{
+ struct cred *cred;
+ const struct cred *old_cred;
+ uid_t uid;
+
+ cred = prepare_creds();
+ if (!cred)
+ return NULL;
+
+ if (sbi->options.gid_derivation) {
+ if (data->under_obb)
+ uid = AID_MEDIA_OBB;
+ else
+ uid = multiuser_get_uid(data->userid, sbi->options.fs_low_uid);
+ } else {
+ uid = sbi->options.fs_low_uid;
+ }
+ cred->fsuid = make_kuid(&init_user_ns, uid);
+ cred->fsgid = make_kgid(&init_user_ns, sbi->options.fs_low_gid);
+
+ old_cred = override_creds(cred);
+
+ return old_cred;
+}
+
+/* Do not directly use this function, use REVERT_CRED() instead. */
+void revert_fsids(const struct cred *old_cred)
+{
+ const struct cred *cur_cred;
+
+ cur_cred = current->cred;
+ revert_creds(old_cred);
+ put_cred(cur_cred);
+}
+
+static int sdcardfs_create(struct inode *dir, struct dentry *dentry,
+ umode_t mode, bool want_excl)
+{
+ int err;
+ struct dentry *lower_dentry;
+ struct vfsmount *lower_dentry_mnt;
+ struct dentry *lower_parent_dentry = NULL;
+ struct path lower_path;
+ const struct cred *saved_cred = NULL;
+ struct fs_struct *saved_fs;
+ struct fs_struct *copied_fs;
+
+ if (!check_caller_access_to_name(dir, &dentry->d_name)) {
+ err = -EACCES;
+ goto out_eacces;
+ }
+
+ /* save current_cred and override it */
+ OVERRIDE_CRED(SDCARDFS_SB(dir->i_sb), saved_cred, SDCARDFS_I(dir));
+
+ sdcardfs_get_lower_path(dentry, &lower_path);
+ lower_dentry = lower_path.dentry;
+ lower_dentry_mnt = lower_path.mnt;
+ lower_parent_dentry = lock_parent(lower_dentry);
+
+ /* set last 16bytes of mode field to 0664 */
+ mode = (mode & S_IFMT) | 00664;
+
+ /* temporarily change umask for lower fs write */
+ saved_fs = current->fs;
+ copied_fs = copy_fs_struct(current->fs);
+ if (!copied_fs) {
+ err = -ENOMEM;
+ goto out_unlock;
+ }
+ current->fs = copied_fs;
+ current->fs->umask = 0;
+ err = vfs_create2(lower_dentry_mnt, d_inode(lower_parent_dentry), lower_dentry, mode, want_excl);
+ if (err)
+ goto out;
+
+ err = sdcardfs_interpose(dentry, dir->i_sb, &lower_path,
+ SDCARDFS_I(dir)->data->userid);
+ if (err)
+ goto out;
+ fsstack_copy_attr_times(dir, sdcardfs_lower_inode(dir));
+ fsstack_copy_inode_size(dir, d_inode(lower_parent_dentry));
+ fixup_lower_ownership(dentry, dentry->d_name.name);
+
+out:
+ current->fs = saved_fs;
+ free_fs_struct(copied_fs);
+out_unlock:
+ unlock_dir(lower_parent_dentry);
+ sdcardfs_put_lower_path(dentry, &lower_path);
+ REVERT_CRED(saved_cred);
+out_eacces:
+ return err;
+}
+
+#if 0
+static int sdcardfs_link(struct dentry *old_dentry, struct inode *dir,
+ struct dentry *new_dentry)
+{
+ struct dentry *lower_old_dentry;
+ struct dentry *lower_new_dentry;
+ struct dentry *lower_dir_dentry;
+ u64 file_size_save;
+ int err;
+ struct path lower_old_path, lower_new_path;
+
+ OVERRIDE_CRED(SDCARDFS_SB(dir->i_sb));
+
+ file_size_save = i_size_read(d_inode(old_dentry));
+ sdcardfs_get_lower_path(old_dentry, &lower_old_path);
+ sdcardfs_get_lower_path(new_dentry, &lower_new_path);
+ lower_old_dentry = lower_old_path.dentry;
+ lower_new_dentry = lower_new_path.dentry;
+ lower_dir_dentry = lock_parent(lower_new_dentry);
+
+ err = vfs_link(lower_old_dentry, d_inode(lower_dir_dentry),
+ lower_new_dentry, NULL);
+ if (err || !d_inode(lower_new_dentry))
+ goto out;
+
+ err = sdcardfs_interpose(new_dentry, dir->i_sb, &lower_new_path);
+ if (err)
+ goto out;
+ fsstack_copy_attr_times(dir, d_inode(lower_new_dentry));
+ fsstack_copy_inode_size(dir, d_inode(lower_new_dentry));
+ set_nlink(d_inode(old_dentry),
+ sdcardfs_lower_inode(d_inode(old_dentry))->i_nlink);
+ i_size_write(d_inode(new_dentry), file_size_save);
+out:
+ unlock_dir(lower_dir_dentry);
+ sdcardfs_put_lower_path(old_dentry, &lower_old_path);
+ sdcardfs_put_lower_path(new_dentry, &lower_new_path);
+ REVERT_CRED();
+ return err;
+}
+#endif
+
+static int sdcardfs_unlink(struct inode *dir, struct dentry *dentry)
+{
+ int err;
+ struct dentry *lower_dentry;
+ struct vfsmount *lower_mnt;
+ struct inode *lower_dir_inode = sdcardfs_lower_inode(dir);
+ struct dentry *lower_dir_dentry;
+ struct path lower_path;
+ const struct cred *saved_cred = NULL;
+
+ if (!check_caller_access_to_name(dir, &dentry->d_name)) {
+ err = -EACCES;
+ goto out_eacces;
+ }
+
+ /* save current_cred and override it */
+ OVERRIDE_CRED(SDCARDFS_SB(dir->i_sb), saved_cred, SDCARDFS_I(dir));
+
+ sdcardfs_get_lower_path(dentry, &lower_path);
+ lower_dentry = lower_path.dentry;
+ lower_mnt = lower_path.mnt;
+ dget(lower_dentry);
+ lower_dir_dentry = lock_parent(lower_dentry);
+
+ err = vfs_unlink2(lower_mnt, lower_dir_inode, lower_dentry, NULL);
+
+ /*
+ * Note: unlinking on top of NFS can cause silly-renamed files.
+ * Trying to delete such files results in EBUSY from NFS
+ * below. Silly-renamed files will get deleted by NFS later on, so
+ * we just need to detect them here and treat such EBUSY errors as
+ * if the upper file was successfully deleted.
+ */
+ if (err == -EBUSY && lower_dentry->d_flags & DCACHE_NFSFS_RENAMED)
+ err = 0;
+ if (err)
+ goto out;
+ fsstack_copy_attr_times(dir, lower_dir_inode);
+ fsstack_copy_inode_size(dir, lower_dir_inode);
+ set_nlink(d_inode(dentry),
+ sdcardfs_lower_inode(d_inode(dentry))->i_nlink);
+ d_inode(dentry)->i_ctime = dir->i_ctime;
+ d_drop(dentry); /* this is needed, else LTP fails (VFS won't do it) */
+out:
+ unlock_dir(lower_dir_dentry);
+ dput(lower_dentry);
+ sdcardfs_put_lower_path(dentry, &lower_path);
+ REVERT_CRED(saved_cred);
+out_eacces:
+ return err;
+}
+
+#if 0
+static int sdcardfs_symlink(struct inode *dir, struct dentry *dentry,
+ const char *symname)
+{
+ int err;
+ struct dentry *lower_dentry;
+ struct dentry *lower_parent_dentry = NULL;
+ struct path lower_path;
+
+ OVERRIDE_CRED(SDCARDFS_SB(dir->i_sb));
+
+ sdcardfs_get_lower_path(dentry, &lower_path);
+ lower_dentry = lower_path.dentry;
+ lower_parent_dentry = lock_parent(lower_dentry);
+
+ err = vfs_symlink(d_inode(lower_parent_dentry), lower_dentry, symname);
+ if (err)
+ goto out;
+ err = sdcardfs_interpose(dentry, dir->i_sb, &lower_path);
+ if (err)
+ goto out;
+ fsstack_copy_attr_times(dir, sdcardfs_lower_inode(dir));
+ fsstack_copy_inode_size(dir, d_inode(lower_parent_dentry));
+
+out:
+ unlock_dir(lower_parent_dentry);
+ sdcardfs_put_lower_path(dentry, &lower_path);
+ REVERT_CRED();
+ return err;
+}
+#endif
+
+static int touch(char *abs_path, mode_t mode)
+{
+ struct file *filp = filp_open(abs_path, O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW, mode);
+
+ if (IS_ERR(filp)) {
+ if (PTR_ERR(filp) == -EEXIST) {
+ return 0;
+ } else {
+ pr_err("sdcardfs: failed to open(%s): %ld\n",
+ abs_path, PTR_ERR(filp));
+ return PTR_ERR(filp);
+ }
+ }
+ filp_close(filp, current->files);
+ return 0;
+}
+
+static int sdcardfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
+{
+ int err;
+ int make_nomedia_in_obb = 0;
+ struct dentry *lower_dentry;
+ struct vfsmount *lower_mnt;
+ struct dentry *lower_parent_dentry = NULL;
+ struct dentry *parent_dentry = NULL;
+ struct path lower_path;
+ struct sdcardfs_sb_info *sbi = SDCARDFS_SB(dentry->d_sb);
+ const struct cred *saved_cred = NULL;
+ struct sdcardfs_inode_data *pd = SDCARDFS_I(dir)->data;
+ int touch_err = 0;
+ struct fs_struct *saved_fs;
+ struct fs_struct *copied_fs;
+ struct qstr q_obb = QSTR_LITERAL("obb");
+ struct qstr q_data = QSTR_LITERAL("data");
+
+ if (!check_caller_access_to_name(dir, &dentry->d_name)) {
+ err = -EACCES;
+ goto out_eacces;
+ }
+
+ /* save current_cred and override it */
+ OVERRIDE_CRED(SDCARDFS_SB(dir->i_sb), saved_cred, SDCARDFS_I(dir));
+
+ /* check disk space */
+ parent_dentry = dget_parent(dentry);
+ if (!check_min_free_space(parent_dentry, 0, 1)) {
+ pr_err("sdcardfs: No minimum free space.\n");
+ err = -ENOSPC;
+ dput(parent_dentry);
+ goto out_revert;
+ }
+ dput(parent_dentry);
+
+ /* the lower_dentry is negative here */
+ sdcardfs_get_lower_path(dentry, &lower_path);
+ lower_dentry = lower_path.dentry;
+ lower_mnt = lower_path.mnt;
+ lower_parent_dentry = lock_parent(lower_dentry);
+
+ /* set last 16bytes of mode field to 0775 */
+ mode = (mode & S_IFMT) | 00775;
+
+ /* temporarily change umask for lower fs write */
+ saved_fs = current->fs;
+ copied_fs = copy_fs_struct(current->fs);
+ if (!copied_fs) {
+ err = -ENOMEM;
+ unlock_dir(lower_parent_dentry);
+ goto out_unlock;
+ }
+ current->fs = copied_fs;
+ current->fs->umask = 0;
+ err = vfs_mkdir2(lower_mnt, d_inode(lower_parent_dentry), lower_dentry, mode);
+
+ if (err) {
+ unlock_dir(lower_parent_dentry);
+ goto out;
+ }
+
+ /* if it is a local obb dentry, setup it with the base obbpath */
+ if (need_graft_path(dentry)) {
+
+ err = setup_obb_dentry(dentry, &lower_path);
+ if (err) {
+ /* if the sbi->obbpath is not available, the lower_path won't be
+ * changed by setup_obb_dentry() but the lower path is saved to
+ * its orig_path. this dentry will be revalidated later.
+ * but now, the lower_path should be NULL
+ */
+ sdcardfs_put_reset_lower_path(dentry);
+
+ /* the newly created lower path which saved to its orig_path or
+ * the lower_path is the base obbpath.
+ * therefore, an additional path_get is required
+ */
+ path_get(&lower_path);
+ } else
+ make_nomedia_in_obb = 1;
+ }
+
+ err = sdcardfs_interpose(dentry, dir->i_sb, &lower_path, pd->userid);
+ if (err) {
+ unlock_dir(lower_parent_dentry);
+ goto out;
+ }
+
+ fsstack_copy_attr_times(dir, sdcardfs_lower_inode(dir));
+ fsstack_copy_inode_size(dir, d_inode(lower_parent_dentry));
+ /* update number of links on parent directory */
+ set_nlink(dir, sdcardfs_lower_inode(dir)->i_nlink);
+ fixup_lower_ownership(dentry, dentry->d_name.name);
+ unlock_dir(lower_parent_dentry);
+ if ((!sbi->options.multiuser) && (qstr_case_eq(&dentry->d_name, &q_obb))
+ && (pd->perm == PERM_ANDROID) && (pd->userid == 0))
+ make_nomedia_in_obb = 1;
+
+ /* When creating /Android/data and /Android/obb, mark them as .nomedia */
+ if (make_nomedia_in_obb ||
+ ((pd->perm == PERM_ANDROID)
+ && (qstr_case_eq(&dentry->d_name, &q_data)))) {
+ REVERT_CRED(saved_cred);
+ OVERRIDE_CRED(SDCARDFS_SB(dir->i_sb), saved_cred, SDCARDFS_I(d_inode(dentry)));
+ set_fs_pwd(current->fs, &lower_path);
+ touch_err = touch(".nomedia", 0664);
+ if (touch_err) {
+ pr_err("sdcardfs: failed to create .nomedia in %s: %d\n",
+ lower_path.dentry->d_name.name, touch_err);
+ goto out;
+ }
+ }
+out:
+ current->fs = saved_fs;
+ free_fs_struct(copied_fs);
+out_unlock:
+ sdcardfs_put_lower_path(dentry, &lower_path);
+out_revert:
+ REVERT_CRED(saved_cred);
+out_eacces:
+ return err;
+}
+
+static int sdcardfs_rmdir(struct inode *dir, struct dentry *dentry)
+{
+ struct dentry *lower_dentry;
+ struct dentry *lower_dir_dentry;
+ struct vfsmount *lower_mnt;
+ int err;
+ struct path lower_path;
+ const struct cred *saved_cred = NULL;
+
+ if (!check_caller_access_to_name(dir, &dentry->d_name)) {
+ err = -EACCES;
+ goto out_eacces;
+ }
+
+ /* save current_cred and override it */
+ OVERRIDE_CRED(SDCARDFS_SB(dir->i_sb), saved_cred, SDCARDFS_I(dir));
+
+ /* sdcardfs_get_real_lower(): in case of remove an user's obb dentry
+ * the dentry on the original path should be deleted.
+ */
+ sdcardfs_get_real_lower(dentry, &lower_path);
+
+ lower_dentry = lower_path.dentry;
+ lower_mnt = lower_path.mnt;
+ lower_dir_dentry = lock_parent(lower_dentry);
+
+ err = vfs_rmdir2(lower_mnt, d_inode(lower_dir_dentry), lower_dentry);
+ if (err)
+ goto out;
+
+ d_drop(dentry); /* drop our dentry on success (why not VFS's job?) */
+ if (d_inode(dentry))
+ clear_nlink(d_inode(dentry));
+ fsstack_copy_attr_times(dir, d_inode(lower_dir_dentry));
+ fsstack_copy_inode_size(dir, d_inode(lower_dir_dentry));
+ set_nlink(dir, d_inode(lower_dir_dentry)->i_nlink);
+
+out:
+ unlock_dir(lower_dir_dentry);
+ sdcardfs_put_real_lower(dentry, &lower_path);
+ REVERT_CRED(saved_cred);
+out_eacces:
+ return err;
+}
+
+#if 0
+static int sdcardfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode,
+ dev_t dev)
+{
+ int err;
+ struct dentry *lower_dentry;
+ struct dentry *lower_parent_dentry = NULL;
+ struct path lower_path;
+
+ OVERRIDE_CRED(SDCARDFS_SB(dir->i_sb));
+
+ sdcardfs_get_lower_path(dentry, &lower_path);
+ lower_dentry = lower_path.dentry;
+ lower_parent_dentry = lock_parent(lower_dentry);
+
+ err = vfs_mknod(d_inode(lower_parent_dentry), lower_dentry, mode, dev);
+ if (err)
+ goto out;
+
+ err = sdcardfs_interpose(dentry, dir->i_sb, &lower_path);
+ if (err)
+ goto out;
+ fsstack_copy_attr_times(dir, sdcardfs_lower_inode(dir));
+ fsstack_copy_inode_size(dir, d_inode(lower_parent_dentry));
+
+out:
+ unlock_dir(lower_parent_dentry);
+ sdcardfs_put_lower_path(dentry, &lower_path);
+ REVERT_CRED();
+ return err;
+}
+#endif
+
+/*
+ * The locking rules in sdcardfs_rename are complex. We could use a simpler
+ * superblock-level name-space lock for renames and copy-ups.
+ */
+static int sdcardfs_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry,
+ unsigned int flags)
+{
+ int err = 0;
+ struct dentry *lower_old_dentry = NULL;
+ struct dentry *lower_new_dentry = NULL;
+ struct dentry *lower_old_dir_dentry = NULL;
+ struct dentry *lower_new_dir_dentry = NULL;
+ struct vfsmount *lower_mnt = NULL;
+ struct dentry *trap = NULL;
+ struct path lower_old_path, lower_new_path;
+ const struct cred *saved_cred = NULL;
+
+ if (flags)
+ return -EINVAL;
+
+ if (!check_caller_access_to_name(old_dir, &old_dentry->d_name) ||
+ !check_caller_access_to_name(new_dir, &new_dentry->d_name)) {
+ err = -EACCES;
+ goto out_eacces;
+ }
+
+ /* save current_cred and override it */
+ OVERRIDE_CRED(SDCARDFS_SB(old_dir->i_sb), saved_cred, SDCARDFS_I(new_dir));
+
+ sdcardfs_get_real_lower(old_dentry, &lower_old_path);
+ sdcardfs_get_lower_path(new_dentry, &lower_new_path);
+ lower_old_dentry = lower_old_path.dentry;
+ lower_new_dentry = lower_new_path.dentry;
+ lower_mnt = lower_old_path.mnt;
+ lower_old_dir_dentry = dget_parent(lower_old_dentry);
+ lower_new_dir_dentry = dget_parent(lower_new_dentry);
+
+ trap = lock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
+ /* source should not be ancestor of target */
+ if (trap == lower_old_dentry) {
+ err = -EINVAL;
+ goto out;
+ }
+ /* target should not be ancestor of source */
+ if (trap == lower_new_dentry) {
+ err = -ENOTEMPTY;
+ goto out;
+ }
+
+ err = vfs_rename2(lower_mnt,
+ d_inode(lower_old_dir_dentry), lower_old_dentry,
+ d_inode(lower_new_dir_dentry), lower_new_dentry,
+ NULL, 0);
+ if (err)
+ goto out;
+
+ /* Copy attrs from lower dir, but i_uid/i_gid */
+ sdcardfs_copy_and_fix_attrs(new_dir, d_inode(lower_new_dir_dentry));
+ fsstack_copy_inode_size(new_dir, d_inode(lower_new_dir_dentry));
+
+ if (new_dir != old_dir) {
+ sdcardfs_copy_and_fix_attrs(old_dir, d_inode(lower_old_dir_dentry));
+ fsstack_copy_inode_size(old_dir, d_inode(lower_old_dir_dentry));
+ }
+ get_derived_permission_new(new_dentry->d_parent, old_dentry, &new_dentry->d_name);
+ fixup_tmp_permissions(d_inode(old_dentry));
+ fixup_lower_ownership(old_dentry, new_dentry->d_name.name);
+ d_invalidate(old_dentry); /* Can't fixup ownership recursively :( */
+out:
+ unlock_rename(lower_old_dir_dentry, lower_new_dir_dentry);
+ dput(lower_old_dir_dentry);
+ dput(lower_new_dir_dentry);
+ sdcardfs_put_real_lower(old_dentry, &lower_old_path);
+ sdcardfs_put_lower_path(new_dentry, &lower_new_path);
+ REVERT_CRED(saved_cred);
+out_eacces:
+ return err;
+}
+
+#if 0
+static int sdcardfs_readlink(struct dentry *dentry, char __user *buf, int bufsiz)
+{
+ int err;
+ struct dentry *lower_dentry;
+ struct path lower_path;
+ /* XXX readlink does not requires overriding credential */
+
+ sdcardfs_get_lower_path(dentry, &lower_path);
+ lower_dentry = lower_path.dentry;
+ if (!d_inode(lower_dentry)->i_op ||
+ !d_inode(lower_dentry)->i_op->readlink) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ err = d_inode(lower_dentry)->i_op->readlink(lower_dentry,
+ buf, bufsiz);
+ if (err < 0)
+ goto out;
+ fsstack_copy_attr_atime(d_inode(dentry), d_inode(lower_dentry));
+
+out:
+ sdcardfs_put_lower_path(dentry, &lower_path);
+ return err;
+}
+#endif
+
+#if 0
+static const char *sdcardfs_follow_link(struct dentry *dentry, void **cookie)
+{
+ char *buf;
+ int len = PAGE_SIZE, err;
+ mm_segment_t old_fs;
+
+ /* This is freed by the put_link method assuming a successful call. */
+ buf = kmalloc(len, GFP_KERNEL);
+ if (!buf) {
+ buf = ERR_PTR(-ENOMEM);
+ return buf;
+ }
+
+ /* read the symlink, and then we will follow it */
+ old_fs = get_fs();
+ set_fs(KERNEL_DS);
+ err = sdcardfs_readlink(dentry, buf, len);
+ set_fs(old_fs);
+ if (err < 0) {
+ kfree(buf);
+ buf = ERR_PTR(err);
+ } else {
+ buf[err] = '\0';
+ }
+ return *cookie = buf;
+}
+#endif
+
+static int sdcardfs_permission_wrn(struct inode *inode, int mask)
+{
+ WARN_RATELIMIT(1, "sdcardfs does not support permission. Use permission2.\n");
+ return -EINVAL;
+}
+
+void copy_attrs(struct inode *dest, const struct inode *src)
+{
+ dest->i_mode = src->i_mode;
+ dest->i_uid = src->i_uid;
+ dest->i_gid = src->i_gid;
+ dest->i_rdev = src->i_rdev;
+ dest->i_atime = src->i_atime;
+ dest->i_mtime = src->i_mtime;
+ dest->i_ctime = src->i_ctime;
+ dest->i_blkbits = src->i_blkbits;
+ dest->i_flags = src->i_flags;
+#ifdef CONFIG_FS_POSIX_ACL
+ dest->i_acl = src->i_acl;
+#endif
+#ifdef CONFIG_SECURITY
+ dest->i_security = src->i_security;
+#endif
+}
+
+static int sdcardfs_permission(struct vfsmount *mnt, struct inode *inode, int mask)
+{
+ int err;
+ struct inode tmp;
+ struct sdcardfs_inode_data *top = top_data_get(SDCARDFS_I(inode));
+
+ if (IS_ERR(mnt))
+ return PTR_ERR(mnt);
+
+ if (!top)
+ return -EINVAL;
+
+ /*
+ * Permission check on sdcardfs inode.
+ * Calling process should have AID_SDCARD_RW permission
+ * Since generic_permission only needs i_mode, i_uid,
+ * i_gid, and i_sb, we can create a fake inode to pass
+ * this information down in.
+ *
+ * The underlying code may attempt to take locks in some
+ * cases for features we're not using, but if that changes,
+ * locks must be dealt with to avoid undefined behavior.
+ */
+ copy_attrs(&tmp, inode);
+ tmp.i_uid = make_kuid(&init_user_ns, top->d_uid);
+ tmp.i_gid = make_kgid(&init_user_ns, get_gid(mnt, inode->i_sb, top));
+ tmp.i_mode = (inode->i_mode & S_IFMT)
+ | get_mode(mnt, SDCARDFS_I(inode), top);
+ data_put(top);
+ tmp.i_sb = inode->i_sb;
+ if (IS_POSIXACL(inode))
+ pr_warn("%s: This may be undefined behavior...\n", __func__);
+ err = generic_permission(&tmp, mask);
+ /* XXX
+ * Original sdcardfs code calls inode_permission(lower_inode,.. )
+ * for checking inode permission. But doing such things here seems
+ * duplicated work, because the functions called after this func,
+ * such as vfs_create, vfs_unlink, vfs_rename, and etc,
+ * does exactly same thing, i.e., they calls inode_permission().
+ * So we just let they do the things.
+ * If there are any security hole, just uncomment following if block.
+ */
+#if 0
+ if (!err) {
+ /*
+ * Permission check on lower_inode(=EXT4).
+ * we check it with AID_MEDIA_RW permission
+ */
+ struct inode *lower_inode;
+
+ OVERRIDE_CRED(SDCARDFS_SB(inode->sb));
+
+ lower_inode = sdcardfs_lower_inode(inode);
+ err = inode_permission(lower_inode, mask);
+
+ REVERT_CRED();
+ }
+#endif
+ return err;
+
+}
+
+static int sdcardfs_setattr_wrn(struct dentry *dentry, struct iattr *ia)
+{
+ WARN_RATELIMIT(1, "sdcardfs does not support setattr. User setattr2.\n");
+ return -EINVAL;
+}
+
+static int sdcardfs_setattr(struct vfsmount *mnt, struct dentry *dentry, struct iattr *ia)
+{
+ int err;
+ struct dentry *lower_dentry;
+ struct vfsmount *lower_mnt;
+ struct inode *inode;
+ struct inode *lower_inode;
+ struct path lower_path;
+ struct iattr lower_ia;
+ struct dentry *parent;
+ struct inode tmp;
+ struct dentry tmp_d;
+ struct sdcardfs_inode_data *top;
+
+ const struct cred *saved_cred = NULL;
+
+ inode = d_inode(dentry);
+ top = top_data_get(SDCARDFS_I(inode));
+
+ if (!top)
+ return -EINVAL;
+
+ /*
+ * Permission check on sdcardfs inode.
+ * Calling process should have AID_SDCARD_RW permission
+ * Since generic_permission only needs i_mode, i_uid,
+ * i_gid, and i_sb, we can create a fake inode to pass
+ * this information down in.
+ *
+ * The underlying code may attempt to take locks in some
+ * cases for features we're not using, but if that changes,
+ * locks must be dealt with to avoid undefined behavior.
+ *
+ */
+ copy_attrs(&tmp, inode);
+ tmp.i_uid = make_kuid(&init_user_ns, top->d_uid);
+ tmp.i_gid = make_kgid(&init_user_ns, get_gid(mnt, dentry->d_sb, top));
+ tmp.i_mode = (inode->i_mode & S_IFMT)
+ | get_mode(mnt, SDCARDFS_I(inode), top);
+ tmp.i_size = i_size_read(inode);
+ data_put(top);
+ tmp.i_sb = inode->i_sb;
+ tmp_d.d_inode = &tmp;
+
+ /*
+ * Check if user has permission to change dentry. We don't check if
+ * this user can change the lower inode: that should happen when
+ * calling notify_change on the lower inode.
+ */
+ /* prepare our own lower struct iattr (with the lower file) */
+ memcpy(&lower_ia, ia, sizeof(lower_ia));
+ /* Allow touch updating timestamps. A previous permission check ensures
+ * we have write access. Changes to mode, owner, and group are ignored
+ */
+ ia->ia_valid |= ATTR_FORCE;
+ err = setattr_prepare(&tmp_d, ia);
+
+ if (!err) {
+ /* check the Android group ID */
+ parent = dget_parent(dentry);
+ if (!check_caller_access_to_name(d_inode(parent), &dentry->d_name))
+ err = -EACCES;
+ dput(parent);
+ }
+
+ if (err)
+ goto out_err;
+
+ /* save current_cred and override it */
+ OVERRIDE_CRED(SDCARDFS_SB(dentry->d_sb), saved_cred, SDCARDFS_I(inode));
+
+ sdcardfs_get_lower_path(dentry, &lower_path);
+ lower_dentry = lower_path.dentry;
+ lower_mnt = lower_path.mnt;
+ lower_inode = sdcardfs_lower_inode(inode);
+
+ if (ia->ia_valid & ATTR_FILE)
+ lower_ia.ia_file = sdcardfs_lower_file(ia->ia_file);
+
+ lower_ia.ia_valid &= ~(ATTR_UID | ATTR_GID | ATTR_MODE);
+
+ /*
+ * If shrinking, first truncate upper level to cancel writing dirty
+ * pages beyond the new eof; and also if its' maxbytes is more
+ * limiting (fail with -EFBIG before making any change to the lower
+ * level). There is no need to vmtruncate the upper level
+ * afterwards in the other cases: we fsstack_copy_inode_size from
+ * the lower level.
+ */
+ if (ia->ia_valid & ATTR_SIZE) {
+ err = inode_newsize_ok(&tmp, ia->ia_size);
+ if (err) {
+ goto out;
+ }
+ truncate_setsize(inode, ia->ia_size);
+ }
+
+ /*
+ * mode change is for clearing setuid/setgid bits. Allow lower fs
+ * to interpret this in its own way.
+ */
+ if (lower_ia.ia_valid & (ATTR_KILL_SUID | ATTR_KILL_SGID))
+ lower_ia.ia_valid &= ~ATTR_MODE;
+
+ /* notify the (possibly copied-up) lower inode */
+ /*
+ * Note: we use d_inode(lower_dentry), because lower_inode may be
+ * unlinked (no inode->i_sb and i_ino==0. This happens if someone
+ * tries to open(), unlink(), then ftruncate() a file.
+ */
+ inode_lock(d_inode(lower_dentry));
+ err = notify_change2(lower_mnt, lower_dentry, &lower_ia, /* note: lower_ia */
+ NULL);
+ inode_unlock(d_inode(lower_dentry));
+ if (err)
+ goto out;
+
+ /* get attributes from the lower inode and update derived permissions */
+ sdcardfs_copy_and_fix_attrs(inode, lower_inode);
+
+ /*
+ * Not running fsstack_copy_inode_size(inode, lower_inode), because
+ * VFS should update our inode size, and notify_change on
+ * lower_inode should update its size.
+ */
+
+out:
+ sdcardfs_put_lower_path(dentry, &lower_path);
+ REVERT_CRED(saved_cred);
+out_err:
+ return err;
+}
+
+static int sdcardfs_fillattr(struct vfsmount *mnt, struct inode *inode,
+ struct kstat *lower_stat, struct kstat *stat)
+{
+ struct sdcardfs_inode_info *info = SDCARDFS_I(inode);
+ struct sdcardfs_inode_data *top = top_data_get(info);
+ struct super_block *sb = inode->i_sb;
+
+ if (!top)
+ return -EINVAL;
+
+ stat->dev = inode->i_sb->s_dev;
+ stat->ino = inode->i_ino;
+ stat->mode = (inode->i_mode & S_IFMT) | get_mode(mnt, info, top);
+ stat->nlink = inode->i_nlink;
+ stat->uid = make_kuid(&init_user_ns, top->d_uid);
+ stat->gid = make_kgid(&init_user_ns, get_gid(mnt, sb, top));
+ stat->rdev = inode->i_rdev;
+ stat->size = lower_stat->size;
+ stat->atime = lower_stat->atime;
+ stat->mtime = lower_stat->mtime;
+ stat->ctime = lower_stat->ctime;
+ stat->blksize = lower_stat->blksize;
+ stat->blocks = lower_stat->blocks;
+ data_put(top);
+ return 0;
+}
+static int sdcardfs_getattr(const struct path *path, struct kstat *stat,
+ u32 request_mask, unsigned int flags)
+{
+ struct vfsmount *mnt = path->mnt;
+ struct dentry *dentry = path->dentry;
+ struct kstat lower_stat;
+ struct path lower_path;
+ struct dentry *parent;
+ int err;
+
+ parent = dget_parent(dentry);
+ if (!check_caller_access_to_name(d_inode(parent), &dentry->d_name)) {
+ dput(parent);
+ return -EACCES;
+ }
+ dput(parent);
+
+ sdcardfs_get_lower_path(dentry, &lower_path);
+ err = vfs_getattr(&lower_path, &lower_stat, request_mask, flags);
+ if (err)
+ goto out;
+ sdcardfs_copy_and_fix_attrs(d_inode(dentry),
+ d_inode(lower_path.dentry));
+ err = sdcardfs_fillattr(mnt, d_inode(dentry), &lower_stat, stat);
+out:
+ sdcardfs_put_lower_path(dentry, &lower_path);
+ return err;
+}
+
+const struct inode_operations sdcardfs_symlink_iops = {
+ .permission2 = sdcardfs_permission,
+ .setattr2 = sdcardfs_setattr,
+ /* XXX Following operations are implemented,
+ * but FUSE(sdcard) or FAT does not support them
+ * These methods are *NOT* perfectly tested.
+ .readlink = sdcardfs_readlink,
+ .follow_link = sdcardfs_follow_link,
+ .put_link = kfree_put_link,
+ */
+};
+
+const struct inode_operations sdcardfs_dir_iops = {
+ .create = sdcardfs_create,
+ .lookup = sdcardfs_lookup,
+ .permission = sdcardfs_permission_wrn,
+ .permission2 = sdcardfs_permission,
+ .unlink = sdcardfs_unlink,
+ .mkdir = sdcardfs_mkdir,
+ .rmdir = sdcardfs_rmdir,
+ .rename = sdcardfs_rename,
+ .setattr = sdcardfs_setattr_wrn,
+ .setattr2 = sdcardfs_setattr,
+ .getattr = sdcardfs_getattr,
+ /* XXX Following operations are implemented,
+ * but FUSE(sdcard) or FAT does not support them
+ * These methods are *NOT* perfectly tested.
+ .symlink = sdcardfs_symlink,
+ .link = sdcardfs_link,
+ .mknod = sdcardfs_mknod,
+ */
+};
+
+const struct inode_operations sdcardfs_main_iops = {
+ .permission = sdcardfs_permission_wrn,
+ .permission2 = sdcardfs_permission,
+ .setattr = sdcardfs_setattr_wrn,
+ .setattr2 = sdcardfs_setattr,
+ .getattr = sdcardfs_getattr,
+};
diff --git a/fs/sdcardfs/lookup.c b/fs/sdcardfs/lookup.c
new file mode 100644
index 0000000..2ff3051
--- /dev/null
+++ b/fs/sdcardfs/lookup.c
@@ -0,0 +1,465 @@
+/*
+ * fs/sdcardfs/lookup.c
+ *
+ * Copyright (c) 2013 Samsung Electronics Co. Ltd
+ * Authors: Daeho Jeong, Woojoong Lee, Seunghwan Hyun,
+ * Sunghwan Yun, Sungjong Seo
+ *
+ * This program has been developed as a stackable file system based on
+ * the WrapFS which written by
+ *
+ * Copyright (c) 1998-2011 Erez Zadok
+ * Copyright (c) 2009 Shrikar Archak
+ * Copyright (c) 2003-2011 Stony Brook University
+ * Copyright (c) 2003-2011 The Research Foundation of SUNY
+ *
+ * This file is dual licensed. It may be redistributed and/or modified
+ * under the terms of the Apache 2.0 License OR version 2 of the GNU
+ * General Public License.
+ */
+
+#include "sdcardfs.h"
+#include "linux/delay.h"
+
+/* The dentry cache is just so we have properly sized dentries */
+static struct kmem_cache *sdcardfs_dentry_cachep;
+
+int sdcardfs_init_dentry_cache(void)
+{
+ sdcardfs_dentry_cachep =
+ kmem_cache_create("sdcardfs_dentry",
+ sizeof(struct sdcardfs_dentry_info),
+ 0, SLAB_RECLAIM_ACCOUNT, NULL);
+
+ return sdcardfs_dentry_cachep ? 0 : -ENOMEM;
+}
+
+void sdcardfs_destroy_dentry_cache(void)
+{
+ kmem_cache_destroy(sdcardfs_dentry_cachep);
+}
+
+void free_dentry_private_data(struct dentry *dentry)
+{
+ kmem_cache_free(sdcardfs_dentry_cachep, dentry->d_fsdata);
+ dentry->d_fsdata = NULL;
+}
+
+/* allocate new dentry private data */
+int new_dentry_private_data(struct dentry *dentry)
+{
+ struct sdcardfs_dentry_info *info = SDCARDFS_D(dentry);
+
+ /* use zalloc to init dentry_info.lower_path */
+ info = kmem_cache_zalloc(sdcardfs_dentry_cachep, GFP_ATOMIC);
+ if (!info)
+ return -ENOMEM;
+
+ spin_lock_init(&info->lock);
+ dentry->d_fsdata = info;
+
+ return 0;
+}
+
+struct inode_data {
+ struct inode *lower_inode;
+ userid_t id;
+};
+
+static int sdcardfs_inode_test(struct inode *inode, void *candidate_data/*void *candidate_lower_inode*/)
+{
+ struct inode *current_lower_inode = sdcardfs_lower_inode(inode);
+ userid_t current_userid = SDCARDFS_I(inode)->data->userid;
+
+ if (current_lower_inode == ((struct inode_data *)candidate_data)->lower_inode &&
+ current_userid == ((struct inode_data *)candidate_data)->id)
+ return 1; /* found a match */
+ else
+ return 0; /* no match */
+}
+
+static int sdcardfs_inode_set(struct inode *inode, void *lower_inode)
+{
+ /* we do actual inode initialization in sdcardfs_iget */
+ return 0;
+}
+
+struct inode *sdcardfs_iget(struct super_block *sb, struct inode *lower_inode, userid_t id)
+{
+ struct sdcardfs_inode_info *info;
+ struct inode_data data;
+ struct inode *inode; /* the new inode to return */
+
+ if (!igrab(lower_inode))
+ return ERR_PTR(-ESTALE);
+
+ data.id = id;
+ data.lower_inode = lower_inode;
+ inode = iget5_locked(sb, /* our superblock */
+ /*
+ * hashval: we use inode number, but we can
+ * also use "(unsigned long)lower_inode"
+ * instead.
+ */
+ lower_inode->i_ino, /* hashval */
+ sdcardfs_inode_test, /* inode comparison function */
+ sdcardfs_inode_set, /* inode init function */
+ &data); /* data passed to test+set fxns */
+ if (!inode) {
+ iput(lower_inode);
+ return ERR_PTR(-ENOMEM);
+ }
+ /* if found a cached inode, then just return it (after iput) */
+ if (!(inode->i_state & I_NEW)) {
+ iput(lower_inode);
+ return inode;
+ }
+
+ /* initialize new inode */
+ info = SDCARDFS_I(inode);
+
+ inode->i_ino = lower_inode->i_ino;
+ sdcardfs_set_lower_inode(inode, lower_inode);
+
+ inode->i_version++;
+
+ /* use different set of inode ops for symlinks & directories */
+ if (S_ISDIR(lower_inode->i_mode))
+ inode->i_op = &sdcardfs_dir_iops;
+ else if (S_ISLNK(lower_inode->i_mode))
+ inode->i_op = &sdcardfs_symlink_iops;
+ else
+ inode->i_op = &sdcardfs_main_iops;
+
+ /* use different set of file ops for directories */
+ if (S_ISDIR(lower_inode->i_mode))
+ inode->i_fop = &sdcardfs_dir_fops;
+ else
+ inode->i_fop = &sdcardfs_main_fops;
+
+ inode->i_mapping->a_ops = &sdcardfs_aops;
+
+ inode->i_atime.tv_sec = 0;
+ inode->i_atime.tv_nsec = 0;
+ inode->i_mtime.tv_sec = 0;
+ inode->i_mtime.tv_nsec = 0;
+ inode->i_ctime.tv_sec = 0;
+ inode->i_ctime.tv_nsec = 0;
+
+ /* properly initialize special inodes */
+ if (S_ISBLK(lower_inode->i_mode) || S_ISCHR(lower_inode->i_mode) ||
+ S_ISFIFO(lower_inode->i_mode) || S_ISSOCK(lower_inode->i_mode))
+ init_special_inode(inode, lower_inode->i_mode,
+ lower_inode->i_rdev);
+
+ /* all well, copy inode attributes */
+ sdcardfs_copy_and_fix_attrs(inode, lower_inode);
+ fsstack_copy_inode_size(inode, lower_inode);
+
+ unlock_new_inode(inode);
+ return inode;
+}
+
+/*
+ * Helper interpose routine, called directly by ->lookup to handle
+ * spliced dentries.
+ */
+static struct dentry *__sdcardfs_interpose(struct dentry *dentry,
+ struct super_block *sb,
+ struct path *lower_path,
+ userid_t id)
+{
+ struct inode *inode;
+ struct inode *lower_inode;
+ struct super_block *lower_sb;
+ struct dentry *ret_dentry;
+
+ lower_inode = d_inode(lower_path->dentry);
+ lower_sb = sdcardfs_lower_super(sb);
+
+ /* check that the lower file system didn't cross a mount point */
+ if (lower_inode->i_sb != lower_sb) {
+ ret_dentry = ERR_PTR(-EXDEV);
+ goto out;
+ }
+
+ /*
+ * We allocate our new inode below by calling sdcardfs_iget,
+ * which will initialize some of the new inode's fields
+ */
+
+ /* inherit lower inode number for sdcardfs's inode */
+ inode = sdcardfs_iget(sb, lower_inode, id);
+ if (IS_ERR(inode)) {
+ ret_dentry = ERR_CAST(inode);
+ goto out;
+ }
+
+ ret_dentry = d_splice_alias(inode, dentry);
+ dentry = ret_dentry ?: dentry;
+ if (!IS_ERR(dentry))
+ update_derived_permission_lock(dentry);
+out:
+ return ret_dentry;
+}
+
+/*
+ * Connect an sdcardfs inode dentry/inode with several lower ones. This is
+ * the classic stackable file system "vnode interposition" action.
+ *
+ * @dentry: sdcardfs's dentry which interposes on lower one
+ * @sb: sdcardfs's super_block
+ * @lower_path: the lower path (caller does path_get/put)
+ */
+int sdcardfs_interpose(struct dentry *dentry, struct super_block *sb,
+ struct path *lower_path, userid_t id)
+{
+ struct dentry *ret_dentry;
+
+ ret_dentry = __sdcardfs_interpose(dentry, sb, lower_path, id);
+ return PTR_ERR(ret_dentry);
+}
+
+struct sdcardfs_name_data {
+ struct dir_context ctx;
+ const struct qstr *to_find;
+ char *name;
+ bool found;
+};
+
+static int sdcardfs_name_match(struct dir_context *ctx, const char *name,
+ int namelen, loff_t offset, u64 ino, unsigned int d_type)
+{
+ struct sdcardfs_name_data *buf = container_of(ctx, struct sdcardfs_name_data, ctx);
+ struct qstr candidate = QSTR_INIT(name, namelen);
+
+ if (qstr_case_eq(buf->to_find, &candidate)) {
+ memcpy(buf->name, name, namelen);
+ buf->name[namelen] = 0;
+ buf->found = true;
+ return 1;
+ }
+ return 0;
+}
+
+/*
+ * Main driver function for sdcardfs's lookup.
+ *
+ * Returns: NULL (ok), ERR_PTR if an error occurred.
+ * Fills in lower_parent_path with <dentry,mnt> on success.
+ */
+static struct dentry *__sdcardfs_lookup(struct dentry *dentry,
+ unsigned int flags, struct path *lower_parent_path, userid_t id)
+{
+ int err = 0;
+ struct vfsmount *lower_dir_mnt;
+ struct dentry *lower_dir_dentry = NULL;
+ struct dentry *lower_dentry;
+ const struct qstr *name;
+ struct path lower_path;
+ struct qstr dname;
+ struct dentry *ret_dentry = NULL;
+ struct sdcardfs_sb_info *sbi;
+
+ sbi = SDCARDFS_SB(dentry->d_sb);
+ /* must initialize dentry operations */
+ d_set_d_op(dentry, &sdcardfs_ci_dops);
+
+ if (IS_ROOT(dentry))
+ goto out;
+
+ name = &dentry->d_name;
+
+ /* now start the actual lookup procedure */
+ lower_dir_dentry = lower_parent_path->dentry;
+ lower_dir_mnt = lower_parent_path->mnt;
+
+ /* Use vfs_path_lookup to check if the dentry exists or not */
+ err = vfs_path_lookup(lower_dir_dentry, lower_dir_mnt, name->name, 0,
+ &lower_path);
+ /* check for other cases */
+ if (err == -ENOENT) {
+ struct file *file;
+ const struct cred *cred = current_cred();
+
+ struct sdcardfs_name_data buffer = {
+ .ctx.actor = sdcardfs_name_match,
+ .to_find = name,
+ .name = __getname(),
+ .found = false,
+ };
+
+ if (!buffer.name) {
+ err = -ENOMEM;
+ goto out;
+ }
+ file = dentry_open(lower_parent_path, O_RDONLY, cred);
+ if (IS_ERR(file)) {
+ err = PTR_ERR(file);
+ goto put_name;
+ }
+ err = iterate_dir(file, &buffer.ctx);
+ fput(file);
+ if (err)
+ goto put_name;
+
+ if (buffer.found)
+ err = vfs_path_lookup(lower_dir_dentry,
+ lower_dir_mnt,
+ buffer.name, 0,
+ &lower_path);
+ else
+ err = -ENOENT;
+put_name:
+ __putname(buffer.name);
+ }
+
+ /* no error: handle positive dentries */
+ if (!err) {
+ /* check if the dentry is an obb dentry
+ * if true, the lower_inode must be replaced with
+ * the inode of the graft path
+ */
+
+ if (need_graft_path(dentry)) {
+
+ /* setup_obb_dentry()
+ * The lower_path will be stored to the dentry's orig_path
+ * and the base obbpath will be copyed to the lower_path variable.
+ * if an error returned, there's no change in the lower_path
+ * returns: -ERRNO if error (0: no error)
+ */
+ err = setup_obb_dentry(dentry, &lower_path);
+
+ if (err) {
+ /* if the sbi->obbpath is not available, we can optionally
+ * setup the lower_path with its orig_path.
+ * but, the current implementation just returns an error
+ * because the sdcard daemon also regards this case as
+ * a lookup fail.
+ */
+ pr_info("sdcardfs: base obbpath is not available\n");
+ sdcardfs_put_reset_orig_path(dentry);
+ goto out;
+ }
+ }
+
+ sdcardfs_set_lower_path(dentry, &lower_path);
+ ret_dentry =
+ __sdcardfs_interpose(dentry, dentry->d_sb, &lower_path, id);
+ if (IS_ERR(ret_dentry)) {
+ err = PTR_ERR(ret_dentry);
+ /* path_put underlying path on error */
+ sdcardfs_put_reset_lower_path(dentry);
+ }
+ goto out;
+ }
+
+ /*
+ * We don't consider ENOENT an error, and we want to return a
+ * negative dentry.
+ */
+ if (err && err != -ENOENT)
+ goto out;
+
+ /* instatiate a new negative dentry */
+ dname.name = name->name;
+ dname.len = name->len;
+
+ /* See if the low-level filesystem might want
+ * to use its own hash
+ */
+ lower_dentry = d_hash_and_lookup(lower_dir_dentry, &dname);
+ if (IS_ERR(lower_dentry))
+ return lower_dentry;
+
+ if (!lower_dentry) {
+ /* We called vfs_path_lookup earlier, and did not get a negative
+ * dentry then. Don't confuse the lower filesystem by forcing
+ * one on it now...
+ */
+ err = -ENOENT;
+ goto out;
+ }
+
+ lower_path.dentry = lower_dentry;
+ lower_path.mnt = mntget(lower_dir_mnt);
+ sdcardfs_set_lower_path(dentry, &lower_path);
+
+ /*
+ * If the intent is to create a file, then don't return an error, so
+ * the VFS will continue the process of making this negative dentry
+ * into a positive one.
+ */
+ if (flags & (LOOKUP_CREATE|LOOKUP_RENAME_TARGET))
+ err = 0;
+
+out:
+ if (err)
+ return ERR_PTR(err);
+ return ret_dentry;
+}
+
+/*
+ * On success:
+ * fills dentry object appropriate values and returns NULL.
+ * On fail (== error)
+ * returns error ptr
+ *
+ * @dir : Parent inode.
+ * @dentry : Target dentry to lookup. we should set each of fields.
+ * (dentry->d_name is initialized already)
+ * @nd : nameidata of parent inode
+ */
+struct dentry *sdcardfs_lookup(struct inode *dir, struct dentry *dentry,
+ unsigned int flags)
+{
+ struct dentry *ret = NULL, *parent;
+ struct path lower_parent_path;
+ int err = 0;
+ const struct cred *saved_cred = NULL;
+
+ parent = dget_parent(dentry);
+
+ if (!check_caller_access_to_name(d_inode(parent), &dentry->d_name)) {
+ ret = ERR_PTR(-EACCES);
+ goto out_err;
+ }
+
+ /* save current_cred and override it */
+ OVERRIDE_CRED_PTR(SDCARDFS_SB(dir->i_sb), saved_cred, SDCARDFS_I(dir));
+
+ sdcardfs_get_lower_path(parent, &lower_parent_path);
+
+ /* allocate dentry private data. We free it in ->d_release */
+ err = new_dentry_private_data(dentry);
+ if (err) {
+ ret = ERR_PTR(err);
+ goto out;
+ }
+
+ ret = __sdcardfs_lookup(dentry, flags, &lower_parent_path,
+ SDCARDFS_I(dir)->data->userid);
+ if (IS_ERR(ret))
+ goto out;
+ if (ret)
+ dentry = ret;
+ if (d_inode(dentry)) {
+ fsstack_copy_attr_times(d_inode(dentry),
+ sdcardfs_lower_inode(d_inode(dentry)));
+ /* get derived permission */
+ get_derived_permission(parent, dentry);
+ fixup_tmp_permissions(d_inode(dentry));
+ fixup_lower_ownership(dentry, dentry->d_name.name);
+ }
+ /* update parent directory's atime */
+ fsstack_copy_attr_atime(d_inode(parent),
+ sdcardfs_lower_inode(d_inode(parent)));
+
+out:
+ sdcardfs_put_lower_path(parent, &lower_parent_path);
+ REVERT_CRED(saved_cred);
+out_err:
+ dput(parent);
+ return ret;
+}
diff --git a/fs/sdcardfs/main.c b/fs/sdcardfs/main.c
new file mode 100644
index 0000000..27ec726
--- /dev/null
+++ b/fs/sdcardfs/main.c
@@ -0,0 +1,497 @@
+/*
+ * fs/sdcardfs/main.c
+ *
+ * Copyright (c) 2013 Samsung Electronics Co. Ltd
+ * Authors: Daeho Jeong, Woojoong Lee, Seunghwan Hyun,
+ * Sunghwan Yun, Sungjong Seo
+ *
+ * This program has been developed as a stackable file system based on
+ * the WrapFS which written by
+ *
+ * Copyright (c) 1998-2011 Erez Zadok
+ * Copyright (c) 2009 Shrikar Archak
+ * Copyright (c) 2003-2011 Stony Brook University
+ * Copyright (c) 2003-2011 The Research Foundation of SUNY
+ *
+ * This file is dual licensed. It may be redistributed and/or modified
+ * under the terms of the Apache 2.0 License OR version 2 of the GNU
+ * General Public License.
+ */
+
+#include "sdcardfs.h"
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/parser.h>
+
+enum {
+ Opt_fsuid,
+ Opt_fsgid,
+ Opt_gid,
+ Opt_debug,
+ Opt_mask,
+ Opt_multiuser,
+ Opt_userid,
+ Opt_reserved_mb,
+ Opt_gid_derivation,
+ Opt_default_normal,
+ Opt_err,
+};
+
+static const match_table_t sdcardfs_tokens = {
+ {Opt_fsuid, "fsuid=%u"},
+ {Opt_fsgid, "fsgid=%u"},
+ {Opt_gid, "gid=%u"},
+ {Opt_debug, "debug"},
+ {Opt_mask, "mask=%u"},
+ {Opt_userid, "userid=%d"},
+ {Opt_multiuser, "multiuser"},
+ {Opt_gid_derivation, "derive_gid"},
+ {Opt_default_normal, "default_normal"},
+ {Opt_reserved_mb, "reserved_mb=%u"},
+ {Opt_err, NULL}
+};
+
+static int parse_options(struct super_block *sb, char *options, int silent,
+ int *debug, struct sdcardfs_vfsmount_options *vfsopts,
+ struct sdcardfs_mount_options *opts)
+{
+ char *p;
+ substring_t args[MAX_OPT_ARGS];
+ int option;
+
+ /* by default, we use AID_MEDIA_RW as uid, gid */
+ opts->fs_low_uid = AID_MEDIA_RW;
+ opts->fs_low_gid = AID_MEDIA_RW;
+ vfsopts->mask = 0;
+ opts->multiuser = false;
+ opts->fs_user_id = 0;
+ vfsopts->gid = 0;
+ /* by default, 0MB is reserved */
+ opts->reserved_mb = 0;
+ /* by default, gid derivation is off */
+ opts->gid_derivation = false;
+ opts->default_normal = false;
+
+ *debug = 0;
+
+ if (!options)
+ return 0;
+
+ while ((p = strsep(&options, ",")) != NULL) {
+ int token;
+
+ if (!*p)
+ continue;
+
+ token = match_token(p, sdcardfs_tokens, args);
+
+ switch (token) {
+ case Opt_debug:
+ *debug = 1;
+ break;
+ case Opt_fsuid:
+ if (match_int(&args[0], &option))
+ return 0;
+ opts->fs_low_uid = option;
+ break;
+ case Opt_fsgid:
+ if (match_int(&args[0], &option))
+ return 0;
+ opts->fs_low_gid = option;
+ break;
+ case Opt_gid:
+ if (match_int(&args[0], &option))
+ return 0;
+ vfsopts->gid = option;
+ break;
+ case Opt_userid:
+ if (match_int(&args[0], &option))
+ return 0;
+ opts->fs_user_id = option;
+ break;
+ case Opt_mask:
+ if (match_int(&args[0], &option))
+ return 0;
+ vfsopts->mask = option;
+ break;
+ case Opt_multiuser:
+ opts->multiuser = true;
+ break;
+ case Opt_reserved_mb:
+ if (match_int(&args[0], &option))
+ return 0;
+ opts->reserved_mb = option;
+ break;
+ case Opt_gid_derivation:
+ opts->gid_derivation = true;
+ break;
+ case Opt_default_normal:
+ opts->default_normal = true;
+ break;
+ /* unknown option */
+ default:
+ if (!silent)
+ pr_err("Unrecognized mount option \"%s\" or missing value", p);
+ return -EINVAL;
+ }
+ }
+
+ if (*debug) {
+ pr_info("sdcardfs : options - debug:%d\n", *debug);
+ pr_info("sdcardfs : options - uid:%d\n",
+ opts->fs_low_uid);
+ pr_info("sdcardfs : options - gid:%d\n",
+ opts->fs_low_gid);
+ }
+
+ return 0;
+}
+
+int parse_options_remount(struct super_block *sb, char *options, int silent,
+ struct sdcardfs_vfsmount_options *vfsopts)
+{
+ char *p;
+ substring_t args[MAX_OPT_ARGS];
+ int option;
+ int debug;
+
+ if (!options)
+ return 0;
+
+ while ((p = strsep(&options, ",")) != NULL) {
+ int token;
+
+ if (!*p)
+ continue;
+
+ token = match_token(p, sdcardfs_tokens, args);
+
+ switch (token) {
+ case Opt_debug:
+ debug = 1;
+ break;
+ case Opt_gid:
+ if (match_int(&args[0], &option))
+ return 0;
+ vfsopts->gid = option;
+
+ break;
+ case Opt_mask:
+ if (match_int(&args[0], &option))
+ return 0;
+ vfsopts->mask = option;
+ break;
+ case Opt_default_normal:
+ case Opt_multiuser:
+ case Opt_userid:
+ case Opt_fsuid:
+ case Opt_fsgid:
+ case Opt_reserved_mb:
+ pr_warn("Option \"%s\" can't be changed during remount\n", p);
+ break;
+ /* unknown option */
+ default:
+ if (!silent)
+ pr_err("Unrecognized mount option \"%s\" or missing value", p);
+ return -EINVAL;
+ }
+ }
+
+ if (debug) {
+ pr_info("sdcardfs : options - debug:%d\n", debug);
+ pr_info("sdcardfs : options - gid:%d\n", vfsopts->gid);
+ pr_info("sdcardfs : options - mask:%d\n", vfsopts->mask);
+ }
+
+ return 0;
+}
+
+#if 0
+/*
+ * our custom d_alloc_root work-alike
+ *
+ * we can't use d_alloc_root if we want to use our own interpose function
+ * unchanged, so we simply call our own "fake" d_alloc_root
+ */
+static struct dentry *sdcardfs_d_alloc_root(struct super_block *sb)
+{
+ struct dentry *ret = NULL;
+
+ if (sb) {
+ static const struct qstr name = {
+ .name = "/",
+ .len = 1
+ };
+
+ ret = d_alloc(NULL, &name);
+ if (ret) {
+ d_set_d_op(ret, &sdcardfs_ci_dops);
+ ret->d_sb = sb;
+ ret->d_parent = ret;
+ }
+ }
+ return ret;
+}
+#endif
+
+DEFINE_MUTEX(sdcardfs_super_list_lock);
+EXPORT_SYMBOL_GPL(sdcardfs_super_list_lock);
+LIST_HEAD(sdcardfs_super_list);
+EXPORT_SYMBOL_GPL(sdcardfs_super_list);
+
+/*
+ * There is no need to lock the sdcardfs_super_info's rwsem as there is no
+ * way anyone can have a reference to the superblock at this point in time.
+ */
+static int sdcardfs_read_super(struct vfsmount *mnt, struct super_block *sb,
+ const char *dev_name, void *raw_data, int silent)
+{
+ int err = 0;
+ int debug;
+ struct super_block *lower_sb;
+ struct path lower_path;
+ struct sdcardfs_sb_info *sb_info;
+ struct sdcardfs_vfsmount_options *mnt_opt = mnt->data;
+ struct inode *inode;
+
+ pr_info("sdcardfs version 2.0\n");
+
+ if (!dev_name) {
+ pr_err("sdcardfs: read_super: missing dev_name argument\n");
+ err = -EINVAL;
+ goto out;
+ }
+
+ pr_info("sdcardfs: dev_name -> %s\n", dev_name);
+ pr_info("sdcardfs: options -> %s\n", (char *)raw_data);
+ pr_info("sdcardfs: mnt -> %p\n", mnt);
+
+ /* parse lower path */
+ err = kern_path(dev_name, LOOKUP_FOLLOW | LOOKUP_DIRECTORY,
+ &lower_path);
+ if (err) {
+ pr_err("sdcardfs: error accessing lower directory '%s'\n", dev_name);
+ goto out;
+ }
+
+ /* allocate superblock private data */
+ sb->s_fs_info = kzalloc(sizeof(struct sdcardfs_sb_info), GFP_KERNEL);
+ if (!SDCARDFS_SB(sb)) {
+ pr_crit("sdcardfs: read_super: out of memory\n");
+ err = -ENOMEM;
+ goto out_free;
+ }
+
+ sb_info = sb->s_fs_info;
+ /* parse options */
+ err = parse_options(sb, raw_data, silent, &debug, mnt_opt, &sb_info->options);
+ if (err) {
+ pr_err("sdcardfs: invalid options\n");
+ goto out_freesbi;
+ }
+
+ /* set the lower superblock field of upper superblock */
+ lower_sb = lower_path.dentry->d_sb;
+ atomic_inc(&lower_sb->s_active);
+ sdcardfs_set_lower_super(sb, lower_sb);
+
+ sb->s_stack_depth = lower_sb->s_stack_depth + 1;
+ if (sb->s_stack_depth > FILESYSTEM_MAX_STACK_DEPTH) {
+ pr_err("sdcardfs: maximum fs stacking depth exceeded\n");
+ err = -EINVAL;
+ goto out_sput;
+ }
+
+ /* inherit maxbytes from lower file system */
+ sb->s_maxbytes = lower_sb->s_maxbytes;
+
+ /*
+ * Our c/m/atime granularity is 1 ns because we may stack on file
+ * systems whose granularity is as good.
+ */
+ sb->s_time_gran = 1;
+
+ sb->s_magic = SDCARDFS_SUPER_MAGIC;
+ sb->s_op = &sdcardfs_sops;
+
+ /* get a new inode and allocate our root dentry */
+ inode = sdcardfs_iget(sb, d_inode(lower_path.dentry), 0);
+ if (IS_ERR(inode)) {
+ err = PTR_ERR(inode);
+ goto out_sput;
+ }
+ sb->s_root = d_make_root(inode);
+ if (!sb->s_root) {
+ err = -ENOMEM;
+ goto out_sput;
+ }
+ d_set_d_op(sb->s_root, &sdcardfs_ci_dops);
+
+ /* link the upper and lower dentries */
+ sb->s_root->d_fsdata = NULL;
+ err = new_dentry_private_data(sb->s_root);
+ if (err)
+ goto out_freeroot;
+
+ /* set the lower dentries for s_root */
+ sdcardfs_set_lower_path(sb->s_root, &lower_path);
+
+ /*
+ * No need to call interpose because we already have a positive
+ * dentry, which was instantiated by d_make_root. Just need to
+ * d_rehash it.
+ */
+ d_rehash(sb->s_root);
+
+ /* setup permission policy */
+ sb_info->obbpath_s = kzalloc(PATH_MAX, GFP_KERNEL);
+ mutex_lock(&sdcardfs_super_list_lock);
+ if (sb_info->options.multiuser) {
+ setup_derived_state(d_inode(sb->s_root), PERM_PRE_ROOT,
+ sb_info->options.fs_user_id, AID_ROOT);
+ snprintf(sb_info->obbpath_s, PATH_MAX, "%s/obb", dev_name);
+ } else {
+ setup_derived_state(d_inode(sb->s_root), PERM_ROOT,
+ sb_info->options.fs_user_id, AID_ROOT);
+ snprintf(sb_info->obbpath_s, PATH_MAX, "%s/Android/obb", dev_name);
+ }
+ fixup_tmp_permissions(d_inode(sb->s_root));
+ sb_info->sb = sb;
+ list_add(&sb_info->list, &sdcardfs_super_list);
+ mutex_unlock(&sdcardfs_super_list_lock);
+
+ if (!silent)
+ pr_info("sdcardfs: mounted on top of %s type %s\n",
+ dev_name, lower_sb->s_type->name);
+ goto out; /* all is well */
+
+ /* no longer needed: free_dentry_private_data(sb->s_root); */
+out_freeroot:
+ dput(sb->s_root);
+ sb->s_root = NULL;
+out_sput:
+ /* drop refs we took earlier */
+ atomic_dec(&lower_sb->s_active);
+out_freesbi:
+ kfree(SDCARDFS_SB(sb));
+ sb->s_fs_info = NULL;
+out_free:
+ path_put(&lower_path);
+
+out:
+ return err;
+}
+
+struct sdcardfs_mount_private {
+ struct vfsmount *mnt;
+ const char *dev_name;
+ void *raw_data;
+};
+
+static int __sdcardfs_fill_super(
+ struct super_block *sb,
+ void *_priv, int silent)
+{
+ struct sdcardfs_mount_private *priv = _priv;
+
+ return sdcardfs_read_super(priv->mnt,
+ sb, priv->dev_name, priv->raw_data, silent);
+}
+
+static struct dentry *sdcardfs_mount(struct vfsmount *mnt,
+ struct file_system_type *fs_type, int flags,
+ const char *dev_name, void *raw_data)
+{
+ struct sdcardfs_mount_private priv = {
+ .mnt = mnt,
+ .dev_name = dev_name,
+ .raw_data = raw_data
+ };
+
+ return mount_nodev(fs_type, flags,
+ &priv, __sdcardfs_fill_super);
+}
+
+static struct dentry *sdcardfs_mount_wrn(struct file_system_type *fs_type,
+ int flags, const char *dev_name, void *raw_data)
+{
+ WARN(1, "sdcardfs does not support mount. Use mount2.\n");
+ return ERR_PTR(-EINVAL);
+}
+
+void *sdcardfs_alloc_mnt_data(void)
+{
+ return kmalloc(sizeof(struct sdcardfs_vfsmount_options), GFP_KERNEL);
+}
+
+void sdcardfs_kill_sb(struct super_block *sb)
+{
+ struct sdcardfs_sb_info *sbi;
+
+ if (sb->s_magic == SDCARDFS_SUPER_MAGIC && sb->s_fs_info) {
+ sbi = SDCARDFS_SB(sb);
+ mutex_lock(&sdcardfs_super_list_lock);
+ list_del(&sbi->list);
+ mutex_unlock(&sdcardfs_super_list_lock);
+ }
+ kill_anon_super(sb);
+}
+
+static struct file_system_type sdcardfs_fs_type = {
+ .owner = THIS_MODULE,
+ .name = SDCARDFS_NAME,
+ .mount = sdcardfs_mount_wrn,
+ .mount2 = sdcardfs_mount,
+ .alloc_mnt_data = sdcardfs_alloc_mnt_data,
+ .kill_sb = sdcardfs_kill_sb,
+ .fs_flags = 0,
+};
+MODULE_ALIAS_FS(SDCARDFS_NAME);
+
+static int __init init_sdcardfs_fs(void)
+{
+ int err;
+
+ pr_info("Registering sdcardfs " SDCARDFS_VERSION "\n");
+
+ err = sdcardfs_init_inode_cache();
+ if (err)
+ goto out;
+ err = sdcardfs_init_dentry_cache();
+ if (err)
+ goto out;
+ err = packagelist_init();
+ if (err)
+ goto out;
+ err = register_filesystem(&sdcardfs_fs_type);
+out:
+ if (err) {
+ sdcardfs_destroy_inode_cache();
+ sdcardfs_destroy_dentry_cache();
+ packagelist_exit();
+ }
+ return err;
+}
+
+static void __exit exit_sdcardfs_fs(void)
+{
+ sdcardfs_destroy_inode_cache();
+ sdcardfs_destroy_dentry_cache();
+ packagelist_exit();
+ unregister_filesystem(&sdcardfs_fs_type);
+ pr_info("Completed sdcardfs module unload\n");
+}
+
+/* Original wrapfs authors */
+MODULE_AUTHOR("Erez Zadok, Filesystems and Storage Lab, Stony Brook University (http://www.fsl.cs.sunysb.edu/)");
+
+/* Original sdcardfs authors */
+MODULE_AUTHOR("Woojoong Lee, Daeho Jeong, Kitae Lee, Yeongjin Gil System Memory Lab., Samsung Electronics");
+
+/* Current maintainer */
+MODULE_AUTHOR("Daniel Rosenberg, Google");
+MODULE_DESCRIPTION("Sdcardfs " SDCARDFS_VERSION);
+MODULE_LICENSE("GPL");
+
+module_init(init_sdcardfs_fs);
+module_exit(exit_sdcardfs_fs);
diff --git a/fs/sdcardfs/mmap.c b/fs/sdcardfs/mmap.c
new file mode 100644
index 0000000..2847c0e
--- /dev/null
+++ b/fs/sdcardfs/mmap.c
@@ -0,0 +1,87 @@
+/*
+ * fs/sdcardfs/mmap.c
+ *
+ * Copyright (c) 2013 Samsung Electronics Co. Ltd
+ * Authors: Daeho Jeong, Woojoong Lee, Seunghwan Hyun,
+ * Sunghwan Yun, Sungjong Seo
+ *
+ * This program has been developed as a stackable file system based on
+ * the WrapFS which written by
+ *
+ * Copyright (c) 1998-2011 Erez Zadok
+ * Copyright (c) 2009 Shrikar Archak
+ * Copyright (c) 2003-2011 Stony Brook University
+ * Copyright (c) 2003-2011 The Research Foundation of SUNY
+ *
+ * This file is dual licensed. It may be redistributed and/or modified
+ * under the terms of the Apache 2.0 License OR version 2 of the GNU
+ * General Public License.
+ */
+
+#include "sdcardfs.h"
+
+static int sdcardfs_fault(struct vm_fault *vmf)
+{
+ int err;
+ struct file *file;
+ const struct vm_operations_struct *lower_vm_ops;
+
+ file = (struct file *)vmf->vma->vm_private_data;
+ lower_vm_ops = SDCARDFS_F(file)->lower_vm_ops;
+ BUG_ON(!lower_vm_ops);
+
+ err = lower_vm_ops->fault(vmf);
+ return err;
+}
+
+static void sdcardfs_vm_open(struct vm_area_struct *vma)
+{
+ struct file *file = (struct file *)vma->vm_private_data;
+
+ get_file(file);
+}
+
+static void sdcardfs_vm_close(struct vm_area_struct *vma)
+{
+ struct file *file = (struct file *)vma->vm_private_data;
+
+ fput(file);
+}
+
+static int sdcardfs_page_mkwrite(struct vm_fault *vmf)
+{
+ int err = 0;
+ struct file *file;
+ const struct vm_operations_struct *lower_vm_ops;
+
+ file = (struct file *)vmf->vma->vm_private_data;
+ lower_vm_ops = SDCARDFS_F(file)->lower_vm_ops;
+ BUG_ON(!lower_vm_ops);
+ if (!lower_vm_ops->page_mkwrite)
+ goto out;
+
+ err = lower_vm_ops->page_mkwrite(vmf);
+out:
+ return err;
+}
+
+static ssize_t sdcardfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
+{
+ /*
+ * This function should never be called directly. We need it
+ * to exist, to get past a check in open_check_o_direct(),
+ * which is called from do_last().
+ */
+ return -EINVAL;
+}
+
+const struct address_space_operations sdcardfs_aops = {
+ .direct_IO = sdcardfs_direct_IO,
+};
+
+const struct vm_operations_struct sdcardfs_vm_ops = {
+ .fault = sdcardfs_fault,
+ .page_mkwrite = sdcardfs_page_mkwrite,
+ .open = sdcardfs_vm_open,
+ .close = sdcardfs_vm_close,
+};
diff --git a/fs/sdcardfs/multiuser.h b/fs/sdcardfs/multiuser.h
new file mode 100644
index 0000000..85341e7
--- /dev/null
+++ b/fs/sdcardfs/multiuser.h
@@ -0,0 +1,53 @@
+/*
+ * fs/sdcardfs/multiuser.h
+ *
+ * Copyright (c) 2013 Samsung Electronics Co. Ltd
+ * Authors: Daeho Jeong, Woojoong Lee, Seunghwan Hyun,
+ * Sunghwan Yun, Sungjong Seo
+ *
+ * This program has been developed as a stackable file system based on
+ * the WrapFS which written by
+ *
+ * Copyright (c) 1998-2011 Erez Zadok
+ * Copyright (c) 2009 Shrikar Archak
+ * Copyright (c) 2003-2011 Stony Brook University
+ * Copyright (c) 2003-2011 The Research Foundation of SUNY
+ *
+ * This file is dual licensed. It may be redistributed and/or modified
+ * under the terms of the Apache 2.0 License OR version 2 of the GNU
+ * General Public License.
+ */
+
+#define AID_USER_OFFSET 100000 /* offset for uid ranges for each user */
+#define AID_APP_START 10000 /* first app user */
+#define AID_APP_END 19999 /* last app user */
+#define AID_CACHE_GID_START 20000 /* start of gids for apps to mark cached data */
+#define AID_EXT_GID_START 30000 /* start of gids for apps to mark external data */
+#define AID_EXT_CACHE_GID_START 40000 /* start of gids for apps to mark external cached data */
+#define AID_EXT_CACHE_GID_END 49999 /* end of gids for apps to mark external cached data */
+#define AID_SHARED_GID_START 50000 /* start of gids for apps in each user to share */
+
+typedef uid_t userid_t;
+typedef uid_t appid_t;
+
+static inline uid_t multiuser_get_uid(userid_t user_id, appid_t app_id)
+{
+ return (user_id * AID_USER_OFFSET) + (app_id % AID_USER_OFFSET);
+}
+
+static inline bool uid_is_app(uid_t uid)
+{
+ appid_t appid = uid % AID_USER_OFFSET;
+
+ return appid >= AID_APP_START && appid <= AID_APP_END;
+}
+
+static inline gid_t multiuser_get_ext_cache_gid(uid_t uid)
+{
+ return uid - AID_APP_START + AID_EXT_CACHE_GID_START;
+}
+
+static inline gid_t multiuser_get_ext_gid(uid_t uid)
+{
+ return uid - AID_APP_START + AID_EXT_GID_START;
+}
diff --git a/fs/sdcardfs/packagelist.c b/fs/sdcardfs/packagelist.c
new file mode 100644
index 0000000..4b9a563
--- /dev/null
+++ b/fs/sdcardfs/packagelist.c
@@ -0,0 +1,882 @@
+/*
+ * fs/sdcardfs/packagelist.c
+ *
+ * Copyright (c) 2013 Samsung Electronics Co. Ltd
+ * Authors: Daeho Jeong, Woojoong Lee, Seunghwan Hyun,
+ * Sunghwan Yun, Sungjong Seo
+ *
+ * This program has been developed as a stackable file system based on
+ * the WrapFS which written by
+ *
+ * Copyright (c) 1998-2011 Erez Zadok
+ * Copyright (c) 2009 Shrikar Archak
+ * Copyright (c) 2003-2011 Stony Brook University
+ * Copyright (c) 2003-2011 The Research Foundation of SUNY
+ *
+ * This file is dual licensed. It may be redistributed and/or modified
+ * under the terms of the Apache 2.0 License OR version 2 of the GNU
+ * General Public License.
+ */
+
+#include "sdcardfs.h"
+#include <linux/hashtable.h>
+#include <linux/ctype.h>
+#include <linux/delay.h>
+#include <linux/radix-tree.h>
+#include <linux/dcache.h>
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+
+#include <linux/configfs.h>
+
+struct hashtable_entry {
+ struct hlist_node hlist;
+ struct hlist_node dlist; /* for deletion cleanup */
+ struct qstr key;
+ atomic_t value;
+};
+
+static DEFINE_HASHTABLE(package_to_appid, 8);
+static DEFINE_HASHTABLE(package_to_userid, 8);
+static DEFINE_HASHTABLE(ext_to_groupid, 8);
+
+
+static struct kmem_cache *hashtable_entry_cachep;
+
+static unsigned int full_name_case_hash(const void *salt, const unsigned char *name, unsigned int len)
+{
+ unsigned long hash = init_name_hash(salt);
+
+ while (len--)
+ hash = partial_name_hash(tolower(*name++), hash);
+ return end_name_hash(hash);
+}
+
+static inline void qstr_init(struct qstr *q, const char *name)
+{
+ q->name = name;
+ q->len = strlen(q->name);
+ q->hash = full_name_case_hash(0, q->name, q->len);
+}
+
+static inline int qstr_copy(const struct qstr *src, struct qstr *dest)
+{
+ dest->name = kstrdup(src->name, GFP_KERNEL);
+ dest->hash_len = src->hash_len;
+ return !!dest->name;
+}
+
+
+static appid_t __get_appid(const struct qstr *key)
+{
+ struct hashtable_entry *hash_cur;
+ unsigned int hash = key->hash;
+ appid_t ret_id;
+
+ rcu_read_lock();
+ hash_for_each_possible_rcu(package_to_appid, hash_cur, hlist, hash) {
+ if (qstr_case_eq(key, &hash_cur->key)) {
+ ret_id = atomic_read(&hash_cur->value);
+ rcu_read_unlock();
+ return ret_id;
+ }
+ }
+ rcu_read_unlock();
+ return 0;
+}
+
+appid_t get_appid(const char *key)
+{
+ struct qstr q;
+
+ qstr_init(&q, key);
+ return __get_appid(&q);
+}
+
+static appid_t __get_ext_gid(const struct qstr *key)
+{
+ struct hashtable_entry *hash_cur;
+ unsigned int hash = key->hash;
+ appid_t ret_id;
+
+ rcu_read_lock();
+ hash_for_each_possible_rcu(ext_to_groupid, hash_cur, hlist, hash) {
+ if (qstr_case_eq(key, &hash_cur->key)) {
+ ret_id = atomic_read(&hash_cur->value);
+ rcu_read_unlock();
+ return ret_id;
+ }
+ }
+ rcu_read_unlock();
+ return 0;
+}
+
+appid_t get_ext_gid(const char *key)
+{
+ struct qstr q;
+
+ qstr_init(&q, key);
+ return __get_ext_gid(&q);
+}
+
+static appid_t __is_excluded(const struct qstr *app_name, userid_t user)
+{
+ struct hashtable_entry *hash_cur;
+ unsigned int hash = app_name->hash;
+
+ rcu_read_lock();
+ hash_for_each_possible_rcu(package_to_userid, hash_cur, hlist, hash) {
+ if (atomic_read(&hash_cur->value) == user &&
+ qstr_case_eq(app_name, &hash_cur->key)) {
+ rcu_read_unlock();
+ return 1;
+ }
+ }
+ rcu_read_unlock();
+ return 0;
+}
+
+appid_t is_excluded(const char *key, userid_t user)
+{
+ struct qstr q;
+ qstr_init(&q, key);
+ return __is_excluded(&q, user);
+}
+
+/* Kernel has already enforced everything we returned through
+ * derive_permissions_locked(), so this is used to lock down access
+ * even further, such as enforcing that apps hold sdcard_rw.
+ */
+int check_caller_access_to_name(struct inode *parent_node, const struct qstr *name)
+{
+ struct qstr q_autorun = QSTR_LITERAL("autorun.inf");
+ struct qstr q__android_secure = QSTR_LITERAL(".android_secure");
+ struct qstr q_android_secure = QSTR_LITERAL("android_secure");
+
+ /* Always block security-sensitive files at root */
+ if (parent_node && SDCARDFS_I(parent_node)->data->perm == PERM_ROOT) {
+ if (qstr_case_eq(name, &q_autorun)
+ || qstr_case_eq(name, &q__android_secure)
+ || qstr_case_eq(name, &q_android_secure)) {
+ return 0;
+ }
+ }
+
+ /* Root always has access; access for any other UIDs should always
+ * be controlled through packages.list.
+ */
+ if (from_kuid(&init_user_ns, current_fsuid()) == 0)
+ return 1;
+
+ /* No extra permissions to enforce */
+ return 1;
+}
+
+static struct hashtable_entry *alloc_hashtable_entry(const struct qstr *key,
+ appid_t value)
+{
+ struct hashtable_entry *ret = kmem_cache_alloc(hashtable_entry_cachep,
+ GFP_KERNEL);
+ if (!ret)
+ return NULL;
+ INIT_HLIST_NODE(&ret->dlist);
+ INIT_HLIST_NODE(&ret->hlist);
+
+ if (!qstr_copy(key, &ret->key)) {
+ kmem_cache_free(hashtable_entry_cachep, ret);
+ return NULL;
+ }
+
+ atomic_set(&ret->value, value);
+ return ret;
+}
+
+static int insert_packagelist_appid_entry_locked(const struct qstr *key, appid_t value)
+{
+ struct hashtable_entry *hash_cur;
+ struct hashtable_entry *new_entry;
+ unsigned int hash = key->hash;
+
+ hash_for_each_possible_rcu(package_to_appid, hash_cur, hlist, hash) {
+ if (qstr_case_eq(key, &hash_cur->key)) {
+ atomic_set(&hash_cur->value, value);
+ return 0;
+ }
+ }
+ new_entry = alloc_hashtable_entry(key, value);
+ if (!new_entry)
+ return -ENOMEM;
+ hash_add_rcu(package_to_appid, &new_entry->hlist, hash);
+ return 0;
+}
+
+static int insert_ext_gid_entry_locked(const struct qstr *key, appid_t value)
+{
+ struct hashtable_entry *hash_cur;
+ struct hashtable_entry *new_entry;
+ unsigned int hash = key->hash;
+
+ /* An extension can only belong to one gid */
+ hash_for_each_possible_rcu(ext_to_groupid, hash_cur, hlist, hash) {
+ if (qstr_case_eq(key, &hash_cur->key))
+ return -EINVAL;
+ }
+ new_entry = alloc_hashtable_entry(key, value);
+ if (!new_entry)
+ return -ENOMEM;
+ hash_add_rcu(ext_to_groupid, &new_entry->hlist, hash);
+ return 0;
+}
+
+static int insert_userid_exclude_entry_locked(const struct qstr *key, userid_t value)
+{
+ struct hashtable_entry *hash_cur;
+ struct hashtable_entry *new_entry;
+ unsigned int hash = key->hash;
+
+ /* Only insert if not already present */
+ hash_for_each_possible_rcu(package_to_userid, hash_cur, hlist, hash) {
+ if (atomic_read(&hash_cur->value) == value &&
+ qstr_case_eq(key, &hash_cur->key))
+ return 0;
+ }
+ new_entry = alloc_hashtable_entry(key, value);
+ if (!new_entry)
+ return -ENOMEM;
+ hash_add_rcu(package_to_userid, &new_entry->hlist, hash);
+ return 0;
+}
+
+static void fixup_all_perms_name(const struct qstr *key)
+{
+ struct sdcardfs_sb_info *sbinfo;
+ struct limit_search limit = {
+ .flags = BY_NAME,
+ .name = QSTR_INIT(key->name, key->len),
+ };
+ list_for_each_entry(sbinfo, &sdcardfs_super_list, list) {
+ if (sbinfo_has_sdcard_magic(sbinfo))
+ fixup_perms_recursive(sbinfo->sb->s_root, &limit);
+ }
+}
+
+static void fixup_all_perms_name_userid(const struct qstr *key, userid_t userid)
+{
+ struct sdcardfs_sb_info *sbinfo;
+ struct limit_search limit = {
+ .flags = BY_NAME | BY_USERID,
+ .name = QSTR_INIT(key->name, key->len),
+ .userid = userid,
+ };
+ list_for_each_entry(sbinfo, &sdcardfs_super_list, list) {
+ if (sbinfo_has_sdcard_magic(sbinfo))
+ fixup_perms_recursive(sbinfo->sb->s_root, &limit);
+ }
+}
+
+static void fixup_all_perms_userid(userid_t userid)
+{
+ struct sdcardfs_sb_info *sbinfo;
+ struct limit_search limit = {
+ .flags = BY_USERID,
+ .userid = userid,
+ };
+ list_for_each_entry(sbinfo, &sdcardfs_super_list, list) {
+ if (sbinfo_has_sdcard_magic(sbinfo))
+ fixup_perms_recursive(sbinfo->sb->s_root, &limit);
+ }
+}
+
+static int insert_packagelist_entry(const struct qstr *key, appid_t value)
+{
+ int err;
+
+ mutex_lock(&sdcardfs_super_list_lock);
+ err = insert_packagelist_appid_entry_locked(key, value);
+ if (!err)
+ fixup_all_perms_name(key);
+ mutex_unlock(&sdcardfs_super_list_lock);
+
+ return err;
+}
+
+static int insert_ext_gid_entry(const struct qstr *key, appid_t value)
+{
+ int err;
+
+ mutex_lock(&sdcardfs_super_list_lock);
+ err = insert_ext_gid_entry_locked(key, value);
+ mutex_unlock(&sdcardfs_super_list_lock);
+
+ return err;
+}
+
+static int insert_userid_exclude_entry(const struct qstr *key, userid_t value)
+{
+ int err;
+
+ mutex_lock(&sdcardfs_super_list_lock);
+ err = insert_userid_exclude_entry_locked(key, value);
+ if (!err)
+ fixup_all_perms_name_userid(key, value);
+ mutex_unlock(&sdcardfs_super_list_lock);
+
+ return err;
+}
+
+static void free_hashtable_entry(struct hashtable_entry *entry)
+{
+ kfree(entry->key.name);
+ kmem_cache_free(hashtable_entry_cachep, entry);
+}
+
+static void remove_packagelist_entry_locked(const struct qstr *key)
+{
+ struct hashtable_entry *hash_cur;
+ unsigned int hash = key->hash;
+ struct hlist_node *h_t;
+ HLIST_HEAD(free_list);
+
+ hash_for_each_possible_rcu(package_to_userid, hash_cur, hlist, hash) {
+ if (qstr_case_eq(key, &hash_cur->key)) {
+ hash_del_rcu(&hash_cur->hlist);
+ hlist_add_head(&hash_cur->dlist, &free_list);
+ }
+ }
+ hash_for_each_possible_rcu(package_to_appid, hash_cur, hlist, hash) {
+ if (qstr_case_eq(key, &hash_cur->key)) {
+ hash_del_rcu(&hash_cur->hlist);
+ hlist_add_head(&hash_cur->dlist, &free_list);
+ break;
+ }
+ }
+ synchronize_rcu();
+ hlist_for_each_entry_safe(hash_cur, h_t, &free_list, dlist)
+ free_hashtable_entry(hash_cur);
+}
+
+static void remove_packagelist_entry(const struct qstr *key)
+{
+ mutex_lock(&sdcardfs_super_list_lock);
+ remove_packagelist_entry_locked(key);
+ fixup_all_perms_name(key);
+ mutex_unlock(&sdcardfs_super_list_lock);
+}
+
+static void remove_ext_gid_entry_locked(const struct qstr *key, gid_t group)
+{
+ struct hashtable_entry *hash_cur;
+ unsigned int hash = key->hash;
+
+ hash_for_each_possible_rcu(ext_to_groupid, hash_cur, hlist, hash) {
+ if (qstr_case_eq(key, &hash_cur->key) && atomic_read(&hash_cur->value) == group) {
+ hash_del_rcu(&hash_cur->hlist);
+ synchronize_rcu();
+ free_hashtable_entry(hash_cur);
+ break;
+ }
+ }
+}
+
+static void remove_ext_gid_entry(const struct qstr *key, gid_t group)
+{
+ mutex_lock(&sdcardfs_super_list_lock);
+ remove_ext_gid_entry_locked(key, group);
+ mutex_unlock(&sdcardfs_super_list_lock);
+}
+
+static void remove_userid_all_entry_locked(userid_t userid)
+{
+ struct hashtable_entry *hash_cur;
+ struct hlist_node *h_t;
+ HLIST_HEAD(free_list);
+ int i;
+
+ hash_for_each_rcu(package_to_userid, i, hash_cur, hlist) {
+ if (atomic_read(&hash_cur->value) == userid) {
+ hash_del_rcu(&hash_cur->hlist);
+ hlist_add_head(&hash_cur->dlist, &free_list);
+ }
+ }
+ synchronize_rcu();
+ hlist_for_each_entry_safe(hash_cur, h_t, &free_list, dlist) {
+ free_hashtable_entry(hash_cur);
+ }
+}
+
+static void remove_userid_all_entry(userid_t userid)
+{
+ mutex_lock(&sdcardfs_super_list_lock);
+ remove_userid_all_entry_locked(userid);
+ fixup_all_perms_userid(userid);
+ mutex_unlock(&sdcardfs_super_list_lock);
+}
+
+static void remove_userid_exclude_entry_locked(const struct qstr *key, userid_t userid)
+{
+ struct hashtable_entry *hash_cur;
+ unsigned int hash = key->hash;
+
+ hash_for_each_possible_rcu(package_to_userid, hash_cur, hlist, hash) {
+ if (qstr_case_eq(key, &hash_cur->key) &&
+ atomic_read(&hash_cur->value) == userid) {
+ hash_del_rcu(&hash_cur->hlist);
+ synchronize_rcu();
+ free_hashtable_entry(hash_cur);
+ break;
+ }
+ }
+}
+
+static void remove_userid_exclude_entry(const struct qstr *key, userid_t userid)
+{
+ mutex_lock(&sdcardfs_super_list_lock);
+ remove_userid_exclude_entry_locked(key, userid);
+ fixup_all_perms_name_userid(key, userid);
+ mutex_unlock(&sdcardfs_super_list_lock);
+}
+
+static void packagelist_destroy(void)
+{
+ struct hashtable_entry *hash_cur;
+ struct hlist_node *h_t;
+ HLIST_HEAD(free_list);
+ int i;
+
+ mutex_lock(&sdcardfs_super_list_lock);
+ hash_for_each_rcu(package_to_appid, i, hash_cur, hlist) {
+ hash_del_rcu(&hash_cur->hlist);
+ hlist_add_head(&hash_cur->dlist, &free_list);
+ }
+ hash_for_each_rcu(package_to_userid, i, hash_cur, hlist) {
+ hash_del_rcu(&hash_cur->hlist);
+ hlist_add_head(&hash_cur->dlist, &free_list);
+ }
+ synchronize_rcu();
+ hlist_for_each_entry_safe(hash_cur, h_t, &free_list, dlist)
+ free_hashtable_entry(hash_cur);
+ mutex_unlock(&sdcardfs_super_list_lock);
+ pr_info("sdcardfs: destroyed packagelist pkgld\n");
+}
+
+#define SDCARDFS_CONFIGFS_ATTR(_pfx, _name) \
+static struct configfs_attribute _pfx##attr_##_name = { \
+ .ca_name = __stringify(_name), \
+ .ca_mode = S_IRUGO | S_IWUGO, \
+ .ca_owner = THIS_MODULE, \
+ .show = _pfx##_name##_show, \
+ .store = _pfx##_name##_store, \
+}
+
+#define SDCARDFS_CONFIGFS_ATTR_RO(_pfx, _name) \
+static struct configfs_attribute _pfx##attr_##_name = { \
+ .ca_name = __stringify(_name), \
+ .ca_mode = S_IRUGO, \
+ .ca_owner = THIS_MODULE, \
+ .show = _pfx##_name##_show, \
+}
+
+#define SDCARDFS_CONFIGFS_ATTR_WO(_pfx, _name) \
+static struct configfs_attribute _pfx##attr_##_name = { \
+ .ca_name = __stringify(_name), \
+ .ca_mode = S_IWUGO, \
+ .ca_owner = THIS_MODULE, \
+ .store = _pfx##_name##_store, \
+}
+
+struct package_details {
+ struct config_item item;
+ struct qstr name;
+};
+
+static inline struct package_details *to_package_details(struct config_item *item)
+{
+ return item ? container_of(item, struct package_details, item) : NULL;
+}
+
+static ssize_t package_details_appid_show(struct config_item *item, char *page)
+{
+ return scnprintf(page, PAGE_SIZE, "%u\n", __get_appid(&to_package_details(item)->name));
+}
+
+static ssize_t package_details_appid_store(struct config_item *item,
+ const char *page, size_t count)
+{
+ unsigned int tmp;
+ int ret;
+
+ ret = kstrtouint(page, 10, &tmp);
+ if (ret)
+ return ret;
+
+ ret = insert_packagelist_entry(&to_package_details(item)->name, tmp);
+
+ if (ret)
+ return ret;
+
+ return count;
+}
+
+static ssize_t package_details_excluded_userids_show(struct config_item *item,
+ char *page)
+{
+ struct package_details *package_details = to_package_details(item);
+ struct hashtable_entry *hash_cur;
+ unsigned int hash = package_details->name.hash;
+ int count = 0;
+
+ rcu_read_lock();
+ hash_for_each_possible_rcu(package_to_userid, hash_cur, hlist, hash) {
+ if (qstr_case_eq(&package_details->name, &hash_cur->key))
+ count += scnprintf(page + count, PAGE_SIZE - count,
+ "%d ", atomic_read(&hash_cur->value));
+ }
+ rcu_read_unlock();
+ if (count)
+ count--;
+ count += scnprintf(page + count, PAGE_SIZE - count, "\n");
+ return count;
+}
+
+static ssize_t package_details_excluded_userids_store(struct config_item *item,
+ const char *page, size_t count)
+{
+ unsigned int tmp;
+ int ret;
+
+ ret = kstrtouint(page, 10, &tmp);
+ if (ret)
+ return ret;
+
+ ret = insert_userid_exclude_entry(&to_package_details(item)->name, tmp);
+
+ if (ret)
+ return ret;
+
+ return count;
+}
+
+static ssize_t package_details_clear_userid_store(struct config_item *item,
+ const char *page, size_t count)
+{
+ unsigned int tmp;
+ int ret;
+
+ ret = kstrtouint(page, 10, &tmp);
+ if (ret)
+ return ret;
+ remove_userid_exclude_entry(&to_package_details(item)->name, tmp);
+ return count;
+}
+
+static void package_details_release(struct config_item *item)
+{
+ struct package_details *package_details = to_package_details(item);
+
+ pr_info("sdcardfs: removing %s\n", package_details->name.name);
+ remove_packagelist_entry(&package_details->name);
+ kfree(package_details->name.name);
+ kfree(package_details);
+}
+
+SDCARDFS_CONFIGFS_ATTR(package_details_, appid);
+SDCARDFS_CONFIGFS_ATTR(package_details_, excluded_userids);
+SDCARDFS_CONFIGFS_ATTR_WO(package_details_, clear_userid);
+
+static struct configfs_attribute *package_details_attrs[] = {
+ &package_details_attr_appid,
+ &package_details_attr_excluded_userids,
+ &package_details_attr_clear_userid,
+ NULL,
+};
+
+static struct configfs_item_operations package_details_item_ops = {
+ .release = package_details_release,
+};
+
+static struct config_item_type package_appid_type = {
+ .ct_item_ops = &package_details_item_ops,
+ .ct_attrs = package_details_attrs,
+ .ct_owner = THIS_MODULE,
+};
+
+struct extensions_value {
+ struct config_group group;
+ unsigned int num;
+};
+
+struct extension_details {
+ struct config_item item;
+ struct qstr name;
+ unsigned int num;
+};
+
+static inline struct extensions_value *to_extensions_value(struct config_item *item)
+{
+ return item ? container_of(to_config_group(item), struct extensions_value, group) : NULL;
+}
+
+static inline struct extension_details *to_extension_details(struct config_item *item)
+{
+ return item ? container_of(item, struct extension_details, item) : NULL;
+}
+
+static void extension_details_release(struct config_item *item)
+{
+ struct extension_details *extension_details = to_extension_details(item);
+
+ pr_info("sdcardfs: No longer mapping %s files to gid %d\n",
+ extension_details->name.name, extension_details->num);
+ remove_ext_gid_entry(&extension_details->name, extension_details->num);
+ kfree(extension_details->name.name);
+ kfree(extension_details);
+}
+
+static struct configfs_item_operations extension_details_item_ops = {
+ .release = extension_details_release,
+};
+
+static struct config_item_type extension_details_type = {
+ .ct_item_ops = &extension_details_item_ops,
+ .ct_owner = THIS_MODULE,
+};
+
+static struct config_item *extension_details_make_item(struct config_group *group, const char *name)
+{
+ struct extensions_value *extensions_value = to_extensions_value(&group->cg_item);
+ struct extension_details *extension_details = kzalloc(sizeof(struct extension_details), GFP_KERNEL);
+ const char *tmp;
+ int ret;
+
+ if (!extension_details)
+ return ERR_PTR(-ENOMEM);
+
+ tmp = kstrdup(name, GFP_KERNEL);
+ if (!tmp) {
+ kfree(extension_details);
+ return ERR_PTR(-ENOMEM);
+ }
+ qstr_init(&extension_details->name, tmp);
+ extension_details->num = extensions_value->num;
+ ret = insert_ext_gid_entry(&extension_details->name, extensions_value->num);
+
+ if (ret) {
+ kfree(extension_details->name.name);
+ kfree(extension_details);
+ return ERR_PTR(ret);
+ }
+ config_item_init_type_name(&extension_details->item, name, &extension_details_type);
+
+ return &extension_details->item;
+}
+
+static struct configfs_group_operations extensions_value_group_ops = {
+ .make_item = extension_details_make_item,
+};
+
+static struct config_item_type extensions_name_type = {
+ .ct_group_ops = &extensions_value_group_ops,
+ .ct_owner = THIS_MODULE,
+};
+
+static struct config_group *extensions_make_group(struct config_group *group, const char *name)
+{
+ struct extensions_value *extensions_value;
+ unsigned int tmp;
+ int ret;
+
+ extensions_value = kzalloc(sizeof(struct extensions_value), GFP_KERNEL);
+ if (!extensions_value)
+ return ERR_PTR(-ENOMEM);
+ ret = kstrtouint(name, 10, &tmp);
+ if (ret) {
+ kfree(extensions_value);
+ return ERR_PTR(ret);
+ }
+
+ extensions_value->num = tmp;
+ config_group_init_type_name(&extensions_value->group, name,
+ &extensions_name_type);
+ return &extensions_value->group;
+}
+
+static void extensions_drop_group(struct config_group *group, struct config_item *item)
+{
+ struct extensions_value *value = to_extensions_value(item);
+
+ pr_info("sdcardfs: No longer mapping any files to gid %d\n", value->num);
+ kfree(value);
+}
+
+static struct configfs_group_operations extensions_group_ops = {
+ .make_group = extensions_make_group,
+ .drop_item = extensions_drop_group,
+};
+
+static struct config_item_type extensions_type = {
+ .ct_group_ops = &extensions_group_ops,
+ .ct_owner = THIS_MODULE,
+};
+
+struct config_group extension_group = {
+ .cg_item = {
+ .ci_namebuf = "extensions",
+ .ci_type = &extensions_type,
+ },
+};
+
+static struct config_item *packages_make_item(struct config_group *group, const char *name)
+{
+ struct package_details *package_details;
+ const char *tmp;
+
+ package_details = kzalloc(sizeof(struct package_details), GFP_KERNEL);
+ if (!package_details)
+ return ERR_PTR(-ENOMEM);
+ tmp = kstrdup(name, GFP_KERNEL);
+ if (!tmp) {
+ kfree(package_details);
+ return ERR_PTR(-ENOMEM);
+ }
+ qstr_init(&package_details->name, tmp);
+ config_item_init_type_name(&package_details->item, name,
+ &package_appid_type);
+
+ return &package_details->item;
+}
+
+static ssize_t packages_list_show(struct config_item *item, char *page)
+{
+ struct hashtable_entry *hash_cur_app;
+ struct hashtable_entry *hash_cur_user;
+ int i;
+ int count = 0, written = 0;
+ const char errormsg[] = "<truncated>\n";
+ unsigned int hash;
+
+ rcu_read_lock();
+ hash_for_each_rcu(package_to_appid, i, hash_cur_app, hlist) {
+ written = scnprintf(page + count, PAGE_SIZE - sizeof(errormsg) - count, "%s %d\n",
+ hash_cur_app->key.name, atomic_read(&hash_cur_app->value));
+ hash = hash_cur_app->key.hash;
+ hash_for_each_possible_rcu(package_to_userid, hash_cur_user, hlist, hash) {
+ if (qstr_case_eq(&hash_cur_app->key, &hash_cur_user->key)) {
+ written += scnprintf(page + count + written - 1,
+ PAGE_SIZE - sizeof(errormsg) - count - written + 1,
+ " %d\n", atomic_read(&hash_cur_user->value)) - 1;
+ }
+ }
+ if (count + written == PAGE_SIZE - sizeof(errormsg) - 1) {
+ count += scnprintf(page + count, PAGE_SIZE - count, errormsg);
+ break;
+ }
+ count += written;
+ }
+ rcu_read_unlock();
+
+ return count;
+}
+
+static ssize_t packages_remove_userid_store(struct config_item *item,
+ const char *page, size_t count)
+{
+ unsigned int tmp;
+ int ret;
+
+ ret = kstrtouint(page, 10, &tmp);
+ if (ret)
+ return ret;
+ remove_userid_all_entry(tmp);
+ return count;
+}
+
+static struct configfs_attribute packages_attr_packages_gid_list = {
+ .ca_name = "packages_gid.list",
+ .ca_mode = S_IRUGO,
+ .ca_owner = THIS_MODULE,
+ .show = packages_list_show,
+};
+
+SDCARDFS_CONFIGFS_ATTR_WO(packages_, remove_userid);
+
+static struct configfs_attribute *packages_attrs[] = {
+ &packages_attr_packages_gid_list,
+ &packages_attr_remove_userid,
+ NULL,
+};
+
+/*
+ * Note that, since no extra work is required on ->drop_item(),
+ * no ->drop_item() is provided.
+ */
+static struct configfs_group_operations packages_group_ops = {
+ .make_item = packages_make_item,
+};
+
+static struct config_item_type packages_type = {
+ .ct_group_ops = &packages_group_ops,
+ .ct_attrs = packages_attrs,
+ .ct_owner = THIS_MODULE,
+};
+
+struct config_group *sd_default_groups[] = {
+ &extension_group,
+ NULL,
+};
+
+static struct configfs_subsystem sdcardfs_packages = {
+ .su_group = {
+ .cg_item = {
+ .ci_namebuf = "sdcardfs",
+ .ci_type = &packages_type,
+ },
+ },
+};
+
+static int configfs_sdcardfs_init(void)
+{
+ int ret, i;
+ struct configfs_subsystem *subsys = &sdcardfs_packages;
+
+ config_group_init(&subsys->su_group);
+ for (i = 0; sd_default_groups[i]; i++) {
+ config_group_init(sd_default_groups[i]);
+ configfs_add_default_group(sd_default_groups[i], &subsys->su_group);
+ }
+ mutex_init(&subsys->su_mutex);
+ ret = configfs_register_subsystem(subsys);
+ if (ret) {
+ pr_err("Error %d while registering subsystem %s\n",
+ ret,
+ subsys->su_group.cg_item.ci_namebuf);
+ }
+ return ret;
+}
+
+static void configfs_sdcardfs_exit(void)
+{
+ configfs_unregister_subsystem(&sdcardfs_packages);
+}
+
+int packagelist_init(void)
+{
+ hashtable_entry_cachep =
+ kmem_cache_create("packagelist_hashtable_entry",
+ sizeof(struct hashtable_entry), 0, 0, NULL);
+ if (!hashtable_entry_cachep) {
+ pr_err("sdcardfs: failed creating pkgl_hashtable entry slab cache\n");
+ return -ENOMEM;
+ }
+
+ configfs_sdcardfs_init();
+ return 0;
+}
+
+void packagelist_exit(void)
+{
+ configfs_sdcardfs_exit();
+ packagelist_destroy();
+ kmem_cache_destroy(hashtable_entry_cachep);
+}
diff --git a/fs/sdcardfs/sdcardfs.h b/fs/sdcardfs/sdcardfs.h
new file mode 100644
index 0000000..826afb5
--- /dev/null
+++ b/fs/sdcardfs/sdcardfs.h
@@ -0,0 +1,677 @@
+/*
+ * fs/sdcardfs/sdcardfs.h
+ *
+ * The sdcardfs v2.0
+ * This file system replaces the sdcard daemon on Android
+ * On version 2.0, some of the daemon functions have been ported
+ * to support the multi-user concepts of Android 4.4
+ *
+ * Copyright (c) 2013 Samsung Electronics Co. Ltd
+ * Authors: Daeho Jeong, Woojoong Lee, Seunghwan Hyun,
+ * Sunghwan Yun, Sungjong Seo
+ *
+ * This program has been developed as a stackable file system based on
+ * the WrapFS which written by
+ *
+ * Copyright (c) 1998-2011 Erez Zadok
+ * Copyright (c) 2009 Shrikar Archak
+ * Copyright (c) 2003-2011 Stony Brook University
+ * Copyright (c) 2003-2011 The Research Foundation of SUNY
+ *
+ * This file is dual licensed. It may be redistributed and/or modified
+ * under the terms of the Apache 2.0 License OR version 2 of the GNU
+ * General Public License.
+ */
+
+#ifndef _SDCARDFS_H_
+#define _SDCARDFS_H_
+
+#include <linux/dcache.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/aio.h>
+#include <linux/kref.h>
+#include <linux/mm.h>
+#include <linux/mount.h>
+#include <linux/namei.h>
+#include <linux/seq_file.h>
+#include <linux/statfs.h>
+#include <linux/fs_stack.h>
+#include <linux/magic.h>
+#include <linux/uaccess.h>
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include <linux/types.h>
+#include <linux/security.h>
+#include <linux/string.h>
+#include <linux/list.h>
+#include "multiuser.h"
+
+/* the file system name */
+#define SDCARDFS_NAME "sdcardfs"
+
+/* sdcardfs root inode number */
+#define SDCARDFS_ROOT_INO 1
+
+/* useful for tracking code reachability */
+#define UDBG pr_default("DBG:%s:%s:%d\n", __FILE__, __func__, __LINE__)
+
+#define SDCARDFS_DIRENT_SIZE 256
+
+/* temporary static uid settings for development */
+#define AID_ROOT 0 /* uid for accessing /mnt/sdcard & extSdcard */
+#define AID_MEDIA_RW 1023 /* internal media storage write access */
+
+#define AID_SDCARD_RW 1015 /* external storage write access */
+#define AID_SDCARD_R 1028 /* external storage read access */
+#define AID_SDCARD_PICS 1033 /* external storage photos access */
+#define AID_SDCARD_AV 1034 /* external storage audio/video access */
+#define AID_SDCARD_ALL 1035 /* access all users external storage */
+#define AID_MEDIA_OBB 1059 /* obb files */
+
+#define AID_SDCARD_IMAGE 1057
+
+#define AID_PACKAGE_INFO 1027
+
+
+/*
+ * Permissions are handled by our permission function.
+ * We don't want anyone who happens to look at our inode value to prematurely
+ * block access, so store more permissive values. These are probably never
+ * used.
+ */
+#define fixup_tmp_permissions(x) \
+ do { \
+ (x)->i_uid = make_kuid(&init_user_ns, \
+ SDCARDFS_I(x)->data->d_uid); \
+ (x)->i_gid = make_kgid(&init_user_ns, AID_SDCARD_RW); \
+ (x)->i_mode = ((x)->i_mode & S_IFMT) | 0775;\
+ } while (0)
+
+/* OVERRIDE_CRED() and REVERT_CRED()
+ * OVERRIDE_CRED()
+ * backup original task->cred
+ * and modifies task->cred->fsuid/fsgid to specified value.
+ * REVERT_CRED()
+ * restore original task->cred->fsuid/fsgid.
+ * These two macro should be used in pair, and OVERRIDE_CRED() should be
+ * placed at the beginning of a function, right after variable declaration.
+ */
+#define OVERRIDE_CRED(sdcardfs_sbi, saved_cred, info) \
+ do { \
+ saved_cred = override_fsids(sdcardfs_sbi, info->data); \
+ if (!saved_cred) \
+ return -ENOMEM; \
+ } while (0)
+
+#define OVERRIDE_CRED_PTR(sdcardfs_sbi, saved_cred, info) \
+ do { \
+ saved_cred = override_fsids(sdcardfs_sbi, info->data); \
+ if (!saved_cred) \
+ return ERR_PTR(-ENOMEM); \
+ } while (0)
+
+#define REVERT_CRED(saved_cred) revert_fsids(saved_cred)
+
+/* Android 5.0 support */
+
+/* Permission mode for a specific node. Controls how file permissions
+ * are derived for children nodes.
+ */
+typedef enum {
+ /* Nothing special; this node should just inherit from its parent. */
+ PERM_INHERIT,
+ /* This node is one level above a normal root; used for legacy layouts
+ * which use the first level to represent user_id.
+ */
+ PERM_PRE_ROOT,
+ /* This node is "/" */
+ PERM_ROOT,
+ /* This node is "/Android" */
+ PERM_ANDROID,
+ /* This node is "/Android/data" */
+ PERM_ANDROID_DATA,
+ /* This node is "/Android/obb" */
+ PERM_ANDROID_OBB,
+ /* This node is "/Android/media" */
+ PERM_ANDROID_MEDIA,
+ /* This node is "/Android/[data|media|obb]/[package]" */
+ PERM_ANDROID_PACKAGE,
+ /* This node is "/Android/[data|media|obb]/[package]/cache" */
+ PERM_ANDROID_PACKAGE_CACHE,
+} perm_t;
+
+struct sdcardfs_sb_info;
+struct sdcardfs_mount_options;
+struct sdcardfs_inode_info;
+struct sdcardfs_inode_data;
+
+/* Do not directly use this function. Use OVERRIDE_CRED() instead. */
+const struct cred *override_fsids(struct sdcardfs_sb_info *sbi,
+ struct sdcardfs_inode_data *data);
+/* Do not directly use this function, use REVERT_CRED() instead. */
+void revert_fsids(const struct cred *old_cred);
+
+/* operations vectors defined in specific files */
+extern const struct file_operations sdcardfs_main_fops;
+extern const struct file_operations sdcardfs_dir_fops;
+extern const struct inode_operations sdcardfs_main_iops;
+extern const struct inode_operations sdcardfs_dir_iops;
+extern const struct inode_operations sdcardfs_symlink_iops;
+extern const struct super_operations sdcardfs_sops;
+extern const struct dentry_operations sdcardfs_ci_dops;
+extern const struct address_space_operations sdcardfs_aops, sdcardfs_dummy_aops;
+extern const struct vm_operations_struct sdcardfs_vm_ops;
+
+extern int sdcardfs_init_inode_cache(void);
+extern void sdcardfs_destroy_inode_cache(void);
+extern int sdcardfs_init_dentry_cache(void);
+extern void sdcardfs_destroy_dentry_cache(void);
+extern int new_dentry_private_data(struct dentry *dentry);
+extern void free_dentry_private_data(struct dentry *dentry);
+extern struct dentry *sdcardfs_lookup(struct inode *dir, struct dentry *dentry,
+ unsigned int flags);
+extern struct inode *sdcardfs_iget(struct super_block *sb,
+ struct inode *lower_inode, userid_t id);
+extern int sdcardfs_interpose(struct dentry *dentry, struct super_block *sb,
+ struct path *lower_path, userid_t id);
+
+/* file private data */
+struct sdcardfs_file_info {
+ struct file *lower_file;
+ const struct vm_operations_struct *lower_vm_ops;
+};
+
+struct sdcardfs_inode_data {
+ struct kref refcount;
+ bool abandoned;
+
+ perm_t perm;
+ userid_t userid;
+ uid_t d_uid;
+ bool under_android;
+ bool under_cache;
+ bool under_obb;
+};
+
+/* sdcardfs inode data in memory */
+struct sdcardfs_inode_info {
+ struct inode *lower_inode;
+ /* state derived based on current position in hierarchy */
+ struct sdcardfs_inode_data *data;
+
+ /* top folder for ownership */
+ spinlock_t top_lock;
+ struct sdcardfs_inode_data *top_data;
+
+ struct inode vfs_inode;
+};
+
+
+/* sdcardfs dentry data in memory */
+struct sdcardfs_dentry_info {
+ spinlock_t lock; /* protects lower_path */
+ struct path lower_path;
+ struct path orig_path;
+};
+
+struct sdcardfs_mount_options {
+ uid_t fs_low_uid;
+ gid_t fs_low_gid;
+ userid_t fs_user_id;
+ bool multiuser;
+ bool gid_derivation;
+ bool default_normal;
+ unsigned int reserved_mb;
+};
+
+struct sdcardfs_vfsmount_options {
+ gid_t gid;
+ mode_t mask;
+};
+
+extern int parse_options_remount(struct super_block *sb, char *options, int silent,
+ struct sdcardfs_vfsmount_options *vfsopts);
+
+/* sdcardfs super-block data in memory */
+struct sdcardfs_sb_info {
+ struct super_block *sb;
+ struct super_block *lower_sb;
+ /* derived perm policy : some of options have been added
+ * to sdcardfs_mount_options (Android 4.4 support)
+ */
+ struct sdcardfs_mount_options options;
+ spinlock_t lock; /* protects obbpath */
+ char *obbpath_s;
+ struct path obbpath;
+ void *pkgl_id;
+ struct list_head list;
+};
+
+/*
+ * inode to private data
+ *
+ * Since we use containers and the struct inode is _inside_ the
+ * sdcardfs_inode_info structure, SDCARDFS_I will always (given a non-NULL
+ * inode pointer), return a valid non-NULL pointer.
+ */
+static inline struct sdcardfs_inode_info *SDCARDFS_I(const struct inode *inode)
+{
+ return container_of(inode, struct sdcardfs_inode_info, vfs_inode);
+}
+
+/* dentry to private data */
+#define SDCARDFS_D(dent) ((struct sdcardfs_dentry_info *)(dent)->d_fsdata)
+
+/* superblock to private data */
+#define SDCARDFS_SB(super) ((struct sdcardfs_sb_info *)(super)->s_fs_info)
+
+/* file to private Data */
+#define SDCARDFS_F(file) ((struct sdcardfs_file_info *)((file)->private_data))
+
+/* file to lower file */
+static inline struct file *sdcardfs_lower_file(const struct file *f)
+{
+ return SDCARDFS_F(f)->lower_file;
+}
+
+static inline void sdcardfs_set_lower_file(struct file *f, struct file *val)
+{
+ SDCARDFS_F(f)->lower_file = val;
+}
+
+/* inode to lower inode. */
+static inline struct inode *sdcardfs_lower_inode(const struct inode *i)
+{
+ return SDCARDFS_I(i)->lower_inode;
+}
+
+static inline void sdcardfs_set_lower_inode(struct inode *i, struct inode *val)
+{
+ SDCARDFS_I(i)->lower_inode = val;
+}
+
+/* superblock to lower superblock */
+static inline struct super_block *sdcardfs_lower_super(
+ const struct super_block *sb)
+{
+ return SDCARDFS_SB(sb)->lower_sb;
+}
+
+static inline void sdcardfs_set_lower_super(struct super_block *sb,
+ struct super_block *val)
+{
+ SDCARDFS_SB(sb)->lower_sb = val;
+}
+
+/* path based (dentry/mnt) macros */
+static inline void pathcpy(struct path *dst, const struct path *src)
+{
+ dst->dentry = src->dentry;
+ dst->mnt = src->mnt;
+}
+
+/* sdcardfs_get_pname functions calls path_get()
+ * therefore, the caller must call "proper" path_put functions
+ */
+#define SDCARDFS_DENT_FUNC(pname) \
+static inline void sdcardfs_get_##pname(const struct dentry *dent, \
+ struct path *pname) \
+{ \
+ spin_lock(&SDCARDFS_D(dent)->lock); \
+ pathcpy(pname, &SDCARDFS_D(dent)->pname); \
+ path_get(pname); \
+ spin_unlock(&SDCARDFS_D(dent)->lock); \
+ return; \
+} \
+static inline void sdcardfs_put_##pname(const struct dentry *dent, \
+ struct path *pname) \
+{ \
+ path_put(pname); \
+ return; \
+} \
+static inline void sdcardfs_set_##pname(const struct dentry *dent, \
+ struct path *pname) \
+{ \
+ spin_lock(&SDCARDFS_D(dent)->lock); \
+ pathcpy(&SDCARDFS_D(dent)->pname, pname); \
+ spin_unlock(&SDCARDFS_D(dent)->lock); \
+ return; \
+} \
+static inline void sdcardfs_reset_##pname(const struct dentry *dent) \
+{ \
+ spin_lock(&SDCARDFS_D(dent)->lock); \
+ SDCARDFS_D(dent)->pname.dentry = NULL; \
+ SDCARDFS_D(dent)->pname.mnt = NULL; \
+ spin_unlock(&SDCARDFS_D(dent)->lock); \
+ return; \
+} \
+static inline void sdcardfs_put_reset_##pname(const struct dentry *dent) \
+{ \
+ struct path pname; \
+ spin_lock(&SDCARDFS_D(dent)->lock); \
+ if (SDCARDFS_D(dent)->pname.dentry) { \
+ pathcpy(&pname, &SDCARDFS_D(dent)->pname); \
+ SDCARDFS_D(dent)->pname.dentry = NULL; \
+ SDCARDFS_D(dent)->pname.mnt = NULL; \
+ spin_unlock(&SDCARDFS_D(dent)->lock); \
+ path_put(&pname); \
+ } else \
+ spin_unlock(&SDCARDFS_D(dent)->lock); \
+ return; \
+}
+
+SDCARDFS_DENT_FUNC(lower_path)
+SDCARDFS_DENT_FUNC(orig_path)
+
+static inline bool sbinfo_has_sdcard_magic(struct sdcardfs_sb_info *sbinfo)
+{
+ return sbinfo && sbinfo->sb
+ && sbinfo->sb->s_magic == SDCARDFS_SUPER_MAGIC;
+}
+
+static inline struct sdcardfs_inode_data *data_get(
+ struct sdcardfs_inode_data *data)
+{
+ if (data)
+ kref_get(&data->refcount);
+ return data;
+}
+
+static inline struct sdcardfs_inode_data *top_data_get(
+ struct sdcardfs_inode_info *info)
+{
+ struct sdcardfs_inode_data *top_data;
+
+ spin_lock(&info->top_lock);
+ top_data = data_get(info->top_data);
+ spin_unlock(&info->top_lock);
+ return top_data;
+}
+
+extern void data_release(struct kref *ref);
+
+static inline void data_put(struct sdcardfs_inode_data *data)
+{
+ kref_put(&data->refcount, data_release);
+}
+
+static inline void release_own_data(struct sdcardfs_inode_info *info)
+{
+ /*
+ * This happens exactly once per inode. At this point, the inode that
+ * originally held this data is about to be freed, and all references
+ * to it are held as a top value, and will likely be released soon.
+ */
+ info->data->abandoned = true;
+ data_put(info->data);
+}
+
+static inline void set_top(struct sdcardfs_inode_info *info,
+ struct sdcardfs_inode_info *top_owner)
+{
+ struct sdcardfs_inode_data *old_top;
+ struct sdcardfs_inode_data *new_top = NULL;
+
+ if (top_owner)
+ new_top = top_data_get(top_owner);
+
+ spin_lock(&info->top_lock);
+ old_top = info->top_data;
+ info->top_data = new_top;
+ if (old_top)
+ data_put(old_top);
+ spin_unlock(&info->top_lock);
+}
+
+static inline int get_gid(struct vfsmount *mnt,
+ struct super_block *sb,
+ struct sdcardfs_inode_data *data)
+{
+ struct sdcardfs_vfsmount_options *vfsopts = mnt->data;
+ struct sdcardfs_sb_info *sbi = SDCARDFS_SB(sb);
+
+ if (vfsopts->gid == AID_SDCARD_RW && !sbi->options.default_normal)
+ /* As an optimization, certain trusted system components only run
+ * as owner but operate across all users. Since we're now handing
+ * out the sdcard_rw GID only to trusted apps, we're okay relaxing
+ * the user boundary enforcement for the default view. The UIDs
+ * assigned to app directories are still multiuser aware.
+ */
+ return AID_SDCARD_RW;
+ else
+ return multiuser_get_uid(data->userid, vfsopts->gid);
+}
+
+static inline int get_mode(struct vfsmount *mnt,
+ struct sdcardfs_inode_info *info,
+ struct sdcardfs_inode_data *data)
+{
+ int owner_mode;
+ int filtered_mode;
+ struct sdcardfs_vfsmount_options *opts = mnt->data;
+ int visible_mode = 0775 & ~opts->mask;
+
+
+ if (data->perm == PERM_PRE_ROOT) {
+ /* Top of multi-user view should always be visible to ensure
+ * secondary users can traverse inside.
+ */
+ visible_mode = 0711;
+ } else if (data->under_android) {
+ /* Block "other" access to Android directories, since only apps
+ * belonging to a specific user should be in there; we still
+ * leave +x open for the default view.
+ */
+ if (opts->gid == AID_SDCARD_RW)
+ visible_mode = visible_mode & ~0006;
+ else
+ visible_mode = visible_mode & ~0007;
+ }
+ owner_mode = info->lower_inode->i_mode & 0700;
+ filtered_mode = visible_mode & (owner_mode | (owner_mode >> 3) | (owner_mode >> 6));
+ return filtered_mode;
+}
+
+static inline int has_graft_path(const struct dentry *dent)
+{
+ int ret = 0;
+
+ spin_lock(&SDCARDFS_D(dent)->lock);
+ if (SDCARDFS_D(dent)->orig_path.dentry != NULL)
+ ret = 1;
+ spin_unlock(&SDCARDFS_D(dent)->lock);
+
+ return ret;
+}
+
+static inline void sdcardfs_get_real_lower(const struct dentry *dent,
+ struct path *real_lower)
+{
+ /* in case of a local obb dentry
+ * the orig_path should be returned
+ */
+ if (has_graft_path(dent))
+ sdcardfs_get_orig_path(dent, real_lower);
+ else
+ sdcardfs_get_lower_path(dent, real_lower);
+}
+
+static inline void sdcardfs_put_real_lower(const struct dentry *dent,
+ struct path *real_lower)
+{
+ if (has_graft_path(dent))
+ sdcardfs_put_orig_path(dent, real_lower);
+ else
+ sdcardfs_put_lower_path(dent, real_lower);
+}
+
+extern struct mutex sdcardfs_super_list_lock;
+extern struct list_head sdcardfs_super_list;
+
+/* for packagelist.c */
+extern appid_t get_appid(const char *app_name);
+extern appid_t get_ext_gid(const char *app_name);
+extern appid_t is_excluded(const char *app_name, userid_t userid);
+extern int check_caller_access_to_name(struct inode *parent_node, const struct qstr *name);
+extern int packagelist_init(void);
+extern void packagelist_exit(void);
+
+/* for derived_perm.c */
+#define BY_NAME (1 << 0)
+#define BY_USERID (1 << 1)
+struct limit_search {
+ unsigned int flags;
+ struct qstr name;
+ userid_t userid;
+};
+
+extern void setup_derived_state(struct inode *inode, perm_t perm,
+ userid_t userid, uid_t uid);
+extern void get_derived_permission(struct dentry *parent, struct dentry *dentry);
+extern void get_derived_permission_new(struct dentry *parent, struct dentry *dentry, const struct qstr *name);
+extern void fixup_perms_recursive(struct dentry *dentry, struct limit_search *limit);
+
+extern void update_derived_permission_lock(struct dentry *dentry);
+void fixup_lower_ownership(struct dentry *dentry, const char *name);
+extern int need_graft_path(struct dentry *dentry);
+extern int is_base_obbpath(struct dentry *dentry);
+extern int is_obbpath_invalid(struct dentry *dentry);
+extern int setup_obb_dentry(struct dentry *dentry, struct path *lower_path);
+
+/* locking helpers */
+static inline struct dentry *lock_parent(struct dentry *dentry)
+{
+ struct dentry *dir = dget_parent(dentry);
+
+ inode_lock_nested(d_inode(dir), I_MUTEX_PARENT);
+ return dir;
+}
+
+static inline void unlock_dir(struct dentry *dir)
+{
+ inode_unlock(d_inode(dir));
+ dput(dir);
+}
+
+static inline int prepare_dir(const char *path_s, uid_t uid, gid_t gid, mode_t mode)
+{
+ int err;
+ struct dentry *dent;
+ struct iattr attrs;
+ struct path parent;
+
+ dent = kern_path_locked(path_s, &parent);
+ if (IS_ERR(dent)) {
+ err = PTR_ERR(dent);
+ if (err == -EEXIST)
+ err = 0;
+ goto out_unlock;
+ }
+
+ err = vfs_mkdir2(parent.mnt, d_inode(parent.dentry), dent, mode);
+ if (err) {
+ if (err == -EEXIST)
+ err = 0;
+ goto out_dput;
+ }
+
+ attrs.ia_uid = make_kuid(&init_user_ns, uid);
+ attrs.ia_gid = make_kgid(&init_user_ns, gid);
+ attrs.ia_valid = ATTR_UID | ATTR_GID;
+ inode_lock(d_inode(dent));
+ notify_change2(parent.mnt, dent, &attrs, NULL);
+ inode_unlock(d_inode(dent));
+
+out_dput:
+ dput(dent);
+
+out_unlock:
+ /* parent dentry locked by lookup_create */
+ inode_unlock(d_inode(parent.dentry));
+ path_put(&parent);
+ return err;
+}
+
+/*
+ * Return 1, if a disk has enough free space, otherwise 0.
+ * We assume that any files can not be overwritten.
+ */
+static inline int check_min_free_space(struct dentry *dentry, size_t size, int dir)
+{
+ int err;
+ struct path lower_path;
+ struct kstatfs statfs;
+ u64 avail;
+ struct sdcardfs_sb_info *sbi = SDCARDFS_SB(dentry->d_sb);
+
+ if (sbi->options.reserved_mb) {
+ /* Get fs stat of lower filesystem. */
+ sdcardfs_get_lower_path(dentry, &lower_path);
+ err = vfs_statfs(&lower_path, &statfs);
+ sdcardfs_put_lower_path(dentry, &lower_path);
+
+ if (unlikely(err))
+ return 0;
+
+ /* Invalid statfs informations. */
+ if (unlikely(statfs.f_bsize == 0))
+ return 0;
+
+ /* if you are checking directory, set size to f_bsize. */
+ if (unlikely(dir))
+ size = statfs.f_bsize;
+
+ /* available size */
+ avail = statfs.f_bavail * statfs.f_bsize;
+
+ /* not enough space */
+ if ((u64)size > avail)
+ return 0;
+
+ /* enough space */
+ if ((avail - size) > (sbi->options.reserved_mb * 1024 * 1024))
+ return 1;
+
+ return 0;
+ } else
+ return 1;
+}
+
+/*
+ * Copies attrs and maintains sdcardfs managed attrs
+ * Since our permission check handles all special permissions, set those to be open
+ */
+static inline void sdcardfs_copy_and_fix_attrs(struct inode *dest, const struct inode *src)
+{
+ dest->i_mode = (src->i_mode & S_IFMT) | S_IRWXU | S_IRWXG |
+ S_IROTH | S_IXOTH; /* 0775 */
+ dest->i_uid = make_kuid(&init_user_ns, SDCARDFS_I(dest)->data->d_uid);
+ dest->i_gid = make_kgid(&init_user_ns, AID_SDCARD_RW);
+ dest->i_rdev = src->i_rdev;
+ dest->i_atime = src->i_atime;
+ dest->i_mtime = src->i_mtime;
+ dest->i_ctime = src->i_ctime;
+ dest->i_blkbits = src->i_blkbits;
+ dest->i_flags = src->i_flags;
+ set_nlink(dest, src->i_nlink);
+}
+
+static inline bool str_case_eq(const char *s1, const char *s2)
+{
+ return !strcasecmp(s1, s2);
+}
+
+static inline bool str_n_case_eq(const char *s1, const char *s2, size_t len)
+{
+ return !strncasecmp(s1, s2, len);
+}
+
+static inline bool qstr_case_eq(const struct qstr *q1, const struct qstr *q2)
+{
+ return q1->len == q2->len && str_n_case_eq(q1->name, q2->name, q2->len);
+}
+
+#define QSTR_LITERAL(string) QSTR_INIT(string, sizeof(string)-1)
+
+#endif /* not _SDCARDFS_H_ */
diff --git a/fs/sdcardfs/super.c b/fs/sdcardfs/super.c
new file mode 100644
index 0000000..cffcdb1
--- /dev/null
+++ b/fs/sdcardfs/super.c
@@ -0,0 +1,331 @@
+/*
+ * fs/sdcardfs/super.c
+ *
+ * Copyright (c) 2013 Samsung Electronics Co. Ltd
+ * Authors: Daeho Jeong, Woojoong Lee, Seunghwan Hyun,
+ * Sunghwan Yun, Sungjong Seo
+ *
+ * This program has been developed as a stackable file system based on
+ * the WrapFS which written by
+ *
+ * Copyright (c) 1998-2011 Erez Zadok
+ * Copyright (c) 2009 Shrikar Archak
+ * Copyright (c) 2003-2011 Stony Brook University
+ * Copyright (c) 2003-2011 The Research Foundation of SUNY
+ *
+ * This file is dual licensed. It may be redistributed and/or modified
+ * under the terms of the Apache 2.0 License OR version 2 of the GNU
+ * General Public License.
+ */
+
+#include "sdcardfs.h"
+
+/*
+ * The inode cache is used with alloc_inode for both our inode info and the
+ * vfs inode.
+ */
+static struct kmem_cache *sdcardfs_inode_cachep;
+
+/*
+ * To support the top references, we must track some data separately.
+ * An sdcardfs_inode_info always has a reference to its data, and once set up,
+ * also has a reference to its top. The top may be itself, in which case it
+ * holds two references to its data. When top is changed, it takes a ref to the
+ * new data and then drops the ref to the old data.
+ */
+static struct kmem_cache *sdcardfs_inode_data_cachep;
+
+void data_release(struct kref *ref)
+{
+ struct sdcardfs_inode_data *data =
+ container_of(ref, struct sdcardfs_inode_data, refcount);
+
+ kmem_cache_free(sdcardfs_inode_data_cachep, data);
+}
+
+/* final actions when unmounting a file system */
+static void sdcardfs_put_super(struct super_block *sb)
+{
+ struct sdcardfs_sb_info *spd;
+ struct super_block *s;
+
+ spd = SDCARDFS_SB(sb);
+ if (!spd)
+ return;
+
+ if (spd->obbpath_s) {
+ kfree(spd->obbpath_s);
+ path_put(&spd->obbpath);
+ }
+
+ /* decrement lower super references */
+ s = sdcardfs_lower_super(sb);
+ sdcardfs_set_lower_super(sb, NULL);
+ atomic_dec(&s->s_active);
+
+ kfree(spd);
+ sb->s_fs_info = NULL;
+}
+
+static int sdcardfs_statfs(struct dentry *dentry, struct kstatfs *buf)
+{
+ int err;
+ struct path lower_path;
+ u32 min_blocks;
+ struct sdcardfs_sb_info *sbi = SDCARDFS_SB(dentry->d_sb);
+
+ sdcardfs_get_lower_path(dentry, &lower_path);
+ err = vfs_statfs(&lower_path, buf);
+ sdcardfs_put_lower_path(dentry, &lower_path);
+
+ if (sbi->options.reserved_mb) {
+ /* Invalid statfs informations. */
+ if (buf->f_bsize == 0) {
+ pr_err("Returned block size is zero.\n");
+ return -EINVAL;
+ }
+
+ min_blocks = ((sbi->options.reserved_mb * 1024 * 1024)/buf->f_bsize);
+ buf->f_blocks -= min_blocks;
+
+ if (buf->f_bavail > min_blocks)
+ buf->f_bavail -= min_blocks;
+ else
+ buf->f_bavail = 0;
+
+ /* Make reserved blocks invisiable to media storage */
+ buf->f_bfree = buf->f_bavail;
+ }
+
+ /* set return buf to our f/s to avoid confusing user-level utils */
+ buf->f_type = SDCARDFS_SUPER_MAGIC;
+
+ return err;
+}
+
+/*
+ * @flags: numeric mount options
+ * @options: mount options string
+ */
+static int sdcardfs_remount_fs(struct super_block *sb, int *flags, char *options)
+{
+ int err = 0;
+
+ /*
+ * The VFS will take care of "ro" and "rw" flags among others. We
+ * can safely accept a few flags (RDONLY, MANDLOCK), and honor
+ * SILENT, but anything else left over is an error.
+ */
+ if ((*flags & ~(MS_RDONLY | MS_MANDLOCK | MS_SILENT)) != 0) {
+ pr_err("sdcardfs: remount flags 0x%x unsupported\n", *flags);
+ err = -EINVAL;
+ }
+
+ return err;
+}
+
+/*
+ * @mnt: mount point we are remounting
+ * @sb: superblock we are remounting
+ * @flags: numeric mount options
+ * @options: mount options string
+ */
+static int sdcardfs_remount_fs2(struct vfsmount *mnt, struct super_block *sb,
+ int *flags, char *options)
+{
+ int err = 0;
+
+ /*
+ * The VFS will take care of "ro" and "rw" flags among others. We
+ * can safely accept a few flags (RDONLY, MANDLOCK), and honor
+ * SILENT, but anything else left over is an error.
+ */
+ if ((*flags & ~(MS_RDONLY | MS_MANDLOCK | MS_SILENT | MS_REMOUNT)) != 0) {
+ pr_err("sdcardfs: remount flags 0x%x unsupported\n", *flags);
+ err = -EINVAL;
+ }
+ pr_info("Remount options were %s for vfsmnt %p.\n", options, mnt);
+ err = parse_options_remount(sb, options, *flags & ~MS_SILENT, mnt->data);
+
+
+ return err;
+}
+
+static void *sdcardfs_clone_mnt_data(void *data)
+{
+ struct sdcardfs_vfsmount_options *opt = kmalloc(sizeof(struct sdcardfs_vfsmount_options), GFP_KERNEL);
+ struct sdcardfs_vfsmount_options *old = data;
+
+ if (!opt)
+ return NULL;
+ opt->gid = old->gid;
+ opt->mask = old->mask;
+ return opt;
+}
+
+static void sdcardfs_copy_mnt_data(void *data, void *newdata)
+{
+ struct sdcardfs_vfsmount_options *old = data;
+ struct sdcardfs_vfsmount_options *new = newdata;
+
+ old->gid = new->gid;
+ old->mask = new->mask;
+}
+
+/*
+ * Called by iput() when the inode reference count reached zero
+ * and the inode is not hashed anywhere. Used to clear anything
+ * that needs to be, before the inode is completely destroyed and put
+ * on the inode free list.
+ */
+static void sdcardfs_evict_inode(struct inode *inode)
+{
+ struct inode *lower_inode;
+
+ truncate_inode_pages(&inode->i_data, 0);
+ set_top(SDCARDFS_I(inode), NULL);
+ clear_inode(inode);
+ /*
+ * Decrement a reference to a lower_inode, which was incremented
+ * by our read_inode when it was created initially.
+ */
+ lower_inode = sdcardfs_lower_inode(inode);
+ sdcardfs_set_lower_inode(inode, NULL);
+ iput(lower_inode);
+}
+
+static struct inode *sdcardfs_alloc_inode(struct super_block *sb)
+{
+ struct sdcardfs_inode_info *i;
+ struct sdcardfs_inode_data *d;
+
+ i = kmem_cache_alloc(sdcardfs_inode_cachep, GFP_KERNEL);
+ if (!i)
+ return NULL;
+
+ /* memset everything up to the inode to 0 */
+ memset(i, 0, offsetof(struct sdcardfs_inode_info, vfs_inode));
+
+ d = kmem_cache_alloc(sdcardfs_inode_data_cachep,
+ GFP_KERNEL | __GFP_ZERO);
+ if (!d) {
+ kmem_cache_free(sdcardfs_inode_cachep, i);
+ return NULL;
+ }
+
+ i->data = d;
+ kref_init(&d->refcount);
+ i->top_data = d;
+ spin_lock_init(&i->top_lock);
+ kref_get(&d->refcount);
+
+ i->vfs_inode.i_version = 1;
+ return &i->vfs_inode;
+}
+
+static void i_callback(struct rcu_head *head)
+{
+ struct inode *inode = container_of(head, struct inode, i_rcu);
+
+ release_own_data(SDCARDFS_I(inode));
+ kmem_cache_free(sdcardfs_inode_cachep, SDCARDFS_I(inode));
+}
+
+static void sdcardfs_destroy_inode(struct inode *inode)
+{
+ call_rcu(&inode->i_rcu, i_callback);
+}
+
+/* sdcardfs inode cache constructor */
+static void init_once(void *obj)
+{
+ struct sdcardfs_inode_info *i = obj;
+
+ inode_init_once(&i->vfs_inode);
+}
+
+int sdcardfs_init_inode_cache(void)
+{
+ sdcardfs_inode_cachep =
+ kmem_cache_create("sdcardfs_inode_cache",
+ sizeof(struct sdcardfs_inode_info), 0,
+ SLAB_RECLAIM_ACCOUNT, init_once);
+
+ if (!sdcardfs_inode_cachep)
+ return -ENOMEM;
+
+ sdcardfs_inode_data_cachep =
+ kmem_cache_create("sdcardfs_inode_data_cache",
+ sizeof(struct sdcardfs_inode_data), 0,
+ SLAB_RECLAIM_ACCOUNT, NULL);
+ if (!sdcardfs_inode_data_cachep) {
+ kmem_cache_destroy(sdcardfs_inode_cachep);
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+/* sdcardfs inode cache destructor */
+void sdcardfs_destroy_inode_cache(void)
+{
+ kmem_cache_destroy(sdcardfs_inode_data_cachep);
+ kmem_cache_destroy(sdcardfs_inode_cachep);
+}
+
+/*
+ * Used only in nfs, to kill any pending RPC tasks, so that subsequent
+ * code can actually succeed and won't leave tasks that need handling.
+ */
+static void sdcardfs_umount_begin(struct super_block *sb)
+{
+ struct super_block *lower_sb;
+
+ lower_sb = sdcardfs_lower_super(sb);
+ if (lower_sb && lower_sb->s_op && lower_sb->s_op->umount_begin)
+ lower_sb->s_op->umount_begin(lower_sb);
+}
+
+static int sdcardfs_show_options(struct vfsmount *mnt, struct seq_file *m,
+ struct dentry *root)
+{
+ struct sdcardfs_sb_info *sbi = SDCARDFS_SB(root->d_sb);
+ struct sdcardfs_mount_options *opts = &sbi->options;
+ struct sdcardfs_vfsmount_options *vfsopts = mnt->data;
+
+ if (opts->fs_low_uid != 0)
+ seq_printf(m, ",fsuid=%u", opts->fs_low_uid);
+ if (opts->fs_low_gid != 0)
+ seq_printf(m, ",fsgid=%u", opts->fs_low_gid);
+ if (vfsopts->gid != 0)
+ seq_printf(m, ",gid=%u", vfsopts->gid);
+ if (opts->multiuser)
+ seq_puts(m, ",multiuser");
+ if (vfsopts->mask)
+ seq_printf(m, ",mask=%u", vfsopts->mask);
+ if (opts->fs_user_id)
+ seq_printf(m, ",userid=%u", opts->fs_user_id);
+ if (opts->gid_derivation)
+ seq_puts(m, ",derive_gid");
+ if (opts->default_normal)
+ seq_puts(m, ",default_normal");
+ if (opts->reserved_mb != 0)
+ seq_printf(m, ",reserved=%uMB", opts->reserved_mb);
+
+ return 0;
+};
+
+const struct super_operations sdcardfs_sops = {
+ .put_super = sdcardfs_put_super,
+ .statfs = sdcardfs_statfs,
+ .remount_fs = sdcardfs_remount_fs,
+ .remount_fs2 = sdcardfs_remount_fs2,
+ .clone_mnt_data = sdcardfs_clone_mnt_data,
+ .copy_mnt_data = sdcardfs_copy_mnt_data,
+ .evict_inode = sdcardfs_evict_inode,
+ .umount_begin = sdcardfs_umount_begin,
+ .show_options2 = sdcardfs_show_options,
+ .alloc_inode = sdcardfs_alloc_inode,
+ .destroy_inode = sdcardfs_destroy_inode,
+ .drop_inode = generic_delete_inode,
+};
diff --git a/fs/squashfs/Kconfig b/fs/squashfs/Kconfig
index 1adb334..6c81bf6 100644
--- a/fs/squashfs/Kconfig
+++ b/fs/squashfs/Kconfig
@@ -26,34 +26,6 @@
If unsure, say N.
choice
- prompt "File decompression options"
- depends on SQUASHFS
- help
- Squashfs now supports two options for decompressing file
- data. Traditionally Squashfs has decompressed into an
- intermediate buffer and then memcopied it into the page cache.
- Squashfs now supports the ability to decompress directly into
- the page cache.
-
- If unsure, select "Decompress file data into an intermediate buffer"
-
-config SQUASHFS_FILE_CACHE
- bool "Decompress file data into an intermediate buffer"
- help
- Decompress file data into an intermediate buffer and then
- memcopy it into the page cache.
-
-config SQUASHFS_FILE_DIRECT
- bool "Decompress files directly into the page cache"
- help
- Directly decompress file data into the page cache.
- Doing so can significantly improve performance because
- it eliminates a memcpy and it also removes the lock contention
- on the single buffer.
-
-endchoice
-
-choice
prompt "Decompressor parallelisation options"
depends on SQUASHFS
help
diff --git a/fs/squashfs/Makefile b/fs/squashfs/Makefile
index 7bd9b8b..b9a2990 100644
--- a/fs/squashfs/Makefile
+++ b/fs/squashfs/Makefile
@@ -6,8 +6,7 @@
obj-$(CONFIG_SQUASHFS) += squashfs.o
squashfs-y += block.o cache.o dir.o export.o file.o fragment.o id.o inode.o
squashfs-y += namei.o super.o symlink.o decompressor.o
-squashfs-$(CONFIG_SQUASHFS_FILE_CACHE) += file_cache.o
-squashfs-$(CONFIG_SQUASHFS_FILE_DIRECT) += file_direct.o page_actor.o
+squashfs-y += file_direct.o page_actor.o
squashfs-$(CONFIG_SQUASHFS_DECOMP_SINGLE) += decompressor_single.o
squashfs-$(CONFIG_SQUASHFS_DECOMP_MULTI) += decompressor_multi.o
squashfs-$(CONFIG_SQUASHFS_DECOMP_MULTI_PERCPU) += decompressor_multi_percpu.o
diff --git a/fs/squashfs/block.c b/fs/squashfs/block.c
index f098b9f..dfc2e76 100644
--- a/fs/squashfs/block.c
+++ b/fs/squashfs/block.c
@@ -28,10 +28,13 @@
#include <linux/fs.h>
#include <linux/vfs.h>
+#include <linux/bio.h>
#include <linux/slab.h>
#include <linux/string.h>
+#include <linux/pagemap.h>
#include <linux/buffer_head.h>
#include <linux/bio.h>
+#include <linux/workqueue.h>
#include "squashfs_fs.h"
#include "squashfs_fs_sb.h"
@@ -39,45 +42,382 @@
#include "decompressor.h"
#include "page_actor.h"
-/*
- * Read the metadata block length, this is stored in the first two
- * bytes of the metadata block.
- */
-static struct buffer_head *get_block_length(struct super_block *sb,
- u64 *cur_index, int *offset, int *length)
+static struct workqueue_struct *squashfs_read_wq;
+
+struct squashfs_read_request {
+ struct super_block *sb;
+ u64 index;
+ int length;
+ int compressed;
+ int offset;
+ u64 read_end;
+ struct squashfs_page_actor *output;
+ enum {
+ SQUASHFS_COPY,
+ SQUASHFS_DECOMPRESS,
+ SQUASHFS_METADATA,
+ } data_processing;
+ bool synchronous;
+
+ /*
+ * If the read is synchronous, it is possible to retrieve information
+ * about the request by setting these pointers.
+ */
+ int *res;
+ int *bytes_read;
+ int *bytes_uncompressed;
+
+ int nr_buffers;
+ struct buffer_head **bh;
+ struct work_struct offload;
+};
+
+struct squashfs_bio_request {
+ struct buffer_head **bh;
+ int nr_buffers;
+};
+
+static int squashfs_bio_submit(struct squashfs_read_request *req);
+
+int squashfs_init_read_wq(void)
{
- struct squashfs_sb_info *msblk = sb->s_fs_info;
- struct buffer_head *bh;
+ squashfs_read_wq = create_workqueue("SquashFS read wq");
+ return !!squashfs_read_wq;
+}
- bh = sb_bread(sb, *cur_index);
- if (bh == NULL)
- return NULL;
+void squashfs_destroy_read_wq(void)
+{
+ flush_workqueue(squashfs_read_wq);
+ destroy_workqueue(squashfs_read_wq);
+}
- if (msblk->devblksize - *offset == 1) {
- *length = (unsigned char) bh->b_data[*offset];
- put_bh(bh);
- bh = sb_bread(sb, ++(*cur_index));
- if (bh == NULL)
- return NULL;
- *length |= (unsigned char) bh->b_data[0] << 8;
- *offset = 1;
- } else {
- *length = (unsigned char) bh->b_data[*offset] |
- (unsigned char) bh->b_data[*offset + 1] << 8;
- *offset += 2;
+static void free_read_request(struct squashfs_read_request *req, int error)
+{
+ if (!req->synchronous)
+ squashfs_page_actor_free(req->output, error);
+ if (req->res)
+ *(req->res) = error;
+ kfree(req->bh);
+ kfree(req);
+}
- if (*offset == msblk->devblksize) {
- put_bh(bh);
- bh = sb_bread(sb, ++(*cur_index));
- if (bh == NULL)
- return NULL;
- *offset = 0;
+static void squashfs_process_blocks(struct squashfs_read_request *req)
+{
+ int error = 0;
+ int bytes, i, length;
+ struct squashfs_sb_info *msblk = req->sb->s_fs_info;
+ struct squashfs_page_actor *actor = req->output;
+ struct buffer_head **bh = req->bh;
+ int nr_buffers = req->nr_buffers;
+
+ for (i = 0; i < nr_buffers; ++i) {
+ if (!bh[i])
+ continue;
+ wait_on_buffer(bh[i]);
+ if (!buffer_uptodate(bh[i]))
+ error = -EIO;
+ }
+ if (error)
+ goto cleanup;
+
+ if (req->data_processing == SQUASHFS_METADATA) {
+ /* Extract the length of the metadata block */
+ if (req->offset != msblk->devblksize - 1) {
+ length = le16_to_cpup((__le16 *)
+ (bh[0]->b_data + req->offset));
+ } else {
+ length = (unsigned char)bh[0]->b_data[req->offset];
+ length |= (unsigned char)bh[1]->b_data[0] << 8;
+ }
+ req->compressed = SQUASHFS_COMPRESSED(length);
+ req->data_processing = req->compressed ? SQUASHFS_DECOMPRESS
+ : SQUASHFS_COPY;
+ length = SQUASHFS_COMPRESSED_SIZE(length);
+ if (req->index + length + 2 > req->read_end) {
+ for (i = 0; i < nr_buffers; ++i)
+ put_bh(bh[i]);
+ kfree(bh);
+ req->length = length;
+ req->index += 2;
+ squashfs_bio_submit(req);
+ return;
+ }
+ req->length = length;
+ req->offset = (req->offset + 2) % PAGE_SIZE;
+ if (req->offset < 2) {
+ put_bh(bh[0]);
+ ++bh;
+ --nr_buffers;
+ }
+ }
+ if (req->bytes_read)
+ *(req->bytes_read) = req->length;
+
+ if (req->data_processing == SQUASHFS_COPY) {
+ squashfs_bh_to_actor(bh, nr_buffers, req->output, req->offset,
+ req->length, msblk->devblksize);
+ } else if (req->data_processing == SQUASHFS_DECOMPRESS) {
+ req->length = squashfs_decompress(msblk, bh, nr_buffers,
+ req->offset, req->length, actor);
+ if (req->length < 0) {
+ error = -EIO;
+ goto cleanup;
}
}
- return bh;
+ /* Last page may have trailing bytes not filled */
+ bytes = req->length % PAGE_SIZE;
+ if (bytes && actor->page[actor->pages - 1])
+ zero_user_segment(actor->page[actor->pages - 1], bytes,
+ PAGE_SIZE);
+
+cleanup:
+ if (req->bytes_uncompressed)
+ *(req->bytes_uncompressed) = req->length;
+ if (error) {
+ for (i = 0; i < nr_buffers; ++i)
+ if (bh[i])
+ put_bh(bh[i]);
+ }
+ free_read_request(req, error);
}
+static void read_wq_handler(struct work_struct *work)
+{
+ squashfs_process_blocks(container_of(work,
+ struct squashfs_read_request, offload));
+}
+
+static void squashfs_bio_end_io(struct bio *bio)
+{
+ int i;
+ blk_status_t error = bio->bi_status;
+ struct squashfs_bio_request *bio_req = bio->bi_private;
+
+ bio_put(bio);
+
+ for (i = 0; i < bio_req->nr_buffers; ++i) {
+ if (!bio_req->bh[i])
+ continue;
+ if (!error)
+ set_buffer_uptodate(bio_req->bh[i]);
+ else
+ clear_buffer_uptodate(bio_req->bh[i]);
+ unlock_buffer(bio_req->bh[i]);
+ }
+ kfree(bio_req);
+}
+
+static int bh_is_optional(struct squashfs_read_request *req, int idx)
+{
+ int start_idx, end_idx;
+ struct squashfs_sb_info *msblk = req->sb->s_fs_info;
+
+ start_idx = (idx * msblk->devblksize - req->offset) >> PAGE_SHIFT;
+ end_idx = ((idx + 1) * msblk->devblksize - req->offset + 1) >> PAGE_SHIFT;
+ if (start_idx >= req->output->pages)
+ return 1;
+ if (start_idx < 0)
+ start_idx = end_idx;
+ if (end_idx >= req->output->pages)
+ end_idx = start_idx;
+ return !req->output->page[start_idx] && !req->output->page[end_idx];
+}
+
+static int actor_getblks(struct squashfs_read_request *req, u64 block)
+{
+ int i;
+
+ req->bh = kmalloc_array(req->nr_buffers, sizeof(*(req->bh)), GFP_NOIO);
+ if (!req->bh)
+ return -ENOMEM;
+
+ for (i = 0; i < req->nr_buffers; ++i) {
+ /*
+ * When dealing with an uncompressed block, the actor may
+ * contains NULL pages. There's no need to read the buffers
+ * associated with these pages.
+ */
+ if (!req->compressed && bh_is_optional(req, i)) {
+ req->bh[i] = NULL;
+ continue;
+ }
+ req->bh[i] = sb_getblk(req->sb, block + i);
+ if (!req->bh[i]) {
+ while (--i) {
+ if (req->bh[i])
+ put_bh(req->bh[i]);
+ }
+ return -1;
+ }
+ }
+ return 0;
+}
+
+static int squashfs_bio_submit(struct squashfs_read_request *req)
+{
+ struct bio *bio = NULL;
+ struct buffer_head *bh;
+ struct squashfs_bio_request *bio_req = NULL;
+ int b = 0, prev_block = 0;
+ struct squashfs_sb_info *msblk = req->sb->s_fs_info;
+
+ u64 read_start = round_down(req->index, msblk->devblksize);
+ u64 read_end = round_up(req->index + req->length, msblk->devblksize);
+ sector_t block = read_start >> msblk->devblksize_log2;
+ sector_t block_end = read_end >> msblk->devblksize_log2;
+ int offset = read_start - round_down(req->index, PAGE_SIZE);
+ int nr_buffers = block_end - block;
+ int blksz = msblk->devblksize;
+ int bio_max_pages = nr_buffers > BIO_MAX_PAGES ? BIO_MAX_PAGES
+ : nr_buffers;
+
+ /* Setup the request */
+ req->read_end = read_end;
+ req->offset = req->index - read_start;
+ req->nr_buffers = nr_buffers;
+ if (actor_getblks(req, block) < 0)
+ goto getblk_failed;
+
+ /* Create and submit the BIOs */
+ for (b = 0; b < nr_buffers; ++b, offset += blksz) {
+ bh = req->bh[b];
+ if (!bh || !trylock_buffer(bh))
+ continue;
+ if (buffer_uptodate(bh)) {
+ unlock_buffer(bh);
+ continue;
+ }
+ offset %= PAGE_SIZE;
+
+ /* Append the buffer to the current BIO if it is contiguous */
+ if (bio && bio_req && prev_block + 1 == b) {
+ if (bio_add_page(bio, bh->b_page, blksz, offset)) {
+ bio_req->nr_buffers += 1;
+ prev_block = b;
+ continue;
+ }
+ }
+
+ /* Otherwise, submit the current BIO and create a new one */
+ if (bio)
+ submit_bio(bio);
+ bio_req = kcalloc(1, sizeof(struct squashfs_bio_request),
+ GFP_NOIO);
+ if (!bio_req)
+ goto req_alloc_failed;
+ bio_req->bh = &req->bh[b];
+ bio = bio_alloc(GFP_NOIO, bio_max_pages);
+ if (!bio)
+ goto bio_alloc_failed;
+ bio_set_dev(bio, req->sb->s_bdev);
+ bio->bi_iter.bi_sector = (block + b)
+ << (msblk->devblksize_log2 - 9);
+ bio_set_op_attrs(bio, REQ_OP_READ, 0);
+ bio->bi_private = bio_req;
+ bio->bi_end_io = squashfs_bio_end_io;
+
+ bio_add_page(bio, bh->b_page, blksz, offset);
+ bio_req->nr_buffers += 1;
+ prev_block = b;
+ }
+ if (bio)
+ submit_bio(bio);
+
+ if (req->synchronous)
+ squashfs_process_blocks(req);
+ else {
+ INIT_WORK(&req->offload, read_wq_handler);
+ schedule_work(&req->offload);
+ }
+ return 0;
+
+bio_alloc_failed:
+ kfree(bio_req);
+req_alloc_failed:
+ unlock_buffer(bh);
+ while (--nr_buffers >= b)
+ if (req->bh[nr_buffers])
+ put_bh(req->bh[nr_buffers]);
+ while (--b >= 0)
+ if (req->bh[b])
+ wait_on_buffer(req->bh[b]);
+getblk_failed:
+ free_read_request(req, -ENOMEM);
+ return -ENOMEM;
+}
+
+static int read_metadata_block(struct squashfs_read_request *req,
+ u64 *next_index)
+{
+ int ret, error, bytes_read = 0, bytes_uncompressed = 0;
+ struct squashfs_sb_info *msblk = req->sb->s_fs_info;
+
+ if (req->index + 2 > msblk->bytes_used) {
+ free_read_request(req, -EINVAL);
+ return -EINVAL;
+ }
+ req->length = 2;
+
+ /* Do not read beyond the end of the device */
+ if (req->index + req->length > msblk->bytes_used)
+ req->length = msblk->bytes_used - req->index;
+ req->data_processing = SQUASHFS_METADATA;
+
+ /*
+ * Reading metadata is always synchronous because we don't know the
+ * length in advance and the function is expected to update
+ * 'next_index' and return the length.
+ */
+ req->synchronous = true;
+ req->res = &error;
+ req->bytes_read = &bytes_read;
+ req->bytes_uncompressed = &bytes_uncompressed;
+
+ TRACE("Metadata block @ 0x%llx, %scompressed size %d, src size %d\n",
+ req->index, req->compressed ? "" : "un", bytes_read,
+ req->output->length);
+
+ ret = squashfs_bio_submit(req);
+ if (ret)
+ return ret;
+ if (error)
+ return error;
+ if (next_index)
+ *next_index += 2 + bytes_read;
+ return bytes_uncompressed;
+}
+
+static int read_data_block(struct squashfs_read_request *req, int length,
+ u64 *next_index, bool synchronous)
+{
+ int ret, error = 0, bytes_uncompressed = 0, bytes_read = 0;
+
+ req->compressed = SQUASHFS_COMPRESSED_BLOCK(length);
+ req->length = length = SQUASHFS_COMPRESSED_SIZE_BLOCK(length);
+ req->data_processing = req->compressed ? SQUASHFS_DECOMPRESS
+ : SQUASHFS_COPY;
+
+ req->synchronous = synchronous;
+ if (synchronous) {
+ req->res = &error;
+ req->bytes_read = &bytes_read;
+ req->bytes_uncompressed = &bytes_uncompressed;
+ }
+
+ TRACE("Data block @ 0x%llx, %scompressed size %d, src size %d\n",
+ req->index, req->compressed ? "" : "un", req->length,
+ req->output->length);
+
+ ret = squashfs_bio_submit(req);
+ if (ret)
+ return ret;
+ if (synchronous)
+ ret = error ? error : bytes_uncompressed;
+ if (next_index)
+ *next_index += length;
+ return ret;
+}
/*
* Read and decompress a metadata block or datablock. Length is non-zero
@@ -88,130 +428,50 @@
* generated a larger block - this does occasionally happen with compression
* algorithms).
*/
-int squashfs_read_data(struct super_block *sb, u64 index, int length,
- u64 *next_index, struct squashfs_page_actor *output)
+static int __squashfs_read_data(struct super_block *sb, u64 index, int length,
+ u64 *next_index, struct squashfs_page_actor *output, bool sync)
{
- struct squashfs_sb_info *msblk = sb->s_fs_info;
- struct buffer_head **bh;
- int offset = index & ((1 << msblk->devblksize_log2) - 1);
- u64 cur_index = index >> msblk->devblksize_log2;
- int bytes, compressed, b = 0, k = 0, avail, i;
+ struct squashfs_read_request *req;
- bh = kcalloc(((output->length + msblk->devblksize - 1)
- >> msblk->devblksize_log2) + 1, sizeof(*bh), GFP_KERNEL);
- if (bh == NULL)
+ req = kcalloc(1, sizeof(struct squashfs_read_request), GFP_KERNEL);
+ if (!req) {
+ if (!sync)
+ squashfs_page_actor_free(output, -ENOMEM);
return -ENOMEM;
-
- if (length) {
- /*
- * Datablock.
- */
- bytes = -offset;
- compressed = SQUASHFS_COMPRESSED_BLOCK(length);
- length = SQUASHFS_COMPRESSED_SIZE_BLOCK(length);
- if (next_index)
- *next_index = index + length;
-
- TRACE("Block @ 0x%llx, %scompressed size %d, src size %d\n",
- index, compressed ? "" : "un", length, output->length);
-
- if (length < 0 || length > output->length ||
- (index + length) > msblk->bytes_used)
- goto read_failure;
-
- for (b = 0; bytes < length; b++, cur_index++) {
- bh[b] = sb_getblk(sb, cur_index);
- if (bh[b] == NULL)
- goto block_release;
- bytes += msblk->devblksize;
- }
- ll_rw_block(REQ_OP_READ, 0, b, bh);
- } else {
- /*
- * Metadata block.
- */
- if ((index + 2) > msblk->bytes_used)
- goto read_failure;
-
- bh[0] = get_block_length(sb, &cur_index, &offset, &length);
- if (bh[0] == NULL)
- goto read_failure;
- b = 1;
-
- bytes = msblk->devblksize - offset;
- compressed = SQUASHFS_COMPRESSED(length);
- length = SQUASHFS_COMPRESSED_SIZE(length);
- if (next_index)
- *next_index = index + length + 2;
-
- TRACE("Block @ 0x%llx, %scompressed size %d\n", index,
- compressed ? "" : "un", length);
-
- if (length < 0 || length > output->length ||
- (index + length) > msblk->bytes_used)
- goto block_release;
-
- for (; bytes < length; b++) {
- bh[b] = sb_getblk(sb, ++cur_index);
- if (bh[b] == NULL)
- goto block_release;
- bytes += msblk->devblksize;
- }
- ll_rw_block(REQ_OP_READ, 0, b - 1, bh + 1);
}
- for (i = 0; i < b; i++) {
- wait_on_buffer(bh[i]);
- if (!buffer_uptodate(bh[i]))
- goto block_release;
+ req->sb = sb;
+ req->index = index;
+ req->output = output;
+
+ if (next_index)
+ *next_index = index;
+
+ if (length)
+ length = read_data_block(req, length, next_index, sync);
+ else
+ length = read_metadata_block(req, next_index);
+
+ if (length < 0) {
+ ERROR("squashfs_read_data failed to read block 0x%llx\n",
+ (unsigned long long)index);
+ return -EIO;
}
- if (compressed) {
- if (!msblk->stream)
- goto read_failure;
- length = squashfs_decompress(msblk, bh, b, offset, length,
- output);
- if (length < 0)
- goto read_failure;
- } else {
- /*
- * Block is uncompressed.
- */
- int in, pg_offset = 0;
- void *data = squashfs_first_page(output);
-
- for (bytes = length; k < b; k++) {
- in = min(bytes, msblk->devblksize - offset);
- bytes -= in;
- while (in) {
- if (pg_offset == PAGE_SIZE) {
- data = squashfs_next_page(output);
- pg_offset = 0;
- }
- avail = min_t(int, in, PAGE_SIZE -
- pg_offset);
- memcpy(data + pg_offset, bh[k]->b_data + offset,
- avail);
- in -= avail;
- pg_offset += avail;
- offset += avail;
- }
- offset = 0;
- put_bh(bh[k]);
- }
- squashfs_finish_page(output);
- }
-
- kfree(bh);
return length;
+}
-block_release:
- for (; k < b; k++)
- put_bh(bh[k]);
+int squashfs_read_data(struct super_block *sb, u64 index, int length,
+ u64 *next_index, struct squashfs_page_actor *output)
+{
+ return __squashfs_read_data(sb, index, length, next_index, output,
+ true);
+}
-read_failure:
- ERROR("squashfs_read_data failed to read block 0x%llx\n",
- (unsigned long long) index);
- kfree(bh);
- return -EIO;
+int squashfs_read_data_async(struct super_block *sb, u64 index, int length,
+ u64 *next_index, struct squashfs_page_actor *output)
+{
+
+ return __squashfs_read_data(sb, index, length, next_index, output,
+ false);
}
diff --git a/fs/squashfs/cache.c b/fs/squashfs/cache.c
index 0839efa..9d9d4aa 100644
--- a/fs/squashfs/cache.c
+++ b/fs/squashfs/cache.c
@@ -209,17 +209,14 @@
*/
void squashfs_cache_delete(struct squashfs_cache *cache)
{
- int i, j;
+ int i;
if (cache == NULL)
return;
for (i = 0; i < cache->entries; i++) {
- if (cache->entry[i].data) {
- for (j = 0; j < cache->pages; j++)
- kfree(cache->entry[i].data[j]);
- kfree(cache->entry[i].data);
- }
+ if (cache->entry[i].page)
+ free_page_array(cache->entry[i].page, cache->pages);
kfree(cache->entry[i].actor);
}
@@ -236,7 +233,7 @@
struct squashfs_cache *squashfs_cache_init(char *name, int entries,
int block_size)
{
- int i, j;
+ int i;
struct squashfs_cache *cache = kzalloc(sizeof(*cache), GFP_KERNEL);
if (cache == NULL) {
@@ -268,22 +265,13 @@
init_waitqueue_head(&cache->entry[i].wait_queue);
entry->cache = cache;
entry->block = SQUASHFS_INVALID_BLK;
- entry->data = kcalloc(cache->pages, sizeof(void *), GFP_KERNEL);
- if (entry->data == NULL) {
+ entry->page = alloc_page_array(cache->pages, GFP_KERNEL);
+ if (!entry->page) {
ERROR("Failed to allocate %s cache entry\n", name);
goto cleanup;
}
-
- for (j = 0; j < cache->pages; j++) {
- entry->data[j] = kmalloc(PAGE_SIZE, GFP_KERNEL);
- if (entry->data[j] == NULL) {
- ERROR("Failed to allocate %s buffer\n", name);
- goto cleanup;
- }
- }
-
- entry->actor = squashfs_page_actor_init(entry->data,
- cache->pages, 0);
+ entry->actor = squashfs_page_actor_init(entry->page,
+ cache->pages, 0, NULL);
if (entry->actor == NULL) {
ERROR("Failed to allocate %s cache entry\n", name);
goto cleanup;
@@ -314,18 +302,20 @@
return min(length, entry->length - offset);
while (offset < entry->length) {
- void *buff = entry->data[offset / PAGE_SIZE]
- + (offset % PAGE_SIZE);
+ void *buff = kmap_atomic(entry->page[offset / PAGE_SIZE])
+ + (offset % PAGE_SIZE);
int bytes = min_t(int, entry->length - offset,
PAGE_SIZE - (offset % PAGE_SIZE));
if (bytes >= remaining) {
memcpy(buffer, buff, remaining);
+ kunmap_atomic(buff);
remaining = 0;
break;
}
memcpy(buffer, buff, bytes);
+ kunmap_atomic(buff);
buffer += bytes;
remaining -= bytes;
offset += bytes;
@@ -419,43 +409,38 @@
void *squashfs_read_table(struct super_block *sb, u64 block, int length)
{
int pages = (length + PAGE_SIZE - 1) >> PAGE_SHIFT;
- int i, res;
- void *table, *buffer, **data;
+ struct page **page;
+ void *buff;
+ int res;
struct squashfs_page_actor *actor;
- table = buffer = kmalloc(length, GFP_KERNEL);
- if (table == NULL)
+ page = alloc_page_array(pages, GFP_KERNEL);
+ if (!page)
return ERR_PTR(-ENOMEM);
- data = kcalloc(pages, sizeof(void *), GFP_KERNEL);
- if (data == NULL) {
+ actor = squashfs_page_actor_init(page, pages, length, NULL);
+ if (actor == NULL) {
res = -ENOMEM;
goto failed;
}
- actor = squashfs_page_actor_init(data, pages, length);
- if (actor == NULL) {
- res = -ENOMEM;
- goto failed2;
- }
-
- for (i = 0; i < pages; i++, buffer += PAGE_SIZE)
- data[i] = buffer;
-
res = squashfs_read_data(sb, block, length |
SQUASHFS_COMPRESSED_BIT_BLOCK, NULL, actor);
- kfree(data);
- kfree(actor);
-
if (res < 0)
- goto failed;
+ goto failed2;
- return table;
+ buff = kmalloc(length, GFP_KERNEL);
+ if (!buff)
+ goto failed2;
+ squashfs_actor_to_buf(actor, buff, length);
+ squashfs_page_actor_free(actor, 0);
+ free_page_array(page, pages);
+ return buff;
failed2:
- kfree(data);
+ squashfs_page_actor_free(actor, 0);
failed:
- kfree(table);
+ free_page_array(page, pages);
return ERR_PTR(res);
}
diff --git a/fs/squashfs/decompressor.c b/fs/squashfs/decompressor.c
index 8366398..86831aa 100644
--- a/fs/squashfs/decompressor.c
+++ b/fs/squashfs/decompressor.c
@@ -24,7 +24,8 @@
#include <linux/types.h>
#include <linux/mutex.h>
#include <linux/slab.h>
-#include <linux/buffer_head.h>
+#include <linux/highmem.h>
+#include <linux/fs.h>
#include "squashfs_fs.h"
#include "squashfs_fs_sb.h"
@@ -101,40 +102,44 @@
static void *get_comp_opts(struct super_block *sb, unsigned short flags)
{
struct squashfs_sb_info *msblk = sb->s_fs_info;
- void *buffer = NULL, *comp_opts;
+ void *comp_opts, *buffer = NULL;
+ struct page *page;
struct squashfs_page_actor *actor = NULL;
int length = 0;
+ if (!SQUASHFS_COMP_OPTS(flags))
+ return squashfs_comp_opts(msblk, buffer, length);
+
/*
* Read decompressor specific options from file system if present
*/
- if (SQUASHFS_COMP_OPTS(flags)) {
- buffer = kmalloc(PAGE_SIZE, GFP_KERNEL);
- if (buffer == NULL) {
- comp_opts = ERR_PTR(-ENOMEM);
- goto out;
- }
- actor = squashfs_page_actor_init(&buffer, 1, 0);
- if (actor == NULL) {
- comp_opts = ERR_PTR(-ENOMEM);
- goto out;
- }
+ page = alloc_page(GFP_KERNEL);
+ if (!page)
+ return ERR_PTR(-ENOMEM);
- length = squashfs_read_data(sb,
- sizeof(struct squashfs_super_block), 0, NULL, actor);
-
- if (length < 0) {
- comp_opts = ERR_PTR(length);
- goto out;
- }
+ actor = squashfs_page_actor_init(&page, 1, 0, NULL);
+ if (actor == NULL) {
+ comp_opts = ERR_PTR(-ENOMEM);
+ goto actor_error;
}
- comp_opts = squashfs_comp_opts(msblk, buffer, length);
+ length = squashfs_read_data(sb,
+ sizeof(struct squashfs_super_block), 0, NULL, actor);
-out:
- kfree(actor);
- kfree(buffer);
+ if (length < 0) {
+ comp_opts = ERR_PTR(length);
+ goto read_error;
+ }
+
+ buffer = kmap_atomic(page);
+ comp_opts = squashfs_comp_opts(msblk, buffer, length);
+ kunmap_atomic(buffer);
+
+read_error:
+ squashfs_page_actor_free(actor, 0);
+actor_error:
+ __free_page(page);
return comp_opts;
}
diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
index f1c1430..9236e7d 100644
--- a/fs/squashfs/file.c
+++ b/fs/squashfs/file.c
@@ -47,6 +47,7 @@
#include <linux/string.h>
#include <linux/pagemap.h>
#include <linux/mutex.h>
+#include <linux/mm_inline.h>
#include "squashfs_fs.h"
#include "squashfs_fs_sb.h"
@@ -451,63 +452,129 @@
return res;
}
+static int squashfs_readpages_fragment(struct page *page,
+ struct list_head *readahead_pages, struct address_space *mapping,
+ int expected)
+{
+ if (!page) {
+ page = lru_to_page(readahead_pages);
+ list_del(&page->lru);
+ if (add_to_page_cache_lru(page, mapping, page->index,
+ mapping_gfp_constraint(mapping, GFP_KERNEL))) {
+ put_page(page);
+ return 0;
+ }
+ }
+ return squashfs_readpage_fragment(page, expected);
+}
+
static int squashfs_readpage_sparse(struct page *page, int expected)
{
squashfs_copy_cache(page, NULL, expected, 0);
return 0;
}
-static int squashfs_readpage(struct file *file, struct page *page)
+static int squashfs_readpages_sparse(struct page *page,
+ struct list_head *readahead_pages, struct address_space *mapping,
+ int expected)
{
- struct inode *inode = page->mapping->host;
+ if (!page) {
+ page = lru_to_page(readahead_pages);
+ list_del(&page->lru);
+ if (add_to_page_cache_lru(page, mapping, page->index,
+ mapping_gfp_constraint(mapping, GFP_KERNEL))) {
+ put_page(page);
+ return 0;
+ }
+ }
+ return squashfs_readpage_sparse(page, expected);
+}
+
+static int __squashfs_readpages(struct file *file, struct page *page,
+ struct list_head *readahead_pages, unsigned int nr_pages,
+ struct address_space *mapping)
+{
+ struct inode *inode = mapping->host;
struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
- int index = page->index >> (msblk->block_log - PAGE_SHIFT);
int file_end = i_size_read(inode) >> msblk->block_log;
- int expected = index == file_end ?
+ int res;
+
+ do {
+ struct page *cur_page = page ? page
+ : lru_to_page(readahead_pages);
+ int page_index = cur_page->index;
+ int index = page_index >> (msblk->block_log - PAGE_SHIFT);
+ int expected = index == file_end ?
(i_size_read(inode) & (msblk->block_size - 1)) :
msblk->block_size;
- int res;
- void *pageaddr;
+
+ if (page_index >= ((i_size_read(inode) + PAGE_SIZE - 1) >>
+ PAGE_SHIFT))
+ return 1;
+
+ if (index < file_end || squashfs_i(inode)->fragment_block ==
+ SQUASHFS_INVALID_BLK) {
+ u64 block = 0;
+ int bsize = read_blocklist(inode, index, &block);
+
+ if (bsize < 0)
+ return -1;
+
+ if (bsize == 0) {
+ res = squashfs_readpages_sparse(page,
+ readahead_pages, mapping, expected);
+ } else {
+ res = squashfs_readpages_block(page,
+ readahead_pages, &nr_pages, mapping,
+ page_index, block, bsize);
+ }
+ } else {
+ res = squashfs_readpages_fragment(page,
+ readahead_pages, mapping, expected);
+ }
+ if (res)
+ return 0;
+ page = NULL;
+ } while (readahead_pages && !list_empty(readahead_pages));
+
+ return 0;
+}
+
+static int squashfs_readpage(struct file *file, struct page *page)
+{
+ int ret;
TRACE("Entered squashfs_readpage, page index %lx, start block %llx\n",
- page->index, squashfs_i(inode)->start);
+ page->index, squashfs_i(page->mapping->host)->start);
- if (page->index >= ((i_size_read(inode) + PAGE_SIZE - 1) >>
- PAGE_SHIFT))
- goto out;
+ get_page(page);
- if (index < file_end || squashfs_i(inode)->fragment_block ==
- SQUASHFS_INVALID_BLK) {
- u64 block = 0;
- int bsize = read_blocklist(inode, index, &block);
- if (bsize < 0)
- goto error_out;
-
- if (bsize == 0)
- res = squashfs_readpage_sparse(page, expected);
+ ret = __squashfs_readpages(file, page, NULL, 1, page->mapping);
+ if (ret) {
+ flush_dcache_page(page);
+ if (ret < 0)
+ SetPageError(page);
else
- res = squashfs_readpage_block(page, block, bsize, expected);
- } else
- res = squashfs_readpage_fragment(page, expected);
+ SetPageUptodate(page);
+ zero_user_segment(page, 0, PAGE_SIZE);
+ unlock_page(page);
+ put_page(page);
+ }
- if (!res)
- return 0;
+ return 0;
+}
-error_out:
- SetPageError(page);
-out:
- pageaddr = kmap_atomic(page);
- memset(pageaddr, 0, PAGE_SIZE);
- kunmap_atomic(pageaddr);
- flush_dcache_page(page);
- if (!PageError(page))
- SetPageUptodate(page);
- unlock_page(page);
-
+static int squashfs_readpages(struct file *file, struct address_space *mapping,
+ struct list_head *pages, unsigned int nr_pages)
+{
+ TRACE("Entered squashfs_readpages, %u pages, first page index %lx\n",
+ nr_pages, lru_to_page(pages)->index);
+ __squashfs_readpages(file, NULL, pages, nr_pages, mapping);
return 0;
}
const struct address_space_operations squashfs_aops = {
- .readpage = squashfs_readpage
+ .readpage = squashfs_readpage,
+ .readpages = squashfs_readpages,
};
diff --git a/fs/squashfs/file_cache.c b/fs/squashfs/file_cache.c
deleted file mode 100644
index a9ba8d9..0000000
--- a/fs/squashfs/file_cache.c
+++ /dev/null
@@ -1,38 +0,0 @@
-/*
- * Copyright (c) 2013
- * Phillip Lougher <phillip@squashfs.org.uk>
- *
- * This work is licensed under the terms of the GNU GPL, version 2. See
- * the COPYING file in the top-level directory.
- */
-
-#include <linux/fs.h>
-#include <linux/vfs.h>
-#include <linux/kernel.h>
-#include <linux/slab.h>
-#include <linux/string.h>
-#include <linux/pagemap.h>
-#include <linux/mutex.h>
-
-#include "squashfs_fs.h"
-#include "squashfs_fs_sb.h"
-#include "squashfs_fs_i.h"
-#include "squashfs.h"
-
-/* Read separately compressed datablock and memcopy into page cache */
-int squashfs_readpage_block(struct page *page, u64 block, int bsize, int expected)
-{
- struct inode *i = page->mapping->host;
- struct squashfs_cache_entry *buffer = squashfs_get_datablock(i->i_sb,
- block, bsize);
- int res = buffer->error;
-
- if (res)
- ERROR("Unable to read page, block %llx, size %x\n", block,
- bsize);
- else
- squashfs_copy_cache(page, buffer, expected, 0);
-
- squashfs_cache_put(buffer);
- return res;
-}
diff --git a/fs/squashfs/file_direct.c b/fs/squashfs/file_direct.c
index 80db1b8..dc87f77 100644
--- a/fs/squashfs/file_direct.c
+++ b/fs/squashfs/file_direct.c
@@ -13,6 +13,7 @@
#include <linux/string.h>
#include <linux/pagemap.h>
#include <linux/mutex.h>
+#include <linux/mm_inline.h>
#include "squashfs_fs.h"
#include "squashfs_fs_sb.h"
@@ -20,157 +21,136 @@
#include "squashfs.h"
#include "page_actor.h"
-static int squashfs_read_cache(struct page *target_page, u64 block, int bsize,
- int pages, struct page **page, int bytes);
-
-/* Read separately compressed datablock directly into page cache */
-int squashfs_readpage_block(struct page *target_page, u64 block, int bsize,
- int expected)
-
+static void release_actor_pages(struct page **page, int pages, int error)
{
- struct inode *inode = target_page->mapping->host;
- struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
+ int i;
- int file_end = (i_size_read(inode) - 1) >> PAGE_SHIFT;
- int mask = (1 << (msblk->block_log - PAGE_SHIFT)) - 1;
- int start_index = target_page->index & ~mask;
- int end_index = start_index | mask;
- int i, n, pages, missing_pages, bytes, res = -ENOMEM;
+ for (i = 0; i < pages; i++) {
+ if (!page[i])
+ continue;
+ flush_dcache_page(page[i]);
+ if (!error)
+ SetPageUptodate(page[i]);
+ else {
+ SetPageError(page[i]);
+ zero_user_segment(page[i], 0, PAGE_SIZE);
+ }
+ unlock_page(page[i]);
+ put_page(page[i]);
+ }
+ kfree(page);
+}
+
+/*
+ * Create a "page actor" which will kmap and kunmap the
+ * page cache pages appropriately within the decompressor
+ */
+static struct squashfs_page_actor *actor_from_page_cache(
+ unsigned int actor_pages, struct page *target_page,
+ struct list_head *rpages, unsigned int *nr_pages, int start_index,
+ struct address_space *mapping)
+{
struct page **page;
struct squashfs_page_actor *actor;
- void *pageaddr;
+ int i, n;
+ gfp_t gfp = mapping_gfp_constraint(mapping, GFP_KERNEL);
- if (end_index > file_end)
- end_index = file_end;
+ page = kmalloc_array(actor_pages, sizeof(void *), GFP_KERNEL);
+ if (!page)
+ return NULL;
- pages = end_index - start_index + 1;
+ for (i = 0, n = start_index; i < actor_pages; i++, n++) {
+ if (target_page == NULL && rpages && !list_empty(rpages)) {
+ struct page *cur_page = lru_to_page(rpages);
- page = kmalloc_array(pages, sizeof(void *), GFP_KERNEL);
- if (page == NULL)
- return res;
+ if (cur_page->index < start_index + actor_pages) {
+ list_del(&cur_page->lru);
+ --(*nr_pages);
+ if (add_to_page_cache_lru(cur_page, mapping,
+ cur_page->index, gfp))
+ put_page(cur_page);
+ else
+ target_page = cur_page;
+ } else
+ rpages = NULL;
+ }
- /*
- * Create a "page actor" which will kmap and kunmap the
- * page cache pages appropriately within the decompressor
- */
- actor = squashfs_page_actor_init_special(page, pages, 0);
- if (actor == NULL)
- goto out;
-
- /* Try to grab all the pages covered by the Squashfs block */
- for (missing_pages = 0, i = 0, n = start_index; i < pages; i++, n++) {
- page[i] = (n == target_page->index) ? target_page :
- grab_cache_page_nowait(target_page->mapping, n);
-
- if (page[i] == NULL) {
- missing_pages++;
- continue;
+ if (target_page && target_page->index == n) {
+ page[i] = target_page;
+ target_page = NULL;
+ } else {
+ page[i] = grab_cache_page_nowait(mapping, n);
+ if (page[i] == NULL)
+ continue;
}
if (PageUptodate(page[i])) {
unlock_page(page[i]);
put_page(page[i]);
page[i] = NULL;
- missing_pages++;
}
}
- if (missing_pages) {
- /*
- * Couldn't get one or more pages, this page has either
- * been VM reclaimed, but others are still in the page cache
- * and uptodate, or we're racing with another thread in
- * squashfs_readpage also trying to grab them. Fall back to
- * using an intermediate buffer.
- */
- res = squashfs_read_cache(target_page, block, bsize, pages,
- page, expected);
- if (res < 0)
- goto mark_errored;
-
- goto out;
+ actor = squashfs_page_actor_init(page, actor_pages, 0,
+ release_actor_pages);
+ if (!actor) {
+ release_actor_pages(page, actor_pages, -ENOMEM);
+ kfree(page);
+ return NULL;
}
-
- /* Decompress directly into the page cache buffers */
- res = squashfs_read_data(inode->i_sb, block, bsize, NULL, actor);
- if (res < 0)
- goto mark_errored;
-
- if (res != expected) {
- res = -EIO;
- goto mark_errored;
- }
-
- /* Last page may have trailing bytes not filled */
- bytes = res % PAGE_SIZE;
- if (bytes) {
- pageaddr = kmap_atomic(page[pages - 1]);
- memset(pageaddr + bytes, 0, PAGE_SIZE - bytes);
- kunmap_atomic(pageaddr);
- }
-
- /* Mark pages as uptodate, unlock and release */
- for (i = 0; i < pages; i++) {
- flush_dcache_page(page[i]);
- SetPageUptodate(page[i]);
- unlock_page(page[i]);
- if (page[i] != target_page)
- put_page(page[i]);
- }
-
- kfree(actor);
- kfree(page);
-
- return 0;
-
-mark_errored:
- /* Decompression failed, mark pages as errored. Target_page is
- * dealt with by the caller
- */
- for (i = 0; i < pages; i++) {
- if (page[i] == NULL || page[i] == target_page)
- continue;
- flush_dcache_page(page[i]);
- SetPageError(page[i]);
- unlock_page(page[i]);
- put_page(page[i]);
- }
-
-out:
- kfree(actor);
- kfree(page);
- return res;
+ return actor;
}
+int squashfs_readpages_block(struct page *target_page,
+ struct list_head *readahead_pages,
+ unsigned int *nr_pages,
+ struct address_space *mapping,
+ int page_index, u64 block, int bsize)
-static int squashfs_read_cache(struct page *target_page, u64 block, int bsize,
- int pages, struct page **page, int bytes)
{
- struct inode *i = target_page->mapping->host;
- struct squashfs_cache_entry *buffer = squashfs_get_datablock(i->i_sb,
- block, bsize);
- int res = buffer->error, n, offset = 0;
+ struct squashfs_page_actor *actor;
+ struct inode *inode = mapping->host;
+ struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
+ int start_index, end_index, file_end, actor_pages, res;
+ int mask = (1 << (msblk->block_log - PAGE_SHIFT)) - 1;
- if (res) {
- ERROR("Unable to read page, block %llx, size %x\n", block,
- bsize);
- goto out;
+ /*
+ * If readpage() is called on an uncompressed datablock, we can just
+ * read the pages instead of fetching the whole block.
+ * This greatly improves the performance when a process keep doing
+ * random reads because we only fetch the necessary data.
+ * The readahead algorithm will take care of doing speculative reads
+ * if necessary.
+ * We can't read more than 1 block even if readahead provides use more
+ * pages because we don't know yet if the next block is compressed or
+ * not.
+ */
+ if (bsize && !SQUASHFS_COMPRESSED_BLOCK(bsize)) {
+ u64 block_end = block + msblk->block_size;
+
+ block += (page_index & mask) * PAGE_SIZE;
+ actor_pages = (block_end - block) / PAGE_SIZE;
+ if (*nr_pages < actor_pages)
+ actor_pages = *nr_pages;
+ start_index = page_index;
+ bsize = min_t(int, bsize, (PAGE_SIZE * actor_pages)
+ | SQUASHFS_COMPRESSED_BIT_BLOCK);
+ } else {
+ file_end = (i_size_read(inode) - 1) >> PAGE_SHIFT;
+ start_index = page_index & ~mask;
+ end_index = start_index | mask;
+ if (end_index > file_end)
+ end_index = file_end;
+ actor_pages = end_index - start_index + 1;
}
- for (n = 0; n < pages && bytes > 0; n++,
- bytes -= PAGE_SIZE, offset += PAGE_SIZE) {
- int avail = min_t(int, bytes, PAGE_SIZE);
+ actor = actor_from_page_cache(actor_pages, target_page,
+ readahead_pages, nr_pages, start_index,
+ mapping);
+ if (!actor)
+ return -ENOMEM;
- if (page[n] == NULL)
- continue;
-
- squashfs_fill_page(page[n], buffer, offset, avail);
- unlock_page(page[n]);
- if (page[n] != target_page)
- put_page(page[n]);
- }
-
-out:
- squashfs_cache_put(buffer);
- return res;
+ res = squashfs_read_data_async(inode->i_sb, block, bsize, NULL,
+ actor);
+ return res < 0 ? res : 0;
}
diff --git a/fs/squashfs/lz4_wrapper.c b/fs/squashfs/lz4_wrapper.c
index 95da653..5d85125 100644
--- a/fs/squashfs/lz4_wrapper.c
+++ b/fs/squashfs/lz4_wrapper.c
@@ -94,39 +94,17 @@
struct buffer_head **bh, int b, int offset, int length,
struct squashfs_page_actor *output)
{
+ int res;
struct squashfs_lz4 *stream = strm;
- void *buff = stream->input, *data;
- int avail, i, bytes = length, res;
- for (i = 0; i < b; i++) {
- avail = min(bytes, msblk->devblksize - offset);
- memcpy(buff, bh[i]->b_data + offset, avail);
- buff += avail;
- bytes -= avail;
- offset = 0;
- put_bh(bh[i]);
- }
-
+ squashfs_bh_to_buf(bh, b, stream->input, offset, length,
+ msblk->devblksize);
res = LZ4_decompress_safe(stream->input, stream->output,
length, output->length);
if (res < 0)
return -EIO;
-
- bytes = res;
- data = squashfs_first_page(output);
- buff = stream->output;
- while (data) {
- if (bytes <= PAGE_SIZE) {
- memcpy(data, buff, bytes);
- break;
- }
- memcpy(data, buff, PAGE_SIZE);
- buff += PAGE_SIZE;
- bytes -= PAGE_SIZE;
- data = squashfs_next_page(output);
- }
- squashfs_finish_page(output);
+ squashfs_buf_to_actor(stream->output, output, res);
return res;
}
diff --git a/fs/squashfs/lzo_wrapper.c b/fs/squashfs/lzo_wrapper.c
index 934c17e..2c844d5 100644
--- a/fs/squashfs/lzo_wrapper.c
+++ b/fs/squashfs/lzo_wrapper.c
@@ -79,45 +79,19 @@
struct buffer_head **bh, int b, int offset, int length,
struct squashfs_page_actor *output)
{
- struct squashfs_lzo *stream = strm;
- void *buff = stream->input, *data;
- int avail, i, bytes = length, res;
+ int res;
size_t out_len = output->length;
+ struct squashfs_lzo *stream = strm;
- for (i = 0; i < b; i++) {
- avail = min(bytes, msblk->devblksize - offset);
- memcpy(buff, bh[i]->b_data + offset, avail);
- buff += avail;
- bytes -= avail;
- offset = 0;
- put_bh(bh[i]);
- }
-
+ squashfs_bh_to_buf(bh, b, stream->input, offset, length,
+ msblk->devblksize);
res = lzo1x_decompress_safe(stream->input, (size_t)length,
stream->output, &out_len);
if (res != LZO_E_OK)
- goto failed;
+ return -EIO;
+ squashfs_buf_to_actor(stream->output, output, out_len);
- res = bytes = (int)out_len;
- data = squashfs_first_page(output);
- buff = stream->output;
- while (data) {
- if (bytes <= PAGE_SIZE) {
- memcpy(data, buff, bytes);
- break;
- } else {
- memcpy(data, buff, PAGE_SIZE);
- buff += PAGE_SIZE;
- bytes -= PAGE_SIZE;
- data = squashfs_next_page(output);
- }
- }
- squashfs_finish_page(output);
-
- return res;
-
-failed:
- return -EIO;
+ return out_len;
}
const struct squashfs_decompressor squashfs_lzo_comp_ops = {
diff --git a/fs/squashfs/page_actor.c b/fs/squashfs/page_actor.c
index 9b7b1b6..e348f56 100644
--- a/fs/squashfs/page_actor.c
+++ b/fs/squashfs/page_actor.c
@@ -9,79 +9,11 @@
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/pagemap.h>
+#include <linux/buffer_head.h>
#include "page_actor.h"
-/*
- * This file contains implementations of page_actor for decompressing into
- * an intermediate buffer, and for decompressing directly into the
- * page cache.
- *
- * Calling code should avoid sleeping between calls to squashfs_first_page()
- * and squashfs_finish_page().
- */
-
-/* Implementation of page_actor for decompressing into intermediate buffer */
-static void *cache_first_page(struct squashfs_page_actor *actor)
-{
- actor->next_page = 1;
- return actor->buffer[0];
-}
-
-static void *cache_next_page(struct squashfs_page_actor *actor)
-{
- if (actor->next_page == actor->pages)
- return NULL;
-
- return actor->buffer[actor->next_page++];
-}
-
-static void cache_finish_page(struct squashfs_page_actor *actor)
-{
- /* empty */
-}
-
-struct squashfs_page_actor *squashfs_page_actor_init(void **buffer,
- int pages, int length)
-{
- struct squashfs_page_actor *actor = kmalloc(sizeof(*actor), GFP_KERNEL);
-
- if (actor == NULL)
- return NULL;
-
- actor->length = length ? : pages * PAGE_SIZE;
- actor->buffer = buffer;
- actor->pages = pages;
- actor->next_page = 0;
- actor->squashfs_first_page = cache_first_page;
- actor->squashfs_next_page = cache_next_page;
- actor->squashfs_finish_page = cache_finish_page;
- return actor;
-}
-
-/* Implementation of page_actor for decompressing directly into page cache. */
-static void *direct_first_page(struct squashfs_page_actor *actor)
-{
- actor->next_page = 1;
- return actor->pageaddr = kmap_atomic(actor->page[0]);
-}
-
-static void *direct_next_page(struct squashfs_page_actor *actor)
-{
- if (actor->pageaddr)
- kunmap_atomic(actor->pageaddr);
-
- return actor->pageaddr = actor->next_page == actor->pages ? NULL :
- kmap_atomic(actor->page[actor->next_page++]);
-}
-
-static void direct_finish_page(struct squashfs_page_actor *actor)
-{
- if (actor->pageaddr)
- kunmap_atomic(actor->pageaddr);
-}
-
-struct squashfs_page_actor *squashfs_page_actor_init_special(struct page **page,
- int pages, int length)
+struct squashfs_page_actor *squashfs_page_actor_init(struct page **page,
+ int pages, int length, void (*release_pages)(struct page **, int, int))
{
struct squashfs_page_actor *actor = kmalloc(sizeof(*actor), GFP_KERNEL);
@@ -93,8 +25,129 @@
actor->pages = pages;
actor->next_page = 0;
actor->pageaddr = NULL;
- actor->squashfs_first_page = direct_first_page;
- actor->squashfs_next_page = direct_next_page;
- actor->squashfs_finish_page = direct_finish_page;
+ actor->release_pages = release_pages;
return actor;
}
+
+void squashfs_page_actor_free(struct squashfs_page_actor *actor, int error)
+{
+ if (!actor)
+ return;
+
+ if (actor->release_pages)
+ actor->release_pages(actor->page, actor->pages, error);
+ kfree(actor);
+}
+
+void squashfs_actor_to_buf(struct squashfs_page_actor *actor, void *buf,
+ int length)
+{
+ void *pageaddr;
+ int pos = 0, avail, i;
+
+ for (i = 0; i < actor->pages && pos < length; ++i) {
+ avail = min_t(int, length - pos, PAGE_SIZE);
+ if (actor->page[i]) {
+ pageaddr = kmap_atomic(actor->page[i]);
+ memcpy(buf + pos, pageaddr, avail);
+ kunmap_atomic(pageaddr);
+ }
+ pos += avail;
+ }
+}
+
+void squashfs_buf_to_actor(void *buf, struct squashfs_page_actor *actor,
+ int length)
+{
+ void *pageaddr;
+ int pos = 0, avail, i;
+
+ for (i = 0; i < actor->pages && pos < length; ++i) {
+ avail = min_t(int, length - pos, PAGE_SIZE);
+ if (actor->page[i]) {
+ pageaddr = kmap_atomic(actor->page[i]);
+ memcpy(pageaddr, buf + pos, avail);
+ kunmap_atomic(pageaddr);
+ }
+ pos += avail;
+ }
+}
+
+void squashfs_bh_to_actor(struct buffer_head **bh, int nr_buffers,
+ struct squashfs_page_actor *actor, int offset, int length, int blksz)
+{
+ void *kaddr = NULL;
+ int bytes = 0, pgoff = 0, b = 0, p = 0, avail, i;
+
+ while (bytes < length) {
+ if (actor->page[p]) {
+ kaddr = kmap_atomic(actor->page[p]);
+ while (pgoff < PAGE_SIZE && bytes < length) {
+ avail = min_t(int, blksz - offset,
+ PAGE_SIZE - pgoff);
+ memcpy(kaddr + pgoff, bh[b]->b_data + offset,
+ avail);
+ pgoff += avail;
+ bytes += avail;
+ offset = (offset + avail) % blksz;
+ if (!offset) {
+ put_bh(bh[b]);
+ ++b;
+ }
+ }
+ kunmap_atomic(kaddr);
+ pgoff = 0;
+ } else {
+ for (i = 0; i < PAGE_SIZE / blksz; ++i) {
+ if (bh[b])
+ put_bh(bh[b]);
+ ++b;
+ }
+ bytes += PAGE_SIZE;
+ }
+ ++p;
+ }
+}
+
+void squashfs_bh_to_buf(struct buffer_head **bh, int nr_buffers, void *buf,
+ int offset, int length, int blksz)
+{
+ int i, avail, bytes = 0;
+
+ for (i = 0; i < nr_buffers && bytes < length; ++i) {
+ avail = min_t(int, length - bytes, blksz - offset);
+ if (bh[i]) {
+ memcpy(buf + bytes, bh[i]->b_data + offset, avail);
+ put_bh(bh[i]);
+ }
+ bytes += avail;
+ offset = 0;
+ }
+}
+
+void free_page_array(struct page **page, int nr_pages)
+{
+ int i;
+
+ for (i = 0; i < nr_pages; ++i)
+ __free_page(page[i]);
+ kfree(page);
+}
+
+struct page **alloc_page_array(int nr_pages, int gfp_mask)
+{
+ int i;
+ struct page **page;
+
+ page = kcalloc(nr_pages, sizeof(struct page *), gfp_mask);
+ if (!page)
+ return NULL;
+ for (i = 0; i < nr_pages; ++i) {
+ page[i] = alloc_page(gfp_mask);
+ if (!page[i]) {
+ free_page_array(page, i);
+ return NULL;
+ }
+ }
+ return page;
+}
diff --git a/fs/squashfs/page_actor.h b/fs/squashfs/page_actor.h
index 98537ea..aa1ed79 100644
--- a/fs/squashfs/page_actor.h
+++ b/fs/squashfs/page_actor.h
@@ -5,77 +5,61 @@
* Phillip Lougher <phillip@squashfs.org.uk>
*
* This work is licensed under the terms of the GNU GPL, version 2. See
- * the COPYING file in the top-level directory.
+ * the COPYING file in the top-level squashfsory.
*/
-#ifndef CONFIG_SQUASHFS_FILE_DIRECT
struct squashfs_page_actor {
- void **page;
+ struct page **page;
+ void *pageaddr;
int pages;
int length;
int next_page;
+ void (*release_pages)(struct page **, int, int);
};
-static inline struct squashfs_page_actor *squashfs_page_actor_init(void **page,
- int pages, int length)
-{
- struct squashfs_page_actor *actor = kmalloc(sizeof(*actor), GFP_KERNEL);
+extern struct squashfs_page_actor *squashfs_page_actor_init(struct page **,
+ int, int, void (*)(struct page **, int, int));
+extern void squashfs_page_actor_free(struct squashfs_page_actor *, int);
- if (actor == NULL)
- return NULL;
+extern void squashfs_actor_to_buf(struct squashfs_page_actor *, void *, int);
+extern void squashfs_buf_to_actor(void *, struct squashfs_page_actor *, int);
+extern void squashfs_bh_to_actor(struct buffer_head **, int,
+ struct squashfs_page_actor *, int, int, int);
+extern void squashfs_bh_to_buf(struct buffer_head **, int, void *, int, int,
+ int);
- actor->length = length ? : pages * PAGE_SIZE;
- actor->page = page;
- actor->pages = pages;
- actor->next_page = 0;
- return actor;
-}
-
+/*
+ * Calling code should avoid sleeping between calls to squashfs_first_page()
+ * and squashfs_finish_page().
+ */
static inline void *squashfs_first_page(struct squashfs_page_actor *actor)
{
actor->next_page = 1;
- return actor->page[0];
+ return actor->pageaddr = actor->page[0] ? kmap_atomic(actor->page[0])
+ : NULL;
}
static inline void *squashfs_next_page(struct squashfs_page_actor *actor)
{
- return actor->next_page == actor->pages ? NULL :
- actor->page[actor->next_page++];
+ if (!IS_ERR_OR_NULL(actor->pageaddr))
+ kunmap_atomic(actor->pageaddr);
+
+ if (actor->next_page == actor->pages)
+ return actor->pageaddr = ERR_PTR(-ENODATA);
+
+ actor->pageaddr = actor->page[actor->next_page] ?
+ kmap_atomic(actor->page[actor->next_page]) : NULL;
+ ++actor->next_page;
+ return actor->pageaddr;
}
static inline void squashfs_finish_page(struct squashfs_page_actor *actor)
{
- /* empty */
+ if (!IS_ERR_OR_NULL(actor->pageaddr))
+ kunmap_atomic(actor->pageaddr);
}
-#else
-struct squashfs_page_actor {
- union {
- void **buffer;
- struct page **page;
- };
- void *pageaddr;
- void *(*squashfs_first_page)(struct squashfs_page_actor *);
- void *(*squashfs_next_page)(struct squashfs_page_actor *);
- void (*squashfs_finish_page)(struct squashfs_page_actor *);
- int pages;
- int length;
- int next_page;
-};
-extern struct squashfs_page_actor *squashfs_page_actor_init(void **, int, int);
-extern struct squashfs_page_actor *squashfs_page_actor_init_special(struct page
- **, int, int);
-static inline void *squashfs_first_page(struct squashfs_page_actor *actor)
-{
- return actor->squashfs_first_page(actor);
-}
-static inline void *squashfs_next_page(struct squashfs_page_actor *actor)
-{
- return actor->squashfs_next_page(actor);
-}
-static inline void squashfs_finish_page(struct squashfs_page_actor *actor)
-{
- actor->squashfs_finish_page(actor);
-}
-#endif
+extern struct page **alloc_page_array(int, int);
+extern void free_page_array(struct page **, int);
+
#endif
diff --git a/fs/squashfs/squashfs.h b/fs/squashfs/squashfs.h
index f89f8a7..a9ea6ee 100644
--- a/fs/squashfs/squashfs.h
+++ b/fs/squashfs/squashfs.h
@@ -28,8 +28,12 @@
#define WARNING(s, args...) pr_warn("SQUASHFS: "s, ## args)
/* block.c */
+extern int squashfs_init_read_wq(void);
+extern void squashfs_destroy_read_wq(void);
extern int squashfs_read_data(struct super_block *, u64, int, u64 *,
struct squashfs_page_actor *);
+extern int squashfs_read_data_async(struct super_block *, u64, int, u64 *,
+ struct squashfs_page_actor *);
/* cache.c */
extern struct squashfs_cache *squashfs_cache_init(char *, int, int);
@@ -71,8 +75,9 @@
void squashfs_copy_cache(struct page *, struct squashfs_cache_entry *, int,
int);
-/* file_xxx.c */
-extern int squashfs_readpage_block(struct page *, u64, int, int);
+/* file_direct.c */
+extern int squashfs_readpages_block(struct page *, struct list_head *,
+ unsigned int *, struct address_space *, int, u64, int);
/* id.c */
extern int squashfs_get_id(struct super_block *, unsigned int, unsigned int *);
diff --git a/fs/squashfs/squashfs_fs_sb.h b/fs/squashfs/squashfs_fs_sb.h
index ef69c31..3b767ce 100644
--- a/fs/squashfs/squashfs_fs_sb.h
+++ b/fs/squashfs/squashfs_fs_sb.h
@@ -49,7 +49,7 @@
int num_waiters;
wait_queue_head_t wait_queue;
struct squashfs_cache *cache;
- void **data;
+ struct page **page;
struct squashfs_page_actor *actor;
};
diff --git a/fs/squashfs/super.c b/fs/squashfs/super.c
index 1516bb7..445ce58 100644
--- a/fs/squashfs/super.c
+++ b/fs/squashfs/super.c
@@ -445,9 +445,15 @@
if (err)
return err;
+ if (!squashfs_init_read_wq()) {
+ destroy_inodecache();
+ return -ENOMEM;
+ }
+
err = register_filesystem(&squashfs_fs_type);
if (err) {
destroy_inodecache();
+ squashfs_destroy_read_wq();
return err;
}
@@ -461,6 +467,7 @@
{
unregister_filesystem(&squashfs_fs_type);
destroy_inodecache();
+ squashfs_destroy_read_wq();
}
diff --git a/fs/squashfs/xz_wrapper.c b/fs/squashfs/xz_wrapper.c
index 6bfaef7..2f7be1f 100644
--- a/fs/squashfs/xz_wrapper.c
+++ b/fs/squashfs/xz_wrapper.c
@@ -55,7 +55,7 @@
struct comp_opts *opts;
int err = 0, n;
- opts = kmalloc(sizeof(*opts), GFP_KERNEL);
+ opts = kmalloc(sizeof(*opts), GFP_ATOMIC);
if (opts == NULL) {
err = -ENOMEM;
goto out2;
@@ -136,6 +136,7 @@
enum xz_ret xz_err;
int avail, total = 0, k = 0;
struct squashfs_xz *stream = strm;
+ void *buf = NULL;
xz_dec_reset(stream->state);
stream->buf.in_pos = 0;
@@ -156,12 +157,20 @@
if (stream->buf.out_pos == stream->buf.out_size) {
stream->buf.out = squashfs_next_page(output);
- if (stream->buf.out != NULL) {
+ if (!IS_ERR(stream->buf.out)) {
stream->buf.out_pos = 0;
total += PAGE_SIZE;
}
}
+ if (!stream->buf.out) {
+ if (!buf) {
+ buf = kmalloc(PAGE_SIZE, GFP_ATOMIC);
+ if (!buf)
+ goto out;
+ }
+ stream->buf.out = buf;
+ }
xz_err = xz_dec_run(stream->state, &stream->buf);
if (stream->buf.in_pos == stream->buf.in_size && k < b)
@@ -173,11 +182,13 @@
if (xz_err != XZ_STREAM_END || k < b)
goto out;
+ kfree(buf);
return total + stream->buf.out_pos;
out:
for (; k < b; k++)
put_bh(bh[k]);
+ kfree(buf);
return -EIO;
}
diff --git a/fs/squashfs/zlib_wrapper.c b/fs/squashfs/zlib_wrapper.c
index 2ec24d1..d917c72 100644
--- a/fs/squashfs/zlib_wrapper.c
+++ b/fs/squashfs/zlib_wrapper.c
@@ -66,6 +66,7 @@
struct buffer_head **bh, int b, int offset, int length,
struct squashfs_page_actor *output)
{
+ void *buf = NULL;
int zlib_err, zlib_init = 0, k = 0;
z_stream *stream = strm;
@@ -84,10 +85,19 @@
if (stream->avail_out == 0) {
stream->next_out = squashfs_next_page(output);
- if (stream->next_out != NULL)
+ if (!IS_ERR(stream->next_out))
stream->avail_out = PAGE_SIZE;
}
+ if (!stream->next_out) {
+ if (!buf) {
+ buf = kmalloc(PAGE_SIZE, GFP_ATOMIC);
+ if (!buf)
+ goto out;
+ }
+ stream->next_out = buf;
+ }
+
if (!zlib_init) {
zlib_err = zlib_inflateInit(stream);
if (zlib_err != Z_OK) {
@@ -115,11 +125,13 @@
if (k < b)
goto out;
+ kfree(buf);
return stream->total_out;
out:
for (; k < b; k++)
put_bh(bh[k]);
+ kfree(buf);
return -EIO;
}
diff --git a/fs/super.c b/fs/super.c
index 219f7ca..589f919 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -814,7 +814,8 @@
}
/**
- * do_remount_sb - asks filesystem to change mount options.
+ * do_remount_sb2 - asks filesystem to change mount options.
+ * @mnt: mount we are looking at
* @sb: superblock in question
* @sb_flags: revised superblock flags
* @data: the rest of options
@@ -822,7 +823,7 @@
*
* Alters the mount options of a mounted file system.
*/
-int do_remount_sb(struct super_block *sb, int sb_flags, void *data, int force)
+int do_remount_sb2(struct vfsmount *mnt, struct super_block *sb, int sb_flags, void *data, int force)
{
int retval;
int remount_ro;
@@ -864,7 +865,16 @@
}
}
- if (sb->s_op->remount_fs) {
+ if (mnt && sb->s_op->remount_fs2) {
+ retval = sb->s_op->remount_fs2(mnt, sb, &sb_flags, data);
+ if (retval) {
+ if (!force)
+ goto cancel_readonly;
+ /* If forced remount, go ahead despite any errors */
+ WARN(1, "forced remount of a %s fs returned %i\n",
+ sb->s_type->name, retval);
+ }
+ } else if (sb->s_op->remount_fs) {
retval = sb->s_op->remount_fs(sb, &sb_flags, data);
if (retval) {
if (!force)
@@ -896,12 +906,17 @@
return retval;
}
+int do_remount_sb(struct super_block *sb, int flags, void *data, int force)
+{
+ return do_remount_sb2(NULL, sb, flags, data, force);
+}
+
static void do_emergency_remount(struct work_struct *work)
{
struct super_block *sb, *p = NULL;
spin_lock(&sb_lock);
- list_for_each_entry(sb, &super_blocks, s_list) {
+ list_for_each_entry_reverse(sb, &super_blocks, s_list) {
if (hlist_unhashed(&sb->s_instances))
continue;
sb->s_count++;
@@ -1217,7 +1232,7 @@
EXPORT_SYMBOL(mount_single);
struct dentry *
-mount_fs(struct file_system_type *type, int flags, const char *name, void *data)
+mount_fs(struct file_system_type *type, int flags, const char *name, struct vfsmount *mnt, void *data)
{
struct dentry *root;
struct super_block *sb;
@@ -1234,7 +1249,10 @@
goto out_free_secdata;
}
- root = type->mount(type, flags, name, data);
+ if (type->mount2)
+ root = type->mount2(mnt, type, flags, name, data);
+ else
+ root = type->mount(type, flags, name, data);
if (IS_ERR(root)) {
error = PTR_ERR(root);
goto out_free_secdata;
diff --git a/fs/sync.c b/fs/sync.c
index 83ac79a..12f2aa5 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -9,7 +9,7 @@
#include <linux/slab.h>
#include <linux/export.h>
#include <linux/namei.h>
-#include <linux/sched.h>
+#include <linux/sched/xacct.h>
#include <linux/writeback.h>
#include <linux/syscalls.h>
#include <linux/linkage.h>
@@ -219,6 +219,7 @@
if (f.file) {
ret = vfs_fsync(f.file, datasync);
fdput(f);
+ inc_syscfs(current);
}
return ret;
}
diff --git a/fs/ubifs/crypto.c b/fs/ubifs/crypto.c
index 16a5d5c..55c508f 100644
--- a/fs/ubifs/crypto.c
+++ b/fs/ubifs/crypto.c
@@ -24,14 +24,6 @@
return ubifs_check_dir_empty(inode) == 0;
}
-static unsigned int ubifs_crypt_max_namelen(struct inode *inode)
-{
- if (S_ISLNK(inode->i_mode))
- return UBIFS_MAX_INO_DATA;
- else
- return UBIFS_MAX_NLEN;
-}
-
int ubifs_encrypt(const struct inode *inode, struct ubifs_data_node *dn,
unsigned int in_len, unsigned int *out_len, int block)
{
@@ -88,7 +80,6 @@
.key_prefix = "ubifs:",
.get_context = ubifs_crypt_get_context,
.set_context = ubifs_crypt_set_context,
- .is_encrypted = __ubifs_crypt_is_encrypted,
.empty_dir = ubifs_crypt_empty_dir,
- .max_namelen = ubifs_crypt_max_namelen,
+ .max_namelen = UBIFS_MAX_NLEN,
};
diff --git a/fs/ubifs/dir.c b/fs/ubifs/dir.c
index ef820f8..7fa97e6 100644
--- a/fs/ubifs/dir.c
+++ b/fs/ubifs/dir.c
@@ -220,20 +220,9 @@
dbg_gen("'%pd' in dir ino %lu", dentry, dir->i_ino);
- if (ubifs_crypt_is_encrypted(dir)) {
- err = fscrypt_get_encryption_info(dir);
-
- /*
- * DCACHE_ENCRYPTED_WITH_KEY is set if the dentry is
- * created while the directory was encrypted and we
- * have access to the key.
- */
- if (fscrypt_has_encryption_key(dir))
- fscrypt_set_encrypted_dentry(dentry);
- fscrypt_set_d_op(dentry);
- if (err && err != -ENOKEY)
- return ERR_PTR(err);
- }
+ err = fscrypt_prepare_lookup(dir, dentry, flags);
+ if (err)
+ return ERR_PTR(err);
err = fscrypt_setup_filename(dir, &dentry->d_name, 1, &nm);
if (err)
@@ -1149,38 +1138,24 @@
struct ubifs_info *c = dir->i_sb->s_fs_info;
int err, len = strlen(symname);
int sz_change = CALC_DENT_SIZE(len);
- struct fscrypt_str disk_link = FSTR_INIT((char *)symname, len + 1);
- struct fscrypt_symlink_data *sd = NULL;
+ struct fscrypt_str disk_link;
struct ubifs_budget_req req = { .new_ino = 1, .new_dent = 1,
.new_ino_d = ALIGN(len, 8),
.dirtied_ino = 1 };
struct fscrypt_name nm;
- if (ubifs_crypt_is_encrypted(dir)) {
- err = fscrypt_get_encryption_info(dir);
- if (err)
- goto out_budg;
+ dbg_gen("dent '%pd', target '%s' in dir ino %lu", dentry,
+ symname, dir->i_ino);
- if (!fscrypt_has_encryption_key(dir)) {
- err = -EPERM;
- goto out_budg;
- }
-
- disk_link.len = (fscrypt_fname_encrypted_size(dir, len) +
- sizeof(struct fscrypt_symlink_data));
- }
+ err = fscrypt_prepare_symlink(dir, symname, len, UBIFS_MAX_INO_DATA,
+ &disk_link);
+ if (err)
+ return err;
/*
* Budget request settings: new inode, new direntry and changing parent
* directory inode.
*/
-
- dbg_gen("dent '%pd', target '%s' in dir ino %lu", dentry,
- symname, dir->i_ino);
-
- if (disk_link.len > UBIFS_MAX_INO_DATA)
- return -ENAMETOOLONG;
-
err = ubifs_budget_space(c, &req);
if (err)
return err;
@@ -1202,36 +1177,20 @@
goto out_inode;
}
- if (ubifs_crypt_is_encrypted(dir)) {
- struct qstr istr = QSTR_INIT(symname, len);
- struct fscrypt_str ostr;
-
- sd = kzalloc(disk_link.len, GFP_NOFS);
- if (!sd) {
- err = -ENOMEM;
- goto out_inode;
- }
-
- ostr.name = sd->encrypted_path;
- ostr.len = disk_link.len;
-
- err = fscrypt_fname_usr_to_disk(inode, &istr, &ostr);
+ if (IS_ENCRYPTED(inode)) {
+ disk_link.name = ui->data; /* encrypt directly into ui->data */
+ err = fscrypt_encrypt_symlink(inode, symname, len, &disk_link);
if (err)
goto out_inode;
-
- sd->len = cpu_to_le16(ostr.len);
- disk_link.name = (char *)sd;
} else {
+ memcpy(ui->data, disk_link.name, disk_link.len);
inode->i_link = ui->data;
}
- memcpy(ui->data, disk_link.name, disk_link.len);
- ((char *)ui->data)[disk_link.len - 1] = '\0';
-
/*
* The terminating zero byte is not written to the flash media and it
* is put just to make later in-memory string processing simpler. Thus,
- * data length is @len, not @len + %1.
+ * data length is @disk_link.len - 1, not @disk_link.len.
*/
ui->data_len = disk_link.len - 1;
inode->i_size = ubifs_inode(inode)->ui_size = disk_link.len - 1;
@@ -1265,7 +1224,6 @@
fscrypt_free_filename(&nm);
out_budg:
ubifs_release_budget(c, &req);
- kfree(sd);
return err;
}
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index a02aa59..ff44fd9 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1662,49 +1662,17 @@
struct inode *inode,
struct delayed_call *done)
{
- int err;
- struct fscrypt_symlink_data *sd;
struct ubifs_inode *ui = ubifs_inode(inode);
- struct fscrypt_str cstr;
- struct fscrypt_str pstr;
- if (!ubifs_crypt_is_encrypted(inode))
+ if (!IS_ENCRYPTED(inode))
return ui->data;
if (!dentry)
return ERR_PTR(-ECHILD);
- err = fscrypt_get_encryption_info(inode);
- if (err)
- return ERR_PTR(err);
-
- sd = (struct fscrypt_symlink_data *)ui->data;
- cstr.name = sd->encrypted_path;
- cstr.len = le16_to_cpu(sd->len);
-
- if (cstr.len == 0)
- return ERR_PTR(-ENOENT);
-
- if ((cstr.len + sizeof(struct fscrypt_symlink_data) - 1) > ui->data_len)
- return ERR_PTR(-EIO);
-
- err = fscrypt_fname_alloc_buffer(inode, cstr.len, &pstr);
- if (err)
- return ERR_PTR(err);
-
- err = fscrypt_fname_disk_to_usr(inode, 0, 0, &cstr, &pstr);
- if (err) {
- fscrypt_fname_free_buffer(&pstr);
- return ERR_PTR(err);
- }
-
- pstr.name[pstr.len] = '\0';
-
- set_delayed_call(done, kfree_link, pstr.name);
- return pstr.name;
+ return fscrypt_get_symlink(inode, ui->data, ui->data_len, done);
}
-
const struct address_space_operations ubifs_file_address_operations = {
.readpage = ubifs_readpage,
.writepage = ubifs_writepage,
diff --git a/fs/ubifs/ioctl.c b/fs/ubifs/ioctl.c
index fdc3112..0164bcc 100644
--- a/fs/ubifs/ioctl.c
+++ b/fs/ubifs/ioctl.c
@@ -38,7 +38,8 @@
{
unsigned int flags = ubifs_inode(inode)->flags;
- inode->i_flags &= ~(S_SYNC | S_APPEND | S_IMMUTABLE | S_DIRSYNC);
+ inode->i_flags &= ~(S_SYNC | S_APPEND | S_IMMUTABLE | S_DIRSYNC |
+ S_ENCRYPTED);
if (flags & UBIFS_SYNC_FL)
inode->i_flags |= S_SYNC;
if (flags & UBIFS_APPEND_FL)
@@ -47,6 +48,8 @@
inode->i_flags |= S_IMMUTABLE;
if (flags & UBIFS_DIRSYNC_FL)
inode->i_flags |= S_DIRSYNC;
+ if (flags & UBIFS_CRYPT_FL)
+ inode->i_flags |= S_ENCRYPTED;
}
/*
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index e1cd3dc..3db93c5 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -379,9 +379,7 @@
}
done:
clear_inode(inode);
-#ifdef CONFIG_UBIFS_FS_ENCRYPTION
- fscrypt_put_encryption_info(inode, NULL);
-#endif
+ fscrypt_put_encryption_info(inode);
}
static void ubifs_dirty_inode(struct inode *inode, int flags)
@@ -2013,12 +2011,6 @@
return c;
}
-#ifndef CONFIG_UBIFS_FS_ENCRYPTION
-const struct fscrypt_operations ubifs_crypt_operations = {
- .is_encrypted = __ubifs_crypt_is_encrypted,
-};
-#endif
-
static int ubifs_fill_super(struct super_block *sb, void *data, int silent)
{
struct ubifs_info *c = sb->s_fs_info;
@@ -2061,7 +2053,9 @@
sb->s_maxbytes = c->max_inode_sz = MAX_LFS_FILESIZE;
sb->s_op = &ubifs_super_operations;
sb->s_xattr = ubifs_xattr_handlers;
+#ifdef CONFIG_UBIFS_FS_ENCRYPTION
sb->s_cop = &ubifs_crypt_operations;
+#endif
mutex_lock(&c->umount_mutex);
err = mount_ubifs(c);
diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index cd43651..63c7468 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -38,12 +38,11 @@
#include <linux/backing-dev.h>
#include <linux/security.h>
#include <linux/xattr.h>
-#ifdef CONFIG_UBIFS_FS_ENCRYPTION
-#include <linux/fscrypt_supp.h>
-#else
-#include <linux/fscrypt_notsupp.h>
-#endif
#include <linux/random.h>
+
+#define __FS_HAS_ENCRYPTION IS_ENABLED(CONFIG_UBIFS_FS_ENCRYPTION)
+#include <linux/fscrypt.h>
+
#include "ubifs-media.h"
/* Version of this UBIFS implementation */
@@ -1835,16 +1834,11 @@
extern const struct fscrypt_operations ubifs_crypt_operations;
-static inline bool __ubifs_crypt_is_encrypted(struct inode *inode)
-{
- struct ubifs_inode *ui = ubifs_inode(inode);
-
- return ui->flags & UBIFS_CRYPT_FL;
-}
-
static inline bool ubifs_crypt_is_encrypted(const struct inode *inode)
{
- return __ubifs_crypt_is_encrypted((struct inode *)inode);
+ const struct ubifs_inode *ui = ubifs_inode(inode);
+
+ return ui->flags & UBIFS_CRYPT_FL;
}
/* Normal UBIFS messages */
diff --git a/fs/ubifs/xattr.c b/fs/ubifs/xattr.c
index d47f16c..1ebd853 100644
--- a/fs/ubifs/xattr.c
+++ b/fs/ubifs/xattr.c
@@ -176,6 +176,7 @@
err = ubifs_jnl_update(c, host, nm, inode, 0, 1);
if (err)
goto out_cancel;
+ ubifs_set_inode_flags(host);
mutex_unlock(&host_ui->ui_mutex);
ubifs_release_budget(c, &req);
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 3eda623..7e931f4 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -881,7 +881,8 @@
new_flags, vma->anon_vma,
vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
- NULL_VM_UFFD_CTX);
+ NULL_VM_UFFD_CTX,
+ vma_get_anon_name(vma));
if (prev)
vma = prev;
else
@@ -1424,7 +1425,8 @@
prev = vma_merge(mm, prev, start, vma_end, new_flags,
vma->anon_vma, vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
- ((struct vm_userfaultfd_ctx){ ctx }));
+ ((struct vm_userfaultfd_ctx){ ctx }),
+ vma_get_anon_name(vma));
if (prev) {
vma = prev;
goto next;
@@ -1582,7 +1584,8 @@
prev = vma_merge(mm, prev, start, vma_end, new_flags,
vma->anon_vma, vma->vm_file, vma->vm_pgoff,
vma_policy(vma),
- NULL_VM_UFFD_CTX);
+ NULL_VM_UFFD_CTX,
+ vma_get_anon_name(vma));
if (prev) {
vma = prev;
goto next;
diff --git a/fs/utimes.c b/fs/utimes.c
index e4b3d7c..4f3b158 100644
--- a/fs/utimes.c
+++ b/fs/utimes.c
@@ -88,7 +88,7 @@
}
retry_deleg:
inode_lock(inode);
- error = notify_change(path->dentry, &newattrs, &delegated_inode);
+ error = notify_change2(path->mnt, path->dentry, &newattrs, &delegated_inode);
inode_unlock(inode);
if (delegated_inode) {
error = break_deleg_wait(&delegated_inode);
diff --git a/fs/xattr.c b/fs/xattr.c
index be2ce57..2a8d30a 100644
--- a/fs/xattr.c
+++ b/fs/xattr.c
@@ -131,7 +131,7 @@
return -EPERM;
}
- return inode_permission(inode, mask);
+ return inode_permission2(ERR_PTR(-EOPNOTSUPP), inode, mask);
}
int
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index fcec26d..6224bb4 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -67,10 +67,12 @@
*/
#ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
#define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
+#define TEXT_CFI_MAIN .text.cfi .text.[0-9a-zA-Z_]*.cfi
#define DATA_MAIN .data .data.[0-9a-zA-Z_]*
#define BSS_MAIN .bss .bss.[0-9a-zA-Z_]*
#else
#define TEXT_MAIN .text
+#define TEXT_CFI_MAIN .text.cfi
#define DATA_MAIN .data
#define BSS_MAIN .bss
#endif
@@ -105,7 +107,7 @@
#ifdef CONFIG_FTRACE_MCOUNT_RECORD
#define MCOUNT_REC() . = ALIGN(8); \
VMLINUX_SYMBOL(__start_mcount_loc) = .; \
- *(__mcount_loc) \
+ KEEP(*(__mcount_loc)) \
VMLINUX_SYMBOL(__stop_mcount_loc) = .;
#else
#define MCOUNT_REC()
@@ -460,6 +462,8 @@
ALIGN_FUNCTION(); \
*(.text.hot TEXT_MAIN .text.fixup .text.unlikely) \
*(.text..refcount) \
+ *(.text..ftrace) \
+ *(TEXT_CFI_MAIN) \
*(.ref.text) \
MEM_KEEP(init.text) \
MEM_KEEP(exit.text) \
@@ -512,7 +516,7 @@
VMLINUX_SYMBOL(__softirqentry_text_end) = .;
/* Section used for early init (in .S files) */
-#define HEAD_TEXT *(.head.text)
+#define HEAD_TEXT KEEP(*(.head.text))
#define HEAD_TEXT_SECTION \
.head.text : AT(ADDR(.head.text) - LOAD_OFFSET) { \
@@ -557,7 +561,7 @@
MEM_DISCARD(init.data) \
KERNEL_CTORS() \
MCOUNT_REC() \
- *(.init.rodata) \
+ *(.init.rodata .init.rodata.*) \
FTRACE_EVENTS() \
TRACE_SYSCALLS() \
KPROBE_BLACKLIST() \
@@ -576,7 +580,7 @@
EARLYCON_TABLE()
#define INIT_TEXT \
- *(.init.text) \
+ *(.init.text .init.text.*) \
*(.text.startup) \
MEM_DISCARD(init.text)
@@ -593,7 +597,7 @@
MEM_DISCARD(exit.text)
#define EXIT_CALL \
- *(.exitcall.exit)
+ KEEP(*(.exitcall.exit))
/*
* bss (Block Started by Symbol) - uninitialized data
diff --git a/include/crypto/speck.h b/include/crypto/speck.h
new file mode 100644
index 0000000..73cfc95
--- /dev/null
+++ b/include/crypto/speck.h
@@ -0,0 +1,62 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Common values for the Speck algorithm
+ */
+
+#ifndef _CRYPTO_SPECK_H
+#define _CRYPTO_SPECK_H
+
+#include <linux/types.h>
+
+/* Speck128 */
+
+#define SPECK128_BLOCK_SIZE 16
+
+#define SPECK128_128_KEY_SIZE 16
+#define SPECK128_128_NROUNDS 32
+
+#define SPECK128_192_KEY_SIZE 24
+#define SPECK128_192_NROUNDS 33
+
+#define SPECK128_256_KEY_SIZE 32
+#define SPECK128_256_NROUNDS 34
+
+struct speck128_tfm_ctx {
+ u64 round_keys[SPECK128_256_NROUNDS];
+ int nrounds;
+};
+
+void crypto_speck128_encrypt(const struct speck128_tfm_ctx *ctx,
+ u8 *out, const u8 *in);
+
+void crypto_speck128_decrypt(const struct speck128_tfm_ctx *ctx,
+ u8 *out, const u8 *in);
+
+int crypto_speck128_setkey(struct speck128_tfm_ctx *ctx, const u8 *key,
+ unsigned int keysize);
+
+/* Speck64 */
+
+#define SPECK64_BLOCK_SIZE 8
+
+#define SPECK64_96_KEY_SIZE 12
+#define SPECK64_96_NROUNDS 26
+
+#define SPECK64_128_KEY_SIZE 16
+#define SPECK64_128_NROUNDS 27
+
+struct speck64_tfm_ctx {
+ u32 round_keys[SPECK64_128_NROUNDS];
+ int nrounds;
+};
+
+void crypto_speck64_encrypt(const struct speck64_tfm_ctx *ctx,
+ u8 *out, const u8 *in);
+
+void crypto_speck64_decrypt(const struct speck64_tfm_ctx *ctx,
+ u8 *out, const u8 *in);
+
+int crypto_speck64_setkey(struct speck64_tfm_ctx *ctx, const u8 *key,
+ unsigned int keysize);
+
+#endif /* _CRYPTO_SPECK_H */
diff --git a/include/linux/amba/mmci.h b/include/linux/amba/mmci.h
index da8357b..02dc2f3 100644
--- a/include/linux/amba/mmci.h
+++ b/include/linux/amba/mmci.h
@@ -6,6 +6,15 @@
#define AMBA_MMCI_H
#include <linux/mmc/host.h>
+#include <linux/mmc/card.h>
+#include <linux/mmc/sdio_func.h>
+
+struct embedded_sdio_data {
+ struct sdio_cis cis;
+ struct sdio_cccr cccr;
+ struct sdio_embedded_func *funcs;
+ int num_funcs;
+};
/**
* struct mmci_platform_data - platform configuration for the MMCI
@@ -32,6 +41,7 @@
int gpio_wp;
int gpio_cd;
bool cd_invert;
+ struct embedded_sdio_data *embedded_sdio;
};
#endif
diff --git a/include/linux/android_aid.h b/include/linux/android_aid.h
new file mode 100644
index 0000000..6f1fa179
--- /dev/null
+++ b/include/linux/android_aid.h
@@ -0,0 +1,28 @@
+/* include/linux/android_aid.h
+ *
+ * Copyright (C) 2008 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_ANDROID_AID_H
+#define _LINUX_ANDROID_AID_H
+
+/* AIDs that the kernel treats differently */
+#define AID_OBSOLETE_000 KGIDT_INIT(3001) /* was NET_BT_ADMIN */
+#define AID_OBSOLETE_001 KGIDT_INIT(3002) /* was NET_BT */
+#define AID_INET KGIDT_INIT(3003)
+#define AID_NET_RAW KGIDT_INIT(3004)
+#define AID_NET_ADMIN KGIDT_INIT(3005)
+#define AID_NET_BW_STATS KGIDT_INIT(3006) /* read bandwidth statistics */
+#define AID_NET_BW_ACCT KGIDT_INIT(3007) /* change bandwidth statistics accounting */
+
+#endif
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index d4fcb0e..d70f954 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -6,15 +6,43 @@
#define _LINUX_ARCH_TOPOLOGY_H_
#include <linux/types.h>
+#include <linux/percpu.h>
void topology_normalize_cpu_scale(void);
+int topology_detect_flags(void);
+int topology_smt_flags(void);
+int topology_core_flags(void);
+int topology_cpu_flags(void);
+int topology_update_cpu_topology(void);
struct device_node;
bool topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu);
+DECLARE_PER_CPU(unsigned long, cpu_scale);
+
struct sched_domain;
-unsigned long topology_get_cpu_scale(struct sched_domain *sd, int cpu);
+static inline
+unsigned long topology_get_cpu_scale(struct sched_domain *sd, int cpu)
+{
+ return per_cpu(cpu_scale, cpu);
+}
void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity);
+DECLARE_PER_CPU(unsigned long, freq_scale);
+
+static inline
+unsigned long topology_get_freq_scale(struct sched_domain *sd, int cpu)
+{
+ return per_cpu(freq_scale, cpu);
+}
+
+DECLARE_PER_CPU(unsigned long, max_freq_scale);
+
+static inline
+unsigned long topology_get_max_freq_scale(struct sched_domain *sd, int cpu)
+{
+ return per_cpu(max_freq_scale, cpu);
+}
+
#endif /* _LINUX_ARCH_TOPOLOGY_H_ */
diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h
index ca1d2cc..43072b1 100644
--- a/include/linux/arm-smccc.h
+++ b/include/linux/arm-smccc.h
@@ -160,6 +160,7 @@
#define SMCCC_SMC_INST "smc #0"
#define SMCCC_HVC_INST "hvc #0"
+#define SMCCC_REG(n) asm("x" # n)
#elif defined(CONFIG_ARM)
#include <asm/opcodes-sec.h>
@@ -167,6 +168,7 @@
#define SMCCC_SMC_INST __SMC(0)
#define SMCCC_HVC_INST __HVC(0)
+#define SMCCC_REG(n) asm("r" # n)
#endif
@@ -199,47 +201,47 @@
#define __declare_arg_0(a0, res) \
struct arm_smccc_res *___res = res; \
- register u32 r0 asm("r0") = a0; \
- register unsigned long r1 asm("r1"); \
- register unsigned long r2 asm("r2"); \
- register unsigned long r3 asm("r3")
+ register u32 r0 SMCCC_REG(0) = a0; \
+ register unsigned long r1 SMCCC_REG(1); \
+ register unsigned long r2 SMCCC_REG(2); \
+ register unsigned long r3 SMCCC_REG(3)
#define __declare_arg_1(a0, a1, res) \
struct arm_smccc_res *___res = res; \
- register u32 r0 asm("r0") = a0; \
- register typeof(a1) r1 asm("r1") = a1; \
- register unsigned long r2 asm("r2"); \
- register unsigned long r3 asm("r3")
+ register u32 r0 SMCCC_REG(0) = a0; \
+ register typeof(a1) r1 SMCCC_REG(1) = a1; \
+ register unsigned long r2 SMCCC_REG(2); \
+ register unsigned long r3 SMCCC_REG(3)
#define __declare_arg_2(a0, a1, a2, res) \
struct arm_smccc_res *___res = res; \
- register u32 r0 asm("r0") = a0; \
- register typeof(a1) r1 asm("r1") = a1; \
- register typeof(a2) r2 asm("r2") = a2; \
- register unsigned long r3 asm("r3")
+ register u32 r0 SMCCC_REG(0) = a0; \
+ register typeof(a1) r1 SMCCC_REG(1) = a1; \
+ register typeof(a2) r2 SMCCC_REG(2) = a2; \
+ register unsigned long r3 SMCCC_REG(3)
#define __declare_arg_3(a0, a1, a2, a3, res) \
struct arm_smccc_res *___res = res; \
- register u32 r0 asm("r0") = a0; \
- register typeof(a1) r1 asm("r1") = a1; \
- register typeof(a2) r2 asm("r2") = a2; \
- register typeof(a3) r3 asm("r3") = a3
+ register u32 r0 SMCCC_REG(0) = a0; \
+ register typeof(a1) r1 SMCCC_REG(1) = a1; \
+ register typeof(a2) r2 SMCCC_REG(2) = a2; \
+ register typeof(a3) r3 SMCCC_REG(3) = a3
#define __declare_arg_4(a0, a1, a2, a3, a4, res) \
__declare_arg_3(a0, a1, a2, a3, res); \
- register typeof(a4) r4 asm("r4") = a4
+ register typeof(a4) r4 SMCCC_REG(4) = a4
#define __declare_arg_5(a0, a1, a2, a3, a4, a5, res) \
__declare_arg_4(a0, a1, a2, a3, a4, res); \
- register typeof(a5) r5 asm("r5") = a5
+ register typeof(a5) r5 SMCCC_REG(5) = a5
#define __declare_arg_6(a0, a1, a2, a3, a4, a5, a6, res) \
__declare_arg_5(a0, a1, a2, a3, a4, a5, res); \
- register typeof(a6) r6 asm("r6") = a6
+ register typeof(a6) r6 SMCCC_REG(6) = a6
#define __declare_arg_7(a0, a1, a2, a3, a4, a5, a6, a7, res) \
__declare_arg_6(a0, a1, a2, a3, a4, a5, a6, res); \
- register typeof(a7) r7 asm("r7") = a7
+ register typeof(a7) r7 SMCCC_REG(7) = a7
#define ___declare_args(count, ...) __declare_arg_ ## count(__VA_ARGS__)
#define __declare_args(count, ...) ___declare_args(count, __VA_ARGS__)
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5c5be80..0961516 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -198,6 +198,9 @@
struct bpf_map **used_maps;
struct bpf_prog *prog;
struct user_struct *user;
+#ifdef CONFIG_SECURITY
+ void *security;
+#endif
union {
struct work_struct work;
struct rcu_head rcu;
@@ -253,6 +256,9 @@
#ifdef CONFIG_BPF_SYSCALL
DECLARE_PER_CPU(int, bpf_prog_active);
+extern const struct file_operations bpf_map_fops;
+extern const struct file_operations bpf_prog_fops;
+
#define BPF_PROG_TYPE(_id, _ops) \
extern const struct bpf_verifier_ops _ops;
#define BPF_MAP_TYPE(_id, _ops) \
@@ -282,11 +288,11 @@
extern int sysctl_unprivileged_bpf_disabled;
-int bpf_map_new_fd(struct bpf_map *map);
+int bpf_map_new_fd(struct bpf_map *map, int flags);
int bpf_prog_new_fd(struct bpf_prog *prog);
int bpf_obj_pin_user(u32 ufd, const char __user *pathname);
-int bpf_obj_get_user(const char __user *pathname);
+int bpf_obj_get_user(const char __user *pathname, int flags);
int bpf_percpu_hash_copy(struct bpf_map *map, void *key, void *value);
int bpf_percpu_array_copy(struct bpf_map *map, void *key, void *value);
@@ -305,6 +311,8 @@
void *key, void *value, u64 map_flags);
int bpf_fd_htab_map_lookup_elem(struct bpf_map *map, void *key, u32 *value);
+int bpf_get_file_flag(int flags);
+
/* memcpy that is used with 8-byte aligned pointers, power-of-8 size and
* forced to use 'long' read/writes to try to atomically copy long counters.
* Best-effort only. No barriers here, since it _will_ race with concurrent
@@ -381,7 +389,7 @@
{
}
-static inline int bpf_obj_get_user(const char __user *pathname)
+static inline int bpf_obj_get_user(const char __user *pathname, int flags)
{
return -EOPNOTSUPP;
}
diff --git a/include/linux/cfi.h b/include/linux/cfi.h
new file mode 100644
index 0000000..e27033d
--- /dev/null
+++ b/include/linux/cfi.h
@@ -0,0 +1,38 @@
+#ifndef _LINUX_CFI_H
+#define _LINUX_CFI_H
+
+#include <linux/stringify.h>
+
+#ifdef CONFIG_CFI_CLANG
+#ifdef CONFIG_MODULES
+
+typedef void (*cfi_check_fn)(uint64_t, void *, void *);
+
+/* Compiler-generated function in each module, and the kernel */
+#define CFI_CHECK_FN __cfi_check
+#define CFI_CHECK_FN_NAME __stringify(CFI_CHECK_FN)
+
+extern void CFI_CHECK_FN(uint64_t, void *, void *);
+
+#ifdef CONFIG_CFI_CLANG_SHADOW
+extern void cfi_module_add(struct module *mod, unsigned long min_addr,
+ unsigned long max_addr);
+
+extern void cfi_module_remove(struct module *mod, unsigned long min_addr,
+ unsigned long max_addr);
+#else
+static inline void cfi_module_add(struct module *mod, unsigned long min_addr,
+ unsigned long max_addr)
+{
+}
+
+static inline void cfi_module_remove(struct module *mod, unsigned long min_addr,
+ unsigned long max_addr)
+{
+}
+#endif /* CONFIG_CFI_CLANG_SHADOW */
+
+#endif /* CONFIG_MODULES */
+#endif /* CONFIG_CFI_CLANG */
+
+#endif /* _LINUX_CFI_H */
diff --git a/include/linux/cgroup_subsys.h b/include/linux/cgroup_subsys.h
index acb77dcf..8996c09 100644
--- a/include/linux/cgroup_subsys.h
+++ b/include/linux/cgroup_subsys.h
@@ -21,6 +21,10 @@
SUBSYS(cpuacct)
#endif
+#if IS_ENABLED(CONFIG_SCHED_TUNE)
+SUBSYS(schedtune)
+#endif
+
#if IS_ENABLED(CONFIG_BLK_CGROUP)
SUBSYS(io)
#endif
diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
index 28b76f0..c456227 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -24,3 +24,12 @@
#ifdef __noretpoline
#undef __noretpoline
#endif
+
+#ifdef CONFIG_LTO_CLANG
+#ifdef CONFIG_FTRACE_MCOUNT_RECORD
+#define __norecordmcount \
+ __attribute__((__section__(".text..ftrace")))
+#endif
+
+#define __nocfi __attribute__((no_sanitize("cfi")))
+#endif
diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h
index 6b79a9b..247b571 100644
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -253,6 +253,14 @@
# define __nostackprotector
#endif
+#ifndef __norecordmcount
+#define __norecordmcount
+#endif
+
+#ifndef __nocfi
+#define __nocfi
+#endif
+
/*
* Assume alignment of return value.
*/
diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
index cbf85c4..938b2a5 100644
--- a/include/linux/cpufreq.h
+++ b/include/linux/cpufreq.h
@@ -920,6 +920,11 @@
extern void arch_freq_prepare_all(void);
extern unsigned int arch_freq_get_on_cpu(int cpu);
+extern void arch_set_freq_scale(struct cpumask *cpus, unsigned long cur_freq,
+ unsigned long max_freq);
+extern void arch_set_max_freq_scale(struct cpumask *cpus,
+ unsigned long policy_max_freq);
+
/* the following are really really optional */
extern struct freq_attr cpufreq_freq_attr_scaling_available_freqs;
extern struct freq_attr cpufreq_freq_attr_scaling_boost_freqs;
diff --git a/include/linux/cpufreq_times.h b/include/linux/cpufreq_times.h
new file mode 100644
index 0000000..757bf0c
--- /dev/null
+++ b/include/linux/cpufreq_times.h
@@ -0,0 +1,45 @@
+/* drivers/cpufreq/cpufreq_times.c
+ *
+ * Copyright (C) 2018 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_CPUFREQ_TIMES_H
+#define _LINUX_CPUFREQ_TIMES_H
+
+#include <linux/cpufreq.h>
+#include <linux/pid.h>
+
+#ifdef CONFIG_CPU_FREQ_TIMES
+void cpufreq_task_times_init(struct task_struct *p);
+void cpufreq_task_times_alloc(struct task_struct *p);
+void cpufreq_task_times_exit(struct task_struct *p);
+int proc_time_in_state_show(struct seq_file *m, struct pid_namespace *ns,
+ struct pid *pid, struct task_struct *p);
+void cpufreq_acct_update_power(struct task_struct *p, u64 cputime);
+void cpufreq_times_create_policy(struct cpufreq_policy *policy);
+void cpufreq_times_record_transition(struct cpufreq_freqs *freq);
+void cpufreq_task_times_remove_uids(uid_t uid_start, uid_t uid_end);
+int single_uid_time_in_state_open(struct inode *inode, struct file *file);
+#else
+static inline void cpufreq_task_times_init(struct task_struct *p) {}
+static inline void cpufreq_task_times_alloc(struct task_struct *p) {}
+static inline void cpufreq_task_times_exit(struct task_struct *p) {}
+static inline void cpufreq_acct_update_power(struct task_struct *p,
+ u64 cputime) {}
+static inline void cpufreq_times_create_policy(struct cpufreq_policy *policy) {}
+static inline void cpufreq_times_record_transition(
+ struct cpufreq_freqs *freq) {}
+static inline void cpufreq_task_times_remove_uids(uid_t uid_start,
+ uid_t uid_end) {}
+#endif /* CONFIG_CPU_FREQ_TIMES */
+#endif /* _LINUX_CPUFREQ_TIMES_H */
diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index a6989e0..113a148 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -131,7 +131,8 @@
struct cpuidle_device *dev);
extern int cpuidle_select(struct cpuidle_driver *drv,
- struct cpuidle_device *dev);
+ struct cpuidle_device *dev,
+ bool *stop_tick);
extern int cpuidle_enter(struct cpuidle_driver *drv,
struct cpuidle_device *dev, int index);
extern void cpuidle_reflect(struct cpuidle_device *dev, int index);
@@ -163,7 +164,7 @@
struct cpuidle_device *dev)
{return true; }
static inline int cpuidle_select(struct cpuidle_driver *drv,
- struct cpuidle_device *dev)
+ struct cpuidle_device *dev, bool *stop_tick)
{return -ENODEV; }
static inline int cpuidle_enter(struct cpuidle_driver *drv,
struct cpuidle_device *dev, int index)
@@ -214,7 +215,7 @@
#endif
/* kernel/sched/idle.c */
-extern void sched_idle_set_state(struct cpuidle_state *idle_state);
+extern void sched_idle_set_state(struct cpuidle_state *idle_state, int index);
extern void default_idle_call(void);
#ifdef CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED
@@ -246,7 +247,8 @@
struct cpuidle_device *dev);
int (*select) (struct cpuidle_driver *drv,
- struct cpuidle_device *dev);
+ struct cpuidle_device *dev,
+ bool *stop_tick);
void (*reflect) (struct cpuidle_device *dev, int index);
};
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index cc36484..29c4257 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -24,6 +24,7 @@
#include <linux/slab.h>
#include <linux/string.h>
#include <linux/uaccess.h>
+#include <linux/completion.h>
/*
* Autoloaded crypto modules should only use a prefixed name to avoid allowing
@@ -476,6 +477,45 @@
} CRYPTO_MINALIGN_ATTR;
/*
+ * A helper struct for waiting for completion of async crypto ops
+ */
+struct crypto_wait {
+ struct completion completion;
+ int err;
+};
+
+/*
+ * Macro for declaring a crypto op async wait object on stack
+ */
+#define DECLARE_CRYPTO_WAIT(_wait) \
+ struct crypto_wait _wait = { \
+ COMPLETION_INITIALIZER_ONSTACK((_wait).completion), 0 }
+
+/*
+ * Async ops completion helper functioons
+ */
+void crypto_req_done(struct crypto_async_request *req, int err);
+
+static inline int crypto_wait_req(int err, struct crypto_wait *wait)
+{
+ switch (err) {
+ case -EINPROGRESS:
+ case -EBUSY:
+ wait_for_completion(&wait->completion);
+ reinit_completion(&wait->completion);
+ err = wait->err;
+ break;
+ };
+
+ return err;
+}
+
+static inline void crypto_init_wait(struct crypto_wait *wait)
+{
+ init_completion(&wait->completion);
+}
+
+/*
* Algorithm registration interface.
*/
int crypto_register_alg(struct crypto_alg *alg);
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 006f4cc..e79b7ba 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -149,6 +149,7 @@
int (*d_manage)(const struct path *, bool);
struct dentry *(*d_real)(struct dentry *, const struct inode *,
unsigned int, unsigned int);
+ void (*d_canonical_path)(const struct path *, struct path *);
} ____cacheline_aligned;
/*
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index a553843..a79c930 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -431,6 +431,12 @@
void *dm_get_mdptr(struct mapped_device *md);
/*
+ * Export the device via the ioctl interface (uses mdptr).
+ */
+int dm_ioctl_export(struct mapped_device *md, const char *name,
+ const char *uuid);
+
+/*
* A device can still be used while suspended, but I/O is deferred.
*/
int dm_suspend(struct mapped_device *md, unsigned suspend_flags);
@@ -459,6 +465,13 @@
struct queue_limits *dm_get_queue_limits(struct mapped_device *md);
+void dm_lock_md_type(struct mapped_device *md);
+void dm_unlock_md_type(struct mapped_device *md);
+void dm_set_md_type(struct mapped_device *md, unsigned type);
+unsigned dm_get_md_type(struct mapped_device *md);
+int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t);
+unsigned dm_table_get_type(struct dm_table *t);
+
/*
* Geometry functions.
*/
diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
index 1718950..6385ecd 100644
--- a/include/linux/dma-fence.h
+++ b/include/linux/dma-fence.h
@@ -111,6 +111,7 @@
* @get_driver_name: returns the driver name.
* @get_timeline_name: return the name of the context this fence belongs to.
* @enable_signaling: enable software signaling of fence.
+ * @disable_signaling: disable software signaling of fence (optional).
* @signaled: [optional] peek whether the fence is signaled, can be null.
* @wait: custom wait implementation, or dma_fence_default_wait.
* @release: [optional] called on destruction of fence, can be null
@@ -170,6 +171,7 @@
const char * (*get_driver_name)(struct dma_fence *fence);
const char * (*get_timeline_name)(struct dma_fence *fence);
bool (*enable_signaling)(struct dma_fence *fence);
+ void (*disable_signaling)(struct dma_fence *fence);
bool (*signaled)(struct dma_fence *fence);
signed long (*wait)(struct dma_fence *fence,
bool intr, signed long timeout);
diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h
index 2a0c453..aa5db8b 100644
--- a/include/linux/f2fs_fs.h
+++ b/include/linux/f2fs_fs.h
@@ -21,6 +21,7 @@
#define F2FS_BLKSIZE 4096 /* support only 4KB block */
#define F2FS_BLKSIZE_BITS 12 /* bits for F2FS_BLKSIZE */
#define F2FS_MAX_EXTENSION 64 /* # of extension entries */
+#define F2FS_EXTENSION_LEN 8 /* max size of extension */
#define F2FS_BLK_ALIGN(x) (((x) + F2FS_BLKSIZE - 1) >> F2FS_BLKSIZE_BITS)
#define NULL_ADDR ((block_t)0) /* used as block_t addresses */
@@ -36,15 +37,16 @@
#define F2FS_NODE_INO(sbi) ((sbi)->node_ino_num)
#define F2FS_META_INO(sbi) ((sbi)->meta_ino_num)
-#define F2FS_IO_SIZE(sbi) (1 << (sbi)->write_io_size_bits) /* Blocks */
-#define F2FS_IO_SIZE_KB(sbi) (1 << ((sbi)->write_io_size_bits + 2)) /* KB */
-#define F2FS_IO_SIZE_BYTES(sbi) (1 << ((sbi)->write_io_size_bits + 12)) /* B */
-#define F2FS_IO_SIZE_BITS(sbi) ((sbi)->write_io_size_bits) /* power of 2 */
+#define F2FS_MAX_QUOTAS 3
+
+#define F2FS_IO_SIZE(sbi) (1 << F2FS_OPTION(sbi).write_io_size_bits) /* Blocks */
+#define F2FS_IO_SIZE_KB(sbi) (1 << (F2FS_OPTION(sbi).write_io_size_bits + 2)) /* KB */
+#define F2FS_IO_SIZE_BYTES(sbi) (1 << (F2FS_OPTION(sbi).write_io_size_bits + 12)) /* B */
+#define F2FS_IO_SIZE_BITS(sbi) (F2FS_OPTION(sbi).write_io_size_bits) /* power of 2 */
#define F2FS_IO_SIZE_MASK(sbi) (F2FS_IO_SIZE(sbi) - 1)
/* This flag is used by node and meta inodes, and by recovery */
#define GFP_F2FS_ZERO (GFP_NOFS | __GFP_ZERO)
-#define GFP_F2FS_HIGH_ZERO (GFP_NOFS | __GFP_ZERO | __GFP_HIGHMEM)
/*
* For further optimization on multi-head logs, on-disk layout supports maximum
@@ -100,7 +102,7 @@
__u8 uuid[16]; /* 128-bit uuid for volume */
__le16 volume_name[MAX_VOLUME_NAME]; /* volume name */
__le32 extension_count; /* # of extensions below */
- __u8 extension_list[F2FS_MAX_EXTENSION][8]; /* extension array */
+ __u8 extension_list[F2FS_MAX_EXTENSION][F2FS_EXTENSION_LEN];/* extension array */
__le32 cp_payload;
__u8 version[VERSION_LEN]; /* the kernel version */
__u8 init_version[VERSION_LEN]; /* the initial kernel version */
@@ -108,12 +110,16 @@
__u8 encryption_level; /* versioning level for encryption */
__u8 encrypt_pw_salt[16]; /* Salt used for string2key algorithm */
struct f2fs_device devs[MAX_DEVICES]; /* device list */
- __u8 reserved[327]; /* valid reserved region */
+ __le32 qf_ino[F2FS_MAX_QUOTAS]; /* quota inode numbers */
+ __u8 hot_ext_count; /* # of hot file extension */
+ __u8 reserved[314]; /* valid reserved region */
} __packed;
/*
* For checkpoint
*/
+#define CP_LARGE_NAT_BITMAP_FLAG 0x00000400
+#define CP_NOCRC_RECOVERY_FLAG 0x00000200
#define CP_TRIMMED_FLAG 0x00000100
#define CP_NAT_BITS_FLAG 0x00000080
#define CP_CRC_RECOVERY_FLAG 0x00000040
@@ -184,7 +190,8 @@
} __packed;
#define F2FS_NAME_LEN 255
-#define F2FS_INLINE_XATTR_ADDRS 50 /* 200 bytes for inline xattrs */
+/* 200 bytes for inline xattrs by default */
+#define DEFAULT_INLINE_XATTR_ADDRS 50
#define DEF_ADDRS_PER_INODE 923 /* Address Pointers in an Inode */
#define CUR_ADDRS_PER_INODE(inode) (DEF_ADDRS_PER_INODE - \
get_extra_isize(inode))
@@ -208,6 +215,7 @@
#define F2FS_DATA_EXIST 0x08 /* file inline data exist flag */
#define F2FS_INLINE_DOTS 0x10 /* file having implicit dot dentries */
#define F2FS_EXTRA_ATTR 0x20 /* file having extra attribute */
+#define F2FS_PIN_FILE 0x40 /* file should not be gced */
struct f2fs_inode {
__le16 i_mode; /* file mode */
@@ -225,7 +233,13 @@
__le32 i_ctime_nsec; /* change time in nano scale */
__le32 i_mtime_nsec; /* modification time in nano scale */
__le32 i_generation; /* file version (for NFS) */
- __le32 i_current_depth; /* only for directory depth */
+ union {
+ __le32 i_current_depth; /* only for directory depth */
+ __le16 i_gc_failures; /*
+ * # of gc failures on pinned file.
+ * only for regular files.
+ */
+ };
__le32 i_xattr_nid; /* nid to save xattr */
__le32 i_flags; /* file attributes */
__le32 i_pino; /* parent inode number */
@@ -238,11 +252,13 @@
union {
struct {
__le16 i_extra_isize; /* extra inode attribute size */
- __le16 i_padding; /* padding */
+ __le16 i_inline_xattr_size; /* inline xattr size, unit: 4 bytes */
__le32 i_projid; /* project id */
__le32 i_inode_checksum;/* inode meta checksum */
+ __le64 i_crtime; /* creation time */
+ __le32 i_crtime_nsec; /* creation time in nano scale */
__le32 i_extra_end[0]; /* for attribute size calculation */
- };
+ } __packed;
__le32 i_addr[DEF_ADDRS_PER_INODE]; /* Pointers to data blocks */
};
__le32 i_nid[DEF_NIDS_PER_INODE]; /* direct(2), indirect(2),
@@ -289,6 +305,10 @@
*/
#define NAT_ENTRY_PER_BLOCK (PAGE_SIZE / sizeof(struct f2fs_nat_entry))
#define NAT_ENTRY_BITMAP_SIZE ((NAT_ENTRY_PER_BLOCK + 7) / 8)
+#define NAT_ENTRY_BITMAP_SIZE_ALIGNED \
+ ((NAT_ENTRY_BITMAP_SIZE + BITS_PER_LONG - 1) / \
+ BITS_PER_LONG * BITS_PER_LONG)
+
struct f2fs_nat_entry {
__u8 version; /* latest version of cached nat entry */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index cc613f20..51f0ea6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1597,13 +1597,21 @@
* VFS helper functions..
*/
extern int vfs_create(struct inode *, struct dentry *, umode_t, bool);
+extern int vfs_create2(struct vfsmount *, struct inode *, struct dentry *, umode_t, bool);
extern int vfs_mkdir(struct inode *, struct dentry *, umode_t);
+extern int vfs_mkdir2(struct vfsmount *, struct inode *, struct dentry *, umode_t);
extern int vfs_mknod(struct inode *, struct dentry *, umode_t, dev_t);
+extern int vfs_mknod2(struct vfsmount *, struct inode *, struct dentry *, umode_t, dev_t);
extern int vfs_symlink(struct inode *, struct dentry *, const char *);
+extern int vfs_symlink2(struct vfsmount *, struct inode *, struct dentry *, const char *);
extern int vfs_link(struct dentry *, struct inode *, struct dentry *, struct inode **);
+extern int vfs_link2(struct vfsmount *, struct dentry *, struct inode *, struct dentry *, struct inode **);
extern int vfs_rmdir(struct inode *, struct dentry *);
+extern int vfs_rmdir2(struct vfsmount *, struct inode *, struct dentry *);
extern int vfs_unlink(struct inode *, struct dentry *, struct inode **);
+extern int vfs_unlink2(struct vfsmount *, struct inode *, struct dentry *, struct inode **);
extern int vfs_rename(struct inode *, struct dentry *, struct inode *, struct dentry *, struct inode **, unsigned int);
+extern int vfs_rename2(struct vfsmount *, struct inode *, struct dentry *, struct inode *, struct dentry *, struct inode **, unsigned int);
extern int vfs_whiteout(struct inode *, struct dentry *);
extern struct dentry *vfs_tmpfile(struct dentry *dentry, umode_t mode,
@@ -1734,6 +1742,7 @@
struct dentry * (*lookup) (struct inode *,struct dentry *, unsigned int);
const char * (*get_link) (struct dentry *, struct inode *, struct delayed_call *);
int (*permission) (struct inode *, int);
+ int (*permission2) (struct vfsmount *, struct inode *, int);
struct posix_acl * (*get_acl)(struct inode *, int);
int (*readlink) (struct dentry *, char __user *,int);
@@ -1748,7 +1757,8 @@
int (*rename) (struct inode *, struct dentry *,
struct inode *, struct dentry *, unsigned int);
int (*setattr) (struct dentry *, struct iattr *);
- int (*getattr) (const struct path *, struct kstat *, u32, unsigned int);
+ int (*setattr2) (struct vfsmount *, struct dentry *, struct iattr *);
+ int (*getattr) (const struct path *, struct kstat *, u32, unsigned int);
ssize_t (*listxattr) (struct dentry *, char *, size_t);
int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
u64 len);
@@ -1816,9 +1826,13 @@
int (*unfreeze_fs) (struct super_block *);
int (*statfs) (struct dentry *, struct kstatfs *);
int (*remount_fs) (struct super_block *, int *, char *);
+ int (*remount_fs2) (struct vfsmount *, struct super_block *, int *, char *);
+ void *(*clone_mnt_data) (void *);
+ void (*copy_mnt_data) (void *, void *);
void (*umount_begin) (struct super_block *);
int (*show_options)(struct seq_file *, struct dentry *);
+ int (*show_options2)(struct vfsmount *,struct seq_file *, struct dentry *);
int (*show_devname)(struct seq_file *, struct dentry *);
int (*show_path)(struct seq_file *, struct dentry *);
int (*show_stats)(struct seq_file *, struct dentry *);
@@ -1855,6 +1869,7 @@
#else
#define S_DAX 0 /* Make all the DAX code disappear */
#endif
+#define S_ENCRYPTED 16384 /* Encrypted file (using fs/crypto/) */
/*
* Note that nosuid etc flags are inode-specific: setting some file-system
@@ -1894,6 +1909,7 @@
#define IS_AUTOMOUNT(inode) ((inode)->i_flags & S_AUTOMOUNT)
#define IS_NOSEC(inode) ((inode)->i_flags & S_NOSEC)
#define IS_DAX(inode) ((inode)->i_flags & S_DAX)
+#define IS_ENCRYPTED(inode) ((inode)->i_flags & S_ENCRYPTED)
#define IS_WHITEOUT(inode) (S_ISCHR(inode->i_mode) && \
(inode)->i_rdev == WHITEOUT_DEV)
@@ -2076,6 +2092,9 @@
#define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() during rename() internally. */
struct dentry *(*mount) (struct file_system_type *, int,
const char *, void *);
+ struct dentry *(*mount2) (struct vfsmount *, struct file_system_type *, int,
+ const char *, void *);
+ void *(*alloc_mnt_data) (void);
void (*kill_sb) (struct super_block *);
struct module *owner;
struct file_system_type * next;
@@ -2377,6 +2396,8 @@
extern long vfs_truncate(const struct path *, loff_t);
extern int do_truncate(struct dentry *, loff_t start, unsigned int time_attrs,
struct file *filp);
+extern int do_truncate2(struct vfsmount *, struct dentry *, loff_t start,
+ unsigned int time_attrs, struct file *filp);
extern int vfs_fallocate(struct file *file, int mode, loff_t offset,
loff_t len);
extern long do_sys_open(int dfd, const char __user *filename, int flags,
@@ -2681,8 +2702,11 @@
extern sector_t bmap(struct inode *, sector_t);
#endif
extern int notify_change(struct dentry *, struct iattr *, struct inode **);
+extern int notify_change2(struct vfsmount *, struct dentry *, struct iattr *, struct inode **);
extern int inode_permission(struct inode *, int);
+extern int inode_permission2(struct vfsmount *, struct inode *, int);
extern int __inode_permission(struct inode *, int);
+extern int __inode_permission2(struct vfsmount *, struct inode *, int);
extern int generic_permission(struct inode *, int);
extern int __check_sticky(struct inode *dir, struct inode *inode);
diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h
new file mode 100644
index 0000000..952ab97
--- /dev/null
+++ b/include/linux/fscrypt.h
@@ -0,0 +1,254 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * fscrypt.h: declarations for per-file encryption
+ *
+ * Filesystems that implement per-file encryption include this header
+ * file with the __FS_HAS_ENCRYPTION set according to whether that filesystem
+ * is being built with encryption support or not.
+ *
+ * Copyright (C) 2015, Google, Inc.
+ *
+ * Written by Michael Halcrow, 2015.
+ * Modified by Jaegeuk Kim, 2015.
+ */
+#ifndef _LINUX_FSCRYPT_H
+#define _LINUX_FSCRYPT_H
+
+#include <linux/fs.h>
+
+#define FS_CRYPTO_BLOCK_SIZE 16
+
+struct fscrypt_ctx;
+struct fscrypt_info;
+
+struct fscrypt_str {
+ unsigned char *name;
+ u32 len;
+};
+
+struct fscrypt_name {
+ const struct qstr *usr_fname;
+ struct fscrypt_str disk_name;
+ u32 hash;
+ u32 minor_hash;
+ struct fscrypt_str crypto_buf;
+};
+
+#define FSTR_INIT(n, l) { .name = n, .len = l }
+#define FSTR_TO_QSTR(f) QSTR_INIT((f)->name, (f)->len)
+#define fname_name(p) ((p)->disk_name.name)
+#define fname_len(p) ((p)->disk_name.len)
+
+/* Maximum value for the third parameter of fscrypt_operations.set_context(). */
+#define FSCRYPT_SET_CONTEXT_MAX_SIZE 28
+
+#if __FS_HAS_ENCRYPTION
+#include <linux/fscrypt_supp.h>
+#else
+#include <linux/fscrypt_notsupp.h>
+#endif
+
+/**
+ * fscrypt_require_key - require an inode's encryption key
+ * @inode: the inode we need the key for
+ *
+ * If the inode is encrypted, set up its encryption key if not already done.
+ * Then require that the key be present and return -ENOKEY otherwise.
+ *
+ * No locks are needed, and the key will live as long as the struct inode --- so
+ * it won't go away from under you.
+ *
+ * Return: 0 on success, -ENOKEY if the key is missing, or another -errno code
+ * if a problem occurred while setting up the encryption key.
+ */
+static inline int fscrypt_require_key(struct inode *inode)
+{
+ if (IS_ENCRYPTED(inode)) {
+ int err = fscrypt_get_encryption_info(inode);
+
+ if (err)
+ return err;
+ if (!fscrypt_has_encryption_key(inode))
+ return -ENOKEY;
+ }
+ return 0;
+}
+
+/**
+ * fscrypt_prepare_link - prepare to link an inode into a possibly-encrypted directory
+ * @old_dentry: an existing dentry for the inode being linked
+ * @dir: the target directory
+ * @dentry: negative dentry for the target filename
+ *
+ * A new link can only be added to an encrypted directory if the directory's
+ * encryption key is available --- since otherwise we'd have no way to encrypt
+ * the filename. Therefore, we first set up the directory's encryption key (if
+ * not already done) and return an error if it's unavailable.
+ *
+ * We also verify that the link will not violate the constraint that all files
+ * in an encrypted directory tree use the same encryption policy.
+ *
+ * Return: 0 on success, -ENOKEY if the directory's encryption key is missing,
+ * -EPERM if the link would result in an inconsistent encryption policy, or
+ * another -errno code.
+ */
+static inline int fscrypt_prepare_link(struct dentry *old_dentry,
+ struct inode *dir,
+ struct dentry *dentry)
+{
+ if (IS_ENCRYPTED(dir))
+ return __fscrypt_prepare_link(d_inode(old_dentry), dir);
+ return 0;
+}
+
+/**
+ * fscrypt_prepare_rename - prepare for a rename between possibly-encrypted directories
+ * @old_dir: source directory
+ * @old_dentry: dentry for source file
+ * @new_dir: target directory
+ * @new_dentry: dentry for target location (may be negative unless exchanging)
+ * @flags: rename flags (we care at least about %RENAME_EXCHANGE)
+ *
+ * Prepare for ->rename() where the source and/or target directories may be
+ * encrypted. A new link can only be added to an encrypted directory if the
+ * directory's encryption key is available --- since otherwise we'd have no way
+ * to encrypt the filename. A rename to an existing name, on the other hand,
+ * *is* cryptographically possible without the key. However, we take the more
+ * conservative approach and just forbid all no-key renames.
+ *
+ * We also verify that the rename will not violate the constraint that all files
+ * in an encrypted directory tree use the same encryption policy.
+ *
+ * Return: 0 on success, -ENOKEY if an encryption key is missing, -EPERM if the
+ * rename would cause inconsistent encryption policies, or another -errno code.
+ */
+static inline int fscrypt_prepare_rename(struct inode *old_dir,
+ struct dentry *old_dentry,
+ struct inode *new_dir,
+ struct dentry *new_dentry,
+ unsigned int flags)
+{
+ if (IS_ENCRYPTED(old_dir) || IS_ENCRYPTED(new_dir))
+ return __fscrypt_prepare_rename(old_dir, old_dentry,
+ new_dir, new_dentry, flags);
+ return 0;
+}
+
+/**
+ * fscrypt_prepare_lookup - prepare to lookup a name in a possibly-encrypted directory
+ * @dir: directory being searched
+ * @dentry: filename being looked up
+ * @flags: lookup flags
+ *
+ * Prepare for ->lookup() in a directory which may be encrypted. Lookups can be
+ * done with or without the directory's encryption key; without the key,
+ * filenames are presented in encrypted form. Therefore, we'll try to set up
+ * the directory's encryption key, but even without it the lookup can continue.
+ *
+ * To allow invalidating stale dentries if the directory's encryption key is
+ * added later, we also install a custom ->d_revalidate() method and use the
+ * DCACHE_ENCRYPTED_WITH_KEY flag to indicate whether a given dentry is a
+ * plaintext name (flag set) or a ciphertext name (flag cleared).
+ *
+ * Return: 0 on success, -errno if a problem occurred while setting up the
+ * encryption key
+ */
+static inline int fscrypt_prepare_lookup(struct inode *dir,
+ struct dentry *dentry,
+ unsigned int flags)
+{
+ if (IS_ENCRYPTED(dir))
+ return __fscrypt_prepare_lookup(dir, dentry);
+ return 0;
+}
+
+/**
+ * fscrypt_prepare_setattr - prepare to change a possibly-encrypted inode's attributes
+ * @dentry: dentry through which the inode is being changed
+ * @attr: attributes to change
+ *
+ * Prepare for ->setattr() on a possibly-encrypted inode. On an encrypted file,
+ * most attribute changes are allowed even without the encryption key. However,
+ * without the encryption key we do have to forbid truncates. This is needed
+ * because the size being truncated to may not be a multiple of the filesystem
+ * block size, and in that case we'd have to decrypt the final block, zero the
+ * portion past i_size, and re-encrypt it. (We *could* allow truncating to a
+ * filesystem block boundary, but it's simpler to just forbid all truncates ---
+ * and we already forbid all other contents modifications without the key.)
+ *
+ * Return: 0 on success, -ENOKEY if the key is missing, or another -errno code
+ * if a problem occurred while setting up the encryption key.
+ */
+static inline int fscrypt_prepare_setattr(struct dentry *dentry,
+ struct iattr *attr)
+{
+ if (attr->ia_valid & ATTR_SIZE)
+ return fscrypt_require_key(d_inode(dentry));
+ return 0;
+}
+
+/**
+ * fscrypt_prepare_symlink - prepare to create a possibly-encrypted symlink
+ * @dir: directory in which the symlink is being created
+ * @target: plaintext symlink target
+ * @len: length of @target excluding null terminator
+ * @max_len: space the filesystem has available to store the symlink target
+ * @disk_link: (out) the on-disk symlink target being prepared
+ *
+ * This function computes the size the symlink target will require on-disk,
+ * stores it in @disk_link->len, and validates it against @max_len. An
+ * encrypted symlink may be longer than the original.
+ *
+ * Additionally, @disk_link->name is set to @target if the symlink will be
+ * unencrypted, but left NULL if the symlink will be encrypted. For encrypted
+ * symlinks, the filesystem must call fscrypt_encrypt_symlink() to create the
+ * on-disk target later. (The reason for the two-step process is that some
+ * filesystems need to know the size of the symlink target before creating the
+ * inode, e.g. to determine whether it will be a "fast" or "slow" symlink.)
+ *
+ * Return: 0 on success, -ENAMETOOLONG if the symlink target is too long,
+ * -ENOKEY if the encryption key is missing, or another -errno code if a problem
+ * occurred while setting up the encryption key.
+ */
+static inline int fscrypt_prepare_symlink(struct inode *dir,
+ const char *target,
+ unsigned int len,
+ unsigned int max_len,
+ struct fscrypt_str *disk_link)
+{
+ if (IS_ENCRYPTED(dir) || fscrypt_dummy_context_enabled(dir))
+ return __fscrypt_prepare_symlink(dir, len, max_len, disk_link);
+
+ disk_link->name = (unsigned char *)target;
+ disk_link->len = len + 1;
+ if (disk_link->len > max_len)
+ return -ENAMETOOLONG;
+ return 0;
+}
+
+/**
+ * fscrypt_encrypt_symlink - encrypt the symlink target if needed
+ * @inode: symlink inode
+ * @target: plaintext symlink target
+ * @len: length of @target excluding null terminator
+ * @disk_link: (in/out) the on-disk symlink target being prepared
+ *
+ * If the symlink target needs to be encrypted, then this function encrypts it
+ * into @disk_link->name. fscrypt_prepare_symlink() must have been called
+ * previously to compute @disk_link->len. If the filesystem did not allocate a
+ * buffer for @disk_link->name after calling fscrypt_prepare_link(), then one
+ * will be kmalloc()'ed and the filesystem will be responsible for freeing it.
+ *
+ * Return: 0 on success, -errno on failure
+ */
+static inline int fscrypt_encrypt_symlink(struct inode *inode,
+ const char *target,
+ unsigned int len,
+ struct fscrypt_str *disk_link)
+{
+ if (IS_ENCRYPTED(inode))
+ return __fscrypt_encrypt_symlink(inode, target, len, disk_link);
+ return 0;
+}
+
+#endif /* _LINUX_FSCRYPT_H */
diff --git a/include/linux/fscrypt_common.h b/include/linux/fscrypt_common.h
deleted file mode 100644
index 854d724..0000000
--- a/include/linux/fscrypt_common.h
+++ /dev/null
@@ -1,142 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-/*
- * fscrypt_common.h: common declarations for per-file encryption
- *
- * Copyright (C) 2015, Google, Inc.
- *
- * Written by Michael Halcrow, 2015.
- * Modified by Jaegeuk Kim, 2015.
- */
-
-#ifndef _LINUX_FSCRYPT_COMMON_H
-#define _LINUX_FSCRYPT_COMMON_H
-
-#include <linux/key.h>
-#include <linux/fs.h>
-#include <linux/mm.h>
-#include <linux/bio.h>
-#include <linux/dcache.h>
-#include <crypto/skcipher.h>
-#include <uapi/linux/fs.h>
-
-#define FS_CRYPTO_BLOCK_SIZE 16
-
-struct fscrypt_info;
-
-struct fscrypt_ctx {
- union {
- struct {
- struct page *bounce_page; /* Ciphertext page */
- struct page *control_page; /* Original page */
- } w;
- struct {
- struct bio *bio;
- struct work_struct work;
- } r;
- struct list_head free_list; /* Free list */
- };
- u8 flags; /* Flags */
-};
-
-/**
- * For encrypted symlinks, the ciphertext length is stored at the beginning
- * of the string in little-endian format.
- */
-struct fscrypt_symlink_data {
- __le16 len;
- char encrypted_path[1];
-} __packed;
-
-struct fscrypt_str {
- unsigned char *name;
- u32 len;
-};
-
-struct fscrypt_name {
- const struct qstr *usr_fname;
- struct fscrypt_str disk_name;
- u32 hash;
- u32 minor_hash;
- struct fscrypt_str crypto_buf;
-};
-
-#define FSTR_INIT(n, l) { .name = n, .len = l }
-#define FSTR_TO_QSTR(f) QSTR_INIT((f)->name, (f)->len)
-#define fname_name(p) ((p)->disk_name.name)
-#define fname_len(p) ((p)->disk_name.len)
-
-/*
- * fscrypt superblock flags
- */
-#define FS_CFLG_OWN_PAGES (1U << 1)
-
-/*
- * crypto opertions for filesystems
- */
-struct fscrypt_operations {
- unsigned int flags;
- const char *key_prefix;
- int (*get_context)(struct inode *, void *, size_t);
- int (*set_context)(struct inode *, const void *, size_t, void *);
- bool (*dummy_context)(struct inode *);
- bool (*is_encrypted)(struct inode *);
- bool (*empty_dir)(struct inode *);
- unsigned (*max_namelen)(struct inode *);
-};
-
-/* Maximum value for the third parameter of fscrypt_operations.set_context(). */
-#define FSCRYPT_SET_CONTEXT_MAX_SIZE 28
-
-static inline bool fscrypt_dummy_context_enabled(struct inode *inode)
-{
- if (inode->i_sb->s_cop->dummy_context &&
- inode->i_sb->s_cop->dummy_context(inode))
- return true;
- return false;
-}
-
-static inline bool fscrypt_valid_enc_modes(u32 contents_mode,
- u32 filenames_mode)
-{
- if (contents_mode == FS_ENCRYPTION_MODE_AES_128_CBC &&
- filenames_mode == FS_ENCRYPTION_MODE_AES_128_CTS)
- return true;
-
- if (contents_mode == FS_ENCRYPTION_MODE_AES_256_XTS &&
- filenames_mode == FS_ENCRYPTION_MODE_AES_256_CTS)
- return true;
-
- return false;
-}
-
-static inline bool fscrypt_is_dot_dotdot(const struct qstr *str)
-{
- if (str->len == 1 && str->name[0] == '.')
- return true;
-
- if (str->len == 2 && str->name[0] == '.' && str->name[1] == '.')
- return true;
-
- return false;
-}
-
-static inline struct page *fscrypt_control_page(struct page *page)
-{
-#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
- return ((struct fscrypt_ctx *)page_private(page))->w.control_page;
-#else
- WARN_ON_ONCE(1);
- return ERR_PTR(-EINVAL);
-#endif
-}
-
-static inline int fscrypt_has_encryption_key(const struct inode *inode)
-{
-#if IS_ENABLED(CONFIG_FS_ENCRYPTION)
- return (inode->i_crypt_info != NULL);
-#else
- return 0;
-#endif
-}
-
-#endif /* _LINUX_FSCRYPT_COMMON_H */
diff --git a/include/linux/fscrypt_notsupp.h b/include/linux/fscrypt_notsupp.h
index 19609ce..ee8b43e 100644
--- a/include/linux/fscrypt_notsupp.h
+++ b/include/linux/fscrypt_notsupp.h
@@ -4,14 +4,31 @@
*
* This stubs out the fscrypt functions for filesystems configured without
* encryption support.
+ *
+ * Do not include this file directly. Use fscrypt.h instead!
*/
+#ifndef _LINUX_FSCRYPT_H
+#error "Incorrect include of linux/fscrypt_notsupp.h!"
+#endif
#ifndef _LINUX_FSCRYPT_NOTSUPP_H
#define _LINUX_FSCRYPT_NOTSUPP_H
-#include <linux/fscrypt_common.h>
+static inline bool fscrypt_has_encryption_key(const struct inode *inode)
+{
+ return false;
+}
+
+static inline bool fscrypt_dummy_context_enabled(struct inode *inode)
+{
+ return false;
+}
/* crypto.c */
+static inline void fscrypt_enqueue_decrypt_work(struct work_struct *work)
+{
+}
+
static inline struct fscrypt_ctx *fscrypt_get_ctx(const struct inode *inode,
gfp_t gfp_flags)
{
@@ -40,22 +57,17 @@
return -EOPNOTSUPP;
}
+static inline struct page *fscrypt_control_page(struct page *page)
+{
+ WARN_ON_ONCE(1);
+ return ERR_PTR(-EINVAL);
+}
static inline void fscrypt_restore_control_page(struct page *page)
{
return;
}
-static inline void fscrypt_set_d_op(struct dentry *dentry)
-{
- return;
-}
-
-static inline void fscrypt_set_encrypted_dentry(struct dentry *dentry)
-{
- return;
-}
-
/* policy.c */
static inline int fscrypt_ioctl_set_policy(struct file *filp,
const void __user *arg)
@@ -87,8 +99,7 @@
return -EOPNOTSUPP;
}
-static inline void fscrypt_put_encryption_info(struct inode *inode,
- struct fscrypt_info *ci)
+static inline void fscrypt_put_encryption_info(struct inode *inode)
{
return;
}
@@ -98,7 +109,7 @@
const struct qstr *iname,
int lookup, struct fscrypt_name *fname)
{
- if (dir->i_sb->s_cop->is_encrypted(dir))
+ if (IS_ENCRYPTED(dir))
return -EOPNOTSUPP;
memset(fname, 0, sizeof(struct fscrypt_name));
@@ -113,16 +124,8 @@
return;
}
-static inline u32 fscrypt_fname_encrypted_size(const struct inode *inode,
- u32 ilen)
-{
- /* never happens */
- WARN_ON(1);
- return 0;
-}
-
static inline int fscrypt_fname_alloc_buffer(const struct inode *inode,
- u32 ilen,
+ u32 max_encrypted_len,
struct fscrypt_str *crypto_str)
{
return -EOPNOTSUPP;
@@ -141,13 +144,6 @@
return -EOPNOTSUPP;
}
-static inline int fscrypt_fname_usr_to_disk(struct inode *inode,
- const struct qstr *iname,
- struct fscrypt_str *oname)
-{
- return -EOPNOTSUPP;
-}
-
static inline bool fscrypt_match_name(const struct fscrypt_name *fname,
const u8 *de_name, u32 de_name_len)
{
@@ -158,10 +154,13 @@
}
/* bio.c */
-static inline void fscrypt_decrypt_bio_pages(struct fscrypt_ctx *ctx,
- struct bio *bio)
+static inline void fscrypt_decrypt_bio(struct bio *bio)
{
- return;
+}
+
+static inline void fscrypt_enqueue_decrypt_bio(struct fscrypt_ctx *ctx,
+ struct bio *bio)
+{
}
static inline void fscrypt_pullback_bio_page(struct page **page, bool restore)
@@ -175,4 +174,58 @@
return -EOPNOTSUPP;
}
+/* hooks.c */
+
+static inline int fscrypt_file_open(struct inode *inode, struct file *filp)
+{
+ if (IS_ENCRYPTED(inode))
+ return -EOPNOTSUPP;
+ return 0;
+}
+
+static inline int __fscrypt_prepare_link(struct inode *inode,
+ struct inode *dir)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int __fscrypt_prepare_rename(struct inode *old_dir,
+ struct dentry *old_dentry,
+ struct inode *new_dir,
+ struct dentry *new_dentry,
+ unsigned int flags)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int __fscrypt_prepare_lookup(struct inode *dir,
+ struct dentry *dentry)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int __fscrypt_prepare_symlink(struct inode *dir,
+ unsigned int len,
+ unsigned int max_len,
+ struct fscrypt_str *disk_link)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int __fscrypt_encrypt_symlink(struct inode *inode,
+ const char *target,
+ unsigned int len,
+ struct fscrypt_str *disk_link)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline const char *fscrypt_get_symlink(struct inode *inode,
+ const void *caddr,
+ unsigned int max_size,
+ struct delayed_call *done)
+{
+ return ERR_PTR(-EOPNOTSUPP);
+}
+
#endif /* _LINUX_FSCRYPT_NOTSUPP_H */
diff --git a/include/linux/fscrypt_supp.h b/include/linux/fscrypt_supp.h
index 5153dce..6456c6b 100644
--- a/include/linux/fscrypt_supp.h
+++ b/include/linux/fscrypt_supp.h
@@ -2,16 +2,64 @@
/*
* fscrypt_supp.h
*
- * This is included by filesystems configured with encryption support.
+ * Do not include this file directly. Use fscrypt.h instead!
*/
+#ifndef _LINUX_FSCRYPT_H
+#error "Incorrect include of linux/fscrypt_supp.h!"
+#endif
#ifndef _LINUX_FSCRYPT_SUPP_H
#define _LINUX_FSCRYPT_SUPP_H
-#include <linux/fscrypt_common.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+
+/*
+ * fscrypt superblock flags
+ */
+#define FS_CFLG_OWN_PAGES (1U << 1)
+
+/*
+ * crypto operations for filesystems
+ */
+struct fscrypt_operations {
+ unsigned int flags;
+ const char *key_prefix;
+ int (*get_context)(struct inode *, void *, size_t);
+ int (*set_context)(struct inode *, const void *, size_t, void *);
+ bool (*dummy_context)(struct inode *);
+ bool (*empty_dir)(struct inode *);
+ unsigned int max_namelen;
+};
+
+struct fscrypt_ctx {
+ union {
+ struct {
+ struct page *bounce_page; /* Ciphertext page */
+ struct page *control_page; /* Original page */
+ } w;
+ struct {
+ struct bio *bio;
+ struct work_struct work;
+ } r;
+ struct list_head free_list; /* Free list */
+ };
+ u8 flags; /* Flags */
+};
+
+static inline bool fscrypt_has_encryption_key(const struct inode *inode)
+{
+ return (inode->i_crypt_info != NULL);
+}
+
+static inline bool fscrypt_dummy_context_enabled(struct inode *inode)
+{
+ return inode->i_sb->s_cop->dummy_context &&
+ inode->i_sb->s_cop->dummy_context(inode);
+}
/* crypto.c */
-extern struct kmem_cache *fscrypt_info_cachep;
+extern void fscrypt_enqueue_decrypt_work(struct work_struct *);
extern struct fscrypt_ctx *fscrypt_get_ctx(const struct inode *, gfp_t);
extern void fscrypt_release_ctx(struct fscrypt_ctx *);
extern struct page *fscrypt_encrypt_page(const struct inode *, struct page *,
@@ -19,22 +67,14 @@
u64, gfp_t);
extern int fscrypt_decrypt_page(const struct inode *, struct page *, unsigned int,
unsigned int, u64);
+
+static inline struct page *fscrypt_control_page(struct page *page)
+{
+ return ((struct fscrypt_ctx *)page_private(page))->w.control_page;
+}
+
extern void fscrypt_restore_control_page(struct page *);
-extern const struct dentry_operations fscrypt_d_ops;
-
-static inline void fscrypt_set_d_op(struct dentry *dentry)
-{
- d_set_d_op(dentry, &fscrypt_d_ops);
-}
-
-static inline void fscrypt_set_encrypted_dentry(struct dentry *dentry)
-{
- spin_lock(&dentry->d_lock);
- dentry->d_flags |= DCACHE_ENCRYPTED_WITH_KEY;
- spin_unlock(&dentry->d_lock);
-}
-
/* policy.c */
extern int fscrypt_ioctl_set_policy(struct file *, const void __user *);
extern int fscrypt_ioctl_get_policy(struct file *, void __user *);
@@ -43,7 +83,7 @@
void *, bool);
/* keyinfo.c */
extern int fscrypt_get_encryption_info(struct inode *);
-extern void fscrypt_put_encryption_info(struct inode *, struct fscrypt_info *);
+extern void fscrypt_put_encryption_info(struct inode *);
/* fname.c */
extern int fscrypt_setup_filename(struct inode *, const struct qstr *,
@@ -54,14 +94,11 @@
kfree(fname->crypto_buf.name);
}
-extern u32 fscrypt_fname_encrypted_size(const struct inode *, u32);
extern int fscrypt_fname_alloc_buffer(const struct inode *, u32,
struct fscrypt_str *);
extern void fscrypt_fname_free_buffer(struct fscrypt_str *);
extern int fscrypt_fname_disk_to_usr(struct inode *, u32, u32,
const struct fscrypt_str *, struct fscrypt_str *);
-extern int fscrypt_fname_usr_to_disk(struct inode *, const struct qstr *,
- struct fscrypt_str *);
#define FSCRYPT_FNAME_MAX_UNDIGESTED_SIZE 32
@@ -138,9 +175,30 @@
}
/* bio.c */
-extern void fscrypt_decrypt_bio_pages(struct fscrypt_ctx *, struct bio *);
+extern void fscrypt_decrypt_bio(struct bio *);
+extern void fscrypt_enqueue_decrypt_bio(struct fscrypt_ctx *ctx,
+ struct bio *bio);
extern void fscrypt_pullback_bio_page(struct page **, bool);
extern int fscrypt_zeroout_range(const struct inode *, pgoff_t, sector_t,
unsigned int);
+/* hooks.c */
+extern int fscrypt_file_open(struct inode *inode, struct file *filp);
+extern int __fscrypt_prepare_link(struct inode *inode, struct inode *dir);
+extern int __fscrypt_prepare_rename(struct inode *old_dir,
+ struct dentry *old_dentry,
+ struct inode *new_dir,
+ struct dentry *new_dentry,
+ unsigned int flags);
+extern int __fscrypt_prepare_lookup(struct inode *dir, struct dentry *dentry);
+extern int __fscrypt_prepare_symlink(struct inode *dir, unsigned int len,
+ unsigned int max_len,
+ struct fscrypt_str *disk_link);
+extern int __fscrypt_encrypt_symlink(struct inode *inode, const char *target,
+ unsigned int len,
+ struct fscrypt_str *disk_link);
+extern const char *fscrypt_get_symlink(struct inode *inode, const void *caddr,
+ unsigned int max_size,
+ struct delayed_call *done);
+
#endif /* _LINUX_FSCRYPT_SUPP_H */
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index bdaf225..4636b8f 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -214,12 +214,19 @@
static inline void fsnotify_open(struct file *file)
{
const struct path *path = &file->f_path;
+ struct path lower_path;
struct inode *inode = path->dentry->d_inode;
__u32 mask = FS_OPEN;
if (S_ISDIR(inode->i_mode))
mask |= FS_ISDIR;
+ if (path->dentry->d_op && path->dentry->d_op->d_canonical_path) {
+ path->dentry->d_op->d_canonical_path(path, &lower_path);
+ fsnotify_parent(&lower_path, NULL, mask);
+ fsnotify(lower_path.dentry->d_inode, mask, &lower_path, FSNOTIFY_EVENT_PATH, NULL, 0);
+ path_put(&lower_path);
+ }
fsnotify_parent(path, NULL, mask);
fsnotify(inode, mask, path, FSNOTIFY_EVENT_PATH, NULL, 0);
}
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index e54d257..ff3e5b1 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -254,8 +254,16 @@
return *this_cpu_ptr(ops->disabled);
}
+#ifdef CONFIG_CFI_CLANG
+/* Use a C stub with the correct type for CFI */
+static inline void ftrace_stub(unsigned long a0, unsigned long a1,
+ struct ftrace_ops *op, struct pt_regs *regs)
+{
+}
+#else
extern void ftrace_stub(unsigned long a0, unsigned long a1,
struct ftrace_ops *op, struct pt_regs *regs);
+#endif
#else /* !CONFIG_FUNCTION_TRACER */
/*
@@ -743,7 +751,8 @@
static inline void time_hardirqs_off(unsigned long a0, unsigned long a1) { }
#endif
-#ifdef CONFIG_PREEMPT_TRACER
+#if defined(CONFIG_PREEMPT_TRACER) || \
+ (defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
extern void trace_preempt_on(unsigned long a0, unsigned long a1);
extern void trace_preempt_off(unsigned long a0, unsigned long a1);
#else
diff --git a/include/linux/gpio_event.h b/include/linux/gpio_event.h
new file mode 100644
index 0000000..2613fc5
--- /dev/null
+++ b/include/linux/gpio_event.h
@@ -0,0 +1,170 @@
+/* include/linux/gpio_event.h
+ *
+ * Copyright (C) 2007 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_GPIO_EVENT_H
+#define _LINUX_GPIO_EVENT_H
+
+#include <linux/input.h>
+
+struct gpio_event_input_devs {
+ int count;
+ struct input_dev *dev[];
+};
+enum {
+ GPIO_EVENT_FUNC_UNINIT = 0x0,
+ GPIO_EVENT_FUNC_INIT = 0x1,
+ GPIO_EVENT_FUNC_SUSPEND = 0x2,
+ GPIO_EVENT_FUNC_RESUME = 0x3,
+};
+struct gpio_event_info {
+ int (*func)(struct gpio_event_input_devs *input_devs,
+ struct gpio_event_info *info,
+ void **data, int func);
+ int (*event)(struct gpio_event_input_devs *input_devs,
+ struct gpio_event_info *info,
+ void **data, unsigned int dev, unsigned int type,
+ unsigned int code, int value); /* out events */
+ bool no_suspend;
+};
+
+struct gpio_event_platform_data {
+ const char *name;
+ struct gpio_event_info **info;
+ size_t info_count;
+ int (*power)(const struct gpio_event_platform_data *pdata, bool on);
+ const char *names[]; /* If name is NULL, names contain a NULL */
+ /* terminated list of input devices to create */
+};
+
+#define GPIO_EVENT_DEV_NAME "gpio-event"
+
+/* Key matrix */
+
+enum gpio_event_matrix_flags {
+ /* unset: drive active output low, set: drive active output high */
+ GPIOKPF_ACTIVE_HIGH = 1U << 0,
+ GPIOKPF_DEBOUNCE = 1U << 1,
+ GPIOKPF_REMOVE_SOME_PHANTOM_KEYS = 1U << 2,
+ GPIOKPF_REMOVE_PHANTOM_KEYS = GPIOKPF_REMOVE_SOME_PHANTOM_KEYS |
+ GPIOKPF_DEBOUNCE,
+ GPIOKPF_DRIVE_INACTIVE = 1U << 3,
+ GPIOKPF_LEVEL_TRIGGERED_IRQ = 1U << 4,
+ GPIOKPF_PRINT_UNMAPPED_KEYS = 1U << 16,
+ GPIOKPF_PRINT_MAPPED_KEYS = 1U << 17,
+ GPIOKPF_PRINT_PHANTOM_KEYS = 1U << 18,
+};
+
+#define MATRIX_CODE_BITS (10)
+#define MATRIX_KEY_MASK ((1U << MATRIX_CODE_BITS) - 1)
+#define MATRIX_KEY(dev, code) \
+ (((dev) << MATRIX_CODE_BITS) | (code & MATRIX_KEY_MASK))
+
+extern int gpio_event_matrix_func(struct gpio_event_input_devs *input_devs,
+ struct gpio_event_info *info, void **data, int func);
+struct gpio_event_matrix_info {
+ /* initialize to gpio_event_matrix_func */
+ struct gpio_event_info info;
+ /* size must be ninputs * noutputs */
+ const unsigned short *keymap;
+ unsigned int *input_gpios;
+ unsigned int *output_gpios;
+ unsigned int ninputs;
+ unsigned int noutputs;
+ /* time to wait before reading inputs after driving each output */
+ ktime_t settle_time;
+ /* time to wait before scanning the keypad a second time */
+ ktime_t debounce_delay;
+ ktime_t poll_time;
+ unsigned flags;
+};
+
+/* Directly connected inputs and outputs */
+
+enum gpio_event_direct_flags {
+ GPIOEDF_ACTIVE_HIGH = 1U << 0,
+/* GPIOEDF_USE_DOWN_IRQ = 1U << 1, */
+/* GPIOEDF_USE_IRQ = (1U << 2) | GPIOIDF_USE_DOWN_IRQ, */
+ GPIOEDF_PRINT_KEYS = 1U << 8,
+ GPIOEDF_PRINT_KEY_DEBOUNCE = 1U << 9,
+ GPIOEDF_PRINT_KEY_UNSTABLE = 1U << 10,
+};
+
+struct gpio_event_direct_entry {
+ uint32_t gpio:16;
+ uint32_t code:10;
+ uint32_t dev:6;
+};
+
+/* inputs */
+extern int gpio_event_input_func(struct gpio_event_input_devs *input_devs,
+ struct gpio_event_info *info, void **data, int func);
+struct gpio_event_input_info {
+ /* initialize to gpio_event_input_func */
+ struct gpio_event_info info;
+ ktime_t debounce_time;
+ ktime_t poll_time;
+ uint16_t flags;
+ uint16_t type;
+ const struct gpio_event_direct_entry *keymap;
+ size_t keymap_size;
+};
+
+/* outputs */
+extern int gpio_event_output_func(struct gpio_event_input_devs *input_devs,
+ struct gpio_event_info *info, void **data, int func);
+extern int gpio_event_output_event(struct gpio_event_input_devs *input_devs,
+ struct gpio_event_info *info, void **data,
+ unsigned int dev, unsigned int type,
+ unsigned int code, int value);
+struct gpio_event_output_info {
+ /* initialize to gpio_event_output_func and gpio_event_output_event */
+ struct gpio_event_info info;
+ uint16_t flags;
+ uint16_t type;
+ const struct gpio_event_direct_entry *keymap;
+ size_t keymap_size;
+};
+
+
+/* axes */
+
+enum gpio_event_axis_flags {
+ GPIOEAF_PRINT_UNKNOWN_DIRECTION = 1U << 16,
+ GPIOEAF_PRINT_RAW = 1U << 17,
+ GPIOEAF_PRINT_EVENT = 1U << 18,
+};
+
+extern int gpio_event_axis_func(struct gpio_event_input_devs *input_devs,
+ struct gpio_event_info *info, void **data, int func);
+struct gpio_event_axis_info {
+ /* initialize to gpio_event_axis_func */
+ struct gpio_event_info info;
+ uint8_t count; /* number of gpios for this axis */
+ uint8_t dev; /* device index when using multiple input devices */
+ uint8_t type; /* EV_REL or EV_ABS */
+ uint16_t code;
+ uint16_t decoded_size;
+ uint16_t (*map)(struct gpio_event_axis_info *info, uint16_t in);
+ uint32_t *gpio;
+ uint32_t flags;
+};
+#define gpio_axis_2bit_gray_map gpio_axis_4bit_gray_map
+#define gpio_axis_3bit_gray_map gpio_axis_4bit_gray_map
+uint16_t gpio_axis_4bit_gray_map(
+ struct gpio_event_axis_info *info, uint16_t in);
+uint16_t gpio_axis_5bit_singletrack_map(
+ struct gpio_event_axis_info *info, uint16_t in);
+
+#endif
diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 012c37f..27f14f2 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -405,6 +405,7 @@
}
extern u64 hrtimer_get_next_event(void);
+extern u64 hrtimer_next_event_without(const struct hrtimer *exclude);
extern bool hrtimer_active(const struct hrtimer *timer);
diff --git a/include/linux/init.h b/include/linux/init.h
index 07cab8a..f138e5b 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -47,7 +47,7 @@
/* These are for everybody (although not all archs will actually
discard it in modules) */
-#define __init __section(.init.text) __cold __inittrace __latent_entropy __noinitretpoline
+#define __init __section(.init.text) __cold __inittrace __latent_entropy __noinitretpoline __nocfi
#define __initdata __section(.init.data)
#define __initconst __section(.init.rodata)
#define __exitdata __section(.exit.data)
@@ -153,6 +153,15 @@
#ifndef __ASSEMBLY__
+#ifdef CONFIG_LTO_CLANG
+ /* prepend the variable name with __COUNTER__ to ensure correct ordering */
+ #define ___initcall_name2(c, fn, id) __initcall_##c##_##fn##id
+ #define ___initcall_name1(c, fn, id) ___initcall_name2(c, fn, id)
+ #define __initcall_name(fn, id) ___initcall_name1(__COUNTER__, fn, id)
+#else
+ #define __initcall_name(fn, id) __initcall_##fn##id
+#endif
+
/*
* initcalls are now grouped by functionality into separate
* subsections. Ordering inside the subsections is determined
@@ -170,7 +179,7 @@
*/
#define __define_initcall(fn, id) \
- static initcall_t __initcall_##fn##id __used \
+ static initcall_t __initcall_name(fn, id) __used \
__attribute__((__section__(".initcall" #id ".init"))) = fn;
/*
diff --git a/include/linux/initramfs.h b/include/linux/initramfs.h
new file mode 100644
index 0000000..fc7da63
--- /dev/null
+++ b/include/linux/initramfs.h
@@ -0,0 +1,32 @@
+/*
+ * include/linux/initramfs.h
+ *
+ * Copyright (C) 2015, Google
+ * Rom Lemarchand <romlem@android.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef _LINUX_INITRAMFS_H
+#define _LINUX_INITRAMFS_H
+
+#include <linux/kconfig.h>
+
+#if IS_BUILTIN(CONFIG_BLK_DEV_INITRD)
+
+int __init default_rootfs(void);
+
+#endif
+
+#endif /* _LINUX_INITRAMFS_H */
diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h
index 067a6fa..3e3893f 100644
--- a/include/linux/ipv6.h
+++ b/include/linux/ipv6.h
@@ -42,6 +42,7 @@
__s32 accept_ra_rt_info_max_plen;
#endif
#endif
+ __s32 accept_ra_rt_table;
__s32 proxy_ndp;
__s32 accept_source_route;
__s32 accept_ra_from_local;
diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index 9385aa5..a27cf66 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -62,8 +62,11 @@
/* TICK_NSEC is the time between ticks in nsec assuming SHIFTED_HZ */
#define TICK_NSEC ((NSEC_PER_SEC+HZ/2)/HZ)
-/* TICK_USEC is the time between ticks in usec assuming fake USER_HZ */
-#define TICK_USEC ((1000000UL + USER_HZ/2) / USER_HZ)
+/* TICK_USEC is the time between ticks in usec assuming SHIFTED_HZ */
+#define TICK_USEC ((USEC_PER_SEC + HZ/2) / HZ)
+
+/* USER_TICK_USEC is the time between ticks in usec assuming fake USER_HZ */
+#define USER_TICK_USEC ((1000000UL + USER_HZ/2) / USER_HZ)
#ifndef __jiffy_arch_data
#define __jiffy_arch_data
diff --git a/include/linux/keychord.h b/include/linux/keychord.h
new file mode 100644
index 0000000..08cf540
--- /dev/null
+++ b/include/linux/keychord.h
@@ -0,0 +1,23 @@
+/*
+ * Key chord input driver
+ *
+ * Copyright (C) 2008 Google, Inc.
+ * Author: Mike Lockwood <lockwood@android.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+*/
+
+#ifndef __LINUX_KEYCHORD_H_
+#define __LINUX_KEYCHORD_H_
+
+#include <uapi/linux/keychord.h>
+
+#endif /* __LINUX_KEYCHORD_H_ */
diff --git a/include/linux/keycombo.h b/include/linux/keycombo.h
new file mode 100644
index 0000000..c6db262
--- /dev/null
+++ b/include/linux/keycombo.h
@@ -0,0 +1,36 @@
+/*
+ * include/linux/keycombo.h - platform data structure for keycombo driver
+ *
+ * Copyright (C) 2014 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_KEYCOMBO_H
+#define _LINUX_KEYCOMBO_H
+
+#define KEYCOMBO_NAME "keycombo"
+
+/*
+ * if key_down_fn and key_up_fn are both present, you are guaranteed that
+ * key_down_fn will return before key_up_fn is called, and that key_up_fn
+ * is called iff key_down_fn is called.
+ */
+struct keycombo_platform_data {
+ void (*key_down_fn)(void *);
+ void (*key_up_fn)(void *);
+ void *priv;
+ int key_down_delay; /* Time in ms */
+ int *keys_up;
+ int keys_down[]; /* 0 terminated */
+};
+
+#endif /* _LINUX_KEYCOMBO_H */
diff --git a/include/linux/keyreset.h b/include/linux/keyreset.h
new file mode 100644
index 0000000..2e34afa
--- /dev/null
+++ b/include/linux/keyreset.h
@@ -0,0 +1,29 @@
+/*
+ * include/linux/keyreset.h - platform data structure for resetkeys driver
+ *
+ * Copyright (C) 2014 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _LINUX_KEYRESET_H
+#define _LINUX_KEYRESET_H
+
+#define KEYRESET_NAME "keyreset"
+
+struct keyreset_platform_data {
+ int (*reset_fn)(void);
+ int key_down_delay;
+ int *keys_up;
+ int keys_down[]; /* 0 terminated */
+};
+
+#endif /* _LINUX_KEYRESET_H */
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index c925812..7161d8e 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -1351,6 +1351,40 @@
* @inode we wish to get the security context of.
* @ctx is a pointer in which to place the allocated security context.
* @ctxlen points to the place to put the length of @ctx.
+ *
+ * Security hooks for using the eBPF maps and programs functionalities through
+ * eBPF syscalls.
+ *
+ * @bpf:
+ * Do a initial check for all bpf syscalls after the attribute is copied
+ * into the kernel. The actual security module can implement their own
+ * rules to check the specific cmd they need.
+ *
+ * @bpf_map:
+ * Do a check when the kernel generate and return a file descriptor for
+ * eBPF maps.
+ *
+ * @map: bpf map that we want to access
+ * @mask: the access flags
+ *
+ * @bpf_prog:
+ * Do a check when the kernel generate and return a file descriptor for
+ * eBPF programs.
+ *
+ * @prog: bpf prog that userspace want to use.
+ *
+ * @bpf_map_alloc_security:
+ * Initialize the security field inside bpf map.
+ *
+ * @bpf_map_free_security:
+ * Clean up the security information stored inside bpf map.
+ *
+ * @bpf_prog_alloc_security:
+ * Initialize the security field inside bpf program.
+ *
+ * @bpf_prog_free_security:
+ * Clean up the security information stored inside bpf prog.
+ *
*/
union security_list_options {
int (*binder_set_context_mgr)(struct task_struct *mgr);
@@ -1682,6 +1716,17 @@
struct audit_context *actx);
void (*audit_rule_free)(void *lsmrule);
#endif /* CONFIG_AUDIT */
+
+#ifdef CONFIG_BPF_SYSCALL
+ int (*bpf)(int cmd, union bpf_attr *attr,
+ unsigned int size);
+ int (*bpf_map)(struct bpf_map *map, fmode_t fmode);
+ int (*bpf_prog)(struct bpf_prog *prog);
+ int (*bpf_map_alloc_security)(struct bpf_map *map);
+ void (*bpf_map_free_security)(struct bpf_map *map);
+ int (*bpf_prog_alloc_security)(struct bpf_prog_aux *aux);
+ void (*bpf_prog_free_security)(struct bpf_prog_aux *aux);
+#endif /* CONFIG_BPF_SYSCALL */
};
struct security_hook_heads {
@@ -1901,6 +1946,15 @@
struct list_head audit_rule_match;
struct list_head audit_rule_free;
#endif /* CONFIG_AUDIT */
+#ifdef CONFIG_BPF_SYSCALL
+ struct list_head bpf;
+ struct list_head bpf_map;
+ struct list_head bpf_prog;
+ struct list_head bpf_map_alloc_security;
+ struct list_head bpf_map_free_security;
+ struct list_head bpf_prog_alloc_security;
+ struct list_head bpf_prog_free_security;
+#endif /* CONFIG_BPF_SYSCALL */
} __randomize_layout;
/*
diff --git a/include/linux/memory-state-time.h b/include/linux/memory-state-time.h
new file mode 100644
index 0000000..d2212b0
--- /dev/null
+++ b/include/linux/memory-state-time.h
@@ -0,0 +1,42 @@
+/* include/linux/memory-state-time.h
+ *
+ * Copyright (C) 2016 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/workqueue.h>
+
+#define UPDATE_MEMORY_STATE(BLOCK, VALUE) BLOCK->update_call(BLOCK, VALUE)
+
+struct memory_state_update_block;
+
+typedef void (*memory_state_update_fn_t)(struct memory_state_update_block *ub,
+ int value);
+
+/* This struct is populated when you pass it to a memory_state_register*
+ * function. The update_call function is used for an update and defined in the
+ * typedef memory_state_update_fn_t
+ */
+struct memory_state_update_block {
+ memory_state_update_fn_t update_call;
+ int id;
+};
+
+/* Register a frequency struct memory_state_update_block to provide updates to
+ * memory_state_time about frequency changes using its update_call function.
+ */
+struct memory_state_update_block *memory_state_register_frequency_source(void);
+
+/* Register a bandwidth struct memory_state_update_block to provide updates to
+ * memory_state_time about bandwidth changes using its update_call function.
+ */
+struct memory_state_update_block *memory_state_register_bandwidth_source(void);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index a26cf76..5b13b4e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1223,6 +1223,8 @@
extern void show_free_areas(unsigned int flags, nodemask_t *nodemask);
+void shmem_set_file(struct vm_area_struct *vma, struct file *file);
+
extern bool can_do_mlock(void);
extern int user_shm_lock(size_t, struct user_struct *);
extern void user_shm_unlock(size_t, struct user_struct *);
@@ -2094,7 +2096,7 @@
extern struct vm_area_struct *vma_merge(struct mm_struct *,
struct vm_area_struct *prev, unsigned long addr, unsigned long end,
unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t,
- struct mempolicy *, struct vm_userfaultfd_ctx);
+ struct mempolicy *, struct vm_userfaultfd_ctx, const char __user *);
extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *);
extern int __split_vma(struct mm_struct *, struct vm_area_struct *,
unsigned long addr, int new_below);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index e41ef53..1831bca 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -304,11 +304,18 @@
/*
* For areas with an address space and backing store,
* linkage into the address_space->i_mmap interval tree.
+ *
+ * For private anonymous mappings, a pointer to a null terminated string
+ * in the user process containing the name given to the vma, or NULL
+ * if unnamed.
*/
- struct {
- struct rb_node rb;
- unsigned long rb_subtree_last;
- } shared;
+ union {
+ struct {
+ struct rb_node rb;
+ unsigned long rb_subtree_last;
+ } shared;
+ const char __user *anon_name;
+ };
/*
* A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
@@ -655,4 +662,13 @@
unsigned long val;
} swp_entry_t;
+/* Return the name for an anonymous mapping or NULL for a file-backed mapping */
+static inline const char __user *vma_get_anon_name(struct vm_area_struct *vma)
+{
+ if (vma->vm_file)
+ return NULL;
+
+ return vma->anon_name;
+}
+
#endif /* _LINUX_MM_TYPES_H */
diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h
index 9a43763..227961c 100644
--- a/include/linux/mmc/host.h
+++ b/include/linux/mmc/host.h
@@ -439,6 +439,15 @@
bool cqe_enabled;
bool cqe_on;
+#ifdef CONFIG_MMC_EMBEDDED_SDIO
+ struct {
+ struct sdio_cis *cis;
+ struct sdio_cccr *cccr;
+ struct sdio_embedded_func *funcs;
+ int num_funcs;
+ } embedded_sdio_data;
+#endif
+
unsigned long private[0] ____cacheline_aligned;
};
@@ -451,6 +460,14 @@
int mmc_of_parse(struct mmc_host *host);
int mmc_of_parse_voltage(struct device_node *np, u32 *mask);
+#ifdef CONFIG_MMC_EMBEDDED_SDIO
+extern void mmc_set_embedded_sdio_data(struct mmc_host *host,
+ struct sdio_cis *cis,
+ struct sdio_cccr *cccr,
+ struct sdio_embedded_func *funcs,
+ int num_funcs);
+#endif
+
static inline void *mmc_priv(struct mmc_host *host)
{
return (void *)host->private;
diff --git a/include/linux/mmc/pm.h b/include/linux/mmc/pm.h
index 4a13920..6e2d6a1 100644
--- a/include/linux/mmc/pm.h
+++ b/include/linux/mmc/pm.h
@@ -26,5 +26,6 @@
#define MMC_PM_KEEP_POWER (1 << 0) /* preserve card power during suspend */
#define MMC_PM_WAKE_SDIO_IRQ (1 << 1) /* wake up host system on SDIO IRQ assertion */
+#define MMC_PM_IGNORE_PM_NOTIFY (1 << 2) /* ignore mmc pm notify */
#endif /* LINUX_MMC_PM_H */
diff --git a/include/linux/mmc/sdio_func.h b/include/linux/mmc/sdio_func.h
index 97ca105..f466f38 100644
--- a/include/linux/mmc/sdio_func.h
+++ b/include/linux/mmc/sdio_func.h
@@ -23,6 +23,14 @@
typedef void (sdio_irq_handler_t)(struct sdio_func *);
/*
+ * Structure used to hold embedded SDIO device data from platform layer
+ */
+struct sdio_embedded_func {
+ uint8_t f_class;
+ uint32_t f_maxblksize;
+};
+
+/*
* SDIO function CIS tuple (unknown to the core)
*/
struct sdio_func_tuple {
diff --git a/include/linux/module.h b/include/linux/module.h
index b1cc541..b28e094 100644
--- a/include/linux/module.h
+++ b/include/linux/module.h
@@ -19,6 +19,7 @@
#include <linux/jump_label.h>
#include <linux/export.h>
#include <linux/rbtree_latch.h>
+#include <linux/cfi.h>
#include <linux/percpu.h>
#include <asm/module.h>
@@ -346,6 +347,10 @@
const s32 *crcs;
unsigned int num_syms;
+#ifdef CONFIG_CFI_CLANG
+ cfi_check_fn cfi_check;
+#endif
+
/* Kernel parameters. */
#ifdef CONFIG_SYSFS
struct mutex param_lock;
diff --git a/include/linux/moduleparam.h b/include/linux/moduleparam.h
index 1d7140f..ba36506 100644
--- a/include/linux/moduleparam.h
+++ b/include/linux/moduleparam.h
@@ -228,19 +228,11 @@
VERIFY_OCTAL_PERMISSIONS(perm), level, flags, { arg } }
/* Obsolete - use module_param_cb() */
-#define module_param_call(name, set, get, arg, perm) \
- static const struct kernel_param_ops __param_ops_##name = \
- { .flags = 0, (void *)set, (void *)get }; \
+#define module_param_call(name, _set, _get, arg, perm) \
+ static const struct kernel_param_ops __param_ops_##name = \
+ { .flags = 0, .set = _set, .get = _get }; \
__module_param_call(MODULE_PARAM_PREFIX, \
- name, &__param_ops_##name, arg, \
- (perm) + sizeof(__check_old_set_param(set))*0, -1, 0)
-
-/* We don't get oldget: it's often a new-style param_get_uint, etc. */
-static inline int
-__check_old_set_param(int (*oldset)(const char *, struct kernel_param *))
-{
- return 0;
-}
+ name, &__param_ops_##name, arg, perm, -1, 0)
#ifdef CONFIG_SYSFS
extern void kernel_param_lock(struct module *mod);
diff --git a/include/linux/mount.h b/include/linux/mount.h
index 45b1f56..1ff21c1 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -68,6 +68,7 @@
struct dentry *mnt_root; /* root of the mounted tree */
struct super_block *mnt_sb; /* pointer to superblock */
int mnt_flags;
+ void *data;
} __randomize_layout;
struct file; /* forward dec */
diff --git a/include/linux/namei.h b/include/linux/namei.h
index a982bb7..db4f5c0 100644
--- a/include/linux/namei.h
+++ b/include/linux/namei.h
@@ -80,8 +80,11 @@
extern void done_path_create(struct path *, struct dentry *);
extern struct dentry *kern_path_locked(const char *, struct path *);
extern int kern_path_mountpoint(int, const char *, struct path *, unsigned int);
+extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
+ const char *, unsigned int, struct path *);
extern struct dentry *lookup_one_len(const char *, struct dentry *, int);
+extern struct dentry *lookup_one_len2(const char *, struct vfsmount *mnt, struct dentry *, int);
extern struct dentry *lookup_one_len_unlocked(const char *, struct dentry *, int);
extern int follow_down_one(struct path *);
diff --git a/include/linux/netfilter/xt_qtaguid.h b/include/linux/netfilter/xt_qtaguid.h
new file mode 100644
index 0000000..1c67155
--- /dev/null
+++ b/include/linux/netfilter/xt_qtaguid.h
@@ -0,0 +1,14 @@
+#ifndef _XT_QTAGUID_MATCH_H
+#define _XT_QTAGUID_MATCH_H
+
+/* For now we just replace the xt_owner.
+ * FIXME: make iptables aware of qtaguid. */
+#include <linux/netfilter/xt_owner.h>
+
+#define XT_QTAGUID_UID XT_OWNER_UID
+#define XT_QTAGUID_GID XT_OWNER_GID
+#define XT_QTAGUID_SOCKET XT_OWNER_SOCKET
+#define xt_qtaguid_match_info xt_owner_match_info
+
+int qtaguid_untag(struct socket *sock, bool kernel);
+#endif /* _XT_QTAGUID_MATCH_H */
diff --git a/include/linux/netfilter/xt_quota2.h b/include/linux/netfilter/xt_quota2.h
new file mode 100644
index 0000000..eadc69033
--- /dev/null
+++ b/include/linux/netfilter/xt_quota2.h
@@ -0,0 +1,25 @@
+#ifndef _XT_QUOTA_H
+#define _XT_QUOTA_H
+
+enum xt_quota_flags {
+ XT_QUOTA_INVERT = 1 << 0,
+ XT_QUOTA_GROW = 1 << 1,
+ XT_QUOTA_PACKET = 1 << 2,
+ XT_QUOTA_NO_CHANGE = 1 << 3,
+ XT_QUOTA_MASK = 0x0F,
+};
+
+struct xt_quota_counter;
+
+struct xt_quota_mtinfo2 {
+ char name[15];
+ u_int8_t flags;
+
+ /* Comparison-invariant */
+ aligned_u64 quota;
+
+ /* Used internally by the kernel */
+ struct xt_quota_counter *master __attribute__((aligned(8)));
+};
+
+#endif /* _XT_QUOTA_H */
diff --git a/include/linux/of_fdt.h b/include/linux/of_fdt.h
index 013c541..064bac7 100644
--- a/include/linux/of_fdt.h
+++ b/include/linux/of_fdt.h
@@ -66,6 +66,27 @@
extern int of_get_flat_dt_size(void);
extern uint32_t of_get_flat_dt_phandle(unsigned long node);
+/*
+ * early_init_dt_scan_chosen - scan the device tree for ramdisk and bootargs
+ *
+ * The boot arguments will be placed into the memory pointed to by @data.
+ * That memory should be COMMAND_LINE_SIZE big and initialized to be a valid
+ * (possibly empty) string. Logic for what will be in @data after this
+ * function finishes:
+ *
+ * - CONFIG_CMDLINE_FORCE=true
+ * CONFIG_CMDLINE
+ * - CONFIG_CMDLINE_EXTEND=true, @data is non-empty string
+ * @data + dt bootargs (even if dt bootargs are empty)
+ * - CONFIG_CMDLINE_EXTEND=true, @data is empty string
+ * CONFIG_CMDLINE + dt bootargs (even if dt bootargs are empty)
+ * - CMDLINE_FROM_BOOTLOADER=true, dt bootargs=non-empty:
+ * dt bootargs
+ * - CMDLINE_FROM_BOOTLOADER=true, dt bootargs=empty, @data is non-empty string
+ * @data is left unchanged
+ * - CMDLINE_FROM_BOOTLOADER=true, dt bootargs=empty, @data is empty string
+ * CONFIG_CMDLINE (or "" if that's not defined)
+ */
extern int early_init_dt_scan_chosen(unsigned long node, const char *uname,
int depth, void *data);
extern int early_init_dt_scan_memory(unsigned long node, const char *uname,
diff --git a/include/linux/overflow.h b/include/linux/overflow.h
new file mode 100644
index 0000000..8712ff7
--- /dev/null
+++ b/include/linux/overflow.h
@@ -0,0 +1,278 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+#ifndef __LINUX_OVERFLOW_H
+#define __LINUX_OVERFLOW_H
+
+#include <linux/compiler.h>
+
+/*
+ * In the fallback code below, we need to compute the minimum and
+ * maximum values representable in a given type. These macros may also
+ * be useful elsewhere, so we provide them outside the
+ * COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW block.
+ *
+ * It would seem more obvious to do something like
+ *
+ * #define type_min(T) (T)(is_signed_type(T) ? (T)1 << (8*sizeof(T)-1) : 0)
+ * #define type_max(T) (T)(is_signed_type(T) ? ((T)1 << (8*sizeof(T)-1)) - 1 : ~(T)0)
+ *
+ * Unfortunately, the middle expressions, strictly speaking, have
+ * undefined behaviour, and at least some versions of gcc warn about
+ * the type_max expression (but not if -fsanitize=undefined is in
+ * effect; in that case, the warning is deferred to runtime...).
+ *
+ * The slightly excessive casting in type_min is to make sure the
+ * macros also produce sensible values for the exotic type _Bool. [The
+ * overflow checkers only almost work for _Bool, but that's
+ * a-feature-not-a-bug, since people shouldn't be doing arithmetic on
+ * _Bools. Besides, the gcc builtins don't allow _Bool* as third
+ * argument.]
+ *
+ * Idea stolen from
+ * https://mail-index.netbsd.org/tech-misc/2007/02/05/0000.html -
+ * credit to Christian Biere.
+ */
+#define is_signed_type(type) (((type)(-1)) < (type)1)
+#define __type_half_max(type) ((type)1 << (8*sizeof(type) - 1 - is_signed_type(type)))
+#define type_max(T) ((T)((__type_half_max(T) - 1) + __type_half_max(T)))
+#define type_min(T) ((T)((T)-type_max(T)-(T)1))
+
+
+#ifdef COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW
+/*
+ * For simplicity and code hygiene, the fallback code below insists on
+ * a, b and *d having the same type (similar to the min() and max()
+ * macros), whereas gcc's type-generic overflow checkers accept
+ * different types. Hence we don't just make check_add_overflow an
+ * alias for __builtin_add_overflow, but add type checks similar to
+ * below.
+ */
+#define check_add_overflow(a, b, d) ({ \
+ typeof(a) __a = (a); \
+ typeof(b) __b = (b); \
+ typeof(d) __d = (d); \
+ (void) (&__a == &__b); \
+ (void) (&__a == __d); \
+ __builtin_add_overflow(__a, __b, __d); \
+})
+
+#define check_sub_overflow(a, b, d) ({ \
+ typeof(a) __a = (a); \
+ typeof(b) __b = (b); \
+ typeof(d) __d = (d); \
+ (void) (&__a == &__b); \
+ (void) (&__a == __d); \
+ __builtin_sub_overflow(__a, __b, __d); \
+})
+
+#define check_mul_overflow(a, b, d) ({ \
+ typeof(a) __a = (a); \
+ typeof(b) __b = (b); \
+ typeof(d) __d = (d); \
+ (void) (&__a == &__b); \
+ (void) (&__a == __d); \
+ __builtin_mul_overflow(__a, __b, __d); \
+})
+
+#else
+
+
+/* Checking for unsigned overflow is relatively easy without causing UB. */
+#define __unsigned_add_overflow(a, b, d) ({ \
+ typeof(a) __a = (a); \
+ typeof(b) __b = (b); \
+ typeof(d) __d = (d); \
+ (void) (&__a == &__b); \
+ (void) (&__a == __d); \
+ *__d = __a + __b; \
+ *__d < __a; \
+})
+#define __unsigned_sub_overflow(a, b, d) ({ \
+ typeof(a) __a = (a); \
+ typeof(b) __b = (b); \
+ typeof(d) __d = (d); \
+ (void) (&__a == &__b); \
+ (void) (&__a == __d); \
+ *__d = __a - __b; \
+ __a < __b; \
+})
+/*
+ * If one of a or b is a compile-time constant, this avoids a division.
+ */
+#define __unsigned_mul_overflow(a, b, d) ({ \
+ typeof(a) __a = (a); \
+ typeof(b) __b = (b); \
+ typeof(d) __d = (d); \
+ (void) (&__a == &__b); \
+ (void) (&__a == __d); \
+ *__d = __a * __b; \
+ __builtin_constant_p(__b) ? \
+ __b > 0 && __a > type_max(typeof(__a)) / __b : \
+ __a > 0 && __b > type_max(typeof(__b)) / __a; \
+})
+
+/*
+ * For signed types, detecting overflow is much harder, especially if
+ * we want to avoid UB. But the interface of these macros is such that
+ * we must provide a result in *d, and in fact we must produce the
+ * result promised by gcc's builtins, which is simply the possibly
+ * wrapped-around value. Fortunately, we can just formally do the
+ * operations in the widest relevant unsigned type (u64) and then
+ * truncate the result - gcc is smart enough to generate the same code
+ * with and without the (u64) casts.
+ */
+
+/*
+ * Adding two signed integers can overflow only if they have the same
+ * sign, and overflow has happened iff the result has the opposite
+ * sign.
+ */
+#define __signed_add_overflow(a, b, d) ({ \
+ typeof(a) __a = (a); \
+ typeof(b) __b = (b); \
+ typeof(d) __d = (d); \
+ (void) (&__a == &__b); \
+ (void) (&__a == __d); \
+ *__d = (u64)__a + (u64)__b; \
+ (((~(__a ^ __b)) & (*__d ^ __a)) \
+ & type_min(typeof(__a))) != 0; \
+})
+
+/*
+ * Subtraction is similar, except that overflow can now happen only
+ * when the signs are opposite. In this case, overflow has happened if
+ * the result has the opposite sign of a.
+ */
+#define __signed_sub_overflow(a, b, d) ({ \
+ typeof(a) __a = (a); \
+ typeof(b) __b = (b); \
+ typeof(d) __d = (d); \
+ (void) (&__a == &__b); \
+ (void) (&__a == __d); \
+ *__d = (u64)__a - (u64)__b; \
+ ((((__a ^ __b)) & (*__d ^ __a)) \
+ & type_min(typeof(__a))) != 0; \
+})
+
+/*
+ * Signed multiplication is rather hard. gcc always follows C99, so
+ * division is truncated towards 0. This means that we can write the
+ * overflow check like this:
+ *
+ * (a > 0 && (b > MAX/a || b < MIN/a)) ||
+ * (a < -1 && (b > MIN/a || b < MAX/a) ||
+ * (a == -1 && b == MIN)
+ *
+ * The redundant casts of -1 are to silence an annoying -Wtype-limits
+ * (included in -Wextra) warning: When the type is u8 or u16, the
+ * __b_c_e in check_mul_overflow obviously selects
+ * __unsigned_mul_overflow, but unfortunately gcc still parses this
+ * code and warns about the limited range of __b.
+ */
+
+#define __signed_mul_overflow(a, b, d) ({ \
+ typeof(a) __a = (a); \
+ typeof(b) __b = (b); \
+ typeof(d) __d = (d); \
+ typeof(a) __tmax = type_max(typeof(a)); \
+ typeof(a) __tmin = type_min(typeof(a)); \
+ (void) (&__a == &__b); \
+ (void) (&__a == __d); \
+ *__d = (u64)__a * (u64)__b; \
+ (__b > 0 && (__a > __tmax/__b || __a < __tmin/__b)) || \
+ (__b < (typeof(__b))-1 && (__a > __tmin/__b || __a < __tmax/__b)) || \
+ (__b == (typeof(__b))-1 && __a == __tmin); \
+})
+
+
+#define check_add_overflow(a, b, d) \
+ __builtin_choose_expr(is_signed_type(typeof(a)), \
+ __signed_add_overflow(a, b, d), \
+ __unsigned_add_overflow(a, b, d))
+
+#define check_sub_overflow(a, b, d) \
+ __builtin_choose_expr(is_signed_type(typeof(a)), \
+ __signed_sub_overflow(a, b, d), \
+ __unsigned_sub_overflow(a, b, d))
+
+#define check_mul_overflow(a, b, d) \
+ __builtin_choose_expr(is_signed_type(typeof(a)), \
+ __signed_mul_overflow(a, b, d), \
+ __unsigned_mul_overflow(a, b, d))
+
+
+#endif /* COMPILER_HAS_GENERIC_BUILTIN_OVERFLOW */
+
+/**
+ * array_size() - Calculate size of 2-dimensional array.
+ *
+ * @a: dimension one
+ * @b: dimension two
+ *
+ * Calculates size of 2-dimensional array: @a * @b.
+ *
+ * Returns: number of bytes needed to represent the array or SIZE_MAX on
+ * overflow.
+ */
+static inline __must_check size_t array_size(size_t a, size_t b)
+{
+ size_t bytes;
+
+ if (check_mul_overflow(a, b, &bytes))
+ return SIZE_MAX;
+
+ return bytes;
+}
+
+/**
+ * array3_size() - Calculate size of 3-dimensional array.
+ *
+ * @a: dimension one
+ * @b: dimension two
+ * @c: dimension three
+ *
+ * Calculates size of 3-dimensional array: @a * @b * @c.
+ *
+ * Returns: number of bytes needed to represent the array or SIZE_MAX on
+ * overflow.
+ */
+static inline __must_check size_t array3_size(size_t a, size_t b, size_t c)
+{
+ size_t bytes;
+
+ if (check_mul_overflow(a, b, &bytes))
+ return SIZE_MAX;
+ if (check_mul_overflow(bytes, c, &bytes))
+ return SIZE_MAX;
+
+ return bytes;
+}
+
+static inline __must_check size_t __ab_c_size(size_t n, size_t size, size_t c)
+{
+ size_t bytes;
+
+ if (check_mul_overflow(n, size, &bytes))
+ return SIZE_MAX;
+ if (check_add_overflow(bytes, c, &bytes))
+ return SIZE_MAX;
+
+ return bytes;
+}
+
+/**
+ * struct_size() - Calculate size of structure with trailing array.
+ * @p: Pointer to the structure.
+ * @member: Name of the array member.
+ * @n: Number of elements in the array.
+ *
+ * Calculates size of memory needed for structure @p followed by an
+ * array of @n @member elements.
+ *
+ * Return: number of bytes needed or SIZE_MAX on overflow.
+ */
+#define struct_size(p, member, n) \
+ __ab_c_size(n, \
+ sizeof(*(p)->member) + __must_be_array((p)->member),\
+ sizeof(*(p)))
+
+#endif /* __LINUX_OVERFLOW_H */
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e08b533..467af15 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -243,7 +243,7 @@
__GFP_COLD | __GFP_NORETRY | __GFP_NOWARN;
}
-typedef int filler_t(void *, struct page *);
+typedef int filler_t(struct file *, struct page *);
pgoff_t page_cache_next_hole(struct address_space *mapping,
pgoff_t index, unsigned long max_scan);
@@ -366,8 +366,16 @@
}
unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t start,
unsigned int nr_pages, struct page **pages);
-unsigned find_get_pages_tag(struct address_space *mapping, pgoff_t *index,
- int tag, unsigned int nr_pages, struct page **pages);
+unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index,
+ pgoff_t end, int tag, unsigned int nr_pages,
+ struct page **pages);
+static inline unsigned find_get_pages_tag(struct address_space *mapping,
+ pgoff_t *index, int tag, unsigned int nr_pages,
+ struct page **pages)
+{
+ return find_get_pages_range_tag(mapping, index, (pgoff_t)-1, tag,
+ nr_pages, pages);
+}
unsigned find_get_entries_tag(struct address_space *mapping, pgoff_t start,
int tag, unsigned int nr_entries,
struct page **entries, pgoff_t *indices);
@@ -394,7 +402,7 @@
static inline struct page *read_mapping_page(struct address_space *mapping,
pgoff_t index, void *data)
{
- filler_t *filler = (filler_t *)mapping->a_ops->readpage;
+ filler_t *filler = mapping->a_ops->readpage;
return read_cache_page(mapping, index, filler, data);
}
diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 2636c0c..4f56e0a 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -38,9 +38,17 @@
return pagevec_lookup_range(pvec, mapping, start, (pgoff_t)-1);
}
-unsigned pagevec_lookup_tag(struct pagevec *pvec,
- struct address_space *mapping, pgoff_t *index, int tag,
- unsigned nr_pages);
+unsigned pagevec_lookup_range_tag(struct pagevec *pvec,
+ struct address_space *mapping, pgoff_t *index, pgoff_t end,
+ int tag);
+unsigned pagevec_lookup_range_nr_tag(struct pagevec *pvec,
+ struct address_space *mapping, pgoff_t *index, pgoff_t end,
+ int tag, unsigned max_pages);
+static inline unsigned pagevec_lookup_tag(struct pagevec *pvec,
+ struct address_space *mapping, pgoff_t *index, int tag)
+{
+ return pagevec_lookup_range_tag(pvec, mapping, index, (pgoff_t)-1, tag);
+}
static inline void pagevec_init(struct pagevec *pvec, int cold)
{
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8e22f24..b7fecdf 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1165,6 +1165,11 @@
int perf_event_max_stack_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos);
+static inline bool perf_paranoid_any(void)
+{
+ return sysctl_perf_event_paranoid > 2;
+}
+
static inline bool perf_paranoid_tracepoint_raw(void)
{
return sysctl_perf_event_paranoid > -1;
diff --git a/include/linux/posix_acl.h b/include/linux/posix_acl.h
index b2b7255..540595a 100644
--- a/include/linux/posix_acl.h
+++ b/include/linux/posix_acl.h
@@ -12,6 +12,7 @@
#include <linux/bug.h>
#include <linux/slab.h>
#include <linux/rcupdate.h>
+#include <linux/refcount.h>
#include <uapi/linux/posix_acl.h>
struct posix_acl_entry {
@@ -24,7 +25,7 @@
};
struct posix_acl {
- atomic_t a_refcount;
+ refcount_t a_refcount;
struct rcu_head a_rcu;
unsigned int a_count;
struct posix_acl_entry a_entries[0];
@@ -41,7 +42,7 @@
posix_acl_dup(struct posix_acl *acl)
{
if (acl)
- atomic_inc(&acl->a_refcount);
+ refcount_inc(&acl->a_refcount);
return acl;
}
@@ -51,7 +52,7 @@
static inline void
posix_acl_release(struct posix_acl *acl)
{
- if (acl && atomic_dec_and_test(&acl->a_refcount))
+ if (acl && refcount_dec_and_test(&acl->a_refcount))
kfree_rcu(acl, a_rcu);
}
diff --git a/include/linux/power_supply.h b/include/linux/power_supply.h
index 79e90b3..e270643 100644
--- a/include/linux/power_supply.h
+++ b/include/linux/power_supply.h
@@ -18,6 +18,7 @@
#include <linux/leds.h>
#include <linux/spinlock.h>
#include <linux/notifier.h>
+#include <linux/types.h>
/*
* All voltages, currents, charges, energies, time and temperatures in uV,
@@ -149,6 +150,12 @@
POWER_SUPPLY_PROP_PRECHARGE_CURRENT,
POWER_SUPPLY_PROP_CHARGE_TERM_CURRENT,
POWER_SUPPLY_PROP_CALIBRATE,
+ /* Local extensions */
+ POWER_SUPPLY_PROP_USB_HC,
+ POWER_SUPPLY_PROP_USB_OTG,
+ POWER_SUPPLY_PROP_CHARGE_ENABLED,
+ /* Local extensions of type int64_t */
+ POWER_SUPPLY_PROP_CHARGE_COUNTER_EXT,
/* Properties of type `const char *' */
POWER_SUPPLY_PROP_MODEL_NAME,
POWER_SUPPLY_PROP_MANUFACTURER,
@@ -177,6 +184,7 @@
union power_supply_propval {
int intval;
const char *strval;
+ int64_t int64val;
};
struct device_node;
diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h
index 928ef9e..d1c705e 100644
--- a/include/linux/proc_fs.h
+++ b/include/linux/proc_fs.h
@@ -71,6 +71,12 @@
#endif /* CONFIG_PROC_FS */
+#ifdef CONFIG_PROC_UID
+extern void proc_register_uid(kuid_t uid);
+#else
+static inline void proc_register_uid(kuid_t uid) {}
+#endif
+
struct net;
static inline struct proc_dir_entry *proc_net_mkdir(
diff --git a/include/linux/pstore_ram.h b/include/linux/pstore_ram.h
index 9395f06..10fe2e2 100644
--- a/include/linux/pstore_ram.h
+++ b/include/linux/pstore_ram.h
@@ -80,6 +80,8 @@
ssize_t persistent_ram_ecc_string(struct persistent_ram_zone *prz,
char *str, size_t len);
+void ramoops_console_write_buf(const char *buf, size_t size);
+
/*
* Ramoops platform data
* @mem_size memory size for ramoops
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e04919a..f8c5e72 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -206,6 +206,15 @@
/* Task command name length: */
#define TASK_COMM_LEN 16
+enum task_event {
+ PUT_PREV_TASK = 0,
+ PICK_NEXT_TASK = 1,
+ TASK_WAKE = 2,
+ TASK_MIGRATE = 3,
+ TASK_UPDATE = 4,
+ IRQ_UPDATE = 5,
+};
+
extern cpumask_var_t cpu_isolated_map;
extern void scheduler_tick(void);
@@ -317,6 +326,34 @@
u32 inv_weight;
};
+/**
+ * struct util_est - Estimation utilization of FAIR tasks
+ * @enqueued: instantaneous estimated utilization of a task/cpu
+ * @ewma: the Exponential Weighted Moving Average (EWMA)
+ * utilization of a task
+ *
+ * Support data structure to track an Exponential Weighted Moving Average
+ * (EWMA) of a FAIR task's utilization. New samples are added to the moving
+ * average each time a task completes an activation. Sample's weight is chosen
+ * so that the EWMA will be relatively insensitive to transient changes to the
+ * task's workload.
+ *
+ * The enqueued attribute has a slightly different meaning for tasks and cpus:
+ * - task: the task's util_avg at last task dequeue time
+ * - cfs_rq: the sum of util_est.enqueued for each RUNNABLE task on that CPU
+ * Thus, the util_est.enqueued of a task represents the contribution on the
+ * estimated utilization of the CPU where that task is currently enqueued.
+ *
+ * Only for tasks we track a moving average of the past instantaneous
+ * estimated utilization. This allows to absorb sporadic drops in utilization
+ * of an otherwise almost periodic task.
+ */
+struct util_est {
+ unsigned int enqueued;
+ unsigned int ewma;
+#define UTIL_EST_WEIGHT_SHIFT 2
+};
+
/*
* The load_avg/util_avg accumulates an infinite geometric series
* (see __update_load_avg() in kernel/sched/fair.c).
@@ -376,6 +413,7 @@
u32 period_contrib;
unsigned long load_avg;
unsigned long util_avg;
+ struct util_est util_est;
};
struct sched_statistics {
@@ -450,6 +488,41 @@
#endif
};
+#ifdef CONFIG_SCHED_WALT
+#define RAVG_HIST_SIZE_MAX 5
+
+/* ravg represents frequency scaled cpu-demand of tasks */
+struct ravg {
+ /*
+ * 'mark_start' marks the beginning of an event (task waking up, task
+ * starting to execute, task being preempted) within a window
+ *
+ * 'sum' represents how runnable a task has been within current
+ * window. It incorporates both running time and wait time and is
+ * frequency scaled.
+ *
+ * 'sum_history' keeps track of history of 'sum' seen over previous
+ * RAVG_HIST_SIZE windows. Windows where task was entirely sleeping are
+ * ignored.
+ *
+ * 'demand' represents maximum sum seen over previous
+ * sysctl_sched_ravg_hist_size windows. 'demand' could drive frequency
+ * demand for tasks.
+ *
+ * 'curr_window' represents task's contribution to cpu busy time
+ * statistics (rq->curr_runnable_sum) in current window
+ *
+ * 'prev_window' represents task's contribution to cpu busy time
+ * statistics (rq->prev_runnable_sum) in previous window
+ */
+ u64 mark_start;
+ u32 sum, demand;
+ u32 sum_history[RAVG_HIST_SIZE_MAX];
+ u32 curr_window, prev_window;
+ u16 active_windows;
+};
+#endif
+
struct sched_rt_entity {
struct list_head run_list;
unsigned long timeout;
@@ -602,6 +675,16 @@
const struct sched_class *sched_class;
struct sched_entity se;
struct sched_rt_entity rt;
+#ifdef CONFIG_SCHED_WALT
+ struct ravg ravg;
+ /*
+ * 'init_load_pct' represents the initial task load assigned to children
+ * of this task
+ */
+ u32 init_load_pct;
+ u64 last_sleep_ts;
+#endif
+
#ifdef CONFIG_CGROUP_SCHED
struct task_group *sched_task_group;
#endif
@@ -752,6 +835,10 @@
u64 stimescaled;
#endif
u64 gtime;
+#ifdef CONFIG_CPU_FREQ_TIMES
+ u64 *time_in_state;
+ unsigned int max_state;
+#endif
struct prev_cputime prev_cputime;
#ifdef CONFIG_VIRT_CPU_ACCOUNTING_GEN
struct vtime vtime;
diff --git a/include/linux/sched/cpufreq.h b/include/linux/sched/cpufreq.h
index d1ad3d8..0b55834 100644
--- a/include/linux/sched/cpufreq.h
+++ b/include/linux/sched/cpufreq.h
@@ -12,8 +12,6 @@
#define SCHED_CPUFREQ_DL (1U << 1)
#define SCHED_CPUFREQ_IOWAIT (1U << 2)
-#define SCHED_CPUFREQ_RT_DL (SCHED_CPUFREQ_RT | SCHED_CPUFREQ_DL)
-
#ifdef CONFIG_CPU_FREQ
struct update_util_data {
void (*func)(struct update_util_data *data, u64 time, unsigned int flags);
diff --git a/include/linux/sched/energy.h b/include/linux/sched/energy.h
new file mode 100644
index 0000000..a9b0349
--- /dev/null
+++ b/include/linux/sched/energy.h
@@ -0,0 +1,40 @@
+#ifndef _LINUX_SCHED_ENERGY_H
+#define _LINUX_SCHED_ENERGY_H
+
+#include <linux/sched.h>
+#include <linux/slab.h>
+
+/*
+ * There doesn't seem to be an NR_CPUS style max number of sched domain
+ * levels so here's an arbitrary constant one for the moment.
+ *
+ * The levels alluded to here correspond to entries in struct
+ * sched_domain_topology_level that are meant to be populated by arch
+ * specific code (topology.c).
+ */
+#define NR_SD_LEVELS 8
+
+#define SD_LEVEL0 0
+#define SD_LEVEL1 1
+#define SD_LEVEL2 2
+#define SD_LEVEL3 3
+#define SD_LEVEL4 4
+#define SD_LEVEL5 5
+#define SD_LEVEL6 6
+#define SD_LEVEL7 7
+
+/*
+ * Convenience macro for iterating through said sd levels.
+ */
+#define for_each_possible_sd_level(level) \
+ for (level = 0; level < NR_SD_LEVELS; level++)
+
+extern struct sched_group_energy *sge_array[NR_CPUS][NR_SD_LEVELS];
+
+#ifdef CONFIG_GENERIC_ARCH_TOPOLOGY
+void init_sched_energy_costs(void);
+#else
+void init_sched_energy_costs(void) {}
+#endif
+
+#endif
diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h
index d6a18a3..e076ff8 100644
--- a/include/linux/sched/sysctl.h
+++ b/include/linux/sched/sysctl.h
@@ -21,8 +21,16 @@
extern unsigned int sysctl_sched_latency;
extern unsigned int sysctl_sched_min_granularity;
+extern unsigned int sysctl_sched_sync_hint_enable;
+extern unsigned int sysctl_sched_cstate_aware;
extern unsigned int sysctl_sched_wakeup_granularity;
extern unsigned int sysctl_sched_child_runs_first;
+#ifdef CONFIG_SCHED_WALT
+extern unsigned int sysctl_sched_use_walt_cpu_util;
+extern unsigned int sysctl_sched_use_walt_task_util;
+extern unsigned int sysctl_sched_walt_init_task_load_pct;
+extern unsigned int sysctl_sched_walt_cpu_high_irqload;
+#endif
enum sched_tunable_scaling {
SCHED_TUNABLESCALING_NONE,
diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h
index cf257c2..5fcb7dd 100644
--- a/include/linux/sched/topology.h
+++ b/include/linux/sched/topology.h
@@ -26,6 +26,7 @@
#define SD_PREFER_SIBLING 0x1000 /* Prefer to place tasks in a sibling domain */
#define SD_OVERLAP 0x2000 /* sched_domains of this level overlap */
#define SD_NUMA 0x4000 /* cross-node balancing */
+#define SD_SHARE_CAP_STATES 0x8000 /* Domain members share capacity state */
/*
* Increase resolution of cpu_capacity calculations
@@ -66,12 +67,31 @@
extern int sched_domain_level_max;
+struct capacity_state {
+ unsigned long cap; /* compute capacity */
+ unsigned long frequency;/* frequency */
+ unsigned long power; /* power consumption at this compute capacity */
+};
+
+struct idle_state {
+ unsigned long power; /* power consumption in this idle state */
+};
+
+struct sched_group_energy {
+ unsigned int nr_idle_states; /* number of idle states */
+ struct idle_state *idle_states; /* ptr to idle state array */
+ unsigned int nr_cap_states; /* number of capacity states */
+ struct capacity_state *cap_states; /* ptr to capacity state array */
+};
+
struct sched_group;
struct sched_domain_shared {
atomic_t ref;
atomic_t nr_busy_cpus;
int has_idle_cores;
+
+ bool overutilized;
};
struct sched_domain {
@@ -173,6 +193,9 @@
typedef const struct cpumask *(*sched_domain_mask_f)(int cpu);
typedef int (*sched_domain_flags_f)(void);
+typedef
+const struct sched_group_energy * const(*sched_domain_energy_f)(int cpu);
+extern bool energy_aware(void);
#define SDTL_OVERLAP 0x01
@@ -186,6 +209,7 @@
struct sched_domain_topology_level {
sched_domain_mask_f mask;
sched_domain_flags_f sd_flags;
+ sched_domain_energy_f energy;
int flags;
int numa_level;
struct sd_data data;
diff --git a/include/linux/sched/wake_q.h b/include/linux/sched/wake_q.h
index 10b19a1..a3661e9 100644
--- a/include/linux/sched/wake_q.h
+++ b/include/linux/sched/wake_q.h
@@ -34,6 +34,7 @@
struct wake_q_head {
struct wake_q_node *first;
struct wake_q_node **lastp;
+ int count;
};
#define WAKE_Q_TAIL ((struct wake_q_node *) 0x01)
@@ -45,6 +46,7 @@
{
head->first = WAKE_Q_TAIL;
head->lastp = &head->first;
+ head->count = 0;
}
extern void wake_q_add(struct wake_q_head *head,
diff --git a/include/linux/sched/xacct.h b/include/linux/sched/xacct.h
index c078f0a..9544c9d 100644
--- a/include/linux/sched/xacct.h
+++ b/include/linux/sched/xacct.h
@@ -28,6 +28,11 @@
{
tsk->ioac.syscw++;
}
+
+static inline void inc_syscfs(struct task_struct *tsk)
+{
+ tsk->ioac.syscfs++;
+}
#else
static inline void add_rchar(struct task_struct *tsk, ssize_t amt)
{
@@ -44,6 +49,10 @@
static inline void inc_syscw(struct task_struct *tsk)
{
}
+
+static inline void inc_syscfs(struct task_struct *tsk)
+{
+}
#endif
#endif /* _LINUX_SCHED_XACCT_H */
diff --git a/include/linux/security.h b/include/linux/security.h
index ce62659..73f1ef6 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -1730,6 +1730,54 @@
#endif
+#ifdef CONFIG_BPF_SYSCALL
+union bpf_attr;
+struct bpf_map;
+struct bpf_prog;
+struct bpf_prog_aux;
+#ifdef CONFIG_SECURITY
+extern int security_bpf(int cmd, union bpf_attr *attr, unsigned int size);
+extern int security_bpf_map(struct bpf_map *map, fmode_t fmode);
+extern int security_bpf_prog(struct bpf_prog *prog);
+extern int security_bpf_map_alloc(struct bpf_map *map);
+extern void security_bpf_map_free(struct bpf_map *map);
+extern int security_bpf_prog_alloc(struct bpf_prog_aux *aux);
+extern void security_bpf_prog_free(struct bpf_prog_aux *aux);
+#else
+static inline int security_bpf(int cmd, union bpf_attr *attr,
+ unsigned int size)
+{
+ return 0;
+}
+
+static inline int security_bpf_map(struct bpf_map *map, fmode_t fmode)
+{
+ return 0;
+}
+
+static inline int security_bpf_prog(struct bpf_prog *prog)
+{
+ return 0;
+}
+
+static inline int security_bpf_map_alloc(struct bpf_map *map)
+{
+ return 0;
+}
+
+static inline void security_bpf_map_free(struct bpf_map *map)
+{ }
+
+static inline int security_bpf_prog_alloc(struct bpf_prog_aux *aux)
+{
+ return 0;
+}
+
+static inline void security_bpf_prog_free(struct bpf_prog_aux *aux)
+{ }
+#endif /* CONFIG_SECURITY */
+#endif /* CONFIG_BPF_SYSCALL */
+
#ifdef CONFIG_SECURITY
static inline char *alloc_secdata(void)
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index 8544357..2e5aa1b 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -444,6 +444,7 @@
extern bool pm_save_wakeup_count(unsigned int count);
extern void pm_wakep_autosleep_enabled(bool set);
extern void pm_print_active_wakeup_sources(void);
+extern void pm_get_active_wakeup_sources(char *pending_sources, size_t max);
static inline void lock_system_sleep(void)
{
diff --git a/include/linux/task_io_accounting.h b/include/linux/task_io_accounting.h
index 6f6acce..bb26108 100644
--- a/include/linux/task_io_accounting.h
+++ b/include/linux/task_io_accounting.h
@@ -19,6 +19,8 @@
u64 syscr;
/* # of write syscalls */
u64 syscw;
+ /* # of fsync syscalls */
+ u64 syscfs;
#endif /* CONFIG_TASK_XACCT */
#ifdef CONFIG_TASK_IO_ACCOUNTING
diff --git a/include/linux/task_io_accounting_ops.h b/include/linux/task_io_accounting_ops.h
index bb5498b..733ab62 100644
--- a/include/linux/task_io_accounting_ops.h
+++ b/include/linux/task_io_accounting_ops.h
@@ -97,6 +97,7 @@
dst->wchar += src->wchar;
dst->syscr += src->syscr;
dst->syscw += src->syscw;
+ dst->syscfs += src->syscfs;
}
#else
static inline void task_chr_io_accounting_add(struct task_io_accounting *dst,
diff --git a/include/linux/tick.h b/include/linux/tick.h
index 5cdac11..bd82199 100644
--- a/include/linux/tick.h
+++ b/include/linux/tick.h
@@ -114,26 +114,45 @@
#ifdef CONFIG_NO_HZ_COMMON
extern bool tick_nohz_enabled;
extern int tick_nohz_tick_stopped(void);
+extern void tick_nohz_idle_stop_tick(void);
+extern void tick_nohz_idle_retain_tick(void);
+extern void tick_nohz_idle_restart_tick(void);
extern void tick_nohz_idle_enter(void);
extern void tick_nohz_idle_exit(void);
extern void tick_nohz_irq_exit(void);
-extern ktime_t tick_nohz_get_sleep_length(void);
+extern bool tick_nohz_idle_got_tick(void);
+extern ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next);
extern unsigned long tick_nohz_get_idle_calls(void);
extern unsigned long tick_nohz_get_idle_calls_cpu(int cpu);
extern u64 get_cpu_idle_time_us(int cpu, u64 *last_update_time);
extern u64 get_cpu_iowait_time_us(int cpu, u64 *last_update_time);
+
+static inline void tick_nohz_idle_stop_tick_protected(void)
+{
+ local_irq_disable();
+ tick_nohz_idle_stop_tick();
+ local_irq_enable();
+}
+
#else /* !CONFIG_NO_HZ_COMMON */
#define tick_nohz_enabled (0)
static inline int tick_nohz_tick_stopped(void) { return 0; }
+static inline void tick_nohz_idle_stop_tick(void) { }
+static inline void tick_nohz_idle_retain_tick(void) { }
+static inline void tick_nohz_idle_restart_tick(void) { }
static inline void tick_nohz_idle_enter(void) { }
static inline void tick_nohz_idle_exit(void) { }
+static inline bool tick_nohz_idle_got_tick(void) { return false; }
-static inline ktime_t tick_nohz_get_sleep_length(void)
+static inline ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next)
{
- return NSEC_PER_SEC / HZ;
+ *delta_next = TICK_NSEC;
+ return *delta_next;
}
static inline u64 get_cpu_idle_time_us(int cpu, u64 *unused) { return -1; }
static inline u64 get_cpu_iowait_time_us(int cpu, u64 *unused) { return -1; }
+
+static inline void tick_nohz_idle_stop_tick_protected(void) { }
#endif /* !CONFIG_NO_HZ_COMMON */
#ifdef CONFIG_NO_HZ_FULL
diff --git a/include/linux/usb/class-dual-role.h b/include/linux/usb/class-dual-role.h
new file mode 100644
index 0000000..c6df223
--- /dev/null
+++ b/include/linux/usb/class-dual-role.h
@@ -0,0 +1,129 @@
+#ifndef __LINUX_CLASS_DUAL_ROLE_H__
+#define __LINUX_CLASS_DUAL_ROLE_H__
+
+#include <linux/workqueue.h>
+#include <linux/errno.h>
+#include <linux/types.h>
+
+struct device;
+
+enum dual_role_supported_modes {
+ DUAL_ROLE_SUPPORTED_MODES_DFP_AND_UFP = 0,
+ DUAL_ROLE_SUPPORTED_MODES_DFP,
+ DUAL_ROLE_SUPPORTED_MODES_UFP,
+/*The following should be the last element*/
+ DUAL_ROLE_PROP_SUPPORTED_MODES_TOTAL,
+};
+
+enum {
+ DUAL_ROLE_PROP_MODE_UFP = 0,
+ DUAL_ROLE_PROP_MODE_DFP,
+ DUAL_ROLE_PROP_MODE_NONE,
+/*The following should be the last element*/
+ DUAL_ROLE_PROP_MODE_TOTAL,
+};
+
+enum {
+ DUAL_ROLE_PROP_PR_SRC = 0,
+ DUAL_ROLE_PROP_PR_SNK,
+ DUAL_ROLE_PROP_PR_NONE,
+/*The following should be the last element*/
+ DUAL_ROLE_PROP_PR_TOTAL,
+
+};
+
+enum {
+ DUAL_ROLE_PROP_DR_HOST = 0,
+ DUAL_ROLE_PROP_DR_DEVICE,
+ DUAL_ROLE_PROP_DR_NONE,
+/*The following should be the last element*/
+ DUAL_ROLE_PROP_DR_TOTAL,
+};
+
+enum {
+ DUAL_ROLE_PROP_VCONN_SUPPLY_NO = 0,
+ DUAL_ROLE_PROP_VCONN_SUPPLY_YES,
+/*The following should be the last element*/
+ DUAL_ROLE_PROP_VCONN_SUPPLY_TOTAL,
+};
+
+enum dual_role_property {
+ DUAL_ROLE_PROP_SUPPORTED_MODES = 0,
+ DUAL_ROLE_PROP_MODE,
+ DUAL_ROLE_PROP_PR,
+ DUAL_ROLE_PROP_DR,
+ DUAL_ROLE_PROP_VCONN_SUPPLY,
+};
+
+struct dual_role_phy_instance;
+
+/* Description of typec port */
+struct dual_role_phy_desc {
+ /* /sys/class/dual_role_usb/<name>/ */
+ const char *name;
+ enum dual_role_supported_modes supported_modes;
+ enum dual_role_property *properties;
+ size_t num_properties;
+
+ /* Callback for "cat /sys/class/dual_role_usb/<name>/<property>" */
+ int (*get_property)(struct dual_role_phy_instance *dual_role,
+ enum dual_role_property prop,
+ unsigned int *val);
+ /* Callback for "echo <value> >
+ * /sys/class/dual_role_usb/<name>/<property>" */
+ int (*set_property)(struct dual_role_phy_instance *dual_role,
+ enum dual_role_property prop,
+ const unsigned int *val);
+ /* Decides whether userspace can change a specific property */
+ int (*property_is_writeable)(struct dual_role_phy_instance *dual_role,
+ enum dual_role_property prop);
+};
+
+struct dual_role_phy_instance {
+ const struct dual_role_phy_desc *desc;
+
+ /* Driver private data */
+ void *drv_data;
+
+ struct device dev;
+ struct work_struct changed_work;
+};
+
+#if IS_ENABLED(CONFIG_DUAL_ROLE_USB_INTF)
+extern void dual_role_instance_changed(struct dual_role_phy_instance
+ *dual_role);
+extern struct dual_role_phy_instance *__must_check
+devm_dual_role_instance_register(struct device *parent,
+ const struct dual_role_phy_desc *desc);
+extern void devm_dual_role_instance_unregister(struct device *dev,
+ struct dual_role_phy_instance
+ *dual_role);
+extern int dual_role_get_property(struct dual_role_phy_instance *dual_role,
+ enum dual_role_property prop,
+ unsigned int *val);
+extern int dual_role_set_property(struct dual_role_phy_instance *dual_role,
+ enum dual_role_property prop,
+ const unsigned int *val);
+extern int dual_role_property_is_writeable(struct dual_role_phy_instance
+ *dual_role,
+ enum dual_role_property prop);
+extern void *dual_role_get_drvdata(struct dual_role_phy_instance *dual_role);
+#else /* CONFIG_DUAL_ROLE_USB_INTF */
+static inline void dual_role_instance_changed(struct dual_role_phy_instance
+ *dual_role){}
+static inline struct dual_role_phy_instance *__must_check
+devm_dual_role_instance_register(struct device *parent,
+ const struct dual_role_phy_desc *desc)
+{
+ return ERR_PTR(-ENOSYS);
+}
+static inline void devm_dual_role_instance_unregister(struct device *dev,
+ struct dual_role_phy_instance
+ *dual_role){}
+static inline void *dual_role_get_drvdata(struct dual_role_phy_instance
+ *dual_role)
+{
+ return ERR_PTR(-ENOSYS);
+}
+#endif /* CONFIG_DUAL_ROLE_USB_INTF */
+#endif /* __LINUX_CLASS_DUAL_ROLE_H__ */
diff --git a/include/linux/usb/composite.h b/include/linux/usb/composite.h
index 590d313..9baea7b 100644
--- a/include/linux/usb/composite.h
+++ b/include/linux/usb/composite.h
@@ -586,6 +586,7 @@
struct config_group group;
struct list_head cfs_list;
struct usb_function_driver *fd;
+ struct usb_function *f;
int (*set_inst_name)(struct usb_function_instance *inst,
const char *name);
void (*free_func_inst)(struct usb_function_instance *inst);
diff --git a/include/linux/usb/f_accessory.h b/include/linux/usb/f_accessory.h
new file mode 100644
index 0000000..ebe3c4d
--- /dev/null
+++ b/include/linux/usb/f_accessory.h
@@ -0,0 +1,23 @@
+/*
+ * Gadget Function Driver for Android USB accessories
+ *
+ * Copyright (C) 2011 Google, Inc.
+ * Author: Mike Lockwood <lockwood@android.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef __LINUX_USB_F_ACCESSORY_H
+#define __LINUX_USB_F_ACCESSORY_H
+
+#include <uapi/linux/usb/f_accessory.h>
+
+#endif /* __LINUX_USB_F_ACCESSORY_H */
diff --git a/include/linux/verification.h b/include/linux/verification.h
index cfa4730..60ea906 100644
--- a/include/linux/verification.h
+++ b/include/linux/verification.h
@@ -32,9 +32,13 @@
};
extern const char *const key_being_used_for[NR__KEY_BEING_USED_FOR];
-#ifdef CONFIG_SYSTEM_DATA_VERIFICATION
-
struct key;
+struct public_key_signature;
+
+extern int verify_signature_one(const struct public_key_signature *sig,
+ struct key *trusted_keys, const char *keyid);
+
+#ifdef CONFIG_SYSTEM_DATA_VERIFICATION
extern int verify_pkcs7_signature(const void *data, size_t len,
const void *raw_pkcs7, size_t pkcs7_len,
diff --git a/include/linux/wakeup_reason.h b/include/linux/wakeup_reason.h
new file mode 100644
index 0000000..d84d8c3
--- /dev/null
+++ b/include/linux/wakeup_reason.h
@@ -0,0 +1,32 @@
+/*
+ * include/linux/wakeup_reason.h
+ *
+ * Logs the reason which caused the kernel to resume
+ * from the suspend mode.
+ *
+ * Copyright (C) 2014 Google, Inc.
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _LINUX_WAKEUP_REASON_H
+#define _LINUX_WAKEUP_REASON_H
+
+#define MAX_SUSPEND_ABORT_LEN 256
+
+void log_wakeup_reason(int irq);
+int check_wakeup_reason(int irq);
+
+#ifdef CONFIG_SUSPEND
+void log_suspend_abort_reason(const char *fmt, ...);
+#else
+static inline void log_suspend_abort_reason(const char *fmt, ...) { }
+#endif
+
+#endif /* _LINUX_WAKEUP_REASON_H */
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 57a8e98..2219cce 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -47,6 +47,8 @@
unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t flags);
void zs_free(struct zs_pool *pool, unsigned long obj);
+size_t zs_huge_class_size(struct zs_pool *pool);
+
void *zs_map_object(struct zs_pool *pool, unsigned long handle,
enum zs_mapmode mm);
void zs_unmap_object(struct zs_pool *pool, unsigned long handle);
diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index 35f5aab..bcd9b88 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -261,6 +261,8 @@
void addrconf_prefix_rcv(struct net_device *dev,
u8 *opt, int len, bool sllao);
+u32 addrconf_rt_table(const struct net_device *dev, u32 default_table);
+
/*
* anycast prototypes (anycast.c)
*/
diff --git a/include/net/dst.h b/include/net/dst.h
index ebfb432..cf7ce4fd 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -490,6 +490,14 @@
return dst_orig;
}
+static inline struct dst_entry *
+xfrm_lookup_with_ifid(struct net *net, struct dst_entry *dst_orig,
+ const struct flowi *fl, const struct sock *sk,
+ int flags, u32 if_id)
+{
+ return dst_orig;
+}
+
static inline struct dst_entry *xfrm_lookup_route(struct net *net,
struct dst_entry *dst_orig,
const struct flowi *fl,
@@ -509,6 +517,12 @@
const struct flowi *fl, const struct sock *sk,
int flags);
+struct dst_entry *xfrm_lookup_with_ifid(struct net *net,
+ struct dst_entry *dst_orig,
+ const struct flowi *fl,
+ const struct sock *sk, int flags,
+ u32 if_id);
+
struct dst_entry *xfrm_lookup_route(struct net *net, struct dst_entry *dst_orig,
const struct flowi *fl, const struct sock *sk,
int flags);
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index 792c3f6..f5223bf 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -285,7 +285,7 @@
struct kernel_param;
-int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp);
+int nf_conntrack_set_hashsize(const char *val, const struct kernel_param *kp);
int nf_conntrack_hash_resize(unsigned int hashsize);
extern struct hlist_nulls_head *nf_conntrack_hash;
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 0c828aa..a307f83 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -275,6 +275,7 @@
extern int sysctl_tcp_invalid_ratelimit;
extern int sysctl_tcp_pacing_ss_ratio;
extern int sysctl_tcp_pacing_ca_ratio;
+extern int sysctl_tcp_default_init_rwnd;
extern atomic_long_t tcp_memory_allocated;
extern struct percpu_counter tcp_sockets_allocated;
diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index db99efb..d290de0 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -23,6 +23,7 @@
#include <net/ipv6.h>
#include <net/ip6_fib.h>
#include <net/flow.h>
+#include <net/gro_cells.h>
#include <linux/interrupt.h>
@@ -147,6 +148,7 @@
struct xfrm_id id;
struct xfrm_selector sel;
struct xfrm_mark mark;
+ u32 if_id;
u32 tfcpad;
u32 genid;
@@ -166,7 +168,7 @@
int header_len;
int trailer_len;
u32 extra_flags;
- u32 output_mark;
+ struct xfrm_mark smark;
} props;
struct xfrm_lifetime_cfg lft;
@@ -292,6 +294,13 @@
int (*overflow)(struct xfrm_state *x, struct sk_buff *skb);
};
+struct xfrm_if_cb {
+ struct xfrm_if *(*decode_session)(struct sk_buff *skb);
+};
+
+void xfrm_if_register_cb(const struct xfrm_if_cb *ifcb);
+void xfrm_if_unregister_cb(void);
+
struct net_device;
struct xfrm_type;
struct xfrm_dst;
@@ -573,6 +582,7 @@
atomic_t genid;
u32 priority;
u32 index;
+ u32 if_id;
struct xfrm_mark mark;
struct xfrm_selector selector;
struct xfrm_lifetime_cfg lft;
@@ -1006,6 +1016,22 @@
void xfrm_dst_ifdown(struct dst_entry *dst, struct net_device *dev);
+struct xfrm_if_parms {
+ char name[IFNAMSIZ]; /* name of XFRM device */
+ int link; /* ifindex of underlying L2 interface */
+ u32 if_id; /* interface identifyer */
+};
+
+struct xfrm_if {
+ struct xfrm_if __rcu *next; /* next interface in list */
+ struct net_device *dev; /* virtual device associated with interface */
+ struct net_device *phydev; /* physical device */
+ struct net *net; /* netns for packet i/o */
+ struct xfrm_if_parms p; /* interface parms */
+
+ struct gro_cells gro_cells;
+};
+
struct xfrm_offload {
/* Output sequence number for replay protection on offloading. */
struct {
@@ -1500,8 +1526,8 @@
const struct flowi *fl,
struct xfrm_tmpl *tmpl,
struct xfrm_policy *pol, int *err,
- unsigned short family);
-struct xfrm_state *xfrm_stateonly_find(struct net *net, u32 mark,
+ unsigned short family, u32 if_id);
+struct xfrm_state *xfrm_stateonly_find(struct net *net, u32 mark, u32 if_id,
xfrm_address_t *daddr,
xfrm_address_t *saddr,
unsigned short family,
@@ -1658,20 +1684,20 @@
void *);
void xfrm_policy_walk_done(struct xfrm_policy_walk *walk, struct net *net);
int xfrm_policy_insert(int dir, struct xfrm_policy *policy, int excl);
-struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark,
+struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark, u32 if_id,
u8 type, int dir,
struct xfrm_selector *sel,
struct xfrm_sec_ctx *ctx, int delete,
int *err);
-struct xfrm_policy *xfrm_policy_byid(struct net *net, u32 mark, u8, int dir,
- u32 id, int delete, int *err);
+struct xfrm_policy *xfrm_policy_byid(struct net *net, u32 mark, u32 if_id, u8,
+ int dir, u32 id, int delete, int *err);
int xfrm_policy_flush(struct net *net, u8 type, bool task_valid);
void xfrm_policy_hash_rebuild(struct net *net);
u32 xfrm_get_acqseq(void);
int verify_spi_info(u8 proto, u32 min, u32 max);
int xfrm_alloc_spi(struct xfrm_state *x, u32 minspi, u32 maxspi);
struct xfrm_state *xfrm_find_acq(struct net *net, const struct xfrm_mark *mark,
- u8 mode, u32 reqid, u8 proto,
+ u8 mode, u32 reqid, u32 if_id, u8 proto,
const xfrm_address_t *daddr,
const xfrm_address_t *saddr, int create,
unsigned short family);
@@ -1948,6 +1974,22 @@
return ret;
}
+static inline __u32 xfrm_smark_get(__u32 mark, struct xfrm_state *x)
+{
+ struct xfrm_mark *m = &x->props.smark;
+
+ return (m->v & m->m) | (mark & ~m->m);
+}
+
+static inline int xfrm_if_id_put(struct sk_buff *skb, __u32 if_id)
+{
+ int ret = 0;
+
+ if (if_id)
+ ret = nla_put_u32(skb, XFRMA_IF_ID, if_id);
+ return ret;
+}
+
static inline int xfrm_tunnel_check(struct sk_buff *skb, struct xfrm_state *x,
unsigned int family)
{
diff --git a/include/trace/events/android_fs.h b/include/trace/events/android_fs.h
new file mode 100644
index 0000000..4950953
--- /dev/null
+++ b/include/trace/events/android_fs.h
@@ -0,0 +1,65 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM android_fs
+
+#if !defined(_TRACE_ANDROID_FS_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_ANDROID_FS_H
+
+#include <linux/tracepoint.h>
+#include <trace/events/android_fs_template.h>
+
+DEFINE_EVENT(android_fs_data_start_template, android_fs_dataread_start,
+ TP_PROTO(struct inode *inode, loff_t offset, int bytes,
+ pid_t pid, char *pathname, char *command),
+ TP_ARGS(inode, offset, bytes, pid, pathname, command));
+
+DEFINE_EVENT(android_fs_data_end_template, android_fs_dataread_end,
+ TP_PROTO(struct inode *inode, loff_t offset, int bytes),
+ TP_ARGS(inode, offset, bytes));
+
+DEFINE_EVENT(android_fs_data_start_template, android_fs_datawrite_start,
+ TP_PROTO(struct inode *inode, loff_t offset, int bytes,
+ pid_t pid, char *pathname, char *command),
+ TP_ARGS(inode, offset, bytes, pid, pathname, command));
+
+DEFINE_EVENT(android_fs_data_end_template, android_fs_datawrite_end,
+ TP_PROTO(struct inode *inode, loff_t offset, int bytes),
+ TP_ARGS(inode, offset, bytes));
+
+#endif /* _TRACE_ANDROID_FS_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
+
+#ifndef ANDROID_FSTRACE_GET_PATHNAME
+#define ANDROID_FSTRACE_GET_PATHNAME
+
+/* Sizes an on-stack array, so careful if sizing this up ! */
+#define MAX_TRACE_PATHBUF_LEN 256
+
+static inline char *
+android_fstrace_get_pathname(char *buf, int buflen, struct inode *inode)
+{
+ char *path;
+ struct dentry *d;
+
+ /*
+ * d_obtain_alias() will either iput() if it locates an existing
+ * dentry or transfer the reference to the new dentry created.
+ * So get an extra reference here.
+ */
+ ihold(inode);
+ d = d_obtain_alias(inode);
+ if (likely(!IS_ERR(d))) {
+ path = dentry_path_raw(d, buf, buflen);
+ if (unlikely(IS_ERR(path))) {
+ strcpy(buf, "ERROR");
+ path = buf;
+ }
+ dput(d);
+ } else {
+ strcpy(buf, "ERROR");
+ path = buf;
+ }
+ return path;
+}
+#endif
diff --git a/include/trace/events/android_fs_template.h b/include/trace/events/android_fs_template.h
new file mode 100644
index 0000000..b23d17b
--- /dev/null
+++ b/include/trace/events/android_fs_template.h
@@ -0,0 +1,64 @@
+#if !defined(_TRACE_ANDROID_FS_TEMPLATE_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_ANDROID_FS_TEMPLATE_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_EVENT_CLASS(android_fs_data_start_template,
+ TP_PROTO(struct inode *inode, loff_t offset, int bytes,
+ pid_t pid, char *pathname, char *command),
+ TP_ARGS(inode, offset, bytes, pid, pathname, command),
+ TP_STRUCT__entry(
+ __string(pathbuf, pathname);
+ __field(loff_t, offset);
+ __field(int, bytes);
+ __field(loff_t, i_size);
+ __string(cmdline, command);
+ __field(pid_t, pid);
+ __field(ino_t, ino);
+ ),
+ TP_fast_assign(
+ {
+ /*
+ * Replace the spaces in filenames and cmdlines
+ * because this screws up the tooling that parses
+ * the traces.
+ */
+ __assign_str(pathbuf, pathname);
+ (void)strreplace(__get_str(pathbuf), ' ', '_');
+ __entry->offset = offset;
+ __entry->bytes = bytes;
+ __entry->i_size = i_size_read(inode);
+ __assign_str(cmdline, command);
+ (void)strreplace(__get_str(cmdline), ' ', '_');
+ __entry->pid = pid;
+ __entry->ino = inode->i_ino;
+ }
+ ),
+ TP_printk("entry_name %s, offset %llu, bytes %d, cmdline %s,"
+ " pid %d, i_size %llu, ino %lu",
+ __get_str(pathbuf), __entry->offset, __entry->bytes,
+ __get_str(cmdline), __entry->pid, __entry->i_size,
+ (unsigned long) __entry->ino)
+);
+
+DECLARE_EVENT_CLASS(android_fs_data_end_template,
+ TP_PROTO(struct inode *inode, loff_t offset, int bytes),
+ TP_ARGS(inode, offset, bytes),
+ TP_STRUCT__entry(
+ __field(ino_t, ino);
+ __field(loff_t, offset);
+ __field(int, bytes);
+ ),
+ TP_fast_assign(
+ {
+ __entry->ino = inode->i_ino;
+ __entry->offset = offset;
+ __entry->bytes = bytes;
+ }
+ ),
+ TP_printk("ino %lu, offset %llu, bytes %d",
+ (unsigned long) __entry->ino,
+ __entry->offset, __entry->bytes)
+);
+
+#endif /* _TRACE_ANDROID_FS_TEMPLATE_H */
diff --git a/include/trace/events/f2fs.h b/include/trace/events/f2fs.h
index 7ab4049..06c87f9 100644
--- a/include/trace/events/f2fs.h
+++ b/include/trace/events/f2fs.h
@@ -137,6 +137,19 @@
{ CP_UMOUNT, "Umount" }, \
{ CP_TRIMMED, "Trimmed" })
+#define show_fsync_cpreason(type) \
+ __print_symbolic(type, \
+ { CP_NO_NEEDED, "no needed" }, \
+ { CP_NON_REGULAR, "non regular" }, \
+ { CP_HARDLINK, "hardlink" }, \
+ { CP_SB_NEED_CP, "sb needs cp" }, \
+ { CP_WRONG_PINO, "wrong pino" }, \
+ { CP_NO_SPC_ROLL, "no space roll forward" }, \
+ { CP_NODE_NEED_CP, "node needs cp" }, \
+ { CP_FASTBOOT_MODE, "fastboot mode" }, \
+ { CP_SPEC_LOG_NUM, "log type is 2" }, \
+ { CP_RECOVER_DIR, "dir needs recovery" })
+
struct victim_sel_policy;
struct f2fs_map_blocks;
@@ -211,14 +224,14 @@
TRACE_EVENT(f2fs_sync_file_exit,
- TP_PROTO(struct inode *inode, int need_cp, int datasync, int ret),
+ TP_PROTO(struct inode *inode, int cp_reason, int datasync, int ret),
- TP_ARGS(inode, need_cp, datasync, ret),
+ TP_ARGS(inode, cp_reason, datasync, ret),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(ino_t, ino)
- __field(int, need_cp)
+ __field(int, cp_reason)
__field(int, datasync)
__field(int, ret)
),
@@ -226,15 +239,15 @@
TP_fast_assign(
__entry->dev = inode->i_sb->s_dev;
__entry->ino = inode->i_ino;
- __entry->need_cp = need_cp;
+ __entry->cp_reason = cp_reason;
__entry->datasync = datasync;
__entry->ret = ret;
),
- TP_printk("dev = (%d,%d), ino = %lu, checkpoint is %s, "
+ TP_printk("dev = (%d,%d), ino = %lu, cp_reason: %s, "
"datasync = %d, ret = %d",
show_dev_ino(__entry),
- __entry->need_cp ? "needed" : "not needed",
+ show_fsync_cpreason(__entry->cp_reason),
__entry->datasync,
__entry->ret)
);
@@ -729,6 +742,91 @@
__entry->free)
);
+TRACE_EVENT(f2fs_lookup_start,
+
+ TP_PROTO(struct inode *dir, struct dentry *dentry, unsigned int flags),
+
+ TP_ARGS(dir, dentry, flags),
+
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(ino_t, ino)
+ __field(const char *, name)
+ __field(unsigned int, flags)
+ ),
+
+ TP_fast_assign(
+ __entry->dev = dir->i_sb->s_dev;
+ __entry->ino = dir->i_ino;
+ __entry->name = dentry->d_name.name;
+ __entry->flags = flags;
+ ),
+
+ TP_printk("dev = (%d,%d), pino = %lu, name:%s, flags:%u",
+ show_dev_ino(__entry),
+ __entry->name,
+ __entry->flags)
+);
+
+TRACE_EVENT(f2fs_lookup_end,
+
+ TP_PROTO(struct inode *dir, struct dentry *dentry, nid_t ino,
+ int err),
+
+ TP_ARGS(dir, dentry, ino, err),
+
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(ino_t, ino)
+ __field(const char *, name)
+ __field(nid_t, cino)
+ __field(int, err)
+ ),
+
+ TP_fast_assign(
+ __entry->dev = dir->i_sb->s_dev;
+ __entry->ino = dir->i_ino;
+ __entry->name = dentry->d_name.name;
+ __entry->cino = ino;
+ __entry->err = err;
+ ),
+
+ TP_printk("dev = (%d,%d), pino = %lu, name:%s, ino:%u, err:%d",
+ show_dev_ino(__entry),
+ __entry->name,
+ __entry->cino,
+ __entry->err)
+);
+
+TRACE_EVENT(f2fs_readdir,
+
+ TP_PROTO(struct inode *dir, loff_t start_pos, loff_t end_pos, int err),
+
+ TP_ARGS(dir, start_pos, end_pos, err),
+
+ TP_STRUCT__entry(
+ __field(dev_t, dev)
+ __field(ino_t, ino)
+ __field(loff_t, start)
+ __field(loff_t, end)
+ __field(int, err)
+ ),
+
+ TP_fast_assign(
+ __entry->dev = dir->i_sb->s_dev;
+ __entry->ino = dir->i_ino;
+ __entry->start = start_pos;
+ __entry->end = end_pos;
+ __entry->err = err;
+ ),
+
+ TP_printk("dev = (%d,%d), ino = %lu, start_pos:%llu, end_pos:%llu, err:%d",
+ show_dev_ino(__entry),
+ __entry->start,
+ __entry->end,
+ __entry->err)
+);
+
TRACE_EVENT(f2fs_fallocate,
TP_PROTO(struct inode *inode, int mode,
@@ -1287,6 +1385,13 @@
TP_ARGS(dev, blkstart, blklen)
);
+DEFINE_EVENT(f2fs_discard, f2fs_remove_discard,
+
+ TP_PROTO(struct block_device *dev, block_t blkstart, block_t blklen),
+
+ TP_ARGS(dev, blkstart, blklen)
+);
+
TRACE_EVENT(f2fs_issue_reset_zone,
TP_PROTO(struct block_device *dev, block_t blkstart),
diff --git a/include/trace/events/gpu.h b/include/trace/events/gpu.h
new file mode 100644
index 0000000..7e15cdf
--- /dev/null
+++ b/include/trace/events/gpu.h
@@ -0,0 +1,143 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM gpu
+
+#if !defined(_TRACE_GPU_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_GPU_H
+
+#include <linux/tracepoint.h>
+#include <linux/time.h>
+
+#define show_secs_from_ns(ns) \
+ ({ \
+ u64 t = ns + (NSEC_PER_USEC / 2); \
+ do_div(t, NSEC_PER_SEC); \
+ t; \
+ })
+
+#define show_usecs_from_ns(ns) \
+ ({ \
+ u64 t = ns + (NSEC_PER_USEC / 2) ; \
+ u32 rem; \
+ do_div(t, NSEC_PER_USEC); \
+ rem = do_div(t, USEC_PER_SEC); \
+ })
+
+/*
+ * The gpu_sched_switch event indicates that a switch from one GPU context to
+ * another occurred on one of the GPU hardware blocks.
+ *
+ * The gpu_name argument identifies the GPU hardware block. Each independently
+ * scheduled GPU hardware block should have a different name. This may be used
+ * in different ways for different GPUs. For example, if a GPU includes
+ * multiple processing cores it may use names "GPU 0", "GPU 1", etc. If a GPU
+ * includes a separately scheduled 2D and 3D hardware block, it might use the
+ * names "2D" and "3D".
+ *
+ * The timestamp argument is the timestamp at which the switch occurred on the
+ * GPU. These timestamps are in units of nanoseconds and must use
+ * approximately the same time as sched_clock, though they need not come from
+ * any CPU clock. The timestamps for a single hardware block must be
+ * monotonically nondecreasing. This means that if a variable compensation
+ * offset is used to translate from some other clock to the sched_clock, then
+ * care must be taken when increasing that offset, and doing so may result in
+ * multiple events with the same timestamp.
+ *
+ * The next_ctx_id argument identifies the next context that was running on
+ * the GPU hardware block. A value of 0 indicates that the hardware block
+ * will be idle.
+ *
+ * The next_prio argument indicates the priority of the next context at the
+ * time of the event. The exact numeric values may mean different things for
+ * different GPUs, but they should follow the rule that lower values indicate a
+ * higher priority.
+ *
+ * The next_job_id argument identifies the batch of work that the GPU will be
+ * working on. This should correspond to a job_id that was previously traced
+ * as a gpu_job_enqueue event when the batch of work was created.
+ */
+TRACE_EVENT(gpu_sched_switch,
+
+ TP_PROTO(const char *gpu_name, u64 timestamp,
+ u32 next_ctx_id, s32 next_prio, u32 next_job_id),
+
+ TP_ARGS(gpu_name, timestamp, next_ctx_id, next_prio, next_job_id),
+
+ TP_STRUCT__entry(
+ __string( gpu_name, gpu_name )
+ __field( u64, timestamp )
+ __field( u32, next_ctx_id )
+ __field( s32, next_prio )
+ __field( u32, next_job_id )
+ ),
+
+ TP_fast_assign(
+ __assign_str(gpu_name, gpu_name);
+ __entry->timestamp = timestamp;
+ __entry->next_ctx_id = next_ctx_id;
+ __entry->next_prio = next_prio;
+ __entry->next_job_id = next_job_id;
+ ),
+
+ TP_printk("gpu_name=%s ts=%llu.%06lu next_ctx_id=%lu next_prio=%ld "
+ "next_job_id=%lu",
+ __get_str(gpu_name),
+ (unsigned long long)show_secs_from_ns(__entry->timestamp),
+ (unsigned long)show_usecs_from_ns(__entry->timestamp),
+ (unsigned long)__entry->next_ctx_id,
+ (long)__entry->next_prio,
+ (unsigned long)__entry->next_job_id)
+);
+
+/*
+ * The gpu_job_enqueue event indicates that a batch of work has been queued up
+ * to be processed by the GPU. This event is not intended to indicate that
+ * the batch of work has been submitted to the GPU hardware, but rather that
+ * it has been submitted to the GPU kernel driver.
+ *
+ * This event should be traced on the thread that initiated the work being
+ * queued. For example, if a batch of work is submitted to the kernel by a
+ * userland thread, the event should be traced on that thread.
+ *
+ * The ctx_id field identifies the GPU context in which the batch of work
+ * being queued is to be run.
+ *
+ * The job_id field identifies the batch of work being queued within the given
+ * GPU context. The first batch of work submitted for a given GPU context
+ * should have a job_id of 0, and each subsequent batch of work should
+ * increment the job_id by 1.
+ *
+ * The type field identifies the type of the job being enqueued. The job
+ * types may be different for different GPU hardware. For example, a GPU may
+ * differentiate between "2D", "3D", and "compute" jobs.
+ */
+TRACE_EVENT(gpu_job_enqueue,
+
+ TP_PROTO(u32 ctx_id, u32 job_id, const char *type),
+
+ TP_ARGS(ctx_id, job_id, type),
+
+ TP_STRUCT__entry(
+ __field( u32, ctx_id )
+ __field( u32, job_id )
+ __string( type, type )
+ ),
+
+ TP_fast_assign(
+ __entry->ctx_id = ctx_id;
+ __entry->job_id = job_id;
+ __assign_str(type, type);
+ ),
+
+ TP_printk("ctx_id=%lu job_id=%lu type=%s",
+ (unsigned long)__entry->ctx_id,
+ (unsigned long)__entry->job_id,
+ __get_str(type))
+);
+
+#undef show_secs_from_ns
+#undef show_usecs_from_ns
+
+#endif /* _TRACE_GPU_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
diff --git a/include/trace/events/net.h b/include/trace/events/net.h
index 9c88673..f1a300c 100644
--- a/include/trace/events/net.h
+++ b/include/trace/events/net.h
@@ -58,7 +58,7 @@
__entry->gso_type = skb_shinfo(skb)->gso_type;
),
- TP_printk("dev=%s queue_mapping=%u skbaddr=%p vlan_tagged=%d vlan_proto=0x%04x vlan_tci=0x%04x protocol=0x%04x ip_summed=%d len=%u data_len=%u network_offset=%d transport_offset_valid=%d transport_offset=%d tx_flags=%d gso_size=%d gso_segs=%d gso_type=%#x",
+ TP_printk("dev=%s queue_mapping=%u skbaddr=%pK vlan_tagged=%d vlan_proto=0x%04x vlan_tci=0x%04x protocol=0x%04x ip_summed=%d len=%u data_len=%u network_offset=%d transport_offset_valid=%d transport_offset=%d tx_flags=%d gso_size=%d gso_segs=%d gso_type=%#x",
__get_str(name), __entry->queue_mapping, __entry->skbaddr,
__entry->vlan_tagged, __entry->vlan_proto, __entry->vlan_tci,
__entry->protocol, __entry->ip_summed, __entry->len,
@@ -91,7 +91,7 @@
__assign_str(name, dev->name);
),
- TP_printk("dev=%s skbaddr=%p len=%u rc=%d",
+ TP_printk("dev=%s skbaddr=%pK len=%u rc=%d",
__get_str(name), __entry->skbaddr, __entry->len, __entry->rc)
);
@@ -113,7 +113,7 @@
__assign_str(name, skb->dev->name);
),
- TP_printk("dev=%s skbaddr=%p len=%u",
+ TP_printk("dev=%s skbaddr=%pK len=%u",
__get_str(name), __entry->skbaddr, __entry->len)
)
@@ -192,7 +192,7 @@
__entry->gso_type = skb_shinfo(skb)->gso_type;
),
- TP_printk("dev=%s napi_id=%#x queue_mapping=%u skbaddr=%p vlan_tagged=%d vlan_proto=0x%04x vlan_tci=0x%04x protocol=0x%04x ip_summed=%d hash=0x%08x l4_hash=%d len=%u data_len=%u truesize=%u mac_header_valid=%d mac_header=%d nr_frags=%d gso_size=%d gso_type=%#x",
+ TP_printk("dev=%s napi_id=%#x queue_mapping=%u skbaddr=%pK vlan_tagged=%d vlan_proto=0x%04x vlan_tci=0x%04x protocol=0x%04x ip_summed=%d hash=0x%08x l4_hash=%d len=%u data_len=%u truesize=%u mac_header_valid=%d mac_header=%d nr_frags=%d gso_size=%d gso_type=%#x",
__get_str(name), __entry->napi_id, __entry->queue_mapping,
__entry->skbaddr, __entry->vlan_tagged, __entry->vlan_proto,
__entry->vlan_tci, __entry->protocol, __entry->ip_summed,
diff --git a/include/trace/events/power.h b/include/trace/events/power.h
index 908977d..3eac1f9 100644
--- a/include/trace/events/power.h
+++ b/include/trace/events/power.h
@@ -148,6 +148,31 @@
TP_ARGS(frequency, cpu_id)
);
+TRACE_EVENT(cpu_frequency_limits,
+
+ TP_PROTO(unsigned int max_freq, unsigned int min_freq,
+ unsigned int cpu_id),
+
+ TP_ARGS(max_freq, min_freq, cpu_id),
+
+ TP_STRUCT__entry(
+ __field( u32, min_freq )
+ __field( u32, max_freq )
+ __field( u32, cpu_id )
+ ),
+
+ TP_fast_assign(
+ __entry->min_freq = min_freq;
+ __entry->max_freq = max_freq;
+ __entry->cpu_id = cpu_id;
+ ),
+
+ TP_printk("min=%lu max=%lu cpu_id=%lu",
+ (unsigned long)__entry->min_freq,
+ (unsigned long)__entry->max_freq,
+ (unsigned long)__entry->cpu_id)
+);
+
TRACE_EVENT(device_pm_callback_start,
TP_PROTO(struct device *dev, const char *pm_ops, int event),
@@ -301,6 +326,25 @@
TP_ARGS(name, state, cpu_id)
);
+TRACE_EVENT(clock_set_parent,
+
+ TP_PROTO(const char *name, const char *parent_name),
+
+ TP_ARGS(name, parent_name),
+
+ TP_STRUCT__entry(
+ __string( name, name )
+ __string( parent_name, parent_name )
+ ),
+
+ TP_fast_assign(
+ __assign_str(name, name);
+ __assign_str(parent_name, parent_name);
+ ),
+
+ TP_printk("%s parent=%s", __get_str(name), __get_str(parent_name))
+);
+
/*
* The power domain events are used for power domains transitions
*/
diff --git a/include/trace/events/preemptirq.h b/include/trace/events/preemptirq.h
new file mode 100644
index 0000000..9c4eb33
--- /dev/null
+++ b/include/trace/events/preemptirq.h
@@ -0,0 +1,73 @@
+#ifdef CONFIG_PREEMPTIRQ_EVENTS
+
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM preemptirq
+
+#if !defined(_TRACE_PREEMPTIRQ_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PREEMPTIRQ_H
+
+#include <linux/ktime.h>
+#include <linux/tracepoint.h>
+#include <linux/string.h>
+#include <asm/sections.h>
+
+DECLARE_EVENT_CLASS(preemptirq_template,
+
+ TP_PROTO(unsigned long ip, unsigned long parent_ip),
+
+ TP_ARGS(ip, parent_ip),
+
+ TP_STRUCT__entry(
+ __field(u32, caller_offs)
+ __field(u32, parent_offs)
+ ),
+
+ TP_fast_assign(
+ __entry->caller_offs = (u32)(ip - (unsigned long)_stext);
+ __entry->parent_offs = (u32)(parent_ip - (unsigned long)_stext);
+ ),
+
+ TP_printk("caller=%pF parent=%pF",
+ (void *)((unsigned long)(_stext) + __entry->caller_offs),
+ (void *)((unsigned long)(_stext) + __entry->parent_offs))
+);
+
+#ifndef CONFIG_PROVE_LOCKING
+DEFINE_EVENT(preemptirq_template, irq_disable,
+ TP_PROTO(unsigned long ip, unsigned long parent_ip),
+ TP_ARGS(ip, parent_ip));
+
+DEFINE_EVENT(preemptirq_template, irq_enable,
+ TP_PROTO(unsigned long ip, unsigned long parent_ip),
+ TP_ARGS(ip, parent_ip));
+#endif
+
+#ifdef CONFIG_DEBUG_PREEMPT
+DEFINE_EVENT(preemptirq_template, preempt_disable,
+ TP_PROTO(unsigned long ip, unsigned long parent_ip),
+ TP_ARGS(ip, parent_ip));
+
+DEFINE_EVENT(preemptirq_template, preempt_enable,
+ TP_PROTO(unsigned long ip, unsigned long parent_ip),
+ TP_ARGS(ip, parent_ip));
+#endif
+
+#endif /* _TRACE_PREEMPTIRQ_H */
+
+#include <trace/define_trace.h>
+
+#endif /* !CONFIG_PREEMPTIRQ_EVENTS */
+
+#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || defined(CONFIG_PROVE_LOCKING)
+#define trace_irq_enable(...)
+#define trace_irq_disable(...)
+#define trace_irq_enable_rcuidle(...)
+#define trace_irq_disable_rcuidle(...)
+#endif
+
+#if !defined(CONFIG_PREEMPTIRQ_EVENTS) || !defined(CONFIG_DEBUG_PREEMPT)
+#define trace_preempt_enable(...)
+#define trace_preempt_disable(...)
+#define trace_preempt_enable_rcuidle(...)
+#define trace_preempt_disable_rcuidle(...)
+#endif
diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h
index 0812cd5..76c6ae3 100644
--- a/include/trace/events/sched.h
+++ b/include/trace/events/sched.h
@@ -226,7 +226,7 @@
DEFINE_EVENT(sched_process_template, sched_process_free,
TP_PROTO(struct task_struct *p),
TP_ARGS(p));
-
+
/*
* Tracepoint for a task exiting:
@@ -381,6 +381,30 @@
TP_ARGS(tsk, delay));
/*
+ * Tracepoint for recording the cause of uninterruptible sleep.
+ */
+TRACE_EVENT(sched_blocked_reason,
+
+ TP_PROTO(struct task_struct *tsk),
+
+ TP_ARGS(tsk),
+
+ TP_STRUCT__entry(
+ __field( pid_t, pid )
+ __field( void*, caller )
+ __field( bool, io_wait )
+ ),
+
+ TP_fast_assign(
+ __entry->pid = tsk->pid;
+ __entry->caller = (void*)get_wchan(tsk);
+ __entry->io_wait = tsk->in_iowait;
+ ),
+
+ TP_printk("pid=%d iowait=%d caller=%pS", __entry->pid, __entry->io_wait, __entry->caller)
+);
+
+/*
* Tracepoint for accounting runtime (time the task is executing
* on a CPU).
*/
@@ -572,6 +596,620 @@
TP_printk("cpu=%d", __entry->cpu)
);
+
+#ifdef CONFIG_SMP
+#ifdef CREATE_TRACE_POINTS
+static inline
+int __trace_sched_cpu(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+#ifdef CONFIG_FAIR_GROUP_SCHED
+ struct rq *rq = cfs_rq ? cfs_rq->rq : NULL;
+#else
+ struct rq *rq = cfs_rq ? container_of(cfs_rq, struct rq, cfs) : NULL;
+#endif
+ return rq ? cpu_of(rq)
+ : task_cpu((container_of(se, struct task_struct, se)));
+}
+
+static inline
+int __trace_sched_path(struct cfs_rq *cfs_rq, char *path, int len)
+{
+#ifdef CONFIG_FAIR_GROUP_SCHED
+ int l = path ? len : 0;
+
+ if (cfs_rq && task_group_is_autogroup(cfs_rq->tg))
+ return autogroup_path(cfs_rq->tg, path, l) + 1;
+ else if (cfs_rq && cfs_rq->tg->css.cgroup)
+ return cgroup_path(cfs_rq->tg->css.cgroup, path, l) + 1;
+#endif
+ if (path)
+ strcpy(path, "(null)");
+
+ return strlen("(null)");
+}
+
+static inline
+struct cfs_rq *__trace_sched_group_cfs_rq(struct sched_entity *se)
+{
+#ifdef CONFIG_FAIR_GROUP_SCHED
+ return se->my_q;
+#else
+ return NULL;
+#endif
+}
+#endif /* CREATE_TRACE_POINTS */
+
+#ifdef CONFIG_SCHED_WALT
+extern unsigned int sysctl_sched_use_walt_cpu_util;
+extern unsigned int sysctl_sched_use_walt_task_util;
+extern unsigned int walt_ravg_window;
+extern bool walt_disabled;
+
+#define walt_util(util_var, demand_sum) {\
+ u64 sum = demand_sum << SCHED_CAPACITY_SHIFT;\
+ do_div(sum, walt_ravg_window);\
+ util_var = (typeof(util_var))sum;\
+ }
+#endif
+
+/*
+ * Tracepoint for cfs_rq load tracking:
+ */
+TRACE_EVENT(sched_load_cfs_rq,
+
+ TP_PROTO(struct cfs_rq *cfs_rq),
+
+ TP_ARGS(cfs_rq),
+
+ TP_STRUCT__entry(
+ __field( int, cpu )
+ __dynamic_array(char, path,
+ __trace_sched_path(cfs_rq, NULL, 0) )
+ __field( unsigned long, load )
+ __field( unsigned long, util )
+ __field( unsigned long, util_pelt )
+ __field( unsigned long, util_walt )
+ ),
+
+ TP_fast_assign(
+ __entry->cpu = __trace_sched_cpu(cfs_rq, NULL);
+ __trace_sched_path(cfs_rq, __get_dynamic_array(path),
+ __get_dynamic_array_len(path));
+ __entry->load = cfs_rq->runnable_load_avg;
+ __entry->util = cfs_rq->avg.util_avg;
+ __entry->util_pelt = cfs_rq->avg.util_avg;
+ __entry->util_walt = 0;
+#ifdef CONFIG_SCHED_WALT
+ if (&cfs_rq->rq->cfs == cfs_rq) {
+ walt_util(__entry->util_walt,
+ cfs_rq->rq->prev_runnable_sum);
+ if (!walt_disabled && sysctl_sched_use_walt_cpu_util)
+ __entry->util = __entry->util_walt;
+ }
+#endif
+ ),
+
+ TP_printk("cpu=%d path=%s load=%lu util=%lu util_pelt=%lu util_walt=%lu",
+ __entry->cpu, __get_str(path), __entry->load, __entry->util,
+ __entry->util_pelt, __entry->util_walt)
+);
+
+/*
+ * Tracepoint for rt_rq load tracking:
+ */
+struct rt_rq;
+
+TRACE_EVENT(sched_load_rt_rq,
+
+ TP_PROTO(int cpu, struct rt_rq *rt_rq),
+
+ TP_ARGS(cpu, rt_rq),
+
+ TP_STRUCT__entry(
+ __field( int, cpu )
+ __field( unsigned long, util )
+ ),
+
+ TP_fast_assign(
+ __entry->cpu = cpu;
+ __entry->util = rt_rq->avg.util_avg;
+ ),
+
+ TP_printk("cpu=%d util=%lu", __entry->cpu,
+ __entry->util)
+);
+
+/*
+ * Tracepoint for sched_entity load tracking:
+ */
+TRACE_EVENT(sched_load_se,
+
+ TP_PROTO(struct sched_entity *se),
+
+ TP_ARGS(se),
+
+ TP_STRUCT__entry(
+ __field( int, cpu )
+ __dynamic_array(char, path,
+ __trace_sched_path(__trace_sched_group_cfs_rq(se), NULL, 0) )
+ __array( char, comm, TASK_COMM_LEN )
+ __field( pid_t, pid )
+ __field( unsigned long, load )
+ __field( unsigned long, util )
+ __field( unsigned long, util_pelt )
+ __field( unsigned long, util_walt )
+ ),
+
+ TP_fast_assign(
+ struct cfs_rq *gcfs_rq = __trace_sched_group_cfs_rq(se);
+ struct task_struct *p = gcfs_rq ? NULL
+ : container_of(se, struct task_struct, se);
+
+ __entry->cpu = __trace_sched_cpu(gcfs_rq, se);
+ __trace_sched_path(gcfs_rq, __get_dynamic_array(path),
+ __get_dynamic_array_len(path));
+ memcpy(__entry->comm, p ? p->comm : "(null)", TASK_COMM_LEN);
+ __entry->pid = p ? p->pid : -1;
+ __entry->load = se->avg.load_avg;
+ __entry->util = se->avg.util_avg;
+ __entry->util_pelt = __entry->util;
+ __entry->util_walt = 0;
+#ifdef CONFIG_SCHED_WALT
+ if (!se->my_q) {
+ struct task_struct *p = container_of(se, struct task_struct, se);
+ walt_util(__entry->util_walt, p->ravg.demand);
+ if (!walt_disabled && sysctl_sched_use_walt_task_util)
+ __entry->util = __entry->util_walt;
+ }
+#endif
+ ),
+
+ TP_printk("cpu=%d path=%s comm=%s pid=%d load=%lu util=%lu util_pelt=%lu util_walt=%lu",
+ __entry->cpu, __get_str(path), __entry->comm,
+ __entry->pid, __entry->load, __entry->util,
+ __entry->util_pelt, __entry->util_walt)
+);
+
+/*
+ * Tracepoint for task_group load tracking:
+ */
+#ifdef CONFIG_FAIR_GROUP_SCHED
+TRACE_EVENT(sched_load_tg,
+
+ TP_PROTO(struct cfs_rq *cfs_rq),
+
+ TP_ARGS(cfs_rq),
+
+ TP_STRUCT__entry(
+ __field( int, cpu )
+ __dynamic_array(char, path,
+ __trace_sched_path(cfs_rq, NULL, 0) )
+ __field( long, load )
+ ),
+
+ TP_fast_assign(
+ __entry->cpu = cfs_rq->rq->cpu;
+ __trace_sched_path(cfs_rq, __get_dynamic_array(path),
+ __get_dynamic_array_len(path));
+ __entry->load = atomic_long_read(&cfs_rq->tg->load_avg);
+ ),
+
+ TP_printk("cpu=%d path=%s load=%ld", __entry->cpu, __get_str(path),
+ __entry->load)
+);
+#endif /* CONFIG_FAIR_GROUP_SCHED */
+
+/*
+ * Tracepoint for accounting CPU boosted utilization
+ */
+TRACE_EVENT(sched_boost_cpu,
+
+ TP_PROTO(int cpu, unsigned long util, long margin),
+
+ TP_ARGS(cpu, util, margin),
+
+ TP_STRUCT__entry(
+ __field( int, cpu )
+ __field( unsigned long, util )
+ __field(long, margin )
+ ),
+
+ TP_fast_assign(
+ __entry->cpu = cpu;
+ __entry->util = util;
+ __entry->margin = margin;
+ ),
+
+ TP_printk("cpu=%d util=%lu margin=%ld",
+ __entry->cpu,
+ __entry->util,
+ __entry->margin)
+);
+
+/*
+ * Tracepoint for schedtune_tasks_update
+ */
+TRACE_EVENT(sched_tune_tasks_update,
+
+ TP_PROTO(struct task_struct *tsk, int cpu, int tasks, int idx,
+ int boost, int max_boost, u64 group_ts),
+
+ TP_ARGS(tsk, cpu, tasks, idx, boost, max_boost, group_ts),
+
+ TP_STRUCT__entry(
+ __array( char, comm, TASK_COMM_LEN )
+ __field( pid_t, pid )
+ __field( int, cpu )
+ __field( int, tasks )
+ __field( int, idx )
+ __field( int, boost )
+ __field( int, max_boost )
+ __field( u64, group_ts )
+ ),
+
+ TP_fast_assign(
+ memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+ __entry->pid = tsk->pid;
+ __entry->cpu = cpu;
+ __entry->tasks = tasks;
+ __entry->idx = idx;
+ __entry->boost = boost;
+ __entry->max_boost = max_boost;
+ __entry->group_ts = group_ts;
+ ),
+
+ TP_printk("pid=%d comm=%s "
+ "cpu=%d tasks=%d idx=%d boost=%d max_boost=%d timeout=%llu",
+ __entry->pid, __entry->comm,
+ __entry->cpu, __entry->tasks, __entry->idx,
+ __entry->boost, __entry->max_boost,
+ __entry->group_ts)
+);
+
+/*
+ * Tracepoint for schedtune_boostgroup_update
+ */
+TRACE_EVENT(sched_tune_boostgroup_update,
+
+ TP_PROTO(int cpu, int variation, int max_boost),
+
+ TP_ARGS(cpu, variation, max_boost),
+
+ TP_STRUCT__entry(
+ __field( int, cpu )
+ __field( int, variation )
+ __field( int, max_boost )
+ ),
+
+ TP_fast_assign(
+ __entry->cpu = cpu;
+ __entry->variation = variation;
+ __entry->max_boost = max_boost;
+ ),
+
+ TP_printk("cpu=%d variation=%d max_boost=%d",
+ __entry->cpu, __entry->variation, __entry->max_boost)
+);
+
+/*
+ * Tracepoint for accounting task boosted utilization
+ */
+TRACE_EVENT(sched_boost_task,
+
+ TP_PROTO(struct task_struct *tsk, unsigned long util, long margin),
+
+ TP_ARGS(tsk, util, margin),
+
+ TP_STRUCT__entry(
+ __array( char, comm, TASK_COMM_LEN )
+ __field( pid_t, pid )
+ __field( unsigned long, util )
+ __field( long, margin )
+
+ ),
+
+ TP_fast_assign(
+ memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+ __entry->pid = tsk->pid;
+ __entry->util = util;
+ __entry->margin = margin;
+ ),
+
+ TP_printk("comm=%s pid=%d util=%lu margin=%ld",
+ __entry->comm, __entry->pid,
+ __entry->util,
+ __entry->margin)
+);
+
+/*
+ * Tracepoint for system overutilized flag
+ */
+struct sched_domain;
+TRACE_EVENT_CONDITION(sched_overutilized,
+
+ TP_PROTO(struct sched_domain *sd, bool was_overutilized, bool overutilized),
+
+ TP_ARGS(sd, was_overutilized, overutilized),
+
+ TP_CONDITION(overutilized != was_overutilized),
+
+ TP_STRUCT__entry(
+ __field( bool, overutilized )
+ __array( char, cpulist , 32 )
+ ),
+
+ TP_fast_assign(
+ __entry->overutilized = overutilized;
+ scnprintf(__entry->cpulist, sizeof(__entry->cpulist), "%*pbl", cpumask_pr_args(sched_domain_span(sd)));
+ ),
+
+ TP_printk("overutilized=%d sd_span=%s",
+ __entry->overutilized ? 1 : 0, __entry->cpulist)
+);
+
+/*
+ * Tracepoint for find_best_target
+ */
+TRACE_EVENT(sched_find_best_target,
+
+ TP_PROTO(struct task_struct *tsk, bool prefer_idle,
+ unsigned long min_util, int start_cpu,
+ int best_idle, int best_active, int target),
+
+ TP_ARGS(tsk, prefer_idle, min_util, start_cpu,
+ best_idle, best_active, target),
+
+ TP_STRUCT__entry(
+ __array( char, comm, TASK_COMM_LEN )
+ __field( pid_t, pid )
+ __field( unsigned long, min_util )
+ __field( bool, prefer_idle )
+ __field( int, start_cpu )
+ __field( int, best_idle )
+ __field( int, best_active )
+ __field( int, target )
+ ),
+
+ TP_fast_assign(
+ memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+ __entry->pid = tsk->pid;
+ __entry->min_util = min_util;
+ __entry->prefer_idle = prefer_idle;
+ __entry->start_cpu = start_cpu;
+ __entry->best_idle = best_idle;
+ __entry->best_active = best_active;
+ __entry->target = target;
+ ),
+
+ TP_printk("pid=%d comm=%s prefer_idle=%d start_cpu=%d "
+ "best_idle=%d best_active=%d target=%d",
+ __entry->pid, __entry->comm,
+ __entry->prefer_idle, __entry->start_cpu,
+ __entry->best_idle, __entry->best_active,
+ __entry->target)
+);
+
+/*
+ * Tracepoint for tasks' estimated utilization.
+ */
+TRACE_EVENT(sched_util_est_task,
+
+ TP_PROTO(struct task_struct *tsk, struct sched_avg *avg),
+
+ TP_ARGS(tsk, avg),
+
+ TP_STRUCT__entry(
+ __array( char, comm, TASK_COMM_LEN )
+ __field( pid_t, pid )
+ __field( int, cpu )
+ __field( unsigned int, util_avg )
+ __field( unsigned int, est_enqueued )
+ __field( unsigned int, est_ewma )
+ ),
+
+ TP_fast_assign(
+ memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+ __entry->pid = tsk->pid;
+ __entry->cpu = task_cpu(tsk);
+ __entry->util_avg = avg->util_avg;
+ __entry->est_enqueued = avg->util_est.enqueued;
+ __entry->est_ewma = avg->util_est.ewma;
+ ),
+
+ TP_printk("comm=%s pid=%d cpu=%d util_avg=%u util_est_ewma=%u util_est_enqueued=%u",
+ __entry->comm,
+ __entry->pid,
+ __entry->cpu,
+ __entry->util_avg,
+ __entry->est_ewma,
+ __entry->est_enqueued)
+);
+
+/*
+ * Tracepoint for root cfs_rq's estimated utilization.
+ */
+TRACE_EVENT(sched_util_est_cpu,
+
+ TP_PROTO(int cpu, struct cfs_rq *cfs_rq),
+
+ TP_ARGS(cpu, cfs_rq),
+
+ TP_STRUCT__entry(
+ __field( int, cpu )
+ __field( unsigned int, util_avg )
+ __field( unsigned int, util_est_enqueued )
+ ),
+
+ TP_fast_assign(
+ __entry->cpu = cpu;
+ __entry->util_avg = cfs_rq->avg.util_avg;
+ __entry->util_est_enqueued = cfs_rq->avg.util_est.enqueued;
+ ),
+
+ TP_printk("cpu=%d util_avg=%u util_est_enqueued=%u",
+ __entry->cpu,
+ __entry->util_avg,
+ __entry->util_est_enqueued)
+);
+
+#ifdef CONFIG_SCHED_WALT
+struct rq;
+
+TRACE_EVENT(walt_update_task_ravg,
+
+ TP_PROTO(struct task_struct *p, struct rq *rq, int evt,
+ u64 wallclock, u64 irqtime),
+
+ TP_ARGS(p, rq, evt, wallclock, irqtime),
+
+ TP_STRUCT__entry(
+ __array( char, comm, TASK_COMM_LEN )
+ __field( pid_t, pid )
+ __field( pid_t, cur_pid )
+ __field( u64, wallclock )
+ __field( u64, mark_start )
+ __field( u64, delta_m )
+ __field( u64, win_start )
+ __field( u64, delta )
+ __field( u64, irqtime )
+ __array( char, evt, 16 )
+ __field(unsigned int, demand )
+ __field(unsigned int, sum )
+ __field( int, cpu )
+ __field( u64, cs )
+ __field( u64, ps )
+ __field( u32, curr_window )
+ __field( u32, prev_window )
+ __field( u64, nt_cs )
+ __field( u64, nt_ps )
+ __field( u32, active_windows )
+ ),
+
+ TP_fast_assign(
+ static const char* walt_event_names[] =
+ {
+ "PUT_PREV_TASK",
+ "PICK_NEXT_TASK",
+ "TASK_WAKE",
+ "TASK_MIGRATE",
+ "TASK_UPDATE",
+ "IRQ_UPDATE"
+ };
+ __entry->wallclock = wallclock;
+ __entry->win_start = rq->window_start;
+ __entry->delta = (wallclock - rq->window_start);
+ strcpy(__entry->evt, walt_event_names[evt]);
+ __entry->cpu = rq->cpu;
+ __entry->cur_pid = rq->curr->pid;
+ memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
+ __entry->pid = p->pid;
+ __entry->mark_start = p->ravg.mark_start;
+ __entry->delta_m = (wallclock - p->ravg.mark_start);
+ __entry->demand = p->ravg.demand;
+ __entry->sum = p->ravg.sum;
+ __entry->irqtime = irqtime;
+ __entry->cs = rq->curr_runnable_sum;
+ __entry->ps = rq->prev_runnable_sum;
+ __entry->curr_window = p->ravg.curr_window;
+ __entry->prev_window = p->ravg.prev_window;
+ __entry->nt_cs = rq->nt_curr_runnable_sum;
+ __entry->nt_ps = rq->nt_prev_runnable_sum;
+ __entry->active_windows = p->ravg.active_windows;
+ ),
+
+ TP_printk("wallclock=%llu window_start=%llu delta=%llu event=%s cpu=%d cur_pid=%d pid=%d comm=%s"
+ " mark_start=%llu delta=%llu demand=%u sum=%u irqtime=%llu"
+ " curr_runnable_sum=%llu prev_runnable_sum=%llu cur_window=%u"
+ " prev_window=%u nt_curr_runnable_sum=%llu nt_prev_runnable_sum=%llu active_windows=%u",
+ __entry->wallclock, __entry->win_start, __entry->delta,
+ __entry->evt, __entry->cpu, __entry->cur_pid,
+ __entry->pid, __entry->comm, __entry->mark_start,
+ __entry->delta_m, __entry->demand,
+ __entry->sum, __entry->irqtime,
+ __entry->cs, __entry->ps,
+ __entry->curr_window, __entry->prev_window,
+ __entry->nt_cs, __entry->nt_ps,
+ __entry->active_windows
+ )
+);
+
+TRACE_EVENT(walt_update_history,
+
+ TP_PROTO(struct rq *rq, struct task_struct *p, u32 runtime, int samples,
+ int evt),
+
+ TP_ARGS(rq, p, runtime, samples, evt),
+
+ TP_STRUCT__entry(
+ __array( char, comm, TASK_COMM_LEN )
+ __field( pid_t, pid )
+ __field(unsigned int, runtime )
+ __field( int, samples )
+ __field( int, evt )
+ __field( u64, demand )
+ __field(unsigned int, walt_avg )
+ __field(unsigned int, pelt_avg )
+ __array( u32, hist, RAVG_HIST_SIZE_MAX)
+ __field( int, cpu )
+ ),
+
+ TP_fast_assign(
+ memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
+ __entry->pid = p->pid;
+ __entry->runtime = runtime;
+ __entry->samples = samples;
+ __entry->evt = evt;
+ __entry->demand = p->ravg.demand;
+ walt_util(__entry->walt_avg,__entry->demand);
+ __entry->pelt_avg = p->se.avg.util_avg;
+ memcpy(__entry->hist, p->ravg.sum_history,
+ RAVG_HIST_SIZE_MAX * sizeof(u32));
+ __entry->cpu = rq->cpu;
+ ),
+
+ TP_printk("pid=%d comm=%s runtime=%u samples=%d event=%d demand=%llu ravg_window=%u"
+ " walt=%u pelt=%u hist0=%u hist1=%u hist2=%u hist3=%u hist4=%u cpu=%d",
+ __entry->pid, __entry->comm,
+ __entry->runtime, __entry->samples, __entry->evt,
+ __entry->demand,
+ walt_ravg_window,
+ __entry->walt_avg,
+ __entry->pelt_avg,
+ __entry->hist[0], __entry->hist[1],
+ __entry->hist[2], __entry->hist[3],
+ __entry->hist[4], __entry->cpu)
+);
+
+TRACE_EVENT(walt_migration_update_sum,
+
+ TP_PROTO(struct rq *rq, struct task_struct *p),
+
+ TP_ARGS(rq, p),
+
+ TP_STRUCT__entry(
+ __field(int, cpu )
+ __field(int, pid )
+ __field( u64, cs )
+ __field( u64, ps )
+ __field( s64, nt_cs )
+ __field( s64, nt_ps )
+ ),
+
+ TP_fast_assign(
+ __entry->cpu = cpu_of(rq);
+ __entry->cs = rq->curr_runnable_sum;
+ __entry->ps = rq->prev_runnable_sum;
+ __entry->nt_cs = (s64)rq->nt_curr_runnable_sum;
+ __entry->nt_ps = (s64)rq->nt_prev_runnable_sum;
+ __entry->pid = p->pid;
+ ),
+
+ TP_printk("cpu=%d curr_runnable_sum=%llu prev_runnable_sum=%llu nt_curr_runnable_sum=%lld nt_prev_runnable_sum=%lld pid=%d",
+ __entry->cpu, __entry->cs, __entry->ps,
+ __entry->nt_cs, __entry->nt_ps, __entry->pid)
+);
+#endif /* CONFIG_SCHED_WALT */
+#endif /* CONFIG_SMP */
#endif /* _TRACE_SCHED_H */
/* This part must be outside protection */
diff --git a/include/uapi/linux/android/binder.h b/include/uapi/linux/android/binder.h
index bfaec69..b4723e3 100644
--- a/include/uapi/linux/android/binder.h
+++ b/include/uapi/linux/android/binder.h
@@ -38,9 +38,56 @@
BINDER_TYPE_PTR = B_PACK_CHARS('p', 't', '*', B_TYPE_LARGE),
};
-enum {
+/**
+ * enum flat_binder_object_shifts: shift values for flat_binder_object_flags
+ * @FLAT_BINDER_FLAG_SCHED_POLICY_SHIFT: shift for getting scheduler policy.
+ *
+ */
+enum flat_binder_object_shifts {
+ FLAT_BINDER_FLAG_SCHED_POLICY_SHIFT = 9,
+};
+
+/**
+ * enum flat_binder_object_flags - flags for use in flat_binder_object.flags
+ */
+enum flat_binder_object_flags {
+ /**
+ * @FLAT_BINDER_FLAG_PRIORITY_MASK: bit-mask for min scheduler priority
+ *
+ * These bits can be used to set the minimum scheduler priority
+ * at which transactions into this node should run. Valid values
+ * in these bits depend on the scheduler policy encoded in
+ * @FLAT_BINDER_FLAG_SCHED_POLICY_MASK.
+ *
+ * For SCHED_NORMAL/SCHED_BATCH, the valid range is between [-20..19]
+ * For SCHED_FIFO/SCHED_RR, the value can run between [1..99]
+ */
FLAT_BINDER_FLAG_PRIORITY_MASK = 0xff,
+ /**
+ * @FLAT_BINDER_FLAG_ACCEPTS_FDS: whether the node accepts fds.
+ */
FLAT_BINDER_FLAG_ACCEPTS_FDS = 0x100,
+ /**
+ * @FLAT_BINDER_FLAG_SCHED_POLICY_MASK: bit-mask for scheduling policy
+ *
+ * These two bits can be used to set the min scheduling policy at which
+ * transactions on this node should run. These match the UAPI
+ * scheduler policy values, eg:
+ * 00b: SCHED_NORMAL
+ * 01b: SCHED_FIFO
+ * 10b: SCHED_RR
+ * 11b: SCHED_BATCH
+ */
+ FLAT_BINDER_FLAG_SCHED_POLICY_MASK =
+ 3U << FLAT_BINDER_FLAG_SCHED_POLICY_SHIFT,
+
+ /**
+ * @FLAT_BINDER_FLAG_INHERIT_RT: whether the node inherits RT policy
+ *
+ * Only when set, calls into this node will inherit a real-time
+ * scheduling policy from the caller (for synchronous transactions).
+ */
+ FLAT_BINDER_FLAG_INHERIT_RT = 0x800,
};
#ifdef BINDER_IPC_32BIT
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 30f2ce7..a88b2c4 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -176,6 +176,10 @@
/* Specify numa node during map creation */
#define BPF_F_NUMA_NODE (1U << 2)
+/* Flags for accessing BPF object */
+#define BPF_F_RDONLY (1U << 3)
+#define BPF_F_WRONLY (1U << 4)
+
union bpf_attr {
struct { /* anonymous struct used by BPF_MAP_CREATE command */
__u32 map_type; /* one of enum bpf_map_type */
@@ -216,6 +220,7 @@
struct { /* anonymous struct used by BPF_OBJ_* commands */
__aligned_u64 pathname;
__u32 bpf_fd;
+ __u32 file_flags;
};
struct { /* anonymous struct used by BPF_PROG_ATTACH/DETACH commands */
@@ -243,6 +248,7 @@
__u32 map_id;
};
__u32 next_id;
+ __u32 open_flags;
};
struct { /* anonymous struct used by BPF_OBJ_GET_INFO_BY_FD */
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 4199f8a..971e82a 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -275,6 +275,8 @@
#define FS_ENCRYPTION_MODE_AES_256_CTS 4
#define FS_ENCRYPTION_MODE_AES_128_CBC 5
#define FS_ENCRYPTION_MODE_AES_128_CTS 6
+#define FS_ENCRYPTION_MODE_SPECK128_256_XTS 7
+#define FS_ENCRYPTION_MODE_SPECK128_256_CTS 8
struct fscrypt_policy {
__u8 version;
diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
index 4b5001c..d5ac0eb 100644
--- a/include/uapi/linux/fuse.h
+++ b/include/uapi/linux/fuse.h
@@ -376,6 +376,7 @@
FUSE_READDIRPLUS = 44,
FUSE_RENAME2 = 45,
FUSE_LSEEK = 46,
+ FUSE_CANONICAL_PATH= 2016,
/* CUSE specific operations */
CUSE_INIT = 4096,
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 1f00f0c..b9a7c97 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -451,6 +451,16 @@
#define IFLA_MACSEC_MAX (__IFLA_MACSEC_MAX - 1)
+/* XFRM section */
+enum {
+ IFLA_XFRM_UNSPEC,
+ IFLA_XFRM_LINK,
+ IFLA_XFRM_IF_ID,
+ __IFLA_XFRM_MAX
+};
+
+#define IFLA_XFRM_MAX (__IFLA_XFRM_MAX - 1)
+
enum macsec_validation_type {
MACSEC_VALIDATE_DISABLED = 0,
MACSEC_VALIDATE_CHECK = 1,
diff --git a/include/uapi/linux/ipv6.h b/include/uapi/linux/ipv6.h
index b22a9c4..8c3ea81 100644
--- a/include/uapi/linux/ipv6.h
+++ b/include/uapi/linux/ipv6.h
@@ -166,6 +166,7 @@
DEVCONF_ACCEPT_DAD,
DEVCONF_FORCE_TLLAO,
DEVCONF_NDISC_NOTIFY,
+ DEVCONF_ACCEPT_RA_RT_TABLE,
DEVCONF_MLDV1_UNSOLICITED_REPORT_INTERVAL,
DEVCONF_MLDV2_UNSOLICITED_REPORT_INTERVAL,
DEVCONF_SUPPRESS_FRAG_NDISC,
diff --git a/include/uapi/linux/keychord.h b/include/uapi/linux/keychord.h
new file mode 100644
index 0000000..ea7cf4d
--- /dev/null
+++ b/include/uapi/linux/keychord.h
@@ -0,0 +1,52 @@
+/*
+ * Key chord input driver
+ *
+ * Copyright (C) 2008 Google, Inc.
+ * Author: Mike Lockwood <lockwood@android.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+*/
+
+#ifndef _UAPI_LINUX_KEYCHORD_H_
+#define _UAPI_LINUX_KEYCHORD_H_
+
+#include <linux/input.h>
+
+#define KEYCHORD_VERSION 1
+
+/*
+ * One or more input_keychord structs are written to /dev/keychord
+ * at once to specify the list of keychords to monitor.
+ * Reading /dev/keychord returns the id of a keychord when the
+ * keychord combination is pressed. A keychord is signalled when
+ * all of the keys in the keycode list are in the pressed state.
+ * The order in which the keys are pressed does not matter.
+ * The keychord will not be signalled if keys not in the keycode
+ * list are pressed.
+ * Keychords will not be signalled on key release events.
+ */
+struct input_keychord {
+ /* should be KEYCHORD_VERSION */
+ __u16 version;
+ /*
+ * client specified ID, returned from read()
+ * when this keychord is pressed.
+ */
+ __u16 id;
+
+ /* number of keycodes in this keychord */
+ __u16 count;
+
+ /* variable length array of keycodes */
+ __u16 keycodes[];
+};
+
+#endif /* _UAPI_LINUX_KEYCHORD_H_ */
diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h
index aa50113..5bdb07f 100644
--- a/include/uapi/linux/magic.h
+++ b/include/uapi/linux/magic.h
@@ -55,6 +55,8 @@
#define REISER2FS_SUPER_MAGIC_STRING "ReIsEr2Fs"
#define REISER2FS_JR_SUPER_MAGIC_STRING "ReIsEr3Fs"
+#define SDCARDFS_SUPER_MAGIC 0x5dca2df5
+
#define SMB_SUPER_MAGIC 0x517B
#define CGROUP_SUPER_MAGIC 0x27e0eb
#define CGROUP2_SUPER_MAGIC 0x63677270
diff --git a/include/uapi/linux/netfilter/xt_IDLETIMER.h b/include/uapi/linux/netfilter/xt_IDLETIMER.h
index 3c586a1..c82a1c1 100644
--- a/include/uapi/linux/netfilter/xt_IDLETIMER.h
+++ b/include/uapi/linux/netfilter/xt_IDLETIMER.h
@@ -5,6 +5,7 @@
* Header file for Xtables timer target module.
*
* Copyright (C) 2004, 2010 Nokia Corporation
+ *
* Written by Timo Teras <ext-timo.teras@nokia.com>
*
* Converted to x_tables and forward-ported to 2.6.34
@@ -33,12 +34,19 @@
#include <linux/types.h>
#define MAX_IDLETIMER_LABEL_SIZE 28
+#define NLMSG_MAX_SIZE 64
+
+#define NL_EVENT_TYPE_INACTIVE 0
+#define NL_EVENT_TYPE_ACTIVE 1
struct idletimer_tg_info {
__u32 timeout;
char label[MAX_IDLETIMER_LABEL_SIZE];
+ /* Use netlink messages for notification in addition to sysfs */
+ __u8 send_nl_msg;
+
/* for kernel module internal use only */
struct idletimer_tg *timer __attribute__((aligned(8)));
};
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 3027f94..a4c0c8e 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -210,4 +210,7 @@
# define PR_SPEC_DISABLE (1UL << 2)
# define PR_SPEC_FORCE_DISABLE (1UL << 3)
+#define PR_SET_VMA 0x53564d41
+# define PR_SET_VMA_ANON_NAME 0
+
#endif /* _LINUX_PRCTL_H */
diff --git a/include/uapi/linux/usb/f_accessory.h b/include/uapi/linux/usb/f_accessory.h
new file mode 100644
index 0000000..0baeb7d
--- /dev/null
+++ b/include/uapi/linux/usb/f_accessory.h
@@ -0,0 +1,146 @@
+/*
+ * Gadget Function Driver for Android USB accessories
+ *
+ * Copyright (C) 2011 Google, Inc.
+ * Author: Mike Lockwood <lockwood@android.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#ifndef _UAPI_LINUX_USB_F_ACCESSORY_H
+#define _UAPI_LINUX_USB_F_ACCESSORY_H
+
+/* Use Google Vendor ID when in accessory mode */
+#define USB_ACCESSORY_VENDOR_ID 0x18D1
+
+
+/* Product ID to use when in accessory mode */
+#define USB_ACCESSORY_PRODUCT_ID 0x2D00
+
+/* Product ID to use when in accessory mode and adb is enabled */
+#define USB_ACCESSORY_ADB_PRODUCT_ID 0x2D01
+
+/* Indexes for strings sent by the host via ACCESSORY_SEND_STRING */
+#define ACCESSORY_STRING_MANUFACTURER 0
+#define ACCESSORY_STRING_MODEL 1
+#define ACCESSORY_STRING_DESCRIPTION 2
+#define ACCESSORY_STRING_VERSION 3
+#define ACCESSORY_STRING_URI 4
+#define ACCESSORY_STRING_SERIAL 5
+
+/* Control request for retrieving device's protocol version
+ *
+ * requestType: USB_DIR_IN | USB_TYPE_VENDOR
+ * request: ACCESSORY_GET_PROTOCOL
+ * value: 0
+ * index: 0
+ * data version number (16 bits little endian)
+ * 1 for original accessory support
+ * 2 adds HID and device to host audio support
+ */
+#define ACCESSORY_GET_PROTOCOL 51
+
+/* Control request for host to send a string to the device
+ *
+ * requestType: USB_DIR_OUT | USB_TYPE_VENDOR
+ * request: ACCESSORY_SEND_STRING
+ * value: 0
+ * index: string ID
+ * data zero terminated UTF8 string
+ *
+ * The device can later retrieve these strings via the
+ * ACCESSORY_GET_STRING_* ioctls
+ */
+#define ACCESSORY_SEND_STRING 52
+
+/* Control request for starting device in accessory mode.
+ * The host sends this after setting all its strings to the device.
+ *
+ * requestType: USB_DIR_OUT | USB_TYPE_VENDOR
+ * request: ACCESSORY_START
+ * value: 0
+ * index: 0
+ * data none
+ */
+#define ACCESSORY_START 53
+
+/* Control request for registering a HID device.
+ * Upon registering, a unique ID is sent by the accessory in the
+ * value parameter. This ID will be used for future commands for
+ * the device
+ *
+ * requestType: USB_DIR_OUT | USB_TYPE_VENDOR
+ * request: ACCESSORY_REGISTER_HID_DEVICE
+ * value: Accessory assigned ID for the HID device
+ * index: total length of the HID report descriptor
+ * data none
+ */
+#define ACCESSORY_REGISTER_HID 54
+
+/* Control request for unregistering a HID device.
+ *
+ * requestType: USB_DIR_OUT | USB_TYPE_VENDOR
+ * request: ACCESSORY_REGISTER_HID
+ * value: Accessory assigned ID for the HID device
+ * index: 0
+ * data none
+ */
+#define ACCESSORY_UNREGISTER_HID 55
+
+/* Control request for sending the HID report descriptor.
+ * If the HID descriptor is longer than the endpoint zero max packet size,
+ * the descriptor will be sent in multiple ACCESSORY_SET_HID_REPORT_DESC
+ * commands. The data for the descriptor must be sent sequentially
+ * if multiple packets are needed.
+ *
+ * requestType: USB_DIR_OUT | USB_TYPE_VENDOR
+ * request: ACCESSORY_SET_HID_REPORT_DESC
+ * value: Accessory assigned ID for the HID device
+ * index: offset of data in descriptor
+ * (needed when HID descriptor is too big for one packet)
+ * data the HID report descriptor
+ */
+#define ACCESSORY_SET_HID_REPORT_DESC 56
+
+/* Control request for sending HID events.
+ *
+ * requestType: USB_DIR_OUT | USB_TYPE_VENDOR
+ * request: ACCESSORY_SEND_HID_EVENT
+ * value: Accessory assigned ID for the HID device
+ * index: 0
+ * data the HID report for the event
+ */
+#define ACCESSORY_SEND_HID_EVENT 57
+
+/* Control request for setting the audio mode.
+ *
+ * requestType: USB_DIR_OUT | USB_TYPE_VENDOR
+ * request: ACCESSORY_SET_AUDIO_MODE
+ * value: 0 - no audio
+ * 1 - device to host, 44100 16-bit stereo PCM
+ * index: 0
+ * data none
+ */
+#define ACCESSORY_SET_AUDIO_MODE 58
+
+/* ioctls for retrieving strings set by the host */
+#define ACCESSORY_GET_STRING_MANUFACTURER _IOW('M', 1, char[256])
+#define ACCESSORY_GET_STRING_MODEL _IOW('M', 2, char[256])
+#define ACCESSORY_GET_STRING_DESCRIPTION _IOW('M', 3, char[256])
+#define ACCESSORY_GET_STRING_VERSION _IOW('M', 4, char[256])
+#define ACCESSORY_GET_STRING_URI _IOW('M', 5, char[256])
+#define ACCESSORY_GET_STRING_SERIAL _IOW('M', 6, char[256])
+/* returns 1 if there is a start request pending */
+#define ACCESSORY_IS_START_REQUESTED _IO('M', 7)
+/* returns audio mode (set via the ACCESSORY_SET_AUDIO_MODE control request) */
+#define ACCESSORY_GET_AUDIO_MODE _IO('M', 8)
+
+#endif /* _UAPI_LINUX_USB_F_ACCESSORY_H */
diff --git a/include/uapi/linux/xfrm.h b/include/uapi/linux/xfrm.h
index e3af285..5f3b9fe 100644
--- a/include/uapi/linux/xfrm.h
+++ b/include/uapi/linux/xfrm.h
@@ -305,9 +305,12 @@
XFRMA_ADDRESS_FILTER, /* struct xfrm_address_filter */
XFRMA_PAD,
XFRMA_OFFLOAD_DEV, /* struct xfrm_state_offload */
- XFRMA_OUTPUT_MARK, /* __u32 */
+ XFRMA_SET_MARK, /* __u32 */
+ XFRMA_SET_MARK_MASK, /* __u32 */
+ XFRMA_IF_ID, /* __u32 */
__XFRMA_MAX
+#define XFRMA_OUTPUT_MARK XFRMA_SET_MARK /* Compatibility */
#define XFRMA_MAX (__XFRMA_MAX - 1)
};
diff --git a/init/Kconfig b/init/Kconfig
index 4607532..37ecab2 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -400,6 +400,15 @@
If in doubt, say N here.
+config SCHED_WALT
+ bool "Support window based load tracking"
+ depends on SMP
+ help
+ This feature will allow the scheduler to maintain a tunable window
+ based set of metrics for tasks and runqueues. These metrics can be
+ used to guide task placement as well as task frequency requirements
+ for cpufreq governors.
+
config BSD_PROCESS_ACCT
bool "BSD Process Accounting"
depends on MULTIUSER
@@ -586,6 +595,41 @@
config GENERIC_SCHED_CLOCK
bool
+menu "FAIR Scheuler tunables"
+
+choice
+ prompt "Utilization's PELT half-Life"
+ default PELT_UTIL_HALFLIFE_32
+ help
+ Allows choosing one of the possible values for the PELT half-life to
+ be used for the update of the utilization of tasks and CPUs.
+ The half-life is the amount of [ms] required by the PELT signal to
+ build up to 50% utilization. The higher the half-life the longer it
+ takes for a task to be represented as a big one.
+
+ If not sure, use the default of 32 ms.
+
+config PELT_UTIL_HALFLIFE_32
+ bool "32 ms, default for server"
+
+config PELT_UTIL_HALFLIFE_16
+ bool "16 ms, suggested for interactive workloads"
+ help
+ Use 16ms as PELT half-life value. This will increase the ramp-up and
+ decay of utlization and load twice as fast as for the default
+ configuration using 32ms.
+
+config PELT_UTIL_HALFLIFE_8
+ bool "8 ms, very fast"
+ help
+ Use 8ms as PELT half-life value. This will increase the ramp-up and
+ decay of utlization and load four time as fast as for the default
+ configuration using 32ms.
+
+endchoice
+
+endmenu # FAIR Scheduler tunables"
+
#
# For architectures that want to enable the support for NUMA-affine scheduler
# balancing logic:
@@ -959,6 +1003,39 @@
desktop applications. Task group autogeneration is currently based
upon task session.
+config SCHED_TUNE
+ bool "Boosting for CFS tasks (EXPERIMENTAL)"
+ depends on SMP
+ help
+ This option enables support for task classification using a new
+ cgroup controller, schedtune. Schedtune allows tasks to be given
+ a boost value and marked as latency-sensitive or not. This option
+ provides the "schedtune" controller.
+
+ This new controller:
+ 1. allows only a two layers hierarchy, where the root defines the
+ system-wide boost value and its direct childrens define each one a
+ different "class of tasks" to be boosted with a different value
+ 2. supports up to 16 different task classes, each one which could be
+ configured with a different boost value
+
+ Latency-sensitive tasks are not subject to energy-aware wakeup
+ task placement. The boost value assigned to tasks is used to
+ influence task placement and CPU frequency selection (if
+ utilization-driven frequency selection is in use).
+
+ If unsure, say N.
+
+config DEFAULT_USE_ENERGY_AWARE
+ bool "Default to enabling the Energy Aware Scheduler feature"
+ default n
+ help
+ This option defaults the ENERGY_AWARE scheduling feature to true,
+ as without SCHED_DEBUG set this feature can't be enabled or disabled
+ via sysctl.
+
+ Say N if unsure.
+
config SYSFS_DEPRECATED
bool "Enable deprecated sysfs features to support old userspace tools"
depends on SYSFS
@@ -1887,7 +1964,7 @@
config MODULES_TREE_LOOKUP
def_bool y
- depends on PERF_EVENTS || TRACING
+ depends on PERF_EVENTS || TRACING || CFI_CLANG
config INIT_ALL_POSSIBLE
bool
diff --git a/init/Makefile b/init/Makefile
index 1dbb237..0320e1a 100644
--- a/init/Makefile
+++ b/init/Makefile
@@ -6,11 +6,8 @@
ccflags-y := -fno-function-sections -fno-data-sections
obj-y := main.o version.o mounts.o
-ifneq ($(CONFIG_BLK_DEV_INITRD),y)
obj-y += noinitramfs.o
-else
obj-$(CONFIG_BLK_DEV_INITRD) += initramfs.o
-endif
obj-$(CONFIG_GENERIC_CALIBRATE_DELAY) += calibrate.o
ifneq ($(CONFIG_ARCH_INIT_TASK),y)
@@ -21,6 +18,7 @@
mounts-$(CONFIG_BLK_DEV_RAM) += do_mounts_rd.o
mounts-$(CONFIG_BLK_DEV_INITRD) += do_mounts_initrd.o
mounts-$(CONFIG_BLK_DEV_MD) += do_mounts_md.o
+mounts-$(CONFIG_BLK_DEV_DM) += do_mounts_dm.o
# dependencies on generated files need to be listed explicitly
$(obj)/version.o: include/generated/compile.h
diff --git a/init/do_mounts.c b/init/do_mounts.c
index 7cf4f6d..2782669 100644
--- a/init/do_mounts.c
+++ b/init/do_mounts.c
@@ -565,6 +565,7 @@
wait_for_device_probe();
md_run_setup();
+ dm_run_setup();
if (saved_root_name[0]) {
root_device_name = saved_root_name;
diff --git a/init/do_mounts.h b/init/do_mounts.h
index 5b05c8f..cd20112 100644
--- a/init/do_mounts.h
+++ b/init/do_mounts.h
@@ -61,3 +61,13 @@
static inline void md_run_setup(void) {}
#endif
+
+#ifdef CONFIG_BLK_DEV_DM
+
+void dm_run_setup(void);
+
+#else
+
+static inline void dm_run_setup(void) {}
+
+#endif
diff --git a/init/do_mounts_dm.c b/init/do_mounts_dm.c
new file mode 100644
index 0000000..af84b01
--- /dev/null
+++ b/init/do_mounts_dm.c
@@ -0,0 +1,470 @@
+/* do_mounts_dm.c
+ * Copyright (C) 2010 The Chromium OS Authors <chromium-os-dev@chromium.org>
+ * All Rights Reserved.
+ * Based on do_mounts_md.c
+ *
+ * This file is released under the GPL.
+ */
+#include <linux/async.h>
+#include <linux/ctype.h>
+#include <linux/device-mapper.h>
+#include <linux/fs.h>
+#include <linux/string.h>
+#include <linux/delay.h>
+
+#include "do_mounts.h"
+
+#define DM_MAX_DEVICES 256
+#define DM_MAX_TARGETS 256
+#define DM_MAX_NAME 32
+#define DM_MAX_UUID 129
+#define DM_NO_UUID "none"
+
+#define DM_MSG_PREFIX "init"
+
+/* Separators used for parsing the dm= argument. */
+#define DM_FIELD_SEP " "
+#define DM_LINE_SEP ","
+#define DM_ANY_SEP DM_FIELD_SEP DM_LINE_SEP
+
+/*
+ * When the device-mapper and any targets are compiled into the kernel
+ * (not a module), one or more device-mappers may be created and used
+ * as the root device at boot time with the parameters given with the
+ * boot line dm=...
+ *
+ * Multiple device-mappers can be stacked specifing the number of
+ * devices. A device can have multiple targets if the the number of
+ * targets is specified.
+ *
+ * TODO(taysom:defect 32847)
+ * In the future, the <num> field will be mandatory.
+ *
+ * <device> ::= [<num>] <device-mapper>+
+ * <device-mapper> ::= <head> "," <target>+
+ * <head> ::= <name> <uuid> <mode> [<num>]
+ * <target> ::= <start> <length> <type> <options> ","
+ * <mode> ::= "ro" | "rw"
+ * <uuid> ::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | "none"
+ * <type> ::= "verity" | "bootcache" | ...
+ *
+ * Example:
+ * 2 vboot none ro 1,
+ * 0 1768000 bootcache
+ * device=aa55b119-2a47-8c45-946a-5ac57765011f+1
+ * signature=76e9be054b15884a9fa85973e9cb274c93afadb6
+ * cache_start=1768000 max_blocks=100000 size_limit=23 max_trace=20000,
+ * vroot none ro 1,
+ * 0 1740800 verity payload=254:0 hashtree=254:0 hashstart=1740800 alg=sha1
+ * root_hexdigest=76e9be054b15884a9fa85973e9cb274c93afadb6
+ * salt=5b3549d54d6c7a3837b9b81ed72e49463a64c03680c47835bef94d768e5646fe
+ *
+ * Notes:
+ * 1. uuid is a label for the device and we set it to "none".
+ * 2. The <num> field will be optional initially and assumed to be 1.
+ * Once all the scripts that set these fields have been set, it will
+ * be made mandatory.
+ */
+
+struct dm_setup_target {
+ sector_t begin;
+ sector_t length;
+ char *type;
+ char *params;
+ /* simple singly linked list */
+ struct dm_setup_target *next;
+};
+
+struct dm_device {
+ int minor;
+ int ro;
+ char name[DM_MAX_NAME];
+ char uuid[DM_MAX_UUID];
+ unsigned long num_targets;
+ struct dm_setup_target *target;
+ int target_count;
+ struct dm_device *next;
+};
+
+struct dm_option {
+ char *start;
+ char *next;
+ size_t len;
+ char delim;
+};
+
+static struct {
+ unsigned long num_devices;
+ char *str;
+} dm_setup_args __initdata;
+
+static __initdata int dm_early_setup;
+
+static int __init get_dm_option(struct dm_option *opt, const char *accept)
+{
+ char *str = opt->next;
+ char *endp;
+
+ if (!str)
+ return 0;
+
+ str = skip_spaces(str);
+ opt->start = str;
+ endp = strpbrk(str, accept);
+ if (!endp) { /* act like strchrnul */
+ opt->len = strlen(str);
+ endp = str + opt->len;
+ } else {
+ opt->len = endp - str;
+ }
+ opt->delim = *endp;
+ if (*endp == 0) {
+ /* Don't advance past the nul. */
+ opt->next = endp;
+ } else {
+ opt->next = endp + 1;
+ }
+ return opt->len != 0;
+}
+
+static int __init dm_setup_cleanup(struct dm_device *devices)
+{
+ struct dm_device *dev = devices;
+
+ while (dev) {
+ struct dm_device *old_dev = dev;
+ struct dm_setup_target *target = dev->target;
+ while (target) {
+ struct dm_setup_target *old_target = target;
+ kfree(target->type);
+ kfree(target->params);
+ target = target->next;
+ kfree(old_target);
+ dev->target_count--;
+ }
+ BUG_ON(dev->target_count);
+ dev = dev->next;
+ kfree(old_dev);
+ }
+ return 0;
+}
+
+static char * __init dm_parse_device(struct dm_device *dev, char *str)
+{
+ struct dm_option opt;
+ size_t len;
+
+ /* Grab the logical name of the device to be exported to udev */
+ opt.next = str;
+ if (!get_dm_option(&opt, DM_FIELD_SEP)) {
+ DMERR("failed to parse device name");
+ goto parse_fail;
+ }
+ len = min(opt.len + 1, sizeof(dev->name));
+ strlcpy(dev->name, opt.start, len); /* includes nul */
+
+ /* Grab the UUID value or "none" */
+ if (!get_dm_option(&opt, DM_FIELD_SEP)) {
+ DMERR("failed to parse device uuid");
+ goto parse_fail;
+ }
+ len = min(opt.len + 1, sizeof(dev->uuid));
+ strlcpy(dev->uuid, opt.start, len);
+
+ /* Determine if the table/device will be read only or read-write */
+ get_dm_option(&opt, DM_ANY_SEP);
+ if (!strncmp("ro", opt.start, opt.len)) {
+ dev->ro = 1;
+ } else if (!strncmp("rw", opt.start, opt.len)) {
+ dev->ro = 0;
+ } else {
+ DMERR("failed to parse table mode");
+ goto parse_fail;
+ }
+
+ /* Optional number field */
+ /* XXX: The <num> field will be mandatory in the next round */
+ if (opt.delim == DM_FIELD_SEP[0]) {
+ if (!get_dm_option(&opt, DM_LINE_SEP))
+ return NULL;
+ dev->num_targets = simple_strtoul(opt.start, NULL, 10);
+ } else {
+ dev->num_targets = 1;
+ }
+ if (dev->num_targets > DM_MAX_TARGETS) {
+ DMERR("too many targets %lu > %d",
+ dev->num_targets, DM_MAX_TARGETS);
+ }
+ return opt.next;
+
+parse_fail:
+ return NULL;
+}
+
+static char * __init dm_parse_targets(struct dm_device *dev, char *str)
+{
+ struct dm_option opt;
+ struct dm_setup_target **target = &dev->target;
+ unsigned long num_targets = dev->num_targets;
+ unsigned long i;
+
+ /* Targets are defined as per the table format but with a
+ * comma as a newline separator. */
+ opt.next = str;
+ for (i = 0; i < num_targets; i++) {
+ *target = kzalloc(sizeof(struct dm_setup_target), GFP_KERNEL);
+ if (!*target) {
+ DMERR("failed to allocate memory for target %s<%ld>",
+ dev->name, i);
+ goto parse_fail;
+ }
+ dev->target_count++;
+
+ if (!get_dm_option(&opt, DM_FIELD_SEP)) {
+ DMERR("failed to parse starting sector"
+ " for target %s<%ld>", dev->name, i);
+ goto parse_fail;
+ }
+ (*target)->begin = simple_strtoull(opt.start, NULL, 10);
+
+ if (!get_dm_option(&opt, DM_FIELD_SEP)) {
+ DMERR("failed to parse length for target %s<%ld>",
+ dev->name, i);
+ goto parse_fail;
+ }
+ (*target)->length = simple_strtoull(opt.start, NULL, 10);
+
+ if (get_dm_option(&opt, DM_FIELD_SEP))
+ (*target)->type = kstrndup(opt.start, opt.len,
+ GFP_KERNEL);
+ if (!((*target)->type)) {
+ DMERR("failed to parse type for target %s<%ld>",
+ dev->name, i);
+ goto parse_fail;
+ }
+ if (get_dm_option(&opt, DM_LINE_SEP))
+ (*target)->params = kstrndup(opt.start, opt.len,
+ GFP_KERNEL);
+ if (!((*target)->params)) {
+ DMERR("failed to parse params for target %s<%ld>",
+ dev->name, i);
+ goto parse_fail;
+ }
+ target = &((*target)->next);
+ }
+ DMDEBUG("parsed %d targets", dev->target_count);
+
+ return opt.next;
+
+parse_fail:
+ return NULL;
+}
+
+static struct dm_device * __init dm_parse_args(void)
+{
+ struct dm_device *devices = NULL;
+ struct dm_device **tail = &devices;
+ struct dm_device *dev;
+ char *str = dm_setup_args.str;
+ unsigned long num_devices = dm_setup_args.num_devices;
+ unsigned long i;
+
+ if (!str)
+ return NULL;
+ for (i = 0; i < num_devices; i++) {
+ dev = kzalloc(sizeof(*dev), GFP_KERNEL);
+ if (!dev) {
+ DMERR("failed to allocated memory for dev");
+ goto error;
+ }
+ *tail = dev;
+ tail = &dev->next;
+ /*
+ * devices are given minor numbers 0 - n-1
+ * in the order they are found in the arg
+ * string.
+ */
+ dev->minor = i;
+ str = dm_parse_device(dev, str);
+ if (!str) /* NULL indicates error in parsing, bail */
+ goto error;
+
+ str = dm_parse_targets(dev, str);
+ if (!str)
+ goto error;
+ }
+ return devices;
+error:
+ dm_setup_cleanup(devices);
+ return NULL;
+}
+
+/*
+ * Parse the command-line parameters given our kernel, but do not
+ * actually try to invoke the DM device now; that is handled by
+ * dm_setup_drives after the low-level disk drivers have initialised.
+ * dm format is described at the top of the file.
+ *
+ * Because dm minor numbers are assigned in assending order starting with 0,
+ * You can assume the first device is /dev/dm-0, the next device is /dev/dm-1,
+ * and so forth.
+ */
+static int __init dm_setup(char *str)
+{
+ struct dm_option opt;
+ unsigned long num_devices;
+
+ if (!str) {
+ DMDEBUG("str is NULL");
+ goto parse_fail;
+ }
+ opt.next = str;
+ if (!get_dm_option(&opt, DM_FIELD_SEP))
+ goto parse_fail;
+ if (isdigit(opt.start[0])) { /* XXX: Optional number field */
+ num_devices = simple_strtoul(opt.start, NULL, 10);
+ str = opt.next;
+ } else {
+ num_devices = 1;
+ /* Don't advance str */
+ }
+ if (num_devices > DM_MAX_DEVICES) {
+ DMDEBUG("too many devices %lu > %d",
+ num_devices, DM_MAX_DEVICES);
+ }
+ dm_setup_args.str = str;
+ dm_setup_args.num_devices = num_devices;
+ DMINFO("will configure %lu devices", num_devices);
+ dm_early_setup = 1;
+ return 1;
+
+parse_fail:
+ DMWARN("Invalid arguments supplied to dm=.");
+ return 0;
+}
+
+static void __init dm_setup_drives(void)
+{
+ struct mapped_device *md = NULL;
+ struct dm_table *table = NULL;
+ struct dm_setup_target *target;
+ struct dm_device *dev;
+ char *uuid;
+ fmode_t fmode = FMODE_READ;
+ struct dm_device *devices;
+
+ devices = dm_parse_args();
+
+ for (dev = devices; dev; dev = dev->next) {
+ if (dm_create(dev->minor, &md)) {
+ DMDEBUG("failed to create the device");
+ goto dm_create_fail;
+ }
+ DMDEBUG("created device '%s'", dm_device_name(md));
+
+ /*
+ * In addition to flagging the table below, the disk must be
+ * set explicitly ro/rw.
+ */
+ set_disk_ro(dm_disk(md), dev->ro);
+
+ if (!dev->ro)
+ fmode |= FMODE_WRITE;
+ if (dm_table_create(&table, fmode, dev->target_count, md)) {
+ DMDEBUG("failed to create the table");
+ goto dm_table_create_fail;
+ }
+
+ dm_lock_md_type(md);
+
+ for (target = dev->target; target; target = target->next) {
+ DMINFO("adding target '%llu %llu %s %s'",
+ (unsigned long long) target->begin,
+ (unsigned long long) target->length,
+ target->type, target->params);
+ if (dm_table_add_target(table, target->type,
+ target->begin,
+ target->length,
+ target->params)) {
+ DMDEBUG("failed to add the target"
+ " to the table");
+ goto add_target_fail;
+ }
+ }
+ if (dm_table_complete(table)) {
+ DMDEBUG("failed to complete the table");
+ goto table_complete_fail;
+ }
+
+ /* Suspend the device so that we can bind it to the table. */
+ if (dm_suspend(md, 0)) {
+ DMDEBUG("failed to suspend the device pre-bind");
+ goto suspend_fail;
+ }
+
+ /* Initial table load: acquire type of table. */
+ dm_set_md_type(md, dm_table_get_type(table));
+
+ /* Setup md->queue to reflect md's type. */
+ if (dm_setup_md_queue(md, table)) {
+ DMWARN("unable to set up device queue for new table.");
+ goto setup_md_queue_fail;
+ }
+
+ /*
+ * Bind the table to the device. This is the only way
+ * to associate md->map with the table and set the disk
+ * capacity directly.
+ */
+ if (dm_swap_table(md, table)) { /* should return NULL. */
+ DMDEBUG("failed to bind the device to the table");
+ goto table_bind_fail;
+ }
+
+ /* Finally, resume and the device should be ready. */
+ if (dm_resume(md)) {
+ DMDEBUG("failed to resume the device");
+ goto resume_fail;
+ }
+
+ /* Export the dm device via the ioctl interface */
+ if (!strcmp(DM_NO_UUID, dev->uuid))
+ uuid = NULL;
+ if (dm_ioctl_export(md, dev->name, uuid)) {
+ DMDEBUG("failed to export device with given"
+ " name and uuid");
+ goto export_fail;
+ }
+
+ dm_unlock_md_type(md);
+
+ DMINFO("dm-%d is ready", dev->minor);
+ }
+ dm_setup_cleanup(devices);
+ return;
+
+export_fail:
+resume_fail:
+table_bind_fail:
+setup_md_queue_fail:
+suspend_fail:
+table_complete_fail:
+add_target_fail:
+ dm_unlock_md_type(md);
+dm_table_create_fail:
+ dm_put(md);
+dm_create_fail:
+ DMWARN("starting dm-%d (%s) failed",
+ dev->minor, dev->name);
+ dm_setup_cleanup(devices);
+}
+
+__setup("dm=", dm_setup);
+
+void __init dm_run_setup(void)
+{
+ if (!dm_early_setup)
+ return;
+ DMINFO("attempting early device configuration.");
+ dm_setup_drives();
+}
diff --git a/init/initramfs.c b/init/initramfs.c
index 7046fef..5ea7f1b 100644
--- a/init/initramfs.c
+++ b/init/initramfs.c
@@ -20,6 +20,7 @@
#include <linux/syscalls.h>
#include <linux/utime.h>
#include <linux/file.h>
+#include <linux/initramfs.h>
static ssize_t __init xwrite(int fd, const char *p, size_t count)
{
@@ -607,10 +608,29 @@
}
#endif
+static int __initdata do_skip_initramfs;
+
+static int __init skip_initramfs_param(char *str)
+{
+ if (*str)
+ return 0;
+ do_skip_initramfs = 1;
+ return 1;
+}
+__setup("skip_initramfs", skip_initramfs_param);
+
static int __init populate_rootfs(void)
{
+ char *err;
+
+ if (do_skip_initramfs) {
+ if (initrd_start)
+ free_initrd();
+ return default_rootfs();
+ }
+
/* Load the built in initramfs */
- char *err = unpack_to_rootfs(__initramfs_start, __initramfs_size);
+ err = unpack_to_rootfs(__initramfs_start, __initramfs_size);
if (err)
panic("%s", err); /* Failed to decompress INTERNAL initramfs */
/* If available load the bootloader supplied initrd */
diff --git a/init/noinitramfs.c b/init/noinitramfs.c
index 267739d..bcc8bcb0 100644
--- a/init/noinitramfs.c
+++ b/init/noinitramfs.c
@@ -21,11 +21,16 @@
#include <linux/stat.h>
#include <linux/kdev_t.h>
#include <linux/syscalls.h>
+#include <linux/kconfig.h>
+#include <linux/initramfs.h>
/*
* Create a simple rootfs that is similar to the default initramfs
*/
-static int __init default_rootfs(void)
+#if !IS_BUILTIN(CONFIG_BLK_DEV_INITRD)
+static
+#endif
+int __init default_rootfs(void)
{
int err;
@@ -49,4 +54,6 @@
printk(KERN_WARNING "Failed to create a rootfs\n");
return err;
}
+#if !IS_BUILTIN(CONFIG_BLK_DEV_INITRD)
rootfs_initcall(default_rootfs);
+#endif
diff --git a/ipc/mqueue.c b/ipc/mqueue.c
index d240256..5f46c15 100644
--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -747,7 +747,7 @@
}
mode &= ~current_umask();
- ret = vfs_create(dir, path->dentry, mode, true);
+ ret = vfs_create2(path->mnt, dir, path->dentry, mode, true);
path->dentry->d_fsdata = NULL;
if (ret)
return ERR_PTR(ret);
@@ -763,7 +763,7 @@
if ((oflag & O_ACCMODE) == (O_RDWR | O_WRONLY))
return ERR_PTR(-EINVAL);
acc = oflag2acc[oflag & O_ACCMODE];
- if (inode_permission(d_inode(path->dentry), acc))
+ if (inode_permission2(path->mnt, d_inode(path->dentry), acc))
return ERR_PTR(-EACCES);
return dentry_open(path, oflag, current_cred());
}
@@ -792,7 +792,7 @@
ro = mnt_want_write(mnt); /* we'll drop it in any case */
error = 0;
inode_lock(d_inode(root));
- path.dentry = lookup_one_len(name->name, root, strlen(name->name));
+ path.dentry = lookup_one_len2(name->name, mnt, root, strlen(name->name));
if (IS_ERR(path.dentry)) {
error = PTR_ERR(path.dentry);
goto out_putfd;
@@ -872,7 +872,7 @@
if (err)
goto out_name;
inode_lock_nested(d_inode(mnt->mnt_root), I_MUTEX_PARENT);
- dentry = lookup_one_len(name->name, mnt->mnt_root,
+ dentry = lookup_one_len2(name->name, mnt, mnt->mnt_root,
strlen(name->name));
if (IS_ERR(dentry)) {
err = PTR_ERR(dentry);
@@ -884,7 +884,7 @@
err = -ENOENT;
} else {
ihold(inode);
- err = vfs_unlink(d_inode(dentry->d_parent), dentry, NULL);
+ err = vfs_unlink2(mnt, d_inode(dentry->d_parent), dentry, NULL);
}
dput(dentry);
diff --git a/kernel/Makefile b/kernel/Makefile
index 172d151d..de59ee7 100644
--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -34,6 +34,9 @@
# cond_syscall is currently not LTO compatible
CFLAGS_sys_ni.o = $(DISABLE_LTO)
+# Don't instrument error handlers
+CFLAGS_cfi.o = $(DISABLE_CFI_CLANG)
+
obj-y += sched/
obj-y += locking/
obj-y += power/
@@ -101,6 +104,7 @@
obj-$(CONFIG_IRQ_WORK) += irq_work.o
obj-$(CONFIG_CPU_PM) += cpu_pm.o
obj-$(CONFIG_BPF) += bpf/
+obj-$(CONFIG_CFI_CLANG) += cfi.o
obj-$(CONFIG_PERF_EVENTS) += events/
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index f57d0bd..aa83887 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -19,6 +19,9 @@
#include "map_in_map.h"
+#define ARRAY_CREATE_FLAG_MASK \
+ (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+
static void bpf_array_free_percpu(struct bpf_array *array)
{
int i;
@@ -60,7 +63,8 @@
/* check sanity of attributes */
if (attr->max_entries == 0 || attr->key_size != 4 ||
- attr->value_size == 0 || attr->map_flags & ~BPF_F_NUMA_NODE ||
+ attr->value_size == 0 ||
+ attr->map_flags & ~ARRAY_CREATE_FLAG_MASK ||
(percpu && numa_node != NUMA_NO_NODE))
return ERR_PTR(-EINVAL);
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index e745d6a..ebdef54 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -50,6 +50,9 @@
#include <linux/bpf.h>
#include <linux/filter.h>
+#define DEV_CREATE_FLAG_MASK \
+ (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+
struct bpf_dtab_netdev {
struct net_device *dev;
struct bpf_dtab *dtab;
@@ -83,7 +86,7 @@
/* check sanity of attributes */
if (attr->max_entries == 0 || attr->key_size != 4 ||
- attr->value_size != 4 || attr->map_flags & ~BPF_F_NUMA_NODE)
+ attr->value_size != 4 || attr->map_flags & ~DEV_CREATE_FLAG_MASK)
return ERR_PTR(-EINVAL);
dtab = kzalloc(sizeof(*dtab), GFP_USER);
diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
index 3d0ecc2..73fad0b5 100644
--- a/kernel/bpf/hashtab.c
+++ b/kernel/bpf/hashtab.c
@@ -18,8 +18,9 @@
#include "bpf_lru_list.h"
#include "map_in_map.h"
-#define HTAB_CREATE_FLAG_MASK \
- (BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE)
+#define HTAB_CREATE_FLAG_MASK \
+ (BPF_F_NO_PREALLOC | BPF_F_NO_COMMON_LRU | BPF_F_NUMA_NODE | \
+ BPF_F_RDONLY | BPF_F_WRONLY)
struct bucket {
struct hlist_nulls_head head;
diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index be1dde9..01aaef1 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -295,7 +295,7 @@
}
static void *bpf_obj_do_get(const struct filename *pathname,
- enum bpf_type *type)
+ enum bpf_type *type, int flags)
{
struct inode *inode;
struct path path;
@@ -307,7 +307,7 @@
return ERR_PTR(ret);
inode = d_backing_inode(path.dentry);
- ret = inode_permission(inode, MAY_WRITE);
+ ret = inode_permission(inode, ACC_MODE(flags));
if (ret)
goto out;
@@ -326,18 +326,23 @@
return ERR_PTR(ret);
}
-int bpf_obj_get_user(const char __user *pathname)
+int bpf_obj_get_user(const char __user *pathname, int flags)
{
enum bpf_type type = BPF_TYPE_UNSPEC;
struct filename *pname;
int ret = -ENOENT;
+ int f_flags;
void *raw;
+ f_flags = bpf_get_file_flag(flags);
+ if (f_flags < 0)
+ return f_flags;
+
pname = getname(pathname);
if (IS_ERR(pname))
return PTR_ERR(pname);
- raw = bpf_obj_do_get(pname, &type);
+ raw = bpf_obj_do_get(pname, &type, f_flags);
if (IS_ERR(raw)) {
ret = PTR_ERR(raw);
goto out;
@@ -346,7 +351,7 @@
if (type == BPF_TYPE_PROG)
ret = bpf_prog_new_fd(raw);
else if (type == BPF_TYPE_MAP)
- ret = bpf_map_new_fd(raw);
+ ret = bpf_map_new_fd(raw, f_flags);
else
goto out;
diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index c28c584..224362f 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -406,7 +406,8 @@
#define LPM_KEY_SIZE_MAX LPM_KEY_SIZE(LPM_DATA_SIZE_MAX)
#define LPM_KEY_SIZE_MIN LPM_KEY_SIZE(LPM_DATA_SIZE_MIN)
-#define LPM_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_NUMA_NODE)
+#define LPM_CREATE_FLAG_MASK (BPF_F_NO_PREALLOC | BPF_F_NUMA_NODE | \
+ BPF_F_RDONLY | BPF_F_WRONLY)
static struct bpf_map *trie_alloc(union bpf_attr *attr)
{
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 53a4787..8f9bcbe 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -41,6 +41,9 @@
#include <net/strparser.h>
#include <net/tcp.h>
+#define SOCK_CREATE_FLAG_MASK \
+ (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+
struct bpf_stab {
struct bpf_map map;
struct sock **sock_map;
@@ -508,7 +511,7 @@
/* check sanity of attributes */
if (attr->max_entries == 0 || attr->key_size != 4 ||
- attr->value_size != 4 || attr->map_flags & ~BPF_F_NUMA_NODE)
+ attr->value_size != 4 || attr->map_flags & ~SOCK_CREATE_FLAG_MASK)
return ERR_PTR(-EINVAL);
if (attr->value_size > KMALLOC_MAX_SIZE)
diff --git a/kernel/bpf/stackmap.c b/kernel/bpf/stackmap.c
index 135be43..a15bc63 100644
--- a/kernel/bpf/stackmap.c
+++ b/kernel/bpf/stackmap.c
@@ -11,6 +11,9 @@
#include <linux/perf_event.h>
#include "percpu_freelist.h"
+#define STACK_CREATE_FLAG_MASK \
+ (BPF_F_NUMA_NODE | BPF_F_RDONLY | BPF_F_WRONLY)
+
struct stack_map_bucket {
struct pcpu_freelist_node fnode;
u32 hash;
@@ -60,7 +63,7 @@
if (!capable(CAP_SYS_ADMIN))
return ERR_PTR(-EPERM);
- if (attr->map_flags & ~BPF_F_NUMA_NODE)
+ if (attr->map_flags & ~STACK_CREATE_FLAG_MASK)
return ERR_PTR(-EINVAL);
/* check sanity of attributes */
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 4e93321..1ac558a 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -31,6 +31,8 @@
#define IS_FD_HASH(map) ((map)->map_type == BPF_MAP_TYPE_HASH_OF_MAPS)
#define IS_FD_MAP(map) (IS_FD_ARRAY(map) || IS_FD_HASH(map))
+#define BPF_OBJ_FLAG_MASK (BPF_F_RDONLY | BPF_F_WRONLY)
+
DEFINE_PER_CPU(int, bpf_prog_active);
static DEFINE_IDR(prog_idr);
static DEFINE_SPINLOCK(prog_idr_lock);
@@ -207,6 +209,7 @@
struct bpf_map *map = container_of(work, struct bpf_map, work);
bpf_map_uncharge_memlock(map);
+ security_bpf_map_free(map);
/* implementation dependent freeing */
map->ops->map_free(map);
}
@@ -291,17 +294,54 @@
}
#endif
-static const struct file_operations bpf_map_fops = {
+static ssize_t bpf_dummy_read(struct file *filp, char __user *buf, size_t siz,
+ loff_t *ppos)
+{
+ /* We need this handler such that alloc_file() enables
+ * f_mode with FMODE_CAN_READ.
+ */
+ return -EINVAL;
+}
+
+static ssize_t bpf_dummy_write(struct file *filp, const char __user *buf,
+ size_t siz, loff_t *ppos)
+{
+ /* We need this handler such that alloc_file() enables
+ * f_mode with FMODE_CAN_WRITE.
+ */
+ return -EINVAL;
+}
+
+const struct file_operations bpf_map_fops = {
#ifdef CONFIG_PROC_FS
.show_fdinfo = bpf_map_show_fdinfo,
#endif
.release = bpf_map_release,
+ .read = bpf_dummy_read,
+ .write = bpf_dummy_write,
};
-int bpf_map_new_fd(struct bpf_map *map)
+int bpf_map_new_fd(struct bpf_map *map, int flags)
{
+ int ret;
+
+ ret = security_bpf_map(map, OPEN_FMODE(flags));
+ if (ret < 0)
+ return ret;
+
return anon_inode_getfd("bpf-map", &bpf_map_fops, map,
- O_RDWR | O_CLOEXEC);
+ flags | O_CLOEXEC);
+}
+
+int bpf_get_file_flag(int flags)
+{
+ if ((flags & BPF_F_RDONLY) && (flags & BPF_F_WRONLY))
+ return -EINVAL;
+ if (flags & BPF_F_RDONLY)
+ return O_RDONLY;
+ if (flags & BPF_F_WRONLY)
+ return O_WRONLY;
+ return O_RDWR;
}
/* helper macro to check that unused fields 'union bpf_attr' are zero */
@@ -318,12 +358,17 @@
{
int numa_node = bpf_map_attr_numa_node(attr);
struct bpf_map *map;
+ int f_flags;
int err;
err = CHECK_ATTR(BPF_MAP_CREATE);
if (err)
return -EINVAL;
+ f_flags = bpf_get_file_flag(attr->map_flags);
+ if (f_flags < 0)
+ return f_flags;
+
if (numa_node != NUMA_NO_NODE &&
((unsigned int)numa_node >= nr_node_ids ||
!node_online(numa_node)))
@@ -337,15 +382,19 @@
atomic_set(&map->refcnt, 1);
atomic_set(&map->usercnt, 1);
- err = bpf_map_charge_memlock(map);
+ err = security_bpf_map_alloc(map);
if (err)
goto free_map_nouncharge;
+ err = bpf_map_charge_memlock(map);
+ if (err)
+ goto free_map_sec;
+
err = bpf_map_alloc_id(map);
if (err)
goto free_map;
- err = bpf_map_new_fd(map);
+ err = bpf_map_new_fd(map, f_flags);
if (err < 0) {
/* failed to allocate fd.
* bpf_map_put() is needed because the above
@@ -362,6 +411,8 @@
free_map:
bpf_map_uncharge_memlock(map);
+free_map_sec:
+ security_bpf_map_free(map);
free_map_nouncharge:
map->ops->map_free(map);
return err;
@@ -460,6 +511,11 @@
if (IS_ERR(map))
return PTR_ERR(map);
+ if (!(f.file->f_mode & FMODE_CAN_READ)) {
+ err = -EPERM;
+ goto err_put;
+ }
+
key = memdup_user(ukey, map->key_size);
if (IS_ERR(key)) {
err = PTR_ERR(key);
@@ -540,6 +596,11 @@
if (IS_ERR(map))
return PTR_ERR(map);
+ if (!(f.file->f_mode & FMODE_CAN_WRITE)) {
+ err = -EPERM;
+ goto err_put;
+ }
+
key = memdup_user(ukey, map->key_size);
if (IS_ERR(key)) {
err = PTR_ERR(key);
@@ -623,6 +684,11 @@
if (IS_ERR(map))
return PTR_ERR(map);
+ if (!(f.file->f_mode & FMODE_CAN_WRITE)) {
+ err = -EPERM;
+ goto err_put;
+ }
+
key = memdup_user(ukey, map->key_size);
if (IS_ERR(key)) {
err = PTR_ERR(key);
@@ -666,6 +732,11 @@
if (IS_ERR(map))
return PTR_ERR(map);
+ if (!(f.file->f_mode & FMODE_CAN_READ)) {
+ err = -EPERM;
+ goto err_put;
+ }
+
if (ukey) {
key = memdup_user(ukey, map->key_size);
if (IS_ERR(key)) {
@@ -820,6 +891,7 @@
free_used_maps(aux);
bpf_prog_uncharge_memlock(aux->prog);
+ security_bpf_prog_free(aux);
bpf_prog_free(aux->prog);
}
@@ -867,15 +939,23 @@
}
#endif
-static const struct file_operations bpf_prog_fops = {
+const struct file_operations bpf_prog_fops = {
#ifdef CONFIG_PROC_FS
.show_fdinfo = bpf_prog_show_fdinfo,
#endif
.release = bpf_prog_release,
+ .read = bpf_dummy_read,
+ .write = bpf_dummy_write,
};
int bpf_prog_new_fd(struct bpf_prog *prog)
{
+ int ret;
+
+ ret = security_bpf_prog(prog);
+ if (ret < 0)
+ return ret;
+
return anon_inode_getfd("bpf-prog", &bpf_prog_fops, prog,
O_RDWR | O_CLOEXEC);
}
@@ -1015,10 +1095,14 @@
if (!prog)
return -ENOMEM;
- err = bpf_prog_charge_memlock(prog);
+ err = security_bpf_prog_alloc(prog->aux);
if (err)
goto free_prog_nouncharge;
+ err = bpf_prog_charge_memlock(prog);
+ if (err)
+ goto free_prog_sec;
+
prog->len = attr->insn_cnt;
err = -EFAULT;
@@ -1071,16 +1155,18 @@
free_used_maps(prog->aux);
free_prog:
bpf_prog_uncharge_memlock(prog);
+free_prog_sec:
+ security_bpf_prog_free(prog->aux);
free_prog_nouncharge:
bpf_prog_free(prog);
return err;
}
-#define BPF_OBJ_LAST_FIELD bpf_fd
+#define BPF_OBJ_LAST_FIELD file_flags
static int bpf_obj_pin(const union bpf_attr *attr)
{
- if (CHECK_ATTR(BPF_OBJ))
+ if (CHECK_ATTR(BPF_OBJ) || attr->file_flags != 0)
return -EINVAL;
return bpf_obj_pin_user(attr->bpf_fd, u64_to_user_ptr(attr->pathname));
@@ -1088,10 +1174,12 @@
static int bpf_obj_get(const union bpf_attr *attr)
{
- if (CHECK_ATTR(BPF_OBJ) || attr->bpf_fd != 0)
+ if (CHECK_ATTR(BPF_OBJ) || attr->bpf_fd != 0 ||
+ attr->file_flags & ~BPF_OBJ_FLAG_MASK)
return -EINVAL;
- return bpf_obj_get_user(u64_to_user_ptr(attr->pathname));
+ return bpf_obj_get_user(u64_to_user_ptr(attr->pathname),
+ attr->file_flags);
}
#ifdef CONFIG_CGROUP_BPF
@@ -1305,20 +1393,26 @@
return fd;
}
-#define BPF_MAP_GET_FD_BY_ID_LAST_FIELD map_id
+#define BPF_MAP_GET_FD_BY_ID_LAST_FIELD open_flags
static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
{
struct bpf_map *map;
u32 id = attr->map_id;
+ int f_flags;
int fd;
- if (CHECK_ATTR(BPF_MAP_GET_FD_BY_ID))
+ if (CHECK_ATTR(BPF_MAP_GET_FD_BY_ID) ||
+ attr->open_flags & ~BPF_OBJ_FLAG_MASK)
return -EINVAL;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
+ f_flags = bpf_get_file_flag(attr->open_flags);
+ if (f_flags < 0)
+ return f_flags;
+
spin_lock_bh(&map_idr_lock);
map = idr_find(&map_idr, id);
if (map)
@@ -1330,7 +1424,7 @@
if (IS_ERR(map))
return PTR_ERR(map);
- fd = bpf_map_new_fd(map);
+ fd = bpf_map_new_fd(map, f_flags);
if (fd < 0)
bpf_map_put(map);
@@ -1467,6 +1561,10 @@
if (copy_from_user(&attr, uattr, size) != 0)
return -EFAULT;
+ err = security_bpf(cmd, &attr, size);
+ if (err < 0)
+ return err;
+
switch (cmd) {
case BPF_MAP_CREATE:
err = map_create(&attr);
diff --git a/kernel/cfi.c b/kernel/cfi.c
new file mode 100644
index 0000000..c32e6b3
--- /dev/null
+++ b/kernel/cfi.c
@@ -0,0 +1,300 @@
+/*
+ * CFI (Control Flow Integrity) error and slowpath handling
+ *
+ * Copyright (C) 2017 Google, Inc.
+ */
+
+#include <linux/gfp.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/ratelimit.h>
+#include <linux/rcupdate.h>
+#include <linux/spinlock.h>
+#include <asm/bug.h>
+#include <asm/cacheflush.h>
+#include <asm/memory.h>
+#include <asm/set_memory.h>
+
+/* Compiler-defined handler names */
+#ifdef CONFIG_CFI_PERMISSIVE
+#define cfi_failure_handler __ubsan_handle_cfi_check_fail
+#define cfi_slowpath_handler __cfi_slowpath_diag
+#else /* enforcing */
+#define cfi_failure_handler __ubsan_handle_cfi_check_fail_abort
+#define cfi_slowpath_handler __cfi_slowpath
+#endif /* CONFIG_CFI_PERMISSIVE */
+
+static inline void handle_cfi_failure(void *ptr)
+{
+#ifdef CONFIG_CFI_PERMISSIVE
+ WARN_RATELIMIT(1, "CFI failure (target: [<%px>] %pF):\n", ptr, ptr);
+#else
+ pr_err("CFI failure (target: [<%px>] %pF):\n", ptr, ptr);
+ BUG();
+#endif
+}
+
+#ifdef CONFIG_MODULES
+#ifdef CONFIG_CFI_CLANG_SHADOW
+struct shadow_range {
+ /* Module address range */
+ unsigned long mod_min_addr;
+ unsigned long mod_max_addr;
+ /* Module page range */
+ unsigned long min_page;
+ unsigned long max_page;
+};
+
+#define SHADOW_ORDER 1
+#define SHADOW_PAGES (1 << SHADOW_ORDER)
+#define SHADOW_SIZE \
+ ((SHADOW_PAGES * PAGE_SIZE - sizeof(struct shadow_range)) / sizeof(u16))
+#define SHADOW_INVALID 0xFFFF
+
+struct cfi_shadow {
+ /* Page range covered by the shadow */
+ struct shadow_range r;
+ /* Page offsets to __cfi_check functions in modules */
+ u16 shadow[SHADOW_SIZE];
+};
+
+static DEFINE_SPINLOCK(shadow_update_lock);
+static struct cfi_shadow __rcu *cfi_shadow __read_mostly = NULL;
+
+static inline int ptr_to_shadow(const struct cfi_shadow *s, unsigned long ptr)
+{
+ unsigned long index;
+ unsigned long page = ptr >> PAGE_SHIFT;
+
+ if (unlikely(page < s->r.min_page))
+ return -1; /* Outside of module area */
+
+ index = page - s->r.min_page;
+
+ if (index >= SHADOW_SIZE)
+ return -1; /* Cannot be addressed with shadow */
+
+ return (int)index;
+}
+
+static inline unsigned long shadow_to_ptr(const struct cfi_shadow *s,
+ int index)
+{
+ BUG_ON(index < 0 || index >= SHADOW_SIZE);
+
+ if (unlikely(s->shadow[index] == SHADOW_INVALID))
+ return 0;
+
+ return (s->r.min_page + s->shadow[index]) << PAGE_SHIFT;
+}
+
+static void prepare_next_shadow(const struct cfi_shadow __rcu *prev,
+ struct cfi_shadow *next)
+{
+ int i, index, check;
+
+ /* Mark everything invalid */
+ memset(next->shadow, 0xFF, sizeof(next->shadow));
+
+ if (!prev)
+ return; /* No previous shadow */
+
+ /* If the base address didn't change, update is not needed */
+ if (prev->r.min_page == next->r.min_page) {
+ memcpy(next->shadow, prev->shadow, sizeof(next->shadow));
+ return;
+ }
+
+ /* Convert the previous shadow to the new address range */
+ for (i = 0; i < SHADOW_SIZE; ++i) {
+ if (prev->shadow[i] == SHADOW_INVALID)
+ continue;
+
+ index = ptr_to_shadow(next, shadow_to_ptr(prev, i));
+ if (index < 0)
+ continue;
+
+ check = ptr_to_shadow(next,
+ shadow_to_ptr(prev, prev->shadow[i]));
+ if (check < 0)
+ continue;
+
+ next->shadow[index] = (u16)check;
+ }
+}
+
+static void add_module_to_shadow(struct cfi_shadow *s, struct module *mod)
+{
+ unsigned long ptr;
+ unsigned long min_page_addr;
+ unsigned long max_page_addr;
+ unsigned long check = (unsigned long)mod->cfi_check;
+ int check_index = ptr_to_shadow(s, check);
+
+ BUG_ON((check & PAGE_MASK) != check); /* Must be page aligned */
+
+ if (check_index < 0)
+ return; /* Module not addressable with shadow */
+
+ min_page_addr = (unsigned long)mod->core_layout.base & PAGE_MASK;
+ max_page_addr = (unsigned long)mod->core_layout.base +
+ mod->core_layout.text_size;
+ max_page_addr &= PAGE_MASK;
+
+ /* For each page, store the check function index in the shadow */
+ for (ptr = min_page_addr; ptr <= max_page_addr; ptr += PAGE_SIZE) {
+ int index = ptr_to_shadow(s, ptr);
+ if (index >= 0) {
+ /* Assume a page only contains code for one module */
+ BUG_ON(s->shadow[index] != SHADOW_INVALID);
+ s->shadow[index] = (u16)check_index;
+ }
+ }
+}
+
+static void remove_module_from_shadow(struct cfi_shadow *s, struct module *mod)
+{
+ unsigned long ptr;
+ unsigned long min_page_addr;
+ unsigned long max_page_addr;
+
+ min_page_addr = (unsigned long)mod->core_layout.base & PAGE_MASK;
+ max_page_addr = (unsigned long)mod->core_layout.base +
+ mod->core_layout.text_size;
+ max_page_addr &= PAGE_MASK;
+
+ for (ptr = min_page_addr; ptr <= max_page_addr; ptr += PAGE_SIZE) {
+ int index = ptr_to_shadow(s, ptr);
+ if (index >= 0)
+ s->shadow[index] = SHADOW_INVALID;
+ }
+}
+
+typedef void (*update_shadow_fn)(struct cfi_shadow *, struct module *);
+
+static void update_shadow(struct module *mod, unsigned long min_addr,
+ unsigned long max_addr, update_shadow_fn fn)
+{
+ struct cfi_shadow *prev;
+ struct cfi_shadow *next = (struct cfi_shadow *)
+ __get_free_pages(GFP_KERNEL, SHADOW_ORDER);
+
+ BUG_ON(!next);
+
+ next->r.mod_min_addr = min_addr;
+ next->r.mod_max_addr = max_addr;
+ next->r.min_page = min_addr >> PAGE_SHIFT;
+ next->r.max_page = max_addr >> PAGE_SHIFT;
+
+ spin_lock(&shadow_update_lock);
+ prev = rcu_dereference_protected(cfi_shadow, 1);
+ prepare_next_shadow(prev, next);
+
+ fn(next, mod);
+ set_memory_ro((unsigned long)next, SHADOW_PAGES);
+ rcu_assign_pointer(cfi_shadow, next);
+
+ spin_unlock(&shadow_update_lock);
+ synchronize_rcu();
+
+ if (prev) {
+ set_memory_rw((unsigned long)prev, SHADOW_PAGES);
+ free_pages((unsigned long)prev, SHADOW_ORDER);
+ }
+}
+
+void cfi_module_add(struct module *mod, unsigned long min_addr,
+ unsigned long max_addr)
+{
+ update_shadow(mod, min_addr, max_addr, add_module_to_shadow);
+}
+EXPORT_SYMBOL(cfi_module_add);
+
+void cfi_module_remove(struct module *mod, unsigned long min_addr,
+ unsigned long max_addr)
+{
+ update_shadow(mod, min_addr, max_addr, remove_module_from_shadow);
+}
+EXPORT_SYMBOL(cfi_module_remove);
+
+static inline cfi_check_fn ptr_to_check_fn(const struct cfi_shadow __rcu *s,
+ unsigned long ptr)
+{
+ int index;
+ unsigned long check;
+
+ if (unlikely(!s))
+ return NULL; /* No shadow available */
+
+ if (ptr < s->r.mod_min_addr || ptr > s->r.mod_max_addr)
+ return NULL; /* Not in a mapped module */
+
+ index = ptr_to_shadow(s, ptr);
+ if (index < 0)
+ return NULL; /* Cannot be addressed with shadow */
+
+ return (cfi_check_fn)shadow_to_ptr(s, index);
+}
+#endif /* CONFIG_CFI_CLANG_SHADOW */
+
+static inline cfi_check_fn find_module_cfi_check(void *ptr)
+{
+ struct module *mod;
+
+ preempt_disable();
+ mod = __module_address((unsigned long)ptr);
+ preempt_enable();
+
+ if (mod)
+ return mod->cfi_check;
+
+ return CFI_CHECK_FN;
+}
+
+static inline cfi_check_fn find_cfi_check(void *ptr)
+{
+#ifdef CONFIG_CFI_CLANG_SHADOW
+ cfi_check_fn f;
+
+ if (!rcu_access_pointer(cfi_shadow))
+ return CFI_CHECK_FN; /* No loaded modules */
+
+ /* Look up the __cfi_check function to use */
+ rcu_read_lock();
+ f = ptr_to_check_fn(rcu_dereference(cfi_shadow), (unsigned long)ptr);
+ rcu_read_unlock();
+
+ if (f)
+ return f;
+
+ /*
+ * Fall back to find_module_cfi_check, which works also for a larger
+ * module address space, but is slower.
+ */
+#endif /* CONFIG_CFI_CLANG_SHADOW */
+
+ return find_module_cfi_check(ptr);
+}
+
+void cfi_slowpath_handler(uint64_t id, void *ptr, void *diag)
+{
+ cfi_check_fn check = find_cfi_check(ptr);
+
+ if (likely(check))
+ check(id, ptr, diag);
+ else /* Don't allow unchecked modules */
+ handle_cfi_failure(ptr);
+}
+EXPORT_SYMBOL(cfi_slowpath_handler);
+#endif /* CONFIG_MODULES */
+
+void cfi_failure_handler(void *data, void *ptr, void *vtable)
+{
+ handle_cfi_failure(ptr);
+}
+EXPORT_SYMBOL(cfi_failure_handler);
+
+void __cfi_check_fail(void *data, void *ptr)
+{
+ handle_cfi_failure(ptr);
+}
diff --git a/kernel/cgroup/cgroup-v1.c b/kernel/cgroup/cgroup-v1.c
index a2c05d2..292b601 100644
--- a/kernel/cgroup/cgroup-v1.c
+++ b/kernel/cgroup/cgroup-v1.c
@@ -541,7 +541,8 @@
tcred = get_task_cred(task);
if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) &&
!uid_eq(cred->euid, tcred->uid) &&
- !uid_eq(cred->euid, tcred->suid))
+ !uid_eq(cred->euid, tcred->suid) &&
+ !ns_capable(tcred->user_ns, CAP_SYS_NICE))
ret = -EACCES;
put_cred(tcred);
if (ret)
diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c
index 4657e29..2982fb7 100644
--- a/kernel/cgroup/cpuset.c
+++ b/kernel/cgroup/cpuset.c
@@ -103,6 +103,7 @@
/* user-configured CPUs and Memory Nodes allow to tasks */
cpumask_var_t cpus_allowed;
+ cpumask_var_t cpus_requested;
nodemask_t mems_allowed;
/* effective CPUs and Memory Nodes allow to tasks */
@@ -412,7 +413,7 @@
static int is_cpuset_subset(const struct cpuset *p, const struct cpuset *q)
{
- return cpumask_subset(p->cpus_allowed, q->cpus_allowed) &&
+ return cpumask_subset(p->cpus_requested, q->cpus_requested) &&
nodes_subset(p->mems_allowed, q->mems_allowed) &&
is_cpu_exclusive(p) <= is_cpu_exclusive(q) &&
is_mem_exclusive(p) <= is_mem_exclusive(q);
@@ -511,7 +512,7 @@
cpuset_for_each_child(c, css, par) {
if ((is_cpu_exclusive(trial) || is_cpu_exclusive(c)) &&
c != cur &&
- cpumask_intersects(trial->cpus_allowed, c->cpus_allowed))
+ cpumask_intersects(trial->cpus_requested, c->cpus_requested))
goto out;
if ((is_mem_exclusive(trial) || is_mem_exclusive(c)) &&
c != cur &&
@@ -976,17 +977,18 @@
if (!*buf) {
cpumask_clear(trialcs->cpus_allowed);
} else {
- retval = cpulist_parse(buf, trialcs->cpus_allowed);
+ retval = cpulist_parse(buf, trialcs->cpus_requested);
if (retval < 0)
return retval;
- if (!cpumask_subset(trialcs->cpus_allowed,
- top_cpuset.cpus_allowed))
+ if (!cpumask_subset(trialcs->cpus_requested, cpu_present_mask))
return -EINVAL;
+
+ cpumask_and(trialcs->cpus_allowed, trialcs->cpus_requested, cpu_active_mask);
}
/* Nothing to do if the cpus didn't change */
- if (cpumask_equal(cs->cpus_allowed, trialcs->cpus_allowed))
+ if (cpumask_equal(cs->cpus_requested, trialcs->cpus_requested))
return 0;
retval = validate_change(cs, trialcs);
@@ -995,6 +997,7 @@
spin_lock_irq(&callback_lock);
cpumask_copy(cs->cpus_allowed, trialcs->cpus_allowed);
+ cpumask_copy(cs->cpus_requested, trialcs->cpus_requested);
spin_unlock_irq(&callback_lock);
/* use trialcs->cpus_allowed as a temp variable */
@@ -1763,7 +1766,7 @@
switch (type) {
case FILE_CPULIST:
- seq_printf(sf, "%*pbl\n", cpumask_pr_args(cs->cpus_allowed));
+ seq_printf(sf, "%*pbl\n", cpumask_pr_args(cs->cpus_requested));
break;
case FILE_MEMLIST:
seq_printf(sf, "%*pbl\n", nodemask_pr_args(&cs->mems_allowed));
@@ -1953,11 +1956,14 @@
return ERR_PTR(-ENOMEM);
if (!alloc_cpumask_var(&cs->cpus_allowed, GFP_KERNEL))
goto free_cs;
+ if (!alloc_cpumask_var(&cs->cpus_requested, GFP_KERNEL))
+ goto free_allowed;
if (!alloc_cpumask_var(&cs->effective_cpus, GFP_KERNEL))
- goto free_cpus;
+ goto free_requested;
set_bit(CS_SCHED_LOAD_BALANCE, &cs->flags);
cpumask_clear(cs->cpus_allowed);
+ cpumask_clear(cs->cpus_requested);
nodes_clear(cs->mems_allowed);
cpumask_clear(cs->effective_cpus);
nodes_clear(cs->effective_mems);
@@ -1966,7 +1972,9 @@
return &cs->css;
-free_cpus:
+free_requested:
+ free_cpumask_var(cs->cpus_requested);
+free_allowed:
free_cpumask_var(cs->cpus_allowed);
free_cs:
kfree(cs);
@@ -2029,6 +2037,7 @@
cs->mems_allowed = parent->mems_allowed;
cs->effective_mems = parent->mems_allowed;
cpumask_copy(cs->cpus_allowed, parent->cpus_allowed);
+ cpumask_copy(cs->cpus_requested, parent->cpus_requested);
cpumask_copy(cs->effective_cpus, parent->cpus_allowed);
spin_unlock_irq(&callback_lock);
out_unlock:
@@ -2063,6 +2072,7 @@
free_cpumask_var(cs->effective_cpus);
free_cpumask_var(cs->cpus_allowed);
+ free_cpumask_var(cs->cpus_requested);
kfree(cs);
}
@@ -2125,8 +2135,10 @@
BUG_ON(!alloc_cpumask_var(&top_cpuset.cpus_allowed, GFP_KERNEL));
BUG_ON(!alloc_cpumask_var(&top_cpuset.effective_cpus, GFP_KERNEL));
+ BUG_ON(!alloc_cpumask_var(&top_cpuset.cpus_requested, GFP_KERNEL));
cpumask_setall(top_cpuset.cpus_allowed);
+ cpumask_setall(top_cpuset.cpus_requested);
nodes_setall(top_cpuset.mems_allowed);
cpumask_setall(top_cpuset.effective_cpus);
nodes_setall(top_cpuset.effective_mems);
@@ -2259,7 +2271,7 @@
goto retry;
}
- cpumask_and(&new_cpus, cs->cpus_allowed, parent_cs(cs)->effective_cpus);
+ cpumask_and(&new_cpus, cs->cpus_requested, parent_cs(cs)->effective_cpus);
nodes_and(new_mems, cs->mems_allowed, parent_cs(cs)->effective_mems);
cpus_updated = !cpumask_equal(&new_cpus, cs->effective_cpus);
diff --git a/kernel/configs/android-base.config b/kernel/configs/android-base.config
deleted file mode 100644
index d3fd428..0000000
--- a/kernel/configs/android-base.config
+++ /dev/null
@@ -1,161 +0,0 @@
-# KEEP ALPHABETICALLY SORTED
-# CONFIG_DEVKMEM is not set
-# CONFIG_DEVMEM is not set
-# CONFIG_FHANDLE is not set
-# CONFIG_INET_LRO is not set
-# CONFIG_NFSD is not set
-# CONFIG_NFS_FS is not set
-# CONFIG_OABI_COMPAT is not set
-# CONFIG_SYSVIPC is not set
-# CONFIG_USELIB is not set
-CONFIG_ANDROID=y
-CONFIG_ANDROID_BINDER_IPC=y
-CONFIG_ANDROID_BINDER_DEVICES=binder,hwbinder,vndbinder
-CONFIG_ANDROID_LOW_MEMORY_KILLER=y
-CONFIG_ARMV8_DEPRECATED=y
-CONFIG_ASHMEM=y
-CONFIG_AUDIT=y
-CONFIG_BLK_DEV_INITRD=y
-CONFIG_CGROUPS=y
-CONFIG_CGROUP_BPF=y
-CONFIG_CGROUP_CPUACCT=y
-CONFIG_CGROUP_DEBUG=y
-CONFIG_CGROUP_FREEZER=y
-CONFIG_CGROUP_SCHED=y
-CONFIG_CP15_BARRIER_EMULATION=y
-CONFIG_DEFAULT_SECURITY_SELINUX=y
-CONFIG_EMBEDDED=y
-CONFIG_FB=y
-CONFIG_HARDENED_USERCOPY=y
-CONFIG_HIGH_RES_TIMERS=y
-CONFIG_IKCONFIG=y
-CONFIG_IKCONFIG_PROC=y
-CONFIG_INET6_AH=y
-CONFIG_INET6_ESP=y
-CONFIG_INET6_IPCOMP=y
-CONFIG_INET=y
-CONFIG_INET_DIAG_DESTROY=y
-CONFIG_INET_ESP=y
-CONFIG_INET_XFRM_MODE_TUNNEL=y
-CONFIG_IP6_NF_FILTER=y
-CONFIG_IP6_NF_IPTABLES=y
-CONFIG_IP6_NF_MANGLE=y
-CONFIG_IP6_NF_RAW=y
-CONFIG_IP6_NF_TARGET_REJECT=y
-CONFIG_IPV6=y
-CONFIG_IPV6_MIP6=y
-CONFIG_IPV6_MULTIPLE_TABLES=y
-CONFIG_IPV6_OPTIMISTIC_DAD=y
-CONFIG_IPV6_ROUTER_PREF=y
-CONFIG_IPV6_ROUTE_INFO=y
-CONFIG_IP_ADVANCED_ROUTER=y
-CONFIG_IP_MULTICAST=y
-CONFIG_IP_MULTIPLE_TABLES=y
-CONFIG_IP_NF_ARPFILTER=y
-CONFIG_IP_NF_ARPTABLES=y
-CONFIG_IP_NF_ARP_MANGLE=y
-CONFIG_IP_NF_FILTER=y
-CONFIG_IP_NF_IPTABLES=y
-CONFIG_IP_NF_MANGLE=y
-CONFIG_IP_NF_MATCH_AH=y
-CONFIG_IP_NF_MATCH_ECN=y
-CONFIG_IP_NF_MATCH_TTL=y
-CONFIG_IP_NF_NAT=y
-CONFIG_IP_NF_RAW=y
-CONFIG_IP_NF_SECURITY=y
-CONFIG_IP_NF_TARGET_MASQUERADE=y
-CONFIG_IP_NF_TARGET_NETMAP=y
-CONFIG_IP_NF_TARGET_REDIRECT=y
-CONFIG_IP_NF_TARGET_REJECT=y
-CONFIG_MODULES=y
-CONFIG_MODULE_UNLOAD=y
-CONFIG_MODVERSIONS=y
-CONFIG_NET=y
-CONFIG_NETDEVICES=y
-CONFIG_NETFILTER=y
-CONFIG_NETFILTER_TPROXY=y
-CONFIG_NETFILTER_XT_MATCH_COMMENT=y
-CONFIG_NETFILTER_XT_MATCH_CONNLIMIT=y
-CONFIG_NETFILTER_XT_MATCH_CONNMARK=y
-CONFIG_NETFILTER_XT_MATCH_CONNTRACK=y
-CONFIG_NETFILTER_XT_MATCH_HASHLIMIT=y
-CONFIG_NETFILTER_XT_MATCH_HELPER=y
-CONFIG_NETFILTER_XT_MATCH_IPRANGE=y
-CONFIG_NETFILTER_XT_MATCH_LENGTH=y
-CONFIG_NETFILTER_XT_MATCH_LIMIT=y
-CONFIG_NETFILTER_XT_MATCH_MAC=y
-CONFIG_NETFILTER_XT_MATCH_MARK=y
-CONFIG_NETFILTER_XT_MATCH_PKTTYPE=y
-CONFIG_NETFILTER_XT_MATCH_POLICY=y
-CONFIG_NETFILTER_XT_MATCH_QUOTA=y
-CONFIG_NETFILTER_XT_MATCH_SOCKET=y
-CONFIG_NETFILTER_XT_MATCH_STATE=y
-CONFIG_NETFILTER_XT_MATCH_STATISTIC=y
-CONFIG_NETFILTER_XT_MATCH_STRING=y
-CONFIG_NETFILTER_XT_MATCH_TIME=y
-CONFIG_NETFILTER_XT_MATCH_U32=y
-CONFIG_NETFILTER_XT_TARGET_CLASSIFY=y
-CONFIG_NETFILTER_XT_TARGET_CONNMARK=y
-CONFIG_NETFILTER_XT_TARGET_CONNSECMARK=y
-CONFIG_NETFILTER_XT_TARGET_IDLETIMER=y
-CONFIG_NETFILTER_XT_TARGET_MARK=y
-CONFIG_NETFILTER_XT_TARGET_NFLOG=y
-CONFIG_NETFILTER_XT_TARGET_NFQUEUE=y
-CONFIG_NETFILTER_XT_TARGET_SECMARK=y
-CONFIG_NETFILTER_XT_TARGET_TCPMSS=y
-CONFIG_NETFILTER_XT_TARGET_TPROXY=y
-CONFIG_NETFILTER_XT_TARGET_TRACE=y
-CONFIG_NET_CLS_ACT=y
-CONFIG_NET_CLS_U32=y
-CONFIG_NET_EMATCH=y
-CONFIG_NET_EMATCH_U32=y
-CONFIG_NET_KEY=y
-CONFIG_NET_SCHED=y
-CONFIG_NET_SCH_HTB=y
-CONFIG_NF_CONNTRACK=y
-CONFIG_NF_CONNTRACK_AMANDA=y
-CONFIG_NF_CONNTRACK_EVENTS=y
-CONFIG_NF_CONNTRACK_FTP=y
-CONFIG_NF_CONNTRACK_H323=y
-CONFIG_NF_CONNTRACK_IPV4=y
-CONFIG_NF_CONNTRACK_IPV6=y
-CONFIG_NF_CONNTRACK_IRC=y
-CONFIG_NF_CONNTRACK_NETBIOS_NS=y
-CONFIG_NF_CONNTRACK_PPTP=y
-CONFIG_NF_CONNTRACK_SANE=y
-CONFIG_NF_CONNTRACK_SECMARK=y
-CONFIG_NF_CONNTRACK_TFTP=y
-CONFIG_NF_CT_NETLINK=y
-CONFIG_NF_CT_PROTO_DCCP=y
-CONFIG_NF_CT_PROTO_SCTP=y
-CONFIG_NF_CT_PROTO_UDPLITE=y
-CONFIG_NF_NAT=y
-CONFIG_NO_HZ=y
-CONFIG_PACKET=y
-CONFIG_PM_AUTOSLEEP=y
-CONFIG_PM_WAKELOCKS=y
-CONFIG_PPP=y
-CONFIG_PPP_BSDCOMP=y
-CONFIG_PPP_DEFLATE=y
-CONFIG_PPP_MPPE=y
-CONFIG_PREEMPT=y
-CONFIG_QUOTA=y
-CONFIG_RANDOMIZE_BASE=y
-CONFIG_RTC_CLASS=y
-CONFIG_RT_GROUP_SCHED=y
-CONFIG_SECCOMP=y
-CONFIG_SECURITY=y
-CONFIG_SECURITY_NETWORK=y
-CONFIG_SECURITY_SELINUX=y
-CONFIG_SETEND_EMULATION=y
-CONFIG_STAGING=y
-CONFIG_SWP_EMULATION=y
-CONFIG_SYNC=y
-CONFIG_TUN=y
-CONFIG_UNIX=y
-CONFIG_USB_GADGET=y
-CONFIG_USB_CONFIGFS=y
-CONFIG_USB_CONFIGFS_F_FS=y
-CONFIG_USB_CONFIGFS_F_MIDI=y
-CONFIG_USB_OTG_WAKELOCK=y
-CONFIG_XFRM_USER=y
diff --git a/kernel/configs/android-fetch-configs.sh b/kernel/configs/android-fetch-configs.sh
new file mode 100755
index 0000000..2dcd298
--- /dev/null
+++ b/kernel/configs/android-fetch-configs.sh
@@ -0,0 +1,4 @@
+#!/bin/sh
+
+curl https://android.googlesource.com/kernel/configs/+archive/master/android-4.14.tar.gz | tar xzv
+
diff --git a/kernel/configs/android-recommended.config b/kernel/configs/android-recommended.config
deleted file mode 100644
index 946fb92..0000000
--- a/kernel/configs/android-recommended.config
+++ /dev/null
@@ -1,129 +0,0 @@
-# KEEP ALPHABETICALLY SORTED
-# CONFIG_AIO is not set
-# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
-# CONFIG_INPUT_MOUSE is not set
-# CONFIG_LEGACY_PTYS is not set
-# CONFIG_NF_CONNTRACK_SIP is not set
-# CONFIG_PM_WAKELOCKS_GC is not set
-# CONFIG_VT is not set
-CONFIG_ARM64_SW_TTBR0_PAN=y
-CONFIG_BACKLIGHT_LCD_SUPPORT=y
-CONFIG_BLK_DEV_DM=y
-CONFIG_BLK_DEV_LOOP=y
-CONFIG_BLK_DEV_RAM=y
-CONFIG_BLK_DEV_RAM_SIZE=8192
-CONFIG_CC_STACKPROTECTOR_STRONG=y
-CONFIG_COMPACTION=y
-CONFIG_CPU_SW_DOMAIN_PAN=y
-CONFIG_DM_CRYPT=y
-CONFIG_DM_UEVENT=y
-CONFIG_DM_VERITY=y
-CONFIG_DM_VERITY_FEC=y
-CONFIG_DRAGONRISE_FF=y
-CONFIG_ENABLE_DEFAULT_TRACERS=y
-CONFIG_EXT4_FS=y
-CONFIG_EXT4_FS_SECURITY=y
-CONFIG_FUSE_FS=y
-CONFIG_GREENASIA_FF=y
-CONFIG_HIDRAW=y
-CONFIG_HID_A4TECH=y
-CONFIG_HID_ACRUX=y
-CONFIG_HID_ACRUX_FF=y
-CONFIG_HID_APPLE=y
-CONFIG_HID_BELKIN=y
-CONFIG_HID_CHERRY=y
-CONFIG_HID_CHICONY=y
-CONFIG_HID_CYPRESS=y
-CONFIG_HID_DRAGONRISE=y
-CONFIG_HID_ELECOM=y
-CONFIG_HID_EMS_FF=y
-CONFIG_HID_EZKEY=y
-CONFIG_HID_GREENASIA=y
-CONFIG_HID_GYRATION=y
-CONFIG_HID_HOLTEK=y
-CONFIG_HID_KENSINGTON=y
-CONFIG_HID_KEYTOUCH=y
-CONFIG_HID_KYE=y
-CONFIG_HID_LCPOWER=y
-CONFIG_HID_LOGITECH=y
-CONFIG_HID_LOGITECH_DJ=y
-CONFIG_HID_MAGICMOUSE=y
-CONFIG_HID_MICROSOFT=y
-CONFIG_HID_MONTEREY=y
-CONFIG_HID_MULTITOUCH=y
-CONFIG_HID_NTRIG=y
-CONFIG_HID_ORTEK=y
-CONFIG_HID_PANTHERLORD=y
-CONFIG_HID_PETALYNX=y
-CONFIG_HID_PICOLCD=y
-CONFIG_HID_PRIMAX=y
-CONFIG_HID_PRODIKEYS=y
-CONFIG_HID_ROCCAT=y
-CONFIG_HID_SAITEK=y
-CONFIG_HID_SAMSUNG=y
-CONFIG_HID_SMARTJOYPLUS=y
-CONFIG_HID_SONY=y
-CONFIG_HID_SPEEDLINK=y
-CONFIG_HID_SUNPLUS=y
-CONFIG_HID_THRUSTMASTER=y
-CONFIG_HID_TIVO=y
-CONFIG_HID_TOPSEED=y
-CONFIG_HID_TWINHAN=y
-CONFIG_HID_UCLOGIC=y
-CONFIG_HID_WACOM=y
-CONFIG_HID_WALTOP=y
-CONFIG_HID_WIIMOTE=y
-CONFIG_HID_ZEROPLUS=y
-CONFIG_HID_ZYDACRON=y
-CONFIG_INPUT_EVDEV=y
-CONFIG_INPUT_GPIO=y
-CONFIG_INPUT_JOYSTICK=y
-CONFIG_INPUT_MISC=y
-CONFIG_INPUT_TABLET=y
-CONFIG_INPUT_UINPUT=y
-CONFIG_ION=y
-CONFIG_JOYSTICK_XPAD=y
-CONFIG_JOYSTICK_XPAD_FF=y
-CONFIG_JOYSTICK_XPAD_LEDS=y
-CONFIG_KALLSYMS_ALL=y
-CONFIG_KSM=y
-CONFIG_LOGIG940_FF=y
-CONFIG_LOGIRUMBLEPAD2_FF=y
-CONFIG_LOGITECH_FF=y
-CONFIG_MD=y
-CONFIG_MEDIA_SUPPORT=y
-CONFIG_MSDOS_FS=y
-CONFIG_PANIC_TIMEOUT=5
-CONFIG_PANTHERLORD_FF=y
-CONFIG_PERF_EVENTS=y
-CONFIG_PM_DEBUG=y
-CONFIG_PM_RUNTIME=y
-CONFIG_PM_WAKELOCKS_LIMIT=0
-CONFIG_POWER_SUPPLY=y
-CONFIG_PSTORE=y
-CONFIG_PSTORE_CONSOLE=y
-CONFIG_PSTORE_RAM=y
-CONFIG_SCHEDSTATS=y
-CONFIG_SMARTJOYPLUS_FF=y
-CONFIG_SND=y
-CONFIG_SOUND=y
-CONFIG_STRICT_KERNEL_RWX=y
-CONFIG_SUSPEND_TIME=y
-CONFIG_TABLET_USB_ACECAD=y
-CONFIG_TABLET_USB_AIPTEK=y
-CONFIG_TABLET_USB_GTCO=y
-CONFIG_TABLET_USB_HANWANG=y
-CONFIG_TABLET_USB_KBTAB=y
-CONFIG_TASKSTATS=y
-CONFIG_TASK_DELAY_ACCT=y
-CONFIG_TASK_IO_ACCOUNTING=y
-CONFIG_TASK_XACCT=y
-CONFIG_TIMER_STATS=y
-CONFIG_TMPFS=y
-CONFIG_TMPFS_POSIX_ACL=y
-CONFIG_UHID=y
-CONFIG_USB_ANNOUNCE_NEW_DEVICES=y
-CONFIG_USB_EHCI_HCD=y
-CONFIG_USB_HIDDEV=y
-CONFIG_USB_USBNET=y
-CONFIG_VFAT_FS=y
diff --git a/kernel/cpu.c b/kernel/cpu.c
index f3f389e..13a647b 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1227,6 +1227,7 @@
void enable_nonboot_cpus(void)
{
int cpu, error;
+ struct device *cpu_device;
/* Allow everyone to use the CPU hotplug again */
cpu_maps_update_begin();
@@ -1244,6 +1245,12 @@
trace_suspend_resume(TPS("CPU_ON"), cpu, false);
if (!error) {
pr_info("CPU%d is up\n", cpu);
+ cpu_device = get_cpu_device(cpu);
+ if (!cpu_device)
+ pr_err("%s: failed to get cpu%d device\n",
+ __func__, cpu);
+ else
+ kobject_uevent(&cpu_device->kobj, KOBJ_ONLINE);
continue;
}
pr_warn("Error taking CPU%d up: %d\n", cpu, error);
diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
index ed5d349..8d28e30 100644
--- a/kernel/debug/kdb/kdb_io.c
+++ b/kernel/debug/kdb/kdb_io.c
@@ -217,7 +217,7 @@
int i;
int diag, dtab_count;
int key;
-
+ static int last_crlf;
diag = kdbgetintenv("DTABCOUNT", &dtab_count);
if (diag)
@@ -238,6 +238,9 @@
return buffer;
if (key != 9)
tab = 0;
+ if (key != 10 && key != 13)
+ last_crlf = 0;
+
switch (key) {
case 8: /* backspace */
if (cp > buffer) {
@@ -255,7 +258,12 @@
*cp = tmp;
}
break;
- case 13: /* enter */
+ case 10: /* new line */
+ case 13: /* carriage return */
+ /* handle \n after \r */
+ if (last_crlf && last_crlf != key)
+ break;
+ last_crlf = key;
*lastchar++ = '\n';
*lastchar++ = '\0';
if (!KDB_STATE(KGDB_TRANS)) {
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7c394dd..9069886 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -397,8 +397,13 @@
* 0 - disallow raw tracepoint access for unpriv
* 1 - disallow cpu events for unpriv
* 2 - disallow kernel profiling for unpriv
+ * 3 - disallow all unpriv perf event use
*/
+#ifdef CONFIG_SECURITY_PERF_EVENTS_RESTRICT
+int sysctl_perf_event_paranoid __read_mostly = 3;
+#else
int sysctl_perf_event_paranoid __read_mostly = 2;
+#endif
/* Minimum for 512 kiB + 1 user control page */
int sysctl_perf_event_mlock __read_mostly = 512 + (PAGE_SIZE / 1024); /* 'free' kiB per user */
@@ -9977,6 +9982,9 @@
if (flags & ~PERF_FLAG_ALL)
return -EINVAL;
+ if (perf_paranoid_any() && !capable(CAP_SYS_ADMIN))
+ return -EACCES;
+
err = perf_copy_attr(attr_uptr, &attr);
if (err)
return err;
diff --git a/kernel/fork.c b/kernel/fork.c
index 6a219fe..c0f7cf9 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -90,6 +90,7 @@
#include <linux/kcov.h>
#include <linux/livepatch.h>
#include <linux/thread_info.h>
+#include <linux/cpufreq_times.h>
#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -366,6 +367,8 @@
void free_task(struct task_struct *tsk)
{
+ cpufreq_task_times_exit(tsk);
+
#ifndef CONFIG_THREAD_INFO_IN_TASK
/*
* The task is finally done with both the stack and thread_info,
@@ -1596,6 +1599,8 @@
if (!p)
goto fork_out;
+ cpufreq_task_times_init(p);
+
/*
* This _must_ happen before we call free_task(), i.e. before we jump
* to any of the bad_fork_* labels. This is to avoid freeing
@@ -2057,6 +2062,8 @@
struct completion vfork;
struct pid *pid;
+ cpufreq_task_times_alloc(p);
+
trace_sched_process_fork(current, p);
pid = get_task_pid(p, PIDTYPE_PID);
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 127e7cf..3314c0c 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -302,6 +302,24 @@
!!__bpf_address_lookup(addr, symbolsize, offset, namebuf);
}
+#ifdef CONFIG_CFI_CLANG
+/*
+ * LLVM appends .cfi to function names when CONFIG_CFI_CLANG is enabled,
+ * which causes confusion and potentially breaks user space tools, so we
+ * will strip the postfix from expanded symbol names.
+ */
+static inline void cleanup_symbol_name(char *s)
+{
+ char *res;
+
+ res = strrchr(s, '.');
+ if (res && !strcmp(res, ".cfi"))
+ *res = '\0';
+}
+#else
+static inline void cleanup_symbol_name(char *s) {}
+#endif
+
/*
* Lookup an address
* - modname is set to NULL if it's in the kernel.
@@ -328,7 +346,9 @@
namebuf, KSYM_NAME_LEN);
if (modname)
*modname = NULL;
- return namebuf;
+
+ ret = namebuf;
+ goto found;
}
/* See if it's in a module or a BPF JITed image. */
@@ -337,11 +357,16 @@
if (!ret)
ret = bpf_address_lookup(addr, symbolsize,
offset, modname, namebuf);
+
+found:
+ cleanup_symbol_name(namebuf);
return ret;
}
int lookup_symbol_name(unsigned long addr, char *symname)
{
+ int res;
+
symname[0] = '\0';
symname[KSYM_NAME_LEN - 1] = '\0';
@@ -352,15 +377,23 @@
/* Grab name */
kallsyms_expand_symbol(get_symbol_offset(pos),
symname, KSYM_NAME_LEN);
- return 0;
+ goto found;
}
/* See if it's in a module. */
- return lookup_module_symbol_name(addr, symname);
+ res = lookup_module_symbol_name(addr, symname);
+ if (res)
+ return res;
+
+found:
+ cleanup_symbol_name(symname);
+ return 0;
}
int lookup_symbol_attrs(unsigned long addr, unsigned long *size,
unsigned long *offset, char *modname, char *name)
{
+ int res;
+
name[0] = '\0';
name[KSYM_NAME_LEN - 1] = '\0';
@@ -372,10 +405,16 @@
kallsyms_expand_symbol(get_symbol_offset(pos),
name, KSYM_NAME_LEN);
modname[0] = '\0';
- return 0;
+ goto found;
}
/* See if it's in a module. */
- return lookup_module_symbol_attrs(addr, size, offset, modname, name);
+ res = lookup_module_symbol_attrs(addr, size, offset, modname, name);
+ if (res)
+ return res;
+
+found:
+ cleanup_symbol_name(name);
+ return 0;
}
/* Look up a kernel symbol and return it in a text buffer. */
diff --git a/kernel/module.c b/kernel/module.c
index 321b0b1..d89d348 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2121,6 +2121,8 @@
{
}
+static void cfi_cleanup(struct module *mod);
+
/* Free a module, remove from lists, etc. */
static void free_module(struct module *mod)
{
@@ -2162,6 +2164,10 @@
/* This may be empty, but that's OK */
disable_ro_nx(&mod->init_layout);
+
+ /* Clean up CFI for the module. */
+ cfi_cleanup(mod);
+
module_arch_freeing_init(mod);
module_memfree(mod->init_layout.base);
kfree(mod->args);
@@ -3359,6 +3365,8 @@
return 0;
}
+static void cfi_init(struct module *mod);
+
static int post_relocation(struct module *mod, const struct load_info *info)
{
/* Sort exception table now relocations are done. */
@@ -3371,6 +3379,9 @@
/* Setup kallsyms-specific fields. */
add_kallsyms(mod, info);
+ /* Setup CFI for the module. */
+ cfi_init(mod);
+
/* Arch-specific module finalizing. */
return module_finalize(info->hdr, info->sechdrs, mod);
}
@@ -4114,6 +4125,22 @@
}
#endif /* CONFIG_KALLSYMS */
+static void cfi_init(struct module *mod)
+{
+#ifdef CONFIG_CFI_CLANG
+ mod->cfi_check =
+ (cfi_check_fn)mod_find_symname(mod, CFI_CHECK_FN_NAME);
+ cfi_module_add(mod, module_addr_min, module_addr_max);
+#endif
+}
+
+static void cfi_cleanup(struct module *mod)
+{
+#ifdef CONFIG_CFI_CLANG
+ cfi_module_remove(mod, module_addr_min, module_addr_max);
+#endif
+}
+
/* Maximum number of characters written by module_flags() */
#define MODULE_FLAGS_BUF_SIZE (TAINT_FLAGS_COUNT + 4)
diff --git a/kernel/power/Makefile b/kernel/power/Makefile
index a3f79f0e..5c1743d 100644
--- a/kernel/power/Makefile
+++ b/kernel/power/Makefile
@@ -15,3 +15,5 @@
obj-$(CONFIG_PM_WAKELOCKS) += wakelock.o
obj-$(CONFIG_MAGIC_SYSRQ) += poweroff.o
+
+obj-$(CONFIG_SUSPEND) += wakeup_reason.o
diff --git a/kernel/power/process.c b/kernel/power/process.c
index 7381d49..c366e3d 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -22,6 +22,7 @@
#include <linux/kmod.h>
#include <trace/events/power.h>
#include <linux/cpuset.h>
+#include <linux/wakeup_reason.h>
/*
* Timeout for stopping processes
@@ -38,6 +39,9 @@
unsigned int elapsed_msecs;
bool wakeup = false;
int sleep_usecs = USEC_PER_MSEC;
+#ifdef CONFIG_PM_SLEEP
+ char suspend_abort[MAX_SUSPEND_ABORT_LEN];
+#endif
start = ktime_get_boottime();
@@ -67,6 +71,11 @@
break;
if (pm_wakeup_pending()) {
+#ifdef CONFIG_PM_SLEEP
+ pm_get_active_wakeup_sources(suspend_abort,
+ MAX_SUSPEND_ABORT_LEN);
+ log_suspend_abort_reason(suspend_abort);
+#endif
wakeup = true;
break;
}
@@ -85,26 +94,27 @@
elapsed = ktime_sub(end, start);
elapsed_msecs = ktime_to_ms(elapsed);
- if (todo) {
+ if (wakeup) {
pr_cont("\n");
- pr_err("Freezing of tasks %s after %d.%03d seconds "
- "(%d tasks refusing to freeze, wq_busy=%d):\n",
- wakeup ? "aborted" : "failed",
+ pr_err("Freezing of tasks aborted after %d.%03d seconds",
+ elapsed_msecs / 1000, elapsed_msecs % 1000);
+ } else if (todo) {
+ pr_cont("\n");
+ pr_err("Freezing of tasks failed after %d.%03d seconds"
+ " (%d tasks refusing to freeze, wq_busy=%d):\n",
elapsed_msecs / 1000, elapsed_msecs % 1000,
todo - wq_busy, wq_busy);
if (wq_busy)
show_workqueue_state();
- if (!wakeup) {
- read_lock(&tasklist_lock);
- for_each_process_thread(g, p) {
- if (p != current && !freezer_should_skip(p)
- && freezing(p) && !frozen(p))
- sched_show_task(p);
- }
- read_unlock(&tasklist_lock);
+ read_lock(&tasklist_lock);
+ for_each_process_thread(g, p) {
+ if (p != current && !freezer_should_skip(p)
+ && freezing(p) && !frozen(p))
+ sched_show_task(p);
}
+ read_unlock(&tasklist_lock);
} else {
pr_cont("(elapsed %d.%03d seconds) ", elapsed_msecs / 1000,
elapsed_msecs % 1000);
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index c0bc2c8..4789ca7 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -31,6 +31,7 @@
#include <trace/events/power.h>
#include <linux/compiler.h>
#include <linux/moduleparam.h>
+#include <linux/wakeup_reason.h>
#include "power.h"
@@ -389,7 +390,8 @@
*/
static int suspend_enter(suspend_state_t state, bool *wakeup)
{
- int error;
+ char suspend_abort[MAX_SUSPEND_ABORT_LEN];
+ int error, last_dev;
error = platform_suspend_prepare(state);
if (error)
@@ -397,7 +399,11 @@
error = dpm_suspend_late(PMSG_SUSPEND);
if (error) {
+ last_dev = suspend_stats.last_failed_dev + REC_FAILED_NUM - 1;
+ last_dev %= REC_FAILED_NUM;
pr_err("late suspend of devices failed\n");
+ log_suspend_abort_reason("%s device failed to power down",
+ suspend_stats.failed_devs[last_dev]);
goto Platform_finish;
}
error = platform_suspend_prepare_late(state);
@@ -411,7 +417,11 @@
error = dpm_suspend_noirq(PMSG_SUSPEND);
if (error) {
+ last_dev = suspend_stats.last_failed_dev + REC_FAILED_NUM - 1;
+ last_dev %= REC_FAILED_NUM;
pr_err("noirq suspend of devices failed\n");
+ log_suspend_abort_reason("noirq suspend of %s device failed",
+ suspend_stats.failed_devs[last_dev]);
goto Platform_early_resume;
}
error = platform_suspend_prepare_noirq(state);
@@ -422,8 +432,10 @@
goto Platform_wake;
error = disable_nonboot_cpus();
- if (error || suspend_test(TEST_CPUS))
+ if (error || suspend_test(TEST_CPUS)) {
+ log_suspend_abort_reason("Disabling non-boot cpus failed");
goto Enable_cpus;
+ }
arch_suspend_disable_irqs();
BUG_ON(!irqs_disabled());
@@ -438,6 +450,9 @@
trace_suspend_resume(TPS("machine_suspend"),
state, false);
} else if (*wakeup) {
+ pm_get_active_wakeup_sources(suspend_abort,
+ MAX_SUSPEND_ABORT_LEN);
+ log_suspend_abort_reason(suspend_abort);
error = -EBUSY;
}
syscore_resume();
@@ -487,6 +502,7 @@
error = dpm_suspend_start(PMSG_SUSPEND);
if (error) {
pr_err("Some devices failed to suspend, or early wake event detected\n");
+ log_suspend_abort_reason("Some devices failed to suspend, or early wake event detected");
goto Recover_platform;
}
suspend_test_finish("suspend devices");
diff --git a/kernel/power/wakeup_reason.c b/kernel/power/wakeup_reason.c
new file mode 100644
index 0000000..252611f
--- /dev/null
+++ b/kernel/power/wakeup_reason.c
@@ -0,0 +1,225 @@
+/*
+ * kernel/power/wakeup_reason.c
+ *
+ * Logs the reasons which caused the kernel to resume from
+ * the suspend mode.
+ *
+ * Copyright (C) 2014 Google, Inc.
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/wakeup_reason.h>
+#include <linux/kernel.h>
+#include <linux/irq.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/kobject.h>
+#include <linux/sysfs.h>
+#include <linux/init.h>
+#include <linux/spinlock.h>
+#include <linux/notifier.h>
+#include <linux/suspend.h>
+
+
+#define MAX_WAKEUP_REASON_IRQS 32
+static int irq_list[MAX_WAKEUP_REASON_IRQS];
+static int irqcount;
+static bool suspend_abort;
+static char abort_reason[MAX_SUSPEND_ABORT_LEN];
+static struct kobject *wakeup_reason;
+static DEFINE_SPINLOCK(resume_reason_lock);
+
+static ktime_t last_monotime; /* monotonic time before last suspend */
+static ktime_t curr_monotime; /* monotonic time after last suspend */
+static ktime_t last_stime; /* monotonic boottime offset before last suspend */
+static ktime_t curr_stime; /* monotonic boottime offset after last suspend */
+
+static ssize_t last_resume_reason_show(struct kobject *kobj, struct kobj_attribute *attr,
+ char *buf)
+{
+ int irq_no, buf_offset = 0;
+ struct irq_desc *desc;
+ spin_lock(&resume_reason_lock);
+ if (suspend_abort) {
+ buf_offset = sprintf(buf, "Abort: %s", abort_reason);
+ } else {
+ for (irq_no = 0; irq_no < irqcount; irq_no++) {
+ desc = irq_to_desc(irq_list[irq_no]);
+ if (desc && desc->action && desc->action->name)
+ buf_offset += sprintf(buf + buf_offset, "%d %s\n",
+ irq_list[irq_no], desc->action->name);
+ else
+ buf_offset += sprintf(buf + buf_offset, "%d\n",
+ irq_list[irq_no]);
+ }
+ }
+ spin_unlock(&resume_reason_lock);
+ return buf_offset;
+}
+
+static ssize_t last_suspend_time_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct timespec sleep_time;
+ struct timespec total_time;
+ struct timespec suspend_resume_time;
+
+ /*
+ * total_time is calculated from monotonic bootoffsets because
+ * unlike CLOCK_MONOTONIC it include the time spent in suspend state.
+ */
+ total_time = ktime_to_timespec(ktime_sub(curr_stime, last_stime));
+
+ /*
+ * suspend_resume_time is calculated as monotonic (CLOCK_MONOTONIC)
+ * time interval before entering suspend and post suspend.
+ */
+ suspend_resume_time = ktime_to_timespec(ktime_sub(curr_monotime, last_monotime));
+
+ /* sleep_time = total_time - suspend_resume_time */
+ sleep_time = timespec_sub(total_time, suspend_resume_time);
+
+ /* Export suspend_resume_time and sleep_time in pair here. */
+ return sprintf(buf, "%lu.%09lu %lu.%09lu\n",
+ suspend_resume_time.tv_sec, suspend_resume_time.tv_nsec,
+ sleep_time.tv_sec, sleep_time.tv_nsec);
+}
+
+static struct kobj_attribute resume_reason = __ATTR_RO(last_resume_reason);
+static struct kobj_attribute suspend_time = __ATTR_RO(last_suspend_time);
+
+static struct attribute *attrs[] = {
+ &resume_reason.attr,
+ &suspend_time.attr,
+ NULL,
+};
+static struct attribute_group attr_group = {
+ .attrs = attrs,
+};
+
+/*
+ * logs all the wake up reasons to the kernel
+ * stores the irqs to expose them to the userspace via sysfs
+ */
+void log_wakeup_reason(int irq)
+{
+ struct irq_desc *desc;
+ desc = irq_to_desc(irq);
+ if (desc && desc->action && desc->action->name)
+ printk(KERN_INFO "Resume caused by IRQ %d, %s\n", irq,
+ desc->action->name);
+ else
+ printk(KERN_INFO "Resume caused by IRQ %d\n", irq);
+
+ spin_lock(&resume_reason_lock);
+ if (irqcount == MAX_WAKEUP_REASON_IRQS) {
+ spin_unlock(&resume_reason_lock);
+ printk(KERN_WARNING "Resume caused by more than %d IRQs\n",
+ MAX_WAKEUP_REASON_IRQS);
+ return;
+ }
+
+ irq_list[irqcount++] = irq;
+ spin_unlock(&resume_reason_lock);
+}
+
+int check_wakeup_reason(int irq)
+{
+ int irq_no;
+ int ret = false;
+
+ spin_lock(&resume_reason_lock);
+ for (irq_no = 0; irq_no < irqcount; irq_no++)
+ if (irq_list[irq_no] == irq) {
+ ret = true;
+ break;
+ }
+ spin_unlock(&resume_reason_lock);
+ return ret;
+}
+
+void log_suspend_abort_reason(const char *fmt, ...)
+{
+ va_list args;
+
+ spin_lock(&resume_reason_lock);
+
+ //Suspend abort reason has already been logged.
+ if (suspend_abort) {
+ spin_unlock(&resume_reason_lock);
+ return;
+ }
+
+ suspend_abort = true;
+ va_start(args, fmt);
+ vsnprintf(abort_reason, MAX_SUSPEND_ABORT_LEN, fmt, args);
+ va_end(args);
+ spin_unlock(&resume_reason_lock);
+}
+
+/* Detects a suspend and clears all the previous wake up reasons*/
+static int wakeup_reason_pm_event(struct notifier_block *notifier,
+ unsigned long pm_event, void *unused)
+{
+ switch (pm_event) {
+ case PM_SUSPEND_PREPARE:
+ spin_lock(&resume_reason_lock);
+ irqcount = 0;
+ suspend_abort = false;
+ spin_unlock(&resume_reason_lock);
+ /* monotonic time since boot */
+ last_monotime = ktime_get();
+ /* monotonic time since boot including the time spent in suspend */
+ last_stime = ktime_get_boottime();
+ break;
+ case PM_POST_SUSPEND:
+ /* monotonic time since boot */
+ curr_monotime = ktime_get();
+ /* monotonic time since boot including the time spent in suspend */
+ curr_stime = ktime_get_boottime();
+ break;
+ default:
+ break;
+ }
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block wakeup_reason_pm_notifier_block = {
+ .notifier_call = wakeup_reason_pm_event,
+};
+
+/* Initializes the sysfs parameter
+ * registers the pm_event notifier
+ */
+int __init wakeup_reason_init(void)
+{
+ int retval;
+
+ retval = register_pm_notifier(&wakeup_reason_pm_notifier_block);
+ if (retval)
+ printk(KERN_WARNING "[%s] failed to register PM notifier %d\n",
+ __func__, retval);
+
+ wakeup_reason = kobject_create_and_add("wakeup_reasons", kernel_kobj);
+ if (!wakeup_reason) {
+ printk(KERN_WARNING "[%s] failed to create a sysfs kobject\n",
+ __func__);
+ return 1;
+ }
+ retval = sysfs_create_group(wakeup_reason, &attr_group);
+ if (retval) {
+ kobject_put(wakeup_reason);
+ printk(KERN_WARNING "[%s] failed to create a sysfs group %d\n",
+ __func__, retval);
+ }
+ return 0;
+}
+
+late_initcall(wakeup_reason_init);
diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index a9ee16b..b9207a9 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -20,9 +20,12 @@
obj-y += idle_task.o fair.o rt.o deadline.o
obj-y += wait.o wait_bit.o swait.o completion.o idle.o
obj-$(CONFIG_SMP) += cpupri.o cpudeadline.o topology.o stop_task.o
+obj-$(CONFIG_GENERIC_ARCH_TOPOLOGY) += energy.o
+obj-$(CONFIG_SCHED_WALT) += walt.o
obj-$(CONFIG_SCHED_AUTOGROUP) += autogroup.o
obj-$(CONFIG_SCHEDSTATS) += stats.o
obj-$(CONFIG_SCHED_DEBUG) += debug.o
+obj-$(CONFIG_SCHED_TUNE) += tune.o
obj-$(CONFIG_CGROUP_CPUACCT) += cpuacct.o
obj-$(CONFIG_CPU_FREQ) += cpufreq.o
obj-$(CONFIG_CPU_FREQ_GOV_SCHEDUTIL) += cpufreq_schedutil.o
diff --git a/kernel/sched/autogroup.c b/kernel/sched/autogroup.c
index 1d179ab..0c011dd 100644
--- a/kernel/sched/autogroup.c
+++ b/kernel/sched/autogroup.c
@@ -263,7 +263,6 @@
}
#endif /* CONFIG_PROC_FS */
-#ifdef CONFIG_SCHED_DEBUG
int autogroup_path(struct task_group *tg, char *buf, int buflen)
{
if (!task_group_is_autogroup(tg))
@@ -271,4 +270,3 @@
return snprintf(buf, buflen, "%s-%ld", "/autogroup", tg->autogroup->id);
}
-#endif /* CONFIG_SCHED_DEBUG */
diff --git a/kernel/sched/autogroup.h b/kernel/sched/autogroup.h
index 27cd22b..590184b 100644
--- a/kernel/sched/autogroup.h
+++ b/kernel/sched/autogroup.h
@@ -56,11 +56,9 @@
return tg;
}
-#ifdef CONFIG_SCHED_DEBUG
static inline int autogroup_path(struct task_group *tg, char *buf, int buflen)
{
return 0;
}
-#endif
#endif /* CONFIG_SCHED_AUTOGROUP */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4e89ed8..b059158 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -39,6 +39,7 @@
#define CREATE_TRACE_POINTS
#include <trace/events/sched.h>
+#include "walt.h"
DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues);
@@ -438,6 +439,8 @@
if (cmpxchg(&node->next, NULL, WAKE_Q_TAIL))
return;
+ head->count++;
+
get_task_struct(task);
/*
@@ -447,6 +450,10 @@
head->lastp = &node->next;
}
+static int
+try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags,
+ int sibling_count_hint);
+
void wake_up_q(struct wake_q_head *head)
{
struct wake_q_node *node = head->first;
@@ -461,10 +468,10 @@
task->wake_q.next = NULL;
/*
- * wake_up_process() implies a wmb() to pair with the queueing
+ * try_to_wake_up() implies a wmb() to pair with the queueing
* in wake_q_add() so as not to miss wakeups.
*/
- wake_up_process(task);
+ try_to_wake_up(task, TASK_NORMAL, 0, head->count);
put_task_struct(task);
}
}
@@ -1205,6 +1212,8 @@
p->sched_class->migrate_task_rq(p);
p->se.nr_migrations++;
perf_event_task_migrate(p);
+
+ walt_fixup_busy_time(p, new_cpu);
}
__set_task_cpu(p, new_cpu);
@@ -1554,12 +1563,14 @@
* The caller (fork, wakeup) owns p->pi_lock, ->cpus_allowed is stable.
*/
static inline
-int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
+int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags,
+ int sibling_count_hint)
{
lockdep_assert_held(&p->pi_lock);
if (p->nr_cpus_allowed > 1)
- cpu = p->sched_class->select_task_rq(p, cpu, sd_flags, wake_flags);
+ cpu = p->sched_class->select_task_rq(p, cpu, sd_flags, wake_flags,
+ sibling_count_hint);
else
cpu = cpumask_any(&p->cpus_allowed);
@@ -1965,11 +1976,33 @@
*
*/
+#ifdef CONFIG_SMP
+#ifdef CONFIG_SCHED_WALT
+/* utility function to update walt signals at wakeup */
+static inline void walt_try_to_wake_up(struct task_struct *p)
+{
+ struct rq *rq = cpu_rq(task_cpu(p));
+ struct rq_flags rf;
+ u64 wallclock;
+
+ rq_lock_irqsave(rq, &rf);
+ wallclock = walt_ktime_clock();
+ walt_update_task_ravg(rq->curr, rq, TASK_UPDATE, wallclock, 0);
+ walt_update_task_ravg(p, rq, TASK_WAKE, wallclock, 0);
+ rq_unlock_irqrestore(rq, &rf);
+}
+#else
+#define walt_try_to_wake_up(a) {}
+#endif
+#endif
+
/**
* try_to_wake_up - wake up a thread
* @p: the thread to be awakened
* @state: the mask of task states that can be woken
* @wake_flags: wake modifier flags (WF_*)
+ * @sibling_count_hint: A hint at the number of threads that are being woken up
+ * in this event.
*
* If (@state & @p->state) @p->state = TASK_RUNNING.
*
@@ -1982,7 +2015,8 @@
* %false otherwise.
*/
static int
-try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
+try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags,
+ int sibling_count_hint)
{
unsigned long flags;
int cpu, success = 0;
@@ -2060,6 +2094,8 @@
*/
smp_cond_load_acquire(&p->on_cpu, !VAL);
+ walt_try_to_wake_up(p);
+
p->sched_contributes_to_load = !!task_contributes_to_load(p);
p->state = TASK_WAKING;
@@ -2068,7 +2104,8 @@
atomic_dec(&task_rq(p)->nr_iowait);
}
- cpu = select_task_rq(p, p->wake_cpu, SD_BALANCE_WAKE, wake_flags);
+ cpu = select_task_rq(p, p->wake_cpu, SD_BALANCE_WAKE, wake_flags,
+ sibling_count_hint);
if (task_cpu(p) != cpu) {
wake_flags |= WF_MIGRATED;
set_task_cpu(p, cpu);
@@ -2129,6 +2166,11 @@
trace_sched_waking(p);
if (!task_on_rq_queued(p)) {
+ u64 wallclock = walt_ktime_clock();
+
+ walt_update_task_ravg(rq->curr, rq, TASK_UPDATE, wallclock, 0);
+ walt_update_task_ravg(p, rq, TASK_WAKE, wallclock, 0);
+
if (p->in_iowait) {
delayacct_blkio_end(p);
atomic_dec(&rq->nr_iowait);
@@ -2156,13 +2198,13 @@
*/
int wake_up_process(struct task_struct *p)
{
- return try_to_wake_up(p, TASK_NORMAL, 0);
+ return try_to_wake_up(p, TASK_NORMAL, 0, 1);
}
EXPORT_SYMBOL(wake_up_process);
int wake_up_state(struct task_struct *p, unsigned int state)
{
- return try_to_wake_up(p, state, 0);
+ return try_to_wake_up(p, state, 0, 1);
}
/*
@@ -2181,7 +2223,12 @@
p->se.prev_sum_exec_runtime = 0;
p->se.nr_migrations = 0;
p->se.vruntime = 0;
+#ifdef CONFIG_SCHED_WALT
+ p->last_sleep_ts = 0;
+#endif
+
INIT_LIST_HEAD(&p->se.group_node);
+ walt_init_new_task_load(p);
#ifdef CONFIG_FAIR_GROUP_SCHED
p->se.cfs_rq = NULL;
@@ -2458,6 +2505,9 @@
struct rq *rq;
raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
+
+ walt_init_new_task_load(p);
+
p->state = TASK_RUNNING;
#ifdef CONFIG_SMP
/*
@@ -2468,13 +2518,15 @@
* Use __set_task_cpu() to avoid calling sched_class::migrate_task_rq,
* as we're not fully set-up yet.
*/
- __set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
+ __set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0, 1));
#endif
rq = __task_rq_lock(p, &rf);
update_rq_clock(rq);
post_init_entity_util_avg(&p->se);
activate_task(rq, p, ENQUEUE_NOCLOCK);
+ walt_mark_task_starting(p);
+
p->on_rq = TASK_ON_RQ_QUEUED;
trace_sched_wakeup_new(p);
check_preempt_curr(rq, p, WF_FORK);
@@ -2929,7 +2981,7 @@
int dest_cpu;
raw_spin_lock_irqsave(&p->pi_lock, flags);
- dest_cpu = p->sched_class->select_task_rq(p, task_cpu(p), SD_BALANCE_EXEC, 0);
+ dest_cpu = p->sched_class->select_task_rq(p, task_cpu(p), SD_BALANCE_EXEC, 0, 1);
if (dest_cpu == smp_processor_id())
goto unlock;
@@ -3028,6 +3080,9 @@
rq_lock(rq, &rf);
+ walt_set_window_start(rq, &rf);
+ walt_update_task_ravg(rq->curr, rq, TASK_UPDATE,
+ walt_ktime_clock(), 0);
update_rq_clock(rq);
curr->sched_class->task_tick(rq, curr, 0);
cpu_load_update_active(rq);
@@ -3299,6 +3354,7 @@
struct rq_flags rf;
struct rq *rq;
int cpu;
+ u64 wallclock;
cpu = smp_processor_id();
rq = cpu_rq(cpu);
@@ -3354,10 +3410,17 @@
}
next = pick_next_task(rq, prev, &rf);
+ wallclock = walt_ktime_clock();
+ walt_update_task_ravg(prev, rq, PUT_PREV_TASK, wallclock, 0);
+ walt_update_task_ravg(next, rq, PICK_NEXT_TASK, wallclock, 0);
clear_tsk_need_resched(prev);
clear_preempt_need_resched();
if (likely(prev != next)) {
+#ifdef CONFIG_SCHED_WALT
+ if (!prev->on_rq)
+ prev->last_sleep_ts = wallclock;
+#endif
rq->nr_switches++;
rq->curr = next;
/*
@@ -3618,7 +3681,7 @@
int default_wake_function(wait_queue_entry_t *curr, unsigned mode, int wake_flags,
void *key)
{
- return try_to_wake_up(curr->private, mode, wake_flags);
+ return try_to_wake_up(curr->private, mode, wake_flags, 1);
}
EXPORT_SYMBOL(default_wake_function);
@@ -5706,6 +5769,9 @@
sched_ttwu_pending();
rq_lock_irqsave(rq, &rf);
+
+ walt_migrate_sync_cpu(cpu);
+
if (rq->rd) {
BUG_ON(!cpumask_test_cpu(cpu, rq->rd->span));
set_rq_offline(rq);
@@ -5913,12 +5979,18 @@
rq->idle_stamp = 0;
rq->avg_idle = 2*sysctl_sched_migration_cost;
rq->max_idle_balance_cost = sysctl_sched_migration_cost;
+#ifdef CONFIG_SCHED_WALT
+ rq->cur_irqload = 0;
+ rq->avg_irqload = 0;
+ rq->irqload_ts = 0;
+#endif
INIT_LIST_HEAD(&rq->cfs_tasks);
rq_attach_root(rq, &def_root_domain);
#ifdef CONFIG_NO_HZ_COMMON
rq->last_load_update_tick = jiffies;
+ rq->last_blocked_load_update_tick = jiffies;
rq->nohz_flags = 0;
#endif
#ifdef CONFIG_NO_HZ_FULL
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
index 81eb789..d70794a 100644
--- a/kernel/sched/cpufreq_schedutil.c
+++ b/kernel/sched/cpufreq_schedutil.c
@@ -19,11 +19,14 @@
#include "sched.h"
+unsigned long boosted_cpu_util(int cpu, unsigned long other_util);
+
#define SUGOV_KTHREAD_PRIORITY 50
struct sugov_tunables {
struct gov_attr_set attr_set;
- unsigned int rate_limit_us;
+ unsigned int up_rate_limit_us;
+ unsigned int down_rate_limit_us;
};
struct sugov_policy {
@@ -34,7 +37,9 @@
raw_spinlock_t update_lock; /* For shared policies */
u64 last_freq_update_time;
- s64 freq_update_delay_ns;
+ s64 min_rate_limit_ns;
+ s64 up_rate_delay_ns;
+ s64 down_rate_delay_ns;
unsigned int next_freq;
unsigned int cached_raw_freq;
@@ -111,8 +116,32 @@
return true;
}
+ /* No need to recalculate next freq for min_rate_limit_us
+ * at least. However we might still decide to further rate
+ * limit once frequency change direction is decided, according
+ * to the separate rate limits.
+ */
+
delta_ns = time - sg_policy->last_freq_update_time;
- return delta_ns >= sg_policy->freq_update_delay_ns;
+ return delta_ns >= sg_policy->min_rate_limit_ns;
+}
+
+static bool sugov_up_down_rate_limit(struct sugov_policy *sg_policy, u64 time,
+ unsigned int next_freq)
+{
+ s64 delta_ns;
+
+ delta_ns = time - sg_policy->last_freq_update_time;
+
+ if (next_freq > sg_policy->next_freq &&
+ delta_ns < sg_policy->up_rate_delay_ns)
+ return true;
+
+ if (next_freq < sg_policy->next_freq &&
+ delta_ns < sg_policy->down_rate_delay_ns)
+ return true;
+
+ return false;
}
static void sugov_update_commit(struct sugov_policy *sg_policy, u64 time,
@@ -123,6 +152,9 @@
if (sg_policy->next_freq == next_freq)
return;
+ if (sugov_up_down_rate_limit(sg_policy, time, next_freq))
+ return;
+
sg_policy->next_freq = next_freq;
sg_policy->last_freq_update_time = time;
@@ -178,13 +210,15 @@
static void sugov_get_util(unsigned long *util, unsigned long *max, int cpu)
{
- struct rq *rq = cpu_rq(cpu);
- unsigned long cfs_max;
+ unsigned long max_cap, rt;
- cfs_max = arch_scale_cpu_capacity(NULL, cpu);
+ max_cap = arch_scale_cpu_capacity(NULL, cpu);
- *util = min(rq->cfs.avg.util_avg, cfs_max);
- *max = cfs_max;
+ rt = sched_get_rt_rq_util(cpu);
+
+ *util = boosted_cpu_util(cpu, rt);
+ *util = min(*util, max_cap);
+ *max = max_cap;
}
static void sugov_set_iowait_boost(struct sugov_cpu *sg_cpu, u64 time,
@@ -272,7 +306,7 @@
busy = sugov_cpu_is_busy(sg_cpu);
- if (flags & SCHED_CPUFREQ_RT_DL) {
+ if (flags & SCHED_CPUFREQ_DL) {
next_f = policy->cpuinfo.max_freq;
} else {
sugov_get_util(&util, &max, sg_cpu->cpu);
@@ -290,6 +324,7 @@
sg_policy->cached_raw_freq = 0;
}
}
+
sugov_update_commit(sg_policy, time, next_f);
}
@@ -318,7 +353,7 @@
j_sg_cpu->iowait_boost_pending = false;
continue;
}
- if (j_sg_cpu->flags & SCHED_CPUFREQ_RT_DL)
+ if (j_sg_cpu->flags & SCHED_CPUFREQ_DL)
return policy->cpuinfo.max_freq;
j_util = j_sg_cpu->util;
@@ -354,7 +389,7 @@
sg_cpu->last_update = time;
if (sugov_should_update_freq(sg_policy, time)) {
- if (flags & SCHED_CPUFREQ_RT_DL)
+ if (flags & SCHED_CPUFREQ_DL)
next_f = sg_policy->policy->cpuinfo.max_freq;
else
next_f = sugov_next_freq_shared(sg_cpu, time);
@@ -409,15 +444,32 @@
return container_of(attr_set, struct sugov_tunables, attr_set);
}
-static ssize_t rate_limit_us_show(struct gov_attr_set *attr_set, char *buf)
+static DEFINE_MUTEX(min_rate_lock);
+
+static void update_min_rate_limit_ns(struct sugov_policy *sg_policy)
+{
+ mutex_lock(&min_rate_lock);
+ sg_policy->min_rate_limit_ns = min(sg_policy->up_rate_delay_ns,
+ sg_policy->down_rate_delay_ns);
+ mutex_unlock(&min_rate_lock);
+}
+
+static ssize_t up_rate_limit_us_show(struct gov_attr_set *attr_set, char *buf)
{
struct sugov_tunables *tunables = to_sugov_tunables(attr_set);
- return sprintf(buf, "%u\n", tunables->rate_limit_us);
+ return sprintf(buf, "%u\n", tunables->up_rate_limit_us);
}
-static ssize_t rate_limit_us_store(struct gov_attr_set *attr_set, const char *buf,
- size_t count)
+static ssize_t down_rate_limit_us_show(struct gov_attr_set *attr_set, char *buf)
+{
+ struct sugov_tunables *tunables = to_sugov_tunables(attr_set);
+
+ return sprintf(buf, "%u\n", tunables->down_rate_limit_us);
+}
+
+static ssize_t up_rate_limit_us_store(struct gov_attr_set *attr_set,
+ const char *buf, size_t count)
{
struct sugov_tunables *tunables = to_sugov_tunables(attr_set);
struct sugov_policy *sg_policy;
@@ -426,18 +478,42 @@
if (kstrtouint(buf, 10, &rate_limit_us))
return -EINVAL;
- tunables->rate_limit_us = rate_limit_us;
+ tunables->up_rate_limit_us = rate_limit_us;
- list_for_each_entry(sg_policy, &attr_set->policy_list, tunables_hook)
- sg_policy->freq_update_delay_ns = rate_limit_us * NSEC_PER_USEC;
+ list_for_each_entry(sg_policy, &attr_set->policy_list, tunables_hook) {
+ sg_policy->up_rate_delay_ns = rate_limit_us * NSEC_PER_USEC;
+ update_min_rate_limit_ns(sg_policy);
+ }
return count;
}
-static struct governor_attr rate_limit_us = __ATTR_RW(rate_limit_us);
+static ssize_t down_rate_limit_us_store(struct gov_attr_set *attr_set,
+ const char *buf, size_t count)
+{
+ struct sugov_tunables *tunables = to_sugov_tunables(attr_set);
+ struct sugov_policy *sg_policy;
+ unsigned int rate_limit_us;
+
+ if (kstrtouint(buf, 10, &rate_limit_us))
+ return -EINVAL;
+
+ tunables->down_rate_limit_us = rate_limit_us;
+
+ list_for_each_entry(sg_policy, &attr_set->policy_list, tunables_hook) {
+ sg_policy->down_rate_delay_ns = rate_limit_us * NSEC_PER_USEC;
+ update_min_rate_limit_ns(sg_policy);
+ }
+
+ return count;
+}
+
+static struct governor_attr up_rate_limit_us = __ATTR_RW(up_rate_limit_us);
+static struct governor_attr down_rate_limit_us = __ATTR_RW(down_rate_limit_us);
static struct attribute *sugov_attributes[] = {
- &rate_limit_us.attr,
+ &up_rate_limit_us.attr,
+ &down_rate_limit_us.attr,
NULL
};
@@ -584,7 +660,8 @@
goto stop_kthread;
}
- tunables->rate_limit_us = cpufreq_policy_transition_delay_us(policy);
+ tunables->up_rate_limit_us = cpufreq_policy_transition_delay_us(policy);
+ tunables->down_rate_limit_us = cpufreq_policy_transition_delay_us(policy);
policy->governor_data = sg_policy;
sg_policy->tunables = tunables;
@@ -643,7 +720,11 @@
struct sugov_policy *sg_policy = policy->governor_data;
unsigned int cpu;
- sg_policy->freq_update_delay_ns = sg_policy->tunables->rate_limit_us * NSEC_PER_USEC;
+ sg_policy->up_rate_delay_ns =
+ sg_policy->tunables->up_rate_limit_us * NSEC_PER_USEC;
+ sg_policy->down_rate_delay_ns =
+ sg_policy->tunables->down_rate_limit_us * NSEC_PER_USEC;
+ update_min_rate_limit_ns(sg_policy);
sg_policy->last_freq_update_time = 0;
sg_policy->next_freq = UINT_MAX;
sg_policy->work_in_progress = false;
@@ -656,7 +737,7 @@
memset(sg_cpu, 0, sizeof(*sg_cpu));
sg_cpu->cpu = cpu;
sg_cpu->sg_policy = sg_policy;
- sg_cpu->flags = SCHED_CPUFREQ_RT;
+ sg_cpu->flags = SCHED_CPUFREQ_DL;
sg_cpu->iowait_boost_max = policy->cpuinfo.max_freq;
}
diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c
index 14d2dbf..d2d874f 100644
--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -5,7 +5,9 @@
#include <linux/static_key.h>
#include <linux/context_tracking.h>
#include <linux/sched/cputime.h>
+#include <linux/cpufreq_times.h>
#include "sched.h"
+#include "walt.h"
#ifdef CONFIG_IRQ_TIME_ACCOUNTING
@@ -55,11 +57,18 @@
struct irqtime *irqtime = this_cpu_ptr(&cpu_irqtime);
s64 delta;
int cpu;
+#ifdef CONFIG_SCHED_WALT
+ u64 wallclock;
+ bool account = true;
+#endif
if (!sched_clock_irqtime)
return;
cpu = smp_processor_id();
+#ifdef CONFIG_SCHED_WALT
+ wallclock = sched_clock_cpu(cpu);
+#endif
delta = sched_clock_cpu(cpu) - irqtime->irq_start_time;
irqtime->irq_start_time += delta;
@@ -73,6 +82,13 @@
irqtime_account_delta(irqtime, delta, CPUTIME_IRQ);
else if (in_serving_softirq() && curr != this_cpu_ksoftirqd())
irqtime_account_delta(irqtime, delta, CPUTIME_SOFTIRQ);
+#ifdef CONFIG_SCHED_WALT
+ else
+ account = false;
+
+ if (account)
+ walt_account_irqtime(cpu, curr, delta, wallclock);
+#endif
}
EXPORT_SYMBOL_GPL(irqtime_account_irq);
@@ -132,6 +148,9 @@
/* Account for user time used */
acct_account_cputime(p);
+
+ /* Account power usage for user time */
+ cpufreq_acct_update_power(p, cputime);
}
/*
@@ -176,6 +195,9 @@
/* Account for system time used */
acct_account_cputime(p);
+
+ /* Account power usage for system time */
+ cpufreq_acct_update_power(p, cputime);
}
/*
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index b2589c7..52b2341 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -20,6 +20,8 @@
#include <linux/slab.h>
#include <uapi/linux/sched/types.h>
+#include "walt.h"
+
struct dl_bandwidth def_dl_bandwidth;
static inline struct task_struct *dl_task_of(struct sched_dl_entity *dl_se)
@@ -1290,6 +1292,7 @@
WARN_ON(!dl_prio(prio));
dl_rq->dl_nr_running++;
add_nr_running(rq_of_dl_rq(dl_rq), 1);
+ walt_inc_cumulative_runnable_avg(rq_of_dl_rq(dl_rq), dl_task_of(dl_se));
inc_dl_deadline(dl_rq, deadline);
inc_dl_migration(dl_se, dl_rq);
@@ -1304,6 +1307,7 @@
WARN_ON(!dl_rq->dl_nr_running);
dl_rq->dl_nr_running--;
sub_nr_running(rq_of_dl_rq(dl_rq), 1);
+ walt_dec_cumulative_runnable_avg(rq_of_dl_rq(dl_rq), dl_task_of(dl_se));
dec_dl_deadline(dl_rq, dl_se->deadline);
dec_dl_migration(dl_se, dl_rq);
@@ -1509,7 +1513,8 @@
static int find_later_rq(struct task_struct *task);
static int
-select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags)
+select_task_rq_dl(struct task_struct *p, int cpu, int sd_flag, int flags,
+ int sibling_count_hint)
{
struct task_struct *curr;
struct rq *rq;
@@ -2021,7 +2026,9 @@
deactivate_task(rq, next_task, 0);
sub_running_bw(next_task->dl.dl_bw, &rq->dl);
sub_rq_bw(next_task->dl.dl_bw, &rq->dl);
+ next_task->on_rq = TASK_ON_RQ_MIGRATING;
set_task_cpu(next_task, later_rq->cpu);
+ next_task->on_rq = TASK_ON_RQ_QUEUED;
add_rq_bw(next_task->dl.dl_bw, &later_rq->dl);
add_running_bw(next_task->dl.dl_bw, &later_rq->dl);
activate_task(later_rq, next_task, 0);
@@ -2113,7 +2120,9 @@
deactivate_task(src_rq, p, 0);
sub_running_bw(p->dl.dl_bw, &src_rq->dl);
sub_rq_bw(p->dl.dl_bw, &src_rq->dl);
+ p->on_rq = TASK_ON_RQ_MIGRATING;
set_task_cpu(p, this_cpu);
+ p->on_rq = TASK_ON_RQ_QUEUED;
add_rq_bw(p->dl.dl_bw, &this_rq->dl);
add_running_bw(p->dl.dl_bw, &this_rq->dl);
activate_task(this_rq, p, 0);
diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 2f93e4a..4e75fda 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -267,9 +267,60 @@
}
static struct ctl_table *
+sd_alloc_ctl_energy_table(struct sched_group_energy *sge)
+{
+ struct ctl_table *table = sd_alloc_ctl_entry(5);
+
+ if (table == NULL)
+ return NULL;
+
+ set_table_entry(&table[0], "nr_idle_states", &sge->nr_idle_states,
+ sizeof(int), 0644, proc_dointvec_minmax, false);
+ set_table_entry(&table[1], "idle_states", &sge->idle_states[0].power,
+ sge->nr_idle_states*sizeof(struct idle_state), 0644,
+ proc_doulongvec_minmax, false);
+ set_table_entry(&table[2], "nr_cap_states", &sge->nr_cap_states,
+ sizeof(int), 0644, proc_dointvec_minmax, false);
+ set_table_entry(&table[3], "cap_states", &sge->cap_states[0].cap,
+ sge->nr_cap_states*sizeof(struct capacity_state), 0644,
+ proc_doulongvec_minmax, false);
+
+ return table;
+}
+
+static struct ctl_table *
+sd_alloc_ctl_group_table(struct sched_group *sg)
+{
+ struct ctl_table *table = sd_alloc_ctl_entry(2);
+
+ if (table == NULL)
+ return NULL;
+
+ table->procname = kstrdup("energy", GFP_KERNEL);
+ table->mode = 0555;
+ table->child = sd_alloc_ctl_energy_table((struct sched_group_energy *)sg->sge);
+
+ return table;
+}
+
+static struct ctl_table *
sd_alloc_ctl_domain_table(struct sched_domain *sd)
{
- struct ctl_table *table = sd_alloc_ctl_entry(14);
+ struct ctl_table *table;
+ unsigned int nr_entries = 14;
+
+ int i = 0;
+ struct sched_group *sg = sd->groups;
+
+ if (sg->sge) {
+ int nr_sgs = 0;
+
+ do {} while (nr_sgs++, sg = sg->next, sg != sd->groups);
+
+ nr_entries += nr_sgs;
+ }
+
+ table = sd_alloc_ctl_entry(nr_entries);
if (table == NULL)
return NULL;
@@ -302,7 +353,19 @@
sizeof(long), 0644, proc_doulongvec_minmax, false);
set_table_entry(&table[12], "name", sd->name,
CORENAME_MAX_SIZE, 0444, proc_dostring, false);
- /* &table[13] is terminator */
+ sg = sd->groups;
+ if (sg->sge) {
+ char buf[32];
+ struct ctl_table *entry = &table[13];
+
+ do {
+ snprintf(buf, 32, "group%d", i);
+ entry->procname = kstrdup(buf, GFP_KERNEL);
+ entry->mode = 0555;
+ entry->child = sd_alloc_ctl_group_table(sg);
+ } while (entry++, i++, sg = sg->next, sg != sd->groups);
+ }
+ /* &table[nr_entries-1] is terminator */
return table;
}
@@ -564,6 +627,8 @@
cfs_rq->runnable_load_avg);
SEQ_printf(m, " .%-30s: %lu\n", "util_avg",
cfs_rq->avg.util_avg);
+ SEQ_printf(m, " .%-30s: %u\n", "util_est_enqueued",
+ cfs_rq->avg.util_est.enqueued);
SEQ_printf(m, " .%-30s: %ld\n", "removed_load_avg",
atomic_long_read(&cfs_rq->removed_load_avg));
SEQ_printf(m, " .%-30s: %ld\n", "removed_util_avg",
@@ -1010,6 +1075,8 @@
P(se.avg.load_avg);
P(se.avg.util_avg);
P(se.avg.last_update_time);
+ P(se.avg.util_est.ewma);
+ P(se.avg.util_est.enqueued);
#endif
P(policy);
P(prio);
diff --git a/kernel/sched/energy.c b/kernel/sched/energy.c
new file mode 100644
index 0000000..8342de0
--- /dev/null
+++ b/kernel/sched/energy.c
@@ -0,0 +1,306 @@
+/*
+ * Obtain energy cost data from DT and populate relevant scheduler data
+ * structures.
+ *
+ * Copyright (C) 2015 ARM Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program. If not, see <http://www.gnu.org/licenses/>.
+ */
+#define pr_fmt(fmt) "sched-energy: " fmt
+
+#include <linux/gfp.h>
+#include <linux/of.h>
+#include <linux/printk.h>
+#include <linux/sched.h>
+#include <linux/sched/topology.h>
+#include <linux/sched/energy.h>
+#include <linux/stddef.h>
+#include <linux/arch_topology.h>
+#include <linux/cpu.h>
+#include <linux/pm_opp.h>
+#include <linux/platform_device.h>
+
+#include "sched.h"
+
+struct sched_group_energy *sge_array[NR_CPUS][NR_SD_LEVELS];
+
+static void free_resources(void)
+{
+ int cpu, sd_level;
+ struct sched_group_energy *sge;
+
+ for_each_possible_cpu(cpu) {
+ for_each_possible_sd_level(sd_level) {
+ sge = sge_array[cpu][sd_level];
+ if (sge) {
+ kfree(sge->cap_states);
+ kfree(sge->idle_states);
+ kfree(sge);
+ }
+ }
+ }
+}
+static bool sge_ready;
+static bool freq_energy_model;
+
+void check_max_cap_vs_cpu_scale(int cpu, struct sched_group_energy *sge)
+{
+ unsigned long max_cap, cpu_scale;
+
+ max_cap = sge->cap_states[sge->nr_cap_states - 1].cap;
+ cpu_scale = topology_get_cpu_scale(NULL, cpu);
+
+ if (max_cap == cpu_scale)
+ return;
+
+ pr_warn("CPU%d max energy model capacity=%ld != cpu_scale=%ld\n", cpu,
+ max_cap, cpu_scale);
+}
+
+void init_sched_energy_costs(void)
+{
+ struct device_node *cn, *cp;
+ struct capacity_state *cap_states;
+ struct idle_state *idle_states;
+ struct sched_group_energy *sge;
+ const struct property *prop;
+ int sd_level, i, nstates, cpu;
+ const __be32 *val;
+
+ for_each_possible_cpu(cpu) {
+ cn = of_get_cpu_node(cpu, NULL);
+ if (!cn) {
+ pr_warn("CPU device node missing for CPU %d\n", cpu);
+ return;
+ }
+
+ if (!of_find_property(cn, "sched-energy-costs", NULL)) {
+ pr_warn("CPU device node has no sched-energy-costs\n");
+ return;
+ }
+ /* Check if the energy model contains frequency/power values */
+ if (of_find_property(cn, "freq-energy-model", NULL))
+ freq_energy_model = true;
+
+ for_each_possible_sd_level(sd_level) {
+ cp = of_parse_phandle(cn, "sched-energy-costs", sd_level);
+ if (!cp)
+ break;
+
+ prop = of_find_property(cp, "busy-cost-data", NULL);
+ if (!prop || !prop->value) {
+ pr_warn("No busy-cost data, skipping sched_energy init\n");
+ goto out;
+ }
+
+ sge = kcalloc(1, sizeof(struct sched_group_energy),
+ GFP_NOWAIT);
+
+ nstates = (prop->length / sizeof(u32)) / 2;
+ cap_states = kcalloc(nstates,
+ sizeof(struct capacity_state),
+ GFP_NOWAIT);
+
+ for (i = 0, val = prop->value; i < nstates; i++) {
+ if (freq_energy_model) {
+ /*
+ * Capacity values will be calculated later using
+ * frequency reported by OPP driver and cpu_uarch_scale
+ * values.
+ */
+ cap_states[i].frequency = be32_to_cpup(val++);
+ cap_states[i].cap = 0;
+ } else {
+ cap_states[i].frequency = 0;
+ cap_states[i].cap = be32_to_cpup(val++);
+ }
+ cap_states[i].power = be32_to_cpup(val++);
+ }
+
+ sge->nr_cap_states = nstates;
+ sge->cap_states = cap_states;
+
+ prop = of_find_property(cp, "idle-cost-data", NULL);
+ if (!prop || !prop->value) {
+ pr_warn("No idle-cost data, skipping sched_energy init\n");
+ goto out;
+ }
+
+ nstates = (prop->length / sizeof(u32));
+ idle_states = kcalloc(nstates,
+ sizeof(struct idle_state),
+ GFP_NOWAIT);
+
+ for (i = 0, val = prop->value; i < nstates; i++)
+ idle_states[i].power = be32_to_cpup(val++);
+
+ sge->nr_idle_states = nstates;
+ sge->idle_states = idle_states;
+
+ sge_array[cpu][sd_level] = sge;
+ }
+ if (!freq_energy_model)
+ check_max_cap_vs_cpu_scale(cpu, sge_array[cpu][SD_LEVEL0]);
+ }
+ sge_ready = true;
+ pr_info("Sched-energy-costs installed from DT\n");
+ return;
+
+out:
+ free_resources();
+}
+
+static int sched_energy_probe(struct platform_device *pdev)
+{
+ int cpu;
+ unsigned long *max_frequencies = NULL;
+ int ret;
+
+ if (!sge_ready)
+ return -EPROBE_DEFER;
+
+ if (!energy_aware() || !freq_energy_model)
+ return 0;
+
+ max_frequencies = kmalloc_array(nr_cpu_ids, sizeof(unsigned long),
+ GFP_KERNEL);
+ if (!max_frequencies) {
+ ret = -ENOMEM;
+ goto exit;
+ }
+
+ /*
+ * Find system max possible frequency and max frequencies for each
+ * CPUs.
+ */
+ for_each_possible_cpu(cpu) {
+ struct device *cpu_dev;
+ struct dev_pm_opp *opp;
+
+ cpu_dev = get_cpu_device(cpu);
+ if (IS_ERR_OR_NULL(cpu_dev)) {
+ if (!cpu_dev)
+ ret = -EINVAL;
+ else
+ ret = PTR_ERR(cpu_dev);
+ goto exit;
+ }
+
+ max_frequencies[cpu] = ULONG_MAX;
+
+ opp = dev_pm_opp_find_freq_floor(cpu_dev,
+ &max_frequencies[cpu]);
+ if (IS_ERR_OR_NULL(opp)) {
+ if (!opp || PTR_ERR(opp) == -ENODEV)
+ ret = -EPROBE_DEFER;
+ else
+ ret = PTR_ERR(opp);
+ goto exit;
+ }
+
+ /* Convert HZ to KHZ */
+ max_frequencies[cpu] /= 1000;
+ }
+
+ /* update capacity in energy model */
+ for_each_possible_cpu(cpu) {
+ unsigned long cpu_max_cap;
+ struct sched_group_energy *sge_l0, *sge;
+ cpu_max_cap = topology_get_cpu_scale(NULL, cpu);
+
+ /*
+ * All the cap_states have same frequency table so use
+ * SD_LEVEL0's.
+ */
+ sge_l0 = sge_array[cpu][SD_LEVEL0];
+ if (sge_l0 && sge_l0->nr_cap_states > 0) {
+ int i;
+ int ncapstates = sge_l0->nr_cap_states;
+
+ for (i = 0; i < ncapstates; i++) {
+ int sd_level;
+ unsigned long freq, cap;
+
+ /*
+ * Energy model can contain more frequency
+ * steps than actual for multiple speedbin
+ * support. Ceil the max capacity with actual
+ * one.
+ */
+ freq = min(sge_l0->cap_states[i].frequency,
+ max_frequencies[cpu]);
+ cap = DIV_ROUND_UP(cpu_max_cap * freq,
+ max_frequencies[cpu]);
+
+ for_each_possible_sd_level(sd_level) {
+ sge = sge_array[cpu][sd_level];
+ if (!sge)
+ break;
+ sge->cap_states[i].cap = cap;
+ }
+ dev_dbg(&pdev->dev,
+ "cpu=%d freq=%ld cap=%ld power_d0=%ld\n",
+ cpu, freq, sge_l0->cap_states[i].cap,
+ sge_l0->cap_states[i].power);
+ }
+
+ dev_info(&pdev->dev,
+ "cpu=%d [freq=%ld cap=%ld power_d0=%ld] -> [freq=%ld cap=%ld power_d0=%ld]\n",
+ cpu,
+ sge_l0->cap_states[0].frequency,
+ sge_l0->cap_states[0].cap,
+ sge_l0->cap_states[0].power,
+ sge_l0->cap_states[ncapstates - 1].frequency,
+ sge_l0->cap_states[ncapstates - 1].cap,
+ sge_l0->cap_states[ncapstates - 1].power
+ );
+ }
+ }
+
+ kfree(max_frequencies);
+ dev_info(&pdev->dev, "Sched-energy-costs capacity updated\n");
+ return 0;
+
+exit:
+ if (ret != -EPROBE_DEFER)
+ dev_err(&pdev->dev, "error=%d\n", ret);
+ kfree(max_frequencies);
+ return ret;
+}
+
+static struct platform_driver energy_driver = {
+ .driver = {
+ .name = "sched-energy",
+ },
+ .probe = sched_energy_probe,
+};
+
+static struct platform_device energy_device = {
+ .name = "sched-energy",
+};
+
+static int __init sched_energy_init(void)
+{
+ int ret;
+
+ ret = platform_device_register(&energy_device);
+ if (ret)
+ pr_err("%s device_register failed:%d\n", __func__, ret);
+ ret = platform_driver_register(&energy_driver);
+ if (ret) {
+ pr_err("%s driver_register failed:%d\n", __func__, ret);
+ platform_device_unregister(&energy_device);
+ }
+ return ret;
+}
+subsys_initcall(sched_energy_init);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0cc7098..bf2b794 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -37,6 +37,8 @@
#include <trace/events/sched.h>
#include "sched.h"
+#include "tune.h"
+#include "walt.h"
/*
* Targeted preemption latency for CPU-bound tasks:
@@ -55,6 +57,15 @@
unsigned int normalized_sysctl_sched_latency = 6000000ULL;
/*
+ * Enable/disable honoring sync flag in energy-aware wakeups.
+ */
+unsigned int sysctl_sched_sync_hint_enable = 1;
+/*
+ * Enable/disable using cstate knowledge in idle sibling selection
+ */
+unsigned int sysctl_sched_cstate_aware = 1;
+
+/*
* The initial- and re-scaling of tunables is configurable
*
* Options are:
@@ -100,6 +111,13 @@
const_debug unsigned int sysctl_sched_migration_cost = 500000UL;
+#ifdef CONFIG_SCHED_WALT
+unsigned int sysctl_sched_use_walt_cpu_util = 1;
+unsigned int sysctl_sched_use_walt_task_util = 1;
+__read_mostly unsigned int sysctl_sched_walt_cpu_high_irqload =
+ (10 * NSEC_PER_MSEC);
+#endif
+
#ifdef CONFIG_SMP
/*
* For asym packing, by default the lower numbered cpu has higher priority.
@@ -711,6 +729,7 @@
static int select_idle_sibling(struct task_struct *p, int prev_cpu, int cpu);
static unsigned long task_h_load(struct task_struct *p);
+static unsigned long capacity_of(int cpu);
/* Give new sched_entity start runnable values to heavy its load in infant time */
void init_entity_runnable_average(struct sched_entity *se)
@@ -966,6 +985,7 @@
}
trace_sched_stat_blocked(tsk, delta);
+ trace_sched_blocked_reason(tsk);
/*
* Blocking time is in units of nanosecs, so shift by
@@ -1430,7 +1450,6 @@
static unsigned long weighted_cpuload(struct rq *rq);
static unsigned long source_load(int cpu, int type);
static unsigned long target_load(int cpu, int type);
-static unsigned long capacity_of(int cpu);
/* Cached statistics for all CPUs within a node */
struct numa_stats {
@@ -2967,7 +2986,8 @@
*/
static __always_inline int
___update_load_avg(u64 now, int cpu, struct sched_avg *sa,
- unsigned long weight, int running, struct cfs_rq *cfs_rq)
+ unsigned long weight, int running, struct cfs_rq *cfs_rq,
+ struct rt_rq *rt_rq)
{
u64 delta;
@@ -3023,21 +3043,83 @@
}
sa->util_avg = sa->util_sum / (LOAD_AVG_MAX - 1024 + sa->period_contrib);
+ if (cfs_rq)
+ trace_sched_load_cfs_rq(cfs_rq);
+ else {
+ if (likely(!rt_rq))
+ trace_sched_load_se(container_of(sa, struct sched_entity, avg));
+ else
+ trace_sched_load_rt_rq(cpu, rt_rq);
+ }
+
return 1;
}
+/*
+ * When a task is dequeued, its estimated utilization should not be update if
+ * its util_avg has not been updated at least once.
+ * This flag is used to synchronize util_avg updates with util_est updates.
+ * We map this information into the LSB bit of the utilization saved at
+ * dequeue time (i.e. util_est.dequeued).
+ */
+#define UTIL_AVG_UNCHANGED 0x1
+
+static inline void cfs_se_util_change(struct sched_avg *avg)
+{
+ unsigned int enqueued;
+
+ if (!sched_feat(UTIL_EST))
+ return;
+
+ /* Avoid store if the flag has been already set */
+ enqueued = avg->util_est.enqueued;
+ if (!(enqueued & UTIL_AVG_UNCHANGED))
+ return;
+
+ /* Reset flag to report util_avg has been updated */
+ enqueued &= ~UTIL_AVG_UNCHANGED;
+ WRITE_ONCE(avg->util_est.enqueued, enqueued);
+}
+
static int
__update_load_avg_blocked_se(u64 now, int cpu, struct sched_entity *se)
{
- return ___update_load_avg(now, cpu, &se->avg, 0, 0, NULL);
+ return ___update_load_avg(now, cpu, &se->avg, 0, 0, NULL, NULL);
}
static int
__update_load_avg_se(u64 now, int cpu, struct cfs_rq *cfs_rq, struct sched_entity *se)
{
- return ___update_load_avg(now, cpu, &se->avg,
- se->on_rq * scale_load_down(se->load.weight),
- cfs_rq->curr == se, NULL);
+ if (___update_load_avg(now, cpu, &se->avg,
+ se->on_rq * scale_load_down(se->load.weight),
+ cfs_rq->curr == se, NULL, NULL)) {
+ cfs_se_util_change(&se->avg);
+
+#ifdef UTIL_EST_DEBUG
+ /*
+ * Trace utilization only for actual tasks.
+ *
+ * These trace events are mostly useful to get easier to
+ * read plots for the estimated utilization, where we can
+ * compare it with the actual grow/decrease of the original
+ * PELT signal.
+ * Let's keep them disabled by default in "production kernels".
+ */
+ if (entity_is_task(se)) {
+ struct task_struct *tsk = task_of(se);
+
+ trace_sched_util_est_task(tsk, &se->avg);
+
+ /* Trace utilization only for top level CFS RQ */
+ cfs_rq = &(task_rq(tsk)->cfs);
+ trace_sched_util_est_cpu(cpu, cfs_rq);
+ }
+#endif /* UTIL_EST_DEBUG */
+
+ return 1;
+ }
+
+ return 0;
}
static int
@@ -3045,7 +3127,7 @@
{
return ___update_load_avg(now, cpu, &cfs_rq->avg,
scale_load_down(cfs_rq->load.weight),
- cfs_rq->curr != NULL, cfs_rq);
+ cfs_rq->curr != NULL, cfs_rq, NULL);
}
/*
@@ -3098,6 +3180,8 @@
atomic_long_add(delta, &cfs_rq->tg->load_avg);
cfs_rq->tg_load_avg_contrib = cfs_rq->avg.load_avg;
}
+
+ trace_sched_load_tg(cfs_rq);
}
/*
@@ -3266,6 +3350,9 @@
update_tg_cfs_util(cfs_rq, se);
update_tg_cfs_load(cfs_rq, se);
+ trace_sched_load_cfs_rq(cfs_rq);
+ trace_sched_load_se(se);
+
return 1;
}
@@ -3380,6 +3467,21 @@
return decayed || removed_load;
}
+int update_rt_rq_load_avg(u64 now, int cpu, struct rt_rq *rt_rq, int running)
+{
+ int ret;
+
+ ret = ___update_load_avg(now, cpu, &rt_rq->avg, 0, running, NULL, rt_rq);
+
+ return ret;
+}
+
+unsigned long sched_get_rt_rq_util(int cpu)
+{
+ struct rt_rq *rt_rq = &(cpu_rq(cpu)->rt);
+ return rt_rq->avg.util_avg;
+}
+
/*
* Optional action to be done while updating the load average
*/
@@ -3427,6 +3529,8 @@
set_tg_cfs_propagate(cfs_rq);
cfs_rq_util_change(cfs_rq);
+
+ trace_sched_load_cfs_rq(cfs_rq);
}
/**
@@ -3447,6 +3551,8 @@
set_tg_cfs_propagate(cfs_rq);
cfs_rq_util_change(cfs_rq);
+
+ trace_sched_load_cfs_rq(cfs_rq);
}
/* Add the load generated by se into cfs_rq's load average */
@@ -3543,6 +3649,157 @@
static int idle_balance(struct rq *this_rq, struct rq_flags *rf);
+static inline int task_fits_capacity(struct task_struct *p, long capacity);
+
+static inline void update_misfit_status(struct task_struct *p, struct rq *rq)
+{
+ if (!static_branch_unlikely(&sched_asym_cpucapacity))
+ return;
+
+ if (!p) {
+ rq->misfit_task_load = 0;
+ return;
+ }
+
+ if (task_fits_capacity(p, capacity_of(cpu_of(rq)))) {
+ rq->misfit_task_load = 0;
+ return;
+ }
+
+ rq->misfit_task_load = task_h_load(p);
+}
+
+static inline unsigned long task_util(struct task_struct *p)
+{
+#ifdef CONFIG_SCHED_WALT
+ if (likely(!walt_disabled && sysctl_sched_use_walt_task_util))
+ return (p->ravg.demand /
+ (walt_ravg_window >> SCHED_CAPACITY_SHIFT));
+#endif
+ return READ_ONCE(p->se.avg.util_avg);
+}
+
+static inline unsigned long _task_util_est(struct task_struct *p)
+{
+ struct util_est ue = READ_ONCE(p->se.avg.util_est);
+
+ return max(ue.ewma, ue.enqueued);
+}
+
+static inline unsigned long task_util_est(struct task_struct *p)
+{
+#ifdef CONFIG_SCHED_WALT
+ if (likely(!walt_disabled && sysctl_sched_use_walt_task_util))
+ return (p->ravg.demand /
+ (walt_ravg_window >> SCHED_CAPACITY_SHIFT));
+#endif
+ return max(task_util(p), _task_util_est(p));
+}
+
+static inline void util_est_enqueue(struct cfs_rq *cfs_rq,
+ struct task_struct *p)
+{
+ unsigned int enqueued;
+
+ if (!sched_feat(UTIL_EST))
+ return;
+
+ /* Update root cfs_rq's estimated utilization */
+ enqueued = cfs_rq->avg.util_est.enqueued;
+ enqueued += (_task_util_est(p) | UTIL_AVG_UNCHANGED);
+ WRITE_ONCE(cfs_rq->avg.util_est.enqueued, enqueued);
+
+ trace_sched_util_est_task(p, &p->se.avg);
+ trace_sched_util_est_cpu(cpu_of(rq_of(cfs_rq)), cfs_rq);
+}
+
+/*
+ * Check if a (signed) value is within a specified (unsigned) margin,
+ * based on the observation that:
+ *
+ * abs(x) < y := (unsigned)(x + y - 1) < (2 * y - 1)
+ *
+ * NOTE: this only works when value + maring < INT_MAX.
+ */
+static inline bool within_margin(int value, int margin)
+{
+ return ((unsigned int)(value + margin - 1) < (2 * margin - 1));
+}
+
+static void
+util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep)
+{
+ long last_ewma_diff;
+ struct util_est ue;
+
+ if (!sched_feat(UTIL_EST))
+ return;
+
+ /*
+ * Update root cfs_rq's estimated utilization
+ *
+ * If *p is the last task then the root cfs_rq's estimated utilization
+ * of a CPU is 0 by definition.
+ */
+ ue.enqueued = 0;
+ if (cfs_rq->nr_running) {
+ ue.enqueued = cfs_rq->avg.util_est.enqueued;
+ ue.enqueued -= min_t(unsigned int, ue.enqueued,
+ (_task_util_est(p) | UTIL_AVG_UNCHANGED));
+ }
+ WRITE_ONCE(cfs_rq->avg.util_est.enqueued, ue.enqueued);
+
+ trace_sched_util_est_cpu(cpu_of(rq_of(cfs_rq)), cfs_rq);
+
+ /*
+ * Skip update of task's estimated utilization when the task has not
+ * yet completed an activation, e.g. being migrated.
+ */
+ if (!task_sleep)
+ return;
+
+ /*
+ * If the PELT values haven't changed since enqueue time,
+ * skip the util_est update.
+ */
+ ue = p->se.avg.util_est;
+ if (ue.enqueued & UTIL_AVG_UNCHANGED)
+ return;
+
+ /*
+ * Skip update of task's estimated utilization when its EWMA is
+ * already ~1% close to its last activation value.
+ */
+ ue.enqueued = (task_util(p) | UTIL_AVG_UNCHANGED);
+ last_ewma_diff = ue.enqueued - ue.ewma;
+ if (within_margin(last_ewma_diff, (SCHED_CAPACITY_SCALE / 100)))
+ return;
+
+ /*
+ * Update Task's estimated utilization
+ *
+ * When *p completes an activation we can consolidate another sample
+ * of the task size. This is done by storing the current PELT value
+ * as ue.enqueued and by using this value to update the Exponential
+ * Weighted Moving Average (EWMA):
+ *
+ * ewma(t) = w * task_util(p) + (1-w) * ewma(t-1)
+ * = w * task_util(p) + ewma(t-1) - w * ewma(t-1)
+ * = w * (task_util(p) - ewma(t-1)) + ewma(t-1)
+ * = w * ( last_ewma_diff ) + ewma(t-1)
+ * = w * (last_ewma_diff + ewma(t-1) / w)
+ *
+ * Where 'w' is the weight of new samples, which is configured to be
+ * 0.25, thus making w=1/4 ( >>= UTIL_EST_WEIGHT_SHIFT)
+ */
+ ue.ewma <<= UTIL_EST_WEIGHT_SHIFT;
+ ue.ewma += last_ewma_diff;
+ ue.ewma >>= UTIL_EST_WEIGHT_SHIFT;
+ WRITE_ONCE(p->se.avg.util_est, ue);
+
+ trace_sched_util_est_task(p, &p->se.avg);
+}
+
#else /* CONFIG_SMP */
static inline int
@@ -3551,6 +3808,11 @@
return 0;
}
+int update_rt_rq_load_avg(u64 now, int cpu, struct rt_rq *rt_rq, int running)
+{
+ return 0;
+}
+
#define UPDATE_TG 0x0
#define SKIP_AGE_LOAD 0x0
@@ -3575,6 +3837,15 @@
return 0;
}
+static inline void update_misfit_status(struct task_struct *p, struct rq *rq) {}
+
+static inline void
+util_est_enqueue(struct cfs_rq *cfs_rq, struct task_struct *p) {}
+
+static inline void
+util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p,
+ bool task_sleep) {}
+
#endif /* CONFIG_SMP */
static void check_spread(struct cfs_rq *cfs_rq, struct sched_entity *se)
@@ -4870,6 +5141,46 @@
}
#endif
+#ifdef CONFIG_SMP
+static bool cpu_overutilized(int cpu);
+
+static bool sd_overutilized(struct sched_domain *sd)
+{
+ return sd->shared->overutilized;
+}
+
+static void set_sd_overutilized(struct sched_domain *sd)
+{
+ trace_sched_overutilized(sd, sd->shared->overutilized, true);
+ sd->shared->overutilized = true;
+}
+
+static void clear_sd_overutilized(struct sched_domain *sd)
+{
+ trace_sched_overutilized(sd, sd->shared->overutilized, false);
+ sd->shared->overutilized = false;
+}
+
+static inline void update_overutilized_status(struct rq *rq)
+{
+ struct sched_domain *sd;
+
+ rcu_read_lock();
+ sd = rcu_dereference(rq->sd);
+ if (sd && !sd_overutilized(sd) &&
+ cpu_overutilized(rq->cpu))
+ set_sd_overutilized(sd);
+ rcu_read_unlock();
+}
+
+unsigned long boosted_cpu_util(int cpu, unsigned long other_util);
+#else
+
+#define update_overutilized_status(rq) do {} while (0)
+#define boosted_cpu_util(cpu, other_util) cpu_util_freq(cpu)
+
+#endif /* CONFIG_SMP */
+
/*
* The enqueue_task method is called before nr_running is
* increased. Here we update the fair scheduling stats and
@@ -4880,6 +5191,33 @@
{
struct cfs_rq *cfs_rq;
struct sched_entity *se = &p->se;
+ int task_new = !(flags & ENQUEUE_WAKEUP);
+
+ /*
+ * The code below (indirectly) updates schedutil which looks at
+ * the cfs_rq utilization to select a frequency.
+ * Let's add the task's estimated utilization to the cfs_rq's
+ * estimated utilization, before we update schedutil.
+ */
+ util_est_enqueue(&rq->cfs, p);
+
+ /*
+ * The code below (indirectly) updates schedutil which looks at
+ * the cfs_rq utilization to select a frequency.
+ * Let's update schedtune here to ensure the boost value of the
+ * current task is accounted for in the selection of the OPP.
+ *
+ * We do it also in the case where we enqueue a throttled task;
+ * we could argue that a throttled task should not boost a CPU,
+ * however:
+ * a) properly implementing CPU boosting considering throttled
+ * tasks will increase a lot the complexity of the solution
+ * b) it's not easy to quantify the benefits introduced by
+ * such a more complex solution.
+ * Thus, for the time being we go for the simple solution and boost
+ * also for throttled RQs.
+ */
+ schedtune_enqueue_task(p, cpu_of(rq));
/*
* If in_iowait is set, the code below may not trigger any cpufreq
@@ -4904,6 +5242,7 @@
if (cfs_rq_throttled(cfs_rq))
break;
cfs_rq->h_nr_running++;
+ walt_inc_cfs_cumulative_runnable_avg(cfs_rq, p);
flags = ENQUEUE_WAKEUP;
}
@@ -4911,6 +5250,7 @@
for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);
cfs_rq->h_nr_running++;
+ walt_inc_cfs_cumulative_runnable_avg(cfs_rq, p);
if (cfs_rq_throttled(cfs_rq))
break;
@@ -4919,8 +5259,12 @@
update_cfs_shares(se);
}
- if (!se)
+ if (!se) {
add_nr_running(rq, 1);
+ if (!task_new)
+ update_overutilized_status(rq);
+ walt_inc_cumulative_runnable_avg(rq, p);
+ }
hrtick_update(rq);
}
@@ -4938,6 +5282,14 @@
struct sched_entity *se = &p->se;
int task_sleep = flags & DEQUEUE_SLEEP;
+ /*
+ * The code below (indirectly) updates schedutil which looks at
+ * the cfs_rq utilization to select a frequency.
+ * Let's update schedtune here to ensure the boost value of the
+ * current task is not more accounted for in the selection of the OPP.
+ */
+ schedtune_dequeue_task(p, cpu_of(rq));
+
for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);
dequeue_entity(cfs_rq, se, flags);
@@ -4951,6 +5303,7 @@
if (cfs_rq_throttled(cfs_rq))
break;
cfs_rq->h_nr_running--;
+ walt_dec_cfs_cumulative_runnable_avg(cfs_rq, p);
/* Don't dequeue parent if it has other entities besides us */
if (cfs_rq->load.weight) {
@@ -4970,6 +5323,7 @@
for_each_sched_entity(se) {
cfs_rq = cfs_rq_of(se);
cfs_rq->h_nr_running--;
+ walt_dec_cfs_cumulative_runnable_avg(cfs_rq, p);
if (cfs_rq_throttled(cfs_rq))
break;
@@ -4978,9 +5332,12 @@
update_cfs_shares(se);
}
- if (!se)
+ if (!se) {
sub_nr_running(rq, 1);
+ walt_dec_cumulative_runnable_avg(rq, p);
+ }
+ util_est_dequeue(&rq->cfs, p, task_sleep);
hrtick_update(rq);
}
@@ -5288,16 +5645,6 @@
return max(rq->cpu_load[type-1], total);
}
-static unsigned long capacity_of(int cpu)
-{
- return cpu_rq(cpu)->cpu_capacity;
-}
-
-static unsigned long capacity_orig_of(int cpu)
-{
- return cpu_rq(cpu)->cpu_capacity_orig;
-}
-
static unsigned long cpu_avg_load_per_task(int cpu)
{
struct rq *rq = cpu_rq(cpu);
@@ -5328,6 +5675,818 @@
}
/*
+ * Returns the current capacity of cpu after applying both
+ * cpu and freq scaling.
+ */
+unsigned long capacity_curr_of(int cpu)
+{
+ unsigned long max_cap = cpu_rq(cpu)->cpu_capacity_orig;
+ unsigned long scale_freq = arch_scale_freq_capacity(NULL, cpu);
+
+ return cap_scale(max_cap, scale_freq);
+}
+
+inline bool energy_aware(void)
+{
+ return sched_feat(ENERGY_AWARE);
+}
+
+/*
+ * __cpu_norm_util() returns the cpu util relative to a specific capacity,
+ * i.e. it's busy ratio, in the range [0..SCHED_CAPACITY_SCALE] which is useful
+ * for energy calculations. Using the scale-invariant util returned by
+ * cpu_util() and approximating scale-invariant util by:
+ *
+ * util ~ (curr_freq/max_freq)*1024 * capacity_orig/1024 * running_time/time
+ *
+ * the normalized util can be found using the specific capacity.
+ *
+ * capacity = capacity_orig * curr_freq/max_freq
+ *
+ * norm_util = running_time/time ~ util/capacity
+ */
+static unsigned long __cpu_norm_util(unsigned long util, unsigned long capacity)
+{
+ if (util >= capacity)
+ return SCHED_CAPACITY_SCALE;
+
+ return (util << SCHED_CAPACITY_SHIFT)/capacity;
+}
+
+/*
+ * CPU candidates.
+ *
+ * These are labels to reference CPU candidates for an energy_diff.
+ * Currently we support only two possible candidates: the task's previous CPU
+ * and another candiate CPU.
+ * More advanced/aggressive EAS selection policies can consider more
+ * candidates.
+ */
+#define EAS_CPU_PRV 0
+#define EAS_CPU_NXT 1
+#define EAS_CPU_BKP 2
+
+/*
+ * energy_diff - supports the computation of the estimated energy impact in
+ * moving a "task"'s "util_delta" between different CPU candidates.
+ */
+/*
+ * NOTE: When using or examining WALT task signals, all wakeup
+ * latency is included as busy time for task util.
+ *
+ * This is relevant here because:
+ * When debugging is enabled, it can take as much as 1ms to
+ * write the output to the trace buffer for each eenv
+ * scenario. For periodic tasks where the sleep time is of
+ * a similar order, the WALT task util can be inflated.
+ *
+ * Further, and even without debugging enabled,
+ * task wakeup latency changes depending upon the EAS
+ * wakeup algorithm selected - FIND_BEST_TARGET only does
+ * energy calculations for up to 2 candidate CPUs. When
+ * NO_FIND_BEST_TARGET is configured, we can potentially
+ * do an energy calculation across all CPUS in the system.
+ *
+ * The impact to WALT task util on a Juno board
+ * running a periodic task which only sleeps for 200usec
+ * between 1ms activations has been measured.
+ * (i.e. the wakeup latency induced by energy calculation
+ * and debug output is double the desired sleep time and
+ * almost equivalent to the runtime which is more-or-less
+ * the worst case possible for this test)
+ *
+ * In this scenario, a task which has a PELT util of around
+ * 220 is inflated under WALT to have util around 400.
+ *
+ * This is simply a property of the way WALT includes
+ * wakeup latency in busy time while PELT does not.
+ *
+ * Hence - be careful when enabling DEBUG_EENV_DECISIONS
+ * expecially if WALT is the task signal.
+ */
+/*#define DEBUG_EENV_DECISIONS*/
+
+#ifdef DEBUG_EENV_DECISIONS
+/* max of 8 levels of sched groups traversed */
+#define EAS_EENV_DEBUG_LEVELS 16
+
+struct _eenv_debug {
+ unsigned long cap;
+ unsigned long norm_util;
+ unsigned long cap_energy;
+ unsigned long idle_energy;
+ unsigned long this_energy;
+ unsigned long this_busy_energy;
+ unsigned long this_idle_energy;
+ cpumask_t group_cpumask;
+ unsigned long cpu_util[1];
+};
+#endif
+
+struct eenv_cpu {
+ /* CPU ID, must be in cpus_mask */
+ int cpu_id;
+
+ /*
+ * Index (into sched_group_energy::cap_states) of the OPP the
+ * CPU needs to run at if the task is placed on it.
+ * This includes the both active and blocked load, due to
+ * other tasks on this CPU, as well as the task's own
+ * utilization.
+ */
+ int cap_idx;
+ int cap;
+
+ /* Estimated system energy */
+ unsigned long energy;
+
+ /* Estimated energy variation wrt EAS_CPU_PRV */
+ long nrg_delta;
+
+#ifdef DEBUG_EENV_DECISIONS
+ struct _eenv_debug *debug;
+ int debug_idx;
+#endif /* DEBUG_EENV_DECISIONS */
+};
+
+struct energy_env {
+ /* Utilization to move */
+ struct task_struct *p;
+ unsigned long util_delta;
+ unsigned long util_delta_boosted;
+
+ /* Mask of CPUs candidates to evaluate */
+ cpumask_t cpus_mask;
+
+ /* CPU candidates to evaluate */
+ struct eenv_cpu *cpu;
+ int eenv_cpu_count;
+
+#ifdef DEBUG_EENV_DECISIONS
+ /* pointer to the memory block reserved
+ * for debug on this CPU - there will be
+ * sizeof(struct _eenv_debug) *
+ * (EAS_CPU_CNT * EAS_EENV_DEBUG_LEVELS)
+ * bytes allocated here.
+ */
+ struct _eenv_debug *debug;
+#endif
+ /*
+ * Index (into energy_env::cpu) of the morst energy efficient CPU for
+ * the specified energy_env::task
+ */
+ int next_idx;
+ int max_cpu_count;
+
+ /* Support data */
+ struct sched_group *sg_top;
+ struct sched_group *sg_cap;
+ struct sched_group *sg;
+};
+
+/**
+ * Amount of capacity of a CPU that is (estimated to be) used by CFS tasks
+ * @cpu: the CPU to get the utilization of
+ *
+ * The unit of the return value must be the one of capacity so we can compare
+ * the utilization with the capacity of the CPU that is available for CFS task
+ * (ie cpu_capacity).
+ *
+ * cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus the
+ * recent utilization of currently non-runnable tasks on a CPU. It represents
+ * the amount of utilization of a CPU in the range [0..capacity_orig] where
+ * capacity_orig is the cpu_capacity available at the highest frequency,
+ * i.e. arch_scale_cpu_capacity().
+ * The utilization of a CPU converges towards a sum equal to or less than the
+ * current capacity (capacity_curr <= capacity_orig) of the CPU because it is
+ * the running time on this CPU scaled by capacity_curr.
+ *
+ * The estimated utilization of a CPU is defined to be the maximum between its
+ * cfs_rq.avg.util_avg and the sum of the estimated utilization of the tasks
+ * currently RUNNABLE on that CPU.
+ * This allows to properly represent the expected utilization of a CPU which
+ * has just got a big task running since a long sleep period. At the same time
+ * however it preserves the benefits of the "blocked utilization" in
+ * describing the potential for other tasks waking up on the same CPU.
+ *
+ * Nevertheless, cfs_rq.avg.util_avg can be higher than capacity_curr or even
+ * higher than capacity_orig because of unfortunate rounding in
+ * cfs.avg.util_avg or just after migrating tasks and new task wakeups until
+ * the average stabilizes with the new running time. We need to check that the
+ * utilization stays within the range of [0..capacity_orig] and cap it if
+ * necessary. Without utilization capping, a group could be seen as overloaded
+ * (CPU0 utilization at 121% + CPU1 utilization at 80%) whereas CPU1 has 20% of
+ * available capacity. We allow utilization to overshoot capacity_curr (but not
+ * capacity_orig) as it useful for predicting the capacity required after task
+ * migrations (scheduler-driven DVFS).
+ *
+ * Return: the (estimated) utilization for the specified CPU
+ */
+static inline unsigned long cpu_util(int cpu)
+{
+ struct cfs_rq *cfs_rq;
+ unsigned int util;
+
+#ifdef CONFIG_SCHED_WALT
+ if (likely(!walt_disabled && sysctl_sched_use_walt_cpu_util)) {
+ u64 walt_cpu_util = cpu_rq(cpu)->cumulative_runnable_avg;
+
+ walt_cpu_util <<= SCHED_CAPACITY_SHIFT;
+ do_div(walt_cpu_util, walt_ravg_window);
+
+ return min_t(unsigned long, walt_cpu_util,
+ capacity_orig_of(cpu));
+ }
+#endif
+
+ cfs_rq = &cpu_rq(cpu)->cfs;
+ util = READ_ONCE(cfs_rq->avg.util_avg);
+
+ if (sched_feat(UTIL_EST))
+ util = max(util, READ_ONCE(cfs_rq->avg.util_est.enqueued));
+
+ return min_t(unsigned long, util, capacity_orig_of(cpu));
+}
+
+static inline unsigned long cpu_util_freq(int cpu)
+{
+#ifdef CONFIG_SCHED_WALT
+ u64 walt_cpu_util;
+
+ if (unlikely(walt_disabled || !sysctl_sched_use_walt_cpu_util))
+ return cpu_util(cpu);
+
+ walt_cpu_util = cpu_rq(cpu)->prev_runnable_sum;
+ walt_cpu_util <<= SCHED_CAPACITY_SHIFT;
+ do_div(walt_cpu_util, walt_ravg_window);
+
+ return min_t(unsigned long, walt_cpu_util, capacity_orig_of(cpu));
+#else
+ return cpu_util(cpu);
+#endif
+}
+
+/*
+ * cpu_util_wake: Compute CPU utilization with any contributions from
+ * the waking task p removed.
+ */
+static unsigned long cpu_util_wake(int cpu, struct task_struct *p)
+{
+ struct cfs_rq *cfs_rq;
+ unsigned int util;
+
+#ifdef CONFIG_SCHED_WALT
+ /*
+ * WALT does not decay idle tasks in the same manner
+ * as PELT, so it makes little sense to subtract task
+ * utilization from cpu utilization. Instead just use
+ * cpu_util for this case.
+ */
+ if (likely(!walt_disabled && sysctl_sched_use_walt_cpu_util))
+ return cpu_util(cpu);
+#endif
+
+ /* Task has no contribution or is new */
+ if (cpu != task_cpu(p) || !READ_ONCE(p->se.avg.last_update_time))
+ return cpu_util(cpu);
+
+ cfs_rq = &cpu_rq(cpu)->cfs;
+ util = READ_ONCE(cfs_rq->avg.util_avg);
+
+ /* Discount task's blocked util from CPU's util */
+ util -= min_t(unsigned int, util, task_util(p));
+
+ /*
+ * Covered cases:
+ *
+ * a) if *p is the only task sleeping on this CPU, then:
+ * cpu_util (== task_util) > util_est (== 0)
+ * and thus we return:
+ * cpu_util_wake = (cpu_util - task_util) = 0
+ *
+ * b) if other tasks are SLEEPING on this CPU, which is now exiting
+ * IDLE, then:
+ * cpu_util >= task_util
+ * cpu_util > util_est (== 0)
+ * and thus we discount *p's blocked utilization to return:
+ * cpu_util_wake = (cpu_util - task_util) >= 0
+ *
+ * c) if other tasks are RUNNABLE on that CPU and
+ * util_est > cpu_util
+ * then we use util_est since it returns a more restrictive
+ * estimation of the spare capacity on that CPU, by just
+ * considering the expected utilization of tasks already
+ * runnable on that CPU.
+ *
+ * Cases a) and b) are covered by the above code, while case c) is
+ * covered by the following code when estimated utilization is
+ * enabled.
+ */
+ if (sched_feat(UTIL_EST))
+ util = max(util, READ_ONCE(cfs_rq->avg.util_est.enqueued));
+
+ /*
+ * Utilization (estimated) can exceed the CPU capacity, thus let's
+ * clamp to the maximum CPU capacity to ensure consistency with
+ * the cpu_util call.
+ */
+ return min_t(unsigned long, util, capacity_orig_of(cpu));
+}
+
+static unsigned long group_max_util(struct energy_env *eenv, int cpu_idx)
+{
+ unsigned long max_util = 0;
+ unsigned long util;
+ int cpu;
+
+ for_each_cpu(cpu, sched_group_span(eenv->sg_cap)) {
+ util = cpu_util_wake(cpu, eenv->p);
+
+ /*
+ * If we are looking at the target CPU specified by the eenv,
+ * then we should add the (estimated) utilization of the task
+ * assuming we will wake it up on that CPU.
+ */
+ if (unlikely(cpu == eenv->cpu[cpu_idx].cpu_id))
+ util += eenv->util_delta_boosted;
+
+ max_util = max(max_util, util);
+ }
+
+ return max_util;
+}
+
+/*
+ * group_norm_util() returns the approximated group util relative to it's
+ * current capacity (busy ratio) in the range [0..SCHED_CAPACITY_SCALE] for use
+ * in energy calculations. Since task executions may or may not overlap in time
+ * in the group the true normalized util is between max(cpu_norm_util(i)) and
+ * sum(cpu_norm_util(i)) when iterating over all cpus in the group, i. The
+ * latter is used as the estimate as it leads to a more pessimistic energy
+ * estimate (more busy).
+ */
+static unsigned
+long group_norm_util(struct energy_env *eenv, int cpu_idx)
+{
+ unsigned long capacity = eenv->cpu[cpu_idx].cap;
+ unsigned long util, util_sum = 0;
+ int cpu;
+
+ for_each_cpu(cpu, sched_group_span(eenv->sg)) {
+ util = cpu_util_wake(cpu, eenv->p);
+
+ /*
+ * If we are looking at the target CPU specified by the eenv,
+ * then we should add the (estimated) utilization of the task
+ * assuming we will wake it up on that CPU.
+ */
+ if (unlikely(cpu == eenv->cpu[cpu_idx].cpu_id))
+ util += eenv->util_delta;
+
+ util_sum += __cpu_norm_util(util, capacity);
+ }
+
+ if (util_sum > SCHED_CAPACITY_SCALE)
+ return SCHED_CAPACITY_SCALE;
+ return util_sum;
+}
+
+static int find_new_capacity(struct energy_env *eenv, int cpu_idx)
+{
+ const struct sched_group_energy *sge = eenv->sg_cap->sge;
+ unsigned long util = group_max_util(eenv, cpu_idx);
+ int idx, cap_idx;
+
+ cap_idx = sge->nr_cap_states - 1;
+
+ for (idx = 0; idx < sge->nr_cap_states; idx++) {
+ if (sge->cap_states[idx].cap >= util) {
+ cap_idx = idx;
+ break;
+ }
+ }
+ /* Keep track of SG's capacity */
+ eenv->cpu[cpu_idx].cap = sge->cap_states[cap_idx].cap;
+ eenv->cpu[cpu_idx].cap_idx = cap_idx;
+
+ return cap_idx;
+}
+
+static int group_idle_state(struct energy_env *eenv, int cpu_idx)
+{
+ struct sched_group *sg = eenv->sg;
+ int src_in_grp, dst_in_grp;
+ int i, state = INT_MAX;
+ int max_idle_state_idx;
+ long grp_util = 0;
+ int new_state;
+
+ /* Find the shallowest idle state in the sched group. */
+ for_each_cpu(i, sched_group_span(sg))
+ state = min(state, idle_get_state_idx(cpu_rq(i)));
+
+ /* Take non-cpuidle idling into account (active idle/arch_cpu_idle()) */
+ state++;
+ /*
+ * Try to estimate if a deeper idle state is
+ * achievable when we move the task.
+ */
+ for_each_cpu(i, sched_group_span(sg))
+ grp_util += cpu_util(i);
+
+ src_in_grp = cpumask_test_cpu(eenv->cpu[EAS_CPU_PRV].cpu_id,
+ sched_group_span(sg));
+ dst_in_grp = cpumask_test_cpu(eenv->cpu[cpu_idx].cpu_id,
+ sched_group_span(sg));
+ if (src_in_grp == dst_in_grp) {
+ /*
+ * both CPUs under consideration are in the same group or not in
+ * either group, migration should leave idle state the same.
+ */
+ return state;
+ }
+ /*
+ * add or remove util as appropriate to indicate what group util
+ * will be (worst case - no concurrent execution) after moving the task
+ */
+ grp_util += src_in_grp ? -eenv->util_delta : eenv->util_delta;
+
+ if (grp_util >
+ ((long)sg->sgc->max_capacity * (int)sg->group_weight)) {
+ /*
+ * After moving, the group will be fully occupied
+ * so assume it will not be idle at all.
+ */
+ return 0;
+ }
+
+ /*
+ * after moving, this group is at most partly
+ * occupied, so it should have some idle time.
+ */
+ max_idle_state_idx = sg->sge->nr_idle_states - 2;
+ new_state = grp_util * max_idle_state_idx;
+ if (grp_util <= 0) {
+ /* group will have no util, use lowest state */
+ new_state = max_idle_state_idx + 1;
+ } else {
+ /*
+ * for partially idle, linearly map util to idle
+ * states, excluding the lowest one. This does not
+ * correspond to the state we expect to enter in
+ * reality, but an indication of what might happen.
+ */
+ new_state = min_t(int, max_idle_state_idx,
+ new_state / sg->sgc->max_capacity);
+ new_state = max_idle_state_idx - new_state;
+ }
+ return new_state;
+}
+
+#ifdef DEBUG_EENV_DECISIONS
+static struct _eenv_debug *eenv_debug_entry_ptr(struct _eenv_debug *base, int idx);
+
+static void store_energy_calc_debug_info(struct energy_env *eenv, int cpu_idx, int cap_idx, int idle_idx)
+{
+ int debug_idx = eenv->cpu[cpu_idx].debug_idx;
+ unsigned long sg_util, busy_energy, idle_energy;
+ const struct sched_group_energy *sge;
+ struct _eenv_debug *dbg;
+ int cpu;
+
+ if (debug_idx < EAS_EENV_DEBUG_LEVELS) {
+ sge = eenv->sg->sge;
+ sg_util = group_norm_util(eenv, cpu_idx);
+ busy_energy = sge->cap_states[cap_idx].power;
+ busy_energy *= sg_util;
+ idle_energy = SCHED_CAPACITY_SCALE - sg_util;
+ idle_energy *= sge->idle_states[idle_idx].power;
+ /* should we use sg_cap or sg? */
+ dbg = eenv_debug_entry_ptr(eenv->cpu[cpu_idx].debug, debug_idx);
+ dbg->cap = sge->cap_states[cap_idx].cap;
+ dbg->norm_util = sg_util;
+ dbg->cap_energy = sge->cap_states[cap_idx].power;
+ dbg->idle_energy = sge->idle_states[idle_idx].power;
+ dbg->this_energy = busy_energy + idle_energy;
+ dbg->this_busy_energy = busy_energy;
+ dbg->this_idle_energy = idle_energy;
+
+ cpumask_copy(&dbg->group_cpumask,
+ sched_group_span(eenv->sg));
+
+ for_each_cpu(cpu, &dbg->group_cpumask)
+ dbg->cpu_util[cpu] = cpu_util(cpu);
+
+ eenv->cpu[cpu_idx].debug_idx = debug_idx+1;
+ }
+}
+#else
+#define store_energy_calc_debug_info(a,b,c,d) {}
+#endif /* DEBUG_EENV_DECISIONS */
+
+/*
+ * calc_sg_energy: compute energy for the eenv's SG (i.e. eenv->sg).
+ *
+ * This works in iterations to compute the SG's energy for each CPU
+ * candidate defined by the energy_env's cpu array.
+ */
+static void calc_sg_energy(struct energy_env *eenv)
+{
+ struct sched_group *sg = eenv->sg;
+ unsigned long busy_energy, idle_energy;
+ unsigned int busy_power, idle_power;
+ unsigned long total_energy = 0;
+ unsigned long sg_util;
+ int cap_idx, idle_idx;
+ int cpu_idx;
+
+ for (cpu_idx = EAS_CPU_PRV; cpu_idx < eenv->max_cpu_count; ++cpu_idx) {
+ if (eenv->cpu[cpu_idx].cpu_id == -1)
+ continue;
+
+ /* Compute ACTIVE energy */
+ cap_idx = find_new_capacity(eenv, cpu_idx);
+ busy_power = sg->sge->cap_states[cap_idx].power;
+ sg_util = group_norm_util(eenv, cpu_idx);
+ busy_energy = sg_util * busy_power;
+
+ /* Compute IDLE energy */
+ idle_idx = group_idle_state(eenv, cpu_idx);
+ idle_power = sg->sge->idle_states[idle_idx].power;
+ idle_energy = SCHED_CAPACITY_SCALE - sg_util;
+ idle_energy *= idle_power;
+
+ total_energy = busy_energy + idle_energy;
+ eenv->cpu[cpu_idx].energy += total_energy;
+
+ store_energy_calc_debug_info(eenv, cpu_idx, cap_idx, idle_idx);
+ }
+}
+
+/*
+ * compute_energy() computes the absolute variation in energy consumption by
+ * moving eenv.util_delta from EAS_CPU_PRV to EAS_CPU_NXT.
+ *
+ * NOTE: compute_energy() may fail when racing with sched_domain updates, in
+ * which case we abort by returning -EINVAL.
+ */
+static int compute_energy(struct energy_env *eenv)
+{
+ struct sched_domain *sd;
+ int cpu;
+ struct cpumask visit_cpus;
+ struct sched_group *sg;
+ int cpu_count;
+
+ WARN_ON(!eenv->sg_top->sge);
+
+ cpumask_copy(&visit_cpus, sched_group_span(eenv->sg_top));
+ /* If a cpu is hotplugged in while we are in this function, it does
+ * not appear in the existing visit_cpus mask which came from the
+ * sched_group pointer of the sched_domain pointed at by sd_ea for
+ * either the prev or next cpu and was dereferenced in
+ * select_energy_cpu_idx.
+ * Since we will dereference sd_scs later as we iterate through the
+ * CPUs we expect to visit, new CPUs can be present which are not in
+ * the visit_cpus mask. Guard this with cpu_count.
+ */
+ cpu_count = cpumask_weight(&visit_cpus);
+ while (!cpumask_empty(&visit_cpus)) {
+ struct sched_group *sg_shared_cap = NULL;
+
+ cpu = cpumask_first(&visit_cpus);
+
+ /*
+ * Is the group utilization affected by cpus outside this
+ * sched_group?
+ * This sd may have groups with cpus which were not present
+ * when we took visit_cpus.
+ */
+ sd = rcu_dereference(per_cpu(sd_scs, cpu));
+ if (sd && sd->parent)
+ sg_shared_cap = sd->parent->groups;
+
+ for_each_domain(cpu, sd) {
+ sg = sd->groups;
+
+ /* Has this sched_domain already been visited? */
+ if (sd->child && group_first_cpu(sg) != cpu)
+ break;
+
+ do {
+ eenv->sg_cap = sg;
+ if (sg_shared_cap && sg_shared_cap->group_weight >= sg->group_weight)
+ eenv->sg_cap = sg_shared_cap;
+
+ /*
+ * Compute the energy for all the candidate
+ * CPUs in the current visited SG.
+ */
+ eenv->sg = sg;
+ calc_sg_energy(eenv);
+
+ /* remove CPUs we have just visited */
+ if (!sd->child) {
+ /*
+ * cpu_count here is the number of
+ * cpus we expect to visit in this
+ * calculation. If we race against
+ * hotplug, we can have extra cpus
+ * added to the groups we are
+ * iterating which do not appear in
+ * the visit_cpus mask. In that case
+ * we are not able to calculate energy
+ * without restarting so we will bail
+ * out and use prev_cpu this time.
+ */
+ if (!cpu_count)
+ return -EINVAL;
+ cpumask_xor(&visit_cpus, &visit_cpus, sched_group_span(sg));
+ cpu_count--;
+ }
+
+ if (cpumask_equal(sched_group_span(sg), sched_group_span(eenv->sg_top)))
+ goto next_cpu;
+
+ } while (sg = sg->next, sg != sd->groups);
+ }
+next_cpu:
+ cpumask_clear_cpu(cpu, &visit_cpus);
+ continue;
+ }
+
+ return 0;
+}
+
+static inline bool cpu_in_sg(struct sched_group *sg, int cpu)
+{
+ return cpu != -1 && cpumask_test_cpu(cpu, sched_group_span(sg));
+}
+
+#ifdef DEBUG_EENV_DECISIONS
+static void dump_eenv_debug(struct energy_env *eenv)
+{
+ int cpu_idx, grp_idx;
+ char cpu_utils[(NR_CPUS*12)+10]="cpu_util: ";
+ char cpulist[64];
+
+ trace_printk("eenv scenario: task=%p %s task_util=%lu prev_cpu=%d",
+ eenv->p, eenv->p->comm, eenv->util_delta, eenv->cpu[EAS_CPU_PRV].cpu_id);
+
+ for (cpu_idx=EAS_CPU_PRV; cpu_idx < eenv->max_cpu_count; cpu_idx++) {
+ if (eenv->cpu[cpu_idx].cpu_id == -1)
+ continue;
+ trace_printk("---Scenario %d: Place task on cpu %d energy=%lu (%d debug logs at %p)",
+ cpu_idx+1, eenv->cpu[cpu_idx].cpu_id,
+ eenv->cpu[cpu_idx].energy >> SCHED_CAPACITY_SHIFT,
+ eenv->cpu[cpu_idx].debug_idx,
+ eenv->cpu[cpu_idx].debug);
+ for (grp_idx = 0; grp_idx < eenv->cpu[cpu_idx].debug_idx; grp_idx++) {
+ struct _eenv_debug *debug;
+ int cpu, written=0;
+
+ debug = eenv_debug_entry_ptr(eenv->cpu[cpu_idx].debug, grp_idx);
+ cpu = scnprintf(cpulist, sizeof(cpulist), "%*pbl", cpumask_pr_args(&debug->group_cpumask));
+
+ cpu_utils[0] = 0;
+ /* print out the relevant cpu_util */
+ for_each_cpu(cpu, &(debug->group_cpumask)) {
+ char tmp[64];
+ if (written > sizeof(cpu_utils)-10) {
+ cpu_utils[written]=0;
+ break;
+ }
+ written += snprintf(tmp, sizeof(tmp), "cpu%d(%lu) ", cpu, debug->cpu_util[cpu]);
+ strcat(cpu_utils, tmp);
+ }
+ /* trace the data */
+ trace_printk(" | %s : cap=%lu nutil=%lu, cap_nrg=%lu, idle_nrg=%lu energy=%lu busy_energy=%lu idle_energy=%lu %s",
+ cpulist, debug->cap, debug->norm_util,
+ debug->cap_energy, debug->idle_energy,
+ debug->this_energy >> SCHED_CAPACITY_SHIFT,
+ debug->this_busy_energy >> SCHED_CAPACITY_SHIFT,
+ debug->this_idle_energy >> SCHED_CAPACITY_SHIFT,
+ cpu_utils);
+
+ }
+ trace_printk("---");
+ }
+ trace_printk("----- done");
+ return;
+}
+#else
+#define dump_eenv_debug(a) {}
+#endif /* DEBUG_EENV_DECISIONS */
+/*
+ * select_energy_cpu_idx(): estimate the energy impact of changing the
+ * utilization distribution.
+ *
+ * The eenv parameter specifies the changes: utilization amount and a
+ * collection of possible CPU candidates. The number of candidates
+ * depends upon the selection algorithm used.
+ *
+ * If find_best_target was used to select candidate CPUs, there will
+ * be at most 3 including prev_cpu. If not, we used a brute force
+ * selection which will provide the union of:
+ * * CPUs belonging to the highest sd which is not overutilized
+ * * CPUs the task is allowed to run on
+ * * online CPUs
+ *
+ * This function returns the index of a CPU candidate specified by the
+ * energy_env which corresponds to the most energy efficient CPU.
+ * Thus, 0 (EAS_CPU_PRV) means that non of the CPU candidate is more energy
+ * efficient than running on prev_cpu. This is also the value returned in case
+ * of abort due to error conditions during the computations. The only
+ * exception to this if we fail to access the energy model via sd_ea, where
+ * we return -1 with the intent of asking the system to use a different
+ * wakeup placement algorithm.
+ *
+ * A value greater than zero means that the most energy efficient CPU is the
+ * one represented by eenv->cpu[eenv->next_idx].cpu_id.
+ */
+static inline int select_energy_cpu_idx(struct energy_env *eenv)
+{
+ int last_cpu_idx = eenv->max_cpu_count - 1;
+ struct sched_domain *sd;
+ struct sched_group *sg;
+ int sd_cpu = -1;
+ int cpu_idx;
+ int margin;
+
+ sd_cpu = eenv->cpu[EAS_CPU_PRV].cpu_id;
+ sd = rcu_dereference(per_cpu(sd_ea, sd_cpu));
+ if (!sd)
+ return -1;
+
+ cpumask_clear(&eenv->cpus_mask);
+ for (cpu_idx = EAS_CPU_PRV; cpu_idx < eenv->max_cpu_count; ++cpu_idx) {
+ int cpu = eenv->cpu[cpu_idx].cpu_id;
+
+ if (cpu < 0)
+ continue;
+ cpumask_set_cpu(cpu, &eenv->cpus_mask);
+ }
+
+ sg = sd->groups;
+ do {
+ /* Skip SGs which do not contains a candidate CPU */
+ if (!cpumask_intersects(&eenv->cpus_mask, sched_group_span(sg)))
+ continue;
+
+ eenv->sg_top = sg;
+ if (compute_energy(eenv) == -EINVAL)
+ return EAS_CPU_PRV;
+ } while (sg = sg->next, sg != sd->groups);
+ /* remember - eenv energy values are unscaled */
+
+ /*
+ * Compute the dead-zone margin used to prevent too many task
+ * migrations with negligible energy savings.
+ * An energy saving is considered meaningful if it reduces the energy
+ * consumption of EAS_CPU_PRV CPU candidate by at least ~1.56%
+ */
+ margin = eenv->cpu[EAS_CPU_PRV].energy >> 6;
+
+ /*
+ * By default the EAS_CPU_PRV CPU is considered the most energy
+ * efficient, with a 0 energy variation.
+ */
+ eenv->next_idx = EAS_CPU_PRV;
+ eenv->cpu[EAS_CPU_PRV].nrg_delta = 0;
+
+ dump_eenv_debug(eenv);
+
+ /*
+ * Compare the other CPU candidates to find a CPU which can be
+ * more energy efficient then EAS_CPU_PRV
+ */
+ if (sched_feat(FBT_STRICT_ORDER))
+ last_cpu_idx = EAS_CPU_BKP;
+
+ for(cpu_idx = EAS_CPU_NXT; cpu_idx <= last_cpu_idx; cpu_idx++) {
+ if (eenv->cpu[cpu_idx].cpu_id < 0)
+ continue;
+ eenv->cpu[cpu_idx].nrg_delta =
+ eenv->cpu[cpu_idx].energy -
+ eenv->cpu[EAS_CPU_PRV].energy;
+
+ /* filter energy variations within the dead-zone margin */
+ if (abs(eenv->cpu[cpu_idx].nrg_delta) < margin)
+ eenv->cpu[cpu_idx].nrg_delta = 0;
+ /* update the schedule candidate with min(nrg_delta) */
+ if (eenv->cpu[cpu_idx].nrg_delta <
+ eenv->cpu[eenv->next_idx].nrg_delta) {
+ eenv->next_idx = cpu_idx;
+ /* break out if we want to stop on first saving candidate */
+ if (sched_feat(FBT_STRICT_ORDER))
+ break;
+ }
+ }
+
+ return eenv->next_idx;
+}
+
+/*
* Detect M:N waker/wakee relationships via a switching-frequency heuristic.
*
* A waker of many should wake a different task than the one last awakened
@@ -5344,15 +6503,18 @@
* whatever is irrelevant, spread criteria is apparent partner count exceeds
* socket size.
*/
-static int wake_wide(struct task_struct *p)
+static int wake_wide(struct task_struct *p, int sibling_count_hint)
{
unsigned int master = current->wakee_flips;
unsigned int slave = p->wakee_flips;
- int factor = this_cpu_read(sd_llc_size);
+ int llc_size = this_cpu_read(sd_llc_size);
+
+ if (sibling_count_hint >= llc_size)
+ return 1;
if (master < slave)
swap(master, slave);
- if (slave < factor || master < slave * factor)
+ if (slave < llc_size || master < slave * llc_size)
return 0;
return 1;
}
@@ -5438,17 +6600,110 @@
return affine;
}
-static inline int task_util(struct task_struct *p);
-static int cpu_util_wake(int cpu, struct task_struct *p);
+#ifdef CONFIG_SCHED_TUNE
+struct reciprocal_value schedtune_spc_rdiv;
+
+static long
+schedtune_margin(unsigned long signal, long boost)
+{
+ long long margin = 0;
+
+ /*
+ * Signal proportional compensation (SPC)
+ *
+ * The Boost (B) value is used to compute a Margin (M) which is
+ * proportional to the complement of the original Signal (S):
+ * M = B * (SCHED_CAPACITY_SCALE - S)
+ * The obtained M could be used by the caller to "boost" S.
+ */
+ if (boost >= 0) {
+ margin = SCHED_CAPACITY_SCALE - signal;
+ margin *= boost;
+ } else
+ margin = -signal * boost;
+
+ margin = reciprocal_divide(margin, schedtune_spc_rdiv);
+
+ if (boost < 0)
+ margin *= -1;
+ return margin;
+}
+
+static inline int
+schedtune_cpu_margin(unsigned long util, int cpu)
+{
+ int boost = schedtune_cpu_boost(cpu);
+
+ if (boost == 0)
+ return 0;
+
+ return schedtune_margin(util, boost);
+}
+
+static inline long
+schedtune_task_margin(struct task_struct *task)
+{
+ int boost = schedtune_task_boost(task);
+ unsigned long util;
+ long margin;
+
+ if (boost == 0)
+ return 0;
+
+ util = task_util_est(task);
+ margin = schedtune_margin(util, boost);
+
+ return margin;
+}
+
+#else /* CONFIG_SCHED_TUNE */
+
+static inline int
+schedtune_cpu_margin(unsigned long util, int cpu)
+{
+ return 0;
+}
+
+static inline int
+schedtune_task_margin(struct task_struct *task)
+{
+ return 0;
+}
+
+#endif /* CONFIG_SCHED_TUNE */
+
+unsigned long
+boosted_cpu_util(int cpu, unsigned long other_util)
+{
+ unsigned long util = cpu_util_freq(cpu) + other_util;
+ long margin = schedtune_cpu_margin(util, cpu);
+
+ trace_sched_boost_cpu(cpu, util, margin);
+
+ return util + margin;
+}
+
+static inline unsigned long
+boosted_task_util(struct task_struct *task)
+{
+ unsigned long util = task_util_est(task);
+ long margin = schedtune_task_margin(task);
+
+ trace_sched_boost_task(task, util, margin);
+
+ return util + margin;
+}
static unsigned long capacity_spare_wake(int cpu, struct task_struct *p)
{
- return capacity_orig_of(cpu) - cpu_util_wake(cpu, p);
+ return max_t(long, capacity_of(cpu) - cpu_util_wake(cpu, p), 0);
}
/*
* find_idlest_group finds and returns the least busy CPU group within the
* domain.
+ *
+ * Assumes p is allowed on at least one CPU in sd.
*/
static struct sched_group *
find_idlest_group(struct sched_domain *sd, struct task_struct *p,
@@ -5456,8 +6711,9 @@
{
struct sched_group *idlest = NULL, *group = sd->groups;
struct sched_group *most_spare_sg = NULL;
- unsigned long min_runnable_load = ULONG_MAX, this_runnable_load = 0;
- unsigned long min_avg_load = ULONG_MAX, this_avg_load = 0;
+ unsigned long min_runnable_load = ULONG_MAX;
+ unsigned long this_runnable_load = ULONG_MAX;
+ unsigned long min_avg_load = ULONG_MAX, this_avg_load = ULONG_MAX;
unsigned long most_spare = 0, this_spare = 0;
int load_idx = sd->forkexec_idx;
int imbalance_scale = 100 + (sd->imbalance_pct-100)/2;
@@ -5578,10 +6834,10 @@
}
/*
- * find_idlest_cpu - find the idlest cpu among the cpus in group.
+ * find_idlest_group_cpu - find the idlest cpu among the cpus in group.
*/
static int
-find_idlest_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
+find_idlest_group_cpu(struct sched_group *group, struct task_struct *p, int this_cpu)
{
unsigned long load, min_load = ULONG_MAX;
unsigned int min_exit_latency = UINT_MAX;
@@ -5630,6 +6886,53 @@
return shallowest_idle_cpu != -1 ? shallowest_idle_cpu : least_loaded_cpu;
}
+static inline int find_idlest_cpu(struct sched_domain *sd, struct task_struct *p,
+ int cpu, int prev_cpu, int sd_flag)
+{
+ int new_cpu = cpu;
+
+ if (!cpumask_intersects(sched_domain_span(sd), &p->cpus_allowed))
+ return prev_cpu;
+
+ while (sd) {
+ struct sched_group *group;
+ struct sched_domain *tmp;
+ int weight;
+
+ if (!(sd->flags & sd_flag)) {
+ sd = sd->child;
+ continue;
+ }
+
+ group = find_idlest_group(sd, p, cpu, sd_flag);
+ if (!group) {
+ sd = sd->child;
+ continue;
+ }
+
+ new_cpu = find_idlest_group_cpu(group, p, cpu);
+ if (new_cpu == cpu) {
+ /* Now try balancing at a lower domain level of cpu */
+ sd = sd->child;
+ continue;
+ }
+
+ /* Now try balancing at a lower domain level of new_cpu */
+ cpu = new_cpu;
+ weight = sd->span_weight;
+ sd = NULL;
+ for_each_domain(cpu, tmp) {
+ if (weight <= tmp->span_weight)
+ break;
+ if (tmp->flags & sd_flag)
+ sd = tmp;
+ }
+ /* while loop will break here if sd == NULL */
+ }
+
+ return new_cpu;
+}
+
#ifdef CONFIG_SCHED_SMT
DEFINE_STATIC_KEY_FALSE(sched_smt_present);
@@ -5812,7 +7115,7 @@
/*
* Try and locate an idle core/thread in the LLC cache domain.
*/
-static int select_idle_sibling(struct task_struct *p, int prev, int target)
+static inline int __select_idle_sibling(struct task_struct *p, int prev, int target)
{
struct sched_domain *sd;
int i;
@@ -5845,61 +7148,408 @@
return target;
}
-/*
- * cpu_util returns the amount of capacity of a CPU that is used by CFS
- * tasks. The unit of the return value must be the one of capacity so we can
- * compare the utilization with the capacity of the CPU that is available for
- * CFS task (ie cpu_capacity).
- *
- * cfs_rq.avg.util_avg is the sum of running time of runnable tasks plus the
- * recent utilization of currently non-runnable tasks on a CPU. It represents
- * the amount of utilization of a CPU in the range [0..capacity_orig] where
- * capacity_orig is the cpu_capacity available at the highest frequency
- * (arch_scale_freq_capacity()).
- * The utilization of a CPU converges towards a sum equal to or less than the
- * current capacity (capacity_curr <= capacity_orig) of the CPU because it is
- * the running time on this CPU scaled by capacity_curr.
- *
- * Nevertheless, cfs_rq.avg.util_avg can be higher than capacity_curr or even
- * higher than capacity_orig because of unfortunate rounding in
- * cfs.avg.util_avg or just after migrating tasks and new task wakeups until
- * the average stabilizes with the new running time. We need to check that the
- * utilization stays within the range of [0..capacity_orig] and cap it if
- * necessary. Without utilization capping, a group could be seen as overloaded
- * (CPU0 utilization at 121% + CPU1 utilization at 80%) whereas CPU1 has 20% of
- * available capacity. We allow utilization to overshoot capacity_curr (but not
- * capacity_orig) as it useful for predicting the capacity required after task
- * migrations (scheduler-driven DVFS).
- */
-static int cpu_util(int cpu)
+static inline int select_idle_sibling_cstate_aware(struct task_struct *p, int prev, int target)
{
- unsigned long util = cpu_rq(cpu)->cfs.avg.util_avg;
- unsigned long capacity = capacity_orig_of(cpu);
+ struct sched_domain *sd;
+ struct sched_group *sg;
+ int best_idle_cpu = -1;
+ int best_idle_cstate = -1;
+ int best_idle_capacity = INT_MAX;
+ int i;
- return (util >= capacity) ? capacity : util;
+ /*
+ * Iterate the domains and find an elegible idle cpu.
+ */
+ sd = rcu_dereference(per_cpu(sd_llc, target));
+ for_each_lower_domain(sd) {
+ sg = sd->groups;
+ do {
+ if (!cpumask_intersects(
+ sched_group_span(sg), &p->cpus_allowed))
+ goto next;
+
+ for_each_cpu_and(i, &p->cpus_allowed, sched_group_span(sg)) {
+ int idle_idx;
+ unsigned long new_usage;
+ unsigned long capacity_orig;
+
+ if (!idle_cpu(i))
+ goto next;
+
+ /* figure out if the task can fit here at all */
+ new_usage = boosted_task_util(p);
+ capacity_orig = capacity_orig_of(i);
+
+ if (new_usage > capacity_orig)
+ goto next;
+
+ /* if the task fits without changing OPP and we
+ * intended to use this CPU, just proceed
+ */
+ if (i == target && new_usage <= capacity_curr_of(target)) {
+ return target;
+ }
+
+ /* otherwise select CPU with shallowest idle state
+ * to reduce wakeup latency.
+ */
+ idle_idx = idle_get_state_idx(cpu_rq(i));
+
+ if (idle_idx < best_idle_cstate &&
+ capacity_orig <= best_idle_capacity) {
+ best_idle_cpu = i;
+ best_idle_cstate = idle_idx;
+ best_idle_capacity = capacity_orig;
+ }
+ }
+ next:
+ sg = sg->next;
+ } while (sg != sd->groups);
+ }
+
+ if (best_idle_cpu >= 0)
+ target = best_idle_cpu;
+
+ return target;
}
-static inline int task_util(struct task_struct *p)
+static int select_idle_sibling(struct task_struct *p, int prev, int target)
{
- return p->se.avg.util_avg;
+ if (!sysctl_sched_cstate_aware)
+ return __select_idle_sibling(p, prev, target);
+
+ return select_idle_sibling_cstate_aware(p, prev, target);
}
-/*
- * cpu_util_wake: Compute cpu utilization with any contributions from
- * the waking task p removed.
- */
-static int cpu_util_wake(int cpu, struct task_struct *p)
+static inline int task_fits_capacity(struct task_struct *p, long capacity)
{
- unsigned long util, capacity;
+ return capacity * 1024 > boosted_task_util(p) * capacity_margin;
+}
- /* Task has no contribution or is new */
- if (cpu != task_cpu(p) || !p->se.avg.last_update_time)
- return cpu_util(cpu);
+static int start_cpu(bool boosted)
+{
+ struct root_domain *rd = cpu_rq(smp_processor_id())->rd;
- capacity = capacity_orig_of(cpu);
- util = max_t(long, cpu_rq(cpu)->cfs.avg.util_avg - task_util(p), 0);
+ return boosted ? rd->max_cap_orig_cpu : rd->min_cap_orig_cpu;
+}
- return (util >= capacity) ? capacity : util;
+static inline int find_best_target(struct task_struct *p, int *backup_cpu,
+ bool boosted, bool prefer_idle)
+{
+ unsigned long min_util = boosted_task_util(p);
+ unsigned long target_capacity = ULONG_MAX;
+ unsigned long min_wake_util = ULONG_MAX;
+ unsigned long target_max_spare_cap = 0;
+ unsigned long target_util = ULONG_MAX;
+ unsigned long best_active_util = ULONG_MAX;
+ int best_idle_cstate = INT_MAX;
+ struct sched_domain *sd;
+ struct sched_group *sg;
+ int best_active_cpu = -1;
+ int best_idle_cpu = -1;
+ int target_cpu = -1;
+ int cpu, i;
+
+ *backup_cpu = -1;
+
+ /*
+ * In most cases, target_capacity tracks capacity_orig of the most
+ * energy efficient CPU candidate, thus requiring to minimise
+ * target_capacity. For these cases target_capacity is already
+ * initialized to ULONG_MAX.
+ * However, for prefer_idle and boosted tasks we look for a high
+ * performance CPU, thus requiring to maximise target_capacity. In this
+ * case we initialise target_capacity to 0.
+ */
+ if (prefer_idle && boosted)
+ target_capacity = 0;
+
+ /* Find start CPU based on boost value */
+ cpu = start_cpu(boosted);
+ if (cpu < 0)
+ return -1;
+
+ /* Find SD for the start CPU */
+ sd = rcu_dereference(per_cpu(sd_ea, cpu));
+ if (!sd)
+ return -1;
+
+ /* Scan CPUs in all SDs */
+ sg = sd->groups;
+ do {
+ for_each_cpu_and(i, &p->cpus_allowed, sched_group_span(sg)) {
+ unsigned long capacity_curr = capacity_curr_of(i);
+ unsigned long capacity_orig = capacity_orig_of(i);
+ unsigned long wake_util, new_util;
+ long spare_cap;
+ int idle_idx = INT_MAX;
+
+ if (!cpu_online(i))
+ continue;
+
+ if (walt_cpu_high_irqload(i))
+ continue;
+
+ /*
+ * p's blocked utilization is still accounted for on prev_cpu
+ * so prev_cpu will receive a negative bias due to the double
+ * accounting. However, the blocked utilization may be zero.
+ */
+ wake_util = cpu_util_wake(i, p);
+ new_util = wake_util + task_util_est(p);
+
+ /*
+ * Ensure minimum capacity to grant the required boost.
+ * The target CPU can be already at a capacity level higher
+ * than the one required to boost the task.
+ */
+ new_util = max(min_util, new_util);
+ if (new_util > capacity_orig)
+ continue;
+
+ /*
+ * Pre-compute the maximum possible capacity we expect
+ * to have available on this CPU once the task is
+ * enqueued here.
+ */
+ spare_cap = capacity_orig - new_util;
+
+ if (idle_cpu(i))
+ idle_idx = idle_get_state_idx(cpu_rq(i));
+
+
+ /*
+ * Case A) Latency sensitive tasks
+ *
+ * Unconditionally favoring tasks that prefer idle CPU to
+ * improve latency.
+ *
+ * Looking for:
+ * - an idle CPU, whatever its idle_state is, since
+ * the first CPUs we explore are more likely to be
+ * reserved for latency sensitive tasks.
+ * - a non idle CPU where the task fits in its current
+ * capacity and has the maximum spare capacity.
+ * - a non idle CPU with lower contention from other
+ * tasks and running at the lowest possible OPP.
+ *
+ * The last two goals tries to favor a non idle CPU
+ * where the task can run as if it is "almost alone".
+ * A maximum spare capacity CPU is favoured since
+ * the task already fits into that CPU's capacity
+ * without waiting for an OPP chance.
+ *
+ * The following code path is the only one in the CPUs
+ * exploration loop which is always used by
+ * prefer_idle tasks. It exits the loop with wither a
+ * best_active_cpu or a target_cpu which should
+ * represent an optimal choice for latency sensitive
+ * tasks.
+ */
+ if (prefer_idle) {
+
+ /*
+ * Case A.1: IDLE CPU
+ * Return the best IDLE CPU we find:
+ * - for boosted tasks: the CPU with the highest
+ * performance (i.e. biggest capacity_orig)
+ * - for !boosted tasks: the most energy
+ * efficient CPU (i.e. smallest capacity_orig)
+ */
+ if (idle_cpu(i)) {
+ if (boosted &&
+ capacity_orig < target_capacity)
+ continue;
+ if (!boosted &&
+ capacity_orig > target_capacity)
+ continue;
+ if (capacity_orig == target_capacity &&
+ sysctl_sched_cstate_aware &&
+ best_idle_cstate <= idle_idx)
+ continue;
+
+ target_capacity = capacity_orig;
+ best_idle_cstate = idle_idx;
+ best_idle_cpu = i;
+ continue;
+ }
+ if (best_idle_cpu != -1)
+ continue;
+
+ /*
+ * Case A.2: Target ACTIVE CPU
+ * Favor CPUs with max spare capacity.
+ */
+ if (capacity_curr > new_util &&
+ spare_cap > target_max_spare_cap) {
+ target_max_spare_cap = spare_cap;
+ target_cpu = i;
+ continue;
+ }
+ if (target_cpu != -1)
+ continue;
+
+
+ /*
+ * Case A.3: Backup ACTIVE CPU
+ * Favor CPUs with:
+ * - lower utilization due to other tasks
+ * - lower utilization with the task in
+ */
+ if (wake_util > min_wake_util)
+ continue;
+ if (new_util > best_active_util)
+ continue;
+ min_wake_util = wake_util;
+ best_active_util = new_util;
+ best_active_cpu = i;
+ continue;
+ }
+
+ /*
+ * Enforce EAS mode
+ *
+ * For non latency sensitive tasks, skip CPUs that
+ * will be overutilized by moving the task there.
+ *
+ * The goal here is to remain in EAS mode as long as
+ * possible at least for !prefer_idle tasks.
+ */
+ if ((new_util * capacity_margin) >
+ (capacity_orig * SCHED_CAPACITY_SCALE))
+ continue;
+
+ /*
+ * Favor CPUs with smaller capacity for non latency
+ * sensitive tasks.
+ */
+ if (capacity_orig > target_capacity)
+ continue;
+
+ /*
+ * Case B) Non latency sensitive tasks on IDLE CPUs.
+ *
+ * Find an optimal backup IDLE CPU for non latency
+ * sensitive tasks.
+ *
+ * Looking for:
+ * - minimizing the capacity_orig,
+ * i.e. preferring LITTLE CPUs
+ * - favoring shallowest idle states
+ * i.e. avoid to wakeup deep-idle CPUs
+ *
+ * The following code path is used by non latency
+ * sensitive tasks if IDLE CPUs are available. If at
+ * least one of such CPUs are available it sets the
+ * best_idle_cpu to the most suitable idle CPU to be
+ * selected.
+ *
+ * If idle CPUs are available, favour these CPUs to
+ * improve performances by spreading tasks.
+ * Indeed, the energy_diff() computed by the caller
+ * will take care to ensure the minimization of energy
+ * consumptions without affecting performance.
+ */
+ if (idle_cpu(i)) {
+ /*
+ * Skip CPUs in deeper idle state, but only
+ * if they are also less energy efficient.
+ * IOW, prefer a deep IDLE LITTLE CPU vs a
+ * shallow idle big CPU.
+ */
+ if (capacity_orig == target_capacity &&
+ sysctl_sched_cstate_aware &&
+ best_idle_cstate <= idle_idx)
+ continue;
+
+ target_capacity = capacity_orig;
+ best_idle_cstate = idle_idx;
+ best_idle_cpu = i;
+ continue;
+ }
+
+ /*
+ * Case C) Non latency sensitive tasks on ACTIVE CPUs.
+ *
+ * Pack tasks in the most energy efficient capacities.
+ *
+ * This task packing strategy prefers more energy
+ * efficient CPUs (i.e. pack on smaller maximum
+ * capacity CPUs) while also trying to spread tasks to
+ * run them all at the lower OPP.
+ *
+ * This assumes for example that it's more energy
+ * efficient to run two tasks on two CPUs at a lower
+ * OPP than packing both on a single CPU but running
+ * that CPU at an higher OPP.
+ *
+ * Thus, this case keep track of the CPU with the
+ * smallest maximum capacity and highest spare maximum
+ * capacity.
+ */
+
+ /* Favor CPUs with maximum spare capacity */
+ if (capacity_orig == target_capacity &&
+ spare_cap < target_max_spare_cap)
+ continue;
+
+ target_max_spare_cap = spare_cap;
+ target_capacity = capacity_orig;
+ target_util = new_util;
+ target_cpu = i;
+ }
+
+ } while (sg = sg->next, sg != sd->groups);
+
+ /*
+ * For non latency sensitive tasks, cases B and C in the previous loop,
+ * we pick the best IDLE CPU only if we was not able to find a target
+ * ACTIVE CPU.
+ *
+ * Policies priorities:
+ *
+ * - prefer_idle tasks:
+ *
+ * a) IDLE CPU available: best_idle_cpu
+ * b) ACTIVE CPU where task fits and has the bigger maximum spare
+ * capacity (i.e. target_cpu)
+ * c) ACTIVE CPU with less contention due to other tasks
+ * (i.e. best_active_cpu)
+ *
+ * - NON prefer_idle tasks:
+ *
+ * a) ACTIVE CPU: target_cpu
+ * b) IDLE CPU: best_idle_cpu
+ */
+
+ if (prefer_idle && (best_idle_cpu != -1)) {
+ trace_sched_find_best_target(p, prefer_idle, min_util, cpu,
+ best_idle_cpu, best_active_cpu,
+ best_idle_cpu);
+
+ return best_idle_cpu;
+ }
+
+ if (target_cpu == -1)
+ target_cpu = prefer_idle
+ ? best_active_cpu
+ : best_idle_cpu;
+ else
+ *backup_cpu = prefer_idle
+ ? best_active_cpu
+ : best_idle_cpu;
+
+ trace_sched_find_best_target(p, prefer_idle, min_util, cpu,
+ best_idle_cpu, best_active_cpu,
+ target_cpu);
+
+ /* it is possible for target and backup
+ * to select same CPU - if so, drop backup
+ */
+ if (*backup_cpu == target_cpu)
+ *backup_cpu = -1;
+
+ return target_cpu;
}
/*
@@ -5913,8 +7563,11 @@
{
long min_cap, max_cap;
+ if (!static_branch_unlikely(&sched_asym_cpucapacity))
+ return 0;
+
min_cap = min(capacity_orig_of(prev_cpu), capacity_orig_of(cpu));
- max_cap = cpu_rq(cpu)->rd->max_cpu_capacity;
+ max_cap = cpu_rq(cpu)->rd->max_cpu_capacity.val;
/* Minimum capacity is close to max, no need to abort wake_affine */
if (max_cap - min_cap < max_cap >> 3)
@@ -5923,7 +7576,297 @@
/* Bring task utilization in sync with prev_cpu */
sync_entity_load_avg(&p->se);
- return min_cap * 1024 < task_util(p) * capacity_margin;
+ return !task_fits_capacity(p, min_cap);
+}
+
+static bool cpu_overutilized(int cpu)
+{
+ return (capacity_of(cpu) * 1024) < (cpu_util(cpu) * capacity_margin);
+}
+
+DEFINE_PER_CPU(struct energy_env, eenv_cache);
+
+/* kernels often have NR_CPUS defined to be much
+ * larger than exist in practise on booted systems.
+ * Allocate the cpu array for eenv calculations
+ * at boot time to avoid massive overprovisioning.
+ */
+#ifdef DEBUG_EENV_DECISIONS
+static inline int eenv_debug_size_per_dbg_entry(void)
+{
+ return sizeof(struct _eenv_debug) + (sizeof(unsigned long) * num_possible_cpus());
+}
+
+static inline int eenv_debug_size_per_cpu_entry(void)
+{
+ /* each cpu struct has an array of _eenv_debug structs
+ * which have an array of unsigned longs at the end -
+ * the allocation should be extended so that there are
+ * at least 'num_possible_cpus' entries in the array.
+ */
+ return EAS_EENV_DEBUG_LEVELS * eenv_debug_size_per_dbg_entry();
+}
+/* given a per-_eenv_cpu debug env ptr, get the ptr for a given index */
+static inline struct _eenv_debug *eenv_debug_entry_ptr(struct _eenv_debug *base, int idx)
+{
+ char *ptr = (char *)base;
+ ptr += (idx * eenv_debug_size_per_dbg_entry());
+ return (struct _eenv_debug *)ptr;
+}
+/* given a pointer to the per-cpu global copy of _eenv_debug, get
+ * a pointer to the specified _eenv_cpu debug env.
+ */
+static inline struct _eenv_debug *eenv_debug_percpu_debug_env_ptr(struct _eenv_debug *base, int cpu_idx)
+{
+ char *ptr = (char *)base;
+ ptr += (cpu_idx * eenv_debug_size_per_cpu_entry());
+ return (struct _eenv_debug *)ptr;
+}
+
+static inline int eenv_debug_size(void)
+{
+ return num_possible_cpus() * eenv_debug_size_per_cpu_entry();
+}
+#endif
+
+static inline void alloc_eenv(void)
+{
+ int cpu;
+ int cpu_count = num_possible_cpus();
+
+ for_each_possible_cpu(cpu) {
+ struct energy_env *eenv = &per_cpu(eenv_cache, cpu);
+ eenv->cpu = kmalloc(sizeof(struct eenv_cpu) * cpu_count, GFP_KERNEL);
+ eenv->eenv_cpu_count = cpu_count;
+#ifdef DEBUG_EENV_DECISIONS
+ eenv->debug = (struct _eenv_debug *)kmalloc(eenv_debug_size(), GFP_KERNEL);
+#endif
+ }
+}
+
+static inline void reset_eenv(struct energy_env *eenv)
+{
+ int cpu_count;
+ struct eenv_cpu *cpu;
+#ifdef DEBUG_EENV_DECISIONS
+ struct _eenv_debug *debug;
+ int cpu_idx;
+ debug = eenv->debug;
+#endif
+
+ cpu_count = eenv->eenv_cpu_count;
+ cpu = eenv->cpu;
+ memset(eenv, 0, sizeof(struct energy_env));
+ eenv->cpu = cpu;
+ memset(eenv->cpu, 0, sizeof(struct eenv_cpu)*cpu_count);
+ eenv->eenv_cpu_count = cpu_count;
+
+#ifdef DEBUG_EENV_DECISIONS
+ memset(debug, 0, eenv_debug_size());
+ eenv->debug = debug;
+ for(cpu_idx = 0; cpu_idx < eenv->cpu_array_len; cpu_idx++)
+ eenv->cpu[cpu_idx].debug = eenv_debug_percpu_debug_env_ptr(debug, cpu_idx);
+#endif
+}
+/*
+ * get_eenv - reset the eenv struct cached for this CPU
+ *
+ * When the eenv is returned, it is configured to do
+ * energy calculations for the maximum number of CPUs
+ * the task can be placed on. The prev_cpu entry is
+ * filled in here. Callers are responsible for adding
+ * other CPU candidates up to eenv->max_cpu_count.
+ */
+static inline struct energy_env *get_eenv(struct task_struct *p, int prev_cpu)
+{
+ struct energy_env *eenv;
+ cpumask_t cpumask_possible_cpus;
+ int cpu = smp_processor_id();
+ int i;
+
+ eenv = &(per_cpu(eenv_cache, cpu));
+ reset_eenv(eenv);
+
+ /* populate eenv */
+ eenv->p = p;
+ /* use boosted task util for capacity selection
+ * during energy calculation, but unboosted task
+ * util for group utilization calculations
+ */
+ eenv->util_delta = task_util_est(p);
+ eenv->util_delta_boosted = boosted_task_util(p);
+
+ cpumask_and(&cpumask_possible_cpus, &p->cpus_allowed, cpu_online_mask);
+ eenv->max_cpu_count = cpumask_weight(&cpumask_possible_cpus);
+
+ for (i=0; i < eenv->max_cpu_count; i++)
+ eenv->cpu[i].cpu_id = -1;
+ eenv->cpu[EAS_CPU_PRV].cpu_id = prev_cpu;
+ eenv->next_idx = EAS_CPU_PRV;
+
+ return eenv;
+}
+
+/*
+ * Needs to be called inside rcu_read_lock critical section.
+ * sd is a pointer to the sched domain we wish to use for an
+ * energy-aware placement option.
+ */
+static int find_energy_efficient_cpu(struct sched_domain *sd,
+ struct task_struct *p,
+ int cpu, int prev_cpu,
+ int sync)
+{
+ int use_fbt = sched_feat(FIND_BEST_TARGET);
+ int cpu_iter, eas_cpu_idx = EAS_CPU_NXT;
+ int target_cpu = -1;
+ struct energy_env *eenv;
+
+ if (sysctl_sched_sync_hint_enable && sync) {
+ if (cpumask_test_cpu(cpu, &p->cpus_allowed)) {
+ return cpu;
+ }
+ }
+
+ /* prepopulate energy diff environment */
+ eenv = get_eenv(p, prev_cpu);
+ if (eenv->max_cpu_count < 2)
+ return -1;
+
+ if(!use_fbt) {
+ /*
+ * using this function outside wakeup balance will not supply
+ * an sd ptr. Instead, fetch the highest level with energy data.
+ */
+ if (!sd)
+ sd = rcu_dereference(per_cpu(sd_ea, prev_cpu));
+
+ for_each_cpu_and(cpu_iter, &p->cpus_allowed, sched_domain_span(sd)) {
+ unsigned long spare;
+
+ /* prev_cpu already in list */
+ if (cpu_iter == prev_cpu)
+ continue;
+
+ /*
+ * Consider only CPUs where the task is expected to
+ * fit without making the CPU overutilized.
+ */
+ spare = capacity_spare_wake(cpu_iter, p);
+ if (spare * 1024 < capacity_margin * task_util_est(p))
+ continue;
+
+ /* Add CPU candidate */
+ eenv->cpu[eas_cpu_idx++].cpu_id = cpu_iter;
+ eenv->max_cpu_count = eas_cpu_idx;
+
+ /* stop adding CPUs if we have no space left */
+ if (eas_cpu_idx >= eenv->eenv_cpu_count)
+ break;
+ }
+ } else {
+ int boosted = (schedtune_task_boost(p) > 0);
+ int prefer_idle;
+
+ /*
+ * give compiler a hint that if sched_features
+ * cannot be changed, it is safe to optimise out
+ * all if(prefer_idle) blocks.
+ */
+ prefer_idle = sched_feat(EAS_PREFER_IDLE) ?
+ (schedtune_prefer_idle(p) > 0) : 0;
+
+ eenv->max_cpu_count = EAS_CPU_BKP + 1;
+
+ /* Find a cpu with sufficient capacity */
+ target_cpu = find_best_target(p, &eenv->cpu[EAS_CPU_BKP].cpu_id,
+ boosted, prefer_idle);
+
+ /* Immediately return a found idle CPU for a prefer_idle task */
+ if (prefer_idle && target_cpu >= 0 && idle_cpu(target_cpu))
+ return target_cpu;
+
+ /* Place target into NEXT slot */
+ eenv->cpu[EAS_CPU_NXT].cpu_id = target_cpu;
+
+ /* take note if no backup was found */
+ if (eenv->cpu[EAS_CPU_BKP].cpu_id < 0)
+ eenv->max_cpu_count = EAS_CPU_BKP;
+
+ /* take note if no target was found */
+ if (eenv->cpu[EAS_CPU_NXT].cpu_id < 0)
+ eenv->max_cpu_count = EAS_CPU_NXT;
+ }
+
+ if (eenv->max_cpu_count == EAS_CPU_NXT) {
+ /*
+ * we did not find any energy-awareness
+ * candidates beyond prev_cpu, so we will
+ * fall-back to the regular slow-path.
+ */
+ return -1;
+ }
+
+ /* find most energy-efficient CPU */
+ target_cpu = select_energy_cpu_idx(eenv) < 0 ? -1 :
+ eenv->cpu[eenv->next_idx].cpu_id;
+
+ return target_cpu;
+}
+
+static inline bool nohz_kick_needed(struct rq *rq, bool only_update);
+static void nohz_balancer_kick(bool only_update);
+
+/*
+ * wake_energy: Make the decision if we want to use an energy-aware
+ * wakeup task placement or not. This is limited to situations where
+ * we cannot use energy-awareness right now.
+ *
+ * Returns TRUE if we should attempt energy-aware wakeup, FALSE if not.
+ *
+ * Should only be called from select_task_rq_fair inside the RCU
+ * read-side critical section.
+ */
+static inline int wake_energy(struct task_struct *p, int prev_cpu,
+ int sd_flag, int wake_flags)
+{
+ struct sched_domain *sd = NULL;
+ int sync = wake_flags & WF_SYNC;
+
+ sd = rcu_dereference_sched(cpu_rq(prev_cpu)->sd);
+
+ /*
+ * Check all definite no-energy-awareness conditions
+ */
+ if (!sd)
+ return false;
+
+ if (!energy_aware())
+ return false;
+
+ if (sd_overutilized(sd))
+ return false;
+
+ /*
+ * we cannot do energy-aware wakeup placement sensibly
+ * for tasks with 0 utilization, so let them be placed
+ * according to the normal strategy.
+ * However if fbt is in use we may still benefit from
+ * the heuristics we use there in selecting candidate
+ * CPUs.
+ */
+ if (unlikely(!sched_feat(FIND_BEST_TARGET) && !task_util_est(p)))
+ return false;
+
+ if(!sched_feat(EAS_PREFER_IDLE)){
+ /*
+ * Force prefer-idle tasks into the slow path, this may not happen
+ * if none of the sd flags matched.
+ */
+ if (schedtune_prefer_idle(p) > 0 && !sync)
+ return false;
+ }
+ return true;
}
/*
@@ -5939,21 +7882,28 @@
* preempt must be disabled.
*/
static int
-select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_flags)
+select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_flags,
+ int sibling_count_hint)
{
- struct sched_domain *tmp, *affine_sd = NULL, *sd = NULL;
+ struct sched_domain *tmp, *affine_sd = NULL;
+ struct sched_domain *sd = NULL, *energy_sd = NULL;
int cpu = smp_processor_id();
int new_cpu = prev_cpu;
int want_affine = 0;
+ int want_energy = 0;
int sync = wake_flags & WF_SYNC;
+ rcu_read_lock();
+
if (sd_flag & SD_BALANCE_WAKE) {
record_wakee(p);
- want_affine = !wake_wide(p) && !wake_cap(p, cpu, prev_cpu)
- && cpumask_test_cpu(cpu, &p->cpus_allowed);
+ want_energy = wake_energy(p, prev_cpu, sd_flag, wake_flags);
+ want_affine = !want_energy &&
+ !wake_wide(p, sibling_count_hint) &&
+ !wake_cap(p, cpu, prev_cpu) &&
+ cpumask_test_cpu(cpu, &p->cpus_allowed);
}
- rcu_read_lock();
for_each_domain(cpu, tmp) {
if (!(tmp->flags & SD_LOAD_BALANCE))
break;
@@ -5968,9 +7918,22 @@
break;
}
+ /*
+ * If we are able to try an energy-aware wakeup,
+ * select the highest non-overutilized sched domain
+ * which includes this cpu and prev_cpu
+ *
+ * maybe want to not test prev_cpu and only consider
+ * the current one?
+ */
+ if (want_energy &&
+ !sd_overutilized(tmp) &&
+ cpumask_test_cpu(prev_cpu, sched_domain_span(tmp)))
+ energy_sd = tmp;
+
if (tmp->flags & sd_flag)
sd = tmp;
- else if (!want_affine)
+ else if (!(want_affine || want_energy))
break;
}
@@ -5983,47 +7946,38 @@
new_cpu = cpu;
}
+ if (sd && !(sd_flag & SD_BALANCE_FORK)) {
+ /*
+ * We're going to need the task's util for capacity_spare_wake
+ * in find_idlest_group. Sync it up to prev_cpu's
+ * last_update_time.
+ */
+ sync_entity_load_avg(&p->se);
+ }
+
if (!sd) {
- pick_cpu:
+pick_cpu:
if (sd_flag & SD_BALANCE_WAKE) /* XXX always ? */
new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
- } else while (sd) {
- struct sched_group *group;
- int weight;
+ } else {
+ if (energy_sd)
+ new_cpu = find_energy_efficient_cpu(energy_sd, p, cpu, prev_cpu, sync);
- if (!(sd->flags & sd_flag)) {
- sd = sd->child;
- continue;
- }
-
- group = find_idlest_group(sd, p, cpu, sd_flag);
- if (!group) {
- sd = sd->child;
- continue;
- }
-
- new_cpu = find_idlest_cpu(group, p, cpu);
- if (new_cpu == -1 || new_cpu == cpu) {
- /* Now try balancing at a lower domain level of cpu */
- sd = sd->child;
- continue;
- }
-
- /* Now try balancing at a lower domain level of new_cpu */
- cpu = new_cpu;
- weight = sd->span_weight;
- sd = NULL;
- for_each_domain(cpu, tmp) {
- if (weight <= tmp->span_weight)
- break;
- if (tmp->flags & sd_flag)
- sd = tmp;
- }
- /* while loop will break here if sd == NULL */
+ /* if we did an energy-aware placement and had no choices available
+ * then fall back to the default find_idlest_cpu choice
+ */
+ if (!energy_sd || (energy_sd && new_cpu == -1))
+ new_cpu = find_idlest_cpu(sd, p, cpu, prev_cpu, sd_flag);
}
+
rcu_read_unlock();
+#ifdef CONFIG_NO_HZ_COMMON
+ if (nohz_kick_needed(cpu_rq(new_cpu), true))
+ nohz_balancer_kick(true);
+#endif
+
return new_cpu;
}
@@ -6338,6 +8292,8 @@
if (hrtick_enabled(rq))
hrtick_start_fair(rq, p);
+ update_misfit_status(p, rq);
+
return p;
simple:
#endif
@@ -6355,9 +8311,12 @@
if (hrtick_enabled(rq))
hrtick_start_fair(rq, p);
+ update_misfit_status(p, rq);
+
return p;
idle:
+ update_misfit_status(NULL, rq);
new_tasks = idle_balance(rq, rf);
/*
@@ -6563,6 +8522,13 @@
enum fbq_type { regular, remote, all };
+enum group_type {
+ group_other = 0,
+ group_misfit_task,
+ group_imbalanced,
+ group_overloaded,
+};
+
#define LBF_ALL_PINNED 0x01
#define LBF_NEED_BREAK 0x02
#define LBF_DST_PINNED 0x04
@@ -6581,6 +8547,7 @@
int new_dst_cpu;
enum cpu_idle_type idle;
long imbalance;
+ unsigned int src_grp_nr_running;
/* The set of CPUs under consideration for load-balancing */
struct cpumask *cpus;
@@ -6591,6 +8558,7 @@
unsigned int loop_max;
enum fbq_type fbq_type;
+ enum group_type src_grp_type;
struct list_head tasks;
};
@@ -7004,6 +8972,10 @@
if (cfs_rq_is_decayed(cfs_rq))
list_del_leaf_cfs_rq(cfs_rq);
}
+ update_rt_rq_load_avg(rq_clock_task(rq), cpu, &rq->rt, 0);
+#ifdef CONFIG_NO_HZ_COMMON
+ rq->last_blocked_load_update_tick = jiffies;
+#endif
rq_unlock_irqrestore(rq, &rf);
}
@@ -7063,6 +9035,10 @@
rq_lock_irqsave(rq, &rf);
update_rq_clock(rq);
update_cfs_rq_load_avg(cfs_rq_clock_task(cfs_rq), cfs_rq);
+ update_rt_rq_load_avg(rq_clock_task(rq), cpu, &rq->rt, 0);
+#ifdef CONFIG_NO_HZ_COMMON
+ rq->last_blocked_load_update_tick = jiffies;
+#endif
rq_unlock_irqrestore(rq, &rf);
}
@@ -7074,12 +9050,6 @@
/********** Helpers for find_busiest_group ************************/
-enum group_type {
- group_other = 0,
- group_imbalanced,
- group_overloaded,
-};
-
/*
* sg_lb_stats - stats of a sched_group required for load_balancing
*/
@@ -7095,6 +9065,8 @@
unsigned int group_weight;
enum group_type group_type;
int group_no_capacity;
+ /* A cpu has a task too big for its capacity */
+ unsigned long group_misfit_task_load;
#ifdef CONFIG_NUMA_BALANCING
unsigned int nr_numa_running;
unsigned int nr_preferred_running;
@@ -7111,6 +9083,7 @@
unsigned long total_running;
unsigned long total_load; /* Total load of all groups in sd */
unsigned long total_capacity; /* Total capacity of all groups in sd */
+ unsigned long total_util; /* Total util of all groups in sd */
unsigned long avg_load; /* Average load across all groups in sd */
struct sg_lb_stats busiest_stat;/* Statistics of the busiest group */
@@ -7131,6 +9104,7 @@
.total_running = 0UL,
.total_load = 0UL,
.total_capacity = 0UL,
+ .total_util = 0UL,
.busiest_stat = {
.avg_load = 0UL,
.sum_nr_running = 0,
@@ -7194,13 +9168,46 @@
return 1;
}
+void init_max_cpu_capacity(struct max_cpu_capacity *mcc)
+{
+ raw_spin_lock_init(&mcc->lock);
+ mcc->val = 0;
+ mcc->cpu = -1;
+}
+
static void update_cpu_capacity(struct sched_domain *sd, int cpu)
{
unsigned long capacity = arch_scale_cpu_capacity(sd, cpu);
struct sched_group *sdg = sd->groups;
+ struct max_cpu_capacity *mcc;
+ unsigned long max_capacity;
+ int max_cap_cpu;
+ unsigned long flags;
cpu_rq(cpu)->cpu_capacity_orig = capacity;
+ capacity *= arch_scale_max_freq_capacity(sd, cpu);
+ capacity >>= SCHED_CAPACITY_SHIFT;
+
+ mcc = &cpu_rq(cpu)->rd->max_cpu_capacity;
+
+ raw_spin_lock_irqsave(&mcc->lock, flags);
+ max_capacity = mcc->val;
+ max_cap_cpu = mcc->cpu;
+
+ if ((max_capacity > capacity && max_cap_cpu == cpu) ||
+ max_capacity < capacity) {
+ mcc->val = capacity;
+ mcc->cpu = cpu;
+#ifdef CONFIG_SCHED_DEBUG
+ raw_spin_unlock_irqrestore(&mcc->lock, flags);
+ pr_info("CPU%d: update max cpu_capacity %lu\n", cpu, capacity);
+ goto skip_unlock;
+#endif
+ }
+ raw_spin_unlock_irqrestore(&mcc->lock, flags);
+
+skip_unlock: __attribute__ ((unused));
capacity *= scale_rt_capacity(cpu);
capacity >>= SCHED_CAPACITY_SHIFT;
@@ -7210,13 +9217,14 @@
cpu_rq(cpu)->cpu_capacity = capacity;
sdg->sgc->capacity = capacity;
sdg->sgc->min_capacity = capacity;
+ sdg->sgc->max_capacity = capacity;
}
void update_group_capacity(struct sched_domain *sd, int cpu)
{
struct sched_domain *child = sd->child;
struct sched_group *group, *sdg = sd->groups;
- unsigned long capacity, min_capacity;
+ unsigned long capacity, min_capacity, max_capacity;
unsigned long interval;
interval = msecs_to_jiffies(sd->balance_interval);
@@ -7230,6 +9238,7 @@
capacity = 0;
min_capacity = ULONG_MAX;
+ max_capacity = 0;
if (child->flags & SD_OVERLAP) {
/*
@@ -7260,6 +9269,7 @@
}
min_capacity = min(capacity, min_capacity);
+ max_capacity = max(capacity, max_capacity);
}
} else {
/*
@@ -7273,12 +9283,14 @@
capacity += sgc->capacity;
min_capacity = min(sgc->min_capacity, min_capacity);
+ max_capacity = max(sgc->max_capacity, max_capacity);
group = group->next;
} while (group != child->groups);
}
sdg->sgc->capacity = capacity;
sdg->sgc->min_capacity = min_capacity;
+ sdg->sgc->max_capacity = max_capacity;
}
/*
@@ -7374,16 +9386,40 @@
}
/*
- * group_smaller_cpu_capacity: Returns true if sched_group sg has smaller
+ * group_smaller_min_cpu_capacity: Returns true if sched_group sg has smaller
* per-CPU capacity than sched_group ref.
*/
static inline bool
-group_smaller_cpu_capacity(struct sched_group *sg, struct sched_group *ref)
+group_smaller_min_cpu_capacity(struct sched_group *sg, struct sched_group *ref)
{
return sg->sgc->min_capacity * capacity_margin <
ref->sgc->min_capacity * 1024;
}
+/*
+ * group_smaller_max_cpu_capacity: Returns true if sched_group sg has smaller
+ * per-CPU capacity_orig than sched_group ref.
+ */
+static inline bool
+group_smaller_max_cpu_capacity(struct sched_group *sg, struct sched_group *ref)
+{
+ return sg->sgc->max_capacity * capacity_margin <
+ ref->sgc->max_capacity * 1024;
+}
+
+/*
+ * group_similar_cpu_capacity: Returns true if the minimum capacity of the
+ * compared groups differ by less than 12.5%.
+ */
+static inline bool
+group_similar_cpu_capacity(struct sched_group *sg, struct sched_group *ref)
+{
+ long diff = sg->sgc->min_capacity - ref->sgc->min_capacity;
+ long max = max(sg->sgc->min_capacity, ref->sgc->min_capacity);
+
+ return abs(diff) < max >> 3;
+}
+
static inline enum
group_type group_classify(struct sched_group *group,
struct sg_lb_stats *sgs)
@@ -7394,6 +9430,9 @@
if (sg_imbalanced(group))
return group_imbalanced;
+ if (sgs->group_misfit_task_load)
+ return group_misfit_task;
+
return group_other;
}
@@ -7404,12 +9443,13 @@
* @load_idx: Load index of sched_domain of this_cpu for load calc.
* @local_group: Does group contain this_cpu.
* @sgs: variable to hold the statistics for this group.
- * @overload: Indicate more than one runnable task for any CPU.
+ * @overload: Indicate pullable load (e.g. >1 runnable task).
+ * @overutilized: Indicate overutilization for any CPU.
*/
static inline void update_sg_lb_stats(struct lb_env *env,
struct sched_group *group, int load_idx,
int local_group, struct sg_lb_stats *sgs,
- bool *overload)
+ bool *overload, bool *overutilized, bool *misfit_task)
{
unsigned long load;
int i, nr_running;
@@ -7443,6 +9483,20 @@
*/
if (!nr_running && idle_cpu(i))
sgs->idle_cpus++;
+
+ if (env->sd->flags & SD_ASYM_CPUCAPACITY &&
+ sgs->group_misfit_task_load < rq->misfit_task_load) {
+ sgs->group_misfit_task_load = rq->misfit_task_load;
+ *overload = 1;
+ }
+
+
+ if (cpu_overutilized(i)) {
+ *overutilized = true;
+
+ if (rq->misfit_task_load)
+ *misfit_task = true;
+ }
}
/* Adjust by relative CPU capacity of the group */
@@ -7478,6 +9532,17 @@
{
struct sg_lb_stats *busiest = &sds->busiest_stat;
+ /*
+ * Don't try to pull misfit tasks we can't help.
+ * We can use max_capacity here as reduction in capacity on some
+ * cpus in the group should either be possible to resolve
+ * internally or be covered by avg_load imbalance (eventually).
+ */
+ if (sgs->group_type == group_misfit_task &&
+ (!group_smaller_max_cpu_capacity(sg, sds->local) ||
+ !group_has_capacity(env, &sds->local_stat)))
+ return false;
+
if (sgs->group_type > busiest->group_type)
return true;
@@ -7497,7 +9562,23 @@
* power/energy consequences are not considered.
*/
if (sgs->sum_nr_running <= sgs->group_weight &&
- group_smaller_cpu_capacity(sds->local, sg))
+ group_smaller_min_cpu_capacity(sds->local, sg))
+ return false;
+
+ /*
+ * Candidate sg doesn't face any severe imbalance issues so
+ * don't disturb unless the groups are of similar capacity
+ * where balancing is more harmless.
+ */
+ if (sgs->group_type == group_other &&
+ !group_similar_cpu_capacity(sds->local, sg))
+ return false;
+
+ /*
+ * If we have more than one misfit sg go with the biggest misfit.
+ */
+ if (sgs->group_type == group_misfit_task &&
+ sgs->group_misfit_task_load < busiest->group_misfit_task_load)
return false;
asym_packing:
@@ -7557,6 +9638,18 @@
}
#endif /* CONFIG_NUMA_BALANCING */
+#ifdef CONFIG_NO_HZ_COMMON
+static struct {
+ cpumask_var_t idle_cpus_mask;
+ atomic_t nr_cpus;
+ unsigned long next_balance; /* in jiffy units */
+ unsigned long next_update; /* in jiffy units */
+} nohz ____cacheline_aligned;
+#endif
+
+#define lb_sd_parent(sd) \
+ (sd->parent && sd->parent->groups != sd->parent->groups->next)
+
/**
* update_sd_lb_stats - Update sched_domain's statistics for load balancing.
* @env: The load balancing environment.
@@ -7568,11 +9661,33 @@
struct sched_group *sg = env->sd->groups;
struct sg_lb_stats *local = &sds->local_stat;
struct sg_lb_stats tmp_sgs;
- int load_idx, prefer_sibling = 0;
- bool overload = false;
+ int load_idx;
+ bool overload = false, overutilized = false, misfit_task = false;
+ bool prefer_sibling = child && child->flags & SD_PREFER_SIBLING;
- if (child && child->flags & SD_PREFER_SIBLING)
- prefer_sibling = 1;
+#ifdef CONFIG_NO_HZ_COMMON
+ if (env->idle == CPU_NEWLY_IDLE) {
+ int cpu;
+
+ /* Update the stats of NOHZ idle CPUs in the sd */
+ for_each_cpu_and(cpu, sched_domain_span(env->sd),
+ nohz.idle_cpus_mask) {
+ struct rq *rq = cpu_rq(cpu);
+
+ /* ... Unless we've already done since the last tick */
+ if (time_after(jiffies,
+ rq->last_blocked_load_update_tick))
+ update_blocked_averages(cpu);
+ }
+ }
+ /*
+ * If we've just updated all of the NOHZ idle CPUs, then we can push
+ * back the next nohz.next_update, which will prevent an unnecessary
+ * wakeup for the nohz stats kick
+ */
+ if (cpumask_subset(nohz.idle_cpus_mask, sched_domain_span(env->sd)))
+ nohz.next_update = jiffies + LOAD_AVG_PERIOD;
+#endif
load_idx = get_sd_load_idx(env->sd, env->idle);
@@ -7591,7 +9706,8 @@
}
update_sg_lb_stats(env, sg, load_idx, local_group, sgs,
- &overload);
+ &overload, &overutilized,
+ &misfit_task);
if (local_group)
goto next_group;
@@ -7623,6 +9739,7 @@
sds->total_running += sgs->sum_nr_running;
sds->total_load += sgs->group_load;
sds->total_capacity += sgs->group_capacity;
+ sds->total_util += sgs->group_util;
sg = sg->next;
} while (sg != env->sd->groups);
@@ -7630,11 +9747,52 @@
if (env->sd->flags & SD_NUMA)
env->fbq_type = fbq_classify_group(&sds->busiest_stat);
- if (!env->sd->parent) {
+ env->src_grp_nr_running = sds->busiest_stat.sum_nr_running;
+
+ if (!lb_sd_parent(env->sd)) {
/* update overload indicator if we are at root domain */
- if (env->dst_rq->rd->overload != overload)
- env->dst_rq->rd->overload = overload;
+ if (READ_ONCE(env->dst_rq->rd->overload) != overload)
+ WRITE_ONCE(env->dst_rq->rd->overload, overload);
}
+
+ if (overutilized)
+ set_sd_overutilized(env->sd);
+ else
+ clear_sd_overutilized(env->sd);
+
+ /*
+ * If there is a misfit task in one cpu in this sched_domain
+ * it is likely that the imbalance cannot be sorted out among
+ * the cpu's in this sched_domain. In this case set the
+ * overutilized flag at the parent sched_domain.
+ */
+ if (misfit_task) {
+ struct sched_domain *sd = env->sd->parent;
+
+ /*
+ * In case of a misfit task, load balance at the parent
+ * sched domain level will make sense only if the the cpus
+ * have a different capacity. If cpus at a domain level have
+ * the same capacity, the misfit task cannot be well
+ * accomodated in any of the cpus and there in no point in
+ * trying a load balance at this level
+ */
+ while (sd) {
+ if (sd->flags & SD_ASYM_CPUCAPACITY) {
+ set_sd_overutilized(sd);
+ break;
+ }
+ sd = sd->parent;
+ }
+ }
+
+ /*
+ * If the domain util is greater that domain capacity, load balancing
+ * needs to be done at the next sched domain level as well.
+ */
+ if (lb_sd_parent(env->sd) &&
+ sds->total_capacity * 1024 < sds->total_util * capacity_margin)
+ set_sd_overutilized(env->sd->parent);
}
/**
@@ -7750,7 +9908,22 @@
capa_move /= SCHED_CAPACITY_SCALE;
/* Move if we gain throughput */
- if (capa_move > capa_now)
+ if (capa_move > capa_now) {
+ env->imbalance = busiest->load_per_task;
+ return;
+ }
+
+ /* We can't see throughput improvement with the load-based
+ * method, but it is possible depending upon group size and
+ * capacity range that there might still be an underutilized
+ * cpu available in an asymmetric capacity system. Do one last
+ * check just in case.
+ */
+ if (env->sd->flags & SD_ASYM_CPUCAPACITY &&
+ busiest->group_type == group_overloaded &&
+ busiest->sum_nr_running > busiest->group_weight &&
+ local->sum_nr_running < local->group_weight &&
+ local->group_capacity < busiest->group_capacity)
env->imbalance = busiest->load_per_task;
}
@@ -7783,8 +9956,9 @@
* factors in sg capacity and sgs with smaller group_type are
* skipped when updating the busiest sg:
*/
- if (busiest->avg_load <= sds->avg_load ||
- local->avg_load >= sds->avg_load) {
+ if (busiest->group_type != group_misfit_task &&
+ (busiest->avg_load <= sds->avg_load ||
+ local->avg_load >= sds->avg_load)) {
env->imbalance = 0;
return fix_small_imbalance(env, sds);
}
@@ -7818,6 +9992,22 @@
(sds->avg_load - local->avg_load) * local->group_capacity
) / SCHED_CAPACITY_SCALE;
+ /* Boost imbalance to allow misfit task to be balanced.
+ * Always do this if we are doing a NEWLY_IDLE balance
+ * on the assumption that any tasks we have must not be
+ * long-running (and hence we cannot rely upon load).
+ * However if we are not idle, we should assume the tasks
+ * we have are longer running and not override load-based
+ * calculations above unless we are sure that the local
+ * group is underutilized.
+ */
+ if (busiest->group_type == group_misfit_task &&
+ (env->idle == CPU_NEWLY_IDLE ||
+ local->sum_nr_running < local->group_weight)) {
+ env->imbalance = max_t(long, env->imbalance,
+ busiest->group_misfit_task_load);
+ }
+
/*
* if *imbalance is less than the average load per runnable task
* there is no guarantee that any tasks will be moved so we'll have
@@ -7853,6 +10043,10 @@
* this level.
*/
update_sd_lb_stats(env, &sds);
+
+ if (energy_aware() && !sd_overutilized(env->sd))
+ goto out_balanced;
+
local = &sds.local_stat;
busiest = &sds.busiest_stat;
@@ -7876,11 +10070,18 @@
if (busiest->group_type == group_imbalanced)
goto force_balance;
- /* SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */
- if (env->idle == CPU_NEWLY_IDLE && group_has_capacity(env, local) &&
+ /*
+ * When dst_cpu is idle, prevent SMP nice and/or asymmetric group
+ * capacities from resulting in underutilization due to avg_load.
+ */
+ if (env->idle != CPU_NOT_IDLE && group_has_capacity(env, local) &&
busiest->group_no_capacity)
goto force_balance;
+ /* Misfit tasks should be dealt with regardless of the avg load */
+ if (busiest->group_type == group_misfit_task)
+ goto force_balance;
+
/*
* If the local group is busier than the selected busiest group
* don't try and pull any tasks.
@@ -7918,6 +10119,7 @@
force_balance:
/* Looks like there is an imbalance. Compute it */
+ env->src_grp_type = busiest->group_type;
calculate_imbalance(env, &sds);
return sds.busiest;
@@ -7965,8 +10167,31 @@
if (rt > env->fbq_type)
continue;
+ /*
+ * For ASYM_CPUCAPACITY domains with misfit tasks we simply
+ * seek the "biggest" misfit task.
+ */
+ if (env->src_grp_type == group_misfit_task) {
+ if (rq->misfit_task_load > busiest_load) {
+ busiest_load = rq->misfit_task_load;
+ busiest = rq;
+ }
+ continue;
+ }
+
capacity = capacity_of(i);
+ /*
+ * For ASYM_CPUCAPACITY domains, don't pick a cpu that could
+ * eventually lead to active_balancing high->low capacity.
+ * Higher per-cpu capacity is considered better than balancing
+ * average load.
+ */
+ if (env->sd->flags & SD_ASYM_CPUCAPACITY &&
+ capacity_of(env->dst_cpu) < capacity &&
+ rq->nr_running == 1)
+ continue;
+
wl = weighted_cpuload(rq);
/*
@@ -8034,6 +10259,21 @@
return 1;
}
+ if ((capacity_of(env->src_cpu) < capacity_of(env->dst_cpu)) &&
+ ((capacity_orig_of(env->src_cpu) < capacity_orig_of(env->dst_cpu))) &&
+ env->src_rq->cfs.h_nr_running == 1 &&
+ cpu_overutilized(env->src_cpu) &&
+ !cpu_overutilized(env->dst_cpu)) {
+ return 1;
+ }
+
+ if (env->src_grp_type == group_misfit_task)
+ return 1;
+
+ if (env->src_grp_type == group_overloaded &&
+ env->src_rq->misfit_task_load)
+ return 1;
+
return unlikely(sd->nr_balance_failed > sd->cache_nice_tries+2);
}
@@ -8086,7 +10326,7 @@
int *continue_balancing)
{
int ld_moved, cur_ld_moved, active_balance = 0;
- struct sched_domain *sd_parent = sd->parent;
+ struct sched_domain *sd_parent = lb_sd_parent(sd) ? sd->parent : NULL;
struct sched_group *group;
struct rq *busiest;
struct rq_flags rf;
@@ -8252,7 +10492,8 @@
* excessive cache_hot migrations and active balances.
*/
if (idle != CPU_NEWLY_IDLE)
- sd->nr_balance_failed++;
+ if (env.src_grp_nr_running > 1)
+ sd->nr_balance_failed++;
if (need_active_balance(&env)) {
unsigned long flags;
@@ -8348,6 +10589,7 @@
get_sd_balance_interval(struct sched_domain *sd, int cpu_busy)
{
unsigned long interval = sd->balance_interval;
+ unsigned int cpu;
if (cpu_busy)
interval *= sd->busy_factor;
@@ -8356,6 +10598,24 @@
interval = msecs_to_jiffies(interval);
interval = clamp(interval, 1UL, max_load_balance_interval);
+ /*
+ * check if sched domain is marked as overutilized
+ * we ought to only do this on systems which have SD_ASYMCAPACITY
+ * but we want to do it for all sched domains in those systems
+ * So for now, just check if overutilized as a proxy.
+ */
+ /*
+ * If we are overutilized and we have a misfit task, then
+ * we want to balance as soon as practically possible, so
+ * we return an interval of zero.
+ */
+ if (energy_aware() && sd_overutilized(sd)) {
+ /* we know the root is overutilized, let's check for a misfit task */
+ for_each_cpu(cpu, sched_domain_span(sd)) {
+ if (cpu_rq(cpu)->misfit_task_load)
+ return 1;
+ }
+ }
return interval;
}
@@ -8405,7 +10665,7 @@
rq_unpin_lock(this_rq, rf);
if (this_rq->avg_idle < sysctl_sched_migration_cost ||
- !this_rq->rd->overload) {
+ !READ_ONCE(this_rq->rd->overload)) {
rcu_read_lock();
sd = rcu_dereference_check_sched_domain(this_rq->sd);
if (sd)
@@ -8423,8 +10683,12 @@
int continue_balancing = 1;
u64 t0, domain_cost;
- if (!(sd->flags & SD_LOAD_BALANCE))
+ if (!(sd->flags & SD_LOAD_BALANCE)) {
+ if (time_after_eq(jiffies,
+ sd->groups->sgc->next_update))
+ update_group_capacity(sd, this_cpu);
continue;
+ }
if (this_rq->avg_idle < curr_cost + sd->max_newidle_lb_cost) {
update_next_balance(sd, &next_balance);
@@ -8589,11 +10853,6 @@
* needed, they will kick the idle load balancer, which then does idle
* load balancing for all the idle CPUs.
*/
-static struct {
- cpumask_var_t idle_cpus_mask;
- atomic_t nr_cpus;
- unsigned long next_balance; /* in jiffy units */
-} nohz ____cacheline_aligned;
static inline int find_new_ilb(void)
{
@@ -8610,7 +10869,7 @@
* nohz_load_balancer CPU (if there is one) otherwise fallback to any idle
* CPU (if there is one).
*/
-static void nohz_balancer_kick(void)
+static void nohz_balancer_kick(bool only_update)
{
int ilb_cpu;
@@ -8623,6 +10882,10 @@
if (test_and_set_bit(NOHZ_BALANCE_KICK, nohz_flags(ilb_cpu)))
return;
+
+ if (only_update)
+ set_bit(NOHZ_STATS_KICK, nohz_flags(ilb_cpu));
+
/*
* Use smp_send_reschedule() instead of resched_cpu().
* This way we generate a sched IPI on the target cpu which
@@ -8710,6 +10973,8 @@
atomic_inc(&nohz.nr_cpus);
set_bit(NOHZ_TICK_STOPPED, nohz_flags(cpu));
}
+#else
+static inline void nohz_balancer_kick(bool only_update) {}
#endif
static DEFINE_SPINLOCK(balancing);
@@ -8741,8 +11006,6 @@
int need_serialize, need_decay = 0;
u64 max_cost = 0;
- update_blocked_averages(cpu);
-
rcu_read_lock();
for_each_domain(cpu, sd) {
/*
@@ -8757,9 +11020,16 @@
}
max_cost += sd->max_newidle_lb_cost;
- if (!(sd->flags & SD_LOAD_BALANCE))
+ if (energy_aware() && !sd_overutilized(sd))
continue;
+ if (!(sd->flags & SD_LOAD_BALANCE)) {
+ if (time_after_eq(jiffies,
+ sd->groups->sgc->next_update))
+ update_group_capacity(sd, cpu);
+ continue;
+ }
+
/*
* Stop the load balance at this level. There is another
* CPU in our sched group which is doing load balancing more
@@ -8841,6 +11111,7 @@
{
int this_cpu = this_rq->cpu;
struct rq *rq;
+ struct sched_domain *sd;
int balance_cpu;
/* Earliest time when we have to do rebalance again */
unsigned long next_balance = jiffies + 60*HZ;
@@ -8850,6 +11121,23 @@
!test_bit(NOHZ_BALANCE_KICK, nohz_flags(this_cpu)))
goto end;
+ /*
+ * This cpu is going to update the blocked load of idle CPUs either
+ * before doing a rebalancing or just to keep metrics up to date. we
+ * can safely update the next update timestamp
+ */
+ rcu_read_lock();
+ sd = rcu_dereference(this_rq->sd);
+ /*
+ * Check whether there is a sched_domain available for this cpu.
+ * The last other cpu can have been unplugged since the ILB has been
+ * triggered and the sched_domain can now be null. The idle balance
+ * sequence will quickly be aborted as there is no more idle CPUs
+ */
+ if (sd)
+ nohz.next_update = jiffies + msecs_to_jiffies(LOAD_AVG_PERIOD);
+ rcu_read_unlock();
+
for_each_cpu(balance_cpu, nohz.idle_cpus_mask) {
if (balance_cpu == this_cpu || !idle_cpu(balance_cpu))
continue;
@@ -8876,7 +11164,15 @@
cpu_load_update_idle(rq);
rq_unlock_irq(rq, &rf);
- rebalance_domains(rq, CPU_IDLE);
+ update_blocked_averages(balance_cpu);
+ /*
+ * This idle load balance softirq may have been
+ * triggered only to update the blocked load and shares
+ * of idle CPUs (which we have just done for
+ * balance_cpu). In that case skip the actual balance.
+ */
+ if (!test_bit(NOHZ_STATS_KICK, nohz_flags(this_cpu)))
+ rebalance_domains(rq, idle);
}
if (time_after(next_balance, rq->next_balance)) {
@@ -8907,7 +11203,7 @@
* - For SD_ASYM_PACKING, if the lower numbered cpu's in the scheduler
* domain span are idle.
*/
-static inline bool nohz_kick_needed(struct rq *rq)
+static inline bool nohz_kick_needed(struct rq *rq, bool only_update)
{
unsigned long now = jiffies;
struct sched_domain_shared *sds;
@@ -8915,7 +11211,7 @@
int nr_busy, i, cpu = rq->cpu;
bool kick = false;
- if (unlikely(rq->idle_balance))
+ if (unlikely(rq->idle_balance) && !only_update)
return false;
/*
@@ -8932,15 +11228,27 @@
if (likely(!atomic_read(&nohz.nr_cpus)))
return false;
+ if (only_update) {
+ if (time_before(now, nohz.next_update))
+ return false;
+ else
+ return true;
+ }
+
if (time_before(now, nohz.next_balance))
return false;
- if (rq->nr_running >= 2)
+ if (rq->nr_running >= 2 &&
+ (!energy_aware() || cpu_overutilized(cpu)))
return true;
+ /* Do idle load balance if there have misfit task */
+ if (energy_aware())
+ return rq->misfit_task_load;
+
rcu_read_lock();
sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
- if (sds) {
+ if (sds && !energy_aware()) {
/*
* XXX: write a coherent comment on why we do this.
* See also: http://lkml.kernel.org/r/20111202010832.602203411@sbsiddha-desk.sc.intel.com
@@ -8981,6 +11289,7 @@
}
#else
static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle) { }
+static inline bool nohz_kick_needed(struct rq *rq, bool only_update) { return false; }
#endif
/*
@@ -9002,7 +11311,14 @@
* and abort nohz_idle_balance altogether if we pull some load.
*/
nohz_idle_balance(this_rq, idle);
+ update_blocked_averages(this_rq->cpu);
+#ifdef CONFIG_NO_HZ_COMMON
+ if (!test_bit(NOHZ_STATS_KICK, nohz_flags(this_rq->cpu)))
+ rebalance_domains(this_rq, idle);
+ clear_bit(NOHZ_STATS_KICK, nohz_flags(this_rq->cpu));
+#else
rebalance_domains(this_rq, idle);
+#endif
}
/*
@@ -9017,8 +11333,8 @@
if (time_after_eq(jiffies, rq->next_balance))
raise_softirq(SCHED_SOFTIRQ);
#ifdef CONFIG_NO_HZ_COMMON
- if (nohz_kick_needed(rq))
- nohz_balancer_kick();
+ if (nohz_kick_needed(rq, false))
+ nohz_balancer_kick(false);
#endif
}
@@ -9054,6 +11370,10 @@
if (static_branch_unlikely(&sched_numa_balancing))
task_tick_numa(rq, curr);
+
+ update_misfit_status(curr, rq);
+
+ update_overutilized_status(rq);
}
/*
@@ -9597,8 +11917,11 @@
#ifdef CONFIG_NO_HZ_COMMON
nohz.next_balance = jiffies;
+ nohz.next_update = jiffies;
zalloc_cpumask_var(&nohz.idle_cpus_mask, GFP_NOWAIT);
#endif
+
+ alloc_eenv();
#endif /* SMP */
}
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 9552fd5..dbade30 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -85,3 +85,47 @@
SCHED_FEAT(WA_IDLE, true)
SCHED_FEAT(WA_WEIGHT, true)
SCHED_FEAT(WA_BIAS, true)
+
+/*
+ * UtilEstimation. Use estimated CPU utilization.
+ */
+SCHED_FEAT(UTIL_EST, true)
+
+/*
+ * Energy aware scheduling. Use platform energy model to guide scheduling
+ * decisions optimizing for energy efficiency.
+ */
+#ifdef CONFIG_DEFAULT_USE_ENERGY_AWARE
+SCHED_FEAT(ENERGY_AWARE, true)
+#else
+SCHED_FEAT(ENERGY_AWARE, false)
+#endif
+
+/*
+ * Energy aware scheduling algorithm choices:
+ * EAS_PREFER_IDLE
+ * Direct tasks in a schedtune.prefer_idle=1 group through
+ * the EAS path for wakeup task placement. Otherwise, put
+ * those tasks through the mainline slow path.
+ * FIND_BEST_TARGET
+ * Limit the number of placement options for which we calculate
+ * energy by using heuristics to select 'best idle' and
+ * 'best active' cpu options.
+ * FBT_STRICT_ORDER
+ * ON: If the target CPU saves any energy, use that.
+ * OFF: Use whichever of target or backup saves most.
+ */
+SCHED_FEAT(EAS_PREFER_IDLE, true)
+SCHED_FEAT(FIND_BEST_TARGET, true)
+SCHED_FEAT(FBT_STRICT_ORDER, true)
+
+/*
+ * Apply schedtune boost hold to tasks of all sched classes.
+ * If enabled, schedtune will hold the boost applied to a CPU
+ * for 50ms regardless of task activation - if the task is
+ * still running 50ms later, the boost hold expires and schedtune
+ * boost will expire immediately the task stops.
+ * If disabled, this behaviour will only apply to tasks of the
+ * RT class.
+ */
+SCHED_FEAT(SCHEDTUNE_BOOST_HOLD_ALL, false)
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 257f4f0..2e973e6 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -25,9 +25,10 @@
* sched_idle_set_state - Record idle state for the current CPU.
* @idle_state: State to record.
*/
-void sched_idle_set_state(struct cpuidle_state *idle_state)
+void sched_idle_set_state(struct cpuidle_state *idle_state, int index)
{
idle_set_state(this_rq(), idle_state);
+ idle_set_state_idx(this_rq(), index);
}
static int __read_mostly cpu_idle_force_poll;
@@ -146,13 +147,15 @@
}
/*
- * Tell the RCU framework we are entering an idle section,
- * so no more rcu read side critical sections and one more
+ * The RCU framework needs to be told that we are entering an idle
+ * section, so no more rcu read side critical sections and one more
* step to the grace period
*/
- rcu_idle_enter();
if (cpuidle_not_available(drv, dev)) {
+ tick_nohz_idle_stop_tick();
+ rcu_idle_enter();
+
default_idle_call();
goto exit_idle;
}
@@ -169,20 +172,37 @@
if (idle_should_enter_s2idle() || dev->use_deepest_state) {
if (idle_should_enter_s2idle()) {
+ rcu_idle_enter();
+
entered_state = cpuidle_enter_s2idle(drv, dev);
if (entered_state > 0) {
local_irq_enable();
goto exit_idle;
}
+
+ rcu_idle_exit();
}
+ tick_nohz_idle_stop_tick();
+ rcu_idle_enter();
+
next_state = cpuidle_find_deepest_state(drv, dev);
call_cpuidle(drv, dev, next_state);
} else {
+ bool stop_tick = true;
+
/*
* Ask the cpuidle framework to choose a convenient idle state.
*/
- next_state = cpuidle_select(drv, dev);
+ next_state = cpuidle_select(drv, dev, &stop_tick);
+
+ if (stop_tick)
+ tick_nohz_idle_stop_tick();
+ else
+ tick_nohz_idle_retain_tick();
+
+ rcu_idle_enter();
+
entered_state = call_cpuidle(drv, dev, next_state);
/*
* Give the governor an opportunity to reflect on the outcome
@@ -227,6 +247,7 @@
rmb();
if (cpu_is_offline(smp_processor_id())) {
+ tick_nohz_idle_stop_tick_protected();
cpuhp_report_idle_dead();
arch_cpu_idle_dead();
}
@@ -240,10 +261,12 @@
* broadcast device expired for us, we don't want to go deep
* idle as we know that the IPI is going to arrive right away.
*/
- if (cpu_idle_force_poll || tick_check_broadcast_expired())
+ if (cpu_idle_force_poll || tick_check_broadcast_expired()) {
+ tick_nohz_idle_restart_tick();
cpu_idle_poll();
- else
+ } else {
cpuidle_idle_call();
+ }
arch_cpu_idle_exit();
}
diff --git a/kernel/sched/idle_task.c b/kernel/sched/idle_task.c
index d518664..609abdf 100644
--- a/kernel/sched/idle_task.c
+++ b/kernel/sched/idle_task.c
@@ -10,7 +10,8 @@
#ifdef CONFIG_SMP
static int
-select_task_rq_idle(struct task_struct *p, int cpu, int sd_flag, int flags)
+select_task_rq_idle(struct task_struct *p, int cpu, int sd_flag, int flags,
+ int sibling_count_hint)
{
return task_cpu(p); /* IDLE tasks as never migrated */
}
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index cb9a5b8..58c7498 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -8,6 +8,9 @@
#include <linux/slab.h>
#include <linux/irq_work.h>
+#include "tune.h"
+
+#include "walt.h"
int sched_rr_timeslice = RR_TIMESLICE;
int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;
@@ -1324,10 +1327,13 @@
{
struct sched_rt_entity *rt_se = &p->rt;
+ schedtune_enqueue_task(p, cpu_of(rq));
+
if (flags & ENQUEUE_WAKEUP)
rt_se->timeout = 0;
enqueue_rt_entity(rt_se, flags);
+ walt_inc_cumulative_runnable_avg(rq, p);
if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
enqueue_pushable_task(rq, p);
@@ -1337,8 +1343,11 @@
{
struct sched_rt_entity *rt_se = &p->rt;
+ schedtune_dequeue_task(p, cpu_of(rq));
+
update_curr_rt(rq);
dequeue_rt_entity(rt_se, flags);
+ walt_dec_cumulative_runnable_avg(rq, p);
dequeue_pushable_task(rq, p);
}
@@ -1381,7 +1390,8 @@
static int find_lowest_rq(struct task_struct *task);
static int
-select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags)
+select_task_rq_rt(struct task_struct *p, int cpu, int sd_flag, int flags,
+ int sibling_count_hint)
{
struct task_struct *curr;
struct rq *rq;
@@ -1528,6 +1538,8 @@
return p;
}
+extern int update_rt_rq_load_avg(u64 now, int cpu, struct rt_rq *rt_rq, int running);
+
static struct task_struct *
pick_next_task_rt(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)
{
@@ -1573,6 +1585,10 @@
queue_push_tasks(rq);
+ if (p)
+ update_rt_rq_load_avg(rq_clock_task(rq), cpu_of(rq), rt_rq,
+ rq->curr->sched_class == &rt_sched_class);
+
return p;
}
@@ -1580,6 +1596,8 @@
{
update_curr_rt(rq);
+ update_rt_rq_load_avg(rq_clock_task(rq), cpu_of(rq), &rq->rt, 1);
+
/*
* The previous task needs to be made eligible for pushing
* if it is still active
@@ -1847,7 +1865,9 @@
}
deactivate_task(rq, next_task, 0);
+ next_task->on_rq = TASK_ON_RQ_MIGRATING;
set_task_cpu(next_task, lowest_rq->cpu);
+ next_task->on_rq = TASK_ON_RQ_QUEUED;
activate_task(lowest_rq, next_task, 0);
ret = 1;
@@ -2119,7 +2139,9 @@
resched = true;
deactivate_task(src_rq, p, 0);
+ p->on_rq = TASK_ON_RQ_MIGRATING;
set_task_cpu(p, this_cpu);
+ p->on_rq = TASK_ON_RQ_QUEUED;
activate_task(this_rq, p, 0);
/*
* We continue with the search, just in
@@ -2299,6 +2321,7 @@
struct sched_rt_entity *rt_se = &p->rt;
update_curr_rt(rq);
+ update_rt_rq_load_avg(rq_clock_task(rq), cpu_of(rq), &rq->rt, 1);
watchdog(rq, p);
diff --git a/kernel/sched/sched-pelt.h b/kernel/sched/sched-pelt.h
index a2647367..ac7d489 100644
--- a/kernel/sched/sched-pelt.h
+++ b/kernel/sched/sched-pelt.h
@@ -1,14 +1,62 @@
/* SPDX-License-Identifier: GPL-2.0 */
/* Generated by Documentation/scheduler/sched-pelt; do not modify. */
+
+#ifdef CONFIG_PELT_UTIL_HALFLIFE_32
static const u32 runnable_avg_yN_inv[] = {
- 0xffffffff, 0xfa83b2da, 0xf5257d14, 0xefe4b99a, 0xeac0c6e6, 0xe5b906e6,
- 0xe0ccdeeb, 0xdbfbb796, 0xd744fcc9, 0xd2a81d91, 0xce248c14, 0xc9b9bd85,
- 0xc5672a10, 0xc12c4cc9, 0xbd08a39e, 0xb8fbaf46, 0xb504f333, 0xb123f581,
- 0xad583ee9, 0xa9a15ab4, 0xa5fed6a9, 0xa2704302, 0x9ef5325f, 0x9b8d39b9,
- 0x9837f050, 0x94f4efa8, 0x91c3d373, 0x8ea4398a, 0x8b95c1e3, 0x88980e80,
- 0x85aac367, 0x82cd8698,
+ 0xffffffff,0xfa83b2da,0xf5257d14,0xefe4b99a,
+ 0xeac0c6e6,0xe5b906e6,0xe0ccdeeb,0xdbfbb796,
+ 0xd744fcc9,0xd2a81d91,0xce248c14,0xc9b9bd85,
+ 0xc5672a10,0xc12c4cc9,0xbd08a39e,0xb8fbaf46,
+ 0xb504f333,0xb123f581,0xad583ee9,0xa9a15ab4,
+ 0xa5fed6a9,0xa2704302,0x9ef5325f,0x9b8d39b9,
+ 0x9837f050,0x94f4efa8,0x91c3d373,0x8ea4398a,
+ 0x8b95c1e3,0x88980e80,0x85aac367,0x82cd8698,
+};
+
+static const u32 runnable_avg_yN_sum[] = {
+ 0, 1002, 1982, 2941, 3880, 4798, 5697, 6576, 7437, 8279, 9103,
+ 9909,10698,11470,12226,12966,13690,14398,15091,15769,16433,17082,
+ 17718,18340,18949,19545,20128,20698,21256,21802,22336,22859,23371,
};
#define LOAD_AVG_PERIOD 32
#define LOAD_AVG_MAX 47742
+#define LOAD_AVG_MAX_N 345
+
+#endif
+
+#ifdef CONFIG_PELT_UTIL_HALFLIFE_16
+static const u32 runnable_avg_yN_inv[] = {
+ 0xffffffff,0xf5257d14,0xeac0c6e6,0xe0ccdeeb,
+ 0xd744fcc9,0xce248c14,0xc5672a10,0xbd08a39e,
+ 0xb504f333,0xad583ee9,0xa5fed6a9,0x9ef5325f,
+ 0x9837f050,0x91c3d373,0x8b95c1e3,0x85aac367,
+};
+
+static const u32 runnable_avg_yN_sum[] = {
+ 0,22380,22411,22441,22470,22497,22523,22548,22572,22595,22617,
+ 22638,22658,22677,22696,22714,22731,
+};
+
+#define LOAD_AVG_PERIOD 16
+#define LOAD_AVG_MAX 24152
+#define LOAD_AVG_MAX_N 517
+
+#endif
+
+#ifdef CONFIG_PELT_UTIL_HALFLIFE_8
+static const u32 runnable_avg_yN_inv[] = {
+ 0xffffffff,0xeac0c6e6,0xd744fcc9,0xc5672a10,
+ 0xb504f333,0xa5fed6a9,0x9837f050,0x8b95c1e3,
+};
+
+static const u32 runnable_avg_yN_sum[] = {
+ 0,20844,20053,19327,18661,18051,17491,16978,16507,
+};
+
+#define LOAD_AVG_PERIOD 8
+#define LOAD_AVG_MAX 12337
+#define LOAD_AVG_MAX_N 603
+
+#endif
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b293761..65a3909 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -492,6 +492,9 @@
u64 throttled_clock_task_time;
int throttled, throttle_count;
struct list_head throttled_list;
+#ifdef CONFIG_SCHED_WALT
+ u64 cumulative_runnable_avg;
+#endif /* CONFIG_SCHED_WALT */
#endif /* CONFIG_CFS_BANDWIDTH */
#endif /* CONFIG_FAIR_GROUP_SCHED */
};
@@ -524,6 +527,9 @@
unsigned long rt_nr_total;
int overloaded;
struct plist_head pushable_tasks;
+
+ struct sched_avg avg;
+
#endif /* CONFIG_SMP */
int rt_queued;
@@ -605,6 +611,12 @@
return arch_asym_cpu_priority(a) > arch_asym_cpu_priority(b);
}
+struct max_cpu_capacity {
+ raw_spinlock_t lock;
+ unsigned long val;
+ int cpu;
+};
+
/*
* We add the notion of a root-domain which will be used to define per-domain
* variables. Each exclusive cpuset essentially defines an island domain by
@@ -620,8 +632,12 @@
cpumask_var_t span;
cpumask_var_t online;
- /* Indicate more than one runnable task for any CPU */
- bool overload;
+ /*
+ * Indicate pullable load on at least one CPU, e.g:
+ * - More than one runnable task
+ * - Running task is misfit
+ */
+ int overload;
/*
* The bit corresponding to a CPU gets set here if such CPU has more
@@ -652,13 +668,18 @@
cpumask_var_t rto_mask;
struct cpupri cpupri;
- unsigned long max_cpu_capacity;
+ /* Maximum cpu capacity in the system. */
+ struct max_cpu_capacity max_cpu_capacity;
+
+ /* First cpu with maximum and minimum original capacity */
+ int max_cap_orig_cpu, min_cap_orig_cpu;
};
extern struct root_domain def_root_domain;
extern struct mutex sched_domains_mutex;
extern void init_defrootdomain(void);
+extern void init_max_cpu_capacity(struct max_cpu_capacity *mcc);
extern int sched_init_domains(const struct cpumask *cpu_map);
extern void rq_attach_root(struct rq *rq, struct root_domain *rd);
extern void sched_get_rd(struct root_domain *rd);
@@ -694,6 +715,7 @@
#ifdef CONFIG_NO_HZ_COMMON
#ifdef CONFIG_SMP
unsigned long last_load_update_tick;
+ unsigned long last_blocked_load_update_tick;
#endif /* CONFIG_SMP */
unsigned long nohz_flags;
#endif /* CONFIG_NO_HZ_COMMON */
@@ -743,6 +765,9 @@
struct callback_head *balance_callback;
unsigned char idle_balance;
+
+ unsigned long misfit_task_load;
+
/* For active balancing */
int active_balance;
int push_cpu;
@@ -762,6 +787,20 @@
u64 max_idle_balance_cost;
#endif
+#ifdef CONFIG_SCHED_WALT
+ u64 cumulative_runnable_avg;
+ u64 window_start;
+ u64 curr_runnable_sum;
+ u64 prev_runnable_sum;
+ u64 nt_curr_runnable_sum;
+ u64 nt_prev_runnable_sum;
+ u64 cur_irqload;
+ u64 avg_irqload;
+ u64 irqload_ts;
+ u64 cum_window_demand;
+#endif /* CONFIG_SCHED_WALT */
+
+
#ifdef CONFIG_IRQ_TIME_ACCOUNTING
u64 prev_irq_time;
#endif
@@ -809,6 +848,7 @@
#ifdef CONFIG_CPU_IDLE
/* Must be inspected within a rcu lock section */
struct cpuidle_state *idle_state;
+ int idle_state_idx;
#endif
};
@@ -1067,6 +1107,9 @@
DECLARE_PER_CPU(struct sched_domain_shared *, sd_llc_shared);
DECLARE_PER_CPU(struct sched_domain *, sd_numa);
DECLARE_PER_CPU(struct sched_domain *, sd_asym);
+DECLARE_PER_CPU(struct sched_domain *, sd_ea);
+DECLARE_PER_CPU(struct sched_domain *, sd_scs);
+extern struct static_key_false sched_asym_cpucapacity;
struct sched_group_capacity {
atomic_t ref;
@@ -1076,6 +1119,7 @@
*/
unsigned long capacity;
unsigned long min_capacity; /* Min per-CPU capacity in group */
+ unsigned long max_capacity; /* Max per-CPU capacity in group */
unsigned long next_update;
int imbalance; /* XXX unrelated to capacity but shared group state */
@@ -1093,6 +1137,7 @@
unsigned int group_weight;
struct sched_group_capacity *sgc;
int asym_prefer_cpu; /* cpu of highest priority in group */
+ const struct sched_group_energy *sge;
/*
* The CPUs this group covers.
@@ -1433,7 +1478,8 @@
void (*put_prev_task) (struct rq *rq, struct task_struct *p);
#ifdef CONFIG_SMP
- int (*select_task_rq)(struct task_struct *p, int task_cpu, int sd_flag, int flags);
+ int (*select_task_rq)(struct task_struct *p, int task_cpu, int sd_flag, int flags,
+ int subling_count_hint);
void (*migrate_task_rq)(struct task_struct *p);
void (*task_woken) (struct rq *this_rq, struct task_struct *task);
@@ -1520,6 +1566,17 @@
SCHED_WARN_ON(!rcu_read_lock_held());
return rq->idle_state;
}
+
+static inline void idle_set_state_idx(struct rq *rq, int idle_state_idx)
+{
+ rq->idle_state_idx = idle_state_idx;
+}
+
+static inline int idle_get_state_idx(struct rq *rq)
+{
+ WARN_ON(!rcu_read_lock_held());
+ return rq->idle_state_idx;
+}
#else
static inline void idle_set_state(struct rq *rq,
struct cpuidle_state *idle_state)
@@ -1530,6 +1587,15 @@
{
return NULL;
}
+
+static inline void idle_set_state_idx(struct rq *rq, int idle_state_idx)
+{
+}
+
+static inline int idle_get_state_idx(struct rq *rq)
+{
+ return -1;
+}
#endif
extern void schedule_idle(void);
@@ -1599,8 +1665,8 @@
if (prev_nr < 2 && rq->nr_running >= 2) {
#ifdef CONFIG_SMP
- if (!rq->rd->overload)
- rq->rd->overload = true;
+ if (!READ_ONCE(rq->rd->overload))
+ WRITE_ONCE(rq->rd->overload, 1);
#endif
}
@@ -1666,6 +1732,7 @@
#ifdef CONFIG_SMP
extern void sched_avg_update(struct rq *rq);
+extern unsigned long sched_get_rt_rq_util(int cpu);
#ifndef arch_scale_freq_capacity
static __always_inline
@@ -1675,6 +1742,14 @@
}
#endif
+#ifndef arch_scale_max_freq_capacity
+static __always_inline
+unsigned long arch_scale_max_freq_capacity(struct sched_domain *sd, int cpu)
+{
+ return SCHED_CAPACITY_SCALE;
+}
+#endif
+
#ifndef arch_scale_cpu_capacity
static __always_inline
unsigned long arch_scale_cpu_capacity(struct sched_domain *sd, int cpu)
@@ -1686,6 +1761,23 @@
}
#endif
+#ifdef CONFIG_SMP
+static inline unsigned long capacity_of(int cpu)
+{
+ return cpu_rq(cpu)->cpu_capacity;
+}
+
+static inline unsigned long capacity_orig_of(int cpu)
+{
+ return cpu_rq(cpu)->cpu_capacity_orig;
+}
+
+extern unsigned int sysctl_sched_use_walt_cpu_util;
+extern unsigned int walt_ravg_window;
+extern bool walt_disabled;
+
+#endif /* CONFIG_SMP */
+
static inline void sched_rt_avg_update(struct rq *rq, u64 rt_delta)
{
rq->rt_avg += rt_delta * arch_scale_freq_capacity(NULL, cpu_of(rq));
@@ -1992,6 +2084,7 @@
enum rq_nohz_flag_bits {
NOHZ_TICK_STOPPED,
NOHZ_BALANCE_KICK,
+ NOHZ_STATS_KICK
};
#define nohz_flags(cpu) (&cpu_rq(cpu)->nohz_flags)
@@ -2003,6 +2096,9 @@
#ifdef CONFIG_SMP
+
+extern void init_energy_aware_data(int cpu);
+
static inline
void __dl_update(struct dl_bw *dl_b, s64 bw)
{
@@ -2096,6 +2192,17 @@
static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
#endif /* CONFIG_CPU_FREQ */
+#ifdef CONFIG_SCHED_WALT
+
+static inline bool
+walt_task_in_cum_window_demand(struct rq *rq, struct task_struct *p)
+{
+ return cpu_of(rq) == task_cpu(p) &&
+ (p->on_rq || p->last_sleep_ts >= rq->window_start);
+}
+
+#endif /* CONFIG_SCHED_WALT */
+
#ifdef arch_scale_freq_capacity
#ifndef arch_scale_freq_invariant
#define arch_scale_freq_invariant() (true)
diff --git a/kernel/sched/stop_task.c b/kernel/sched/stop_task.c
index 45caf90..7ca03e5 100644
--- a/kernel/sched/stop_task.c
+++ b/kernel/sched/stop_task.c
@@ -1,5 +1,6 @@
// SPDX-License-Identifier: GPL-2.0
#include "sched.h"
+#include "walt.h"
/*
* stop-task scheduling class.
@@ -12,7 +13,8 @@
#ifdef CONFIG_SMP
static int
-select_task_rq_stop(struct task_struct *p, int cpu, int sd_flag, int flags)
+select_task_rq_stop(struct task_struct *p, int cpu, int sd_flag, int flags,
+ int sibling_count_hint)
{
return task_cpu(p); /* stop tasks as never migrate */
}
@@ -43,12 +45,14 @@
enqueue_task_stop(struct rq *rq, struct task_struct *p, int flags)
{
add_nr_running(rq, 1);
+ walt_inc_cumulative_runnable_avg(rq, p);
}
static void
dequeue_task_stop(struct rq *rq, struct task_struct *p, int flags)
{
sub_nr_running(rq, 1);
+ walt_dec_cumulative_runnable_avg(rq, p);
}
static void yield_task_stop(struct rq *rq)
diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
index 659e075..9e967e8 100644
--- a/kernel/sched/topology.c
+++ b/kernel/sched/topology.c
@@ -39,9 +39,6 @@
if (!(sd->flags & SD_LOAD_BALANCE)) {
printk("does not load-balance\n");
- if (sd->parent)
- printk(KERN_ERR "ERROR: !SD_LOAD_BALANCE domain"
- " has parent");
return -1;
}
@@ -154,8 +151,12 @@
static int sd_degenerate(struct sched_domain *sd)
{
- if (cpumask_weight(sched_domain_span(sd)) == 1)
- return 1;
+ if (cpumask_weight(sched_domain_span(sd)) == 1) {
+ if (sd->groups->sge)
+ sd->flags &= ~SD_LOAD_BALANCE;
+ else
+ return 1;
+ }
/* Following flags need at least 2 groups */
if (sd->flags & (SD_LOAD_BALANCE |
@@ -165,7 +166,8 @@
SD_SHARE_CPUCAPACITY |
SD_ASYM_CPUCAPACITY |
SD_SHARE_PKG_RESOURCES |
- SD_SHARE_POWERDOMAIN)) {
+ SD_SHARE_POWERDOMAIN |
+ SD_SHARE_CAP_STATES)) {
if (sd->groups != sd->groups->next)
return 0;
}
@@ -198,7 +200,12 @@
SD_SHARE_CPUCAPACITY |
SD_SHARE_PKG_RESOURCES |
SD_PREFER_SIBLING |
- SD_SHARE_POWERDOMAIN);
+ SD_SHARE_POWERDOMAIN |
+ SD_SHARE_CAP_STATES);
+ if (parent->groups->sge) {
+ parent->flags &= ~SD_LOAD_BALANCE;
+ return 0;
+ }
if (nr_node_ids == 1)
pflags &= ~SD_SERIALIZE;
}
@@ -294,6 +301,11 @@
if (cpupri_init(&rd->cpupri) != 0)
goto free_cpudl;
+
+ rd->max_cap_orig_cpu = rd->min_cap_orig_cpu = -1;
+
+ init_max_cpu_capacity(&rd->max_cpu_capacity);
+
return 0;
free_cpudl:
@@ -405,11 +417,15 @@
DEFINE_PER_CPU(struct sched_domain_shared *, sd_llc_shared);
DEFINE_PER_CPU(struct sched_domain *, sd_numa);
DEFINE_PER_CPU(struct sched_domain *, sd_asym);
+DEFINE_PER_CPU(struct sched_domain *, sd_ea);
+DEFINE_PER_CPU(struct sched_domain *, sd_scs);
+DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity);
static void update_top_cache_domain(int cpu)
{
struct sched_domain_shared *sds = NULL;
struct sched_domain *sd;
+ struct sched_domain *ea_sd = NULL;
int id = cpu;
int size = 1;
@@ -430,6 +446,32 @@
sd = highest_flag_domain(cpu, SD_ASYM_PACKING);
rcu_assign_pointer(per_cpu(sd_asym, cpu), sd);
+
+ for_each_domain(cpu, sd) {
+ if (sd->groups->sge)
+ ea_sd = sd;
+ else
+ break;
+ }
+ rcu_assign_pointer(per_cpu(sd_ea, cpu), ea_sd);
+
+ sd = highest_flag_domain(cpu, SD_SHARE_CAP_STATES);
+ rcu_assign_pointer(per_cpu(sd_scs, cpu), sd);
+}
+
+static void update_asym_cpucapacity(int cpu)
+{
+ int enable = false;
+
+ rcu_read_lock();
+ if (lowest_flag_domain(cpu, SD_ASYM_CPUCAPACITY))
+ enable = true;
+ rcu_read_unlock();
+
+ if (enable) {
+ /* This expects to be hotplug-safe */
+ static_branch_enable_cpuslocked(&sched_asym_cpucapacity);
+ }
}
/*
@@ -714,6 +756,7 @@
sg_span = sched_group_span(sg);
sg->sgc->capacity = SCHED_CAPACITY_SCALE * cpumask_weight(sg_span);
sg->sgc->min_capacity = SCHED_CAPACITY_SCALE;
+ sg->sgc->max_capacity = SCHED_CAPACITY_SCALE;
}
static int
@@ -873,6 +916,7 @@
sg->sgc->capacity = SCHED_CAPACITY_SCALE * cpumask_weight(sched_group_span(sg));
sg->sgc->min_capacity = SCHED_CAPACITY_SCALE;
+ sg->sgc->max_capacity = SCHED_CAPACITY_SCALE;
return sg;
}
@@ -962,6 +1006,108 @@
update_group_capacity(sd, cpu);
}
+#define cap_state_power(s,i) (s->cap_states[i].power)
+#define cap_state_cap(s,i) (s->cap_states[i].cap)
+#define idle_state_power(s,i) (s->idle_states[i].power)
+
+static inline int sched_group_energy_equal(const struct sched_group_energy *a,
+ const struct sched_group_energy *b)
+{
+ int i;
+
+ /* check pointers first */
+ if (a == b)
+ return true;
+
+ /* check contents are equivalent */
+ if (a->nr_cap_states != b->nr_cap_states)
+ return false;
+ if (a->nr_idle_states != b->nr_idle_states)
+ return false;
+ for (i=0;i<a->nr_cap_states;i++){
+ if (cap_state_power(a,i) !=
+ cap_state_power(b,i))
+ return false;
+ if (cap_state_cap(a,i) !=
+ cap_state_cap(b,i))
+ return false;
+ }
+ for (i=0;i<a->nr_idle_states;i++){
+ if (idle_state_power(a,i) !=
+ idle_state_power(b,i))
+ return false;
+ }
+
+ return true;
+}
+
+#define energy_eff(e, n) \
+ ((e->cap_states[n].cap << SCHED_CAPACITY_SHIFT)/e->cap_states[n].power)
+
+static void init_sched_groups_energy(int cpu, struct sched_domain *sd,
+ sched_domain_energy_f fn)
+{
+ struct sched_group *sg = sd->groups;
+ const struct sched_group_energy *sge;
+ int i;
+
+ if (!(fn && fn(cpu)))
+ return;
+
+ if (cpu != group_balance_cpu(sg))
+ return;
+
+ if (sd->flags & SD_OVERLAP) {
+ pr_err("BUG: EAS does not support overlapping sd spans\n");
+#ifdef CONFIG_SCHED_DEBUG
+ pr_err(" the %s domain has SD_OVERLAP set\n", sd->name);
+#endif
+ return;
+ }
+
+ if (sd->child && !sd->child->groups->sge) {
+ pr_err("BUG: EAS setup borken for CPU%d\n", cpu);
+#ifdef CONFIG_SCHED_DEBUG
+ pr_err(" energy data on %s but not on %s domain\n",
+ sd->name, sd->child->name);
+#endif
+ return;
+ }
+
+ sge = fn(cpu);
+
+ /*
+ * Check that the per-cpu provided sd energy data is consistent for all
+ * cpus within the mask.
+ */
+ if (cpumask_weight(sched_group_span(sg)) > 1) {
+ struct cpumask mask;
+
+ cpumask_xor(&mask, sched_group_span(sg), get_cpu_mask(cpu));
+
+ for_each_cpu(i, &mask)
+ BUG_ON(!sched_group_energy_equal(sge,fn(i)));
+ }
+
+ /* Check that energy efficiency (capacity/power) is monotonically
+ * decreasing in the capacity state vector with higher indexes
+ */
+ for (i = 0; i < (sge->nr_cap_states - 1); i++) {
+ if (energy_eff(sge, i) > energy_eff(sge, i+1))
+ continue;
+#ifdef CONFIG_SCHED_DEBUG
+ pr_warn("WARN: cpu=%d, domain=%s: incr. energy eff %lu[%d]->%lu[%d]\n",
+ cpu, sd->name, energy_eff(sge, i), i,
+ energy_eff(sge, i+1), i+1);
+#else
+ pr_warn("WARN: cpu=%d: incr. energy eff %lu[%d]->%lu[%d]\n",
+ cpu, energy_eff(sge, i), i, energy_eff(sge, i+1), i+1);
+#endif
+ }
+
+ sd->groups->sge = fn(cpu);
+}
+
/*
* Initializers for schedule domains
* Non-inlined to reduce accumulated stack pressure in build_sched_domains()
@@ -1081,6 +1227,7 @@
* SD_NUMA - describes NUMA topologies
* SD_SHARE_POWERDOMAIN - describes shared power domain
* SD_ASYM_CPUCAPACITY - describes mixed capacity topologies
+ * SD_SHARE_CAP_STATES - describes shared capacity states
*
* Odd one out, which beside describing the topology has a quirk also
* prescribes the desired behaviour that goes along with it:
@@ -1093,7 +1240,8 @@
SD_NUMA | \
SD_ASYM_PACKING | \
SD_ASYM_CPUCAPACITY | \
- SD_SHARE_POWERDOMAIN)
+ SD_SHARE_POWERDOMAIN | \
+ SD_SHARE_CAP_STATES)
static struct sched_domain *
sd_init(struct sched_domain_topology_level *tl,
@@ -1141,7 +1289,7 @@
| 0*SD_SHARE_CPUCAPACITY
| 0*SD_SHARE_PKG_RESOURCES
| 0*SD_SERIALIZE
- | 0*SD_PREFER_SIBLING
+ | 1*SD_PREFER_SIBLING
| 0*SD_NUMA
| sd_flags
,
@@ -1161,18 +1309,43 @@
sd_id = cpumask_first(sched_domain_span(sd));
/*
+ * Check if cpu_map eclipses cpu capacity asymmetry.
+ */
+
+ if (sd->flags & SD_ASYM_CPUCAPACITY) {
+ long capacity = arch_scale_cpu_capacity(NULL, sd_id);
+ bool disable = true;
+ int i;
+
+ for_each_cpu(i, sched_domain_span(sd)) {
+ if (capacity != arch_scale_cpu_capacity(NULL, i)) {
+ disable = false;
+ break;
+ }
+ }
+
+ if (disable)
+ sd->flags &= ~SD_ASYM_CPUCAPACITY;
+ }
+
+ /*
* Convert topological properties into behaviour.
*/
if (sd->flags & SD_ASYM_CPUCAPACITY) {
struct sched_domain *t = sd;
+ /*
+ * Don't attempt to spread across cpus of different capacities.
+ */
+ if (sd->child)
+ sd->child->flags &= ~SD_PREFER_SIBLING;
+
for_each_lower_domain(t)
t->flags |= SD_BALANCE_WAKE;
}
if (sd->flags & SD_SHARE_CPUCAPACITY) {
- sd->flags |= SD_PREFER_SIBLING;
sd->imbalance_pct = 110;
sd->smt_gain = 1178; /* ~15% */
@@ -1187,6 +1360,7 @@
sd->busy_idx = 3;
sd->idle_idx = 2;
+ sd->flags &= ~SD_PREFER_SIBLING;
sd->flags |= SD_SERIALIZE;
if (sched_domains_numa_distance[tl->numa_level] > RECLAIM_DISTANCE) {
sd->flags &= ~(SD_BALANCE_EXEC |
@@ -1196,21 +1370,16 @@
#endif
} else {
- sd->flags |= SD_PREFER_SIBLING;
sd->cache_nice_tries = 1;
sd->busy_idx = 2;
sd->idle_idx = 1;
}
- /*
- * For all levels sharing cache; connect a sched_domain_shared
- * instance.
- */
- if (sd->flags & SD_SHARE_PKG_RESOURCES) {
- sd->shared = *per_cpu_ptr(sdd->sds, sd_id);
- atomic_inc(&sd->shared->ref);
+ sd->shared = *per_cpu_ptr(sdd->sds, sd_id);
+ atomic_inc(&sd->shared->ref);
+
+ if (sd->flags & SD_SHARE_PKG_RESOURCES)
atomic_set(&sd->shared->nr_busy_cpus, sd_weight);
- }
sd->private = sdd;
@@ -1651,7 +1820,6 @@
enum s_alloc alloc_state;
struct sched_domain *sd;
struct s_data d;
- struct rq *rq = NULL;
int i, ret = -ENOMEM;
alloc_state = __visit_domain_allocation_hell(&d, cpu_map);
@@ -1669,8 +1837,6 @@
*per_cpu_ptr(d.sd, i) = sd;
if (tl->flags & SDTL_OVERLAP)
sd->flags |= SD_OVERLAP;
- if (cpumask_equal(cpu_map, sched_domain_span(sd)))
- break;
}
}
@@ -1690,10 +1856,13 @@
/* Calculate CPU capacity for physical packages and nodes */
for (i = nr_cpumask_bits-1; i >= 0; i--) {
+ struct sched_domain_topology_level *tl = sched_domain_topology;
+
if (!cpumask_test_cpu(i, cpu_map))
continue;
- for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent) {
+ for (sd = *per_cpu_ptr(d.sd, i); sd; sd = sd->parent, tl++) {
+ init_sched_groups_energy(i, sd, tl->energy);
claim_allocations(i, sd);
init_sched_groups_capacity(i, sd);
}
@@ -1702,21 +1871,25 @@
/* Attach the domains */
rcu_read_lock();
for_each_cpu(i, cpu_map) {
- rq = cpu_rq(i);
+ int max_cpu = READ_ONCE(d.rd->max_cap_orig_cpu);
+ int min_cpu = READ_ONCE(d.rd->min_cap_orig_cpu);
+
sd = *per_cpu_ptr(d.sd, i);
- /* Use READ_ONCE()/WRITE_ONCE() to avoid load/store tearing: */
- if (rq->cpu_capacity_orig > READ_ONCE(d.rd->max_cpu_capacity))
- WRITE_ONCE(d.rd->max_cpu_capacity, rq->cpu_capacity_orig);
+ if ((max_cpu < 0) || (cpu_rq(i)->cpu_capacity_orig >
+ cpu_rq(max_cpu)->cpu_capacity_orig))
+ WRITE_ONCE(d.rd->max_cap_orig_cpu, i);
+
+ if ((min_cpu < 0) || (cpu_rq(i)->cpu_capacity_orig <
+ cpu_rq(min_cpu)->cpu_capacity_orig))
+ WRITE_ONCE(d.rd->min_cap_orig_cpu, i);
cpu_attach_domain(sd, d.rd, i);
}
rcu_read_unlock();
- if (rq && sched_debug_enabled) {
- pr_info("span: %*pbl (max cpu_capacity = %lu)\n",
- cpumask_pr_args(cpu_map), rq->rd->max_cpu_capacity);
- }
+ if (!cpumask_empty(cpu_map))
+ update_asym_cpucapacity(cpumask_first(cpu_map));
ret = 0;
error:
@@ -1928,4 +2101,3 @@
mutex_unlock(&sched_domains_mutex);
}
-
diff --git a/kernel/sched/tune.c b/kernel/sched/tune.c
new file mode 100644
index 0000000..765dcd4
--- /dev/null
+++ b/kernel/sched/tune.c
@@ -0,0 +1,621 @@
+#include <linux/cgroup.h>
+#include <linux/err.h>
+#include <linux/kernel.h>
+#include <linux/percpu.h>
+#include <linux/printk.h>
+#include <linux/rcupdate.h>
+#include <linux/slab.h>
+
+#include <trace/events/sched.h>
+
+#include "sched.h"
+#include "tune.h"
+
+bool schedtune_initialized = false;
+extern struct reciprocal_value schedtune_spc_rdiv;
+
+/* We hold schedtune boost in effect for at least this long */
+#define SCHEDTUNE_BOOST_HOLD_NS 50000000ULL
+
+/*
+ * EAS scheduler tunables for task groups.
+ */
+
+/* SchdTune tunables for a group of tasks */
+struct schedtune {
+ /* SchedTune CGroup subsystem */
+ struct cgroup_subsys_state css;
+
+ /* Boost group allocated ID */
+ int idx;
+
+ /* Boost value for tasks on that SchedTune CGroup */
+ int boost;
+
+ /* Hint to bias scheduling of tasks on that SchedTune CGroup
+ * towards idle CPUs */
+ int prefer_idle;
+};
+
+static inline struct schedtune *css_st(struct cgroup_subsys_state *css)
+{
+ return css ? container_of(css, struct schedtune, css) : NULL;
+}
+
+static inline struct schedtune *task_schedtune(struct task_struct *tsk)
+{
+ return css_st(task_css(tsk, schedtune_cgrp_id));
+}
+
+static inline struct schedtune *parent_st(struct schedtune *st)
+{
+ return css_st(st->css.parent);
+}
+
+/*
+ * SchedTune root control group
+ * The root control group is used to defined a system-wide boosting tuning,
+ * which is applied to all tasks in the system.
+ * Task specific boost tuning could be specified by creating and
+ * configuring a child control group under the root one.
+ * By default, system-wide boosting is disabled, i.e. no boosting is applied
+ * to tasks which are not into a child control group.
+ */
+static struct schedtune
+root_schedtune = {
+ .boost = 0,
+ .prefer_idle = 0,
+};
+
+/*
+ * Maximum number of boost groups to support
+ * When per-task boosting is used we still allow only limited number of
+ * boost groups for two main reasons:
+ * 1. on a real system we usually have only few classes of workloads which
+ * make sense to boost with different values (e.g. background vs foreground
+ * tasks, interactive vs low-priority tasks)
+ * 2. a limited number allows for a simpler and more memory/time efficient
+ * implementation especially for the computation of the per-CPU boost
+ * value
+ */
+#define BOOSTGROUPS_COUNT 5
+
+/* Array of configured boostgroups */
+static struct schedtune *allocated_group[BOOSTGROUPS_COUNT] = {
+ &root_schedtune,
+ NULL,
+};
+
+/* SchedTune boost groups
+ * Keep track of all the boost groups which impact on CPU, for example when a
+ * CPU has two RUNNABLE tasks belonging to two different boost groups and thus
+ * likely with different boost values.
+ * Since on each system we expect only a limited number of boost groups, here
+ * we use a simple array to keep track of the metrics required to compute the
+ * maximum per-CPU boosting value.
+ */
+struct boost_groups {
+ /* Maximum boost value for all RUNNABLE tasks on a CPU */
+ bool idle;
+ int boost_max;
+ u64 boost_ts;
+ struct {
+ /* The boost for tasks on that boost group */
+ int boost;
+ /* Count of RUNNABLE tasks on that boost group */
+ unsigned tasks;
+ /* Timestamp of boost activation */
+ u64 ts;
+ } group[BOOSTGROUPS_COUNT];
+ /* CPU's boost group locking */
+ raw_spinlock_t lock;
+};
+
+/* Boost groups affecting each CPU in the system */
+DEFINE_PER_CPU(struct boost_groups, cpu_boost_groups);
+
+static inline bool schedtune_boost_timeout(u64 now, u64 ts)
+{
+ return ((now - ts) > SCHEDTUNE_BOOST_HOLD_NS);
+}
+
+static inline bool
+schedtune_boost_group_active(int idx, struct boost_groups* bg, u64 now)
+{
+ if (bg->group[idx].tasks)
+ return true;
+
+ return !schedtune_boost_timeout(now, bg->group[idx].ts);
+}
+
+static void
+schedtune_cpu_update(int cpu, u64 now)
+{
+ struct boost_groups *bg = &per_cpu(cpu_boost_groups, cpu);
+ int boost_max;
+ u64 boost_ts;
+ int idx;
+
+ /* The root boost group is always active */
+ boost_max = bg->group[0].boost;
+ boost_ts = now;
+ for (idx = 1; idx < BOOSTGROUPS_COUNT; ++idx) {
+ /*
+ * A boost group affects a CPU only if it has
+ * RUNNABLE tasks on that CPU or it has hold
+ * in effect from a previous task.
+ */
+ if (!schedtune_boost_group_active(idx, bg, now))
+ continue;
+
+ /* This boost group is active */
+ if (boost_max > bg->group[idx].boost)
+ continue;
+
+ boost_max = bg->group[idx].boost;
+ boost_ts = bg->group[idx].ts;
+ }
+ /* Ensures boost_max is non-negative when all cgroup boost values
+ * are neagtive. Avoids under-accounting of cpu capacity which may cause
+ * task stacking and frequency spikes.*/
+ boost_max = max(boost_max, 0);
+ bg->boost_max = boost_max;
+ bg->boost_ts = boost_ts;
+}
+
+static int
+schedtune_boostgroup_update(int idx, int boost)
+{
+ struct boost_groups *bg;
+ int cur_boost_max;
+ int old_boost;
+ int cpu;
+ u64 now;
+
+ /* Update per CPU boost groups */
+ for_each_possible_cpu(cpu) {
+ bg = &per_cpu(cpu_boost_groups, cpu);
+
+ /*
+ * Keep track of current boost values to compute the per CPU
+ * maximum only when it has been affected by the new value of
+ * the updated boost group
+ */
+ cur_boost_max = bg->boost_max;
+ old_boost = bg->group[idx].boost;
+
+ /* Update the boost value of this boost group */
+ bg->group[idx].boost = boost;
+
+ /* Check if this update increase current max */
+ now = sched_clock_cpu(cpu);
+ if (boost > cur_boost_max &&
+ schedtune_boost_group_active(idx, bg, now)) {
+ bg->boost_max = boost;
+ bg->boost_ts = bg->group[idx].ts;
+
+ trace_sched_tune_boostgroup_update(cpu, 1, bg->boost_max);
+ continue;
+ }
+
+ /* Check if this update has decreased current max */
+ if (cur_boost_max == old_boost && old_boost > boost) {
+ schedtune_cpu_update(cpu, now);
+ trace_sched_tune_boostgroup_update(cpu, -1, bg->boost_max);
+ continue;
+ }
+
+ trace_sched_tune_boostgroup_update(cpu, 0, bg->boost_max);
+ }
+
+ return 0;
+}
+
+#define ENQUEUE_TASK 1
+#define DEQUEUE_TASK -1
+
+static inline bool
+schedtune_update_timestamp(struct task_struct *p)
+{
+ if (sched_feat(SCHEDTUNE_BOOST_HOLD_ALL))
+ return true;
+
+ return task_has_rt_policy(p);
+}
+
+static inline void
+schedtune_tasks_update(struct task_struct *p, int cpu, int idx, int task_count)
+{
+ struct boost_groups *bg = &per_cpu(cpu_boost_groups, cpu);
+ int tasks = bg->group[idx].tasks + task_count;
+
+ /* Update boosted tasks count while avoiding to make it negative */
+ bg->group[idx].tasks = max(0, tasks);
+
+ /* Update timeout on enqueue */
+ if (task_count > 0) {
+ u64 now = sched_clock_cpu(cpu);
+
+ if (schedtune_update_timestamp(p))
+ bg->group[idx].ts = now;
+
+ /* Boost group activation or deactivation on that RQ */
+ if (bg->group[idx].tasks == 1)
+ schedtune_cpu_update(cpu, now);
+ }
+
+ trace_sched_tune_tasks_update(p, cpu, tasks, idx,
+ bg->group[idx].boost, bg->boost_max,
+ bg->group[idx].ts);
+}
+
+/*
+ * NOTE: This function must be called while holding the lock on the CPU RQ
+ */
+void schedtune_enqueue_task(struct task_struct *p, int cpu)
+{
+ struct boost_groups *bg = &per_cpu(cpu_boost_groups, cpu);
+ unsigned long irq_flags;
+ struct schedtune *st;
+ int idx;
+
+ if (unlikely(!schedtune_initialized))
+ return;
+
+ /*
+ * Boost group accouting is protected by a per-cpu lock and requires
+ * interrupt to be disabled to avoid race conditions for example on
+ * do_exit()::cgroup_exit() and task migration.
+ */
+ raw_spin_lock_irqsave(&bg->lock, irq_flags);
+ rcu_read_lock();
+
+ st = task_schedtune(p);
+ idx = st->idx;
+
+ schedtune_tasks_update(p, cpu, idx, ENQUEUE_TASK);
+
+ rcu_read_unlock();
+ raw_spin_unlock_irqrestore(&bg->lock, irq_flags);
+}
+
+int schedtune_can_attach(struct cgroup_taskset *tset)
+{
+ struct task_struct *task;
+ struct cgroup_subsys_state *css;
+ struct boost_groups *bg;
+ struct rq_flags rq_flags;
+ unsigned int cpu;
+ struct rq *rq;
+ int src_bg; /* Source boost group index */
+ int dst_bg; /* Destination boost group index */
+ int tasks;
+ u64 now;
+
+ if (unlikely(!schedtune_initialized))
+ return 0;
+
+
+ cgroup_taskset_for_each(task, css, tset) {
+
+ /*
+ * Lock the CPU's RQ the task is enqueued to avoid race
+ * conditions with migration code while the task is being
+ * accounted
+ */
+ rq = task_rq_lock(task, &rq_flags);
+
+ if (!task->on_rq) {
+ task_rq_unlock(rq, task, &rq_flags);
+ continue;
+ }
+
+ /*
+ * Boost group accouting is protected by a per-cpu lock and requires
+ * interrupt to be disabled to avoid race conditions on...
+ */
+ cpu = cpu_of(rq);
+ bg = &per_cpu(cpu_boost_groups, cpu);
+ raw_spin_lock(&bg->lock);
+
+ dst_bg = css_st(css)->idx;
+ src_bg = task_schedtune(task)->idx;
+
+ /*
+ * Current task is not changing boostgroup, which can
+ * happen when the new hierarchy is in use.
+ */
+ if (unlikely(dst_bg == src_bg)) {
+ raw_spin_unlock(&bg->lock);
+ task_rq_unlock(rq, task, &rq_flags);
+ continue;
+ }
+
+ /*
+ * This is the case of a RUNNABLE task which is switching its
+ * current boost group.
+ */
+
+ /* Move task from src to dst boost group */
+ tasks = bg->group[src_bg].tasks - 1;
+ bg->group[src_bg].tasks = max(0, tasks);
+ bg->group[dst_bg].tasks += 1;
+
+ /* Update boost hold start for this group */
+ now = sched_clock_cpu(cpu);
+ bg->group[dst_bg].ts = now;
+
+ /* Force boost group re-evaluation at next boost check */
+ bg->boost_ts = now - SCHEDTUNE_BOOST_HOLD_NS;
+
+ raw_spin_unlock(&bg->lock);
+ task_rq_unlock(rq, task, &rq_flags);
+ }
+
+ return 0;
+}
+
+void schedtune_cancel_attach(struct cgroup_taskset *tset)
+{
+ /* This can happen only if SchedTune controller is mounted with
+ * other hierarchies ane one of them fails. Since usually SchedTune is
+ * mouted on its own hierarcy, for the time being we do not implement
+ * a proper rollback mechanism */
+ WARN(1, "SchedTune cancel attach not implemented");
+}
+
+/*
+ * NOTE: This function must be called while holding the lock on the CPU RQ
+ */
+void schedtune_dequeue_task(struct task_struct *p, int cpu)
+{
+ struct boost_groups *bg = &per_cpu(cpu_boost_groups, cpu);
+ unsigned long irq_flags;
+ struct schedtune *st;
+ int idx;
+
+ if (unlikely(!schedtune_initialized))
+ return;
+
+ /*
+ * Boost group accouting is protected by a per-cpu lock and requires
+ * interrupt to be disabled to avoid race conditions on...
+ */
+ raw_spin_lock_irqsave(&bg->lock, irq_flags);
+ rcu_read_lock();
+
+ st = task_schedtune(p);
+ idx = st->idx;
+
+ schedtune_tasks_update(p, cpu, idx, DEQUEUE_TASK);
+
+ rcu_read_unlock();
+ raw_spin_unlock_irqrestore(&bg->lock, irq_flags);
+}
+
+int schedtune_cpu_boost(int cpu)
+{
+ struct boost_groups *bg;
+ u64 now;
+
+ bg = &per_cpu(cpu_boost_groups, cpu);
+ now = sched_clock_cpu(cpu);
+
+ /* Check to see if we have a hold in effect */
+ if (schedtune_boost_timeout(now, bg->boost_ts))
+ schedtune_cpu_update(cpu, now);
+
+ return bg->boost_max;
+}
+
+int schedtune_task_boost(struct task_struct *p)
+{
+ struct schedtune *st;
+ int task_boost;
+
+ if (unlikely(!schedtune_initialized))
+ return 0;
+
+ /* Get task boost value */
+ rcu_read_lock();
+ st = task_schedtune(p);
+ task_boost = st->boost;
+ rcu_read_unlock();
+
+ return task_boost;
+}
+
+int schedtune_prefer_idle(struct task_struct *p)
+{
+ struct schedtune *st;
+ int prefer_idle;
+
+ if (unlikely(!schedtune_initialized))
+ return 0;
+
+ /* Get prefer_idle value */
+ rcu_read_lock();
+ st = task_schedtune(p);
+ prefer_idle = st->prefer_idle;
+ rcu_read_unlock();
+
+ return prefer_idle;
+}
+
+static u64
+prefer_idle_read(struct cgroup_subsys_state *css, struct cftype *cft)
+{
+ struct schedtune *st = css_st(css);
+
+ return st->prefer_idle;
+}
+
+static int
+prefer_idle_write(struct cgroup_subsys_state *css, struct cftype *cft,
+ u64 prefer_idle)
+{
+ struct schedtune *st = css_st(css);
+ st->prefer_idle = prefer_idle;
+
+ return 0;
+}
+
+static s64
+boost_read(struct cgroup_subsys_state *css, struct cftype *cft)
+{
+ struct schedtune *st = css_st(css);
+
+ return st->boost;
+}
+
+static int
+boost_write(struct cgroup_subsys_state *css, struct cftype *cft,
+ s64 boost)
+{
+ struct schedtune *st = css_st(css);
+
+ if (boost < 0 || boost > 100)
+ return -EINVAL;
+
+ st->boost = boost;
+
+ /* Update CPU boost */
+ schedtune_boostgroup_update(st->idx, st->boost);
+
+ return 0;
+}
+
+static struct cftype files[] = {
+ {
+ .name = "boost",
+ .read_s64 = boost_read,
+ .write_s64 = boost_write,
+ },
+ {
+ .name = "prefer_idle",
+ .read_u64 = prefer_idle_read,
+ .write_u64 = prefer_idle_write,
+ },
+ { } /* terminate */
+};
+
+static int
+schedtune_boostgroup_init(struct schedtune *st)
+{
+ struct boost_groups *bg;
+ int cpu;
+
+ /* Keep track of allocated boost groups */
+ allocated_group[st->idx] = st;
+
+ /* Initialize the per CPU boost groups */
+ for_each_possible_cpu(cpu) {
+ bg = &per_cpu(cpu_boost_groups, cpu);
+ bg->group[st->idx].boost = 0;
+ bg->group[st->idx].tasks = 0;
+ bg->group[st->idx].ts = 0;
+ }
+
+ return 0;
+}
+
+static struct cgroup_subsys_state *
+schedtune_css_alloc(struct cgroup_subsys_state *parent_css)
+{
+ struct schedtune *st;
+ int idx;
+
+ if (!parent_css)
+ return &root_schedtune.css;
+
+ /* Allow only single level hierachies */
+ if (parent_css != &root_schedtune.css) {
+ pr_err("Nested SchedTune boosting groups not allowed\n");
+ return ERR_PTR(-ENOMEM);
+ }
+
+ /* Allow only a limited number of boosting groups */
+ for (idx = 1; idx < BOOSTGROUPS_COUNT; ++idx)
+ if (!allocated_group[idx])
+ break;
+ if (idx == BOOSTGROUPS_COUNT) {
+ pr_err("Trying to create more than %d SchedTune boosting groups\n",
+ BOOSTGROUPS_COUNT);
+ return ERR_PTR(-ENOSPC);
+ }
+
+ st = kzalloc(sizeof(*st), GFP_KERNEL);
+ if (!st)
+ goto out;
+
+ /* Initialize per CPUs boost group support */
+ st->idx = idx;
+ if (schedtune_boostgroup_init(st))
+ goto release;
+
+ return &st->css;
+
+release:
+ kfree(st);
+out:
+ return ERR_PTR(-ENOMEM);
+}
+
+static void
+schedtune_boostgroup_release(struct schedtune *st)
+{
+ /* Reset this boost group */
+ schedtune_boostgroup_update(st->idx, 0);
+
+ /* Keep track of allocated boost groups */
+ allocated_group[st->idx] = NULL;
+}
+
+static void
+schedtune_css_free(struct cgroup_subsys_state *css)
+{
+ struct schedtune *st = css_st(css);
+
+ schedtune_boostgroup_release(st);
+ kfree(st);
+}
+
+struct cgroup_subsys schedtune_cgrp_subsys = {
+ .css_alloc = schedtune_css_alloc,
+ .css_free = schedtune_css_free,
+ .can_attach = schedtune_can_attach,
+ .cancel_attach = schedtune_cancel_attach,
+ .legacy_cftypes = files,
+ .early_init = 1,
+};
+
+static inline void
+schedtune_init_cgroups(void)
+{
+ struct boost_groups *bg;
+ int cpu;
+
+ /* Initialize the per CPU boost groups */
+ for_each_possible_cpu(cpu) {
+ bg = &per_cpu(cpu_boost_groups, cpu);
+ memset(bg, 0, sizeof(struct boost_groups));
+ raw_spin_lock_init(&bg->lock);
+ }
+
+ pr_info("schedtune: configured to support %d boost groups\n",
+ BOOSTGROUPS_COUNT);
+
+ schedtune_initialized = true;
+}
+
+/*
+ * Initialize the cgroup structures
+ */
+static int
+schedtune_init(void)
+{
+ schedtune_spc_rdiv = reciprocal_value(100);
+ schedtune_init_cgroups();
+ return 0;
+}
+postcore_initcall(schedtune_init);
diff --git a/kernel/sched/tune.h b/kernel/sched/tune.h
new file mode 100644
index 0000000..e79e1b1
--- /dev/null
+++ b/kernel/sched/tune.h
@@ -0,0 +1,33 @@
+
+#ifdef CONFIG_SCHED_TUNE
+
+#include <linux/reciprocal_div.h>
+
+/*
+ * System energy normalization constants
+ */
+struct target_nrg {
+ unsigned long min_power;
+ unsigned long max_power;
+ struct reciprocal_value rdiv;
+};
+
+int schedtune_cpu_boost(int cpu);
+int schedtune_task_boost(struct task_struct *tsk);
+
+int schedtune_prefer_idle(struct task_struct *tsk);
+
+void schedtune_enqueue_task(struct task_struct *p, int cpu);
+void schedtune_dequeue_task(struct task_struct *p, int cpu);
+
+#else /* CONFIG_SCHED_TUNE */
+
+#define schedtune_cpu_boost(cpu) 0
+#define schedtune_task_boost(tsk) 0
+
+#define schedtune_prefer_idle(tsk) 0
+
+#define schedtune_enqueue_task(task, cpu) do { } while (0)
+#define schedtune_dequeue_task(task, cpu) do { } while (0)
+
+#endif /* CONFIG_SCHED_TUNE */
diff --git a/kernel/sched/walt.c b/kernel/sched/walt.c
new file mode 100644
index 0000000..6a88c6e
--- /dev/null
+++ b/kernel/sched/walt.c
@@ -0,0 +1,923 @@
+/*
+ * Copyright (c) 2016, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ *
+ * Window Assisted Load Tracking (WALT) implementation credits:
+ * Srivatsa Vaddagiri, Steve Muckle, Syed Rameez Mustafa, Joonwoo Park,
+ * Pavan Kumar Kondeti, Olav Haugan
+ *
+ * 2016-03-06: Integration with EAS/refactoring by Vikram Mulukutla
+ * and Todd Kjos
+ */
+
+#include <linux/acpi.h>
+#include <linux/syscore_ops.h>
+#include <trace/events/sched.h>
+#include "sched.h"
+#include "walt.h"
+
+#define WINDOW_STATS_RECENT 0
+#define WINDOW_STATS_MAX 1
+#define WINDOW_STATS_MAX_RECENT_AVG 2
+#define WINDOW_STATS_AVG 3
+#define WINDOW_STATS_INVALID_POLICY 4
+
+#define EXITING_TASK_MARKER 0xdeaddead
+
+static __read_mostly unsigned int walt_ravg_hist_size = 5;
+static __read_mostly unsigned int walt_window_stats_policy =
+ WINDOW_STATS_MAX_RECENT_AVG;
+static __read_mostly unsigned int walt_account_wait_time = 1;
+static __read_mostly unsigned int walt_freq_account_wait_time = 0;
+static __read_mostly unsigned int walt_io_is_busy = 0;
+
+unsigned int sysctl_sched_walt_init_task_load_pct = 15;
+
+/* true -> use PELT based load stats, false -> use window-based load stats */
+bool __read_mostly walt_disabled = false;
+
+/*
+ * Window size (in ns). Adjust for the tick size so that the window
+ * rollover occurs just before the tick boundary.
+ */
+__read_mostly unsigned int walt_ravg_window =
+ (20000000 / TICK_NSEC) * TICK_NSEC;
+#define MIN_SCHED_RAVG_WINDOW ((10000000 / TICK_NSEC) * TICK_NSEC)
+#define MAX_SCHED_RAVG_WINDOW ((1000000000 / TICK_NSEC) * TICK_NSEC)
+
+static unsigned int sync_cpu;
+static ktime_t ktime_last;
+static __read_mostly bool walt_ktime_suspended;
+
+static unsigned int task_load(struct task_struct *p)
+{
+ return p->ravg.demand;
+}
+
+static inline void fixup_cum_window_demand(struct rq *rq, s64 delta)
+{
+ rq->cum_window_demand += delta;
+ if (unlikely((s64)rq->cum_window_demand < 0))
+ rq->cum_window_demand = 0;
+}
+
+void
+walt_inc_cumulative_runnable_avg(struct rq *rq,
+ struct task_struct *p)
+{
+ rq->cumulative_runnable_avg += p->ravg.demand;
+
+ /*
+ * Add a task's contribution to the cumulative window demand when
+ *
+ * (1) task is enqueued with on_rq = 1 i.e migration,
+ * prio/cgroup/class change.
+ * (2) task is waking for the first time in this window.
+ */
+ if (p->on_rq || (p->last_sleep_ts < rq->window_start))
+ fixup_cum_window_demand(rq, p->ravg.demand);
+}
+
+void
+walt_dec_cumulative_runnable_avg(struct rq *rq,
+ struct task_struct *p)
+{
+ rq->cumulative_runnable_avg -= p->ravg.demand;
+ BUG_ON((s64)rq->cumulative_runnable_avg < 0);
+
+ /*
+ * on_rq will be 1 for sleeping tasks. So check if the task
+ * is migrating or dequeuing in RUNNING state to change the
+ * prio/cgroup/class.
+ */
+ if (task_on_rq_migrating(p) || p->state == TASK_RUNNING)
+ fixup_cum_window_demand(rq, -(s64)p->ravg.demand);
+}
+
+static void
+fixup_cumulative_runnable_avg(struct rq *rq,
+ struct task_struct *p, u64 new_task_load)
+{
+ s64 task_load_delta = (s64)new_task_load - task_load(p);
+
+ rq->cumulative_runnable_avg += task_load_delta;
+ if ((s64)rq->cumulative_runnable_avg < 0)
+ panic("cra less than zero: tld: %lld, task_load(p) = %u\n",
+ task_load_delta, task_load(p));
+
+ fixup_cum_window_demand(rq, task_load_delta);
+}
+
+u64 walt_ktime_clock(void)
+{
+ if (unlikely(walt_ktime_suspended))
+ return ktime_to_ns(ktime_last);
+ return ktime_get_ns();
+}
+
+static void walt_resume(void)
+{
+ walt_ktime_suspended = false;
+}
+
+static int walt_suspend(void)
+{
+ ktime_last = ktime_get();
+ walt_ktime_suspended = true;
+ return 0;
+}
+
+static struct syscore_ops walt_syscore_ops = {
+ .resume = walt_resume,
+ .suspend = walt_suspend
+};
+
+static int __init walt_init_ops(void)
+{
+ register_syscore_ops(&walt_syscore_ops);
+ return 0;
+}
+late_initcall(walt_init_ops);
+
+#ifdef CONFIG_CFS_BANDWIDTH
+void walt_inc_cfs_cumulative_runnable_avg(struct cfs_rq *cfs_rq,
+ struct task_struct *p)
+{
+ cfs_rq->cumulative_runnable_avg += p->ravg.demand;
+}
+
+void walt_dec_cfs_cumulative_runnable_avg(struct cfs_rq *cfs_rq,
+ struct task_struct *p)
+{
+ cfs_rq->cumulative_runnable_avg -= p->ravg.demand;
+}
+#endif
+
+static int exiting_task(struct task_struct *p)
+{
+ if (p->flags & PF_EXITING) {
+ if (p->ravg.sum_history[0] != EXITING_TASK_MARKER) {
+ p->ravg.sum_history[0] = EXITING_TASK_MARKER;
+ }
+ return 1;
+ }
+ return 0;
+}
+
+static int __init set_walt_ravg_window(char *str)
+{
+ unsigned int adj_window;
+ bool no_walt = walt_disabled;
+
+ get_option(&str, &walt_ravg_window);
+
+ /* Adjust for CONFIG_HZ */
+ adj_window = (walt_ravg_window / TICK_NSEC) * TICK_NSEC;
+
+ /* Warn if we're a bit too far away from the expected window size */
+ WARN(adj_window < walt_ravg_window - NSEC_PER_MSEC,
+ "tick-adjusted window size %u, original was %u\n", adj_window,
+ walt_ravg_window);
+
+ walt_ravg_window = adj_window;
+
+ walt_disabled = walt_disabled ||
+ (walt_ravg_window < MIN_SCHED_RAVG_WINDOW ||
+ walt_ravg_window > MAX_SCHED_RAVG_WINDOW);
+
+ WARN(!no_walt && walt_disabled,
+ "invalid window size, disabling WALT\n");
+
+ return 0;
+}
+
+early_param("walt_ravg_window", set_walt_ravg_window);
+
+static void
+update_window_start(struct rq *rq, u64 wallclock)
+{
+ s64 delta;
+ int nr_windows;
+
+ delta = wallclock - rq->window_start;
+ /* If the MPM global timer is cleared, set delta as 0 to avoid kernel BUG happening */
+ if (delta < 0) {
+ delta = 0;
+ WARN_ONCE(1, "WALT wallclock appears to have gone backwards or reset\n");
+ }
+
+ if (delta < walt_ravg_window)
+ return;
+
+ nr_windows = div64_u64(delta, walt_ravg_window);
+ rq->window_start += (u64)nr_windows * (u64)walt_ravg_window;
+
+ rq->cum_window_demand = rq->cumulative_runnable_avg;
+}
+
+extern unsigned long capacity_curr_of(int cpu);
+/*
+ * Translate absolute delta time accounted on a CPU
+ * to a scale where 1024 is the capacity of the most
+ * capable CPU running at FMAX
+ */
+static u64 scale_exec_time(u64 delta, struct rq *rq)
+{
+ unsigned long capcurr = capacity_curr_of(cpu_of(rq));
+
+ return (delta * capcurr) >> SCHED_CAPACITY_SHIFT;
+}
+
+static int cpu_is_waiting_on_io(struct rq *rq)
+{
+ if (!walt_io_is_busy)
+ return 0;
+
+ return atomic_read(&rq->nr_iowait);
+}
+
+void walt_account_irqtime(int cpu, struct task_struct *curr,
+ u64 delta, u64 wallclock)
+{
+ struct rq *rq = cpu_rq(cpu);
+ unsigned long flags, nr_windows;
+ u64 cur_jiffies_ts;
+
+ raw_spin_lock_irqsave(&rq->lock, flags);
+
+ /*
+ * cputime (wallclock) uses sched_clock so use the same here for
+ * consistency.
+ */
+ delta += sched_clock() - wallclock;
+ cur_jiffies_ts = get_jiffies_64();
+
+ if (is_idle_task(curr))
+ walt_update_task_ravg(curr, rq, IRQ_UPDATE, walt_ktime_clock(),
+ delta);
+
+ nr_windows = cur_jiffies_ts - rq->irqload_ts;
+
+ if (nr_windows) {
+ if (nr_windows < 10) {
+ /* Decay CPU's irqload by 3/4 for each window. */
+ rq->avg_irqload *= (3 * nr_windows);
+ rq->avg_irqload = div64_u64(rq->avg_irqload,
+ 4 * nr_windows);
+ } else {
+ rq->avg_irqload = 0;
+ }
+ rq->avg_irqload += rq->cur_irqload;
+ rq->cur_irqload = 0;
+ }
+
+ rq->cur_irqload += delta;
+ rq->irqload_ts = cur_jiffies_ts;
+ raw_spin_unlock_irqrestore(&rq->lock, flags);
+}
+
+
+#define WALT_HIGH_IRQ_TIMEOUT 3
+
+u64 walt_irqload(int cpu) {
+ struct rq *rq = cpu_rq(cpu);
+ s64 delta;
+ delta = get_jiffies_64() - rq->irqload_ts;
+
+ /*
+ * Current context can be preempted by irq and rq->irqload_ts can be
+ * updated by irq context so that delta can be negative.
+ * But this is okay and we can safely return as this means there
+ * was recent irq occurrence.
+ */
+
+ if (delta < WALT_HIGH_IRQ_TIMEOUT)
+ return rq->avg_irqload;
+ else
+ return 0;
+}
+
+int walt_cpu_high_irqload(int cpu) {
+ return walt_irqload(cpu) >= sysctl_sched_walt_cpu_high_irqload;
+}
+
+static int account_busy_for_cpu_time(struct rq *rq, struct task_struct *p,
+ u64 irqtime, int event)
+{
+ if (is_idle_task(p)) {
+ /* TASK_WAKE && TASK_MIGRATE is not possible on idle task! */
+ if (event == PICK_NEXT_TASK)
+ return 0;
+
+ /* PUT_PREV_TASK, TASK_UPDATE && IRQ_UPDATE are left */
+ return irqtime || cpu_is_waiting_on_io(rq);
+ }
+
+ if (event == TASK_WAKE)
+ return 0;
+
+ if (event == PUT_PREV_TASK || event == IRQ_UPDATE ||
+ event == TASK_UPDATE)
+ return 1;
+
+ /* Only TASK_MIGRATE && PICK_NEXT_TASK left */
+ return walt_freq_account_wait_time;
+}
+
+/*
+ * Account cpu activity in its busy time counters (rq->curr/prev_runnable_sum)
+ */
+static void update_cpu_busy_time(struct task_struct *p, struct rq *rq,
+ int event, u64 wallclock, u64 irqtime)
+{
+ int new_window, nr_full_windows = 0;
+ int p_is_curr_task = (p == rq->curr);
+ u64 mark_start = p->ravg.mark_start;
+ u64 window_start = rq->window_start;
+ u32 window_size = walt_ravg_window;
+ u64 delta;
+
+ new_window = mark_start < window_start;
+ if (new_window) {
+ nr_full_windows = div64_u64((window_start - mark_start),
+ window_size);
+ if (p->ravg.active_windows < USHRT_MAX)
+ p->ravg.active_windows++;
+ }
+
+ /* Handle per-task window rollover. We don't care about the idle
+ * task or exiting tasks. */
+ if (new_window && !is_idle_task(p) && !exiting_task(p)) {
+ u32 curr_window = 0;
+
+ if (!nr_full_windows)
+ curr_window = p->ravg.curr_window;
+
+ p->ravg.prev_window = curr_window;
+ p->ravg.curr_window = 0;
+ }
+
+ if (!account_busy_for_cpu_time(rq, p, irqtime, event)) {
+ /* account_busy_for_cpu_time() = 0, so no update to the
+ * task's current window needs to be made. This could be
+ * for example
+ *
+ * - a wakeup event on a task within the current
+ * window (!new_window below, no action required),
+ * - switching to a new task from idle (PICK_NEXT_TASK)
+ * in a new window where irqtime is 0 and we aren't
+ * waiting on IO */
+
+ if (!new_window)
+ return;
+
+ /* A new window has started. The RQ demand must be rolled
+ * over if p is the current task. */
+ if (p_is_curr_task) {
+ u64 prev_sum = 0;
+
+ /* p is either idle task or an exiting task */
+ if (!nr_full_windows) {
+ prev_sum = rq->curr_runnable_sum;
+ }
+
+ rq->prev_runnable_sum = prev_sum;
+ rq->curr_runnable_sum = 0;
+ }
+
+ return;
+ }
+
+ if (!new_window) {
+ /* account_busy_for_cpu_time() = 1 so busy time needs
+ * to be accounted to the current window. No rollover
+ * since we didn't start a new window. An example of this is
+ * when a task starts execution and then sleeps within the
+ * same window. */
+
+ if (!irqtime || !is_idle_task(p) || cpu_is_waiting_on_io(rq))
+ delta = wallclock - mark_start;
+ else
+ delta = irqtime;
+ delta = scale_exec_time(delta, rq);
+ rq->curr_runnable_sum += delta;
+ if (!is_idle_task(p) && !exiting_task(p))
+ p->ravg.curr_window += delta;
+
+ return;
+ }
+
+ if (!p_is_curr_task) {
+ /* account_busy_for_cpu_time() = 1 so busy time needs
+ * to be accounted to the current window. A new window
+ * has also started, but p is not the current task, so the
+ * window is not rolled over - just split up and account
+ * as necessary into curr and prev. The window is only
+ * rolled over when a new window is processed for the current
+ * task.
+ *
+ * Irqtime can't be accounted by a task that isn't the
+ * currently running task. */
+
+ if (!nr_full_windows) {
+ /* A full window hasn't elapsed, account partial
+ * contribution to previous completed window. */
+ delta = scale_exec_time(window_start - mark_start, rq);
+ if (!exiting_task(p))
+ p->ravg.prev_window += delta;
+ } else {
+ /* Since at least one full window has elapsed,
+ * the contribution to the previous window is the
+ * full window (window_size). */
+ delta = scale_exec_time(window_size, rq);
+ if (!exiting_task(p))
+ p->ravg.prev_window = delta;
+ }
+ rq->prev_runnable_sum += delta;
+
+ /* Account piece of busy time in the current window. */
+ delta = scale_exec_time(wallclock - window_start, rq);
+ rq->curr_runnable_sum += delta;
+ if (!exiting_task(p))
+ p->ravg.curr_window = delta;
+
+ return;
+ }
+
+ if (!irqtime || !is_idle_task(p) || cpu_is_waiting_on_io(rq)) {
+ /* account_busy_for_cpu_time() = 1 so busy time needs
+ * to be accounted to the current window. A new window
+ * has started and p is the current task so rollover is
+ * needed. If any of these three above conditions are true
+ * then this busy time can't be accounted as irqtime.
+ *
+ * Busy time for the idle task or exiting tasks need not
+ * be accounted.
+ *
+ * An example of this would be a task that starts execution
+ * and then sleeps once a new window has begun. */
+
+ if (!nr_full_windows) {
+ /* A full window hasn't elapsed, account partial
+ * contribution to previous completed window. */
+ delta = scale_exec_time(window_start - mark_start, rq);
+ if (!is_idle_task(p) && !exiting_task(p))
+ p->ravg.prev_window += delta;
+
+ delta += rq->curr_runnable_sum;
+ } else {
+ /* Since at least one full window has elapsed,
+ * the contribution to the previous window is the
+ * full window (window_size). */
+ delta = scale_exec_time(window_size, rq);
+ if (!is_idle_task(p) && !exiting_task(p))
+ p->ravg.prev_window = delta;
+
+ }
+ /*
+ * Rollover for normal runnable sum is done here by overwriting
+ * the values in prev_runnable_sum and curr_runnable_sum.
+ * Rollover for new task runnable sum has completed by previous
+ * if-else statement.
+ */
+ rq->prev_runnable_sum = delta;
+
+ /* Account piece of busy time in the current window. */
+ delta = scale_exec_time(wallclock - window_start, rq);
+ rq->curr_runnable_sum = delta;
+ if (!is_idle_task(p) && !exiting_task(p))
+ p->ravg.curr_window = delta;
+
+ return;
+ }
+
+ if (irqtime) {
+ /* account_busy_for_cpu_time() = 1 so busy time needs
+ * to be accounted to the current window. A new window
+ * has started and p is the current task so rollover is
+ * needed. The current task must be the idle task because
+ * irqtime is not accounted for any other task.
+ *
+ * Irqtime will be accounted each time we process IRQ activity
+ * after a period of idleness, so we know the IRQ busy time
+ * started at wallclock - irqtime. */
+
+ BUG_ON(!is_idle_task(p));
+ mark_start = wallclock - irqtime;
+
+ /* Roll window over. If IRQ busy time was just in the current
+ * window then that is all that need be accounted. */
+ rq->prev_runnable_sum = rq->curr_runnable_sum;
+ if (mark_start > window_start) {
+ rq->curr_runnable_sum = scale_exec_time(irqtime, rq);
+ return;
+ }
+
+ /* The IRQ busy time spanned multiple windows. Process the
+ * busy time preceding the current window start first. */
+ delta = window_start - mark_start;
+ if (delta > window_size)
+ delta = window_size;
+ delta = scale_exec_time(delta, rq);
+ rq->prev_runnable_sum += delta;
+
+ /* Process the remaining IRQ busy time in the current window. */
+ delta = wallclock - window_start;
+ rq->curr_runnable_sum = scale_exec_time(delta, rq);
+
+ return;
+ }
+
+ BUG();
+}
+
+static int account_busy_for_task_demand(struct task_struct *p, int event)
+{
+ /* No need to bother updating task demand for exiting tasks
+ * or the idle task. */
+ if (exiting_task(p) || is_idle_task(p))
+ return 0;
+
+ /* When a task is waking up it is completing a segment of non-busy
+ * time. Likewise, if wait time is not treated as busy time, then
+ * when a task begins to run or is migrated, it is not running and
+ * is completing a segment of non-busy time. */
+ if (event == TASK_WAKE || (!walt_account_wait_time &&
+ (event == PICK_NEXT_TASK || event == TASK_MIGRATE)))
+ return 0;
+
+ return 1;
+}
+
+/*
+ * Called when new window is starting for a task, to record cpu usage over
+ * recently concluded window(s). Normally 'samples' should be 1. It can be > 1
+ * when, say, a real-time task runs without preemption for several windows at a
+ * stretch.
+ */
+static void update_history(struct rq *rq, struct task_struct *p,
+ u32 runtime, int samples, int event)
+{
+ u32 *hist = &p->ravg.sum_history[0];
+ int ridx, widx;
+ u32 max = 0, avg, demand;
+ u64 sum = 0;
+
+ /* Ignore windows where task had no activity */
+ if (!runtime || is_idle_task(p) || exiting_task(p) || !samples)
+ goto done;
+
+ /* Push new 'runtime' value onto stack */
+ widx = walt_ravg_hist_size - 1;
+ ridx = widx - samples;
+ for (; ridx >= 0; --widx, --ridx) {
+ hist[widx] = hist[ridx];
+ sum += hist[widx];
+ if (hist[widx] > max)
+ max = hist[widx];
+ }
+
+ for (widx = 0; widx < samples && widx < walt_ravg_hist_size; widx++) {
+ hist[widx] = runtime;
+ sum += hist[widx];
+ if (hist[widx] > max)
+ max = hist[widx];
+ }
+
+ p->ravg.sum = 0;
+
+ if (walt_window_stats_policy == WINDOW_STATS_RECENT) {
+ demand = runtime;
+ } else if (walt_window_stats_policy == WINDOW_STATS_MAX) {
+ demand = max;
+ } else {
+ avg = div64_u64(sum, walt_ravg_hist_size);
+ if (walt_window_stats_policy == WINDOW_STATS_AVG)
+ demand = avg;
+ else
+ demand = max(avg, runtime);
+ }
+
+ /*
+ * A throttled deadline sched class task gets dequeued without
+ * changing p->on_rq. Since the dequeue decrements hmp stats
+ * avoid decrementing it here again.
+ *
+ * When window is rolled over, the cumulative window demand
+ * is reset to the cumulative runnable average (contribution from
+ * the tasks on the runqueue). If the current task is dequeued
+ * already, it's demand is not included in the cumulative runnable
+ * average. So add the task demand separately to cumulative window
+ * demand.
+ */
+ if (!task_has_dl_policy(p) || !p->dl.dl_throttled) {
+ if (task_on_rq_queued(p))
+ fixup_cumulative_runnable_avg(rq, p, demand);
+ else if (rq->curr == p)
+ fixup_cum_window_demand(rq, demand);
+ }
+
+ p->ravg.demand = demand;
+
+done:
+ trace_walt_update_history(rq, p, runtime, samples, event);
+ return;
+}
+
+static void add_to_task_demand(struct rq *rq, struct task_struct *p,
+ u64 delta)
+{
+ delta = scale_exec_time(delta, rq);
+ p->ravg.sum += delta;
+ if (unlikely(p->ravg.sum > walt_ravg_window))
+ p->ravg.sum = walt_ravg_window;
+}
+
+/*
+ * Account cpu demand of task and/or update task's cpu demand history
+ *
+ * ms = p->ravg.mark_start;
+ * wc = wallclock
+ * ws = rq->window_start
+ *
+ * Three possibilities:
+ *
+ * a) Task event is contained within one window.
+ * window_start < mark_start < wallclock
+ *
+ * ws ms wc
+ * | | |
+ * V V V
+ * |---------------|
+ *
+ * In this case, p->ravg.sum is updated *iff* event is appropriate
+ * (ex: event == PUT_PREV_TASK)
+ *
+ * b) Task event spans two windows.
+ * mark_start < window_start < wallclock
+ *
+ * ms ws wc
+ * | | |
+ * V V V
+ * -----|-------------------
+ *
+ * In this case, p->ravg.sum is updated with (ws - ms) *iff* event
+ * is appropriate, then a new window sample is recorded followed
+ * by p->ravg.sum being set to (wc - ws) *iff* event is appropriate.
+ *
+ * c) Task event spans more than two windows.
+ *
+ * ms ws_tmp ws wc
+ * | | | |
+ * V V V V
+ * ---|-------|-------|-------|-------|------
+ * | |
+ * |<------ nr_full_windows ------>|
+ *
+ * In this case, p->ravg.sum is updated with (ws_tmp - ms) first *iff*
+ * event is appropriate, window sample of p->ravg.sum is recorded,
+ * 'nr_full_window' samples of window_size is also recorded *iff*
+ * event is appropriate and finally p->ravg.sum is set to (wc - ws)
+ * *iff* event is appropriate.
+ *
+ * IMPORTANT : Leave p->ravg.mark_start unchanged, as update_cpu_busy_time()
+ * depends on it!
+ */
+static void update_task_demand(struct task_struct *p, struct rq *rq,
+ int event, u64 wallclock)
+{
+ u64 mark_start = p->ravg.mark_start;
+ u64 delta, window_start = rq->window_start;
+ int new_window, nr_full_windows;
+ u32 window_size = walt_ravg_window;
+
+ new_window = mark_start < window_start;
+ if (!account_busy_for_task_demand(p, event)) {
+ if (new_window)
+ /* If the time accounted isn't being accounted as
+ * busy time, and a new window started, only the
+ * previous window need be closed out with the
+ * pre-existing demand. Multiple windows may have
+ * elapsed, but since empty windows are dropped,
+ * it is not necessary to account those. */
+ update_history(rq, p, p->ravg.sum, 1, event);
+ return;
+ }
+
+ if (!new_window) {
+ /* The simple case - busy time contained within the existing
+ * window. */
+ add_to_task_demand(rq, p, wallclock - mark_start);
+ return;
+ }
+
+ /* Busy time spans at least two windows. Temporarily rewind
+ * window_start to first window boundary after mark_start. */
+ delta = window_start - mark_start;
+ nr_full_windows = div64_u64(delta, window_size);
+ window_start -= (u64)nr_full_windows * (u64)window_size;
+
+ /* Process (window_start - mark_start) first */
+ add_to_task_demand(rq, p, window_start - mark_start);
+
+ /* Push new sample(s) into task's demand history */
+ update_history(rq, p, p->ravg.sum, 1, event);
+ if (nr_full_windows)
+ update_history(rq, p, scale_exec_time(window_size, rq),
+ nr_full_windows, event);
+
+ /* Roll window_start back to current to process any remainder
+ * in current window. */
+ window_start += (u64)nr_full_windows * (u64)window_size;
+
+ /* Process (wallclock - window_start) next */
+ mark_start = window_start;
+ add_to_task_demand(rq, p, wallclock - mark_start);
+}
+
+/* Reflect task activity on its demand and cpu's busy time statistics */
+void walt_update_task_ravg(struct task_struct *p, struct rq *rq,
+ int event, u64 wallclock, u64 irqtime)
+{
+ if (walt_disabled || !rq->window_start)
+ return;
+
+ /* there's a bug here - there are many cases where
+ * we enter here without holding this lock, coming from
+ * walt_fixup_busy_time - looks like in 4.14 we don't
+ * hold the dest_rq at time of migration, but I haven't
+ * yet worked out if it is safe to always lock dest_rq there.
+ *
+ * temporarily disable this assert to continue checking the
+ * rest of the locking here.
+ */
+ //lockdep_assert_held(&rq->lock);
+
+ update_window_start(rq, wallclock);
+
+ if (!p->ravg.mark_start)
+ goto done;
+
+ update_task_demand(p, rq, event, wallclock);
+ update_cpu_busy_time(p, rq, event, wallclock, irqtime);
+
+done:
+ trace_walt_update_task_ravg(p, rq, event, wallclock, irqtime);
+
+ p->ravg.mark_start = wallclock;
+}
+
+static void reset_task_stats(struct task_struct *p)
+{
+ u32 sum = 0;
+
+ if (exiting_task(p))
+ sum = EXITING_TASK_MARKER;
+
+ memset(&p->ravg, 0, sizeof(struct ravg));
+ /* Retain EXITING_TASK marker */
+ p->ravg.sum_history[0] = sum;
+}
+
+void walt_mark_task_starting(struct task_struct *p)
+{
+ u64 wallclock;
+ struct rq *rq = task_rq(p);
+
+ if (!rq->window_start) {
+ reset_task_stats(p);
+ return;
+ }
+
+ wallclock = walt_ktime_clock();
+ p->ravg.mark_start = wallclock;
+}
+
+void walt_set_window_start(struct rq *rq, struct rq_flags *rf)
+{
+ if (likely(rq->window_start))
+ return;
+
+ if (cpu_of(rq) == sync_cpu) {
+ rq->window_start = 1;
+ } else {
+ struct rq *sync_rq = cpu_rq(sync_cpu);
+ rq_unpin_lock(rq, rf);
+ double_lock_balance(rq, sync_rq);
+ rq->window_start = sync_rq->window_start;
+ rq->curr_runnable_sum = rq->prev_runnable_sum = 0;
+ raw_spin_unlock(&sync_rq->lock);
+ rq_repin_lock(rq, rf);
+ }
+
+ rq->curr->ravg.mark_start = rq->window_start;
+}
+
+void walt_migrate_sync_cpu(int cpu)
+{
+ if (cpu == sync_cpu)
+ sync_cpu = smp_processor_id();
+}
+
+void walt_fixup_busy_time(struct task_struct *p, int new_cpu)
+{
+ struct rq *src_rq = task_rq(p);
+ struct rq *dest_rq = cpu_rq(new_cpu);
+ u64 wallclock;
+
+ if (!p->on_rq && p->state != TASK_WAKING)
+ return;
+
+ if (exiting_task(p)) {
+ return;
+ }
+
+ if (p->state == TASK_WAKING)
+ double_rq_lock(src_rq, dest_rq);
+
+ wallclock = walt_ktime_clock();
+
+//#define LOCK_CONDITION(rq) (debug_locks && !lockdep_is_held(&rq->lock))
+// WARN(LOCK_CONDITION(task_rq(p)), "task_rq(p) not held. p->state=%08lx new_cpu=%d task_cpu=%d", p->state, new_cpu, p->cpu);
+// WARN(LOCK_CONDITION(dest_rq), "dest_rq not held. p->state=%08lx new_cpu=%d task_cpu=%d", p->state, new_cpu, p->cpu);
+
+ /*
+ * It seems that in lots of cases we don't have
+ * dest_rq locked when we get here, which means
+ * we can't be sure to the WALT stats - someone
+ * needs to fix this.
+ */
+ walt_update_task_ravg(task_rq(p)->curr, task_rq(p),
+ TASK_UPDATE, wallclock, 0);
+ walt_update_task_ravg(dest_rq->curr, dest_rq,
+ TASK_UPDATE, wallclock, 0);
+
+// WARN(LOCK_CONDITION(task_rq(p)), "task_rq(p) not held after rq update. p->state=%08lx new_cpu=%d task_cpu=%d", p->state, new_cpu, p->cpu);
+ walt_update_task_ravg(p, task_rq(p), TASK_MIGRATE, wallclock, 0);
+
+ /*
+ * When a task is migrating during the wakeup, adjust
+ * the task's contribution towards cumulative window
+ * demand.
+ */
+ if (p->state == TASK_WAKING &&
+ p->last_sleep_ts >= src_rq->window_start) {
+ fixup_cum_window_demand(src_rq, -(s64)p->ravg.demand);
+ fixup_cum_window_demand(dest_rq, p->ravg.demand);
+ }
+
+ if (p->ravg.curr_window) {
+ src_rq->curr_runnable_sum -= p->ravg.curr_window;
+ dest_rq->curr_runnable_sum += p->ravg.curr_window;
+ }
+
+ if (p->ravg.prev_window) {
+ src_rq->prev_runnable_sum -= p->ravg.prev_window;
+ dest_rq->prev_runnable_sum += p->ravg.prev_window;
+ }
+
+ if ((s64)src_rq->prev_runnable_sum < 0) {
+ src_rq->prev_runnable_sum = 0;
+ WARN_ON(1);
+ }
+ if ((s64)src_rq->curr_runnable_sum < 0) {
+ src_rq->curr_runnable_sum = 0;
+ WARN_ON(1);
+ }
+
+ trace_walt_migration_update_sum(src_rq, p);
+ trace_walt_migration_update_sum(dest_rq, p);
+
+ if (p->state == TASK_WAKING)
+ double_rq_unlock(src_rq, dest_rq);
+}
+
+void walt_init_new_task_load(struct task_struct *p)
+{
+ int i;
+ u32 init_load_windows =
+ div64_u64((u64)sysctl_sched_walt_init_task_load_pct *
+ (u64)walt_ravg_window, 100);
+ u32 init_load_pct = current->init_load_pct;
+
+ p->init_load_pct = 0;
+ memset(&p->ravg, 0, sizeof(struct ravg));
+
+ if (init_load_pct) {
+ init_load_windows = div64_u64((u64)init_load_pct *
+ (u64)walt_ravg_window, 100);
+ }
+
+ p->ravg.demand = init_load_windows;
+ for (i = 0; i < RAVG_HIST_SIZE_MAX; ++i)
+ p->ravg.sum_history[i] = init_load_windows;
+}
diff --git a/kernel/sched/walt.h b/kernel/sched/walt.h
new file mode 100644
index 0000000..3446ca1
--- /dev/null
+++ b/kernel/sched/walt.h
@@ -0,0 +1,67 @@
+/*
+ * Copyright (c) 2016, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef __WALT_H
+#define __WALT_H
+
+#ifdef CONFIG_SCHED_WALT
+
+void walt_update_task_ravg(struct task_struct *p, struct rq *rq, int event,
+ u64 wallclock, u64 irqtime);
+void walt_inc_cumulative_runnable_avg(struct rq *rq, struct task_struct *p);
+void walt_dec_cumulative_runnable_avg(struct rq *rq, struct task_struct *p);
+
+void walt_fixup_busy_time(struct task_struct *p, int new_cpu);
+void walt_init_new_task_load(struct task_struct *p);
+void walt_mark_task_starting(struct task_struct *p);
+void walt_set_window_start(struct rq *rq, struct rq_flags *rf);
+void walt_migrate_sync_cpu(int cpu);
+u64 walt_ktime_clock(void);
+void walt_account_irqtime(int cpu, struct task_struct *curr, u64 delta,
+ u64 wallclock);
+
+u64 walt_irqload(int cpu);
+int walt_cpu_high_irqload(int cpu);
+
+#else /* CONFIG_SCHED_WALT */
+
+static inline void walt_update_task_ravg(struct task_struct *p, struct rq *rq,
+ int event, u64 wallclock, u64 irqtime) { }
+static inline void walt_inc_cumulative_runnable_avg(struct rq *rq, struct task_struct *p) { }
+static inline void walt_dec_cumulative_runnable_avg(struct rq *rq, struct task_struct *p) { }
+static inline void walt_fixup_busy_time(struct task_struct *p, int new_cpu) { }
+static inline void walt_init_new_task_load(struct task_struct *p) { }
+static inline void walt_mark_task_starting(struct task_struct *p) { }
+static inline void walt_set_window_start(struct rq *rq, struct rq_flags *rf) { }
+static inline void walt_migrate_sync_cpu(int cpu) { }
+static inline u64 walt_ktime_clock(void) { return 0; }
+
+#define walt_cpu_high_irqload(cpu) false
+
+#endif /* CONFIG_SCHED_WALT */
+
+#if defined(CONFIG_CFS_BANDWIDTH) && defined(CONFIG_SCHED_WALT)
+void walt_inc_cfs_cumulative_runnable_avg(struct cfs_rq *rq,
+ struct task_struct *p);
+void walt_dec_cfs_cumulative_runnable_avg(struct cfs_rq *rq,
+ struct task_struct *p);
+#else
+static inline void walt_inc_cfs_cumulative_runnable_avg(struct cfs_rq *rq,
+ struct task_struct *p) { }
+static inline void walt_dec_cfs_cumulative_runnable_avg(struct cfs_rq *rq,
+ struct task_struct *p) { }
+#endif
+
+extern bool walt_disabled;
+
+#endif
diff --git a/kernel/sys.c b/kernel/sys.c
index e25ec93..26a5cf0 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -42,6 +42,8 @@
#include <linux/syscore_ops.h>
#include <linux/version.h>
#include <linux/ctype.h>
+#include <linux/mm.h>
+#include <linux/mempolicy.h>
#include <linux/compat.h>
#include <linux/syscalls.h>
@@ -2183,6 +2185,153 @@
return 1;
}
+#ifdef CONFIG_MMU
+static int prctl_update_vma_anon_name(struct vm_area_struct *vma,
+ struct vm_area_struct **prev,
+ unsigned long start, unsigned long end,
+ const char __user *name_addr)
+{
+ struct mm_struct *mm = vma->vm_mm;
+ int error = 0;
+ pgoff_t pgoff;
+
+ if (name_addr == vma_get_anon_name(vma)) {
+ *prev = vma;
+ goto out;
+ }
+
+ pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
+ *prev = vma_merge(mm, *prev, start, end, vma->vm_flags, vma->anon_vma,
+ vma->vm_file, pgoff, vma_policy(vma),
+ vma->vm_userfaultfd_ctx, name_addr);
+ if (*prev) {
+ vma = *prev;
+ goto success;
+ }
+
+ *prev = vma;
+
+ if (start != vma->vm_start) {
+ error = split_vma(mm, vma, start, 1);
+ if (error)
+ goto out;
+ }
+
+ if (end != vma->vm_end) {
+ error = split_vma(mm, vma, end, 0);
+ if (error)
+ goto out;
+ }
+
+success:
+ if (!vma->vm_file)
+ vma->anon_name = name_addr;
+
+out:
+ if (error == -ENOMEM)
+ error = -EAGAIN;
+ return error;
+}
+
+static int prctl_set_vma_anon_name(unsigned long start, unsigned long end,
+ unsigned long arg)
+{
+ unsigned long tmp;
+ struct vm_area_struct *vma, *prev;
+ int unmapped_error = 0;
+ int error = -EINVAL;
+
+ /*
+ * If the interval [start,end) covers some unmapped address
+ * ranges, just ignore them, but return -ENOMEM at the end.
+ * - this matches the handling in madvise.
+ */
+ vma = find_vma_prev(current->mm, start, &prev);
+ if (vma && start > vma->vm_start)
+ prev = vma;
+
+ for (;;) {
+ /* Still start < end. */
+ error = -ENOMEM;
+ if (!vma)
+ return error;
+
+ /* Here start < (end|vma->vm_end). */
+ if (start < vma->vm_start) {
+ unmapped_error = -ENOMEM;
+ start = vma->vm_start;
+ if (start >= end)
+ return error;
+ }
+
+ /* Here vma->vm_start <= start < (end|vma->vm_end) */
+ tmp = vma->vm_end;
+ if (end < tmp)
+ tmp = end;
+
+ /* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */
+ error = prctl_update_vma_anon_name(vma, &prev, start, tmp,
+ (const char __user *)arg);
+ if (error)
+ return error;
+ start = tmp;
+ if (prev && start < prev->vm_end)
+ start = prev->vm_end;
+ error = unmapped_error;
+ if (start >= end)
+ return error;
+ if (prev)
+ vma = prev->vm_next;
+ else /* madvise_remove dropped mmap_sem */
+ vma = find_vma(current->mm, start);
+ }
+}
+
+static int prctl_set_vma(unsigned long opt, unsigned long start,
+ unsigned long len_in, unsigned long arg)
+{
+ struct mm_struct *mm = current->mm;
+ int error;
+ unsigned long len;
+ unsigned long end;
+
+ if (start & ~PAGE_MASK)
+ return -EINVAL;
+ len = (len_in + ~PAGE_MASK) & PAGE_MASK;
+
+ /* Check to see whether len was rounded up from small -ve to zero */
+ if (len_in && !len)
+ return -EINVAL;
+
+ end = start + len;
+ if (end < start)
+ return -EINVAL;
+
+ if (end == start)
+ return 0;
+
+ down_write(&mm->mmap_sem);
+
+ switch (opt) {
+ case PR_SET_VMA_ANON_NAME:
+ error = prctl_set_vma_anon_name(start, end, arg);
+ break;
+ default:
+ error = -EINVAL;
+ }
+
+ up_write(&mm->mmap_sem);
+
+ return error;
+}
+#else /* CONFIG_MMU */
+static int prctl_set_vma(unsigned long opt, unsigned long start,
+ unsigned long len_in, unsigned long arg)
+{
+ return -EINVAL;
+}
+#endif
+
int __weak arch_prctl_spec_ctrl_get(struct task_struct *t, unsigned long which)
{
return -EINVAL;
@@ -2406,6 +2555,9 @@
return -EINVAL;
error = arch_prctl_spec_ctrl_set(me, arg2, arg3);
break;
+ case PR_SET_VMA:
+ error = prctl_set_vma(arg2, arg3, arg4, arg5);
+ break;
default:
error = -EINVAL;
break;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 0695505..f9291b0 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -105,6 +105,7 @@
extern unsigned int core_pipe_limit;
#endif
extern int pid_max;
+extern int extra_free_kbytes;
extern int pid_max_min, pid_max_max;
extern int percpu_pagelist_fraction;
extern int latencytop_enabled;
@@ -328,6 +329,50 @@
.extra1 = &min_sched_granularity_ns,
.extra2 = &max_sched_granularity_ns,
},
+#ifdef CONFIG_SCHED_WALT
+ {
+ .procname = "sched_use_walt_cpu_util",
+ .data = &sysctl_sched_use_walt_cpu_util,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
+ .procname = "sched_use_walt_task_util",
+ .data = &sysctl_sched_use_walt_task_util,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
+ .procname = "sched_walt_init_task_load_pct",
+ .data = &sysctl_sched_walt_init_task_load_pct,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
+ .procname = "sched_walt_cpu_high_irqload",
+ .data = &sysctl_sched_walt_cpu_high_irqload,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+#endif
+ {
+ .procname = "sched_sync_hint_enable",
+ .data = &sysctl_sched_sync_hint_enable,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
+ .procname = "sched_cstate_aware",
+ .data = &sysctl_sched_cstate_aware,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
{
.procname = "sched_wakeup_granularity_ns",
.data = &sysctl_sched_wakeup_granularity,
@@ -1446,6 +1491,14 @@
.extra2 = &one_thousand,
},
{
+ .procname = "extra_free_kbytes",
+ .data = &extra_free_kbytes,
+ .maxlen = sizeof(extra_free_kbytes),
+ .mode = 0644,
+ .proc_handler = min_free_kbytes_sysctl_handler,
+ .extra1 = &zero,
+ },
+ {
.procname = "percpu_pagelist_fraction",
.data = &percpu_pagelist_fraction,
.maxlen = sizeof(percpu_pagelist_fraction),
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index d00e85a..12a01cb 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -463,7 +463,8 @@
#endif
}
-static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base)
+static ktime_t __hrtimer_get_next_event(struct hrtimer_cpu_base *cpu_base,
+ const struct hrtimer *exclude)
{
struct hrtimer_clock_base *base = cpu_base->clock_base;
unsigned int active = cpu_base->active_bases;
@@ -479,9 +480,24 @@
next = timerqueue_getnext(&base->active);
timer = container_of(next, struct hrtimer, node);
+ if (timer == exclude) {
+ /* Get to the next timer in the queue. */
+ struct rb_node *rbn = rb_next(&next->node);
+
+ next = rb_entry_safe(rbn, struct timerqueue_node, node);
+ if (!next)
+ continue;
+
+ timer = container_of(next, struct hrtimer, node);
+ }
expires = ktime_sub(hrtimer_get_expires(timer), base->offset);
if (expires < expires_next) {
expires_next = expires;
+
+ /* Skip cpu_base update if a timer is being excluded. */
+ if (exclude)
+ continue;
+
hrtimer_update_next_timer(cpu_base, timer);
}
}
@@ -560,7 +576,7 @@
if (!cpu_base->hres_active)
return;
- expires_next = __hrtimer_get_next_event(cpu_base);
+ expires_next = __hrtimer_get_next_event(cpu_base, NULL);
if (skip_equal && expires_next == cpu_base->expires_next)
return;
@@ -1076,7 +1092,30 @@
raw_spin_lock_irqsave(&cpu_base->lock, flags);
if (!__hrtimer_hres_active(cpu_base))
- expires = __hrtimer_get_next_event(cpu_base);
+ expires = __hrtimer_get_next_event(cpu_base, NULL);
+
+ raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
+
+ return expires;
+}
+
+/**
+ * hrtimer_next_event_without - time until next expiry event w/o one timer
+ * @exclude: timer to exclude
+ *
+ * Returns the next expiry time over all timers except for the @exclude one or
+ * KTIME_MAX if none of them is pending.
+ */
+u64 hrtimer_next_event_without(const struct hrtimer *exclude)
+{
+ struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(&hrtimer_bases);
+ u64 expires = KTIME_MAX;
+ unsigned long flags;
+
+ raw_spin_lock_irqsave(&cpu_base->lock, flags);
+
+ if (__hrtimer_hres_active(cpu_base))
+ expires = __hrtimer_get_next_event(cpu_base, exclude);
raw_spin_unlock_irqrestore(&cpu_base->lock, flags);
@@ -1318,7 +1357,7 @@
__hrtimer_run_queues(cpu_base, now);
/* Reevaluate the clock bases for the next expiry */
- expires_next = __hrtimer_get_next_event(cpu_base);
+ expires_next = __hrtimer_get_next_event(cpu_base, NULL);
/*
* Store the new expiry value so the migration code can verify
* against it.
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 99e03be..9de7702 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -31,7 +31,7 @@
/* USER_HZ period (usecs): */
-unsigned long tick_usec = TICK_USEC;
+unsigned long tick_usec = USER_TICK_USEC;
/* SHIFTED_HZ period (nsecs): */
unsigned long tick_nsec;
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index ea3c062..d415324 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -563,14 +563,11 @@
sched_clock_idle_wakeup_event();
}
-static ktime_t tick_nohz_start_idle(struct tick_sched *ts)
+static void tick_nohz_start_idle(struct tick_sched *ts)
{
- ktime_t now = ktime_get();
-
- ts->idle_entrytime = now;
+ ts->idle_entrytime = ktime_get();
ts->idle_active = 1;
sched_clock_idle_sleep_event();
- return now;
}
/**
@@ -679,13 +676,10 @@
return local_softirq_pending() & BIT(TIMER_SOFTIRQ);
}
-static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
- ktime_t now, int cpu)
+static ktime_t tick_nohz_next_event(struct tick_sched *ts, int cpu)
{
- struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
u64 basemono, next_tick, next_tmr, next_rcu, delta, expires;
unsigned long seq, basejiff;
- ktime_t tick;
/* Read jiffies and the time when jiffies were updated last */
do {
@@ -694,6 +688,7 @@
basejiff = jiffies;
} while (read_seqretry(&jiffies_lock, seq));
ts->last_jiffies = basejiff;
+ ts->timer_expires_base = basemono;
/*
* Keep the periodic tick, when RCU, architecture or irq_work
@@ -738,32 +733,20 @@
* next period, so no point in stopping it either, bail.
*/
if (!ts->tick_stopped) {
- tick = 0;
+ ts->timer_expires = 0;
goto out;
}
}
/*
- * If this CPU is the one which updates jiffies, then give up
- * the assignment and let it be taken by the CPU which runs
- * the tick timer next, which might be this CPU as well. If we
- * don't drop this here the jiffies might be stale and
- * do_timer() never invoked. Keep track of the fact that it
- * was the one which had the do_timer() duty last. If this CPU
- * is the one which had the do_timer() duty last, we limit the
- * sleep time to the timekeeping max_deferment value.
+ * If this CPU is the one which had the do_timer() duty last, we limit
+ * the sleep time to the timekeeping max_deferment value.
* Otherwise we can sleep as long as we want.
*/
delta = timekeeping_max_deferment();
- if (cpu == tick_do_timer_cpu) {
- tick_do_timer_cpu = TICK_DO_TIMER_NONE;
- ts->do_timer_last = 1;
- } else if (tick_do_timer_cpu != TICK_DO_TIMER_NONE) {
+ if (cpu != tick_do_timer_cpu &&
+ (tick_do_timer_cpu != TICK_DO_TIMER_NONE || !ts->do_timer_last))
delta = KTIME_MAX;
- ts->do_timer_last = 0;
- } else if (!ts->do_timer_last) {
- delta = KTIME_MAX;
- }
#ifdef CONFIG_NO_HZ_FULL
/* Limit the tick delta to the maximum scheduler deferment */
@@ -777,14 +760,42 @@
else
expires = KTIME_MAX;
- expires = min_t(u64, expires, next_tick);
- tick = expires;
+ ts->timer_expires = min_t(u64, expires, next_tick);
+
+out:
+ return ts->timer_expires;
+}
+
+static void tick_nohz_stop_tick(struct tick_sched *ts, int cpu)
+{
+ struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
+ u64 basemono = ts->timer_expires_base;
+ u64 expires = ts->timer_expires;
+ ktime_t tick = expires;
+
+ /* Make sure we won't be trying to stop it twice in a row. */
+ ts->timer_expires_base = 0;
+
+ /*
+ * If this CPU is the one which updates jiffies, then give up
+ * the assignment and let it be taken by the CPU which runs
+ * the tick timer next, which might be this CPU as well. If we
+ * don't drop this here the jiffies might be stale and
+ * do_timer() never invoked. Keep track of the fact that it
+ * was the one which had the do_timer() duty last.
+ */
+ if (cpu == tick_do_timer_cpu) {
+ tick_do_timer_cpu = TICK_DO_TIMER_NONE;
+ ts->do_timer_last = 1;
+ } else if (tick_do_timer_cpu != TICK_DO_TIMER_NONE) {
+ ts->do_timer_last = 0;
+ }
/* Skip reprogram of event if its not changed */
if (ts->tick_stopped && (expires == ts->next_tick)) {
/* Sanity check: make sure clockevent is actually programmed */
if (tick == KTIME_MAX || ts->next_tick == hrtimer_get_expires(&ts->sched_timer))
- goto out;
+ return;
WARN_ON_ONCE(1);
printk_once("basemono: %llu ts->next_tick: %llu dev->next_event: %llu timer->active: %d timer->expires: %llu\n",
@@ -817,7 +828,7 @@
if (unlikely(expires == KTIME_MAX)) {
if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
hrtimer_cancel(&ts->sched_timer);
- goto out;
+ return;
}
if (ts->nohz_mode == NOHZ_MODE_HIGHRES) {
@@ -826,16 +837,23 @@
hrtimer_set_expires(&ts->sched_timer, tick);
tick_program_event(tick, 1);
}
-
-out:
- /*
- * Update the estimated sleep length until the next timer
- * (not only the tick).
- */
- ts->sleep_length = ktime_sub(dev->next_event, now);
- return tick;
}
+static void tick_nohz_retain_tick(struct tick_sched *ts)
+{
+ ts->timer_expires_base = 0;
+}
+
+#ifdef CONFIG_NO_HZ_FULL
+static void tick_nohz_stop_sched_tick(struct tick_sched *ts, int cpu)
+{
+ if (tick_nohz_next_event(ts, cpu))
+ tick_nohz_stop_tick(ts, cpu);
+ else
+ tick_nohz_retain_tick(ts);
+}
+#endif /* CONFIG_NO_HZ_FULL */
+
static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now)
{
/* Update jiffies first */
@@ -871,7 +889,7 @@
return;
if (can_stop_full_tick(cpu, ts))
- tick_nohz_stop_sched_tick(ts, ktime_get(), cpu);
+ tick_nohz_stop_sched_tick(ts, cpu);
else if (ts->tick_stopped)
tick_nohz_restart_sched_tick(ts, ktime_get());
#endif
@@ -897,10 +915,8 @@
return false;
}
- if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE)) {
- ts->sleep_length = NSEC_PER_SEC / HZ;
+ if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE))
return false;
- }
if (need_resched())
return false;
@@ -935,42 +951,65 @@
return true;
}
-static void __tick_nohz_idle_enter(struct tick_sched *ts)
+static void __tick_nohz_idle_stop_tick(struct tick_sched *ts)
{
- ktime_t now, expires;
+ ktime_t expires;
int cpu = smp_processor_id();
- now = tick_nohz_start_idle(ts);
+ /*
+ * If tick_nohz_get_sleep_length() ran tick_nohz_next_event(), the
+ * tick timer expiration time is known already.
+ */
+ if (ts->timer_expires_base)
+ expires = ts->timer_expires;
+ else if (can_stop_idle_tick(cpu, ts))
+ expires = tick_nohz_next_event(ts, cpu);
+ else
+ return;
- if (can_stop_idle_tick(cpu, ts)) {
+ ts->idle_calls++;
+
+ if (expires > 0LL) {
int was_stopped = ts->tick_stopped;
- ts->idle_calls++;
+ tick_nohz_stop_tick(ts, cpu);
- expires = tick_nohz_stop_sched_tick(ts, now, cpu);
- if (expires > 0LL) {
- ts->idle_sleeps++;
- ts->idle_expires = expires;
- }
+ ts->idle_sleeps++;
+ ts->idle_expires = expires;
if (!was_stopped && ts->tick_stopped) {
ts->idle_jiffies = ts->last_jiffies;
nohz_balance_enter_idle(cpu);
}
+ } else {
+ tick_nohz_retain_tick(ts);
}
}
/**
- * tick_nohz_idle_enter - stop the idle tick from the idle task
+ * tick_nohz_idle_stop_tick - stop the idle tick from the idle task
*
* When the next event is more than a tick into the future, stop the idle tick
+ */
+void tick_nohz_idle_stop_tick(void)
+{
+ __tick_nohz_idle_stop_tick(this_cpu_ptr(&tick_cpu_sched));
+}
+
+void tick_nohz_idle_retain_tick(void)
+{
+ tick_nohz_retain_tick(this_cpu_ptr(&tick_cpu_sched));
+ /*
+ * Undo the effect of get_next_timer_interrupt() called from
+ * tick_nohz_next_event().
+ */
+ timer_clear_idle();
+}
+
+/**
+ * tick_nohz_idle_enter - prepare for entering idle on the current CPU
+ *
* Called when we start the idle loop.
- *
- * The arch is responsible of calling:
- *
- * - rcu_idle_enter() after its last use of RCU before the CPU is put
- * to sleep.
- * - rcu_idle_exit() before the first use of RCU after the CPU is woken up.
*/
void tick_nohz_idle_enter(void)
{
@@ -980,7 +1019,7 @@
/*
* Update the idle state in the scheduler domain hierarchy
- * when tick_nohz_stop_sched_tick() is called from the idle loop.
+ * when tick_nohz_stop_tick() is called from the idle loop.
* State will be updated to busy during the first busy tick after
* exiting idle.
*/
@@ -989,8 +1028,11 @@
local_irq_disable();
ts = this_cpu_ptr(&tick_cpu_sched);
+
+ WARN_ON_ONCE(ts->timer_expires_base);
+
ts->inidle = 1;
- __tick_nohz_idle_enter(ts);
+ tick_nohz_start_idle(ts);
local_irq_enable();
}
@@ -1008,21 +1050,62 @@
struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
if (ts->inidle)
- __tick_nohz_idle_enter(ts);
+ tick_nohz_start_idle(ts);
else
tick_nohz_full_update_tick(ts);
}
/**
- * tick_nohz_get_sleep_length - return the length of the current sleep
- *
- * Called from power state control code with interrupts disabled
+ * tick_nohz_idle_got_tick - Check whether or not the tick handler has run
*/
-ktime_t tick_nohz_get_sleep_length(void)
+bool tick_nohz_idle_got_tick(void)
{
struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
- return ts->sleep_length;
+ if (ts->inidle > 1) {
+ ts->inidle = 1;
+ return true;
+ }
+ return false;
+}
+
+/**
+ * tick_nohz_get_sleep_length - return the expected length of the current sleep
+ * @delta_next: duration until the next event if the tick cannot be stopped
+ *
+ * Called from power state control code with interrupts disabled
+ */
+ktime_t tick_nohz_get_sleep_length(ktime_t *delta_next)
+{
+ struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev);
+ struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
+ int cpu = smp_processor_id();
+ /*
+ * The idle entry time is expected to be a sufficient approximation of
+ * the current time at this point.
+ */
+ ktime_t now = ts->idle_entrytime;
+ ktime_t next_event;
+
+ WARN_ON_ONCE(!ts->inidle);
+
+ *delta_next = ktime_sub(dev->next_event, now);
+
+ if (!can_stop_idle_tick(cpu, ts))
+ return *delta_next;
+
+ next_event = tick_nohz_next_event(ts, cpu);
+ if (!next_event)
+ return *delta_next;
+
+ /*
+ * If the next highres timer to expire is earlier than next_event, the
+ * idle governor needs to know that.
+ */
+ next_event = min_t(u64, next_event,
+ hrtimer_next_event_without(&ts->sched_timer));
+
+ return ktime_sub(next_event, now);
}
/**
@@ -1071,6 +1154,20 @@
#endif
}
+static void __tick_nohz_idle_restart_tick(struct tick_sched *ts, ktime_t now)
+{
+ tick_nohz_restart_sched_tick(ts, now);
+ tick_nohz_account_idle_ticks(ts);
+}
+
+void tick_nohz_idle_restart_tick(void)
+{
+ struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched);
+
+ if (ts->tick_stopped)
+ __tick_nohz_idle_restart_tick(ts, ktime_get());
+}
+
/**
* tick_nohz_idle_exit - restart the idle tick from the idle task
*
@@ -1086,6 +1183,7 @@
local_irq_disable();
WARN_ON_ONCE(!ts->inidle);
+ WARN_ON_ONCE(ts->timer_expires_base);
ts->inidle = 0;
@@ -1095,10 +1193,8 @@
if (ts->idle_active)
tick_nohz_stop_idle(ts, now);
- if (ts->tick_stopped) {
- tick_nohz_restart_sched_tick(ts, now);
- tick_nohz_account_idle_ticks(ts);
- }
+ if (ts->tick_stopped)
+ __tick_nohz_idle_restart_tick(ts, now);
local_irq_enable();
}
@@ -1112,6 +1208,9 @@
struct pt_regs *regs = get_irq_regs();
ktime_t now = ktime_get();
+ if (ts->inidle)
+ ts->inidle = 2;
+
dev->next_event = KTIME_MAX;
tick_sched_do_timer(now);
@@ -1209,6 +1308,9 @@
struct pt_regs *regs = get_irq_regs();
ktime_t now = ktime_get();
+ if (ts->inidle)
+ ts->inidle = 2;
+
tick_sched_do_timer(now);
/*
diff --git a/kernel/time/tick-sched.h b/kernel/time/tick-sched.h
index 954b43d..2b845f2 100644
--- a/kernel/time/tick-sched.h
+++ b/kernel/time/tick-sched.h
@@ -38,7 +38,8 @@
* @idle_exittime: Time when the idle state was left
* @idle_sleeptime: Sum of the time slept in idle with sched tick stopped
* @iowait_sleeptime: Sum of the time slept in idle with sched tick stopped, with IO outstanding
- * @sleep_length: Duration of the current idle sleep
+ * @timer_expires: Anticipated timer expiration time (in case sched tick is stopped)
+ * @timer_expires_base: Base time clock monotonic for @timer_expires
* @do_timer_lst: CPU was the last one doing do_timer before going idle
*/
struct tick_sched {
@@ -58,8 +59,9 @@
ktime_t idle_exittime;
ktime_t idle_sleeptime;
ktime_t iowait_sleeptime;
- ktime_t sleep_length;
unsigned long last_jiffies;
+ u64 timer_expires;
+ u64 timer_expires_base;
u64 next_timer;
ktime_t idle_expires;
int do_timer_last;
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index 4ad6f6c..3ec4922 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -73,6 +73,9 @@
select GLOB
bool
+config GPU_TRACEPOINTS
+ bool
+
config CONTEXT_SWITCH_TRACER
bool
@@ -160,6 +163,17 @@
address on the current task structure into a stack of calls.
+config PREEMPTIRQ_EVENTS
+ bool "Enable trace events for preempt and irq disable/enable"
+ select TRACE_IRQFLAGS
+ depends on DEBUG_PREEMPT || !PROVE_LOCKING
+ default n
+ help
+ Enable tracing of disable and enable events for preemption and irqs.
+ For tracing preempt disable/enable events, DEBUG_PREEMPT must be
+ enabled. For tracing irq disable/enable events, PROVE_LOCKING must
+ be disabled.
+
config IRQSOFF_TRACER
bool "Interrupts-off Latency Tracer"
default n
diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
index 19a15b2..803e38a 100644
--- a/kernel/trace/Makefile
+++ b/kernel/trace/Makefile
@@ -35,6 +35,7 @@
obj-$(CONFIG_TRACING_MAP) += tracing_map.o
obj-$(CONFIG_CONTEXT_SWITCH_TRACER) += trace_sched_switch.o
obj-$(CONFIG_FUNCTION_TRACER) += trace_functions.o
+obj-$(CONFIG_PREEMPTIRQ_EVENTS) += trace_irqsoff.o
obj-$(CONFIG_IRQSOFF_TRACER) += trace_irqsoff.o
obj-$(CONFIG_PREEMPT_TRACER) += trace_irqsoff.o
obj-$(CONFIG_SCHED_TRACER) += trace_sched_wakeup.o
@@ -68,6 +69,7 @@
endif
obj-$(CONFIG_PROBE_EVENTS) += trace_probe.o
obj-$(CONFIG_UPROBE_EVENTS) += trace_uprobe.o
+obj-$(CONFIG_GPU_TRACEPOINTS) += gpu-traces.o
obj-$(CONFIG_TRACEPOINT_BENCHMARK) += trace_benchmark.o
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 7379bcf..e69d28e 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -122,8 +122,9 @@
struct ftrace_ops *op, struct pt_regs *regs);
#else
/* See comment below, where ftrace_ops_list_func is defined */
-static void ftrace_ops_no_ops(unsigned long ip, unsigned long parent_ip);
-#define ftrace_ops_list_func ((ftrace_func_t)ftrace_ops_no_ops)
+static void ftrace_ops_no_ops(unsigned long ip, unsigned long parent_ip,
+ struct ftrace_ops *op, struct pt_regs *regs);
+#define ftrace_ops_list_func ftrace_ops_no_ops
#endif
/*
@@ -5572,9 +5573,9 @@
return 0;
}
-static int ftrace_process_locs(struct module *mod,
- unsigned long *start,
- unsigned long *end)
+static int __norecordmcount ftrace_process_locs(struct module *mod,
+ unsigned long *start,
+ unsigned long *end)
{
struct ftrace_page *start_pg;
struct ftrace_page *pg;
@@ -6098,7 +6099,8 @@
__ftrace_ops_list_func(ip, parent_ip, NULL, regs);
}
#else
-static void ftrace_ops_no_ops(unsigned long ip, unsigned long parent_ip)
+static void ftrace_ops_no_ops(unsigned long ip, unsigned long parent_ip,
+ struct ftrace_ops *op, struct pt_regs *regs)
{
__ftrace_ops_list_func(ip, parent_ip, NULL, NULL);
}
@@ -6562,14 +6564,17 @@
fgraph_graph_time = enable;
}
+void ftrace_graph_return_stub(struct ftrace_graph_ret *trace)
+{
+}
+
int ftrace_graph_entry_stub(struct ftrace_graph_ent *trace)
{
return 0;
}
/* The callbacks that hook a function */
-trace_func_graph_ret_t ftrace_graph_return =
- (trace_func_graph_ret_t)ftrace_stub;
+trace_func_graph_ret_t ftrace_graph_return = ftrace_graph_return_stub;
trace_func_graph_ent_t ftrace_graph_entry = ftrace_graph_entry_stub;
static trace_func_graph_ent_t __ftrace_graph_entry = ftrace_graph_entry_stub;
@@ -6797,7 +6802,7 @@
goto out;
ftrace_graph_active--;
- ftrace_graph_return = (trace_func_graph_ret_t)ftrace_stub;
+ ftrace_graph_return = ftrace_graph_return_stub;
ftrace_graph_entry = ftrace_graph_entry_stub;
__ftrace_graph_entry = ftrace_graph_entry_stub;
ftrace_shutdown(&graph_ops, FTRACE_STOP_FUNC_RET);
diff --git a/kernel/trace/gpu-traces.c b/kernel/trace/gpu-traces.c
new file mode 100644
index 0000000..a4b3f00
--- /dev/null
+++ b/kernel/trace/gpu-traces.c
@@ -0,0 +1,23 @@
+/*
+ * GPU tracepoints
+ *
+ * Copyright (C) 2013 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include <linux/module.h>
+
+#define CREATE_TRACE_POINTS
+#include <trace/events/gpu.h>
+
+EXPORT_TRACEPOINT_SYMBOL(gpu_sched_switch);
+EXPORT_TRACEPOINT_SYMBOL(gpu_job_enqueue);
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 169b3c4..6121390 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -66,6 +66,9 @@
#define TRACE_GRAPH_INDENT 2
+/* Flag options */
+#define TRACE_GRAPH_PRINT_FLAT 0x80
+
unsigned int fgraph_max_depth;
static struct tracer_opt trace_opts[] = {
@@ -89,6 +92,8 @@
{ TRACER_OPT(sleep-time, TRACE_GRAPH_SLEEP_TIME) },
/* Include time within nested functions */
{ TRACER_OPT(graph-time, TRACE_GRAPH_GRAPH_TIME) },
+ /* Use standard trace formatting rather than hierarchical */
+ { TRACER_OPT(funcgraph-flat, TRACE_GRAPH_PRINT_FLAT) },
{ } /* Empty entry */
};
@@ -1247,6 +1252,9 @@
int cpu = iter->cpu;
int ret;
+ if (flags & TRACE_GRAPH_PRINT_FLAT)
+ return TRACE_TYPE_UNHANDLED;
+
if (data && per_cpu_ptr(data->cpu_data, cpu)->ignore) {
per_cpu_ptr(data->cpu_data, cpu)->ignore = 0;
return TRACE_TYPE_HANDLED;
@@ -1304,13 +1312,6 @@
return print_graph_function_flags(iter, tracer_flags.val);
}
-static enum print_line_t
-print_graph_function_event(struct trace_iterator *iter, int flags,
- struct trace_event *event)
-{
- return print_graph_function(iter);
-}
-
static void print_lat_header(struct seq_file *s, u32 flags)
{
static const char spaces[] = " " /* 16 spaces */
@@ -1379,6 +1380,11 @@
struct trace_iterator *iter = s->private;
struct trace_array *tr = iter->tr;
+ if (flags & TRACE_GRAPH_PRINT_FLAT) {
+ trace_default_header(s);
+ return;
+ }
+
if (!(tr->trace_flags & TRACE_ITER_CONTEXT_INFO))
return;
@@ -1460,19 +1466,6 @@
return 0;
}
-static struct trace_event_functions graph_functions = {
- .trace = print_graph_function_event,
-};
-
-static struct trace_event graph_trace_entry_event = {
- .type = TRACE_GRAPH_ENT,
- .funcs = &graph_functions,
-};
-
-static struct trace_event graph_trace_ret_event = {
- .type = TRACE_GRAPH_RET,
- .funcs = &graph_functions
-};
static struct tracer graph_trace __tracer_data = {
.name = "function_graph",
@@ -1549,16 +1542,6 @@
{
max_bytes_for_cpu = snprintf(NULL, 0, "%u", nr_cpu_ids - 1);
- if (!register_trace_event(&graph_trace_entry_event)) {
- pr_warn("Warning: could not register graph trace events\n");
- return 1;
- }
-
- if (!register_trace_event(&graph_trace_ret_event)) {
- pr_warn("Warning: could not register graph trace events\n");
- return 1;
- }
-
return register_tracer(&graph_trace);
}
diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c
index 7758bc0..03ecb44 100644
--- a/kernel/trace/trace_irqsoff.c
+++ b/kernel/trace/trace_irqsoff.c
@@ -16,6 +16,10 @@
#include "trace.h"
+#define CREATE_TRACE_POINTS
+#include <trace/events/preemptirq.h>
+
+#if defined(CONFIG_IRQSOFF_TRACER) || defined(CONFIG_PREEMPT_TRACER)
static struct trace_array *irqsoff_trace __read_mostly;
static int tracer_enabled __read_mostly;
@@ -463,63 +467,43 @@
#else /* !CONFIG_PROVE_LOCKING */
/*
- * Stubs:
- */
-
-void trace_softirqs_on(unsigned long ip)
-{
-}
-
-void trace_softirqs_off(unsigned long ip)
-{
-}
-
-inline void print_irqtrace_events(struct task_struct *curr)
-{
-}
-
-/*
* We are only interested in hardirq on/off events:
*/
-void trace_hardirqs_on(void)
+static inline void tracer_hardirqs_on(void)
{
if (!preempt_trace() && irq_trace())
stop_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
}
-EXPORT_SYMBOL(trace_hardirqs_on);
-void trace_hardirqs_off(void)
+static inline void tracer_hardirqs_off(void)
{
if (!preempt_trace() && irq_trace())
start_critical_timing(CALLER_ADDR0, CALLER_ADDR1);
}
-EXPORT_SYMBOL(trace_hardirqs_off);
-__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
+static inline void tracer_hardirqs_on_caller(unsigned long caller_addr)
{
if (!preempt_trace() && irq_trace())
stop_critical_timing(CALLER_ADDR0, caller_addr);
}
-EXPORT_SYMBOL(trace_hardirqs_on_caller);
-__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
+static inline void tracer_hardirqs_off_caller(unsigned long caller_addr)
{
if (!preempt_trace() && irq_trace())
start_critical_timing(CALLER_ADDR0, caller_addr);
}
-EXPORT_SYMBOL(trace_hardirqs_off_caller);
#endif /* CONFIG_PROVE_LOCKING */
#endif /* CONFIG_IRQSOFF_TRACER */
#ifdef CONFIG_PREEMPT_TRACER
-void trace_preempt_on(unsigned long a0, unsigned long a1)
+static inline void tracer_preempt_on(unsigned long a0, unsigned long a1)
{
if (preempt_trace() && !irq_trace())
stop_critical_timing(a0, a1);
}
-void trace_preempt_off(unsigned long a0, unsigned long a1)
+static inline void tracer_preempt_off(unsigned long a0, unsigned long a1)
{
if (preempt_trace() && !irq_trace())
start_critical_timing(a0, a1);
@@ -781,3 +765,100 @@
return 0;
}
core_initcall(init_irqsoff_tracer);
+#endif /* IRQSOFF_TRACER || PREEMPTOFF_TRACER */
+
+#ifndef CONFIG_IRQSOFF_TRACER
+static inline void tracer_hardirqs_on(void) { }
+static inline void tracer_hardirqs_off(void) { }
+static inline void tracer_hardirqs_on_caller(unsigned long caller_addr) { }
+static inline void tracer_hardirqs_off_caller(unsigned long caller_addr) { }
+#endif
+
+#ifndef CONFIG_PREEMPT_TRACER
+static inline void tracer_preempt_on(unsigned long a0, unsigned long a1) { }
+static inline void tracer_preempt_off(unsigned long a0, unsigned long a1) { }
+#endif
+
+#if defined(CONFIG_TRACE_IRQFLAGS) && !defined(CONFIG_PROVE_LOCKING)
+/* Per-cpu variable to prevent redundant calls when IRQs already off */
+static DEFINE_PER_CPU(int, tracing_irq_cpu);
+
+void trace_hardirqs_on(void)
+{
+ if (!this_cpu_read(tracing_irq_cpu))
+ return;
+
+ trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+ tracer_hardirqs_on();
+
+ this_cpu_write(tracing_irq_cpu, 0);
+}
+EXPORT_SYMBOL(trace_hardirqs_on);
+
+void trace_hardirqs_off(void)
+{
+ if (this_cpu_read(tracing_irq_cpu))
+ return;
+
+ this_cpu_write(tracing_irq_cpu, 1);
+
+ trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
+ tracer_hardirqs_off();
+}
+EXPORT_SYMBOL(trace_hardirqs_off);
+
+__visible void trace_hardirqs_on_caller(unsigned long caller_addr)
+{
+ if (!this_cpu_read(tracing_irq_cpu))
+ return;
+
+ trace_irq_enable_rcuidle(CALLER_ADDR0, caller_addr);
+ tracer_hardirqs_on_caller(caller_addr);
+
+ this_cpu_write(tracing_irq_cpu, 0);
+}
+EXPORT_SYMBOL(trace_hardirqs_on_caller);
+
+__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
+{
+ if (this_cpu_read(tracing_irq_cpu))
+ return;
+
+ this_cpu_write(tracing_irq_cpu, 1);
+
+ trace_irq_disable_rcuidle(CALLER_ADDR0, caller_addr);
+ tracer_hardirqs_off_caller(caller_addr);
+}
+EXPORT_SYMBOL(trace_hardirqs_off_caller);
+
+/*
+ * Stubs:
+ */
+
+void trace_softirqs_on(unsigned long ip)
+{
+}
+
+void trace_softirqs_off(unsigned long ip)
+{
+}
+
+inline void print_irqtrace_events(struct task_struct *curr)
+{
+}
+#endif
+
+#if defined(CONFIG_PREEMPT_TRACER) || \
+ (defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS))
+void trace_preempt_on(unsigned long a0, unsigned long a1)
+{
+ trace_preempt_enable_rcuidle(a0, a1);
+ tracer_preempt_on(a0, a1);
+}
+
+void trace_preempt_off(unsigned long a0, unsigned long a1)
+{
+ trace_preempt_disable_rcuidle(a0, a1);
+ tracer_preempt_off(a0, a1);
+}
+#endif
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 4500b00..416f7fe 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -911,6 +911,174 @@
.funcs = &trace_fn_funcs,
};
+/* TRACE_GRAPH_ENT */
+static enum print_line_t trace_graph_ent_trace(struct trace_iterator *iter, int flags,
+ struct trace_event *event)
+{
+ struct trace_seq *s = &iter->seq;
+ struct ftrace_graph_ent_entry *field;
+
+ trace_assign_type(field, iter->ent);
+
+ trace_seq_puts(s, "graph_ent: func=");
+ if (trace_seq_has_overflowed(s))
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ if (!seq_print_ip_sym(s, field->graph_ent.func, flags))
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ trace_seq_puts(s, "\n");
+ if (trace_seq_has_overflowed(s))
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ return TRACE_TYPE_HANDLED;
+}
+
+static enum print_line_t trace_graph_ent_raw(struct trace_iterator *iter, int flags,
+ struct trace_event *event)
+{
+ struct ftrace_graph_ent_entry *field;
+
+ trace_assign_type(field, iter->ent);
+
+ trace_seq_printf(&iter->seq, "%lx %d\n",
+ field->graph_ent.func,
+ field->graph_ent.depth);
+ if (trace_seq_has_overflowed(&iter->seq))
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ return TRACE_TYPE_HANDLED;
+}
+
+static enum print_line_t trace_graph_ent_hex(struct trace_iterator *iter, int flags,
+ struct trace_event *event)
+{
+ struct ftrace_graph_ent_entry *field;
+ struct trace_seq *s = &iter->seq;
+
+ trace_assign_type(field, iter->ent);
+
+ SEQ_PUT_HEX_FIELD(s, field->graph_ent.func);
+ SEQ_PUT_HEX_FIELD(s, field->graph_ent.depth);
+
+ return TRACE_TYPE_HANDLED;
+}
+
+static enum print_line_t trace_graph_ent_bin(struct trace_iterator *iter, int flags,
+ struct trace_event *event)
+{
+ struct ftrace_graph_ent_entry *field;
+ struct trace_seq *s = &iter->seq;
+
+ trace_assign_type(field, iter->ent);
+
+ SEQ_PUT_FIELD(s, field->graph_ent.func);
+ SEQ_PUT_FIELD(s, field->graph_ent.depth);
+
+ return TRACE_TYPE_HANDLED;
+}
+
+static struct trace_event_functions trace_graph_ent_funcs = {
+ .trace = trace_graph_ent_trace,
+ .raw = trace_graph_ent_raw,
+ .hex = trace_graph_ent_hex,
+ .binary = trace_graph_ent_bin,
+};
+
+static struct trace_event trace_graph_ent_event = {
+ .type = TRACE_GRAPH_ENT,
+ .funcs = &trace_graph_ent_funcs,
+};
+
+/* TRACE_GRAPH_RET */
+static enum print_line_t trace_graph_ret_trace(struct trace_iterator *iter, int flags,
+ struct trace_event *event)
+{
+ struct trace_seq *s = &iter->seq;
+ struct trace_entry *entry = iter->ent;
+ struct ftrace_graph_ret_entry *field;
+
+ trace_assign_type(field, entry);
+
+ trace_seq_puts(s, "graph_ret: func=");
+ if (trace_seq_has_overflowed(s))
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ if (!seq_print_ip_sym(s, field->ret.func, flags))
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ trace_seq_puts(s, "\n");
+ if (trace_seq_has_overflowed(s))
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ return TRACE_TYPE_HANDLED;
+}
+
+static enum print_line_t trace_graph_ret_raw(struct trace_iterator *iter, int flags,
+ struct trace_event *event)
+{
+ struct ftrace_graph_ret_entry *field;
+
+ trace_assign_type(field, iter->ent);
+
+ trace_seq_printf(&iter->seq, "%lx %lld %lld %ld %d\n",
+ field->ret.func,
+ field->ret.calltime,
+ field->ret.rettime,
+ field->ret.overrun,
+ field->ret.depth);
+ if (trace_seq_has_overflowed(&iter->seq))
+ return TRACE_TYPE_PARTIAL_LINE;
+
+ return TRACE_TYPE_HANDLED;
+}
+
+static enum print_line_t trace_graph_ret_hex(struct trace_iterator *iter, int flags,
+ struct trace_event *event)
+{
+ struct ftrace_graph_ret_entry *field;
+ struct trace_seq *s = &iter->seq;
+
+ trace_assign_type(field, iter->ent);
+
+ SEQ_PUT_HEX_FIELD(s, field->ret.func);
+ SEQ_PUT_HEX_FIELD(s, field->ret.calltime);
+ SEQ_PUT_HEX_FIELD(s, field->ret.rettime);
+ SEQ_PUT_HEX_FIELD(s, field->ret.overrun);
+ SEQ_PUT_HEX_FIELD(s, field->ret.depth);
+
+ return TRACE_TYPE_HANDLED;
+}
+
+static enum print_line_t trace_graph_ret_bin(struct trace_iterator *iter, int flags,
+ struct trace_event *event)
+{
+ struct ftrace_graph_ret_entry *field;
+ struct trace_seq *s = &iter->seq;
+
+ trace_assign_type(field, iter->ent);
+
+ SEQ_PUT_FIELD(s, field->ret.func);
+ SEQ_PUT_FIELD(s, field->ret.calltime);
+ SEQ_PUT_FIELD(s, field->ret.rettime);
+ SEQ_PUT_FIELD(s, field->ret.overrun);
+ SEQ_PUT_FIELD(s, field->ret.depth);
+
+ return TRACE_TYPE_HANDLED;
+}
+
+static struct trace_event_functions trace_graph_ret_funcs = {
+ .trace = trace_graph_ret_trace,
+ .raw = trace_graph_ret_raw,
+ .hex = trace_graph_ret_hex,
+ .binary = trace_graph_ret_bin,
+};
+
+static struct trace_event trace_graph_ret_event = {
+ .type = TRACE_GRAPH_RET,
+ .funcs = &trace_graph_ret_funcs,
+};
+
/* TRACE_CTX an TRACE_WAKE */
static enum print_line_t trace_ctxwake_print(struct trace_iterator *iter,
char *delim)
@@ -1382,6 +1550,8 @@
static struct trace_event *events[] __initdata = {
&trace_fn_event,
+ &trace_graph_ent_event,
+ &trace_graph_ret_event,
&trace_ctx_event,
&trace_wake_event,
&trace_stack_event,
diff --git a/kernel/user.c b/kernel/user.c
index 00281ad..1a689bf 100644
--- a/kernel/user.c
+++ b/kernel/user.c
@@ -17,6 +17,7 @@
#include <linux/interrupt.h>
#include <linux/export.h>
#include <linux/user_namespace.h>
+#include <linux/proc_fs.h>
#include <linux/proc_ns.h>
/*
@@ -202,6 +203,7 @@
}
spin_unlock_irq(&uidhash_lock);
}
+ proc_register_uid(uid);
return up;
@@ -223,6 +225,7 @@
spin_lock_irq(&uidhash_lock);
uid_hash_insert(&root_user, uidhashentry(GLOBAL_ROOT_UID));
spin_unlock_irq(&uidhash_lock);
+ proc_register_uid(GLOBAL_ROOT_UID);
return 0;
}
diff --git a/mm/filemap.c b/mm/filemap.c
index e2e738c..1578319 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -420,19 +420,17 @@
return;
pagevec_init(&pvec, 0);
- while ((index <= end) &&
- (nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
- PAGECACHE_TAG_WRITEBACK,
- min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1)) != 0) {
+ while (index <= end) {
unsigned i;
+ nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index,
+ end, PAGECACHE_TAG_WRITEBACK);
+ if (!nr_pages)
+ break;
+
for (i = 0; i < nr_pages; i++) {
struct page *page = pvec.pages[i];
- /* until radix tree lookup accepts end_index */
- if (page->index > end)
- continue;
-
wait_on_page_writeback(page);
ClearPageError(page);
}
@@ -1753,9 +1751,10 @@
EXPORT_SYMBOL(find_get_pages_contig);
/**
- * find_get_pages_tag - find and return pages that match @tag
+ * find_get_pages_range_tag - find and return pages in given range matching @tag
* @mapping: the address_space to search
* @index: the starting page index
+ * @end: The final page index (inclusive)
* @tag: the tag index
* @nr_pages: the maximum number of pages
* @pages: where the resulting pages are placed
@@ -1763,8 +1762,9 @@
* Like find_get_pages, except we only return pages which are tagged with
* @tag. We update @index to index the next page for the traversal.
*/
-unsigned find_get_pages_tag(struct address_space *mapping, pgoff_t *index,
- int tag, unsigned int nr_pages, struct page **pages)
+unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index,
+ pgoff_t end, int tag, unsigned int nr_pages,
+ struct page **pages)
{
struct radix_tree_iter iter;
void **slot;
@@ -1777,6 +1777,9 @@
radix_tree_for_each_tagged(slot, &mapping->page_tree,
&iter, *index, tag) {
struct page *head, *page;
+
+ if (iter.index > end)
+ break;
repeat:
page = radix_tree_deref_slot(slot);
if (unlikely(!page))
@@ -1818,18 +1821,28 @@
}
pages[ret] = page;
- if (++ret == nr_pages)
- break;
+ if (++ret == nr_pages) {
+ *index = pages[ret - 1]->index + 1;
+ goto out;
+ }
}
+ /*
+ * We come here when we got at @end. We take care to not overflow the
+ * index @index as it confuses some of the callers. This breaks the
+ * iteration when there is page at index -1 but that is already broken
+ * anyway.
+ */
+ if (end == (pgoff_t)-1)
+ *index = (pgoff_t)-1;
+ else
+ *index = end + 1;
+out:
rcu_read_unlock();
- if (ret)
- *index = pages[ret - 1]->index + 1;
-
return ret;
}
-EXPORT_SYMBOL(find_get_pages_tag);
+EXPORT_SYMBOL(find_get_pages_range_tag);
/**
* find_get_entries_tag - find and return entries that match @tag
@@ -2665,7 +2678,7 @@
static struct page *do_read_cache_page(struct address_space *mapping,
pgoff_t index,
- int (*filler)(void *, struct page *),
+ int (*filler)(struct file *, struct page *),
void *data,
gfp_t gfp)
{
@@ -2772,7 +2785,7 @@
*/
struct page *read_cache_page(struct address_space *mapping,
pgoff_t index,
- int (*filler)(void *, struct page *),
+ int (*filler)(struct file *, struct page *),
void *data)
{
return do_read_cache_page(mapping, index, filler, data, mapping_gfp_mask(mapping));
@@ -2794,7 +2807,7 @@
pgoff_t index,
gfp_t gfp)
{
- filler_t *filler = (filler_t *)mapping->a_ops->readpage;
+ filler_t *filler = mapping->a_ops->readpage;
return do_read_cache_page(mapping, index, filler, NULL, gfp);
}
diff --git a/mm/madvise.c b/mm/madvise.c
index 751e97a..86e514b 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -138,7 +138,7 @@
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
*prev = vma_merge(mm, *prev, start, end, new_flags, vma->anon_vma,
vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx);
+ vma->vm_userfaultfd_ctx, vma_get_anon_name(vma));
if (*prev) {
vma = *prev;
goto success;
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index ecbda7f..aa169bf 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -729,7 +729,8 @@
((vmstart - vma->vm_start) >> PAGE_SHIFT);
prev = vma_merge(mm, prev, vmstart, vmend, vma->vm_flags,
vma->anon_vma, vma->vm_file, pgoff,
- new_pol, vma->vm_userfaultfd_ctx);
+ new_pol, vma->vm_userfaultfd_ctx,
+ vma_get_anon_name(vma));
if (prev) {
vma = prev;
next = vma->vm_next;
diff --git a/mm/mlock.c b/mm/mlock.c
index 46af369..658ad55 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -528,7 +528,7 @@
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
*prev = vma_merge(mm, *prev, start, end, newflags, vma->anon_vma,
vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx);
+ vma->vm_userfaultfd_ctx, vma_get_anon_name(vma));
if (*prev) {
vma = *prev;
goto success;
diff --git a/mm/mmap.c b/mm/mmap.c
index 2398776..ddbdf49 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -973,7 +973,8 @@
*/
static inline int is_mergeable_vma(struct vm_area_struct *vma,
struct file *file, unsigned long vm_flags,
- struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+ struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+ const char __user *anon_name)
{
/*
* VM_SOFTDIRTY should not prevent from VMA merging, if we
@@ -991,6 +992,8 @@
return 0;
if (!is_mergeable_vm_userfaultfd_ctx(vma, vm_userfaultfd_ctx))
return 0;
+ if (vma_get_anon_name(vma) != anon_name)
+ return 0;
return 1;
}
@@ -1023,9 +1026,10 @@
can_vma_merge_before(struct vm_area_struct *vma, unsigned long vm_flags,
struct anon_vma *anon_vma, struct file *file,
pgoff_t vm_pgoff,
- struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+ struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+ const char __user *anon_name)
{
- if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx) &&
+ if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
if (vma->vm_pgoff == vm_pgoff)
return 1;
@@ -1044,9 +1048,10 @@
can_vma_merge_after(struct vm_area_struct *vma, unsigned long vm_flags,
struct anon_vma *anon_vma, struct file *file,
pgoff_t vm_pgoff,
- struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+ struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+ const char __user *anon_name)
{
- if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx) &&
+ if (is_mergeable_vma(vma, file, vm_flags, vm_userfaultfd_ctx, anon_name) &&
is_mergeable_anon_vma(anon_vma, vma->anon_vma, vma)) {
pgoff_t vm_pglen;
vm_pglen = vma_pages(vma);
@@ -1057,9 +1062,9 @@
}
/*
- * Given a mapping request (addr,end,vm_flags,file,pgoff), figure out
- * whether that can be merged with its predecessor or its successor.
- * Or both (it neatly fills a hole).
+ * Given a mapping request (addr,end,vm_flags,file,pgoff,anon_name),
+ * figure out whether that can be merged with its predecessor or its
+ * successor. Or both (it neatly fills a hole).
*
* In most cases - when called for mmap, brk or mremap - [addr,end) is
* certain not to be mapped by the time vma_merge is called; but when
@@ -1101,7 +1106,8 @@
unsigned long end, unsigned long vm_flags,
struct anon_vma *anon_vma, struct file *file,
pgoff_t pgoff, struct mempolicy *policy,
- struct vm_userfaultfd_ctx vm_userfaultfd_ctx)
+ struct vm_userfaultfd_ctx vm_userfaultfd_ctx,
+ const char __user *anon_name)
{
pgoff_t pglen = (end - addr) >> PAGE_SHIFT;
struct vm_area_struct *area, *next;
@@ -1134,7 +1140,8 @@
mpol_equal(vma_policy(prev), policy) &&
can_vma_merge_after(prev, vm_flags,
anon_vma, file, pgoff,
- vm_userfaultfd_ctx)) {
+ vm_userfaultfd_ctx,
+ anon_name)) {
/*
* OK, it can. Can we now merge in the successor as well?
*/
@@ -1143,7 +1150,8 @@
can_vma_merge_before(next, vm_flags,
anon_vma, file,
pgoff+pglen,
- vm_userfaultfd_ctx) &&
+ vm_userfaultfd_ctx,
+ anon_name) &&
is_mergeable_anon_vma(prev->anon_vma,
next->anon_vma, NULL)) {
/* cases 1, 6 */
@@ -1166,7 +1174,8 @@
mpol_equal(policy, vma_policy(next)) &&
can_vma_merge_before(next, vm_flags,
anon_vma, file, pgoff+pglen,
- vm_userfaultfd_ctx)) {
+ vm_userfaultfd_ctx,
+ anon_name)) {
if (prev && addr < prev->vm_end) /* case 4 */
err = __vma_adjust(prev, prev->vm_start,
addr, prev->vm_pgoff, NULL, next);
@@ -1678,7 +1687,7 @@
* Can we just expand an old mapping?
*/
vma = vma_merge(mm, prev, addr, addr + len, vm_flags,
- NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX);
+ NULL, file, pgoff, NULL, NULL_VM_UFFD_CTX, NULL);
if (vma)
goto out;
@@ -2749,6 +2758,7 @@
return 0;
}
+EXPORT_SYMBOL(do_munmap);
int vm_munmap(unsigned long start, size_t len)
{
@@ -2935,7 +2945,7 @@
/* Can we just expand an old private anonymous mapping? */
vma = vma_merge(mm, prev, addr, addr + len, flags,
- NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX);
+ NULL, NULL, pgoff, NULL, NULL_VM_UFFD_CTX, NULL);
if (vma)
goto out;
@@ -3136,7 +3146,7 @@
return NULL; /* should never get here */
new_vma = vma_merge(mm, prev, addr, addr + len, vma->vm_flags,
vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx);
+ vma->vm_userfaultfd_ctx, vma_get_anon_name(vma));
if (new_vma) {
/*
* Source vma may have been merged into new_vma
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 60864e1..555f673 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -384,7 +384,7 @@
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
*pprev = vma_merge(mm, *pprev, start, end, newflags,
vma->anon_vma, vma->vm_file, pgoff, vma_policy(vma),
- vma->vm_userfaultfd_ctx);
+ vma->vm_userfaultfd_ctx, vma_get_anon_name(vma));
if (*pprev) {
vma = *pprev;
VM_WARN_ON((vma->vm_flags ^ newflags) & ~VM_SOFTDIRTY);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 3175ac8..4135fb0 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2194,30 +2194,14 @@
while (!done && (index <= end)) {
int i;
- nr_pages = pagevec_lookup_tag(&pvec, mapping, &index, tag,
- min(end - index, (pgoff_t)PAGEVEC_SIZE-1) + 1);
+ nr_pages = pagevec_lookup_range_tag(&pvec, mapping, &index, end,
+ tag);
if (nr_pages == 0)
break;
for (i = 0; i < nr_pages; i++) {
struct page *page = pvec.pages[i];
- /*
- * At this point, the page may be truncated or
- * invalidated (changing page->mapping to NULL), or
- * even swizzled back from swapper_space to tmpfs file
- * mapping. However, page->index will not change
- * because we have a reference on the page.
- */
- if (page->index > end) {
- /*
- * can't be range_cyclic (1st pass) because
- * end == -1 in that case.
- */
- done = 1;
- break;
- }
-
done_index = page->index;
lock_page(page);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 59ccf45..8c7ae1c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -258,10 +258,22 @@
#endif
};
+/*
+ * Try to keep at least this much lowmem free. Do not allow normal
+ * allocations below this point, only high priority ones. Automatically
+ * tuned according to the amount of memory in the system.
+ */
int min_free_kbytes = 1024;
int user_min_free_kbytes = -1;
int watermark_scale_factor = 10;
+/*
+ * Extra memory for the system to try freeing. Used to temporarily
+ * free memory, to make space for new workloads. Anyone can allocate
+ * down to the min watermarks controlled by min_free_kbytes above.
+ */
+int extra_free_kbytes = 0;
+
static unsigned long __meminitdata nr_kernel_pages;
static unsigned long __meminitdata nr_all_pages;
static unsigned long __meminitdata dma_reserve;
@@ -6891,6 +6903,7 @@
static void __setup_per_zone_wmarks(void)
{
unsigned long pages_min = min_free_kbytes >> (PAGE_SHIFT - 10);
+ unsigned long pages_low = extra_free_kbytes >> (PAGE_SHIFT - 10);
unsigned long lowmem_pages = 0;
struct zone *zone;
unsigned long flags;
@@ -6902,11 +6915,14 @@
}
for_each_zone(zone) {
- u64 tmp;
+ u64 min, low;
spin_lock_irqsave(&zone->lock, flags);
- tmp = (u64)pages_min * zone->managed_pages;
- do_div(tmp, lowmem_pages);
+ min = (u64)pages_min * zone->managed_pages;
+ do_div(min, lowmem_pages);
+ low = (u64)pages_low * zone->managed_pages;
+ do_div(low, vm_total_pages);
+
if (is_highmem(zone)) {
/*
* __GFP_HIGH and PF_MEMALLOC allocations usually don't
@@ -6927,7 +6943,7 @@
* If it's a lowmem zone, reserve a number of pages
* proportionate to the zone's size.
*/
- zone->watermark[WMARK_MIN] = tmp;
+ zone->watermark[WMARK_MIN] = min;
}
/*
@@ -6935,12 +6951,14 @@
* scale factor in proportion to available memory, but
* ensure a minimum size on small systems.
*/
- tmp = max_t(u64, tmp >> 2,
+ min = max_t(u64, min >> 2,
mult_frac(zone->managed_pages,
watermark_scale_factor, 10000));
- zone->watermark[WMARK_LOW] = min_wmark_pages(zone) + tmp;
- zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) + tmp * 2;
+ zone->watermark[WMARK_LOW] = min_wmark_pages(zone) +
+ low + min;
+ zone->watermark[WMARK_HIGH] = min_wmark_pages(zone) +
+ low + min * 2;
spin_unlock_irqrestore(&zone->lock, flags);
}
@@ -7023,7 +7041,7 @@
/*
* min_free_kbytes_sysctl_handler - just a wrapper around proc_dointvec() so
* that we can call two helper functions whenever min_free_kbytes
- * changes.
+ * or extra_free_kbytes changes.
*/
int min_free_kbytes_sysctl_handler(struct ctl_table *table, int write,
void __user *buffer, size_t *length, loff_t *ppos)
diff --git a/mm/readahead.c b/mm/readahead.c
index 59aa0d0..bf9db0f 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -81,7 +81,7 @@
* Hides the details of the LRU cache etc from the filesystems.
*/
int read_cache_pages(struct address_space *mapping, struct list_head *pages,
- int (*filler)(void *, struct page *), void *data)
+ int (*filler)(struct file *, struct page *), void *data)
{
struct page *page;
int ret = 0;
diff --git a/mm/shmem.c b/mm/shmem.c
index f383298..6a5973d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -4275,6 +4275,14 @@
}
EXPORT_SYMBOL_GPL(shmem_file_setup);
+void shmem_set_file(struct vm_area_struct *vma, struct file *file)
+{
+ if (vma->vm_file)
+ fput(vma->vm_file);
+ vma->vm_file = file;
+ vma->vm_ops = &shmem_vm_ops;
+}
+
/**
* shmem_zero_setup - setup a shared anonymous mapping
* @vma: the vma to be mmapped is prepared by do_mmap_pgoff
@@ -4294,10 +4302,7 @@
if (IS_ERR(file))
return PTR_ERR(file);
- if (vma->vm_file)
- fput(vma->vm_file);
- vma->vm_file = file;
- vma->vm_ops = &shmem_vm_ops;
+ shmem_set_file(vma, file);
if (IS_ENABLED(CONFIG_TRANSPARENT_HUGE_PAGECACHE) &&
((vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK) <
diff --git a/mm/swap.c b/mm/swap.c
index a77d68f..4edac53 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -986,15 +986,25 @@
}
EXPORT_SYMBOL(pagevec_lookup_range);
-unsigned pagevec_lookup_tag(struct pagevec *pvec, struct address_space *mapping,
- pgoff_t *index, int tag, unsigned nr_pages)
+unsigned pagevec_lookup_range_tag(struct pagevec *pvec,
+ struct address_space *mapping, pgoff_t *index, pgoff_t end,
+ int tag)
{
- pvec->nr = find_get_pages_tag(mapping, index, tag,
- nr_pages, pvec->pages);
+ pvec->nr = find_get_pages_range_tag(mapping, index, end, tag,
+ PAGEVEC_SIZE, pvec->pages);
return pagevec_count(pvec);
}
-EXPORT_SYMBOL(pagevec_lookup_tag);
+EXPORT_SYMBOL(pagevec_lookup_range_tag);
+unsigned pagevec_lookup_range_nr_tag(struct pagevec *pvec,
+ struct address_space *mapping, pgoff_t *index, pgoff_t end,
+ int tag, unsigned max_pages)
+{
+ pvec->nr = find_get_pages_range_tag(mapping, index, end, tag,
+ min_t(unsigned int, max_pages, PAGEVEC_SIZE), pvec->pages);
+ return pagevec_count(pvec);
+}
+EXPORT_SYMBOL(pagevec_lookup_range_nr_tag);
/*
* Perform any setup for the swap system
*/
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 685049a..abf1a56 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -190,6 +190,7 @@
* (see: fix_fullness_group())
*/
static const int fullness_threshold_frac = 4;
+static size_t huge_class_size;
struct size_class {
spinlock_t lock;
@@ -1426,6 +1427,25 @@
}
EXPORT_SYMBOL_GPL(zs_unmap_object);
+/**
+ * zs_huge_class_size() - Returns the size (in bytes) of the first huge
+ * zsmalloc &size_class.
+ * @pool: zsmalloc pool to use
+ *
+ * The function returns the size of the first huge class - any object of equal
+ * or bigger size will be stored in zspage consisting of a single physical
+ * page.
+ *
+ * Context: Any context.
+ *
+ * Return: the size (in bytes) of the first huge zsmalloc &size_class.
+ */
+size_t zs_huge_class_size(struct zs_pool *pool)
+{
+ return huge_class_size;
+}
+EXPORT_SYMBOL_GPL(zs_huge_class_size);
+
static unsigned long obj_malloc(struct size_class *class,
struct zspage *zspage, unsigned long handle)
{
@@ -2386,6 +2406,27 @@
objs_per_zspage = pages_per_zspage * PAGE_SIZE / size;
/*
+ * We iterate from biggest down to smallest classes,
+ * so huge_class_size holds the size of the first huge
+ * class. Any object bigger than or equal to that will
+ * endup in the huge class.
+ */
+ if (pages_per_zspage != 1 && objs_per_zspage != 1 &&
+ !huge_class_size) {
+ huge_class_size = size;
+ /*
+ * The object uses ZS_HANDLE_SIZE bytes to store the
+ * handle. We need to subtract it, because zs_malloc()
+ * unconditionally adds handle size before it performs
+ * size class search - so object may be smaller than
+ * huge class size, yet it still can end up in the huge
+ * class because it grows by ZS_HANDLE_SIZE extra bytes
+ * right before class lookup.
+ */
+ huge_class_size -= (ZS_HANDLE_SIZE - 1);
+ }
+
+ /*
* size_class is used for normal zsmalloc operation such
* as alloc/free for that size. Although it is natural that we
* have one size_class for each size, there is a chance that we
diff --git a/net/Kconfig b/net/Kconfig
index 9dba271..999cc6b 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -91,6 +91,12 @@
endif # if INET
+config ANDROID_PARANOID_NETWORK
+ bool "Only allow certain groups to create sockets"
+ default y
+ help
+ none
+
config NETWORK_SECMARK
bool "Security Marking"
help
@@ -284,6 +290,7 @@
bool "enable BPF Just In Time compiler"
depends on HAVE_CBPF_JIT || HAVE_EBPF_JIT
depends on MODULES
+ depends on !CFI
---help---
Berkeley Packet Filter filtering capabilities are normally handled
by an interpreter. This option allows kernel to generate a native
diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 91e3ba2..91d52a8 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -108,11 +108,40 @@
}
EXPORT_SYMBOL(bt_sock_unregister);
+#ifdef CONFIG_PARANOID_NETWORK
+static inline int current_has_bt_admin(void)
+{
+ return !current_euid();
+}
+
+static inline int current_has_bt(void)
+{
+ return current_has_bt_admin();
+}
+# else
+static inline int current_has_bt_admin(void)
+{
+ return 1;
+}
+
+static inline int current_has_bt(void)
+{
+ return 1;
+}
+#endif
+
static int bt_sock_create(struct net *net, struct socket *sock, int proto,
int kern)
{
int err;
+ if (proto == BTPROTO_RFCOMM || proto == BTPROTO_SCO ||
+ proto == BTPROTO_L2CAP) {
+ if (!current_has_bt())
+ return -EPERM;
+ } else if (!current_has_bt_admin())
+ return -EPERM;
+
if (net != &init_net)
return -EAFNOSUPPORT;
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 6e1e10f..3bc92c1 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -2345,7 +2345,7 @@
x = xfrm_state_lookup_byspi(pn->net, htonl(pkt_dev->spi), AF_INET);
} else {
/* slow path: we dont already have xfrm_state */
- x = xfrm_stateonly_find(pn->net, DUMMY_MARK,
+ x = xfrm_stateonly_find(pn->net, DUMMY_MARK, 0,
(xfrm_address_t *)&pkt_dev->cur_daddr,
(xfrm_address_t *)&pkt_dev->cur_saddr,
AF_INET,
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index c6c8ad1..d206121 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -17,6 +17,7 @@
obj-$(CONFIG_NET_IP_TUNNEL) += ip_tunnel.o
obj-$(CONFIG_SYSCTL) += sysctl_net_ipv4.o
+obj-$(CONFIG_SYSFS) += sysfs_net_ipv4.o
obj-$(CONFIG_PROC_FS) += proc.o
obj-$(CONFIG_IP_MULTIPLE_TABLES) += fib_rules.o
obj-$(CONFIG_IP_MROUTE) += ipmr.o
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index b9d9a2b..c7e47f4 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -89,6 +89,7 @@
#include <linux/netfilter_ipv4.h>
#include <linux/random.h>
#include <linux/slab.h>
+#include <linux/netfilter/xt_qtaguid.h>
#include <linux/uaccess.h>
@@ -121,6 +122,19 @@
#endif
#include <net/l3mdev.h>
+#ifdef CONFIG_ANDROID_PARANOID_NETWORK
+#include <linux/android_aid.h>
+
+static inline int current_has_network(void)
+{
+ return in_egroup_p(AID_INET) || capable(CAP_NET_RAW);
+}
+#else
+static inline int current_has_network(void)
+{
+ return 1;
+}
+#endif
/* The inetsw table contains everything that inet_create needs to
* build a new socket.
@@ -255,6 +269,9 @@
if (protocol < 0 || protocol >= IPPROTO_MAX)
return -EINVAL;
+ if (!current_has_network())
+ return -EACCES;
+
sock->state = SS_UNCONNECTED;
/* Look for the requested type/protocol pair. */
@@ -303,8 +320,7 @@
}
err = -EPERM;
- if (sock->type == SOCK_RAW && !kern &&
- !ns_capable(net->user_ns, CAP_NET_RAW))
+ if (sock->type == SOCK_RAW && !kern && !capable(CAP_NET_RAW))
goto out_rcu_unlock;
sock->ops = answer->ops;
@@ -407,6 +423,9 @@
if (sk) {
long timeout;
+#ifdef CONFIG_NETFILTER_XT_MATCH_QTAGUID
+ qtaguid_untag(sock, true);
+#endif
/* Applications forget to leave groups before exiting */
ip_mc_drop_socket(sk);
diff --git a/net/ipv4/netfilter/nf_socket_ipv4.c b/net/ipv4/netfilter/nf_socket_ipv4.c
index 4824b1e..0f4ed1b 100644
--- a/net/ipv4/netfilter/nf_socket_ipv4.c
+++ b/net/ipv4/netfilter/nf_socket_ipv4.c
@@ -100,6 +100,7 @@
__be16 uninitialized_var(dport), uninitialized_var(sport);
const struct iphdr *iph = ip_hdr(skb);
struct sk_buff *data_skb = NULL;
+ struct sock *sk = skb->sk;
u8 uninitialized_var(protocol);
#if IS_ENABLED(CONFIG_NF_CONNTRACK)
enum ip_conntrack_info ctinfo;
@@ -155,8 +156,14 @@
}
#endif
- return nf_socket_get_sock_v4(net, data_skb, doff, protocol, saddr,
- daddr, sport, dport, indev);
+ if (sk)
+ refcount_inc(&sk->sk_refcnt);
+ else
+ sk = nf_socket_get_sock_v4(dev_net(skb->dev), data_skb, doff,
+ protocol, saddr, daddr, sport,
+ dport, indev);
+
+ return sk;
}
EXPORT_SYMBOL_GPL(nf_sk_lookup_slow_v4);
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index d82e834..e51f5be 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -198,6 +198,21 @@
return ret;
}
+/* Validate changes from /proc interface. */
+static int proc_tcp_default_init_rwnd(struct ctl_table *ctl, int write,
+ void __user *buffer,
+ size_t *lenp, loff_t *ppos)
+{
+ int old_value = *(int *)ctl->data;
+ int ret = proc_dointvec(ctl, write, buffer, lenp, ppos);
+ int new_value = *(int *)ctl->data;
+
+ if (write && ret == 0 && (new_value < 3 || new_value > 100))
+ *(int *)ctl->data = old_value;
+
+ return ret;
+}
+
static int proc_tcp_congestion_control(struct ctl_table *ctl, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
@@ -724,6 +739,13 @@
.proc_handler = proc_tcp_available_ulp,
},
{
+ .procname = "tcp_default_init_rwnd",
+ .data = &sysctl_tcp_default_init_rwnd,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_tcp_default_init_rwnd
+ },
+ {
.procname = "icmp_msgs_per_sec",
.data = &sysctl_icmp_msgs_per_sec,
.maxlen = sizeof(int),
diff --git a/net/ipv4/sysfs_net_ipv4.c b/net/ipv4/sysfs_net_ipv4.c
new file mode 100644
index 0000000..0cbbf10
--- /dev/null
+++ b/net/ipv4/sysfs_net_ipv4.c
@@ -0,0 +1,88 @@
+/*
+ * net/ipv4/sysfs_net_ipv4.c
+ *
+ * sysfs-based networking knobs (so we can, unlike with sysctl, control perms)
+ *
+ * Copyright (C) 2008 Google, Inc.
+ *
+ * Robert Love <rlove@google.com>
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ * GNU General Public License for more details.
+ */
+
+#include <linux/kobject.h>
+#include <linux/string.h>
+#include <linux/sysfs.h>
+#include <linux/init.h>
+#include <net/tcp.h>
+
+#define CREATE_IPV4_FILE(_name, _var) \
+static ssize_t _name##_show(struct kobject *kobj, \
+ struct kobj_attribute *attr, char *buf) \
+{ \
+ return sprintf(buf, "%d\n", _var); \
+} \
+static ssize_t _name##_store(struct kobject *kobj, \
+ struct kobj_attribute *attr, \
+ const char *buf, size_t count) \
+{ \
+ int val, ret; \
+ ret = sscanf(buf, "%d", &val); \
+ if (ret != 1) \
+ return -EINVAL; \
+ if (val < 0) \
+ return -EINVAL; \
+ _var = val; \
+ return count; \
+} \
+static struct kobj_attribute _name##_attr = \
+ __ATTR(_name, 0644, _name##_show, _name##_store)
+
+CREATE_IPV4_FILE(tcp_wmem_min, sysctl_tcp_wmem[0]);
+CREATE_IPV4_FILE(tcp_wmem_def, sysctl_tcp_wmem[1]);
+CREATE_IPV4_FILE(tcp_wmem_max, sysctl_tcp_wmem[2]);
+
+CREATE_IPV4_FILE(tcp_rmem_min, sysctl_tcp_rmem[0]);
+CREATE_IPV4_FILE(tcp_rmem_def, sysctl_tcp_rmem[1]);
+CREATE_IPV4_FILE(tcp_rmem_max, sysctl_tcp_rmem[2]);
+
+static struct attribute *ipv4_attrs[] = {
+ &tcp_wmem_min_attr.attr,
+ &tcp_wmem_def_attr.attr,
+ &tcp_wmem_max_attr.attr,
+ &tcp_rmem_min_attr.attr,
+ &tcp_rmem_def_attr.attr,
+ &tcp_rmem_max_attr.attr,
+ NULL
+};
+
+static struct attribute_group ipv4_attr_group = {
+ .attrs = ipv4_attrs,
+};
+
+static __init int sysfs_ipv4_init(void)
+{
+ struct kobject *ipv4_kobject;
+ int ret;
+
+ ipv4_kobject = kobject_create_and_add("ipv4", kernel_kobj);
+ if (!ipv4_kobject)
+ return -ENOMEM;
+
+ ret = sysfs_create_group(ipv4_kobject, &ipv4_attr_group);
+ if (ret) {
+ kobject_put(ipv4_kobject);
+ return ret;
+ }
+
+ return 0;
+}
+
+subsys_initcall(sysfs_ipv4_init);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 991f382..2f2070e 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -95,6 +95,7 @@
int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
int sysctl_tcp_early_retrans __read_mostly = 3;
int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2;
+int sysctl_tcp_default_init_rwnd __read_mostly = TCP_INIT_CWND * 2;
#define FLAG_DATA 0x01 /* Incoming frame contained data. */
#define FLAG_WIN_UPDATE 0x02 /* Incoming ACK was a window update. */
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index b2ead31..6bf0c38 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -194,7 +194,7 @@
* (RFC 3517, Section 4, NextSeg() rule (2)). Further place a
* limit when mss is larger than 1460.
*/
- u32 init_rwnd = TCP_INIT_CWND * 2;
+ u32 init_rwnd = sysctl_tcp_default_init_rwnd;
if (mss > 1460)
init_rwnd = max((1460 * init_rwnd) / mss, 2U);
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 6a76e41..e773245 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -229,6 +229,7 @@
.accept_ra_rt_info_max_plen = 0,
#endif
#endif
+ .accept_ra_rt_table = 0,
.proxy_ndp = 0,
.accept_source_route = 0, /* we do not accept RH0 by default. */
.disable_ipv6 = 0,
@@ -283,6 +284,7 @@
.accept_ra_rt_info_max_plen = 0,
#endif
#endif
+ .accept_ra_rt_table = 0,
.proxy_ndp = 0,
.accept_source_route = 0, /* we do not accept RH0 by default. */
.disable_ipv6 = 0,
@@ -2274,6 +2276,31 @@
ipv6_regen_rndid(idev);
}
+u32 addrconf_rt_table(const struct net_device *dev, u32 default_table) {
+ /* Determines into what table to put autoconf PIO/RIO/default routes
+ * learned on this device.
+ *
+ * - If 0, use the same table for every device. This puts routes into
+ * one of RT_TABLE_{PREFIX,INFO,DFLT} depending on the type of route
+ * (but note that these three are currently all equal to
+ * RT6_TABLE_MAIN).
+ * - If > 0, use the specified table.
+ * - If < 0, put routes into table dev->ifindex + (-rt_table).
+ */
+ struct inet6_dev *idev = in6_dev_get(dev);
+ u32 table;
+ int sysctl = idev->cnf.accept_ra_rt_table;
+ if (sysctl == 0) {
+ table = default_table;
+ } else if (sysctl > 0) {
+ table = (u32) sysctl;
+ } else {
+ table = (unsigned) dev->ifindex + (-sysctl);
+ }
+ in6_dev_put(idev);
+ return table;
+}
+
/*
* Add prefix route.
*/
@@ -2283,7 +2310,7 @@
unsigned long expires, u32 flags)
{
struct fib6_config cfg = {
- .fc_table = l3mdev_fib_table(dev) ? : RT6_TABLE_PREFIX,
+ .fc_table = l3mdev_fib_table(dev) ? : addrconf_rt_table(dev, RT6_TABLE_PREFIX),
.fc_metric = IP6_RT_PRIO_ADDRCONF,
.fc_ifindex = dev->ifindex,
.fc_expires = expires,
@@ -2316,7 +2343,7 @@
struct fib6_node *fn;
struct rt6_info *rt = NULL;
struct fib6_table *table;
- u32 tb_id = l3mdev_fib_table(dev) ? : RT6_TABLE_PREFIX;
+ u32 tb_id = l3mdev_fib_table(dev) ? : addrconf_rt_table(dev, RT6_TABLE_PREFIX);
table = fib6_get_table(dev_net(dev), tb_id);
if (!table)
@@ -5043,6 +5070,7 @@
array[DEVCONF_ACCEPT_RA_RT_INFO_MAX_PLEN] = cnf->accept_ra_rt_info_max_plen;
#endif
#endif
+ array[DEVCONF_ACCEPT_RA_RT_TABLE] = cnf->accept_ra_rt_table;
array[DEVCONF_PROXY_NDP] = cnf->proxy_ndp;
array[DEVCONF_ACCEPT_SOURCE_ROUTE] = cnf->accept_source_route;
#ifdef CONFIG_IPV6_OPTIMISTIC_DAD
@@ -6203,6 +6231,13 @@
#endif
#endif
{
+ .procname = "accept_ra_rt_table",
+ .data = &ipv6_devconf.accept_ra_rt_table,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+ {
.procname = "proxy_ndp",
.data = &ipv6_devconf.proxy_ndp,
.maxlen = sizeof(int),
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 9ccbf74..e9fa881 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -66,6 +66,20 @@
#include <linux/uaccess.h>
#include <linux/mroute6.h>
+#ifdef CONFIG_ANDROID_PARANOID_NETWORK
+#include <linux/android_aid.h>
+
+static inline int current_has_network(void)
+{
+ return in_egroup_p(AID_INET) || capable(CAP_NET_RAW);
+}
+#else
+static inline int current_has_network(void)
+{
+ return 1;
+}
+#endif
+
#include "ip6_offload.h"
MODULE_AUTHOR("Cast of dozens");
@@ -122,6 +136,9 @@
if (protocol < 0 || protocol >= IPPROTO_MAX)
return -EINVAL;
+ if (!current_has_network())
+ return -EACCES;
+
/* Look for the requested type/protocol pair. */
lookup_protocol:
err = -ESOCKTNOSUPPORT;
@@ -168,8 +185,7 @@
}
err = -EPERM;
- if (sock->type == SOCK_RAW && !kern &&
- !ns_capable(net->user_ns, CAP_NET_RAW))
+ if (sock->type == SOCK_RAW && !kern && !capable(CAP_NET_RAW))
goto out_rcu_unlock;
sock->ops = answer->ops;
diff --git a/net/ipv6/exthdrs_core.c b/net/ipv6/exthdrs_core.c
index 305e2ed..477692f 100644
--- a/net/ipv6/exthdrs_core.c
+++ b/net/ipv6/exthdrs_core.c
@@ -166,15 +166,15 @@
* to explore inner IPv6 header, eg. ICMPv6 error messages.
*
* If target header is found, its offset is set in *offset and return protocol
- * number. Otherwise, return -1.
+ * number. Otherwise, return -ENOENT or -EBADMSG.
*
* If the first fragment doesn't contain the final protocol header or
* NEXTHDR_NONE it is considered invalid.
*
* Note that non-1st fragment is special case that "the protocol number
* of last header" is "next header" field in Fragment header. In this case,
- * *offset is meaningless and fragment offset is stored in *fragoff if fragoff
- * isn't NULL.
+ * *offset is meaningless. If fragoff is not NULL, the fragment offset is
+ * stored in *fragoff; if it is NULL, return -EINVAL.
*
* if flags is not NULL and it's a fragment, then the frag flag
* IP6_FH_F_FRAG will be set. If it's an AH header, the
@@ -253,9 +253,12 @@
if (target < 0 &&
((!ipv6_ext_hdr(hp->nexthdr)) ||
hp->nexthdr == NEXTHDR_NONE)) {
- if (fragoff)
+ if (fragoff) {
*fragoff = _frag_off;
- return hp->nexthdr;
+ return hp->nexthdr;
+ } else {
+ return -EINVAL;
+ }
}
if (!found)
return -ENOENT;
diff --git a/net/ipv6/netfilter/nf_socket_ipv6.c b/net/ipv6/netfilter/nf_socket_ipv6.c
index f14de4b..be7ce4d 100644
--- a/net/ipv6/netfilter/nf_socket_ipv6.c
+++ b/net/ipv6/netfilter/nf_socket_ipv6.c
@@ -102,6 +102,7 @@
struct sock *nf_sk_lookup_slow_v6(struct net *net, const struct sk_buff *skb,
const struct net_device *indev)
{
+ struct sock *sk = skb->sk;
__be16 uninitialized_var(dport), uninitialized_var(sport);
const struct in6_addr *daddr = NULL, *saddr = NULL;
struct ipv6hdr *iph = ipv6_hdr(skb);
@@ -143,8 +144,14 @@
return NULL;
}
- return nf_socket_get_sock_v6(net, data_skb, doff, tproto, saddr, daddr,
- sport, dport, indev);
+ if (sk)
+ refcount_inc(&sk->sk_refcnt);
+ else
+ sk = nf_socket_get_sock_v6(dev_net(skb->dev), data_skb, doff,
+ tproto, saddr, daddr, sport, dport,
+ indev);
+
+ return sk;
}
EXPORT_SYMBOL_GPL(nf_sk_lookup_slow_v6);
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 60efd32..49aca95 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2507,8 +2507,7 @@
const struct in6_addr *gwaddr,
struct net_device *dev)
{
- u32 tb_id = l3mdev_fib_table(dev) ? : RT6_TABLE_INFO;
- int ifindex = dev->ifindex;
+ u32 tb_id = l3mdev_fib_table(dev) ? : addrconf_rt_table(dev, RT6_TABLE_INFO);
struct fib6_node *fn;
struct rt6_info *rt = NULL;
struct fib6_table *table;
@@ -2523,7 +2522,7 @@
goto out;
for (rt = fn->leaf; rt; rt = rt->dst.rt6_next) {
- if (rt->dst.dev->ifindex != ifindex)
+ if (rt->dst.dev->ifindex != dev->ifindex)
continue;
if ((rt->rt6i_flags & (RTF_ROUTEINFO|RTF_GATEWAY)) != (RTF_ROUTEINFO|RTF_GATEWAY))
continue;
@@ -2555,7 +2554,7 @@
.fc_nlinfo.nl_net = net,
};
- cfg.fc_table = l3mdev_fib_table(dev) ? : RT6_TABLE_INFO,
+ cfg.fc_table = l3mdev_fib_table(dev) ? : addrconf_rt_table(dev, RT6_TABLE_INFO),
cfg.fc_dst = *prefix;
cfg.fc_gateway = *gwaddr;
@@ -2571,7 +2570,7 @@
struct rt6_info *rt6_get_dflt_router(const struct in6_addr *addr, struct net_device *dev)
{
- u32 tb_id = l3mdev_fib_table(dev) ? : RT6_TABLE_DFLT;
+ u32 tb_id = l3mdev_fib_table(dev) ? : addrconf_rt_table(dev, RT6_TABLE_MAIN);
struct rt6_info *rt;
struct fib6_table *table;
@@ -2597,7 +2596,7 @@
unsigned int pref)
{
struct fib6_config cfg = {
- .fc_table = l3mdev_fib_table(dev) ? : RT6_TABLE_DFLT,
+ .fc_table = l3mdev_fib_table(dev) ? : addrconf_rt_table(dev, RT6_TABLE_DFLT),
.fc_metric = IP6_RT_PRIO_USER,
.fc_ifindex = dev->ifindex,
.fc_flags = RTF_GATEWAY | RTF_ADDRCONF | RTF_DEFAULT |
@@ -2621,43 +2620,16 @@
return rt6_get_dflt_router(gwaddr, dev);
}
-static void __rt6_purge_dflt_routers(struct fib6_table *table)
-{
- struct rt6_info *rt;
-
-restart:
- read_lock_bh(&table->tb6_lock);
- for (rt = table->tb6_root.leaf; rt; rt = rt->dst.rt6_next) {
- if (rt->rt6i_flags & (RTF_DEFAULT | RTF_ADDRCONF) &&
- (!rt->rt6i_idev || rt->rt6i_idev->cnf.accept_ra != 2)) {
- dst_hold(&rt->dst);
- read_unlock_bh(&table->tb6_lock);
- ip6_del_rt(rt);
- goto restart;
- }
- }
- read_unlock_bh(&table->tb6_lock);
-
- table->flags &= ~RT6_TABLE_HAS_DFLT_ROUTER;
+int rt6_addrconf_purge(struct rt6_info *rt, void *arg) {
+ if (rt->rt6i_flags & (RTF_DEFAULT | RTF_ADDRCONF) &&
+ (!rt->rt6i_idev || rt->rt6i_idev->cnf.accept_ra != 2))
+ return -1;
+ return 0;
}
void rt6_purge_dflt_routers(struct net *net)
{
- struct fib6_table *table;
- struct hlist_head *head;
- unsigned int h;
-
- rcu_read_lock();
-
- for (h = 0; h < FIB6_TABLE_HASHSZ; h++) {
- head = &net->ipv6.fib_table_hash[h];
- hlist_for_each_entry_rcu(table, head, tb6_hlist) {
- if (table->flags & RT6_TABLE_HAS_DFLT_ROUTER)
- __rt6_purge_dflt_routers(table);
- }
- }
-
- rcu_read_unlock();
+ fib6_clean_all(net, rt6_addrconf_purge, NULL);
}
static void rtmsg_to_fib6_config(struct net *net,
diff --git a/net/key/af_key.c b/net/key/af_key.c
index 3b209cb..294b68e 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -1383,7 +1383,7 @@
}
if (!x)
- x = xfrm_find_acq(net, &dummy_mark, mode, reqid, proto, xdaddr, xsaddr, 1, family);
+ x = xfrm_find_acq(net, &dummy_mark, mode, reqid, 0, proto, xdaddr, xsaddr, 1, family);
if (x == NULL)
return -ENOENT;
@@ -2412,7 +2412,7 @@
return err;
}
- xp = xfrm_policy_bysel_ctx(net, DUMMY_MARK, XFRM_POLICY_TYPE_MAIN,
+ xp = xfrm_policy_bysel_ctx(net, DUMMY_MARK, 0, XFRM_POLICY_TYPE_MAIN,
pol->sadb_x_policy_dir - 1, &sel, pol_ctx,
1, &err);
security_xfrm_policy_free(pol_ctx);
@@ -2661,7 +2661,7 @@
return -EINVAL;
delete = (hdr->sadb_msg_type == SADB_X_SPDDELETE2);
- xp = xfrm_policy_byid(net, DUMMY_MARK, XFRM_POLICY_TYPE_MAIN,
+ xp = xfrm_policy_byid(net, DUMMY_MARK, 0, XFRM_POLICY_TYPE_MAIN,
dir, pol->sadb_x_policy_id, delete, &err);
if (xp == NULL)
return -ENOENT;
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig
index e4a13cc..1a7a43f 100644
--- a/net/netfilter/Kconfig
+++ b/net/netfilter/Kconfig
@@ -1360,6 +1360,8 @@
based on who created the socket: the user or group. It is also
possible to check whether a socket actually exists.
+ Conflicts with '"quota, tag, uid" match'
+
config NETFILTER_XT_MATCH_POLICY
tristate 'IPsec "policy" match support'
depends on XFRM
@@ -1393,6 +1395,22 @@
To compile it as a module, choose M here. If unsure, say N.
+config NETFILTER_XT_MATCH_QTAGUID
+ bool '"quota, tag, owner" match and stats support'
+ depends on NETFILTER_XT_MATCH_SOCKET
+ depends on NETFILTER_XT_MATCH_OWNER=n
+ help
+ This option replaces the `owner' match. In addition to matching
+ on uid, it keeps stats based on a tag assigned to a socket.
+ The full tag is comprised of a UID and an accounting tag.
+ The tags are assignable to sockets from user space (e.g. a download
+ manager can assign the socket to another UID for accounting).
+ Stats and control are done via /proc/net/xt_qtaguid/.
+ It replaces owner as it takes the same arguments, but should
+ really be recognized by the iptables tool.
+
+ If unsure, say `N'.
+
config NETFILTER_XT_MATCH_QUOTA
tristate '"quota" match support'
depends on NETFILTER_ADVANCED
@@ -1403,6 +1421,29 @@
If you want to compile it as a module, say M here and read
<file:Documentation/kbuild/modules.txt>. If unsure, say `N'.
+config NETFILTER_XT_MATCH_QUOTA2
+ tristate '"quota2" match support'
+ depends on NETFILTER_ADVANCED
+ help
+ This option adds a `quota2' match, which allows to match on a
+ byte counter correctly and not per CPU.
+ It allows naming the quotas.
+ This is based on http://xtables-addons.git.sourceforge.net
+
+ If you want to compile it as a module, say M here and read
+ <file:Documentation/kbuild/modules.txt>. If unsure, say `N'.
+
+config NETFILTER_XT_MATCH_QUOTA2_LOG
+ bool '"quota2" Netfilter LOG support'
+ depends on NETFILTER_XT_MATCH_QUOTA2
+ default n
+ help
+ This option allows `quota2' to log ONCE when a quota limit
+ is passed. It logs via NETLINK using the NETLINK_NFLOG family.
+ It logs similarly to how ipt_ULOG would without data.
+
+ If unsure, say `N'.
+
config NETFILTER_XT_MATCH_RATEEST
tristate '"rateest" match support'
depends on NETFILTER_ADVANCED
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index f78ed24..bcd5603 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -173,7 +173,9 @@
obj-$(CONFIG_NETFILTER_XT_MATCH_PHYSDEV) += xt_physdev.o
obj-$(CONFIG_NETFILTER_XT_MATCH_PKTTYPE) += xt_pkttype.o
obj-$(CONFIG_NETFILTER_XT_MATCH_POLICY) += xt_policy.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_QTAGUID) += xt_qtaguid_print.o xt_qtaguid.o
obj-$(CONFIG_NETFILTER_XT_MATCH_QUOTA) += xt_quota.o
+obj-$(CONFIG_NETFILTER_XT_MATCH_QUOTA2) += xt_quota2.o
obj-$(CONFIG_NETFILTER_XT_MATCH_RATEEST) += xt_rateest.o
obj-$(CONFIG_NETFILTER_XT_MATCH_REALM) += xt_realm.o
obj-$(CONFIG_NETFILTER_XT_MATCH_RECENT) += xt_recent.o
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index a268acc..c44500d 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1940,7 +1940,7 @@
return 0;
}
-int nf_conntrack_set_hashsize(const char *val, struct kernel_param *kp)
+int nf_conntrack_set_hashsize(const char *val, const struct kernel_param *kp)
{
unsigned int hashsize;
int rc;
diff --git a/net/netfilter/nf_nat_ftp.c b/net/netfilter/nf_nat_ftp.c
index e84a578..d76afaf 100644
--- a/net/netfilter/nf_nat_ftp.c
+++ b/net/netfilter/nf_nat_ftp.c
@@ -134,7 +134,7 @@
}
/* Prior to 2.6.11, we had a ports param. No longer, but don't break users. */
-static int warn_set(const char *val, struct kernel_param *kp)
+static int warn_set(const char *val, const struct kernel_param *kp)
{
printk(KERN_INFO KBUILD_MODNAME
": kernel >= 2.6.10 only uses 'ports' for conntrack modules\n");
diff --git a/net/netfilter/nf_nat_irc.c b/net/netfilter/nf_nat_irc.c
index 0648cb0..dcb5f63 100644
--- a/net/netfilter/nf_nat_irc.c
+++ b/net/netfilter/nf_nat_irc.c
@@ -106,7 +106,7 @@
}
/* Prior to 2.6.11, we had a ports param. No longer, but don't break users. */
-static int warn_set(const char *val, struct kernel_param *kp)
+static int warn_set(const char *val, const struct kernel_param *kp)
{
printk(KERN_INFO KBUILD_MODNAME
": kernel >= 2.6.10 only uses 'ports' for conntrack modules\n");
diff --git a/net/netfilter/xt_IDLETIMER.c b/net/netfilter/xt_IDLETIMER.c
index 1141f08..a4843c1 100644
--- a/net/netfilter/xt_IDLETIMER.c
+++ b/net/netfilter/xt_IDLETIMER.c
@@ -5,6 +5,7 @@
* After timer expires a kevent will be sent.
*
* Copyright (C) 2004, 2010 Nokia Corporation
+ *
* Written by Timo Teras <ext-timo.teras@nokia.com>
*
* Converted to x_tables and reworked for upstream inclusion
@@ -38,8 +39,17 @@
#include <linux/netfilter/xt_IDLETIMER.h>
#include <linux/kdev_t.h>
#include <linux/kobject.h>
+#include <linux/skbuff.h>
#include <linux/workqueue.h>
#include <linux/sysfs.h>
+#include <linux/rtc.h>
+#include <linux/time.h>
+#include <linux/math64.h>
+#include <linux/suspend.h>
+#include <linux/notifier.h>
+#include <net/net_namespace.h>
+#include <net/sock.h>
+#include <net/inet_sock.h>
struct idletimer_tg_attr {
struct attribute attr;
@@ -55,14 +65,110 @@
struct kobject *kobj;
struct idletimer_tg_attr attr;
+ struct timespec delayed_timer_trigger;
+ struct timespec last_modified_timer;
+ struct timespec last_suspend_time;
+ struct notifier_block pm_nb;
+
+ int timeout;
unsigned int refcnt;
+ bool work_pending;
+ bool send_nl_msg;
+ bool active;
+ uid_t uid;
};
static LIST_HEAD(idletimer_tg_list);
static DEFINE_MUTEX(list_mutex);
+static DEFINE_SPINLOCK(timestamp_lock);
static struct kobject *idletimer_tg_kobj;
+static bool check_for_delayed_trigger(struct idletimer_tg *timer,
+ struct timespec *ts)
+{
+ bool state;
+ struct timespec temp;
+ spin_lock_bh(×tamp_lock);
+ timer->work_pending = false;
+ if ((ts->tv_sec - timer->last_modified_timer.tv_sec) > timer->timeout ||
+ timer->delayed_timer_trigger.tv_sec != 0) {
+ state = false;
+ temp.tv_sec = timer->timeout;
+ temp.tv_nsec = 0;
+ if (timer->delayed_timer_trigger.tv_sec != 0) {
+ temp = timespec_add(timer->delayed_timer_trigger, temp);
+ ts->tv_sec = temp.tv_sec;
+ ts->tv_nsec = temp.tv_nsec;
+ timer->delayed_timer_trigger.tv_sec = 0;
+ timer->work_pending = true;
+ schedule_work(&timer->work);
+ } else {
+ temp = timespec_add(timer->last_modified_timer, temp);
+ ts->tv_sec = temp.tv_sec;
+ ts->tv_nsec = temp.tv_nsec;
+ }
+ } else {
+ state = timer->active;
+ }
+ spin_unlock_bh(×tamp_lock);
+ return state;
+}
+
+static void notify_netlink_uevent(const char *iface, struct idletimer_tg *timer)
+{
+ char iface_msg[NLMSG_MAX_SIZE];
+ char state_msg[NLMSG_MAX_SIZE];
+ char timestamp_msg[NLMSG_MAX_SIZE];
+ char uid_msg[NLMSG_MAX_SIZE];
+ char *envp[] = { iface_msg, state_msg, timestamp_msg, uid_msg, NULL };
+ int res;
+ struct timespec ts;
+ uint64_t time_ns;
+ bool state;
+
+ res = snprintf(iface_msg, NLMSG_MAX_SIZE, "INTERFACE=%s",
+ iface);
+ if (NLMSG_MAX_SIZE <= res) {
+ pr_err("message too long (%d)", res);
+ return;
+ }
+
+ get_monotonic_boottime(&ts);
+ state = check_for_delayed_trigger(timer, &ts);
+ res = snprintf(state_msg, NLMSG_MAX_SIZE, "STATE=%s",
+ state ? "active" : "inactive");
+
+ if (NLMSG_MAX_SIZE <= res) {
+ pr_err("message too long (%d)", res);
+ return;
+ }
+
+ if (state) {
+ res = snprintf(uid_msg, NLMSG_MAX_SIZE, "UID=%u", timer->uid);
+ if (NLMSG_MAX_SIZE <= res)
+ pr_err("message too long (%d)", res);
+ } else {
+ res = snprintf(uid_msg, NLMSG_MAX_SIZE, "UID=");
+ if (NLMSG_MAX_SIZE <= res)
+ pr_err("message too long (%d)", res);
+ }
+
+ time_ns = timespec_to_ns(&ts);
+ res = snprintf(timestamp_msg, NLMSG_MAX_SIZE, "TIME_NS=%llu", time_ns);
+ if (NLMSG_MAX_SIZE <= res) {
+ timestamp_msg[0] = '\0';
+ pr_err("message too long (%d)", res);
+ }
+
+ pr_debug("putting nlmsg: <%s> <%s> <%s> <%s>\n", iface_msg, state_msg,
+ timestamp_msg, uid_msg);
+ kobject_uevent_env(idletimer_tg_kobj, KOBJ_CHANGE, envp);
+ return;
+
+
+}
+
static
struct idletimer_tg *__idletimer_tg_find_by_label(const char *label)
{
@@ -83,6 +189,7 @@
{
struct idletimer_tg *timer;
unsigned long expires = 0;
+ unsigned long now = jiffies;
mutex_lock(&list_mutex);
@@ -92,11 +199,15 @@
mutex_unlock(&list_mutex);
- if (time_after(expires, jiffies))
+ if (time_after(expires, now))
return sprintf(buf, "%u\n",
- jiffies_to_msecs(expires - jiffies) / 1000);
+ jiffies_to_msecs(expires - now) / 1000);
- return sprintf(buf, "0\n");
+ if (timer->send_nl_msg)
+ return sprintf(buf, "0 %d\n",
+ jiffies_to_msecs(now - expires) / 1000);
+ else
+ return sprintf(buf, "0\n");
}
static void idletimer_tg_work(struct work_struct *work)
@@ -105,6 +216,9 @@
work);
sysfs_notify(idletimer_tg_kobj, NULL, timer->attr.attr.name);
+
+ if (timer->send_nl_msg)
+ notify_netlink_uevent(timer->attr.attr.name, timer);
}
static void idletimer_tg_expired(unsigned long data)
@@ -112,8 +226,55 @@
struct idletimer_tg *timer = (struct idletimer_tg *) data;
pr_debug("timer %s expired\n", timer->attr.attr.name);
-
+ spin_lock_bh(×tamp_lock);
+ timer->active = false;
+ timer->work_pending = true;
schedule_work(&timer->work);
+ spin_unlock_bh(×tamp_lock);
+}
+
+static int idletimer_resume(struct notifier_block *notifier,
+ unsigned long pm_event, void *unused)
+{
+ struct timespec ts;
+ unsigned long time_diff, now = jiffies;
+ struct idletimer_tg *timer = container_of(notifier,
+ struct idletimer_tg, pm_nb);
+ if (!timer)
+ return NOTIFY_DONE;
+ switch (pm_event) {
+ case PM_SUSPEND_PREPARE:
+ get_monotonic_boottime(&timer->last_suspend_time);
+ break;
+ case PM_POST_SUSPEND:
+ spin_lock_bh(×tamp_lock);
+ if (!timer->active) {
+ spin_unlock_bh(×tamp_lock);
+ break;
+ }
+ /* since jiffies are not updated when suspended now represents
+ * the time it would have suspended */
+ if (time_after(timer->timer.expires, now)) {
+ get_monotonic_boottime(&ts);
+ ts = timespec_sub(ts, timer->last_suspend_time);
+ time_diff = timespec_to_jiffies(&ts);
+ if (timer->timer.expires > (time_diff + now)) {
+ mod_timer_pending(&timer->timer,
+ (timer->timer.expires - time_diff));
+ } else {
+ del_timer(&timer->timer);
+ timer->timer.expires = 0;
+ timer->active = false;
+ timer->work_pending = true;
+ schedule_work(&timer->work);
+ }
+ }
+ spin_unlock_bh(×tamp_lock);
+ break;
+ default:
+ break;
+ }
+ return NOTIFY_DONE;
}
static int idletimer_tg_create(struct idletimer_tg_info *info)
@@ -146,6 +307,21 @@
setup_timer(&info->timer->timer, idletimer_tg_expired,
(unsigned long) info->timer);
info->timer->refcnt = 1;
+ info->timer->send_nl_msg = (info->send_nl_msg == 0) ? false : true;
+ info->timer->active = true;
+ info->timer->timeout = info->timeout;
+
+ info->timer->delayed_timer_trigger.tv_sec = 0;
+ info->timer->delayed_timer_trigger.tv_nsec = 0;
+ info->timer->work_pending = false;
+ info->timer->uid = 0;
+ get_monotonic_boottime(&info->timer->last_modified_timer);
+
+ info->timer->pm_nb.notifier_call = idletimer_resume;
+ ret = register_pm_notifier(&info->timer->pm_nb);
+ if (ret)
+ printk(KERN_WARNING "[%s] Failed to register pm notifier %d\n",
+ __func__, ret);
INIT_WORK(&info->timer->work, idletimer_tg_work);
@@ -162,6 +338,42 @@
return ret;
}
+static void reset_timer(const struct idletimer_tg_info *info,
+ struct sk_buff *skb)
+{
+ unsigned long now = jiffies;
+ struct idletimer_tg *timer = info->timer;
+ bool timer_prev;
+
+ spin_lock_bh(×tamp_lock);
+ timer_prev = timer->active;
+ timer->active = true;
+ /* timer_prev is used to guard overflow problem in time_before*/
+ if (!timer_prev || time_before(timer->timer.expires, now)) {
+ pr_debug("Starting Checkentry timer (Expired, Jiffies): %lu, %lu\n",
+ timer->timer.expires, now);
+
+ /* Stores the uid resposible for waking up the radio */
+ if (skb && (skb->sk)) {
+ timer->uid = from_kuid_munged(current_user_ns(),
+ sock_i_uid(skb_to_full_sk(skb)));
+ }
+
+ /* checks if there is a pending inactive notification*/
+ if (timer->work_pending)
+ timer->delayed_timer_trigger = timer->last_modified_timer;
+ else {
+ timer->work_pending = true;
+ schedule_work(&timer->work);
+ }
+ }
+
+ get_monotonic_boottime(&timer->last_modified_timer);
+ mod_timer(&timer->timer,
+ msecs_to_jiffies(info->timeout * 1000) + now);
+ spin_unlock_bh(×tamp_lock);
+}
+
/*
* The actual xt_tables plugin.
*/
@@ -169,15 +381,23 @@
const struct xt_action_param *par)
{
const struct idletimer_tg_info *info = par->targinfo;
+ unsigned long now = jiffies;
pr_debug("resetting timer %s, timeout period %u\n",
info->label, info->timeout);
BUG_ON(!info->timer);
- mod_timer(&info->timer->timer,
- msecs_to_jiffies(info->timeout * 1000) + jiffies);
+ info->timer->active = true;
+ if (time_before(info->timer->timer.expires, now)) {
+ schedule_work(&info->timer->work);
+ pr_debug("Starting timer %s (Expired, Jiffies): %lu, %lu\n",
+ info->label, info->timer->timer.expires, now);
+ }
+
+ /* TODO: Avoid modifying timers on each packet */
+ reset_timer(info, skb);
return XT_CONTINUE;
}
@@ -186,7 +406,7 @@
struct idletimer_tg_info *info = par->targinfo;
int ret;
- pr_debug("checkentry targinfo%s\n", info->label);
+ pr_debug("checkentry targinfo %s\n", info->label);
if (info->timeout == 0) {
pr_debug("timeout value is zero\n");
@@ -208,9 +428,7 @@
info->timer = __idletimer_tg_find_by_label(info->label);
if (info->timer) {
info->timer->refcnt++;
- mod_timer(&info->timer->timer,
- msecs_to_jiffies(info->timeout * 1000) + jiffies);
-
+ reset_timer(info, NULL);
pr_debug("increased refcnt of timer %s to %u\n",
info->label, info->timer->refcnt);
} else {
@@ -223,6 +441,7 @@
}
mutex_unlock(&list_mutex);
+
return 0;
}
@@ -239,13 +458,14 @@
list_del(&info->timer->entry);
del_timer_sync(&info->timer->timer);
- cancel_work_sync(&info->timer->work);
sysfs_remove_file(idletimer_tg_kobj, &info->timer->attr.attr);
+ unregister_pm_notifier(&info->timer->pm_nb);
+ cancel_work_sync(&info->timer->work);
kfree(info->timer->attr.attr.name);
kfree(info->timer);
} else {
pr_debug("decreased refcnt of timer %s to %u\n",
- info->label, info->timer->refcnt);
+ info->label, info->timer->refcnt);
}
mutex_unlock(&list_mutex);
@@ -253,6 +473,7 @@
static struct xt_target idletimer_tg __read_mostly = {
.name = "IDLETIMER",
+ .revision = 1,
.family = NFPROTO_UNSPEC,
.target = idletimer_tg_target,
.targetsize = sizeof(struct idletimer_tg_info),
@@ -319,3 +540,4 @@
MODULE_LICENSE("GPL v2");
MODULE_ALIAS("ipt_IDLETIMER");
MODULE_ALIAS("ip6t_IDLETIMER");
+MODULE_ALIAS("arpt_IDLETIMER");
diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c
index 5185ff0..1f7fbd3 100644
--- a/net/netfilter/xt_bpf.c
+++ b/net/netfilter/xt_bpf.c
@@ -62,7 +62,7 @@
return -EINVAL;
set_fs(KERNEL_DS);
- fd = bpf_obj_get_user(path);
+ fd = bpf_obj_get_user(path, 0);
set_fs(oldfs);
if (fd < 0)
return fd;
diff --git a/net/netfilter/xt_qtaguid.c b/net/netfilter/xt_qtaguid.c
new file mode 100644
index 0000000..d261932
--- /dev/null
+++ b/net/netfilter/xt_qtaguid.c
@@ -0,0 +1,3027 @@
+/*
+ * Kernel iptables module to track stats for packets based on user tags.
+ *
+ * (C) 2011 Google, Inc
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * There are run-time debug flags enabled via the debug_mask module param, or
+ * via the DEFAULT_DEBUG_MASK. See xt_qtaguid_internal.h.
+ */
+#define DEBUG
+
+#include <linux/file.h>
+#include <linux/inetdevice.h>
+#include <linux/module.h>
+#include <linux/miscdevice.h>
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_qtaguid.h>
+#include <linux/ratelimit.h>
+#include <linux/seq_file.h>
+#include <linux/skbuff.h>
+#include <linux/workqueue.h>
+#include <net/addrconf.h>
+#include <net/sock.h>
+#include <net/tcp.h>
+#include <net/udp.h>
+#include <net/netfilter/nf_socket.h>
+
+#if defined(CONFIG_IP6_NF_IPTABLES) || defined(CONFIG_IP6_NF_IPTABLES_MODULE)
+#include <linux/netfilter_ipv6/ip6_tables.h>
+#endif
+
+#include <linux/netfilter/xt_socket.h>
+#include "xt_qtaguid_internal.h"
+#include "xt_qtaguid_print.h"
+#include "../../fs/proc/internal.h"
+
+/*
+ * We only use the xt_socket funcs within a similar context to avoid unexpected
+ * return values.
+ */
+#define XT_SOCKET_SUPPORTED_HOOKS \
+ ((1 << NF_INET_PRE_ROUTING) | (1 << NF_INET_LOCAL_IN))
+
+
+static const char *module_procdirname = "xt_qtaguid";
+static struct proc_dir_entry *xt_qtaguid_procdir;
+
+static unsigned int proc_iface_perms = S_IRUGO;
+module_param_named(iface_perms, proc_iface_perms, uint, S_IRUGO | S_IWUSR);
+
+static struct proc_dir_entry *xt_qtaguid_stats_file;
+static unsigned int proc_stats_perms = S_IRUGO;
+module_param_named(stats_perms, proc_stats_perms, uint, S_IRUGO | S_IWUSR);
+
+static struct proc_dir_entry *xt_qtaguid_ctrl_file;
+
+/* Everybody can write. But proc_ctrl_write_limited is true by default which
+ * limits what can be controlled. See the can_*() functions.
+ */
+static unsigned int proc_ctrl_perms = S_IRUGO | S_IWUGO;
+module_param_named(ctrl_perms, proc_ctrl_perms, uint, S_IRUGO | S_IWUSR);
+
+/* Limited by default, so the gid of the ctrl and stats proc entries
+ * will limit what can be done. See the can_*() functions.
+ */
+static bool proc_stats_readall_limited = true;
+static bool proc_ctrl_write_limited = true;
+
+module_param_named(stats_readall_limited, proc_stats_readall_limited, bool,
+ S_IRUGO | S_IWUSR);
+module_param_named(ctrl_write_limited, proc_ctrl_write_limited, bool,
+ S_IRUGO | S_IWUSR);
+
+/*
+ * Limit the number of active tags (via socket tags) for a given UID.
+ * Multiple processes could share the UID.
+ */
+static int max_sock_tags = DEFAULT_MAX_SOCK_TAGS;
+module_param(max_sock_tags, int, S_IRUGO | S_IWUSR);
+
+/*
+ * After the kernel has initiallized this module, it is still possible
+ * to make it passive.
+ * Setting passive to Y:
+ * - the iface stats handling will not act on notifications.
+ * - iptables matches will never match.
+ * - ctrl commands silently succeed.
+ * - stats are always empty.
+ * This is mostly usefull when a bug is suspected.
+ */
+static bool module_passive;
+module_param_named(passive, module_passive, bool, S_IRUGO | S_IWUSR);
+
+/*
+ * Control how qtaguid data is tracked per proc/uid.
+ * Setting tag_tracking_passive to Y:
+ * - don't create proc specific structs to track tags
+ * - don't check that active tag stats exceed some limits.
+ * - don't clean up socket tags on process exits.
+ * This is mostly usefull when a bug is suspected.
+ */
+static bool qtu_proc_handling_passive;
+module_param_named(tag_tracking_passive, qtu_proc_handling_passive, bool,
+ S_IRUGO | S_IWUSR);
+
+#define QTU_DEV_NAME "xt_qtaguid"
+
+uint qtaguid_debug_mask = DEFAULT_DEBUG_MASK;
+module_param_named(debug_mask, qtaguid_debug_mask, uint, S_IRUGO | S_IWUSR);
+
+/*---------------------------------------------------------------------------*/
+static const char *iface_stat_procdirname = "iface_stat";
+static struct proc_dir_entry *iface_stat_procdir;
+/*
+ * The iface_stat_all* will go away once userspace gets use to the new fields
+ * that have a format line.
+ */
+static const char *iface_stat_all_procfilename = "iface_stat_all";
+static struct proc_dir_entry *iface_stat_all_procfile;
+static const char *iface_stat_fmt_procfilename = "iface_stat_fmt";
+static struct proc_dir_entry *iface_stat_fmt_procfile;
+
+
+static LIST_HEAD(iface_stat_list);
+static DEFINE_SPINLOCK(iface_stat_list_lock);
+
+static struct rb_root sock_tag_tree = RB_ROOT;
+static DEFINE_SPINLOCK(sock_tag_list_lock);
+
+static struct rb_root tag_counter_set_tree = RB_ROOT;
+static DEFINE_SPINLOCK(tag_counter_set_list_lock);
+
+static struct rb_root uid_tag_data_tree = RB_ROOT;
+static DEFINE_SPINLOCK(uid_tag_data_tree_lock);
+
+static struct rb_root proc_qtu_data_tree = RB_ROOT;
+/* No proc_qtu_data_tree_lock; use uid_tag_data_tree_lock */
+
+static struct qtaguid_event_counts qtu_events;
+/*----------------------------------------------*/
+static bool can_manipulate_uids(void)
+{
+ /* root pwnd */
+ return in_egroup_p(xt_qtaguid_ctrl_file->gid)
+ || unlikely(!from_kuid(&init_user_ns, current_fsuid())) || unlikely(!proc_ctrl_write_limited)
+ || unlikely(uid_eq(current_fsuid(), xt_qtaguid_ctrl_file->uid));
+}
+
+static bool can_impersonate_uid(kuid_t uid)
+{
+ return uid_eq(uid, current_fsuid()) || can_manipulate_uids();
+}
+
+static bool can_read_other_uid_stats(kuid_t uid)
+{
+ /* root pwnd */
+ return in_egroup_p(xt_qtaguid_stats_file->gid)
+ || unlikely(!from_kuid(&init_user_ns, current_fsuid())) || uid_eq(uid, current_fsuid())
+ || unlikely(!proc_stats_readall_limited)
+ || unlikely(uid_eq(current_fsuid(), xt_qtaguid_ctrl_file->uid));
+}
+
+static inline void dc_add_byte_packets(struct data_counters *counters, int set,
+ enum ifs_tx_rx direction,
+ enum ifs_proto ifs_proto,
+ int bytes,
+ int packets)
+{
+ counters->bpc[set][direction][ifs_proto].bytes += bytes;
+ counters->bpc[set][direction][ifs_proto].packets += packets;
+}
+
+static struct tag_node *tag_node_tree_search(struct rb_root *root, tag_t tag)
+{
+ struct rb_node *node = root->rb_node;
+
+ while (node) {
+ struct tag_node *data = rb_entry(node, struct tag_node, node);
+ int result;
+ RB_DEBUG("qtaguid: tag_node_tree_search(0x%llx): "
+ " node=%p data=%p\n", tag, node, data);
+ result = tag_compare(tag, data->tag);
+ RB_DEBUG("qtaguid: tag_node_tree_search(0x%llx): "
+ " data.tag=0x%llx (uid=%u) res=%d\n",
+ tag, data->tag, get_uid_from_tag(data->tag), result);
+ if (result < 0)
+ node = node->rb_left;
+ else if (result > 0)
+ node = node->rb_right;
+ else
+ return data;
+ }
+ return NULL;
+}
+
+static void tag_node_tree_insert(struct tag_node *data, struct rb_root *root)
+{
+ struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+ /* Figure out where to put new node */
+ while (*new) {
+ struct tag_node *this = rb_entry(*new, struct tag_node,
+ node);
+ int result = tag_compare(data->tag, this->tag);
+ RB_DEBUG("qtaguid: %s(): tag=0x%llx"
+ " (uid=%u)\n", __func__,
+ this->tag,
+ get_uid_from_tag(this->tag));
+ parent = *new;
+ if (result < 0)
+ new = &((*new)->rb_left);
+ else if (result > 0)
+ new = &((*new)->rb_right);
+ else
+ BUG();
+ }
+
+ /* Add new node and rebalance tree. */
+ rb_link_node(&data->node, parent, new);
+ rb_insert_color(&data->node, root);
+}
+
+static void tag_stat_tree_insert(struct tag_stat *data, struct rb_root *root)
+{
+ tag_node_tree_insert(&data->tn, root);
+}
+
+static struct tag_stat *tag_stat_tree_search(struct rb_root *root, tag_t tag)
+{
+ struct tag_node *node = tag_node_tree_search(root, tag);
+ if (!node)
+ return NULL;
+ return rb_entry(&node->node, struct tag_stat, tn.node);
+}
+
+static void tag_counter_set_tree_insert(struct tag_counter_set *data,
+ struct rb_root *root)
+{
+ tag_node_tree_insert(&data->tn, root);
+}
+
+static struct tag_counter_set *tag_counter_set_tree_search(struct rb_root *root,
+ tag_t tag)
+{
+ struct tag_node *node = tag_node_tree_search(root, tag);
+ if (!node)
+ return NULL;
+ return rb_entry(&node->node, struct tag_counter_set, tn.node);
+
+}
+
+static void tag_ref_tree_insert(struct tag_ref *data, struct rb_root *root)
+{
+ tag_node_tree_insert(&data->tn, root);
+}
+
+static struct tag_ref *tag_ref_tree_search(struct rb_root *root, tag_t tag)
+{
+ struct tag_node *node = tag_node_tree_search(root, tag);
+ if (!node)
+ return NULL;
+ return rb_entry(&node->node, struct tag_ref, tn.node);
+}
+
+static struct sock_tag *sock_tag_tree_search(struct rb_root *root,
+ const struct sock *sk)
+{
+ struct rb_node *node = root->rb_node;
+
+ while (node) {
+ struct sock_tag *data = rb_entry(node, struct sock_tag,
+ sock_node);
+ if (sk < data->sk)
+ node = node->rb_left;
+ else if (sk > data->sk)
+ node = node->rb_right;
+ else
+ return data;
+ }
+ return NULL;
+}
+
+static void sock_tag_tree_insert(struct sock_tag *data, struct rb_root *root)
+{
+ struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+ /* Figure out where to put new node */
+ while (*new) {
+ struct sock_tag *this = rb_entry(*new, struct sock_tag,
+ sock_node);
+ parent = *new;
+ if (data->sk < this->sk)
+ new = &((*new)->rb_left);
+ else if (data->sk > this->sk)
+ new = &((*new)->rb_right);
+ else
+ BUG();
+ }
+
+ /* Add new node and rebalance tree. */
+ rb_link_node(&data->sock_node, parent, new);
+ rb_insert_color(&data->sock_node, root);
+}
+
+static void sock_tag_tree_erase(struct rb_root *st_to_free_tree)
+{
+ struct rb_node *node;
+ struct sock_tag *st_entry;
+
+ node = rb_first(st_to_free_tree);
+ while (node) {
+ st_entry = rb_entry(node, struct sock_tag, sock_node);
+ node = rb_next(node);
+ CT_DEBUG("qtaguid: %s(): "
+ "erase st: sk=%p tag=0x%llx (uid=%u)\n", __func__,
+ st_entry->sk,
+ st_entry->tag,
+ get_uid_from_tag(st_entry->tag));
+ rb_erase(&st_entry->sock_node, st_to_free_tree);
+ sock_put(st_entry->sk);
+ kfree(st_entry);
+ }
+}
+
+static struct proc_qtu_data *proc_qtu_data_tree_search(struct rb_root *root,
+ const pid_t pid)
+{
+ struct rb_node *node = root->rb_node;
+
+ while (node) {
+ struct proc_qtu_data *data = rb_entry(node,
+ struct proc_qtu_data,
+ node);
+ if (pid < data->pid)
+ node = node->rb_left;
+ else if (pid > data->pid)
+ node = node->rb_right;
+ else
+ return data;
+ }
+ return NULL;
+}
+
+static void proc_qtu_data_tree_insert(struct proc_qtu_data *data,
+ struct rb_root *root)
+{
+ struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+ /* Figure out where to put new node */
+ while (*new) {
+ struct proc_qtu_data *this = rb_entry(*new,
+ struct proc_qtu_data,
+ node);
+ parent = *new;
+ if (data->pid < this->pid)
+ new = &((*new)->rb_left);
+ else if (data->pid > this->pid)
+ new = &((*new)->rb_right);
+ else
+ BUG();
+ }
+
+ /* Add new node and rebalance tree. */
+ rb_link_node(&data->node, parent, new);
+ rb_insert_color(&data->node, root);
+}
+
+static void uid_tag_data_tree_insert(struct uid_tag_data *data,
+ struct rb_root *root)
+{
+ struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+ /* Figure out where to put new node */
+ while (*new) {
+ struct uid_tag_data *this = rb_entry(*new,
+ struct uid_tag_data,
+ node);
+ parent = *new;
+ if (data->uid < this->uid)
+ new = &((*new)->rb_left);
+ else if (data->uid > this->uid)
+ new = &((*new)->rb_right);
+ else
+ BUG();
+ }
+
+ /* Add new node and rebalance tree. */
+ rb_link_node(&data->node, parent, new);
+ rb_insert_color(&data->node, root);
+}
+
+static struct uid_tag_data *uid_tag_data_tree_search(struct rb_root *root,
+ uid_t uid)
+{
+ struct rb_node *node = root->rb_node;
+
+ while (node) {
+ struct uid_tag_data *data = rb_entry(node,
+ struct uid_tag_data,
+ node);
+ if (uid < data->uid)
+ node = node->rb_left;
+ else if (uid > data->uid)
+ node = node->rb_right;
+ else
+ return data;
+ }
+ return NULL;
+}
+
+/*
+ * Allocates a new uid_tag_data struct if needed.
+ * Returns a pointer to the found or allocated uid_tag_data.
+ * Returns a PTR_ERR on failures, and lock is not held.
+ * If found is not NULL:
+ * sets *found to true if not allocated.
+ * sets *found to false if allocated.
+ */
+struct uid_tag_data *get_uid_data(uid_t uid, bool *found_res)
+{
+ struct uid_tag_data *utd_entry;
+
+ /* Look for top level uid_tag_data for the UID */
+ utd_entry = uid_tag_data_tree_search(&uid_tag_data_tree, uid);
+ DR_DEBUG("qtaguid: get_uid_data(%u) utd=%p\n", uid, utd_entry);
+
+ if (found_res)
+ *found_res = utd_entry;
+ if (utd_entry)
+ return utd_entry;
+
+ utd_entry = kzalloc(sizeof(*utd_entry), GFP_ATOMIC);
+ if (!utd_entry) {
+ pr_err("qtaguid: get_uid_data(%u): "
+ "tag data alloc failed\n", uid);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ utd_entry->uid = uid;
+ utd_entry->tag_ref_tree = RB_ROOT;
+ uid_tag_data_tree_insert(utd_entry, &uid_tag_data_tree);
+ DR_DEBUG("qtaguid: get_uid_data(%u) new utd=%p\n", uid, utd_entry);
+ return utd_entry;
+}
+
+/* Never returns NULL. Either PTR_ERR or a valid ptr. */
+static struct tag_ref *new_tag_ref(tag_t new_tag,
+ struct uid_tag_data *utd_entry)
+{
+ struct tag_ref *tr_entry;
+ int res;
+
+ if (utd_entry->num_active_tags + 1 > max_sock_tags) {
+ pr_info("qtaguid: new_tag_ref(0x%llx): "
+ "tag ref alloc quota exceeded. max=%d\n",
+ new_tag, max_sock_tags);
+ res = -EMFILE;
+ goto err_res;
+
+ }
+
+ tr_entry = kzalloc(sizeof(*tr_entry), GFP_ATOMIC);
+ if (!tr_entry) {
+ pr_err("qtaguid: new_tag_ref(0x%llx): "
+ "tag ref alloc failed\n",
+ new_tag);
+ res = -ENOMEM;
+ goto err_res;
+ }
+ tr_entry->tn.tag = new_tag;
+ /* tr_entry->num_sock_tags handled by caller */
+ utd_entry->num_active_tags++;
+ tag_ref_tree_insert(tr_entry, &utd_entry->tag_ref_tree);
+ DR_DEBUG("qtaguid: new_tag_ref(0x%llx): "
+ " inserted new tag ref %p\n",
+ new_tag, tr_entry);
+ return tr_entry;
+
+err_res:
+ return ERR_PTR(res);
+}
+
+static struct tag_ref *lookup_tag_ref(tag_t full_tag,
+ struct uid_tag_data **utd_res)
+{
+ struct uid_tag_data *utd_entry;
+ struct tag_ref *tr_entry;
+ bool found_utd;
+ uid_t uid = get_uid_from_tag(full_tag);
+
+ DR_DEBUG("qtaguid: lookup_tag_ref(tag=0x%llx (uid=%u))\n",
+ full_tag, uid);
+
+ utd_entry = get_uid_data(uid, &found_utd);
+ if (IS_ERR_OR_NULL(utd_entry)) {
+ if (utd_res)
+ *utd_res = utd_entry;
+ return NULL;
+ }
+
+ tr_entry = tag_ref_tree_search(&utd_entry->tag_ref_tree, full_tag);
+ if (utd_res)
+ *utd_res = utd_entry;
+ DR_DEBUG("qtaguid: lookup_tag_ref(0x%llx) utd_entry=%p tr_entry=%p\n",
+ full_tag, utd_entry, tr_entry);
+ return tr_entry;
+}
+
+/* Never returns NULL. Either PTR_ERR or a valid ptr. */
+static struct tag_ref *get_tag_ref(tag_t full_tag,
+ struct uid_tag_data **utd_res)
+{
+ struct uid_tag_data *utd_entry;
+ struct tag_ref *tr_entry;
+
+ DR_DEBUG("qtaguid: get_tag_ref(0x%llx)\n",
+ full_tag);
+ tr_entry = lookup_tag_ref(full_tag, &utd_entry);
+ BUG_ON(IS_ERR_OR_NULL(utd_entry));
+ if (!tr_entry)
+ tr_entry = new_tag_ref(full_tag, utd_entry);
+
+ if (utd_res)
+ *utd_res = utd_entry;
+ DR_DEBUG("qtaguid: get_tag_ref(0x%llx) utd=%p tr=%p\n",
+ full_tag, utd_entry, tr_entry);
+ return tr_entry;
+}
+
+/* Checks and maybe frees the UID Tag Data entry */
+static void put_utd_entry(struct uid_tag_data *utd_entry)
+{
+ /* Are we done with the UID tag data entry? */
+ if (RB_EMPTY_ROOT(&utd_entry->tag_ref_tree) &&
+ !utd_entry->num_pqd) {
+ DR_DEBUG("qtaguid: %s(): "
+ "erase utd_entry=%p uid=%u "
+ "by pid=%u tgid=%u uid=%u\n", __func__,
+ utd_entry, utd_entry->uid,
+ current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()));
+ BUG_ON(utd_entry->num_active_tags);
+ rb_erase(&utd_entry->node, &uid_tag_data_tree);
+ kfree(utd_entry);
+ } else {
+ DR_DEBUG("qtaguid: %s(): "
+ "utd_entry=%p still has %d tags %d proc_qtu_data\n",
+ __func__, utd_entry, utd_entry->num_active_tags,
+ utd_entry->num_pqd);
+ BUG_ON(!(utd_entry->num_active_tags ||
+ utd_entry->num_pqd));
+ }
+}
+
+/*
+ * If no sock_tags are using this tag_ref,
+ * decrements refcount of utd_entry, removes tr_entry
+ * from utd_entry->tag_ref_tree and frees.
+ */
+static void free_tag_ref_from_utd_entry(struct tag_ref *tr_entry,
+ struct uid_tag_data *utd_entry)
+{
+ DR_DEBUG("qtaguid: %s(): %p tag=0x%llx (uid=%u)\n", __func__,
+ tr_entry, tr_entry->tn.tag,
+ get_uid_from_tag(tr_entry->tn.tag));
+ if (!tr_entry->num_sock_tags) {
+ BUG_ON(!utd_entry->num_active_tags);
+ utd_entry->num_active_tags--;
+ rb_erase(&tr_entry->tn.node, &utd_entry->tag_ref_tree);
+ DR_DEBUG("qtaguid: %s(): erased %p\n", __func__, tr_entry);
+ kfree(tr_entry);
+ }
+}
+
+static void put_tag_ref_tree(tag_t full_tag, struct uid_tag_data *utd_entry)
+{
+ struct rb_node *node;
+ struct tag_ref *tr_entry;
+ tag_t acct_tag;
+
+ DR_DEBUG("qtaguid: %s(tag=0x%llx (uid=%u))\n", __func__,
+ full_tag, get_uid_from_tag(full_tag));
+ acct_tag = get_atag_from_tag(full_tag);
+ node = rb_first(&utd_entry->tag_ref_tree);
+ while (node) {
+ tr_entry = rb_entry(node, struct tag_ref, tn.node);
+ node = rb_next(node);
+ if (!acct_tag || tr_entry->tn.tag == full_tag)
+ free_tag_ref_from_utd_entry(tr_entry, utd_entry);
+ }
+}
+
+static ssize_t read_proc_u64(struct file *file, char __user *buf,
+ size_t size, loff_t *ppos)
+{
+ uint64_t *valuep = PDE_DATA(file_inode(file));
+ char tmp[24];
+ size_t tmp_size;
+
+ tmp_size = scnprintf(tmp, sizeof(tmp), "%llu\n", *valuep);
+ return simple_read_from_buffer(buf, size, ppos, tmp, tmp_size);
+}
+
+static ssize_t read_proc_bool(struct file *file, char __user *buf,
+ size_t size, loff_t *ppos)
+{
+ bool *valuep = PDE_DATA(file_inode(file));
+ char tmp[24];
+ size_t tmp_size;
+
+ tmp_size = scnprintf(tmp, sizeof(tmp), "%u\n", *valuep);
+ return simple_read_from_buffer(buf, size, ppos, tmp, tmp_size);
+}
+
+static int get_active_counter_set(tag_t tag)
+{
+ int active_set = 0;
+ struct tag_counter_set *tcs;
+
+ MT_DEBUG("qtaguid: get_active_counter_set(tag=0x%llx)"
+ " (uid=%u)\n",
+ tag, get_uid_from_tag(tag));
+ /* For now we only handle UID tags for active sets */
+ tag = get_utag_from_tag(tag);
+ spin_lock_bh(&tag_counter_set_list_lock);
+ tcs = tag_counter_set_tree_search(&tag_counter_set_tree, tag);
+ if (tcs)
+ active_set = tcs->active_set;
+ spin_unlock_bh(&tag_counter_set_list_lock);
+ return active_set;
+}
+
+/*
+ * Find the entry for tracking the specified interface.
+ * Caller must hold iface_stat_list_lock
+ */
+static struct iface_stat *get_iface_entry(const char *ifname)
+{
+ struct iface_stat *iface_entry;
+
+ /* Find the entry for tracking the specified tag within the interface */
+ if (ifname == NULL) {
+ pr_info("qtaguid: iface_stat: get() NULL device name\n");
+ return NULL;
+ }
+
+ /* Iterate over interfaces */
+ list_for_each_entry(iface_entry, &iface_stat_list, list) {
+ if (!strcmp(ifname, iface_entry->ifname))
+ goto done;
+ }
+ iface_entry = NULL;
+done:
+ return iface_entry;
+}
+
+/* This is for fmt2 only */
+static void pp_iface_stat_header(struct seq_file *m)
+{
+ seq_puts(m,
+ "ifname "
+ "total_skb_rx_bytes total_skb_rx_packets "
+ "total_skb_tx_bytes total_skb_tx_packets "
+ "rx_tcp_bytes rx_tcp_packets "
+ "rx_udp_bytes rx_udp_packets "
+ "rx_other_bytes rx_other_packets "
+ "tx_tcp_bytes tx_tcp_packets "
+ "tx_udp_bytes tx_udp_packets "
+ "tx_other_bytes tx_other_packets\n"
+ );
+}
+
+static void pp_iface_stat_line(struct seq_file *m,
+ struct iface_stat *iface_entry)
+{
+ struct data_counters *cnts;
+ int cnt_set = 0; /* We only use one set for the device */
+ cnts = &iface_entry->totals_via_skb;
+ seq_printf(m, "%s %llu %llu %llu %llu %llu %llu %llu %llu "
+ "%llu %llu %llu %llu %llu %llu %llu %llu\n",
+ iface_entry->ifname,
+ dc_sum_bytes(cnts, cnt_set, IFS_RX),
+ dc_sum_packets(cnts, cnt_set, IFS_RX),
+ dc_sum_bytes(cnts, cnt_set, IFS_TX),
+ dc_sum_packets(cnts, cnt_set, IFS_TX),
+ cnts->bpc[cnt_set][IFS_RX][IFS_TCP].bytes,
+ cnts->bpc[cnt_set][IFS_RX][IFS_TCP].packets,
+ cnts->bpc[cnt_set][IFS_RX][IFS_UDP].bytes,
+ cnts->bpc[cnt_set][IFS_RX][IFS_UDP].packets,
+ cnts->bpc[cnt_set][IFS_RX][IFS_PROTO_OTHER].bytes,
+ cnts->bpc[cnt_set][IFS_RX][IFS_PROTO_OTHER].packets,
+ cnts->bpc[cnt_set][IFS_TX][IFS_TCP].bytes,
+ cnts->bpc[cnt_set][IFS_TX][IFS_TCP].packets,
+ cnts->bpc[cnt_set][IFS_TX][IFS_UDP].bytes,
+ cnts->bpc[cnt_set][IFS_TX][IFS_UDP].packets,
+ cnts->bpc[cnt_set][IFS_TX][IFS_PROTO_OTHER].bytes,
+ cnts->bpc[cnt_set][IFS_TX][IFS_PROTO_OTHER].packets);
+}
+
+struct proc_iface_stat_fmt_info {
+ int fmt;
+};
+
+static void *iface_stat_fmt_proc_start(struct seq_file *m, loff_t *pos)
+{
+ struct proc_iface_stat_fmt_info *p = m->private;
+ loff_t n = *pos;
+
+ /*
+ * This lock will prevent iface_stat_update() from changing active,
+ * and in turn prevent an interface from unregistering itself.
+ */
+ spin_lock_bh(&iface_stat_list_lock);
+
+ if (unlikely(module_passive))
+ return NULL;
+
+ if (!n && p->fmt == 2)
+ pp_iface_stat_header(m);
+
+ return seq_list_start(&iface_stat_list, n);
+}
+
+static void *iface_stat_fmt_proc_next(struct seq_file *m, void *p, loff_t *pos)
+{
+ return seq_list_next(p, &iface_stat_list, pos);
+}
+
+static void iface_stat_fmt_proc_stop(struct seq_file *m, void *p)
+{
+ spin_unlock_bh(&iface_stat_list_lock);
+}
+
+static int iface_stat_fmt_proc_show(struct seq_file *m, void *v)
+{
+ struct proc_iface_stat_fmt_info *p = m->private;
+ struct iface_stat *iface_entry;
+ struct rtnl_link_stats64 dev_stats, *stats;
+ struct rtnl_link_stats64 no_dev_stats = {0};
+
+
+ CT_DEBUG("qtaguid:proc iface_stat_fmt pid=%u tgid=%u uid=%u\n",
+ current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()));
+
+ iface_entry = list_entry(v, struct iface_stat, list);
+
+ if (iface_entry->active) {
+ stats = dev_get_stats(iface_entry->net_dev,
+ &dev_stats);
+ } else {
+ stats = &no_dev_stats;
+ }
+ /*
+ * If the meaning of the data changes, then update the fmtX
+ * string.
+ */
+ if (p->fmt == 1) {
+ seq_printf(m, "%s %d %llu %llu %llu %llu %llu %llu %llu %llu\n",
+ iface_entry->ifname,
+ iface_entry->active,
+ iface_entry->totals_via_dev[IFS_RX].bytes,
+ iface_entry->totals_via_dev[IFS_RX].packets,
+ iface_entry->totals_via_dev[IFS_TX].bytes,
+ iface_entry->totals_via_dev[IFS_TX].packets,
+ stats->rx_bytes, stats->rx_packets,
+ stats->tx_bytes, stats->tx_packets
+ );
+ } else {
+ pp_iface_stat_line(m, iface_entry);
+ }
+ return 0;
+}
+
+static const struct file_operations read_u64_fops = {
+ .read = read_proc_u64,
+ .llseek = default_llseek,
+};
+
+static const struct file_operations read_bool_fops = {
+ .read = read_proc_bool,
+ .llseek = default_llseek,
+};
+
+static void iface_create_proc_worker(struct work_struct *work)
+{
+ struct proc_dir_entry *proc_entry;
+ struct iface_stat_work *isw = container_of(work, struct iface_stat_work,
+ iface_work);
+ struct iface_stat *new_iface = isw->iface_entry;
+
+ /* iface_entries are not deleted, so safe to manipulate. */
+ proc_entry = proc_mkdir(new_iface->ifname, iface_stat_procdir);
+ if (IS_ERR_OR_NULL(proc_entry)) {
+ pr_err("qtaguid: iface_stat: create_proc(): alloc failed.\n");
+ kfree(isw);
+ return;
+ }
+
+ new_iface->proc_ptr = proc_entry;
+
+ proc_create_data("tx_bytes", proc_iface_perms, proc_entry,
+ &read_u64_fops,
+ &new_iface->totals_via_dev[IFS_TX].bytes);
+ proc_create_data("rx_bytes", proc_iface_perms, proc_entry,
+ &read_u64_fops,
+ &new_iface->totals_via_dev[IFS_RX].bytes);
+ proc_create_data("tx_packets", proc_iface_perms, proc_entry,
+ &read_u64_fops,
+ &new_iface->totals_via_dev[IFS_TX].packets);
+ proc_create_data("rx_packets", proc_iface_perms, proc_entry,
+ &read_u64_fops,
+ &new_iface->totals_via_dev[IFS_RX].packets);
+ proc_create_data("active", proc_iface_perms, proc_entry,
+ &read_bool_fops, &new_iface->active);
+
+ IF_DEBUG("qtaguid: iface_stat: create_proc(): done "
+ "entry=%p dev=%s\n", new_iface, new_iface->ifname);
+ kfree(isw);
+}
+
+/*
+ * Will set the entry's active state, and
+ * update the net_dev accordingly also.
+ */
+static void _iface_stat_set_active(struct iface_stat *entry,
+ struct net_device *net_dev,
+ bool activate)
+{
+ if (activate) {
+ entry->net_dev = net_dev;
+ entry->active = true;
+ IF_DEBUG("qtaguid: %s(%s): "
+ "enable tracking. rfcnt=%d\n", __func__,
+ entry->ifname,
+ __this_cpu_read(*net_dev->pcpu_refcnt));
+ } else {
+ entry->active = false;
+ entry->net_dev = NULL;
+ IF_DEBUG("qtaguid: %s(%s): "
+ "disable tracking. rfcnt=%d\n", __func__,
+ entry->ifname,
+ __this_cpu_read(*net_dev->pcpu_refcnt));
+
+ }
+}
+
+/* Caller must hold iface_stat_list_lock */
+static struct iface_stat *iface_alloc(struct net_device *net_dev)
+{
+ struct iface_stat *new_iface;
+ struct iface_stat_work *isw;
+
+ new_iface = kzalloc(sizeof(*new_iface), GFP_ATOMIC);
+ if (new_iface == NULL) {
+ pr_err("qtaguid: iface_stat: create(%s): "
+ "iface_stat alloc failed\n", net_dev->name);
+ return NULL;
+ }
+ new_iface->ifname = kstrdup(net_dev->name, GFP_ATOMIC);
+ if (new_iface->ifname == NULL) {
+ pr_err("qtaguid: iface_stat: create(%s): "
+ "ifname alloc failed\n", net_dev->name);
+ kfree(new_iface);
+ return NULL;
+ }
+ spin_lock_init(&new_iface->tag_stat_list_lock);
+ new_iface->tag_stat_tree = RB_ROOT;
+ _iface_stat_set_active(new_iface, net_dev, true);
+
+ /*
+ * ipv6 notifier chains are atomic :(
+ * No create_proc_read_entry() for you!
+ */
+ isw = kmalloc(sizeof(*isw), GFP_ATOMIC);
+ if (!isw) {
+ pr_err("qtaguid: iface_stat: create(%s): "
+ "work alloc failed\n", new_iface->ifname);
+ _iface_stat_set_active(new_iface, net_dev, false);
+ kfree(new_iface->ifname);
+ kfree(new_iface);
+ return NULL;
+ }
+ isw->iface_entry = new_iface;
+ INIT_WORK(&isw->iface_work, iface_create_proc_worker);
+ schedule_work(&isw->iface_work);
+ list_add(&new_iface->list, &iface_stat_list);
+ return new_iface;
+}
+
+static void iface_check_stats_reset_and_adjust(struct net_device *net_dev,
+ struct iface_stat *iface)
+{
+ struct rtnl_link_stats64 dev_stats, *stats;
+ bool stats_rewound;
+
+ stats = dev_get_stats(net_dev, &dev_stats);
+ /* No empty packets */
+ stats_rewound =
+ (stats->rx_bytes < iface->last_known[IFS_RX].bytes)
+ || (stats->tx_bytes < iface->last_known[IFS_TX].bytes);
+
+ IF_DEBUG("qtaguid: %s(%s): iface=%p netdev=%p "
+ "bytes rx/tx=%llu/%llu "
+ "active=%d last_known=%d "
+ "stats_rewound=%d\n", __func__,
+ net_dev ? net_dev->name : "?",
+ iface, net_dev,
+ stats->rx_bytes, stats->tx_bytes,
+ iface->active, iface->last_known_valid, stats_rewound);
+
+ if (iface->active && iface->last_known_valid && stats_rewound) {
+ pr_warn_once("qtaguid: iface_stat: %s(%s): "
+ "iface reset its stats unexpectedly\n", __func__,
+ net_dev->name);
+
+ iface->totals_via_dev[IFS_TX].bytes +=
+ iface->last_known[IFS_TX].bytes;
+ iface->totals_via_dev[IFS_TX].packets +=
+ iface->last_known[IFS_TX].packets;
+ iface->totals_via_dev[IFS_RX].bytes +=
+ iface->last_known[IFS_RX].bytes;
+ iface->totals_via_dev[IFS_RX].packets +=
+ iface->last_known[IFS_RX].packets;
+ iface->last_known_valid = false;
+ IF_DEBUG("qtaguid: %s(%s): iface=%p "
+ "used last known bytes rx/tx=%llu/%llu\n", __func__,
+ iface->ifname, iface, iface->last_known[IFS_RX].bytes,
+ iface->last_known[IFS_TX].bytes);
+ }
+}
+
+/*
+ * Create a new entry for tracking the specified interface.
+ * Do nothing if the entry already exists.
+ * Called when an interface is configured with a valid IP address.
+ */
+static void iface_stat_create(struct net_device *net_dev,
+ struct in_ifaddr *ifa)
+{
+ struct in_device *in_dev = NULL;
+ const char *ifname;
+ struct iface_stat *entry;
+ __be32 ipaddr = 0;
+ struct iface_stat *new_iface;
+
+ IF_DEBUG("qtaguid: iface_stat: create(%s): ifa=%p netdev=%p\n",
+ net_dev ? net_dev->name : "?",
+ ifa, net_dev);
+ if (!net_dev) {
+ pr_err("qtaguid: iface_stat: create(): no net dev\n");
+ return;
+ }
+
+ ifname = net_dev->name;
+ if (!ifa) {
+ in_dev = in_dev_get(net_dev);
+ if (!in_dev) {
+ pr_err("qtaguid: iface_stat: create(%s): no inet dev\n",
+ ifname);
+ return;
+ }
+ IF_DEBUG("qtaguid: iface_stat: create(%s): in_dev=%p\n",
+ ifname, in_dev);
+ for (ifa = in_dev->ifa_list; ifa; ifa = ifa->ifa_next) {
+ IF_DEBUG("qtaguid: iface_stat: create(%s): "
+ "ifa=%p ifa_label=%s\n",
+ ifname, ifa, ifa->ifa_label);
+ if (!strcmp(ifname, ifa->ifa_label))
+ break;
+ }
+ }
+
+ if (!ifa) {
+ IF_DEBUG("qtaguid: iface_stat: create(%s): no matching IP\n",
+ ifname);
+ goto done_put;
+ }
+ ipaddr = ifa->ifa_local;
+
+ spin_lock_bh(&iface_stat_list_lock);
+ entry = get_iface_entry(ifname);
+ if (entry != NULL) {
+ IF_DEBUG("qtaguid: iface_stat: create(%s): entry=%p\n",
+ ifname, entry);
+ iface_check_stats_reset_and_adjust(net_dev, entry);
+ _iface_stat_set_active(entry, net_dev, true);
+ IF_DEBUG("qtaguid: %s(%s): "
+ "tracking now %d on ip=%pI4\n", __func__,
+ entry->ifname, true, &ipaddr);
+ goto done_unlock_put;
+ }
+
+ new_iface = iface_alloc(net_dev);
+ IF_DEBUG("qtaguid: iface_stat: create(%s): done "
+ "entry=%p ip=%pI4\n", ifname, new_iface, &ipaddr);
+done_unlock_put:
+ spin_unlock_bh(&iface_stat_list_lock);
+done_put:
+ if (in_dev)
+ in_dev_put(in_dev);
+}
+
+static void iface_stat_create_ipv6(struct net_device *net_dev,
+ struct inet6_ifaddr *ifa)
+{
+ struct in_device *in_dev;
+ const char *ifname;
+ struct iface_stat *entry;
+ struct iface_stat *new_iface;
+ int addr_type;
+
+ IF_DEBUG("qtaguid: iface_stat: create6(): ifa=%p netdev=%p->name=%s\n",
+ ifa, net_dev, net_dev ? net_dev->name : "");
+ if (!net_dev) {
+ pr_err("qtaguid: iface_stat: create6(): no net dev!\n");
+ return;
+ }
+ ifname = net_dev->name;
+
+ in_dev = in_dev_get(net_dev);
+ if (!in_dev) {
+ pr_err("qtaguid: iface_stat: create6(%s): no inet dev\n",
+ ifname);
+ return;
+ }
+
+ IF_DEBUG("qtaguid: iface_stat: create6(%s): in_dev=%p\n",
+ ifname, in_dev);
+
+ if (!ifa) {
+ IF_DEBUG("qtaguid: iface_stat: create6(%s): no matching IP\n",
+ ifname);
+ goto done_put;
+ }
+ addr_type = ipv6_addr_type(&ifa->addr);
+
+ spin_lock_bh(&iface_stat_list_lock);
+ entry = get_iface_entry(ifname);
+ if (entry != NULL) {
+ IF_DEBUG("qtaguid: %s(%s): entry=%p\n", __func__,
+ ifname, entry);
+ iface_check_stats_reset_and_adjust(net_dev, entry);
+ _iface_stat_set_active(entry, net_dev, true);
+ IF_DEBUG("qtaguid: %s(%s): "
+ "tracking now %d on ip=%pI6c\n", __func__,
+ entry->ifname, true, &ifa->addr);
+ goto done_unlock_put;
+ }
+
+ new_iface = iface_alloc(net_dev);
+ IF_DEBUG("qtaguid: iface_stat: create6(%s): done "
+ "entry=%p ip=%pI6c\n", ifname, new_iface, &ifa->addr);
+
+done_unlock_put:
+ spin_unlock_bh(&iface_stat_list_lock);
+done_put:
+ in_dev_put(in_dev);
+}
+
+static struct sock_tag *get_sock_stat_nl(const struct sock *sk)
+{
+ MT_DEBUG("qtaguid: get_sock_stat_nl(sk=%p)\n", sk);
+ return sock_tag_tree_search(&sock_tag_tree, sk);
+}
+
+static struct sock_tag *get_sock_stat(const struct sock *sk)
+{
+ struct sock_tag *sock_tag_entry;
+ MT_DEBUG("qtaguid: get_sock_stat(sk=%p)\n", sk);
+ if (!sk)
+ return NULL;
+ spin_lock_bh(&sock_tag_list_lock);
+ sock_tag_entry = get_sock_stat_nl(sk);
+ spin_unlock_bh(&sock_tag_list_lock);
+ return sock_tag_entry;
+}
+
+static int ipx_proto(const struct sk_buff *skb,
+ struct xt_action_param *par)
+{
+ int thoff = 0, tproto;
+
+ switch (par->state->pf) {
+ case NFPROTO_IPV6:
+ tproto = ipv6_find_hdr(skb, &thoff, -1, NULL, NULL);
+ if (tproto < 0)
+ MT_DEBUG("%s(): transport header not found in ipv6"
+ " skb=%p\n", __func__, skb);
+ break;
+ case NFPROTO_IPV4:
+ tproto = ip_hdr(skb)->protocol;
+ break;
+ default:
+ tproto = IPPROTO_RAW;
+ }
+ return tproto;
+}
+
+static void
+data_counters_update(struct data_counters *dc, int set,
+ enum ifs_tx_rx direction, int proto, int bytes)
+{
+ switch (proto) {
+ case IPPROTO_TCP:
+ dc_add_byte_packets(dc, set, direction, IFS_TCP, bytes, 1);
+ break;
+ case IPPROTO_UDP:
+ dc_add_byte_packets(dc, set, direction, IFS_UDP, bytes, 1);
+ break;
+ case IPPROTO_IP:
+ default:
+ dc_add_byte_packets(dc, set, direction, IFS_PROTO_OTHER, bytes,
+ 1);
+ break;
+ }
+}
+
+/*
+ * Update stats for the specified interface. Do nothing if the entry
+ * does not exist (when a device was never configured with an IP address).
+ * Called when an device is being unregistered.
+ */
+static void iface_stat_update(struct net_device *net_dev, bool stash_only)
+{
+ struct rtnl_link_stats64 dev_stats, *stats;
+ struct iface_stat *entry;
+
+ stats = dev_get_stats(net_dev, &dev_stats);
+ spin_lock_bh(&iface_stat_list_lock);
+ entry = get_iface_entry(net_dev->name);
+ if (entry == NULL) {
+ IF_DEBUG("qtaguid: iface_stat: update(%s): not tracked\n",
+ net_dev->name);
+ spin_unlock_bh(&iface_stat_list_lock);
+ return;
+ }
+
+ IF_DEBUG("qtaguid: %s(%s): entry=%p\n", __func__,
+ net_dev->name, entry);
+ if (!entry->active) {
+ IF_DEBUG("qtaguid: %s(%s): already disabled\n", __func__,
+ net_dev->name);
+ spin_unlock_bh(&iface_stat_list_lock);
+ return;
+ }
+
+ if (stash_only) {
+ entry->last_known[IFS_TX].bytes = stats->tx_bytes;
+ entry->last_known[IFS_TX].packets = stats->tx_packets;
+ entry->last_known[IFS_RX].bytes = stats->rx_bytes;
+ entry->last_known[IFS_RX].packets = stats->rx_packets;
+ entry->last_known_valid = true;
+ IF_DEBUG("qtaguid: %s(%s): "
+ "dev stats stashed rx/tx=%llu/%llu\n", __func__,
+ net_dev->name, stats->rx_bytes, stats->tx_bytes);
+ spin_unlock_bh(&iface_stat_list_lock);
+ return;
+ }
+ entry->totals_via_dev[IFS_TX].bytes += stats->tx_bytes;
+ entry->totals_via_dev[IFS_TX].packets += stats->tx_packets;
+ entry->totals_via_dev[IFS_RX].bytes += stats->rx_bytes;
+ entry->totals_via_dev[IFS_RX].packets += stats->rx_packets;
+ /* We don't need the last_known[] anymore */
+ entry->last_known_valid = false;
+ _iface_stat_set_active(entry, net_dev, false);
+ IF_DEBUG("qtaguid: %s(%s): "
+ "disable tracking. rx/tx=%llu/%llu\n", __func__,
+ net_dev->name, stats->rx_bytes, stats->tx_bytes);
+ spin_unlock_bh(&iface_stat_list_lock);
+}
+
+/* Guarantied to return a net_device that has a name */
+static void get_dev_and_dir(const struct sk_buff *skb,
+ struct xt_action_param *par,
+ enum ifs_tx_rx *direction,
+ const struct net_device **el_dev)
+{
+ const struct nf_hook_state *parst = par->state;
+
+ BUG_ON(!direction || !el_dev);
+
+ if (parst->in) {
+ *el_dev = parst->in;
+ *direction = IFS_RX;
+ } else if (parst->out) {
+ *el_dev = parst->out;
+ *direction = IFS_TX;
+ } else {
+ pr_err("qtaguid[%d]: %s(): no par->state->in/out?!!\n",
+ parst->hook, __func__);
+ BUG();
+ }
+ if (skb->dev && *el_dev != skb->dev) {
+ MT_DEBUG("qtaguid[%d]: skb->dev=%p %s vs par->%s=%p %s\n",
+ parst->hook, skb->dev, skb->dev->name,
+ *direction == IFS_RX ? "in" : "out", *el_dev,
+ (*el_dev)->name);
+ }
+}
+
+/*
+ * Update stats for the specified interface from the skb.
+ * Do nothing if the entry
+ * does not exist (when a device was never configured with an IP address).
+ * Called on each sk.
+ */
+static void iface_stat_update_from_skb(const struct sk_buff *skb,
+ struct xt_action_param *par)
+{
+ const struct nf_hook_state *parst = par->state;
+ struct iface_stat *entry;
+ const struct net_device *el_dev;
+ enum ifs_tx_rx direction;
+ int bytes = skb->len;
+ int proto;
+
+ get_dev_and_dir(skb, par, &direction, &el_dev);
+ proto = ipx_proto(skb, par);
+ MT_DEBUG("qtaguid[%d]: iface_stat: %s(%s): "
+ "type=%d fam=%d proto=%d dir=%d\n",
+ parst->hook, __func__, el_dev->name, el_dev->type,
+ parst->pf, proto, direction);
+
+ spin_lock_bh(&iface_stat_list_lock);
+ entry = get_iface_entry(el_dev->name);
+ if (entry == NULL) {
+ IF_DEBUG("qtaguid[%d]: iface_stat: %s(%s): not tracked\n",
+ parst->hook, __func__, el_dev->name);
+ spin_unlock_bh(&iface_stat_list_lock);
+ return;
+ }
+
+ IF_DEBUG("qtaguid[%d]: %s(%s): entry=%p\n", parst->hook, __func__,
+ el_dev->name, entry);
+
+ data_counters_update(&entry->totals_via_skb, 0, direction, proto,
+ bytes);
+ spin_unlock_bh(&iface_stat_list_lock);
+}
+
+static void tag_stat_update(struct tag_stat *tag_entry,
+ enum ifs_tx_rx direction, int proto, int bytes)
+{
+ int active_set;
+ active_set = get_active_counter_set(tag_entry->tn.tag);
+ MT_DEBUG("qtaguid: tag_stat_update(tag=0x%llx (uid=%u) set=%d "
+ "dir=%d proto=%d bytes=%d)\n",
+ tag_entry->tn.tag, get_uid_from_tag(tag_entry->tn.tag),
+ active_set, direction, proto, bytes);
+ data_counters_update(&tag_entry->counters, active_set, direction,
+ proto, bytes);
+ if (tag_entry->parent_counters)
+ data_counters_update(tag_entry->parent_counters, active_set,
+ direction, proto, bytes);
+}
+
+/*
+ * Create a new entry for tracking the specified {acct_tag,uid_tag} within
+ * the interface.
+ * iface_entry->tag_stat_list_lock should be held.
+ */
+static struct tag_stat *create_if_tag_stat(struct iface_stat *iface_entry,
+ tag_t tag)
+{
+ struct tag_stat *new_tag_stat_entry = NULL;
+ IF_DEBUG("qtaguid: iface_stat: %s(): ife=%p tag=0x%llx"
+ " (uid=%u)\n", __func__,
+ iface_entry, tag, get_uid_from_tag(tag));
+ new_tag_stat_entry = kzalloc(sizeof(*new_tag_stat_entry), GFP_ATOMIC);
+ if (!new_tag_stat_entry) {
+ pr_err("qtaguid: iface_stat: tag stat alloc failed\n");
+ goto done;
+ }
+ new_tag_stat_entry->tn.tag = tag;
+ tag_stat_tree_insert(new_tag_stat_entry, &iface_entry->tag_stat_tree);
+done:
+ return new_tag_stat_entry;
+}
+
+static void if_tag_stat_update(const char *ifname, uid_t uid,
+ const struct sock *sk, enum ifs_tx_rx direction,
+ int proto, int bytes)
+{
+ struct tag_stat *tag_stat_entry;
+ tag_t tag, acct_tag;
+ tag_t uid_tag;
+ struct data_counters *uid_tag_counters;
+ struct sock_tag *sock_tag_entry;
+ struct iface_stat *iface_entry;
+ struct tag_stat *new_tag_stat = NULL;
+ MT_DEBUG("qtaguid: if_tag_stat_update(ifname=%s "
+ "uid=%u sk=%p dir=%d proto=%d bytes=%d)\n",
+ ifname, uid, sk, direction, proto, bytes);
+
+ spin_lock_bh(&iface_stat_list_lock);
+ iface_entry = get_iface_entry(ifname);
+ if (!iface_entry) {
+ pr_err_ratelimited("qtaguid: tag_stat: stat_update() "
+ "%s not found\n", ifname);
+ spin_unlock_bh(&iface_stat_list_lock);
+ return;
+ }
+ /* It is ok to process data when an iface_entry is inactive */
+
+ MT_DEBUG("qtaguid: tag_stat: stat_update() dev=%s entry=%p\n",
+ ifname, iface_entry);
+
+ /*
+ * Look for a tagged sock.
+ * It will have an acct_uid.
+ */
+ sock_tag_entry = get_sock_stat(sk);
+ if (sock_tag_entry) {
+ tag = sock_tag_entry->tag;
+ acct_tag = get_atag_from_tag(tag);
+ uid_tag = get_utag_from_tag(tag);
+ } else {
+ acct_tag = make_atag_from_value(0);
+ tag = combine_atag_with_uid(acct_tag, uid);
+ uid_tag = make_tag_from_uid(uid);
+ }
+ MT_DEBUG("qtaguid: tag_stat: stat_update(): "
+ " looking for tag=0x%llx (uid=%u) in ife=%p\n",
+ tag, get_uid_from_tag(tag), iface_entry);
+ /* Loop over tag list under this interface for {acct_tag,uid_tag} */
+ spin_lock_bh(&iface_entry->tag_stat_list_lock);
+
+ tag_stat_entry = tag_stat_tree_search(&iface_entry->tag_stat_tree,
+ tag);
+ if (tag_stat_entry) {
+ /*
+ * Updating the {acct_tag, uid_tag} entry handles both stats:
+ * {0, uid_tag} will also get updated.
+ */
+ tag_stat_update(tag_stat_entry, direction, proto, bytes);
+ goto unlock;
+ }
+
+ /* Loop over tag list under this interface for {0,uid_tag} */
+ tag_stat_entry = tag_stat_tree_search(&iface_entry->tag_stat_tree,
+ uid_tag);
+ if (!tag_stat_entry) {
+ /* Here: the base uid_tag did not exist */
+ /*
+ * No parent counters. So
+ * - No {0, uid_tag} stats and no {acc_tag, uid_tag} stats.
+ */
+ new_tag_stat = create_if_tag_stat(iface_entry, uid_tag);
+ if (!new_tag_stat)
+ goto unlock;
+ uid_tag_counters = &new_tag_stat->counters;
+ } else {
+ uid_tag_counters = &tag_stat_entry->counters;
+ }
+
+ if (acct_tag) {
+ /* Create the child {acct_tag, uid_tag} and hook up parent. */
+ new_tag_stat = create_if_tag_stat(iface_entry, tag);
+ if (!new_tag_stat)
+ goto unlock;
+ new_tag_stat->parent_counters = uid_tag_counters;
+ } else {
+ /*
+ * For new_tag_stat to be still NULL here would require:
+ * {0, uid_tag} exists
+ * and {acct_tag, uid_tag} doesn't exist
+ * AND acct_tag == 0.
+ * Impossible. This reassures us that new_tag_stat
+ * below will always be assigned.
+ */
+ BUG_ON(!new_tag_stat);
+ }
+ tag_stat_update(new_tag_stat, direction, proto, bytes);
+unlock:
+ spin_unlock_bh(&iface_entry->tag_stat_list_lock);
+ spin_unlock_bh(&iface_stat_list_lock);
+}
+
+static int iface_netdev_event_handler(struct notifier_block *nb,
+ unsigned long event, void *ptr) {
+ struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+
+ if (unlikely(module_passive))
+ return NOTIFY_DONE;
+
+ IF_DEBUG("qtaguid: iface_stat: netdev_event(): "
+ "ev=0x%lx/%s netdev=%p->name=%s\n",
+ event, netdev_evt_str(event), dev, dev ? dev->name : "");
+
+ switch (event) {
+ case NETDEV_UP:
+ iface_stat_create(dev, NULL);
+ atomic64_inc(&qtu_events.iface_events);
+ break;
+ case NETDEV_DOWN:
+ case NETDEV_UNREGISTER:
+ iface_stat_update(dev, event == NETDEV_DOWN);
+ atomic64_inc(&qtu_events.iface_events);
+ break;
+ }
+ return NOTIFY_DONE;
+}
+
+static int iface_inet6addr_event_handler(struct notifier_block *nb,
+ unsigned long event, void *ptr)
+{
+ struct inet6_ifaddr *ifa = ptr;
+ struct net_device *dev;
+
+ if (unlikely(module_passive))
+ return NOTIFY_DONE;
+
+ IF_DEBUG("qtaguid: iface_stat: inet6addr_event(): "
+ "ev=0x%lx/%s ifa=%p\n",
+ event, netdev_evt_str(event), ifa);
+
+ switch (event) {
+ case NETDEV_UP:
+ BUG_ON(!ifa || !ifa->idev);
+ dev = (struct net_device *)ifa->idev->dev;
+ iface_stat_create_ipv6(dev, ifa);
+ atomic64_inc(&qtu_events.iface_events);
+ break;
+ case NETDEV_DOWN:
+ case NETDEV_UNREGISTER:
+ BUG_ON(!ifa || !ifa->idev);
+ dev = (struct net_device *)ifa->idev->dev;
+ iface_stat_update(dev, event == NETDEV_DOWN);
+ atomic64_inc(&qtu_events.iface_events);
+ break;
+ }
+ return NOTIFY_DONE;
+}
+
+static int iface_inetaddr_event_handler(struct notifier_block *nb,
+ unsigned long event, void *ptr)
+{
+ struct in_ifaddr *ifa = ptr;
+ struct net_device *dev;
+
+ if (unlikely(module_passive))
+ return NOTIFY_DONE;
+
+ IF_DEBUG("qtaguid: iface_stat: inetaddr_event(): "
+ "ev=0x%lx/%s ifa=%p\n",
+ event, netdev_evt_str(event), ifa);
+
+ switch (event) {
+ case NETDEV_UP:
+ BUG_ON(!ifa || !ifa->ifa_dev);
+ dev = ifa->ifa_dev->dev;
+ iface_stat_create(dev, ifa);
+ atomic64_inc(&qtu_events.iface_events);
+ break;
+ case NETDEV_DOWN:
+ case NETDEV_UNREGISTER:
+ BUG_ON(!ifa || !ifa->ifa_dev);
+ dev = ifa->ifa_dev->dev;
+ iface_stat_update(dev, event == NETDEV_DOWN);
+ atomic64_inc(&qtu_events.iface_events);
+ break;
+ }
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block iface_netdev_notifier_blk = {
+ .notifier_call = iface_netdev_event_handler,
+};
+
+static struct notifier_block iface_inetaddr_notifier_blk = {
+ .notifier_call = iface_inetaddr_event_handler,
+};
+
+static struct notifier_block iface_inet6addr_notifier_blk = {
+ .notifier_call = iface_inet6addr_event_handler,
+};
+
+static const struct seq_operations iface_stat_fmt_proc_seq_ops = {
+ .start = iface_stat_fmt_proc_start,
+ .next = iface_stat_fmt_proc_next,
+ .stop = iface_stat_fmt_proc_stop,
+ .show = iface_stat_fmt_proc_show,
+};
+
+static int proc_iface_stat_fmt_open(struct inode *inode, struct file *file)
+{
+ struct proc_iface_stat_fmt_info *s;
+
+ s = __seq_open_private(file, &iface_stat_fmt_proc_seq_ops,
+ sizeof(struct proc_iface_stat_fmt_info));
+ if (!s)
+ return -ENOMEM;
+
+ s->fmt = (uintptr_t)PDE_DATA(inode);
+ return 0;
+}
+
+static const struct file_operations proc_iface_stat_fmt_fops = {
+ .open = proc_iface_stat_fmt_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release_private,
+};
+
+static int __init iface_stat_init(struct proc_dir_entry *parent_procdir)
+{
+ int err;
+
+ iface_stat_procdir = proc_mkdir(iface_stat_procdirname, parent_procdir);
+ if (!iface_stat_procdir) {
+ pr_err("qtaguid: iface_stat: init failed to create proc entry\n");
+ err = -1;
+ goto err;
+ }
+
+ iface_stat_all_procfile = proc_create_data(iface_stat_all_procfilename,
+ proc_iface_perms,
+ parent_procdir,
+ &proc_iface_stat_fmt_fops,
+ (void *)1 /* fmt1 */);
+ if (!iface_stat_all_procfile) {
+ pr_err("qtaguid: iface_stat: init "
+ " failed to create stat_old proc entry\n");
+ err = -1;
+ goto err_zap_entry;
+ }
+
+ iface_stat_fmt_procfile = proc_create_data(iface_stat_fmt_procfilename,
+ proc_iface_perms,
+ parent_procdir,
+ &proc_iface_stat_fmt_fops,
+ (void *)2 /* fmt2 */);
+ if (!iface_stat_fmt_procfile) {
+ pr_err("qtaguid: iface_stat: init "
+ " failed to create stat_all proc entry\n");
+ err = -1;
+ goto err_zap_all_stats_entry;
+ }
+
+
+ err = register_netdevice_notifier(&iface_netdev_notifier_blk);
+ if (err) {
+ pr_err("qtaguid: iface_stat: init "
+ "failed to register dev event handler\n");
+ goto err_zap_all_stats_entries;
+ }
+ err = register_inetaddr_notifier(&iface_inetaddr_notifier_blk);
+ if (err) {
+ pr_err("qtaguid: iface_stat: init "
+ "failed to register ipv4 dev event handler\n");
+ goto err_unreg_nd;
+ }
+
+ err = register_inet6addr_notifier(&iface_inet6addr_notifier_blk);
+ if (err) {
+ pr_err("qtaguid: iface_stat: init "
+ "failed to register ipv6 dev event handler\n");
+ goto err_unreg_ip4_addr;
+ }
+ return 0;
+
+err_unreg_ip4_addr:
+ unregister_inetaddr_notifier(&iface_inetaddr_notifier_blk);
+err_unreg_nd:
+ unregister_netdevice_notifier(&iface_netdev_notifier_blk);
+err_zap_all_stats_entries:
+ remove_proc_entry(iface_stat_fmt_procfilename, parent_procdir);
+err_zap_all_stats_entry:
+ remove_proc_entry(iface_stat_all_procfilename, parent_procdir);
+err_zap_entry:
+ remove_proc_entry(iface_stat_procdirname, parent_procdir);
+err:
+ return err;
+}
+
+static struct sock *qtaguid_find_sk(const struct sk_buff *skb,
+ struct xt_action_param *par)
+{
+ const struct nf_hook_state *parst = par->state;
+ struct sock *sk;
+ unsigned int hook_mask = (1 << parst->hook);
+
+ MT_DEBUG("qtaguid[%d]: find_sk(skb=%p) family=%d\n",
+ parst->hook, skb, parst->pf);
+
+ /*
+ * Let's not abuse the the xt_socket_get*_sk(), or else it will
+ * return garbage SKs.
+ */
+ if (!(hook_mask & XT_SOCKET_SUPPORTED_HOOKS))
+ return NULL;
+
+ switch (parst->pf) {
+ case NFPROTO_IPV6:
+ sk = nf_sk_lookup_slow_v6(dev_net(skb->dev), skb, parst->in);
+ break;
+ case NFPROTO_IPV4:
+ sk = nf_sk_lookup_slow_v4(dev_net(skb->dev), skb, parst->in);
+ break;
+ default:
+ return NULL;
+ }
+
+ if (sk) {
+ MT_DEBUG("qtaguid[%d]: %p->sk_proto=%u->sk_state=%d\n",
+ parst->hook, sk, sk->sk_protocol, sk->sk_state);
+ }
+ return sk;
+}
+
+static void account_for_uid(const struct sk_buff *skb,
+ const struct sock *alternate_sk, uid_t uid,
+ struct xt_action_param *par)
+{
+ const struct net_device *el_dev;
+ enum ifs_tx_rx direction;
+ int proto;
+
+ get_dev_and_dir(skb, par, &direction, &el_dev);
+ proto = ipx_proto(skb, par);
+ MT_DEBUG("qtaguid[%d]: dev name=%s type=%d fam=%d proto=%d dir=%d\n",
+ par->state->hook, el_dev->name, el_dev->type,
+ par->state->pf, proto, direction);
+
+ if_tag_stat_update(el_dev->name, uid,
+ skb->sk ? skb->sk : alternate_sk,
+ direction,
+ proto, skb->len);
+}
+
+static bool qtaguid_mt(const struct sk_buff *skb, struct xt_action_param *par)
+{
+ const struct xt_qtaguid_match_info *info = par->matchinfo;
+ const struct nf_hook_state *parst = par->state;
+ const struct file *filp;
+ bool got_sock = false;
+ struct sock *sk;
+ kuid_t sock_uid;
+ bool res;
+ bool set_sk_callback_lock = false;
+ /*
+ * TODO: unhack how to force just accounting.
+ * For now we only do tag stats when the uid-owner is not requested
+ */
+ bool do_tag_stat = !(info->match & XT_QTAGUID_UID);
+
+ if (unlikely(module_passive))
+ return (info->match ^ info->invert) == 0;
+
+ MT_DEBUG("qtaguid[%d]: entered skb=%p par->in=%p/out=%p fam=%d\n",
+ parst->hook, skb, parst->in, parst->out, parst->pf);
+
+ atomic64_inc(&qtu_events.match_calls);
+ if (skb == NULL) {
+ res = (info->match ^ info->invert) == 0;
+ goto ret_res;
+ }
+
+ switch (parst->hook) {
+ case NF_INET_PRE_ROUTING:
+ case NF_INET_POST_ROUTING:
+ atomic64_inc(&qtu_events.match_calls_prepost);
+ iface_stat_update_from_skb(skb, par);
+ /*
+ * We are done in pre/post. The skb will get processed
+ * further alter.
+ */
+ res = (info->match ^ info->invert);
+ goto ret_res;
+ break;
+ /* default: Fall through and do UID releated work */
+ }
+
+ sk = skb_to_full_sk(skb);
+ /*
+ * When in TCP_TIME_WAIT the sk is not a "struct sock" but
+ * "struct inet_timewait_sock" which is missing fields.
+ * So we ignore it.
+ */
+ if (sk && sk->sk_state == TCP_TIME_WAIT)
+ sk = NULL;
+ if (sk == NULL) {
+ /*
+ * A missing sk->sk_socket happens when packets are in-flight
+ * and the matching socket is already closed and gone.
+ */
+ sk = qtaguid_find_sk(skb, par);
+ /*
+ * TCP_NEW_SYN_RECV are not "struct sock" but "struct request_sock"
+ * where we can get a pointer to a full socket to retrieve uid/gid.
+ * When in TCP_TIME_WAIT, sk is a struct inet_timewait_sock
+ * which is missing fields and does not contain any reference
+ * to a full socket, so just ignore the socket.
+ */
+ if (sk && sk->sk_state == TCP_NEW_SYN_RECV) {
+ sock_gen_put(sk);
+ sk = sk_to_full_sk(sk);
+ } else if (sk && (!sk_fullsock(sk) || sk->sk_state == TCP_TIME_WAIT)) {
+ sock_gen_put(sk);
+ sk = NULL;
+ } else {
+ /*
+ * If we got the socket from the find_sk(), we will need to put
+ * it back, as nf_tproxy_get_sock_v4() got it.
+ */
+ got_sock = sk;
+ }
+ if (sk)
+ atomic64_inc(&qtu_events.match_found_sk_in_ct);
+ else
+ atomic64_inc(&qtu_events.match_found_no_sk_in_ct);
+ } else {
+ atomic64_inc(&qtu_events.match_found_sk);
+ }
+ MT_DEBUG("qtaguid[%d]: sk=%p got_sock=%d fam=%d proto=%d\n",
+ parst->hook, sk, got_sock, parst->pf, ipx_proto(skb, par));
+
+ if (!sk) {
+ /*
+ * Here, the qtaguid_find_sk() using connection tracking
+ * couldn't find the owner, so for now we just count them
+ * against the system.
+ */
+ if (do_tag_stat)
+ account_for_uid(skb, sk, 0, par);
+ MT_DEBUG("qtaguid[%d]: leaving (sk=NULL)\n", parst->hook);
+ res = (info->match ^ info->invert) == 0;
+ atomic64_inc(&qtu_events.match_no_sk);
+ goto put_sock_ret_res;
+ } else if (info->match & info->invert & XT_QTAGUID_SOCKET) {
+ res = false;
+ goto put_sock_ret_res;
+ }
+ sock_uid = sk->sk_uid;
+ if (do_tag_stat)
+ account_for_uid(skb, sk, from_kuid(&init_user_ns, sock_uid),
+ par);
+
+ /*
+ * The following two tests fail the match when:
+ * id not in range AND no inverted condition requested
+ * or id in range AND inverted condition requested
+ * Thus (!a && b) || (a && !b) == a ^ b
+ */
+ if (info->match & XT_QTAGUID_UID) {
+ kuid_t uid_min = make_kuid(&init_user_ns, info->uid_min);
+ kuid_t uid_max = make_kuid(&init_user_ns, info->uid_max);
+
+ if ((uid_gte(sock_uid, uid_min) &&
+ uid_lte(sock_uid, uid_max)) ^
+ !(info->invert & XT_QTAGUID_UID)) {
+ MT_DEBUG("qtaguid[%d]: leaving uid not matching\n",
+ parst->hook);
+ res = false;
+ goto put_sock_ret_res;
+ }
+ }
+ if (info->match & XT_QTAGUID_GID) {
+ kgid_t gid_min = make_kgid(&init_user_ns, info->gid_min);
+ kgid_t gid_max = make_kgid(&init_user_ns, info->gid_max);
+ set_sk_callback_lock = true;
+ read_lock_bh(&sk->sk_callback_lock);
+ MT_DEBUG("qtaguid[%d]: sk=%p->sk_socket=%p->file=%p\n",
+ parst->hook, sk, sk->sk_socket,
+ sk->sk_socket ? sk->sk_socket->file : (void *)-1LL);
+ filp = sk->sk_socket ? sk->sk_socket->file : NULL;
+ if (!filp) {
+ res = ((info->match ^ info->invert) &
+ XT_QTAGUID_GID) == 0;
+ atomic64_inc(&qtu_events.match_no_sk_gid);
+ goto put_sock_ret_res;
+ }
+ MT_DEBUG("qtaguid[%d]: filp...uid=%u\n",
+ parst->hook, filp ?
+ from_kuid(&init_user_ns, filp->f_cred->fsuid) : -1);
+ if ((gid_gte(filp->f_cred->fsgid, gid_min) &&
+ gid_lte(filp->f_cred->fsgid, gid_max)) ^
+ !(info->invert & XT_QTAGUID_GID)) {
+ MT_DEBUG("qtaguid[%d]: leaving gid not matching\n",
+ parst->hook);
+ res = false;
+ goto put_sock_ret_res;
+ }
+ }
+ MT_DEBUG("qtaguid[%d]: leaving matched\n", parst->hook);
+ res = true;
+
+put_sock_ret_res:
+ if (got_sock)
+ sock_gen_put(sk);
+ if (set_sk_callback_lock)
+ read_unlock_bh(&sk->sk_callback_lock);
+ret_res:
+ MT_DEBUG("qtaguid[%d]: left %d\n", parst->hook, res);
+ return res;
+}
+
+#ifdef DDEBUG
+/*
+ * This function is not in xt_qtaguid_print.c because of locks visibility.
+ * The lock of sock_tag_list must be aquired before calling this function
+ */
+static void prdebug_full_state_locked(int indent_level, const char *fmt, ...)
+{
+ va_list args;
+ char *fmt_buff;
+ char *buff;
+
+ if (!unlikely(qtaguid_debug_mask & DDEBUG_MASK))
+ return;
+
+ fmt_buff = kasprintf(GFP_ATOMIC,
+ "qtaguid: %s(): %s {\n", __func__, fmt);
+ BUG_ON(!fmt_buff);
+ va_start(args, fmt);
+ buff = kvasprintf(GFP_ATOMIC,
+ fmt_buff, args);
+ BUG_ON(!buff);
+ pr_debug("%s", buff);
+ kfree(fmt_buff);
+ kfree(buff);
+ va_end(args);
+
+ prdebug_sock_tag_tree(indent_level, &sock_tag_tree);
+
+ spin_lock_bh(&uid_tag_data_tree_lock);
+ prdebug_uid_tag_data_tree(indent_level, &uid_tag_data_tree);
+ prdebug_proc_qtu_data_tree(indent_level, &proc_qtu_data_tree);
+ spin_unlock_bh(&uid_tag_data_tree_lock);
+
+ spin_lock_bh(&iface_stat_list_lock);
+ prdebug_iface_stat_list(indent_level, &iface_stat_list);
+ spin_unlock_bh(&iface_stat_list_lock);
+
+ pr_debug("qtaguid: %s(): }\n", __func__);
+}
+#else
+static void prdebug_full_state_locked(int indent_level, const char *fmt, ...) {}
+#endif
+
+struct proc_ctrl_print_info {
+ struct sock *sk; /* socket found by reading to sk_pos */
+ loff_t sk_pos;
+};
+
+static void *qtaguid_ctrl_proc_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ struct proc_ctrl_print_info *pcpi = m->private;
+ struct sock_tag *sock_tag_entry = v;
+ struct rb_node *node;
+
+ (*pos)++;
+
+ if (!v || v == SEQ_START_TOKEN)
+ return NULL;
+
+ node = rb_next(&sock_tag_entry->sock_node);
+ if (!node) {
+ pcpi->sk = NULL;
+ sock_tag_entry = SEQ_START_TOKEN;
+ } else {
+ sock_tag_entry = rb_entry(node, struct sock_tag, sock_node);
+ pcpi->sk = sock_tag_entry->sk;
+ }
+ pcpi->sk_pos = *pos;
+ return sock_tag_entry;
+}
+
+static void *qtaguid_ctrl_proc_start(struct seq_file *m, loff_t *pos)
+{
+ struct proc_ctrl_print_info *pcpi = m->private;
+ struct sock_tag *sock_tag_entry;
+ struct rb_node *node;
+
+ spin_lock_bh(&sock_tag_list_lock);
+
+ if (unlikely(module_passive))
+ return NULL;
+
+ if (*pos == 0) {
+ pcpi->sk_pos = 0;
+ node = rb_first(&sock_tag_tree);
+ if (!node) {
+ pcpi->sk = NULL;
+ return SEQ_START_TOKEN;
+ }
+ sock_tag_entry = rb_entry(node, struct sock_tag, sock_node);
+ pcpi->sk = sock_tag_entry->sk;
+ } else {
+ sock_tag_entry = (pcpi->sk ? get_sock_stat_nl(pcpi->sk) :
+ NULL) ?: SEQ_START_TOKEN;
+ if (*pos != pcpi->sk_pos) {
+ /* seq_read skipped a next call */
+ *pos = pcpi->sk_pos;
+ return qtaguid_ctrl_proc_next(m, sock_tag_entry, pos);
+ }
+ }
+ return sock_tag_entry;
+}
+
+static void qtaguid_ctrl_proc_stop(struct seq_file *m, void *v)
+{
+ spin_unlock_bh(&sock_tag_list_lock);
+}
+
+/*
+ * Procfs reader to get all active socket tags using style "1)" as described in
+ * fs/proc/generic.c
+ */
+static int qtaguid_ctrl_proc_show(struct seq_file *m, void *v)
+{
+ struct sock_tag *sock_tag_entry = v;
+ uid_t uid;
+
+ CT_DEBUG("qtaguid: proc ctrl pid=%u tgid=%u uid=%u\n",
+ current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()));
+
+ if (sock_tag_entry != SEQ_START_TOKEN) {
+ int sk_ref_count;
+ uid = get_uid_from_tag(sock_tag_entry->tag);
+ CT_DEBUG("qtaguid: proc_read(): sk=%p tag=0x%llx (uid=%u) "
+ "pid=%u\n",
+ sock_tag_entry->sk,
+ sock_tag_entry->tag,
+ uid,
+ sock_tag_entry->pid
+ );
+ sk_ref_count = refcount_read(
+ &sock_tag_entry->sk->sk_refcnt);
+ seq_printf(m, "sock=%pK tag=0x%llx (uid=%u) pid=%u "
+ "f_count=%d\n",
+ sock_tag_entry->sk,
+ sock_tag_entry->tag, uid,
+ sock_tag_entry->pid, sk_ref_count);
+ } else {
+ seq_printf(m, "events: sockets_tagged=%llu "
+ "sockets_untagged=%llu "
+ "counter_set_changes=%llu "
+ "delete_cmds=%llu "
+ "iface_events=%llu "
+ "match_calls=%llu "
+ "match_calls_prepost=%llu "
+ "match_found_sk=%llu "
+ "match_found_sk_in_ct=%llu "
+ "match_found_no_sk_in_ct=%llu "
+ "match_no_sk=%llu "
+ "match_no_sk_gid=%llu\n",
+ (u64)atomic64_read(&qtu_events.sockets_tagged),
+ (u64)atomic64_read(&qtu_events.sockets_untagged),
+ (u64)atomic64_read(&qtu_events.counter_set_changes),
+ (u64)atomic64_read(&qtu_events.delete_cmds),
+ (u64)atomic64_read(&qtu_events.iface_events),
+ (u64)atomic64_read(&qtu_events.match_calls),
+ (u64)atomic64_read(&qtu_events.match_calls_prepost),
+ (u64)atomic64_read(&qtu_events.match_found_sk),
+ (u64)atomic64_read(&qtu_events.match_found_sk_in_ct),
+ (u64)atomic64_read(&qtu_events.match_found_no_sk_in_ct),
+ (u64)atomic64_read(&qtu_events.match_no_sk),
+ (u64)atomic64_read(&qtu_events.match_no_sk_gid));
+
+ /* Count the following as part of the last item_index. No need
+ * to lock the sock_tag_list here since it is already locked when
+ * starting the seq_file operation
+ */
+ prdebug_full_state_locked(0, "proc ctrl");
+ }
+
+ return 0;
+}
+
+/*
+ * Delete socket tags, and stat tags associated with a given
+ * accouting tag and uid.
+ */
+static int ctrl_cmd_delete(const char *input)
+{
+ char cmd;
+ int uid_int;
+ kuid_t uid;
+ uid_t entry_uid;
+ tag_t acct_tag;
+ tag_t tag;
+ int res, argc;
+ struct iface_stat *iface_entry;
+ struct rb_node *node;
+ struct sock_tag *st_entry;
+ struct rb_root st_to_free_tree = RB_ROOT;
+ struct tag_stat *ts_entry;
+ struct tag_counter_set *tcs_entry;
+ struct tag_ref *tr_entry;
+ struct uid_tag_data *utd_entry;
+
+ argc = sscanf(input, "%c %llu %u", &cmd, &acct_tag, &uid_int);
+ uid = make_kuid(&init_user_ns, uid_int);
+ CT_DEBUG("qtaguid: ctrl_delete(%s): argc=%d cmd=%c "
+ "user_tag=0x%llx uid=%u\n", input, argc, cmd,
+ acct_tag, uid_int);
+ if (argc < 2) {
+ res = -EINVAL;
+ goto err;
+ }
+ if (!valid_atag(acct_tag)) {
+ pr_info("qtaguid: ctrl_delete(%s): invalid tag\n", input);
+ res = -EINVAL;
+ goto err;
+ }
+ if (argc < 3) {
+ uid = current_fsuid();
+ uid_int = from_kuid(&init_user_ns, uid);
+ } else if (!can_impersonate_uid(uid)) {
+ pr_info("qtaguid: ctrl_delete(%s): "
+ "insufficient priv from pid=%u tgid=%u uid=%u\n",
+ input, current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()));
+ res = -EPERM;
+ goto err;
+ }
+
+ tag = combine_atag_with_uid(acct_tag, uid_int);
+ CT_DEBUG("qtaguid: ctrl_delete(%s): "
+ "looking for tag=0x%llx (uid=%u)\n",
+ input, tag, uid_int);
+
+ /* Delete socket tags */
+ spin_lock_bh(&sock_tag_list_lock);
+ spin_lock_bh(&uid_tag_data_tree_lock);
+ node = rb_first(&sock_tag_tree);
+ while (node) {
+ st_entry = rb_entry(node, struct sock_tag, sock_node);
+ entry_uid = get_uid_from_tag(st_entry->tag);
+ node = rb_next(node);
+ if (entry_uid != uid_int)
+ continue;
+
+ CT_DEBUG("qtaguid: ctrl_delete(%s): st tag=0x%llx (uid=%u)\n",
+ input, st_entry->tag, entry_uid);
+
+ if (!acct_tag || st_entry->tag == tag) {
+ rb_erase(&st_entry->sock_node, &sock_tag_tree);
+ /* Can't sockfd_put() within spinlock, do it later. */
+ sock_tag_tree_insert(st_entry, &st_to_free_tree);
+ tr_entry = lookup_tag_ref(st_entry->tag, NULL);
+ BUG_ON(tr_entry->num_sock_tags <= 0);
+ tr_entry->num_sock_tags--;
+ /*
+ * TODO: remove if, and start failing.
+ * This is a hack to work around the fact that in some
+ * places we have "if (IS_ERR_OR_NULL(pqd_entry))"
+ * and are trying to work around apps
+ * that didn't open the /dev/xt_qtaguid.
+ */
+ if (st_entry->list.next && st_entry->list.prev)
+ list_del(&st_entry->list);
+ }
+ }
+ spin_unlock_bh(&uid_tag_data_tree_lock);
+ spin_unlock_bh(&sock_tag_list_lock);
+
+ sock_tag_tree_erase(&st_to_free_tree);
+
+ /* Delete tag counter-sets */
+ spin_lock_bh(&tag_counter_set_list_lock);
+ /* Counter sets are only on the uid tag, not full tag */
+ tcs_entry = tag_counter_set_tree_search(&tag_counter_set_tree, tag);
+ if (tcs_entry) {
+ CT_DEBUG("qtaguid: ctrl_delete(%s): "
+ "erase tcs: tag=0x%llx (uid=%u) set=%d\n",
+ input,
+ tcs_entry->tn.tag,
+ get_uid_from_tag(tcs_entry->tn.tag),
+ tcs_entry->active_set);
+ rb_erase(&tcs_entry->tn.node, &tag_counter_set_tree);
+ kfree(tcs_entry);
+ }
+ spin_unlock_bh(&tag_counter_set_list_lock);
+
+ /*
+ * If acct_tag is 0, then all entries belonging to uid are
+ * erased.
+ */
+ spin_lock_bh(&iface_stat_list_lock);
+ list_for_each_entry(iface_entry, &iface_stat_list, list) {
+ spin_lock_bh(&iface_entry->tag_stat_list_lock);
+ node = rb_first(&iface_entry->tag_stat_tree);
+ while (node) {
+ ts_entry = rb_entry(node, struct tag_stat, tn.node);
+ entry_uid = get_uid_from_tag(ts_entry->tn.tag);
+ node = rb_next(node);
+
+ CT_DEBUG("qtaguid: ctrl_delete(%s): "
+ "ts tag=0x%llx (uid=%u)\n",
+ input, ts_entry->tn.tag, entry_uid);
+
+ if (entry_uid != uid_int)
+ continue;
+ if (!acct_tag || ts_entry->tn.tag == tag) {
+ CT_DEBUG("qtaguid: ctrl_delete(%s): "
+ "erase ts: %s 0x%llx %u\n",
+ input, iface_entry->ifname,
+ get_atag_from_tag(ts_entry->tn.tag),
+ entry_uid);
+ rb_erase(&ts_entry->tn.node,
+ &iface_entry->tag_stat_tree);
+ kfree(ts_entry);
+ }
+ }
+ spin_unlock_bh(&iface_entry->tag_stat_list_lock);
+ }
+ spin_unlock_bh(&iface_stat_list_lock);
+
+ /* Cleanup the uid_tag_data */
+ spin_lock_bh(&uid_tag_data_tree_lock);
+ node = rb_first(&uid_tag_data_tree);
+ while (node) {
+ utd_entry = rb_entry(node, struct uid_tag_data, node);
+ entry_uid = utd_entry->uid;
+ node = rb_next(node);
+
+ CT_DEBUG("qtaguid: ctrl_delete(%s): "
+ "utd uid=%u\n",
+ input, entry_uid);
+
+ if (entry_uid != uid_int)
+ continue;
+ /*
+ * Go over the tag_refs, and those that don't have
+ * sock_tags using them are freed.
+ */
+ put_tag_ref_tree(tag, utd_entry);
+ put_utd_entry(utd_entry);
+ }
+ spin_unlock_bh(&uid_tag_data_tree_lock);
+
+ atomic64_inc(&qtu_events.delete_cmds);
+ res = 0;
+
+err:
+ return res;
+}
+
+static int ctrl_cmd_counter_set(const char *input)
+{
+ char cmd;
+ uid_t uid = 0;
+ tag_t tag;
+ int res, argc;
+ struct tag_counter_set *tcs;
+ int counter_set;
+
+ argc = sscanf(input, "%c %d %u", &cmd, &counter_set, &uid);
+ CT_DEBUG("qtaguid: ctrl_counterset(%s): argc=%d cmd=%c "
+ "set=%d uid=%u\n", input, argc, cmd,
+ counter_set, uid);
+ if (argc != 3) {
+ res = -EINVAL;
+ goto err;
+ }
+ if (counter_set < 0 || counter_set >= IFS_MAX_COUNTER_SETS) {
+ pr_info("qtaguid: ctrl_counterset(%s): invalid counter_set range\n",
+ input);
+ res = -EINVAL;
+ goto err;
+ }
+ if (!can_manipulate_uids()) {
+ pr_info("qtaguid: ctrl_counterset(%s): "
+ "insufficient priv from pid=%u tgid=%u uid=%u\n",
+ input, current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()));
+ res = -EPERM;
+ goto err;
+ }
+
+ tag = make_tag_from_uid(uid);
+ spin_lock_bh(&tag_counter_set_list_lock);
+ tcs = tag_counter_set_tree_search(&tag_counter_set_tree, tag);
+ if (!tcs) {
+ tcs = kzalloc(sizeof(*tcs), GFP_ATOMIC);
+ if (!tcs) {
+ spin_unlock_bh(&tag_counter_set_list_lock);
+ pr_err("qtaguid: ctrl_counterset(%s): "
+ "failed to alloc counter set\n",
+ input);
+ res = -ENOMEM;
+ goto err;
+ }
+ tcs->tn.tag = tag;
+ tag_counter_set_tree_insert(tcs, &tag_counter_set_tree);
+ CT_DEBUG("qtaguid: ctrl_counterset(%s): added tcs tag=0x%llx "
+ "(uid=%u) set=%d\n",
+ input, tag, get_uid_from_tag(tag), counter_set);
+ }
+ tcs->active_set = counter_set;
+ spin_unlock_bh(&tag_counter_set_list_lock);
+ atomic64_inc(&qtu_events.counter_set_changes);
+ res = 0;
+
+err:
+ return res;
+}
+
+static int ctrl_cmd_tag(const char *input)
+{
+ char cmd;
+ int sock_fd = 0;
+ kuid_t uid;
+ unsigned int uid_int = 0;
+ tag_t acct_tag = make_atag_from_value(0);
+ tag_t full_tag;
+ struct socket *el_socket;
+ int res, argc;
+ struct sock_tag *sock_tag_entry;
+ struct tag_ref *tag_ref_entry;
+ struct uid_tag_data *uid_tag_data_entry;
+ struct proc_qtu_data *pqd_entry;
+
+ /* Unassigned args will get defaulted later. */
+ argc = sscanf(input, "%c %d %llu %u", &cmd, &sock_fd, &acct_tag, &uid_int);
+ uid = make_kuid(&init_user_ns, uid_int);
+ CT_DEBUG("qtaguid: ctrl_tag(%s): argc=%d cmd=%c sock_fd=%d "
+ "acct_tag=0x%llx uid=%u\n", input, argc, cmd, sock_fd,
+ acct_tag, uid_int);
+ if (argc < 2) {
+ res = -EINVAL;
+ goto err;
+ }
+ el_socket = sockfd_lookup(sock_fd, &res); /* This locks the file */
+ if (!el_socket) {
+ pr_info("qtaguid: ctrl_tag(%s): failed to lookup"
+ " sock_fd=%d err=%d pid=%u tgid=%u uid=%u\n",
+ input, sock_fd, res, current->pid, current->tgid,
+ from_kuid(&init_user_ns, current_fsuid()));
+ goto err;
+ }
+ CT_DEBUG("qtaguid: ctrl_tag(%s): socket->...->sk_refcnt=%d ->sk=%p\n",
+ input, refcount_read(&el_socket->sk->sk_refcnt),
+ el_socket->sk);
+ if (argc < 3) {
+ acct_tag = make_atag_from_value(0);
+ } else if (!valid_atag(acct_tag)) {
+ pr_info("qtaguid: ctrl_tag(%s): invalid tag\n", input);
+ res = -EINVAL;
+ goto err_put;
+ }
+ CT_DEBUG("qtaguid: ctrl_tag(%s): "
+ "pid=%u tgid=%u uid=%u euid=%u fsuid=%u "
+ "ctrl.gid=%u in_group()=%d in_egroup()=%d\n",
+ input, current->pid, current->tgid,
+ from_kuid(&init_user_ns, current_uid()),
+ from_kuid(&init_user_ns, current_euid()),
+ from_kuid(&init_user_ns, current_fsuid()),
+ from_kgid(&init_user_ns, xt_qtaguid_ctrl_file->gid),
+ in_group_p(xt_qtaguid_ctrl_file->gid),
+ in_egroup_p(xt_qtaguid_ctrl_file->gid));
+ if (argc < 4) {
+ uid = current_fsuid();
+ uid_int = from_kuid(&init_user_ns, uid);
+ } else if (!can_impersonate_uid(uid)) {
+ pr_info("qtaguid: ctrl_tag(%s): "
+ "insufficient priv from pid=%u tgid=%u uid=%u\n",
+ input, current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()));
+ res = -EPERM;
+ goto err_put;
+ }
+ full_tag = combine_atag_with_uid(acct_tag, uid_int);
+
+ spin_lock_bh(&sock_tag_list_lock);
+ spin_lock_bh(&uid_tag_data_tree_lock);
+ sock_tag_entry = get_sock_stat_nl(el_socket->sk);
+ tag_ref_entry = get_tag_ref(full_tag, &uid_tag_data_entry);
+ if (IS_ERR(tag_ref_entry)) {
+ res = PTR_ERR(tag_ref_entry);
+ spin_unlock_bh(&uid_tag_data_tree_lock);
+ spin_unlock_bh(&sock_tag_list_lock);
+ goto err_put;
+ }
+ tag_ref_entry->num_sock_tags++;
+ if (sock_tag_entry) {
+ struct tag_ref *prev_tag_ref_entry;
+
+ CT_DEBUG("qtaguid: ctrl_tag(%s): retag for sk=%p "
+ "st@%p ...->sk_refcnt=%d\n",
+ input, el_socket->sk, sock_tag_entry,
+ refcount_read(&el_socket->sk->sk_refcnt));
+ prev_tag_ref_entry = lookup_tag_ref(sock_tag_entry->tag,
+ &uid_tag_data_entry);
+ BUG_ON(IS_ERR_OR_NULL(prev_tag_ref_entry));
+ BUG_ON(prev_tag_ref_entry->num_sock_tags <= 0);
+ prev_tag_ref_entry->num_sock_tags--;
+ sock_tag_entry->tag = full_tag;
+ } else {
+ CT_DEBUG("qtaguid: ctrl_tag(%s): newtag for sk=%p\n",
+ input, el_socket->sk);
+ sock_tag_entry = kzalloc(sizeof(*sock_tag_entry),
+ GFP_ATOMIC);
+ if (!sock_tag_entry) {
+ pr_err("qtaguid: ctrl_tag(%s): "
+ "socket tag alloc failed\n",
+ input);
+ BUG_ON(tag_ref_entry->num_sock_tags <= 0);
+ tag_ref_entry->num_sock_tags--;
+ free_tag_ref_from_utd_entry(tag_ref_entry,
+ uid_tag_data_entry);
+ spin_unlock_bh(&uid_tag_data_tree_lock);
+ spin_unlock_bh(&sock_tag_list_lock);
+ res = -ENOMEM;
+ goto err_put;
+ }
+ /*
+ * Hold the sk refcount here to make sure the sk pointer cannot
+ * be freed and reused
+ */
+ sock_hold(el_socket->sk);
+ sock_tag_entry->sk = el_socket->sk;
+ sock_tag_entry->pid = current->tgid;
+ sock_tag_entry->tag = combine_atag_with_uid(acct_tag, uid_int);
+ pqd_entry = proc_qtu_data_tree_search(
+ &proc_qtu_data_tree, current->tgid);
+ /*
+ * TODO: remove if, and start failing.
+ * At first, we want to catch user-space code that is not
+ * opening the /dev/xt_qtaguid.
+ */
+ if (IS_ERR_OR_NULL(pqd_entry))
+ pr_warn_once(
+ "qtaguid: %s(): "
+ "User space forgot to open /dev/xt_qtaguid? "
+ "pid=%u tgid=%u uid=%u\n", __func__,
+ current->pid, current->tgid,
+ from_kuid(&init_user_ns, current_fsuid()));
+ else
+ list_add(&sock_tag_entry->list,
+ &pqd_entry->sock_tag_list);
+
+ sock_tag_tree_insert(sock_tag_entry, &sock_tag_tree);
+ atomic64_inc(&qtu_events.sockets_tagged);
+ }
+ spin_unlock_bh(&uid_tag_data_tree_lock);
+ spin_unlock_bh(&sock_tag_list_lock);
+ /* We keep the ref to the sk until it is untagged */
+ CT_DEBUG("qtaguid: ctrl_tag(%s): done st@%p ...->sk_refcnt=%d\n",
+ input, sock_tag_entry,
+ refcount_read(&el_socket->sk->sk_refcnt));
+ sockfd_put(el_socket);
+ return 0;
+
+err_put:
+ CT_DEBUG("qtaguid: ctrl_tag(%s): done. ...->sk_refcnt=%d\n",
+ input, refcount_read(&el_socket->sk->sk_refcnt) - 1);
+ /* Release the sock_fd that was grabbed by sockfd_lookup(). */
+ sockfd_put(el_socket);
+ return res;
+
+err:
+ CT_DEBUG("qtaguid: ctrl_tag(%s): done.\n", input);
+ return res;
+}
+
+static int ctrl_cmd_untag(const char *input)
+{
+ char cmd;
+ int sock_fd = 0;
+ struct socket *el_socket;
+ int res, argc;
+
+ argc = sscanf(input, "%c %d", &cmd, &sock_fd);
+ CT_DEBUG("qtaguid: ctrl_untag(%s): argc=%d cmd=%c sock_fd=%d\n",
+ input, argc, cmd, sock_fd);
+ if (argc < 2) {
+ res = -EINVAL;
+ return res;
+ }
+ el_socket = sockfd_lookup(sock_fd, &res); /* This locks the file */
+ if (!el_socket) {
+ pr_info("qtaguid: ctrl_untag(%s): failed to lookup"
+ " sock_fd=%d err=%d pid=%u tgid=%u uid=%u\n",
+ input, sock_fd, res, current->pid, current->tgid,
+ from_kuid(&init_user_ns, current_fsuid()));
+ return res;
+ }
+ CT_DEBUG("qtaguid: ctrl_untag(%s): socket->...->f_count=%ld ->sk=%p\n",
+ input, atomic_long_read(&el_socket->file->f_count),
+ el_socket->sk);
+ res = qtaguid_untag(el_socket, false);
+ sockfd_put(el_socket);
+ return res;
+}
+
+int qtaguid_untag(struct socket *el_socket, bool kernel)
+{
+ int res;
+ pid_t pid;
+ struct sock_tag *sock_tag_entry;
+ struct tag_ref *tag_ref_entry;
+ struct uid_tag_data *utd_entry;
+ struct proc_qtu_data *pqd_entry;
+
+ spin_lock_bh(&sock_tag_list_lock);
+ sock_tag_entry = get_sock_stat_nl(el_socket->sk);
+ if (!sock_tag_entry) {
+ spin_unlock_bh(&sock_tag_list_lock);
+ res = -EINVAL;
+ return res;
+ }
+ /*
+ * The socket already belongs to the current process
+ * so it can do whatever it wants to it.
+ */
+ rb_erase(&sock_tag_entry->sock_node, &sock_tag_tree);
+
+ tag_ref_entry = lookup_tag_ref(sock_tag_entry->tag, &utd_entry);
+ BUG_ON(!tag_ref_entry);
+ BUG_ON(tag_ref_entry->num_sock_tags <= 0);
+ spin_lock_bh(&uid_tag_data_tree_lock);
+ if (kernel)
+ pid = sock_tag_entry->pid;
+ else
+ pid = current->tgid;
+ pqd_entry = proc_qtu_data_tree_search(
+ &proc_qtu_data_tree, pid);
+ /*
+ * TODO: remove if, and start failing.
+ * At first, we want to catch user-space code that is not
+ * opening the /dev/xt_qtaguid.
+ */
+ if (IS_ERR_OR_NULL(pqd_entry) || !sock_tag_entry->list.next) {
+ pr_warn_once("qtaguid: %s(): "
+ "User space forgot to open /dev/xt_qtaguid? "
+ "pid=%u tgid=%u sk_pid=%u, uid=%u\n", __func__,
+ current->pid, current->tgid, sock_tag_entry->pid,
+ from_kuid(&init_user_ns, current_fsuid()));
+ } else {
+ list_del(&sock_tag_entry->list);
+ }
+ spin_unlock_bh(&uid_tag_data_tree_lock);
+ /*
+ * We don't free tag_ref from the utd_entry here,
+ * only during a cmd_delete().
+ */
+ tag_ref_entry->num_sock_tags--;
+ spin_unlock_bh(&sock_tag_list_lock);
+ /*
+ * Release the sock_fd that was grabbed at tag time.
+ */
+ sock_put(sock_tag_entry->sk);
+ CT_DEBUG("qtaguid: done. st@%p ...->sk_refcnt=%d\n",
+ sock_tag_entry,
+ refcount_read(&el_socket->sk->sk_refcnt));
+
+ kfree(sock_tag_entry);
+ atomic64_inc(&qtu_events.sockets_untagged);
+
+ return 0;
+}
+
+static ssize_t qtaguid_ctrl_parse(const char *input, size_t count)
+{
+ char cmd;
+ ssize_t res;
+
+ CT_DEBUG("qtaguid: ctrl(%s): pid=%u tgid=%u uid=%u\n",
+ input, current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()));
+
+ cmd = input[0];
+ /* Collect params for commands */
+ switch (cmd) {
+ case 'd':
+ res = ctrl_cmd_delete(input);
+ break;
+
+ case 's':
+ res = ctrl_cmd_counter_set(input);
+ break;
+
+ case 't':
+ res = ctrl_cmd_tag(input);
+ break;
+
+ case 'u':
+ res = ctrl_cmd_untag(input);
+ break;
+
+ default:
+ res = -EINVAL;
+ goto err;
+ }
+ if (!res)
+ res = count;
+err:
+ CT_DEBUG("qtaguid: ctrl(%s): res=%zd\n", input, res);
+ return res;
+}
+
+#define MAX_QTAGUID_CTRL_INPUT_LEN 255
+static ssize_t qtaguid_ctrl_proc_write(struct file *file, const char __user *buffer,
+ size_t count, loff_t *offp)
+{
+ char input_buf[MAX_QTAGUID_CTRL_INPUT_LEN];
+
+ if (unlikely(module_passive))
+ return count;
+
+ if (count >= MAX_QTAGUID_CTRL_INPUT_LEN)
+ return -EINVAL;
+
+ if (copy_from_user(input_buf, buffer, count))
+ return -EFAULT;
+
+ input_buf[count] = '\0';
+ return qtaguid_ctrl_parse(input_buf, count);
+}
+
+struct proc_print_info {
+ struct iface_stat *iface_entry;
+ int item_index;
+ tag_t tag; /* tag found by reading to tag_pos */
+ off_t tag_pos;
+ int tag_item_index;
+};
+
+static void pp_stats_header(struct seq_file *m)
+{
+ seq_puts(m,
+ "idx iface acct_tag_hex uid_tag_int cnt_set "
+ "rx_bytes rx_packets "
+ "tx_bytes tx_packets "
+ "rx_tcp_bytes rx_tcp_packets "
+ "rx_udp_bytes rx_udp_packets "
+ "rx_other_bytes rx_other_packets "
+ "tx_tcp_bytes tx_tcp_packets "
+ "tx_udp_bytes tx_udp_packets "
+ "tx_other_bytes tx_other_packets\n");
+}
+
+static int pp_stats_line(struct seq_file *m, struct tag_stat *ts_entry,
+ int cnt_set)
+{
+ struct data_counters *cnts;
+ tag_t tag = ts_entry->tn.tag;
+ uid_t stat_uid = get_uid_from_tag(tag);
+ struct proc_print_info *ppi = m->private;
+ /* Detailed tags are not available to everybody */
+ if (!can_read_other_uid_stats(make_kuid(&init_user_ns,stat_uid))) {
+ CT_DEBUG("qtaguid: stats line: "
+ "%s 0x%llx %u: insufficient priv "
+ "from pid=%u tgid=%u uid=%u stats.gid=%u\n",
+ ppi->iface_entry->ifname,
+ get_atag_from_tag(tag), stat_uid,
+ current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()),
+ from_kgid(&init_user_ns,xt_qtaguid_stats_file->gid));
+ return 0;
+ }
+ ppi->item_index++;
+ cnts = &ts_entry->counters;
+ seq_printf(m, "%d %s 0x%llx %u %u "
+ "%llu %llu "
+ "%llu %llu "
+ "%llu %llu "
+ "%llu %llu "
+ "%llu %llu "
+ "%llu %llu "
+ "%llu %llu "
+ "%llu %llu\n",
+ ppi->item_index,
+ ppi->iface_entry->ifname,
+ get_atag_from_tag(tag),
+ stat_uid,
+ cnt_set,
+ dc_sum_bytes(cnts, cnt_set, IFS_RX),
+ dc_sum_packets(cnts, cnt_set, IFS_RX),
+ dc_sum_bytes(cnts, cnt_set, IFS_TX),
+ dc_sum_packets(cnts, cnt_set, IFS_TX),
+ cnts->bpc[cnt_set][IFS_RX][IFS_TCP].bytes,
+ cnts->bpc[cnt_set][IFS_RX][IFS_TCP].packets,
+ cnts->bpc[cnt_set][IFS_RX][IFS_UDP].bytes,
+ cnts->bpc[cnt_set][IFS_RX][IFS_UDP].packets,
+ cnts->bpc[cnt_set][IFS_RX][IFS_PROTO_OTHER].bytes,
+ cnts->bpc[cnt_set][IFS_RX][IFS_PROTO_OTHER].packets,
+ cnts->bpc[cnt_set][IFS_TX][IFS_TCP].bytes,
+ cnts->bpc[cnt_set][IFS_TX][IFS_TCP].packets,
+ cnts->bpc[cnt_set][IFS_TX][IFS_UDP].bytes,
+ cnts->bpc[cnt_set][IFS_TX][IFS_UDP].packets,
+ cnts->bpc[cnt_set][IFS_TX][IFS_PROTO_OTHER].bytes,
+ cnts->bpc[cnt_set][IFS_TX][IFS_PROTO_OTHER].packets);
+ return seq_has_overflowed(m) ? -ENOSPC : 1;
+}
+
+static bool pp_sets(struct seq_file *m, struct tag_stat *ts_entry)
+{
+ int ret;
+ int counter_set;
+ for (counter_set = 0; counter_set < IFS_MAX_COUNTER_SETS;
+ counter_set++) {
+ ret = pp_stats_line(m, ts_entry, counter_set);
+ if (ret < 0)
+ return false;
+ }
+ return true;
+}
+
+static int qtaguid_stats_proc_iface_stat_ptr_valid(struct iface_stat *ptr)
+{
+ struct iface_stat *iface_entry;
+
+ if (!ptr)
+ return false;
+
+ list_for_each_entry(iface_entry, &iface_stat_list, list)
+ if (iface_entry == ptr)
+ return true;
+ return false;
+}
+
+static void qtaguid_stats_proc_next_iface_entry(struct proc_print_info *ppi)
+{
+ spin_unlock_bh(&ppi->iface_entry->tag_stat_list_lock);
+ list_for_each_entry_continue(ppi->iface_entry, &iface_stat_list, list) {
+ spin_lock_bh(&ppi->iface_entry->tag_stat_list_lock);
+ return;
+ }
+ ppi->iface_entry = NULL;
+}
+
+static void *qtaguid_stats_proc_next(struct seq_file *m, void *v, loff_t *pos)
+{
+ struct proc_print_info *ppi = m->private;
+ struct tag_stat *ts_entry;
+ struct rb_node *node;
+
+ if (!v) {
+ pr_err("qtaguid: %s(): unexpected v: NULL\n", __func__);
+ return NULL;
+ }
+
+ (*pos)++;
+
+ if (!ppi->iface_entry || unlikely(module_passive))
+ return NULL;
+
+ if (v == SEQ_START_TOKEN)
+ node = rb_first(&ppi->iface_entry->tag_stat_tree);
+ else
+ node = rb_next(&((struct tag_stat *)v)->tn.node);
+
+ while (!node) {
+ qtaguid_stats_proc_next_iface_entry(ppi);
+ if (!ppi->iface_entry)
+ return NULL;
+ node = rb_first(&ppi->iface_entry->tag_stat_tree);
+ }
+
+ ts_entry = rb_entry(node, struct tag_stat, tn.node);
+ ppi->tag = ts_entry->tn.tag;
+ ppi->tag_pos = *pos;
+ ppi->tag_item_index = ppi->item_index;
+ return ts_entry;
+}
+
+static void *qtaguid_stats_proc_start(struct seq_file *m, loff_t *pos)
+{
+ struct proc_print_info *ppi = m->private;
+ struct tag_stat *ts_entry = NULL;
+
+ spin_lock_bh(&iface_stat_list_lock);
+
+ if (*pos == 0) {
+ ppi->item_index = 1;
+ ppi->tag_pos = 0;
+ if (list_empty(&iface_stat_list)) {
+ ppi->iface_entry = NULL;
+ } else {
+ ppi->iface_entry = list_first_entry(&iface_stat_list,
+ struct iface_stat,
+ list);
+ spin_lock_bh(&ppi->iface_entry->tag_stat_list_lock);
+ }
+ return SEQ_START_TOKEN;
+ }
+ if (!qtaguid_stats_proc_iface_stat_ptr_valid(ppi->iface_entry)) {
+ if (ppi->iface_entry) {
+ pr_err("qtaguid: %s(): iface_entry %p not found\n",
+ __func__, ppi->iface_entry);
+ ppi->iface_entry = NULL;
+ }
+ return NULL;
+ }
+
+ spin_lock_bh(&ppi->iface_entry->tag_stat_list_lock);
+
+ if (!ppi->tag_pos) {
+ /* seq_read skipped first next call */
+ ts_entry = SEQ_START_TOKEN;
+ } else {
+ ts_entry = tag_stat_tree_search(
+ &ppi->iface_entry->tag_stat_tree, ppi->tag);
+ if (!ts_entry) {
+ pr_info("qtaguid: %s(): tag_stat.tag 0x%llx not found. Abort.\n",
+ __func__, ppi->tag);
+ return NULL;
+ }
+ }
+
+ if (*pos == ppi->tag_pos) { /* normal resume */
+ ppi->item_index = ppi->tag_item_index;
+ } else {
+ /* seq_read skipped a next call */
+ *pos = ppi->tag_pos;
+ ts_entry = qtaguid_stats_proc_next(m, ts_entry, pos);
+ }
+
+ return ts_entry;
+}
+
+static void qtaguid_stats_proc_stop(struct seq_file *m, void *v)
+{
+ struct proc_print_info *ppi = m->private;
+ if (ppi->iface_entry)
+ spin_unlock_bh(&ppi->iface_entry->tag_stat_list_lock);
+ spin_unlock_bh(&iface_stat_list_lock);
+}
+
+/*
+ * Procfs reader to get all tag stats using style "1)" as described in
+ * fs/proc/generic.c
+ * Groups all protocols tx/rx bytes.
+ */
+static int qtaguid_stats_proc_show(struct seq_file *m, void *v)
+{
+ struct tag_stat *ts_entry = v;
+
+ if (v == SEQ_START_TOKEN)
+ pp_stats_header(m);
+ else
+ pp_sets(m, ts_entry);
+
+ return 0;
+}
+
+/*------------------------------------------*/
+static int qtudev_open(struct inode *inode, struct file *file)
+{
+ struct uid_tag_data *utd_entry;
+ struct proc_qtu_data *pqd_entry;
+ struct proc_qtu_data *new_pqd_entry;
+ int res;
+ bool utd_entry_found;
+
+ if (unlikely(qtu_proc_handling_passive))
+ return 0;
+
+ DR_DEBUG("qtaguid: qtudev_open(): pid=%u tgid=%u uid=%u\n",
+ current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()));
+
+ spin_lock_bh(&uid_tag_data_tree_lock);
+
+ /* Look for existing uid data, or alloc one. */
+ utd_entry = get_uid_data(from_kuid(&init_user_ns, current_fsuid()), &utd_entry_found);
+ if (IS_ERR_OR_NULL(utd_entry)) {
+ res = PTR_ERR(utd_entry);
+ goto err_unlock;
+ }
+
+ /* Look for existing PID based proc_data */
+ pqd_entry = proc_qtu_data_tree_search(&proc_qtu_data_tree,
+ current->tgid);
+ if (pqd_entry) {
+ pr_err("qtaguid: qtudev_open(): %u/%u %u "
+ "%s already opened\n",
+ current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()),
+ QTU_DEV_NAME);
+ res = -EBUSY;
+ goto err_unlock_free_utd;
+ }
+
+ new_pqd_entry = kzalloc(sizeof(*new_pqd_entry), GFP_ATOMIC);
+ if (!new_pqd_entry) {
+ pr_err("qtaguid: qtudev_open(): %u/%u %u: "
+ "proc data alloc failed\n",
+ current->pid, current->tgid, from_kuid(&init_user_ns, current_fsuid()));
+ res = -ENOMEM;
+ goto err_unlock_free_utd;
+ }
+ new_pqd_entry->pid = current->tgid;
+ INIT_LIST_HEAD(&new_pqd_entry->sock_tag_list);
+ new_pqd_entry->parent_tag_data = utd_entry;
+ utd_entry->num_pqd++;
+
+ proc_qtu_data_tree_insert(new_pqd_entry,
+ &proc_qtu_data_tree);
+
+ spin_unlock_bh(&uid_tag_data_tree_lock);
+ DR_DEBUG("qtaguid: tracking data for uid=%u in pqd=%p\n",
+ from_kuid(&init_user_ns, current_fsuid()), new_pqd_entry);
+ file->private_data = new_pqd_entry;
+ return 0;
+
+err_unlock_free_utd:
+ if (!utd_entry_found) {
+ rb_erase(&utd_entry->node, &uid_tag_data_tree);
+ kfree(utd_entry);
+ }
+err_unlock:
+ spin_unlock_bh(&uid_tag_data_tree_lock);
+ return res;
+}
+
+static int qtudev_release(struct inode *inode, struct file *file)
+{
+ struct proc_qtu_data *pqd_entry = file->private_data;
+ struct uid_tag_data *utd_entry = pqd_entry->parent_tag_data;
+ struct sock_tag *st_entry;
+ struct rb_root st_to_free_tree = RB_ROOT;
+ struct list_head *entry, *next;
+ struct tag_ref *tr;
+
+ if (unlikely(qtu_proc_handling_passive))
+ return 0;
+
+ /*
+ * Do not trust the current->pid, it might just be a kworker cleaning
+ * up after a dead proc.
+ */
+ DR_DEBUG("qtaguid: qtudev_release(): "
+ "pid=%u tgid=%u uid=%u "
+ "pqd_entry=%p->pid=%u utd_entry=%p->active_tags=%d\n",
+ current->pid, current->tgid, pqd_entry->parent_tag_data->uid,
+ pqd_entry, pqd_entry->pid, utd_entry,
+ utd_entry->num_active_tags);
+
+ spin_lock_bh(&sock_tag_list_lock);
+ spin_lock_bh(&uid_tag_data_tree_lock);
+
+ list_for_each_safe(entry, next, &pqd_entry->sock_tag_list) {
+ st_entry = list_entry(entry, struct sock_tag, list);
+ DR_DEBUG("qtaguid: %s(): "
+ "erase sock_tag=%p->sk=%p pid=%u tgid=%u uid=%u\n",
+ __func__,
+ st_entry, st_entry->sk,
+ current->pid, current->tgid,
+ pqd_entry->parent_tag_data->uid);
+
+ utd_entry = uid_tag_data_tree_search(
+ &uid_tag_data_tree,
+ get_uid_from_tag(st_entry->tag));
+ BUG_ON(IS_ERR_OR_NULL(utd_entry));
+ DR_DEBUG("qtaguid: %s(): "
+ "looking for tag=0x%llx in utd_entry=%p\n", __func__,
+ st_entry->tag, utd_entry);
+ tr = tag_ref_tree_search(&utd_entry->tag_ref_tree,
+ st_entry->tag);
+ BUG_ON(!tr);
+ BUG_ON(tr->num_sock_tags <= 0);
+ tr->num_sock_tags--;
+ free_tag_ref_from_utd_entry(tr, utd_entry);
+
+ rb_erase(&st_entry->sock_node, &sock_tag_tree);
+ list_del(&st_entry->list);
+ /* Can't sockfd_put() within spinlock, do it later. */
+ sock_tag_tree_insert(st_entry, &st_to_free_tree);
+
+ /*
+ * Try to free the utd_entry if no other proc_qtu_data is
+ * using it (num_pqd is 0) and it doesn't have active tags
+ * (num_active_tags is 0).
+ */
+ put_utd_entry(utd_entry);
+ }
+
+ rb_erase(&pqd_entry->node, &proc_qtu_data_tree);
+ BUG_ON(pqd_entry->parent_tag_data->num_pqd < 1);
+ pqd_entry->parent_tag_data->num_pqd--;
+ put_utd_entry(pqd_entry->parent_tag_data);
+ kfree(pqd_entry);
+ file->private_data = NULL;
+
+ spin_unlock_bh(&uid_tag_data_tree_lock);
+ spin_unlock_bh(&sock_tag_list_lock);
+
+
+ sock_tag_tree_erase(&st_to_free_tree);
+
+ spin_lock_bh(&sock_tag_list_lock);
+ prdebug_full_state_locked(0, "%s(): pid=%u tgid=%u", __func__,
+ current->pid, current->tgid);
+ spin_unlock_bh(&sock_tag_list_lock);
+ return 0;
+}
+
+/*------------------------------------------*/
+static const struct file_operations qtudev_fops = {
+ .owner = THIS_MODULE,
+ .open = qtudev_open,
+ .release = qtudev_release,
+};
+
+static struct miscdevice qtu_device = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = QTU_DEV_NAME,
+ .fops = &qtudev_fops,
+ /* How sad it doesn't allow for defaults: .mode = S_IRUGO | S_IWUSR */
+};
+
+static const struct seq_operations proc_qtaguid_ctrl_seqops = {
+ .start = qtaguid_ctrl_proc_start,
+ .next = qtaguid_ctrl_proc_next,
+ .stop = qtaguid_ctrl_proc_stop,
+ .show = qtaguid_ctrl_proc_show,
+};
+
+static int proc_qtaguid_ctrl_open(struct inode *inode, struct file *file)
+{
+ return seq_open_private(file, &proc_qtaguid_ctrl_seqops,
+ sizeof(struct proc_ctrl_print_info));
+}
+
+static const struct file_operations proc_qtaguid_ctrl_fops = {
+ .open = proc_qtaguid_ctrl_open,
+ .read = seq_read,
+ .write = qtaguid_ctrl_proc_write,
+ .llseek = seq_lseek,
+ .release = seq_release_private,
+};
+
+static const struct seq_operations proc_qtaguid_stats_seqops = {
+ .start = qtaguid_stats_proc_start,
+ .next = qtaguid_stats_proc_next,
+ .stop = qtaguid_stats_proc_stop,
+ .show = qtaguid_stats_proc_show,
+};
+
+static int proc_qtaguid_stats_open(struct inode *inode, struct file *file)
+{
+ return seq_open_private(file, &proc_qtaguid_stats_seqops,
+ sizeof(struct proc_print_info));
+}
+
+static const struct file_operations proc_qtaguid_stats_fops = {
+ .open = proc_qtaguid_stats_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = seq_release_private,
+};
+
+/*------------------------------------------*/
+static int __init qtaguid_proc_register(struct proc_dir_entry **res_procdir)
+{
+ int ret;
+ *res_procdir = proc_mkdir(module_procdirname, init_net.proc_net);
+ if (!*res_procdir) {
+ pr_err("qtaguid: failed to create proc/.../xt_qtaguid\n");
+ ret = -ENOMEM;
+ goto no_dir;
+ }
+
+ xt_qtaguid_ctrl_file = proc_create_data("ctrl", proc_ctrl_perms,
+ *res_procdir,
+ &proc_qtaguid_ctrl_fops,
+ NULL);
+ if (!xt_qtaguid_ctrl_file) {
+ pr_err("qtaguid: failed to create xt_qtaguid/ctrl "
+ " file\n");
+ ret = -ENOMEM;
+ goto no_ctrl_entry;
+ }
+
+ xt_qtaguid_stats_file = proc_create_data("stats", proc_stats_perms,
+ *res_procdir,
+ &proc_qtaguid_stats_fops,
+ NULL);
+ if (!xt_qtaguid_stats_file) {
+ pr_err("qtaguid: failed to create xt_qtaguid/stats "
+ "file\n");
+ ret = -ENOMEM;
+ goto no_stats_entry;
+ }
+ /*
+ * TODO: add support counter hacking
+ * xt_qtaguid_stats_file->write_proc = qtaguid_stats_proc_write;
+ */
+ return 0;
+
+no_stats_entry:
+ remove_proc_entry("ctrl", *res_procdir);
+no_ctrl_entry:
+ remove_proc_entry("xt_qtaguid", NULL);
+no_dir:
+ return ret;
+}
+
+static struct xt_match qtaguid_mt_reg __read_mostly = {
+ /*
+ * This module masquerades as the "owner" module so that iptables
+ * tools can deal with it.
+ */
+ .name = "owner",
+ .revision = 1,
+ .family = NFPROTO_UNSPEC,
+ .match = qtaguid_mt,
+ .matchsize = sizeof(struct xt_qtaguid_match_info),
+ .me = THIS_MODULE,
+};
+
+static int __init qtaguid_mt_init(void)
+{
+ if (qtaguid_proc_register(&xt_qtaguid_procdir)
+ || iface_stat_init(xt_qtaguid_procdir)
+ || xt_register_match(&qtaguid_mt_reg)
+ || misc_register(&qtu_device))
+ return -1;
+ return 0;
+}
+
+/*
+ * TODO: allow unloading of the module.
+ * For now stats are permanent.
+ * Kconfig forces'y/n' and never an 'm'.
+ */
+
+module_init(qtaguid_mt_init);
+MODULE_AUTHOR("jpa <jpa@google.com>");
+MODULE_DESCRIPTION("Xtables: socket owner+tag matching and associated stats");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("ipt_owner");
+MODULE_ALIAS("ip6t_owner");
+MODULE_ALIAS("ipt_qtaguid");
+MODULE_ALIAS("ip6t_qtaguid");
diff --git a/net/netfilter/xt_qtaguid_internal.h b/net/netfilter/xt_qtaguid_internal.h
new file mode 100644
index 0000000..c705270
--- /dev/null
+++ b/net/netfilter/xt_qtaguid_internal.h
@@ -0,0 +1,350 @@
+/*
+ * Kernel iptables module to track stats for packets based on user tags.
+ *
+ * (C) 2011 Google, Inc
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef __XT_QTAGUID_INTERNAL_H__
+#define __XT_QTAGUID_INTERNAL_H__
+
+#include <linux/types.h>
+#include <linux/rbtree.h>
+#include <linux/spinlock_types.h>
+#include <linux/workqueue.h>
+
+/* Iface handling */
+#define IDEBUG_MASK (1<<0)
+/* Iptable Matching. Per packet. */
+#define MDEBUG_MASK (1<<1)
+/* Red-black tree handling. Per packet. */
+#define RDEBUG_MASK (1<<2)
+/* procfs ctrl/stats handling */
+#define CDEBUG_MASK (1<<3)
+/* dev and resource tracking */
+#define DDEBUG_MASK (1<<4)
+
+/* E.g (IDEBUG_MASK | CDEBUG_MASK | DDEBUG_MASK) */
+#define DEFAULT_DEBUG_MASK 0
+
+/*
+ * (Un)Define these *DEBUG to compile out/in the pr_debug calls.
+ * All undef: text size ~ 0x3030; all def: ~ 0x4404.
+ */
+#define IDEBUG
+#define MDEBUG
+#define RDEBUG
+#define CDEBUG
+#define DDEBUG
+
+#define MSK_DEBUG(mask, ...) do { \
+ if (unlikely(qtaguid_debug_mask & (mask))) \
+ pr_debug(__VA_ARGS__); \
+ } while (0)
+#ifdef IDEBUG
+#define IF_DEBUG(...) MSK_DEBUG(IDEBUG_MASK, __VA_ARGS__)
+#else
+#define IF_DEBUG(...) no_printk(__VA_ARGS__)
+#endif
+#ifdef MDEBUG
+#define MT_DEBUG(...) MSK_DEBUG(MDEBUG_MASK, __VA_ARGS__)
+#else
+#define MT_DEBUG(...) no_printk(__VA_ARGS__)
+#endif
+#ifdef RDEBUG
+#define RB_DEBUG(...) MSK_DEBUG(RDEBUG_MASK, __VA_ARGS__)
+#else
+#define RB_DEBUG(...) no_printk(__VA_ARGS__)
+#endif
+#ifdef CDEBUG
+#define CT_DEBUG(...) MSK_DEBUG(CDEBUG_MASK, __VA_ARGS__)
+#else
+#define CT_DEBUG(...) no_printk(__VA_ARGS__)
+#endif
+#ifdef DDEBUG
+#define DR_DEBUG(...) MSK_DEBUG(DDEBUG_MASK, __VA_ARGS__)
+#else
+#define DR_DEBUG(...) no_printk(__VA_ARGS__)
+#endif
+
+extern uint qtaguid_debug_mask;
+
+/*---------------------------------------------------------------------------*/
+/*
+ * Tags:
+ *
+ * They represent what the data usage counters will be tracked against.
+ * By default a tag is just based on the UID.
+ * The UID is used as the base for policing, and can not be ignored.
+ * So a tag will always at least represent a UID (uid_tag).
+ *
+ * A tag can be augmented with an "accounting tag" which is associated
+ * with a UID.
+ * User space can set the acct_tag portion of the tag which is then used
+ * with sockets: all data belonging to that socket will be counted against the
+ * tag. The policing is then based on the tag's uid_tag portion,
+ * and stats are collected for the acct_tag portion separately.
+ *
+ * There could be
+ * a: {acct_tag=1, uid_tag=10003}
+ * b: {acct_tag=2, uid_tag=10003}
+ * c: {acct_tag=3, uid_tag=10003}
+ * d: {acct_tag=0, uid_tag=10003}
+ * a, b, and c represent tags associated with specific sockets.
+ * d is for the totals for that uid, including all untagged traffic.
+ * Typically d is used with policing/quota rules.
+ *
+ * We want tag_t big enough to distinguish uid_t and acct_tag.
+ * It might become a struct if needed.
+ * Nothing should be using it as an int.
+ */
+typedef uint64_t tag_t; /* Only used via accessors */
+
+#define TAG_UID_MASK 0xFFFFFFFFULL
+#define TAG_ACCT_MASK (~0xFFFFFFFFULL)
+
+static inline int tag_compare(tag_t t1, tag_t t2)
+{
+ return t1 < t2 ? -1 : t1 == t2 ? 0 : 1;
+}
+
+static inline tag_t combine_atag_with_uid(tag_t acct_tag, uid_t uid)
+{
+ return acct_tag | uid;
+}
+static inline tag_t make_tag_from_uid(uid_t uid)
+{
+ return uid;
+}
+static inline uid_t get_uid_from_tag(tag_t tag)
+{
+ return tag & TAG_UID_MASK;
+}
+static inline tag_t get_utag_from_tag(tag_t tag)
+{
+ return tag & TAG_UID_MASK;
+}
+static inline tag_t get_atag_from_tag(tag_t tag)
+{
+ return tag & TAG_ACCT_MASK;
+}
+
+static inline bool valid_atag(tag_t tag)
+{
+ return !(tag & TAG_UID_MASK);
+}
+static inline tag_t make_atag_from_value(uint32_t value)
+{
+ return (uint64_t)value << 32;
+}
+/*---------------------------------------------------------------------------*/
+
+/*
+ * Maximum number of socket tags that a UID is allowed to have active.
+ * Multiple processes belonging to the same UID contribute towards this limit.
+ * Special UIDs that can impersonate a UID also contribute (e.g. download
+ * manager, ...)
+ */
+#define DEFAULT_MAX_SOCK_TAGS 1024
+
+/*
+ * For now we only track 2 sets of counters.
+ * The default set is 0.
+ * Userspace can activate another set for a given uid being tracked.
+ */
+#define IFS_MAX_COUNTER_SETS 2
+
+enum ifs_tx_rx {
+ IFS_TX,
+ IFS_RX,
+ IFS_MAX_DIRECTIONS
+};
+
+/* For now, TCP, UDP, the rest */
+enum ifs_proto {
+ IFS_TCP,
+ IFS_UDP,
+ IFS_PROTO_OTHER,
+ IFS_MAX_PROTOS
+};
+
+struct byte_packet_counters {
+ uint64_t bytes;
+ uint64_t packets;
+};
+
+struct data_counters {
+ struct byte_packet_counters bpc[IFS_MAX_COUNTER_SETS][IFS_MAX_DIRECTIONS][IFS_MAX_PROTOS];
+};
+
+static inline uint64_t dc_sum_bytes(struct data_counters *counters,
+ int set,
+ enum ifs_tx_rx direction)
+{
+ return counters->bpc[set][direction][IFS_TCP].bytes
+ + counters->bpc[set][direction][IFS_UDP].bytes
+ + counters->bpc[set][direction][IFS_PROTO_OTHER].bytes;
+}
+
+static inline uint64_t dc_sum_packets(struct data_counters *counters,
+ int set,
+ enum ifs_tx_rx direction)
+{
+ return counters->bpc[set][direction][IFS_TCP].packets
+ + counters->bpc[set][direction][IFS_UDP].packets
+ + counters->bpc[set][direction][IFS_PROTO_OTHER].packets;
+}
+
+
+/* Generic X based nodes used as a base for rb_tree ops */
+struct tag_node {
+ struct rb_node node;
+ tag_t tag;
+};
+
+struct tag_stat {
+ struct tag_node tn;
+ struct data_counters counters;
+ /*
+ * If this tag is acct_tag based, we need to count against the
+ * matching parent uid_tag.
+ */
+ struct data_counters *parent_counters;
+};
+
+struct iface_stat {
+ struct list_head list; /* in iface_stat_list */
+ char *ifname;
+ bool active;
+ /* net_dev is only valid for active iface_stat */
+ struct net_device *net_dev;
+
+ struct byte_packet_counters totals_via_dev[IFS_MAX_DIRECTIONS];
+ struct data_counters totals_via_skb;
+ /*
+ * We keep the last_known, because some devices reset their counters
+ * just before NETDEV_UP, while some will reset just before
+ * NETDEV_REGISTER (which is more normal).
+ * So now, if the device didn't do a NETDEV_UNREGISTER and we see
+ * its current dev stats smaller that what was previously known, we
+ * assume an UNREGISTER and just use the last_known.
+ */
+ struct byte_packet_counters last_known[IFS_MAX_DIRECTIONS];
+ /* last_known is usable when last_known_valid is true */
+ bool last_known_valid;
+
+ struct proc_dir_entry *proc_ptr;
+
+ struct rb_root tag_stat_tree;
+ spinlock_t tag_stat_list_lock;
+};
+
+/* This is needed to create proc_dir_entries from atomic context. */
+struct iface_stat_work {
+ struct work_struct iface_work;
+ struct iface_stat *iface_entry;
+};
+
+/*
+ * Track tag that this socket is transferring data for, and not necessarily
+ * the uid that owns the socket.
+ * This is the tag against which tag_stat.counters will be billed.
+ * These structs need to be looked up by sock and pid.
+ */
+struct sock_tag {
+ struct rb_node sock_node;
+ struct sock *sk; /* Only used as a number, never dereferenced */
+ /* Used to associate with a given pid */
+ struct list_head list; /* in proc_qtu_data.sock_tag_list */
+ pid_t pid;
+
+ tag_t tag;
+};
+
+struct qtaguid_event_counts {
+ /* Various successful events */
+ atomic64_t sockets_tagged;
+ atomic64_t sockets_untagged;
+ atomic64_t counter_set_changes;
+ atomic64_t delete_cmds;
+ atomic64_t iface_events; /* Number of NETDEV_* events handled */
+
+ atomic64_t match_calls; /* Number of times iptables called mt */
+ /* Number of times iptables called mt from pre or post routing hooks */
+ atomic64_t match_calls_prepost;
+ /*
+ * match_found_sk_*: numbers related to the netfilter matching
+ * function finding a sock for the sk_buff.
+ * Total skbs processed is sum(match_found*).
+ */
+ atomic64_t match_found_sk; /* An sk was already in the sk_buff. */
+ /* The connection tracker had or didn't have the sk. */
+ atomic64_t match_found_sk_in_ct;
+ atomic64_t match_found_no_sk_in_ct;
+ /*
+ * No sk could be found. No apparent owner. Could happen with
+ * unsolicited traffic.
+ */
+ atomic64_t match_no_sk;
+ /*
+ * The file ptr in the sk_socket wasn't there and we couldn't get GID.
+ * This might happen for traffic while the socket is being closed.
+ */
+ atomic64_t match_no_sk_gid;
+};
+
+/* Track the set active_set for the given tag. */
+struct tag_counter_set {
+ struct tag_node tn;
+ int active_set;
+};
+
+/*----------------------------------------------*/
+/*
+ * The qtu uid data is used to track resources that are created directly or
+ * indirectly by processes (uid tracked).
+ * It is shared by the processes with the same uid.
+ * Some of the resource will be counted to prevent further rogue allocations,
+ * some will need freeing once the owner process (uid) exits.
+ */
+struct uid_tag_data {
+ struct rb_node node;
+ uid_t uid;
+
+ /*
+ * For the uid, how many accounting tags have been set.
+ */
+ int num_active_tags;
+ /* Track the number of proc_qtu_data that reference it */
+ int num_pqd;
+ struct rb_root tag_ref_tree;
+ /* No tag_node_tree_lock; use uid_tag_data_tree_lock */
+};
+
+struct tag_ref {
+ struct tag_node tn;
+
+ /*
+ * This tracks the number of active sockets that have a tag on them
+ * which matches this tag_ref.tn.tag.
+ * A tag ref can live on after the sockets are untagged.
+ * A tag ref can only be removed during a tag delete command.
+ */
+ int num_sock_tags;
+};
+
+struct proc_qtu_data {
+ struct rb_node node;
+ pid_t pid;
+
+ struct uid_tag_data *parent_tag_data;
+
+ /* Tracks the sock_tags that need freeing upon this proc's death */
+ struct list_head sock_tag_list;
+ /* No spinlock_t sock_tag_list_lock; use the global one. */
+};
+
+/*----------------------------------------------*/
+#endif /* ifndef __XT_QTAGUID_INTERNAL_H__ */
diff --git a/net/netfilter/xt_qtaguid_print.c b/net/netfilter/xt_qtaguid_print.c
new file mode 100644
index 0000000..cab478e
--- /dev/null
+++ b/net/netfilter/xt_qtaguid_print.c
@@ -0,0 +1,565 @@
+/*
+ * Pretty printing Support for iptables xt_qtaguid module.
+ *
+ * (C) 2011 Google, Inc
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * Most of the functions in this file just waste time if DEBUG is not defined.
+ * The matching xt_qtaguid_print.h will static inline empty funcs if the needed
+ * debug flags ore not defined.
+ * Those funcs that fail to allocate memory will panic as there is no need to
+ * hobble allong just pretending to do the requested work.
+ */
+
+#define DEBUG
+
+#include <linux/fs.h>
+#include <linux/gfp.h>
+#include <linux/net.h>
+#include <linux/rbtree.h>
+#include <linux/slab.h>
+#include <linux/spinlock_types.h>
+#include <net/sock.h>
+
+#include "xt_qtaguid_internal.h"
+#include "xt_qtaguid_print.h"
+
+#ifdef DDEBUG
+
+static void _bug_on_err_or_null(void *ptr)
+{
+ if (IS_ERR_OR_NULL(ptr)) {
+ pr_err("qtaguid: kmalloc failed\n");
+ BUG();
+ }
+}
+
+char *pp_tag_t(tag_t *tag)
+{
+ char *res;
+
+ if (!tag)
+ res = kasprintf(GFP_ATOMIC, "tag_t@null{}");
+ else
+ res = kasprintf(GFP_ATOMIC,
+ "tag_t@%p{tag=0x%llx, uid=%u}",
+ tag, *tag, get_uid_from_tag(*tag));
+ _bug_on_err_or_null(res);
+ return res;
+}
+
+char *pp_data_counters(struct data_counters *dc, bool showValues)
+{
+ char *res;
+
+ if (!dc)
+ res = kasprintf(GFP_ATOMIC, "data_counters@null{}");
+ else if (showValues)
+ res = kasprintf(
+ GFP_ATOMIC, "data_counters@%p{"
+ "set0{"
+ "rx{"
+ "tcp{b=%llu, p=%llu}, "
+ "udp{b=%llu, p=%llu},"
+ "other{b=%llu, p=%llu}}, "
+ "tx{"
+ "tcp{b=%llu, p=%llu}, "
+ "udp{b=%llu, p=%llu},"
+ "other{b=%llu, p=%llu}}}, "
+ "set1{"
+ "rx{"
+ "tcp{b=%llu, p=%llu}, "
+ "udp{b=%llu, p=%llu},"
+ "other{b=%llu, p=%llu}}, "
+ "tx{"
+ "tcp{b=%llu, p=%llu}, "
+ "udp{b=%llu, p=%llu},"
+ "other{b=%llu, p=%llu}}}}",
+ dc,
+ dc->bpc[0][IFS_RX][IFS_TCP].bytes,
+ dc->bpc[0][IFS_RX][IFS_TCP].packets,
+ dc->bpc[0][IFS_RX][IFS_UDP].bytes,
+ dc->bpc[0][IFS_RX][IFS_UDP].packets,
+ dc->bpc[0][IFS_RX][IFS_PROTO_OTHER].bytes,
+ dc->bpc[0][IFS_RX][IFS_PROTO_OTHER].packets,
+ dc->bpc[0][IFS_TX][IFS_TCP].bytes,
+ dc->bpc[0][IFS_TX][IFS_TCP].packets,
+ dc->bpc[0][IFS_TX][IFS_UDP].bytes,
+ dc->bpc[0][IFS_TX][IFS_UDP].packets,
+ dc->bpc[0][IFS_TX][IFS_PROTO_OTHER].bytes,
+ dc->bpc[0][IFS_TX][IFS_PROTO_OTHER].packets,
+ dc->bpc[1][IFS_RX][IFS_TCP].bytes,
+ dc->bpc[1][IFS_RX][IFS_TCP].packets,
+ dc->bpc[1][IFS_RX][IFS_UDP].bytes,
+ dc->bpc[1][IFS_RX][IFS_UDP].packets,
+ dc->bpc[1][IFS_RX][IFS_PROTO_OTHER].bytes,
+ dc->bpc[1][IFS_RX][IFS_PROTO_OTHER].packets,
+ dc->bpc[1][IFS_TX][IFS_TCP].bytes,
+ dc->bpc[1][IFS_TX][IFS_TCP].packets,
+ dc->bpc[1][IFS_TX][IFS_UDP].bytes,
+ dc->bpc[1][IFS_TX][IFS_UDP].packets,
+ dc->bpc[1][IFS_TX][IFS_PROTO_OTHER].bytes,
+ dc->bpc[1][IFS_TX][IFS_PROTO_OTHER].packets);
+ else
+ res = kasprintf(GFP_ATOMIC, "data_counters@%p{...}", dc);
+ _bug_on_err_or_null(res);
+ return res;
+}
+
+char *pp_tag_node(struct tag_node *tn)
+{
+ char *tag_str;
+ char *res;
+
+ if (!tn) {
+ res = kasprintf(GFP_ATOMIC, "tag_node@null{}");
+ _bug_on_err_or_null(res);
+ return res;
+ }
+ tag_str = pp_tag_t(&tn->tag);
+ res = kasprintf(GFP_ATOMIC,
+ "tag_node@%p{tag=%s}",
+ tn, tag_str);
+ _bug_on_err_or_null(res);
+ kfree(tag_str);
+ return res;
+}
+
+char *pp_tag_ref(struct tag_ref *tr)
+{
+ char *tn_str;
+ char *res;
+
+ if (!tr) {
+ res = kasprintf(GFP_ATOMIC, "tag_ref@null{}");
+ _bug_on_err_or_null(res);
+ return res;
+ }
+ tn_str = pp_tag_node(&tr->tn);
+ res = kasprintf(GFP_ATOMIC,
+ "tag_ref@%p{%s, num_sock_tags=%d}",
+ tr, tn_str, tr->num_sock_tags);
+ _bug_on_err_or_null(res);
+ kfree(tn_str);
+ return res;
+}
+
+char *pp_tag_stat(struct tag_stat *ts)
+{
+ char *tn_str;
+ char *counters_str;
+ char *parent_counters_str;
+ char *res;
+
+ if (!ts) {
+ res = kasprintf(GFP_ATOMIC, "tag_stat@null{}");
+ _bug_on_err_or_null(res);
+ return res;
+ }
+ tn_str = pp_tag_node(&ts->tn);
+ counters_str = pp_data_counters(&ts->counters, true);
+ parent_counters_str = pp_data_counters(ts->parent_counters, false);
+ res = kasprintf(GFP_ATOMIC,
+ "tag_stat@%p{%s, counters=%s, parent_counters=%s}",
+ ts, tn_str, counters_str, parent_counters_str);
+ _bug_on_err_or_null(res);
+ kfree(tn_str);
+ kfree(counters_str);
+ kfree(parent_counters_str);
+ return res;
+}
+
+char *pp_iface_stat(struct iface_stat *is)
+{
+ char *res;
+ if (!is) {
+ res = kasprintf(GFP_ATOMIC, "iface_stat@null{}");
+ } else {
+ struct data_counters *cnts = &is->totals_via_skb;
+ res = kasprintf(GFP_ATOMIC, "iface_stat@%p{"
+ "list=list_head{...}, "
+ "ifname=%s, "
+ "total_dev={rx={bytes=%llu, "
+ "packets=%llu}, "
+ "tx={bytes=%llu, "
+ "packets=%llu}}, "
+ "total_skb={rx={bytes=%llu, "
+ "packets=%llu}, "
+ "tx={bytes=%llu, "
+ "packets=%llu}}, "
+ "last_known_valid=%d, "
+ "last_known={rx={bytes=%llu, "
+ "packets=%llu}, "
+ "tx={bytes=%llu, "
+ "packets=%llu}}, "
+ "active=%d, "
+ "net_dev=%p, "
+ "proc_ptr=%p, "
+ "tag_stat_tree=rb_root{...}}",
+ is,
+ is->ifname,
+ is->totals_via_dev[IFS_RX].bytes,
+ is->totals_via_dev[IFS_RX].packets,
+ is->totals_via_dev[IFS_TX].bytes,
+ is->totals_via_dev[IFS_TX].packets,
+ dc_sum_bytes(cnts, 0, IFS_RX),
+ dc_sum_packets(cnts, 0, IFS_RX),
+ dc_sum_bytes(cnts, 0, IFS_TX),
+ dc_sum_packets(cnts, 0, IFS_TX),
+ is->last_known_valid,
+ is->last_known[IFS_RX].bytes,
+ is->last_known[IFS_RX].packets,
+ is->last_known[IFS_TX].bytes,
+ is->last_known[IFS_TX].packets,
+ is->active,
+ is->net_dev,
+ is->proc_ptr);
+ }
+ _bug_on_err_or_null(res);
+ return res;
+}
+
+char *pp_sock_tag(struct sock_tag *st)
+{
+ char *tag_str;
+ char *res;
+
+ if (!st) {
+ res = kasprintf(GFP_ATOMIC, "sock_tag@null{}");
+ _bug_on_err_or_null(res);
+ return res;
+ }
+ tag_str = pp_tag_t(&st->tag);
+ res = kasprintf(GFP_ATOMIC, "sock_tag@%p{"
+ "sock_node=rb_node{...}, "
+ "sk=%p (f_count=%d), list=list_head{...}, "
+ "pid=%u, tag=%s}",
+ st, st->sk, refcount_read(&st->sk->sk_refcnt),
+ st->pid, tag_str);
+ _bug_on_err_or_null(res);
+ kfree(tag_str);
+ return res;
+}
+
+char *pp_uid_tag_data(struct uid_tag_data *utd)
+{
+ char *res;
+
+ if (!utd)
+ res = kasprintf(GFP_ATOMIC, "uid_tag_data@null{}");
+ else
+ res = kasprintf(GFP_ATOMIC, "uid_tag_data@%p{"
+ "uid=%u, num_active_acct_tags=%d, "
+ "num_pqd=%d, "
+ "tag_node_tree=rb_root{...}, "
+ "proc_qtu_data_tree=rb_root{...}}",
+ utd, utd->uid,
+ utd->num_active_tags, utd->num_pqd);
+ _bug_on_err_or_null(res);
+ return res;
+}
+
+char *pp_proc_qtu_data(struct proc_qtu_data *pqd)
+{
+ char *parent_tag_data_str;
+ char *res;
+
+ if (!pqd) {
+ res = kasprintf(GFP_ATOMIC, "proc_qtu_data@null{}");
+ _bug_on_err_or_null(res);
+ return res;
+ }
+ parent_tag_data_str = pp_uid_tag_data(pqd->parent_tag_data);
+ res = kasprintf(GFP_ATOMIC, "proc_qtu_data@%p{"
+ "node=rb_node{...}, pid=%u, "
+ "parent_tag_data=%s, "
+ "sock_tag_list=list_head{...}}",
+ pqd, pqd->pid, parent_tag_data_str
+ );
+ _bug_on_err_or_null(res);
+ kfree(parent_tag_data_str);
+ return res;
+}
+
+/*------------------------------------------*/
+void prdebug_sock_tag_tree(int indent_level,
+ struct rb_root *sock_tag_tree)
+{
+ struct rb_node *node;
+ struct sock_tag *sock_tag_entry;
+ char *str;
+
+ if (!unlikely(qtaguid_debug_mask & DDEBUG_MASK))
+ return;
+
+ if (RB_EMPTY_ROOT(sock_tag_tree)) {
+ str = "sock_tag_tree=rb_root{}";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+ return;
+ }
+
+ str = "sock_tag_tree=rb_root{";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+ indent_level++;
+ for (node = rb_first(sock_tag_tree);
+ node;
+ node = rb_next(node)) {
+ sock_tag_entry = rb_entry(node, struct sock_tag, sock_node);
+ str = pp_sock_tag(sock_tag_entry);
+ pr_debug("%*d: %s,\n", indent_level*2, indent_level, str);
+ kfree(str);
+ }
+ indent_level--;
+ str = "}";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+}
+
+void prdebug_sock_tag_list(int indent_level,
+ struct list_head *sock_tag_list)
+{
+ struct sock_tag *sock_tag_entry;
+ char *str;
+
+ if (!unlikely(qtaguid_debug_mask & DDEBUG_MASK))
+ return;
+
+ if (list_empty(sock_tag_list)) {
+ str = "sock_tag_list=list_head{}";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+ return;
+ }
+
+ str = "sock_tag_list=list_head{";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+ indent_level++;
+ list_for_each_entry(sock_tag_entry, sock_tag_list, list) {
+ str = pp_sock_tag(sock_tag_entry);
+ pr_debug("%*d: %s,\n", indent_level*2, indent_level, str);
+ kfree(str);
+ }
+ indent_level--;
+ str = "}";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+}
+
+void prdebug_proc_qtu_data_tree(int indent_level,
+ struct rb_root *proc_qtu_data_tree)
+{
+ char *str;
+ struct rb_node *node;
+ struct proc_qtu_data *proc_qtu_data_entry;
+
+ if (!unlikely(qtaguid_debug_mask & DDEBUG_MASK))
+ return;
+
+ if (RB_EMPTY_ROOT(proc_qtu_data_tree)) {
+ str = "proc_qtu_data_tree=rb_root{}";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+ return;
+ }
+
+ str = "proc_qtu_data_tree=rb_root{";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+ indent_level++;
+ for (node = rb_first(proc_qtu_data_tree);
+ node;
+ node = rb_next(node)) {
+ proc_qtu_data_entry = rb_entry(node,
+ struct proc_qtu_data,
+ node);
+ str = pp_proc_qtu_data(proc_qtu_data_entry);
+ pr_debug("%*d: %s,\n", indent_level*2, indent_level,
+ str);
+ kfree(str);
+ indent_level++;
+ prdebug_sock_tag_list(indent_level,
+ &proc_qtu_data_entry->sock_tag_list);
+ indent_level--;
+
+ }
+ indent_level--;
+ str = "}";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+}
+
+void prdebug_tag_ref_tree(int indent_level, struct rb_root *tag_ref_tree)
+{
+ char *str;
+ struct rb_node *node;
+ struct tag_ref *tag_ref_entry;
+
+ if (!unlikely(qtaguid_debug_mask & DDEBUG_MASK))
+ return;
+
+ if (RB_EMPTY_ROOT(tag_ref_tree)) {
+ str = "tag_ref_tree{}";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+ return;
+ }
+
+ str = "tag_ref_tree{";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+ indent_level++;
+ for (node = rb_first(tag_ref_tree);
+ node;
+ node = rb_next(node)) {
+ tag_ref_entry = rb_entry(node,
+ struct tag_ref,
+ tn.node);
+ str = pp_tag_ref(tag_ref_entry);
+ pr_debug("%*d: %s,\n", indent_level*2, indent_level,
+ str);
+ kfree(str);
+ }
+ indent_level--;
+ str = "}";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+}
+
+void prdebug_uid_tag_data_tree(int indent_level,
+ struct rb_root *uid_tag_data_tree)
+{
+ char *str;
+ struct rb_node *node;
+ struct uid_tag_data *uid_tag_data_entry;
+
+ if (!unlikely(qtaguid_debug_mask & DDEBUG_MASK))
+ return;
+
+ if (RB_EMPTY_ROOT(uid_tag_data_tree)) {
+ str = "uid_tag_data_tree=rb_root{}";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+ return;
+ }
+
+ str = "uid_tag_data_tree=rb_root{";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+ indent_level++;
+ for (node = rb_first(uid_tag_data_tree);
+ node;
+ node = rb_next(node)) {
+ uid_tag_data_entry = rb_entry(node, struct uid_tag_data,
+ node);
+ str = pp_uid_tag_data(uid_tag_data_entry);
+ pr_debug("%*d: %s,\n", indent_level*2, indent_level, str);
+ kfree(str);
+ if (!RB_EMPTY_ROOT(&uid_tag_data_entry->tag_ref_tree)) {
+ indent_level++;
+ prdebug_tag_ref_tree(indent_level,
+ &uid_tag_data_entry->tag_ref_tree);
+ indent_level--;
+ }
+ }
+ indent_level--;
+ str = "}";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+}
+
+void prdebug_tag_stat_tree(int indent_level,
+ struct rb_root *tag_stat_tree)
+{
+ char *str;
+ struct rb_node *node;
+ struct tag_stat *ts_entry;
+
+ if (!unlikely(qtaguid_debug_mask & DDEBUG_MASK))
+ return;
+
+ if (RB_EMPTY_ROOT(tag_stat_tree)) {
+ str = "tag_stat_tree{}";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+ return;
+ }
+
+ str = "tag_stat_tree{";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+ indent_level++;
+ for (node = rb_first(tag_stat_tree);
+ node;
+ node = rb_next(node)) {
+ ts_entry = rb_entry(node, struct tag_stat, tn.node);
+ str = pp_tag_stat(ts_entry);
+ pr_debug("%*d: %s\n", indent_level*2, indent_level,
+ str);
+ kfree(str);
+ }
+ indent_level--;
+ str = "}";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+}
+
+void prdebug_iface_stat_list(int indent_level,
+ struct list_head *iface_stat_list)
+{
+ char *str;
+ struct iface_stat *iface_entry;
+
+ if (!unlikely(qtaguid_debug_mask & DDEBUG_MASK))
+ return;
+
+ if (list_empty(iface_stat_list)) {
+ str = "iface_stat_list=list_head{}";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+ return;
+ }
+
+ str = "iface_stat_list=list_head{";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+ indent_level++;
+ list_for_each_entry(iface_entry, iface_stat_list, list) {
+ str = pp_iface_stat(iface_entry);
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+ kfree(str);
+
+ spin_lock_bh(&iface_entry->tag_stat_list_lock);
+ if (!RB_EMPTY_ROOT(&iface_entry->tag_stat_tree)) {
+ indent_level++;
+ prdebug_tag_stat_tree(indent_level,
+ &iface_entry->tag_stat_tree);
+ indent_level--;
+ }
+ spin_unlock_bh(&iface_entry->tag_stat_list_lock);
+ }
+ indent_level--;
+ str = "}";
+ pr_debug("%*d: %s\n", indent_level*2, indent_level, str);
+}
+
+#endif /* ifdef DDEBUG */
+/*------------------------------------------*/
+static const char * const netdev_event_strings[] = {
+ "netdev_unknown",
+ "NETDEV_UP",
+ "NETDEV_DOWN",
+ "NETDEV_REBOOT",
+ "NETDEV_CHANGE",
+ "NETDEV_REGISTER",
+ "NETDEV_UNREGISTER",
+ "NETDEV_CHANGEMTU",
+ "NETDEV_CHANGEADDR",
+ "NETDEV_GOING_DOWN",
+ "NETDEV_CHANGENAME",
+ "NETDEV_FEAT_CHANGE",
+ "NETDEV_BONDING_FAILOVER",
+ "NETDEV_PRE_UP",
+ "NETDEV_PRE_TYPE_CHANGE",
+ "NETDEV_POST_TYPE_CHANGE",
+ "NETDEV_POST_INIT",
+ "NETDEV_UNREGISTER_BATCH",
+ "NETDEV_RELEASE",
+ "NETDEV_NOTIFY_PEERS",
+ "NETDEV_JOIN",
+};
+
+const char *netdev_evt_str(int netdev_event)
+{
+ if (netdev_event < 0
+ || netdev_event >= ARRAY_SIZE(netdev_event_strings))
+ return "bad event num";
+ return netdev_event_strings[netdev_event];
+}
diff --git a/net/netfilter/xt_qtaguid_print.h b/net/netfilter/xt_qtaguid_print.h
new file mode 100644
index 0000000..b63871a
--- /dev/null
+++ b/net/netfilter/xt_qtaguid_print.h
@@ -0,0 +1,120 @@
+/*
+ * Pretty printing Support for iptables xt_qtaguid module.
+ *
+ * (C) 2011 Google, Inc
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef __XT_QTAGUID_PRINT_H__
+#define __XT_QTAGUID_PRINT_H__
+
+#include "xt_qtaguid_internal.h"
+
+#ifdef DDEBUG
+
+char *pp_tag_t(tag_t *tag);
+char *pp_data_counters(struct data_counters *dc, bool showValues);
+char *pp_tag_node(struct tag_node *tn);
+char *pp_tag_ref(struct tag_ref *tr);
+char *pp_tag_stat(struct tag_stat *ts);
+char *pp_iface_stat(struct iface_stat *is);
+char *pp_sock_tag(struct sock_tag *st);
+char *pp_uid_tag_data(struct uid_tag_data *qtd);
+char *pp_proc_qtu_data(struct proc_qtu_data *pqd);
+
+/*------------------------------------------*/
+void prdebug_sock_tag_list(int indent_level,
+ struct list_head *sock_tag_list);
+void prdebug_sock_tag_tree(int indent_level,
+ struct rb_root *sock_tag_tree);
+void prdebug_proc_qtu_data_tree(int indent_level,
+ struct rb_root *proc_qtu_data_tree);
+void prdebug_tag_ref_tree(int indent_level, struct rb_root *tag_ref_tree);
+void prdebug_uid_tag_data_tree(int indent_level,
+ struct rb_root *uid_tag_data_tree);
+void prdebug_tag_stat_tree(int indent_level,
+ struct rb_root *tag_stat_tree);
+void prdebug_iface_stat_list(int indent_level,
+ struct list_head *iface_stat_list);
+
+#else
+
+/*------------------------------------------*/
+static inline char *pp_tag_t(tag_t *tag)
+{
+ return NULL;
+}
+static inline char *pp_data_counters(struct data_counters *dc, bool showValues)
+{
+ return NULL;
+}
+static inline char *pp_tag_node(struct tag_node *tn)
+{
+ return NULL;
+}
+static inline char *pp_tag_ref(struct tag_ref *tr)
+{
+ return NULL;
+}
+static inline char *pp_tag_stat(struct tag_stat *ts)
+{
+ return NULL;
+}
+static inline char *pp_iface_stat(struct iface_stat *is)
+{
+ return NULL;
+}
+static inline char *pp_sock_tag(struct sock_tag *st)
+{
+ return NULL;
+}
+static inline char *pp_uid_tag_data(struct uid_tag_data *qtd)
+{
+ return NULL;
+}
+static inline char *pp_proc_qtu_data(struct proc_qtu_data *pqd)
+{
+ return NULL;
+}
+
+/*------------------------------------------*/
+static inline
+void prdebug_sock_tag_list(int indent_level,
+ struct list_head *sock_tag_list)
+{
+}
+static inline
+void prdebug_sock_tag_tree(int indent_level,
+ struct rb_root *sock_tag_tree)
+{
+}
+static inline
+void prdebug_proc_qtu_data_tree(int indent_level,
+ struct rb_root *proc_qtu_data_tree)
+{
+}
+static inline
+void prdebug_tag_ref_tree(int indent_level, struct rb_root *tag_ref_tree)
+{
+}
+static inline
+void prdebug_uid_tag_data_tree(int indent_level,
+ struct rb_root *uid_tag_data_tree)
+{
+}
+static inline
+void prdebug_tag_stat_tree(int indent_level,
+ struct rb_root *tag_stat_tree)
+{
+}
+static inline
+void prdebug_iface_stat_list(int indent_level,
+ struct list_head *iface_stat_list)
+{
+}
+#endif
+/*------------------------------------------*/
+const char *netdev_evt_str(int netdev_event);
+#endif /* ifndef __XT_QTAGUID_PRINT_H__ */
diff --git a/net/netfilter/xt_quota2.c b/net/netfilter/xt_quota2.c
new file mode 100644
index 0000000..24b7742
--- /dev/null
+++ b/net/netfilter/xt_quota2.c
@@ -0,0 +1,401 @@
+/*
+ * xt_quota2 - enhanced xt_quota that can count upwards and in packets
+ * as a minimal accounting match.
+ * by Jan Engelhardt <jengelh@medozas.de>, 2008
+ *
+ * Originally based on xt_quota.c:
+ * netfilter module to enforce network quotas
+ * Sam Johnston <samj@samj.net>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License; either
+ * version 2 of the License, as published by the Free Software Foundation.
+ */
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/proc_fs.h>
+#include <linux/skbuff.h>
+#include <linux/spinlock.h>
+#include <asm/atomic.h>
+#include <net/netlink.h>
+
+#include <linux/netfilter/x_tables.h>
+#include <linux/netfilter/xt_quota2.h>
+
+#ifdef CONFIG_NETFILTER_XT_MATCH_QUOTA2_LOG
+/* For compatibility, these definitions are copied from the
+ * deprecated header file <linux/netfilter_ipv4/ipt_ULOG.h> */
+#define ULOG_MAC_LEN 80
+#define ULOG_PREFIX_LEN 32
+
+/* Format of the ULOG packets passed through netlink */
+typedef struct ulog_packet_msg {
+ unsigned long mark;
+ long timestamp_sec;
+ long timestamp_usec;
+ unsigned int hook;
+ char indev_name[IFNAMSIZ];
+ char outdev_name[IFNAMSIZ];
+ size_t data_len;
+ char prefix[ULOG_PREFIX_LEN];
+ unsigned char mac_len;
+ unsigned char mac[ULOG_MAC_LEN];
+ unsigned char payload[0];
+} ulog_packet_msg_t;
+#endif
+
+/**
+ * @lock: lock to protect quota writers from each other
+ */
+struct xt_quota_counter {
+ u_int64_t quota;
+ spinlock_t lock;
+ struct list_head list;
+ atomic_t ref;
+ char name[sizeof(((struct xt_quota_mtinfo2 *)NULL)->name)];
+ struct proc_dir_entry *procfs_entry;
+};
+
+#ifdef CONFIG_NETFILTER_XT_MATCH_QUOTA2_LOG
+/* Harald's favorite number +1 :D From ipt_ULOG.C */
+static int qlog_nl_event = 112;
+module_param_named(event_num, qlog_nl_event, uint, S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(event_num,
+ "Event number for NETLINK_NFLOG message. 0 disables log."
+ "111 is what ipt_ULOG uses.");
+static struct sock *nflognl;
+#endif
+
+static LIST_HEAD(counter_list);
+static DEFINE_SPINLOCK(counter_list_lock);
+
+static struct proc_dir_entry *proc_xt_quota;
+static unsigned int quota_list_perms = S_IRUGO | S_IWUSR;
+static kuid_t quota_list_uid = KUIDT_INIT(0);
+static kgid_t quota_list_gid = KGIDT_INIT(0);
+module_param_named(perms, quota_list_perms, uint, S_IRUGO | S_IWUSR);
+
+#ifdef CONFIG_NETFILTER_XT_MATCH_QUOTA2_LOG
+static void quota2_log(unsigned int hooknum,
+ const struct sk_buff *skb,
+ const struct net_device *in,
+ const struct net_device *out,
+ const char *prefix)
+{
+ ulog_packet_msg_t *pm;
+ struct sk_buff *log_skb;
+ size_t size;
+ struct nlmsghdr *nlh;
+
+ if (!qlog_nl_event)
+ return;
+
+ size = NLMSG_SPACE(sizeof(*pm));
+ size = max(size, (size_t)NLMSG_GOODSIZE);
+ log_skb = alloc_skb(size, GFP_ATOMIC);
+ if (!log_skb) {
+ pr_err("xt_quota2: cannot alloc skb for logging\n");
+ return;
+ }
+
+ nlh = nlmsg_put(log_skb, /*pid*/0, /*seq*/0, qlog_nl_event,
+ sizeof(*pm), 0);
+ if (!nlh) {
+ pr_err("xt_quota2: nlmsg_put failed\n");
+ kfree_skb(log_skb);
+ return;
+ }
+ pm = nlmsg_data(nlh);
+ if (skb->tstamp == 0)
+ __net_timestamp((struct sk_buff *)skb);
+ pm->data_len = 0;
+ pm->hook = hooknum;
+ if (prefix != NULL)
+ strlcpy(pm->prefix, prefix, sizeof(pm->prefix));
+ else
+ *(pm->prefix) = '\0';
+ if (in)
+ strlcpy(pm->indev_name, in->name, sizeof(pm->indev_name));
+ else
+ pm->indev_name[0] = '\0';
+
+ if (out)
+ strlcpy(pm->outdev_name, out->name, sizeof(pm->outdev_name));
+ else
+ pm->outdev_name[0] = '\0';
+
+ NETLINK_CB(log_skb).dst_group = 1;
+ pr_debug("throwing 1 packets to netlink group 1\n");
+ netlink_broadcast(nflognl, log_skb, 0, 1, GFP_ATOMIC);
+}
+#else
+static void quota2_log(unsigned int hooknum,
+ const struct sk_buff *skb,
+ const struct net_device *in,
+ const struct net_device *out,
+ const char *prefix)
+{
+}
+#endif /* if+else CONFIG_NETFILTER_XT_MATCH_QUOTA2_LOG */
+
+static ssize_t quota_proc_read(struct file *file, char __user *buf,
+ size_t size, loff_t *ppos)
+{
+ struct xt_quota_counter *e = PDE_DATA(file_inode(file));
+ char tmp[24];
+ size_t tmp_size;
+
+ spin_lock_bh(&e->lock);
+ tmp_size = scnprintf(tmp, sizeof(tmp), "%llu\n", e->quota);
+ spin_unlock_bh(&e->lock);
+ return simple_read_from_buffer(buf, size, ppos, tmp, tmp_size);
+}
+
+static ssize_t quota_proc_write(struct file *file, const char __user *input,
+ size_t size, loff_t *ppos)
+{
+ struct xt_quota_counter *e = PDE_DATA(file_inode(file));
+ char buf[sizeof("18446744073709551616")];
+
+ if (size > sizeof(buf))
+ size = sizeof(buf);
+ if (copy_from_user(buf, input, size) != 0)
+ return -EFAULT;
+ buf[sizeof(buf)-1] = '\0';
+
+ spin_lock_bh(&e->lock);
+ e->quota = simple_strtoull(buf, NULL, 0);
+ spin_unlock_bh(&e->lock);
+ return size;
+}
+
+static const struct file_operations q2_counter_fops = {
+ .read = quota_proc_read,
+ .write = quota_proc_write,
+ .llseek = default_llseek,
+};
+
+static struct xt_quota_counter *
+q2_new_counter(const struct xt_quota_mtinfo2 *q, bool anon)
+{
+ struct xt_quota_counter *e;
+ unsigned int size;
+
+ /* Do not need all the procfs things for anonymous counters. */
+ size = anon ? offsetof(typeof(*e), list) : sizeof(*e);
+ e = kmalloc(size, GFP_KERNEL);
+ if (e == NULL)
+ return NULL;
+
+ e->quota = q->quota;
+ spin_lock_init(&e->lock);
+ if (!anon) {
+ INIT_LIST_HEAD(&e->list);
+ atomic_set(&e->ref, 1);
+ strlcpy(e->name, q->name, sizeof(e->name));
+ }
+ return e;
+}
+
+/**
+ * q2_get_counter - get ref to counter or create new
+ * @name: name of counter
+ */
+static struct xt_quota_counter *
+q2_get_counter(const struct xt_quota_mtinfo2 *q)
+{
+ struct proc_dir_entry *p;
+ struct xt_quota_counter *e = NULL;
+ struct xt_quota_counter *new_e;
+
+ if (*q->name == '\0')
+ return q2_new_counter(q, true);
+
+ /* No need to hold a lock while getting a new counter */
+ new_e = q2_new_counter(q, false);
+ if (new_e == NULL)
+ goto out;
+
+ spin_lock_bh(&counter_list_lock);
+ list_for_each_entry(e, &counter_list, list)
+ if (strcmp(e->name, q->name) == 0) {
+ atomic_inc(&e->ref);
+ spin_unlock_bh(&counter_list_lock);
+ kfree(new_e);
+ pr_debug("xt_quota2: old counter name=%s", e->name);
+ return e;
+ }
+ e = new_e;
+ pr_debug("xt_quota2: new_counter name=%s", e->name);
+ list_add_tail(&e->list, &counter_list);
+ /* The entry having a refcount of 1 is not directly destructible.
+ * This func has not yet returned the new entry, thus iptables
+ * has not references for destroying this entry.
+ * For another rule to try to destroy it, it would 1st need for this
+ * func* to be re-invoked, acquire a new ref for the same named quota.
+ * Nobody will access the e->procfs_entry either.
+ * So release the lock. */
+ spin_unlock_bh(&counter_list_lock);
+
+ /* create_proc_entry() is not spin_lock happy */
+ p = e->procfs_entry = proc_create_data(e->name, quota_list_perms,
+ proc_xt_quota, &q2_counter_fops, e);
+
+ if (IS_ERR_OR_NULL(p)) {
+ spin_lock_bh(&counter_list_lock);
+ list_del(&e->list);
+ spin_unlock_bh(&counter_list_lock);
+ goto out;
+ }
+ proc_set_user(p, quota_list_uid, quota_list_gid);
+ return e;
+
+ out:
+ kfree(e);
+ return NULL;
+}
+
+static int quota_mt2_check(const struct xt_mtchk_param *par)
+{
+ struct xt_quota_mtinfo2 *q = par->matchinfo;
+
+ pr_debug("xt_quota2: check() flags=0x%04x", q->flags);
+
+ if (q->flags & ~XT_QUOTA_MASK)
+ return -EINVAL;
+
+ q->name[sizeof(q->name)-1] = '\0';
+ if (*q->name == '.' || strchr(q->name, '/') != NULL) {
+ printk(KERN_ERR "xt_quota.3: illegal name\n");
+ return -EINVAL;
+ }
+
+ q->master = q2_get_counter(q);
+ if (q->master == NULL) {
+ printk(KERN_ERR "xt_quota.3: memory alloc failure\n");
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static void quota_mt2_destroy(const struct xt_mtdtor_param *par)
+{
+ struct xt_quota_mtinfo2 *q = par->matchinfo;
+ struct xt_quota_counter *e = q->master;
+
+ if (*q->name == '\0') {
+ kfree(e);
+ return;
+ }
+
+ spin_lock_bh(&counter_list_lock);
+ if (!atomic_dec_and_test(&e->ref)) {
+ spin_unlock_bh(&counter_list_lock);
+ return;
+ }
+
+ list_del(&e->list);
+ remove_proc_entry(e->name, proc_xt_quota);
+ spin_unlock_bh(&counter_list_lock);
+ kfree(e);
+}
+
+static bool
+quota_mt2(const struct sk_buff *skb, struct xt_action_param *par)
+{
+ struct xt_quota_mtinfo2 *q = (void *)par->matchinfo;
+ struct xt_quota_counter *e = q->master;
+ bool ret = q->flags & XT_QUOTA_INVERT;
+
+ spin_lock_bh(&e->lock);
+ if (q->flags & XT_QUOTA_GROW) {
+ /*
+ * While no_change is pointless in "grow" mode, we will
+ * implement it here simply to have a consistent behavior.
+ */
+ if (!(q->flags & XT_QUOTA_NO_CHANGE)) {
+ e->quota += (q->flags & XT_QUOTA_PACKET) ? 1 : skb->len;
+ }
+ ret = true;
+ } else {
+ if (e->quota >= skb->len) {
+ if (!(q->flags & XT_QUOTA_NO_CHANGE))
+ e->quota -= (q->flags & XT_QUOTA_PACKET) ? 1 : skb->len;
+ ret = !ret;
+ } else {
+ /* We are transitioning, log that fact. */
+ if (e->quota) {
+ quota2_log(xt_hooknum(par),
+ skb,
+ xt_in(par),
+ xt_out(par),
+ q->name);
+ }
+ /* we do not allow even small packets from now on */
+ e->quota = 0;
+ }
+ }
+ spin_unlock_bh(&e->lock);
+ return ret;
+}
+
+static struct xt_match quota_mt2_reg[] __read_mostly = {
+ {
+ .name = "quota2",
+ .revision = 3,
+ .family = NFPROTO_IPV4,
+ .checkentry = quota_mt2_check,
+ .match = quota_mt2,
+ .destroy = quota_mt2_destroy,
+ .matchsize = sizeof(struct xt_quota_mtinfo2),
+ .me = THIS_MODULE,
+ },
+ {
+ .name = "quota2",
+ .revision = 3,
+ .family = NFPROTO_IPV6,
+ .checkentry = quota_mt2_check,
+ .match = quota_mt2,
+ .destroy = quota_mt2_destroy,
+ .matchsize = sizeof(struct xt_quota_mtinfo2),
+ .me = THIS_MODULE,
+ },
+};
+
+static int __init quota_mt2_init(void)
+{
+ int ret;
+ pr_debug("xt_quota2: init()");
+
+#ifdef CONFIG_NETFILTER_XT_MATCH_QUOTA2_LOG
+ nflognl = netlink_kernel_create(&init_net, NETLINK_NFLOG, NULL);
+ if (!nflognl)
+ return -ENOMEM;
+#endif
+
+ proc_xt_quota = proc_mkdir("xt_quota", init_net.proc_net);
+ if (proc_xt_quota == NULL)
+ return -EACCES;
+
+ ret = xt_register_matches(quota_mt2_reg, ARRAY_SIZE(quota_mt2_reg));
+ if (ret < 0)
+ remove_proc_entry("xt_quota", init_net.proc_net);
+ pr_debug("xt_quota2: init() %d", ret);
+ return ret;
+}
+
+static void __exit quota_mt2_exit(void)
+{
+ xt_unregister_matches(quota_mt2_reg, ARRAY_SIZE(quota_mt2_reg));
+ remove_proc_entry("xt_quota", init_net.proc_net);
+}
+
+module_init(quota_mt2_init);
+module_exit(quota_mt2_exit);
+MODULE_DESCRIPTION("Xtables: countdown quota match; up counter");
+MODULE_AUTHOR("Sam Johnston <samj@samj.net>");
+MODULE_AUTHOR("Jan Engelhardt <jengelh@medozas.de>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("ipt_quota2");
+MODULE_ALIAS("ip6t_quota2");
diff --git a/net/netfilter/xt_socket.c b/net/netfilter/xt_socket.c
index 575d215..cddd5cb 100644
--- a/net/netfilter/xt_socket.c
+++ b/net/netfilter/xt_socket.c
@@ -79,8 +79,7 @@
transparent && sk_fullsock(sk))
pskb->mark = sk->sk_mark;
- if (sk != skb->sk)
- sock_gen_put(sk);
+ sock_gen_put(sk);
if (wildcard || !transparent)
sk = NULL;
diff --git a/net/nfc/hci/core.c b/net/nfc/hci/core.c
index b740fef..6bf14f4 100644
--- a/net/nfc/hci/core.c
+++ b/net/nfc/hci/core.c
@@ -209,6 +209,11 @@
}
create_info = (struct hci_create_pipe_resp *)skb->data;
+ if (create_info->pipe >= NFC_HCI_MAX_PIPES) {
+ status = NFC_HCI_ANY_E_NOK;
+ goto exit;
+ }
+
/* Save the new created pipe and bind with local gate,
* the description for skb->data[3] is destination gate id
* but since we received this cmd from host controller, we
@@ -232,6 +237,11 @@
}
delete_info = (struct hci_delete_pipe_noti *)skb->data;
+ if (delete_info->pipe >= NFC_HCI_MAX_PIPES) {
+ status = NFC_HCI_ANY_E_NOK;
+ goto exit;
+ }
+
hdev->pipes[delete_info->pipe].gate = NFC_HCI_INVALID_GATE;
hdev->pipes[delete_info->pipe].dest_host = NFC_HCI_INVALID_HOST;
break;
diff --git a/net/rfkill/Kconfig b/net/rfkill/Kconfig
index 060600b..7c33c8b 100644
--- a/net/rfkill/Kconfig
+++ b/net/rfkill/Kconfig
@@ -10,6 +10,11 @@
To compile this driver as a module, choose M here: the
module will be called rfkill.
+config RFKILL_PM
+ bool "Power off on suspend"
+ depends on RFKILL && PM
+ default y
+
# LED trigger support
config RFKILL_LEDS
bool
diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index 2064c3a..d223a86e 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -854,8 +854,7 @@
}
EXPORT_SYMBOL(rfkill_resume_polling);
-#ifdef CONFIG_PM_SLEEP
-static int rfkill_suspend(struct device *dev)
+static __maybe_unused int rfkill_suspend(struct device *dev)
{
struct rfkill *rfkill = to_rfkill(dev);
@@ -865,7 +864,7 @@
return 0;
}
-static int rfkill_resume(struct device *dev)
+static __maybe_unused int rfkill_resume(struct device *dev)
{
struct rfkill *rfkill = to_rfkill(dev);
bool cur;
@@ -885,17 +884,13 @@
}
static SIMPLE_DEV_PM_OPS(rfkill_pm_ops, rfkill_suspend, rfkill_resume);
-#define RFKILL_PM_OPS (&rfkill_pm_ops)
-#else
-#define RFKILL_PM_OPS NULL
-#endif
static struct class rfkill_class = {
.name = "rfkill",
.dev_release = rfkill_release,
.dev_groups = rfkill_dev_groups,
.dev_uevent = rfkill_dev_uevent,
- .pm = RFKILL_PM_OPS,
+ .pm = IS_ENABLED(CONFIG_RFKILL_PM) ? &rfkill_pm_ops : NULL,
};
bool rfkill_blocked(struct rfkill *rfkill)
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index aa04666..e5e4e18 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -50,7 +50,7 @@
static DEFINE_MUTEX(svc_pool_map_mutex);/* protects svc_pool_map.count only */
static int
-param_set_pool_mode(const char *val, struct kernel_param *kp)
+param_set_pool_mode(const char *val, const struct kernel_param *kp)
{
int *ip = (int *)kp->arg;
struct svc_pool_map *m = &svc_pool_map;
@@ -80,7 +80,7 @@
}
static int
-param_get_pool_mode(char *buf, struct kernel_param *kp)
+param_get_pool_mode(char *buf, const struct kernel_param *kp)
{
int *ip = (int *)kp->arg;
diff --git a/net/wireless/scan.c b/net/wireless/scan.c
index f6c5fe48..1289cc1 100644
--- a/net/wireless/scan.c
+++ b/net/wireless/scan.c
@@ -71,7 +71,7 @@
MODULE_PARM_DESC(bss_entries_limit,
"limit to number of scan BSS entries (per wiphy, default 1000)");
-#define IEEE80211_SCAN_RESULT_EXPIRE (30 * HZ)
+#define IEEE80211_SCAN_RESULT_EXPIRE (7 * HZ)
static void bss_free(struct cfg80211_internal_bss *bss)
{
diff --git a/net/xfrm/Kconfig b/net/xfrm/Kconfig
index 286ed25c..5338188 100644
--- a/net/xfrm/Kconfig
+++ b/net/xfrm/Kconfig
@@ -25,6 +25,14 @@
If unsure, say Y.
+config XFRM_INTERFACE
+ tristate "Transformation virtual interface"
+ depends on XFRM && IPV6
+ ---help---
+ This provides a virtual interface to route IPsec traffic.
+
+ If unsure, say N.
+
config XFRM_SUB_POLICY
bool "Transformation sub policy support"
depends on XFRM
diff --git a/net/xfrm/Makefile b/net/xfrm/Makefile
index 0bd2465..fbc4552 100644
--- a/net/xfrm/Makefile
+++ b/net/xfrm/Makefile
@@ -10,3 +10,4 @@
obj-$(CONFIG_XFRM_ALGO) += xfrm_algo.o
obj-$(CONFIG_XFRM_USER) += xfrm_user.o
obj-$(CONFIG_XFRM_IPCOMP) += xfrm_ipcomp.o
+obj-$(CONFIG_XFRM_INTERFACE) += xfrm_interface.o
diff --git a/net/xfrm/xfrm_algo.c b/net/xfrm/xfrm_algo.c
index 44ac85f..d0ca0db 100644
--- a/net/xfrm/xfrm_algo.c
+++ b/net/xfrm/xfrm_algo.c
@@ -241,7 +241,7 @@
.uinfo = {
.auth = {
- .icv_truncbits = 96,
+ .icv_truncbits = 128,
.icv_fullbits = 256,
}
},
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 30e5746..4a6859e 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -80,7 +80,8 @@
}
dst = __xfrm_dst_lookup(net, 0, 0, saddr, daddr,
- x->props.family, x->props.output_mark);
+ x->props.family,
+ xfrm_smark_get(0, x));
if (IS_ERR(dst))
return 0;
diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 9f492dc..a6f6510 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -320,6 +320,7 @@
seq = 0;
if (!spi && (err = xfrm_parse_spi(skb, nexthdr, &spi, &seq)) != 0) {
+ secpath_reset(skb);
XFRM_INC_STATS(net, LINUX_MIB_XFRMINHDRERROR);
goto drop;
}
@@ -328,17 +329,21 @@
XFRM_SPI_SKB_CB(skb)->daddroff);
do {
if (skb->sp->len == XFRM_MAX_DEPTH) {
+ secpath_reset(skb);
XFRM_INC_STATS(net, LINUX_MIB_XFRMINBUFFERERROR);
goto drop;
}
x = xfrm_state_lookup(net, mark, daddr, spi, nexthdr, family);
if (x == NULL) {
+ secpath_reset(skb);
XFRM_INC_STATS(net, LINUX_MIB_XFRMINNOSTATES);
xfrm_audit_state_notfound(skb, family, spi, seq);
goto drop;
}
+ skb->mark = xfrm_smark_get(skb->mark, x);
+
skb->sp->xvec[skb->sp->len++] = x;
lock:
diff --git a/net/xfrm/xfrm_interface.c b/net/xfrm/xfrm_interface.c
new file mode 100644
index 0000000..31acc6f
--- /dev/null
+++ b/net/xfrm/xfrm_interface.c
@@ -0,0 +1,975 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * XFRM virtual interface
+ *
+ * Copyright (C) 2018 secunet Security Networks AG
+ *
+ * Author:
+ * Steffen Klassert <steffen.klassert@secunet.com>
+ */
+
+#include <linux/module.h>
+#include <linux/capability.h>
+#include <linux/errno.h>
+#include <linux/types.h>
+#include <linux/sockios.h>
+#include <linux/icmp.h>
+#include <linux/if.h>
+#include <linux/in.h>
+#include <linux/ip.h>
+#include <linux/net.h>
+#include <linux/in6.h>
+#include <linux/netdevice.h>
+#include <linux/if_link.h>
+#include <linux/if_arp.h>
+#include <linux/icmpv6.h>
+#include <linux/init.h>
+#include <linux/route.h>
+#include <linux/rtnetlink.h>
+#include <linux/netfilter_ipv6.h>
+#include <linux/slab.h>
+#include <linux/hash.h>
+
+#include <linux/uaccess.h>
+#include <linux/atomic.h>
+
+#include <net/icmp.h>
+#include <net/ip.h>
+#include <net/ipv6.h>
+#include <net/ip6_route.h>
+#include <net/addrconf.h>
+#include <net/xfrm.h>
+#include <net/net_namespace.h>
+#include <net/netns/generic.h>
+#include <linux/etherdevice.h>
+
+static int xfrmi_dev_init(struct net_device *dev);
+static void xfrmi_dev_setup(struct net_device *dev);
+static struct rtnl_link_ops xfrmi_link_ops __read_mostly;
+static unsigned int xfrmi_net_id __read_mostly;
+
+struct xfrmi_net {
+ /* lists for storing interfaces in use */
+ struct xfrm_if __rcu *xfrmi[1];
+};
+
+#define for_each_xfrmi_rcu(start, xi) \
+ for (xi = rcu_dereference(start); xi; xi = rcu_dereference(xi->next))
+
+static struct xfrm_if *xfrmi_lookup(struct net *net, struct xfrm_state *x)
+{
+ struct xfrmi_net *xfrmn = net_generic(net, xfrmi_net_id);
+ struct xfrm_if *xi;
+
+ for_each_xfrmi_rcu(xfrmn->xfrmi[0], xi) {
+ if (x->if_id == xi->p.if_id &&
+ (xi->dev->flags & IFF_UP))
+ return xi;
+ }
+
+ return NULL;
+}
+
+static struct xfrm_if *xfrmi_decode_session(struct sk_buff *skb)
+{
+ struct xfrmi_net *xfrmn;
+ int ifindex;
+ struct xfrm_if *xi;
+
+ if (!skb->dev)
+ return NULL;
+
+ xfrmn = net_generic(dev_net(skb->dev), xfrmi_net_id);
+ ifindex = skb->dev->ifindex;
+
+ for_each_xfrmi_rcu(xfrmn->xfrmi[0], xi) {
+ if (ifindex == xi->dev->ifindex &&
+ (xi->dev->flags & IFF_UP))
+ return xi;
+ }
+
+ return NULL;
+}
+
+static void xfrmi_link(struct xfrmi_net *xfrmn, struct xfrm_if *xi)
+{
+ struct xfrm_if __rcu **xip = &xfrmn->xfrmi[0];
+
+ rcu_assign_pointer(xi->next , rtnl_dereference(*xip));
+ rcu_assign_pointer(*xip, xi);
+}
+
+static void xfrmi_unlink(struct xfrmi_net *xfrmn, struct xfrm_if *xi)
+{
+ struct xfrm_if __rcu **xip;
+ struct xfrm_if *iter;
+
+ for (xip = &xfrmn->xfrmi[0];
+ (iter = rtnl_dereference(*xip)) != NULL;
+ xip = &iter->next) {
+ if (xi == iter) {
+ rcu_assign_pointer(*xip, xi->next);
+ break;
+ }
+ }
+}
+
+static void xfrmi_dev_free(struct net_device *dev)
+{
+ free_percpu(dev->tstats);
+}
+
+static int xfrmi_create2(struct net_device *dev)
+{
+ struct xfrm_if *xi = netdev_priv(dev);
+ struct net *net = dev_net(dev);
+ struct xfrmi_net *xfrmn = net_generic(net, xfrmi_net_id);
+ int err;
+
+ dev->rtnl_link_ops = &xfrmi_link_ops;
+ err = register_netdevice(dev);
+ if (err < 0)
+ goto out;
+
+ strcpy(xi->p.name, dev->name);
+
+ dev_hold(dev);
+ xfrmi_link(xfrmn, xi);
+
+ return 0;
+
+out:
+ return err;
+}
+
+static struct xfrm_if *xfrmi_create(struct net *net, struct xfrm_if_parms *p)
+{
+ struct net_device *dev;
+ struct xfrm_if *xi;
+ char name[IFNAMSIZ];
+ int err;
+
+ if (p->name[0]) {
+ strlcpy(name, p->name, IFNAMSIZ);
+ } else {
+ err = -EINVAL;
+ goto failed;
+ }
+
+ dev = alloc_netdev(sizeof(*xi), name, NET_NAME_UNKNOWN, xfrmi_dev_setup);
+ if (!dev) {
+ err = -EAGAIN;
+ goto failed;
+ }
+
+ dev_net_set(dev, net);
+
+ xi = netdev_priv(dev);
+ xi->p = *p;
+ xi->net = net;
+ xi->dev = dev;
+ xi->phydev = dev_get_by_index(net, p->link);
+ if (!xi->phydev) {
+ err = -ENODEV;
+ goto failed_free;
+ }
+
+ err = xfrmi_create2(dev);
+ if (err < 0)
+ goto failed_dev_put;
+
+ return xi;
+
+failed_dev_put:
+ dev_put(xi->phydev);
+failed_free:
+ free_netdev(dev);
+failed:
+ return ERR_PTR(err);
+}
+
+static struct xfrm_if *xfrmi_locate(struct net *net, struct xfrm_if_parms *p,
+ int create)
+{
+ struct xfrm_if __rcu **xip;
+ struct xfrm_if *xi;
+ struct xfrmi_net *xfrmn = net_generic(net, xfrmi_net_id);
+
+ for (xip = &xfrmn->xfrmi[0];
+ (xi = rtnl_dereference(*xip)) != NULL;
+ xip = &xi->next) {
+ if (xi->p.if_id == p->if_id) {
+ if (create)
+ return ERR_PTR(-EEXIST);
+
+ return xi;
+ }
+ }
+ if (!create)
+ return ERR_PTR(-ENODEV);
+ return xfrmi_create(net, p);
+}
+
+static void xfrmi_dev_uninit(struct net_device *dev)
+{
+ struct xfrm_if *xi = netdev_priv(dev);
+ struct xfrmi_net *xfrmn = net_generic(xi->net, xfrmi_net_id);
+
+ xfrmi_unlink(xfrmn, xi);
+ dev_put(xi->phydev);
+ dev_put(dev);
+}
+
+static void xfrmi_scrub_packet(struct sk_buff *skb, bool xnet)
+{
+ skb->tstamp = 0;
+ skb->pkt_type = PACKET_HOST;
+ skb->skb_iif = 0;
+ skb->ignore_df = 0;
+ skb_dst_drop(skb);
+ nf_reset(skb);
+ nf_reset_trace(skb);
+
+ if (!xnet)
+ return;
+
+ ipvs_reset(skb);
+ secpath_reset(skb);
+ skb_orphan(skb);
+ skb->mark = 0;
+}
+
+static int xfrmi_rcv_cb(struct sk_buff *skb, int err)
+{
+ struct pcpu_sw_netstats *tstats;
+ struct xfrm_mode *inner_mode;
+ struct net_device *dev;
+ struct xfrm_state *x;
+ struct xfrm_if *xi;
+ bool xnet;
+
+ if (err && !skb->sp)
+ return 0;
+
+ x = xfrm_input_state(skb);
+
+ xi = xfrmi_lookup(xs_net(x), x);
+ if (!xi)
+ return 1;
+
+ dev = xi->dev;
+ skb->dev = dev;
+
+ if (err) {
+ dev->stats.rx_errors++;
+ dev->stats.rx_dropped++;
+
+ return 0;
+ }
+
+ xnet = !net_eq(xi->net, dev_net(skb->dev));
+
+ if (xnet) {
+ inner_mode = x->inner_mode;
+
+ if (x->sel.family == AF_UNSPEC) {
+ inner_mode = xfrm_ip2inner_mode(x, XFRM_MODE_SKB_CB(skb)->protocol);
+ if (inner_mode == NULL) {
+ XFRM_INC_STATS(dev_net(skb->dev),
+ LINUX_MIB_XFRMINSTATEMODEERROR);
+ return -EINVAL;
+ }
+ }
+
+ if (!xfrm_policy_check(NULL, XFRM_POLICY_IN, skb,
+ inner_mode->afinfo->family))
+ return -EPERM;
+ }
+
+ xfrmi_scrub_packet(skb, xnet);
+
+ tstats = this_cpu_ptr(dev->tstats);
+
+ u64_stats_update_begin(&tstats->syncp);
+ tstats->rx_packets++;
+ tstats->rx_bytes += skb->len;
+ u64_stats_update_end(&tstats->syncp);
+
+ return 0;
+}
+
+static int
+xfrmi_xmit2(struct sk_buff *skb, struct net_device *dev, struct flowi *fl)
+{
+ struct xfrm_if *xi = netdev_priv(dev);
+ struct net_device_stats *stats = &xi->dev->stats;
+ struct dst_entry *dst = skb_dst(skb);
+ unsigned int length = skb->len;
+ struct net_device *tdev;
+ struct xfrm_state *x;
+ int err = -1;
+ int mtu;
+
+ if (!dst)
+ goto tx_err_link_failure;
+
+ dst_hold(dst);
+ dst = xfrm_lookup_with_ifid(xi->net, dst, fl, NULL, 0, xi->p.if_id);
+ if (IS_ERR(dst)) {
+ err = PTR_ERR(dst);
+ dst = NULL;
+ goto tx_err_link_failure;
+ }
+
+ x = dst->xfrm;
+ if (!x)
+ goto tx_err_link_failure;
+
+ if (x->if_id != xi->p.if_id)
+ goto tx_err_link_failure;
+
+ tdev = dst->dev;
+
+ if (tdev == dev) {
+ stats->collisions++;
+ net_warn_ratelimited("%s: Local routing loop detected!\n",
+ xi->p.name);
+ goto tx_err_dst_release;
+ }
+
+ mtu = dst_mtu(dst);
+ if (!skb->ignore_df && skb->len > mtu) {
+ skb_dst_update_pmtu(skb, mtu);
+
+ if (skb->protocol == htons(ETH_P_IPV6)) {
+ if (mtu < IPV6_MIN_MTU)
+ mtu = IPV6_MIN_MTU;
+
+ icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu);
+ } else {
+ icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
+ htonl(mtu));
+ }
+
+ dst_release(dst);
+ return -EMSGSIZE;
+ }
+
+ xfrmi_scrub_packet(skb, !net_eq(xi->net, dev_net(dev)));
+ skb_dst_set(skb, dst);
+ skb->dev = tdev;
+
+ err = dst_output(xi->net, skb->sk, skb);
+ if (net_xmit_eval(err) == 0) {
+ struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats);
+
+ u64_stats_update_begin(&tstats->syncp);
+ tstats->tx_bytes += length;
+ tstats->tx_packets++;
+ u64_stats_update_end(&tstats->syncp);
+ } else {
+ stats->tx_errors++;
+ stats->tx_aborted_errors++;
+ }
+
+ return 0;
+tx_err_link_failure:
+ stats->tx_carrier_errors++;
+ dst_link_failure(skb);
+tx_err_dst_release:
+ dst_release(dst);
+ return err;
+}
+
+static netdev_tx_t xfrmi_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+ struct xfrm_if *xi = netdev_priv(dev);
+ struct net_device_stats *stats = &xi->dev->stats;
+ struct flowi fl;
+ int ret;
+
+ memset(&fl, 0, sizeof(fl));
+
+ switch (skb->protocol) {
+ case htons(ETH_P_IPV6):
+ xfrm_decode_session(skb, &fl, AF_INET6);
+ memset(IP6CB(skb), 0, sizeof(*IP6CB(skb)));
+ break;
+ case htons(ETH_P_IP):
+ xfrm_decode_session(skb, &fl, AF_INET);
+ memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
+ break;
+ default:
+ goto tx_err;
+ }
+
+ fl.flowi_oif = xi->phydev->ifindex;
+
+ ret = xfrmi_xmit2(skb, dev, &fl);
+ if (ret < 0)
+ goto tx_err;
+
+ return NETDEV_TX_OK;
+
+tx_err:
+ stats->tx_errors++;
+ stats->tx_dropped++;
+ kfree_skb(skb);
+ return NETDEV_TX_OK;
+}
+
+static int xfrmi4_err(struct sk_buff *skb, u32 info)
+{
+ const struct iphdr *iph = (const struct iphdr *)skb->data;
+ struct net *net = dev_net(skb->dev);
+ int protocol = iph->protocol;
+ struct ip_comp_hdr *ipch;
+ struct ip_esp_hdr *esph;
+ struct ip_auth_hdr *ah ;
+ struct xfrm_state *x;
+ struct xfrm_if *xi;
+ __be32 spi;
+
+ switch (protocol) {
+ case IPPROTO_ESP:
+ esph = (struct ip_esp_hdr *)(skb->data+(iph->ihl<<2));
+ spi = esph->spi;
+ break;
+ case IPPROTO_AH:
+ ah = (struct ip_auth_hdr *)(skb->data+(iph->ihl<<2));
+ spi = ah->spi;
+ break;
+ case IPPROTO_COMP:
+ ipch = (struct ip_comp_hdr *)(skb->data+(iph->ihl<<2));
+ spi = htonl(ntohs(ipch->cpi));
+ break;
+ default:
+ return 0;
+ }
+
+ switch (icmp_hdr(skb)->type) {
+ case ICMP_DEST_UNREACH:
+ if (icmp_hdr(skb)->code != ICMP_FRAG_NEEDED)
+ return 0;
+ case ICMP_REDIRECT:
+ break;
+ default:
+ return 0;
+ }
+
+ x = xfrm_state_lookup(net, skb->mark, (const xfrm_address_t *)&iph->daddr,
+ spi, protocol, AF_INET);
+ if (!x)
+ return 0;
+
+ xi = xfrmi_lookup(net, x);
+ if (!xi) {
+ xfrm_state_put(x);
+ return -1;
+ }
+
+ if (icmp_hdr(skb)->type == ICMP_DEST_UNREACH)
+ ipv4_update_pmtu(skb, net, info, 0, 0, protocol, 0);
+ else
+ ipv4_redirect(skb, net, 0, 0, protocol, 0);
+ xfrm_state_put(x);
+
+ return 0;
+}
+
+static int xfrmi6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
+ u8 type, u8 code, int offset, __be32 info)
+{
+ const struct ipv6hdr *iph = (const struct ipv6hdr *)skb->data;
+ struct net *net = dev_net(skb->dev);
+ int protocol = iph->nexthdr;
+ struct ip_comp_hdr *ipch;
+ struct ip_esp_hdr *esph;
+ struct ip_auth_hdr *ah;
+ struct xfrm_state *x;
+ struct xfrm_if *xi;
+ __be32 spi;
+
+ switch (protocol) {
+ case IPPROTO_ESP:
+ esph = (struct ip_esp_hdr *)(skb->data + offset);
+ spi = esph->spi;
+ break;
+ case IPPROTO_AH:
+ ah = (struct ip_auth_hdr *)(skb->data + offset);
+ spi = ah->spi;
+ break;
+ case IPPROTO_COMP:
+ ipch = (struct ip_comp_hdr *)(skb->data + offset);
+ spi = htonl(ntohs(ipch->cpi));
+ break;
+ default:
+ return 0;
+ }
+
+ if (type != ICMPV6_PKT_TOOBIG &&
+ type != NDISC_REDIRECT)
+ return 0;
+
+ x = xfrm_state_lookup(net, skb->mark, (const xfrm_address_t *)&iph->daddr,
+ spi, protocol, AF_INET6);
+ if (!x)
+ return 0;
+
+ xi = xfrmi_lookup(net, x);
+ if (!xi) {
+ xfrm_state_put(x);
+ return -1;
+ }
+
+ if (type == NDISC_REDIRECT)
+ ip6_redirect(skb, net, skb->dev->ifindex, 0,
+ sock_net_uid(net, NULL));
+ else
+ ip6_update_pmtu(skb, net, info, 0, 0, sock_net_uid(net, NULL));
+ xfrm_state_put(x);
+
+ return 0;
+}
+
+static int xfrmi_change(struct xfrm_if *xi, const struct xfrm_if_parms *p)
+{
+ if (xi->p.link != p->link)
+ return -EINVAL;
+
+ xi->p.if_id = p->if_id;
+
+ return 0;
+}
+
+static int xfrmi_update(struct xfrm_if *xi, struct xfrm_if_parms *p)
+{
+ struct net *net = dev_net(xi->dev);
+ struct xfrmi_net *xfrmn = net_generic(net, xfrmi_net_id);
+ int err;
+
+ xfrmi_unlink(xfrmn, xi);
+ synchronize_net();
+ err = xfrmi_change(xi, p);
+ xfrmi_link(xfrmn, xi);
+ netdev_state_change(xi->dev);
+ return err;
+}
+
+static void xfrmi_get_stats64(struct net_device *dev,
+ struct rtnl_link_stats64 *s)
+{
+ int cpu;
+
+ if (!dev->tstats)
+ return;
+
+ for_each_possible_cpu(cpu) {
+ struct pcpu_sw_netstats *stats;
+ struct pcpu_sw_netstats tmp;
+ int start;
+
+ stats = per_cpu_ptr(dev->tstats, cpu);
+ do {
+ start = u64_stats_fetch_begin_irq(&stats->syncp);
+ tmp.rx_packets = stats->rx_packets;
+ tmp.rx_bytes = stats->rx_bytes;
+ tmp.tx_packets = stats->tx_packets;
+ tmp.tx_bytes = stats->tx_bytes;
+ } while (u64_stats_fetch_retry_irq(&stats->syncp, start));
+
+ s->rx_packets += tmp.rx_packets;
+ s->rx_bytes += tmp.rx_bytes;
+ s->tx_packets += tmp.tx_packets;
+ s->tx_bytes += tmp.tx_bytes;
+ }
+
+ s->rx_dropped = dev->stats.rx_dropped;
+ s->tx_dropped = dev->stats.tx_dropped;
+}
+
+static int xfrmi_get_iflink(const struct net_device *dev)
+{
+ struct xfrm_if *xi = netdev_priv(dev);
+
+ return xi->phydev->ifindex;
+}
+
+
+static const struct net_device_ops xfrmi_netdev_ops = {
+ .ndo_init = xfrmi_dev_init,
+ .ndo_uninit = xfrmi_dev_uninit,
+ .ndo_start_xmit = xfrmi_xmit,
+ .ndo_get_stats64 = xfrmi_get_stats64,
+ .ndo_get_iflink = xfrmi_get_iflink,
+};
+
+static void xfrmi_dev_setup(struct net_device *dev)
+{
+ dev->netdev_ops = &xfrmi_netdev_ops;
+ dev->type = ARPHRD_NONE;
+ dev->hard_header_len = ETH_HLEN;
+ dev->min_header_len = ETH_HLEN;
+ dev->mtu = ETH_DATA_LEN;
+ dev->min_mtu = ETH_MIN_MTU;
+ dev->max_mtu = ETH_DATA_LEN;
+ dev->addr_len = ETH_ALEN;
+ dev->flags = IFF_NOARP;
+ dev->needs_free_netdev = true;
+ dev->priv_destructor = xfrmi_dev_free;
+ netif_keep_dst(dev);
+}
+
+static int xfrmi_dev_init(struct net_device *dev)
+{
+ struct xfrm_if *xi = netdev_priv(dev);
+ struct net_device *phydev = xi->phydev;
+ int err;
+
+ dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
+ if (!dev->tstats)
+ return -ENOMEM;
+
+ err = gro_cells_init(&xi->gro_cells, dev);
+ if (err) {
+ free_percpu(dev->tstats);
+ return err;
+ }
+
+ dev->features |= NETIF_F_LLTX;
+
+ dev->needed_headroom = phydev->needed_headroom;
+ dev->needed_tailroom = phydev->needed_tailroom;
+
+ if (is_zero_ether_addr(dev->dev_addr))
+ eth_hw_addr_inherit(dev, phydev);
+ if (is_zero_ether_addr(dev->broadcast))
+ memcpy(dev->broadcast, phydev->broadcast, dev->addr_len);
+
+ return 0;
+}
+
+static int xfrmi_validate(struct nlattr *tb[], struct nlattr *data[],
+ struct netlink_ext_ack *extack)
+{
+ return 0;
+}
+
+static void xfrmi_netlink_parms(struct nlattr *data[],
+ struct xfrm_if_parms *parms)
+{
+ memset(parms, 0, sizeof(*parms));
+
+ if (!data)
+ return;
+
+ if (data[IFLA_XFRM_LINK])
+ parms->link = nla_get_u32(data[IFLA_XFRM_LINK]);
+
+ if (data[IFLA_XFRM_IF_ID])
+ parms->if_id = nla_get_u32(data[IFLA_XFRM_IF_ID]);
+}
+
+static int xfrmi_newlink(struct net *src_net, struct net_device *dev,
+ struct nlattr *tb[], struct nlattr *data[],
+ struct netlink_ext_ack *extack)
+{
+ struct net *net = dev_net(dev);
+ struct xfrm_if_parms *p;
+ struct xfrm_if *xi;
+
+ xi = netdev_priv(dev);
+ p = &xi->p;
+
+ xfrmi_netlink_parms(data, p);
+
+ if (!tb[IFLA_IFNAME])
+ return -EINVAL;
+
+ nla_strlcpy(p->name, tb[IFLA_IFNAME], IFNAMSIZ);
+
+ xi = xfrmi_locate(net, p, 1);
+ return PTR_ERR_OR_ZERO(xi);
+}
+
+static void xfrmi_dellink(struct net_device *dev, struct list_head *head)
+{
+ unregister_netdevice_queue(dev, head);
+}
+
+static int xfrmi_changelink(struct net_device *dev, struct nlattr *tb[],
+ struct nlattr *data[],
+ struct netlink_ext_ack *extack)
+{
+ struct xfrm_if *xi = netdev_priv(dev);
+ struct net *net = dev_net(dev);
+
+ xfrmi_netlink_parms(data, &xi->p);
+
+ xi = xfrmi_locate(net, &xi->p, 0);
+
+ if (IS_ERR_OR_NULL(xi)) {
+ xi = netdev_priv(dev);
+ } else {
+ if (xi->dev != dev)
+ return -EEXIST;
+ }
+
+ return xfrmi_update(xi, &xi->p);
+}
+
+static size_t xfrmi_get_size(const struct net_device *dev)
+{
+ return
+ /* IFLA_XFRM_LINK */
+ nla_total_size(4) +
+ /* IFLA_XFRM_IF_ID */
+ nla_total_size(4) +
+ 0;
+}
+
+static int xfrmi_fill_info(struct sk_buff *skb, const struct net_device *dev)
+{
+ struct xfrm_if *xi = netdev_priv(dev);
+ struct xfrm_if_parms *parm = &xi->p;
+
+ if (nla_put_u32(skb, IFLA_XFRM_LINK, parm->link) ||
+ nla_put_u32(skb, IFLA_XFRM_IF_ID, parm->if_id))
+ goto nla_put_failure;
+ return 0;
+
+nla_put_failure:
+ return -EMSGSIZE;
+}
+
+struct net *xfrmi_get_link_net(const struct net_device *dev)
+{
+ struct xfrm_if *xi = netdev_priv(dev);
+
+ return dev_net(xi->phydev);
+}
+
+static const struct nla_policy xfrmi_policy[IFLA_XFRM_MAX + 1] = {
+ [IFLA_XFRM_LINK] = { .type = NLA_U32 },
+ [IFLA_XFRM_IF_ID] = { .type = NLA_U32 },
+};
+
+static struct rtnl_link_ops xfrmi_link_ops __read_mostly = {
+ .kind = "xfrm",
+ .maxtype = IFLA_XFRM_MAX,
+ .policy = xfrmi_policy,
+ .priv_size = sizeof(struct xfrm_if),
+ .setup = xfrmi_dev_setup,
+ .validate = xfrmi_validate,
+ .newlink = xfrmi_newlink,
+ .dellink = xfrmi_dellink,
+ .changelink = xfrmi_changelink,
+ .get_size = xfrmi_get_size,
+ .fill_info = xfrmi_fill_info,
+ .get_link_net = xfrmi_get_link_net,
+};
+
+static void __net_exit xfrmi_destroy_interfaces(struct xfrmi_net *xfrmn)
+{
+ struct xfrm_if *xi;
+ LIST_HEAD(list);
+
+ xi = rtnl_dereference(xfrmn->xfrmi[0]);
+ if (!xi)
+ return;
+
+ unregister_netdevice_queue(xi->dev, &list);
+ unregister_netdevice_many(&list);
+}
+
+static int __net_init xfrmi_init_net(struct net *net)
+{
+ return 0;
+}
+
+static void __net_exit xfrmi_exit_net(struct net *net)
+{
+ struct xfrmi_net *xfrmn = net_generic(net, xfrmi_net_id);
+
+ rtnl_lock();
+ xfrmi_destroy_interfaces(xfrmn);
+ rtnl_unlock();
+}
+
+static struct pernet_operations xfrmi_net_ops = {
+ .init = xfrmi_init_net,
+ .exit = xfrmi_exit_net,
+ .id = &xfrmi_net_id,
+ .size = sizeof(struct xfrmi_net),
+};
+
+static struct xfrm6_protocol xfrmi_esp6_protocol __read_mostly = {
+ .handler = xfrm6_rcv,
+ .cb_handler = xfrmi_rcv_cb,
+ .err_handler = xfrmi6_err,
+ .priority = 10,
+};
+
+static struct xfrm6_protocol xfrmi_ah6_protocol __read_mostly = {
+ .handler = xfrm6_rcv,
+ .cb_handler = xfrmi_rcv_cb,
+ .err_handler = xfrmi6_err,
+ .priority = 10,
+};
+
+static struct xfrm6_protocol xfrmi_ipcomp6_protocol __read_mostly = {
+ .handler = xfrm6_rcv,
+ .cb_handler = xfrmi_rcv_cb,
+ .err_handler = xfrmi6_err,
+ .priority = 10,
+};
+
+static struct xfrm4_protocol xfrmi_esp4_protocol __read_mostly = {
+ .handler = xfrm4_rcv,
+ .input_handler = xfrm_input,
+ .cb_handler = xfrmi_rcv_cb,
+ .err_handler = xfrmi4_err,
+ .priority = 10,
+};
+
+static struct xfrm4_protocol xfrmi_ah4_protocol __read_mostly = {
+ .handler = xfrm4_rcv,
+ .input_handler = xfrm_input,
+ .cb_handler = xfrmi_rcv_cb,
+ .err_handler = xfrmi4_err,
+ .priority = 10,
+};
+
+static struct xfrm4_protocol xfrmi_ipcomp4_protocol __read_mostly = {
+ .handler = xfrm4_rcv,
+ .input_handler = xfrm_input,
+ .cb_handler = xfrmi_rcv_cb,
+ .err_handler = xfrmi4_err,
+ .priority = 10,
+};
+
+static int __init xfrmi4_init(void)
+{
+ int err;
+
+ err = xfrm4_protocol_register(&xfrmi_esp4_protocol, IPPROTO_ESP);
+ if (err < 0)
+ goto xfrm_proto_esp_failed;
+ err = xfrm4_protocol_register(&xfrmi_ah4_protocol, IPPROTO_AH);
+ if (err < 0)
+ goto xfrm_proto_ah_failed;
+ err = xfrm4_protocol_register(&xfrmi_ipcomp4_protocol, IPPROTO_COMP);
+ if (err < 0)
+ goto xfrm_proto_comp_failed;
+
+ return 0;
+
+xfrm_proto_comp_failed:
+ xfrm4_protocol_deregister(&xfrmi_ah4_protocol, IPPROTO_AH);
+xfrm_proto_ah_failed:
+ xfrm4_protocol_deregister(&xfrmi_esp4_protocol, IPPROTO_ESP);
+xfrm_proto_esp_failed:
+ return err;
+}
+
+static void xfrmi4_fini(void)
+{
+ xfrm4_protocol_deregister(&xfrmi_ipcomp4_protocol, IPPROTO_COMP);
+ xfrm4_protocol_deregister(&xfrmi_ah4_protocol, IPPROTO_AH);
+ xfrm4_protocol_deregister(&xfrmi_esp4_protocol, IPPROTO_ESP);
+}
+
+static int __init xfrmi6_init(void)
+{
+ int err;
+
+ err = xfrm6_protocol_register(&xfrmi_esp6_protocol, IPPROTO_ESP);
+ if (err < 0)
+ goto xfrm_proto_esp_failed;
+ err = xfrm6_protocol_register(&xfrmi_ah6_protocol, IPPROTO_AH);
+ if (err < 0)
+ goto xfrm_proto_ah_failed;
+ err = xfrm6_protocol_register(&xfrmi_ipcomp6_protocol, IPPROTO_COMP);
+ if (err < 0)
+ goto xfrm_proto_comp_failed;
+
+ return 0;
+
+xfrm_proto_comp_failed:
+ xfrm6_protocol_deregister(&xfrmi_ah6_protocol, IPPROTO_AH);
+xfrm_proto_ah_failed:
+ xfrm6_protocol_deregister(&xfrmi_esp6_protocol, IPPROTO_ESP);
+xfrm_proto_esp_failed:
+ return err;
+}
+
+static void xfrmi6_fini(void)
+{
+ xfrm6_protocol_deregister(&xfrmi_ipcomp6_protocol, IPPROTO_COMP);
+ xfrm6_protocol_deregister(&xfrmi_ah6_protocol, IPPROTO_AH);
+ xfrm6_protocol_deregister(&xfrmi_esp6_protocol, IPPROTO_ESP);
+}
+
+static const struct xfrm_if_cb xfrm_if_cb = {
+ .decode_session = xfrmi_decode_session,
+};
+
+static int __init xfrmi_init(void)
+{
+ const char *msg;
+ int err;
+
+ pr_info("IPsec XFRM device driver\n");
+
+ msg = "tunnel device";
+ err = register_pernet_device(&xfrmi_net_ops);
+ if (err < 0)
+ goto pernet_dev_failed;
+
+ msg = "xfrm4 protocols";
+ err = xfrmi4_init();
+ if (err < 0)
+ goto xfrmi4_failed;
+
+ msg = "xfrm6 protocols";
+ err = xfrmi6_init();
+ if (err < 0)
+ goto xfrmi6_failed;
+
+
+ msg = "netlink interface";
+ err = rtnl_link_register(&xfrmi_link_ops);
+ if (err < 0)
+ goto rtnl_link_failed;
+
+ xfrm_if_register_cb(&xfrm_if_cb);
+
+ return err;
+
+rtnl_link_failed:
+ xfrmi6_fini();
+xfrmi6_failed:
+ xfrmi4_fini();
+xfrmi4_failed:
+ unregister_pernet_device(&xfrmi_net_ops);
+pernet_dev_failed:
+ pr_err("xfrmi init: failed to register %s\n", msg);
+ return err;
+}
+
+static void __exit xfrmi_fini(void)
+{
+ xfrm_if_unregister_cb();
+ rtnl_link_unregister(&xfrmi_link_ops);
+ xfrmi4_fini();
+ xfrmi6_fini();
+ unregister_pernet_device(&xfrmi_net_ops);
+}
+
+module_init(xfrmi_init);
+module_exit(xfrmi_fini);
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_RTNL_LINK("xfrm");
+MODULE_ALIAS_NETDEV("xfrm0");
+MODULE_AUTHOR("Steffen Klassert");
+MODULE_DESCRIPTION("XFRM virtual interface");
diff --git a/net/xfrm/xfrm_output.c b/net/xfrm/xfrm_output.c
index 35610cc..0f26404 100644
--- a/net/xfrm/xfrm_output.c
+++ b/net/xfrm/xfrm_output.c
@@ -66,8 +66,7 @@
goto error_nolock;
}
- if (x->props.output_mark)
- skb->mark = x->props.output_mark;
+ skb->mark = xfrm_smark_get(skb->mark, x);
err = x->outer_mode->output(x, skb);
if (err) {
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index a6c0027..04cd312 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -47,6 +47,9 @@
static DEFINE_PER_CPU(struct xfrm_dst *, xfrm_last_dst);
static struct work_struct *xfrm_pcpu_work __read_mostly;
+static DEFINE_SPINLOCK(xfrm_if_cb_lock);
+static struct xfrm_if_cb const __rcu *xfrm_if_cb __read_mostly;
+
static DEFINE_SPINLOCK(xfrm_policy_afinfo_lock);
static struct xfrm_policy_afinfo const __rcu *xfrm_policy_afinfo[AF_INET6 + 1]
__read_mostly;
@@ -119,6 +122,12 @@
return afinfo;
}
+/* Called with rcu_read_lock(). */
+static const struct xfrm_if_cb *xfrm_if_get_cb(void)
+{
+ return rcu_dereference(xfrm_if_cb);
+}
+
struct dst_entry *__xfrm_dst_lookup(struct net *net, int tos, int oif,
const xfrm_address_t *saddr,
const xfrm_address_t *daddr,
@@ -748,6 +757,7 @@
newpos = NULL;
hlist_for_each_entry(pol, chain, bydst) {
if (pol->type == policy->type &&
+ pol->if_id == policy->if_id &&
!selector_cmp(&pol->selector, &policy->selector) &&
xfrm_policy_mark_match(policy, pol) &&
xfrm_sec_ctx_match(pol->security, policy->security) &&
@@ -799,8 +809,9 @@
}
EXPORT_SYMBOL(xfrm_policy_insert);
-struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark, u8 type,
- int dir, struct xfrm_selector *sel,
+struct xfrm_policy *xfrm_policy_bysel_ctx(struct net *net, u32 mark, u32 if_id,
+ u8 type, int dir,
+ struct xfrm_selector *sel,
struct xfrm_sec_ctx *ctx, int delete,
int *err)
{
@@ -813,6 +824,7 @@
ret = NULL;
hlist_for_each_entry(pol, chain, bydst) {
if (pol->type == type &&
+ pol->if_id == if_id &&
(mark & pol->mark.m) == pol->mark.v &&
!selector_cmp(sel, &pol->selector) &&
xfrm_sec_ctx_match(ctx, pol->security)) {
@@ -838,8 +850,9 @@
}
EXPORT_SYMBOL(xfrm_policy_bysel_ctx);
-struct xfrm_policy *xfrm_policy_byid(struct net *net, u32 mark, u8 type,
- int dir, u32 id, int delete, int *err)
+struct xfrm_policy *xfrm_policy_byid(struct net *net, u32 mark, u32 if_id,
+ u8 type, int dir, u32 id, int delete,
+ int *err)
{
struct xfrm_policy *pol, *ret;
struct hlist_head *chain;
@@ -854,6 +867,7 @@
ret = NULL;
hlist_for_each_entry(pol, chain, byidx) {
if (pol->type == type && pol->index == id &&
+ pol->if_id == if_id &&
(mark & pol->mark.m) == pol->mark.v) {
xfrm_pol_hold(pol);
if (delete) {
@@ -1057,13 +1071,14 @@
*/
static int xfrm_policy_match(const struct xfrm_policy *pol,
const struct flowi *fl,
- u8 type, u16 family, int dir)
+ u8 type, u16 family, int dir, u32 if_id)
{
const struct xfrm_selector *sel = &pol->selector;
int ret = -ESRCH;
bool match;
if (pol->family != family ||
+ pol->if_id != if_id ||
(fl->flowi_mark & pol->mark.m) != pol->mark.v ||
pol->type != type)
return ret;
@@ -1078,7 +1093,8 @@
static struct xfrm_policy *xfrm_policy_lookup_bytype(struct net *net, u8 type,
const struct flowi *fl,
- u16 family, u8 dir)
+ u16 family, u8 dir,
+ u32 if_id)
{
int err;
struct xfrm_policy *pol, *ret;
@@ -1102,7 +1118,7 @@
priority = ~0U;
ret = NULL;
hlist_for_each_entry_rcu(pol, chain, bydst) {
- err = xfrm_policy_match(pol, fl, type, family, dir);
+ err = xfrm_policy_match(pol, fl, type, family, dir, if_id);
if (err) {
if (err == -ESRCH)
continue;
@@ -1121,7 +1137,7 @@
if ((pol->priority >= priority) && ret)
break;
- err = xfrm_policy_match(pol, fl, type, family, dir);
+ err = xfrm_policy_match(pol, fl, type, family, dir, if_id);
if (err) {
if (err == -ESRCH)
continue;
@@ -1146,21 +1162,25 @@
return ret;
}
-static struct xfrm_policy *
-xfrm_policy_lookup(struct net *net, const struct flowi *fl, u16 family, u8 dir)
+static struct xfrm_policy *xfrm_policy_lookup(struct net *net,
+ const struct flowi *fl,
+ u16 family, u8 dir, u32 if_id)
{
#ifdef CONFIG_XFRM_SUB_POLICY
struct xfrm_policy *pol;
- pol = xfrm_policy_lookup_bytype(net, XFRM_POLICY_TYPE_SUB, fl, family, dir);
+ pol = xfrm_policy_lookup_bytype(net, XFRM_POLICY_TYPE_SUB, fl, family,
+ dir, if_id);
if (pol != NULL)
return pol;
#endif
- return xfrm_policy_lookup_bytype(net, XFRM_POLICY_TYPE_MAIN, fl, family, dir);
+ return xfrm_policy_lookup_bytype(net, XFRM_POLICY_TYPE_MAIN, fl, family,
+ dir, if_id);
}
static struct xfrm_policy *xfrm_sk_policy_lookup(const struct sock *sk, int dir,
- const struct flowi *fl, u16 family)
+ const struct flowi *fl,
+ u16 family, u32 if_id)
{
struct xfrm_policy *pol;
@@ -1178,7 +1198,8 @@
match = xfrm_selector_match(&pol->selector, fl, family);
if (match) {
- if ((sk->sk_mark & pol->mark.m) != pol->mark.v) {
+ if ((sk->sk_mark & pol->mark.m) != pol->mark.v ||
+ pol->if_id != if_id) {
pol = NULL;
goto out;
}
@@ -1306,6 +1327,7 @@
newp->lft = old->lft;
newp->curlft = old->curlft;
newp->mark = old->mark;
+ newp->if_id = old->if_id;
newp->action = old->action;
newp->flags = old->flags;
newp->xfrm_nr = old->xfrm_nr;
@@ -1391,7 +1413,8 @@
}
}
- x = xfrm_state_find(remote, local, fl, tmpl, policy, &error, family);
+ x = xfrm_state_find(remote, local, fl, tmpl, policy, &error,
+ family, policy->if_id);
if (x && x->km.state == XFRM_STATE_VALID) {
xfrm[nx++] = x;
@@ -1605,10 +1628,11 @@
dst_copy_metrics(dst1, dst);
if (xfrm[i]->props.mode != XFRM_MODE_TRANSPORT) {
+ __u32 mark = xfrm_smark_get(fl->flowi_mark, xfrm[i]);
+
family = xfrm[i]->props.family;
dst = xfrm_dst_lookup(xfrm[i], tos, fl->flowi_oif,
- &saddr, &daddr, family,
- xfrm[i]->props.output_mark);
+ &saddr, &daddr, family, mark);
err = PTR_ERR(dst);
if (IS_ERR(dst))
goto put_states;
@@ -1693,7 +1717,8 @@
pols[1] = xfrm_policy_lookup_bytype(xp_net(pols[0]),
XFRM_POLICY_TYPE_MAIN,
fl, family,
- XFRM_POLICY_OUT);
+ XFRM_POLICY_OUT,
+ pols[0]->if_id);
if (pols[1]) {
if (IS_ERR(pols[1])) {
xfrm_pols_put(pols, *num_pols);
@@ -2046,8 +2071,10 @@
goto out;
}
-static struct xfrm_dst *
-xfrm_bundle_lookup(struct net *net, const struct flowi *fl, u16 family, u8 dir, struct xfrm_flo *xflo)
+static struct xfrm_dst *xfrm_bundle_lookup(struct net *net,
+ const struct flowi *fl,
+ u16 family, u8 dir,
+ struct xfrm_flo *xflo, u32 if_id)
{
struct xfrm_policy *pols[XFRM_POLICY_TYPE_MAX];
int num_pols = 0, num_xfrms = 0, err;
@@ -2056,7 +2083,7 @@
/* Resolve policies to use if we couldn't get them from
* previous cache entry */
num_pols = 1;
- pols[0] = xfrm_policy_lookup(net, fl, family, dir);
+ pols[0] = xfrm_policy_lookup(net, fl, family, dir, if_id);
err = xfrm_expand_policies(fl, family, pols,
&num_pols, &num_xfrms);
if (err < 0)
@@ -2073,6 +2100,11 @@
if (IS_ERR(xdst)) {
err = PTR_ERR(xdst);
+ if (err == -EREMOTE) {
+ xfrm_pols_put(pols, num_pols);
+ return NULL;
+ }
+
if (err != -EAGAIN)
goto error;
goto make_dummy_bundle;
@@ -2122,14 +2154,19 @@
return ret;
}
-/* Main function: finds/creates a bundle for given flow.
+/* Finds/creates a bundle for given flow and if_id
*
* At the moment we eat a raw IP route. Mostly to speed up lookups
* on interfaces with disabled IPsec.
+ *
+ * xfrm_lookup uses an if_id of 0 by default, and is provided for
+ * compatibility
*/
-struct dst_entry *xfrm_lookup(struct net *net, struct dst_entry *dst_orig,
- const struct flowi *fl,
- const struct sock *sk, int flags)
+struct dst_entry *xfrm_lookup_with_ifid(struct net *net,
+ struct dst_entry *dst_orig,
+ const struct flowi *fl,
+ const struct sock *sk,
+ int flags, u32 if_id)
{
struct xfrm_policy *pols[XFRM_POLICY_TYPE_MAX];
struct xfrm_dst *xdst;
@@ -2145,7 +2182,8 @@
sk = sk_const_to_full_sk(sk);
if (sk && sk->sk_policy[XFRM_POLICY_OUT]) {
num_pols = 1;
- pols[0] = xfrm_sk_policy_lookup(sk, XFRM_POLICY_OUT, fl, family);
+ pols[0] = xfrm_sk_policy_lookup(sk, XFRM_POLICY_OUT, fl, family,
+ if_id);
err = xfrm_expand_policies(fl, family, pols,
&num_pols, &num_xfrms);
if (err < 0)
@@ -2166,6 +2204,9 @@
if (IS_ERR(xdst)) {
xfrm_pols_put(pols, num_pols);
err = PTR_ERR(xdst);
+ if (err == -EREMOTE)
+ goto nopol;
+
goto dropdst;
} else if (xdst == NULL) {
num_xfrms = 0;
@@ -2188,7 +2229,7 @@
!net->xfrm.policy_count[XFRM_POLICY_OUT])
goto nopol;
- xdst = xfrm_bundle_lookup(net, fl, family, dir, &xflo);
+ xdst = xfrm_bundle_lookup(net, fl, family, dir, &xflo, if_id);
if (xdst == NULL)
goto nopol;
if (IS_ERR(xdst)) {
@@ -2269,6 +2310,19 @@
xfrm_pols_put(pols, drop_pols);
return ERR_PTR(err);
}
+EXPORT_SYMBOL(xfrm_lookup_with_ifid);
+
+/* Main function: finds/creates a bundle for given flow.
+ *
+ * At the moment we eat a raw IP route. Mostly to speed up lookups
+ * on interfaces with disabled IPsec.
+ */
+struct dst_entry *xfrm_lookup(struct net *net, struct dst_entry *dst_orig,
+ const struct flowi *fl, const struct sock *sk,
+ int flags)
+{
+ return xfrm_lookup_with_ifid(net, dst_orig, fl, sk, flags, 0);
+}
EXPORT_SYMBOL(xfrm_lookup);
/* Callers of xfrm_lookup_route() must ensure a call to dst_output().
@@ -2367,6 +2421,7 @@
return -EAFNOSUPPORT;
afinfo->decode_session(skb, fl, reverse);
+
err = security_xfrm_decode_session(skb, &fl->flowi_secid);
rcu_read_unlock();
return err;
@@ -2397,6 +2452,19 @@
int reverse;
struct flowi fl;
int xerr_idx = -1;
+ const struct xfrm_if_cb *ifcb;
+ struct xfrm_if *xi;
+ u32 if_id = 0;
+
+ rcu_read_lock();
+ ifcb = xfrm_if_get_cb();
+
+ if (ifcb) {
+ xi = ifcb->decode_session(skb);
+ if (xi)
+ if_id = xi->p.if_id;
+ }
+ rcu_read_unlock();
reverse = dir & ~XFRM_POLICY_MASK;
dir &= XFRM_POLICY_MASK;
@@ -2424,7 +2492,7 @@
pol = NULL;
sk = sk_to_full_sk(sk);
if (sk && sk->sk_policy[dir]) {
- pol = xfrm_sk_policy_lookup(sk, dir, &fl, family);
+ pol = xfrm_sk_policy_lookup(sk, dir, &fl, family, if_id);
if (IS_ERR(pol)) {
XFRM_INC_STATS(net, LINUX_MIB_XFRMINPOLERROR);
return 0;
@@ -2432,7 +2500,7 @@
}
if (!pol)
- pol = xfrm_policy_lookup(net, &fl, family, dir);
+ pol = xfrm_policy_lookup(net, &fl, family, dir, if_id);
if (IS_ERR(pol)) {
XFRM_INC_STATS(net, LINUX_MIB_XFRMINPOLERROR);
@@ -2456,7 +2524,7 @@
if (pols[0]->type != XFRM_POLICY_TYPE_MAIN) {
pols[1] = xfrm_policy_lookup_bytype(net, XFRM_POLICY_TYPE_MAIN,
&fl, family,
- XFRM_POLICY_IN);
+ XFRM_POLICY_IN, if_id);
if (pols[1]) {
if (IS_ERR(pols[1])) {
XFRM_INC_STATS(net, LINUX_MIB_XFRMINPOLERROR);
@@ -2816,6 +2884,21 @@
}
EXPORT_SYMBOL(xfrm_policy_unregister_afinfo);
+void xfrm_if_register_cb(const struct xfrm_if_cb *ifcb)
+{
+ spin_lock(&xfrm_if_cb_lock);
+ rcu_assign_pointer(xfrm_if_cb, ifcb);
+ spin_unlock(&xfrm_if_cb_lock);
+}
+EXPORT_SYMBOL(xfrm_if_register_cb);
+
+void xfrm_if_unregister_cb(void)
+{
+ RCU_INIT_POINTER(xfrm_if_cb, NULL);
+ synchronize_rcu();
+}
+EXPORT_SYMBOL(xfrm_if_unregister_cb);
+
#ifdef CONFIG_XFRM_STATISTICS
static int __net_init xfrm_statistics_init(struct net *net)
{
@@ -2997,6 +3080,9 @@
register_pernet_subsys(&xfrm_net_ops);
seqcount_init(&xfrm_policy_hash_generation);
xfrm_input_init();
+
+ RCU_INIT_POINTER(xfrm_if_cb, NULL);
+ synchronize_rcu();
}
#ifdef CONFIG_AUDITSYSCALL
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 6c4ec69..5e53ad6 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -931,7 +931,7 @@
xfrm_state_find(const xfrm_address_t *daddr, const xfrm_address_t *saddr,
const struct flowi *fl, struct xfrm_tmpl *tmpl,
struct xfrm_policy *pol, int *err,
- unsigned short family)
+ unsigned short family, u32 if_id)
{
static xfrm_address_t saddr_wildcard = { };
struct net *net = xp_net(pol);
@@ -955,6 +955,7 @@
if (x->props.family == encap_family &&
x->props.reqid == tmpl->reqid &&
(mark & x->mark.m) == x->mark.v &&
+ x->if_id == if_id &&
!(x->props.flags & XFRM_STATE_WILDRECV) &&
xfrm_state_addr_check(x, daddr, saddr, encap_family) &&
tmpl->mode == x->props.mode &&
@@ -971,6 +972,7 @@
if (x->props.family == encap_family &&
x->props.reqid == tmpl->reqid &&
(mark & x->mark.m) == x->mark.v &&
+ x->if_id == if_id &&
!(x->props.flags & XFRM_STATE_WILDRECV) &&
xfrm_addr_equal(&x->id.daddr, daddr, encap_family) &&
tmpl->mode == x->props.mode &&
@@ -1010,6 +1012,7 @@
* to current session. */
xfrm_init_tempstate(x, fl, tmpl, daddr, saddr, family);
memcpy(&x->mark, &pol->mark, sizeof(x->mark));
+ x->if_id = if_id;
error = security_xfrm_state_alloc_acquire(x, pol->security, fl->flowi_secid);
if (error) {
@@ -1067,7 +1070,7 @@
}
struct xfrm_state *
-xfrm_stateonly_find(struct net *net, u32 mark,
+xfrm_stateonly_find(struct net *net, u32 mark, u32 if_id,
xfrm_address_t *daddr, xfrm_address_t *saddr,
unsigned short family, u8 mode, u8 proto, u32 reqid)
{
@@ -1080,6 +1083,7 @@
if (x->props.family == family &&
x->props.reqid == reqid &&
(mark & x->mark.m) == x->mark.v &&
+ x->if_id == if_id &&
!(x->props.flags & XFRM_STATE_WILDRECV) &&
xfrm_state_addr_check(x, daddr, saddr, family) &&
mode == x->props.mode &&
@@ -1160,11 +1164,13 @@
struct xfrm_state *x;
unsigned int h;
u32 mark = xnew->mark.v & xnew->mark.m;
+ u32 if_id = xnew->if_id;
h = xfrm_dst_hash(net, &xnew->id.daddr, &xnew->props.saddr, reqid, family);
hlist_for_each_entry(x, net->xfrm.state_bydst+h, bydst) {
if (x->props.family == family &&
x->props.reqid == reqid &&
+ x->if_id == if_id &&
(mark & x->mark.m) == x->mark.v &&
xfrm_addr_equal(&x->id.daddr, &xnew->id.daddr, family) &&
xfrm_addr_equal(&x->props.saddr, &xnew->props.saddr, family))
@@ -1187,7 +1193,7 @@
static struct xfrm_state *__find_acq_core(struct net *net,
const struct xfrm_mark *m,
unsigned short family, u8 mode,
- u32 reqid, u8 proto,
+ u32 reqid, u32 if_id, u8 proto,
const xfrm_address_t *daddr,
const xfrm_address_t *saddr,
int create)
@@ -1242,6 +1248,7 @@
x->props.family = family;
x->props.mode = mode;
x->props.reqid = reqid;
+ x->if_id = if_id;
x->mark.v = m->v;
x->mark.m = m->m;
x->lft.hard_add_expires_seconds = net->xfrm.sysctl_acq_expires;
@@ -1296,7 +1303,7 @@
if (use_spi && !x1)
x1 = __find_acq_core(net, &x->mark, family, x->props.mode,
- x->props.reqid, x->id.proto,
+ x->props.reqid, x->if_id, x->id.proto,
&x->id.daddr, &x->props.saddr, 0);
__xfrm_state_bump_genids(x);
@@ -1395,6 +1402,7 @@
x->props.flags = orig->props.flags;
x->props.extra_flags = orig->props.extra_flags;
+ x->if_id = orig->if_id;
x->tfcpad = orig->tfcpad;
x->replay_maxdiff = orig->replay_maxdiff;
x->replay_maxage = orig->replay_maxage;
@@ -1550,6 +1558,19 @@
if (x1->curlft.use_time)
xfrm_state_check_expire(x1);
+ if (x->props.smark.m || x->props.smark.v || x->if_id) {
+ spin_lock_bh(&net->xfrm.xfrm_state_lock);
+
+ if (x->props.smark.m || x->props.smark.v)
+ x1->props.smark = x->props.smark;
+
+ if (x->if_id)
+ x1->if_id = x->if_id;
+
+ __xfrm_state_bump_genids(x1);
+ spin_unlock_bh(&net->xfrm.xfrm_state_lock);
+ }
+
err = 0;
x->km.state = XFRM_STATE_DEAD;
__xfrm_state_put(x);
@@ -1613,13 +1634,13 @@
struct xfrm_state *
xfrm_find_acq(struct net *net, const struct xfrm_mark *mark, u8 mode, u32 reqid,
- u8 proto, const xfrm_address_t *daddr,
+ u32 if_id, u8 proto, const xfrm_address_t *daddr,
const xfrm_address_t *saddr, int create, unsigned short family)
{
struct xfrm_state *x;
spin_lock_bh(&net->xfrm.xfrm_state_lock);
- x = __find_acq_core(net, mark, family, mode, reqid, proto, daddr, saddr, create);
+ x = __find_acq_core(net, mark, family, mode, reqid, if_id, proto, daddr, saddr, create);
spin_unlock_bh(&net->xfrm.xfrm_state_lock);
return x;
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 5554d28..967a602 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -527,6 +527,19 @@
x->replay_maxdiff = nla_get_u32(rt);
}
+static void xfrm_smark_init(struct nlattr **attrs, struct xfrm_mark *m)
+{
+ if (attrs[XFRMA_SET_MARK]) {
+ m->v = nla_get_u32(attrs[XFRMA_SET_MARK]);
+ if (attrs[XFRMA_SET_MARK_MASK])
+ m->m = nla_get_u32(attrs[XFRMA_SET_MARK_MASK]);
+ else
+ m->m = 0xffffffff;
+ } else {
+ m->v = m->m = 0;
+ }
+}
+
static struct xfrm_state *xfrm_state_construct(struct net *net,
struct xfrm_usersa_info *p,
struct nlattr **attrs,
@@ -579,8 +592,10 @@
xfrm_mark_get(attrs, &x->mark);
- if (attrs[XFRMA_OUTPUT_MARK])
- x->props.output_mark = nla_get_u32(attrs[XFRMA_OUTPUT_MARK]);
+ xfrm_smark_init(attrs, &x->props.smark);
+
+ if (attrs[XFRMA_IF_ID])
+ x->if_id = nla_get_u32(attrs[XFRMA_IF_ID]);
err = __xfrm_init_state(x, false, attrs[XFRMA_OFFLOAD_DEV]);
if (err)
@@ -820,6 +835,18 @@
return 0;
}
+static int xfrm_smark_put(struct sk_buff *skb, struct xfrm_mark *m)
+{
+ int ret = 0;
+
+ if (m->v | m->m) {
+ ret = nla_put_u32(skb, XFRMA_SET_MARK, m->v);
+ if (!ret)
+ ret = nla_put_u32(skb, XFRMA_SET_MARK_MASK, m->m);
+ }
+ return ret;
+}
+
/* Don't change this without updating xfrm_sa_len! */
static int copy_to_user_state_extra(struct xfrm_state *x,
struct xfrm_usersa_info *p,
@@ -883,6 +910,11 @@
ret = xfrm_mark_put(skb, &x->mark);
if (ret)
goto out;
+
+ ret = xfrm_smark_put(skb, &x->props.smark);
+ if (ret)
+ goto out;
+
if (x->replay_esn)
ret = nla_put(skb, XFRMA_REPLAY_ESN_VAL,
xfrm_replay_state_esn_len(x->replay_esn),
@@ -896,8 +928,8 @@
ret = copy_user_offload(&x->xso, skb);
if (ret)
goto out;
- if (x->props.output_mark) {
- ret = nla_put_u32(skb, XFRMA_OUTPUT_MARK, x->props.output_mark);
+ if (x->if_id) {
+ ret = nla_put_u32(skb, XFRMA_IF_ID, x->if_id);
if (ret)
goto out;
}
@@ -1249,6 +1281,7 @@
int err;
u32 mark;
struct xfrm_mark m;
+ u32 if_id = 0;
p = nlmsg_data(nlh);
err = verify_spi_info(p->info.id.proto, p->min, p->max);
@@ -1261,6 +1294,10 @@
x = NULL;
mark = xfrm_mark_get(attrs, &m);
+
+ if (attrs[XFRMA_IF_ID])
+ if_id = nla_get_u32(attrs[XFRMA_IF_ID]);
+
if (p->info.seq) {
x = xfrm_find_acq_byseq(net, mark, p->info.seq);
if (x && !xfrm_addr_equal(&x->id.daddr, daddr, family)) {
@@ -1271,7 +1308,7 @@
if (!x)
x = xfrm_find_acq(net, &m, p->info.mode, p->info.reqid,
- p->info.id.proto, daddr,
+ if_id, p->info.id.proto, daddr,
&p->info.saddr, 1,
family);
err = -ENOENT;
@@ -1559,6 +1596,9 @@
xfrm_mark_get(attrs, &xp->mark);
+ if (attrs[XFRMA_IF_ID])
+ xp->if_id = nla_get_u32(attrs[XFRMA_IF_ID]);
+
return xp;
error:
*errp = err;
@@ -1706,6 +1746,8 @@
err = copy_to_user_policy_type(xp->type, skb);
if (!err)
err = xfrm_mark_put(skb, &xp->mark);
+ if (!err)
+ err = xfrm_if_id_put(skb, xp->if_id);
if (err) {
nlmsg_cancel(skb, nlh);
return err;
@@ -1787,6 +1829,7 @@
int delete;
struct xfrm_mark m;
u32 mark = xfrm_mark_get(attrs, &m);
+ u32 if_id = 0;
p = nlmsg_data(nlh);
delete = nlh->nlmsg_type == XFRM_MSG_DELPOLICY;
@@ -1799,8 +1842,11 @@
if (err)
return err;
+ if (attrs[XFRMA_IF_ID])
+ if_id = nla_get_u32(attrs[XFRMA_IF_ID]);
+
if (p->index)
- xp = xfrm_policy_byid(net, mark, type, p->dir, p->index, delete, &err);
+ xp = xfrm_policy_byid(net, mark, if_id, type, p->dir, p->index, delete, &err);
else {
struct nlattr *rt = attrs[XFRMA_SEC_CTX];
struct xfrm_sec_ctx *ctx;
@@ -1817,7 +1863,7 @@
if (err)
return err;
}
- xp = xfrm_policy_bysel_ctx(net, mark, type, p->dir, &p->sel,
+ xp = xfrm_policy_bysel_ctx(net, mark, if_id, type, p->dir, &p->sel,
ctx, delete, &err);
security_xfrm_policy_free(ctx);
}
@@ -1940,6 +1986,10 @@
if (err)
goto out_cancel;
+ err = xfrm_if_id_put(skb, x->if_id);
+ if (err)
+ goto out_cancel;
+
nlmsg_end(skb, nlh);
return 0;
@@ -2081,6 +2131,7 @@
int err = -ENOENT;
struct xfrm_mark m;
u32 mark = xfrm_mark_get(attrs, &m);
+ u32 if_id = 0;
err = copy_from_user_policy_type(&type, attrs);
if (err)
@@ -2090,8 +2141,11 @@
if (err)
return err;
+ if (attrs[XFRMA_IF_ID])
+ if_id = nla_get_u32(attrs[XFRMA_IF_ID]);
+
if (p->index)
- xp = xfrm_policy_byid(net, mark, type, p->dir, p->index, 0, &err);
+ xp = xfrm_policy_byid(net, mark, if_id, type, p->dir, p->index, 0, &err);
else {
struct nlattr *rt = attrs[XFRMA_SEC_CTX];
struct xfrm_sec_ctx *ctx;
@@ -2108,7 +2162,7 @@
if (err)
return err;
}
- xp = xfrm_policy_bysel_ctx(net, mark, type, p->dir,
+ xp = xfrm_policy_bysel_ctx(net, mark, if_id, type, p->dir,
&p->sel, ctx, 0, &err);
security_xfrm_policy_free(ctx);
}
@@ -2489,7 +2543,9 @@
[XFRMA_PROTO] = { .type = NLA_U8 },
[XFRMA_ADDRESS_FILTER] = { .len = sizeof(struct xfrm_address_filter) },
[XFRMA_OFFLOAD_DEV] = { .len = sizeof(struct xfrm_user_offload) },
- [XFRMA_OUTPUT_MARK] = { .len = NLA_U32 },
+ [XFRMA_SET_MARK] = { .type = NLA_U32 },
+ [XFRMA_SET_MARK_MASK] = { .type = NLA_U32 },
+ [XFRMA_IF_ID] = { .type = NLA_U32 },
};
static const struct nla_policy xfrma_spd_policy[XFRMA_SPD_MAX+1] = {
@@ -2621,6 +2677,10 @@
if (err)
return err;
+ err = xfrm_if_id_put(skb, x->if_id);
+ if (err)
+ return err;
+
nlmsg_end(skb, nlh);
return 0;
}
@@ -2714,8 +2774,12 @@
l += nla_total_size(sizeof(x->props.extra_flags));
if (x->xso.dev)
l += nla_total_size(sizeof(x->xso));
- if (x->props.output_mark)
- l += nla_total_size(sizeof(x->props.output_mark));
+ if (x->props.smark.v | x->props.smark.m) {
+ l += nla_total_size(sizeof(x->props.smark.v));
+ l += nla_total_size(sizeof(x->props.smark.m));
+ }
+ if (x->if_id)
+ l += nla_total_size(sizeof(x->if_id));
/* Must count x->lastused as it may become non-zero behind our back. */
l += nla_total_size_64bit(sizeof(u64));
@@ -2844,6 +2908,8 @@
err = copy_to_user_policy_type(xp->type, skb);
if (!err)
err = xfrm_mark_put(skb, &xp->mark);
+ if (!err)
+ err = xfrm_if_id_put(skb, xp->if_id);
if (err) {
nlmsg_cancel(skb, nlh);
return err;
@@ -2959,6 +3025,8 @@
err = copy_to_user_policy_type(xp->type, skb);
if (!err)
err = xfrm_mark_put(skb, &xp->mark);
+ if (!err)
+ err = xfrm_if_id_put(skb, xp->if_id);
if (err) {
nlmsg_cancel(skb, nlh);
return err;
@@ -3038,6 +3106,8 @@
err = copy_to_user_policy_type(xp->type, skb);
if (!err)
err = xfrm_mark_put(skb, &xp->mark);
+ if (!err)
+ err = xfrm_if_id_put(skb, xp->if_id);
if (err)
goto out_free_skb;
diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index fcbbecf..020097e 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -143,6 +143,37 @@
# Expands to either gcc or clang
cc-name = $(shell $(CC) -v 2>&1 | grep -q "clang version" && echo clang || echo gcc)
+# __cc-version
+# Returns compiler version
+__cc-version = $(shell $(CONFIG_SHELL) $(srctree)/scripts/$(cc-name)-version.sh $(CC))
+
+# __cc-fullversion
+# Returns full compiler version
+__cc-fullversion = $(shell $(CONFIG_SHELL) \
+ $(srctree)/scripts/$(cc-name)-version.sh -p $(CC))
+
+# __cc-ifversion
+# Matches compiler name and version
+# Usage: EXTRA_CFLAGS += $(call cc-if-name-version, gcc, -lt, 0402, -O1)
+__cc-ifversion = $(shell [ $(cc-name) = $(1) ] && [ $(__cc-version) $(2) $(3) ] && echo $(4) || echo $(5))
+
+# __cc-if-fullversion
+# Matches compiler name and full version
+# Usage: EXTRA_CFLAGS += $(call cc-if-name-fullversion, gcc, -lt, 040502, -O1)
+__cc-if-fullversion = $(shell [ $(cc-name) = $(1) ] && [ $(__cc-fullversion) $(2) $(3) ] && echo $(4) || echo $(5))
+
+# gcc-ifversion
+gcc-ifversion = $(call __cc-ifversion, gcc, $(1), $(2), $(3), $(4))
+
+# gcc-if-fullversion
+gcc-if-fullversion = (call __cc-if-fullversion, gcc, $(1), $(2), $(3), $(4))
+
+# clang-ifversion
+clang-ifversion = $(call __cc-ifversion, clang, $(1), $(2), $(3), $(4))
+
+# clang-if-fullversion
+clang-if-fullversion = (call __cc-if-fullversion, clang, $(1), $(2), $(3), $(4))
+
# cc-version
cc-version = $(shell $(CONFIG_SHELL) $(srctree)/scripts/gcc-version.sh $(CC))
@@ -150,9 +181,9 @@
cc-fullversion = $(shell $(CONFIG_SHELL) \
$(srctree)/scripts/gcc-version.sh -p $(CC))
-# cc-ifversion
-# Usage: EXTRA_CFLAGS += $(call cc-ifversion, -lt, 0402, -O1)
-cc-ifversion = $(shell [ $(cc-version) $(1) $(2) ] && echo $(3) || echo $(4))
+# backward compatibility
+cc-ifversion = $(gcc-ifversion)
+cc-if-fullversion = $(gcc-if-fullversion)
# cc-if-fullversion
# Usage: EXTRA_CFLAGS += $(call cc-if-fullversion, -lt, 040502, -O1)
@@ -174,6 +205,10 @@
# Important: no spaces around options
ar-option = $(call try-run, $(AR) rc$(1) "$$TMP",$(1),$(2))
+# ld-name
+# Expands to either bfd or gold
+ld-name = $(shell $(LD) -v 2>&1 | grep -q "GNU gold" && echo gold || echo bfd)
+
# ld-version
# Note this is mainly for HJ Lu's 3 number binutil versions
ld-version = $(shell $(LD) --version | $(srctree)/scripts/ld-version.sh)
@@ -182,6 +217,18 @@
# Usage: $(call ld-ifversion, -ge, 22252, y)
ld-ifversion = $(shell [ $(ld-version) $(1) $(2) ] && echo $(3) || echo $(4))
+# __ld-ifversion
+# Usage: $(call __ld-ifversion, gold, -ge, 112000000, y)
+__ld-ifversion = $(shell [ $(ld-name) = $(1) ] && [ $(ld-version) $(2) $(3) ] && echo $(4) || echo $(5))
+
+# bfd-ifversion
+# Usage: $(call bfd-ifversion, -ge, 227000000, y)
+bfd-ifversion = $(call __ld-ifversion, bfd, $(1), $(2), $(3), $(4))
+
+# gold-ifversion
+# Usage: $(call gold-ifversion, -ge, 112000000, y)
+gold-ifversion = $(call __ld-ifversion, gold, $(1), $(2), $(3), $(4))
+
######
###
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 7143da0..b5b05f1 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -210,6 +210,23 @@
cmd_cc_o_c = $(CC) $(c_flags) -c -o $(@D)/.tmp_$(@F) $<
+ifdef CONFIG_LTO_CLANG
+# Generate .o.symversions files for each .o with exported symbols, and link these
+# to the kernel and/or modules at the end.
+cmd_modversions_c = \
+ if $(OBJDUMP) -h $(@D)/.tmp_$(@F) >/dev/null 2>/dev/null; then \
+ if $(OBJDUMP) -h $(@D)/.tmp_$(@F) | grep -q __ksymtab; then \
+ $(call cmd_gensymtypes_c,$(KBUILD_SYMTYPES),$(@:.o=.symtypes)) \
+ > $(@D)/$(@F).symversions; \
+ fi; \
+ else \
+ if $(LLVM_DIS) -o=- $(@D)/.tmp_$(@F) | grep -q __ksymtab; then \
+ $(call cmd_gensymtypes_c,$(KBUILD_SYMTYPES),$(@:.o=.symtypes)) \
+ > $(@D)/$(@F).symversions; \
+ fi; \
+ fi; \
+ mv -f $(@D)/.tmp_$(@F) $@;
+else
cmd_modversions_c = \
if $(OBJDUMP) -h $(@D)/.tmp_$(@F) | grep -q __ksymtab; then \
$(call cmd_gensymtypes_c,$(KBUILD_SYMTYPES),$(@:.o=.symtypes)) \
@@ -222,12 +239,19 @@
mv -f $(@D)/.tmp_$(@F) $@; \
fi;
endif
+endif
ifdef CONFIG_FTRACE_MCOUNT_RECORD
ifdef BUILD_C_RECORDMCOUNT
ifeq ("$(origin RECORDMCOUNT_WARN)", "command line")
RECORDMCOUNT_FLAGS = -w
endif
+
+ifdef CONFIG_LTO_CLANG
+# With LTO, we postpone running recordmcount until after the LTO link step, so
+# let's export the parameters for the link script.
+export RECORDMCOUNT_FLAGS
+else
# Due to recursion, we must skip empty.o.
# The empty.o file is created in the make process in order to determine
# the target endianness and word size. It is made before all other C
@@ -236,22 +260,28 @@
if [ $(@) != "scripts/mod/empty.o" ]; then \
$(objtree)/scripts/recordmcount $(RECORDMCOUNT_FLAGS) "$(@)"; \
fi;
+endif
+
recordmcount_source := $(srctree)/scripts/recordmcount.c \
$(srctree)/scripts/recordmcount.h
-else
+else # !BUILD_C_RECORDMCOUNT
sub_cmd_record_mcount = set -e ; perl $(srctree)/scripts/recordmcount.pl "$(ARCH)" \
"$(if $(CONFIG_CPU_BIG_ENDIAN),big,little)" \
"$(if $(CONFIG_64BIT),64,32)" \
"$(OBJDUMP)" "$(OBJCOPY)" "$(CC) $(KBUILD_CFLAGS)" \
"$(LD)" "$(NM)" "$(RM)" "$(MV)" \
"$(if $(part-of-module),1,0)" "$(@)";
+
recordmcount_source := $(srctree)/scripts/recordmcount.pl
endif # BUILD_C_RECORDMCOUNT
+
+ifndef CONFIG_LTO_CLANG
cmd_record_mcount = \
if [ "$(findstring $(CC_FLAGS_FTRACE),$(_c_flags))" = \
"$(CC_FLAGS_FTRACE)" ]; then \
$(sub_cmd_record_mcount) \
fi;
+endif
endif # CONFIG_FTRACE_MCOUNT_RECORD
ifdef CONFIG_STACK_VALIDATION
@@ -462,8 +492,29 @@
#
ifdef builtin-target
+ifdef CONFIG_LTO_CLANG
+ ifdef CONFIG_MODVERSIONS
+ # combine symversions for later processing
+ update_lto_symversions = \
+ rm -f $@.symversions; \
+ for i in $(filter-out FORCE,$^); do \
+ if [ -f $$i.symversions ]; then \
+ cat $$i.symversions \
+ >> $@.symversions; \
+ fi; \
+ done;
+ endif
+ # rebuild the symbol table with llvm-ar to include IR files
+ update_lto_symtable = ; \
+ mv -f $@ $@.tmp; \
+ $(LLVM_AR) rcsT$(KBUILD_ARFLAGS) $@ \
+ $$($(AR) t $@.tmp); \
+ rm -f $@.tmp
+endif
+
ifdef CONFIG_THIN_ARCHIVES
- cmd_make_builtin = rm -f $@; $(AR) rcSTP$(KBUILD_ARFLAGS)
+ cmd_make_builtin = $(update_lto_symversions) \
+ rm -f $@; $(AR) rcSTP$(KBUILD_ARFLAGS)
cmd_make_empty_builtin = rm -f $@; $(AR) rcSTP$(KBUILD_ARFLAGS)
quiet_cmd_link_o_target = AR $@
else
@@ -504,7 +555,11 @@
quiet_cmd_link_l_target = AR $@
ifdef CONFIG_THIN_ARCHIVES
- cmd_link_l_target = rm -f $@; $(AR) rcsTP$(KBUILD_ARFLAGS) $@ $(lib-y)
+ cmd_link_l_target = \
+ $(update_lto_symversions) \
+ rm -f $@; \
+ $(AR) rcsTP$(KBUILD_ARFLAGS) $@ $(lib-y) \
+ $(update_lto_symtable)
else
cmd_link_l_target = rm -f $@; $(AR) rcs$(KBUILD_ARFLAGS) $@ $(lib-y)
endif
@@ -522,14 +577,36 @@
ref_prefix = EXTERN(
endif
-quiet_cmd_export_list = EXPORTS $@
-cmd_export_list = $(OBJDUMP) -h $< | \
- sed -ne '/___ksymtab/s/.*+\([^ ]*\).*/$(ref_prefix)\1)/p' >$(ksyms-lds);\
- rm -f $(dummy-object);\
+filter_export_list = sed -ne '/___ksymtab/s/.*+\([^ "]*\).*/$(ref_prefix)\1)/p'
+link_export_list = rm -f $(dummy-object);\
echo | $(CC) $(a_flags) -c -o $(dummy-object) -x assembler -;\
$(LD) $(ld_flags) -r -o $@ -T $(ksyms-lds) $(dummy-object);\
rm $(dummy-object) $(ksyms-lds)
+quiet_cmd_export_list = EXPORTS $@
+
+ifdef CONFIG_LTO_CLANG
+# objdump doesn't understand IR files and llvm-dis doesn't support archives,
+# so we'll walk through each file in the archive separately
+cmd_export_list = \
+ rm -f $(ksyms-lds); \
+ for o in $$($(AR) t $<); do \
+ if $(OBJDUMP) -h $$o >/dev/null 2>/dev/null; then \
+ $(OBJDUMP) -h $$o | \
+ $(filter_export_list) \
+ >>$(ksyms-lds); \
+ else \
+ $(LLVM_DIS) -o=- $$o | \
+ $(filter_export_list) \
+ >>$(ksyms-lds); \
+ fi; \
+ done; \
+ $(link_export_list)
+else
+cmd_export_list = $(OBJDUMP) -h $< | $(filter_export_list) >$(ksyms-lds); \
+ $(link_export_list)
+endif
+
$(obj)/lib-ksyms.o: $(lib-target) FORCE
$(call if_changed,export_list)
@@ -557,23 +634,32 @@
ifdef CONFIG_THIN_ARCHIVES
quiet_cmd_link_multi-y = AR $@
- cmd_link_multi-y = rm -f $@; $(AR) rcSTP$(KBUILD_ARFLAGS) $@ $(link_multi_deps)
+ cmd_link_multi-y = $(update_lto_symversions) \
+ rm -f $@; $(AR) rcSTP$(KBUILD_ARFLAGS) $@ $(link_multi_deps) \
+ $(update_lto_symtable)
else
quiet_cmd_link_multi-y = LD $@
cmd_link_multi-y = $(cmd_link_multi-link)
endif
quiet_cmd_link_multi-m = LD [M] $@
-cmd_link_multi-m = $(cmd_link_multi-link)
+
+ifdef CONFIG_LTO_CLANG
+ # don't compile IR until needed
+ cmd_link_multi-m = $(cmd_link_multi-y)
+else
+ cmd_link_multi-m = $(cmd_link_multi-link)
+endif
$(multi-used-y): FORCE
$(call if_changed,link_multi-y)
-$(call multi_depend, $(multi-used-y), .o, -objs -y)
$(multi-used-m): FORCE
$(call if_changed,link_multi-m)
@{ echo $(@:.o=.ko); echo $(link_multi_deps); \
$(cmd_undef_syms); } > $(MODVERDIR)/$(@F:.o=.mod)
+
+$(call multi_depend, $(multi-used-y), .o, -objs -y)
$(call multi_depend, $(multi-used-m), .o, -objs -y -m)
targets += $(multi-used-y) $(multi-used-m)
diff --git a/scripts/Makefile.clean b/scripts/Makefile.clean
index 808d09f..11f4067 100644
--- a/scripts/Makefile.clean
+++ b/scripts/Makefile.clean
@@ -12,7 +12,7 @@
# The filename Kbuild has precedence over Makefile
kbuild-dir := $(if $(filter /%,$(src)),$(src),$(srctree)/$(src))
-include $(if $(wildcard $(kbuild-dir)/Kbuild), $(kbuild-dir)/Kbuild, $(kbuild-dir)/Makefile)
+-include $(if $(wildcard $(kbuild-dir)/Kbuild), $(kbuild-dir)/Kbuild, $(kbuild-dir)/Makefile)
# Figure out what we need to build from the various variables
# ==========================================================================
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index aac94d9..36fa07a 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -318,6 +318,12 @@
dtc-tmp = $(subst $(comma),_,$(dot-target).dts.tmp)
+# cat
+# ---------------------------------------------------------------------------
+# Concatentate multiple files together
+quiet_cmd_cat = CAT $@
+cmd_cat = (cat $(filter-out FORCE,$^) > $@) || (rm -f $@; false)
+
# Bzip2
# ---------------------------------------------------------------------------
diff --git a/scripts/Makefile.modinst b/scripts/Makefile.modinst
index 51ca024..96524c6 100644
--- a/scripts/Makefile.modinst
+++ b/scripts/Makefile.modinst
@@ -30,7 +30,7 @@
INSTALL_MOD_DIR ?= extra
ext-mod-dir = $(INSTALL_MOD_DIR)$(subst $(patsubst %/,%,$(KBUILD_EXTMOD)),,$(@D))
-modinst_dir = $(if $(KBUILD_EXTMOD),$(ext-mod-dir),kernel/$(@D))
+modinst_dir ?= $(if $(KBUILD_EXTMOD),$(ext-mod-dir),kernel/$(@D))
$(modules):
$(call cmd,modules_install,$(MODLIB)/$(modinst_dir))
diff --git a/scripts/Makefile.modpost b/scripts/Makefile.modpost
index 991db7d..acd7ea0 100644
--- a/scripts/Makefile.modpost
+++ b/scripts/Makefile.modpost
@@ -83,12 +83,28 @@
MODPOST_OPT=$(subst -i,-n,$(filter -i,$(MAKEFLAGS)))
+# If CONFIG_LTO_CLANG is enabled, .o files are either LLVM IR, or empty, so we
+# need to link them into actual objects before passing them to modpost
+modpost-ext = $(if $(CONFIG_LTO_CLANG),.lto,)
+
+ifdef CONFIG_LTO_CLANG
+quiet_cmd_cc_lto_link_modules = LD [M] $@
+cmd_cc_lto_link_modules = \
+ $(LD) $(ld_flags) -r -o $(@) \
+ $(shell [ -s $(@:$(modpost-ext).o=.o.symversions) ] && \
+ echo -T $(@:$(modpost-ext).o=.o.symversions)) \
+ --whole-archive $(filter-out FORCE,$^)
+
+$(modules:.ko=$(modpost-ext).o): %$(modpost-ext).o: %.o FORCE
+ $(call if_changed,cc_lto_link_modules)
+endif
+
# We can go over command line length here, so be careful.
quiet_cmd_modpost = MODPOST $(words $(filter-out vmlinux FORCE, $^)) modules
- cmd_modpost = $(MODLISTCMD) | sed 's/\.ko$$/.o/' | $(modpost) $(MODPOST_OPT) -s -T -
+ cmd_modpost = $(MODLISTCMD) | sed 's/\.ko$$/$(modpost-ext)\.o/' | $(modpost) $(MODPOST_OPT) -s -T -
PHONY += __modpost
-__modpost: $(modules:.ko=.o) FORCE
+__modpost: $(modules:.ko=$(modpost-ext).o) FORCE
$(call cmd,modpost) $(wildcard vmlinux)
quiet_cmd_kernel-mod = MODPOST $@
@@ -98,8 +114,7 @@
$(call cmd,kernel-mod)
# Declare generated files as targets for modpost
-$(modules:.ko=.mod.c): __modpost ;
-
+$(modules:.ko=$(modpost-ext).mod.c): __modpost ;
# Step 5), compile all *.mod.c files
@@ -110,22 +125,37 @@
cmd_cc_o_c = $(CC) $(c_flags) $(KBUILD_CFLAGS_MODULE) $(CFLAGS_MODULE) \
-c -o $@ $<
-$(modules:.ko=.mod.o): %.mod.o: %.mod.c FORCE
+$(modules:.ko=.mod.o): %.mod.o: %$(modpost-ext).mod.c FORCE
$(call if_changed_dep,cc_o_c)
-targets += $(modules:.ko=.mod.o)
+targets += $(modules:.ko=$(modpost-ext).mod.o)
ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
# Step 6), final link of the modules with optional arch pass after final link
quiet_cmd_ld_ko_o = LD [M] $@
+
+ifdef CONFIG_LTO_CLANG
+ cmd_ld_ko_o = \
+ $(LD) -r $(LDFLAGS) \
+ $(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE) \
+ $(shell [ -s $(@:.ko=.o.symversions) ] && \
+ echo -T $(@:.ko=.o.symversions)) \
+ -o $@ --whole-archive \
+ $(filter-out FORCE,$(^:$(modpost-ext).o=.o))
+
+ ifdef CONFIG_FTRACE_MCOUNT_RECORD
+ cmd_ld_ko_o += ; $(objtree)/scripts/recordmcount $(RECORDMCOUNT_FLAGS) $@
+ endif
+else
cmd_ld_ko_o = \
$(LD) -r $(LDFLAGS) \
$(KBUILD_LDFLAGS_MODULE) $(LDFLAGS_MODULE) \
-o $@ $(filter-out FORCE,$^) ; \
$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
+endif
-$(modules): %.ko :%.o %.mod.o FORCE
+$(modules): %.ko: %$(modpost-ext).o %.mod.o FORCE
+$(call if_changed,ld_ko_o)
targets += $(modules)
diff --git a/scripts/clang-version.sh b/scripts/clang-version.sh
new file mode 100755
index 0000000..9780efa
--- /dev/null
+++ b/scripts/clang-version.sh
@@ -0,0 +1,33 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+#
+# clang-version [-p] clang-command
+#
+# Prints the compiler version of `clang-command' in a canonical 4-digit form
+# such as `0500' for clang-5.0 etc.
+#
+# With the -p option, prints the patchlevel as well, for example `050001' for
+# clang-5.0.1 etc.
+#
+
+if [ "$1" = "-p" ] ; then
+ with_patchlevel=1;
+ shift;
+fi
+
+compiler="$*"
+
+if [ ${#compiler} -eq 0 ]; then
+ echo "Error: No compiler specified."
+ printf "Usage:\n\t$0 <clang-command>\n"
+ exit 1
+fi
+
+MAJOR=$(echo __clang_major__ | $compiler -E -x c - | tail -n 1)
+MINOR=$(echo __clang_minor__ | $compiler -E -x c - | tail -n 1)
+if [ "x$with_patchlevel" != "x" ] ; then
+ PATCHLEVEL=$(echo __clang_patchlevel__ | $compiler -E -x c - | tail -n 1)
+ printf "%02d%02d%02d\\n" $MAJOR $MINOR $PATCHLEVEL
+else
+ printf "%02d%02d\\n" $MAJOR $MINOR
+fi
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index e6818b8..cfa4471 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -61,9 +61,40 @@
${AR} rcsTP${KBUILD_ARFLAGS} built-in.o \
${KBUILD_VMLINUX_INIT} \
${KBUILD_VMLINUX_MAIN}
+
+ if [ -n "${CONFIG_LTO_CLANG}" ]; then
+ mv -f built-in.o built-in.o.tmp
+ ${LLVM_AR} rcsT${KBUILD_ARFLAGS} built-in.o $(${AR} t built-in.o.tmp)
+ rm -f built-in.o.tmp
+ fi
fi
}
+# If CONFIG_LTO_CLANG is selected, collect generated symbol versions into
+# .tmp_symversions
+modversions()
+{
+ if [ -z "${CONFIG_LTO_CLANG}" ]; then
+ return
+ fi
+
+ if [ -z "${CONFIG_MODVERSIONS}" ]; then
+ return
+ fi
+
+ rm -f .tmp_symversions
+
+ for a in built-in.o ${KBUILD_VMLINUX_LIBS}; do
+ for o in $(${AR} t $a); do
+ if [ -f ${o}.symversions ]; then
+ cat ${o}.symversions >> .tmp_symversions
+ fi
+ done
+ done
+
+ echo "-T .tmp_symversions"
+}
+
# Link of vmlinux.o used for section mismatch analysis
# ${1} output file
modpost_link()
@@ -84,7 +115,29 @@
${KBUILD_VMLINUX_LIBS} \
--end-group"
fi
- ${LD} ${LDFLAGS} -r -o ${1} ${objects}
+
+ if [ -n "${CONFIG_LTO_CLANG}" ]; then
+ # This might take a while, so indicate that we're doing
+ # an LTO link
+ info LTO vmlinux.o
+ else
+ info LD vmlinux.o
+ fi
+
+ ${LD} ${LDFLAGS} -r -o ${1} $(modversions) ${objects}
+}
+
+# If CONFIG_LTO_CLANG is selected, we postpone running recordmcount until
+# we have compiled LLVM IR to an object file.
+recordmcount()
+{
+ if [ -z "${CONFIG_LTO_CLANG}" ]; then
+ return
+ fi
+
+ if [ -n "${CONFIG_FTRACE_MCOUNT_RECORD}" ]; then
+ scripts/recordmcount ${RECORDMCOUNT_FLAGS} $*
+ fi
}
# Link of vmlinux
@@ -96,8 +149,16 @@
local objects
if [ "${SRCARCH}" != "um" ]; then
- if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
- objects="--whole-archive \
+ local ld=${LD}
+ local ldflags="${LDFLAGS} ${LDFLAGS_vmlinux}"
+
+ if [ -n "${LDFINAL_vmlinux}" ]; then
+ ld=${LDFINAL_vmlinux}
+ ldflags="${LDFLAGS_FINAL_vmlinux} ${LDFLAGS_vmlinux}"
+ fi
+
+ if [[ -n "${CONFIG_THIN_ARCHIVES}" && -z "${CONFIG_LTO_CLANG}" ]]; then
+ objects="--whole-archive \
built-in.o \
--no-whole-archive \
--start-group \
@@ -113,8 +174,7 @@
${1}"
fi
- ${LD} ${LDFLAGS} ${LDFLAGS_vmlinux} -o ${2} \
- -T ${lds} ${objects}
+ ${ld} ${ldflags} -o ${2} -T ${lds} ${objects}
else
if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
objects="-Wl,--whole-archive \
@@ -141,7 +201,6 @@
fi
}
-
# Create ${2} .o file with all symbols from the ${1} object file
kallsyms()
{
@@ -192,6 +251,7 @@
rm -f .tmp_System.map
rm -f .tmp_kallsyms*
rm -f .tmp_version
+ rm -f .tmp_symversions
rm -f .tmp_vmlinux*
rm -f built-in.o
rm -f System.map
@@ -253,12 +313,22 @@
archive_builtin
#link vmlinux.o
-info LD vmlinux.o
modpost_link vmlinux.o
# modpost vmlinux.o to check for section mismatches
${MAKE} -f "${srctree}/scripts/Makefile.modpost" vmlinux.o
+if [ -n "${CONFIG_LTO_CLANG}" ]; then
+ # Re-use vmlinux.o, so we can avoid the slow LTO link step in
+ # vmlinux_link
+ KBUILD_VMLINUX_INIT=
+ KBUILD_VMLINUX_MAIN=vmlinux.o
+ KBUILD_VMLINUX_LIBS=
+
+ # Call recordmcount if needed
+ recordmcount vmlinux.o
+fi
+
kallsymso=""
kallsyms_vmlinux=""
if [ -n "${CONFIG_KALLSYMS}" ]; then
diff --git a/scripts/mod/Makefile b/scripts/mod/Makefile
index 42c5d50..e014b2f 100644
--- a/scripts/mod/Makefile
+++ b/scripts/mod/Makefile
@@ -1,5 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
OBJECT_FILES_NON_STANDARD := y
+CFLAGS_empty.o += $(DISABLE_LTO)
hostprogs-y := modpost mk_elfconfig
always := $(hostprogs-y) empty.o
diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c
index 16e086d..69a7699 100644
--- a/scripts/recordmcount.c
+++ b/scripts/recordmcount.c
@@ -420,7 +420,8 @@
strcmp(".softirqentry.text", txtname) == 0 ||
strcmp(".kprobes.text", txtname) == 0 ||
strcmp(".cpuidle.text", txtname) == 0 ||
- strcmp(".text.unlikely", txtname) == 0;
+ (strncmp(".text.", txtname, 6) == 0 &&
+ strcmp(".text..ftrace", txtname) != 0);
}
/* 32 bit and 64 bit are very similar */
diff --git a/security/Kconfig b/security/Kconfig
index 87f2a6f..65d29c3 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -18,6 +18,15 @@
If you are unsure how to answer this question, answer N.
+config SECURITY_PERF_EVENTS_RESTRICT
+ bool "Restrict unprivileged use of performance events"
+ depends on PERF_EVENTS
+ help
+ If you say Y here, the kernel.perf_event_paranoid sysctl
+ will be set to 3 by default, and no unprivileged use of the
+ perf_event_open syscall will be permitted unless it is
+ changed.
+
config SECURITY
bool "Enable different security models"
depends on SYSFS
diff --git a/security/apparmor/lsm.c b/security/apparmor/lsm.c
index 1346ee5..17893fd 100644
--- a/security/apparmor/lsm.c
+++ b/security/apparmor/lsm.c
@@ -813,11 +813,11 @@
.get = param_get_aalockpolicy
};
-static int param_set_audit(const char *val, struct kernel_param *kp);
-static int param_get_audit(char *buffer, struct kernel_param *kp);
+static int param_set_audit(const char *val, const struct kernel_param *kp);
+static int param_get_audit(char *buffer, const struct kernel_param *kp);
-static int param_set_mode(const char *val, struct kernel_param *kp);
-static int param_get_mode(char *buffer, struct kernel_param *kp);
+static int param_set_mode(const char *val, const struct kernel_param *kp);
+static int param_get_mode(char *buffer, const struct kernel_param *kp);
/* Flag values, also controllable via /sys/module/apparmor/parameters
* We define special types as we want to do additional mediation.
@@ -951,7 +951,7 @@
return param_get_uint(buffer, kp);
}
-static int param_get_audit(char *buffer, struct kernel_param *kp)
+static int param_get_audit(char *buffer, const struct kernel_param *kp)
{
if (!apparmor_enabled)
return -EINVAL;
@@ -960,7 +960,7 @@
return sprintf(buffer, "%s", audit_mode_names[aa_g_audit]);
}
-static int param_set_audit(const char *val, struct kernel_param *kp)
+static int param_set_audit(const char *val, const struct kernel_param *kp)
{
int i;
@@ -981,7 +981,7 @@
return -EINVAL;
}
-static int param_get_mode(char *buffer, struct kernel_param *kp)
+static int param_get_mode(char *buffer, const struct kernel_param *kp)
{
if (!apparmor_enabled)
return -EINVAL;
@@ -991,7 +991,7 @@
return sprintf(buffer, "%s", aa_profile_mode_names[aa_g_profile_mode]);
}
-static int param_set_mode(const char *val, struct kernel_param *kp)
+static int param_set_mode(const char *val, const struct kernel_param *kp)
{
int i;
diff --git a/security/commoncap.c b/security/commoncap.c
index ae26ef0..347e2f3 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -31,6 +31,10 @@
#include <linux/binfmts.h>
#include <linux/personality.h>
+#ifdef CONFIG_ANDROID_PARANOID_NETWORK
+#include <linux/android_aid.h>
+#endif
+
/*
* If a non-root user executes a setuid-root binary in
* !secure(SECURE_NOROOT) mode, then we raise capabilities.
@@ -54,7 +58,7 @@
}
/**
- * cap_capable - Determine whether a task has a particular effective capability
+ * __cap_capable - Determine whether a task has a particular effective capability
* @cred: The credentials to use
* @ns: The user namespace in which we need the capability
* @cap: The capability to check for
@@ -68,7 +72,7 @@
* cap_has_capability() returns 0 when a task has a capability, but the
* kernel's capable() and has_capability() returns 1 for this case.
*/
-int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
+int __cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
int cap, int audit)
{
struct user_namespace *ns = targ_ns;
@@ -106,6 +110,27 @@
/* We never get here */
}
+int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
+ int cap, int audit)
+{
+ int ret = __cap_capable(cred, targ_ns, cap, audit);
+
+#ifdef CONFIG_ANDROID_PARANOID_NETWORK
+ if (ret != 0 && cap == CAP_NET_RAW && in_egroup_p(AID_NET_RAW)) {
+ printk("Process %s granted CAP_NET_RAW from Android group net_raw.\n", current->comm);
+ printk(" Please update the .rc file to explictly set 'capabilities NET_RAW'\n");
+ printk(" Implicit grants are deprecated and will be removed in the future.\n");
+ return 0;
+ }
+ if (ret != 0 && cap == CAP_NET_ADMIN && in_egroup_p(AID_NET_ADMIN)) {
+ printk("Process %s granted CAP_NET_ADMIN from Android group net_admin.\n", current->comm);
+ printk(" Please update the .rc file to explictly set 'capabilities NET_ADMIN'\n");
+ printk(" Implicit grants are deprecated and will be removed in the future.\n");
+ return 0;
+ }
+#endif
+ return ret;
+}
/**
* cap_settime - Determine whether the current process may set the system clock
* @ts: The time to set
diff --git a/security/inode.c b/security/inode.c
index 8dd9ca8..bf28109 100644
--- a/security/inode.c
+++ b/security/inode.c
@@ -122,7 +122,7 @@
dir = d_inode(parent);
inode_lock(dir);
- dentry = lookup_one_len(name, parent, strlen(name));
+ dentry = lookup_one_len2(name, mount, parent, strlen(name));
if (IS_ERR(dentry))
goto out;
diff --git a/security/security.c b/security/security.c
index 4bf0f57..264a5e5 100644
--- a/security/security.c
+++ b/security/security.c
@@ -12,6 +12,7 @@
* (at your option) any later version.
*/
+#include <linux/bpf.h>
#include <linux/capability.h>
#include <linux/dcache.h>
#include <linux/module.h>
@@ -595,6 +596,7 @@
return 0;
return call_int_hook(path_chown, 0, path, uid, gid);
}
+EXPORT_SYMBOL(security_path_chown);
int security_path_chroot(const struct path *path)
{
@@ -1703,3 +1705,34 @@
actx);
}
#endif /* CONFIG_AUDIT */
+
+#ifdef CONFIG_BPF_SYSCALL
+int security_bpf(int cmd, union bpf_attr *attr, unsigned int size)
+{
+ return call_int_hook(bpf, 0, cmd, attr, size);
+}
+int security_bpf_map(struct bpf_map *map, fmode_t fmode)
+{
+ return call_int_hook(bpf_map, 0, map, fmode);
+}
+int security_bpf_prog(struct bpf_prog *prog)
+{
+ return call_int_hook(bpf_prog, 0, prog);
+}
+int security_bpf_map_alloc(struct bpf_map *map)
+{
+ return call_int_hook(bpf_map_alloc_security, 0, map);
+}
+int security_bpf_prog_alloc(struct bpf_prog_aux *aux)
+{
+ return call_int_hook(bpf_prog_alloc_security, 0, aux);
+}
+void security_bpf_map_free(struct bpf_map *map)
+{
+ call_void_hook(bpf_map_free_security, map);
+}
+void security_bpf_prog_free(struct bpf_prog_aux *aux)
+{
+ call_void_hook(bpf_prog_free_security, aux);
+}
+#endif /* CONFIG_BPF_SYSCALL */
diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index f5d3047..2e3a627 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -85,6 +85,7 @@
#include <linux/export.h>
#include <linux/msg.h>
#include <linux/shm.h>
+#include <linux/bpf.h>
#include "avc.h"
#include "objsec.h"
@@ -1814,6 +1815,10 @@
return inode_has_perm(cred, file_inode(file), av, &ad);
}
+#ifdef CONFIG_BPF_SYSCALL
+static int bpf_fd_pass(struct file *file, u32 sid);
+#endif
+
/* Check whether a task can use an open file descriptor to
access an inode in a given way. Check access to the
descriptor itself, and then use dentry_has_perm to
@@ -1844,6 +1849,12 @@
goto out;
}
+#ifdef CONFIG_BPF_SYSCALL
+ rc = bpf_fd_pass(file, cred_sid(cred));
+ if (rc)
+ return rc;
+#endif
+
/* av is zero if only checking access to the descriptor. */
rc = 0;
if (av)
@@ -2164,6 +2175,12 @@
return rc;
}
+#ifdef CONFIG_BPF_SYSCALL
+ rc = bpf_fd_pass(file, sid);
+ if (rc)
+ return rc;
+#endif
+
if (unlikely(IS_PRIVATE(d_backing_inode(dentry))))
return 0;
@@ -6252,6 +6269,139 @@
}
#endif
+#ifdef CONFIG_BPF_SYSCALL
+static int selinux_bpf(int cmd, union bpf_attr *attr,
+ unsigned int size)
+{
+ u32 sid = current_sid();
+ int ret;
+
+ switch (cmd) {
+ case BPF_MAP_CREATE:
+ ret = avc_has_perm(sid, sid, SECCLASS_BPF, BPF__MAP_CREATE,
+ NULL);
+ break;
+ case BPF_PROG_LOAD:
+ ret = avc_has_perm(sid, sid, SECCLASS_BPF, BPF__PROG_LOAD,
+ NULL);
+ break;
+ default:
+ ret = 0;
+ break;
+ }
+
+ return ret;
+}
+
+static u32 bpf_map_fmode_to_av(fmode_t fmode)
+{
+ u32 av = 0;
+
+ if (fmode & FMODE_READ)
+ av |= BPF__MAP_READ;
+ if (fmode & FMODE_WRITE)
+ av |= BPF__MAP_WRITE;
+ return av;
+}
+
+/* This function will check the file pass through unix socket or binder to see
+ * if it is a bpf related object. And apply correspinding checks on the bpf
+ * object based on the type. The bpf maps and programs, not like other files and
+ * socket, are using a shared anonymous inode inside the kernel as their inode.
+ * So checking that inode cannot identify if the process have privilege to
+ * access the bpf object and that's why we have to add this additional check in
+ * selinux_file_receive and selinux_binder_transfer_files.
+ */
+static int bpf_fd_pass(struct file *file, u32 sid)
+{
+ struct bpf_security_struct *bpfsec;
+ struct bpf_prog *prog;
+ struct bpf_map *map;
+ int ret;
+
+ if (file->f_op == &bpf_map_fops) {
+ map = file->private_data;
+ bpfsec = map->security;
+ ret = avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF,
+ bpf_map_fmode_to_av(file->f_mode), NULL);
+ if (ret)
+ return ret;
+ } else if (file->f_op == &bpf_prog_fops) {
+ prog = file->private_data;
+ bpfsec = prog->aux->security;
+ ret = avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF,
+ BPF__PROG_RUN, NULL);
+ if (ret)
+ return ret;
+ }
+ return 0;
+}
+
+static int selinux_bpf_map(struct bpf_map *map, fmode_t fmode)
+{
+ u32 sid = current_sid();
+ struct bpf_security_struct *bpfsec;
+
+ bpfsec = map->security;
+ return avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF,
+ bpf_map_fmode_to_av(fmode), NULL);
+}
+
+static int selinux_bpf_prog(struct bpf_prog *prog)
+{
+ u32 sid = current_sid();
+ struct bpf_security_struct *bpfsec;
+
+ bpfsec = prog->aux->security;
+ return avc_has_perm(sid, bpfsec->sid, SECCLASS_BPF,
+ BPF__PROG_RUN, NULL);
+}
+
+static int selinux_bpf_map_alloc(struct bpf_map *map)
+{
+ struct bpf_security_struct *bpfsec;
+
+ bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
+ if (!bpfsec)
+ return -ENOMEM;
+
+ bpfsec->sid = current_sid();
+ map->security = bpfsec;
+
+ return 0;
+}
+
+static void selinux_bpf_map_free(struct bpf_map *map)
+{
+ struct bpf_security_struct *bpfsec = map->security;
+
+ map->security = NULL;
+ kfree(bpfsec);
+}
+
+static int selinux_bpf_prog_alloc(struct bpf_prog_aux *aux)
+{
+ struct bpf_security_struct *bpfsec;
+
+ bpfsec = kzalloc(sizeof(*bpfsec), GFP_KERNEL);
+ if (!bpfsec)
+ return -ENOMEM;
+
+ bpfsec->sid = current_sid();
+ aux->security = bpfsec;
+
+ return 0;
+}
+
+static void selinux_bpf_prog_free(struct bpf_prog_aux *aux)
+{
+ struct bpf_security_struct *bpfsec = aux->security;
+
+ aux->security = NULL;
+ kfree(bpfsec);
+}
+#endif
+
static struct security_hook_list selinux_hooks[] __lsm_ro_after_init = {
LSM_HOOK_INIT(binder_set_context_mgr, selinux_binder_set_context_mgr),
LSM_HOOK_INIT(binder_transaction, selinux_binder_transaction),
@@ -6471,6 +6621,16 @@
LSM_HOOK_INIT(audit_rule_match, selinux_audit_rule_match),
LSM_HOOK_INIT(audit_rule_free, selinux_audit_rule_free),
#endif
+
+#ifdef CONFIG_BPF_SYSCALL
+ LSM_HOOK_INIT(bpf, selinux_bpf),
+ LSM_HOOK_INIT(bpf_map, selinux_bpf_map),
+ LSM_HOOK_INIT(bpf_prog, selinux_bpf_prog),
+ LSM_HOOK_INIT(bpf_map_alloc_security, selinux_bpf_map_alloc),
+ LSM_HOOK_INIT(bpf_prog_alloc_security, selinux_bpf_prog_alloc),
+ LSM_HOOK_INIT(bpf_map_free_security, selinux_bpf_map_free),
+ LSM_HOOK_INIT(bpf_prog_free_security, selinux_bpf_prog_free),
+#endif
};
static __init int selinux_init(void)
diff --git a/security/selinux/include/classmap.h b/security/selinux/include/classmap.h
index cc35695..acdee77 100644
--- a/security/selinux/include/classmap.h
+++ b/security/selinux/include/classmap.h
@@ -238,6 +238,8 @@
{ "access", NULL } },
{ "infiniband_endport",
{ "manage_subnet", NULL } },
+ { "bpf",
+ {"map_create", "map_read", "map_write", "prog_load", "prog_run"} },
{ NULL }
};
diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h
index 1649cd1..3d54468 100644
--- a/security/selinux/include/objsec.h
+++ b/security/selinux/include/objsec.h
@@ -150,6 +150,10 @@
u32 sid; /* SID of pkey */
};
+struct bpf_security_struct {
+ u32 sid; /*SID of bpf obj creater*/
+};
+
extern unsigned int selinux_checkreqprot;
#endif /* _SELINUX_OBJSEC_H_ */
diff --git a/verity_dev_keys.x509 b/verity_dev_keys.x509
new file mode 100644
index 0000000..86399c3
--- /dev/null
+++ b/verity_dev_keys.x509
@@ -0,0 +1,24 @@
+-----BEGIN CERTIFICATE-----
+MIID/TCCAuWgAwIBAgIJAJcPmDkJqolJMA0GCSqGSIb3DQEBBQUAMIGUMQswCQYD
+VQQGEwJVUzETMBEGA1UECAwKQ2FsaWZvcm5pYTEWMBQGA1UEBwwNTW91bnRhaW4g
+VmlldzEQMA4GA1UECgwHQW5kcm9pZDEQMA4GA1UECwwHQW5kcm9pZDEQMA4GA1UE
+AwwHQW5kcm9pZDEiMCAGCSqGSIb3DQEJARYTYW5kcm9pZEBhbmRyb2lkLmNvbTAe
+Fw0xNDExMDYxOTA3NDBaFw00MjAzMjQxOTA3NDBaMIGUMQswCQYDVQQGEwJVUzET
+MBEGA1UECAwKQ2FsaWZvcm5pYTEWMBQGA1UEBwwNTW91bnRhaW4gVmlldzEQMA4G
+A1UECgwHQW5kcm9pZDEQMA4GA1UECwwHQW5kcm9pZDEQMA4GA1UEAwwHQW5kcm9p
+ZDEiMCAGCSqGSIb3DQEJARYTYW5kcm9pZEBhbmRyb2lkLmNvbTCCASIwDQYJKoZI
+hvcNAQEBBQADggEPADCCAQoCggEBAOjreE0vTVSRenuzO9vnaWfk0eQzYab0gqpi
+6xAzi6dmD+ugoEKJmbPiuE5Dwf21isZ9uhUUu0dQM46dK4ocKxMRrcnmGxydFn6o
+fs3ODJMXOkv2gKXL/FdbEPdDbxzdu8z3yk+W67udM/fW7WbaQ3DO0knu+izKak/3
+T41c5uoXmQ81UNtAzRGzGchNVXMmWuTGOkg6U+0I2Td7K8yvUMWhAWPPpKLtVH9r
+AL5TzjYNR92izdKcz3AjRsI3CTjtpiVABGeX0TcjRSuZB7K9EK56HV+OFNS6I1NP
+jdD7FIShyGlqqZdUOkAUZYanbpgeT5N7QL6uuqcGpoTOkalu6kkCAwEAAaNQME4w
+HQYDVR0OBBYEFH5DM/m7oArf4O3peeKO0ZIEkrQPMB8GA1UdIwQYMBaAFH5DM/m7
+oArf4O3peeKO0ZIEkrQPMAwGA1UdEwQFMAMBAf8wDQYJKoZIhvcNAQEFBQADggEB
+AHO3NSvDE5jFvMehGGtS8BnFYdFKRIglDMc4niWSzhzOVYRH4WajxdtBWc5fx0ix
+NF/+hVKVhP6AIOQa+++sk+HIi7RvioPPbhjcsVlZe7cUEGrLSSveGouQyc+j0+m6
+JF84kszIl5GGNMTnx0XRPO+g8t6h5LWfnVydgZfpGRRg+WHewk1U2HlvTjIceb0N
+dcoJ8WKJAFWdcuE7VIm4w+vF/DYX/A2Oyzr2+QRhmYSv1cusgAeC1tvH4ap+J1Lg
+UnOu5Kh/FqPLLSwNVQp4Bu7b9QFfqK8Moj84bj88NqRGZgDyqzuTrFxn6FW7dmyA
+yttuAJAEAymk1mipd9+zp38=
+-----END CERTIFICATE-----