| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)
to do the replacement at the end of the merge window.
Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Moyster <oysterized@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
范龙飞 reports that KASAN can report a use-after-free in __lock_acquire.
The reason is due to insufficient serialization in lo_release(), which
will continue to use the loop device even after it has decremented the
lo_refcnt to zero.
In the meantime, another process can come in, open the loop device
again as it is being shut down. Confusion ensues.
(cherry-picked from ae6650163c66a7eff1acd6eb8b0f752dcfa8eba5)
Change-Id: Iada5ceba515f0137db8c95d0746e8ffc753037f3
Reported-by: 范龙飞 <long7573@126.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Marissa Wall <marissaw@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(cherry picked from commit b277da0a8a594308e17881f4926879bd5fca2a2d)
Clear QUEUE_FLAG_ADD_RANDOM in all block drivers that set
QUEUE_FLAG_NONROT.
Historically, all block devices have automatically made entropy
contributions. But as previously stated in commit e2e1a148 ("block: add
sysfs knob for turning off disk entropy contributions"):
- On SSD disks, the completion times aren't as random as they
are for rotational drives. So it's questionable whether they
should contribute to the random pool in the first place.
- Calling add_disk_randomness() has a lot of overhead.
There are more reliable sources for randomness than non-rotational block
devices. From a security perspective it is better to err on the side of
caution than to allow entropy contributions from unreliable "random"
sources.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Mister Oyster <oysterized@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit d72e9a7a93e4f8e9e52491921d99e0c8aa89eb4e upstream.
The copy_page is optimized memcpy for page-alinged address. If it is
used with non-page aligned address, it can corrupt memory which means
system corruption. With zram, it can happen with
1. 64K architecture
2. partial IO
3. slub debug
Partial IO need to allocate a page and zram allocates it via kmalloc.
With slub debug, kmalloc(PAGE_SIZE) doesn't return page-size aligned
address. And finally, copy_page(mem, cmem) corrupts memory.
So, this patch changes it to memcpy.
Actuaully, we don't need to change zram_bvec_write part because zsmalloc
returns page-aligned address in case of PAGE_SIZE class but it's not
good to rely on the internal of zsmalloc.
Note:
When this patch is merged to stable, clear_page should be fixed, too.
Unfortunately, recent zram removes it by "same page merge" feature so
it's hard to backport this patch to -stable tree.
I will handle it when I receive the mail from stable tree maintainer to
merge this patch to backport.
Change-Id: I22e122640cee66282afda3ecb18e3312637eafba
Fixes: 42e99bd ("zram: optimize memory operations with clear_page()/copy_page()")
Link: http://lkml.kernel.org/r/1492042622-12074-2-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zram hot_add sysfs attribute is a very 'special' attribute - reading
from it creates a new uninitialized zram device. This file, by a
mistake, can be read by a 'normal' user at the moment, while only root
must be able to create a new zram device, therefore hot_add attribute
must have S_IRUSR mode, not S_IRUGO.
[akpm@linux-foundation.org: s/sence/sense/, reflow comment to use 80 cols]
Fixes: 6566d1a32bf72 ("zram: add dynamic device add/remove functionality")
Link: http://lkml.kernel.org/r/20161205155845.20129-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Steven Allen <steven@stebalien.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org> [4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The zram hot removal code calls idr_remove() even when zram_remove()
returns an error (typically -EBUSY). This results in a leftover at the
device release, eventually leading to a crash when the module is
reloaded.
As described in the bug report below, the following procedure would
cause an Oops with zram:
- provision three zram devices via modprobe zram num_devices=3
- configure a size for each device
+ echo "1G" > /sys/block/$zram_name/disksize
- mkfs and mount zram0 only
- attempt to hot remove all three devices
+ echo 2 > /sys/class/zram-control/hot_remove
+ echo 1 > /sys/class/zram-control/hot_remove
+ echo 0 > /sys/class/zram-control/hot_remove
- zram0 removal fails with EBUSY, as expected
- unmount zram0
- try zram0 hot remove again
+ echo 0 > /sys/class/zram-control/hot_remove
- fails with ENODEV (unexpected)
- unload zram kernel module
+ completes successfully
- zram0 device node still exists
- attempt to mount /dev/zram0
+ mount command is killed
+ following BUG is encountered
BUG: unable to handle kernel paging request at ffffffffa0002ba0
IP: get_disk+0x16/0x50
Oops: 0000 [#1] SMP
CPU: 0 PID: 252 Comm: mount Not tainted 4.9.0-rc6 #176
Call Trace:
exact_lock+0xc/0x20
kobj_lookup+0xdc/0x160
get_gendisk+0x2f/0x110
__blkdev_get+0x10c/0x3c0
blkdev_get+0x19d/0x2e0
blkdev_open+0x56/0x70
do_dentry_open.isra.19+0x1ff/0x310
vfs_open+0x43/0x60
path_openat+0x2c9/0xf30
do_filp_open+0x79/0xd0
do_sys_open+0x114/0x1e0
SyS_open+0x19/0x20
entry_SYSCALL_64_fastpath+0x13/0x94
This patch adds the proper error check in hot_remove_store() not to call
idr_remove() unconditionally.
Fixes: 17ec4cd98578 ("zram: don't call idr_remove() from zram_remove()")
Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1010970
Link: http://lkml.kernel.org/r/20161121132140.12683-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Reviewed-by: David Disseldorp <ddiss@suse.de>
Reported-by: David Disseldorp <ddiss@suse.de>
Tested-by: David Disseldorp <ddiss@suse.de>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org> [4.4+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
| |
We'll have to make do without the granular specification of how
reclaim should be handled.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
We now allocate streams from CPU_UP hot-plug path, there are no
context-dependent stream allocations anymore and we can schedule from
zcomp_strm_alloc(). Use GFP_KERNEL directly and drop a gfp_t parameter.
Link: http://lkml.kernel.org/r/20160531122017.2878-9-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add "deflate", "lz4hc", "842" algorithms to the list of known
compression backends. The real availability of those algorithms,
however, depends on the corresponding CONFIG_CRYPTO_FOO config options.
[sergey.senozhatsky@gmail.com: zram-add-more-compression-algorithms-v3]
Link: http://lkml.kernel.org/r/20160604024902.11778-7-sergey.senozhatsky@gmail.com
Link: http://lkml.kernel.org/r/20160531122017.2878-8-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Remove lzo/lz4 backends, we use crypto API now.
[sergey.senozhatsky@gmail.com: zram-delete-custom-lzo-lz4-v3]
Link: http://lkml.kernel.org/r/20160604024902.11778-6-sergey.senozhatsky@gmail.com
Link: http://lkml.kernel.org/r/20160531122017.2878-7-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no way to get a string with all the crypto comp algorithms
supported by the crypto comp engine, so we need to maintain our own
backends list. At the same time we additionally need to use
crypto_has_comp() to make sure that the user has requested a compression
algorithm that is recognized by the crypto comp engine. Relying on
/proc/crypto is not an options here, because it does not show
not-yet-inserted compression modules.
Example:
modprobe zram
cat /proc/crypto | grep -i lz4
modprobe lz4
cat /proc/crypto | grep -i lz4
name : lz4
driver : lz4-generic
module : lz4
So the user can't tell exactly if the lz4 is really supported from
/proc/crypto output, unless someone or something has loaded it.
This patch also adds crypto_has_comp() to zcomp_available_show(). We
store all the compression algorithms names in zcomp's `backends' array,
regardless the CONFIG_CRYPTO_FOO configuration, but show only those that
are also supported by crypto engine. This helps user to know the exact
list of compression algorithms that can be used.
Example:
module lz4 is not loaded yet, but is supported by the crypto
engine. /proc/crypto has no information on this module, while
zram's `comp_algorithm' lists it:
cat /proc/crypto | grep -i lz4
cat /sys/block/zram0/comp_algorithm
[lzo] lz4 deflate lz4hc 842
We still use the `backends' array to determine if the requested
compression backend is known to crypto api. This array, however, may not
contain some entries, therefore as the last step we call crypto_has_comp()
function which attempts to insmod the requested compression algorithm to
determine if crypto api supports it. The advantage of this method is that
now we permit the usage of out-of-tree crypto compression modules
(implementing S/W or H/W compression).
[sergey.senozhatsky@gmail.com: zram-use-crypto-api-to-check-alg-availability-v3]
Link: http://lkml.kernel.org/r/20160604024902.11778-4-sergey.senozhatsky@gmail.com
Link: http://lkml.kernel.org/r/20160531122017.2878-5-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't have an idle zstreams list anymore and our write path now works
absolutely differently, preventing preemption during compression. This
removes possibilities of read paths preempting writes at wrong places
(which could badly affect the performance of both paths) and at the same
time opens the door for a move from custom LZO/LZ4 compression backends
implementation to a more generic one, using crypto compress API.
Joonsoo Kim [1] attempted to do this a while ago, but faced with the
need of introducing a new crypto API interface. The root cause was the
fact that crypto API compression algorithms require a compression stream
structure (in zram terminology) for both compression and decompression
ops, while in reality only several of compression algorithms really need
it. This resulted in a concept of context-less crypto API compression
backends [2]. Both write and read paths, though, would have been
executed with the preemption enabled, which in the worst case could have
resulted in a decreased worst-case performance, e.g. consider the
following case:
CPU0
zram_write()
spin_lock()
take the last idle stream
spin_unlock()
<< preempted >>
zram_read()
spin_lock()
no idle streams
spin_unlock()
schedule()
resuming zram_write compression()
but it took me some time to realize that, and it took even longer to
evolve zram and to make it ready for crypto API. The key turned out to be
-- drop the idle streams list entirely. Without the idle streams list we
are free to use compression algorithms that require compression stream for
decompression (read), because streams are now placed in per-cpu data and
each write path has to disable preemption for compression op, almost
completely eliminating the aforementioned case (technically, we still have
a small chance, because write path has a fast and a slow paths and the
slow path is executed with the preemption enabled; but the frequency of
failed fast path is too low).
TEST
====
- 4 CPUs, x86_64 system
- 3G zram, lzo
- fio tests: read, randread, write, randwrite, rw, randrw
test script [3] command:
ZRAM_SIZE=3G LOG_SUFFIX=XXXX FIO_LOOPS=5 ./zram-fio-test.sh
BASE PATCHED
jobs1
READ: 2527.2MB/s 2482.7MB/s
READ: 2102.7MB/s 2045.0MB/s
WRITE: 1284.3MB/s 1324.3MB/s
WRITE: 1080.7MB/s 1101.9MB/s
READ: 430125KB/s 437498KB/s
WRITE: 430538KB/s 437919KB/s
READ: 399593KB/s 403987KB/s
WRITE: 399910KB/s 404308KB/s
jobs2
READ: 8133.5MB/s 7854.8MB/s
READ: 7086.6MB/s 6912.8MB/s
WRITE: 3177.2MB/s 3298.3MB/s
WRITE: 2810.2MB/s 2871.4MB/s
READ: 1017.6MB/s 1023.4MB/s
WRITE: 1018.2MB/s 1023.1MB/s
READ: 977836KB/s 984205KB/s
WRITE: 979435KB/s 985814KB/s
jobs3
READ: 13557MB/s 13391MB/s
READ: 11876MB/s 11752MB/s
WRITE: 4641.5MB/s 4682.1MB/s
WRITE: 4164.9MB/s 4179.3MB/s
READ: 1453.8MB/s 1455.1MB/s
WRITE: 1455.1MB/s 1458.2MB/s
READ: 1387.7MB/s 1395.7MB/s
WRITE: 1386.1MB/s 1394.9MB/s
jobs4
READ: 20271MB/s 20078MB/s
READ: 18033MB/s 17928MB/s
WRITE: 6176.8MB/s 6180.5MB/s
WRITE: 5686.3MB/s 5705.3MB/s
READ: 2009.4MB/s 2006.7MB/s
WRITE: 2007.5MB/s 2004.9MB/s
READ: 1929.7MB/s 1935.6MB/s
WRITE: 1926.8MB/s 1932.6MB/s
jobs5
READ: 18823MB/s 19024MB/s
READ: 18968MB/s 19071MB/s
WRITE: 6191.6MB/s 6372.1MB/s
WRITE: 5818.7MB/s 5787.1MB/s
READ: 2011.7MB/s 1981.3MB/s
WRITE: 2011.4MB/s 1980.1MB/s
READ: 1949.3MB/s 1935.7MB/s
WRITE: 1940.4MB/s 1926.1MB/s
jobs6
READ: 21870MB/s 21715MB/s
READ: 19957MB/s 19879MB/s
WRITE: 6528.4MB/s 6537.6MB/s
WRITE: 6098.9MB/s 6073.6MB/s
READ: 2048.6MB/s 2049.9MB/s
WRITE: 2041.7MB/s 2042.9MB/s
READ: 2013.4MB/s 1990.4MB/s
WRITE: 2009.4MB/s 1986.5MB/s
jobs7
READ: 21359MB/s 21124MB/s
READ: 19746MB/s 19293MB/s
WRITE: 6660.4MB/s 6518.8MB/s
WRITE: 6211.6MB/s 6193.1MB/s
READ: 2089.7MB/s 2080.6MB/s
WRITE: 2085.8MB/s 2076.5MB/s
READ: 2041.2MB/s 2052.5MB/s
WRITE: 2037.5MB/s 2048.8MB/s
jobs8
READ: 20477MB/s 19974MB/s
READ: 18922MB/s 18576MB/s
WRITE: 6851.9MB/s 6788.3MB/s
WRITE: 6407.7MB/s 6347.5MB/s
READ: 2134.8MB/s 2136.1MB/s
WRITE: 2132.8MB/s 2134.4MB/s
READ: 2074.2MB/s 2069.6MB/s
WRITE: 2087.3MB/s 2082.4MB/s
jobs9
READ: 19797MB/s 19994MB/s
READ: 18806MB/s 18581MB/s
WRITE: 6878.7MB/s 6822.7MB/s
WRITE: 6456.8MB/s 6447.2MB/s
READ: 2141.1MB/s 2154.7MB/s
WRITE: 2144.4MB/s 2157.3MB/s
READ: 2084.1MB/s 2085.1MB/s
WRITE: 2091.5MB/s 2092.5MB/s
jobs10
READ: 19794MB/s 19784MB/s
READ: 18794MB/s 18745MB/s
WRITE: 6984.4MB/s 6676.3MB/s
WRITE: 6532.3MB/s 6342.7MB/s
READ: 2150.6MB/s 2155.4MB/s
WRITE: 2156.8MB/s 2161.5MB/s
READ: 2106.4MB/s 2095.6MB/s
WRITE: 2109.7MB/s 2098.4MB/s
BASE PATCHED
jobs1 perfstat
stalled-cycles-frontend 102,480,595,419 ( 41.53%) 114,508,864,804 ( 46.92%)
stalled-cycles-backend 51,941,417,832 ( 21.05%) 46,836,112,388 ( 19.19%)
instructions 283,612,054,215 ( 1.15) 283,918,134,959 ( 1.16)
branches 56,372,560,385 ( 724.923) 56,449,814,753 ( 733.766)
branch-misses 374,826,000 ( 0.66%) 326,935,859 ( 0.58%)
jobs2 perfstat
stalled-cycles-frontend 155,142,745,777 ( 40.99%) 164,170,979,198 ( 43.82%)
stalled-cycles-backend 70,813,866,387 ( 18.71%) 66,456,858,165 ( 17.74%)
instructions 463,436,648,173 ( 1.22) 464,221,890,191 ( 1.24)
branches 91,088,733,902 ( 760.088) 91,278,144,546 ( 769.133)
branch-misses 504,460,363 ( 0.55%) 394,033,842 ( 0.43%)
jobs3 perfstat
stalled-cycles-frontend 201,300,397,212 ( 39.84%) 223,969,902,257 ( 44.44%)
stalled-cycles-backend 87,712,593,974 ( 17.36%) 81,618,888,712 ( 16.19%)
instructions 642,869,545,023 ( 1.27) 644,677,354,132 ( 1.28)
branches 125,724,560,594 ( 690.682) 126,133,159,521 ( 694.542)
branch-misses 527,941,798 ( 0.42%) 444,782,220 ( 0.35%)
jobs4 perfstat
stalled-cycles-frontend 246,701,197,429 ( 38.12%) 280,076,030,886 ( 43.29%)
stalled-cycles-backend 119,050,341,112 ( 18.40%) 110,955,641,671 ( 17.15%)
instructions 822,716,962,127 ( 1.27) 825,536,969,320 ( 1.28)
branches 160,590,028,545 ( 688.614) 161,152,996,915 ( 691.068)
branch-misses 650,295,287 ( 0.40%) 550,229,113 ( 0.34%)
jobs5 perfstat
stalled-cycles-frontend 298,958,462,516 ( 38.30%) 344,852,200,358 ( 44.16%)
stalled-cycles-backend 137,558,742,122 ( 17.62%) 129,465,067,102 ( 16.58%)
instructions 1,005,714,688,752 ( 1.29) 1,007,657,999,432 ( 1.29)
branches 195,988,773,962 ( 697.730) 196,446,873,984 ( 700.319)
branch-misses 695,818,940 ( 0.36%) 624,823,263 ( 0.32%)
jobs6 perfstat
stalled-cycles-frontend 334,497,602,856 ( 36.71%) 387,590,419,779 ( 42.38%)
stalled-cycles-backend 163,539,365,335 ( 17.95%) 152,640,193,639 ( 16.69%)
instructions 1,184,738,177,851 ( 1.30) 1,187,396,281,677 ( 1.30)
branches 230,592,915,640 ( 702.902) 231,253,802,882 ( 702.356)
branch-misses 747,934,786 ( 0.32%) 643,902,424 ( 0.28%)
jobs7 perfstat
stalled-cycles-frontend 396,724,684,187 ( 37.71%) 460,705,858,952 ( 43.84%)
stalled-cycles-backend 188,096,616,496 ( 17.88%) 175,785,787,036 ( 16.73%)
instructions 1,364,041,136,608 ( 1.30) 1,366,689,075,112 ( 1.30)
branches 265,253,096,936 ( 700.078) 265,890,524,883 ( 702.839)
branch-misses 784,991,589 ( 0.30%) 729,196,689 ( 0.27%)
jobs8 perfstat
stalled-cycles-frontend 440,248,299,870 ( 36.92%) 509,554,793,816 ( 42.46%)
stalled-cycles-backend 222,575,930,616 ( 18.67%) 213,401,248,432 ( 17.78%)
instructions 1,542,262,045,114 ( 1.29) 1,545,233,932,257 ( 1.29)
branches 299,775,178,439 ( 697.666) 300,528,458,505 ( 694.769)
branch-misses 847,496,084 ( 0.28%) 748,794,308 ( 0.25%)
jobs9 perfstat
stalled-cycles-frontend 506,269,882,480 ( 37.86%) 592,798,032,820 ( 44.43%)
stalled-cycles-backend 253,192,498,861 ( 18.93%) 233,727,666,185 ( 17.52%)
instructions 1,721,985,080,913 ( 1.29) 1,724,666,236,005 ( 1.29)
branches 334,517,360,255 ( 694.134) 335,199,758,164 ( 697.131)
branch-misses 873,496,730 ( 0.26%) 815,379,236 ( 0.24%)
jobs10 perfstat
stalled-cycles-frontend 549,063,363,749 ( 37.18%) 651,302,376,662 ( 43.61%)
stalled-cycles-backend 281,680,986,810 ( 19.07%) 277,005,235,582 ( 18.55%)
instructions 1,901,859,271,180 ( 1.29) 1,906,311,064,230 ( 1.28)
branches 369,398,536,153 ( 694.004) 370,527,696,358 ( 688.409)
branch-misses 967,929,335 ( 0.26%) 890,125,056 ( 0.24%)
BASE PATCHED
seconds elapsed 79.421641008 78.735285546
seconds elapsed 61.471246133 60.869085949
seconds elapsed 62.317058173 62.224188495
seconds elapsed 60.030739363 60.081102518
seconds elapsed 74.070398362 74.317582865
seconds elapsed 84.985953007 85.414364176
seconds elapsed 97.724553255 98.173311344
seconds elapsed 109.488066758 110.268399318
seconds elapsed 122.768189405 122.967164498
seconds elapsed 135.130035105 136.934770801
On my other system (8 x86_64 CPUs, short version of test results):
BASE PATCHED
seconds elapsed 19.518065994 19.806320662
seconds elapsed 15.172772749 15.594718291
seconds elapsed 13.820925970 13.821708564
seconds elapsed 13.293097816 14.585206405
seconds elapsed 16.207284118 16.064431606
seconds elapsed 17.958376158 17.771825767
seconds elapsed 19.478009164 19.602961508
seconds elapsed 21.347152811 21.352318709
seconds elapsed 24.478121126 24.171088735
seconds elapsed 26.865057442 26.767327618
So performance-wise the numbers are quite similar.
Also update zcomp interface to be more aligned with the crypto API.
[1] http://marc.info/?l=linux-kernel&m=144480832108927&w=2
[2] http://marc.info/?l=linux-kernel&m=145379613507518&w=2
[3] https://github.com/sergey-senozhatsky/zram-perf-test
Link: http://lkml.kernel.org/r/20160531122017.2878-3-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Suggested-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This has started as a 'add zlib support' work, but after some thinking I
saw no blockers for a bigger change -- a switch to crypto API.
We don't have an idle zstreams list anymore and our write path now works
absolutely differently, preventing preemption during compression. This
removes possibilities of read paths preempting writes at wrong places
and opens the door for a move from custom LZO/LZ4 compression backends
implementation to a more generic one, using crypto compress API.
This patch set also eliminates the need of a new context-less crypto API
interface, which was quite hard to sell, so we can move along faster.
benchmarks:
(x86_64, 4GB, zram-perf script)
perf reported run-time fio (max jobs=3). I performed fio test with the
increasing number of parallel jobs (max to 3) on a 3G zram device, using
`static' data and the following crypto comp algorithms:
842, deflate, lz4, lz4hc, lzo
the output was:
- test running time (which can tell us what algorithms performs faster)
and
- zram mm_stat (which tells the compressed memory size, max used memory, etc).
It's just for information. for example, LZ4HC has twice the running
time of LZO, but the compressed memory size is: 23592960 vs 34603008
bytes.
test-fio-zram-842
197.907655282 seconds time elapsed
201.623142884 seconds time elapsed
226.854291345 seconds time elapsed
test-fio-zram-DEFLATE
253.259516155 seconds time elapsed
258.148563401 seconds time elapsed
290.251909365 seconds time elapsed
test-fio-zram-LZ4
27.022598717 seconds time elapsed
29.580522717 seconds time elapsed
33.293463430 seconds time elapsed
test-fio-zram-LZ4HC
56.393954615 seconds time elapsed
74.904659747 seconds time elapsed
101.940998564 seconds time elapsed
test-fio-zram-LZO
28.155948075 seconds time elapsed
30.390036330 seconds time elapsed
34.455773159 seconds time elapsed
zram mm_stat-s (max fio jobs=3)
test-fio-zram-842
mm_stat (jobs1): 3221225472 673185792 690266112 0 690266112 0 0
mm_stat (jobs2): 3221225472 673185792 690266112 0 690266112 0 0
mm_stat (jobs3): 3221225472 673185792 690266112 0 690266112 0 0
test-fio-zram-DEFLATE
mm_stat (jobs1): 3221225472 24379392 37761024 0 37761024 0 0
mm_stat (jobs2): 3221225472 24379392 37761024 0 37761024 0 0
mm_stat (jobs3): 3221225472 24379392 37761024 0 37761024 0 0
test-fio-zram-LZ4
mm_stat (jobs1): 3221225472 23592960 37761024 0 37761024 0 0
mm_stat (jobs2): 3221225472 23592960 37761024 0 37761024 0 0
mm_stat (jobs3): 3221225472 23592960 37761024 0 37761024 0 0
test-fio-zram-LZ4HC
mm_stat (jobs1): 3221225472 23592960 37761024 0 37761024 0 0
mm_stat (jobs2): 3221225472 23592960 37761024 0 37761024 0 0
mm_stat (jobs3): 3221225472 23592960 37761024 0 37761024 0 0
test-fio-zram-LZO
mm_stat (jobs1): 3221225472 34603008 50335744 0 50335744 0 0
mm_stat (jobs2): 3221225472 34603008 50335744 0 50335744 0 0
mm_stat (jobs3): 3221225472 34603008 50335744 0 50339840 0 0
This patch (of 8):
We don't perform any zstream idle list lookup anymore, so
zcomp_strm_find()/zcomp_strm_release() names are not representative.
Rename to zcomp_stream_get()/zcomp_stream_put().
Link: http://lkml.kernel.org/r/20160531122017.2878-2-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
debug_stat sysfs is read-only and represents various debugging data that
zram developers may need. This file is not meant to be used by anyone
else: its content is not documented and will change any time w/o any
notice. Therefore, the output of debug_stat file contains a version
string. To avoid any confusion, we will increase the version number
every time we modify the output.
At the moment this file exports only one value -- the number of
re-compressions, IOW, the number of times compression fast path has
failed. This stat is temporary any will be useful in case if any
per-cpu compression streams regressions will be reported.
Link: http://lkml.kernel.org/r/20160513230834.GB26763@bbox
Link: http://lkml.kernel.org/r/20160511134553.12655-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove the internal part of max_comp_streams interface, since we
switched to per-cpu streams. We will keep RW max_comp_streams attr
around, because:
a) we may (silently) switch back to idle compression streams list and
don't want to disturb user space
b) max_comp_streams attr must wait for the next 'lay off cycle'; we
give user space 2 years to adjust before we remove/downgrade the attr,
and there are already several attrs scheduled for removal in 4.11, so
it's too late for max_comp_streams.
This slightly change a user visible behaviour:
- First, reading from max_comp_stream file now will always return the
number of online CPUs.
- Second, writing to max_comp_stream will not take any effect.
Link: http://lkml.kernel.org/r/20160503165546.25201-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Remove idle streams list and keep compression streams in per-cpu data.
This removes two contented spin_lock()/spin_unlock() calls from write
path and also prevent write OP from being preempted while holding the
compression stream, which can cause slow downs.
For instance, let's assume that we have N cpus and N-2
max_comp_streams.TASK1 owns the last idle stream, TASK2-TASK3 come in
with the write requests:
TASK1 TASK2 TASK3
zram_bvec_write()
spin_lock
find stream
spin_unlock
compress
<<preempted>> zram_bvec_write()
spin_lock
find stream
spin_unlock
no_stream
schedule
zram_bvec_write()
spin_lock
find_stream
spin_unlock
no_stream
schedule
spin_lock
release stream
spin_unlock
wake up TASK2
not only TASK2 and TASK3 will not get the stream, TASK1 will be
preempted in the middle of its operation; while we would prefer it to
finish compression and release the stream.
Test environment: x86_64, 4 CPU box, 3G zram, lzo
The following fio tests were executed:
read, randread, write, randwrite, rw, randrw
with the increasing number of jobs from 1 to 10.
4 streams 8 streams per-cpu
===========================================================
jobs1
READ: 2520.1MB/s 2566.5MB/s 2491.5MB/s
READ: 2102.7MB/s 2104.2MB/s 2091.3MB/s
WRITE: 1355.1MB/s 1320.2MB/s 1378.9MB/s
WRITE: 1103.5MB/s 1097.2MB/s 1122.5MB/s
READ: 434013KB/s 435153KB/s 439961KB/s
WRITE: 433969KB/s 435109KB/s 439917KB/s
READ: 403166KB/s 405139KB/s 403373KB/s
WRITE: 403223KB/s 405197KB/s 403430KB/s
jobs2
READ: 7958.6MB/s 8105.6MB/s 8073.7MB/s
READ: 6864.9MB/s 6989.8MB/s 7021.8MB/s
WRITE: 2438.1MB/s 2346.9MB/s 3400.2MB/s
WRITE: 1994.2MB/s 1990.3MB/s 2941.2MB/s
READ: 981504KB/s 973906KB/s 1018.8MB/s
WRITE: 981659KB/s 974060KB/s 1018.1MB/s
READ: 937021KB/s 938976KB/s 987250KB/s
WRITE: 934878KB/s 936830KB/s 984993KB/s
jobs3
READ: 13280MB/s 13553MB/s 13553MB/s
READ: 11534MB/s 11785MB/s 11755MB/s
WRITE: 3456.9MB/s 3469.9MB/s 4810.3MB/s
WRITE: 3029.6MB/s 3031.6MB/s 4264.8MB/s
READ: 1363.8MB/s 1362.6MB/s 1448.9MB/s
WRITE: 1361.9MB/s 1360.7MB/s 1446.9MB/s
READ: 1309.4MB/s 1310.6MB/s 1397.5MB/s
WRITE: 1307.4MB/s 1308.5MB/s 1395.3MB/s
jobs4
READ: 20244MB/s 20177MB/s 20344MB/s
READ: 17886MB/s 17913MB/s 17835MB/s
WRITE: 4071.6MB/s 4046.1MB/s 6370.2MB/s
WRITE: 3608.9MB/s 3576.3MB/s 5785.4MB/s
READ: 1824.3MB/s 1821.6MB/s 1997.5MB/s
WRITE: 1819.8MB/s 1817.4MB/s 1992.5MB/s
READ: 1765.7MB/s 1768.3MB/s 1937.3MB/s
WRITE: 1767.5MB/s 1769.1MB/s 1939.2MB/s
jobs5
READ: 18663MB/s 18986MB/s 18823MB/s
READ: 16659MB/s 16605MB/s 16954MB/s
WRITE: 3912.4MB/s 3888.7MB/s 6126.9MB/s
WRITE: 3506.4MB/s 3442.5MB/s 5519.3MB/s
READ: 1798.2MB/s 1746.5MB/s 1935.8MB/s
WRITE: 1792.7MB/s 1740.7MB/s 1929.1MB/s
READ: 1727.6MB/s 1658.2MB/s 1917.3MB/s
WRITE: 1726.5MB/s 1657.2MB/s 1916.6MB/s
jobs6
READ: 21017MB/s 20922MB/s 21162MB/s
READ: 19022MB/s 19140MB/s 18770MB/s
WRITE: 3968.2MB/s 4037.7MB/s 6620.8MB/s
WRITE: 3643.5MB/s 3590.2MB/s 6027.5MB/s
READ: 1871.8MB/s 1880.5MB/s 2049.9MB/s
WRITE: 1867.8MB/s 1877.2MB/s 2046.2MB/s
READ: 1755.8MB/s 1710.3MB/s 1964.7MB/s
WRITE: 1750.5MB/s 1705.9MB/s 1958.8MB/s
jobs7
READ: 21103MB/s 20677MB/s 21482MB/s
READ: 18522MB/s 18379MB/s 19443MB/s
WRITE: 4022.5MB/s 4067.4MB/s 6755.9MB/s
WRITE: 3691.7MB/s 3695.5MB/s 5925.6MB/s
READ: 1841.5MB/s 1933.9MB/s 2090.5MB/s
WRITE: 1842.7MB/s 1935.3MB/s 2091.9MB/s
READ: 1832.4MB/s 1856.4MB/s 1971.5MB/s
WRITE: 1822.3MB/s 1846.2MB/s 1960.6MB/s
jobs8
READ: 20463MB/s 20194MB/s 20862MB/s
READ: 18178MB/s 17978MB/s 18299MB/s
WRITE: 4085.9MB/s 4060.2MB/s 7023.8MB/s
WRITE: 3776.3MB/s 3737.9MB/s 6278.2MB/s
READ: 1957.6MB/s 1944.4MB/s 2109.5MB/s
WRITE: 1959.2MB/s 1946.2MB/s 2111.4MB/s
READ: 1900.6MB/s 1885.7MB/s 2082.1MB/s
WRITE: 1896.2MB/s 1881.4MB/s 2078.3MB/s
jobs9
READ: 19692MB/s 19734MB/s 19334MB/s
READ: 17678MB/s 18249MB/s 17666MB/s
WRITE: 4004.7MB/s 4064.8MB/s 6990.7MB/s
WRITE: 3724.7MB/s 3772.1MB/s 6193.6MB/s
READ: 1953.7MB/s 1967.3MB/s 2105.6MB/s
WRITE: 1953.4MB/s 1966.7MB/s 2104.1MB/s
READ: 1860.4MB/s 1897.4MB/s 2068.5MB/s
WRITE: 1858.9MB/s 1895.9MB/s 2066.8MB/s
jobs10
READ: 19730MB/s 19579MB/s 19492MB/s
READ: 18028MB/s 18018MB/s 18221MB/s
WRITE: 4027.3MB/s 4090.6MB/s 7020.1MB/s
WRITE: 3810.5MB/s 3846.8MB/s 6426.8MB/s
READ: 1956.1MB/s 1994.6MB/s 2145.2MB/s
WRITE: 1955.9MB/s 1993.5MB/s 2144.8MB/s
READ: 1852.8MB/s 1911.6MB/s 2075.8MB/s
WRITE: 1855.7MB/s 1914.6MB/s 2078.1MB/s
perf stat
4 streams 8 streams per-cpu
====================================================================================================================
jobs1
stalled-cycles-frontend 23,174,811,209 ( 38.21%) 23,220,254,188 ( 38.25%) 23,061,406,918 ( 38.34%)
stalled-cycles-backend 11,514,174,638 ( 18.98%) 11,696,722,657 ( 19.27%) 11,370,852,810 ( 18.90%)
instructions 73,925,005,782 ( 1.22) 73,903,177,632 ( 1.22) 73,507,201,037 ( 1.22)
branches 14,455,124,835 ( 756.063) 14,455,184,779 ( 755.281) 14,378,599,509 ( 758.546)
branch-misses 69,801,336 ( 0.48%) 80,225,529 ( 0.55%) 72,044,726 ( 0.50%)
jobs2
stalled-cycles-frontend 49,912,741,782 ( 46.11%) 50,101,189,290 ( 45.95%) 32,874,195,633 ( 35.11%)
stalled-cycles-backend 27,080,366,230 ( 25.02%) 27,949,970,232 ( 25.63%) 16,461,222,706 ( 17.58%)
instructions 122,831,629,690 ( 1.13) 122,919,846,419 ( 1.13) 121,924,786,775 ( 1.30)
branches 23,725,889,239 ( 692.663) 23,733,547,140 ( 688.062) 23,553,950,311 ( 794.794)
branch-misses 90,733,041 ( 0.38%) 96,320,895 ( 0.41%) 84,561,092 ( 0.36%)
jobs3
stalled-cycles-frontend 66,437,834,608 ( 45.58%) 63,534,923,344 ( 43.69%) 42,101,478,505 ( 33.19%)
stalled-cycles-backend 34,940,799,661 ( 23.97%) 34,774,043,148 ( 23.91%) 21,163,324,388 ( 16.68%)
instructions 171,692,121,862 ( 1.18) 171,775,373,044 ( 1.18) 170,353,542,261 ( 1.34)
branches 32,968,962,622 ( 628.723) 32,987,739,894 ( 630.512) 32,729,463,918 ( 717.027)
branch-misses 111,522,732 ( 0.34%) 110,472,894 ( 0.33%) 99,791,291 ( 0.30%)
jobs4
stalled-cycles-frontend 98,741,701,675 ( 49.72%) 94,797,349,965 ( 47.59%) 54,535,655,381 ( 33.53%)
stalled-cycles-backend 54,642,609,615 ( 27.51%) 55,233,554,408 ( 27.73%) 27,882,323,541 ( 17.14%)
instructions 220,884,807,851 ( 1.11) 220,930,887,273 ( 1.11) 218,926,845,851 ( 1.35)
branches 42,354,518,180 ( 592.105) 42,362,770,587 ( 590.452) 41,955,552,870 ( 716.154)
branch-misses 138,093,449 ( 0.33%) 131,295,286 ( 0.31%) 121,794,771 ( 0.29%)
jobs5
stalled-cycles-frontend 116,219,747,212 ( 48.14%) 110,310,397,012 ( 46.29%) 66,373,082,723 ( 33.70%)
stalled-cycles-backend 66,325,434,776 ( 27.48%) 64,157,087,914 ( 26.92%) 32,999,097,299 ( 16.76%)
instructions 270,615,008,466 ( 1.12) 270,546,409,525 ( 1.14) 268,439,910,948 ( 1.36)
branches 51,834,046,557 ( 599.108) 51,811,867,722 ( 608.883) 51,412,576,077 ( 729.213)
branch-misses 158,197,086 ( 0.31%) 142,639,805 ( 0.28%) 133,425,455 ( 0.26%)
jobs6
stalled-cycles-frontend 138,009,414,492 ( 48.23%) 139,063,571,254 ( 48.80%) 75,278,568,278 ( 32.80%)
stalled-cycles-backend 79,211,949,650 ( 27.68%) 79,077,241,028 ( 27.75%) 37,735,797,899 ( 16.44%)
instructions 319,763,993,731 ( 1.12) 319,937,782,834 ( 1.12) 316,663,600,784 ( 1.38)
branches 61,219,433,294 ( 595.056) 61,250,355,540 ( 598.215) 60,523,446,617 ( 733.706)
branch-misses 169,257,123 ( 0.28%) 154,898,028 ( 0.25%) 141,180,587 ( 0.23%)
jobs7
stalled-cycles-frontend 162,974,812,119 ( 49.20%) 159,290,061,987 ( 48.43%) 88,046,641,169 ( 33.21%)
stalled-cycles-backend 92,223,151,661 ( 27.84%) 91,667,904,406 ( 27.87%) 44,068,454,971 ( 16.62%)
instructions 369,516,432,430 ( 1.12) 369,361,799,063 ( 1.12) 365,290,380,661 ( 1.38)
branches 70,795,673,950 ( 594.220) 70,743,136,124 ( 597.876) 69,803,996,038 ( 732.822)
branch-misses 181,708,327 ( 0.26%) 165,767,821 ( 0.23%) 150,109,797 ( 0.22%)
jobs8
stalled-cycles-frontend 185,000,017,027 ( 49.30%) 182,334,345,473 ( 48.37%) 99,980,147,041 ( 33.26%)
stalled-cycles-backend 105,753,516,186 ( 28.18%) 107,937,830,322 ( 28.63%) 51,404,177,181 ( 17.10%)
instructions 418,153,161,055 ( 1.11) 418,308,565,828 ( 1.11) 413,653,475,581 ( 1.38)
branches 80,035,882,398 ( 592.296) 80,063,204,510 ( 589.843) 79,024,105,589 ( 730.530)
branch-misses 199,764,528 ( 0.25%) 177,936,926 ( 0.22%) 160,525,449 ( 0.20%)
jobs9
stalled-cycles-frontend 210,941,799,094 ( 49.63%) 204,714,679,254 ( 48.55%) 114,251,113,756 ( 33.96%)
stalled-cycles-backend 122,640,849,067 ( 28.85%) 122,188,553,256 ( 28.98%) 58,360,041,127 ( 17.35%)
instructions 468,151,025,415 ( 1.10) 467,354,869,323 ( 1.11) 462,665,165,216 ( 1.38)
branches 89,657,067,510 ( 585.628) 89,411,550,407 ( 588.990) 88,360,523,943 ( 730.151)
branch-misses 218,292,301 ( 0.24%) 191,701,247 ( 0.21%) 178,535,678 ( 0.20%)
jobs10
stalled-cycles-frontend 233,595,958,008 ( 49.81%) 227,540,615,689 ( 49.11%) 160,341,979,938 ( 43.07%)
stalled-cycles-backend 136,153,676,021 ( 29.03%) 133,635,240,742 ( 28.84%) 65,909,135,465 ( 17.70%)
instructions 517,001,168,497 ( 1.10) 516,210,976,158 ( 1.11) 511,374,038,613 ( 1.37)
branches 98,911,641,329 ( 585.796) 98,700,069,712 ( 591.583) 97,646,761,028 ( 728.712)
branch-misses 232,341,823 ( 0.23%) 199,256,308 ( 0.20%) 183,135,268 ( 0.19%)
per-cpu streams tend to cause significantly less stalled cycles; execute
less branches and hit less branch-misses.
perf stat reported execution time
4 streams 8 streams per-cpu
====================================================================
jobs1
seconds elapsed 20.909073870 20.875670495 20.817838540
jobs2
seconds elapsed 18.529488399 18.720566469 16.356103108
jobs3
seconds elapsed 18.991159531 18.991340812 16.766216066
jobs4
seconds elapsed 19.560643828 19.551323547 16.246621715
jobs5
seconds elapsed 24.746498464 25.221646740 20.696112444
jobs6
seconds elapsed 28.258181828 28.289765505 22.885688857
jobs7
seconds elapsed 32.632490241 31.909125381 26.272753738
jobs8
seconds elapsed 35.651403851 36.027596308 29.108024711
jobs9
seconds elapsed 40.569362365 40.024227989 32.898204012
jobs10
seconds elapsed 44.673112304 43.874898137 35.632952191
Please see
Link: http://marc.info/?l=linux-kernel&m=146166970727530
Link: http://marc.info/?l=linux-kernel&m=146174716719650
for more test results (under low memory conditions).
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Pass GFP flags to zs_malloc() instead of using a fixed mask supplied to
zs_create_pool(), so we can be more flexible, but, more importantly, we
need this to switch zram to per-cpu compression streams -- zram will try
to allocate handle with preemption disabled in a fast path and switch to
a slow path (using different gfp mask) if the fast one has failed.
Apart from that, this also align zs_malloc() interface with zspool/zbud.
[sergey.senozhatsky@gmail.com: pass GFP flags to zs_malloc() instead of using a fixed mask]
Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
Link: http://lkml.kernel.org/r/20160429150942.GA637@swordfish
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Compaction returns back to zram the number of migrated objects, which is
quite uninformative -- we have objects of different sizes so user space
cannot obtain any valuable data from that number. Change compaction to
operate in terms of pages and return back to compaction issuer the
number of pages that were freed during compaction. So from now on we
will export more meaningful value in zram<id>/mm_stat -- the number of
freed (compacted) pages.
This requires:
(a) a rename of `num_migrated' to 'pages_compacted'
(b) a internal API change -- return first_page's fullness_group from
putback_zspage(), so we know when putback_zspage() did
free_zspage(). It helps us to account compaction stats correctly.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`zs_compact_control' accounts the number of migrated objects but it has
a limited lifespan -- we lose it as soon as zs_compaction() returns back
to zram. It worked fine, because (a) zram had it's own counter of
migrated objects and (b) only zram could trigger compaction. However,
this does not work for automatic pool compaction (not issued by zram).
To account objects migrated during auto-compaction (issued by the
shrinker) we need to store this number in zs_pool.
Define a new `struct zs_pool_stats' structure to keep zs_pool's stats
there. It provides only `num_migrated', as of this writing, but it
surely can be extended.
A new zsmalloc zs_pool_stats() symbol exports zs_pool's stats back to
caller.
Use zs_pool_stats() in zram and remove `num_migrated' from zram_stats.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Suggested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The use of idr_remove() is forbidden in the callback functions of
idr_for_each(). It is therefore unsafe to call idr_remove in
zram_remove().
This patch moves the call to idr_remove() from zram_remove() to
hot_remove_store(). In the detroy_devices() path, idrs are removed by
idr_destroy(). This solves an use-after-free detected by KASan.
[akpm@linux-foundation.org: fix coding stype, per Sergey]
Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: <stable@vger.kernel.org> [4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Do not __GFP_ZERO allocated zcomp ->private pages. We keep allocated
streams around and use them for read/write requests, so we supply a
zeroed out ->private to compression algorithm as a scratch buffer only
once -- the first time we use that stream. For the rest of IO requests
served by this stream ->private usually contains some temporarily data
from the previous requests.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Each zcomp backend uses own gfp flag but it's pointless because the
context they could be called is driven by upper layer(ie, zcomp
frontend). As well, zcomp frondend could call them in different
context. One context(ie, zram init part) is it should be better to make
sure successful allocation other context(ie, further stream allocation
part for accelarating I/O speed) is just optional so let's pass gfp down
from driver (ie, zcomp frontend) like normal MM convention.
[sergey.senozhatsky@gmail.com: add missing __vmalloc zero and highmem gfps]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When we're using LZ4 multi compression streams for zram swap, we found
out page allocation failure message in system running test. That was
not only once, but a few(2 - 5 times per test). Also, some failure
cases were continually occurring to try allocation order 3.
In order to make parallel compression private data, we should call
kzalloc() with order 2/3 in runtime(lzo/lz4). But if there is no order
2/3 size memory to allocate in that time, page allocation fails. This
patch makes to use vmalloc() as fallback of kmalloc(), this prevents
page alloc failure warning.
After using this, we never found warning message in running test, also
It could reduce process startup latency about 60-120ms in each case.
For reference a call trace :
Binder_1: page allocation failure: order:3, mode:0x10c0d0
CPU: 0 PID: 424 Comm: Binder_1 Tainted: GW 3.10.49-perf-g991d02b-dirty #20
Call trace:
dump_backtrace+0x0/0x270
show_stack+0x10/0x1c
dump_stack+0x1c/0x28
warn_alloc_failed+0xfc/0x11c
__alloc_pages_nodemask+0x724/0x7f0
__get_free_pages+0x14/0x5c
kmalloc_order_trace+0x38/0xd8
zcomp_lz4_create+0x2c/0x38
zcomp_strm_alloc+0x34/0x78
zcomp_strm_multi_find+0x124/0x1ec
zcomp_strm_find+0xc/0x18
zram_bvec_rw+0x2fc/0x780
zram_make_request+0x25c/0x2d4
generic_make_request+0x80/0xbc
submit_bio+0xa4/0x15c
__swap_writepage+0x218/0x230
swap_writepage+0x3c/0x4c
shrink_page_list+0x51c/0x8d0
shrink_inactive_list+0x3f8/0x60c
shrink_lruvec+0x33c/0x4cc
shrink_zone+0x3c/0x100
try_to_free_pages+0x2b8/0x54c
__alloc_pages_nodemask+0x514/0x7f0
__get_free_pages+0x14/0x5c
proc_info_read+0x50/0xe4
vfs_read+0xa0/0x12c
SyS_read+0x44/0x74
DMA: 3397*4kB (MC) 26*8kB (RC) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB
0*512kB 0*1024kB 0*2048kB 0*4096kB = 13796kB
[minchan@kernel.org: change vmalloc gfp and adding comment about gfp]
[sergey.senozhatsky@gmail.com: tweak comments and styles]
Signed-off-by: Kyeongdon Kim <kyeongdon.kim@lge.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We can end up allocating a new compression stream with GFP_KERNEL from
within the IO path, which may result is nested (recursive) IO
operations. That can introduce problems if the IO path in question is a
reclaimer, holding some locks that will deadlock nested IOs.
Allocate streams and working memory using GFP_NOIO flag, forbidding
recursive IO and FS operations.
An example:
inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
git/20158 [HC0[0]:SC0[0]:HE1:SE1] takes:
(jbd2_handle){+.+.?.}, at: start_this_handle+0x4ca/0x555
{IN-RECLAIM_FS-W} state was registered at:
__lock_acquire+0x8da/0x117b
lock_acquire+0x10c/0x1a7
start_this_handle+0x52d/0x555
jbd2__journal_start+0xb4/0x237
__ext4_journal_start_sb+0x108/0x17e
ext4_dirty_inode+0x32/0x61
__mark_inode_dirty+0x16b/0x60c
iput+0x11e/0x274
__dentry_kill+0x148/0x1b8
shrink_dentry_list+0x274/0x44a
prune_dcache_sb+0x4a/0x55
super_cache_scan+0xfc/0x176
shrink_slab.part.14.constprop.25+0x2a2/0x4d3
shrink_zone+0x74/0x140
kswapd+0x6b7/0x930
kthread+0x107/0x10f
ret_from_fork+0x3f/0x70
irq event stamp: 138297
hardirqs last enabled at (138297): debug_check_no_locks_freed+0x113/0x12f
hardirqs last disabled at (138296): debug_check_no_locks_freed+0x33/0x12f
softirqs last enabled at (137818): __do_softirq+0x2d3/0x3e9
softirqs last disabled at (137813): irq_exit+0x41/0x95
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(jbd2_handle);
<Interrupt>
lock(jbd2_handle);
*** DEADLOCK ***
5 locks held by git/20158:
#0: (sb_writers#7){.+.+.+}, at: [<ffffffff81155411>] mnt_want_write+0x24/0x4b
#1: (&type->i_mutex_dir_key#2/1){+.+.+.}, at: [<ffffffff81145087>] lock_rename+0xd9/0xe3
#2: (&sb->s_type->i_mutex_key#11){+.+.+.}, at: [<ffffffff8114f8e2>] lock_two_nondirectories+0x3f/0x6b
#3: (&sb->s_type->i_mutex_key#11/4){+.+.+.}, at: [<ffffffff8114f909>] lock_two_nondirectories+0x66/0x6b
#4: (jbd2_handle){+.+.?.}, at: [<ffffffff811e31db>] start_this_handle+0x4ca/0x555
stack backtrace:
CPU: 2 PID: 20158 Comm: git Not tainted 4.1.0-rc7-next-20150615-dbg-00016-g8bdf555-dirty #211
Call Trace:
dump_stack+0x4c/0x6e
mark_lock+0x384/0x56d
mark_held_locks+0x5f/0x76
lockdep_trace_alloc+0xb2/0xb5
kmem_cache_alloc_trace+0x32/0x1e2
zcomp_strm_alloc+0x25/0x73 [zram]
zcomp_strm_multi_find+0xe7/0x173 [zram]
zcomp_strm_find+0xc/0xe [zram]
zram_bvec_rw+0x2ca/0x7e0 [zram]
zram_make_request+0x1fa/0x301 [zram]
generic_make_request+0x9c/0xdb
submit_bio+0xf7/0x120
ext4_io_submit+0x2e/0x43
ext4_bio_write_page+0x1b7/0x300
mpage_submit_page+0x60/0x77
mpage_map_and_submit_buffers+0x10f/0x21d
ext4_writepages+0xc8c/0xe1b
do_writepages+0x23/0x2c
__filemap_fdatawrite_range+0x84/0x8b
filemap_flush+0x1c/0x1e
ext4_alloc_da_blocks+0xb8/0x117
ext4_rename+0x132/0x6dc
? mark_held_locks+0x5f/0x76
ext4_rename2+0x29/0x2b
vfs_rename+0x540/0x636
SyS_renameat2+0x359/0x44d
SyS_rename+0x1e/0x20
entry_SYSCALL_64_fastpath+0x12/0x6f
[minchan@kernel.org: add stable mark]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Kyeongdon Kim <kyeongdon.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
| |
Make is_partial_io()/valid_io_request()/page_zero_filled() return boolean,
since each function only uses either one or zero as its return value.
Signed-off-by: Geliang Tang <geliangtang@163.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`mem_used_max' is designed to store the max amount of memory zram consumed
to store the data. However, it does not represent the actual
'overcommited' (max) value. The existing code goes to -ENOMEM
overcommited case before it updates `->stats.max_used_pages', which hides
the reason we went to -ENOMEM in the first place -- we actually used more
memory than `->limit_pages':
alloced_pages = zs_get_total_pages(meta->mem_pool);
if (zram->limit_pages && alloced_pages > zram->limit_pages) {
zs_free(meta->mem_pool, handle);
ret = -ENOMEM;
goto out;
}
update_used_max(zram, alloced_pages);
Which is misleading. User will see -ENOMEM, check `->limit_pages', check
`->stats.max_used_pages', which will keep the value BEFORE zram passed
`->limit_pages', and see:
`->stats.max_used_pages' < `->limit_pages'
Move update_used_max() before we do `->limit_pages' check, so that
user will see:
`->stats.max_used_pages' > `->limit_pages'
should the overcommit and -ENOMEM happen.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the user supplies an unsupported compression algorithm, keep the
previously selected one (knowingly supported) or the default one (if the
compression algorithm hasn't been changed yet).
Note that previously this operation (i.e. setting an invalid algorithm)
would result in no algorithm being selected, which means that this
represents a small change in the default behaviour.
Minchan said:
For initializing zram, we need to set up 3 optional parameters in advance.
1. the number of compression streams
2. memory limitation
3. compression algorithm
Although user pass completely wrong value to set up for 1 and 2
parameters, it's okay because they have default value so zram will be
initialized with the default value (of course, when user passes a wrong
value via *echo*, sysfs returns -EINVAL so the user can notice it).
But 3 is not consistent with other optional parameters. IOW, if the
user passes a wrong value to set up 3 parameter, zram's initialization
would fail unlike other optional parameters.
So this patch makes them consistent.
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
zram_meta_alloc() constructs a pool name for zs_create_pool() call as
snprintf(pool_name, sizeof(pool_name), "zram%d", device_id);
However, it defines pool name buffer to be only 8 bytes long (minus
trailing zero), which means that we can have only 1000 pool names: zram0
-- zram999.
With CONFIG_ZSMALLOC_STAT enabled an attempt to create a device zram1000
can fail if device zram100 already exists, because snprintf() will
truncate new pool name to zram100 and pass it debugfs_create_dir(),
causing:
debugfs dir <zram100> creation failed
zram: Error creating memory pool
... and so on.
Fix it by passing zram->disk->disk_name to zram_meta_alloc() instead of
divice_id. We construct zram%d name earlier and keep it as a ->disk_name,
no need to snprintf() it again.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Improvement idea by Marcin Jabrzyk.
comp_algorithm_store() silently accepts any supplied algorithm name,
because zram performs algorithm availability check later, during the
device configuration phase in disksize_store() and emits the following
error:
"zram: Cannot initialise %s compressing backend"
this error line is somewhat generic and, besides, can indicate a failed
attempt to allocate compression backend's working buffers.
add algorithm availability check to comp_algorithm_store():
echo lzz > /sys/block/zram0/comp_algorithm
-bash: echo: write error: Invalid argument
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Marcin Jabrzyk <m.jabrzyk@samsung.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Supplied sysfs values sometimes contain new-line symbols (echo vs. echo
-n), which we also copy as a compression algorithm name. it works fine
when we lookup for compression algorithm, because we use sysfs_streq()
which takes care of new line symbols. however, it doesn't look nice when
we print compression algorithm name if zcomp_create() failed:
zram: Cannot initialise LXZ
compressing backend
cut trailing new-line, so the error string will look like
zram: Cannot initialise LXZ compressing backend
we also now can replace sysfs_streq() in zcomp_available_show() with
strcmp().
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
`bool locked' local variable tells us if we should perform
zcomp_strm_release() or not (jumped to `out' label before
zcomp_strm_find() occurred), which is equivalent to `zstrm' being or not
being NULL. remove `locked' and check `zstrm' instead.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We currently don't support on-demand device creation. The one and only
way to have N zram devices is to specify num_devices module parameter
(default value: 1). IOW if, for some reason, at some point, user wants
to have N + 1 devies he/she must umount all the existing devices, unload
the module, load the module passing num_devices equals to N + 1. And do
this again, if needed.
This patch introduces zram control sysfs class, which has two sysfs
attrs:
- hot_add -- add a new zram device
- hot_remove -- remove a specific (device_id) zram device
hot_add sysfs attr is read-only and has only automatic device id
assignment mode (as requested by Minchan Kim). read operation performed
on this attr creates a new zram device and returns back its device_id or
error status.
Usage example:
# add a new specific zram device
cat /sys/class/zram-control/hot_add
2
# remove a specific zram device
echo 4 > /sys/class/zram-control/hot_remove
Returning zram_add() error code back to user (-ENOMEM in this case)
cat /sys/class/zram-control/hot_add
cat: /sys/class/zram-control/hot_add: Cannot allocate memory
NOTE, there might be users who already depend on the fact that at least
zram0 device gets always created by zram_init(). Preserve this behavior.
[minchan@kernel.org: use zram->claim to avoid lockdep splat]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
[ Original patch from Minchan Kim <minchan@kernel.org> ]
Commit ba6b17d68c8e ("zram: fix umount-reset_store-mount race
condition") introduced bdev->bd_mutex to protect a race between mount
and reset. At that time, we don't have dynamic zram-add/remove feature
so it was okay.
However, as we introduce dynamic device feature, bd_mutex became
trouble.
CPU 0
echo 1 > /sys/block/zram<id>/reset
-> kernfs->s_active(A)
-> zram:reset_store->bd_mutex(B)
CPU 1
echo <id> > /sys/class/zram/zram-remove
->zram:zram_remove: bd_mutex(B)
-> sysfs_remove_group
-> kernfs->s_active(A)
IOW, AB -> BA deadlock
The reason we are holding bd_mutex for zram_remove is to prevent
any incoming open /dev/zram[0-9]. Otherwise, we could remove zram
others already have opened. But it causes above deadlock problem.
To fix the problem, this patch overrides block_device.open and
it returns -EBUSY if zram asserts he claims zram to reset so any
incoming open will be failed so we don't need to hold bd_mutex
for zram_remove ayn more.
This patch is to prepare for zram-add/remove feature.
[sergey.senozhatsky@gmail.com: simplify reset_store()]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
| |
This patch prepares zram to enable on-demand device creation.
zram_add() performs automatic device_id assignment and returns
new device id (>= 0) or error code (< 0).
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
| |
We don't have meta->tb_lock anymore and use meta table entry bit_spin_lock
instead. update corresponding comment.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With dynamic device creation/removal (which will be introduced later in
the series) printing num_devices in zram_init() will not make a lot of
sense, as well as printing the number of destroyed devices in
destroy_devices(). Print per-device action (added/removed) in zram_add()
and zram_remove() instead.
Example:
[ 3645.259652] zram: Added device: zram5
[ 3646.152074] zram: Added device: zram6
[ 3650.585012] zram: Removed device: zram5
[ 3655.845584] zram: Added device: zram8
[ 3660.975223] zram: Removed device: zram6
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
| |
Limiting the number of zram devices to 32 (default max_num_devices value)
is confusing, let's drop it. A user with 2TB or 4TB of RAM, for example,
can request as many devices as he can handle.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch looks big, but basically it just moves code blocks.
No functional changes.
Our current code layout looks like a sandwitch.
For example,
a) between read/write handlers, we have update_used_max() helper function:
static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw
b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:
static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request
c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.
Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):
-- one-liners: zram_test_flag/etc.
-- helpers: is_partial_io/update_position/etc
-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.
-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page
-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page
-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch makes some preparations for on-demand device add/remove
functionality.
Remove `zram_devices' array and switch to id-to-pointer translation (idr).
idr doesn't bloat zram struct with additional members, f.e. list_head,
yet still provides ability to match the device_id with the device pointer.
No user-space visible changes.
[Julia.Lawall@lip6.fr: return -ENOMEM when `queue' alloc fails]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
| |
Fix a misplaced backslash.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
|
|
|
|
|
|
|
| |
This config option doesn't provide any usage for zram.
Signed-off-by: Marcin Jabrzyk <m.jabrzyk@samsung.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
|
|
| |
LZ4 is much faster than LZO, and should be preferred when available.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 3aaf14da807a4e9931a37f21e4251abb8a67021b upstream.
zcomp_create() verifies the success of zcomp_strm_{multi,single}_create()
through comp->stream, which can potentially be pointing to memory that
was freed if these functions returned an error.
While at it, replace a 'ERR_PTR(-ENOMEM)' by a more generic
'ERR_PTR(error)' as in the future zcomp_strm_{multi,siggle}_create()
could return other error codes. Function documentation updated
accordingly.
Fixes: beca3ec71fe5 ("zram: add multi stream functionality")
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
| |
|
|
|
|
| |
fix zram drv h include
fix build
|
| |
|
|
| |
Change-Id: I23f6f75979077992298d848efd79a6efc0d776bd
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Revert "zram: do not use copy_page with non-page aligned address"
This reverts commit ed3e8707d2e19d6da506d8ab298e68e79b6621f2.
Revert "zram: sym permissions -> octal perm (checkpath warnings)"
This reverts commit 920095f4566b901834f9b41395968b739b402d4c.
Revert "zram: fix indents/warnings from checkpath"
This reverts commit 0a2fdee5446969c8c70bbdc9f8fde93eb1d47327.
Revert "UPSTREAM: zram/zcomp: do not zero out zcomp private pages"
This reverts commit d13c0c08323df29367affc7b7623d9d2d0ccfbb2.
Revert "UPSTREAM: zram: pass gfp from zcomp frontend to backend"
This reverts commit 6d22d73c07a0f2ffe706e88c302d52371ad29206.
Revert "UPSTREAM: zram: try vmalloc() after kmalloc()"
This reverts commit e6af82ad8a5599a783e9850aca8f1b32fc1f93f4.
Revert "UPSTREAM: zram/zcomp: use GFP_NOIO to allocate streams"
This reverts commit 38e34f1f6f1c9ee9c7f3958fcb35e72174337690.
Revert "zram: Fix a wrong return after merged new LZ4 version"
This reverts commit 7832ce6d8a006747a4c27840b4f7e7d3c12f0dbb.
Revert "zram: change usage of LZ4 to work with new LZ4 version"
This reverts commit 56622e86d4356054aad833aa8547992fdb76e4e3.
Revert "zram: avoid lockdep splat by revalidate_disk"
This reverts commit 149cadf4d8043f55a0d92cacc4b3d3d9cfb75148.
Revert "zram: revalidate disk after capacity change"
This reverts commit 270bdcb8d33f5c4769edab61f33f2fe43c8636f8.
|
| |
|
|
|
| |
result of :
git grep -l '__FUNCTION__' | xargs sed -i 's/__FUNCTION__/__func__/g'
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
__FUNCTION__ hasn't been treated as a string literal since gcc 3.4, so
this only helps people who only test-compile using 3.3 (compiler-gcc3.h
barks at anything older than that). Besides, there are almost no
occurrences of __FUNCTION__ left in the tree.
[akpm@linux-foundation.org: convert remaining __FUNCTION__ references]
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Moyster <oysterized@gmail.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit d72e9a7a93e4f8e9e52491921d99e0c8aa89eb4e upstream.
The copy_page is optimized memcpy for page-alinged address. If it is
used with non-page aligned address, it can corrupt memory which means
system corruption. With zram, it can happen with
1. 64K architecture
2. partial IO
3. slub debug
Partial IO need to allocate a page and zram allocates it via kmalloc.
With slub debug, kmalloc(PAGE_SIZE) doesn't return page-size aligned
address. And finally, copy_page(mem, cmem) corrupts memory.
So, this patch changes it to memcpy.
Actuaully, we don't need to change zram_bvec_write part because zsmalloc
returns page-aligned address in case of PAGE_SIZE class but it's not
good to rely on the internal of zsmalloc.
Note:
When this patch is merged to stable, clear_page should be fixed, too.
Unfortunately, recent zram removes it by "same page merge" feature so
it's hard to backport this patch to -stable tree.
I will handle it when I receive the mail from stable tree maintainer to
merge this patch to backport.
Fixes: 42e99bd ("zram: optimize memory operations with clear_page()/copy_page()")
Link: http://lkml.kernel.org/r/1492042622-12074-2-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Joe Maples <joe@frap129.org>
|
| | |
|