aboutsummaryrefslogtreecommitdiff
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
...
* f2fs: fix potential overflow when adjusting GC cycleChao Yu2017-10-042-9/+16
| | | | | | | | | While comparing signed and unsigned variables, compiler will converts the signed value to unsigned one, due to this reason, {in,de}crease_sleep_time may return overflowed result. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: avoid unneeded sync on quota fileChao Yu2017-10-041-2/+2
| | | | | | | | We only need to sync quota file with appointed quota type instead of all types in f2fs_quota_{on,off}. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: introduce gc_urgent mode for background GCJaegeuk Kim2017-10-043-2/+28
| | | | | | | | | This patch adds a sysfs entry to control urgent mode for background GC. If this is set, background GC thread conducts GC with gc_urgent_sleep_time all the time. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix the size value in __check_sit_bitmapYunlong Song2017-10-041-2/+5
| | | | | | | | The current size value is not correct and will miss bitmap check. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: add app/fs io statChao Yu2017-10-0410-32/+202
| | | | | | | | | | | This patch enables inner app/fs io stats and introduces below virtual fs nodes for exposing stats info: /sys/fs/f2fs/<dev>/iostat_enable /proc/fs/f2fs/<dev>/iostat_info Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix wrong stat assignment] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: do not change the valid_block value if cur_valid_map was wrongly set ↵Yunlong Song2017-10-041-0/+4
| | | | | | | | or cleared Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: update cur_valid_map_mir together with cur_valid_mapYunlong Song2017-10-041-14/+30
| | | | | | | | | | | | When cur_valid_map passes the f2fs_test_and_set(,clear)_bit test, cur_valid_map_mir update is skipped unlikely, so fix it. The fix now changes the mirror check together with cur_valid_map all the time. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: Fix unused variable and add unlikely for corner condition.] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: use printk_ratelimited for f2fs_msgJaegeuk Kim2017-10-041-1/+1
| | | | | | This patch reduces contention of printks. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: expose features to sysfs entryJaegeuk Kim2017-10-041-26/+130
| | | | | | | | | | | This patch exposes what features are supported by current f2fs build to sysfs entry via: /sys/fs/f2fs/features/ /sys/fs/f2fs/dev/features Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: support inode checksumChao Yu2017-10-045-1/+118
| | | | | | | | This patch adds to support inode checksum in f2fs. Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix verification flow] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: return wrong error number on f2fs_quota_writeJaegeuk Kim2017-10-041-1/+1
| | | | | | This must return size, not error number. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: provide f2fs_balance_fs to __write_node_pageYunlong Song2017-10-043-8/+13
| | | | | | | | Let node writeback also do f2fs_balance_fs to ensure there are always enough free segments. Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: introduce f2fs_statfs_projectChao Yu2017-10-041-0/+48
| | | | | | | | This patch introduces f2fs_statfs_project, it enables to show usage status of directory tree which is limited with project quota. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: don't need to wait for node writes for atomic writeJaegeuk Kim2017-10-041-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | We have a node chain to serialize node block writes, so if any IOs for node block writes are reordered, we'll get broken node chain. IOWs, roll-forward recovery will see all or none node blocks given fsync mark. E.g., Node chain consists of: N1 -> N2 -> N3 -> NFSYNC -> N1' -> N2' -> N'FSYNC Reordered to: 1) N1 -> N2 -> N3 -> N2' -> NFSYNC -> N'FSYNC -> power-cut 2) N1 -> N2 -> N3 -> N1' -> NFSYNC -> power-cut 3) N1 -> N2 -> NFSYNC -> N1' -> N'FSYNC -> N3 -> power-cut 4) N1 -> NFSYNC -> N1' -> N2' -> N'FSYNC -> N3 -> power-cut Roll-forward recovery can proceed to: 1) N1 -> N2 -> N3 -> NFSYNC -> X 2) N1 -> N2 -> N3 -> NFSYNC -> N1' -> X 3) N1 -> N2 -> N3 -> FSYNC -> N1' -> X 4) N1 -> X Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: avoid naming confusion of sysfs initJaegeuk Kim2017-10-043-14/+14
| | | | | | | | | | | This patch changes the function names of sysfs init to follow ext4. f2fs_init_sysfs <-> f2fs_register_sysfs f2fs_exit_sysfs <-> f2fs_unregister_sysfs Suggested-by: Chao Yu <yuchao0@huawei.com> Reivewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: support project quotaChao Yu2017-10-046-15/+118
| | | | | | | This patch adds to support plain project quota. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: record quota during dot{,dot} recoveryChao Yu2017-10-041-0/+2
| | | | | | | | | | | In ->lookup(), we will have a try to recover dot or dotdot for corrupted directory, once disk quota is on, if it allocates new block during dotdot recovery, we need to record disk quota info for the allocation, so this patch fixes this issue by adding missing dquot_initialize() in __recover_dot_dentries. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: enhance on-disk inode structure scalabilityChao Yu2017-10-049-50/+124
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch add new flag F2FS_EXTRA_ATTR storing in inode.i_inline to indicate that on-disk structure of current inode is extended. In order to extend, we changed the inode structure a bit: Original one: struct f2fs_inode { ... struct f2fs_extent i_ext; __le32 i_addr[DEF_ADDRS_PER_INODE]; __le32 i_nid[DEF_NIDS_PER_INODE]; } Extended one: struct f2fs_inode { ... struct f2fs_extent i_ext; union { struct { __le16 i_extra_isize; __le16 i_padding; __le32 i_extra_end[0]; }; __le32 i_addr[DEF_ADDRS_PER_INODE]; }; __le32 i_nid[DEF_NIDS_PER_INODE]; } Once F2FS_EXTRA_ATTR is set, we will steal four bytes in the head of i_addr field for storing i_extra_isize and i_padding. with i_extra_isize, we can calculate actual size of reserved space in i_addr, available attribute fields included in total extra attribute fields for current inode can be described as below: +--------------------+ | .i_mode | | ... | | .i_ext | +--------------------+ | .i_extra_isize |-----+ | .i_padding | | | .i_prjid | | | .i_atime_extra | | | .i_ctime_extra | | | .i_mtime_extra |<----+ | .i_inode_cs |<----- store blkaddr/inline from here | .i_xattr_cs | | ... | +--------------------+ | | | block address | | | +--------------------+ | .i_nid | +--------------------+ | node_footer | | (nid, ino, offset) | +--------------------+ Hence, with this patch, we would enhance scalability of f2fs inode for storing more newly added attribute. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: make max inline size changeableChao Yu2017-10-045-59/+95
| | | | | | | | | | | | | | | | | This patch tries to make below macros calculating max inline size, inline dentry field size considerring reserving size-changeable space: - MAX_INLINE_DATA - NR_INLINE_DENTRY - INLINE_DENTRY_BITMAP_SIZE - INLINE_RESERVED_SIZE Then, when inline_{data,dentry} options is enabled, it allows us to reserve inline space with different size flexibly for adding newly introduced inode attribute. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: add ioctl to expose current featuresJaegeuk Kim2017-10-042-0/+15
| | | | | | | | | This patch adds an ioctl to provide feature information to user. For exapmle, SQLite can use this ioctl to detect whether f2fs support atomic write or not. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: don't give partially written atomic data from process crashJaegeuk Kim2017-10-042-0/+19
| | | | | | | | | | | | | | | | | | | | | | This patch resolves the below scenario. == Process 1 == == Process 2 == open(w) open(rw) begin write(new_#1) process_crash f_op->flush locks_remove_posix f_op>release read (new_#1) In order to avoid corrupted database caused by new_#1, we must do roll-back at process_crash time. In order to check that, this patch keeps task which triggers transaction begin, and does roll-back in f_op->flush before removing file locks. Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: preserve i_mode if __f2fs_set_acl() failsErnesto A. Fernández2017-10-021-2/+3
| | | | | | | | | | | | | | | | | When changing a file's acl mask, __f2fs_set_acl() will first set the group bits of i_mode to the value of the mask, and only then set the actual extended attribute representing the new acl. If the second part fails (due to lack of space, for example) and the file had no acl attribute to begin with, the system will from now on assume that the mask permission bits are actual group permission bits, potentially granting access to the wrong users. Prevent this by only changing the inode mode after the acl has been set. Signed-off-by: Ernesto A. Fernández <ernesto.mnd.fernandez@gmail.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: alloc new nids for xattr block in recoveryYunlei He2017-10-021-13/+15
| | | | | | | | | | | | | | | | | recovery file A: recovery file B: -get_dnode_of_data -alloc_nid -recover_xattr_data -set_node_addr(sbi, &ni, NEW_ADDR, false); --->bug_on for nid has been used by file A In recovery process, new allocated node blocks may "reuse" xattr block nids, this patch alloc new nids for xattr blocks in recovery process to avoid this problem. Signed-off-by: Yunlei He <heyunlei@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: spread struct f2fs_dentry_ptr for inline pathChao Yu2017-10-022-23/+31
| | | | | | | | | | Use f2fs_dentry_ptr structure to indicate inline dentry structure as much as possible, so we can wrap inline dentry with size-fixed fields to the one with size-changeable fields. With this change, we can handle size-changeable inline dentry more easily. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove unused input parameterYunlei He2017-10-023-7/+5
| | | | | | | | | | This patch remove unused input parameter in function new_node_page. Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Yong Sheng <shengyong1@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* debugfs: add get/set for atomic typesSeth Jennings2017-09-261-0/+42
| | | | | | | | | | | | | | | | | | | debugfs currently lack the ability to create attributes that set/get atomic_t values. This patch adds support for this through a new debugfs_create_atomic_t() function. Change-Id: I60cb007e9a67a410771e0ad78621f0875cb6d48c Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Git-commit: 3a76e5e09fbb51e756b4e732e3e65446f4984cf5 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
* block: zram: Backport from Linux 4.1Sultan Qasim Khan2017-09-251-0/+28
| | | | Change-Id: I23f6f75979077992298d848efd79a6efc0d776bd
* fs: proc: task_mmu: fix proc_mem_open creds for fs access checksMister Oyster2017-09-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Should fix : [ 2201.337557]<0> (0)[672:android.bg]WARNING: at ../../../../../../kernel/meizu/m2note/kernel/ptrace.c:239 __ptrace_may_access+0x164/0x178() [ 2201.337568]<0> (0)[672:android.bg]denying ptrace access check without PTRACE_MODE_*CREDS [ 2201.337583]<0> (0)[672:android.bg]CPU: 0 PID: 672 Comm: android.bg Tainted: G W 3.10.107-NOyster #1 [ 2201.337593]<0> (0)[672:android.bg]Call trace: [ 2201.337609]<0> (0)[672:android.bg][<ffffffc000089558>] dump_backtrace+0x0/0x148 [ 2201.337625]<0> (0)[672:android.bg][<ffffffc0000896b4>] show_stack+0x14/0x1c [ 2201.337642]<0> (0)[672:android.bg][<ffffffc0009e88d4>] dump_stack+0x20/0x28 [ 2201.337657]<0> (0)[672:android.bg][<ffffffc00009b6cc>] warn_slowpath_fmt+0xb0/0x134 [ 2201.337673]<0> (0)[672:android.bg][<ffffffc0000ab378>] __ptrace_may_access+0x164/0x178 [ 2201.337687]<0> (0)[672:android.bg][<ffffffc0000ab8c0>] ptrace_may_access+0x2c/0x4c [ 2201.337704]<0> (0)[672:android.bg][<ffffffc000098d48>] mm_access+0x98/0xe0 [ 2201.337722]<0> (0)[672:android.bg][<ffffffc00020e788>] proc_mem_open+0x2c/0xa0 [ 2201.337739]<0> (0)[672:android.bg][<ffffffc00020a79c>] pid_smaps_open+0x48/0x88 [ 2201.337756]<0> (0)[672:android.bg][<ffffffc0001a0c54>] do_dentry_open+0x178/0x268 [ 2201.337772]<0> (0)[672:android.bg][<ffffffc0001a1e30>] finish_open+0x30/0x5c [ 2201.337787]<0> (0)[672:android.bg][<ffffffc0001b12c0>] do_last.isra.29+0x45c/0xcbc [ 2201.337802]<0> (0)[672:android.bg][<ffffffc0001b1bd8>] path_openat.isra.30+0xb8/0x494 [ 2201.337817]<0> (0)[672:android.bg][<ffffffc0001b2028>] do_filp_open+0x40/0xb4 [ 2201.337834]<0> (0)[672:android.bg][<ffffffc0001a22fc>] do_sys_open+0x118/0x1f0 [ 2201.337851]<0> (0)[672:android.bg][<ffffffc0001a240c>] SyS_openat+0x10/0x18 [ 2201.337861]<0> (0)[672:android.bg]---[ end trace e7bf4b0b0cb5766d ]--- Signed-off-by: Mister Oyster <oysterized@gmail.com>
* seq_file: remove "%n" usage from seq_file usersTetsuo Handa2017-09-234-41/+22
| | | | | | | | | | | | | | | | | | | All seq_printf() users are using "%n" for calculating padding size, convert them to use seq_setwidth() / seq_pad() pair. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Joe Perches <joe@perches.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Git-commit: 652586df95e5d76b37d07a11839126dcfede1621 Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git [davidb@codeaurora.org: Resolve merge conflicts with ipv4/6 ping changes in upstream] CRs-fixed: 665291 Change-Id: Ia0416c9dbe3d80ff35f24f9c93c3543d1200a327 Signed-off-by: David Brown <davidb@codeaurora.org>
* seq_file: introduce seq_setwidth() and seq_pad()Tetsuo Handa2017-09-231-0/+15
| | | | | | | | | | | | | | | | | | | | | There are several users who want to know bytes written by seq_*() for alignment purpose. Currently they are using %n format for knowing it because seq_*() returns 0 on success. This patch introduces seq_setwidth() and seq_pad() for allowing them to align without using %n format. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Joe Perches <joe@perches.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Git-commit: 839cc2a94cc3665bafe32203c2f095f4dd470a80) Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git CRs-fixed: 665291 Change-Id: I727d9af5ed320d717295c9d0f82c88623fb181c1 Signed-off-by: David Brown <davidb@codeaurora.org>
* misc: replace __FUNCTION__ by __function__Moyster2017-09-232-2/+2
| | | | | result of : git grep -l '__FUNCTION__' | xargs sed -i 's/__FUNCTION__/__func__/g'
* fsync: cleanupMister Oyster2017-09-221-22/+18
| | | | Signed-off-by: Mister Oyster <oysterized@gmail.com>
* fs: reduce permissions of dynamic sync control sysfs to 0660fire8552017-09-221-1/+1
| | | | this fixes compiling because 0666 isn't allowed
* Revert to dynamic sync control v1.5Mister Oyster2017-09-154-144/+182
| | | | | This reverts commit 6e4b043748ce83d6c7ba6dbf9ba50bd857d659d6. This reverts commit 1abef3b2cf1b835192ba2484e42f4a1dbba26807.
* fs: f2fs: remove duplicate lib llist helper from f2fs since it was merged ↵Mister Oyster2017-09-131-22/+0
| | | | https://github.com/Moyster/android_kernel_m2note/commit/cb0f38652f775a21e09939cac3031ffe7417c563
* ANDROID: sdcardfs: Add missing breakDaniel Rosenberg2017-09-131-0/+1
| | | | | | Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 63245673 Change-Id: I5fc596420301045895e5a9a7e297fd05434babf9
* ANDROID: Sdcardfs: Move gid derivation under flagDaniel Rosenberg2017-09-135-4/+20
| | | | | | | | | This moves the code to adjust the gid/uid of lower filesystem files under the mount flag derive_gid. Signed-off-by: Daniel Rosenberg <drosen@google.com> Change-Id: I44eaad4ef67c7fcfda3b6ea3502afab94442610c Bug: 63245673
* ANDROID: mnt: Fix freeing of mount dataDaniel Rosenberg2017-09-131-3/+1
| | | | | | | | | Fix double free on error paths Signed-off-by: Daniel Rosenberg <drosen@google.com> Change-Id: I1c25a175e87e5dd5cafcdcf9d78bf4c0dc3f88ef Bug: 65386954 Fixes: aa6d3ace42f9 ("mnt: Add filesystem private data to mount points")
* cifs: Fix df output for users with quota limitsSachin Prabhu2017-09-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 42bec214d8bd432be6d32a1acb0a9079ecd4d142 upstream. The df for a SMB2 share triggers a GetInfo call for FS_FULL_SIZE_INFORMATION. The values returned are used to populate struct statfs. The problem is that none of the information returned by the call contains the total blocks available on the filesystem. Instead we use the blocks available to the user ie. quota limitation when filling out statfs.f_blocks. The information returned does contain Actual free units on the filesystem and is used to populate statfs.f_bfree. For users with quota enabled, it can lead to situations where the total free space reported is more than the total blocks on the system ending up with df reports like the following # df -h /mnt/a Filesystem Size Used Avail Use% Mounted on //192.168.22.10/a 2.5G -2.3G 2.5G - /mnt/a To fix this problem, we instead populate both statfs.f_bfree with the same value as statfs.f_bavail ie. CallerAvailableAllocationUnits. This is similar to what is done already in the code for cifs and df now reports the quota information for the user used to mount the share. # df --si /mnt/a Filesystem Size Used Avail Use% Mounted on //192.168.22.10/a 2.7G 101M 2.6G 4% /mnt/a Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Pierguido Lambri <plambri@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Francisco Franco <franciscofranco.1990@gmail.com>
* FROMLIST: f2fs: introduce discard_granularity sysfs entryChao Yu2017-09-083-15/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (url: https://patchwork.kernel.org/patch/9876921/) Commit d618ebaf0aa8 ("f2fs: enable small discard by default") enables f2fs to issue 4K size discard in real-time discard mode. However, issuing smaller discard may cost more lifetime but releasing less free space in flash device. Since f2fs has ability of separating hot/cold data and garbage collection, we can expect that small-sized invalid region would expand soon with OPU, deletion or garbage collection on valid datas, so it's better to delay or skip issuing smaller size discards, it could help to reduce overmuch consumption of IO bandwidth and lifetime of flash storage. This patch makes f2fs selectng 64K size as its default minimal granularity, and issue discard with the size which is not smaller than minimal granularity. Also it exposes discard granularity as sysfs entry for configuration in different scenario. Jaegeuk Kim: We must issue all the accumulated discard commands when fstrim is called. So, I've added pend_list_tag[] to indicate whether we should issue the commands or not. If tag sets P_ACTIVE or P_TRIM, we have to issue them. P_TRIM is set once at a time, given fstrim trigger. In addition, issue_discard_thread is calling too much due to the number of discard commands remaining in the pending list. I added a timer to control it likewise gc_thread. Change-Id: Ia90dd686c25cb27f144137ea3c9bcc1c943a9aea Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* FROMLIST: f2fs: give a try to do atomic write in -ENOMEM caseJaegeuk Kim2017-09-081-2/+6
| | | | | | | | | | It'd be better to retry writing atomic pages when we get -ENOMEM. (url https://www.spinics.net/lists/linux-fsdevel/msg113844.html) Bug: 63260873 Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* FROMLIST: f2fs: use IPU for cold filesJaegeuk Kim2017-09-081-0/+4
| | | | | | | | | | | | (url: https://patchwork.kernel.org/patch/9886419/) We expect cold files write data sequentially, but sometimes some of small data can be updated, which incurs fragmentation. Let's avoid that. Change-Id: Ib4a8db92e05bc88b1c7809707078efd249421045 Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* FROMLIST: f2fs: make background threads of f2fs being aware of freezingChao Yu2017-09-082-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (url: https://patchwork.kernel.org/patch/9857935/) When ->freeze_fs is called from lvm for doing snapshot, it needs to make sure there will be no more changes in filesystem's data, however, previously, background threads like GC thread wasn't aware of freezing, so in environment with active background threads, data of snapshot becomes unstable. This patch fixes this issue by adding sb_{start,end}_intwrite in below background threads: - GC thread - flush thread - discard thread Note that, don't use sb_start_intwrite() in gc_thread_func() due to: generic/241 reports below bug: ====================================================== WARNING: possible circular locking dependency detected 4.13.0-rc1+ #32 Tainted: G O ------------------------------------------------------ f2fs_gc-250:0/22186 is trying to acquire lock: (&sbi->gc_mutex){+.+...}, at: [<f8fa7f0b>] f2fs_sync_fs+0x7b/0x1b0 [f2fs] but task is already holding lock: (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (sb_internal#2){++++.-}: __lock_acquire+0x405/0x7b0 lock_acquire+0xae/0x220 __sb_start_write+0x11d/0x1f0 f2fs_evict_inode+0x2d6/0x4e0 [f2fs] evict+0xa8/0x170 iput+0x1fb/0x2c0 f2fs_sync_inode_meta+0x3f/0xf0 [f2fs] write_checkpoint+0x1b1/0x750 [f2fs] f2fs_sync_fs+0x85/0x1b0 [f2fs] f2fs_do_sync_file.isra.24+0x137/0xa30 [f2fs] f2fs_sync_file+0x34/0x40 [f2fs] vfs_fsync_range+0x4a/0xa0 do_fsync+0x3c/0x60 SyS_fdatasync+0x15/0x20 do_fast_syscall_32+0xa1/0x1b0 entry_SYSENTER_32+0x4c/0x7b -> #1 (&sbi->cp_mutex){+.+...}: __lock_acquire+0x405/0x7b0 lock_acquire+0xae/0x220 __mutex_lock+0x4f/0x830 mutex_lock_nested+0x25/0x30 write_checkpoint+0x2f/0x750 [f2fs] f2fs_sync_fs+0x85/0x1b0 [f2fs] sync_filesystem+0x67/0x80 generic_shutdown_super+0x27/0x100 kill_block_super+0x22/0x50 kill_f2fs_super+0x3a/0x40 [f2fs] deactivate_locked_super+0x3d/0x70 deactivate_super+0x40/0x60 cleanup_mnt+0x39/0x70 __cleanup_mnt+0x10/0x20 task_work_run+0x69/0x80 exit_to_usermode_loop+0x57/0x92 do_fast_syscall_32+0x18c/0x1b0 entry_SYSENTER_32+0x4c/0x7b -> #0 (&sbi->gc_mutex){+.+...}: validate_chain.isra.36+0xc50/0xdb0 __lock_acquire+0x405/0x7b0 lock_acquire+0xae/0x220 __mutex_lock+0x4f/0x830 mutex_lock_nested+0x25/0x30 f2fs_sync_fs+0x7b/0x1b0 [f2fs] f2fs_balance_fs_bg+0xb9/0x200 [f2fs] gc_thread_func+0x302/0x4a0 [f2fs] kthread+0xe9/0x120 ret_from_fork+0x19/0x24 other info that might help us debug this: Chain exists of: &sbi->gc_mutex --> &sbi->cp_mutex --> sb_internal#2 Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(sb_internal#2); lock(&sbi->cp_mutex); lock(sb_internal#2); lock(&sbi->gc_mutex); *** DEADLOCK *** 1 lock held by f2fs_gc-250:0/22186: #0: (sb_internal#2){++++.-}, at: [<f8fb5609>] gc_thread_func+0x159/0x4a0 [f2fs] stack backtrace: CPU: 2 PID: 22186 Comm: f2fs_gc-250:0 Tainted: G O 4.13.0-rc1+ #32 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 Call Trace: dump_stack+0x5f/0x92 print_circular_bug+0x1b3/0x1bd validate_chain.isra.36+0xc50/0xdb0 ? __this_cpu_preempt_check+0xf/0x20 __lock_acquire+0x405/0x7b0 lock_acquire+0xae/0x220 ? f2fs_sync_fs+0x7b/0x1b0 [f2fs] __mutex_lock+0x4f/0x830 ? f2fs_sync_fs+0x7b/0x1b0 [f2fs] mutex_lock_nested+0x25/0x30 ? f2fs_sync_fs+0x7b/0x1b0 [f2fs] f2fs_sync_fs+0x7b/0x1b0 [f2fs] f2fs_balance_fs_bg+0xb9/0x200 [f2fs] gc_thread_func+0x302/0x4a0 [f2fs] ? preempt_schedule_common+0x2f/0x4d ? f2fs_gc+0x540/0x540 [f2fs] kthread+0xe9/0x120 ? f2fs_gc+0x540/0x540 [f2fs] ? kthread_create_on_node+0x30/0x30 ret_from_fork+0x19/0x24 The deadlock occurs in below condition: GC Thread Thread B - sb_start_intwrite - f2fs_sync_file - f2fs_sync_fs - mutex_lock(&sbi->gc_mutex) - write_checkpoint - block_operations - f2fs_sync_inode_meta - iput - sb_start_intwrite - mutex_lock(&sbi->gc_mutex) Fix this by altering sb_start_intwrite to sb_start_write_trylock. Change-Id: Ibea8cff73d684e5aebc950f29ef4d611fc10df76 Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* FS: F2FS: Use jiffiesDorimanx2017-09-047-8/+9
| | | | Signed-off-by: Joe Maples <joe@frap129.org>
* fs: exfat: import latest dorimanx/exfat-nofuse driversMister Oyster2017-08-3129-0/+12655
|
* exec: Limit arg stack to at most 75% of _STK_LIMKees Cook2017-08-311-5/+6
| | | | | | | | To avoid pathological stack usage or the need to special-case setuid execs, just limit all arg stack usage to at most 75% of _STK_LIM (6MB). Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* pstore: Use dynamic spinlock initializerKees Cook2017-08-201-1/+1
| | | | | | | | | | | | The per-prz spinlock should be using the dynamic initializer so that lockdep can correctly track it. Without this, under lockdep, we get a warning at boot that the lock is in non-static memory. Fixes: 109704492ef6 ("pstore: Make spinlock per zone instead of global") Fixes: 76d5692a5803 ("pstore: Correctly initialize spinlock and flags") Signed-off-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Joe Maples <joe@frap129.org>
* pstore: Correctly initialize spinlock and flagsKees Cook2017-08-201-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ram backend wasn't always initializing its spinlock correctly. Since it was coming from kzalloc memory, though, it was harmless on architectures that initialize unlocked spinlocks to 0 (at least x86 and ARM). This also fixes a possibly ignored flag setting too. When running under CONFIG_DEBUG_SPINLOCK, the following Oops was visible: [ 0.760836] persistent_ram: found existing buffer, size 29988, start 29988 [ 0.765112] persistent_ram: found existing buffer, size 30105, start 30105 [ 0.769435] persistent_ram: found existing buffer, size 118542, start 118542 [ 0.785960] persistent_ram: found existing buffer, size 0, start 0 [ 0.786098] persistent_ram: found existing buffer, size 0, start 0 [ 0.786131] pstore: using zlib compression [ 0.790716] BUG: spinlock bad magic on CPU#0, swapper/0/1 [ 0.790729] lock: 0xffffffc0d1ca9bb0, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0 [ 0.790742] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.10.0-rc2+ #913 [ 0.790747] Hardware name: Google Kevin (DT) [ 0.790750] Call trace: [ 0.790768] [<ffffff900808ae88>] dump_backtrace+0x0/0x2bc [ 0.790780] [<ffffff900808b164>] show_stack+0x20/0x28 [ 0.790794] [<ffffff9008460ee0>] dump_stack+0xa4/0xcc [ 0.790809] [<ffffff9008113cfc>] spin_dump+0xe0/0xf0 [ 0.790821] [<ffffff9008113d3c>] spin_bug+0x30/0x3c [ 0.790834] [<ffffff9008113e28>] do_raw_spin_lock+0x50/0x1b8 [ 0.790846] [<ffffff9008a2d2ec>] _raw_spin_lock_irqsave+0x54/0x6c [ 0.790862] [<ffffff90083ac3b4>] buffer_size_add+0x48/0xcc [ 0.790875] [<ffffff90083acb34>] persistent_ram_write+0x60/0x11c [ 0.790888] [<ffffff90083aab1c>] ramoops_pstore_write_buf+0xd4/0x2a4 [ 0.790900] [<ffffff90083a9d3c>] pstore_console_write+0xf0/0x134 [ 0.790912] [<ffffff900811c304>] console_unlock+0x48c/0x5e8 [ 0.790923] [<ffffff900811da18>] register_console+0x3b0/0x4d4 [ 0.790935] [<ffffff90083aa7d0>] pstore_register+0x1a8/0x234 [ 0.790947] [<ffffff90083ac250>] ramoops_probe+0x6b8/0x7d4 [ 0.790961] [<ffffff90085ca548>] platform_drv_probe+0x7c/0xd0 [ 0.790972] [<ffffff90085c76ac>] driver_probe_device+0x1b4/0x3bc [ 0.790982] [<ffffff90085c7ac8>] __device_attach_driver+0xc8/0xf4 [ 0.790996] [<ffffff90085c4bfc>] bus_for_each_drv+0xb4/0xe4 [ 0.791006] [<ffffff90085c7414>] __device_attach+0xd0/0x158 [ 0.791016] [<ffffff90085c7b18>] device_initial_probe+0x24/0x30 [ 0.791026] [<ffffff90085c648c>] bus_probe_device+0x50/0xe4 [ 0.791038] [<ffffff90085c35b8>] device_add+0x3a4/0x76c [ 0.791051] [<ffffff90087d0e84>] of_device_add+0x74/0x84 [ 0.791062] [<ffffff90087d19b8>] of_platform_device_create_pdata+0xc0/0x100 [ 0.791073] [<ffffff90087d1a2c>] of_platform_device_create+0x34/0x40 [ 0.791086] [<ffffff900903c910>] of_platform_default_populate_init+0x58/0x78 [ 0.791097] [<ffffff90080831fc>] do_one_initcall+0x88/0x160 [ 0.791109] [<ffffff90090010ac>] kernel_init_freeable+0x264/0x31c [ 0.791123] [<ffffff9008a25bd0>] kernel_init+0x18/0x11c [ 0.791133] [<ffffff9008082ec0>] ret_from_fork+0x10/0x50 [ 0.793717] console [pstore-1] enabled [ 0.797845] pstore: Registered ramoops as persistent store backend [ 0.804647] ramoops: attached 0x100000@0xf7edc000, ecc: 0/0 Fixes: 663deb47880f ("pstore: Allow prz to control need for locking") Fixes: 109704492ef6 ("pstore: Make spinlock per zone instead of global") Reported-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Joe Maples <joe@frap129.org>
* pstore: Allow prz to control need for lockingJoel Fernandes2017-08-202-11/+18
| | | | | | | | | | | In preparation of not locking at all for certain buffers depending on if there's contention, make locking optional depending on the initialization of the prz. Signed-off-by: Joel Fernandes <joelaf@google.com> [kees: moved locking flag into prz instead of via caller arguments] Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Joe Maples <joe@frap129.org>
* pstore: Make spinlock per zone instead of globalJoel Fernandes2017-08-201-6/+5
| | | | | | | | | | | | | | | Currently pstore has a global spinlock for all zones. Since the zones are independent and modify different areas of memory, there's no need to have a global lock, so we should use a per-zone lock as introduced here. Also, when ramoops's ftrace use-case has a FTRACE_PER_CPU flag introduced later, which splits the ftrace memory area into a single zone per CPU, it will eliminate the need for locking. In preparation for this, make the locking optional. Signed-off-by: Joel Fernandes <joelaf@google.com> [kees: updated commit message] Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Joe Maples <joe@frap129.org>