aboutsummaryrefslogtreecommitdiff
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
...
* ext4 crypto: add ext4_mpage_readpages()Theodore Ts'o2017-05-274-3/+271
| | | | | | | | | | This takes code from fs/mpage.c and optimizes it for ext4. Its primary reason is to allow us to more easily add encryption to ext4's read path in an efficient manner. Change-Id: I7d3a27c9768c1487dd374754b40ea6fe64589593 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>
* ext4: Add support for FIDTRIM, a best-effort ioctl for deep discard trimJP Abgrall2017-05-273-12/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (This is a re-application of commit 687f999e1fbd3b553bccbd7f52996ae56c5e327e applied to the ext4 file system code as of 3.18.) * What This provides an interface for issuing an FITRIM which uses the secure discard instead of just a discard. Only the eMMC command is "secure", and not how the FS uses it: due to the fact that the FS might reassign a region somewhere else, the original deleted data will not be affected by the "trim" which only handles un-used regions. So we'll just call it "deep discard", and note that this is a "best effort" cleanup. * Why Once in a while, We want to be able to cleanup most of the unused blocks after erasing a bunch of files. We don't want to constantly secure-discard via a mount option. From an eMMC spec perspective, it tells the device to really get rid of all the data for the specified blocks and not just put them back into the pool of free ones (unlike the normal TRIM). The eMMC spec says the secure trim handling must make sure the data (and metadata) is not available anymore. A simple TRIM doesn't clear the data, it just puts blocks in the free pool. JEDEC Standard No. 84-A441 7.6.9 Secure Erase 7.6.10 Secure Trim From an FS perspective, it is acceptable to leave some data behind. - directory entries related to deleted files - databases entries related to deleted files - small-file data stored in inode extents - blocks held by the FS waiting to be re-used (mitigated by sync). - blocks reassigned by the FS prior to FIDTRIM. Change-Id: I687f999e1fbd3b553bccbd7f52996ae56c5e327e Signed-off-by: Geremy Condra <gcondra@google.com> Signed-off-by: JP Abgrall <jpa@google.com>
* ext4: use old legacy direct I/O interface for 3.18 backportTheodore Ts'o2017-05-273-15/+20
| | | | Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* non-ext4 portions of "direct-io: Implement generic deferred AIO completions"Theodore Ts'o2017-05-275-58/+84
| | | | | | Originally from 7b7a8665edd8db73 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: revert to old read/write/aio interface for 3.18 backportTheodore Ts'o2017-05-271-5/+176
| | | | Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: remove tmpfile, rename2, set_acl operations for 3.18 backportTheodore Ts'o2017-05-271-12/+19
| | | | | | Also switch to the rename operation. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: use old interface for ext4_readdir()Theodore Ts'o2017-05-271-2/+15
| | | | Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* sync: don't block the flusher thread waiting on IODave Chinner2017-05-271-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When sync does it's WB_SYNC_ALL writeback, it issues data Io and then immediately waits for IO completion. This is done in the context of the flusher thread, and hence completely ties up the flusher thread for the backing device until all the dirty inodes have been synced. On filesystems that are dirtying inodes constantly and quickly, this means the flusher thread can be tied up for minutes per sync call and hence badly affect system level write IO performance as the page cache cannot be cleaned quickly. We already have a wait loop for IO completion for sync(2), so cut this out of the flusher thread and delegate it to wait_sb_inodes(). Hence we can do rapid IO submission, and then wait for it all to complete. Effect of sync on fsmark before the patch: FSUse% Count Size Files/sec App Overhead ..... 0 640000 4096 35154.6 1026984 0 720000 4096 36740.3 1023844 0 800000 4096 36184.6 916599 0 880000 4096 1282.7 1054367 0 960000 4096 3951.3 918773 0 1040000 4096 40646.2 996448 0 1120000 4096 43610.1 895647 0 1200000 4096 40333.1 921048 And a single sync pass took: real 0m52.407s user 0m0.000s sys 0m0.090s After the patch, there is no impact on fsmark results, and each individual sync(2) operation run concurrently with the same fsmark workload takes roughly 7s: real 0m6.930s user 0m0.000s sys 0m0.039s IOWs, sync is 7-8x faster on a busy filesystem and does not have an adverse impact on ongoing async data write operations. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 7747bd4bceb3079572695d3942294a6c7b265557)
* ext4: fix up bio->bi_iter.bi_sectorTheodore Ts'o2017-05-271-2/+2
| | | | Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: use old truncate_pagecache() interface for ext4 3.18 backportTheodore Ts'o2017-05-272-3/+5
| | | | Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: use old percpu_counter_init() function signatureTheodore Ts'o2017-05-272-9/+6
| | | | | | Older kernels don't pass in a gfp_flags argument. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: use the old shrinker APITheodore Ts'o2017-05-271-6/+10
| | | | Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: disable ext4 acl handling for 3.18 backportTheodore Ts'o2017-05-271-0/+3
| | | | | | | The ACL code changes too much between 3.18 and 3.10; so disable acl handling so we don't have make things actually work. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* BACKPORT: ext4 from 3.18 to mtk-3.10Mister Oyster2017-05-2738-6175/+6782
|
* fs: add inode_set_flags() for ext4 3.18 backportTheodore Ts'o2017-05-271-0/+31
| | | | | | Excerpted from commit 5f16f3225b062 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* fs: add {lock,unlock}_two_nondirectories for 3.18 backportTheodore Ts'o2017-05-271-0/+35
| | | | Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ANDROID: sdcardfs: Check for NULL in revalidateDaniel Rosenberg2017-05-241-3/+5
| | | | | | | | | If the inode is in the process of being evicted, the top value may be NULL. Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 38502532 Change-Id: I0b9d04aab621e0398d44d1c5dc53293106aa5f89
* UPSTREAM: proc: actually make proc_fd_permission() thread-friendlyOleg Nesterov2017-05-241-2/+5
| | | | | | | | | | | | | | | | | | | | | (cherry pick from commit 54708d2858e79a2bdda10bf8a20c80eb96c20613) The commit 96d0df79f264 ("proc: make proc_fd_permission() thread-friendly") fixed the access to /proc/self/fd from sub-threads, but introduced another problem: a sub-thread can't access /proc/<tid>/fd/ or /proc/thread-self/fd if generic_permission() fails. Change proc_fd_permission() to check same_thread_group(pid_task(), current). Fixes: 96d0df79f264 ("proc: make proc_fd_permission() thread-friendly") Reported-by: "Jin, Yihua" <yihua.jin@intel.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Bug: 26016905 Change-Id: I1894c78d0b13f0bde8cde84bd142ba67590dc0f1
* UPSTREAM: proc: make proc_fd_permission() thread-friendlyOleg Nesterov2017-05-241-5/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (cherry pick from commit 96d0df79f2644fc823f26c06491e182d87a90c2a) proc_fd_permission() says "process can still access /proc/self/fd after it has executed a setuid()", but the "task_pid() = proc_pid() check only helps if the task is group leader, /proc/self points to /proc/<leader-pid>. Change this check to use task_tgid() so that the whole thread group can access its /proc/self/fd or /proc/<tid-of-sub-thread>/fd. Notes: - CLONE_THREAD does not require CLONE_FILES so task->files can differ, but I don't think this can lead to any security problem. And this matches same_thread_group() in __ptrace_may_access(). - /proc/self should probably point to /proc/<thread-tid>, but it is too late to change the rules. Perhaps it makes sense to add /proc/thread though. Test-case: void *tfunc(void *arg) { assert(opendir("/proc/self/fd")); return NULL; } int main(void) { pthread_t t; pthread_create(&t, NULL, tfunc, NULL); pthread_join(t, NULL); return 0; } fails if, say, this executable is not readable and suid_dumpable = 0. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Bug: 26016905 Change-Id: Ifdd6403a8fccd073122e89d3547c13ccc08f0dce Signed-off-by: Joe Maples <joe@frap129.org> Conflicts: fs/proc/fd.c
* fs/mpage.c: factor page_endio() out of mpage_end_io()Matthew Wilcox2017-05-241-17/+1
| | | | | | | | | | | | | | | page_endio() takes care of updating all the appropriate page flags once I/O has finished to a page. Switch to using mapping_set_error() instead of setting AS_EIO directly; this will handle thin-provisioned devices correctly. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dheeraj Reddy <dheeraj.reddy@intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
* fs/mpage.c: factor clean_buffers() out of __mpage_writepage()Matthew Wilcox2017-05-241-24/+30
| | | | | | | | | | | | | | | __mpage_writepage() is over 200 lines long, has 20 local variables, four goto labels and could desperately use simplification. Splitting clean_buffers() into a helper function improves matters a little, removing 20+ lines from it. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dheeraj Reddy <dheeraj.reddy@intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
* fs/mpage.c: Convert to use bio_for_each_segment()Kent Overstreet2017-05-241-9/+8
| | | | | | | | | | | | | | | | | | | | | | With immutable biovecs we don't want code accessing bi_io_vec directly - the uses this patch changes weren't incorrect since they all own the bio, but it makes the code harder to audit for no good reason - also, this will help with multipage bvecs later. Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Chris Mason <chris.mason@fusionio.com> Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com> Cc: Joern Engel <joern@logfs.org> Cc: Prasad Joshi <prasadjoshi.linux@gmail.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Upstream 2c30c71bd653afcbed7f6754e8fe3d16e0e708a1. Patch applied just to fs/mpage.c for future merges. Signed-off-by: Park Ju Hyung <qkrwngud825@gmail.com> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
* fs/block_dev.c: add bdev_read_page() and bdev_write_page()Matthew Wilcox2017-05-242-0/+75
| | | | | | | | | | | | | | | | A block device driver may choose to provide a rw_page operation. These will be called when the filesystem is attempting to do page sized I/O to page cache pages (ie not for direct I/O). This does preclude I/Os that are larger than page size, so this may only be a performance gain for some devices. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Tested-by: Dheeraj Reddy <dheeraj.reddy@intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
* mm/mempolicy.c: convert the shared_policy lock to a rwlockNathan Zimmer2017-05-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running the SPECint_rate gcc on some very large boxes it was noticed that the system was spending lots of time in mpol_shared_policy_lookup(). The gamess benchmark can also show it and is what I mostly used to chase down the issue since the setup for that I found to be easier. To be clear the binaries were on tmpfs because of disk I/O requirements. We then used text replication to avoid icache misses and having all the copies banging on the memory where the instruction code resides. This results in us hitting a bottleneck in mpol_shared_policy_lookup() since lookup is serialised by the shared_policy lock. I have only reproduced this on very large (3k+ cores) boxes. The problem starts showing up at just a few hundred ranks getting worse until it threatens to livelock once it gets large enough. For example on the gamess benchmark at 128 ranks this area consumes only ~1% of time, at 512 ranks it consumes nearly 13%, and at 2k ranks it is over 90%. To alleviate the contention in this area I converted the spinlock to an rwlock. This allows a large number of lookups to happen simultaneously. The results were quite good reducing this consumtion at max ranks to around 2%. [akpm@linux-foundation.org: tidy up code comments] Signed-off-by: Nathan Zimmer <nzimmer@sgi.com> Acked-by: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Nadia Yvette Chambers <nyc@holomorphy.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@suse.de> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com>
* Fix "hide su" patch for 3.10Alberto972017-05-241-0/+1
| | | | | | | | | | Without this, "ls system/xbin" returns "ls: system/xbin/su: No such file or directory" if root is disabled in Developer Settings. This happens because EXT4 uses "readdir" instead of "iterate". 3.18 kernel, instead, unconditionally goes for the "iterate" way here and that explains why I'm not seeing this error there. Change-Id: I26426683df0fd199a80f053294f352e31754bec5
* fs: namespace: fix maybe-uninitialized warningNathan Chancellor2017-05-231-2/+2
| | | | | | | | | | | | | | | | | | | | | | fs/namespace.c: In function 'SyS_mount': fs/namespace.c:2617:8: warning: 'kernel_dev' may be used uninitialized in this function [-Wmaybe-uninitialized] ret = do_mount(kernel_dev, kernel_dir->name, kernel_type, flags, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (void *) data_page); ~~~~~~~~~~~~~~~~~~~ fs/namespace.c:2596:8: note: 'kernel_dev' was declared here char *kernel_dev; ^~~~~~~~~~ fs/namespace.c:2617:8: warning: 'kernel_type' may be used uninitialized in this function [-Wmaybe-uninitialized] ret = do_mount(kernel_dev, kernel_dir->name, kernel_type, flags, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (void *) data_page); ~~~~~~~~~~~~~~~~~~~ fs/namespace.c:2594:8: note: 'kernel_type' was declared here char *kernel_type; ^~~~~~~~~~~ Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
* kernel: Fix potential refcount leak in su checkTom Marshall2017-05-221-1/+3
| | | | | Change-Id: I3d241ae805ba708c18bccfd5e5d6cdcc8a5bc1c8 Signed-off-by: Kevin F. Haggerty <haggertk@lineageos.org>
* kernel: Only expose su when daemon is runningTom Marshall2017-05-213-0/+28
| | | | | | | | | | | | | | It has been claimed that the PG implementation of 'su' has security vulnerabilities even when disabled. Unfortunately, the people that find these vulnerabilities often like to keep them private so they can profit from exploits while leaving users exposed to malicious hackers. In order to reduce the attack surface for vulnerabilites, it is therefore necessary to make 'su' completely inaccessible when it is not in use (except by the root and system users). Change-Id: I79716c72f74d0b7af34ec3a8054896c6559a181d
* f2fs: switch to using fscrypt_match_name()Eric Biggers2017-05-211-24/+4
| | | | | | | | | Switch f2fs directory searches to use the fscrypt_match_name() helper function. There should be no functional change. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* fscrypt: introduce helper function for filename matchingEric Biggers2017-05-212-22/+70
| | | | | | | | | Introduce a helper function fscrypt_match_name() which tests whether a fscrypt_name matches a directory entry. Also clean up the magic numbers and document things properly. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* fscrypt: fix context consistency check when key(s) unavailableEric Biggers2017-05-211-19/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To mitigate some types of offline attacks, filesystem encryption is designed to enforce that all files in an encrypted directory tree use the same encryption policy (i.e. the same encryption context excluding the nonce). However, the fscrypt_has_permitted_context() function which enforces this relies on comparing struct fscrypt_info's, which are only available when we have the encryption keys. This can cause two incorrect behaviors: 1. If we have the parent directory's key but not the child's key, or vice versa, then fscrypt_has_permitted_context() returned false, causing applications to see EPERM or ENOKEY. This is incorrect if the encryption contexts are in fact consistent. Although we'd normally have either both keys or neither key in that case since the master_key_descriptors would be the same, this is not guaranteed because keys can be added or removed from keyrings at any time. 2. If we have neither the parent's key nor the child's key, then fscrypt_has_permitted_context() returned true, causing applications to see no error (or else an error for some other reason). This is incorrect if the encryption contexts are in fact inconsistent, since in that case we should deny access. To fix this, retrieve and compare the fscrypt_contexts if we are unable to set up both fscrypt_infos. While this slightly hurts performance when accessing an encrypted directory tree without the key, this isn't a case we really need to be optimizing for; access *with* the key is much more important. Furthermore, the performance hit is barely noticeable given that we are already retrieving the fscrypt_context and doing two keyring searches in fscrypt_get_encryption_info(). If we ever actually wanted to optimize this case we might start by caching the fscrypt_contexts. Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* fscrypt: Move key structure and constants to uapiJoe Richey2017-05-211-11/+0
| | | | | | | | | | | | | | | | | | | This commit exposes the necessary constants and structures for a userspace program to pass filesystem encryption keys into the keyring. The fscrypt_key structure was already part of the kernel ABI, this change just makes it so programs no longer have to redeclare these structures (like e4crypt in e2fsprogs currently does). Note that we do not expose the other FS_*_KEY_SIZE constants as they are not necessary. Only XTS is supported for contents_encryption_mode, so currently FS_MAX_KEY_SIZE bytes of key material must always be passed to the kernel. This commit also removes __packed from fscrypt_key as it does not contain any implicit padding and does not refer to an on-disk structure. Signed-off-by: Joe Richey <joerichey@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* fscrypt: remove unnecessary checks for NULL operationsEric Biggers2017-05-212-13/+1
| | | | | | | | | | | | | | The functions in fs/crypto/*.c are only called by filesystems configured with encryption support. Since the ->get_context(), ->set_context(), and ->empty_dir() operations are always provided in that case (and must be, otherwise there would be no way to get/set encryption policies, or in the case of ->get_context() even access encrypted files at all), there is no need to check for these operations being NULL and we can remove these unneeded checks. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* fscrypt: eliminate ->prepare_context() operationEric Biggers2017-05-211-7/+0
| | | | | | | | | | | | | | | | The only use of the ->prepare_context() fscrypt operation was to allow ext4 to evict inline data from the inode before ->set_context(). However, there is no reason why this cannot be done as simply the first step in ->set_context(), and in fact it makes more sense to do it that way because then the policy modes and flags get validated before any real work is done. Therefore, merge ext4_prepare_context() into ext4_set_context(), and remove ->prepare_context(). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Conflicts: fs/ext4/super.c
* fscrypt: avoid collisions when presenting long encrypted filenamesEric Biggers2017-05-212-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When accessing an encrypted directory without the key, userspace must operate on filenames derived from the ciphertext names, which contain arbitrary bytes. Since we must support filenames as long as NAME_MAX, we can't always just base64-encode the ciphertext, since that may make it too long. Currently, this is solved by presenting long names in an abbreviated form containing any needed filesystem-specific hashes (e.g. to identify a directory block), then the last 16 bytes of ciphertext. This needs to be sufficient to identify the actual name on lookup. However, there is a bug. It seems to have been assumed that due to the use of a CBC (ciphertext block chaining)-based encryption mode, the last 16 bytes (i.e. the AES block size) of ciphertext would depend on the full plaintext, preventing collisions. However, we actually use CBC with ciphertext stealing (CTS), which handles the last two blocks specially, causing them to appear "flipped". Thus, it's actually the second-to-last block which depends on the full plaintext. This caused long filenames that differ only near the end of their plaintexts to, when observed without the key, point to the wrong inode and be undeletable. For example, with ext4: # echo pass | e4crypt add_key -p 16 edir/ # seq -f "edir/abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch # find edir/ -type f | xargs stat -c %i | sort | uniq | wc -l 100000 # sync # echo 3 > /proc/sys/vm/drop_caches # keyctl new_session # find edir/ -type f | xargs stat -c %i | sort | uniq | wc -l 2004 # rm -rf edir/ rm: cannot remove 'edir/_A7nNFi3rhkEQlJ6P,hdzluhODKOeWx5V': Structure needs cleaning ... To fix this, when presenting long encrypted filenames, encode the second-to-last block of ciphertext rather than the last 16 bytes. Although it would be nice to solve this without depending on a specific encryption mode, that would mean doing a cryptographic hash like SHA-256 which would be much less efficient. This way is sufficient for now, and it's still compatible with encryption modes like HEH which are strong pseudorandom permutations. Also, changing the presented names is still allowed at any time because they are only provided to allow applications to do things like delete encrypted directories. They're not designed to be used to persistently identify files --- which would be hard to do anyway, given that they're encrypted after all. For ease of backports, this patch only makes the minimal fix to both ext4 and f2fs. It leaves ubifs as-is, since ubifs doesn't compare the ciphertext block yet. Follow-on patches will clean things up properly and make the filesystems use a shared helper function. Fixes: 5de0b4d0cd15 ("ext4 crypto: simplify and speed up filename encryption") Reported-by: Gwendal Grignou <gwendal@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* f2fs: check entire encrypted bigname when finding a dentryJaegeuk Kim2017-05-214-20/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If user has no key under an encrypted dir, fscrypt gives digested dentries. Previously, when looking up a dentry, f2fs only checks its hash value with first 4 bytes of the digested dentry, which didn't handle hash collisions fully. This patch enhances to check entire dentry bytes likewise ext4. Eric reported how to reproduce this issue by: # seq -f "edir/abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch # find edir -type f | xargs stat -c %i | sort | uniq | wc -l 100000 # sync # echo 3 > /proc/sys/vm/drop_caches # keyctl new_session # find edir -type f | xargs stat -c %i | sort | uniq | wc -l 99999 Cc: <stable@vger.kernel.org> Reported-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> (fixed f2fs_dentry_hash() to work even when the hash is 0) Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Conflicts: fs/f2fs/inline.c
* f2fs: sync f2fs_lookup() with ext4_lookup()Eric Biggers2017-05-211-3/+4
| | | | | | | | | | As for ext4, now that fscrypt_has_permitted_context() correctly handles the case where we have the key for the parent directory but not the child, f2fs_lookup() no longer has to work around it. Also add the same warning message that ext4 uses. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* f2fs: fix a mount fail for wrong next_scan_nidYunlei He2017-05-211-0/+3
| | | | | | | | | | | | | | | -write_checkpoint -do_checkpoint -next_free_nid <--- something wrong with next free nid -f2fs_fill_super -build_node_manager -build_free_nids -get_current_nat_page -__get_meta_page <--- attempt to access beyond end of device Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: relocate inode_{,un}lock in F2FS_IOC_SETFLAGSChao Yu2017-05-211-3/+4
| | | | | | | | This patch expands cover region of inode->i_rwsem to keep setting flag atomically. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: show available_nids in f2fs/statusJaegeuk Kim2017-05-212-3/+5
| | | | | | This patch adds an entry in f2fs/status to show # of available nids. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: flush dirty nats periodicallyJaegeuk Kim2017-05-211-1/+1
| | | | | | | This patch flushes dirty nats in order to acquire available nids by writing checkpoint. Otherwise, we can have no chance to get freed nids. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discardChao Yu2017-05-214-8/+31
| | | | | | | | | | | | | Introduce CP_TRIMMED_FLAG to indicate all invalid block were trimmed before umount, so once we do mount with image which contain the flag, we don't record invalid blocks as undiscard one, when fstrim is being triggered, we can avoid issuing redundant discard commands. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: include/trace/events/f2fs.h
* f2fs: allow cpc->reason to indicate more than one reasonChao Yu2017-05-213-20/+18
| | | | | | | | Change to use different bits of cpc->reason to indicate different status, so cpc->reason can indicate more than one reason. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: release cp and dnode lock before IPUHou Pengyang2017-05-214-16/+27
| | | | | | | | We don't need to rewrite the page under cp_rwsem and dnode locks. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: shrink size of struct discard_cmdChao Yu2017-05-211-1/+1
| | | | | | | | In order to shrink size of struct discard_cmd, change variable type of @state in struct discard_cmd from int to unsigned char. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: don't hold cmd_lock during waiting discard commandChao Yu2017-05-212-5/+21
| | | | | | | | | | | | | Previously, with protection of cmd_lock, we will wait for end io of discard command which potentially may lead long latency, making worse concurrency. So, in this patch, we try to add reference into discard entry to prevent the entry being released by other thread, then we can avoid holding global cmd_lock during waiting discard to finish. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: nullify fio->encrypted_page for each writesJaegeuk Kim2017-05-211-2/+2
| | | | | | | | | This makes sure each write request has nullified encrypted_page pointer. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: fs/f2fs/segment.c
* f2fs: sanity check segment countJin Qian2017-05-211-0/+7
| | | | | | | | F2FS uses 4 bytes to represent block address. As a result, supported size of disk is 16 TB and it equals to 16 * 1024 * 1024 / 2 segments. Signed-off-by: Jin Qian <jinqian@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: introduce valid_ipu_blkaddr to clean upJaegeuk Kim2017-05-211-5/+12
| | | | | | | This patch introduces valid_ipu_blkaddr to clean up checking block address for inplace-update. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: lookup extent cache first under IPU scenarioHou Pengyang2017-05-213-3/+17
| | | | | | | | | | | | | | | If a page is cold, NOT atomit written and need_ipu now, there is a high probability that IPU should be adapted. For IPU, we try to check extent tree to get the block index first, instead of reading the dnode page, where may lead to an useless dnode IO, since no need to update the dnode index for IPU. Signed-off-by: Hou Pengyang <houpengyang@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Conflicts: fs/f2fs/gc.c