aboutsummaryrefslogtreecommitdiff
path: root/fs/ext4
Commit message (Collapse)AuthorAgeFilesLines
...
* ext4: add optimization for the lazytime mount optionTheodore Ts'o2017-05-292-2/+72
| | | | | | | | | | | | | | | | | Add an optimization for the MS_LAZYTIME mount option so that we will opportunistically write out any inodes with the I_DIRTY_TIME flag set in a particular inode table block when we need to update some inode in that inode table block anyway. Also add some temporary code so that we can set the lazytime mount option without needing a modified /sbin/mount program which can set MS_LAZYTIME. We can eventually make this go away once util-linux has added support. Google-Bug-Id: 18297052 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: add support for a lazytime mount optionTheodore Ts'o2017-05-291-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | Add a new mount option which enables a new "lazytime" mode. This mode causes atime, mtime, and ctime updates to only be made to the in-memory version of the inode. The on-disk times will only get updated when (a) if the inode needs to be updated for some non-time related change, (b) if userspace calls fsync(), syncfs() or sync(), or (c) just before an undeleted inode is evicted from memory. This is OK according to POSIX because there are no guarantees after a crash unless userspace explicitly requests via a fsync(2) call. For workloads which feature a large number of random write to a preallocated file, the lazytime mount option significantly reduces writes to the inode table. The repeated 4k writes to a single block will result in undesirable stress on flash devices and SMR disk drives. Even on conventional HDD's, the repeated writes to the inode table block will trigger Adjacent Track Interference (ATI) remediation latencies, which very negatively impact long tail latencies --- which is a very big deal for web serving tiers (for example). Google-Bug-Id: 18297052 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: push sync_filesystem() down to the file system's remount_fs()Theodore Ts'o2017-05-291-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the no-op "mount -o mount /dev/xxx" operation when the file system is already mounted read-write causes an implied, unconditional syncfs(). This seems pretty stupid, and it's certainly documented or guaraunteed to do this, nor is it particularly useful, except in the case where the file system was mounted rw and is getting remounted read-only. However, it's possible that there might be some file systems that are actually depending on this behavior. In most file systems, it's probably fine to only call sync_filesystem() when transitioning from read-write to read-only, and there are some file systems where this is not needed at all (for example, for a pseudo-filesystem or something like romfs). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Jan Kara <jack@suse.cz> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Anders Larsen <al@alarsen.net> Cc: Phillip Lougher <phillip@squashfs.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: xfs@oss.sgi.com Cc: linux-btrfs@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: codalist@coda.cs.cmu.edu Cc: linux-ext4@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: fuse-devel@lists.sourceforge.net Cc: cluster-devel@redhat.com Cc: linux-mtd@lists.infradead.org Cc: jfs-discussion@lists.sourceforge.net Cc: linux-nfs@vger.kernel.org Cc: linux-nilfs@vger.kernel.org Cc: linux-ntfs-dev@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Cc: reiserfs-devel@vger.kernel.org Change-Id: Ie6fc68d845b0d327f56e4da91a8a9ba0673e5d5e
* fs: ext4: disable support for fallocate FALLOC_FL_PUNCH_HOLENick Desaulniers2017-05-291-0/+7
| | | | | | Bug: 28760453 Change-Id: I019c2de559db9e4b95860ab852211b456d78c4ca Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
* ext4 crypto: use dget_parent() in ext4_d_revalidate()Theodore Ts'o2017-05-291-4/+8
| | | | | | | | | | | | | | This avoids potential problems caused by a race where the inode gets renamed out from its parent directory and the parent directory is deleted while ext4_d_revalidate() is running. Upstream commit: 3d43bcfef5f0548845a425365011c499875491b0 Fixes: 28b4c263961c Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Change-Id: Ia970597753fae0d67fa6eebb972de24d5c1194f8
* ext4 crypto: don't let data integrity writebacks fail with ENOMEMTheodore Ts'o2017-05-294-19/+38
| | | | | | | | | | | | | | | | | | | We don't want the writeback triggered from the journal commit (in data=writeback mode) to cause the journal to abort due to generic_writepages() returning an ENOMEM error. In addition, if fsync() fails with ENOMEM, most applications will probably not do the right thing. So if we are doing a data integrity sync, and ext4_encrypt() returns ENOMEM, we will submit any queued I/O to date, and then retry the allocation using GFP_NOFAIL. Upstream commit: c9af28fdd44922a6c10c9f8315718408af98e315 Google-Bug-Id: 27641567 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Change-Id: I55b6ab35c9ad4eb2ca6d06380755395f17525496
* ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea()Theodore Ts'o2017-05-292-1/+4
| | | | | | | | | | | | | We aren't checking to see if the in-inode extended attribute is corrupted before we try to expand the inode's extra isize fields. This can lead to potential crashes caused by the BUG_ON() check in ext4_xattr_shift_entries(). Upstream commit: 9e92f48c34eb2b9af9d12f892e2fe1fce5e8ce35 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Change-Id: Idd5c5eaaaf7e244e3d310fc528840c13ce4c44a4
* ext4 crypto: fix memleak in ext4_readdir()Kirill Tkhai2017-05-291-2/+5
| | | | | | | | | | | When ext4_bread() fails, fname_crypto_str remains allocated after return. Fix that. Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> CC: Dmitry Monakhov <dmonakhov@virtuozzo.com> Signed-off-by: Theodore Ts'o <tytso@google.com> Change-Id: Ie137cb7be090c52c65c65872035b537ece8c2f17
* ext4 crypto: revalidate dentry after adding or removing the keyTheodore Ts'o2017-05-294-0/+81
| | | | | | | | | | | | Add a validation check for dentries for encrypted directory to make sure we're not caching stale data after a key has been added or removed. Also check to make sure that status of the encryption key is updated when readdir(2) is executed. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com> Change-Id: Ic7a90d79d9447272fc512ae2abbd299523de02b8
* ext4 crypto: simplify interfaces to directory entry insert functionsTheodore Ts'o2017-05-293-17/+11
| | | | | | | | | | | | A number of functions include ext4_add_dx_entry, make_indexed_dir, etc. are being passed a dentry even though the only thing they use is the containing parent. We can shrink the code size slightly by maing this replacement. This will also be useful in cases where we don't have a dentry as the argument to the directory entry insert functions. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: "Theodore Ts'o" <tytso@google.com> Change-Id: I9267c577ab4d7d60e34cbf37c71eaf443e637c5f
* ext4 crypto: add missing locking for keyring_key accessTheodore Ts'o2017-05-291-0/+4
| | | | | | | Cc: stable@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: "Theodore Ts'o" <tytso@google.com> Change-Id: Ia13629b6512a0c5dd2a09e7e3676c74af20c96a3
* ext4 crypto: exit cleanly if ext4_derive_key_aes() failsLaurent Navet2017-05-291-0/+2
| | | | | | | | | | | Return value of ext4_derive_key_aes() is stored but not used. Add test to exit cleanly if ext4_derive_key_aes() fail. Also fix coverity CID 1309760. Signed-off-by: Laurent Navet <laurent.navet@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: "Theodore Ts'o" <tytso@google.com> Change-Id: I796cdfc65386e546f332a3dbbf9f2c2cd76e3301
* ext4 crypto: check for too-short encrypted file namesTheodore Ts'o2017-05-291-0/+4
| | | | | | | | | | | | | | | An encrypted file name should never be shorter than an 16 bytes, the AES block size. The 3.10 crypto layer will oops and crash the kernel if ciphertext shorter than the block size is passed to it. Fortunately, in modern kernels the crypto layer will not crash the kernel in this scenario, but nevertheless, it represents a corrupted directory, and we should detect it and mark the file system as corrupted so that e2fsck can fix this. Change-Id: Ic42808e5161b22ff607689d3570be4d04e6158ed Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>
* ext4 crypto: use a jbd2 transaction when adding a crypto policyTheodore Ts'o2017-05-291-2/+15
| | | | | | | | | | | | Start a jbd2 transaction, and mark the inode dirty on the inode under that transaction after setting the encrypt flag. Otherwise if the directory isn't modified after setting the crypto policy, the encrypted flag might not survive the inode getting pushed out from memory, or the the file system getting unmounted and remounted. Change-Id: I5868e0531881922d8a5e68fa88b6cf2bb1675b99 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>
* ext4: fix data corruption caused by unwritten and delayed extentsLukas Czerner2017-05-292-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently it is possible to lose whole file system block worth of data when we hit the specific interaction with unwritten and delayed extents in status extent tree. The problem is that when we insert delayed extent into extent status tree the only way to get rid of it is when we write out delayed buffer. However there is a limitation in the extent status tree implementation so that when inserting unwritten extent should there be even a single delayed block the whole unwritten extent would be marked as delayed. At this point, there is no way to get rid of the delayed extents, because there are no delayed buffers to write out. So when a we write into said unwritten extent we will convert it to written, but it still remains delayed. When we try to write into that block later ext4_da_map_blocks() will set the buffer new and delayed and map it to invalid block which causes the rest of the block to be zeroed loosing already written data. For now we can fix this by simply not allowing to set delayed status on written extent in the extent status tree. Also add WARN_ON() to make sure that we notice if this happens in the future. This problem can be easily reproduced by running the following xfs_io. xfs_io -f -c "pwrite -S 0xaa 4096 2048" \ -c "falloc 0 131072" \ -c "pwrite -S 0xbb 65536 2048" \ -c "fsync" /mnt/test/fff echo 3 > /proc/sys/vm/drop_caches xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff This can be theoretically also reproduced by at random by running fsx, but it's not very reliable, though on machines with bigger page size (like ppc) this can be seen more often (especially xfstest generic/127) Change-Id: I0ba800f68cf35a0137a5c5b0903017e08bc6f814 Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: "Theodore Ts'o" <tytso@google.com> Cc: stable@vger.kernel.org
* ext4 crypto: fix bugs in ext4_encrypted_zeroout()Theodore Ts'o2017-05-292-4/+21
| | | | | | | | | | | | | | | | | | | | | Fix multiple bugs in ext4_encrypted_zeroout(), including one that could cause us to write an encrypted zero page to the wrong location on disk, potentially causing data and file system corruption. Fortunately, this tends to only show up in stress tests, but even with these fixes, we are seeing some test failures with generic/127 --- but these are now caused by data failures instead of metadata corruption. Since ext4_encrypted_zeroout() is only used for some optimizations to keep the extent tree from being too fragmented, and ext4_encrypted_zeroout() itself isn't all that optimized from a time or IOPS perspective, disable the extent tree optimization for encrypted inodes for now. This prevents the data corruption issues reported by generic/127 until we can figure out what's going wrong. Change-Id: I795e6b479c75f0f930bb47092720c4d7add538da Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: "Theodore Ts'o" <tytso@google.com> Cc: stable@vger.kernel.org
* ext4 crypto: replace some BUG_ON()'s with error checksTheodore Ts'o2017-05-294-7/+15
| | | | | | | | | | Buggy (or hostile) userspace should not be able to cause the kernel to crash. Change-Id: I67f7b32dd458d577b506ddff6ef07955e804e3ff Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: "Theodore Ts'o" <tytso@google.com> Cc: stable@vger.kernel.org
* ext4 crypto: ext4_page_crypto() doesn't need a encryption contextTheodore Ts'o2017-05-294-28/+9
| | | | | | | | | | | | Since ext4_page_crypto() doesn't need an encryption context (at least not any more), this allows us to simplify a number function signature and also allows us to avoid needing to allocate a context in ext4_block_write_begin(). It also means we no longer need a separate ext4_decrypt_one() function. Change-Id: I2f83f5745487ef85312bf8469a6b2a190545a5e4 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: "Theodore Ts'o" <tytso@google.com>
* ext4 crypto: fix memory leak in ext4_bio_write_page()Theodore Ts'o2017-05-291-1/+4
| | | | | | | | | | | | | | | | | There are times when ext4_bio_write_page() is called even though we don't actually need to do any I/O. This happens when ext4_writepage() gets called by the jbd2 commit path when an inode needs to force its pages written out in order to provide data=ordered guarantees --- and a page is backed by an unwritten (e.g., uninitialized) block on disk, or if delayed allocation means the page's backing store hasn't been allocated yet. In that case, we need to skip the call to ext4_encrypt_page(), since in addition to wasting CPU, it leads to a bounce page and an ext4 crypto context getting leaked. Change-Id: Icd2123808fd7372c11e6f9e17849e242837d729d Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: "Theodore Ts'o" <tytso@google.com> Cc: stable@vger.kernel.org
* ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea()Theodore Ts'o2017-05-271-4/+28
| | | | | | | | | | | | | | commit 9e92f48c34eb2b9af9d12f892e2fe1fce5e8ce35 upstream. We aren't checking to see if the in-inode extended attribute is corrupted before we try to expand the inode's extra isize fields. This can lead to potential crashes caused by the BUG_ON() check in ext4_xattr_shift_entries(). Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* UPSTREAM: ext4: fix fencepost in s_first_meta_bg validationTheodore Ts'o2017-05-271-1/+1
| | | | | | | | | | | | | | | (cherry-picked from commit 2ba3e6e8afc9b6188b471f27cf2b5e3cf34e7af2) It is OK for s_first_meta_bg to be equal to the number of block group descriptor blocks. (It rarely happens, but it shouldn't cause any problems.) https://bugzilla.kernel.org/show_bug.cgi?id=194567 Fixes: 3a4b77cd47bb837b8557595ec7425f281f2ca1fe Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Change-Id: Ib414feb50f88dcd42dc846429b81df6c72b28136
* BACKPORT: ext4: validate s_first_meta_bg at mount timeEryu Guan2017-05-271-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (Cherry-picked from commit 3a4b77cd47bb837b8557595ec7425f281f2ca1fe) Ralf Spenneberg reported that he hit a kernel crash when mounting a modified ext4 image. And it turns out that kernel crashed when calculating fs overhead (ext4_calculate_overhead()), this is because the image has very large s_first_meta_bg (debug code shows it's 842150400), and ext4 overruns the memory in count_overhead() when setting bitmap buffer, which is PAGE_SIZE. ext4_calculate_overhead(): buf = get_zeroed_page(GFP_NOFS); <=== PAGE_SIZE buffer blks = count_overhead(sb, i, buf); count_overhead(): for (j = ext4_bg_num_gdb(sb, grp); j > 0; j--) { <=== j = 842150400 ext4_set_bit(EXT4_B2C(sbi, s++), buf); <=== buffer overrun count++; } This can be reproduced easily for me by this script: #!/bin/bash rm -f fs.img mkdir -p /mnt/ext4 fallocate -l 16M fs.img mke2fs -t ext4 -O bigalloc,meta_bg,^resize_inode -F fs.img debugfs -w -R "ssv first_meta_bg 842150400" fs.img mount -o loop fs.img /mnt/ext4 Fix it by validating s_first_meta_bg first at mount time, and refusing to mount if its value exceeds the largest possible meta_bg number. Reported-by: Ralf Spenneberg <ralf@os-t.de> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Change-Id: I252fda33d116b044a3e710b79bdd0c7ce2870145
* ANDROID: ext4 crypto: Disables zeroing on truncation when there's no keyMichael Halcrow2017-05-271-0/+5
| | | | | | | | | | | | | | When performing orphan cleanup on mount, ext4 may truncate pages. Truncation as currently implemented may require the encryption key for partial zeroing, and the key isn't necessarily available on mount. Since the userspace tools don't perform the partial zeroing operation anyway, let's just skip doing that in the kernel. This patch fixes a BUG_ON() oops. Bug: 35209576 Change-Id: I2527a3f8d2c57d2de5df03fda69ee397f76095d7 Signed-off-by: Michael Halcrow <mhalcrow@google.com>
* ext4 crypto: fix return value for ext4_es_scan()Theodore Ts'o2017-05-271-1/+1
| | | | | | | | | | | | Between 3.10 and 3.18, the abstraction to scan for objects in the slab cache which can be freed when the system is under memory pressure changed. When I backported the ext4 code from 3.18 to the 3.10 kernel, I didn't get the return value required by the calling conventions for the scan function correct, which could potentially cause the memory reclaimer to loop indefinitely. Change-Id: I1712fedf96fd91c911fb4d019d7ef76f6c4c1808 Signed-off-by: "Theodore Ts'o" <tytso@google.com>
* ext4 crypto: allocate bounce pages using GFP_NOWAITTheodore Ts'o2017-05-272-23/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we allocated bounce pages using a combination of alloc_page() and mempool_alloc() with the __GFP_WAIT bit set. Instead, use mempool_alloc() with GFP_NOWAIT. The mempool_alloc() function will try using alloc_pages() initially, and then only use the mempool reserve of pages if alloc_pages() is unable to fulfill the request. This minimizes the the impact on the mm layer when we need to do a large amount of writeback of encrypted files, as Jaeguk Kim had reported that under a heavy fio workload on a system with restricted amounts memory (which unfortunately, includes many mobile handsets), he had observed the the OOM killer getting triggered several times. Using GFP_NOWAIT If the mempool_alloc() function fails, we will retry the page writeback at a later time; the function of the mempool is to ensure that we can writeback at least 32 pages at a time, so we can more efficiently dispatch I/O under high memory pressure situations. In the future we should make this be a tunable so we can determine the best tradeoff between permanently sequestering memory and the ability to quickly launder pages so we can free up memory quickly when necessary. Change-Id: I3dbb5eb9a3aa04f40e551338eee5e8d06f352fe8 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4 crypto: release crypto resource on module exitChao Yu2017-05-271-0/+1
| | | | | | | | | Crypto resource should be released when ext4 module exits, otherwise it will cause memory leak. Change-Id: Ie298e73bd766768707a7af440691ce2f418f5acc Signed-off-by: Chao Yu <chao2.yu@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4 crypto: handle unexpected lack of encryption keysTheodore Ts'o2017-05-273-9/+14
| | | | | | | | Fix up attempts by users to try to write to a file when they don't have access to the encryption key. Change-Id: Iabdd438b26b409eaccf9c847fcf9c3ab52f1959e Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4 crypto: allocate the right amount of memory for the on-disk symlinkTheodore Ts'o2017-05-273-21/+37
| | | | | | | | | | Previously we were taking the required padding when allocating space for the on-disk symlink. This caused a buffer overrun which could trigger a krenel crash when running fsstress. Change-Id: I4e05ff207748192036de58bc5af91ae4c357b5b4 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>
* ext4 crypto: clean up error handling in ext4_fname_setup_filenameTheodore Ts'o2017-05-271-19/+16
| | | | | | | | | | Fix a potential memory leak where fname->crypto_buf.name wouldn't get freed in some error paths, and also make the error handling easier to understand/audit. Change-Id: I251041ff2df61dcc2a818539783cfc0de2e2933a Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>
* ext4 crypto: policies may only be set on directoriesTheodore Ts'o2017-05-271-0/+2
| | | | | | | | | Thanks to Chao Yu <chao2.yu@samsung.com> for pointing out we were missing this check. Change-Id: I823edbeddf6cc5086e4d17262d7c497368b1acb7 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>
* ext4 crypto: enforce crypto policy restrictions on cross-renamesTheodore Ts'o2017-05-271-0/+9
| | | | | | | | | Thanks to Chao Yu <chao2.yu@samsung.com> for pointing out the need for this check. Change-Id: I957a4e4be043582972d3c8799f18826fc136d567 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>
* ext4 crypto: encrypt tmpfile located in encryption protected directoryTheodore Ts'o2017-05-273-34/+30
| | | | | | | | | | | | | Factor out calls to ext4_inherit_context() and move them to __ext4_new_inode(); this fixes a problem where ext4_tmpfile() wasn't calling calling ext4_inherit_context(), so the temporary file wasn't getting protected. Since the blocks for the tmpfile could end up on disk, they really should be protected if the tmpfile is created within the context of an encrypted directory. Change-Id: I05e04109aa38878aba970d537de0316326a96fe1 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>
* ext4 crypto: make sure the encryption info is initialized on opendir(2)Theodore Ts'o2017-05-271-0/+8
| | | | | | Change-Id: Ie78f2f807c0b3bc5959d2b601f18826f2658984d Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>
* ext4 crypto: set up encryption info for new inodes in ext4_inherit_context()Theodore Ts'o2017-05-271-0/+1
| | | | | | | | | | Set up the encryption information for newly created inodes immediately after they inherit their encryption context from their parent directories. Change-Id: Ie2a48cde918eaf8ad978a8a698de24627b363955 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>
* ext4 crypto: fix memory leaks in ext4_encrypted_zerooutTheodore Ts'o2017-05-271-31/+31
| | | | | | | | | | ext4_encrypted_zeroout() could end up leaking a bio and bounce page. Fortunately it's not used much. While we're fixing things up, refactor out common code into the static function alloc_bounce_page(). Change-Id: I44023c01de7ec97ad43bfa85cd7d3b97b22ee0c0 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>
* ext4 crypto: use per-inode tfm structureTheodore Ts'o2017-05-279-156/+96
| | | | | | | | | | | | | | | As suggested by Herbert Xu, we shouldn't allocate a new tfm each time we read or write a page. Instead we can use a single tfm hanging off the inode's crypt_info structure for all of our encryption needs for that inode, since the tfm can be used by multiple crypto requests in parallel. Also use cmpxchg() to avoid races that could result in crypt_info structure getting doubly allocated or doubly freed. Change-Id: I4ae5c07d0e5d99ec1e26eeb49d833c4a284d9a5f Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>
* ext4 crypto: require CONFIG_CRYPTO_CTR if ext4 encryption is enabledTheodore Ts'o2017-05-271-0/+1
| | | | | | | | | On arm64 this is apparently needed for CTS mode to function correctly. Otherwise attempts to use CTS return ENOENT. Change-Id: I3f597f5f88e806dbeed75a7123c3d6bb7e608350 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: "Theodore Ts'o" <tytso@google.com>
* ext4 crypto: shrink size of the ext4_crypto_ctx structureTheodore Ts'o2017-05-274-34/+30
| | | | | | | | | | | Some fields are only used when the crypto_ctx is being used on the read path, some are only used on the write path, and some are only used when the structure is on free list. Optimize memory use by using a union. Change-Id: I66de766a0f1122463edf3280ff0c2923be2472b8 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: "Theodore Ts'o" <tytso@google.com>
* ext4 crypto: get rid of ci_mode from struct ext4_crypt_infoTheodore Ts'o2017-05-274-15/+12
| | | | | | | | | The ci_mode field was superfluous, and getting rid of it gets rid of an unused hole in the structure. Change-Id: I0f4c38a1162fa9c6da8a3529b7477ff5560c21df Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: "Theodore Ts'o" <tytso@google.com>
* ext4 crypto: use slab cachesTheodore Ts'o2017-05-273-34/+39
| | | | | | | | | Use slab caches the ext4_crypto_ctx and ext4_crypt_info structures for slighly better memory efficiency and debuggability. Change-Id: If47986e2e29fa181d113864dcd9d1cae79c72639 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: "Theodore Ts'o" <tytso@google.com>
* ext4: clean up superblock encryption mode fieldsTheodore Ts'o2017-05-274-32/+7
| | | | | | | | | | | | | The superblock fields s_file_encryption_mode and s_dir_encryption_mode are vestigal, so remove them as a cleanup. While we're at it, allow file systems with both encryption and inline_data enabled at the same time to work correctly. We can't have encrypted inodes with inline data, but there's no reason to prohibit unencrypted inodes from using the inline data feature. Change-Id: Ia90b7e24bcf9ebabef529b710d70bd8ba71a17a4 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: "Theodore Ts'o" <tytso@google.com>
* ext4 crypto: reorganize how we store keys in the inodeTheodore Ts'o2017-05-2711-346/+246
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a pretty massive patch which does a number of different things: 1) The per-inode encryption information is now stored in an allocated data structure, ext4_crypt_info, instead of directly in the node. This reduces the size usage of an in-memory inode when it is not using encryption. 2) We drop the ext4_fname_crypto_ctx entirely, and use the per-inode encryption structure instead. This remove an unnecessary memory allocation and free for the fname_crypto_ctx as well as allowing us to reuse the ctfm in a directory for multiple lookups and file creations. 3) We also cache the inode's policy information in the ext4_crypt_info structure so we don't have to continually read it out of the extended attributes. 4) We now keep the keyring key in the inode's encryption structure instead of releasing it after we are done using it to derive the per-inode key. This allows us to test to see if the key has been revoked; if it has, we prevent the use of the derived key and free it. 5) When an inode is released (or when the derived key is freed), we will use memset_explicit() to zero out the derived key, so it's not left hanging around in memory. This implies that when a user logs out, it is important to first revoke the key, and then unlink it, and then finally, to use "echo 3 > /proc/sys/vm/drop_caches" to release any decrypted pages and dcache entries from the system caches. 6) All this, and we also shrink the number of lines of code by around 100. :-) Change-Id: I948f7844d425c0ce616f800446ecb0b6bea686f8 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>
* ext4 crypto: separate kernel and userspace structure for the keyTheodore Ts'o2017-05-276-48/+43
| | | | | | | | | | | | | | | Use struct ext4_encryption_key only for the master key passed via the kernel keyring. For internal kernel space users, we now use struct ext4_crypt_info. This will allow us to put information from the policy structure so we can cache it and avoid needing to constantly looking up the extended attribute. We will do this in a spearate patch. This patch is mostly mechnical to make it easier for patch review. Change-Id: I208472675d0550df5f60b3b58652a9a1b434caed Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>
* ext4 crypto: don't allocate a page when encrypting/decrypting file namesTheodore Ts'o2017-05-275-54/+28
| | | | | | Change-Id: Ib0deff3a9aff318d8f2be6b4a550168d4771ccc2 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>
* ext4 crypto: optimize filename encryptionTheodore Ts'o2017-05-274-313/+230
| | | | | | | | | | | | | | | | Encrypt the filename as soon it is passed in by the user. This avoids our needing to encrypt the filename 2 or 3 times while in the process of creating a filename. Similarly, when looking up a directory entry, encrypt the filename early, or if the encryption key is not available, base-64 decode the file syystem so that the hash value and the last 16 bytes of the encrypted filename is available in the new struct ext4_filename data structure. Change-Id: Ia76a5e51770840c57a53180cd89476f2e9b8c966 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>
* ext4 crypto: add padding to filenames before encryptingTheodore Ts'o2017-05-275-8/+31
| | | | | | | | | | | | This obscures the length of the filenames, to decrease the amount of information leakage. By default, we pad the filenames to the next 4 byte boundaries. This costs nothing, since the directory entries are aligned to 4 byte boundaries anyway. Filenames can also be padded to 8, 16, or 32 bytes, which will consume more directory space. Change-Id: I2d4ab2b76797ab93fada683f405e3876e0cff9dc Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>
* ext4 crypto: simplify and speed up filename encryptionTheodore Ts'o2017-05-275-204/+149
| | | | | | | | | | Avoid using SHA-1 when calculating the user-visible filename when the encryption key is available, and avoid decrypting lots of filenames when searching for a directory entry in a directory block. Change-Id: Ifff4c07a80740112e2e984d2da3105e2fe41ab68 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>
* ext4 crypto: sync up the mainline 4.1-rc1 mergeTheodore Ts'o2017-05-275-15/+16
| | | | | | | | | | | | There were some last minute changes that weren't reflected in the ext4 crypto patches that we were syncing with flounder. They were mostly whitespace changes, plus an error handling bugfix if there was a normal (non-crypto-related) bugs when adding a directory entry to an inode while creating a file. Change-Id: I01e1f8ee07aef2f826a27efcbfa85a825000f2bc Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>
* ext4: make fsync to sync parent dir in no-journal for real this timeTheodore Ts'o2017-05-271-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (cherry picked from commit e12fb97222fc41e8442896934f76d39ef99b590a) (needed to avoid patch conflicts with further ext4 crypto patches) Previously commit 14ece1028b3ed53ffec1b1213ffc6acaf79ad77c added a support for for syncing parent directory of newly created inodes to make sure that the inode is not lost after a power failure in no-journal mode. However this does not work in majority of cases, namely: - if the directory has inline data - if the directory is already indexed - if the directory already has at least one block and: - the new entry fits into it - or we've successfully converted it to indexed So in those cases we might lose the inode entirely even after fsync in the no-journal mode. This also includes ext2 default mode obviously. I've noticed this while running xfstest generic/321 and even though the test should fail (we need to run fsck after a crash in no-journal mode) I could not find a newly created entries even when if it was fsynced before. Fix this by adjusting the ext4_add_entry() successful exit paths to set the inode EXT4_STATE_NEWENTRY so that fsync has the chance to fsync the parent directory as well. Change-Id: I742fb1c5304986cb990352a2471186bcd2c77ceb Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Frank Mayhar <fmayhar@google.com> Cc: stable@vger.kernel.org Signed-off-by: Theodore Ts'o <tytso@google.com>
* ext4 crypto: enable encryption feature flagTheodore Ts'o2017-05-276-24/+79
| | | | | | | | | | Also add the test dummy encryption mode flag so we can more easily test the encryption patches using xfstests. Change-Id: Iaae44110ab5870e5da60aca76197828f0ebc139b Signed-off-by: Michael Halcrow <mhalcrow@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Theodore Ts'o <tytso@google.com>