| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add an optimization for the MS_LAZYTIME mount option so that we will
opportunistically write out any inodes with the I_DIRTY_TIME flag set
in a particular inode table block when we need to update some inode in
that inode table block anyway.
Also add some temporary code so that we can set the lazytime mount
option without needing a modified /sbin/mount program which can set
MS_LAZYTIME. We can eventually make this go away once util-linux has
added support.
Google-Bug-Id: 18297052
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new mount option which enables a new "lazytime" mode. This mode
causes atime, mtime, and ctime updates to only be made to the
in-memory version of the inode. The on-disk times will only get
updated when (a) if the inode needs to be updated for some non-time
related change, (b) if userspace calls fsync(), syncfs() or sync(), or
(c) just before an undeleted inode is evicted from memory.
This is OK according to POSIX because there are no guarantees after a
crash unless userspace explicitly requests via a fsync(2) call.
For workloads which feature a large number of random write to a
preallocated file, the lazytime mount option significantly reduces
writes to the inode table. The repeated 4k writes to a single block
will result in undesirable stress on flash devices and SMR disk
drives. Even on conventional HDD's, the repeated writes to the inode
table block will trigger Adjacent Track Interference (ATI) remediation
latencies, which very negatively impact long tail latencies --- which
is a very big deal for web serving tiers (for example).
Google-Bug-Id: 18297052
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the no-op "mount -o mount /dev/xxx" operation when the
file system is already mounted read-write causes an implied,
unconditional syncfs(). This seems pretty stupid, and it's certainly
documented or guaraunteed to do this, nor is it particularly useful,
except in the case where the file system was mounted rw and is getting
remounted read-only.
However, it's possible that there might be some file systems that are
actually depending on this behavior. In most file systems, it's
probably fine to only call sync_filesystem() when transitioning from
read-write to read-only, and there are some file systems where this is
not needed at all (for example, for a pseudo-filesystem or something
like romfs).
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: linux-fsdevel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Artem Bityutskiy <dedekind1@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Jan Kara <jack@suse.cz>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Anders Larsen <al@alarsen.net>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: Petr Vandrovec <petr@vandrovec.name>
Cc: xfs@oss.sgi.com
Cc: linux-btrfs@vger.kernel.org
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Cc: codalist@coda.cs.cmu.edu
Cc: linux-ext4@vger.kernel.org
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: fuse-devel@lists.sourceforge.net
Cc: cluster-devel@redhat.com
Cc: linux-mtd@lists.infradead.org
Cc: jfs-discussion@lists.sourceforge.net
Cc: linux-nfs@vger.kernel.org
Cc: linux-nilfs@vger.kernel.org
Cc: linux-ntfs-dev@lists.sourceforge.net
Cc: ocfs2-devel@oss.oracle.com
Cc: reiserfs-devel@vger.kernel.org
Change-Id: Ie6fc68d845b0d327f56e4da91a8a9ba0673e5d5e
|
| |
|
|
|
|
| |
Bug: 28760453
Change-Id: I019c2de559db9e4b95860ab852211b456d78c4ca
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
This avoids potential problems caused by a race where the inode gets
renamed out from its parent directory and the parent directory is
deleted while ext4_d_revalidate() is running.
Upstream commit: 3d43bcfef5f0548845a425365011c499875491b0
Fixes: 28b4c263961c
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Change-Id: Ia970597753fae0d67fa6eebb972de24d5c1194f8
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We don't want the writeback triggered from the journal commit (in
data=writeback mode) to cause the journal to abort due to
generic_writepages() returning an ENOMEM error. In addition, if
fsync() fails with ENOMEM, most applications will probably not do the
right thing.
So if we are doing a data integrity sync, and ext4_encrypt() returns
ENOMEM, we will submit any queued I/O to date, and then retry the
allocation using GFP_NOFAIL.
Upstream commit: c9af28fdd44922a6c10c9f8315718408af98e315
Google-Bug-Id: 27641567
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Change-Id: I55b6ab35c9ad4eb2ca6d06380755395f17525496
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
We aren't checking to see if the in-inode extended attribute is
corrupted before we try to expand the inode's extra isize fields.
This can lead to potential crashes caused by the BUG_ON() check in
ext4_xattr_shift_entries().
Upstream commit: 9e92f48c34eb2b9af9d12f892e2fe1fce5e8ce35
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Change-Id: Idd5c5eaaaf7e244e3d310fc528840c13ce4c44a4
|
| |
|
|
|
|
|
|
|
|
|
| |
When ext4_bread() fails, fname_crypto_str remains
allocated after return. Fix that.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
CC: Dmitry Monakhov <dmonakhov@virtuozzo.com>
Signed-off-by: Theodore Ts'o <tytso@google.com>
Change-Id: Ie137cb7be090c52c65c65872035b537ece8c2f17
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Add a validation check for dentries for encrypted directory to make
sure we're not caching stale data after a key has been added or removed.
Also check to make sure that status of the encryption key is updated
when readdir(2) is executed.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
Change-Id: Ic7a90d79d9447272fc512ae2abbd299523de02b8
|
| |
|
|
|
|
|
|
|
|
|
|
| |
A number of functions include ext4_add_dx_entry, make_indexed_dir,
etc. are being passed a dentry even though the only thing they use is
the containing parent. We can shrink the code size slightly by maing
this replacement. This will also be useful in cases where we don't
have a dentry as the argument to the directory entry insert functions.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@google.com>
Change-Id: I9267c577ab4d7d60e34cbf37c71eaf443e637c5f
|
| |
|
|
|
|
|
| |
Cc: stable@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@google.com>
Change-Id: Ia13629b6512a0c5dd2a09e7e3676c74af20c96a3
|
| |
|
|
|
|
|
|
|
|
|
| |
Return value of ext4_derive_key_aes() is stored but not used.
Add test to exit cleanly if ext4_derive_key_aes() fail.
Also fix coverity CID 1309760.
Signed-off-by: Laurent Navet <laurent.navet@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@google.com>
Change-Id: I796cdfc65386e546f332a3dbbf9f2c2cd76e3301
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An encrypted file name should never be shorter than an 16 bytes, the
AES block size. The 3.10 crypto layer will oops and crash the kernel
if ciphertext shorter than the block size is passed to it.
Fortunately, in modern kernels the crypto layer will not crash the
kernel in this scenario, but nevertheless, it represents a corrupted
directory, and we should detect it and mark the file system as
corrupted so that e2fsck can fix this.
Change-Id: Ic42808e5161b22ff607689d3570be4d04e6158ed
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Start a jbd2 transaction, and mark the inode dirty on the inode under
that transaction after setting the encrypt flag. Otherwise if the
directory isn't modified after setting the crypto policy, the
encrypted flag might not survive the inode getting pushed out from
memory, or the the file system getting unmounted and remounted.
Change-Id: I5868e0531881922d8a5e68fa88b6cf2bb1675b99
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently it is possible to lose whole file system block worth of data
when we hit the specific interaction with unwritten and delayed extents
in status extent tree.
The problem is that when we insert delayed extent into extent status
tree the only way to get rid of it is when we write out delayed buffer.
However there is a limitation in the extent status tree implementation
so that when inserting unwritten extent should there be even a single
delayed block the whole unwritten extent would be marked as delayed.
At this point, there is no way to get rid of the delayed extents,
because there are no delayed buffers to write out. So when a we write
into said unwritten extent we will convert it to written, but it still
remains delayed.
When we try to write into that block later ext4_da_map_blocks() will set
the buffer new and delayed and map it to invalid block which causes
the rest of the block to be zeroed loosing already written data.
For now we can fix this by simply not allowing to set delayed status on
written extent in the extent status tree. Also add WARN_ON() to make
sure that we notice if this happens in the future.
This problem can be easily reproduced by running the following xfs_io.
xfs_io -f -c "pwrite -S 0xaa 4096 2048" \
-c "falloc 0 131072" \
-c "pwrite -S 0xbb 65536 2048" \
-c "fsync" /mnt/test/fff
echo 3 > /proc/sys/vm/drop_caches
xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff
This can be theoretically also reproduced by at random by running fsx,
but it's not very reliable, though on machines with bigger page size
(like ppc) this can be seen more often (especially xfstest generic/127)
Change-Id: I0ba800f68cf35a0137a5c5b0903017e08bc6f814
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@google.com>
Cc: stable@vger.kernel.org
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix multiple bugs in ext4_encrypted_zeroout(), including one that
could cause us to write an encrypted zero page to the wrong location
on disk, potentially causing data and file system corruption.
Fortunately, this tends to only show up in stress tests, but even with
these fixes, we are seeing some test failures with generic/127 --- but
these are now caused by data failures instead of metadata corruption.
Since ext4_encrypted_zeroout() is only used for some optimizations to
keep the extent tree from being too fragmented, and
ext4_encrypted_zeroout() itself isn't all that optimized from a time
or IOPS perspective, disable the extent tree optimization for
encrypted inodes for now. This prevents the data corruption issues
reported by generic/127 until we can figure out what's going wrong.
Change-Id: I795e6b479c75f0f930bb47092720c4d7add538da
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@google.com>
Cc: stable@vger.kernel.org
|
| |
|
|
|
|
|
|
|
|
| |
Buggy (or hostile) userspace should not be able to cause the kernel to
crash.
Change-Id: I67f7b32dd458d577b506ddff6ef07955e804e3ff
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@google.com>
Cc: stable@vger.kernel.org
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Since ext4_page_crypto() doesn't need an encryption context (at least
not any more), this allows us to simplify a number function signature
and also allows us to avoid needing to allocate a context in
ext4_block_write_begin(). It also means we no longer need a separate
ext4_decrypt_one() function.
Change-Id: I2f83f5745487ef85312bf8469a6b2a190545a5e4
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There are times when ext4_bio_write_page() is called even though we
don't actually need to do any I/O. This happens when ext4_writepage()
gets called by the jbd2 commit path when an inode needs to force its
pages written out in order to provide data=ordered guarantees --- and
a page is backed by an unwritten (e.g., uninitialized) block on disk,
or if delayed allocation means the page's backing store hasn't been
allocated yet. In that case, we need to skip the call to
ext4_encrypt_page(), since in addition to wasting CPU, it leads to a
bounce page and an ext4 crypto context getting leaked.
Change-Id: Icd2123808fd7372c11e6f9e17849e242837d729d
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@google.com>
Cc: stable@vger.kernel.org
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit 9e92f48c34eb2b9af9d12f892e2fe1fce5e8ce35 upstream.
We aren't checking to see if the in-inode extended attribute is
corrupted before we try to expand the inode's extra isize fields.
This can lead to potential crashes caused by the BUG_ON() check in
ext4_xattr_shift_entries().
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(cherry-picked from commit 2ba3e6e8afc9b6188b471f27cf2b5e3cf34e7af2)
It is OK for s_first_meta_bg to be equal to the number of block group
descriptor blocks. (It rarely happens, but it shouldn't cause any
problems.)
https://bugzilla.kernel.org/show_bug.cgi?id=194567
Fixes: 3a4b77cd47bb837b8557595ec7425f281f2ca1fe
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
Change-Id: Ib414feb50f88dcd42dc846429b81df6c72b28136
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(Cherry-picked from commit 3a4b77cd47bb837b8557595ec7425f281f2ca1fe)
Ralf Spenneberg reported that he hit a kernel crash when mounting a
modified ext4 image. And it turns out that kernel crashed when
calculating fs overhead (ext4_calculate_overhead()), this is because
the image has very large s_first_meta_bg (debug code shows it's
842150400), and ext4 overruns the memory in count_overhead() when
setting bitmap buffer, which is PAGE_SIZE.
ext4_calculate_overhead():
buf = get_zeroed_page(GFP_NOFS); <=== PAGE_SIZE buffer
blks = count_overhead(sb, i, buf);
count_overhead():
for (j = ext4_bg_num_gdb(sb, grp); j > 0; j--) { <=== j = 842150400
ext4_set_bit(EXT4_B2C(sbi, s++), buf); <=== buffer overrun
count++;
}
This can be reproduced easily for me by this script:
#!/bin/bash
rm -f fs.img
mkdir -p /mnt/ext4
fallocate -l 16M fs.img
mke2fs -t ext4 -O bigalloc,meta_bg,^resize_inode -F fs.img
debugfs -w -R "ssv first_meta_bg 842150400" fs.img
mount -o loop fs.img /mnt/ext4
Fix it by validating s_first_meta_bg first at mount time, and
refusing to mount if its value exceeds the largest possible meta_bg
number.
Reported-by: Ralf Spenneberg <ralf@os-t.de>
Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Change-Id: I252fda33d116b044a3e710b79bdd0c7ce2870145
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
When performing orphan cleanup on mount, ext4 may truncate pages.
Truncation as currently implemented may require the encryption key for
partial zeroing, and the key isn't necessarily available on mount.
Since the userspace tools don't perform the partial zeroing operation
anyway, let's just skip doing that in the kernel.
This patch fixes a BUG_ON() oops.
Bug: 35209576
Change-Id: I2527a3f8d2c57d2de5df03fda69ee397f76095d7
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Between 3.10 and 3.18, the abstraction to scan for objects in the slab
cache which can be freed when the system is under memory pressure
changed. When I backported the ext4 code from 3.18 to the 3.10
kernel, I didn't get the return value required by the calling
conventions for the scan function correct, which could potentially
cause the memory reclaimer to loop indefinitely.
Change-Id: I1712fedf96fd91c911fb4d019d7ef76f6c4c1808
Signed-off-by: "Theodore Ts'o" <tytso@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we allocated bounce pages using a combination of
alloc_page() and mempool_alloc() with the __GFP_WAIT bit set.
Instead, use mempool_alloc() with GFP_NOWAIT. The mempool_alloc()
function will try using alloc_pages() initially, and then only use the
mempool reserve of pages if alloc_pages() is unable to fulfill the
request.
This minimizes the the impact on the mm layer when we need to do a
large amount of writeback of encrypted files, as Jaeguk Kim had
reported that under a heavy fio workload on a system with restricted
amounts memory (which unfortunately, includes many mobile handsets),
he had observed the the OOM killer getting triggered several times.
Using GFP_NOWAIT
If the mempool_alloc() function fails, we will retry the page
writeback at a later time; the function of the mempool is to ensure
that we can writeback at least 32 pages at a time, so we can more
efficiently dispatch I/O under high memory pressure situations. In
the future we should make this be a tunable so we can determine the
best tradeoff between permanently sequestering memory and the ability
to quickly launder pages so we can free up memory quickly when
necessary.
Change-Id: I3dbb5eb9a3aa04f40e551338eee5e8d06f352fe8
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| |
|
|
|
|
|
|
|
| |
Crypto resource should be released when ext4 module exits, otherwise
it will cause memory leak.
Change-Id: Ie298e73bd766768707a7af440691ce2f418f5acc
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| |
|
|
|
|
|
|
| |
Fix up attempts by users to try to write to a file when they don't
have access to the encryption key.
Change-Id: Iabdd438b26b409eaccf9c847fcf9c3ab52f1959e
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| |
|
|
|
|
|
|
|
|
| |
Previously we were taking the required padding when allocating space
for the on-disk symlink. This caused a buffer overrun which could
trigger a krenel crash when running fsstress.
Change-Id: I4e05ff207748192036de58bc5af91ae4c357b5b4
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
|
| |
|
|
|
|
|
|
|
|
| |
Fix a potential memory leak where fname->crypto_buf.name wouldn't get
freed in some error paths, and also make the error handling easier to
understand/audit.
Change-Id: I251041ff2df61dcc2a818539783cfc0de2e2933a
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
|
| |
|
|
|
|
|
|
|
| |
Thanks to Chao Yu <chao2.yu@samsung.com> for pointing out we were
missing this check.
Change-Id: I823edbeddf6cc5086e4d17262d7c497368b1acb7
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
|
| |
|
|
|
|
|
|
|
| |
Thanks to Chao Yu <chao2.yu@samsung.com> for pointing out the need for
this check.
Change-Id: I957a4e4be043582972d3c8799f18826fc136d567
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Factor out calls to ext4_inherit_context() and move them to
__ext4_new_inode(); this fixes a problem where ext4_tmpfile() wasn't
calling calling ext4_inherit_context(), so the temporary file wasn't
getting protected. Since the blocks for the tmpfile could end up on
disk, they really should be protected if the tmpfile is created within
the context of an encrypted directory.
Change-Id: I05e04109aa38878aba970d537de0316326a96fe1
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
|
| |
|
|
|
|
| |
Change-Id: Ie78f2f807c0b3bc5959d2b601f18826f2658984d
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
|
| |
|
|
|
|
|
|
|
|
| |
Set up the encryption information for newly created inodes immediately
after they inherit their encryption context from their parent
directories.
Change-Id: Ie2a48cde918eaf8ad978a8a698de24627b363955
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
|
| |
|
|
|
|
|
|
|
|
| |
ext4_encrypted_zeroout() could end up leaking a bio and bounce page.
Fortunately it's not used much. While we're fixing things up,
refactor out common code into the static function alloc_bounce_page().
Change-Id: I44023c01de7ec97ad43bfa85cd7d3b97b22ee0c0
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As suggested by Herbert Xu, we shouldn't allocate a new tfm each time
we read or write a page. Instead we can use a single tfm hanging off
the inode's crypt_info structure for all of our encryption needs for
that inode, since the tfm can be used by multiple crypto requests in
parallel.
Also use cmpxchg() to avoid races that could result in crypt_info
structure getting doubly allocated or doubly freed.
Change-Id: I4ae5c07d0e5d99ec1e26eeb49d833c4a284d9a5f
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
|
| |
|
|
|
|
|
|
|
| |
On arm64 this is apparently needed for CTS mode to function correctly.
Otherwise attempts to use CTS return ENOENT.
Change-Id: I3f597f5f88e806dbeed75a7123c3d6bb7e608350
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@google.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
Some fields are only used when the crypto_ctx is being used on the
read path, some are only used on the write path, and some are only
used when the structure is on free list. Optimize memory use by using
a union.
Change-Id: I66de766a0f1122463edf3280ff0c2923be2472b8
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@google.com>
|
| |
|
|
|
|
|
|
|
| |
The ci_mode field was superfluous, and getting rid of it gets rid of
an unused hole in the structure.
Change-Id: I0f4c38a1162fa9c6da8a3529b7477ff5560c21df
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@google.com>
|
| |
|
|
|
|
|
|
|
| |
Use slab caches the ext4_crypto_ctx and ext4_crypt_info structures for
slighly better memory efficiency and debuggability.
Change-Id: If47986e2e29fa181d113864dcd9d1cae79c72639
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The superblock fields s_file_encryption_mode and s_dir_encryption_mode
are vestigal, so remove them as a cleanup. While we're at it, allow
file systems with both encryption and inline_data enabled at the same
time to work correctly. We can't have encrypted inodes with inline
data, but there's no reason to prohibit unencrypted inodes from using
the inline data feature.
Change-Id: Ia90b7e24bcf9ebabef529b710d70bd8ba71a17a4
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: "Theodore Ts'o" <tytso@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a pretty massive patch which does a number of different things:
1) The per-inode encryption information is now stored in an allocated
data structure, ext4_crypt_info, instead of directly in the node.
This reduces the size usage of an in-memory inode when it is not
using encryption.
2) We drop the ext4_fname_crypto_ctx entirely, and use the per-inode
encryption structure instead. This remove an unnecessary memory
allocation and free for the fname_crypto_ctx as well as allowing us
to reuse the ctfm in a directory for multiple lookups and file
creations.
3) We also cache the inode's policy information in the ext4_crypt_info
structure so we don't have to continually read it out of the
extended attributes.
4) We now keep the keyring key in the inode's encryption structure
instead of releasing it after we are done using it to derive the
per-inode key. This allows us to test to see if the key has been
revoked; if it has, we prevent the use of the derived key and free
it.
5) When an inode is released (or when the derived key is freed), we
will use memset_explicit() to zero out the derived key, so it's not
left hanging around in memory. This implies that when a user logs
out, it is important to first revoke the key, and then unlink it,
and then finally, to use "echo 3 > /proc/sys/vm/drop_caches" to
release any decrypted pages and dcache entries from the system
caches.
6) All this, and we also shrink the number of lines of code by around
100. :-)
Change-Id: I948f7844d425c0ce616f800446ecb0b6bea686f8
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Use struct ext4_encryption_key only for the master key passed via the
kernel keyring.
For internal kernel space users, we now use struct ext4_crypt_info.
This will allow us to put information from the policy structure so we
can cache it and avoid needing to constantly looking up the extended
attribute. We will do this in a spearate patch. This patch is mostly
mechnical to make it easier for patch review.
Change-Id: I208472675d0550df5f60b3b58652a9a1b434caed
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
|
| |
|
|
|
|
| |
Change-Id: Ib0deff3a9aff318d8f2be6b4a550168d4771ccc2
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Encrypt the filename as soon it is passed in by the user. This avoids
our needing to encrypt the filename 2 or 3 times while in the process
of creating a filename.
Similarly, when looking up a directory entry, encrypt the filename
early, or if the encryption key is not available, base-64 decode the
file syystem so that the hash value and the last 16 bytes of the
encrypted filename is available in the new struct ext4_filename data
structure.
Change-Id: Ia76a5e51770840c57a53180cd89476f2e9b8c966
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This obscures the length of the filenames, to decrease the amount of
information leakage. By default, we pad the filenames to the next 4
byte boundaries. This costs nothing, since the directory entries are
aligned to 4 byte boundaries anyway. Filenames can also be padded to
8, 16, or 32 bytes, which will consume more directory space.
Change-Id: I2d4ab2b76797ab93fada683f405e3876e0cff9dc
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
|
| |
|
|
|
|
|
|
|
|
| |
Avoid using SHA-1 when calculating the user-visible filename when the
encryption key is available, and avoid decrypting lots of filenames
when searching for a directory entry in a directory block.
Change-Id: Ifff4c07a80740112e2e984d2da3105e2fe41ab68
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
There were some last minute changes that weren't reflected in the ext4
crypto patches that we were syncing with flounder. They were mostly
whitespace changes, plus an error handling bugfix if there was a
normal (non-crypto-related) bugs when adding a directory entry to an
inode while creating a file.
Change-Id: I01e1f8ee07aef2f826a27efcbfa85a825000f2bc
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(cherry picked from commit e12fb97222fc41e8442896934f76d39ef99b590a)
(needed to avoid patch conflicts with further ext4 crypto patches)
Previously commit 14ece1028b3ed53ffec1b1213ffc6acaf79ad77c added a
support for for syncing parent directory of newly created inodes to
make sure that the inode is not lost after a power failure in
no-journal mode.
However this does not work in majority of cases, namely:
- if the directory has inline data
- if the directory is already indexed
- if the directory already has at least one block and:
- the new entry fits into it
- or we've successfully converted it to indexed
So in those cases we might lose the inode entirely even after fsync in
the no-journal mode. This also includes ext2 default mode obviously.
I've noticed this while running xfstest generic/321 and even though the
test should fail (we need to run fsck after a crash in no-journal mode)
I could not find a newly created entries even when if it was fsynced
before.
Fix this by adjusting the ext4_add_entry() successful exit paths to set
the inode EXT4_STATE_NEWENTRY so that fsync has the chance to fsync the
parent directory as well.
Change-Id: I742fb1c5304986cb990352a2471186bcd2c77ceb
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Frank Mayhar <fmayhar@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Theodore Ts'o <tytso@google.com>
|
| |
|
|
|
|
|
|
|
|
| |
Also add the test dummy encryption mode flag so we can more easily
test the encryption patches using xfstests.
Change-Id: Iaae44110ab5870e5da60aca76197828f0ebc139b
Signed-off-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Theodore Ts'o <tytso@google.com>
|