aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
...
* mt6735: Add an option to multiplex AP and STA on wlan0Diogo Ferreira2017-06-045-8/+14
| | | | | | | | | | | | | This adds CONFIG_MTK_COMBO_AOSP_TETHERING_SUPPORT which, when enabled, allows ap and wlan to co-exist in the same interface, as Android expects. Most of this functionality is also available (albeit not compilable broken) under CFG_TC1_FEATURE but that has larger implications around the radio and usb stack that we do not want to adopt. Change-Id: Ib1d1be40566f1bb9ccc7be45b49ec8d1f3b3ba58 Ticket: PORRIDGE-30
* defconfig: use cubic instead of westwood (seems to give better results on ↵Mister Oyster2017-06-031-15/+11
| | | | xDSL lines) and disable unused ones
* ipv6: fix out of bound writes in __ip6_append_data()Eric Dumazet2017-06-021-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | Andrey Konovalov and idaifish@gmail.com reported crashes caused by one skb shared_info being overwritten from __ip6_append_data() Andrey program lead to following state : copy -4200 datalen 2000 fraglen 2040 maxfraglen 2040 alloclen 2048 transhdrlen 0 offset 0 fraggap 6200 The skb_copy_and_csum_bits(skb_prev, maxfraglen, data + transhdrlen, fraggap, 0); is overwriting skb->head and skb_shared_info Since we apparently detect this rare condition too late, move the code earlier to even avoid allocating skb and risking crashes. Once again, many thanks to Andrey and syzkaller team. Change-Id: Ieb39488dca91832154217fdac4a98246d8e407da Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Reported-by: <idaifish@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ANDROID: hid: uhid: implement refcount for open and closeDmitry Torokhov2017-06-021-2/+15
| | | | | | | | | | | | | | | Fix concurrent open and close activity sending a UHID_CLOSE while some consumers still have the device open. Temporary solution for reference counts on device open and close calls, absent a facility for this in the HID core likely to appear in the future. [toddpoynor@google.com: commit text] Bug: 38448648 Signed-off-by: Todd Poynor <toddpoynor@google.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Change-Id: I57413e42ec961a960a8ddc4942228df22c730d80
* Revert "jiffies conversions: Use compile time constants when possible"Mister Oyster2017-06-022-144/+98
| | | | | Build breaks when any of this code is included This reverts commit 9722cd7360077819a5af4937c0f742149fcec82c.
* defconfig: move to config_hz=300Moyster2017-06-022-4/+7
|
* Revert "use -O3 compile flags"Moyster2017-06-021-2/+2
| | | | This reverts commit 7bd42c9fc7c3f553fd2b502249fbbfba462a1555.
* ext4: fix deadlock during page writebackJan Kara2017-06-011-3/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | [ Upstream commit 646caa9c8e196880b41cd3e3d33a2ebc752bdb85 ] Commit 06bd3c36a733 (ext4: fix data exposure after a crash) uncovered a deadlock in ext4_writepages() which was previously much harder to hit. After this commit xfstest generic/130 reproduces the deadlock on small filesystems. The problem happens when ext4_do_update_inode() sets LARGE_FILE feature and marks current inode handle as synchronous. That subsequently results in ext4_journal_stop() called from ext4_writepages() to block waiting for transaction commit while still holding page locks, reference to io_end, and some prepared bio in mpd structure each of which can possibly block transaction commit from completing and thus results in deadlock. Fix the problem by releasing page locks, io_end reference, and submitting prepared bio before calling ext4_journal_stop(). [ Changed to defer the call to ext4_journal_stop() only if the handle is synchronous. --tytso ] Change-Id: I724640d96ffaa03e512cd0b48cea056b4030c382 Reported-and-tested-by: Eryu Guan <eguan@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
* ext4: fix data exposure after a crashJan Kara2017-06-011-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 06bd3c36a733ac27962fea7d6f47168841376824 upstream. Huang has reported that in his powerfail testing he is seeing stale block contents in some of recently allocated blocks although he mounts ext4 in data=ordered mode. After some investigation I have found out that indeed when delayed allocation is used, we don't add inode to transaction's list of inodes needing flushing before commit. Originally we were doing that but commit f3b59291a69d removed the logic with a flawed argument that it is not needed. The problem is that although for delayed allocated blocks we write their contents immediately after allocating them, there is no guarantee that the IO scheduler or device doesn't reorder things and thus transaction allocating blocks and attaching them to inode can reach stable storage before actual block contents. Actually whenever we attach freshly allocated blocks to inode using a written extent, we should add inode to transaction's ordered inode list to make sure we properly wait for block contents to be written before committing the transaction. So that is what we do in this patch. This also handles other cases where stale data exposure was possible - like filling hole via mmap in data=ordered,nodelalloc mode. The only exception to the above rule are extending direct IO writes where blkdev_direct_IO() waits for IO to complete before increasing i_size and thus stale data exposure is not possible. For now we don't complicate the code with optimizing this special case since the overhead is pretty low. In case this is observed to be a performance problem we can always handle it using a special flag to ext4_map_blocks(). Change-Id: I9f8b371c9fd716bf3d8af3780ce43e73d80cfb28 Fixes: f3b59291a69d0b734be1fc8be489fef2dd846d3d Reported-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com> Tested-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> [bwh: Backported to 3.16: - Drop check for EXT4_GET_BLOCKS_ZERO flag - Adjust context] Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
* defconfig: renable rps/xps/rfs_accelMister Oyster2017-05-311-0/+4
|
* net: fix config_xps and config_rps since use_generic_smp_helpers was removedMister Oyster2017-05-311-3/+3
|
* ANDROID: mnt: Fix next_descendentDaniel Rosenberg2017-05-311-3/+8
| | | | | | | | | | | | | | next_descendent did not properly handle the case where the initial mount had no slaves. In this case, we would look for the next slave, but since don't have a master, the check for wrapping around to the start of the list will always fail. Instead, we check for this case, and ensure that we end the iteration when we come back to the root. Signed-off-by: Daniel Rosenberg <drosen@google.com> Bug: 62094374 Change-Id: I43dfcee041aa3730cb4b9a1161418974ef84812e
* defconfig: cleanup net stuffMister Oyster2017-05-311-47/+48
|
* netfilter: x_tables: enforce nul-terminated table name from getsockopt ↵Pablo Neira Ayuso2017-05-304-0/+10
| | | | | | | | | | | | GET_ENTRIES Make sure the table names via getsockopt GET_ENTRIES is nul-terminated in ebtables and all the x_tables variants and their respective compat code. Uncovered by KASAN. Change-Id: Id72a6528fe4b5fb550979a1960b34f9b9ee0b71a Reported-by: Baozeng Ding <sploving1@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* f2fs: add lazytime mount optionJaegeuk Kim2017-05-291-0/+14
| | | | | | This patch adds lazytime support. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* binder: remove unused varMister Oyster2017-05-291-1/+0
| | | | | | ../../../../../../kernel/meizu/m2note/drivers/android/binder.c:63:14: warning: 'system_server_pid' defined but not used [-Wunused-variable] static pid_t system_server_pid;
* binder: Quiet BinderUma Maheshwari Bhiram2017-05-291-2/+2
| | | | | | | | | | Temporary change to avoid watchdog bark because of excessive failed transaction logging CRs-Fixed: 572081 Change-Id: Id664d65ab9e78627991f8b7d4f4e5e126908c214 Signed-off-by: Uma Maheshwari Bhiram <ubhira@codeaurora.org>
* vfs: add find_inode_nowait() functionTheodore Ts'o2017-05-292-0/+55
| | | | | | | | | | Add a new function find_inode_nowait() which is an even more general version of ilookup5_nowait(). It is designed for callers which need very fine grained control over when the function is allowed to block or increment the inode's reference count. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fscrypt: remove duplicate d_inode definition in ext4 & f2fs In file included ↵Mister Oyster2017-05-292-15/+0
| | | | from ../../../../../../kernel/meizu/m2note/fs/crypto/crypto.c:29:0: ../../../../../../kernel/meizu/m2note/fs/crypto/fscrypt_private.h:103:29: error: redefinition of 'd_inode' static inline struct inode *d_inode(const struct dentry *dentry) ^ In file included from ../../../../../../kernel/meizu/m2note/include/linux/fs.h:8:0, from ../../../../../../kernel/meizu/m2note/include/linux/pagemap.h:8, from ../../../../../../kernel/meizu/m2note/fs/crypto/crypto.c:22: ../../../../../../kernel/meizu/m2note/include/linux/dcache.h:430:29: note: previous definition of 'd_inode' was here static inline struct inode *d_inode(const struct dentry *dentry) ^ In file included from ../../../../../../kernel/meizu/m2note/fs/crypto/crypto.c:29:0: ../../../../../../kernel/meizu/m2note/fs/crypto/fscrypt_private.h:108:20: error: redefinition of 'd_is_negative' static inline bool d_is_negative(const struct dentry *dentry) In file included from ../../../../../../kernel/meizu/m2note/fs/f2fs/dir.c:14:0: ../../../../../../kernel/meizu/m2note/fs/f2fs/f2fs.h:182:29: error: redefinition of 'd_inode' static inline struct inode *d_inode(const struct dentry *dentry) ^ In file included from ../../../../../../kernel/meizu/m2note/include/linux/fs.h:8:0, from ../../../../../../kernel/meizu/m2note/fs/f2fs/dir.c:11: ../../../../../../kernel/meizu/m2note/include/linux/dcache.h:430:29: note: previous definition of 'd_inode' was here static inline struct inode *d_inode(const struct dentry *dentry)
* tracing: Fix event header writeback.h to include tracepoint.hSteven Rostedt (Red Hat)2017-05-291-0/+1
| | | | | | | | | | | | | The trace event headers are required to include tracepoint.h. The only reason they worked now is because module.h included tracepoint.h, and that will soon change. Link: http://lkml.kernel.org/r/20140226190644.442886305@goodmis.org Fixes: 455b2864686d "writeback: Initial tracing support" Cc: Dave Chinner <dchinner@redhat.com> Cc: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* fs: add dirtytime_expire_seconds sysctlTheodore Ts'o2017-05-293-0/+22
| | | | | | | | Add a tuning knob so we can adjust the dirtytime expiration timeout, which is very useful for testing lazytime. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
* fs: make sure the timestamps for lazytime inodes eventually get writtenTheodore Ts'o2017-05-292-10/+73
| | | | | | | | | | | | | | | | | | Jan Kara pointed out that if there is an inode which is constantly getting dirtied with I_DIRTY_PAGES, an inode with an updated timestamp will never be written since inode->dirtied_when is constantly getting updated. We fix this by adding an extra field to the inode, dirtied_time_when, so inodes with a stale dirtytime can get detected and handled. In addition, if we have a dirtytime inode caused by an atime update, and there is no write activity on the file system, we need to have a secondary system to make sure these inodes get written out. We do this by setting up a second delayed work structure which wakes up the CPU much more rarely compared to writeback_expire_centisecs. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
* ext4: prevent bugon on race between write/fcntlDmitry Monakhov2017-05-291-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit a41537e69b4aa43f0fea02498c2595a81267383b upstream. O_DIRECT flags can be toggeled via fcntl(F_SETFL). But this value checked twice inside ext4_file_write_iter() and __generic_file_write() which result in BUG_ON inside ext4_direct_IO. Let's initialize iocb->private unconditionally. TESTCASE: xfstest:generic/036 https://patchwork.ozlabs.org/patch/402445/ #TYPICAL STACK TRACE: kernel BUG at fs/ext4/inode.c:2960! invalid opcode: 0000 [#1] SMP Modules linked in: brd iTCO_wdt lpc_ich mfd_core igb ptp dm_mirror dm_region_hash dm_log dm_mod CPU: 6 PID: 5505 Comm: aio-dio-fcntl-r Not tainted 3.17.0-rc2-00176-gff5c017 #161 Hardware name: Intel Corporation W2600CR/W2600CR, BIOS SE5C600.86B.99.99.x028.061320111235 06/13/2011 task: ffff88080e95a7c0 ti: ffff88080f908000 task.ti: ffff88080f908000 RIP: 0010:[<ffffffff811fabf2>] [<ffffffff811fabf2>] ext4_direct_IO+0x162/0x3d0 RSP: 0018:ffff88080f90bb58 EFLAGS: 00010246 RAX: 0000000000000400 RBX: ffff88080fdb2a28 RCX: 00000000a802c818 RDX: 0000040000080000 RSI: ffff88080d8aeb80 RDI: 0000000000000001 RBP: ffff88080f90bbc8 R08: 0000000000000000 R09: 0000000000001581 R10: 0000000000000000 R11: 0000000000000000 R12: ffff88080d8aeb80 R13: ffff88080f90bbf8 R14: ffff88080fdb28c8 R15: ffff88080fdb2a28 FS: 00007f23b2055700(0000) GS:ffff880818400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f23b2045000 CR3: 000000080cedf000 CR4: 00000000000407e0 Stack: ffff88080f90bb98 0000000000000000 7ffffffffffffffe ffff88080fdb2c30 0000000000000200 0000000000000200 0000000000000001 0000000000000200 ffff88080f90bbc8 ffff88080fdb2c30 ffff88080f90be08 0000000000000200 Call Trace: [<ffffffff8112ca9d>] generic_file_direct_write+0xed/0x180 [<ffffffff8112f2b2>] __generic_file_write_iter+0x222/0x370 [<ffffffff811f495b>] ext4_file_write_iter+0x34b/0x400 [<ffffffff811bd709>] ? aio_run_iocb+0x239/0x410 [<ffffffff811bd709>] ? aio_run_iocb+0x239/0x410 [<ffffffff810990e5>] ? local_clock+0x25/0x30 [<ffffffff810abd94>] ? __lock_acquire+0x274/0x700 [<ffffffff811f4610>] ? ext4_unwritten_wait+0xb0/0xb0 [<ffffffff811bd756>] aio_run_iocb+0x286/0x410 [<ffffffff810990e5>] ? local_clock+0x25/0x30 [<ffffffff810ac359>] ? lock_release_holdtime+0x29/0x190 [<ffffffff811bc05b>] ? lookup_ioctx+0x4b/0xf0 [<ffffffff811bde3b>] do_io_submit+0x55b/0x740 [<ffffffff811bdcaa>] ? do_io_submit+0x3ca/0x740 [<ffffffff811be030>] SyS_io_submit+0x10/0x20 [<ffffffff815ce192>] system_call_fastpath+0x16/0x1b Code: 01 48 8b 80 f0 01 00 00 48 8b 18 49 8b 45 10 0f 85 f1 01 00 00 48 03 45 c8 48 3b 43 48 0f 8f e3 01 00 00 49 83 7c 24 18 00 75 04 <0f> 0b eb fe f0 ff 83 ec 01 00 00 49 8b 44 24 18 8b 00 85 c0 89 RIP [<ffffffff811fabf2>] ext4_direct_IO+0x162/0x3d0 RSP <ffff88080f90bb58> Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> [hujianyang: Backported to 3.10 - Move initialization of iocb->private to ext4_file_write() as we don't have ext4_file_write_iter(), which is introduced by commit 9b884164. - Adjust context to make 'overwrite' changes apply to ext4_file_dio_write() as ext4_file_dio_write() is not move into ext4_file_write()] Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: allow DAX writeback for hole punchRoss Zwisler2017-05-291-2/+2
| | | | | | | | | | | | | | commit cca32b7eeb4ea24fa6596650e06279ad9130af98 upstream. Currently when doing a DAX hole punch with ext4 we fail to do a writeback. This is because the logic around filemap_write_and_wait_range() in ext4_punch_hole() only looks for dirty page cache pages in the radix tree, not for dirty DAX exceptional entries. Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Willy Tarreau <w@1wt.eu>
* ext4: reinforce check of i_dtime when clearing high fields of uid and gidDaeho Jeong2017-05-291-4/+4
| | | | | | | | | | | | | | | | | | | | | | | commit 93e3b4e6631d2a74a8cf7429138096862ff9f452 upstream. Now, ext4_do_update_inode() clears high 16-bit fields of uid/gid of deleted and evicted inode to fix up interoperability with old kernels. However, it checks only i_dtime of an inode to determine whether the inode was deleted and evicted, and this is very risky, because i_dtime can be used for the pointer maintaining orphan inode list, too. We need to further check whether the i_dtime is being used for the orphan inode list even if the i_dtime is not NULL. We found that high 16-bit fields of uid/gid of inode are unintentionally and permanently cleared when the inode truncation is just triggered, but not finished, and the inode metadata, whose high uid/gid bits are cleared, is written on disk, and the sudden power-off follows that in order. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Hobin Woo <hobin.woo@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Willy Tarreau <w@1wt.eu>
* ext4: use __GFP_NOFAIL in ext4_free_blocks()Konstantin Khlebnikov2017-05-291-19/+28
| | | | | | | | | | | | | commit adb7ef600cc9d9d15ecc934cc26af5c1379777df upstream. This might be unexpected but pages allocated for sbi->s_buddy_cache are charged to current memory cgroup. So, GFP_NOFS allocation could fail if current task has been killed by OOM or if current memory cgroup has no free memory left. Block allocator cannot handle such failures here yet. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Willy Tarreau <w@1wt.eu>
* ext4: avoid modifying checksum fields directly during checksum verificationDaeho Jeong2017-05-294-36/+42
| | | | | | | | | | | | | | | | | | commit b47820edd1634dc1208f9212b7ecfb4230610a23 upstream. We temporally change checksum fields in buffers of some types of metadata into '0' for verifying the checksum values. By doing this without locking the buffer, some metadata's checksums, which are being committed or written back to the storage, could be damaged. In our test, several metadata blocks were found with damaged metadata checksum value during recovery process. When we only verify the checksum value, we have to avoid modifying checksum fields directly. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* ext4: validate that metadata blocks do not overlap superblockTheodore Ts'o2017-05-291-1/+17
| | | | | | | | | | | | | | commit 829fa70dddadf9dd041d62b82cd7cea63943899d upstream. A number of fuzzing failures seem to be caused by allocation bitmaps or other metadata blocks being pointed at the superblock. This can cause kernel BUG or WARNings once the superblock is overwritten, so validate the group descriptor blocks to make sure this doesn't happen. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Willy Tarreau <w@1wt.eu>
* ext4: fix BACKPORT: posix_acl: Clear SGID bit when setting file permissionsMister Oyster2017-05-291-8/+4
|
* ext4: return -ENOMEM instead of successDan Carpenter2017-05-291-1/+3
| | | | | | | | | | | | [ Upstream commit 578620f451f836389424833f1454eeeb2ffc9e9f ] We should set the error code if kzalloc() fails. Fixes: 67cf5b09a46f ("ext4: add the basic function for inline data support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
* ext4: fix mballoc breakage with 64k block sizeChandan Rajendra2017-05-291-1/+1
| | | | | | | | | | | | | | | | [ Upstream commit 69e43e8cc971a79dd1ee5d4343d8e63f82725123 ] 'border' variable is set to a value of 2 times the block size of the underlying filesystem. With 64k block size, the resulting value won't fit into a 16-bit variable. Hence this commit changes the data type of 'border' to 'unsigned int'. Fixes: c9de560ded61f Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
* ext4: fix stack memory corruption with 64k block sizeChandan Rajendra2017-05-291-1/+1
| | | | | | | | | | | | | | | | | [ Upstream commit 30a9d7afe70ed6bd9191d3000e2ef1a34fb58493 ] The number of 'counters' elements needed in 'struct sg' is super_block->s_blocksize_bits + 2. Presently we have 16 'counters' elements in the array. This is insufficient for block sizes >= 32k. In such cases the memcpy operation performed in ext4_mb_seq_groups_show() would cause stack memory corruption. Fixes: c9de560ded61f Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
* ext4: sanity check the block and cluster size at mount timeTheodore Ts'o2017-05-292-1/+17
| | | | | | | | | | | | | | | | [ Upstream commit 9e47a4c9fc58032ee135bf76516809c7624b1551 ] If the block size or cluster size is insane, reject the mount. This is important for security reasons (although we shouldn't be just depending on this check). Ref: http://www.securityfocus.com/archive/1/539661 Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1332506 Reported-by: Borislav Petkov <bp@alien8.de> Reported-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
* ext4: use more strict checks for inodes_per_block on mountTheodore Ts'o2017-05-291-9/+6
| | | | | | | | | | | | [ Upstream commit cd6bb35bf7f6d7d922509bf50265383a0ceabe96 ] Centralize the checks for inodes_per_block and be more strict to make sure the inodes_per_block_group can't end up being zero. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
* ext4: add sanity checking to count_overhead()Theodore Ts'o2017-05-291-3/+8
| | | | | | | | | | | | | [ Upstream commit c48ae41bafe31e9a66d8be2ced4e42a6b57fa814 ] The commit "ext4: sanity check the block and cluster size at mount time" should prevent any problems, but in case the superblock is modified while the file system is mounted, add an extra safety check to make sure we won't overrun the allocated buffer. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
* ext4: fix reference counting bug on block allocation errorVegard Nossum2017-05-291-14/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 554a5ccc4e4a20c5f3ec859de0842db4b4b9c77e upstream. If we hit this error when mounted with errors=continue or errors=remount-ro: EXT4-fs error (device loop0): ext4_mb_mark_diskspace_used:2940: comm ext4.exe: Allocating blocks 5090-6081 which overlap fs metadata then ext4_mb_new_blocks() will call ext4_mb_release_context() and try to continue. However, ext4_mb_release_context() is the wrong thing to call here since we are still actually using the allocation context. Instead, just error out. We could retry the allocation, but there is a possibility of getting stuck in an infinite loop instead, so this seems safer. [ Fixed up so we don't return EAGAIN to userspace. --tytso ] Fixes: 8556e8f3b6 ("ext4: Don't allow new groups to be added during block allocation") Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: stable@vger.kernel.org [wt: 3.10 doesn't have EFSCORRUPTED, but XFS uses EUCLEAN as does 3.14 on this patch so use this instead] Signed-off-by: Willy Tarreau <w@1wt.eu>
* ext4: short-cut orphan cleanup on errorVegard Nossum2017-05-291-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c65d5c6c81a1f27dec5f627f67840726fcd146de upstream. If we encounter a filesystem error during orphan cleanup, we should stop. Otherwise, we may end up in an infinite loop where the same inode is processed again and again. EXT4-fs (loop0): warning: checktime reached, running e2fsck is recommended EXT4-fs error (device loop0): ext4_mb_generate_buddy:758: group 2, block bitmap and bg descriptor inconsistent: 6117 vs 0 free clusters Aborting journal on device loop0-8. EXT4-fs (loop0): Remounting filesystem read-only EXT4-fs error (device loop0) in ext4_free_blocks:4895: Journal has aborted EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted EXT4-fs error (device loop0) in ext4_ext_remove_space:3068: IO failure EXT4-fs error (device loop0) in ext4_ext_truncate:4667: Journal has aborted EXT4-fs error (device loop0) in ext4_orphan_del:2927: Journal has aborted EXT4-fs error (device loop0) in ext4_do_update_inode:4893: Journal has aborted EXT4-fs (loop0): Inode 16 (00000000618192a0): orphan list check failed! [...] EXT4-fs (loop0): Inode 16 (0000000061819748): orphan list check failed! [...] EXT4-fs (loop0): Inode 16 (0000000061819bf0): orphan list check failed! [...] See-also: c9eb13a9105 ("ext4: fix hang when processing corrupted orphaned inode list") Cc: Jan Kara <jack@suse.cz> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Willy Tarreau <w@1wt.eu>
* ext4: don't call ext4_should_journal_data() on the journal inodeVegard Nossum2017-05-291-3/+3
| | | | | | | | | | | | | | | | | | | | | commit 6a7fd522a7c94cdef0a3b08acf8e6702056e635c upstream. If ext4_fill_super() fails early, it's possible for ext4_evict_inode() to call ext4_should_journal_data() before superblock options and flags are fully set up. In that case, the iput() on the journal inode can end up causing a BUG(). Work around this problem by reordering the tests so we only call ext4_should_journal_data() after we know it's not the journal inode. Fixes: 2d859db3e4 ("ext4: fix data corruption in inodes with journalled data") Fixes: 2b405bfa84 ("ext4: fix data=journal fast mount/umount hang") Cc: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Willy Tarreau <w@1wt.eu>
* ext4: check for extents that wrap aroundVegard Nossum2017-05-291-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f70749ca42943faa4d4dcce46dfdcaadb1d0c4b6 upstream. An extent with lblock = 4294967295 and len = 1 will pass the ext4_valid_extent() test: ext4_lblk_t last = lblock + len - 1; if (len == 0 || lblock > last) return 0; since last = 4294967295 + 1 - 1 = 4294967295. This would later trigger the BUG_ON(es->es_lblk + es->es_len < es->es_lblk) in ext4_es_end(). We can simplify it by removing the - 1 altogether and changing the test to use lblock + len <= lblock, since now if len = 0, then lblock + 0 == lblock and it fails, and if len > 0 then lblock + len > lblock in order to pass (i.e. it doesn't overflow). Fixes: 5946d0893 ("ext4: check for overlapping extents in ext4_valid_extent_entries()") Fixes: 2f974865f ("ext4: check for zero length extent explicitly") Cc: Eryu Guan <guaneryu@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Willy Tarreau <w@1wt.eu>
* ext4: verify extent header depthVegard Nossum2017-05-291-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 7bc9491645118c9461bd21099c31755ff6783593 upstream. Although the extent tree depth of 5 should enough be for the worst case of 2*32 extents of length 1, the extent tree code does not currently to merge nodes which are less than half-full with a sibling node, or to shrink the tree depth if possible. So it's possible, at least in theory, for the tree depth to be greater than 5. However, even in the worst case, a tree depth of 32 is highly unlikely, and if the file system is maliciously corrupted, an insanely large eh_depth can cause memory allocation failures that will trigger kernel warnings (here, eh_depth = 65280): JBD2: ext4.exe wants too many credits credits:195849 rsv_credits:0 max:256 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 50 at fs/jbd2/transaction.c:293 start_this_handle+0x569/0x580 CPU: 0 PID: 50 Comm: ext4.exe Not tainted 4.7.0-rc5+ #508 Stack: 604a8947 625badd8 0002fd09 00000000 60078643 00000000 62623910 601bf9bc 62623970 6002fc84 626239b0 900000125 Call Trace: [<6001c2dc>] show_stack+0xdc/0x1a0 [<601bf9bc>] dump_stack+0x2a/0x2e [<6002fc84>] __warn+0x114/0x140 [<6002fdff>] warn_slowpath_null+0x1f/0x30 [<60165829>] start_this_handle+0x569/0x580 [<60165d4e>] jbd2__journal_start+0x11e/0x220 [<60146690>] __ext4_journal_start_sb+0x60/0xa0 [<60120a81>] ext4_truncate+0x131/0x3a0 [<60123677>] ext4_setattr+0x757/0x840 [<600d5d0f>] notify_change+0x16f/0x2a0 [<600b2b16>] do_truncate+0x76/0xc0 [<600c3e56>] path_openat+0x806/0x1300 [<600c55c9>] do_filp_open+0x89/0xf0 [<600b4074>] do_sys_open+0x134/0x1e0 [<600b4140>] SyS_open+0x20/0x30 [<6001ea68>] handle_syscall+0x88/0x90 [<600295fd>] userspace+0x3fd/0x500 [<6001ac55>] fork_handler+0x85/0x90 ---[ end trace 08b0b88b6387a244 ]--- [ Commit message modified and the extent tree depath check changed from 5 to 32 -- tytso ] Cc: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Willy Tarreau <w@1wt.eu>
* ext4: silence UBSAN in ext4_mb_init()Nicolai Stange2017-05-291-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 935244cd54b86ca46e69bc6604d2adfb1aec2d42 upstream. Currently, in ext4_mb_init(), there's a loop like the following: do { ... offset += 1 << (sb->s_blocksize_bits - i); i++; } while (i <= sb->s_blocksize_bits + 1); Note that the updated offset is used in the loop's next iteration only. However, at the last iteration, that is at i == sb->s_blocksize_bits + 1, the shift count becomes equal to (unsigned)-1 > 31 (c.f. C99 6.5.7(3)) and UBSAN reports UBSAN: Undefined behaviour in fs/ext4/mballoc.c:2621:15 shift exponent 4294967295 is too large for 32-bit type 'int' [...] Call Trace: [<ffffffff818c4d25>] dump_stack+0xbc/0x117 [<ffffffff818c4c69>] ? _atomic_dec_and_lock+0x169/0x169 [<ffffffff819411ab>] ubsan_epilogue+0xd/0x4e [<ffffffff81941cac>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254 [<ffffffff81941ab1>] ? __ubsan_handle_load_invalid_value+0x158/0x158 [<ffffffff814b6dc1>] ? kmem_cache_alloc+0x101/0x390 [<ffffffff816fc13b>] ? ext4_mb_init+0x13b/0xfd0 [<ffffffff814293c7>] ? create_cache+0x57/0x1f0 [<ffffffff8142948a>] ? create_cache+0x11a/0x1f0 [<ffffffff821c2168>] ? mutex_lock+0x38/0x60 [<ffffffff821c23ab>] ? mutex_unlock+0x1b/0x50 [<ffffffff814c26ab>] ? put_online_mems+0x5b/0xc0 [<ffffffff81429677>] ? kmem_cache_create+0x117/0x2c0 [<ffffffff816fcc49>] ext4_mb_init+0xc49/0xfd0 [...] Observe that the mentioned shift exponent, 4294967295, equals (unsigned)-1. Unless compilers start to do some fancy transformations (which at least GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the such calculated value of offset is never used again. Silence UBSAN by introducing another variable, offset_incr, holding the next increment to apply to offset and adjust that one by right shifting it by one position per loop iteration. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161 Cc: stable@vger.kernel.org Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Willy Tarreau <w@1wt.eu>
* ext4: address UBSAN warning in mb_find_order_for_block()Nicolai Stange2017-05-291-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit b5cb316cdf3a3f5f6125412b0f6065185240cfdc upstream. Currently, in mb_find_order_for_block(), there's a loop like the following: while (order <= e4b->bd_blkbits + 1) { ... bb += 1 << (e4b->bd_blkbits - order); } Note that the updated bb is used in the loop's next iteration only. However, at the last iteration, that is at order == e4b->bd_blkbits + 1, the shift count becomes negative (c.f. C99 6.5.7(3)) and UBSAN reports UBSAN: Undefined behaviour in fs/ext4/mballoc.c:1281:11 shift exponent -1 is negative [...] Call Trace: [<ffffffff818c4d35>] dump_stack+0xbc/0x117 [<ffffffff818c4c79>] ? _atomic_dec_and_lock+0x169/0x169 [<ffffffff819411bb>] ubsan_epilogue+0xd/0x4e [<ffffffff81941cbc>] __ubsan_handle_shift_out_of_bounds+0x1fb/0x254 [<ffffffff81941ac1>] ? __ubsan_handle_load_invalid_value+0x158/0x158 [<ffffffff816e93a0>] ? ext4_mb_generate_from_pa+0x590/0x590 [<ffffffff816502c8>] ? ext4_read_block_bitmap_nowait+0x598/0xe80 [<ffffffff816e7b7e>] mb_find_order_for_block+0x1ce/0x240 [...] Unless compilers start to do some fancy transformations (which at least GCC 6.0.0 doesn't currently do), the issue is of cosmetic nature only: the such calculated value of bb is never used again. Silence UBSAN by introducing another variable, bb_incr, holding the next increment to apply to bb and adjust that one by right shifting it by one position per loop iteration. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=114701 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=112161 Cc: stable@vger.kernel.org Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Willy Tarreau <w@1wt.eu>
* ext4: fix hang when processing corrupted orphaned inode listTheodore Ts'o2017-05-291-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c9eb13a9105e2e418f72e46a2b6da3f49e696902 upstream. If the orphaned inode list contains inode #5, ext4_iget() returns a bad inode (since the bootloader inode should never be referenced directly). Because of the bad inode, we end up processing the inode repeatedly and this hangs the machine. This can be reproduced via: mke2fs -t ext4 /tmp/foo.img 100 debugfs -w -R "ssv last_orphan 5" /tmp/foo.img mount -o loop /tmp/foo.img /mnt (But don't do this if you are using an unpatched kernel if you care about the system staying functional. :-) This bug was found by the port of American Fuzzy Lop into the kernel to find file system problems[1]. (Since it *only* happens if inode #5 shows up on the orphan list --- 3, 7, 8, etc. won't do it, it's not surprising that AFL needed two hours before it found it.) [1] http://events.linuxfoundation.org/sites/events/files/slides/AFL%20filesystem%20fuzzing%2C%20Vault%202016_0.pdf Cc: stable@vger.kernel.org Reported by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Willy Tarreau <w@1wt.eu>
* ext4: add lockdep annotations for i_data_semTheodore Ts'o2017-05-293-4/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | commit daf647d2dd58cec59570d7698a45b98e580f2076 upstream. With the internal Quota feature, mke2fs creates empty quota inodes and quota usage tracking is enabled as soon as the file system is mounted. Since quotacheck is no longer preallocating all of the blocks in the quota inode that are likely needed to be written to, we are now seeing a lockdep false positive caused by needing to allocate a quota block from inside ext4_map_blocks(), while holding i_data_sem for a data inode. This results in this complaint: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ei->i_data_sem); lock(&s->s_dquot.dqio_mutex); lock(&ei->i_data_sem); lock(&s->s_dquot.dqio_mutex); Google-Bug-Id: 27907753 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
* ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()Eryu Guan2017-05-291-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5e1021f2b6dff1a86a468a1424d59faae2bc63c1 upstream. ext4_reserve_inode_write() in ext4_mark_inode_dirty() could fail on error (e.g. EIO) and iloc.bh can be NULL in this case. But the error is ignored in the following "if" condition and ext4_expand_extra_isize() might be called with NULL iloc.bh set, which triggers NULL pointer dereference. This is uncovered by commit 8b4953e13f4c ("ext4: reserve code points for the project quota feature"), which enlarges the ext4_inode size, and run the following script on new kernel but with old mke2fs: #/bin/bash mnt=/mnt/ext4 devname=ext4-error dev=/dev/mapper/$devname fsimg=/home/fs.img trap cleanup 0 1 2 3 9 15 cleanup() { umount $mnt >/dev/null 2>&1 dmsetup remove $devname losetup -d $backend_dev rm -f $fsimg exit 0 } rm -f $fsimg fallocate -l 1g $fsimg backend_dev=`losetup -f --show $fsimg` devsize=`blockdev --getsz $backend_dev` good_tab="0 $devsize linear $backend_dev 0" error_tab="0 $devsize error $backend_dev 0" dmsetup create $devname --table "$good_tab" mkfs -t ext4 $dev mount -t ext4 -o errors=continue,strictatime $dev $mnt dmsetup load $devname --table "$error_tab" && dmsetup resume $devname echo 3 > /proc/sys/vm/drop_caches ls -l $mnt exit 0 [ Patch changed to simplify the function a tiny bit. -- Ted ] Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Willy Tarreau <w@1wt.eu>
* ext4: fix potential integer overflowInsu Yun2017-05-291-1/+1
| | | | | | | | | | | | | commit 46901760b46064964b41015d00c140c83aa05bcf upstream. Since sizeof(ext_new_group_data) > sizeof(ext_new_flex_group_data), integer overflow could be happened. Therefore, need to fix integer overflow sanitization. Signed-off-by: Insu Yun <wuninsu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: Fix handling of extended tv_secDavid Turner2017-05-291-7/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit a4dad1ae24f850410c4e60f22823cba1289b8d52 upstream. In ext4, the bottom two bits of {a,c,m}time_extra are used to extend the {a,c,m}time fields, deferring the year 2038 problem to the year 2446. When decoding these extended fields, for times whose bottom 32 bits would represent a negative number, sign extension causes the 64-bit extended timestamp to be negative as well, which is not what's intended. This patch corrects that issue, so that the only negative {a,c,m}times are those between 1901 and 1970 (as per 32-bit signed timestamps). Some older kernels might have written pre-1970 dates with 1,1 in the extra bits. This patch treats those incorrectly-encoded dates as pre-1970, instead of post-2311, until kernel 4.20 is released. Hopefully by then e2fsck will have fixed up the bad data. Also add a comment explaining the encoding of ext4's extra {a,c,m}time bits. Signed-off-by: David Turner <novalis@novalis.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Mark Harris <mh8928@yahoo.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=23732 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* fix calculation of meta_bg descriptor backupsAndy Leiserson2017-05-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | commit 904dad4742d211b7a8910e92695c0fa957483836 upstream. "group" is the group where the backup will be placed, and is initialized to zero in the declaration. This meant that backups for meta_bg descriptors were erroneously written to the backup block group descriptors in groups 1 and (desc_per_block-1). Reproduction information: mke2fs -Fq -t ext4 -b 1024 -O ^resize_inode /tmp/foo.img 16G truncate -s 24G /tmp/foo.img losetup /dev/loop0 /tmp/foo.img mount /dev/loop0 /mnt resize2fs /dev/loop0 umount /dev/loop0 dd if=/dev/zero of=/dev/loop0 bs=1024 count=2 e2fsck -fy /dev/loop0 losetup -d /dev/loop0 Signed-off-by: Andy Leiserson <andy@leiserson.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4, jbd2: ensure entering into panic after recording an error in superblockDaeho Jeong2017-05-293-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 4327ba52afd03fc4b5afa0ee1d774c9c5b0e85c5 upstream. If a EXT4 filesystem utilizes JBD2 journaling and an error occurs, the journaling will be aborted first and the error number will be recorded into JBD2 superblock and, finally, the system will enter into the panic state in "errors=panic" option. But, in the rare case, this sequence is little twisted like the below figure and it will happen that the system enters into panic state, which means the system reset in mobile environment, before completion of recording an error in the journal superblock. In this case, e2fsck cannot recognize that the filesystem failure occurred in the previous run and the corruption wouldn't be fixed. Task A Task B ext4_handle_error() -> jbd2_journal_abort() -> __journal_abort_soft() -> __jbd2_journal_abort_hard() | -> journal->j_flags |= JBD2_ABORT; | | __ext4_abort() | -> jbd2_journal_abort() | | -> __journal_abort_soft() | | -> if (journal->j_flags & JBD2_ABORT) | | return; | -> panic() | -> jbd2_journal_update_sb_errno() Tested-by: Hobin Woo <hobin.woo@samsung.com> Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* ext4: replace open coded nofail allocation in ext4_free_blocks()Michal Hocko2017-05-291-11/+5
| | | | | | | | | | | | | | | commit 7444a072c387a93ebee7066e8aee776954ab0e41 upstream. ext4_free_blocks is looping around the allocation request and mimics __GFP_NOFAIL behavior without any allocation fallback strategy. Let's remove the open coded loop and replace it with __GFP_NOFAIL. Without the flag the allocator has no way to find out never-fail requirement and cannot help in any way. Signed-off-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>