aboutsummaryrefslogtreecommitdiff
Commit message (Collapse)AuthorAgeFilesLines
...
* crypto: af_alg - Allow af_af_alg_release_parent to be called on nokey pathHerbert Xu2017-04-112-1/+9
| | | | | | | | | | | | commit 6a935170a980024dd29199e9dbb5c4da4767a1b9 upstream. This patch allows af_alg_release_parent to be called even for nokey sockets. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* crypto: algif_skcipher - Add key check exception for cipher_nullHerbert Xu2017-04-111-1/+1
| | | | | | | | | | | | commit 6e8d8ecf438792ecf7a3207488fb4eebc4edb040 upstream. This patch adds an exception to the key check so that cipher_null users may continue to use algif_skcipher without setting a key. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* crypto: skcipher - Add crypto_skcipher_has_setkeyHerbert Xu2017-04-113-0/+11
| | | | | | | | | | | | commit a1383cd86a062fc798899ab20f0ec2116cce39cb upstream. This patch adds a way for skcipher users to determine whether a key is required by a transform. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* crypto: algif_hash - Require setkey before accept(2)Herbert Xu2017-04-111-8/+193
| | | | | | | | | | | | | | | | | commit 6de62f15b581f920ade22d758f4c338311c2f0d4 upstream. Hash implementations that require a key may crash if you use them without setting a key. This patch adds the necessary checks so that if you do attempt to use them without a key that we return -ENOKEY instead of proceeding. This patch also adds a compatibility path to support old applications that do acept(2) before setkey. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* crypto: shash - Fix has_key settingHerbert Xu2017-04-111-4/+3
| | | | | | | | | | | | | | | | commit 00420a65fa2beb3206090ead86942484df2275f3 upstream. The has_key logic is wrong for shash algorithms as they always have a setkey function. So we should instead be testing against shash_no_setkey. Fixes: a5596d633278 ("crypto: hash - Add crypto_ahash_has_setkey") Cc: stable@vger.kernel.org Reported-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* crypto: hash - Add crypto_ahash_has_setkeyHerbert Xu2017-04-113-2/+13
| | | | | | | | | | | | commit a5596d6332787fd383b3b5427b41f94254430827 upstream. This patch adds a way for ahash users to determine whether a key is required by a crypto_ahash transform. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* crypto: algif_skcipher - Add nokey compatibility pathHerbert Xu2017-04-111-5/+144
| | | | | | | | | | | | commit a0fa2d037129a9849918a92d91b79ed6c7bd2818 upstream. This patch adds a compatibility path to support old applications that do acept(2) before setkey. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* crypto: af_alg - Add nokey compatibility pathHerbert Xu2017-04-112-1/+14
| | | | | | | | | | | | commit 37766586c965d63758ad542325a96d5384f4a8c9 upstream. This patch adds a compatibility path to support old applications that do acept(2) before setkey. Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* crypto: af_alg - Disallow bind/setkey/... after accept(2)Herbert Xu2017-04-112-8/+35
| | | | | | | | | | | | | | | | | | | | | commit c840ac6af3f8713a71b4d2363419145760bd6044 upstream. Each af_alg parent socket obtained by socket(2) corresponds to a tfm object once bind(2) has succeeded. An accept(2) call on that parent socket creates a context which then uses the tfm object. Therefore as long as any child sockets created by accept(2) exist the parent socket must not be modified or freed. This patch guarantees this by using locks and a reference count on the parent socket. Any attempt to modify the parent socket will fail with EBUSY. Cc: stable@vger.kernel.org Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* crypto: algif_skcipher - Require setkey before accept(2)Herbert Xu2017-04-111-9/+42
| | | | | | | | | | | | | | | | commit dd504589577d8e8e70f51f997ad487a4cb6c026f upstream. Some cipher implementations will crash if you try to use them without calling setkey first. This patch adds a check so that the accept(2) call will fail with -ENOKEY if setkey hasn't been done on the socket yet. Cc: stable@vger.kernel.org Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* metag: Only define atomic_dec_if_positive conditionallyGuenter Roeck2017-04-111-2/+1
| | | | | | | | | | | | | | | | | | | | | commit 35d04077ad96ed33ceea2501f5a4f1eacda77218 upstream. The definition of atomic_dec_if_positive() assumes that atomic_sub_if_positive() exists, which is only the case if metag specific atomics are used. This results in the following build error when trying to build metag1_defconfig. kernel/ucount.c: In function 'dec_ucount': kernel/ucount.c:211: error: implicit declaration of function 'atomic_sub_if_positive' Moving the definition of atomic_dec_if_positive() into the metag conditional code fixes the problem. Fixes: 6006c0d8ce94 ("metag: Atomics, locks and bitops") Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: James Hogan <james.hogan@imgtec.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* fbdev/efifb: Fix 16 color palette entry calculationMax Staudt2017-04-111-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | commit d50b3f43db739f03fcf8c0a00664b3d2fed0496e upstream. When using efifb with a 16-bit (5:6:5) visual, fbcon's text is rendered in the wrong colors - e.g. text gray (#aaaaaa) is rendered as green (#50bc50) and neighboring pixels have slightly different values (such as #50bc78). The reason is that fbcon loads its 16 color palette through efifb_setcolreg(), which in turn calculates a 32-bit value to write into memory for each palette index. Until now, this code could only handle 8-bit visuals and didn't mask overlapping values when ORing them. With this patch, fbcon displays the correct colors when a qemu VM is booted in 16-bit mode (in GRUB: "set gfxpayload=800x600x16"). Fixes: 7c83172b98e5 ("x86_64 EFI boot support: EFI frame buffer driver") # v2.6.24+ Signed-off-by: Max Staudt <mstaudt@suse.de> Acked-By: Peter Jones <pjones@redhat.com> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* dm: mark request_queue dead before destroying the DM deviceBart Van Assche2017-04-111-0/+5
| | | | | | | | | | | | | | commit 3b785fbcf81c3533772c52b717f77293099498d3 upstream. This avoids that new requests are queued while __dm_destroy() is in progress. [js] use md->queue instead of non-present helper Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Willy Tarreau <w@1wt.eu>
* regulator: tps65910: Work around silicon erratum SWCZ010Jan Remmet2017-04-111-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 8f9165c981fed187bb483de84caf9adf835aefda upstream. http://www.ti.com/lit/pdf/SWCZ010: DCDC o/p voltage can go higher than programmed value Impact: VDDI, VDD2, and VIO output programmed voltage level can go higher than expected or crash, when coming out of PFM to PWM mode or using DVFS. Description: When DCDC CLK SYNC bits are 11/01: * VIO 3-MHz oscillator is the source clock of the digital core and input clock of VDD1 and VDD2 * Turn-on of VDD1 and VDD2 HSD PFETis synchronized or at a constant phase shift * Current pulled though VCC1+VCC2 is Iload(VDD1) + Iload(VDD2) * The 3 HSD PFET will be turned-on at the same time, causing the highest possible switching noise on the application. This noise level depends on the layout, the VBAT level, and the load current. The noise level increases with improper layout. When DCDC CLK SYNC bits are 00: * VIO 3-MHz oscillator is the source clock of digital core * VDD1 and VDD2 are running on their own 3-MHz oscillator * Current pulled though VCC1+VCC2 average of Iload(VDD1) + Iload(VDD2) * The switching noise of the 3 SMPS will be randomly spread over time, causing lower overall switching noise. Workaround: Set DCDCCTRL_REG[1:0]= 00. Signed-off-by: Jan Remmet <j.remmet@phytec.de> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
* ASoC: omap-mcpdm: Fix irq resource handlingPeter Ujfalusi2017-04-111-2/+3
| | | | | | | | | | | | | | | commit a8719670687c46ed2e904c0d05fa4cd7e4950cd1 upstream. Fixes: ddd17531ad908 ("ASoC: omap-mcpdm: Clean up with devm_* function") Managed irq request will not doing any good in ASoC probe level as it is not going to free up the irq when the driver is unbound from the sound card. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Reported-by: Russell King <linux@armlinux.org.uk> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
* mfd: 88pm80x: Double shifting bug in suspend/resumeDan Carpenter2017-04-111-2/+2
| | | | | | | | | | | | | commit 9a6dc644512fd083400a96ac4a035ac154fe6b8d upstream. set_bit() and clear_bit() take the bit number so this code is really doing "1 << (1 << irq)" which is a double shift bug. It's done consistently so it won't cause a problem unless "irq" is more than 4. Fixes: 70c6cce04066 ('mfd: Support 88pm80x in 80x driver') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
* mpi: Fix NULL ptr dereference in mpi_powm() [ver #3]Andrey Ryabinin2017-04-111-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit f5527fffff3f002b0a6b376163613b82f69de073 upstream. This fixes CVE-2016-8650. If mpi_powm() is given a zero exponent, it wants to immediately return either 1 or 0, depending on the modulus. However, if the result was initalised with zero limb space, no limbs space is allocated and a NULL-pointer exception ensues. Fix this by allocating a minimal amount of limb space for the result when the 0-exponent case when the result is 1 and not touching the limb space when the result is 0. This affects the use of RSA keys and X.509 certificates that carry them. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6 PGD 0 Oops: 0002 [#1] SMP Modules linked in: CPU: 3 PID: 3014 Comm: keyctl Not tainted 4.9.0-rc6-fscache+ #278 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 task: ffff8804011944c0 task.stack: ffff880401294000 RIP: 0010:[<ffffffff8138ce5d>] [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6 RSP: 0018:ffff880401297ad8 EFLAGS: 00010212 RAX: 0000000000000000 RBX: ffff88040868bec0 RCX: ffff88040868bba0 RDX: ffff88040868b260 RSI: ffff88040868bec0 RDI: ffff88040868bee0 RBP: ffff880401297ba8 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000047 R11: ffffffff8183b210 R12: 0000000000000000 R13: ffff8804087c7600 R14: 000000000000001f R15: ffff880401297c50 FS: 00007f7a7918c700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000401250000 CR4: 00000000001406e0 Stack: ffff88040868bec0 0000000000000020 ffff880401297b00 ffffffff81376cd4 0000000000000100 ffff880401297b10 ffffffff81376d12 ffff880401297b30 ffffffff81376f37 0000000000000100 0000000000000000 ffff880401297ba8 Call Trace: [<ffffffff81376cd4>] ? __sg_page_iter_next+0x43/0x66 [<ffffffff81376d12>] ? sg_miter_get_next_page+0x1b/0x5d [<ffffffff81376f37>] ? sg_miter_next+0x17/0xbd [<ffffffff8138ba3a>] ? mpi_read_raw_from_sgl+0xf2/0x146 [<ffffffff8132a95c>] rsa_verify+0x9d/0xee [<ffffffff8132acca>] ? pkcs1pad_sg_set_buf+0x2e/0xbb [<ffffffff8132af40>] pkcs1pad_verify+0xc0/0xe1 [<ffffffff8133cb5e>] public_key_verify_signature+0x1b0/0x228 [<ffffffff8133d974>] x509_check_for_self_signed+0xa1/0xc4 [<ffffffff8133cdde>] x509_cert_parse+0x167/0x1a1 [<ffffffff8133d609>] x509_key_preparse+0x21/0x1a1 [<ffffffff8133c3d7>] asymmetric_key_preparse+0x34/0x61 [<ffffffff812fc9f3>] key_create_or_update+0x145/0x399 [<ffffffff812fe227>] SyS_add_key+0x154/0x19e [<ffffffff81001c2b>] do_syscall_64+0x80/0x191 [<ffffffff816825e4>] entry_SYSCALL64_slow_path+0x25/0x25 Code: 56 41 55 41 54 53 48 81 ec a8 00 00 00 44 8b 71 04 8b 42 04 4c 8b 67 18 45 85 f6 89 45 80 0f 84 b4 06 00 00 85 c0 75 2f 41 ff ce <49> c7 04 24 01 00 00 00 b0 01 75 0b 48 8b 41 18 48 83 38 01 0f RIP [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6 RSP <ffff880401297ad8> CR2: 0000000000000000 ---[ end trace d82015255d4a5d8d ]--- Basically, this is a backport of a libgcrypt patch: http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=patch;h=6e1adb05d290aeeb1c230c763970695f4a538526 Fixes: cdec9cb5167a ("crypto: GnuPG based MPI lib - source files (part 1)") Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> cc: linux-ima-devel@lists.sourceforge.net Signed-off-by: James Morris <james.l.morris@oracle.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* hwmon: (adt7411) set bit 3 in CFG1 registerMichael Walle2017-04-111-1/+4
| | | | | | | | | | | | commit b53893aae441a034bf4dbbad42fe218561d7d81f upstream. According to the datasheet you should only write 1 to this bit. If it is not set, at least AIN3 will return bad values on newer silicon revisions. Fixes: d84ca5b345c2 ("hwmon: Add driver for ADT7411 voltage and temperature sensor") Signed-off-by: Michael Walle <michael@walle.cc> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Willy Tarreau <w@1wt.eu>
* can: dev: fix deadlock reported after bus-offSergei Miroshnichenko2017-04-112-11/+19
| | | | | | | | | | | | | | | | | | | | commit 9abefcb1aaa58b9d5aa40a8bb12c87d02415e4c8 upstream. A timer was used to restart after the bus-off state, leading to a relatively large can_restart() executed in an interrupt context, which in turn sets up pinctrl. When this happens during system boot, there is a high probability of grabbing the pinctrl_list_mutex, which is locked already by the probe() of other device, making the kernel suspect a deadlock condition [1]. To resolve this issue, the restart_timer is replaced by a delayed work. [1] https://github.com/victronenergy/venus/issues/24 Signed-off-by: Sergei Miroshnichenko <sergeimir@emcraft.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de> Signed-off-by: Willy Tarreau <w@1wt.eu>
* dm flakey: fix reads to be issued if drop_writes configuredMike Snitzer2017-04-111-11/+16
| | | | | | | | | | | | | commit 299f6230bc6d0ccd5f95bb0fb865d80a9c7d5ccc upstream. v4.8-rc3 commit 99f3c90d0d ("dm flakey: error READ bios during the down_interval") overlooked the 'drop_writes' feature, which is meant to allow reads to be issued rather than errored, during the down_interval. Fixes: 99f3c90d0d ("dm flakey: error READ bios during the down_interval") Reported-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* tile: avoid using clocksource_cyc2ns with absolute cycle countChris Metcalf2017-04-111-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e658a6f14d7c0243205f035979d0ecf6c12a036f upstream. For large values of "mult" and long uptimes, the intermediate result of "cycles * mult" can overflow 64 bits. For example, the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock; we have mult = 853, and after 208.5 days, we overflow 64 bits. Since clocksource_cyc2ns() is intended to be used for relative cycle counts, not absolute cycle counts, performance is more importance than accepting a wider range of cycle values. So, just use mult_frac() directly in tile's sched_clock(). Commit 4cecf6d401a0 ("sched, x86: Avoid unnecessary overflow in sched_clock") by Salman Qazi results in essentially the same generated code for x86 as this change does for tile. In fact, a follow-on change by Salman introduced mult_frac() and switched to using it, so the C code was largely identical at that point too. Peter Zijlstra then added mul_u64_u32_shr() and switched x86 to use it. This is, in principle, better; by optimizing the 64x64->64 multiplies to be 32x32->64 multiplies we can potentially save some time. However, the compiler piplines the 64x64->64 multiplies pretty well, and the conditional branch in the generic mul_u64_u32_shr() causes some bubbles in execution, with the result that it's pretty much a wash. If tilegx provided its own implementation of mul_u64_u32_shr() without the conditional branch, we could potentially save 3 cycles, but that seems like small gain for a fair amount of additional build scaffolding; no other platform currently provides a mul_u64_u32_shr() override, and tile doesn't currently have an <asm/div64.h> header to put the override in. Additionally, gcc currently has an optimization bug that prevents it from recognizing the opportunity to use a 32x32->64 multiply, and so the result would be no better than the existing mult_frac() until such time as the compiler is fixed. For now, just using mult_frac() seems like the right answer. Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* PCI: Handle read-only BARs on AMD CS553x devicesMyron Stowe2017-04-111-4/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 06cf35f903aa6da0cc8d9f81e9bcd1f7e1b534bb upstream. Some AMD CS553x devices have read-only BARs because of a firmware or hardware defect. There's a workaround in quirk_cs5536_vsa(), but it no longer works after 36e8164882ca ("PCI: Restore detection of read-only BARs"). Prior to 36e8164882ca, we filled in res->start; afterwards we leave it zeroed out. The quirk only updated the size, so the driver tried to use a region starting at zero, which didn't work. Expand quirk_cs5536_vsa() to read the base addresses from the BARs and hard-code the sizes. On Nix's system BAR 2's read-only value is 0x6200. Prior to 36e8164882ca, we interpret that as a 512-byte BAR based on the lowest-order bit set. Per datasheet sec 5.6.1, that BAR (MFGPT) requires only 64 bytes; use that to avoid clearing any address bits if a platform uses only 64-byte alignment. [js] pcibios_bus_to_resource takes pdev, not bus in 3.12 [bhelgaas: changelog, reduce BAR 2 size to 64] Fixes: 36e8164882ca ("PCI: Restore detection of read-only BARs") Link: https://bugzilla.kernel.org/show_bug.cgi?id=85991#c4 Link: http://support.amd.com/TechDocs/31506_cs5535_databook.pdf Link: http://support.amd.com/TechDocs/33238G_cs5536_db.pdf Reported-and-tested-by: Nix <nix@esperi.org.uk> Signed-off-by: Myron Stowe <myron.stowe@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Willy Tarreau <w@1wt.eu>
* ACPI / APEI: Fix incorrect return value of ghes_proc()Punit Agrawal2017-04-111-1/+1
| | | | | | | | | | | | | | | | commit 806487a8fc8f385af75ed261e9ab658fc845e633 upstream. Although ghes_proc() tests for errors while reading the error status, it always return success (0). Fix this by propagating the return value. Fixes: d334a49113a4a33 (ACPI, APEI, Generic Hardware Error Source memory error support) Signed-of-by: Punit Agrawal <punit.agrawa.@arm.com> Tested-by: Tyler Baicar <tbaicar@codeaurora.org> Reviewed-by: Borislav Petkov <bp@suse.de> [ rjw: Subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* mei: bus: fix received data size check in NFC fixupAlexander Usyskin2017-04-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | commit 582ab27a063a506ccb55fc48afcc325342a2deba upstream. NFC version reply size checked against only header size, not against full message size. That may lead potentially to uninitialized memory access in version data. That leads to warnings when version data is accessed: drivers/misc/mei/bus-fixup.c: warning: '*((void *)&ver+11)' may be used uninitialized in this function [-Wuninitialized]: => 212:2 Reported in Build regressions/improvements in v4.9-rc3 https://lkml.org/lkml/2016/10/30/57 [js] the check is in 3.12 only once Fixes: 59fcd7c63abf (mei: nfc: Initial nfc implementation) Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Willy Tarreau <w@1wt.eu>
* staging: iio: ad5933: avoid uninitialized variable in error caseArnd Bergmann2017-04-111-7/+10
| | | | | | | | | | | | | | | | | | | | | commit 34eee70a7b82b09dbda4cb453e0e21d460dae226 upstream. The ad5933_i2c_read function returns an error code to indicate whether it could read data or not. However ad5933_work() ignores this return code and just accesses the data unconditionally, which gets detected by gcc as a possible bug: drivers/staging/iio/impedance-analyzer/ad5933.c: In function 'ad5933_work': drivers/staging/iio/impedance-analyzer/ad5933.c:649:16: warning: 'status' may be used uninitialized in this function [-Wmaybe-uninitialized] This adds minimal error handling so we only evaluate the data if it was correctly read. Link: https://patchwork.kernel.org/patch/8110281/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Lars-Peter Clausen <lars@metafoo.de> Signed-off-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
* hv: do not lose pending heartbeat vmbus packetsLong Li2017-04-111-3/+7
| | | | | | | | | | | | | | | | commit 407a3aee6ee2d2cb46d9ba3fc380bc29f35d020c upstream. The host keeps sending heartbeat packets independent of the guest responding to them. Even though we respond to the heartbeat messages at interrupt level, we can have situations where there maybe multiple heartbeat messages pending that have not been responded to. For instance this occurs when the VM is paused and the host continues to send the heartbeat messages. Address this issue by draining and responding to all the heartbeat messages that maybe pending. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* uio: fix dmem_region_start computationJan Viktorin2017-04-111-1/+1
| | | | | | | | | | | | | | | | commit 4d31a2588ae37a5d0f61f4d956454e9504846aeb upstream. The variable i contains a total number of resources (including IORESOURCE_IRQ). However, we want the dmem_region_start to point after the last resource of type IORESOURCE_MEM. The original behaviour leads (very likely) to skipping several UIO mapping regions and makes them useless. Fix this by computing dmem_region_start from the uiomem which points to the last used UIO mapping. Fixes: 0a0c3b5a24bd ("Add new uio device for dynamic memory allocation") Signed-off-by: Jan Viktorin <viktorin@rehivetech.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* gpio: mpc8xxx: Correct irq handler functionLiu Gang2017-04-111-1/+1
| | | | | | | | | | | | | | | | | | commit d71cf15b865bdd45925f7b094d169aaabd705145 upstream. From the beginning of the gpio-mpc8xxx.c, the "handle_level_irq" has being used to handle GPIO interrupts in the PowerPC/Layerscape platforms. But actually, almost all PowerPC/Layerscape platforms assert an interrupt request upon either a high-to-low change or any change on the state of the signal. So the "handle_level_irq" is not reasonable for PowerPC/Layerscape GPIO interrupt, it should be "handle_edge_irq". Otherwise the system may lost some interrupts from the PIN's state changes. Signed-off-by: Liu Gang <Gang.Liu@nxp.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
* cx231xx: fix GPIOs for Pixelview SBTVD hybridMauro Carvalho Chehab2017-04-112-2/+3
| | | | | | | | | | | | | commit 24b923f073ac37eb744f56a2c7f77107b8219ab2 upstream. This device uses GPIOs: 28 to switch between analog and digital modes: on digital mode, it should be set to 1. The code that sets it on analog mode is OK, but it misses the logic that sets it on digital mode. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* cx231xx: don't return error on successMauro Carvalho Chehab2017-04-111-1/+4
| | | | | | | | | | | commit 1871d718a9db649b70f0929d2778dc01bc49b286 upstream. The cx231xx_set_agc_analog_digital_mux_select() callers expect it to return 0 or an error. Returning a positive value makes the first attempt to switch between analog/digital to fail. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* mb86a20s: fix demod settingsMauro Carvalho Chehab2017-04-111-50/+42
| | | | | | | | | | | | | | | | | commit 505a0ea706fc1db4381baa6c6bd2e596e730a55e upstream. With the current settings, only one channel locks properly. That's likely because, when this driver was written, Brazil were still using experimental transmissions. Change it to reproduce the settings used by the newer drivers. That makes it lock on other channels. Tested with both PixelView SBTVD Hybrid (cx231xx-based) and C3Tech Digital Duo HDTV/SDTV (em28xx-based) devices. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* mb86a20s: fix the locking logicMauro Carvalho Chehab2017-04-111-1/+11
| | | | | | | | | | | | | | | | | | | commit dafb65fb98d85d8e78405e82c83e81975e5d5480 upstream. On this frontend, it takes a while to start output normal TS data. That only happens on state S9. On S8, the TS output is enabled, but it is not reliable enough. However, the zigzag loop is too fast to let it sync. As, on practical tests, the zigzag software loop doesn't seem to be helping, but just slowing down the tuning, let's switch to hardware algorithm, as the tuners used on such devices are capable of work with frequency drifts without any help from software. Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* pstore/ram: Use memcpy_fromio() to save old bufferAndrew Bresticker2017-04-111-2/+2
| | | | | | | | | | | | | | | | | | commit d771fdf94180de2bd811ac90cba75f0f346abf8d upstream. The ramoops buffer may be mapped as either I/O memory or uncached memory. On ARM64, this results in a device-type (strongly-ordered) mapping. Since unnaligned accesses to device-type memory will generate an alignment fault (regardless of whether or not strict alignment checking is enabled), it is not safe to use memcpy(). memcpy_fromio() is guaranteed to only use aligned accesses, so use that instead. Signed-off-by: Andrew Bresticker <abrestic@chromium.org> Signed-off-by: Enric Balletbo Serra <enric.balletbo@collabora.com> Reviewed-by: Puneet Kumar <puneetster@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
* pstore/ram: Use memcpy_toio instead of memcpyFurquan Shaikh2017-04-111-1/+1
| | | | | | | | | | | | | | | | | | commit 7e75678d23167c2527e655658a8ef36a36c8b4d9 upstream. persistent_ram_update uses vmap / iomap based on whether the buffer is in memory region or reserved region. However, both map it as non-cacheable memory. For armv8 specifically, non-cacheable mapping requests use a memory type that has to be accessed aligned to the request size. memcpy() doesn't guarantee that. Signed-off-by: Furquan Shaikh <furquan@google.com> Signed-off-by: Enric Balletbo Serra <enric.balletbo@collabora.com> Reviewed-by: Aaron Durbin <adurbin@chromium.org> Reviewed-by: Olof Johansson <olofj@chromium.org> Tested-by: Furquan Shaikh <furquan@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
* pstore/core: drop cmpxchg based updatesSebastian Andrzej Siewior2017-04-111-41/+2
| | | | | | | | | | | | | | | | | | | | | | | commit d5a9bf0b38d2ac85c9a693c7fb851f74fd2a2494 upstream. I have here a FPGA behind PCIe which exports SRAM which I use for pstore. Now it seems that the FPGA no longer supports cmpxchg based updates and writes back 0xff…ff and returns the same. This leads to crash during crash rendering pstore useless. Since I doubt that there is much benefit from using cmpxchg() here, I am dropping this atomic access and use the spinlock based version. Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Kees Cook <keescook@chromium.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Rabin Vincent <rabinv@axis.com> Tested-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reviewed-by: Guenter Roeck <linux@roeck-us.net> [kees: remove "_locked" suffix since it's the only option now] Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
* mmc: block: don't use CMD23 with very old MMC cardsDaniel Glöckner2017-04-111-1/+2
| | | | | | | | | | | | | | | | | | | | | | commit 0ed50abb2d8fc81570b53af25621dad560cd49b3 upstream. CMD23 aka SET_BLOCK_COUNT was introduced with MMC v3.1. Older versions of the specification allowed to terminate multi-block transfers only with CMD12. The patch fixes the following problem: mmc0: new MMC card at address 0001 mmcblk0: mmc0:0001 SDMB-16 15.3 MiB mmcblk0: timed out sending SET_BLOCK_COUNT command, card status 0x400900 ... blk_update_request: I/O error, dev mmcblk0, sector 0 Buffer I/O error on dev mmcblk0, logical block 0, async page read mmcblk0: unable to read partition table Signed-off-by: Daniel Glöckner <dg@emlix.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
* mmc: mxs: Initialize the spinlock prior to using itFabio Estevam2017-04-111-2/+2
| | | | | | | | | | | | | | | | | commit f91346e8b5f46aaf12f1df26e87140584ffd1b3f upstream. An interrupt may occur right after devm_request_irq() is called and prior to the spinlock initialization, leading to a kernel oops, as the interrupt handler uses the spinlock. In order to prevent this problem, move the spinlock initialization prior to requesting the interrupts. Fixes: e4243f13d10e (mmc: mxs-mmc: add mmc host driver for i.MX23/28) Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Reviewed-by: Marek Vasut <marex@denx.de> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
* PM / sleep: fix device reference leak in test_suspendJohan Hovold2017-04-111-1/+3
| | | | | | | | | | | | commit ceb75787bc75d0a7b88519ab8a68067ac690f55a upstream. Make sure to drop the reference taken by class_find_device() after opening the RTC device. Fixes: 77437fd4e61f (pm: boot time suspend selftest) Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* mfd: core: Fix device reference leak in mfd_clone_cellJohan Hovold2017-04-111-0/+2
| | | | | | | | | | | | commit 722f191080de641f023feaa7d5648caf377844f5 upstream. Make sure to drop the reference taken by bus_find_device_by_name() before returning from mfd_clone_cell(). Fixes: a9bbba996302 ("mfd: add platform_device sharing support for mfd") Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
* ratelimit: fix bug in time interval by resetting right begin timeJaewon Kim2017-04-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c2594bc37f4464bc74f2c119eb3269a643400aa0 upstream. rs->begin in ratelimit is set in two cases. 1) when rs->begin was not initialized 2) when rs->interval was passed For case #2, current ratelimit sets the begin to 0. This incurrs improper suppression. The begin value will be set in the next ratelimit call by 1). Then the time interval check will be always false, and rs->printed will not be initialized. Although enough time passed, ratelimit may return 0 if rs->printed is not less than rs->burst. To reset interval properly, begin should be jiffies rather than 0. For an example code below: static DEFINE_RATELIMIT_STATE(mylimit, 1, 1); for (i = 1; i <= 10; i++) { if (__ratelimit(&mylimit)) printk("ratelimit test count %d\n", i); msleep(3000); } test result in the current code shows suppression even there is 3 seconds sleep. [ 78.391148] ratelimit test count 1 [ 81.295988] ratelimit test count 2 [ 87.315981] ratelimit test count 4 [ 93.336267] ratelimit test count 6 [ 99.356031] ratelimit test count 8 [ 105.376367] ratelimit test count 10 Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
* rcu: Fix soft lockup for rcu_nocb_kthreadDing Tianhong2017-04-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit bedc1969150d480c462cdac320fa944b694a7162 upstream. Carrying out the following steps results in a softlockup in the RCU callback-offload (rcuo) kthreads: 1. Connect to ixgbevf, and set the speed to 10Gb/s. 2. Use ifconfig to bring the nic up and down repeatedly. [ 317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready [ 368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15] [ 368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000 [ 368.106005] RIP: 0010:[<ffffffff81579e04>] [<ffffffff81579e04>] fib_table_lookup+0x14/0x390 [ 368.106005] RSP: 0018:ffff88061fc83ce8 EFLAGS: 00000286 [ 368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001 [ 368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00 [ 368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000 [ 368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58 [ 368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0 [ 368.106005] FS: 0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000 [ 368.106005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0 [ 368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 368.106005] Stack: [ 368.106005] 00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00 [ 368.106005] ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146 [ 368.106005] ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0 [ 368.106005] Call Trace: [ 368.106005] <IRQ> [ 368.106005] [ 368.106005] [<ffffffff815349a6>] ip_route_input_noref+0x516/0xbd0 [ 368.106005] [<ffffffff814ee146>] ? skb_release_data+0xd6/0x110 [ 368.106005] [<ffffffff814ee20a>] ? kfree_skb+0x3a/0xa0 [ 368.106005] [<ffffffff8153698f>] ip_rcv_finish+0x29f/0x350 [ 368.106005] [<ffffffff81537034>] ip_rcv+0x234/0x380 [ 368.106005] [<ffffffff814fd656>] __netif_receive_skb_core+0x676/0x870 [ 368.106005] [<ffffffff814fd868>] __netif_receive_skb+0x18/0x60 [ 368.106005] [<ffffffff814fe4de>] process_backlog+0xae/0x180 [ 368.106005] [<ffffffff814fdcb2>] net_rx_action+0x152/0x240 [ 368.106005] [<ffffffff81077b3f>] __do_softirq+0xef/0x280 [ 368.106005] [<ffffffff8161619c>] call_softirq+0x1c/0x30 [ 368.106005] <EOI> [ 368.106005] [ 368.106005] [<ffffffff81015d95>] do_softirq+0x65/0xa0 [ 368.106005] [<ffffffff81077174>] local_bh_enable+0x94/0xa0 [ 368.106005] [<ffffffff81114922>] rcu_nocb_kthread+0x232/0x370 [ 368.106005] [<ffffffff81098250>] ? wake_up_bit+0x30/0x30 [ 368.106005] [<ffffffff811146f0>] ? rcu_start_gp+0x40/0x40 [ 368.106005] [<ffffffff8109728f>] kthread+0xcf/0xe0 [ 368.106005] [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140 [ 368.106005] [<ffffffff816147d8>] ret_from_fork+0x58/0x90 [ 368.106005] [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140 ==================================cut here============================== It turns out that the rcuos callback-offload kthread is busy processing a very large quantity of RCU callbacks, and it is not reliquishing the CPU while doing so. This commit therefore adds an cond_resched_rcu_qs() within the loop to allow other tasks to run. [js] use onlu cond_resched() in 3.12 Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> [ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Dhaval Giani <dhaval.giani@oracle.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Willy Tarreau <w@1wt.eu>
* tools/vm/slabinfo: fix an unintentional printfDan Carpenter2017-04-111-1/+2
| | | | | | | | | | | | | | | | | commit 2d6a4d64812bb12dda53704943b61a7496d02098 upstream. The curly braces are missing here so we print stuff unintentionally. Fixes: 9da4714a2d44 ('slub: slabinfo update for cmpxchg handling') Link: http://lkml.kernel.org/r/20160715211243.GE19522@mwanda Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Christoph Lameter <cl@linux.com> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Laura Abbott <labbott@fedoraproject.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
* lib/genalloc.c: start search from start of chunkDaniel Mentz2017-04-111-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 62e931fac45b17c2a42549389879411572f75804 upstream. gen_pool_alloc_algo() iterates over the chunks of a pool trying to find a contiguous block of memory that satisfies the allocation request. The shortcut if (size > atomic_read(&chunk->avail)) continue; makes the loop skip over chunks that do not have enough bytes left to fulfill the request. There are two situations, though, where an allocation might still fail: (1) The available memory is not contiguous, i.e. the request cannot be fulfilled due to external fragmentation. (2) A race condition. Another thread runs the same code concurrently and is quicker to grab the available memory. In those situations, the loop calls pool->algo() to search the entire chunk, and pool->algo() returns some value that is >= end_bit to indicate that the search failed. This return value is then assigned to start_bit. The variables start_bit and end_bit describe the range that should be searched, and this range should be reset for every chunk that is searched. Today, the code fails to reset start_bit to 0. As a result, prefixes of subsequent chunks are ignored. Memory allocations might fail even though there is plenty of room left in these prefixes of those other chunks. Fixes: 7f184275aa30 ("lib, Make gen_pool memory allocator lockless") Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.com Signed-off-by: Daniel Mentz <danielmentz@google.com> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
* drbd: Fix kernel_sendmsg() usage - potential NULL derefRichard Weinberger2017-04-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit d8e9e5e80e882b4f90cba7edf1e6cb7376e52e54 upstream. Don't pass a size larger than iov_len to kernel_sendmsg(). Otherwise it will cause a NULL pointer deref when kernel_sendmsg() returns with rv < size. DRBD as external module has been around in the kernel 2.4 days already. We used to be compatible to 2.4 and very early 2.6 kernels, we used to use rv = sock_sendmsg(sock, &msg, iov.iov_len); then later changed to rv = kernel_sendmsg(sock, &msg, &iov, 1, size); when we should have used rv = kernel_sendmsg(sock, &msg, &iov, 1, iov.iov_len); tcp_sendmsg() used to totally ignore the size parameter. 57be5bd ip: convert tcp_sendmsg() to iov_iter primitives changes that, and exposes our long standing error. Even with this error exposed, to trigger the bug, we would need to have an environment (config or otherwise) causing us to not use sendpage() for larger transfers, a failing connection, and have it fail "just at the right time". Apparently that was unlikely enough for most, so this went unnoticed for years. Still, it is known to trigger at least some of these, and suspected for the others: [0] http://lists.linbit.com/pipermail/drbd-user/2016-July/023112.html [1] http://lists.linbit.com/pipermail/drbd-dev/2016-March/003362.html [2] https://forums.grsecurity.net/viewtopic.php?f=3&t=4546 [3] https://ubuntuforums.org/showthread.php?t=2336150 [4] http://e2.howsolveproblem.com/i/1175162/ This should go into 4.9, and into all stable branches since and including v4.0, which is the first to contain the exposing change. It is correct for all stable branches older than that as well (which contain the DRBD driver; which is 2.6.33 and up). It requires a small "conflict" resolution for v4.4 and earlier, with v4.5 we dropped the comment block immediately preceding the kernel_sendmsg(). Fixes: b411b3637fa7 ("The DRBD driver") Cc: viro@zeniv.linux.org.uk Cc: christoph.lechleitner@iteg.at Cc: wolfgang.glas@iteg.at Reported-by: Christoph Lechleitner <christoph.lechleitner@iteg.at> Tested-by: Christoph Lechleitner <christoph.lechleitner@iteg.at> Signed-off-by: Richard Weinberger <richard@nod.at> [changed oneliner to be "obvious" without context; more verbose message] Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* cfq: fix starvation of asynchronous writesGlauber Costa2017-04-111-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3932a86b4b9d1f0b049d64d4591ce58ad18b44ec upstream. While debugging timeouts happening in my application workload (ScyllaDB), I have observed calls to open() taking a long time, ranging everywhere from 2 seconds - the first ones that are enough to time out my application - to more than 30 seconds. The problem seems to happen because XFS may block on pending metadata updates under certain circumnstances, and that's confirmed with the following backtrace taken by the offcputime tool (iovisor/bcc): ffffffffb90c57b1 finish_task_switch ffffffffb97dffb5 schedule ffffffffb97e310c schedule_timeout ffffffffb97e1f12 __down ffffffffb90ea821 down ffffffffc046a9dc xfs_buf_lock ffffffffc046abfb _xfs_buf_find ffffffffc046ae4a xfs_buf_get_map ffffffffc046babd xfs_buf_read_map ffffffffc0499931 xfs_trans_read_buf_map ffffffffc044a561 xfs_da_read_buf ffffffffc0451390 xfs_dir3_leaf_read.constprop.16 ffffffffc0452b90 xfs_dir2_leaf_lookup_int ffffffffc0452e0f xfs_dir2_leaf_lookup ffffffffc044d9d3 xfs_dir_lookup ffffffffc047d1d9 xfs_lookup ffffffffc0479e53 xfs_vn_lookup ffffffffb925347a path_openat ffffffffb9254a71 do_filp_open ffffffffb9242a94 do_sys_open ffffffffb9242b9e sys_open ffffffffb97e42b2 entry_SYSCALL_64_fastpath 00007fb0698162ed [unknown] Inspecting my run with blktrace, I can see that the xfsaild kthread exhibit very high "Dispatch wait" times, on the dozens of seconds range and consistent with the open() times I have saw in that run. Still from the blktrace output, we can after searching a bit, identify the request that wasn't dispatched: 8,0 11 152 81.092472813 804 A WM 141698288 + 8 <- (8,1) 141696240 8,0 11 153 81.092472889 804 Q WM 141698288 + 8 [xfsaild/sda1] 8,0 11 154 81.092473207 804 G WM 141698288 + 8 [xfsaild/sda1] 8,0 11 206 81.092496118 804 I WM 141698288 + 8 ( 22911) [xfsaild/sda1] <==== 'I' means Inserted (into the IO scheduler) ===================================> 8,0 0 289372 96.718761435 0 D WM 141698288 + 8 (15626265317) [swapper/0] <==== Only 15s later the CFQ scheduler dispatches the request ======================> As we can see above, in this particular example CFQ took 15 seconds to dispatch this request. Going back to the full trace, we can see that the xfsaild queue had plenty of opportunity to run, and it was selected as the active queue many times. It would just always be preempted by something else (example): 8,0 1 0 81.117912979 0 m N cfq1618SN / insert_request 8,0 1 0 81.117913419 0 m N cfq1618SN / add_to_rr 8,0 1 0 81.117914044 0 m N cfq1618SN / preempt 8,0 1 0 81.117914398 0 m N cfq767A / slice expired t=1 8,0 1 0 81.117914755 0 m N cfq767A / resid=40 8,0 1 0 81.117915340 0 m N / served: vt=1948520448 min_vt=1948520448 8,0 1 0 81.117915858 0 m N cfq767A / sl_used=1 disp=0 charge=0 iops=1 sect=0 where cfq767 is the xfsaild queue and cfq1618 corresponds to one of the ScyllaDB IO dispatchers. The requests preempting the xfsaild queue are synchronous requests. That's a characteristic of ScyllaDB workloads, as we only ever issue O_DIRECT requests. While it can be argued that preempting ASYNC requests in favor of SYNC is part of the CFQ logic, I don't believe that doing so for 15+ seconds is anyone's goal. Moreover, unless I am misunderstanding something, that breaks the expectation set by the "fifo_expire_async" tunable, which in my system is set to the default. Looking at the code, it seems to me that the issue is that after we make an async queue active, there is no guarantee that it will execute any request. When the queue itself tests if it cfq_may_dispatch() it can bail if it sees SYNC requests in flight. An incoming request from another queue can also preempt it in such situation before we have the chance to execute anything (as seen in the trace above). This patch sets the must_dispatch flag if we notice that we have requests that are already fifo_expired. This flag is always cleared after cfq_dispatch_request() returns from cfq_dispatch_requests(), so it won't pin the queue for subsequent requests (unless they are themselves expired) Care is taken during preempt to still allow rt requests to preempt us regardless. Testing my workload with this patch applied produces much better results. From the application side I see no timeouts, and the open() latency histogram generated by systemtap looks much better, with the worst outlier at 131ms: Latency histogram of xfs_buf_lock acquisition (microseconds): value |-------------------------------------------------- count 0 | 11 1 |@@@@ 161 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1966 4 |@ 54 8 | 36 16 | 7 32 | 0 64 | 0 ~ 1024 | 0 2048 | 0 4096 | 1 8192 | 1 16384 | 2 32768 | 0 65536 | 0 131072 | 1 262144 | 0 524288 | 0 Signed-off-by: Glauber Costa <glauber@scylladb.com> CC: Jens Axboe <axboe@kernel.dk> CC: linux-block@vger.kernel.org CC: linux-kernel@vger.kernel.org Signed-off-by: Glauber Costa <glauber@scylladb.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* Revert "ipc/sem.c: optimize sem_lock()"Willy Tarreau2017-04-111-8/+0
| | | | | | | | | | | | This reverts commit 901f6fedc5340d66e2ca67c70dfee926cb5a1ea0 (upstream commit 6d07b68ce16ae9535955ba2059dedba5309c3ca1). As suggested in commit 5864a2fd3088db73d47942370d0f7210a807b9bc (ipc/sem.c: fix complex_count vs. simple op race) since it introduces a regression and the candidate fix requires too many changes for 3.10. Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
* kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscdMichal Hocko2017-04-111-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 735f2770a770156100f534646158cb58cb8b2939 upstream. Commit fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit") has caused a subtle regression in nscd which uses CLONE_CHILD_CLEARTID to clear the nscd_certainly_running flag in the shared databases, so that the clients are notified when nscd is restarted. Now, when nscd uses a non-persistent database, clients that have it mapped keep thinking the database is being updated by nscd, when in fact nscd has created a new (anonymous) one (for non-persistent databases it uses an unlinked file as backend). The original proposal for the CLONE_CHILD_CLEARTID change claimed (https://lkml.org/lkml/2006/10/25/233): : The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls : on behalf of pthread_create() library calls. This feature is used to : request that the kernel clear the thread-id in user space (at an address : provided in the syscall) when the thread disassociates itself from the : address space, which is done in mm_release(). : : Unfortunately, when a multi-threaded process incurs a core dump (such as : from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of : the other threads, which then proceed to clear their user-space tids : before synchronizing in exit_mm() with the start of core dumping. This : misrepresents the state of process's address space at the time of the : SIGSEGV and makes it more difficult for someone to debug NPTL and glibc : problems (misleading him/her to conclude that the threads had gone away : before the fault). : : The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a : core dump has been initiated. The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269) seems to have a larger scope than the original patch asked for. It seems that limitting the scope of the check to core dumping should work for SIGSEGV issue describe above. [Changelog partly based on Andreas' description] Fixes: fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit") Link: http://lkml.kernel.org/r/1471968749-26173-1-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Tested-by: William Preston <wpreston@suse.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Andreas Schwab <schwab@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
* tracing: Move mutex to protect against resetting of seq dataSteven Rostedt (Red Hat)2017-04-111-7/+8
| | | | | | | | | | | | commit 1245800c0f96eb6ebb368593e251d66c01e61022 upstream. The iter->seq can be reset outside the protection of the mutex. So can reading of user data. Move the mutex up to the beginning of the function. Fixes: d7350c3f45694 ("tracing/core: make the read callbacks reentrants") Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
* kaweth: fix firmware downloadOliver Neukum2017-04-111-2/+1
| | | | | | | | | | | commit 60bcabd080f53561efa9288be45c128feda1a8bb upstream. This fixes the oops discovered by the Umap2 project and Alan Stern. The intf member needs to be set before the firmware is downloaded. Signed-off-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>
* net: sky2: Fix shutdown crashJeremy Linton2017-04-111-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 06ba3b2133dc203e1e9bc36cee7f0839b79a9e8b upstream. The sky2 frequently crashes during machine shutdown with: sky2_get_stats+0x60/0x3d8 [sky2] dev_get_stats+0x68/0xd8 rtnl_fill_stats+0x54/0x140 rtnl_fill_ifinfo+0x46c/0xc68 rtmsg_ifinfo_build_skb+0x7c/0xf0 rtmsg_ifinfo.part.22+0x3c/0x70 rtmsg_ifinfo+0x50/0x5c netdev_state_change+0x4c/0x58 linkwatch_do_dev+0x50/0x88 __linkwatch_run_queue+0x104/0x1a4 linkwatch_event+0x30/0x3c process_one_work+0x140/0x3e0 worker_thread+0x60/0x44c kthread+0xdc/0xf0 ret_from_fork+0x10/0x50 This is caused by the sky2 being called after it has been shutdown. A previous thread about this can be found here: https://lkml.org/lkml/2016/4/12/410 An alternative fix is to assure that IFF_UP gets cleared by calling dev_close() during shutdown. This is similar to what the bnx2/tg3/xgene and maybe others are doing to assure that the driver isn't being called following _shutdown(). Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Willy Tarreau <w@1wt.eu>