Shutdown-seeking AI

Journal article

Goldstein, Simon David and Robinson, Pamela. (2024). Shutdown-seeking AI. Philosophical Studies : an international journal for philosophy in the analytic tradition. pp. 1-13. https://doi.org/10.1007/s11098-024-02099-6

Publication dates
Authors	Goldstein, Simon David and Robinson, Pamela
Abstract	We propose developing AIs whose only final goal is being shut down. We argue that this approach to AI safety has three benefits: (i) it could potentially be implemented in reinforcement learning, (ii) it avoids some dangerous instrumental convergence dynamics, and (iii) it creates trip wires for monitoring dangerous capabilities. We also argue that the proposal can overcome a key challenge raised by Soares et al. (2015), that shutdown-seeking AIs will manipulate humans into shutting them down. We conclude by comparing our approach with Soares et al.'s corrigibility framework.
Keywords	AI safety; Instrumental convergence; Reward misspecification
Year	01 Jan 2024
Journal	Philosophical Studies : an international journal for philosophy in the analytic tradition
Journal citation	pp. 1-13
Publisher	Springer Science and Business Media B.V.
ISSN	0031-8116
Digital Object Identifier (DOI)	https://doi.org/10.1007/s11098-024-02099-6
Web address (URL)	https://link.springer.com/article/10.1007/s11098-024-02099-6
Open access	Published as non-open access
Research or scholarly	Research
Page range	1-13
Publisher's version	OA_Goldstein_2024_Shutdown_seeking_AI.pdf License CC BY 4.0 File Access Level Open
Output status	Published
Online	06 Jun 2024
Publication process dates
Accepted	30 Dec 2023
Deposited	04 Oct 2024
Additional information	© The Author(s) 2024
	This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Com- mons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licenses/ by/4. 0/.
Place of publication	Netherlands

Permalink -

https://acuresearchbank.acu.edu.au/item/90zy8/shutdown-seeking-ai

Download files

Publisher's version

	OA_Goldstein_2024_Shutdown_seeking_AI.pdf
License: CC BY 4.0
File access level: Open

83
total views
29
total downloads
5
views this month
4
downloads this month

These values are for the period from 19th October 2020, when this repository was created.

Export as

Related outputs

AI deception : A survey of examples, risks, and potential solutions

Park, Peter S., Goldstein, Simon, O'Gara, Aidan, Chen, Michael and Hendrycks, Dan. (2024). AI deception : A survey of examples, risks, and potential solutions. Patterns. 5(5), pp. 1-16. https://doi.org/10.1016/j.patter.2024.100988

A question-sensitive theory of intention

Beddor, Bob and Goldstein, Simon. (2023). A question-sensitive theory of intention. The Philosophical Quarterly. 73(2), pp. 346-378. https://doi.org/10.1093/pq/pqac031

Getting accurate about knowledge

Carter, Sam and Goldstein, Simon. (2023). Getting accurate about knowledge. Mind. 132(525), pp. 158-191. https://doi.org/10.1093/mind/fzac009

Language agents reduce the risk of existential catastrophe

Goldstein, Simon and Kirk-Giannini, Cameron Domenico. (2023). Language agents reduce the risk of existential catastrophe. AI & Society. pp. 1-11. https://doi.org/10.1007/s00146-023-01748-4

Attitude verbs’ local context

Blumberg, Kyle and Goldstein, Simon. (2023). Attitude verbs’ local context. Linguistics and Philosophy. 46(3), pp. 483-507. https://doi.org/10.1007/s10988-022-09373-y

Fragile knowledge

Goldstein, Simon. (2022). Fragile knowledge. Mind. 131(522), pp. 487-515. https://doi.org/10.1093/mind/fzab040

Contextology

Goldstein, Simon and Kirk-Giannini, Cameron Domenico. (2022). Contextology. Philosophical Studies. 179(11), pp. 3187-3209. https://doi.org/10.1007/s11098-022-01820-7

Sly Pete in dynamic semantics

Goldstein, Simon David. (2022). Sly Pete in dynamic semantics. Journal of Philosophical Logic. 51(5), pp. 1103-1117. https://doi.org/10.1007/s10992-022-09660-w

Knowledge from multiple experiences

Goldstein, Simon and Hawthorne, John. (2022). Knowledge from multiple experiences. Philosophical Studies. 179(4), pp. 1341-1372. https://doi.org/10.1007/s11098-021-01710-4

Counterfactual contamination

Goldstein, Simon and Hawthorne, John. (2022). Counterfactual contamination. Australasian Journal of Philosophy. 100(2), pp. 262-278. https://doi.org/10.1080/00048402.2021.1886129

Probability for epistemic modalities

Goldstein, Simon and Santorio, Paolo. (2021). Probability for epistemic modalities. Philosophers' Imprint. 21(33), pp. 1-37.

Mighty knowledge

Beddor, Bob and Goldstein, Simon. (2021). Mighty knowledge. Journal of Philosophy. 118(5), pp. 229-269. https://doi.org/10.5840/jphil2021118518

The normality of error

Carter, Sam and Goldstein, Simon. (2021). The normality of error. Philosophical Studies. 178, pp. 2509-2533. https://doi.org/10.1007/s11098-020-01560-6

Losing confidence in luminosity

Goldstein, Simon and Waxman, Daniel. (2021). Losing confidence in luminosity. Noûs. 55(4), pp. 962-991. https://doi.org/10.1111/nous.12348

Epistemic modal credence

Goldstein, Simon. (2021). Epistemic modal credence. Philosophers' Imprint. 21(26), pp. 1-24.

The counterfactual direct argument

Goldstein, Simon. (2020). The counterfactual direct argument. Linguistics and Philosophy. 43(2), pp. 193-232. https://doi.org/10.1007/s10988-019-09272-9

Free choice impossibilty results

Goldstein, Simon. (2020). Free choice impossibilty results. Journal of Philosophical Logic. 49(2), pp. 249-282. https://doi.org/10.1007/s10992-019-09517-9

Conditional heresies

Cariani, Fabrizio and Goldstein, Simon. (2020). Conditional heresies. Philosophy and Phenomenological Research. 101(2), pp. 251-282. https://doi.org/10.1111/phpr.12565

A theory of conditional assertion

Goldstein, Simon. (2019). A theory of conditional assertion. Journal of Philosophy. 116(6), pp. 293-318. https://doi.org/10.5840/jphil2019116620

Generalized update semantics

Goldstein, Simon. (2019). Generalized update semantics. Mind: A Quarterly review of philosophy. 128(511), pp. 795-835. https://doi.org/10.1093/mind/fzy076

Free choice and homogeneity

Goldstein, Simon. (2019). Free choice and homogeneity. Semantics and Pragmatics. 12, pp. 1-47. https://doi.org/10.3765/sp.12.23

Triviality results for probabilistic modals

Goldstein, Simon. (2019). Triviality results for probabilistic modals. Philosophy and Phenomenological Research. 99(1), pp. 188-222. https://doi.org/10.1111/phpr.12477

A stronger doctrine of double effect

Bronner, Ben and Goldstein, Simon. (2018). A stronger doctrine of double effect. Australasian Journal of Philosophy. 96(4), pp. 793 - 805. https://doi.org/10.1080/00048402.2017.1400572

Believing epistemic contradictions

Beddor, Bob and Goldstein, Simon. (2018). Believing epistemic contradictions. The Review of Symbolic Logic. 11(1), pp. 87-114. https://doi.org/10.1017/S1755020316000514

A preface paradox for intention

Goldstein, Simon. (2016). A preface paradox for intention. Philosophers' Imprint. 16(14), pp. 1-20.