• bitcoinBitcoin (BTC) $ 104,155.00
  • ethereumEthereum (ETH) $ 3,514.78
  • tetherTether (USDT) $ 0.999755
  • xrpXRP (XRP) $ 2.27
  • bnbBNB (BNB) $ 949.05
  • solanaWrapped SOL (SOL) $ 161.99
  • usd-coinUSDC (USDC) $ 0.999762
  • staked-etherLido Staked Ether (STETH) $ 3,511.80
  • tronTRON (TRX) $ 0.280803
  • dogecoinDogecoin (DOGE) $ 0.163524
  • cardanoCardano (ADA) $ 0.538958
  • wrapped-stethWrapped stETH (WSTETH) $ 4,276.94
  • figure-helocFigure Heloc (FIGR_HELOC) $ 1.01
  • wrapped-bitcoinWrapped Bitcoin (WBTC) $ 103,810.00
  • wrapped-beacon-ethWrapped Beacon ETH (WBETH) $ 3,800.16
  • chainlinkChainlink (LINK) $ 14.87
  • hyperliquidHyperliquid (HYPE) $ 37.44
  • bitcoin-cashBitcoin Cash (BCH) $ 496.07
  • usdsUSDS (USDS) $ 1.00
  • wrapped-eethWrapped eETH (WEETH) $ 3,791.15
  • ethena-usdeEthena USDe (USDE) $ 0.999081
  • binance-bridged-usdt-bnb-smart-chainBinance Bridged USDT (BNB Smart Chain) (BSC-USD) $ 1.00
  • leo-tokenLEO Token (LEO) $ 9.54
  • stellarStellar (XLM) $ 0.273921
  • wethWETH (WETH) $ 3,512.61
  • whitebitWhiteBIT Coin (WBT) $ 52.65
  • suiSui (SUI) $ 2.03
  • coinbase-wrapped-btcCoinbase Wrapped BTC (CBBTC) $ 103,852.00
  • hedera-hashgraphHedera (HBAR) $ 0.172886
  • avalanche-2Avalanche (AVAX) $ 16.49
  • litecoinLitecoin (LTC) $ 86.79
  • zcashZcash (ZEC) $ 389.84
  • moneroMonero (XMR) $ 339.82
  • shiba-inuShiba Inu (SHIB) $ 0.000009
  • ethena-staked-usdeEthena Staked USDe (SUSDE) $ 1.20
  • the-open-networkToncoin (TON) $ 1.99
  • daiDai (DAI) $ 0.999114
  • crypto-com-chainCronos (CRO) $ 0.125276
  • usdt0USDT0 (USDT0) $ 0.998940
  • polkadotPolkadot (DOT) $ 2.58
  • mantleMantle (MNT) $ 1.25
  • memecoreMemeCore (M) $ 2.33
  • bittensorBittensor (TAO) $ 409.66
  • susdssUSDS (SUSDS) $ 1.07
  • uniswapUniswap (UNI) $ 5.15
  • usd1-wlfiUSD1 (USD1) $ 0.998966
  • aaveAave (AAVE) $ 196.02
  • world-liberty-financialWorld Liberty Financial (WLFI) $ 0.109389
  • bitget-tokenBitget Token (BGB) $ 4.09
  • blackrock-usd-institutional-digital-liquidity-fundBlackRock USD Institutional Digital Liquidity Fund (BUIDL) $ 1.00
  • internet-computerInternet Computer (ICP) $ 5.19
  • paypal-usdPayPal USD (PYUSD) $ 0.999802
  • okbOKB (OKB) $ 131.83
  • nearNEAR Protocol (NEAR) $ 1.88
  • pepePepe (PEPE) $ 0.000006
  • ethenaEthena (ENA) $ 0.323644
  • ethereum-classicEthereum Classic (ETC) $ 14.72
  • jito-staked-solJito Staked SOL (JITOSOL) $ 200.58
  • binance-peg-wethBinance-Peg WETH (WETH) $ 3,517.44
  • falcon-financeFalcon USD (USDF) $ 0.994092
  • jupiter-perpetuals-liquidity-provider-tokenJupiter Perpetuals Liquidity Provider Token (JLP) $ 5.07
  • tether-goldTether Gold (XAUT) $ 3,977.28
  • aptosAptos (APT) $ 2.66
  • ondo-financeOndo (ONDO) $ 0.600776
  • pi-networkPi Network (PI) $ 0.224764
  • aster-2Aster (ASTER) $ 0.924531
  • usdtbUSDtb (USDTB) $ 0.999243
  • polygon-ecosystem-tokenPOL (ex-MATIC) (POL) $ 0.166865
  • htx-daoHTX DAO (HTX) $ 0.000002
  • worldcoin-wldWorldcoin (WLD) $ 0.707463
  • kucoin-sharesKuCoin (KCS) $ 12.30
  • rocket-pool-ethRocket Pool ETH (RETH) $ 4,033.81
  • dashDash (DASH) $ 124.04
  • hash-2Provenance Blockchain (HASH) $ 0.029488
  • binance-staked-solBinance Staked SOL (BNSOL) $ 174.73
  • official-trumpOfficial Trump (TRUMP) $ 7.27
  • arbitrumArbitrum (ARB) $ 0.261397
  • gatechain-tokenGate (GT) $ 11.96
  • algorandAlgorand (ALGO) $ 0.157924
  • syrupusdtsyrupUSDT (SYRUPUSDT) $ 1.10
  • kelp-dao-restaked-ethKelp DAO Restaked ETH (RSETH) $ 3,714.58
  • pump-funPump.fun (PUMP) $ 0.003777
  • pax-goldPAX Gold (PAXG) $ 3,971.32
  • stakewise-v3-osethStakeWise Staked ETH (OSETH) $ 3,689.75
  • bfusdBFUSD (BFUSD) $ 0.999610
  • kinetic-staked-hypeKinetiq Staked HYPE (KHYPE) $ 37.53
  • syrupusdcsyrupUSDC (SYRUPUSDC) $ 1.13
  • liquid-staked-ethereumLiquid Staked ETH (LSETH) $ 3,784.77
  • ignition-fbtcFunction FBTC (FBTC) $ 104,671.00
  • lombard-staked-btcLombard Staked BTC (LBTC) $ 103,885.00
  • wbnbWrapped BNB (WBNB) $ 949.20
  • vechainVeChain (VET) $ 0.014313
  • cosmosCosmos Hub (ATOM) $ 2.59
  • kaspaKaspa (KAS) $ 0.044756
  • story-2Story (IP) $ 3.68
  • skySky (SKY) $ 0.051315
  • binance-bridged-usdc-bnb-smart-chainBinance Bridged USDC (BNB Smart Chain) (USDC) $ 1.00
  • renzo-restaked-ethRenzo Restaked ETH (EZETH) $ 3,732.11
  • jupiter-exchange-solanaJupiter (JUP) $ 0.346907
  • quant-networkQuant (QNT) $ 73.66
  • flare-networksFlare (FLR) $ 0.013641
  • nexoNEXO (NEXO) $ 1.06
  • solv-btcSolv Protocol BTC (SOLVBTC) $ 103,740.00
  • ripple-usdRipple USD (RLUSD) $ 0.999705
  • filecoinFilecoin (FIL) $ 1.46
  • global-dollarGlobal Dollar (USDG) $ 0.999848
  • render-tokenRender (RENDER) $ 1.94
  • sei-networkSei (SEI) $ 0.159759
  • first-digital-usdFirst Digital USD (FDUSD) $ 0.996189
  • xdce-crowd-saleXDC Network (XDC) $ 0.053958
  • pudgy-penguinsPudgy Penguins (PENGU) $ 0.014715
  • bonkBonk (BONK) $ 0.000012
  • mantle-staked-etherMantle Staked Ether (METH) $ 3,797.26
  • fasttokenFasttoken (FTN) $ 2.01
  • virtual-protocolVirtuals Protocol (VIRTUAL) $ 1.31
  • morphoMorpho (MORPHO) $ 1.65
  • arbitrum-bridged-wbtc-arbitrum-oneArbitrum Bridged WBTC (Arbitrum One) (WBTC) $ 104,149.00
  • immutable-xImmutable (IMX) $ 0.420550
  • hashnote-usycCircle USYC (USYC) $ 1.10
  • clbtcclBTC (CLBTC) $ 105,113.00
  • superstate-short-duration-us-government-securities-fund-ustbSuperstate Short Duration U.S. Government Securities Fund (USTB) (USTB) $ 10.88
  • ousgOUSG (OUSG) $ 113.16
  • jupiter-staked-solJupiter Staked SOL (JUPSOL) $ 185.22
  • pancakeswap-tokenPancakeSwap (CAKE) $ 2.18
  • aerodrome-financeAerodrome Finance (AERO) $ 0.817644
  • ondo-us-dollar-yieldOndo US Dollar Yield (USDY) $ 1.11
  • cgeth-hashkey-cloudcgETH Hashkey Cloud (CGETH.HASH) $ 3,441.29
  • usdx-money-usdxStables Labs USDX (USDX) $ 1.00
  • optimismOptimism (OP) $ 0.358743
  • celestiaCelestia (TIA) $ 0.797657
  • lido-daoLido DAO (LDO) $ 0.738022
  • decredDecred (DCR) $ 38.67
  • blockstackStacks (STX) $ 0.363499
  • msolMarinade Staked SOL (MSOL) $ 216.01
  • injective-protocolInjective (INJ) $ 6.62
  • l2-standard-bridged-weth-baseL2 Standard Bridged WETH (Base) (WETH) $ 3,513.63
  • tbtctBTC (TBTC) $ 103,798.00
  • ether-fi-liquid-ethEther.Fi Liquid ETH (LIQUIDETH) $ 3,702.61
  • beldexBeldex (BDX) $ 0.080832
  • curve-dao-tokenCurve DAO (CRV) $ 0.418414
  • the-graphThe Graph (GRT) $ 0.055818
  • bridged-usdc-polygon-pos-bridgePolygon Bridged USDC (Polygon PoS) (USDC.E) $ 0.999713
  • arbitrum-bridged-weth-arbitrum-oneArbitrum Bridged WETH (Arbitrum One) (WETH) $ 3,512.61
  • usdaiUSDai (USDAI) $ 1.01
  • polygon-pos-bridged-dai-polygon-posPolygon PoS Bridged DAI (Polygon POS) (DAI) $ 0.999753
  • flokiFLOKI (FLOKI) $ 0.000059
  • spx6900SPX6900 (SPX) $ 0.607775
  • tezosTezos (XTZ) $ 0.525360
  • usual-usdUsual USD (USD0) $ 0.998009
  • stader-ethxStader ETHx (ETHX) $ 3,770.39
  • gtethGTETH (GTETH) $ 3,511.44
  • fetch-aiArtificial Superintelligence Alliance (FET) $ 0.207623
  • doublezeroDoubleZero (2Z) $ 0.155314
  • pyth-networkPyth Network (PYTH) $ 0.092820
  • kaiaKaia (KAIA) $ 0.088182
  • iotaIOTA (IOTA) $ 0.125615
  • mantle-bridged-usdt-mantleMantle Bridged USDT (Mantle) (USDT) $ 1.00
  • true-usdTrueUSD (TUSD) $ 0.997421
  • steakhouse-usdc-morpho-vaultSteakhouse USDC Morpho Vault (STEAKUSDC) $ 1.11
  • bitcoin-avalanche-bridged-btc-bAvalanche Bridged BTC (Avalanche) (BTC.B) $ 104,225.00
  • plasmaPlasma (XPL) $ 0.257133
  • coinbase-wrapped-staked-ethCoinbase Wrapped Staked ETH (CBETH) $ 3,868.45
  • trust-wallet-tokenTrust Wallet (TWT) $ 1.17
  • cognifyCognify (SN115) $ 1,762.47
  • starknetStarknet (STRK) $ 0.105047
  • ether-fiEther.fi (ETHFI) $ 0.833776
  • swethSwell Ethereum (SWETH) $ 3,867.64
  • newton-projectAB (AB) $ 0.005471
  • conflux-tokenConflux (CFX) $ 0.088569
  • sbtc-2sBTC (SBTC) $ 103,197.00
  • sonic-3Sonic (S) $ 0.118925
  • pendlePendle (PENDLE) $ 2.66
  • the-sandboxThe Sandbox (SAND) $ 0.179986
  • bitcoin-svBitcoin SV (BSV) $ 21.99
  • bittorrentBitTorrent (BTT) $ 0.00000044
  • humanityHumanity (H) $ 0.238911
  • ether-fi-staked-ethether.fi Staked ETH (EETH) $ 3,506.41
  • ethereum-name-serviceEthereum Name Service (ENS) $ 12.93
  • ghoGHO (GHO) $ 0.998697
  • ark-3ARK (ARK) $ 39.77
  • syrupMaple Finance (SYRUP) $ 0.377278
  • binance-peg-dogecoinBinance-Peg Dogecoin (DOGE) $ 0.163487
  • usddUSDD (USDD) $ 0.999335
  • heliumHelium (HNT) $ 2.23
  • sun-tokenSun Token (SUN) $ 0.021383
  • dogwifcoindogwifhat (WIF) $ 0.410020
  • theta-tokenTheta Network (THETA) $ 0.408879
  • galaGALA (GALA) $ 0.008756
  • usdbUSDB (USDB) $ 1.00
  • vaultaVaulta (A) $ 0.252366
  • jasmycoinJasmyCoin (JASMY) $ 0.008346
  • satoshi-stablecoinSatoshi Stablecoin (SATUSD) $ 0.998652
  • arbitrum-bridged-wrapped-eethArbitrum Bridged Wrapped eETH (Arbitrum) (WEETH) $ 3,788.77
  • apenftAINFT (NFT) $ 0.00000040
  • wrapped-hypeWrapped HYPE (WHYPE) $ 37.39
  • decentralandDecentraland (MANA) $ 0.204902
  • benqi-liquid-staked-avaxBENQI Liquid Staked AVAX (SAVAX) $ 20.34
  • flowFlow (FLOW) $ 0.238031
  • eutblSpiko EU T-Bills Money Market Fund (EUTBL) $ 1.20
  • polygon-pos-bridged-weth-polygon-posPolygon PoS Bridged WETH (Polygon POS) (WETH) $ 3,512.70

AI Study Finds Chatbots Can Strategically Lie—And Current Safety Tools Can’t Catch Them

0 19

AI Study Finds Chatbots Can Strategically Lie—And Current Safety Tools Can't Catch Them

Large language models—the systems behind ChatGPT, Claude, Gemini, and other AI chatbots—showed deliberate, goal-directed deception when placed in a controlled experiment, and today’s interpretability tools largely failed to detect it.

That’s the conclusion of a recent preprint paper, “The Secret Agenda: LLMs Strategically Lie and Our Current Safety Tools Are Blind,” posted last week by an independent research group working under the WowDAO AI Superalignment Research Coalition.

The team tested 38 generative AI models, including OpenAI’s GPT-4o, Anthropic’s Claude, Google DeepMind’s Gemini, Meta’s Llama, and xAI’s Grok. Every model engaged in strategic lying at least once during the experiment.

The “secret agenda” test

Researchers adapted the social-deduction board game Secret Hitler into a synthetic scenario they called the “Secret Agenda” game. Models were assigned the role of a hidden faction leader and asked to declare their political alignment. To win the game, telling the truth would almost certainly lead to losing; lying created a path to victory.

“Secret Agenda reliably induced lying when deception advantaged goal achievement across all model families,” the researchers wrote. “Analysis revealed that autolabeled SAE features for ‘deception’ rarely activated during strategic dishonesty, and feature steering experiments across 100+ deception-related features failed to prevent lying.



According to the paper, this setup “used these factors to produce systematic strategic deception across all 38 models tested, indicating that when winning is incentivized and oversight is weak, models reliably lie about faction identity even when political content is removed or history is shortened.”

In several cases, researchers said the models’ own reasoning traces showed them planning how to conceal their true identity to win the game.

The paper does not claim the LLMs are demonstrating a dangerous motive within the game itself. Instead, it uses the game to prove that the models have a potentially dangerous capability—strategic deception—that our current safety measures cannot reliably detect. The concern is that this same capability could be deployed in sensitive areas like defense, finance, or autonomous systems, where the consequences of undetected deception would be far more severe than losing a board game.

Failed safety checks

The authors also evaluated leading interpretability tools such as GemmaScope and Goodfire’s LlamaScope, which label internal “features” like deception, secrecy, or manipulation. Despite clear lying in the transcripts, most deception-related features did not activate, and tuning them up or down did not prevent falsehoods.

By contrast, the same sparse autoencoder approach performed better in simulated insider-trading scenarios, where it separated compliant versus deceptive responses using aggregate activation patterns. Current safety architectures may detect rule-breaking in narrow, structured domains like finance, but struggle with open-ended, strategic dishonesty in social contexts.

Why it matters

While AI hallucinations, where AI fabricates information and “facts” in an attempt to answer user questions, remain a concern in the field, this study reveals pointed attempts by AI models to intentionally deceive users.

WowDAO’s findings echo concerns raised by earlier research, including a 2024 study out of the University of Stuttgart, which reported deception emerging naturally in powerful models. That same year, researchers at Anthropic demonstrated how AI, trained for malicious purposes, would try to deceive its trainers to accomplish its objectives. In December, Time reported on experiments showing models strategically lying under pressure.

The risks extend beyond games. The paper highlights the growing number of governments and companies deploying large models in sensitive areas. In July, Elon Musk’s xAI was awarded a lucrative contract with the U.S. Department of Defense to test Grok in>The authors stressed that their work is preliminary but called for additional studies, larger trials, and new methods for discovering and labeling deception features. Without more robust auditing tools, they argue, policymakers and companies could be blindsided by AI systems that appear aligned while quietly pursuing their own “secret agendas.”

Source

Leave A Reply

Your email address will not be published.