Have a reference story (or a couple) — something people can relate to — that represents the upper limit in terms of complexity you consider to be acceptable for a story (ideally something that can still be completed in <~3 days). Ask team members to to give a thumbs up/thumbs down on whether the story is “more complex” than that story. Discuss.
The key here is that the reference story is a “good story” but at the upper limit. A variation might be a “medium” story (in the ~1-2 day range). Then vote above, same, or below, have a conversation, etc.
You might also play a splitting game whereby the team individually attempts to split the story (“is this splittable?”), and shares their work.
Or ask the team to tack rank the stories in terms of complexity/risk/scope, and then attack the top third to de-risk, split.
… but to be clear, let’s say your complexity ranking game ENDED once you’ve stopped splitting. You never used this artifact elsewhere (for velocity, etc.) I don’t think it would be all that bad. You might play with a set of emoji cards. The issue is extending the discussion of complexity to other systems/jobs.