This is a good question. In such a case there would still be a fixed set of action probabilities, but they would vary contextually given the state. In a card game for example, the cards may change, but the maximum number of cards available to a single person at a given time is often limited. For example, if a player could have 10 cards in their hand, then each of the actions could correspond to playing card 1,2,3,..10. The network would be provided an input tensor which conveyed what the actual content of each of those “card positions” were, and the agent could act contextually given this information.
Also, while large action spaces of hundreds or thousands are difficult to work with in discreet settings, policy gradient methods, if trained properly, could support dozens of potential actions, which may be enough for some of the changing scenarios you have in mind.
Hope that helps answer your question!