How exactly do professional sound mixers in big budget movies balance levels with dialogue versus music (not to mention sound design and foley)?
I'll answer your question with another question; how did van Gogh paint
Starry Night?
Mixing sound for picture is an art form just like any other. You start with the proper tools, then add talent and experience.
Is it common practice to get all the dialogue levels the way you want them, then lay the music underneath at a constant volume, or do music tracks go through many audio level changes depending on whether or not dialogue is being spoken at any given time.
Every rerecording mixer is going to have his/her own process, but most start with the dialog and build around it. But the dialog tracks are constantly readjusted as the rest of the sounds are brought in around them, and when dialog is not present the rest of the sound design, music and score is adjusted on a second-by-second basis to complement the visuals. And there is much more than volume used to mix; there's EQ, dynamic processing (limiters, compressors, etc.) and effects (usually reverb). Of course, every scene and film has its own unique requirements, so you may start with the music or Foley/sound FX and work from there. That is a part of the challenge of designing and mixing sound-for-picture; you are working to support of the visuals, the story/plot and the characters - that takes precedence over everything else.
It would seem to me that an audience might notice if a song was playing loudly, then it dipped down suddenly right as a character was speaking their lines, then raised again when they finished.
Now you're getting into decisions that apply to a specific scene in a specific movie. There are a variety of techniques that are used in combination to sustain the proper illusion. Dynamic processing of the music can give the illusion of loudness without making the music really loud. (We won't go into the loudness war currently under way in the music industry.) You use dynamic processing on dialog to make it "pop" out of the mix a little more. You use EQ to carve out "holes" in the frequency spectrum, and/or add frequencies to help the dialog cut through the density of the music.
Now comes the question of the specific scene. Does the dialog begin almost immediately, or is there 30 or more seconds of establishing shot(s)? What is happening in these shots? What kind of music is being played? How far away from the source of the music are the characters? What kind of conversation is to occur, and what are implications of the dialog to the story/plot/characters?
*** BTW - It also helps if the editor cuts with the audio needs in mind.
*** BTW #2 - this is not as much of a problem if it is thought out carefully during preproduction and executed properly during production. Here's an example; let's say that the scene is set in a club. Well, what's going on in a club? The music is pounding, lots of people speaking loudly to be heard over the music. However, I'll get that scene and the actors are speaking as if they are sitting on a couch having an intimate conversation. When the scene was shot the actors should have been speaking very loudly. Besides being much more realistic it also raises the "pitch" of the voices and naturally adds harmonic frequencies that have more "cut" that will make it more audible in a loud situation. Check out the club scene in "The Social Network;" the dialog is entirely realistic, showing it was performed properly on the set thus facilitating the audio post and rerecording process. The audio post community (only half jokingly) thought the movie should win the Oscar for best sound for this reason alone.
Also, would you typically make an actual cut in the music track to raise or lower the level? Or would you set an audio level automation on that track? Or would you make all audio adjustments with plug-ins on each clip, like a gain plug-in for instance?
You automate everything - volume, EQ, processing and effects.
For my specific case, I'm referring to a low key indie feature film with no music scores, just band music that mostly plays from a source within the scene, like radios and record players. Sometimes a music cue will start as a character listening to music on headphones, but will continue on into other scenes.
Now you've change the entire conversation. Most low/no/mini/micro budget indie filmmakers will not have access to the tools that a professional rerecording mixer uses as a matter of course, nor will they have the knowledge or experience. The biggest issue that most will face is processing power. In essence you are "rendering" every audio change in real time. It is not at all unusual to run five plug-ins on each of dozens, even hundreds, of audio tracks, multiple sub-mixes that may also have several plug-ins on them and to have multiple effects return channels (usually reverbs).
Probably the easiest thing for a DIY "low key indie" type to do is to use the original music track and create a preprocessed "futz" (radio or TV or headphone or whatever) version of the music on another track and cross-fade between them as needed.
I'm sure that A.P.E will chime in since mixing sound-for-picture is what he does for a living.