the comment about the mics is good because I will be shooting both in and out of cabs, so that is another technicality that I need to figure out. as you all say it is definitely and ongoing learning process.
I shot a subject inside a cab once and the sound was horrible. It got better after the cab driver shut off the AC, but I didn't have a mic on my subject, who was in the backseat, and I was shooting from the front seat, through the partition. It was nighttime, so the background of cars and lights as the cab wove in and out of NYC traffic was great, but it was very difficult to stabilize the camera and the footage needed lots of post-production correcting (Image Stabilization in Avid). Not easy, and I could only use a tiny bit of it, which is a shame because my subject was saying some great stuff back there.
As for a script, you should look at some two-column documentary scripts. One column is for video, one for audio (if you're having a narrator, music, or sound FX, this is important). I find them extremely helpful in planning, even though you don't know what's going to happen or what someone will say, you have a map to follow. Generally, you make one before you shoot, and then revise it afterwards when you know what footage you've got.
I wanted to include a link to a sample of a 2-column script from a short documentary (school project) I made a few years ago, just to give you an idea - but I had to take it off the web. It was a doc about Geo. Washington and the time I spent in NYC, and I used footage of some Rev. War reenactors, still images, and interviews. I narrated it myself. At this point, I pretty much knew what my narration script would be, but didn't have all my stills or footage yet. You can tell that I didn't know what my interview subject, a historian ("KQ"), would say, but I knew what I wanted her to talk about, so I just summarize it in my script. I just made it as a table in Word.
The important thing is having the first line of your video description on the same height in the cell as the first line of the audio description that coincides with it -- so we know what will be both seen and heard simultaneously. It's more acceptable to describe shots and camera angles in a 2-column script than the regular narrative script. If there will be lots of changes in audio, like music, voiceover, narrator, FX, and there's only one line of description for the video, that cell on the video side will just be bigger and bigger and you don't introduce a new video description until you've finished with all the audio from the previous one. It, like a regular script, needs to convey everything that's happening on the screen at one time.
For example, on the video side, if you indicate there will be a CU of an interview subject who will be talking, you put their first line of speech (or describe what they'll talk about) directly across from it in the audio column. You can easily find other samples on the web by Googling for "two column script." They're used for TV alot, too. Remember that scripts are sort of "living documents" that change, anyway. But having a preliminary two-column script for your doc is really helpful for planning and editing.
.