Sacado: Fix loop for hierarchical.

One loop in GeneralFad wasn't correct for Cuda with hierarchical parallelism. This would only affect computations with a passive scalar (i.e., variable declared as a Fad but had a zero-length derivative array).

