Content-based video structuring above shot level faces technical challenges in semantic feature extraction and flexible shot cluster organization. Aiming at solving these problems, a two-layer shot clustering approach for home video structuring, which operates directly in MPEG domain, is proposed. Such an approach goes one-step further than conversional one-layer structure to constructs a hierarchical content structure to represent more details of video contents as well as their interior correlations. With two independent aspects of human perception taken into consideration, this structure provides fine-grained organization of video shots. Promising results are achieved in the experiments made on MPEG-7 test videos.