This paper addresses the challenge of bridging the semantic gap between the rich meaning users desire when they query to locate and browse media and the shallowness of media descriptions that can be computed in today's content management systems. To facilitate high-level semantics-based content annotation and interpretation, we tackle the problem of automatic decomposition of motion pictures into meaningful story units, namely scenes. Since a scene is a complicated and subjective concept, we first propose guidelines from film production to determine when a scene change occurs. We then investigate different rules and conventions followed as part of Film Grammar that would guide and shape an algorithmic solution for determining a scene. Two different techniques using intershot analysis are proposed as solutions in this paper. Further, we present different refinement mechanisms such as film punctuation detection founded on film grammar to further improve the results. The refinement techniques demonstrate significant improvements in overall performance.