A WebVTT file starts with a header and then contains a series of data blocks. If a data block has a start and end time, it is called a WebVTT cue.
A comment is another kind of data block. Different kinds of data can be carried in WebVTT files. The HTML specification identifies captions, subtitles, chapters, audio descriptions and metadata as data kinds and specifies which one is being used in the text track kind attribute of the text track element [HTML51]. A WebVTT file must only contain data of one kind, never a mix of different kinds of data.
The environment is responsible for interpreting the data correctly. WebVTT caption or subtitle cues are rendered as overlays on top of a video viewport or into a region, which is a subarea of the video viewport. The cue box of a WebVTT cue is a box within which the text of all lines of the cue is to be rendered. The writing direction affects the interpretation of the line , position , and size cue settings to be interpreted with respect to either the width or height of the video.
By default, the writing direction is set to to horizontal. The vertical growing left writing direction could be used for vertical Chinese, Japanese, and Korean, and the vertical growing right writing direction could be used for vertical Mongolian.
A boolean indicating whether the line is an integer number of lines using the line dimensions of the first line of the cue , or whether it is a percentage of the dimension of the video. The flag is set to true when lines are counted, and false otherwise.
Cues where the flag is false will be offset as requested modulo overlap avoidance if multiple cues are in the same place. By default, the snap-to-lines flag is set to true. The line defines positioning of the cue box. The line offsets the cue box from the top, the right or left of the video viewport as defined by the writing direction , the snap-to-lines flag , or the lines occupied by any other showing tracks.
The line is set either as a number of lines, a percentage of the video viewport height or width, or as the special value auto , which means the offset is to depend on the other showing tracks. By default, the line is set to auto.
If the writing direction is horizontal , then the line percentages are relative to the height of the video, otherwise to the width of the video. A WebVTT cue has a computed line whose value is that returned by the following algorithm, which is defined in terms of the other aspects of the cue:.
If the line is numeric, the WebVTT cue snap-to-lines flag of the WebVTT cue is false, and the line is negative or greater than , then return and abort these steps. Although the WebVTT parser will not set the line to a number outside the range If the line is numeric, return the value of the WebVTT cue line and abort these steps. Either the WebVTT cue snap-to-lines flag is true, so any value, not just those in the range The line is the special value auto.
Let cue be the WebVTT cue. Let track be the text track whose list of cues the cue is in. By default, the line alignment is set to start. The line alignment is separate from the text alignment — right-to-left vs. The position defines the indent of the cue box in the direction defined by the writing direction. The position is either a number giving the position of the cue box as a percentage value or the special value auto , which means the position is to depend on the text alignment of the cue.
If the cue is not within a region , the percentage value is to be interpreted as a percentage of the video dimensions, otherwise as a percentage of the region dimensions.
By default, the position is set to auto. If the writing direction is horizontal , then the position percentages are relative to the width of the video, otherwise to the height of the video. A WebVTT cue has a computed position whose value is that returned by the following algorithm, which is defined in terms of the other aspects of the cue:. If the position is numeric between 0 and , then return the value of the position and abort these steps. Otherwise, the position is the special value auto.
If the cue text alignment is left , return 0 and abort these steps. If the cue text alignment is right , return and abort these steps. Even for horizontal cues with right-to-left cue text, the cue box is positioned from the left edge of the video viewport. This allows defining a rendering space template which can be filled with either left-to-right or right-to-left cue text, or both. Since cue text can consist of text with left-to-right base direction, or right-to-left base direction, or both on different lines , such automatic positioning would have unexpected results.
An alignment for the cue box in the dimension of the writing direction , describing what the position is anchored to, one of:. By default, the position alignment is set to auto. A WebVTT cue has a computed position alignment whose value is that returned by the following algorithm, which is defined in terms of other aspects of the cue:.
If the WebVTT cue text alignment is left , return line-left and abort these steps. If the WebVTT cue text alignment is right , return line-right and abort these steps.
If the WebVTT cue text alignment is start , return line-left if the base direction of the cue text is left-to-right, line-right otherwise. If the WebVTT cue text alignment is end , return line-right if the base direction of the cue text is left-to-right, line-left otherwise. Otherwise, return center. Since the position always measures from the left of the video for horizontal cues or the top otherwise , the WebVTT cue position alignment line-left value varies between left and top for horizontal and vertical cues.
A number giving the size of the cue box , to be interpreted as a percentage of the video, as defined by the writing direction. If the writing direction is horizontal , then the size percentages are relative to the width of the video, otherwise to the height of the video. An alignment for all lines of text within the cue box , in the dimension of the writing direction , one of:. By default, the text alignment is set to center.
The base direction of each line in a cue which is used by the Unicode Bidirectional Algorithm to determine the order in which to display the characters in the line is determined by looking up the first strong directional character in each line, using the CSS plaintext algorithm. In this example, the second cue will have a right-to-left base direction, rendering as ". Note that the text below shows all characters left-to-right; a text editor would not necessarily have the same rendering.
Where the base direction of some embedded text within a line needs to be different from the surrounding text on that line, this can be achieved by using the paired Unicode bidi formatting code characters. Again, the text below shows all characters left-to-right. The default text alignment is center alignment regardless of the base direction of the cue text. To make the text alignment of each line match the base direction of the line e. In this example, start alignment is used.
The first line is left-aligned because the base direction is left-to-right, and the second line is right-aligned because the base direction is right-to-left.
The left alignment and right alignment can be used to left-align or right-align the cue text regardless of its lines' base direction. An optional WebVTT region to which a cue belongs. By default, the region is set to null. Regions provide a means to group caption or subtitle cues so the cues can be rendered together, which is particularly important when scrolling up. Each WebVTT region consists of:. Defaults to the empty string.
A number giving the width of the box within which the text of each line of the containing cues is to be rendered, to be interpreted as a percentage of the video width.
Defaults to A number giving the number of lines of the box within which the text of each line of the containing cues is to be rendered.
Defaults to 3. Since a WebVTT region defines a fixed rendering area, a cue that has more lines than the region allows will be clipped. For scrolling regions, the clipping happens at the top, for non-scrolling regions it happens at the bottom.
Two numbers giving the x and y coordinates within the region which is anchored to the video viewport and does not change location even when the region does, e. Defaults to 0, , i. Two numbers giving the x and y coordinates within the video viewport to which the region anchor point is anchored.
The following diagram illustrates how anchoring of a region to a video viewport works. Think of it as sticking a pin through a note onto a board:. A list of zero or more WebVTT regions. Chapter cues mark up the timeline of a audio or video file in consecutive, non-overlapping intervals.
It is further possible to subdivide these intervals into sub-chapters building a navigation tree. A WebVTT file body consists of the following components, in the following order:. A WebVTT line terminator consists of one of the following:. A WebVTT region definition block consists of the following components, in the given order:. A WebVTT style block consists of the following components, in the given order:. A WebVTT cue block consists of the following components, in the given order:.
The cue payload is the text or data associated with the cue. Different cues can overlap. Cues are always listed ordered by their start time. A WebVTT timestamp consists of the following components, in the given order:. Each setting consists of the following components, in the order given:. A WebVTT percentage consists of the following components:. When interpreted as a number, a WebVTT percentage must be in the range A WebVTT comment block consists of the following components, in the given order:.
A WebVTT comment block is ignored by the parser. WebVTT metadata text cues are only useful for scripted applications e. WebVTT caption or subtitle cue text is cue payload that consists of zero or more WebVTT caption or subtitle cue components , in any order, each optionally separated from the next by a WebVTT line terminator.
The WebVTT caption or subtitle cue components are:. All WebVTT caption or subtitle cue components bar the HTML character reference may have one or more cue component class names attached to it by separating the cue component class name from the cue component start tag using the period '.
The class name must immediately follow the "period". A WebVTT cue ruby span consists of the following components, in the order given:. Cue positioning controls the positioning of the baseline text, not the ruby text. This might be extended in the future to also support an object for ruby base text as well as complex ruby, when these features are more mature in HTML and CSS.
A WebVTT cue voice span consists of the following components, in the order given:. A WebVTT cue language span consists of the following components, in the order given:.
The requirement above regarding valid BCP 47 language tag is an authoring requirement, so a conformance checker will do validity checking of the language tag, but other user agents will not. A WebVTT cue span start tag has a tag name and either requires or disallows an annotation, and consists of the following components, in the order given:. A WebVTT cue span end tag has a tag name and consists of the following components, in the order given:.
WebVTT chapter title text is cue text that makes use of zero or more of the following components, each optionally separated from the next by a WebVTT line terminator :. To define a region, a WebVTT region definition block is specified.
Each component must not be included more than once per WebVTT region settings list string. The WebVTT region settings list gives configuration options regarding the dimensions, positioning and anchoring of the region.
For example, it allows a group of cues within a region to be anchored in the center of the region and the center of the video viewport. In this example, when the font size grows, the region grows uniformly in all directions from the center. A WebVTT region identifier setting consists of the following components, in the order given:. The WebVTT region identifier setting gives a name to the region so it can be referenced by the cues that belong to the region.
A WebVTT region width setting consists of the following components, in the order given:. A WebVTT percentage. The WebVTT region width setting provides a fixed width as a percentage of the video width for the region into which cues are rendered and based on which alignment is calculated.
A WebVTT region lines setting consists of the following components, in the order given:. The WebVTT region lines setting provides a fixed height as a number of lines for the region into which cues are rendered.
As such, it defines the height of the roll-up region if it is a scroll region. A WebVTT region anchor setting consists of the following components, in the order given:. The WebVTT region anchor setting provides a tuple of two percentages that specify the point within the region box that is fixed in location. The first percentage measures the x-dimension and the second percentage y-dimension from the top left corner of the region box.
A WebVTT region viewport anchor setting consists of the following components, in the order given:. The WebVTT region viewport anchor setting provides a tuple of two percentages that specify the point within the video viewport that the region anchor point is anchored to. The first percentage measures the x-dimension and the second percentage measures the y-dimension from the top left corner of the video viewport box.
For browsers, the region maps to an absolute positioned CSS box relative to the video viewport, i. Overflow is hidden. A WebVTT region scroll setting consists of the following components, in the order given:. The WebVTT region scroll setting specifies whether cues rendered into the region are allowed to move out of their initial rendering place and roll up, i.
If the scroll setting is omitted, cues do not move from their rendered position. Cues are added to a region one line at a time below existing cue lines. When an existing rendered cue line is removed, and it was above another already rendered cue line, that cue line moves into its space, thus scrolling in the given direction. If there is not enough space for a new cue line to be added to a region, the top-most cue line is pushed off the visible region thus slowly becoming invisible as it moves into overflow:hidden.
This eventually makes space for the new cue line and allows it to be added. When there is no scroll direction, cue lines are added in the empty line closest to the line in the bottom of the region. If no empty line is available, the oldest line is replaced. A WebVTT cue setting is part of a WebVTT cue settings list and provides configuration options regarding the position and alignment of the cue box and the cue text within.
For example, a set of WebVTT cue settings may allow a cue box to be aligned to the left or positioned at the top right with the cue text within center aligned. Each of these setting must not be included more than once per WebVTT cue settings list. A WebVTT vertical text cue setting configures the cue to use vertical text layout rather than horizontal text layout.
Vertical text layout is sometimes used in Japanese, for example. The default is horizontal layout. A WebVTT line cue setting consists of the following components, in the order given:. The string " line " as the WebVTT cue setting name. The offset is for the start , center , or end of the cue box, depending on the WebVTT cue line alignment value - start by default. The offset can be given either as a percentage of the relevant writing-mode dependent video viewport dimension or as a line number.
Line numbers are based on the size of the first line of the cue. A WebVTT position cue setting consists of the following components, in the order given:. The string " position " as the WebVTT cue setting name.
For horizontal cues, this is the horizontal position. The cue position is given as a percentage of the video viewport. A WebVTT size cue setting consists of the following components, in the order given:. The string " size " as the WebVTT cue setting name.
For horizontal cues, this is the width of the cue box. It is given as a percentage of the width of the video viewport. A WebVTT alignment cue setting consists of the following components, in the order given:.
The string " align " as the WebVTT cue setting name. A WebVTT alignment cue setting configures the alignment of the text within the cue. A WebVTT region cue setting consists of the following components, in the order given:.
The string " region " as the WebVTT cue setting name. If a cue is part of a region, its cue settings for "position" and "align" are applied to the line boxes in the cue relative to the region box and the cue box width and height are calculated relative to the region dimensions rather than the viewport dimensions. For example:. In this ninety-second example, the two cues partly overlap, with the first ending before the second ends and the second starting before the first ends. This therefore is not a WebVTT file using only nested cues.
However, only a small subset of WebVTT file types are typically authored. Conformance checkers, when validating WebVTT files , may offer to restrict syntax checking for validating these types. Many captioning formats have simple ways of specifying a limited subset of text colors and background colors for text. Therefore, the WebVTT spec makes available a set of default cue component class names for WebVTT caption or subtitle cue components that authors can use in a standard way to mark up colored text and text background.
User agents that support CSS style sheets may implement this section through adding User Agent stylesheets. WebVTT caption or subtitle cue components that have one or more class names matching those in the first cell of a row in the table below must set their color property as presentational hints to the value in the second cell of the row:.
Do not use the classes blue and black on the default dark background, since they result in unreadable text. WebVTT caption or subtitle cue components that have one or more class names matching those in the first cell of a row in the table below must set their background-color property as presentational hints to the value in the second cell of the row:.
For the purpose of determining the cascade of the color and background classes, the order of appearance determines the cascade of the classes. Default classes can be changed by authors, e. Most of the steps will be skipped for chapters or metadata files. A WebVTT parser , given an input byte stream, a text track list of cues output , and a collection of CSS style sheets stylesheets , must decode the byte stream using the UTF-8 decode algorithm, and then must parse the resulting string according to the WebVTT parser algorithm below.
A WebVTT parser , specifically its conversion and parsing steps, is typically run asynchronously, with the input byte stream being updated incrementally as the resource is downloaded; this is called an incremental WebVTT parser.
A WebVTT parser verifies a file signature before parsing the provided byte stream. If the stream lacks this WebVTT file signature, then the parser aborts. The WebVTT parser algorithm is as follows:. Let input be the string being parsed, after conversion to Unicode, and with the following transformations applied:. Let position be a pointer into input , initially pointing at the start of the string. In an incremental WebVTT parser , when this algorithm or further algorithms that it uses moves the position pointer, the user agent must wait until appropriate further characters from the byte stream have been added to input before moving the pointer, so that the algorithm never reads past the end of the input string.
Once the byte stream has ended, and all characters have been added to input , then the position pointer may, when so instructed by the algorithms, be moved past the end of input. If input is less than six characters long, then abort these steps. The file does not start with the correct WebVTT file signature and was therefore not successfully processed. If position is past the end of input , then abort these steps.
The file was successfully processed, but it contains no useful data and so no WebVTT cues were added to output. Advance position to the next character in input. Otherwise, advance position to the next character in input. Let regions be an empty text track list of regions. Collect a WebVTT block , and let block be the returned value.
If block is a WebVTT cue , add block to the text track list of cues output. Otherwise, if block is a CSS style sheet , add block to stylesheets. Otherwise, if block is a WebVTT region object , add block to regions. End : The file has ended. Abort these steps. The WebVTT parser has finished. The file was successfully processed. When the algorithm above says to collect a WebVTT block , optionally with a flag in header set, the user agent must run the following steps:.
Let input , position , seen cue and regions be the same variables as those of the same name in the algorithm that invoked these steps. Let line be those characters, if any.
If position is past the end of input , let seen EOF be true. Collect WebVTT cue timings and settings from line using regions for cue. If that fails, let cue be null. Otherwise, let buffer be the empty string and let seen cue be true. If cue is not null, let the cue text of cue be buffer , and return cue. Otherwise, if stylesheet is not null, then Parse a stylesheet from buffer. Otherwise, if region is not null, then collect WebVTT region settings from buffer using region for the results.
When the WebVTT parser algorithm says to collect WebVTT region settings from a string input for a text track , the user agent must run the following algorithm. Let settings be the result of splitting input on spaces. If name is a case-sensitive match for " id ". Otherwise if name is a case-sensitive match for " width ".
If value contains any characters other than ASCII digits , then jump to the step labeled next setting. The rules to parse a percentage string are as follows.
This will return either a number in the range If at any point the algorithm says that it "fails", this means that it is aborted at that point and returns nothing. If input does not match the syntax for a WebVTT percentage , then fail.
Let percentage be the result of parsing input using the rules for parsing floating-point number values. When the algorithm above says to collect WebVTT cue timings and settings from a string input using a text track list of regions regions for a WebVTT cue cue , the user agent must run the following algorithm.
Skip whitespace. Collect a WebVTT timestamp. If that algorithm fails, then abort these steps and return failure. Otherwise, move position forwards one character. Let remainder be the trailing substring of input starting at position. Parse the WebVTT cue settings from remainder using regions for cue. When the user agent is to parse the WebVTT cue settings from a string input using a text track list of regions regions for a text track cue cue , the user agent must run the following steps:.
Otherwise let linepos be the full value string and linealign be null. Let number be the result of parsing linepos using the rules for parsing floating-point number values. Otherwise, if linealign is not null, then jump to the step labeled next setting.
Otherwise, let it be true. Otherwise let colpos be the full value string and colalign be null. Otherwise, if colalign is not null, then jump to the step labeled next setting. When this specification says that a user agent is to collect a WebVTT timestamp , the user agent must run the following steps:. Let input and position be the same variables as those of the same name in the algorithm that invoked these steps.
If position is past the end of input , return an error and abort these steps. If the character indicated by position is not an ASCII digit , then return an error and abort these steps. Interpret string as a base-ten integer. Let value 1 be that integer. If string is not exactly two characters in length, or if value 1 is greater than 59, let most significant units be hours.
If string is not exactly two characters in length, return an error and abort these steps. Let value 2 be that integer. Let value 3 be that integer. If string is not exactly three characters in length, return an error and abort these steps.
Let value 4 be that integer. If value 2 is greater than 59 or if value 3 is greater than 59, return an error and abort these steps. A WebVTT Node Object is a conceptual construct used to represent components of cue text so that its processing can be described without reference to the underlying syntax.
Cycles do not occur; the parent-child relationships so constructed form a tree structure. WebVTT Internal Node Objects also have an ordered list of class names , known as their applicable classes , and a language, known as their applicable language , which is to be interpreted as a BCP 47 language tag. User agents will add a language tag as the applicable language even if it is not a valid or not even well-formed language tag.
These represent spans of text a WebVTT cue class span in cue text , and are used to annotate parts of the cue with applicable classes without implying further meaning such as italics or bold.
A fragment of text. A timestamp. A WebVTT Timestamp Object has a value, in seconds and fractions of a second, which is the time represented by the timestamp. The WebVTT cue text parsing rules consist of the following algorithm.
The input is a string input supposedly containing WebVTT caption or subtitle cue text , and optionally a fallback language language. Loop : If position is past the end of input , return result and abort these steps. Let token be the result of invoking the WebVTT cue text tokenizer. When the steps above say to attach a WebVTT Internal Node Object of a particular concrete class, the user agent must run the following steps:.
If any of the following conditions is true, then let current be the parent node of current. Otherwise, if the tag name of the end tag token token is " lang " and current is a WebVTT Language Object , then let current be the parent node of current , and pop the top value from the language stack. Otherwise, if the tag name of the end tag token token is " ruby " and current is a WebVTT Ruby Text Object , then let current be the parent node of the parent node of current.
If that algorithm does not fail, and if position now points at the end of input i. The WebVTT cue text tokenizer is as follows. It emits a token, which is either a string whose value is a sequence of characters , a start tag with a tag name, a list of classes, and optionally an annotation , an end tag with a tag name , or a timestamp tag with a tag value.
Let tokenizer state be WebVTT data state. Loop : If position is past the end of input , let c be an end-of-file marker. Otherwise, let c be the character in input pointed to by position. An end-of-file marker is not a Unicode character, it is used to end the tokenizer. Set tokenizer state to the HTML character reference in data state , and jump to the step labeled next.
If result is the empty string, then set tokenizer state to the WebVTT tag state and jump to the step labeled next. Attempt to consume an HTML character reference , with no additional allowed character. Then, in any case, set tokenizer state to the WebVTT data state , and jump to the step labeled next.
Set tokenizer state to the WebVTT start tag annotation state , and jump to the step labeled next. Set tokenizer state to the WebVTT start tag class state , and jump to the step labeled next.
Set tokenizer state to the WebVTT end tag state , and jump to the step labeled next. Set result to c , set tokenizer state to the WebVTT timestamp tag state , and jump to the step labeled next. Advance position to the next character in input , then jump to the next "end-of-file marker" entry below. Return a start tag whose tag name is the empty string, with no classes and no annotation, and abort these steps. Set result to c , set tokenizer state to the WebVTT start tag state , and jump to the step labeled next.
Set buffer to c , set tokenizer state to the WebVTT start tag annotation state , and jump to the step labeled next. Return a start tag whose tag name is result , with no classes and no annotation, and abort these steps. Append to classes an entry whose value is buffer , set buffer to the empty string, set tokenizer state to the WebVTT start tag annotation state , and jump to the step labeled next. Append to classes an entry whose value is buffer , set buffer to c , set tokenizer state to the WebVTT start tag annotation state , and jump to the step labeled next.
Append to classes an entry whose value is buffer , set buffer to the empty string, and jump to the step labeled next. Append to classes an entry whose value is buffer , then return a start tag whose tag name is result , with the classes given in classes but no annotation, and abort these steps. It must end with a single newline.
They do not have to be unique, although it is common to number them e. A cue timing indicates when the cue is shown. It has a start and end time which are represented by timestamps.
The end time must be greater than the start time, and the start time must be greater than or equal to all previous start times. Cues may have overlapping timings. Cue settings are optional components used to position where the cue payload text will be displayed over the video. This includes whether the text is displayed horizontally or vertically. There can be zero or more of them, and they can be used in any order so long as each setting is used no more than once.
The cue settings are added to the right of the cue timings. There must be one or more spaces between the cue timing and the first setting and between each setting.
A setting's name and value are separated by a colon. The settings are case sensitive so use lower case as shown. There are five cue settings:. The first line demonstrates no settings. The second line might be used to overlay text on a sign or label. The third line might be used for a title. The last line might be used for an Asian language. The payload is where the main information or content is located.
In normal usage the payload contains the subtitles to be displayed. The payload text may contain newlines but it cannot contain a blank line, which is equivalent to two consecutive newlines. A blank line signifies the end of a cue. If you are using the WebVTT file for metadata these restrictions do not apply. In addition to the three escape sequences mentioned above, there are fours others. They are listed in the table below.
The following tags are the HTML tags allowed in a cue and require opening and closing tags e. The methods used in WebVTT are those which are used to alter the cue or region as the attributes for both interfaces are different.
We can categorize them for better understanding relating to each interface in WebVTT:. There are few steps that can be followed to write a simple webVTT file. Steps are given below:.
CSS pseudo classes allow us to classify the type of object which we want to differentiate from other types of objects. It is one of the good features supported by WebVTT is the localization and use of class elements which can be used in same way they are used in HTML and CSS to classify the style for particular type of objects, but here these are used for styling and classifying the Cues as shown below:.
The type of pseudo class is determined by the selector it is using and working is similar in nature as it works in HTML. Following CSS pseudo classes can be used:. This has been corrected. WebVTT was implemented in Firefox 24 behind the preference media. WebVTT is enabled by default starting in Firefox 31 and can be disabled by setting the preference to false. Firefox 58 now fully supports VTTRegion and its use; however, this feature is disabled by default behind the preference media.
Regions are enabled by default starting in Firefox 59 see bugs bug and bug You could use this to add a description to the file. A blank line, which is equivalent to two consecutive newlines. Zero or more cues or comments. Zero or more blank lines. They're after the bugs. A space or a newline. Zero or more characters other than those noted above.
NOTE One comment that is spanning more than one line. NOTE You can also make a comment across more than one line this way. NOTE This last line may not translate well. NOTE style blocks cannot appear after the first cue. A cue consists of five components: An optional cue identifier followed by a newline.
Cue timings. Optional cue settings with at least one space before the first and between each setting. A single newline. The cue payload text. Example 7 - Example of a cue 1 - Title Crawl Example 8 - Cue identifier from Example 7 1 - Title Crawl. Each cue timing contains five components: Timestamp for start time. At least one space. Timestamp for end time.
0コメント