US20120269364A1 - Composite audio waveforms - Google Patents
Composite audio waveforms Download PDFInfo
- Publication number
- US20120269364A1 US20120269364A1 US13/540,513 US201213540513A US2012269364A1 US 20120269364 A1 US20120269364 A1 US 20120269364A1 US 201213540513 A US201213540513 A US 201213540513A US 2012269364 A1 US2012269364 A1 US 2012269364A1
- Authority
- US
- United States
- Prior art keywords
- audio
- clip
- media
- media clip
- waveform
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/10—Indexing; Addressing; Timing or synchronising; Measuring tape travel
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B27/00—Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
- G11B27/02—Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
- G11B27/031—Electronic editing of digitised analogue information signals, e.g. audio or video signals
- G11B27/034—Electronic editing of digitised analogue information signals, e.g. audio or video signals on discs
Definitions
- Digital audio players or video players are capable of playing audio and video data from digital files, such as, for example, MP3, WAV, or AIFF files.
- Known digital audio or video players are capable of showing basic information about a media file, such as the name of the file and any status or progression information regarding the playback process if the audio or video file is being played back on a digital audio or video player. This same type of information is available to and displayed by video, audio, and movie editing software.
- FIG. 1 illustrates digital movie editing software that shows the name 102 of an audio file 101 , basic time length information 104 associated with audio file 101 , and a progression status bar 106 . While this information is useful and, indeed, necessary in digital media editing, it would be beneficial to be able to see more detailed information about audio data, such as audio intensity over time, via a visual representation.
- audio data is comprised of multiple channels, such as a surround sound mix which could have six or more channels.
- a surround sound mix which could have six or more channels.
- the additional detailed information could be presented with six or more visual representations, each visual representation associated with one channel. Not only do such visual representations occupy much space on a computer display, but much of the information may not necessarily be useful (i.e., the type of digital media editing a user wants to perform does not require editing multiple channels), unless a user is interested in working specifically on one or more of those channels.
- Another problem associated with digital media editing is generating visual representations of media files, such as audio clips. Significant time and memory is required to read in all the audio data for a given audio clip and then generate a visual representation based on the audio data.
- a user may wish to begin a video clip as soon as an audio clip begins.
- an audio clip begins with silence and a video clips begins with blank video.
- simply aligning the beginning or ending (i.e., edges) of a video clip with an edge of an audio clip may not produce the desired results.
- editors of digital media may have to manually edit each clip, such as deleting “silence” at the beginning of a media clip, or manually aligning the media clips with a selection device, such as a mouse.
- a selection device such as a mouse.
- FIG. 1 is a representative screenshot of digital movie editing software displaying basic information about an audio clip without any accompanying graphic waveforms
- FIG. 2 illustrates two individual audio waveforms, each representing two channels of audio data that are coalesced to create the single composite waveform of FIG. 3 ;
- FIG. 3 is a representative screenshot of digital movie editing software displaying information about an audio clip that includes a single composite waveform that represents two channels of audio data, according to an embodiment of the invention
- FIG. 4 is a representative screenshot illustrating a “timeline snap” feature according to one embodiment of the invention, according to an embodiment of the invention.
- FIG. 5 is a block diagram that depicts a computer system upon which an embodiment of the invention may be implemented.
- Additional information may be displayed while an audio clip is being played as well as when the audio clip is not being played.
- An example of information that may be displayed is information about volume intensity at different points in time in an audio clip. Certain points of interest within an audio clip may also be automatically identified for “snapping.”
- Information about an audio clip could be used when editing movies or other video data to more accurately synchronize video data with audio data, for example.
- graphical audio waveforms corresponding to audio in a video file may be used to detect potential problems in the video file, such as loud outbursts from crowds of people.
- Graphical audio waveforms may also be used to identify points in a soundtrack to place edit points.
- Other benefits of allowing users to view information about audio data in a file, such as graphical waveforms representing intensity levels over time, will be apparent to those skilled in the art.
- the techniques described herein may be implemented in a variety of ways. Performance of such techniques may be integrated into a system or a device, or may be implemented as a stand-alone mechanism. Furthermore, the approach may be implemented in computer software, hardware, or a combination thereof.
- the techniques described herein provide users with a single visual waveform graphic that represents a characteristic produced collectively by multiple tracks of audio data in a media clip.
- Media data is digital data that represents audio or video and that can be played or generated by an electronic device, such as a sound card, video card, or digital video recorder.
- a media clip is an image, audio, or video file or any portion thereof.
- a single waveform that reflects a characteristic produced collectively by multiple tracks is referred to herein as a “composite audio waveform”.
- a displayed composite audio waveform may help in a variety of ways, such as to help a user to synchronize a song or sound clip to match action in a video clip.
- Embodiments that make use of the composite audio waveform to synchronize audio and video are useful in digital movie editing software.
- the composite audio waveform is a representation of the audio intensity (volume) produced by combining all tracks found within the audio media clip.
- a composite audio waveform that reflects the collective intensity of all tracks within an audio clip may be used to see where an audio clip builds in intensity.
- Users of movie editing software may use the visual cues provided in the composite audio waveform to align video frames to the audio. For example, users may use composite audio waveforms to align video to audio events, such as a certain drumbeat or the exact beginning or end of the audio.
- users have an option of turning the visual display of graphic waveforms on or off.
- an option to turn waveforms on or off may be a preferences option.
- the displayed waveforms may be resized or zoomed in on, allowing a user to see more details of the waveform when desired. For example, a user could select a waveform and press up and down arrows to change the zoom.
- the technique of generating and displaying a composite audio waveform may be applied to the audio within video clips, as well as to audio clips themselves.
- Audio from a video clip may be extracted from video clips that also include audio tracks. Extracting audio from a video clip allows users to move or copy the audio to a different place within a movie.
- An individual audio clip may be composed of a number of channels, e.g., two channels for stereo data—one for the left speaker and one for the right speaker.
- all channels are coalesced into a single waveform that represents all channels. That is, one waveform shows the combined audio intensity for all channels of an audio clip.
- a single waveform may show a cumulative intensity, for example, by summing the intensity volumes of the separate channels.
- FIG. 2 illustrates two individual audio waveforms, each representing two channels of audio data that are combined to create the single composite waveform of FIG. 3 .
- the sound in channel 202 is more intense at the beginning of the audio clip than the sound in channel 204 . This could occur, for example, from a guitar coming from the right speaker at the beginning of a song, but not from the left.
- FIG. 3 illustrates digital movie editing software that includes the ability to display a waveform graphic 301 for audio clip 101 of FIG. 1 , according to an embodiment of the invention.
- waveform graphic 301 indicates the average intensity of the audio data in the audio clip over time.
- Waveform graphic 301 represents a single composite waveform that represents audio from an audio clip having multiple audio channels (e.g., a stereo audio clip), where the multiple channels are coalesced into one composite waveform.
- Summing, or coalescing, the two audio channels of FIG. 2 into a single composite waveform as in FIG. 3 makes editing movies with audio more user-friendly and less confusing, while simultaneously conserving screen space on the user's display.
- the process of coalescing multiple channels into a single composite waveform may include summing, decimating, and reducing the bit depth of audio samples as described in more detail herein.
- data is decimated using a 128 point sinc function to reduce the amount of data being managed in the application.
- a 128 point sinc function is not required, and in alternative embodiments, various sinc functions may be used, such as a 400 point sinc function. Any sinc function or any other method for downsampling audio data may be used.
- a sinc function sinc(x), or “sampling function,” is a function associated with digital signal processing and the theory of Fourier transforms. The full name of the function is “sine cardinal,” but may be referred to as “sinc.”
- Sinc filters may be used in many applications of signal processing. In one embodiment, a sinc process is used to take multiple sequential samples of digital audio data and reduce them to one sample that is a weighted average of all of the samples.
- an audio file with one channel i.e. a mono audio file.
- the goal is to reduce the total number of samples in the audio file but still have a usable, representative signal.
- a 128 point sinc function will take 128 audio samples at a time, and reduce them to one sample that represents all 128 audio samples.
- this process is iterated over the entire audio file and creates a file (or creates data in memory) that is 128 th the size of the original file.
- Sinc filtering may be considered a “weighted average” using coefficients generated from a sinc function.
- the sinc function used is:
- x is the absolute value of the distance from the center of the samples being filtered.
- Decimating the data reduces memory overhead and increases speed of processing and plotting.
- the data is further reduced in size by changing the bit depth of the audio samples. For example, most audio files stored on computers use 16 bits to represent one sample of audio.
- the sample size is reduced to 8 bits of data by truncation and rounding techniques. Bit depth reduction alone lowers the memory footprint of the audio data by 50%.
- the overall data size of a 16 bit stereo file can be reduced to 1/1600th of its original size during the processing of the data prior to plotting and saving to disk.
- displaying audio waveforms is a slow, computationally expensive process that consumes a significant amount of memory.
- Techniques for displaying audio waveforms can require (1) the audio data for the waveform be read from disk, (2) the waveform to be calculated from the audio data, and (3) the waveform to be drawn to the screen, each time a particular audio waveform is to be displayed.
- caching is used to solve performance problems associated with known waveform display techniques.
- the waveform is calculated from the audio data only once during the lifetime of a “project” in movie editing software, or other software that uses the techniques of the present invention. Because the waveform is only calculated once from the audio data, the waveform does not have to be recalculated from data read from disk every time the waveform is to be displayed. In one embodiment, the waveform is calculated from the audio data only once during the lifetime of the audio data.
- the audio data is transformed into a digital image representing the waveform during a first session of an application.
- a session is the period of time a user interfaces with an application; in this case, a media editing application.
- the session begins when the user accesses the application and ends when the user quits, or closes, the application.
- the digital image is durably saved to a persistent storage, such as a hard disks, floppy disks, optical disks, or tapes.
- Input is received, e.g., from a user, to initiate a second session of the application.
- Input is also received to load the digital image from persistent storage.
- the digital image is loaded by reading the digital image from the persistent storage and displaying the digital image on a computer display, e.g., via a graphical user interface. Consequently, the digital image only needs to be calculated and generated once.
- audio data is drawn once into a digital image, and the digital image is saved.
- the saved image is much smaller than the actual audio data that it represents and therefore much faster to load and display.
- the saved image may be resized or cropped as needed within a user interface.
- the “previously-calculated” image may also be presented with faded opacity, while the waveform is being recalculated, to indicate to a user that waveform processing is in progress.
- displaying a waveform in this manner is much faster than known methods of recalculating and displaying audio waveforms.
- waveforms may be used to align video or photo clips with key audio events.
- waveforms may be more useful since waveforms visually depict information about the underlying audio data associated with the audio clip.
- the composite waveforms described above depict the intensity of audio data over time. Illustration of the changes in audio intensity over time is helpful because such changes indicate likely key audio events.
- “Snapping” refers to the process of automatically aligning a particular point in a media clip, such as a video clip, with a particular point in another media clip, such as at the beginning or ending of an audio clip. For example, a user “drags” a first visual representation (e.g., waveform) of a first media clip using an input device, such as a mouse, and aligns an edge of the first visual representation with an edge of a second visual representation of a second media clip. When the edge of the first visual representation arrives within a few pixels of the edge of the second visual representation, the edge of the first visual representation “snaps into place,” or automatically aligns with the edge of the second visual representation.
- a first visual representation e.g., waveform
- a user desires to align media clips so that the beginning of one media clip coincides with the beginning or end of another media clip. For instance, the user may wish to have her favorite music begin when video of her high school graduation ceremony begins. In this scenario, the user would using snapping align the beginning edge of the visual representation of the music clip with the beginning edge of the visual representation of the video clip.
- a technique for aligning an intra-clip POI of a media clip with an intra-clip POI or edge of a visual representation of another media clip. Instead of simply snapping to an edge of a visual representation of a media clip, it is now possible to snap to key events (i.e., intra-clip POIs) within a media clip.
- An intra-clip POI is any point within a media clip, excluding the exact beginning and ending of the media clip, with which a user might be interested in aligning another media clip.
- Intra-clip POIs include, but are not limited to the beginning and ending of silence in a media clip, and peaks of audio intensity within the media clip, such as rhythmic beats. There may be many other “interesting” points within a media clip that can be identified and subsequently used to align multiple media clips. Thus, key events in a media clip (such as the beginning of video in a video clip) may be aligned with, or “snapped” to, key events in another media clip (such as the beginning of audio in an audio clip).
- the first point will “snap,” or automatically align, with the second point when the first point arrives within a certain number of pixels of the second point.
- an audio peak in the visual representation of an audio clip may be defined as an intra-clip POI and later used to align the audio clip with video in a movie.
- video frames may be set to start or end exactly at the start or end of the audio with no awkward moments of silence.
- digital image clips may be set to change at audio peaks, such as a drumbeat or guitar solo, in an audio file.
- a snap timeline feature is implemented by displaying a snap line at the beginning and end of video and/or audio within a media clip.
- a snap line provides the user with a visual cue of precisely where an intra-clip POI is located so that the user may more easily align multiple media clips.
- the snap line may be colored differently than the colors immediately around the snap line in order for the snap line to stand out in the display.
- a snap line may be set at other intra-clip POIs.
- movie editing software utilizing the disclosed techniques may be configured to display a snap line whenever three or more frames of audio silence occur.
- snap lines may be configured to display at peaks of the waveforms that indicate loud audio events or peaks of audio transients. Snap lines may also be configured when intensity of the audio falls below a certain level and when intensity of the audio rises above a certain level.
- many intra-clip POIs in media clips may be appropriate for snap lines, and snap lines may be configured and determined in many ways.
- snap lines mark the beginning and end of clips and transitions in the timeline. Snap lines may become visible as audio clips are dragged along a timeline. When snap lines of the audio and video are aligned, this “timeline snap” feature allows a very precise fit between audio and video data that would be difficult to achieve without snap lines.
- the timeline snap feature is a powerful editing feature for movie editing software.
- the timeline snap feature is implemented by marking the locations where zero intensity audio begins and/or ends in the graphic waveform (i.e., intra-clip POIs) associated with the audio clip.
- a zero intensity location may be any location where the intensity falls below a certain threshold (i.e., zero intensity need not be absolute zero). These “zero locations” may serve as alignment guides.
- a snap line is drawn on the screen. The snap line shows where the other media clip may “snap” to, such as the beginnings or endings of the zero locations, or other intra-clip POIs.
- music stored in an audio file typically has silence at the beginning of the file.
- the video clip may be dragged and “snapped” to the first non-zero location (i.e. an intra-clip POI).
- an intra-clip POI the first non-zero location
- the user “drags” an edge of the visual representation of the video clip within a few pixels of the intra-clip POI in the audio clip
- a snap line may be drawn at the intra-clip POI and the video clip is automatically snapped to that location without the user having to manually position the edge of the visual representation of the video clip next to the intra-clip POI.
- FIG. 4 illustrates one example of how snap lines may be used to align clips, according to an embodiment of the invention.
- a snap pointer 402 is located at the left edge of a top clip 410 .
- the left edge of top clip 410 is being aligned with an intra-clip POI (e.g., beginning of audio) in a bottom clip 412 .
- intra-clip POI e.g., beginning of audio
- bottom clip 412 e.g., beginning of audio
- top clip 410 “snaps” into place when an edge of top clip 410 is within a few pixels of an intra-clip POI (i.e., snap location) in bottom clip 412 .
- a snap line 404 may be drawn as a visual indicator of an intra-clip POI. Each snap location in bottom clip 412 may be visually indicated simultaneously with a separate snap line, or only the snap lines within a specified number of pixels from an edge of a selected media clip (i.e. top clip 410 ) may be generated.
- top clip 410 left and right by attaching snap pointer 402 with an edge of top clip 410 .
- top clip 410 is also moved left and right.
- snap line 404 may change color indicating a snap.
- the audio in top clip 410 will then start in sync with a sound in the middle of the bottom clip 412 , as indicated in FIG. 4 .
- a short “pop” sound may be played in addition to, or to the exclusion of, the visual indications described above when a media clip is snapped.
- Audible indications such as a “pop” or “click” provide further alignment feedback to the user. Users may configure whether the pop sound is played.
- the snapping technique described herein allows users to quickly identify intra-clip POIs and easily align media clips to locations of silence or other intra-clip POIs within media clips.
- FIG. 5 is a block diagram that depicts a computer system 500 upon which an embodiment of the invention may be implemented.
- Computer system 500 includes a bus 502 or other communication mechanism for communicating information, and a processor 504 coupled with bus 502 for processing information.
- Computer system 500 also includes a main memory 506 , such as a random access memory (RAM) or other dynamic storage device, coupled to bus 502 for storing information and instructions to be executed by processor 504 .
- Main memory 506 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed by processor 504 .
- Computer system 500 further includes a read only memory (ROM) 508 or other static storage device coupled to bus 502 for storing static information and instructions for processor 504 .
- ROM read only memory
- a storage device 510 such as a magnetic disk or optical disk, is provided and coupled to bus 502 for storing information and instructions.
- Computer system 500 may be coupled via bus 502 to a display 512 , such as a cathode ray tube (CRT), for displaying information to a computer user.
- a display 512 such as a cathode ray tube (CRT)
- An input device 514 is coupled to bus 502 for communicating information and command selections to processor 504 .
- cursor control 516 is Another type of user input device
- cursor control 516 such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections to processor 504 and for controlling cursor movement on display 512 .
- This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane.
- the invention is related to the use of computer system 500 for implementing the techniques described herein. According to one embodiment of the invention, those techniques are performed by computer system 500 in response to processor 504 executing one or more sequences of one or more instructions contained in main memory 506 . Such instructions may be read into main memory 506 from another computer-readable medium, such as storage device 510 . Execution of the sequences of instructions contained in main memory 506 causes processor 504 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software.
- Non-volatile media includes, for example, optical or magnetic disks, such as storage device 510 .
- Volatile media includes dynamic memory, such as main memory 506 .
- Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise bus 502 . Transmission media may also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications.
- Computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer may read.
- Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to processor 504 for execution.
- the instructions may initially be carried on a magnetic disk of a remote computer.
- the remote computer may load the instructions into its dynamic memory and send the instructions over a telephone line using a modem.
- a modem local to computer system 500 may receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal.
- An infra-red detector may receive the data carried in the infra-red signal and appropriate circuitry may place the data on bus 502 .
- Bus 502 carries the data to main memory 506 , from which processor 504 retrieves and executes the instructions.
- the instructions received by main memory 506 may optionally be stored on storage device 510 either before or after execution by processor 504 .
- Computer system 500 also includes a communication interface 518 coupled to bus 502 .
- Communication interface 518 provides a two-way data communication coupling to a network link 520 that is connected to a local network 522 .
- communication interface 518 may be an integrated services digital network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line.
- ISDN integrated services digital network
- communication interface 518 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN.
- LAN local area network
- Wireless links may also be implemented.
- communication interface 518 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
- Network link 520 typically provides data communication through one or more networks to other data devices.
- network link 550 may provide a connection through local network 522 to a host computer 524 or to data equipment operated by an Internet Service Provider (ISP) 526 .
- ISP 526 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet” 528 .
- Internet 528 uses electrical, electromagnetic or optical signals that carry digital data streams.
- the signals through the various networks and the signals on network link 520 and through communication interface 518 which carry the digital data to and from computer system 500 , are exemplary forms of carrier waves transporting the information.
- Computer system 500 may send messages and receive data, including program code, through the network(s), network link 520 and communication interface 518 .
- a server 530 might transmit a requested code for an application program through Internet 528 , ISP 526 , local network 522 and communication interface 518 .
- the received code may be executed by processor 504 as it is received, and/or stored in storage device 510 , or other non-volatile storage for later execution. In this manner, computer system 500 may obtain application code in the form of a carrier wave.
Abstract
A technique for aligning a plurality of media clips is provided. One or more intra-clip points of interest (POIs) are identified in at least a first media clip. When aligning a first point in the first media clip with a second point in a second media clip, the first point may be snapped to the second point, wherein at least one of the first point and second point is an intra-clip POI. When a snap occurs, at least one of a visual or audible indication is generated, such as a “pop” sound, a snap line, or automatically aligning the first point with the second point when the first point is within a specified number of pixels of the second point. Techniques for representing multiple channels of an audio clip as a single waveform and caching waveforms are also provided.
Description
- This application claims the benefit as a Continuation of application Ser. No. 11/325,886, filed Jan. 4, 2006 the entire contents of which is hereby incorporated by reference as if fully set forth herein, under 35 U.S.C. §120. The applicant(s) hereby rescind any disclaimer of claim scope in the parent application(s) or the prosecution history thereof and advise the USPTO that the claims in this application may be broader than any claim in the parent application(s), which claims the benefit of priority from U.S. Provisional Application No. 60/642,138, filed on Jan. 5, 2005, entitled “Composite Audio Waveforms with Precision Alignment Guides”; the entire content of which is incorporated by this reference for all purposes as if fully disclosed herein.
- The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the described approaches qualify as prior art merely by virtue of their inclusion in this section.
- Digital audio players or video players are capable of playing audio and video data from digital files, such as, for example, MP3, WAV, or AIFF files. Known digital audio or video players are capable of showing basic information about a media file, such as the name of the file and any status or progression information regarding the playback process if the audio or video file is being played back on a digital audio or video player. This same type of information is available to and displayed by video, audio, and movie editing software.
-
FIG. 1 illustrates digital movie editing software that shows thename 102 of anaudio file 101, basictime length information 104 associated withaudio file 101, and aprogression status bar 106. While this information is useful and, indeed, necessary in digital media editing, it would be beneficial to be able to see more detailed information about audio data, such as audio intensity over time, via a visual representation. - Sometimes, audio data is comprised of multiple channels, such as a surround sound mix which could have six or more channels. Thus, the additional detailed information, alluded to above, could be presented with six or more visual representations, each visual representation associated with one channel. Not only do such visual representations occupy much space on a computer display, but much of the information may not necessarily be useful (i.e., the type of digital media editing a user wants to perform does not require editing multiple channels), unless a user is interested in working specifically on one or more of those channels.
- Another problem associated with digital media editing is generating visual representations of media files, such as audio clips. Significant time and memory is required to read in all the audio data for a given audio clip and then generate a visual representation based on the audio data.
- Lastly, many users of media editing software wish to align two or more media clips. For example, a user may wish to begin a video clip as soon as an audio clip begins. However, often times an audio clip begins with silence and a video clips begins with blank video. Furthermore, there may be many places within a video and audio clip, other than where the audio begins, in which a user may wish to align the media clips. Thus, it is likely that simply aligning the beginning or ending (i.e., edges) of a video clip with an edge of an audio clip may not produce the desired results.
- Because simply aligning the edges of media clips may not produce the desired results, editors of digital media may have to manually edit each clip, such as deleting “silence” at the beginning of a media clip, or manually aligning the media clips with a selection device, such as a mouse. Each of these latter techniques are prone to producing less than precise alignments where too much or too little audio is deleted at the beginning of an audio clip (when manually editing) or where a video clip may not start exactly when audio begins (when manually aligning).
- The present invention is depicted by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements and in which:
-
FIG. 1 is a representative screenshot of digital movie editing software displaying basic information about an audio clip without any accompanying graphic waveforms; -
FIG. 2 illustrates two individual audio waveforms, each representing two channels of audio data that are coalesced to create the single composite waveform ofFIG. 3 ; -
FIG. 3 is a representative screenshot of digital movie editing software displaying information about an audio clip that includes a single composite waveform that represents two channels of audio data, according to an embodiment of the invention; -
FIG. 4 is a representative screenshot illustrating a “timeline snap” feature according to one embodiment of the invention, according to an embodiment of the invention; and -
FIG. 5 is a block diagram that depicts a computer system upon which an embodiment of the invention may be implemented. - Techniques are described hereafter for providing detailed information about audio when editing digital media, for example. Additional information may be displayed while an audio clip is being played as well as when the audio clip is not being played. An example of information that may be displayed is information about volume intensity at different points in time in an audio clip. Certain points of interest within an audio clip may also be automatically identified for “snapping.”
- Information about an audio clip could be used when editing movies or other video data to more accurately synchronize video data with audio data, for example. Additionally, graphical audio waveforms corresponding to audio in a video file may be used to detect potential problems in the video file, such as loud outbursts from crowds of people. Graphical audio waveforms may also be used to identify points in a soundtrack to place edit points. Other benefits of allowing users to view information about audio data in a file, such as graphical waveforms representing intensity levels over time, will be apparent to those skilled in the art.
- The techniques described herein may be implemented in a variety of ways. Performance of such techniques may be integrated into a system or a device, or may be implemented as a stand-alone mechanism. Furthermore, the approach may be implemented in computer software, hardware, or a combination thereof.
- The techniques described herein provide users with a single visual waveform graphic that represents a characteristic produced collectively by multiple tracks of audio data in a media clip. Media data is digital data that represents audio or video and that can be played or generated by an electronic device, such as a sound card, video card, or digital video recorder. A media clip is an image, audio, or video file or any portion thereof. A single waveform that reflects a characteristic produced collectively by multiple tracks is referred to herein as a “composite audio waveform”.
- A displayed composite audio waveform may help in a variety of ways, such as to help a user to synchronize a song or sound clip to match action in a video clip. Embodiments that make use of the composite audio waveform to synchronize audio and video are useful in digital movie editing software.
- According to one embodiment, the composite audio waveform is a representation of the audio intensity (volume) produced by combining all tracks found within the audio media clip.
- A composite audio waveform that reflects the collective intensity of all tracks within an audio clip may be used to see where an audio clip builds in intensity. Users of movie editing software may use the visual cues provided in the composite audio waveform to align video frames to the audio. For example, users may use composite audio waveforms to align video to audio events, such as a certain drumbeat or the exact beginning or end of the audio.
- In one embodiment, users have an option of turning the visual display of graphic waveforms on or off. For example, an option to turn waveforms on or off may be a preferences option. In another embodiment, the displayed waveforms may be resized or zoomed in on, allowing a user to see more details of the waveform when desired. For example, a user could select a waveform and press up and down arrows to change the zoom.
- According to one embodiment, the technique of generating and displaying a composite audio waveform may be applied to the audio within video clips, as well as to audio clips themselves. Audio from a video clip may be extracted from video clips that also include audio tracks. Extracting audio from a video clip allows users to move or copy the audio to a different place within a movie.
- An individual audio clip may be composed of a number of channels, e.g., two channels for stereo data—one for the left speaker and one for the right speaker. In one embodiment, all channels are coalesced into a single waveform that represents all channels. That is, one waveform shows the combined audio intensity for all channels of an audio clip. A single waveform may show a cumulative intensity, for example, by summing the intensity volumes of the separate channels.
-
FIG. 2 illustrates two individual audio waveforms, each representing two channels of audio data that are combined to create the single composite waveform ofFIG. 3 . As shown, the sound inchannel 202 is more intense at the beginning of the audio clip than the sound inchannel 204. This could occur, for example, from a guitar coming from the right speaker at the beginning of a song, but not from the left. -
FIG. 3 illustrates digital movie editing software that includes the ability to display a waveform graphic 301 foraudio clip 101 ofFIG. 1 , according to an embodiment of the invention. As shown, waveform graphic 301 indicates the average intensity of the audio data in the audio clip over time. Waveform graphic 301 represents a single composite waveform that represents audio from an audio clip having multiple audio channels (e.g., a stereo audio clip), where the multiple channels are coalesced into one composite waveform. - Summing, or coalescing, the two audio channels of
FIG. 2 into a single composite waveform as inFIG. 3 makes editing movies with audio more user-friendly and less confusing, while simultaneously conserving screen space on the user's display. The process of coalescing multiple channels into a single composite waveform may include summing, decimating, and reducing the bit depth of audio samples as described in more detail herein. - In one embodiment, to form the composite waveform, data is decimated using a 128 point sinc function to reduce the amount of data being managed in the application. A 128 point sinc function is not required, and in alternative embodiments, various sinc functions may be used, such as a 400 point sinc function. Any sinc function or any other method for downsampling audio data may be used.
- A sinc function sinc(x), or “sampling function,” is a function associated with digital signal processing and the theory of Fourier transforms. The full name of the function is “sine cardinal,” but may be referred to as “sinc.” Sinc filters may be used in many applications of signal processing. In one embodiment, a sinc process is used to take multiple sequential samples of digital audio data and reduce them to one sample that is a weighted average of all of the samples.
- For example, consider an audio file with one channel (i.e. a mono audio file). The goal is to reduce the total number of samples in the audio file but still have a usable, representative signal. A 128 point sinc function will take 128 audio samples at a time, and reduce them to one sample that represents all 128 audio samples. In embodiments of the present invention, this process is iterated over the entire audio file and creates a file (or creates data in memory) that is 128th the size of the original file.
- Sinc filtering may be considered a “weighted average” using coefficients generated from a sinc function. In one embodiment of the present invention, the sinc function used is:
-
sinc(x)=sin(x)/x - where x is the absolute value of the distance from the center of the samples being filtered.
- Decimating the data, such as by using a sinc function, reduces memory overhead and increases speed of processing and plotting. After decimating, the data is further reduced in size by changing the bit depth of the audio samples. For example, most audio files stored on computers use 16 bits to represent one sample of audio. In one embodiment of the present invention, the sample size is reduced to 8 bits of data by truncation and rounding techniques. Bit depth reduction alone lowers the memory footprint of the audio data by 50%. Thus, the overall data size of a 16 bit stereo file can be reduced to 1/1600th of its original size during the processing of the data prior to plotting and saving to disk. In summary, such memory savings comes from coalescing stereo data (reduced to ½ the size), reducing bit depth (reduced to ½ the size again), and decimating the audio data by a 400 point sinc function (reduced additionally to 1/400th of the size).
- Typically, displaying audio waveforms is a slow, computationally expensive process that consumes a significant amount of memory. Techniques for displaying audio waveforms can require (1) the audio data for the waveform be read from disk, (2) the waveform to be calculated from the audio data, and (3) the waveform to be drawn to the screen, each time a particular audio waveform is to be displayed.
- In some embodiments, caching is used to solve performance problems associated with known waveform display techniques. In one embodiment, the waveform is calculated from the audio data only once during the lifetime of a “project” in movie editing software, or other software that uses the techniques of the present invention. Because the waveform is only calculated once from the audio data, the waveform does not have to be recalculated from data read from disk every time the waveform is to be displayed. In one embodiment, the waveform is calculated from the audio data only once during the lifetime of the audio data.
- To cache the waveform after it has been calculated from the audio data, the audio data is transformed into a digital image representing the waveform during a first session of an application. A session is the period of time a user interfaces with an application; in this case, a media editing application. The session begins when the user accesses the application and ends when the user quits, or closes, the application. Before the first session ends, the digital image is durably saved to a persistent storage, such as a hard disks, floppy disks, optical disks, or tapes.
- Input is received, e.g., from a user, to initiate a second session of the application. Input is also received to load the digital image from persistent storage. The digital image is loaded by reading the digital image from the persistent storage and displaying the digital image on a computer display, e.g., via a graphical user interface. Consequently, the digital image only needs to be calculated and generated once.
- More specifically, audio data is drawn once into a digital image, and the digital image is saved. Typically, the saved image is much smaller than the actual audio data that it represents and therefore much faster to load and display. Using common fast graphic routines, the saved image may be resized or cropped as needed within a user interface. The “previously-calculated” image may also be presented with faded opacity, while the waveform is being recalculated, to indicate to a user that waveform processing is in progress. Thus, displaying a waveform in this manner is much faster than known methods of recalculating and displaying audio waveforms.
- As discussed above, techniques are provided to display a waveform graphic of audio data from an audio clip. These waveforms may be used to align video or photo clips with key audio events. Although many visual representations of an audio clip, such as
audio clip 101 inFIG. 1 , will suffice in aligning media clips, waveforms may be more useful since waveforms visually depict information about the underlying audio data associated with the audio clip. For example, the composite waveforms described above depict the intensity of audio data over time. Illustration of the changes in audio intensity over time is helpful because such changes indicate likely key audio events. - “Snapping” refers to the process of automatically aligning a particular point in a media clip, such as a video clip, with a particular point in another media clip, such as at the beginning or ending of an audio clip. For example, a user “drags” a first visual representation (e.g., waveform) of a first media clip using an input device, such as a mouse, and aligns an edge of the first visual representation with an edge of a second visual representation of a second media clip. When the edge of the first visual representation arrives within a few pixels of the edge of the second visual representation, the edge of the first visual representation “snaps into place,” or automatically aligns with the edge of the second visual representation.
- Typically, a user desires to align media clips so that the beginning of one media clip coincides with the beginning or end of another media clip. For instance, the user may wish to have her favorite music begin when video of her high school graduation ceremony begins. In this scenario, the user would using snapping align the beginning edge of the visual representation of the music clip with the beginning edge of the visual representation of the video clip.
- However, given current snapping techniques, if the music clip begins with silence, as many audio clips do, the user will have to manually identify when the music begins and then manually align the beginning of the audio in the music clip with the beginning edge of the visual representation of the video clip. Thus, current techniques do not snap intra-clip points of interest (POI) of a media clip with an edge or intra-clip POI of another media clip.
- According to an embodiment of the invention, a technique is provided for aligning an intra-clip POI of a media clip with an intra-clip POI or edge of a visual representation of another media clip. Instead of simply snapping to an edge of a visual representation of a media clip, it is now possible to snap to key events (i.e., intra-clip POIs) within a media clip. An intra-clip POI is any point within a media clip, excluding the exact beginning and ending of the media clip, with which a user might be interested in aligning another media clip.
- Intra-clip POIs include, but are not limited to the beginning and ending of silence in a media clip, and peaks of audio intensity within the media clip, such as rhythmic beats. There may be many other “interesting” points within a media clip that can be identified and subsequently used to align multiple media clips. Thus, key events in a media clip (such as the beginning of video in a video clip) may be aligned with, or “snapped” to, key events in another media clip (such as the beginning of audio in an audio clip).
- As a user “drags” the visual representation of a first media clip, instead of struggling to position a first point (i.e., intra-clip POI) in the visual representation of the first media clip at a second point (e.g., an edge or intra-clip POI) in the visual representation of the second media clip, the first point will “snap,” or automatically align, with the second point when the first point arrives within a certain number of pixels of the second point.
- For example, when editing media clips with movie editing software, an audio peak in the visual representation of an audio clip may be defined as an intra-clip POI and later used to align the audio clip with video in a movie. As another example, video frames may be set to start or end exactly at the start or end of the audio with no awkward moments of silence. As another example, when creating a slideshow, digital image clips may be set to change at audio peaks, such as a drumbeat or guitar solo, in an audio file.
- In one embodiment, a snap timeline feature is implemented by displaying a snap line at the beginning and end of video and/or audio within a media clip. A snap line provides the user with a visual cue of precisely where an intra-clip POI is located so that the user may more easily align multiple media clips. The snap line may be colored differently than the colors immediately around the snap line in order for the snap line to stand out in the display.
- In addition, a snap line may be set at other intra-clip POIs. For example, movie editing software utilizing the disclosed techniques may be configured to display a snap line whenever three or more frames of audio silence occur. Also, snap lines may be configured to display at peaks of the waveforms that indicate loud audio events or peaks of audio transients. Snap lines may also be configured when intensity of the audio falls below a certain level and when intensity of the audio rises above a certain level. As will be apparent to those skilled in the art, many intra-clip POIs in media clips may be appropriate for snap lines, and snap lines may be configured and determined in many ways.
- Typically, snap lines mark the beginning and end of clips and transitions in the timeline. Snap lines may become visible as audio clips are dragged along a timeline. When snap lines of the audio and video are aligned, this “timeline snap” feature allows a very precise fit between audio and video data that would be difficult to achieve without snap lines. The timeline snap feature is a powerful editing feature for movie editing software.
- In one embodiment, the timeline snap feature is implemented by marking the locations where zero intensity audio begins and/or ends in the graphic waveform (i.e., intra-clip POIs) associated with the audio clip. In this context, a zero intensity location may be any location where the intensity falls below a certain threshold (i.e., zero intensity need not be absolute zero). These “zero locations” may serve as alignment guides. Thus, as a user drags another media clip across the timeline, whenever an alignment guide is approached, a snap line is drawn on the screen. The snap line shows where the other media clip may “snap” to, such as the beginnings or endings of the zero locations, or other intra-clip POIs.
- For example, music stored in an audio file typically has silence at the beginning of the file. When a user wants a video clip to start at the same time the music in the audio file begins, the video clip may be dragged and “snapped” to the first non-zero location (i.e. an intra-clip POI). When the user “drags” an edge of the visual representation of the video clip within a few pixels of the intra-clip POI in the audio clip, a snap line may be drawn at the intra-clip POI and the video clip is automatically snapped to that location without the user having to manually position the edge of the visual representation of the video clip next to the intra-clip POI.
-
FIG. 4 illustrates one example of how snap lines may be used to align clips, according to an embodiment of the invention. As shown inFIG. 4 , asnap pointer 402 is located at the left edge of atop clip 410. The left edge oftop clip 410 is being aligned with an intra-clip POI (e.g., beginning of audio) in abottom clip 412. As a user dragstop clip 410 left and right,top clip 410 “snaps” into place when an edge oftop clip 410 is within a few pixels of an intra-clip POI (i.e., snap location) inbottom clip 412. - A
snap line 404 may be drawn as a visual indicator of an intra-clip POI. Each snap location inbottom clip 412 may be visually indicated simultaneously with a separate snap line, or only the snap lines within a specified number of pixels from an edge of a selected media clip (i.e. top clip 410) may be generated. - The user may also drag
top clip 410 left and right by attachingsnap pointer 402 with an edge oftop clip 410. Thus, assnap pointer 402 is dragged left and right,top clip 410 is also moved left and right. Once the snap is made,snap line 404 may change color indicating a snap. The audio intop clip 410 will then start in sync with a sound in the middle of thebottom clip 412, as indicated inFIG. 4 . - According to one embodiment, a short “pop” sound may be played in addition to, or to the exclusion of, the visual indications described above when a media clip is snapped. Audible indications, such as a “pop” or “click” provide further alignment feedback to the user. Users may configure whether the pop sound is played.
- In sum, the snapping technique described herein allows users to quickly identify intra-clip POIs and easily align media clips to locations of silence or other intra-clip POIs within media clips.
-
FIG. 5 is a block diagram that depicts acomputer system 500 upon which an embodiment of the invention may be implemented.Computer system 500 includes abus 502 or other communication mechanism for communicating information, and aprocessor 504 coupled withbus 502 for processing information.Computer system 500 also includes amain memory 506, such as a random access memory (RAM) or other dynamic storage device, coupled tobus 502 for storing information and instructions to be executed byprocessor 504.Main memory 506 also may be used for storing temporary variables or other intermediate information during execution of instructions to be executed byprocessor 504.Computer system 500 further includes a read only memory (ROM) 508 or other static storage device coupled tobus 502 for storing static information and instructions forprocessor 504. Astorage device 510, such as a magnetic disk or optical disk, is provided and coupled tobus 502 for storing information and instructions. -
Computer system 500 may be coupled viabus 502 to adisplay 512, such as a cathode ray tube (CRT), for displaying information to a computer user. Aninput device 514, including alphanumeric and other keys, is coupled tobus 502 for communicating information and command selections toprocessor 504. Another type of user input device iscursor control 516, such as a mouse, a trackball, or cursor direction keys for communicating direction information and command selections toprocessor 504 and for controlling cursor movement ondisplay 512. This input device typically has two degrees of freedom in two axes, a first axis (e.g., x) and a second axis (e.g., y), that allows the device to specify positions in a plane. - The invention is related to the use of
computer system 500 for implementing the techniques described herein. According to one embodiment of the invention, those techniques are performed bycomputer system 500 in response toprocessor 504 executing one or more sequences of one or more instructions contained inmain memory 506. Such instructions may be read intomain memory 506 from another computer-readable medium, such asstorage device 510. Execution of the sequences of instructions contained inmain memory 506 causesprocessor 504 to perform the process steps described herein. In alternative embodiments, hard-wired circuitry may be used in place of or in combination with software instructions to implement the invention. Thus, embodiments of the invention are not limited to any specific combination of hardware circuitry and software. - The term “computer-readable medium” as used herein refers to any medium that participates in providing instructions to
processor 504 for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media includes, for example, optical or magnetic disks, such asstorage device 510. Volatile media includes dynamic memory, such asmain memory 506. Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprisebus 502. Transmission media may also take the form of acoustic or light waves, such as those generated during radio-wave and infra-red data communications. - Common forms of computer-readable media include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, or any other magnetic medium, a CD-ROM, any other optical medium, punchcards, papertape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave as described hereinafter, or any other medium from which a computer may read.
- Various forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to
processor 504 for execution. For example, the instructions may initially be carried on a magnetic disk of a remote computer. The remote computer may load the instructions into its dynamic memory and send the instructions over a telephone line using a modem. A modem local tocomputer system 500 may receive the data on the telephone line and use an infra-red transmitter to convert the data to an infra-red signal. An infra-red detector may receive the data carried in the infra-red signal and appropriate circuitry may place the data onbus 502.Bus 502 carries the data tomain memory 506, from whichprocessor 504 retrieves and executes the instructions. The instructions received bymain memory 506 may optionally be stored onstorage device 510 either before or after execution byprocessor 504. -
Computer system 500 also includes acommunication interface 518 coupled tobus 502.Communication interface 518 provides a two-way data communication coupling to anetwork link 520 that is connected to alocal network 522. For example,communication interface 518 may be an integrated services digital network (ISDN) card or a modem to provide a data communication connection to a corresponding type of telephone line. As another example,communication interface 518 may be a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links may also be implemented. In any such implementation,communication interface 518 sends and receives electrical, electromagnetic or optical signals that carry digital data streams representing various types of information. - Network link 520 typically provides data communication through one or more networks to other data devices. For example, network link 550 may provide a connection through
local network 522 to ahost computer 524 or to data equipment operated by an Internet Service Provider (ISP) 526.ISP 526 in turn provides data communication services through the world wide packet data communication network now commonly referred to as the “Internet” 528.Local network 522 andInternet 528 both use electrical, electromagnetic or optical signals that carry digital data streams. The signals through the various networks and the signals onnetwork link 520 and throughcommunication interface 518, which carry the digital data to and fromcomputer system 500, are exemplary forms of carrier waves transporting the information. -
Computer system 500 may send messages and receive data, including program code, through the network(s),network link 520 andcommunication interface 518. In the Internet example, aserver 530 might transmit a requested code for an application program throughInternet 528,ISP 526,local network 522 andcommunication interface 518. - The received code may be executed by
processor 504 as it is received, and/or stored instorage device 510, or other non-volatile storage for later execution. In this manner,computer system 500 may obtain application code in the form of a carrier wave.
Claims (20)
1. A method comprising:
generating a single audio waveform based on a plurality of audio channels within a media clip;
wherein the single audio waveform reflects a combined characteristic based on audio information from each of the plurality of audio channels; and
generating a depiction of the media clip based on the single audio waveform;
wherein the method is performed by one or more computing devices.
2. The method of claim 1 , wherein the single audio waveform depicts the audio intensity of the media clip over time.
3. The method of claim 1 , wherein generating the single audio waveform comprises coalescing two or more audio channels of the plurality of audio channels into one audio channel.
4. The method of claim 1 , wherein generating the single audio waveform comprises reducing the number of audio samples in the media clip.
5. The method of claim 4 , wherein reducing the number of audio samples in the media clip comprises using a sinc function.
6. The method of claim 1 , wherein generating the single audio waveform comprises changing the bit depth of audio samples in the media clip.
7. The method of claim 1 , wherein generating the single audio waveform comprises reducing the number of audio samples in the media clip and changing the bit depth of multiple audio samples in the media clip.
8. One or more non-transitory computer-readable media carrying instructions which, when executed by one or more processors, cause:
generating a single audio waveform based on a plurality of audio channels within a media clip;
wherein the single audio waveform reflects a combined characteristic based on audio information from each of the plurality of audio channels; and
generating a depiction of the media clip based on the single audio waveform.
9. The one or more non-transitory computer-readable media of claim 8 , wherein the single audio waveform depicts the audio intensity of the media clip over time.
10. The one or more non-transitory computer-readable media of claim 8 , wherein generating the single audio waveform comprises coalescing two or more audio channels of the plurality of audio channels into one audio channel.
11. The one or more non-transitory computer-readable media of claim 8 , wherein generating the single audio waveform comprises reducing the number of audio samples in the media clip.
12. The one or more non-transitory computer-readable media of claim 11 , wherein reducing the number of audio samples in the media clip comprises using a sinc function.
13. The one or more non-transitory computer-readable media of claim 8 , wherein generating the single audio waveform comprises changing the bit depth of audio samples in the media clip.
14. The one or more non-transitory computer-readable media of claim 8 , wherein generating the single audio waveform comprises reducing the number of audio samples in the media clip and changing the bit depth of multiple audio samples in the media clip.
15. An apparatus comprising:
one or more processors;
one or more storage media storing instructions which, when executed by the one or more processors, cause:
generating a single audio waveform based on a plurality of audio channels within a media clip;
wherein the single audio waveform reflects a combined characteristic based on audio information from each of the plurality of audio channels; and
generating a depiction of the media clip based on the single audio waveform.
16. The apparatus of claim 15 , wherein the single audio waveform depicts the audio intensity of the media clip over time.
17. The apparatus of claim 15 , wherein generating the single audio waveform comprises coalescing two or more audio channels of the plurality of audio channels into one audio channel.
18. The apparatus of claim 15 , wherein generating the single audio waveform comprises reducing the number of audio samples in the media clip.
19. The apparatus of claim 18 , wherein reducing the number of audio samples in the media clip comprises using a sinc function.
20. The apparatus of claim 15 , wherein generating the single audio waveform comprises changing the bit depth of audio samples in the media clip.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/540,513 US20120269364A1 (en) | 2005-01-05 | 2012-07-02 | Composite audio waveforms |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US64213805P | 2005-01-05 | 2005-01-05 | |
US11/325,886 US8271872B2 (en) | 2005-01-05 | 2006-01-04 | Composite audio waveforms with precision alignment guides |
US13/540,513 US20120269364A1 (en) | 2005-01-05 | 2012-07-02 | Composite audio waveforms |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/325,886 Continuation US8271872B2 (en) | 2005-01-05 | 2006-01-04 | Composite audio waveforms with precision alignment guides |
Publications (1)
Publication Number | Publication Date |
---|---|
US20120269364A1 true US20120269364A1 (en) | 2012-10-25 |
Family
ID=36642115
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/325,886 Active 2029-03-11 US8271872B2 (en) | 2005-01-05 | 2006-01-04 | Composite audio waveforms with precision alignment guides |
US13/540,513 Abandoned US20120269364A1 (en) | 2005-01-05 | 2012-07-02 | Composite audio waveforms |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/325,886 Active 2029-03-11 US8271872B2 (en) | 2005-01-05 | 2006-01-04 | Composite audio waveforms with precision alignment guides |
Country Status (1)
Country | Link |
---|---|
US (2) | US8271872B2 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9542976B2 (en) | 2013-09-13 | 2017-01-10 | Google Inc. | Synchronizing videos with frame-based metadata using video content |
Families Citing this family (57)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7725828B1 (en) * | 2003-10-15 | 2010-05-25 | Apple Inc. | Application of speed effects to a video presentation |
US20080040770A1 (en) | 2006-08-09 | 2008-02-14 | Nils Angquist | Media map for capture of content from random access devices |
US8413184B2 (en) | 2006-08-09 | 2013-04-02 | Apple Inc. | Media map for capture of content from random access devices |
US8751022B2 (en) * | 2007-04-14 | 2014-06-10 | Apple Inc. | Multi-take compositing of digital media assets |
US20080263433A1 (en) * | 2007-04-14 | 2008-10-23 | Aaron Eppolito | Multiple version merge for media production |
US20080256448A1 (en) * | 2007-04-14 | 2008-10-16 | Nikhil Mahesh Bhatt | Multi-Frame Video Display Method and Apparatus |
US20080256136A1 (en) * | 2007-04-14 | 2008-10-16 | Jerremy Holland | Techniques and tools for managing attributes of media content |
US8225207B1 (en) * | 2007-09-14 | 2012-07-17 | Adobe Systems Incorporated | Compression threshold control |
US8205148B1 (en) | 2008-01-11 | 2012-06-19 | Bruce Sharpe | Methods and apparatus for temporal alignment of media |
US8330802B2 (en) * | 2008-12-09 | 2012-12-11 | Microsoft Corp. | Stereo movie editing |
US9190110B2 (en) | 2009-05-12 | 2015-11-17 | JBF Interlude 2009 LTD | System and method for assembling a recorded composition |
US9607655B2 (en) * | 2010-02-17 | 2017-03-28 | JBF Interlude 2009 LTD | System and method for seamless multimedia assembly |
US11232458B2 (en) | 2010-02-17 | 2022-01-25 | JBF Interlude 2009 LTD | System and method for data mining within interactive multimedia |
US8819557B2 (en) | 2010-07-15 | 2014-08-26 | Apple Inc. | Media-editing application with a free-form space for organizing or compositing media clips |
US8875025B2 (en) | 2010-07-15 | 2014-10-28 | Apple Inc. | Media-editing application with media clips grouping capabilities |
US9323438B2 (en) | 2010-07-15 | 2016-04-26 | Apple Inc. | Media-editing application with live dragging and live editing capabilities |
US9251855B2 (en) | 2011-01-28 | 2016-02-02 | Apple Inc. | Efficient media processing |
US11747972B2 (en) | 2011-02-16 | 2023-09-05 | Apple Inc. | Media-editing application with novel editing tools |
US9997196B2 (en) | 2011-02-16 | 2018-06-12 | Apple Inc. | Retiming media presentations |
US8966367B2 (en) | 2011-02-16 | 2015-02-24 | Apple Inc. | Anchor override for a media-editing application with an anchored timeline |
US8600220B2 (en) | 2012-04-02 | 2013-12-03 | JBF Interlude 2009 Ltd—Israel | Systems and methods for loading more than one video content at a time |
US9009619B2 (en) | 2012-09-19 | 2015-04-14 | JBF Interlude 2009 Ltd—Israel | Progress bar for branched videos |
US8860882B2 (en) | 2012-09-19 | 2014-10-14 | JBF Interlude 2009 Ltd—Israel | Systems and methods for constructing multimedia content modules |
WO2014053474A1 (en) * | 2012-10-01 | 2014-04-10 | Kehlet Korrektur | Method and system for organising image recordings and sound recordings |
US9257148B2 (en) | 2013-03-15 | 2016-02-09 | JBF Interlude 2009 LTD | System and method for synchronization of selectably presentable media streams |
US9832516B2 (en) | 2013-06-19 | 2017-11-28 | JBF Interlude 2009 LTD | Systems and methods for multiple device interaction with selectably presentable media streams |
US10448119B2 (en) | 2013-08-30 | 2019-10-15 | JBF Interlude 2009 LTD | Methods and systems for unfolding video pre-roll |
US9530454B2 (en) | 2013-10-10 | 2016-12-27 | JBF Interlude 2009 LTD | Systems and methods for real-time pixel switching |
US9520155B2 (en) | 2013-12-24 | 2016-12-13 | JBF Interlude 2009 LTD | Methods and systems for seeking to non-key frames |
US9641898B2 (en) | 2013-12-24 | 2017-05-02 | JBF Interlude 2009 LTD | Methods and systems for in-video library |
US9792026B2 (en) | 2014-04-10 | 2017-10-17 | JBF Interlude 2009 LTD | Dynamic timeline for branched video |
US9653115B2 (en) | 2014-04-10 | 2017-05-16 | JBF Interlude 2009 LTD | Systems and methods for creating linear video from branched video |
US9792957B2 (en) | 2014-10-08 | 2017-10-17 | JBF Interlude 2009 LTD | Systems and methods for dynamic video bookmarking |
US11412276B2 (en) | 2014-10-10 | 2022-08-09 | JBF Interlude 2009 LTD | Systems and methods for parallel track transitions |
US10582265B2 (en) | 2015-04-30 | 2020-03-03 | JBF Interlude 2009 LTD | Systems and methods for nonlinear video playback using linear real-time video players |
US9672868B2 (en) | 2015-04-30 | 2017-06-06 | JBF Interlude 2009 LTD | Systems and methods for seamless media creation |
US9691429B2 (en) | 2015-05-11 | 2017-06-27 | Mibblio, Inc. | Systems and methods for creating music videos synchronized with an audio track |
US10681408B2 (en) | 2015-05-11 | 2020-06-09 | David Leiberman | Systems and methods for creating composite videos |
US9940746B2 (en) | 2015-06-18 | 2018-04-10 | Apple Inc. | Image fetching for timeline scrubbing of digital media |
US10460765B2 (en) | 2015-08-26 | 2019-10-29 | JBF Interlude 2009 LTD | Systems and methods for adaptive and responsive video |
US11164548B2 (en) | 2015-12-22 | 2021-11-02 | JBF Interlude 2009 LTD | Intelligent buffering of large-scale video |
US11128853B2 (en) | 2015-12-22 | 2021-09-21 | JBF Interlude 2009 LTD | Seamless transitions in large-scale video |
US10462202B2 (en) | 2016-03-30 | 2019-10-29 | JBF Interlude 2009 LTD | Media stream rate synchronization |
US11856271B2 (en) | 2016-04-12 | 2023-12-26 | JBF Interlude 2009 LTD | Symbiotic interactive video |
US10218760B2 (en) | 2016-06-22 | 2019-02-26 | JBF Interlude 2009 LTD | Dynamic summary generation for real-time switchable videos |
US11050809B2 (en) | 2016-12-30 | 2021-06-29 | JBF Interlude 2009 LTD | Systems and methods for dynamic weighting of branched video paths |
US10257578B1 (en) | 2018-01-05 | 2019-04-09 | JBF Interlude 2009 LTD | Dynamic library display for interactive videos |
US11601721B2 (en) | 2018-06-04 | 2023-03-07 | JBF Interlude 2009 LTD | Interactive video dynamic adaptation and user profiling |
CN109194979B (en) * | 2018-10-30 | 2022-06-17 | 湖南天鸿瑞达集团有限公司 | Audio and video processing method and device, mobile terminal and readable storage medium |
US11321904B2 (en) | 2019-08-30 | 2022-05-03 | Maxon Computer Gmbh | Methods and systems for context passing between nodes in three-dimensional modeling |
US11490047B2 (en) | 2019-10-02 | 2022-11-01 | JBF Interlude 2009 LTD | Systems and methods for dynamically adjusting video aspect ratios |
US11245961B2 (en) | 2020-02-18 | 2022-02-08 | JBF Interlude 2009 LTD | System and methods for detecting anomalous activities for interactive videos |
US11714928B2 (en) | 2020-02-27 | 2023-08-01 | Maxon Computer Gmbh | Systems and methods for a self-adjusting node workspace |
US11373369B2 (en) | 2020-09-02 | 2022-06-28 | Maxon Computer Gmbh | Systems and methods for extraction of mesh geometry from straight skeleton for beveled shapes |
CN112822543A (en) * | 2020-12-30 | 2021-05-18 | 北京达佳互联信息技术有限公司 | Video processing method and device, electronic equipment and storage medium |
US11882337B2 (en) | 2021-05-28 | 2024-01-23 | JBF Interlude 2009 LTD | Automated platform for generating interactive videos |
US11934477B2 (en) | 2021-09-24 | 2024-03-19 | JBF Interlude 2009 LTD | Video player integration within websites |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5204969A (en) * | 1988-12-30 | 1993-04-20 | Macromedia, Inc. | Sound editing system using visually displayed control line for altering specified characteristic of adjacent segment of stored waveform |
US5530454A (en) * | 1994-04-13 | 1996-06-25 | Tektronix, Inc. | Digital oscilloscope architecture for signal monitoring with enhanced duty cycle |
US5839100A (en) * | 1996-04-22 | 1998-11-17 | Wegener; Albert William | Lossless and loss-limited compression of sampled data signals |
US5874950A (en) * | 1995-12-20 | 1999-02-23 | International Business Machines Corporation | Method and system for graphically displaying audio data on a monitor within a computer system |
US20040037202A1 (en) * | 2002-08-26 | 2004-02-26 | Brommer Karl D. | Multichannel digital recording system with multi-user detection |
US20040205514A1 (en) * | 2002-06-28 | 2004-10-14 | Microsoft Corporation | Hyperlink preview utility and method |
US20050010409A1 (en) * | 2001-11-19 | 2005-01-13 | Hull Jonathan J. | Printable representations for time-based media |
US20050128110A1 (en) * | 2003-12-10 | 2005-06-16 | Matsushita Electric Industrial Co., Ltd. | A/D converter apparatus and D/A converter apparatus |
US20050248474A1 (en) * | 1997-11-07 | 2005-11-10 | Microsoft Corporation | GUI for digital audio signal filtering mechanism |
US20050259828A1 (en) * | 2004-04-30 | 2005-11-24 | Van Den Berghe Guido | Multi-channel compatible stereo recording |
US20060074321A1 (en) * | 2002-08-27 | 2006-04-06 | Kenji Kouchi | Vital sign display and its method |
US20070208565A1 (en) * | 2004-03-12 | 2007-09-06 | Ari Lakaniemi | Synthesizing a Mono Audio Signal |
US20120131462A1 (en) * | 2010-11-24 | 2012-05-24 | Hon Hai Precision Industry Co., Ltd. | Handheld device and user interface creating method |
Family Cites Families (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5371851A (en) * | 1989-04-26 | 1994-12-06 | Credence Systems Corporation | Graphical data base editor |
US5200564A (en) * | 1990-06-29 | 1993-04-06 | Casio Computer Co., Ltd. | Digital information processing apparatus with multiple CPUs |
US5999173A (en) * | 1992-04-03 | 1999-12-07 | Adobe Systems Incorporated | Method and apparatus for video editing with video clip representations displayed along a time line |
GB2280778B (en) * | 1992-04-10 | 1996-12-04 | Avid Technology Inc | Digital audio workstation providing digital storage and display of video information |
US5642171A (en) * | 1994-06-08 | 1997-06-24 | Dell Usa, L.P. | Method and apparatus for synchronizing audio and video data streams in a multimedia system |
US5732184A (en) * | 1995-10-20 | 1998-03-24 | Digital Processing Systems, Inc. | Video and audio cursor video editing system |
US20020002562A1 (en) * | 1995-11-03 | 2002-01-03 | Thomas P. Moran | Computer controlled display system using a graphical replay device to control playback of temporal data representing collaborative activities |
US6175632B1 (en) * | 1996-08-09 | 2001-01-16 | Elliot S. Marx | Universal beat synchronization of audio and lighting sources with interactive visual cueing |
US6160548A (en) * | 1997-04-15 | 2000-12-12 | Lea; Christopher B. | Method and mechanism for synchronizing hardware and software modules |
DE69833434T2 (en) * | 1997-06-30 | 2006-08-31 | Noritsu Koki Co., Ltd. | PICTURE PROCESSING APPARATUS AND RECORDING MEDIA WITH A LANGUAGE-CODED IMAGE |
US6177928B1 (en) * | 1997-08-22 | 2001-01-23 | At&T Corp. | Flexible synchronization framework for multimedia streams having inserted time stamp |
US6088027A (en) * | 1998-01-08 | 2000-07-11 | Macromedia, Inc. | Method and apparatus for screen object manipulation |
US6163510A (en) * | 1998-06-30 | 2000-12-19 | International Business Machines Corporation | Multimedia search and indexing system and method of operation using audio cues with signal thresholds |
US20020175917A1 (en) * | 2001-04-10 | 2002-11-28 | Dipto Chakravarty | Method and system for streaming media manager |
US7254455B2 (en) * | 2001-04-13 | 2007-08-07 | Sony Creative Software Inc. | System for and method of determining the period of recurring events within a recorded signal |
US8046688B2 (en) * | 2001-06-15 | 2011-10-25 | Sony Corporation | System for and method of adjusting tempo to match audio events to video events or other audio events in a recorded signal |
US7314994B2 (en) * | 2001-11-19 | 2008-01-01 | Ricoh Company, Ltd. | Music processing printer |
WO2003085801A2 (en) * | 2002-04-03 | 2003-10-16 | Borealis Technical Limited | High phase order elctrical rotating machine with distributed windings |
JP4220353B2 (en) * | 2003-11-06 | 2009-02-04 | 株式会社ケンウッド | Modulation apparatus, mobile communication system, modulation method, and communication method |
US7512886B1 (en) * | 2004-04-15 | 2009-03-31 | Magix Ag | System and method of automatically aligning video scenes with an audio track |
US20050286497A1 (en) * | 2004-05-06 | 2005-12-29 | Brad Zutaut | Directional facilitator system for transferring media content between a computer and a mobile device via a data network |
US7653204B2 (en) * | 2004-06-14 | 2010-01-26 | Broadcom Corporation | Method and system for codec with polyringer |
-
2006
- 2006-01-04 US US11/325,886 patent/US8271872B2/en active Active
-
2012
- 2012-07-02 US US13/540,513 patent/US20120269364A1/en not_active Abandoned
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5204969A (en) * | 1988-12-30 | 1993-04-20 | Macromedia, Inc. | Sound editing system using visually displayed control line for altering specified characteristic of adjacent segment of stored waveform |
US5530454A (en) * | 1994-04-13 | 1996-06-25 | Tektronix, Inc. | Digital oscilloscope architecture for signal monitoring with enhanced duty cycle |
US5874950A (en) * | 1995-12-20 | 1999-02-23 | International Business Machines Corporation | Method and system for graphically displaying audio data on a monitor within a computer system |
US5839100A (en) * | 1996-04-22 | 1998-11-17 | Wegener; Albert William | Lossless and loss-limited compression of sampled data signals |
US7257452B2 (en) * | 1997-11-07 | 2007-08-14 | Microsoft Corporation | Gui for digital audio signal filtering mechanism |
US20050248474A1 (en) * | 1997-11-07 | 2005-11-10 | Microsoft Corporation | GUI for digital audio signal filtering mechanism |
US20050010409A1 (en) * | 2001-11-19 | 2005-01-13 | Hull Jonathan J. | Printable representations for time-based media |
US20040205514A1 (en) * | 2002-06-28 | 2004-10-14 | Microsoft Corporation | Hyperlink preview utility and method |
US20040037202A1 (en) * | 2002-08-26 | 2004-02-26 | Brommer Karl D. | Multichannel digital recording system with multi-user detection |
US20060074321A1 (en) * | 2002-08-27 | 2006-04-06 | Kenji Kouchi | Vital sign display and its method |
US20050128110A1 (en) * | 2003-12-10 | 2005-06-16 | Matsushita Electric Industrial Co., Ltd. | A/D converter apparatus and D/A converter apparatus |
US20070208565A1 (en) * | 2004-03-12 | 2007-09-06 | Ari Lakaniemi | Synthesizing a Mono Audio Signal |
US20050259828A1 (en) * | 2004-04-30 | 2005-11-24 | Van Den Berghe Guido | Multi-channel compatible stereo recording |
US20120131462A1 (en) * | 2010-11-24 | 2012-05-24 | Hon Hai Precision Industry Co., Ltd. | Handheld device and user interface creating method |
Non-Patent Citations (2)
Title |
---|
"Liquifer Pro 4.0 for window User's Guide", dated 1998, Liquid Audio Inc., total pages 468. * |
"Sound Forge 7.0", released 09/02/2004, Sony Pictures Digital Inc., pages 1-274, URL: * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9542976B2 (en) | 2013-09-13 | 2017-01-10 | Google Inc. | Synchronizing videos with frame-based metadata using video content |
Also Published As
Publication number | Publication date |
---|---|
US20060150072A1 (en) | 2006-07-06 |
US8271872B2 (en) | 2012-09-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8271872B2 (en) | Composite audio waveforms with precision alignment guides | |
US11157154B2 (en) | Media-editing application with novel editing tools | |
US7869892B2 (en) | Audio file editing system and method | |
JP3185505B2 (en) | Meeting record creation support device | |
US8966367B2 (en) | Anchor override for a media-editing application with an anchored timeline | |
US8875025B2 (en) | Media-editing application with media clips grouping capabilities | |
US10380773B2 (en) | Information processing apparatus, information processing method, and computer readable medium | |
US20030124502A1 (en) | Computer method and apparatus to digitize and simulate the classroom lecturing | |
US20140035920A1 (en) | Colorization of audio segments | |
JP2003052011A (en) | Video editing method and system for editing video project | |
KR20070090751A (en) | Image displaying method and video playback apparatus | |
US11747972B2 (en) | Media-editing application with novel editing tools | |
US20040177317A1 (en) | Closed caption navigation | |
US7827297B2 (en) | Multimedia linking and synchronization method, presentation and editing apparatus | |
WO2022001579A1 (en) | Audio processing method and apparatus, device, and storage medium | |
Gohlke et al. | Track displays in DAW software: Beyond waveform views | |
US7571064B2 (en) | Display digital signal visualizations with increasing accuracy | |
JP2007267356A (en) | File management program, thumb nail image display method, and moving image reproduction device | |
US9817829B2 (en) | Systems and methods for prioritizing textual metadata | |
JP2003037806A (en) | Nonlinear editing method, device thereof program and storing medium recording the same | |
KR20140137219A (en) | Method for providing s,e,u-contents by easily, quickly and accurately extracting only wanted part from multimedia file | |
CN110662104B (en) | Video dragging bar generation method and device, electronic equipment and storage medium | |
Whitt | Audio-Video Capture, Conversion, and Editing Software | |
Loviscach | A nimble video editor that puts audio first | |
Gerhard et al. | Focus-Plus-Context Audio Interaction Design |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |