Motion data format
Understanding the motion capture data format is crucial for integrating Move API output into your applications. This guide explains the structure and content of the motion capture data returned by the API.
Overview
Motion capture data from the Move API contains 3D skeletal animation information that can be used in games, animations, analysis, and other applications. The data is structured to be both human-readable and machine-processable.
Data structure
Take metadata
Each take includes metadata about the capture:
{
"id": "take_789012",
"duration": 5.2,
"frame_count": 156,
"frame_rate": 30,
"model_used": "s1",
"created_at": "2024-01-15T10:35:00Z",
"coordinate_system": {
"origin": [0, 0, 0],
"units": "meters",
"up_axis": "Y"
}
}
Frame data
Motion capture data is organized by frames, with each frame containing the 3D positions of all skeletal joints:
{
"frames": [
{
"frame_number": 0,
"timestamp": 0.0,
"joints": {
"Hips": {
"position": [0.0, 1.0, 0.0],
"rotation": [0.0, 0.0, 0.0, 1.0],
"confidence": 0.95
},
"Spine": {
"position": [0.0, 1.2, 0.0],
"rotation": [0.0, 0.0, 0.0, 1.0],
"confidence": 0.92
}
}
}
]
}
Joint structure
Standard humanoid joints
The Move API uses a standard humanoid joint hierarchy:
Hips
├── Spine
│ ├── Chest
│ │ ├── Neck
│ │ │ └── Head
│ │ ├── LeftShoulder
│ │ │ ├── LeftArm
│ │ │ │ ├── LeftForeArm
│ │ │ │ │ └── LeftHand
│ │ │ │ └── LeftHandIndex1
│ │ │ └── LeftHandThumb1
│ │ └── RightShoulder
│ │ ├── RightArm
│ │ │ ├── RightForeArm
│ │ │ │ └── RightHand
│ │ │ └── RightHandIndex1
│ │ └── RightHandThumb1
│ ├── LeftHip
│ │ ├── LeftUpLeg
│ │ │ ├── LeftLeg
│ │ │ │ └── LeftFoot
│ │ │ └── LeftToeBase
│ │ └── LeftToeEnd
│ └── RightHip
│ ├── RightUpLeg
│ │ ├── RightLeg
│ │ │ └── RightFoot
│ │ └── RightToeBase
│ └── RightToeEnd
Joint data format
Each joint contains:
- Position: 3D coordinates [x, y, z] in meters
- Rotation: Quaternion [x, y, z, w] representing orientation
- Confidence: Confidence score (0.0 to 1.0) for tracking accuracy
Export formats
The Move API supports multiple output formats for different use cases:
FBX format
FBX (Filmbox) is a proprietary format widely used in 3D animation:
- Applications: Maya, Blender, Unity, Unreal Engine
- Content: Skeletal animation with mesh data
- Advantages: Industry standard, rich metadata
- File Size: Larger due to binary format
BVH format
BVH (Biovision Hierarchy) is a text-based motion capture format:
- Applications: Motion analysis, research, some 3D software
- Content: Hierarchical skeletal data
- Advantages: Human-readable, compact
- File Size: Smaller than FBX
USD formats
USD (Universal Scene Description) is an open-source format for 3D scene data:
- USDC: Binary USD format for efficient storage and transmission
- USDZ: Compressed USD format optimized for iOS and AR applications
- Applications: Maya, Blender, Houdini, Omniverse, custom pipelines
- Content: Skeletal animation with scene composition capabilities
- Advantages: Open standard, highly composable, efficient for large scenes
- File Size: Optimized for complex scenes and pipelines
GLB format
GLB (GL Binary) is the binary format for glTF:
- Applications: Web applications, mobile apps, AR/VR
- Content: 3D models with animations
- Advantages: Compact, web-optimized, widely supported
- File Size: Efficient binary format
Blend format
Blend is the native format for Blender:
- Applications: Blender 3D software
- Content: Complete scene data with animations
- Advantages: Native Blender format, preserves all data
- File Size: Varies based on scene complexity
C3D format
C3D (Coordinate 3D) is a standard format for motion capture:
- Applications: Biomechanics, sports analysis, research
- Content: 3D coordinate data with analog data support
- Advantages: Industry standard for motion analysis
- File Size: Efficient for coordinate data
JSON format
JSON provides programmatic access to motion capture data:
- Applications: Custom applications, analysis, web integration
- Content: Raw motion capture data with metadata
- Advantages: Easy to parse, flexible structure
- File Size: Moderate, depends on frame count
Video outputs
- Render Video: Preview video showing the motion capture data
- Render Overlay Video: Preview video with motion capture data overlaid on the original video (single camera only)
Other outputs
- Sync Data: Timing information about video offsets (.pkl format)
- Motion Data: Raw motion capture data in JSON format
Coordinate system
World coordinates
The Move API uses a right-handed coordinate system:
- X-axis: Left to right
- Y-axis: Up (vertical)
- Z-axis: Forward (depth)
Units
- Distance: Meters
- Rotation: Radians (quaternions)
- Time: Seconds
- Confidence: 0.0 to 1.0 (no units)
Data quality
Confidence scores
Each joint includes a confidence score indicating tracking quality:
- 0.9-1.0: Excellent tracking
- 0.7-0.9: Good tracking
- 0.5-0.7: Fair tracking
- 0.0-0.5: Poor tracking or occluded
Quality factors
Tracking quality depends on:
- Model Used: s2 and m2 provide higher accuracy
- Camera Setup: Multi-camera setups improve quality
- Lighting: Good lighting improves tracking
- Occlusion: Hidden body parts reduce confidence
- Motion Speed: Very fast movements may reduce accuracy
Working with motion capture data
Python example
import json
# Load motion capture data
with open("motion_data.json", "r") as f:
motion_data = json.load(f)
# Access frame data
for frame in motion_data["frames"]:
frame_num = frame["frame_number"]
timestamp = frame["timestamp"]
# Access joint positions
hips_pos = frame["joints"]["Hips"]["position"]
spine_pos = frame["joints"]["Spine"]["position"]
print(f"Frame {frame_num}: Hips at {hips_pos}")
JavaScript example
// Load motion capture data
fetch('motion_data.json')
.then(response => response.json())
.then(data => {
// Process frames
data.frames.forEach(frame => {
const hips = frame.joints.Hips;
const confidence = hips.confidence;
if (confidence > 0.8) {
console.log(`High confidence frame: ${frame.frame_number}`);
}
});
});
Data processing
Filtering by Confidence
def filter_high_confidence_frames(motion_data, threshold=0.8):
filtered_frames = []
for frame in motion_data["frames"]:
# Check if all joints have high confidence
all_high_confidence = all(
joint["confidence"] > threshold
for joint in frame["joints"].values()
)
if all_high_confidence:
filtered_frames.append(frame)
return filtered_frames
Converting to Different Formats
def convert_to_custom_format(motion_data):
custom_data = {
"animation": [],
"metadata": {
"duration": motion_data["duration"],
"frame_rate": motion_data["frame_rate"]
}
}
for frame in motion_data["frames"]:
frame_data = {
"time": frame["timestamp"],
"positions": {},
"rotations": {}
}
for joint_name, joint_data in frame["joints"].items():
frame_data["positions"][joint_name] = joint_data["position"]
frame_data["rotations"][joint_name] = joint_data["rotation"]
custom_data["animation"].append(frame_data)
return custom_data
Best practices
Data Validation
- Check Confidence: Filter out low-confidence frames
- Validate Coordinates: Ensure positions are within expected ranges
- Check Completeness: Verify all expected joints are present
Performance Optimization
- Frame Sampling: Use every Nth frame for real-time applications
- Joint Filtering: Only process joints relevant to your use case
- Caching: Cache processed motion capture data for repeated use
Integration
- Coordinate System: Ensure your application uses the same coordinate system
- Units: Convert units if your application uses different measurements
- Frame Rate: Handle frame rate differences between source and target
Next steps
- API Reference - Detailed API documentation
- Usage Guides - Practical implementation examples
- GitHub Recipes - Code samples and tutorials