Smart Yoga Instructor for Guiding and Correcting Yoga Postures in Real Time : International Journal of Yoga

Secondary Logo

Journal Logo

Brief Communication

Smart Yoga Instructor for Guiding and Correcting Yoga Postures in Real Time

Kishore, D Mohan; Bindu, S1; Manjunath, Nandi Krishnamurthy

Author Information
International Journal of Yoga 15(3):p 254-261, Sep–Dec 2022. | DOI: 10.4103/ijoy.ijoy_137_22
  • Open



Yoga is a popular kind of workout practiced all over the world for its physical, mental, and spiritual benefits. Performing the wrong posture by pushing themselves beyond their flexibility limit and Practicing Yoga inaccurately without professional guidance could lead to pain and added muscular problems. Hence, performing Yoga postures accurately is an important aspect. While practicing Yoga a trainer monitoring, the correctness of the performance could be beneficial, but the challenges involved could be a shortage of time due to work pressure or having an instructor for personal classes could be expensive.

The COVID pandemic has created the awareness of the health benefits of practicing Yoga and has also made people apprehensive about taking assistance from Yoga practitioners. Computer vision techniques used for Yoga posture estimation and correction could be a promising solution but are seldom used in the domain of health and exercise due to limited literature.[1] In this study, an artificially intelligent system has been created which is able to identify the performed posture and also guide the user visually and through audio on the correctness of the posture in real-time with various steps of the yogic cycle.[2] OpenCV, an optimized open-sourced library for computer vision, machine learning, and image processing supporting multiple languages including python was used for the task of detection and identification.[3] A model was trained to estimate the closeness of the pose performed in real-time or in images and videos. The pose estimation technique involves a method of determining the key points or position of the joints to form the skeleton which is used to determine the angle between the points to give corrective feedback on the pose performed. A dataset of standard postures each for Ardha Chandrasana/Half-moon pose, Tadasana/Mountain pose, Trikonasana/Triangular pose, Veerabhadrasana/Warrior pose, and Vrikshasana/Tree pose was created and used for training and testing the model. The dataset has approximately consisted of about 6000 images of the above five postures out of which 75% of the dataset is used in training the model while 25% is used for testing. Further, human body modelling and human pose estimation are described below.

Human Body Modelling

The human body could be modeled for estimating the pose by determining the position of the joints in the human skeleton drawn over the image. Commonly, kinematic models are used which carry information about the body by its limbs and joints. Human body modeling could be done by different methods such as the skeleton-base model, where the key points represent the joints in the human body or by the planar model represented by multiple rectangular boxes showing the shape of a human body or by a volumetric model which represents 3D model of the human body with shapes and poses, as shown in Figure 1.

Figure 1:
Human body modelling

However, skeleton-based model does not represent the texture or shape of the body. The challenges faced could be in identifying the joints during pose estimation which depends on the background effect, clothing, lighting, and the view angle.

Human Pose Estimation

Computer vision is used to estimate the human pose by identifying human joints as key points in images or videos, for example, the left shoulder, right knee, elbows, wrist, etc.[4] Pose estimation tries to seek an exact pose in the space of all performed poses. It can be done by single pose or multi-pose estimation: A single object is estimated by the single pose estimation method, and multiple objects are estimated by multi-pose estimation.[5] Human posture assessment can be done by mathematical estimation called generative strategies, also pictorially named discriminative strategies.[6] Image processing techniques use AI-based models such as convolutional neural networks (CNNs) tailoring the CNN architecture for human pose inference.[7] An approach for pose estimation can be done either by bottom-up/top-down methods. In the bottom-up approach, body joints are first estimated and then grouped to form unique poses, whereas top-down methods first detect a boundary box, and only then estimate body joints.[8]

Pose Estimation Methods

Pose estimation with deep learning

When it comes to object detection, deep learning systems outperform traditional computer vision techniques. As a result, pose estimation can be significantly improved by deep learning approaches.[910] Epipolar Pose, Open Pose, Posenet, and Mediapipe are a few of the popular pose estimation techniques. From a 2D snapshot of a human stance, the Epipolar Pose creates a 3D structure. This architecture's primary benefit is that no ground truth data are needed.[11] First, a 2D picture of the human stance is taken, and then a 3D pose estimator is trained using an epipolar geometry.[12] The requirement for at least two cameras is its principal drawback. Another 2D method for position estimation is called Open Pose. Additional sources of input photos include webcams and CCTVs. The benefit of Open Pose is the simultaneous identification of important spots on the torso, face, and limbs.[13] PoseNet can estimate single or many poses from video inputs and is independent of picture size, so it provides accurate results whether the image is enlarged or contracted.[141516] A trustworthy posture estimate technique called Mediapipe architecture identifies 33 important locations in a color image. It can estimate poses from videos, detect sign language and gestures, as well as physical workouts like Yoga, dance, and other fitness postures. It can also understand gestures used for control.[17]

Pose assessment techniques for gestures relating to fitness are difficult since they involve multiple postures with a lot of room for interpretation and depend on the garment being worn at the time. The literature contains claims that Mediapipe is quick, precise, and trustworthy. We concur with the research because we have already stated that Mediapipe is superior to several of the current techniques. The Mediapipe, however, is unable to identify the neck critical point. It has problems with lighting and background contrast and requires a lot of processing time.

In a different study, we investigated different pose estimation techniques, including Epipolar Pose, Open Pose, Posenet, and Mediapipe and found that Mediapipe provided the highest level of accuracy. In contrast, we exclusively discuss posture estimation and correction in this work using Mediapipes.

Pose Estimation with Mediapipe

For Pose Estimation, the input image is fed to the Mediapipe library for extracting the key point for detection. A set of coordinates in the X, Y, and Z-axis for 33 major key points of the human body is generated. The major portion of the body from the input image is identified from the extracted coordinates and a skeleton is formed on the image. The key points extracted are indexed from 0 to 32 out of which the first 11 landmarks are used to detect the facial region, the next 11 landmarks determine the upper part of the body such as shoulders, elbows, wrists, hands, and an estimate of three fingers, namely little finger, index finger, and thumb on both hands, and finally, the 11 key points define the lower body consisting of the hips, knees, legs, and foot. All the key points together give a complete orientation of the body in 3D space. A skeleton of the human pose is drawn with the help of these points, which is then used to derive angles between these points thus enabling us to effectively correct the user's Yoga poses. As our work involves pose estimation of full-body, we have not extracted the facial features, instead, 14 key points other than facial features are extracted.

Literature Review

Practicing Yoga without an instructor is the need of the hour, but improper practice could lead to injury. Several researchers have reported on Yoga posture estimation and correction methods to solve this problem. A work on providing concise feedback was reported for Natarajasana, Trikonasana, Vrikshasana, Virbhadrasana 1 and 2, and Utkatasana. They have achieved a classification accuracy of 95% for pose identification and have provided feedback to improve the posture performed. In this work, they have identified the difference in the location of each key point with respect to that of its neighboring key point and tested for any mistakes performed.[18] A system called Pose Tutor has been developed which is an AI-based explainable pose recognition and correction system which combines vision and pose skeleton models to predict the pose. An angle detection mechanism was used for pose predictions and to detect wrongly formed joints.[19] TAGteach, an acoustic guiding system involving auditory feedback such as the generation of sound when the desired behavior occurs has been implemented to correct posture in sports, dance, and, walking. However, no correction procedure has been discussed in their work.[20] Yoga Tutor system has been developed capable of capturing user motion through a Mobile camera and have sent it to the pose detection system implemented using Open pose method for the detection using time-distributed CNN, LSTM, and SoftMax regression to analyze and predict user pose.[21] A similar pose detection method has been implemented using only a PC camera.[22] A similar efficient and low-cost, Human Pose Estimation technique has been reported based on computer vision.[23] Yoga hand gesture identification for five postures using XGBoost with Random Search CV models has been presented and have achieved an accuracy of 99.2% accuracy,[24] Correction using cosine technique has been proposed using open pose architecture. All human joints have been extracted and connected using the greedy technique. The Euclidean angle between the body parts has been calculated and compared with the reference angles.[25] However, due to limited literature reports on using Mediapipe for full body posture estimation, in our work, we have explored all possibilities of using Mediapipe architecture for full body posture estimate and have performed correction by dividing the image into four quadrants and compared them using the rule-based algorithm.


The participants involved in creating the database of selected Yoga postures involved both men and women who are graduates and undergraduates from the medical backgrounds. The participants included in the database were either BNYS, naturopathy, BSc/MSc Yoga degree holders, and Yoga practitioners aged 18–35 years. The participants who were mentally and psychologically fit and who did not have a history of any serious diseases and who did not have any significate addition to alcohol or smoking were considered.

An online questionnaire related to practicing Yoga, the presence of any ailment, and if addicted to alcohol or smoking was taken along with a consent form of willingness for including their images in the creation of a database was taken. During the selection of participants, it was ensured that only those who regularly practiced any form of Yoga for over 2–3 years were considered. However, physically inactive and people with serious health issues were excluded.

Procedure of Data Collection

The collection of huge data was done due to the unavailability of datasets related to chosen Yoga postures. For dataset creation, the participants were asked to perform Ardha Chandrasana/Half-moon pose, Tadasana/Mountain pose, Trikonasana/Triangular pose, Veerabhadrasana/Warrior pose II and Vrikshasana/Tree pose. As the data set had 6000 postures a majority of the images were taken from a high-definition Panasonic camera, web camera or mobile camera of participants in their convenient place and time. Yoga pose videos and images were captured at 4–5-m distance in front of the camera.

The Yoga pose dataset was created comprising of both males and females performing at different locations at their convent place and time. To make the data realistic and to train the model for real-life environments, the images and videos were captured in the living room, garden area, terrace, and in studios. Deliberately few images without proper illusion were also considered to enhance the ability of the model during training.

The method of the complete posture correction system is shown in Figure 2. Initially, the image of a Yoga practitioner performing an asana is captured and fed to the Mediapipe, which is a pretrained pose estimation model which detects human postures in images or videos by extracting the key points. A rule-based algorithm in which the input image is divided into four quadrants and the key points lying within the divided quadrants was compared with standard key points. Using the trained dataset, real-time pose estimation and correction are implemented. If it does not match any of the selected Yoga postures (asana) from the database, an error was shown.

Figure 2:
Block diagram of the system

Data analysis

A step-by-step description of the implementation of the proposed artificial intelligent system

  • Obtaining the dataset of 6000 images and dividing them into testing (20%) and training (80%) datasets
  • Classifying the images for 5 Yoga poses and labelling them
  • Pre-training a deep learning model using a Google teachable machine with 100 epochs, 32 batch sizes, and a learning rate of 0.001
  • Mediapipe was used to extract key points and in this work the image is divided into four quadrants and the key points lying within the divided quadrants were compared with standard key points extracted from the reference images
  • Test image is captured using a camera and then given as input to the pretrained model to detect all the key points
  • The key points detected from the pretrained model give a skeletal view of the pose
  • In the correction model, the slope formula and tan formula is used to find the angles between the key points
  • The angles from the key points of the test image are compared with the reference image
  • The difference in the angle between the test and the reference image is used to correct the posture if the difference is positive the correction direction is downward and if negative it is upward
  • Pose correction is performed by voice and text communication. This method of key extraction along with Google text-to-speech and speech-to-text was used for assistance.


Pose estimation for five Yoga postures has been done using Mediapipe for the five Yoga postures (asana) used. For simplicity, the images of the same individual are shown (after taking consent) for all estimations and comparisons. The five Yoga poses considered for posture estimation are

  1. Ardha Chandrasana/Half-moon pose:
  2. Subject has a Yoga block handy at the front right-hand corner of the mat. Start in Warrior 2 with right foot at the front of the mat, front knee in line with your toes. Place left hand on the hip and reach out and then down with right arm, place fingertips in front of right toes. Step back foot a bit forward, and shift weight into right leg. As subject press right foot down, begin to extend the standing leg, as the left leg floats up in line with the hips. Place right hand on block directly under the shoulder, toward the little-toe side of right foot. To find stability in this pose, bring left leg slightly more forward rather than backward, as it will have the tendency to float in the space behind.
  3. Tadasana/Mountain pose: Tada means a mountain. Sarna means upright, straight, unmoved. Sthitiis standing still, steadiness. Tadasana, therefore, implies a pose where one stands firm and erect as a mountain.
  4. In Tadäsana, the arms are stretched out over the head, but for the sake of convenience, subject placed them by the side of the thighs.
  5. Trikonasana/Triangular pose:
  6. Stand straight. Separate your feet comfortably wide apart. Turn right foot out 90° and left foot in by 15°. The center of the right heel with the center of the arch of left foot. Ensure that the feet are pressing the ground and the weight of the body is equally balanced on both feet.
    Inhale deeply and as exhale, bend body to the right, downward from the hips, keeping the waist straight, allowing your left hand to come up in the air while right hand comes down toward floor. Keep both arms in a straight line.
  7. Veerabhadrasana/Warrior pose II:
  8. Stand in Tadäsana. Raise both arms above the head; stretch up and join the palms. Take a deep inhalation and with a jump spread the legs apart sideways 4–4 1/5 feet. Exhale, turn to the right. Simultaneously turn the right foot 90° to the right and the left foot slightly to the right. Flex the right knee till the right thigh is parallel to the floor and the right shin perpendicular to the floor, forming a right angle between the right thigh and the right calf. The bent knee should not extend beyond the ankle, but should be in line with the heel. Stretch out the left leg and tighten at the knee. The face, chest, and right knee should face the same way as the right foot, as illustrated. Throw the head up, stretch the spine from the coccyx and gaze at the joined palms.
  9. Vrikshasana/Tree pose:
  10. In this posture, the subject bends the right leg at the knee and place the right heel at the root of the left thigh. Rest the foot on the left thigh, toes pointing downward. Balance on the left leg, join the palms and raise the arms straight over the head. The same was repeated with the right leg.

The results of the pose estimation using Mediapipe are given in Figure 3.

Figure 3:
Key point detection by Mediapipe for the postures mentioned from a-e


Pose estimation of the five Yoga postures has been done for Mediapipe architecture and is shown in Figure 3. As described in the data analysis, sample images were captured in real-time and fed individually to the model and estimated the posture accuracy. The average value of accuracy is tabulated in Table 1. Here, the strategy utilized for calculating the exactness is the classification score which is the proportion of the number of redress forecasts (CP) made to the overall number of expectations (TP) (i.e. add up to a number of forecasts = the whole of CP and the number of off-base forecasts [WP]).

Table 1:
Comparison of postures with accuracy (%) of prediction

It may be observed from the table that the exactness of expectation using Mediapipe is around 85%. The estimation accuracy is better compared to Epipolar Pose, Open Pose, and Posenet which we have reported in another work.[26] Mediapipe is preferred as it does not need a platform to be deployed and it can also detect better in low light and employs low-light filtration. It has the best library for detecting key points of the whole body accurately using single camera. Further investigation would be required to extend this procedure for other progressed stances for posture estimation and redress utilizing the same technique which includes basic apparatuses with way better exactness to help people practicing Yoga stances as a self-evaluation as well as a bio-feedback technique.

Pose correction

Pose correction is done by first extracting the 14 coordinate values using Mediapipe as shown in Figure 4. The angle at each joint, for example, between shoulder, elbow, and wrist can be calculated. Let P, Q, and R represent the three joints (points) where point q is the common point. Let the lines PQ and QR intersect at Q, then the angle between PQ and QR can be calculated by using the basic slope formula.

Figure 4:
Sketch showing angle calculation at a key point

Referring to Figure 4 PQ and QR can be considered as two bones or skeletal structures of the human body, assuming the line PQ as the elbow and line QR as the hand, the angle made between the elbow and hand can be calculated. On further applying this analysis to all the other joints, we can calculate the angles made at each joint. Referring to one of the selected Yoga poses, i.e., as an example various angles after attaining final posture in the left and right direction for Warrior II pose (shown in Figure 5) is as shown in Table 2.

Figure 5:
Final position for Warrior II Pose
Table 2:
Key point values for Warrior II pose in terms of angle

During the data preparation stage, the angles made for each Yoga pose is calculated prior and stored in the database. This analysis is done to get all the angles for a particular Yoga pose. These angles are calculated for all the five Yoga poses prior to usage and are stored in the database for reference.

Giving feedback to the user of the performed pose is of utmost importance. This helps in guiding the user to correct posture if wrongly done and thus learning to practice the Yoga pose correctly. The feedback regarding the performance of the user is provided in real-time via the display or audio messages. When the user deviates beyond the threshold value the user is notified. Based on the varying levels of flexibility one can set the threshold as 10° or 20° in either direction. Users can observe the correction and make necessary adjustments to accurately practice the Yoga routine. The feedback is given as a visual alert on the screen and also an audio message through a speaker so that the user need not turn their head in order to read the message on the screen. Also based on the Yoga pose the user may be away from the screen and may not be able to correctly read the text shown on the screen. Thus a Bluetooth-connected speaker could be a good alternative to send the feedback message to the user regarding the posture.

Step by step description of the result from the viewpoint of the end-user:

  • The voice assistant greets you with a good morning/good afternoon/good evening message and displays the Yoga poses available for the user to perform
  • Voice input is obtained from the user regarding the pose her/she wishes to perform
  • Once the input is obtained, a demo video of the selected asana is displayed to the user
  • The voice assistant also gives the steps to be followed to perform the pose
  • In addition, an image of the selected pose is displayed for user convenience
  • Now, the user has all the information about the asana he/she wants to perform
  • The voice assistant now asks the user for the pose out of the available to be performed
  • The user has to input the timing required to perform the asana (a trained person may require less time compared to others). The time selected is displaced to the user using a down counter
  • When the user performs the pose, a screenshot at the 0th s is captured and used for prediction
  • The key points on the user pose are highlighted and at the back end, it is used for comparison with the reference image
  • The correctness of the pose is checked. If there are any variations with respect to the reference image, then the correction in the form of voice and text is conveyed to the user. For example, if your accuracy is 85%, raise your hand by 10° up/down
  • Based on the correction the user will be able to perform the Yoga poses effectively
  • Hence pose correction is performed.

Limitations and future suggestions

It is observed that the accuracy of prediction using Mediapipe is around 85%. As this work involves capturing images from only one camera the accuracy of a pose is less. Also, the accuracy of a few Yoga postures in the Mediapipe are also less because the Mediapipe does not detect the neck key point. The accuracy of each of these could be increased further with an increase in the training dataset.

Further research would be needed to expand this technique for other advanced postures for pose estimation and correction using the same methodology which involves simple tools with better accuracy to assist individuals practicing Yoga postures as a self-evaluation as well as a bio-feedback mechanism.


The advancements in the technology in the field of machine learning, artificial intelligence, and computer vision have made it possible to implement human pose estimation and correction which can be effectively used in the health and fitness sector. Yoga is popular and widely accepted all over the globe, an assistive system that can guide a person to perform Yoga on their own premises without the need for a trainer has been implemented. Pose estimation for fitness applications is challenging and involves creating a huge database of asanas. Also, the challenge is due to the variety of appearances or outfits used while creating the database. In this work, a complete pose correction system using voice assistance and display message is implemented. As further development, this work can be extended to areas such as gym, Zumba, aerobics, physiotherapy for particular health conditions and effectively treating a few chronic diseases through proper Yoga practice.

Financial support and sponsorship


Conflicts of interest

There are no conflicts of interest.


1. Bakshi A, Sheikh D, Ansari Y, Sharma C, Naik H. Pose estimate based Yoga instructor Int J Recent Adv Multidiscip Top. 2021;2:70–3
2. Mahendran JK, Barry DT, Nivedha AK, Bhandarkar SM. Computer vision-based assistance system for the visually impaired using mobile edge artificial intelligence IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. 2021:2418–27
3. Verma I, Marhatta U, Sharma S, Kumar V. Age prediction using image dataset using machine learning Int J Innov Technol Explor Eng. 2020;8:107–13
4. Zhao M, Li T, Abu Alsheikh M, Tian Y, Zhao H, Torralba A, et al Through-wall human pose estimation using radio signals InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:7356–7365
5. Li J, Wang C, Zhu H, Mao Y, Fang HS, Lu C. Crowdpose: Efficient crowded scenes pose estimation and a new benchmark Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019:10863–72
6. Sarafianos N, Boteanu B, Ionescu B, Kakadiaris IA. 3D Human pose estimation: A review of the literature and analysis of covariates Comput Vis Image Underst. 2016;152:1–20
7. Jiang Y, Li C. Convolutional neural networks for image-based high-throughput plant phenotyping: A review Plant Phenomics. 2020;2020:4152816.
8. Wang M, Tighe J, Modolo D. Combining detection and tracking for human pose estimation in videos InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020:11088–96
9. Pauzi AS, Bin MN, Sani S, Bataineh AM, Hisyam MN, Jaafar MH, et al Movement estimation using mediapipe blazepose Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2021;13051:562–71
10. Chai X, Zhou F, Chen X. Epipolar constraint of single-camera mirror binocular stereo vision systems Opt Eng. 2017;56:1.
11. Pawang F. OpenPose: Human Pose Estimation MethodLast accessed on 2022 Jul 25 Available from:
12. Walch F, Hazirbas C, Leal-Taixe L, Sattler T, Hilsenbeck S, Cremers D. Image-Based Localization Using LSTMs for Structured Feature Correlation Proceedings of the IEEE International Conference on Computer Vision. 2017:627–37
13. He Y, Yan R, Fragkiadaki K, Yu SI. Epipolar transformer for multi-view human pose estimation IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops. 2020:4466–71
14. Haque S, Rabby AS, Laboni MA, Neehal N, Hossain SA. ExNET: Deep Neural Network for Exercise Pose Detection Communications in Computer and Information Science. 2019:186–93
15. Mehta D, Sotnychenko O, Mueller F, Xu W, Elgharib M, Fua P, et al Xnect: Real-time multi-person 3d human pose estimation with a single rgb camera arXiv preprint arXiv:1907.00837. 2019
16. Jose J, Shailesh S. Yoga Asana identification: A deep learning approach IOP Conf Ser Mater Sci Eng. 2021;1110:012002.
17. Yadav SK, Singh A, Gupta A, Raheja JL. Real-time Yoga recognition using deep learning Neural Comput Appl. 2019;31:9349–61
18. Chaudhari A, Dalvi O, Ramade O, Ambawade D. Yog-Guru: Real-Time Yoga Pose Correction System Using Deep Learning Methods Proceedings-International Conference on Communication, Information and Computing Technology (ICCICT). 2021:2021
19. Dittakavi B, Bavikadi D, Desai SV, Chakraborty S, Reddy N, Balasubramanian VN, et al Pose Tutor: An Explainable System for Pose Correction in the Wild In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition;. 2022:3540–9
20. Ennett TM, Zonneveld KLM, Thomson KM, Vause T, Ditor D. Comparison of two TAGteach error-correction procedures to teach beginner yoga poses to adults J Appl Behav Anal. 2020;53:222–36
21. Rishan F, De Silva B, Alawathugoda S, Nijabdeen S, Rupasinghe L, Liyanapathirana C. Infinity Yoga Tutor: Yoga Posture Detection and Correction System Proceedings of ICITR 2020-5th International Conference on Information Technology Research: Towards the New Digital Enlightenment. 2020
22. Thar MC, Winn KZ, Funabiki N. A Proposal of Yoga Pose Assessment Method Using Pose Detection for Self-Learning 2019 International Conference on Advanced Information Technologies, ICAIT 2019. 2019:137–42
23. Gupta A, Jangid A. Yoga Pose Detection and Validation 2021 International Symposium of Asian Control Association on Intelligent Robotics and Industrial Automation, IRIA 2021. 2021:319–24
24. Sharma A, Agrawal Y, Shah Y, Jain P. iYogacare: Real-Time Yoga Recognition and Self-Correction for Smart Healthcare IEEE Consumer Electronics Magazine. 2022
25. Muley S, Mahajan P, Medidar P, Chavan H, Patil P. Yoga guidance using human pose estimation IRJMETS. 2020;2:1533–7
26. Mohan Kishore D, Bindu SN. Estimation of yoga poses using machine learning techniques Int J Yoga. 2022:15

Human posture recognition; Mediapipe; pose detection and pose correction; real-time; Yoga

© 2022 International Journal of Yoga | Published by Wolters Kluwer – Medknow