Aneesh Tickoo, Author at MarkTechPost

Researchers from Yale and Google DeepMind Unlock Math Problem-Solving Success with Advanced Fine-Tuning Techniques on Large Language Models

Aneesh Tickoo — Thu, 26 Oct 2023 11:30:00 +0000

Even the most advanced large language models (LLMs), such as GPT-4 and PaLM 2, find it difficult to solve mathematical issues since they call for imagination, mathematical reasoning, and computation. The chance of LLMs being able to discover a proper answer is considerably higher when they are permitted to tackle the problem many times. Therefore, LLMs already demonstrate the potential to improve on this arithmetic problem-solving challenge. For instance, the pre-trained PaLM 2- L can reach about 33.4% accuracy with greedy decoding. However, 79.4% of the time, there is at least one accurate answer (pass@64) when sampling 64 solutions using temperature sampling (Table 1).

Table 1: Results of the fine-tuning of supervised solutions. The MATH dataset and the PRM800K dataset, which are two different sources of training data, are contrasted.

This significant performance disparity shows that LLMs may be able to generate accurate answers but have difficulty differentiating between proper and erroneous solutions. Therefore, to narrow the performance as mentioned above difference, they investigate task-specific fine-tuning techniques that might enhance the LLM’s capacity for solution development and assessment.

They examine three fine-tuning techniques:

(1) SSFT, supervised step-by-step solution fine-tuning. They study if the pre-trained LLMs may profit from a supervised fine-tuning step as a starting point technique.

They adjust the LLMs to provide the whole solution and answer.

(2) Solution-cluster Reranking (SCR). They keep perfecting the generator as a solution evaluator for candidate solution reranking to improve the LLM’s capability to evaluate solutions. While earlier research has looked at such a solution sample-rank or reranking, they offer a novel method combining the advantages of majority voting with reranking while lowering ranking costs. To be more precise, as a preliminary stage in majority voting, they first sort the candidate replies into several groups based on their mathematical equivalency. Then, to enhance the outcomes of the majority vote even more, they apply the solution evaluator to the solutions in the most frequent clusters.

(3) Sequential multi-tasking fine-tuning. In addition to the solution assessment task, they are also interested in enhancing the LLM’s performance on the solution-generating task and determining if the solution evaluation task’s training objective may help the model generate solutions.

To achieve this, they provide a sequential multi-task learning environment where the solution assessment task is framed as a natural language generation problem, such that its training goal may offer a valuable supervision signal to the solution generation model. In further detail, they adjust the model in three stages: (1) as a generator (SSFT), (2) as a solution evaluator (SCR), and (3) again as a generator (SSFT).

They do extensive research using PaLM 2-S* and PaLM 2-L, the small and big forms of PaLM 2, on the difficult MATH dataset, which results in the following conclusions:

• Since SSFT benefits more from fine-grained, well-formatted answers, the caliber and style of the step-by-step solutions can significantly influence the refined model.

• Reranking only the most common solution clusters can result in better performance than reranking all of the solutions, and it can also improve computational efficiency, which is why they think it would be a better standard practice for future work.

• They demonstrate the benefit of training the model for both solution generation and evaluation tasks and present a successful attempt at leveraging the learning signal of a binary evaluation task for a generation model. Their proposed multi-task sequential fine-tuning can more effectively improve the performance of the solution generation model compared with supervised solution fine-tuning only.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 32k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post Researchers from Yale and Google DeepMind Unlock Math Problem-Solving Success with Advanced Fine-Tuning Techniques on Large Language Models appeared first on MarkTechPost.

Researchers from Google and the University of Toronto Introduce Groundbreaking Zero-Shot Agent for Autonomous Learning and Task Execution in Live Computer Environments

Aneesh Tickoo — Wed, 25 Oct 2023 23:55:05 +0000

https://arxiv.org/abs/2310.08740

" data-medium-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-25-at-4.53.14-PM-300x159.png" data-large-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-25-at-4.53.14-PM-1024x543.png" />https://arxiv.org/abs/2310.08740

Large language models (LLMs) for action production in various live contexts, such as ALFWORLD and ALPHACODE, have shown promise in earlier efforts. Examples include SAYCAN, REACT, TOOLFORMER, and SWIFTSAGE. LLMs are used similarly to follow expert trails, understand environmental changes, plan and carry out future activities, and compose API requests. Several studies, including REFLEXION and SELF-REFINE, have demonstrated that repeatedly performing a task with numerous rounds of self-reflection may significantly enhance task completion. LLMs are asked to modify a previous execution plan in light of environmental feedback. Such adjustments are incorporated into the action generator’s prompt for the subsequent round.

MINIWOB++ has recently been utilized as a testbed to evaluate LLM’s performance on modularized computing workloads. Using comprehensive trace examples of the task for direct supervision (WebGUM), self-supervision, or few/many shot prompting (SYNAPSE) are standard methods for learning a task. They have completed dozens of computer jobs with a task completion rate greater than 90%, seemingly solving the computer control issue. Nonetheless, the need for expert traces constrains the agent’s capacity to learn new jobs. Can an agent independently know and enhance its control over a computer without utilizing well-chosen traces as guidance? Researchers from Google Research and the University of Toronto suggest a zero-shot agent to answer this query.

Their agent is built on top of PaLM2, a recent LLM, and it uses a single set of instruction prompts for all activities rather than task-specific prompts. Additionally, contemporary efforts like RCI, ADAPLANNER, and SYNAPSE use screen representations that might include a lot more data than what is displayed to the user on the screen. For instance, Fig. 1 illustrates items that are contained in the HTML that are provided to the LLM but are not displayed on the screen. Arbitrarily, using this new knowledge makes the agent’s ability to complete the task easier. However, in typical usage scenarios, such information might not be easily accessible and, depending on it, could limit how widely the agent can be applied.

Figure 1 shows disparate displays on screens. Fig. 1a–1c shows the social media task before and after pressing the “more” button (seed=2). HTML has already made the material visible before clicking. Fig. 1d-1e: The click-tab-2 (seed=0) has a similar problem.

13 rather difficult jobs on MINIWOB++ that are meant to span many screens were carefully evaluated, and they discovered that 5 of them included HTML that contained such information—multi-screen information in a single observation. These are the contributions they made: First, in comparison to earlier studies, they adopt a condensed screen depiction, which makes the test environment more all-encompassing and realistic. Second, they provide a straightforward but effective action planner that, in a single pass, precisely plans out executable operations on a state. They demonstrate that such a “naive” approach can complete nearly all the simple tasks on the MINIWOB++ benchmark using the most recent LLM capacity.

To help the agent successfully learn from exploratory failures and advance in more difficult tasks, they suggest a systematic thought management technique that draws influence from Reflexion. Their agent achieves performance equivalent to previous few/many-shot state-of-the-art after a few rounds of tries. Their agent is the first zero-shot design for computer control tasks that they are aware of, according to research.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post Researchers from Google and the University of Toronto Introduce Groundbreaking Zero-Shot Agent for Autonomous Learning and Task Execution in Live Computer Environments appeared first on MarkTechPost.

Researchers from CMU and UC Santa Barbara Propose Innovative AI-Based ‘Diagnosis of Thought’ Prompting for Cognitive Distortion Detection in Psychotherapy

Aneesh Tickoo — Tue, 24 Oct 2023 18:01:51 +0000

https://arxiv.org/abs/2310.07146

" data-medium-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-24-at-8.59.59-PM-300x136.png" data-large-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-24-at-8.59.59-PM-1024x463.png" />https://arxiv.org/abs/2310.07146

In the entire world, about one in eight persons have mental problems. However, mental health disorders are significantly underserved for various reasons, such as a lack of mental health specialists, subpar treatments, prohibitive costs, and societal stigma. In high-income regions, treatment coverage for mental health services is 33%; in low- and lower-middle-income areas, it is just 8%. According to an APA report published recently, six of ten psychologists “no longer have openings for new patients.” Continuous work has gone into creating automated tools for mental health assistance, such as compassionate chatbots and sentiment analysis, to lessen the impact of such circumstances.

Existing efforts, however, typically make superficial heuristic attempts, such as emotion analysis and producing consoling reactions. Such systems still have much to learn about how to contribute to professional psychotherapy, which calls for in-depth research into the patient’s thought processes, the creation of cognition models, and techniques for reconstructing cognition models. Commonly used traditional treatment paradigms like cognitive-behavior therapy (CBT) and acceptance and commitment therapy (ACT) are built around these techniques. Building professional support for psychotherapy is made more difficult because most data sources documenting the interactions between patients and licensed professionals are confidential.

Recent advancements in the Large Language Model (LLM) development reveal this model’s astounding aptitude for different textual reasoning problems in a zero-shot environment. ChatGPT and GPT-4 provide highly promising results in the traditional Sally-Anne exam, which assesses the fundamental theory of the mind’s capacity to ascribe mental states, including beliefs, emotions, wants, etc. Further, using this capacity for intricate cognitive analysis and reasoning is promising. The moment is perfect for building expert, focused, organized AI support for psychotherapy. They take the first step in this work by examining the first essential procedure in cognitive behavior therapy (CBT), the job of cognitive distortion identification.

Researchers from Carnegie Mellon University and the University of California, Santa Barbara, suggest the Diagnosis of Thought (DoT) prompting, which was inspired by how psychotherapy specialists undertake sophisticated diagnosis over the patient’s speech. In DoT, they use three steps to diagnose the patient’s speech: subjective evaluation, contrastive reasoning, and schema analysis. They separate the patient’s subjective ideas from the objective facts while doing a subjective evaluation. In contrastive reasoning, they extract the justifications for and against the patient’s ideas. Finally, they summarise the underlying thinking schema and connect it to the various forms of cognitive distortions in schema analysis.

With the most recent top-performing LLMs, they do extensive trials. DoT achieves over 10% and 15% relative gains for distortion evaluation and classification on ChatGPT in zero-shot settings, respectively. The diagnostic procedure is fully interpretable thanks to the generated justifications during the three steps, and human specialists further confirm their quality. They demonstrate the enormous potential of LLM in enhancing professional psychotherapy. This investigation acts as the starting point for a bigger project; they invite the communities of AI and psychotherapy to work together on a joint venture. Their ultimate objective is to provide expert, secure, AI-driven help that can significantly improve mental health support systems.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post Researchers from CMU and UC Santa Barbara Propose Innovative AI-Based ‘Diagnosis of Thought’ Prompting for Cognitive Distortion Detection in Psychotherapy appeared first on MarkTechPost.

MIT Researchers Introduce a New Training-Free and Game-Theoretic AI Procedure for Language Model Decoding

Aneesh Tickoo — Mon, 23 Oct 2023 13:22:35 +0000

A few tasks requiring the creation or verification of factual assertions—such as question answering, fact-checking, and even the generation of unconditional text—are relatively successfully handled by current language models (LMs). However, growing evidence shows that LMs become more prone to producing erroneous but often repeated comments as size increases. They are far from being completely dependable. The fact that LMs have several affordances for resolving factual generation tasks further complicates issues.

They can be used both generatively (by asking for the most likely answer to a question) and discriminatively (by presenting a (question-answer pair and asking whether the answer is acceptable), but these two methods sometimes yield different results. Generative methods can fail when probability mass is spread across multiple contradictory answers, whereas discriminative methods can fail because of miscalibration or a subtle dependence on the question. How should they extract an LM’s best estimate about the truth from these chaotic and frequently contradicting signals? The CONSENSUS GAME, a signaling game, is used in this research by researchers from MIT to offer a method for bridging generative and discriminative LM decoding processes.

A DISCRIMINATOR agent must convey an abstract correct or wrong value to a GENERATOR agent at a high level. Still, it can only do so by utilizing a limited number of potential natural language strings. It seems to reason that a combined policy, where the GENERATOR and DISCRIMINATOR agree on the assignment of strings to correctness values, would be a successful approach for this game. They can examine an approach like that to find candidates everyone agrees are right. A multi-step game with a difficult (string-valued) action space must be solved to do this. No-regret learning algorithms have been popular recently as the go-to method for calculating winning tactics in games like Poker, Stratego, and Diplomacy.

Here, they demonstrate that they may also be used for tasks involving the creation of free-form languages. This game-theoretic method of LM decoding is known as EQUILIBRIUM-RANKING. When used in 6 benchmarks for question-answering performance (MMLU, ARC, RACE, HHH, TruthfulQA, and GSM8K), EQUILIBRIUM-RANKING significantly outperforms the generative, discriminative, and mixed decoding techniques now in use. In a broader sense, their findings demonstrate how the game-theoretic toolset may be used to formalize and enhance coherence in LMs. The accuracy of factual tasks also improves as a result of increased coherence.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post MIT Researchers Introduce a New Training-Free and Game-Theoretic AI Procedure for Language Model Decoding appeared first on MarkTechPost.

Meet OmniControl: An Artificial Intelligence Approach for Incorporating Flexible Spatial Control Signals into a Text-Conditioned Human Motion Generation Model Based on the Diffusion Process

Aneesh Tickoo — Sun, 22 Oct 2023 18:50:18 +0000

https://neu-vi.github.io/omnicontrol/

" data-medium-file="https://www.marktechpost.com/wp-content/uploads/2023/10/ezgif-1-85a9358f8e-300x150.gif" data-large-file="https://www.marktechpost.com/wp-content/uploads/2023/10/ezgif-1-85a9358f8e.gif" />https://neu-vi.github.io/omnicontrol/

Researchers address the issue of combining spatial control signals over every joint at any given time into text-conditioned human motion production. Modern diffusion-based techniques may produce varied and lifelike human motion, but they find it difficult to incorporate variable spatial control signals, which are essential for many applications. For instance, a model must regulate the hand position to contact the cup at a particular place and time and understand “pick up” semantics to synthesize the action for picking up a cup. Similarly, when moving through a room with low ceilings, a model must carefully regulate the height of the head for a certain amount of time to avoid accidents.

Since they are difficult to explain in the textual prompt, these control signals are often delivered as global positions of joints of interest in keyframes. However, previous inpainting-based approaches cannot incorporate flexible control signals due to their chosen relative human posture representations. The limits are mostly caused by the relative locations of the joints and the pelvis with respect to one another and the prior frame. The global pelvic position supplied in the control signal must thus be translated to a relative location concerning the previous frame to be input to the keyframe. Similar to how other joints’ positions must be input, the global position of the pelvis must also be converted.

However, the pelvis’ relative locations between the diffusion generation process must be more present or corrected in both instances. To integrate any spatial control signal on joints other than the pelvis, one must first need help managing sparse limitations on the pelvis. Others present a two-stage model, but it still has trouble regulating other joints due to the limited control signals over the pelvis. In this study, researchers from Northeastern University and Google Research suggest OmniControl, a brand-new diffusion-based human generation model that may include flexible spatial control signals over any joint at any given moment. Building on OmniControl, realism guiding is added to regulate the creation of human movements.

Figure 1: Given a written prompt and adaptable spatial control signals, OmniControl can produce convincing human gestures. Later frames in the series are indicated by darker colours. The input control signals are shown by the green line or points.

For the model to work well, they use the same relative human posture representations for input and output. However, they suggest, in contrast to current approaches, converting the produced motion to global coordinates for direct comparison with the input control signals in the spatial guidance module, where the gradients of the error are employed to improve the motion. It resolves the shortcomings of the earlier inpainting-based methods by removing the uncertainty regarding the relative locations of the pelvis. Additionally, compared to previous approaches, it enables dynamic iterative refining of the produced motion, improving control precision.

Although successfully enforcing space limits, spatial guidance alone frequently results in drifting issues and abnormal human movements. They present the realism guidance, which outputs the residuals w.r.t. the features in each attention layer of the motion diffusion model, to solve these problems by drawing inspiration from the controlled picture production. These residuals can explicitly and densely alter whole-body motion. To produce realistic, coherent, and consistent movements with spatial restrictions, both the spatial and the realism guidance are crucial, and they are complementary in balancing control precision and motion realism.

Studies using HumanML3D and KIT-ML demonstrate that OmniControl performs significantly better than the most advanced text-based motion generation techniques for pelvic control in terms of both motion realism and control accuracy. However, incorporating the spatial limitations over any joint at any moment is where OmniControl excels. Additionally, as illustrated in Fig. 1, they may train a single model to control numerous joints collectively rather than separately (for example, both the left and right wrists).

These features of OmniControl make it possible for several downstream applications, such as tying produced a human motion to the surrounding scenery and objects, as seen in Fig. 1’s last column. Their brief contributions are: (1) As far as they are aware, OmniControl is the first strategy capable of combining spatial control signals over any joint at any moment. (2) To successfully balance the control precision and motion realism in the produced motion, they suggest a unique control module that uses spatial and realism guidance. (3) Tests demonstrate that OmniControl can control additional joints using a single model in text-based motion creation, setting a new standard for controlling the pelvis and opening up various applications in human motion production.

Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post Meet OmniControl: An Artificial Intelligence Approach for Incorporating Flexible Spatial Control Signals into a Text-Conditioned Human Motion Generation Model Based on the Diffusion Process appeared first on MarkTechPost.

Demystifying Generative Artificial Intelligence: An In-Depth Dive into Diffusion Models and Visual Computing Evolution

Aneesh Tickoo — Sat, 21 Oct 2023 14:04:43 +0000

https://arxiv.org/abs/2310.07204

" data-medium-file="https://www.marktechpost.com/wp-content/uploads/2023/10/F8O7qBmWkAAzwoU-300x202.jpeg" data-large-file="https://www.marktechpost.com/wp-content/uploads/2023/10/F8O7qBmWkAAzwoU-1024x690.jpeg" />https://arxiv.org/abs/2310.07204

To combine computer-generated visuals or deduce the physical characteristics of a scene from pictures, computer graphics, and 3D computer vision groups have been working to create physically realistic models for decades. Several industries, including visual effects, gaming, image and video processing, computer-aided design, virtual and augmented reality, data visualization, robotics, autonomous vehicles, and remote sensing, among others, are built on this methodology, which includes rendering, simulation, geometry processing, and photogrammetry. An entirely new way of thinking about visual computing has emerged with the rise of generative artificial intelligence (AI). With only a written prompt or high-level human instruction as input, generative AI systems enable the creation and manipulation of photorealistic and styled photos, movies, or 3D objects.

These technologies automate several time-consuming tasks in visual computing that were previously only available to specialists with in-depth topic expertise. Foundation models for visual computing, such as Stable Diffusion, Imagen, Midjourney, or DALL-E 2 and DALL-E 3, have opened the unparalleled powers of generative AI. These models have “seen it all” after being trained on hundreds of millions to billions of text-image pairings, and they are incredibly vast, with just a few billion learnable parameters. These models were the basis for the generative AI tools mentioned above and were trained on an enormous cloud of powerful graphics processing units (GPUs).

The diffusion models based on convolutional neural networks (CNN) frequently used to generate images, videos, and 3D objects integrate text calculated using transformer-based architectures, such as CLIP, in a multi-modal fashion. There is still room for the academic community to make significant contributions to the development of these tools for graphics and vision, even though well-funded industry players have used a significant amount of resources to develop and train foundation models for 2D image generation. For example, it needs to be clarified how to adapt current picture foundation models for use in other, higher-dimensional domains, such as video and 3D scene creation.

A need for more specific kinds of training data mostly causes this. For instance, there are many more examples of low-quality and generic 2D photos on the web than of high-quality and varied 3D objects or settings. Furthermore, scaling 2D image creation systems to accommodate greater dimensions, as necessary for video, 3D scene, or 4D multi-view-consistent scene synthesis, is not immediately apparent. Another example of a current limitation is computation: even though an enormous amount of (unlabeled) video data is available on the web, current network architectures are frequently too inefficient to be trained in a reasonable amount of time or on a reasonable amount of compute resources. This results in diffusion models being rather slow at inference time. This is due to their networks’ large size and iterative nature.

Figure 1: The theory and application of diffusion models for visual computing are covered in this cutting-edge paper. Recently, these models have taken over as the accepted norm for creating and modifying images, videos, and objects in 3D and 4D.

Despite the unresolved issues, the number of diffusion models for visual computing has increased dramatically in the past year (see illustrative examples in Fig. 1). The objectives of this state-of-the-art report (STAR) developed by researchers from multiple universities are to offer an organized review of the numerous recent publications focused on applications of diffusion models in visual computing, to teach the principles of diffusion models, and to identify outstanding issues.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post Demystifying Generative Artificial Intelligence: An In-Depth Dive into Diffusion Models and Visual Computing Evolution appeared first on MarkTechPost.

Meet SwimXYZ: A Synthetic Dataset of Swimming Motions and Videos Containing 3.4M Frames Annotated with Ground Truth 2D and 3D Joints

Aneesh Tickoo — Fri, 20 Oct 2023 09:04:09 +0000

https://g-fiche.github.io/research-pages/swimxyz/

" data-medium-file="https://www.marktechpost.com/wp-content/uploads/2023/10/ezgif-5-b8f61baff5-300x169.gif" data-large-file="https://www.marktechpost.com/wp-content/uploads/2023/10/ezgif-5-b8f61baff5.gif" />https://g-fiche.github.io/research-pages/swimxyz/

Human motion capture has emerged as a key tool in various industries, including sports, medical, and character animation for the entertainment sector. Motion capture is utilized in sports for multiple purposes, including injury prevention, injury analysis, video game industry animations, and even generating informative visualization for TV broadcasters. Traditional motion capture systems provide solid results in the majority of circumstances. Still, they are expensive and time-consuming to set up, calibrate, and post-process, making them difficult to utilize on a broad scale. These concerns are made worse for aquatic activities like swimming, which bring up unique problems such as marker reflections or the installation of underwater cameras.

Recent developments have enabled capturing motion from RGB photos and films using simple, affordable devices. These real-time, single-camera systems might open the door for the widespread application of motion capture during sporting events by utilizing existing live video data. It might be used in small structures to enhance amateur athletes’ training programs. However, because of a need for more data, they face several obstacles when using computer vision-based motion capture for swimming. Every Human Pose and Shape (HPS) estimate approach, whether 2D (2D joints, body segmentation) or 3D (3D joints, virtual markers), must extract information from the image. However, computer-vision algorithms trained on traditional datasets need help handling aquatic data since it differs greatly from the training pictures.

Recent advancements in HPS estimation demonstrated that synthetic data might replace or supplement actual pictures. They introduce SwimXYZ to broaden the application of image-based motion capture techniques in swimming. SwimXYZ is an artificial dataset featuring swimming-specific films annotated with 2D and 3D joints from real swimming pools. The 3.4 million frames of the 11520 movies that make up SwimXYZ vary in camera perspective, subject and water look, lighting, and action. Along with 240 synthetic swimming motion sequences in SMPL format, SwimXYZ offers a variety of body forms and swimming motions.

Researchers from CentraleSupélec, IETR UMR, Centrale Nantes and Université Technologique de Compiègne established SwimXYZ in this study, a sizable collection of artificial swimming movements and films that will be made available online when the paper is accepted.SwimXYZ’s trials demonstrate the potential for motion capture in swimming, and their goal is to help make it more widely used. Future studies may employ movements in the SMPL format for training pose and motion priors or swimming stroke classifiers in addition to the films given by SwimXYZ for training 2D and 3D pose estimation models. SwimXYZ’s lack of variety in subjects (gender, body type, and swimming suit look) and locations (outside environment, pool floor) may be rectified in future works. Other enhancements can include other annotations (such as segmentation and depth maps) or the addition of additional swimming motions, such as dives and turnarounds.

Check out the Paper and Project Page. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post Meet SwimXYZ: A Synthetic Dataset of Swimming Motions and Videos Containing 3.4M Frames Annotated with Ground Truth 2D and 3D Joints appeared first on MarkTechPost.

Are Pre-Trained Foundation Models the Future of Molecular Machine Learning? Introducing Unprecedented Datasets and the Graphium Machine Learning Library

Aneesh Tickoo — Thu, 19 Oct 2023 05:33:50 +0000

https://arxiv.org/abs/2310.04292

" data-medium-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-19-at-11.02.51-AM-300x204.png" data-large-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-19-at-11.02.51-AM-1024x698.png" />https://arxiv.org/abs/2310.04292

The recent results of machine learning in drug discovery have been largely attributed to graph and geometric deep learning models. These techniques have proven effective in modeling atomistic interactions, molecular representation learning, 3D and 4D situations, activity and property prediction, force field creation, and molecular production. Like other deep learning techniques, they need a lot of training data to provide excellent modeling accuracy. However, most training datasets in the present literature on treatments have small sample sizes. Surprisingly, recent developments in self-supervised learning, foundation models for computer vision and natural language processing, and deep understanding have significantly increased data efficiency.

In reality, it is demonstrated that the learned inductive bias reduces the data needs for downstream tasks by spending upfront in pre-training huge models with plenty of data, a one-time expense. After these accomplishments, other research has examined the advantages of pre-training large molecular graph neural networks for low-data molecular modeling. Due to the lack of big, labeled molecular datasets, these investigations could only use self-supervised approaches like contrastive learning, autoencoders, or denoising tasks. Only a small portion of the improvement made by self-supervised models in NLP and CV has yet been produced by low-data modeling attempts by fine-tuning from these models.

Since molecules’ and their conformers’ behavior depends on their environment and is primarily controlled by quantum physics, this is partially explained by the underspecification of molecules and their conformers as graphs. For instance, it is widely known that molecules with comparable structures can exhibit significantly varying levels of bioactivity, a phenomenon known as an activity cliff, which restricts graph modeling based only on structural data. According to their argument, developing efficient base models for molecular modeling necessitates supervised training using information derived from quantum mechanical descriptions and biological environment-dependent data.

Researchers from Québec AI Institute ,Valence Labs ,Université de Montréal, ,McGill University ,Graphcore ,New Jersey Institute of Technology ,RWTH Aachen University and HEC Montré makes three contributions to molecular research. They start by presenting a brand-new family of multitask datasets that are orders of magnitude bigger than the state of the art. Second, they discuss Graphium, a graph machine learning package enabling effective training on enormous datasets. Third, various baseline models demonstrate the benefit of training on multiple tasks. They provide three comprehensive and rigorously maintained multi-label datasets, the largest currently, with approximately 100 million molecules and over 3000 activities with sparse definitions. These datasets combine labels that describe quantum and biological features that have been learned through simulation and wet lab testing, and they have been created for the supervised training of foundation models. The responsibilities covered by the labels span both the node-level and the graph-level.

The variety of labels makes it easier to acquire transfer skills effectively. It makes it possible to build fundamental models by increasing the generalizability of such models for various downstream molecular modeling activities. They meticulously vetted and added new information to the existing data to produce these extensive databases. As a result, descriptions of each molecule in their collection include information about its quantum mechanical characteristics and biological functions. The QM characteristics’ energy, electrical, and geometric components are calculated using various cutting-edge techniques, including semi-empirical techniques like PM6 and approaches based on density functional theory, such as B3LYP. As shown in Figure 1, their databases on biological activity include molecular signatures from toxicological profiling, gene expression profiling, and dose-response bioassays.

Figure 1: A visual overview of the suggested molecular dataset collections. The “mixes” are designed to be anticipated concurrently while doing several tasks. They comprise jobs at the graph level and node level, as well as quantum, chemical, and biological aspects, categorical and continuous data points.

The simultaneous modeling of quantum and biological effects promotes the capacity to characterize complicated environment-dependent features of molecules that would be impossible to obtain from what are often small experimental datasets. The Library of Graphium Has created a complete graph machine learning toolkit called Graphium to enable effective training on these enormous multitask datasets. This innovative library streamlines the creation and training of molecular graph foundation models by including feature ensembles and complicated feature interactions. Graphium addresses the limitations of previous frameworks primarily intended for sequential samples with little interaction between node, edge, and graph characteristics by considering features and representations as essential building components and adding cutting-edge GNN layers.

Additionally, Graphium handles the crucial and otherwise hard engineering of training models on huge dataset ensembles in a simple and highly configurable manner by offering features like dataset combination, addressing missing data, and joint training. Baseline Findings For the dataset mixtures offered, they train various models in single-dataset and multi-dataset scenarios. These provide reliable baselines that may serve as a reference point for upcoming users of these datasets and also offer some insight into the advantages of training using this multi-dataset methodology. Results for these models specifically demonstrate that training low-resource tasks may be greatly enhanced by movement in conjunction with bigger datasets.

In conclusion, this work offers the biggest 2D molecular datasets. These datasets were created expressly to train foundation models that can accurately understand molecules’ quantum characteristics and biological flexibility and, as a result, be tailored to various downstream applications. Additionally, they created the Graphium library to simplify the training of these models and provide different baseline results that demonstrate the potency of the datasets and library being used.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post Are Pre-Trained Foundation Models the Future of Molecular Machine Learning? Introducing Unprecedented Datasets and the Graphium Machine Learning Library appeared first on MarkTechPost.

This AI Research Presents RoboHive: A Comprehensive Software Platform and Ecosystem for Research in the Field of Robot Learning and Embodied Artificial Intelligence

Aneesh Tickoo — Wed, 18 Oct 2023 08:48:01 +0000

https://sites.google.com/view/robohive

" data-medium-file="https://www.marktechpost.com/wp-content/uploads/2023/10/ezgif-5-1256e08830-300x169.gif" data-large-file="https://www.marktechpost.com/wp-content/uploads/2023/10/ezgif-5-1256e08830.gif" />https://sites.google.com/view/robohive

In recent years, artificial intelligence (AI) advancements have been made, notably in language modeling, protein folding, and gameplay. The development of robot learning has been modest. Moravec’s paradox, which holds that sensorimotor behaviors are inherently harder for AI agents than high-level cognitive activities, might be partly blamed for this slower progress. In addition, they must focus on a critical issue that is as important: the complexity of software frameworks for robot learning and the absence of common benchmarks. As a result, the entrance hurdle is raised, quick prototyping is restricted, and the flow of ideas is constrained. The discipline of robotics continues to be more fragmented than others, such as computer vision or natural language processing, where benchmarks and datasets are standardized.

Researchers from U.Washington, UC Berkeley, CMU, UT Austin, Open AI, Google AI, and Meta-AI provide RoboHive, an integrated environment designed specifically for robot learning, to close this gap. RoboHive is a platform that serves as both a benchmarking and research tool. To enable a variety of learning paradigms, including reinforcement, imitation, and transfer learning, it offers a wide range of contexts, specific task descriptions, and strict assessment criteria. For researchers, this makes efficient investigation and prototyping possible. In addition, RoboHive provides customers with hardware integration and teleoperation capabilities, allowing for a smooth transition between real-world and virtual robots. They want to close the gap between robot learning’s present status and its potential for development using RoboHive. The creation and open-sourcing of the RoboHive, a unified framework for robot learning, is the main contribution of their work.

RoboHive’s salient characteristics include:

1. The Environment Zoo: RoboHive offers various settings spanning various academic fields. These settings may be used for manipulation tasks, including dexterity in-hand manipulation, movement with bipedal and quadrupedal robots, and even manipulation using musculoskeletal arm-hand models. They use MuJoCo to power their virtual worlds, which offer quick physics simulation and are made with a focus on physical realism.

2. RoboHive presents a unifying RobotClass abstraction that smoothly interacts with virtual and actual robots via simhooks and hardware hooks. By changing a single flag, this special capability enables researchers to easily interact with robotic hardware and translate their discoveries from simulation to reality.

3. Teleoperation Support and Expert Dataset: RoboHive has out-of-the-box teleoperation capabilities via various modalities, including a keyboard, 3D space mouse, and virtual reality controllers. They are sharing RoboSet, one of the largest real-world manipulation datasets amassed by human teleoperation, which covers 12 abilities across several culinary chores. Researchers working in imitation learning, offline learning, and related disciplines will find these teleoperation capabilities and datasets especially helpful.

4. Visual Diversity and Physics Fidelity: RoboHive emphasizes projects with great physical realism and extensive visual diversity, surpassing prior benchmarks, to reveal the next research frontier in real-world robots. They link visuomotor control studies with the visual difficulties of everyday life by including complex assets, rich textures, and enhanced scene arrangement. Additionally, RoboHive natively enables scene layout and visual domain randomization in various situations, boosting visual perception’s adaptability and delivering realistic and rich physical material.

5. Metrics and Baselines RoboHive uses short and unambiguous metrics to assess algorithm performance in various situations. The framework offers a user-friendly gym-like API for seamless integration with learning algorithms, allowing accessibility for multiple academics and practitioners. Additionally, RoboHive contains thorough baseline results for frequently researched algorithms within the research community in partnership with TorchRL and mjRL, providing a benchmark for performance comparison and study.

Check out the Paper and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post This AI Research Presents RoboHive: A Comprehensive Software Platform and Ecosystem for Research in the Field of Robot Learning and Embodied Artificial Intelligence appeared first on MarkTechPost.

Researchers from Stanford and Microsoft Introduce Self-Improving AI: Leveraging GPT-4 to Elevate Scaffolding Program Performance

Aneesh Tickoo — Tue, 17 Oct 2023 09:17:27 +0000

https://arxiv.org/abs/2310.02304

" data-medium-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-17-at-2.45.54-PM-300x188.png" data-large-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-17-at-2.45.54-PM-1024x640.png" />https://arxiv.org/abs/2310.02304

Almost every aim described in natural language may be optimized by querying a language model. However, a program may frequently provide outputs with greater objective values by making several organized calls to a language model. They refer to these as “scaffolding” programs, and they are often created (by people) using a computer language like Python. Their main finding is that a scaffolding program’s design is an optimization issue for any distribution over optimization problems and any given language model. Researchers from Microsoft Research and Stanford University in this paper describe the Self-Taught Optimizer (STOP), a technique in which the recursive application of code that uses a language model to enhance any given solution leads to self-improvement.

Their method starts with an initial seed “improver” scaffolding program that uses the language model to enhance a response to a subsequent challenge. The model improves this improver program as the system iterates. To measure the effectiveness of their self-optimizing architecture, they apply a limited selection of downstream algorithmic tasks. Their findings show that the model improves as it runs through more iterations using its self-improvement techniques. STOP demonstrates how language models may function as their meta-optimizers in this way. In addition, they analyze the kind of self-improvement tactics the model (see Figure 1) suggests, how well the recommended strategies translate to downstream tasks, and if the model is vulnerable to risky self-improvement techniques.

Figure 1: Examples of self-improvement techniques suggested and used by GPT-4 are shown here. The arbitrary code, including the scaffolding code itself, is then revised using each technique as scaffolding.

Since the underlying language model is unaltered, this issue is known as recursively self-improving code generation, which is inspired by but not entirely a Recursively Self-Improving (RSI) system. It has been at least 50 years since researchers formalized the concept of RSI. That effort, however, concentrated on creating systems that were more competent in general and made the assumption that the model could improve every part of its code. Their research is a modest step in that direction because it only considers the model’s capacity to enhance the scaffold that invokes it iteratively. The RSI-code-generation problem is first stated mathematically well-defined in this study.

Then, they create and assess STOP to illustrate the possible use of RSI-code generation. Different downstream jobs have demonstrated improvements. When utilizing a version of the GPT-4 language model trained on data up to 2021, far in advance of the debut of most scaffolding systems, Figure 1 demonstrates a few of the intriguing and useful scaffolds STOP offers. Additional tests track how frequently the model tries to turn off a sandbox flag. Finally, they tackle issues with the ethical development of such technology.

The main contributions of this work are:

Formulating a meta-optimization strategy where a scaffolding system recursively improves itself.
Demonstrating that this system can successfully recursively improve itself using a modern language model (GPT-4 in particular).
Examining the self-improvement techniques proposed and implemented by the model, including how the model avoids safety precautions like a sandbox.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post Researchers from Stanford and Microsoft Introduce Self-Improving AI: Leveraging GPT-4 to Elevate Scaffolding Program Performance appeared first on MarkTechPost.