Tanya Malhotra, Author at MarkTechPost

This AI Paper Presents Video Language Planning (VLP): A Novel Artificial Intelligence Approach that Consists of a Tree Search Procedure with Vision-Language Models and Text-to-Video Dynamics

Tanya Malhotra — Thu, 26 Oct 2023 10:18:41 +0000

https://video-language-planning.github.io/

" data-medium-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-26-at-3.17.07-AM-300x176.png" data-large-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-26-at-3.17.07-AM-1024x601.png" />https://video-language-planning.github.io/

With the constantly advancing applications of Artificial Intelligence, generative models are growing at a fast pace. The idea of intelligently interacting with the physical environment has been a topic of discussion as it highlights the significance of planning at two different levels: low-level underlying dynamics and high-level semantic abstractions. These two layers are essential for robotic systems to be properly controlled to carry out activities in the actual world.

The notion of dividing the planning problem into these two layers has long been recognized in robotics. As a result, many strategies have been developed, including combining motion with task planning and determining control rules for intricate manipulation jobs. These methods seek to produce plans that consider the goals of the work and the dynamics of the real environment. Talking about LLMs, these models can create high-level plans using symbolic job descriptions but have trouble implementing such plans. When it comes to the more tangible parts of tasks, such as shapes, physics, and limitations, they are incapable of reasoning.

In recent research, a team of researchers from Google Deepmind, MIT, and UC Berkeley has proposed merging text-to-video and vision-language models (VLMs) to overcome the drawbacks. By combining the advantages of both models, this integration, known as Video Language Planning (VLP), has been introduced. VLP has been introduced with the goal of facilitating visual planning for long-horizon, complex activities. This method makes use of recent developments in huge generative models that have undergone extensive pre-training on internet data. VLP’s major objective is to make it easier to plan jobs that call for lengthy action sequences and comprehension in both the language and visual domains. These jobs could involve anything from simple object rearrangements to complex robotic system operations.

The foundation of VLP is a tree search process that has two primary parts, which are as follows.

Vision-Language Models: These models fulfill the roles of both value functions and policies and support the creation and evaluation of plans. They are able to suggest the next course of action to complete the work after comprehending the task description and the available visual information.

Models for Text-to-Video: These models serve as dynamics models as they have the ability to foresee how certain decisions will have an impact. They predict potential results derived from the behaviors suggested by the vision-language models.

A long-horizon task instruction and the current visual observations are the two primary inputs used by VLP. A complete and detailed video plan is the result of VLP, which provides step-by-step instructions on accomplishing the ultimate objective by combining language and visual features. It does a good job of bridging the gap between written work descriptions and visual comprehension.

VLP can do a variety of activities, including bi-arm dexterous manipulation and multi-object rearrangement. This flexibility demonstrates the approach’s wide range of possible applications. Real robotic systems may realistically implement the generated video blueprints. Goal-conditioned rules facilitate this conversion of the virtual plan into actual robot behaviors. These regulations enable the robot to carry out the task step-by-step by using each intermediate frame of the video plan as a guide for its actions.

Comparing experiments using VLP to earlier techniques, significant gains in long-horizon task success rates have been seen. These investigations have been carried out on real robots employing three different hardware platforms and in simulated situations.

Check out the Paper, Github, and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post This AI Paper Presents Video Language Planning (VLP): A Novel Artificial Intelligence Approach that Consists of a Tree Search Procedure with Vision-Language Models and Text-to-Video Dynamics appeared first on MarkTechPost.

Microsoft Researchers Introduce Table-GPT: Elevating Language Models to Excel in Two-Dimensional Table Understanding and Tasks

Tanya Malhotra — Wed, 25 Oct 2023 09:00:00 +0000

https://arxiv.org/abs/2310.09263

" data-medium-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-25-at-2.59.02-AM-300x171.png" data-large-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-25-at-2.59.02-AM-1024x585.png" />https://arxiv.org/abs/2310.09263

With the recent developments in the field of Artificial intelligence, Large Language Models, including GPT and LLaMa, are continuously showing remarkable performance over a broad spectrum of natural language tasks. These models have been proven effective in various domains and have advanced the field of Natural Language processing to a great extent. Language models are capable of taking directions from humans and carrying out different jobs. However, there comes a drawback, which is that these models have difficulty with tasks involving the knowledge of tables. This is because their primary training is one-dimensional natural language texts, whereas tables are two-dimensional structures, which accounts for this constraint.

To address this issue, a team of researchers has proposed the concept of table-tuning, an innovative way to alleviate this issue. This method entails further training or optimizing pre-existing language models, such as GPT-3.5 and ChatGPT, using a wide range of table-related tasks derived from actual tables. Enhancing these language models’ capacity to understand and manipulate tables is the main objective of table-tuning.

The Table-GPT models, which have been generated through table-tuning, exhibit improved capabilities in understanding tables. These models have consistently outperformed the standard GPT-3.5 and ChatGPT on a wide range of table-based tasks. This means they can more accurately interpret and manipulate tabular data. The Table-GPT models retain a high degree of generalizability even if they are specialized in table jobs. They are able to adjust to new activities involving tables because they can react to a range of human directions with effectiveness. This flexibility is comparable to ChatGPT’s capacity to manage a variety of natural language jobs and the original GPT-3.5.

The primary contributions have been summarized as follows.

Table-Tuning Paradigm: Table-Tuning paradigm has been introduced, which involves training language models one more time with the express purpose of improving their efficiency in tasks involving tables. It employs a variety of table-based jobs that are synthesized from actual tables using a synthesize-then-augment methodology.

Data Augmentation approaches: Task-level, table-level, instruction-level, and completion-level data augmentation approaches have been developed at different levels. These methods are essential for maintaining Table-GPT’s generalizability and preventing overfitting. By adding value to the training set, they strengthen the model.

Performance in Table-Tasks: Out of the box, Table-GPT exhibits exceptional competence in table-based tasks in both zero-shot and few-shot scenarios. This indicates that the model can perform these tasks quite well, even with little in the way of specialized training or examples.

Table-GPT’s adaptability makes it suitable for use as a table foundation model. When it comes to downstream single-task optimizations such as task-specific fine-tuning and prompt engineering, it can be a better place to start than the vanilla GPT. This demonstrates how useful it is for a variety of purposes outside of table work.

In summary, the suggested table-tuning paradigm provides a way to overcome the difficulty of teaching language models how to use tables. It improves their comprehension of two-dimensional data structures and gives them the tools they need to succeed in a wide range of table-related jobs, both well-known and unknown.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post Microsoft Researchers Introduce Table-GPT: Elevating Language Models to Excel in Two-Dimensional Table Understanding and Tasks appeared first on MarkTechPost.

SalesForce AI Introduces CodeChain: An Innovative Artificial Intelligence Framework For Modular Code Generation Through A Chain of Self-Revisions With Representative Sub-Modules

Tanya Malhotra — Sat, 21 Oct 2023 11:56:12 +0000

https://arxiv.org/abs/2310.08992

" data-medium-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-21-at-2.55.10-PM-300x196.png" data-large-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-21-at-2.55.10-PM-1024x669.png" />https://arxiv.org/abs/2310.08992

A major objective in the study of Artificial Intelligence is the development of AI systems that can provide useful computer programs to address challenging issues. Much progress has been made in this direction in recent years, especially with the remarkable successes of massive pretrained Large Language Models (LLMs). These models were first created for natural language comprehension, but they have now expanded to include the ability to generate and comprehend code and text. Notable progress has been made in producing code from descriptions of natural language problems as a result of this development.

LLMs have already proven themselves capable of handling straightforward programming tasks, as seen by their achievements in benchmarks such as MBPP and HumanEval. However, these models encounter significant difficulties when trying to solve more difficult and competitive programming tasks. Their propensity to provide code solutions as monolithic blocks rather than decomposing them into logical subtasks and reusable sub-modules is one of the primary causes of their difficulties. On the other hand, when faced with complex problems, skilled human programmers instinctively write modular and abstract code. By reusing previously created modules, they effectively expand upon their current expertise.

In a recent research, a team of researchers from Salesforce Research has introduced CodeChain, an innovative framework for bridging the gap between LLMs and human developers. With a sequence of self-revisions driven by representative sub-modules developed in earlier iterations, this framework aims to improve the process of developing modularized code. CodeChain tells the LLM to write modularized code using a chain-of-thought approach. The intention is to motivate the model to approach problem-solving in terms of logical subtasks and submodules.

A sequence of self-revisions forms the basis of CodeChain. There are two iterative phases in it, which are as follows.

Sub-Module Extraction and Clustering: In this stage, sub-modules are found by analyzing the code that the LLM produced. After that, these sub-modules are arranged into clusters. Representative sub-modules are chosen from each cluster. These representations are thought to be more widely applicable and reusable.

Prompt Augmentation and Re-Generation: The initial chain-of-thought prompt is enhanced and regenerated by integrating the chosen module implementations from the preceding stage. After that, the LLM is told to produce fresh modularized solutions once more. As a result, the model can effectively expand upon the information and understanding that it has obtained from earlier iterations.

CodeChain has a great impact on code generation. The team has shared that the modularity and accuracy of generated solutions are greatly improved by pushing the LLM to build upon and reuse pre-existing, verified sub-modules. Relative pass@1 improvements have been achieved by the framework on APPS of 35% and on CodeContests of an astounding 76%. These gains are shown in a variety of LLMs, including open-source LLMs like WizardCoder and models from OpenAI. Comprehensive ablation studies have been carried out to gain a deeper understanding of the elements that have contributed to CodeChain’s success. Aspects such as prompting techniques, the number of clusters employed, the sizes of the LLM models, and the caliber of the programs produced are all examined in these studies. The understanding obtained from these investigations clarifies why CodeChain is so successful in raising the caliber and modularity of code produced by LLMs.

To sum up, CodeChain is a revolutionary development in the field of large language model code generation. It achieves this by promoting modularity and facilitating self-revisions by reusing previously created sub-modules, hence bridging the gap between LLMs and seasoned human programmers.

Check out the Paper. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post SalesForce AI Introduces CodeChain: An Innovative Artificial Intelligence Framework For Modular Code Generation Through A Chain of Self-Revisions With Representative Sub-Modules appeared first on MarkTechPost.

Meet ScaleCrafter: Unlocking Ultra-High-Resolution Image Synthesis with Pre-trained Diffusion Models

Tanya Malhotra — Fri, 20 Oct 2023 19:03:23 +0000

https://arxiv.org/abs/2310.07702v1

" data-medium-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-20-at-10.00.04-PM-300x248.png" data-large-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-20-at-10.00.04-PM-1024x847.png" />https://arxiv.org/abs/2310.07702v1

The development of image synthesis techniques has experienced a notable upsurge in recent years, garnering major interest from the academic and industry worlds. Text-to-image generation models and Stable Diffusion (SD) are the most widely used developments in this field. Although these models have demonstrated remarkable abilities, they can only currently produce images with a maximum resolution of 1024 x 1024 pixels, which is insufficient to satisfy the requirements of high-resolution applications like advertising.

Problems develop when trying to generate images larger than these training resolutions, mostly with object repetition and deformed object architectures. Object duplication becomes more problematic as the image size increases if a Stable Diffusion model is used to generate images at dimensions of 512 × 512 or 1024 x 1024, having been trained on 512 x 512 images.

In the resulting graphics, these problems mostly show up as object duplication and incorrect object topologies. The existing methods for creating higher-resolution images, such as those based on joint-diffusion techniques and attention mechanisms, find it difficult to adequately address these issues. Researchers have examined the U-Net architecture’s structural elements in diffusion models by pinpointing a crucial element causing the problems, which is convolutional kernels’ constrained perceptual fields. Basically, issues like object recurrence arise because the model’s convolutional procedures are limited in their capacity to see and comprehend the content of the input images.

A team of researchers has proposed ScaleCrafter for higher-resolution visual generation at inference time. It uses re-dilation, a simple yet incredibly powerful solution that enables the models to handle greater resolutions and varying aspect ratios more effectively by dynamically adjusting the convolutional perceptual field throughout the picture production process. The model can enhance the coherence and quality of the generated images by dynamically adjusting the receptive field. The work presents two further advances: dispersed convolution and noise-damped classifier-free guidance. With this, the model can produce ultra-high-resolution photographs, up to 4096 by 4096 pixel dimensions. This method doesn’t require any extra training or optimization stages, making it a workable solution for high-resolution picture synthesis’s repetition and structural problems.

Comprehensive tests have been carried out for this study, which showed that the suggested method successfully addresses the object repetition issue and delivers cutting-edge results in producing images with higher resolution, especially excelling in displaying complex texture details. This work also sheds light on the possibility of using diffusion models that have already been trained on low-resolution images to generate high-resolution visuals without requiring a lot of retraining, which could guide future work in the field of ultra-high-resolution image and video synthesis.

The primary contributions have been summarized as follows.

The team has found that rather than the number of attention tokens, the primary cause of object repetition is the convolutional procedures’ constrained receptive field.

Based on these findings, the team has proposed a re-dilation approach that dynamically increases the convolutional receptive field while inference is underway, which tackles the root of the issue.

Two innovative strategies have been presented: dispersed convolution and noise-damped classifier-free guidance, specifically meant to be used in creating ultra-high-resolution images.

The method has been applied to a text-to-video model and has been comprehensively evaluated across a variety of diffusion models, including different iterations of Stable Diffusion. These tests include a wide range of aspect ratios and image resolutions, showcasing the model’s effectiveness in addressing the problem of object recurrence and improving high-resolution image synthesis.

Check out the Paper and Github. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post Meet ScaleCrafter: Unlocking Ultra-High-Resolution Image Synthesis with Pre-trained Diffusion Models appeared first on MarkTechPost.

Researchers from Stanford, NVIDIA, and UT Austin Propose Cross-Episodic Curriculum (CEC): A New Artificial Intelligence Algorithm to Boost the Learning Efficiency and Generalization of Transformer Agents

Tanya Malhotra — Thu, 19 Oct 2023 14:56:48 +0000

Sequential decision-making problems are undergoing a major transition due to the paradigm shift brought about by the introduction of foundation models. These models, such as transformer models, have completely changed a number of fields, including planning, control, and pre-trained visual representation. Despite these impressive developments, applying these data-hungry algorithms to fields like robotics with less data presents a huge barrier. It raises the question of whether it is possible to maximize the limited amount of data that is accessible, irrespective of its source or quality, to support more effective learning.

To address these challenges, a group of researchers has recently presented a unique algorithm named Cross-Episodic Curriculum (CEC). The CEC technique takes advantage of the ways in which different experiences are distributed differently when they are arranged into a curriculum. The goal of CEC is to improve Transformer agents’ learning and generalization efficiency. The fundamental concept of CEC is the incorporation of cross-episodic experiences into a Transformer model to create a curriculum. Online learning trials and mixed-quality demos are arranged in a step-by-step fashion in this curriculum, which captures the learning curve and the improvement in skill across several episodes. CEC creates a strong cross-episodic attention mechanism using Transformer models’ potent pattern recognition capabilities.

The team has provided two example scenarios to illustrate the efficacy of CEC, which are as follows.

DeepMind Lab’s Multi-Task Reinforcement Learning with Discrete Control: This scenario uses CEC to solve a discrete control multi-task reinforcement learning challenge. The curriculum developed by CEC captures the learning path in both individualized and progressively complicated contexts. This enables agents to gradually master increasingly difficult tasks by learning and adapting in small steps.

RoboMimic, Imitation Learning Using Mixed-Quality Data for Continuous Control – The second scenario, which is pertinent to RoboMimic, uses continuous control and imitation learning with mixed-quality data. The goal of the curriculum that CEC created is to record the increase in demonstrators’ level of expertise.

The policies produced by CEC perform exceptionally well and have strong generalizations in both scenarios, which suggests that CEC is a viable strategy for enhancing Transformer agents’ adaptability and learning efficiency in a variety of contexts. The Cross-Episodic Curriculum method comprises two essential steps, which are as follows.

Curricular Data Preparation: Curricular data preparation is the initial step in the CEC process. This entails putting the events in a particular order and structure. To clearly illustrate curriculum patterns, these events are arranged in a particular order. These patterns can take many different forms, such as policy improvement in single environments, learning progress in progressively harder environments, and an increase in the demonstrator’s expertise.

Cross-Episodic Attention Model Training: This is the second significant stage in training the model. The model is trained to anticipate actions during this training phase. The unique aspect of this method is that the model may look back at earlier episodes in addition to the current one. It is capable of internalizing the enhancements and policy adjustments noted in the curriculum data. Due to the model’s use of prior experience, learning can occur more efficiently.

Usually, colored triangles, which stand in for causal Transformer models, are used to show these stages visually. These models are essential to the CEC method because they make it easier to include cross-episodic events in the learning process. The model’s recommended actions, indicated by “a^,” are essential for making decisions.

Check out the Paper, Code, and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post Researchers from Stanford, NVIDIA, and UT Austin Propose Cross-Episodic Curriculum (CEC): A New Artificial Intelligence Algorithm to Boost the Learning Efficiency and Generalization of Transformer Agents appeared first on MarkTechPost.

Amazon Researchers Present a Deep Learning Compiler for Training Consisting of Three Main Features- a Syncfree Optimizer, Compiler Caching, and Multi-Threaded Execution

Tanya Malhotra — Wed, 18 Oct 2023 17:39:00 +0000

https://www.amazon.science/publications/syncfree-optimizers-and-compiler-improvements-for-efficient-model-training

" data-medium-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-18-at-11.07.05-PM-300x266.png" data-large-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-18-at-11.07.05-PM.png" />https://www.amazon.science/publications/syncfree-optimizers-and-compiler-improvements-for-efficient-model-training

One of the biggest challenges in Machine Learning has always been to train and use neural networks efficiently. A turning point was reached with the introduction of the transformer model architecture, which created new opportunities for gradient descent parallelization and distribution strategies, enabling the training of bigger, more intricate models on a wider scale. However, the exponential increase in these models’ sizes has brought up a number of issues with memory limitations and GPU availability. A significant issue is that a lot of models are now larger than the RAM that can be found on a single GPU. The enormous disparities in size between pre-trained language and vision models present another challenge. The idea of compilation is a potentially effective remedy that can balance the needs for computing efficiency and model size.

In recent research, a team of researchers has introduced a deep learning compiler specifically made for neural network training. With three essential components, i.e., multi-threaded execution, compiler caching, and a sync-free optimizer, their work has shown remarkable speedups over traditional approaches, such as native implementations and PyTorch’s XLA (Accelerated Linear Algebra) framework, for both common language and vision problems.

This deep learning compiler has been developed with a sync-free optimizer implementation. Optimizers play a crucial role in neural network training as they modify model parameters in order to minimize the loss function. Synchronization barriers are a common feature of traditional optimizers and can cause a bottleneck in distributed training. A sync-free optimizer, on the other hand, seeks to lessen or do away with the requirement for synchronization, enabling more effective parallelism and better use of computational resources. This function is especially helpful when training speed and resource efficiency are negatively impacted by synchronization.

Another important feature of this deep-learning compiler is compiler caching. Pre-compiled representations of certain neural network or computation graph components are stored and reused through the process of caching. It is inefficient to rebuild the entire network from scratch every time you train a model. By saving and reusing previously built components, compiler caching seeks to alleviate this inefficiency and can drastically cut down on training time. This feature efficiently conserves computing resources by utilizing the advantages of earlier compilation attempts.

The third essential component is the multi-threaded execution. Neural network training frequently requires a large number of activities that can be parallelized. These operations can be completed concurrently on multi-core processors using multi-threading, which can result in significant speed increases. The compiler can speed up deep learning model training by optimizing the training procedure for multi-threaded execution, which allows it to utilize the hardware more effectively.

By contrasting their deep learning compiler with two well-established baselines, i.e., native implementations and the XLA framework inside the PyTorch deep learning framework, the team has illustrated the practical significance of these compiler characteristics. They have used these parallels to address prevalent issues in computer vision and natural language processing. When compared to these baseline methods, the results have demonstrated that their compiler can achieve a significant speedup and resource efficiency, highlighting the significance and promise of deep learning compilers in improving the effectiveness and practicality of neural network training for real-world applications.

In conclusion, this work is a major step forward in the field of deep learning and has the potential to speed up and optimize training procedures. These trials and findings of the research show the effectiveness of their changes to the PyTorch XLA compiler. These changes are extremely helpful for speeding up the training of neural network models across several domains and configurations.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post Amazon Researchers Present a Deep Learning Compiler for Training Consisting of Three Main Features- a Syncfree Optimizer, Compiler Caching, and Multi-Threaded Execution appeared first on MarkTechPost.

Researchers from Princeton Introduce ShearedLLaMA Models for Accelerating Language Model Pre-Training via Structured Pruning

Tanya Malhotra — Tue, 17 Oct 2023 13:30:36 +0000

Large Language Models (LLMs) have become extremely popular because of their outstanding capabilities in a variety of natural language tasks. Though they are growing at a fast pace, the massive computational resources needed to train these models are a major drawback. Consequently, there’s been a surge in interest in creating more compact and effective LLMs, such as LLaMA, MPT, and Falcon. These medium-sized models are intended to support various use cases by providing effective inference and fine-tuning. However, training even the smallest billion-parameter LLMs from the start is prohibitively expensive for many organizations due to the significant computational resources required.

Researchers have earlier demonstrated how like moderate-sized Large Language Models (LLMs) like LLaMA, smaller language models can be just as powerful. These models are thought to be a more effective substitute for large LLMs, which need a lot of processing power to train. In a recent study, a team of researchers studied the usefulness of structured pruning as a successful technique for reducing the size of bigger, pre-trained models into smaller LLMs. This method makes use of two essential strategies, which are as follows.

Targeted Structured Pruning: It is a technique that methodically eliminates layers, heads, intermediate, and hidden dimensions from a bigger language model in order to trim it to a target configuration. Because this procedure is carried out from beginning to end, the model’s coherence and functioning are preserved. It optimizes the model without sacrificing vital language comprehension abilities.

Dynamic Batch Loading: This method modifies the training data composition within each batch according to the changing loss levels in various domains. It makes sure that the model concentrates more on tasks or domains where it isn’t performing as well as it could be dynamically modifying the data samples utilized in each batch. It may effectively adjust its performance in this way, increasing overall efficiency.

Sheared-LLaMA-1.3B and Sheared-LLaMA-2.7B, two smaller LLMs created from the pruning of an LLaMA2-7B model, show how effective this suggested procedure is. This trimming procedure only consumes 50 billion tokens, or 5% of OpenLLaMA’s pre-training budget, of the training set. Notwithstanding these drawbacks, Sheared-LLaMA-1.3B and Sheared-LLaMA-2.7B perform better on a variety of 11 typical downstream jobs than other well-known LLMs of comparable scales, such Pythia, INCITE, and OpenLLaMA. These exercises address a variety of topics, including instruction tuning for open-ended generation, reading comprehension, common sense understanding, and world knowledge.

Additional training with more tokens may also result in even bigger benefits based on the performance trajectory of the pruned models. While the current study’s trials are restricted to models with a maximum of 7 billion parameters, the LLM-shearing technique is engineered to possess great generalizability and can be expanded to encompass big language models of any magnitude in prospective investigations.

To sum up, LLM shearing provides a complete approach to LLM size reduction via dynamic batch loading and focused structured pruning. The construction of Sheared-LaMA models that perform better than equivalent-sized models in a variety of downstream tasks is an effective demonstration of it. This method demonstrates how more effectively and economically smaller but strong LLMs can be developed, and it can be used for a wide range of model sizes.

Check out the Paper, Github, and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 31k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post Researchers from Princeton Introduce ShearedLLaMA Models for Accelerating Language Model Pre-Training via Structured Pruning appeared first on MarkTechPost.

This Artificial Intelligence Survey Research Provides A Comprehensive Overview Of Large Language Models Applied To The Healthcare Domain

Tanya Malhotra — Sun, 15 Oct 2023 10:58:17 +0000

https://arxiv.org/abs/2310.05694

" data-medium-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-15-at-4.25.57-PM-300x201.png" data-large-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-15-at-4.25.57-PM-1024x685.png" />https://arxiv.org/abs/2310.05694

Natural language processing (NLP) systems have long relied heavily on Pretrained Language Models (PLMs) for a variety of tasks, including speech recognition, metaphor processing, sentiment analysis, information extraction, and machine translation. With recent developments, PLMs are changing quickly, and new developments are showing that they can function as stand-alone systems. A major stride in this approach has been made with OpenAI’s development of Large Language Models (LLMs), such as GPT-4, which have shown improved performance in NLP tasks as well as in subjects like biology, chemistry, and medical tests. A new era of possibilities has begun with Google’s Med-PaLM 2, which is specifically designed for the medical sector and has attained “expert” level performance on medical question datasets.

LLMs have the power to revolutionize the healthcare industry by improving the efficacy and efficiency of numerous applications. These models can offer insightful analysis and answers to medical questions since they have a thorough understanding of medical ideas and terminologies. They can help with patient interactions, clinical decision support, and even the interpretation of medical imaging. There are also certain drawbacks to LLMs, including the requirement for substantial amounts of training data and the potential for biases in that data to be propagated.

In a recent research, a team of researchers surveyed about the capabilities of LLMs in healthcare. It is necessary to contrast these two types of language models in order to understand the significant improvement from PLMs to LLMs. Although PLMs are fundamental building blocks, LLMs have a wider range of capabilities that allow them to produce cohesive, context-aware responses in healthcare contexts. A change from discriminative AI approaches, in which models categorize or forecast events, to generative AI approaches, in which models produce language-based answers, may be seen in the switch from PLMs to LLMs. This shift further highlights the shift from model-centered to data-centered approaches.

There are many different models in the LLM world, each suited to a certain specialty. Notable models that have been specially tailored for the healthcare industry include HuatuoGPT, Med-PaLM 2, and Visual Med-Alpaca. HuatuoGPT, for example, asks questions to actively involve patients, whereas Visual Med-Alpaca works with visual experts to do duties like radiological picture interpretation. Because of their multiplicity, LLMs are able to tackle a variety of healthcare-related issues.

The training set, techniques, and optimization strategies used all have a significant impact on how well LLMs perform in healthcare applications. The survey explores the technical elements of creating and optimizing LLMs for use in medical settings. There are practical and ethical issues with the use of LLMs in healthcare settings. It is crucial to guarantee justice, responsibility, openness, and ethics when using LLM. Applications for Healthcare must be free from bias, follow moral guidelines, and give clear justifications for their answers—especially when patient care is involved.

The primary contributions have been summarized by the team as follows.

A transitional path from PLMs to LLMs has been shared, providing updates on new developments.

Focus has been put on assembling training materials, assessment tools, and data resources for LLMs in the healthcare industry and to help medical researchers choose the best LLMs for their individual requirements.

Moral issues, including impartiality, equity, and openness, have been examined.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post This Artificial Intelligence Survey Research Provides A Comprehensive Overview Of Large Language Models Applied To The Healthcare Domain appeared first on MarkTechPost.

How Can Transformers Handle Longer Inputs? CMU and Google Researchers Unveil a Novel Approach (FIRE): A Functional Interpolation for Relative Position Encoding

Tanya Malhotra — Sat, 14 Oct 2023 12:36:13 +0000

https://arxiv.org/abs/2310.04418

" data-medium-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-14-at-6.03.44-PM-300x131.png" data-large-file="https://www.marktechpost.com/wp-content/uploads/2023/10/Screenshot-2023-10-14-at-6.03.44-PM-1024x446.png" />https://arxiv.org/abs/2310.04418

Transformer-based Language Models have uplifted the domain of Natural Language Processing (NLP) in recent years. Their capacity to comprehend and produce text that is human-like has resulted in ground-breaking improvements across a range of NLP tasks. However, these models have a serious flaw: when exposed to input sequences longer than those encountered during training, their performance usually declines noticeably. The need to find ways to increase their ability to manage lengthier contexts in real-world applications has been spurred by this restriction.

Although the Transformer architecture itself is theoretically capable of handling different input durations, the model’s efficacy when dealing with longer inputs can be limited by the position encoding used during training. To address these challenges, a team of researchers from Carnegie Mellon University, Google Research, and Google DeepMind has introduced a unique approach called Functional Interpolation for Relative Positional Encoding (FIRE). The purpose of FIRE is to improve Transformers’ ability to generalize over long context lengths. This has been made possible by a brand-new method called progressive interpolation with functional relative position encoding.

The basic idea of FIRE is to give Transformer models a more flexible means of comprehending token placements within a sequence. FIRE offers a dynamic and learnable mechanism for encoding positional information in place of a predefined position encoding scheme. This strategy is important because it enables the model to modify and alter its comprehension of location in response to the particular context and sequence length that it encounters.

FIRE’s capacity to conceptually describe some of the widely used relative position encoding techniques, like Kerple, Alibi, and T5’s Relative Positional Encoding (RPE), is one of its main advantages. This indicates that FIRE preserves compatibility with current methods and models while simultaneously providing enhanced performance.

A number of experiments have been conducted to assess FIRE-equipped models’ performance in situations where prolonged context comprehension is crucial. This assessment covers a range of benchmarks, such as zero-shot language modeling and problems with long textual inputs. Improved models using this new method have shown better performance in terms of generalization when handling lengthier contexts. This implies that when presented with longer sequences, individuals are more capable of comprehending and producing meaningful text—a skill that is extremely useful in practical settings.

The main contributions have been summarized by the researchers as follows.

A new functional relative positional encoding technique called FIRE has been introduced. FIRE can represent popular position encoding methods, such as Alibi, Kerple, and T5’s RPE, bringing these methods together.

FIRE outperforms current techniques in zero-shot and fine-tuning scenarios on a variety of datasets and benchmarks, exhibiting high-length generalization performance. It even outperforms the best baseline by 2.28 perplexity points on the C4 language modeling problem, demonstrating its usefulness. It outperforms other techniques by an average of more than 1 point on the SCROLLS long text test.

FIRE’s versatility for different tasks is enhanced by its capacity to capture both local and anti-local position biases, as demonstrated by the visualizations of learned position embeddings.

In conclusion, FIRE offers a great resolution to a persistent issue with Transformer models. Relative position encoding is approached in a flexible and learnable way, enabling these models to continue operating at high performance even when faced with input sequences of previously unheard-of length.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post How Can Transformers Handle Longer Inputs? CMU and Google Researchers Unveil a Novel Approach (FIRE): A Functional Interpolation for Relative Position Encoding appeared first on MarkTechPost.

Researchers at Stanford Propose DDBMs: A Simple and Scalable Extension to Diffusion Models Suitable for Distribution Translation Problems

Tanya Malhotra — Thu, 12 Oct 2023 05:29:18 +0000

https://arxiv.org/abs/2309.16948v1

" data-medium-file="https://www.marktechpost.com/wp-content/uploads/2023/10/ezgif-2-8858395937-300x169.gif" data-large-file="https://www.marktechpost.com/wp-content/uploads/2023/10/ezgif-2-8858395937.gif" />https://arxiv.org/abs/2309.16948v1

Diffusion models have recently seen much success and attention in the Artificial Intelligence community. Belonging to the family of generative models, these models can effectively reverse a diffusion process that turns data into noise, allowing them to understand complex data distributions. This method has been a breakthrough in a number of generative tasks, particularly in the generation of high-quality images, where it has outperformed conventional GAN-based techniques. The development of modern text-to-image generative AI systems has been made possible by these diffusion model developments.

Diffusion models have performed exceptionally well in some areas but not in others. It can be difficult to apply them to applications like picture translation, where the goal is to map between pairs of images because they presuppose a preexisting distribution of random noise. Complex methods like training the model or manually adjusting the sample approach are frequently used to address this problem. These techniques have weak theoretical underpinnings and frequently support one-way mapping, usually from corrupted to clean pictures, dispensing with the idea of cycle consistency.

In contrast to the conventional diffusion model paradigm, a team of researchers has introduced a new and unique strategy known as Denoising Diffusion Bridge Models (DDBMs). Diffusion bridges are a class of processes that smoothly interpolate between two paired distributions that are specified as endpoints, and DDBMs make use of this idea. DDBMs derive the score of the diffusion bridge directly from data rather than starting with random noise. The learned score then directs the model as it solves a stochastic differential equation to map from one endpoint distribution to the other.

The capacity of DDBMs to automatically combine several kinds of generative models is one of their main advantages. They can easily combine components from OT-Flow-Matching and score-based diffusion models, allowing for the adaption of current design decisions and architectural strategies to address their more general challenge.

The team has applied DDBMs to difficult-picture datasets for their empirical analysis, taking into account both pixel-level and latent-space models. DDBMs greatly outperform baseline approaches in common picture translation tasks, demonstrating their suitability for tackling challenging image alteration tasks. DDBMs produce competitive results with state-of-the-art techniques specially created for image production, as assessed by FID scores when the team simplifies the problem by assuming that the source distribution is random noise.

This shows how adaptable and reliable DDBMs are in a variety of generative tasks, even when they are not specifically designed for the given circumstance. In conclusion, diffusion models have been effective in a variety of generative tasks, but they have drawbacks for work like picture translation. The suggested DDBMs offer an innovative and scalable solution that integrates diffusion-based generation and distribution translation methods, improving performance and versatility in tackling challenging image-related tasks.

If you like our work, you will love our newsletter..

We are also on WhatsApp. Join our AI Channel on Whatsapp..

The post Researchers at Stanford Propose DDBMs: A Simple and Scalable Extension to Diffusion Models Suitable for Distribution Translation Problems appeared first on MarkTechPost.