Top Related Projects
Integrate cutting-edge LLM technology quickly and easily into your apps
An open-source NLP research library, built on PyTorch.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
💫 Industrial-strength Natural Language Processing (NLP) in Python
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
Quick Overview
GitHub Semantic is an open-source library that provides a powerful semantic code analysis tool. It uses machine learning and natural language processing techniques to analyze and understand code across multiple programming languages, enabling advanced code search, navigation, and refactoring capabilities.
Pros
- Supports multiple programming languages, including JavaScript, TypeScript, Python, and more
- Provides accurate and context-aware code analysis
- Enables advanced code search and navigation features
- Can be integrated into various development tools and workflows
Cons
- Requires significant computational resources for large codebases
- May have a steep learning curve for advanced usage
- Documentation could be more comprehensive for some features
- Limited community support compared to some other code analysis tools
Code Examples
- Parsing a TypeScript file:
import { parseTypeScriptFile } from '@github/semantic'
const ast = await parseTypeScriptFile('path/to/file.ts')
console.log(ast)
- Performing a semantic search:
import { semanticSearch } from '@github/semantic'
const results = await semanticSearch('function that calculates fibonacci', 'path/to/codebase')
console.log(results)
- Extracting function definitions:
import { extractFunctions } from '@github/semantic'
const functions = await extractFunctions('path/to/file.js')
functions.forEach(func => console.log(func.name, func.parameters))
Getting Started
To get started with GitHub Semantic, follow these steps:
-
Install the library:
npm install @github/semantic
-
Import and use the desired functions in your project:
import { parseTypeScriptFile, semanticSearch } from '@github/semantic' // Use the imported functions as needed const ast = await parseTypeScriptFile('path/to/file.ts') const searchResults = await semanticSearch('query', 'path/to/codebase')
-
Refer to the documentation for more advanced usage and configuration options.
Competitor Comparisons
Integrate cutting-edge LLM technology quickly and easily into your apps
Pros of Semantic Kernel
- More active development with frequent updates and contributions
- Broader scope, focusing on integrating AI capabilities into various applications
- Extensive documentation and examples for easier adoption
Cons of Semantic Kernel
- Steeper learning curve due to its comprehensive nature
- Heavier dependency on external AI services, potentially increasing costs
- Less focused on specific code analysis tasks compared to Semantic
Code Comparison
Semantic:
parseModule :: Parser Module
parseModule = do
header <- optional moduleHeader
imports <- many importDecl
decls <- many topDecl
return $ Module header imports decls
Semantic Kernel:
public class SemanticFunction
{
public string Name { get; set; }
public string Description { get; set; }
public ISKFunction Function { get; set; }
public List<ParameterView> Parameters { get; set; }
}
Summary
Semantic focuses on code analysis and parsing, while Semantic Kernel offers a broader toolkit for AI integration. Semantic may be more suitable for specific code-related tasks, whereas Semantic Kernel provides a more versatile platform for AI-powered applications. The choice between them depends on the project's requirements and the desired level of AI integration.
An open-source NLP research library, built on PyTorch.
Pros of AllenNLP
- More comprehensive NLP toolkit with a wider range of pre-built models and tasks
- Extensive documentation and tutorials, making it more accessible for beginners
- Active community and regular updates
Cons of AllenNLP
- Steeper learning curve due to its extensive feature set
- Potentially slower performance for specific tasks compared to more specialized libraries
Code Comparison
AllenNLP:
from allennlp.predictors.predictor import Predictor
predictor = Predictor.from_path("https://storage.googleapis.com/allennlp-public-models/bert-base-srl-2020.03.24.tar.gz")
result = predictor.predict(sentence="The cat sat on the mat.")
Semantic:
from semantic.api import SemanticApi
api = SemanticApi()
result = api.analyze("The cat sat on the mat.")
AllenNLP provides a more detailed and customizable approach, while Semantic offers a simpler API for quick analysis. AllenNLP's code demonstrates loading a specific model, whereas Semantic's API abstracts away model selection. AllenNLP is better suited for researchers and developers needing fine-grained control, while Semantic may be preferable for rapid prototyping or simpler NLP tasks.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Pros of transformers
- Extensive library of pre-trained models for various NLP tasks
- Active community and frequent updates
- Comprehensive documentation and tutorials
Cons of transformers
- Can be resource-intensive for large models
- Steeper learning curve for beginners
- Limited focus on semantic analysis compared to semantic
Code comparison
transformers:
from transformers import pipeline
classifier = pipeline("sentiment-analysis")
result = classifier("I love this product!")[0]
print(f"Label: {result['label']}, Score: {result['score']:.4f}")
semantic:
{-# LANGUAGE OverloadedStrings #-}
import Semantic
main :: IO ()
main = do
let src = "function foo() { return 42; }"
tree <- parseTreeFromString JavaScript src
print tree
The code snippets demonstrate the different focus areas of the two libraries. transformers provides high-level APIs for various NLP tasks, while semantic is more focused on parsing and analyzing source code.
transformers is better suited for general NLP tasks and offers a wide range of pre-trained models. semantic, on the other hand, excels in semantic analysis of source code and is more specialized for programming language processing.
💫 Industrial-strength Natural Language Processing (NLP) in Python
Pros of spaCy
- Extensive language support with pre-trained models for multiple languages
- Comprehensive documentation and active community support
- Efficient and fast processing for large-scale text analysis
Cons of spaCy
- Steeper learning curve for beginners compared to semantic
- Less focus on semantic parsing and logical form extraction
- May require more manual configuration for specialized NLP tasks
Code Comparison
spaCy:
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")
for ent in doc.ents:
print(ent.text, ent.label_)
semantic:
{-# LANGUAGE OverloadedStrings #-}
import Semantic
main :: IO ()
main = do
let text = "Apple is looking at buying U.K. startup for $1 billion"
result <- runSemantic $ parse text
print result
Note: The code examples demonstrate basic usage for each library. spaCy focuses on named entity recognition in this example, while semantic's approach is more general for parsing and analysis.
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Pros of fairseq
- Broader scope: Supports a wide range of sequence modeling tasks, including machine translation, text summarization, and language modeling
- Active development: Regularly updated with new features and improvements
- Extensive documentation: Comprehensive guides and examples for various use cases
Cons of fairseq
- Steeper learning curve: Requires more in-depth knowledge of NLP concepts
- Higher resource requirements: May need more computational power for training and inference
Code Comparison
fairseq:
from fairseq.models.transformer import TransformerModel
en2de = TransformerModel.from_pretrained(
'/path/to/checkpoints',
checkpoint_file='checkpoint_best.pt',
data_name_or_path='data-bin/wmt16_en_de_bpe32k'
)
en2de.translate('Hello world!')
semantic:
import Semantic.Api
import Semantic.Config
main :: IO ()
main = do
config <- defaultConfig
result <- runSemantic config $ do
parseFile "path/to/file.py"
print result
The code snippets demonstrate the different focus areas of the two projects. fairseq is geared towards NLP tasks, while semantic is designed for parsing and analyzing source code.
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
Pros of Stanza
- Supports a wide range of languages (over 60) for various NLP tasks
- Provides pre-trained neural models for accurate linguistic annotations
- Offers a Python interface with easy integration into existing workflows
Cons of Stanza
- May have slower processing speed compared to Semantic
- Requires more computational resources for running neural models
- Limited focus on code analysis and programming language support
Code Comparison
Stanza example:
import stanza
nlp = stanza.Pipeline('en')
doc = nlp("Hello world!")
for sentence in doc.sentences:
print([(word.text, word.upos) for word in sentence.words])
Semantic example:
{-# LANGUAGE OverloadedStrings #-}
import Semantic
main :: IO ()
main = do
let src = "def hello(): print('Hello, world!')"
ast <- parseFile Python src
print ast
While Stanza focuses on natural language processing tasks, Semantic is tailored for parsing and analyzing source code across multiple programming languages. Stanza excels in linguistic annotations for human languages, whereas Semantic provides powerful tools for code analysis, making it more suitable for developers working with source code and programming languages.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Semantic
semantic
is a Haskell library and command line tool for parsing, analyzing, and comparing source code.
In a hurry? Check out our documentation of example uses for the semantic
command line tool.
Table of Contents |
---|
Usage |
Language support |
Development |
Technology and architecture |
Licensing |
Usage
Run semantic --help
for complete list of up-to-date options.
Parse
Usage: semantic parse [--sexpression | (--json-symbols|--symbols) |
--proto-symbols | --show | --quiet] [FILES...]
Generate parse trees for path(s)
Available options:
--sexpression Output s-expression parse trees (default)
--json-symbols,--symbols Output JSON symbol list
--proto-symbols Output protobufs symbol list
--show Output using the Show instance (debug only, format
subject to change without notice)
--quiet Don't produce output, but show timing stats
-h,--help Show this help text
Language support
Language | Parse | AST Symbolsâ | Stack graphs |
---|---|---|---|
Ruby | â | â | |
JavaScript | â | â | |
TypeScript | â | â | ð§ |
Python | â | â | ð§ |
Go | â | â | |
PHP | â | â | |
Java | ð§ | â | |
JSON | â | â¬ï¸ | â¬ï¸ |
JSX | â | â | |
TSX | â | â | |
CodeQL | â | â | |
Haskell | ð§ | ð§ |
â Used for code navigation on github.com.
- â â Supported
- ð¶ â Partial support
- ð§ â Under development
- ⬠- N/A ï¸
Development
semantic
requires at least GHC 8.10.1 and Cabal 3.0. We strongly recommend using ghcup
to sandbox GHC versions, as GHC packages installed through your OS's package manager may not install statically-linked versions of the GHC boot libraries. semantic
currently builds only on Unix systems; users of other operating systems may wish to use the Docker images.
We use cabal's
Nix-style local builds for development. To get started quickly:
git clone git@github.com:github/semantic.git
cd semantic
script/bootstrap
cabal v2-build all
cabal v2-run semantic:test
cabal v2-run semantic:semantic -- --help
You can also use the Bazel build system for development. To learn more about Bazel and why it might give you a better development experience, check the build documentation.
git clone git@github.com:github/semantic.git
cd semantic
script/bootstrap-bazel
bazel build //...
stack
as a build tool is not officially supported; there is unofficial stack.yaml
support available, though we cannot make guarantees as to its stability.
Technology and architecture
Architecturally, semantic
:
- Generates per-language Haskell syntax types based on tree-sitter grammar definitions.
- Reads blobs from a filesystem or provided via a protocol buffer request.
- Returns blobs or performs analysis.
- Renders output in one of many supported formats.
Throughout its lifecycle, semantic
has leveraged a number of interesting algorithms and techniques, including:
- Myers' algorithm (SES) as described in the paper An O(ND) Difference Algorithm and Its Variations
- RWS as described in the paper RWS-Diff: Flexible and Efficient Change Detection in Hierarchical Data.
- Open unions and data types à la carte.
- An implementation of Abstracting Definitional Interpreters extended to work with an à la carte representation of syntax terms.
Contributions
Contributions are welcome! Please see our contribution guidelines and our code of conduct for details on how to participate in our community.
Licensing
Semantic is licensed under the MIT license.
Top Related Projects
Integrate cutting-edge LLM technology quickly and easily into your apps
An open-source NLP research library, built on PyTorch.
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
💫 Industrial-strength Natural Language Processing (NLP) in Python
Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
Stanford NLP Python library for tokenization, sentence segmentation, NER, and parsing of many human languages
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot