icu

The home of the ICU project source code.

3,118

810

3,118

View on GitHub

Top Related Projects

icu

3,045

The home of the ICU project source code.

double-conversion

1,147

Efficient binary-decimal and decimal-binary conversion routines for IEEE doubles.

cityhash

1,172

Automatically exported from code.google.com/p/cityhash

re2

9,246

RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.

Quick Overview

The Unicode Common Code (ICU4C) is a C/C++ library that provides Unicode and globalization support for software applications. It is a widely used library that helps developers handle text, dates, numbers, and other locale-specific data in a consistent and reliable manner across different platforms and languages.

Pros

Comprehensive Unicode Support: ICU4C provides comprehensive support for the Unicode standard, including character encoding, normalization, collation, and more.
Cross-Platform Compatibility: The library is designed to be cross-platform, allowing developers to use the same code on various operating systems, including Windows, macOS, and Linux.
Localization and Internationalization: ICU4C offers a rich set of tools and APIs for handling localization and internationalization, making it easier to develop software that can be used in different languages and regions.
Active Development and Community: The project is actively maintained by the Unicode Consortium and has a large and engaged community of contributors and users.

Cons

Complexity: The library is quite extensive and can have a steep learning curve, especially for developers who are new to internationalization and globalization.
Performance Overhead: Depending on the use case, the comprehensive features of ICU4C may introduce some performance overhead, which can be a concern for certain applications.
Dependency Management: Integrating ICU4C into a project may require managing dependencies and build configurations, which can add complexity to the development process.
Limited Documentation: While the project has good documentation, some areas may be less well-documented, making it challenging for new users to get started.

Code Examples

Here are a few examples of how to use ICU4C:

Performing Unicode Normalization:

#include <unicode/unorm2.h>

UErrorCode status = U_ZERO_ERROR;
const UNormalizer2* normalizer = unorm2_getNFCInstance(&status);
UnicodeString input = UnicodeString("Café");
UnicodeString normalized;
unorm2_normalize(normalizer, input.getBuffer(), input.length(), normalized.getBuffer(), normalized.getCapacity(), &status);
// normalized.toString() will now contain "Café"

Formatting Dates and Times:

#include <unicode/udat.h>

UErrorCode status = U_ZERO_ERROR;
UDateFormat* formatter = udat_open(UDAT_FULL, UDAT_FULL, "en_US", NULL, 0, NULL, 0, &status);
UDate date = udat_parse(formatter, "Friday, April 14, 2023", -1, NULL, &status);
char buffer[256];
int32_t len = udat_format(formatter, date, buffer, sizeof(buffer), NULL, &status);
// buffer will now contain the formatted date string

Performing Locale-Aware String Comparison:

#include <unicode/ucol.h>

UErrorCode status = U_ZERO_ERROR;
UCollator* collator = ucol_open("fr_FR", &status);
UnicodeString str1 = UnicodeString("café");
UnicodeString str2 = UnicodeString("cafe");
int32_t result = ucol_strcoll(collator, str1.getBuffer(), str1.length(), str2.getBuffer(), str2.length());
// result will be a negative value, indicating that "café" comes before "cafe" in French collation

Getting Started

To get started with ICU4C, follow these steps:

Download the latest version of ICU4C from the official website.
Extract the downloaded archive and navigate to the icu/source directory.
Run the configure script to generate the necessary build files for your platform:
```
./configure
```
Build the library:
```
make
```
Install the library:
```
make install
```
In your C/C++ project, include the necessary ICU4C headers and link against the ICU4C libraries. For example, in your `CMakeLists.txt

Competitor Comparisons

icu

3,045

The home of the ICU project source code.

Pros of ICU

More comprehensive, including both C/C++ (ICU4C) and Java (ICU4J) implementations
Unified codebase for multiple programming languages
Broader range of internationalization services and utilities

Cons of ICU

Larger project size, potentially more complex to navigate
May require more resources due to its comprehensive nature
Potentially slower release cycle due to coordinating multiple language implementations

Code Comparison

ICU (Java implementation):

Collator coll = Collator.getInstance(new ULocale("de"));
coll.setStrength(Collator.PRIMARY);
boolean result = coll.equals("ß", "ss");

ICU4C (C++ implementation):

UErrorCode status = U_ZERO_ERROR;
UCollator *coll = ucol_open("de", &status);
ucol_setStrength(coll, UCOL_PRIMARY);
UBool result = ucol_equal(coll, u"ß", -1, u"ss", -1);
ucol_close(coll);

Both examples demonstrate creating a collator for the German locale and comparing "ß" with "ss" at primary strength. The ICU repository includes both implementations, while ICU4C focuses solely on the C/C++ version.

double-conversion

1,147

Efficient binary-decimal and decimal-binary conversion routines for IEEE doubles.

Pros of double-conversion

Lightweight and focused specifically on double-to-string and string-to-double conversions
Highly optimized for performance in these specific operations
Easy to integrate into existing C++ projects

Cons of double-conversion

Limited scope compared to ICU4C's comprehensive internationalization features
Lacks support for other data types and internationalization tasks
May require additional libraries for full-fledged internationalization support

Code Comparison

ICU4C (formatting a number):

UErrorCode status = U_ZERO_ERROR;
UnicodeString result;
NumberFormat* formatter = NumberFormat::createInstance(Locale::getUS(), status);
formatter->format(12345.678, result);
delete formatter;

double-conversion (converting double to string):

char buffer[128];
double value = 12345.678;
int length = DoubleToStringConverter::ToFixed(value, 3, buffer, sizeof(buffer));

Both libraries offer efficient number formatting, but ICU4C provides a more comprehensive set of internationalization tools, while double-conversion focuses on fast and precise double-string conversions.

cityhash

1,172

Automatically exported from code.google.com/p/cityhash

Pros of CityHash

Lightweight and focused on fast hash functions for strings
Optimized for modern processors with good performance on short strings
Simple API with minimal dependencies

Cons of CityHash

Limited scope compared to ICU4C's comprehensive internationalization features
Less active development and community support
Not designed for Unicode-aware string operations or locale-specific functionality

Code Comparison

CityHash (hash function example):

uint64_t CityHash64(const char *buf, size_t len) {
  if (len <= 32) {
    if (len <= 16) {
      return HashLen0to16(buf, len);
    } else {
      return HashLen17to32(buf, len);
    }
  }
  // ... (additional code for longer strings)
}

ICU4C (string comparison example):

UErrorCode status = U_ZERO_ERROR;
UCollator *coll = ucol_open("en_US", &status);
int result = ucol_strcoll(coll, str1, -1, str2, -1);
ucol_close(coll);

Summary

CityHash is a specialized library for fast string hashing, while ICU4C is a comprehensive internationalization framework. CityHash offers simplicity and performance for specific use cases, but lacks the broad Unicode and locale support provided by ICU4C. The choice between them depends on whether you need focused hashing functionality or extensive internationalization capabilities.

re2

9,246

RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.

Pros of re2

Faster performance for regular expression matching
Guaranteed linear time complexity, avoiding catastrophic backtracking
Smaller memory footprint compared to ICU4C

Cons of re2

Limited Unicode support compared to ICU4C's extensive capabilities
Fewer internationalization features and locale-specific functionalities
Not as widely adopted or supported in various programming environments

Code Comparison

re2:

#include <re2/re2.h>

RE2 pattern("\\w+");
re2::StringPiece input("Hello, World!");
re2::StringPiece match;
while (RE2::FindAndConsume(&input, pattern, &match)) {
    // Process match
}

ICU4C:

#include <unicode/regex.h>

UErrorCode status = U_ZERO_ERROR;
RegexMatcher* matcher = new RegexMatcher("\\w+", 0, status);
matcher->reset("Hello, World!");
while (matcher->find(status)) {
    // Process match
}
delete matcher;

Both libraries provide regular expression functionality, but re2 focuses on performance and safety, while ICU4C offers more comprehensive Unicode and internationalization support. re2 is better suited for high-performance regex operations, while ICU4C excels in applications requiring extensive Unicode handling and localization features.

snappy

6,313

A fast compressor/decompressor

Pros of Snappy

Lightweight and focused on fast compression/decompression
Simple API, easy to integrate into existing projects
Optimized for speed, particularly suitable for real-time systems

Cons of Snappy

Limited to compression functionality, lacks broader text processing capabilities
Not designed for maximum compression ratio, prioritizes speed over file size reduction
Less widely adopted compared to ICU4C in international software development

Code Comparison

Snappy (C++):

snappy::Compress(input_data.data(), input_data.size(), &compressed);
snappy::Uncompress(compressed.data(), compressed.size(), &uncompressed);

ICU4C (C):

UErrorCode status = U_ZERO_ERROR;
int32_t result_length;
UChar* result = u_strToUpper(source, -1, NULL, 0, NULL, &status);

Summary

Snappy is a fast compression/decompression library, while ICU4C is a comprehensive internationalization library. Snappy excels in scenarios requiring quick data compression, whereas ICU4C provides extensive support for Unicode and globalization features. The choice between them depends on the specific needs of the project, with Snappy being more suitable for performance-critical compression tasks and ICU4C for applications requiring robust internationalization support.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

International Components for Unicode

This is the repository for the International Components for Unicode. The ICU project is under the stewardship of The Unicode Consortium.

Source: https://github.com/unicode-org/icu
Bugs: https://unicode-org.atlassian.net/projects/ICU
API Docs: https://unicode-org.github.io/icu-docs/
User Guide: https://unicode-org.github.io/icu/

ICU Logo

Build Status (`main` branch)

Build	Status
GitHub Actions (ICU4C)
GitHub Actions (ICU4J)
GitHub Actions (Valgrind)
Exhaustive Tests
Fuzzing
OpenSSF Scorecard

Subdirectories and Information

Copyright & Licenses

Copyright Â© 2016 and later: Unicode, Inc. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries. License & terms of use: https://www.unicode.org/copyright.html

A CLA is required to contribute to this project - please refer to the CONTRIBUTING.md file (or start a Pull Request) for more information.

The contents of this repository are governed by the Unicode Terms of Use and are released under LICENSE.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of ICU

Cons of ICU

Code Comparison

Pros of double-conversion

Cons of double-conversion

Code Comparison

Pros of CityHash

Cons of CityHash

Code Comparison

Summary

Pros of re2

Cons of re2

Code Comparison

Pros of Snappy

Cons of Snappy

Code Comparison

Summary

Convert designs to code with AI

README

International Components for Unicode

Build Status (main branch)

Subdirectories and Information

Copyright & Licenses

Top Related Projects

Convert designs to code with AI

Build Status (`main` branch)