Convert Figma logo to code with AI

unicode-org logoicu

The home of the ICU project source code.

2,871
752
2,871
85

Top Related Projects

2,911

The home of the ICU project source code.

Efficient binary-decimal and decimal-binary conversion routines for IEEE doubles.

Automatically exported from code.google.com/p/cityhash

9,089

RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.

6,239

A fast compressor/decompressor

Quick Overview

The Unicode Common Code (ICU4C) is a C/C++ library that provides Unicode and globalization support for software applications. It is a widely used library that helps developers handle text, dates, numbers, and other locale-specific data in a consistent and reliable manner across different platforms and languages.

Pros

  • Comprehensive Unicode Support: ICU4C provides comprehensive support for the Unicode standard, including character encoding, normalization, collation, and more.
  • Cross-Platform Compatibility: The library is designed to be cross-platform, allowing developers to use the same code on various operating systems, including Windows, macOS, and Linux.
  • Localization and Internationalization: ICU4C offers a rich set of tools and APIs for handling localization and internationalization, making it easier to develop software that can be used in different languages and regions.
  • Active Development and Community: The project is actively maintained by the Unicode Consortium and has a large and engaged community of contributors and users.

Cons

  • Complexity: The library is quite extensive and can have a steep learning curve, especially for developers who are new to internationalization and globalization.
  • Performance Overhead: Depending on the use case, the comprehensive features of ICU4C may introduce some performance overhead, which can be a concern for certain applications.
  • Dependency Management: Integrating ICU4C into a project may require managing dependencies and build configurations, which can add complexity to the development process.
  • Limited Documentation: While the project has good documentation, some areas may be less well-documented, making it challenging for new users to get started.

Code Examples

Here are a few examples of how to use ICU4C:

  1. Performing Unicode Normalization:
#include <unicode/unorm2.h>

UErrorCode status = U_ZERO_ERROR;
const UNormalizer2* normalizer = unorm2_getNFCInstance(&status);
UnicodeString input = UnicodeString("Café");
UnicodeString normalized;
unorm2_normalize(normalizer, input.getBuffer(), input.length(), normalized.getBuffer(), normalized.getCapacity(), &status);
// normalized.toString() will now contain "Café"
  1. Formatting Dates and Times:
#include <unicode/udat.h>

UErrorCode status = U_ZERO_ERROR;
UDateFormat* formatter = udat_open(UDAT_FULL, UDAT_FULL, "en_US", NULL, 0, NULL, 0, &status);
UDate date = udat_parse(formatter, "Friday, April 14, 2023", -1, NULL, &status);
char buffer[256];
int32_t len = udat_format(formatter, date, buffer, sizeof(buffer), NULL, &status);
// buffer will now contain the formatted date string
  1. Performing Locale-Aware String Comparison:
#include <unicode/ucol.h>

UErrorCode status = U_ZERO_ERROR;
UCollator* collator = ucol_open("fr_FR", &status);
UnicodeString str1 = UnicodeString("café");
UnicodeString str2 = UnicodeString("cafe");
int32_t result = ucol_strcoll(collator, str1.getBuffer(), str1.length(), str2.getBuffer(), str2.length());
// result will be a negative value, indicating that "café" comes before "cafe" in French collation

Getting Started

To get started with ICU4C, follow these steps:

  1. Download the latest version of ICU4C from the official website.
  2. Extract the downloaded archive and navigate to the icu/source directory.
  3. Run the configure script to generate the necessary build files for your platform:
    ./configure
    
  4. Build the library:
    make
    
  5. Install the library:
    make install
    
  6. In your C/C++ project, include the necessary ICU4C headers and link against the ICU4C libraries. For example, in your `CMakeLists.txt

Competitor Comparisons

2,911

The home of the ICU project source code.

Pros of ICU

  • More comprehensive, including both C/C++ (ICU4C) and Java (ICU4J) implementations
  • Unified codebase for multiple programming languages
  • Broader range of internationalization services and utilities

Cons of ICU

  • Larger project size, potentially more complex to navigate
  • May require more resources due to its comprehensive nature
  • Potentially slower release cycle due to coordinating multiple language implementations

Code Comparison

ICU (Java implementation):

Collator coll = Collator.getInstance(new ULocale("de"));
coll.setStrength(Collator.PRIMARY);
boolean result = coll.equals("ß", "ss");

ICU4C (C++ implementation):

UErrorCode status = U_ZERO_ERROR;
UCollator *coll = ucol_open("de", &status);
ucol_setStrength(coll, UCOL_PRIMARY);
UBool result = ucol_equal(coll, u"ß", -1, u"ss", -1);
ucol_close(coll);

Both examples demonstrate creating a collator for the German locale and comparing "ß" with "ss" at primary strength. The ICU repository includes both implementations, while ICU4C focuses solely on the C/C++ version.

Efficient binary-decimal and decimal-binary conversion routines for IEEE doubles.

Pros of double-conversion

  • Lightweight and focused specifically on double-to-string and string-to-double conversions
  • Highly optimized for performance in these specific operations
  • Easy to integrate into existing C++ projects

Cons of double-conversion

  • Limited scope compared to ICU4C's comprehensive internationalization features
  • Lacks support for other data types and internationalization tasks
  • May require additional libraries for full-fledged internationalization support

Code Comparison

ICU4C (formatting a number):

UErrorCode status = U_ZERO_ERROR;
UnicodeString result;
NumberFormat* formatter = NumberFormat::createInstance(Locale::getUS(), status);
formatter->format(12345.678, result);
delete formatter;

double-conversion (converting double to string):

char buffer[128];
double value = 12345.678;
int length = DoubleToStringConverter::ToFixed(value, 3, buffer, sizeof(buffer));

Both libraries offer efficient number formatting, but ICU4C provides a more comprehensive set of internationalization tools, while double-conversion focuses on fast and precise double-string conversions.

Automatically exported from code.google.com/p/cityhash

Pros of CityHash

  • Lightweight and focused on fast hash functions for strings
  • Optimized for modern processors with good performance on short strings
  • Simple API with minimal dependencies

Cons of CityHash

  • Limited scope compared to ICU4C's comprehensive internationalization features
  • Less active development and community support
  • Not designed for Unicode-aware string operations or locale-specific functionality

Code Comparison

CityHash (hash function example):

uint64_t CityHash64(const char *buf, size_t len) {
  if (len <= 32) {
    if (len <= 16) {
      return HashLen0to16(buf, len);
    } else {
      return HashLen17to32(buf, len);
    }
  }
  // ... (additional code for longer strings)
}

ICU4C (string comparison example):

UErrorCode status = U_ZERO_ERROR;
UCollator *coll = ucol_open("en_US", &status);
int result = ucol_strcoll(coll, str1, -1, str2, -1);
ucol_close(coll);

Summary

CityHash is a specialized library for fast string hashing, while ICU4C is a comprehensive internationalization framework. CityHash offers simplicity and performance for specific use cases, but lacks the broad Unicode and locale support provided by ICU4C. The choice between them depends on whether you need focused hashing functionality or extensive internationalization capabilities.

9,089

RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.

Pros of re2

  • Faster performance for regular expression matching
  • Guaranteed linear time complexity, avoiding catastrophic backtracking
  • Smaller memory footprint compared to ICU4C

Cons of re2

  • Limited Unicode support compared to ICU4C's extensive capabilities
  • Fewer internationalization features and locale-specific functionalities
  • Not as widely adopted or supported in various programming environments

Code Comparison

re2:

#include <re2/re2.h>

RE2 pattern("\\w+");
re2::StringPiece input("Hello, World!");
re2::StringPiece match;
while (RE2::FindAndConsume(&input, pattern, &match)) {
    // Process match
}

ICU4C:

#include <unicode/regex.h>

UErrorCode status = U_ZERO_ERROR;
RegexMatcher* matcher = new RegexMatcher("\\w+", 0, status);
matcher->reset("Hello, World!");
while (matcher->find(status)) {
    // Process match
}
delete matcher;

Both libraries provide regular expression functionality, but re2 focuses on performance and safety, while ICU4C offers more comprehensive Unicode and internationalization support. re2 is better suited for high-performance regex operations, while ICU4C excels in applications requiring extensive Unicode handling and localization features.

6,239

A fast compressor/decompressor

Pros of Snappy

  • Lightweight and focused on fast compression/decompression
  • Simple API, easy to integrate into existing projects
  • Optimized for speed, particularly suitable for real-time systems

Cons of Snappy

  • Limited to compression functionality, lacks broader text processing capabilities
  • Not designed for maximum compression ratio, prioritizes speed over file size reduction
  • Less widely adopted compared to ICU4C in international software development

Code Comparison

Snappy (C++):

snappy::Compress(input_data.data(), input_data.size(), &compressed);
snappy::Uncompress(compressed.data(), compressed.size(), &uncompressed);

ICU4C (C):

UErrorCode status = U_ZERO_ERROR;
int32_t result_length;
UChar* result = u_strToUpper(source, -1, NULL, 0, NULL, &status);

Summary

Snappy is a fast compression/decompression library, while ICU4C is a comprehensive internationalization library. Snappy excels in scenarios requiring quick data compression, whereas ICU4C provides extensive support for Unicode and globalization features. The choice between them depends on the specific needs of the project, with Snappy being more suitable for performance-critical compression tasks and ICU4C for applications requiring robust internationalization support.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

International Components for Unicode

This is the repository for the International Components for Unicode. The ICU project is under the stewardship of The Unicode Consortium.

ICU Logo

Build Status (main branch)

BuildStatus
GitHub Actions (ICU4C)GHA ICU4C
GitHub Actions (ICU4J)GHA ICU4J
GitHub Actions (Valgrind)GHA CI Valgrind
Exhaustive TestsExhaustive Tests for ICU
FuzzingFuzzing Status
OpenSSF ScorecardOpenSSF Scorecard

Subdirectories and Information

Copyright & Licenses

Copyright © 2016-2024 Unicode, Inc. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the United States and other countries.

A CLA is required to contribute to this project - please refer to the CONTRIBUTING.md file (or start a Pull Request) for more information.

The contents of this repository are governed by the Unicode Terms of Use and are released under LICENSE.