Convert Figma logo to code with AI

rvagg logonode-worker-farm

Distribute processing tasks to child processes with an über-simple API and baked-in durability & custom concurrency options.

1,744
123
1,744
13

Top Related Projects

Offload tasks to a pool of workers on node.js and in the browser

🧵 Make web workers & worker threads as simple as a function call.

4,628

A fast, efficient Node.js Worker Thread Pool implementation

9,237

Napa.js: a multi-threaded JavaScript runtime

Quick Overview

Node Worker Farm is a module for Node.js that allows you to easily create and manage a farm of worker processes. It provides a simple interface for offloading CPU-intensive tasks to child processes, enabling better utilization of multi-core systems and improved performance for Node.js applications.

Pros

  • Easy to use API for creating and managing worker processes
  • Automatic load balancing across worker processes
  • Configurable pool size and worker lifetime
  • Supports both synchronous and asynchronous task execution

Cons

  • Limited to Node.js environments
  • Overhead of inter-process communication can impact performance for very small tasks
  • Requires careful consideration of shared state and memory usage
  • May not be suitable for all types of workloads

Code Examples

  1. Basic usage:
const workerFarm = require('worker-farm');
const workers = workerFarm(require.resolve('./worker'));

for (let i = 0; i < 10; i++) {
  workers('Hello', i, (err, result) => {
    console.log(result);
  });
}

workerFarm.end(workers);
  1. Using promises:
const workerFarm = require('worker-farm');
const { promisify } = require('util');

const workers = promisify(workerFarm(require.resolve('./worker')));

async function runTasks() {
  const results = await Promise.all(
    Array(10).fill().map((_, i) => workers('Hello', i))
  );
  console.log(results);
  workerFarm.end(workers);
}

runTasks();
  1. Configuring worker farm options:
const workerFarm = require('worker-farm');

const options = {
  maxConcurrentWorkers: 4,
  maxConcurrentCallsPerWorker: 1,
  maxRetries: 3,
  autoStart: true
};

const workers = workerFarm(options, require.resolve('./worker'));

// Use workers as before

Getting Started

  1. Install the package:

    npm install worker-farm
    
  2. Create a worker file (e.g., worker.js):

    module.exports = (input, callback) => {
      // Perform some CPU-intensive task
      const result = input.toUpperCase();
      callback(null, result);
    };
    
  3. Use the worker farm in your main file:

    const workerFarm = require('worker-farm');
    const workers = workerFarm(require.resolve('./worker'));
    
    workers('hello world', (err, result) => {
      console.log(result); // Outputs: HELLO WORLD
      workerFarm.end(workers);
    });
    

Competitor Comparisons

Offload tasks to a pool of workers on node.js and in the browser

Pros of workerpool

  • Supports both Node.js and browser environments
  • Offers more advanced features like promises, cancellation, and timeouts
  • Provides a more flexible API for creating and managing worker pools

Cons of workerpool

  • May have a steeper learning curve due to its more extensive feature set
  • Potentially higher overhead for simple use cases compared to node-worker-farm

Code Comparison

workerpool:

const workerpool = require('workerpool');
const pool = workerpool.pool();

pool.exec(myFunction, [arg1, arg2])
  .then(result => console.log(result))
  .catch(error => console.error(error));

node-worker-farm:

const workerFarm = require('worker-farm');
const workers = workerFarm(require.resolve('./worker'));

workers(arg1, arg2, (err, result) => {
  if (err) console.error(err);
  else console.log(result);
});

Key Differences

  1. API style: workerpool uses promises, while node-worker-farm uses callbacks
  2. Flexibility: workerpool allows inline function execution, node-worker-farm requires separate worker files
  3. Environment support: workerpool works in both Node.js and browsers, node-worker-farm is Node.js-specific
  4. Feature set: workerpool offers more advanced features like cancellation and timeouts

Both libraries provide worker pool functionality for Node.js, but workerpool offers a more comprehensive solution with broader platform support and advanced features, while node-worker-farm focuses on simplicity and ease of use for Node.js environments.

🧵 Make web workers & worker threads as simple as a function call.

Pros of threads.js

  • Supports both Node.js and browsers, offering greater flexibility
  • Provides a more modern, Promise-based API for easier asynchronous programming
  • Offers built-in support for transferable objects, improving performance for large data transfers

Cons of threads.js

  • May have a steeper learning curve due to its more complex API
  • Potentially higher overhead for simple tasks compared to worker-farm's lightweight approach

Code Comparison

threads.js:

import { spawn, Thread, Worker } from 'threads'

const worker = await spawn(new Worker('./worker'))
const result = await worker.doWork(data)
await Thread.terminate(worker)

node-worker-farm:

const workerFarm = require('worker-farm')
const workers = workerFarm(require.resolve('./worker'))

workers.doWork(data, (err, result) => {
  workerFarm.end(workers)
})

Key Differences

  • threads.js uses a more modern, Promise-based approach, while node-worker-farm relies on callbacks
  • threads.js offers a more feature-rich API with support for various thread pool strategies and worker termination
  • node-worker-farm provides a simpler, more straightforward API for basic multi-threading tasks

Use Cases

  • Choose threads.js for projects requiring cross-platform support or more advanced threading features
  • Opt for node-worker-farm for simpler Node.js-specific projects with basic multi-threading needs
4,628

A fast, efficient Node.js Worker Thread Pool implementation

Pros of Piscina

  • Better performance and scalability, especially for CPU-intensive tasks
  • Built-in support for TypeScript and ESM modules
  • More active development and maintenance

Cons of Piscina

  • Slightly more complex API compared to Worker Farm
  • Requires Node.js 12.11.0 or later, while Worker Farm supports older versions

Code Comparison

Piscina:

const { Piscina } = require('piscina');

const piscina = new Piscina({
  filename: path.resolve(__dirname, 'worker.js')
});

const result = await piscina.run(data);

Worker Farm:

const workerFarm = require('worker-farm');
const workers = workerFarm(require.resolve('./worker'));

workers(data, (err, result) => {
  // Handle result
});

Both Piscina and Worker Farm are Node.js libraries for managing worker threads, but they differ in their approach and features. Piscina offers better performance and modern JavaScript support, while Worker Farm has a simpler API and broader Node.js version compatibility. The choice between them depends on your specific project requirements, performance needs, and the Node.js version you're using.

9,237

Napa.js: a multi-threaded JavaScript runtime

Pros of Napajs

  • Multi-threaded JavaScript runtime with shared memory, allowing for more efficient parallel processing
  • Supports TypeScript out of the box, providing better type safety and developer experience
  • Offers a more comprehensive API for managing workers and shared resources

Cons of Napajs

  • More complex setup and usage compared to node-worker-farm's simpler API
  • Less actively maintained, with the last update being several years ago
  • Requires more understanding of multi-threading concepts and potential pitfalls

Code Comparison

node-worker-farm:

const workerFarm = require('worker-farm');
const workers = workerFarm(require.resolve('./worker'));

workers('Hello, World!', (err, output) => {
  console.log(output);
  workerFarm.end(workers);
});

Napajs:

const napa = require('napajs');
const zone = napa.zone.create('zone', { workers: 4 });

zone.execute(() => {
  return 'Hello, World!';
}).then((result) => {
  console.log(result.value);
});

Both libraries aim to improve Node.js performance through parallel processing, but they take different approaches. node-worker-farm focuses on simplicity and ease of use, while Napajs offers more advanced features for multi-threading and shared memory. The choice between them depends on the specific requirements of your project and the level of complexity you're willing to manage.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Worker Farm Build Status

NPM

Distribute processing tasks to child processes with an über-simple API and baked-in durability & custom concurrency options. Available in npm as worker-farm.

Example

Given a file, child.js:

module.exports = function (inp, callback) {
  callback(null, inp + ' BAR (' + process.pid + ')')
}

And a main file:

var workerFarm = require('worker-farm')
  , workers    = workerFarm(require.resolve('./child'))
  , ret        = 0

for (var i = 0; i < 10; i++) {
  workers('#' + i + ' FOO', function (err, outp) {
    console.log(outp)
    if (++ret == 10)
      workerFarm.end(workers)
  })
}

We'll get an output something like the following:

#1 FOO BAR (8546)
#0 FOO BAR (8545)
#8 FOO BAR (8545)
#9 FOO BAR (8546)
#2 FOO BAR (8548)
#4 FOO BAR (8551)
#3 FOO BAR (8549)
#6 FOO BAR (8555)
#5 FOO BAR (8553)
#7 FOO BAR (8557)

This example is contained in the examples/basic directory.

Example #1: Estimating π using child workers

You will also find a more complex example in examples/pi that estimates the value of π by using a Monte Carlo area-under-the-curve method and compares the speed of doing it all in-process vs using child workers to complete separate portions.

Running node examples/pi will give you something like:

Doing it the slow (single-process) way...
π ≈ 3.1416269360000006  (0.0000342824102075312 away from actual!)
took 8341 milliseconds
Doing it the fast (multi-process) way...
π ≈ 3.1416233600000036  (0.00003070641021052367 away from actual!)
took 1985 milliseconds

Durability

An important feature of Worker Farm is call durability. If a child process dies for any reason during the execution of call(s), those calls will be re-queued and taken care of by other child processes. In this way, when you ask for something to be done, unless there is something seriously wrong with what you're doing, you should get a result on your callback function.

My use-case

There are other libraries for managing worker processes available but my use-case was fairly specific: I need to make heavy use of the node-java library to interact with JVM code. Unfortunately, because the JVM garbage collector is so difficult to interact with, it's prone to killing your Node process when the GC kicks under heavy load. For safety I needed a durable way to make calls so that (1) it wouldn't kill my main process and (2) any calls that weren't successful would be resubmitted for processing.

Worker Farm allows me to spin up multiple JVMs to be controlled by Node, and have a single, uncomplicated API that acts the same way as an in-process API and the calls will be taken care of by a child process even if an error kills a child process while it is working as the call will simply be passed to a new child process.

But, don't think that Worker Farm is specific to that use-case, it's designed to be very generic and simple to adapt to anything requiring the use of child Node processes.

API

Worker Farm exports a main function and an end() method. The main function sets up a "farm" of coordinated child-process workers and it can be used to instantiate multiple farms, all operating independently.

workerFarm([options, ]pathToModule[, exportedMethods])

In its most basic form, you call workerFarm() with the path to a module file to be invoked by the child process. You should use an absolute path to the module file, the best way to obtain the path is with require.resolve('./path/to/module'), this function can be used in exactly the same way as require('./path/to/module') but it returns an absolute path.

exportedMethods

If your module exports a single function on module.exports then you should omit the final parameter. However, if you are exporting multiple functions on module.exports then you should list them in an Array of Strings:

var workers = workerFarm(require.resolve('./mod'), [ 'doSomething', 'doSomethingElse' ])
workers.doSomething(function () {})
workers.doSomethingElse(function () {})

Listing the available methods will instruct Worker Farm what API to provide you with on the returned object. If you don't list a exportedMethods Array then you'll get a single callable function to use; but if you list the available methods then you'll get an object with callable functions by those names.

It is assumed that each function you call on your child module will take a callback function as the last argument.

options

If you don't provide an options object then the following defaults will be used:

{
    workerOptions               : {}
  , maxCallsPerWorker           : Infinity
  , maxConcurrentWorkers        : require('os').cpus().length
  , maxConcurrentCallsPerWorker : 10
  , maxConcurrentCalls          : Infinity
  , maxCallTime                 : Infinity
  , maxRetries                  : Infinity
  , autoStart                   : false
  , onChild                     : function() {}
}
  • workerOptions allows you to customize all the parameters passed to child nodes. This object supports all possible options of child_process.fork. The default options passed are the parent execArgv, cwd and env. Any (or all) of them can be overridden, and others can be added as well.

  • maxCallsPerWorker allows you to control the lifespan of your child processes. A positive number will indicate that you only want each child to accept that many calls before it is terminated. This may be useful if you need to control memory leaks or similar in child processes.

  • maxConcurrentWorkers will set the number of child processes to maintain concurrently. By default it is set to the number of CPUs available on the current system, but it can be any reasonable number, including 1.

  • maxConcurrentCallsPerWorker allows you to control the concurrency of individual child processes. Calls are placed into a queue and farmed out to child processes according to the number of calls they are allowed to handle concurrently. It is arbitrarily set to 10 by default so that calls are shared relatively evenly across workers, however if your calls predictably take a similar amount of time then you could set it to Infinity and Worker Farm won't queue any calls but spread them evenly across child processes and let them go at it. If your calls aren't I/O bound then it won't matter what value you use here as the individual workers won't be able to execute more than a single call at a time.

  • maxConcurrentCalls allows you to control the maximum number of calls in the queue—either actively being processed or waiting for a worker to be processed. Infinity indicates no limit but if you have conditions that may endlessly queue jobs and you need to set a limit then provide a >0 value and any calls that push the limit will return on their callback with a MaxConcurrentCallsError error (check err.type == 'MaxConcurrentCallsError').

  • maxCallTime (use with caution, understand what this does before you use it!) when !== Infinity, will cap a time, in milliseconds, that any single call can take to execute in a worker. If this time limit is exceeded by just a single call then the worker running that call will be killed and any calls running on that worker will have their callbacks returned with a TimeoutError (check err.type == 'TimeoutError'). If you are running with maxConcurrentCallsPerWorker value greater than 1 then all calls currently executing will fail and will be automatically resubmitted unless you've changed the maxRetries option. Use this if you have jobs that may potentially end in infinite loops that you can't programatically end with your child code. Preferably run this with a maxConcurrentCallsPerWorker so you don't interrupt other calls when you have a timeout. This timeout operates on a per-call basis but will interrupt a whole worker.

  • maxRetries allows you to control the max number of call requeues after worker termination (unexpected or timeout). By default this option is set to Infinity which means that each call of each terminated worker will always be auto requeued. When the number of retries exceeds maxRetries value, the job callback will be executed with a ProcessTerminatedError. Note that if you are running with finite maxCallTime and maxConcurrentCallsPerWorkers greater than 1 then any TimeoutError will increase the retries counter for each concurrent call of the terminated worker.

  • autoStart when set to true will start the workers as early as possible. Use this when your workers have to do expensive initialization. That way they'll be ready when the first request comes through.

  • onChild when new child process starts this callback will be called with subprocess object as an argument. Use this when you need to add some custom communication with child processes.

workerFarm.end(farm)

Child processes stay alive waiting for jobs indefinitely and your farm manager will stay alive managing its workers, so if you need it to stop then you have to do so explicitly. If you send your farm API to workerFarm.end() then it'll cleanly end your worker processes. Note though that it's a soft ending so it'll wait for child processes to finish what they are working on before asking them to die.

Any calls that are queued and not yet being handled by a child process will be discarded. end() only waits for those currently in progress.

Once you end a farm, it won't handle any more calls, so don't even try!

Related

  • farm-cli – Launch a farm of workers from CLI.

License

Worker Farm is Copyright (c) Rod Vagg and licensed under the MIT license. All rights not explicitly granted in the MIT license are reserved. See the included LICENSE.md file for more details.

NPM DownloadsLast 30 Days