Closed Bug 1833647 Opened 1 year ago Closed 2 months ago

Implement Float16Array

Categories

(Core :: JavaScript Engine, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
127 Branch
Tracking Status
firefox127 --- fixed

People

(Reporter: dminor, Assigned: dminor)

References

(Blocks 4 open bugs)

Details

(Keywords: dev-doc-complete)

Attachments

(8 files, 1 obsolete file)

This is at stage 3 and ready for implementation. Some thoughts:

  • We define typed arrays here and have precedent for defining custom types on top of an underlying primitive type in uint8_clamped.
  • It looks like we could do something similar for f16.
  • Conversion from f32 to f16 is a matter of bit-twiddling, it looks there is hardware support for conversions / storage on x86 and ARM, so this is likely optimizable.
  • From what I can tell from the uint8_clamped implementation, this is sufficient, we’ll convert back to a Number for operations like filter or fill.

The changes required to support this are self-contained. I think this would be a good project for someone new interested in contributing to SpiderMonkey, it's not trivial, but not a huge project either.

I haven't hacked SpiderMonkey before but would like to take a crack at this :)

Excellent! Let me know if you need any help getting started. Here's some info on getting a build going: https://firefox-source-docs.mozilla.org/js/build.html

Thank you. From the code you linked I assume the first steps are:

  • Define a js::float16 type similar to js::uint8_clamped.
  • Represent float16 as a C++ short (uint16_t). Use bit-twiddling to extract appropriate bits from a float32. Most likely optimize with hardware supported instructions when available.

AFAIU the benefits come from the smaller representation? I.e. to do computation with float16 one must convert to float32 (bit twiddling or hardware instructions), do the computation, convert back to float16. Hardware support for computation in float16 is only available in recent Intel processors from 2020 onwards and is called BFloat16. Is this correct?

The thing I'm most uncertain about is what the interface of the js::float16 type should look like? js::uint8_clamped has conversions from various primitive types. I assume we need more than that? I'm trying to answer these questions myself by looking through uses of js::uint8_clamped on Searchfox.

Yes, that sounds right. Don't worry about the hardware optimizations as part of this bug, it's not worth the extra complexity right now. If that changes, we'll file a follow up bug to handle the optimizations.

Yes, you're right, the advantage is the smaller representation, e.g. for passing buffers to GPUs that do have native support for float16 arithmetic when doing graphics or neural networks.

That's a good question about whether the various primitive type conversions, I'm not sure offhand. I'd suggest starting with the float32 conversion for now, and worry about the other types later. I'll look into it too.

Severity: -- → N/A
Priority: -- → P3

After looking at the code some more, I'm not sure what ExternalT in MACRO(ExternalT, NativeT, Name) means. For all the other definitions it's pretty clear what it's supposed to be without knowing the meaning of it.

I'm at the point where the compiler is complaining that it doesn't know how to sort Float16 typed arrays. Makes sense. I implemented TypedArrayStdSort by calling std::sort and converting the uint16_t half-precision floats into single-precision floats. Seems reasonable. However due to sizeof(js::float16) == 2, it wants to use Radix sort (1). Which inturn calls Counting Sort. Now that seems less reasonable for floating point data.

(1): https://searchfox.org/mozilla-central/source/js/src/vm/TypedArrayObject.cpp#2763

(In reply to B. Dahse from comment #3)

I.e. to do computation with float16 one must convert to float32 (bit twiddling or hardware instructions), do the computation, convert back to float16. Hardware support for computation in float16 is only available in recent Intel processors from 2020 onwards and is called BFloat16. Is this correct?

BFloat16 is a different 16-bit float type. The Float16Array proposal uses instead the IEEE-754/IEC 60559 binary16, see also https://en.wikipedia.org/wiki/Half-precision_floating-point_format.

(In reply to B. Dahse from comment #5)

After looking at the code some more, I'm not sure what ExternalT in MACRO(ExternalT, NativeT, Name) means. For all the other definitions it's pretty clear what it's supposed to be without knowing the meaning of it.

TypedArray contents are also exposed via the public JS SpiderMonkey API. The public API uses the ExternalT type. I think it should be possible to use ExternalT = uint16_t for now.

(In reply to B. Dahse from comment #6)

I'm at the point where the compiler is complaining that it doesn't know how to sort Float16 typed arrays. Makes sense. I implemented TypedArrayStdSort by calling std::sort and converting the uint16_t half-precision floats into single-precision floats. Seems reasonable.

It should be possible to specialise UnsignedSortValue similar to this specialisations for float/double. And then enable this function to float16 in addition to float/double. That way it's not necessary to convert to float.

However due to sizeof(js::float16) == 2, it wants to use Radix sort (1). Which inturn calls Counting Sort. Now that seems less reasonable for floating point data.

Just change the if constexpr condition, so float16 types don't call counting sort. We can worry about any possible performance improvements later.

(In reply to Dan Minor [:dminor] from comment #0)

  • Conversion from f32 to f16 is a matter of bit-twiddling, it looks there is hardware support for conversions / storage on x86 and ARM, so this is likely optimizable.

The conversion functions from stackoverflow don't correctly handle all inputs, for example denormals. https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/builtins/truncsfhf2.c (Apache 2 licensed) seems better when comparing the results against the results from vcvtps2ph on x86-64.

It's easy to create a test program to ensure the conversion works correctly:

#include <stdint.h>
#include <cstring>
#include <cstdio>

union Float16 {
 private:
  uint16_t value_ = 0;

 public:
  Float16() = default;
  Float16(uint16_t value) : value_(value) {}

  bool operator==(const Float16& other) const { return value_ == other.value_; }
  bool operator!=(const Float16& other) const { return value_ != other.value_; }

  auto value() const { return value_; }
};

static auto ConvertFloat16(float flt) {
  // https://gcc.gnu.org/onlinedocs/gcc/Half-Precision.html
  _Float16 f16 = flt;

  uint32_t fltInt16;
  std::memcpy(&fltInt16, &f16, sizeof(_Float16));

  return Float16(fltInt16);
}

static auto MyConvertFloat16(float flt) {
  // TODO: Implement conversion here.
  return ConvertFloat16(flt);
}

// Compile with:
// (1) -O3 -std=c++17 -mfpmath=sse -msse2
// (2) -O3 -std=c++17 -mfpmath=sse -msse2 -march=x86-64-v3
//
// (1) tests the intrinsic compiler functions.
// (2) tests the hardward optimised instructions.
int main() {
  for (uint64_t j = 0; j <= uint64_t(0xffff'ffff); ++j) {
    uint32_t i = uint32_t(j);
    float flt;
    std::memcpy(&flt, &i, sizeof(float));

    auto x = ConvertFloat16(flt);
    auto y = MyConvertFloat16(flt);

    if (x != y) {
      std::printf("%08x: x(%04x) != y(%04x)\n", i, x.value(), y.value());
      break;
    }
  }
}

Thank you for the detailed comment. It sounds like I'm on the right track. The specialization of UnsignedSortValue and enabling the floating point implementation for float16 was my first idea but I ran into difficulties so I decided to just code a float16 specific version to get something compiling.

I took your advice about disabling radix/counting sort for float16 so I can compile SpiderMonkey once again. Yay!

Now unfortunately some 500 of the JIT tests are broken. Most fail with an assertion "MOZ_CRASH(unexpected JSProtoKey)" which I have not had time to look into just yet. There are also quite a few TypeError: can't convert BigInt to number. I suspect it has to do with where I defined the new Float16 scalar. I put it after Big(U)int64 so maybe there is some type confusion going on. Any advice is welcome :)

Oh the Protokey thing was easy enough, just another case in the switch.

Also, I'm a bit unsure how to handle float16 in codegen. Since we represent it as a uint16_t at runtime, I guess I should handle it the same way in JIT? Because we don't want to reinterpret the bits as a float for example.

Seems my intuition about the order of declarations was right. After moving around I'm now down to a handful of webassembly test failures:

FAILURES:
    wasm/memory64/memory-copy.js
    --wasm-compiler=optimizing wasm/memory64/memory-copy.js
    --wasm-compiler=baseline wasm/memory64/memory-copy.js
    wasm/memory64/memory-fill.js
    --test-wasm-await-tier2 wasm/memory64/memory-copy.js
    --wasm-compiler=optimizing wasm/memory64/memory-fill.js
TIMEOUTS:
    --wasm-compiler=optimizing wasm/atomicity.js
    --wasm-compiler=baseline wasm/atomicity.js
    --disable-wasm-huge-memory wasm/atomicity.js

(In reply to B. Dahse from comment #10)

Also, I'm a bit unsure how to handle float16 in codegen. Since we represent it as a uint16_t at runtime, I guess I should handle it the same way in JIT?

Code needs to be added to convert the bits representing the Float16 number to a double, similar to the existing MacroAssembler::convertFloat32ToDouble function.

But this can be added in a follow-up bug. Instead I'd suggest to disable JIT support Float16 TypedArrays by adding the following check to GetPropIRGenerator::tryAttachTypedArrayElement and SetPropIRGenerator::tryAttachSetTypedArrayElement:

// JIT support for Float16 isn't yet implemented.
if (tarr->type() == Scalar::Float16) {
  return AttachDecision::NoAction;
}

I see. I will review my diff and send it in because I don't think there is much more that needs to be done in this bug in that case.

(In reply to B. Dahse from comment #13)

I see. I will review my diff and send it in because I don't think there is much more that needs to be done in this bug in that case.

Great! Thank you for working on this :)

Blocks: 1835034

I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1835034 as the follow up from Comment 12, please mention that bug number when you disable like Anba suggested.

Assignee: nobody → ronoueb
Status: NEW → ASSIGNED

Hi, thanks for your work on this so far :) I was wondering if you were planning to continue? It's ok if not, the work you've done is useful, and we'll look for someone else to finish it off.

Flags: needinfo?(ronoueb)

I haven't forgotten about this and I think I actually more or less solved it but I currently have a broken build (ugh, C++ templates...). Now that I know there's still interest I will try to get it rebased and try to rig it to use the mozilla floating point traits template stuff instead of my own to get it to build again. If I'm blocked on anything it's because I'm not familiar with the appropriate phabricator workflow for what I'm trying to do.

Flags: needinfo?(ronoueb)

Great :) I'm glad you're still interested in this. Let us know if you have any questions.

Hi! I wanted to check in again to see if you've had any luck with this. We'd like to get this landed in 2024, so please let me know if there's anyway I can help out. If you don't have the time for it right now, I'd be happy to pick up whatever you have in progress and get it finished off. Thanks again for your work :)

Flags: needinfo?(ronoueb)

I'm going to take this one over.

Assignee: ronoueb → dminor
Mentor: dminor
Flags: needinfo?(ronoueb)

Yes that's fine, I'm starting a new job so I don't have time to push this one over the finish line. This is the code I wrote for doing branchless conversion from double/float to float16, tested on all 32 bit strings. It's based on a snippet from Stackoverflow with NaN normalization added.

  template<typename F>
  float16& operator=(F x) {
    using enc = mozilla::FloatingPoint<float16>;
    using flt = mozilla::FloatingPoint<F>;
    using raw_type = typename flt::Bits;

    static constexpr auto sig_diff = flt::kSignificandWidth - enc::kSignificandWidth;
    static constexpr auto bit_diff = CHAR_BIT * (sizeof(raw_type) - sizeof(enc::Bits));
    static constexpr auto bias_mul = raw_type(enc::kExponentBias) << flt::kSignificandWidth;
    static constexpr auto flt_inf  = flt::kExponentBits;
    static constexpr auto enc_inf  = enc::kExponentBits;
    static constexpr auto qnan     = enc_inf | enc_inf >> 1;

    auto bits = mozilla::BitwiseCast<raw_type>(x);
    auto sign = bits & flt::kSignBit; // save sign
    bits ^= sign; // clear sign
    auto is_nan = flt_inf < bits; // compare before rounding!!
    auto payload = bits >> sig_diff & 0x1ff;

    static constexpr auto min_norm = raw_type(flt::kExponentBias - enc::kExponentBias + 1) << flt::kSignificandWidth;
    static constexpr auto sub_rnd = enc::kExponentBias < sig_diff
      ? raw_type(1) << (flt::kSignificandWidth - 1 + enc::kExponentBias - sig_diff)
      : raw_type(enc::kExponentBias - sig_diff) << flt::kSignificandWidth;
    static constexpr auto sub_mul = raw_type(flt::kExponentBias + sig_diff) << flt::kSignificandWidth;

    bool is_sub = bits < min_norm;
    auto norm = mozilla::BitwiseCast<F>(bits);
    auto subn = norm;
    subn *= mozilla::BitwiseCast<F>(sub_rnd);  // round subnormals
    subn *= mozilla::BitwiseCast<F>(sub_mul);  // correct subnormal exp
    norm *= mozilla::BitwiseCast<F>(bias_mul); // fix exp bias
    bits  = mozilla::BitwiseCast<raw_type>(norm);
    bits += (bits >> sig_diff) & 1; // add tie breaking bias
    bits += (raw_type(1) << (sig_diff - 1)) - 1; // round up to half
    //if( is_sub ) bits = std::bit_cast<raw_type>(subn);
    bits ^= -is_sub & (mozilla::BitwiseCast<raw_type>(subn) ^ bits);

    bits >>= sig_diff; // truncate
    //if( enc::inf < bits ) bits = enc::inf; // fix overflow
    bits ^= -( enc_inf < bits ) & ( enc_inf ^ bits );
    //if( is_nan ) bits = enc::qnan | payload;
    bits ^= -is_nan & ((qnan | payload) ^ bits);
    bits |= sign >> bit_diff; // restore sign

    val = bits;
    return *this;
  }
Depends on: 1877822

This adds a Float16 type with conversion code for double derived from the half
library. The half library (https://sourceforge.net/projects/half/) is MIT
licensed.

Optimized code, based upon lookup tables, exists for single precision, but it's
not obvious to me whether it's correct to first round to single precision and
then to half precision when converting from double. The fact that the half
library does not implement their conversion this way leads me to believe that
this is not correct.

The V8 implementation is using this library: https://github.com/Maratyszcza/FP16,
which only supports single precision conversions.

Depends on D203414

The tests are based upon the existing fround.js testcase, with a polyfill and
additional tests from https://github.com/petamoriken/float16/, which is MIT
licensed.

Depends on D203415

This adds the Float16Array implementation. This code is derived from an
earlier patch written by Benjamin Dahse.

Depends on D203416

Depends on D203417

Depends on D203418

it's not obvious to me whether it's correct to first round to single precision and then to half precision when converting from double

Indeed it is not correct. Consider the float64 1.00048828125000022204, which is the next float64 after 1.00048828125, which is exactly halfway between 1.0 and the next float16 after 1.0 i.e. 1.0009765625.

Float64ToFloat32(1.00048828125000022204) is 1.00048828125, so Float32ToFloat16(Float64ToFloat32(1.00048828125000022204)) is 1.0 (under ties-to-even).

Float64ToFloat16(1.00048828125000022204) is of course 1.0009765625, since the original value was specifically chosen to be > halfway between 1.0 and 1.0009765625.

Blocks: 1883766
Attachment #9389011 - Attachment description: Bug 1833647 - Add Float16.h; r=anba,nbp! → Bug 1833647 - Add Float16.h; r=anba,nbp!,dnazer!

The Float16Array tests in test262 check for Float16Array on the fly instead of
using a feature flag. This adds support for checking the list of include files
for a specific include file, and adding the appropriate command line to the
existing shell option code if it is present.

Depends on D203419

Attachment #9336461 - Attachment is obsolete: true

Incidentally you may wish to follow / comment on https://github.com/tc39/proposal-float16array/issues/12.

While other browsers also do not yet support Float16Array, the lack of its support is the reason for our interop2024 failures in a number of indexedDB structured-clone tests and idb-binary-key-roundtrip.htm, so I'm marking this bug on wpt.fyi's dashboard for those tests.

Attachment #9392383 - Attachment description: Bug 1833647 - Add support for feature detection to test262-update.py; r=anba,nbp! → Bug 1833647 - Add support for feature detection to test262-update.py; r=anba!
Attachment #9392384 - Attachment description: Bug 1833647 - Reimport test262 with Float16Array tests enabled; r=anba,nbp! → Bug 1833647 - Reimport test262 with Float16Array tests enabled; r=anba!
Blocks: 1893229
Pushed by [email protected]:
https://hg.mozilla.org/integration/autoland/rev/cab7c4515d6e
Add pref for Float16Array; r=anba,nbp
https://hg.mozilla.org/integration/autoland/rev/9f14aaa722d5
Add Float16.h; r=anba,nbp,sylvestre
https://hg.mozilla.org/integration/autoland/rev/b9dd8b1e2baf
Implement Math.f16round; r=anba,nbp
https://hg.mozilla.org/integration/autoland/rev/ae7d211b9183
Add Float16Array; r=anba,nbp
https://hg.mozilla.org/integration/autoland/rev/2f0b7bba5b1a
Enable some tests for Float16Array; r=anba,nbp
https://hg.mozilla.org/integration/autoland/rev/c292d2436e9b
Implement DataView float16 methods; r=anba,nbp
https://hg.mozilla.org/integration/autoland/rev/dd63daeb6c07
Add support for feature detection to test262-update.py; r=anba
https://hg.mozilla.org/integration/autoland/rev/9355724ed379
Reimport test262 with Float16Array tests enabled; r=anba

Backed out for causing crashes with js::GlobalObject::skipDeselectedConstructor

Flags: needinfo?(dminor)

I think I have this fixed by adding only adding Float16Array to JSProto for Nightly Builds: https://treeherder.mozilla.org/jobs?repo=try&revision=ea84ec64c025afafba0df634cec40ac6a19c0321

Flags: needinfo?(dminor)
Pushed by [email protected]:
https://hg.mozilla.org/integration/autoland/rev/efbb8096ba45
Add pref for Float16Array; r=anba,nbp
https://hg.mozilla.org/integration/autoland/rev/c8f44ec2be16
Add Float16.h; r=anba,nbp,sylvestre
https://hg.mozilla.org/integration/autoland/rev/3fbfc13c9452
Implement Math.f16round; r=anba,nbp
https://hg.mozilla.org/integration/autoland/rev/a4640f7b9be7
Add Float16Array; r=anba,nbp
https://hg.mozilla.org/integration/autoland/rev/a399a0da4980
Enable some tests for Float16Array; r=anba,nbp
https://hg.mozilla.org/integration/autoland/rev/7e6742efa89b
Implement DataView float16 methods; r=anba,nbp
https://hg.mozilla.org/integration/autoland/rev/18435f4cca59
Add support for feature detection to test262-update.py; r=anba
https://hg.mozilla.org/integration/autoland/rev/2768653c3e4a
Reimport test262 with Float16Array tests enabled; r=anba
Regressions: 1893797
Keywords: dev-doc-needed

MDN docs work for this can be tracked in https://github.com/mdn/content/issues/33561

A question - this is gated behind a preference, but some of it seems to be conditioned on nightly - such as https://hg.mozilla.org/integration/autoland/rev/3fbfc13c9452#l1.30

Does this mean that this featured only works on NIghtly AND with preference set?

Flags: needinfo?(dminor)

Yes, for now, this requires a Nightly build and the preference set. My current plan is to ship this in Firefox 130, at which point the pref will default to true, and the Nightly requirement will go away.

Flags: needinfo?(dminor)
Regressions: 1904648
Regressions: 1904649
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: