Vectorized Algorithms in Java

DZone

There has been a Cambrian explosion of JVM data technologies in recent years. It’s all very exciting, but is the JVM really competitive with C in this area? I would argue that there is a reason Apache Arrow is polyglot, and it’s not just interoperability with Python. To pick on one project impressive enough to be thriving after seven years, if you’ve actually used Apache Spark, you will be aware that it looks fastest next to its predecessor, MapReduce. Big data is a lot like teenage sex: everybody talks about it, nobody really knows how to do it, and everyone keeps their embarrassing stories to themselves. In games of incomplete information, it’s possible to overestimate the competence of others: nobody opens up about how slow their Spark jobs really are because there’s a risk of looking stupid.

If it can be accepted that Spark is inefficient, the question becomes: Is Spark fundamentally inefficient?Flare provides a drop-in replacement for Spark’s backend, but replaces JIT compiled code with highly efficient native code, yielding order of magnitude improvements in job throughput. Some of Flare’s gains come from generating specialized code, but the rest comes from just generating better native code than C2 does. If Flare validates Spark’s execution model, perhaps it raises questions about the suitability of the JVM for high-throughput data processing.

Source: DZone

Pyntax

Vectorized Algorithms in Java

ByRichard Startin

By Richard Startin

Related Post

The Power of Software Visualization

How to Configure log4j2 In a Spring Boot Application? | Spring Boot Logging [Video]

Azure Databricks: 14 Best Practices For a Developer

You missed

Teslas made in Texas will likely have to leave the state before Texans can buy them

MagSafe used to fish out iPhone 12 Pro dropped in canal

Wacom Cintiq Pro 24 Touch review: Beautiful but needs improvement

Google made it hard for users to keep location data private

Pyntax