简体   繁体   中英

Parallela FPGA- 64 cores performance compared with GPUs and expensive FPGAs?

This is the Parallela:

http://anycpu.org/forum/viewtopic.php?f=13&t=66

It has 64 cores, 1GB RAM, runs Linux, Ethernet- everyone is shouting about it....

My question is, from a performance/capability perspective how does the Parallela compare with more expensive FPGAs? Do they just have wider buses/more memory/faster processor clocks/more processors on the chip?

I understand GPUs are for massively parallel simple operations and CPUs are better for more complicated single-threaded computation- so where do expensive FPGAs and the Parallela fit on this curve?

The Parallela runs Linux- yet I was always under the impression FPGAs have their logic flashed on to them by writing verilog or VHDL?

A partial answer : FPGAs tend not to have ANY processors on the chip (there are exceptions) - but if you think about processing by fetching instructions and executing them one after the other, you haven't really grasped FPGAs. If you can see how to execute one complete iteration of your inner loop in a single clock cycle, you're getting there.

There will be tasks where this is easy, and the FPGA can wipe the floor with any other solution. There will be tasks where it is impossible, and the Parallela will be a contender. I don't see any one high performance solution as an overall winner; there are impressive things being done with GPUs (low power isn't one of them!), and many-core XMOS or Parallela solutions have their place too.

The only Parallelas available now are 16 cores. They have a Xilinx Zynq 7010 or 7020 which is dual core Arm 800mhz/1ghz and 80k logic cell FPGA which is used to communicate with the Parallela chip. I don't know how much of the FPGA is available to play with though.

If Parallelas has 16 cores and assume that each core has a hardware multiplier that runs at 1GHz, the overall computation ability of Parallelas is comparable with a $200 FPGA roughly, and definitely worse than a $1000 FPGA. However in most applications math computation are not the main processor's jobs; they are handled by ASIC (or an IP core or DSP coprocessor inside the main processor), for example H.264 codec or WiFi data modulation. For applications supported by ASIC, high-performance processor plus corresponding ASIC is always the best solution. Only if you want to be unique at some part, for example better image processing algorithms, you probably want to implement your own signal processing algorithm, and this is where multi-core DSP, GPU and high-end FPGA compete.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM