Saturday, January 8, 2022

graph analytics processor (2017)

 

https://robertsinterests.wordpress.com/2017/06/15/darpa-funds-development-of-new-type-of-processor-ee-times/


DARPA Funds Development of New Type of Processor | EE Times
Posted on June 15, 2017

A completely new kind of non-von-Neumann processor called a HIVE — Hierarchical Identify Verify Exploit — is being funded by the Defense Advanced Research Project Agency (DARPA) to the tune of $80 million over four-and-a-half years. Chipmakers Intel and Qualcomm are participating in the project, along with a national laboratory, a university and a defense contractor North Grumman.

Pacific Northwest National Laboratory (Richland, Washington) and Georgia Tech are involved in creating software tools for the processor while Northrup Grumman will build a Baltimore center that uncovers and transfers the Defense Departments graph analytic needs for the what is being called the world’s first graph analytic processor (GAP).

Hierarchical Identify Verify Exploit (HIVE) uses a sequence that begins with the multi-layer graphical representations of data (see figure) that open the way for graph analytic processing to identify relationships between data within and perhaps between the layers.
(Source: DARPA)

“When we look at computer architectures today, they use the same [John] von Neumann architecture invented in the 1940s. CPUs and GPUs have gone parallel, but each core is still a von Neumann processor,” Trung Tran, a program manager in DARPA’s Microsystems Technology Office (MTO), told EE Times in an exclusive interview.

“HIVE is not von Neumann because of the sparseness of its data and its ability to simultaneously perform different processes on different areas of memory simultaneously,” Trung said. “This non-von-Neumann approach allows one big map that can be accessed by many processors at the same time, each using its own local scratch-pad memory while simultaneously performing scatter-and-gather operations across global memory.”

Graph analytic processors do not exist today, but they theoretically differ from CPUs and GPUs in key ways. First of all, they are optimized for processing sparse graph primitives. Because the items they process are sparsely located in global memory, they also involve a new memory architecture that can access randomly placed memory locations at ultra-high speeds (up to terabytes per second).

Today’s memory chips are optimized to access long sequential locations (to fill their caches) at their highest speeds, which are in the much slower gigabytes per second range. HIVEs, on the other hand, will access random eight-byte data points from global memory at its highest speed, then process them independently using their private scratch-pad memory. The architecture is also specified to be scalable to up to however many HIVE processors are needed to perform a specific graph algorithm.

“Of all the data collected today, only about 20 percent is useful — that’s why its sparse —making our eight-byte granularity much more efficient for Big Data problems,” said Tran.

The Giga Traversed Edges Per Second Per Watt needed for realtime graphic analysis that identify relationships as they unfold in the field is 1000-times faster (green) than the fastest GPU (blue) or CPU (red) today. (Source: DARPA)

Together, the new arithmetic-processing-unit (APU) optimized for graph analytics plus the new memory architecture chips are specified by DARPA to use 1,000-times less power than using today’s supercomputers. The participants, especially Intel and Qualcomm, will also have the rights to commercialize the processor and memory architectures they invent to create a HIVE.

The graph analytics processor is needed, according to DARPA, for Big Data problems, which typically involve many-to-many rather than many-to-one or one-to-one relationships for which today’s processors are optimized. A military example, according to DARPA, might be the the first digital missives of a cyberattack. A civilian example, according to Intel, might be all the people buying from Amazon mapped to all the items each of them bought (clearly delineating the many-to-many relationships as people-to-products).

“From my standpoint, the next big problem to solve is Big Data, that today is analyzed by regression which is inefficient for relations between data points that are very sparse,” said Tran. “We found that the CPU and GPU leave a big gap between the size of problems and the richness of results, whereas graph theory is a perfect fit for which we see an emerging commercial market too.”

Besides the HIVE chip, the DARPA mandate calls for the development of software tools to help programming the new architecture, which goes beyond today’s parallel processing paradigm by also allowing simultaneous parallel access to random memory locations. If successful, DARPA claims that the graph analytics processor will be able to recognize and identify many types of situations that are intractable for conventional CPUs and GPUs.

Applications (top) and performance (bottom) comparisons between Intel CPUs, Nvidia GPUs, Google TPUs, and DARPA’s proposed HIVE processor. (Source: DARPA)

DARPA describes its Big Data as sensor feeds, economic indicators, scientific- and environmental-measurements as the nodes of a graph, and the edges of the graph as the relationships between the nodes, such as “bought” in the Amazon example.

The basis of graph theory analytics can be traced back to the famous philosopher Gottfried Wilhelm Leibniz, but is usually attributed to the first paper on the subject, the “Seven Bridges of Königsberg” published in 1736 by Leonhard Euler. Since then it has been developed into a host of algorithms and mathematical structures that model the relationships between random data points. The HIVE architecture is designed to use these graph analytics to identify threats, track disease outbreaks, and otherwise answer Big Data questions that today are intractable for conventional CPUs and GPUs.

[ 4 years, 6 months DARPA program ]

The four-and-a-half-year DARPA program will spend the first year with Intel and Qualcomm designing rival architectures, while Georgia Tech and PNNL design rival software tools. After the first year, one hardware design and one software design will be chosen. DARPA will provide the company with the winning hardware design with $50 million in funding, on the condition that the company kick in $50 million of its own. DARPA will also provide $7 million to the organization that provides the winning software design.

Meanwhile, Northrup will be given $11 million in non-matching funds to set up the Baltimore center to survey all of the Defense Department needs in graph analytics and make sure that the hardware and software builders meet those needs.

“HIVE is a team effort to collaborate on data handling that leverages machine learning and other AI using graph analytic processors,” Dhiraj Mallick, vice president of the Intel’s Data Center Group, told EE Times.

Confident that Intel will beat out Qualcomm with the winning chip design, Mallick continued: “Intel has been asked to provide a 16-node platform at the end of the program using 16 HIVE processors on a single printed circuit board. Intel will also have the rights to productize versions for the worldwide market.”

The resulting HIVE processor will enable realtime identification and awareness of strategic assets as situations unfold. Whereas today we have to depend on after-the-fact analysis “closing the barn door after the horse has been stolen,” Mallick said.

http://www.eetimes.com/document.asp?doc_id=1331871

 <---------------------------------------------------------------------------->

    DARPA Funds Development of New Type of Processor
    Worlds 1st Non-Von-Neumann
    R. Colin Johnson
    6/9/2017 01:01 AM EDT       

       to address the big data (data science) problems 
       graph theory
        graph theory analytics 
        graph analytics 
       graph analytics processor
       graph analytics chip
       graph analytics chip project 

       graph analytics processor will be able to recognize and identify many types of situations that are intractable for conventional CPUs and GPUs. 

    https://robertsinterests.wordpress.com/2017/06/15/darpa-funds-development-of-new-type-of-processor-ee-times/

    http://www.eetimes.com/document.asp?doc_id=1331871&
    http://www.eetimes.com/document.asp?doc_id=1331871&page_number=2

    https://www.darpa.mil/news-events/2017-06-01
    
    https://www.top500.org/news/darpa-taps-intel-for-graph-analytics-chip-project/
    
    https://www.darpa.mil/news-events/2017-06-02

    https://www.fbo.gov/index?s=opportunity&mode=form&id=daa4d6dbee8741f56d837c404eac726d&tab=core&_cview=1

    https://en.wikipedia.org/wiki/Vector_processor

 <---------------------------------------------------------------------------->

    CPU - central processing unit (Intel) 
        - 
    GPU - graphic processing unit (Nvidia) 
    HPU - hybrid processing unit  (Intel) 
    SOC - system on a chip    

RISC - argued for a set of simplified instruction instead of raising the 
       sematic level of instructions, ...
       RISC [reduced instruction set computing architecture]
       RISC vs. CISC (Complex Instruction Set Computing architecture)
       ([ Personal computer Wintel machine would be an example of CISC ])
       ([   Wintel is CISC arch.itecture because Intel CPU is CISC     ])
       ([ iPhone and other Andriod devices would be an example of RISC ])
       ([   they are RISC arch. because they have ARM chip inside them ])
       ([   and ARM chip has RISC arch.itecture                        ]) 
       ([ CPU - central processing unit - the processor                ])
       ([     - CPU is the micro-electronic integrated circuit (IC)    ])
       ([       that perform the math(add, subtract, multiple, divide) ]) 
       ([       and boolean logic (AND, OR, NAND, NOR, comparison)     ]) 
       /* 
        * SUN workstation, part of Oracle, would be an example of a 
        * workstation computer using RISC architecture with the 
        * Ultra-SPARC processor, because the ultra-SPARC processor 
        * was designed using RISC arch.itecture.
        */
        en.wikipedia.org/wiki/Reduced_instruction_set_computing

    CPU CISC Architecture (Intel)
    CPU RISC Architecture (ARM) 


 <---------------------------------------------------------------------------->

    multi-node NUMA


    distributed, cache-coherent, non-uniform memory access machines (CC-NUMA) as advocated by the Stanford DASH project and later embodied in the SGI Origin computers.


NOW  - building scalable large-scale computers from standard networks 
       and computers as compared to distributed, cache-coherent, non-uniform
       memory access time machines (CC-NUMA),
       Stanford DASH project, SGI Origin computers,
       Network of Workstation (NOW)
       now.cs.berkeley.edu

• The Network of Workstations (NOW) project, 1993-1998, developed NOW-I and NOW-II, which were clusters of workstations that proved valuable in applications ranging from encryption to sorting. The Inktomi search engine was first built on NOW II, which led to a search engine startup company. Inktomi Inc. in turn demonstrated to the fledgling Internet industry the value of clusters of a large number of low-cost computers versus fewer more expensive high-end servers, which Google and others followed. Project alumni became faculty at Berkeley, Harvard, Illinois, Princeton, Rutgers, Stanford, Texas, and Wisconsin. Tom Anderson, Eric Brewer, David Culler, and I led the NOW project. 7

• NOW was controversial in that it argued for building scalable large-scale computers from standard networks and computers as compared to distributed, cache-coherent, non-uniform memory access time machines (CC-NUMA) as advocated by the Stanford DASH project and later embodied in the SGI Origin computers.

 <---------------------------------------------------------------------------->

https://www.darpa.mil/news-events/2017-06-01

Defense Advanced Research Projects Agency
News And Events
Beyond Scaling: An Electronics Resurgence Initiative
With its new multifaceted push, DARPA aims to lay groundwork for the next era of world-changing electronics and microsystems
outreach@darpa.mil
6/1/2017
Image Caption: The patchwork of microelectronic dies represents work performed by a multitude of university groups that participated in previous DARPA-industry-academe collaborations. DARPA’s new electronics initiative is pushing for a new era of microsystem structures and capabilities. Click on the image for a high-resolution version.

The Department of Defense’s proposed FY 2018 budget includes a $75 million allocation for DARPA in support of a new, public-private “electronics resurgence” initiative. The initiative seeks to undergird a new era of electronics in which advances in performance will be catalyzed not just by continued component miniaturization but also by radically new microsystem materials, designs, and architectures. The new funds will supplement the Agency’s FY 2018 R&D portfolio in electronics, photonics, and related systems to create a coordinated effort valued at more than $200 million, to be further supplemented by significant commercial sector investments.

The new initiative comes at a time when the microsystems technology community is facing an array of long-anticipated obstacles to its relentless and storied decades-long march of progress. The microelectronics revolution—which began after World War II with the invention of the transistor and led to today’s chips bearing billions of these now astoundingly minuscule digital switches—has arrived at an inflection point, beyond which innovators will no longer be able to rely solely on the benefits of cramming more and more electronic devices into smaller and smaller spaces.

“For nearly seventy years, the United States has enjoyed the economic and security advantages that have come from national leadership in electronics innovation,” said Bill Chappell, director of DARPA’s Microsystems Technology Office (MTO), which will lead the new effort. “If we want to remain out front, we need to foment an electronics revolution that does not depend on traditional methods of achieving progress. That’s the point of this new initiative – to embrace progress through circuit specialization and to wrangle the complexity of the next phase of advances, which will have broad implications on both commercial and national defense interests.”

To appreciate the magnitude of the change the initiative aims to achieve, it helps to look back at the last such paradigm shift, which quietly became public on July 1, 1948. That’s when the word “transistor” made its understated debut on page D4 of the New York Times. That day’s “The News of Radio” column started out with descriptions of two new radio shows but ended with a bit of obscure technology news: “A device called the transistor, which has several applications in radio where a vacuum tube ordinarily is employed, was demonstrated for the for the first time yesterday.” The shift from vacuum tubes to transistors would prove monumental, kicking off more than 70 years of electronics improvements based on these increasingly minuscule components.

Ten years after the rollout of the transistor, for example, Jack Kilby of Texas Instruments demonstrated the first integrated circuit—a breakthrough in which all circuit components shared space aboard a single chip of semiconductor material and opened the pathway to what has been a relentless sprint of seemingly miraculous transistor miniaturization. Nothing could be more emblematic of modernity than the countless microelectronic chips underlying today’s vast and varied technoscape. The essence of that pathway from discreet transistors in the 1940s to the billions that engineers now integrate on individual chips often is referred to as “scaling.” DARPA, which was founded in 1958, the same year Kilby introduced integrated circuits, supported many of the breakthroughs that enabled that evolutionary trek through the Silicon Age, including fundamental advances in semiconductor materials, massive-scale integration, and precision manufacturing.

But there always has been a finish line on the horizon. The fantastic saga of electronics miniaturization that has yielded ever more computing power at ever-lower unit costs—represented by the famed Moore’s Law (named after Intel’s co-founder Gordon Moore)—has always been destined to encounter the limitations of both physics and economics. As this inflection point nears, continued progress in microelectronics will require a new phase of innovation to keep the modern miracle of electronics innovation moving forward.

This inflection point would mark not only a flattening in one of the most consequential technological trajectories in the history of humanity, but also the beginning of what likely will be an even more audacious era of technological creativity and advance. This is where the DARPA initiative comes in.

By focusing on the development of new materials for use in electronic devices, new architectures for integrating those devices into complex circuits, and software and hardware design innovations for transforming microsystem designs into reality far more efficiently than ever before, the initiative aims to ensure continued improvements in electronics performance even without the benefit of traditional scaling. Over the coming months, DARPA’s MTO will engage with the microelectronics community through technology discussions, workshops, and other channels to forge a collaborative, cost-shared research agenda to usher microsystems into an exciting new age of innovation. The new research effort will complement DARPA’s recently created Joint University Microelectronics Program (JUMP), the largest University research effort in basic electronics, co-funded by DARPA and Semiconductor Research Corporation, an industry consortium.

The materials portion of the initiative will explore the use of unconventional circuit ingredients to substantially increase circuit performance without requiring smaller transistors. Although silicon is the most familiar microsystem material and compound semiconductors such as silicon germanium already play niche roles, these materials offer limited flexibility in function and reside in a single planar layer. The initiative will show that the Periodic Table provides a vast reservoir of candidate materials for next-generation logic and memory components. Research will unfold with an eye on integrating different semiconductor materials on individual chips, “sticky logic” devices that combine processing and memory functions, and vertical rather than only planar integration of microsystem components.

The architecture portion of the initiative will examine circuit structures that are optimized to the specific tasks they perform. Graphics processing units, which underlie much of the ongoing progress in machine learning, have already demonstrated the performance improvement derived from specialized hardware architectures. The initiative will explore other opportunities, such as reconfigurable physical structures that adjust to the needs of the software they support.

The design portion of the initiative will focus on developing tools for rapidly designing and realizing specialized circuits. Unlike general-purpose circuitry, specialized electronics can be much faster and more energy efficient. Although DARPA has consistently invested in these application-specific integrated circuits (ASICs) for military use, ASICs can be costly and time-consuming to develop. New design tools and an open-source design paradigm could be transformative, enabling innovators to rapidly and cheaply create specialized circuits for a range of commercial applications.

“The proliferation and increasing sophistication of microelectronics—and the computing, communications, navigation, and countless other technologies that depend on those electronics—have been astounding, and have primarily happened with essentially the same silicon-based approach,” said Chappell. “Look at how much the world has changed as a result of mobile phone technology alone in just the past ten years. To keep this pace of progress moving forward even as we lose the benefit of conventional scaling, we need to break away from tradition and embrace the kinds of innovations that the new initiative is all about. We are looking forward to working with the commercial sector, the defense industrial base, academia, the national laboratories, and other hotbeds of innovation to initiate the next electronics revolution.”

Image Caption: The patchwork of microelectronic dies represents work performed by a multitude of university groups that participated in previous DARPA-industry-academe collaborations. DARPA’s new electronics initiative is pushing for a new era of microsystem structures and capabilities. Click on the image for a high-resolution version.

 <---------------------------------------------------------------------------->

http://www.epanorama.net/newepa/2017/01/29/computer-trends-2017/comment-page-7/

Computer trends 2017

    Tomi Engdahl
    January 29, 2017
    Computers, Trends and predictions
    505

I did not have time to post my computer technologies predictions t the ends of 2016. Because I missed the year end deadline, I though that there is no point on posting anything before the news from CES 2017 have been published. Here are some of myck picks on the current computer technologies trends:

CES 2017 had 3 significant technology trends: deep learning goes deep, Alexa everywhere and Wi-Fi gets meshy. The PC sector seemed to be pretty boring.

Gartner expects that IT sales will growth (2.7%) but hardware sales will not have any growth – can drop this year. TEKsystems 2017 IT forecast shows IT budgets rebounding from a slump in 2016, and IT leaders’ confidence high going into the new year. But challenges around talent acquisition and organizational alignment will persist. Programming and software development continue to be among the most crucial and hard-to-find IT skill sets.

Smart phones sales (expected to be 1.89 billion) and PC sales (expected to be 432 million) do not grow in 2017. According to IDC PC shipments declined for a fifth consecutive year in 2016 as the industry continued to suffer from stagnation and lack of compelling drivers for upgrades. Both Gartner and IDC estimated that PC shipments declined about 6% in 2016.Revenue in the traditional (non-cloud) IT infrastructure segment decreased 10.8 per cent year over year in the third quarter of 2016. Only PC category that has potential for growth is ultramobile (includes Microsoft Surface ja Apple MacBook Air). Need for memory chips is increasing.

Browser suffers from JavaScript-creep disease: This causes that the browing experience seems to be become slower even though computer and broadband connections are getting faster all the time. Bloat on web pages has been going on for ages, and this trend seems to continue.

Microsoft tries all it can to make people to switch from older Windows versions to Windows 10. Microsoft says that continued usage of Windows 7 increases maintenance and operating costs for businesses as malware attacks that could have been avoided by upgrading to Windows 10. Microsoft says that continued usage of Windows 7 increases maintenance and operating costs for businesses. Microsoft: Windows 7 Does Not Meet the Demands of Modern Technology; Recommends Windows 10. On February 2017 Microsoft stops the 20 year long tradition of monthly security updates. Windows 10 “Creators Update” coming early 2017 for free, featuring 3D and mixed reality, 4K gaming, more.

Microsoft plans to emulate x86 instructions on ARM chips, throwing a compatibility lifeline to future Windows tablets and phones. Microsoft’s x86 on ARM64 Emulation is coming in 2017. This capability is coming to Windows 10, though not until “Redstone 3″ in the Fall of 2017. 

Parents should worry less about the amount of time their children spend using smartphones, computers and playing video games because screen time is actually beneficial, the University of Oxford has concluded. 257 minutes is the time teens can spend on computers each day before harming wellbeing.

Outsourcing IT operations to foreign countries is not trendy anymore and companied live at uncertain times. India’s $150 billion outsourcing industry stares at an uncertain future. In the past five years, revenue and profit growth for the top five companies listed on the BSE have halved. Industry leader TCS too felt the impact as it made a shift in business model towards software platforms and chased digital contacts.

Containers will become hot this year and cloud will stay hot. Research firm 451 Research predicts this year containerization will be US $ 762 million business and that Containers will become 2.6 billion worth of software business in 2020. (40 per cent a year growth rate).

Cloud services are expected to have  22 percent annual growth rate. By 2020, the sector would grow from the current 22.2 billion to $ 46 billion. In Finland 30% of companies now prefer to buy cloud services when buying IT (20 per cent of IT budget goes to cloud).Cloud spend to make up over a third of IT budgets by 2017. Cloud and hosting services will be responsible for 34% of IT budgets by 2017, up from 28% by the end of 2016, according to 451 Research. Cloud services have many advantages, but cloud services have also disadvantages. In five years, SaaS will be the cloud that matters.

When cloud is growing, so is the spending on cloud hardware by the cloud companies. Cloud hardware spend hits US$8.4bn/quarter, as traditional kit sinks – 2017 forecast to see cloud kit clock $11bn every 90 days. In 2016′s third quarter vendor revenue from sales of infrastructure products (server, storage, and Ethernet switch) for cloud IT, including public and private cloud, grew by 8.1 per cent year over year to $8.4 billion. Private cloud accounted for $3.3 billion with the rest going to public clouds. Data centers need lower latency components so Google Searches for Better Silicon.

The first signs of the decline and fall of the 20+ year x86 hegemony will appear in 2017. The availability of industry leading fab processes will allow other processor architectures (including AMD x86, ARM, Open Power and even the new RISC-V architecture) to compete with Intel on a level playing field.

USB-C will now come to screens – C-type USB connector promises to really become the only all equipment for the physical interface.The HDMI connection will be lost from laptops in the future. Thunderbolt 3 is arranged to work with USB Type-C,  but it’s not the same thing (Thunderbolt is four times faster than USB 3.1).

World’s first ‘exascale’ supercomputer prototype will be ready by the end of 2017, says China

It seems that Oracle Begins Aggressively Pursuing Java Licensing Fees in 2017. Java SE is free, but Java SE Suite and various flavors of Java SE Advanced are not. Oracle is massively ramping up audits of Java customers it claims are in breach of its licences – six years after it bought Sun Microsystems. Huge sums of money are at stake. The version of Java in contention is Java SE, with three paid flavours that range from $40 to $300 per named user and from $5,000 to $15,000 for a processor licence. If you download Java, you get everything – and you need to make sure you are installing only the components you are entitled to and you need to remove the bits you aren’t using.

Your Year in Review, Unsung Hero article sees the following trends in 2017:

    A battle between ASICs, GPUs, and FPGAs to run emerging workloads in artificial intelligence
    A race to create the first generation of 5G silicon
    Continued efforts to define new memories that have meaningful impact
    New players trying to take share in the huge market for smartphones
    An emerging market for VR gaining critical mass

Virtual Reality Will Stay Hot on both PC and mobile.“VR is the heaviest heterogeneous workload we encounter in mobile—there’s a lot going on, much more than in a standard app,” said Tim Leland, a vice president for graphics and imaging at Qualcomm. The challenges are in the needs to calculate data from multiple sensors and respond to it with updated visuals in less than 18 ms to keep up with the viewer’s head motions so the CPUs, GPUs, DSPs, sensor fusion core, display engine, and video-decoding block are all running at close to full tilt.

 <---------------------------------------------------------------------------->

http://www.eetimes.com/document.asp?doc_id=1332014&

News & Analysis
Woodie Flowers: Things, Not Theory
Martin Rowe
7/17/2017 11:59 AM EDT
21 comments
NO RATINGS


CAMBRIDGE, Mass. — If you've never heard of Woodie Flowers, you're missing out on an important aspect of engineering education: solving problems. The Papalardo Professor Emeritus of mechanical engineering at MIT has a resume that could fill a room. Flowers and Dean Kamen started the FIRST Robotics Competition, which holds events worldwide and is responsible for many people choosing to study and practice engineering. Search his name and you'll see Woodie's many accomplishments.

Flowers gave a keynote address at NI Week 2017, where he talked about why engineers need to think not about equations and theory, but about how they can use their skills to solve many of the world's problems. He also said that engineers, because of their ability to separate fact from fiction, are in a great position to understand how the world works. EE Times met with Flowers at his office on the MIT campus.
text
Woodie Flowers speaks to engineers at NIWeek 2017. Photo courtesy of National Instruments.

EE Times: How did you go about developing a hands-on engineering course?
Flowers: When I joined the MIT faculty, the head of the ME department was Asher Shapiro, one of the world's leaders in fluid dynamics. His grad course in fluid mechanics was the best course I ever took. He was, though, a classic engineering science guy. He'd start at the upper left corner of the blackboard, and at the end of the class, he was at the lower right corner.

What stuck in my head, however, came from a film series from National Science Foundation (NSF) that Shapiro supervised in making. When I look out the window of an airplane and see that little thing called a wing, I'm more likely to remember it not from Navier-Stokes equations, but because I remember the mechanics of why the wing is there. I realized that for engineers to truly learn how to solve problems, they needed more than theory. So I developed a hands-on course in design in which students had to solve real problems on their own.

EE Times: Do students come in with a yearning to learn?
Flowers: In my classes, particularly in a freshman seminar (which became the biggest freshman seminar at MIT), I would write "Things, not theory" on the blackboard. I would ask them to tell me what they wanted to learn about. I would get some 400 responses, and we'd pick about 150 things where students would give a five-minute presentation to the class. That showed me students could be curious about a lot of things. Creating a self-image with a license to be curious and a need to know is a big deal. Learn about the things around you. Can you figure out what this is?
Woodi Flowers bearingtext
One of the many objects that Flowers keeps in his office. He often hands such items to prospective PhD students during interviews. They are expected to show curiosity regarding the object.

Nearly half of the students in that freshman seminar were women. Some claimed to have been culturally deprived of the opportunity to learn how things worked.

When I was on nonprofit boards, I would read The Wall Street Journal because I needed to know. All you have to do is plant the seed and set the vector off in the right direction for people to realize their curiosity and willingness to learn.

When I first got involved in the sophomore design course, I was still a graduate student. The department head heard about a creativity kit from Xerox Parc where people were given a bag of things and told to make something. The students really struggled with what to make. That was a frustrating experience and hard for me to help them. They had to tackle a difficult problem first. The following semester, we decided to define the problem first. We told them to build a device that goes up a ramp in 30 seconds. That was all they needed.

EE Times: Reminds me of the movie "Apollo 13" when engineers were given the items on board and had to figure out how to create a tool and tell the astronauts how to build it.
I believe that the kids here are lucky because they are told, "Here's the problem, here's the stuff, go solve the problem." They are given the problem, but there are many ways to succeed. It's not a pass or fail. It has latitude but is well-defined.

EE Times: In trying to create a course that was hands-on, did it ruffle a few feathers?
Flowers: My colleagues would say, "Woodie, what are you doing? The students are spending all their time on your course and not doing the work they need to do."

Asher said, "Because you're in design, you have almost no chance of getting tenure." He was trying to be kind and honest, but that gave me license to "do my thing," so I didn’t try to get tenure until I was in sight of it. I would have liked to hear what they said about me when discussing my tenure. I did make some noise about a different kind of education and stirred things up a bit. Today, it's nice to walk down the halls of MIT as a retired professor and have people smile at me. I never would have predicted that.

In Part 2, Flowers discusses the value of both working alone and working as part of a team.

—Martin Rowe covers test and measurement for EE Times and EDN. Contact him at martin.rowe@aspencore.com Circle me on Google+ Follow me on TwitterVisit my LinkedIn page

 <---------------------------------------------------------------------------->

https://www.top500.org/news/darpa-taps-intel-for-graph-analytics-chip-project/

 DARPA Taps Intel for Graph Analytics Chip Project
Michael Feldman | June 7, 2017 04:22 CEST

The Defense Advanced Research Projects Agency (DARPA) has selected Intel to develop a graph analytics processor that will be a thousand times faster than anything available today.

The work is being done under DARPA’s Hierarchical Identify Verify Exploit (HIVE) program, a four-and-half year effort whose goal is to develop and integrate new graph hardware and software technologies for accelerating DoD analytics workloads. Along with Intel, DARPA has also brought in Pacific Northwest National Laboratory, Georgia Tech, Northrop Grumman, and Qualcomm Intelligent Solutions to help principally with the system software and application effort.

Graph analytics is applied to problems where casual relationships need to be derived from the large datasets. This applies to a wide array of applications such as transportation routing, genomics processing, financial transaction optimization, and consumer purchasing analysis, just to name a few. In the case of DARPA and the DoD, the more relevant applications are in areas like communications, intelligence, surveillance, and reconnaissance.

The problem is that the average computer cluster is not very adept at these types of problems. Generally, graph processing requires large amounts of high-bandwidth memory to operate with any efficiency. And as the problem size gets larger, the cluster network becomes a secondary bottleneck.

In a write-up posted on DARPA’s news site, Trung Tran, a program manager in the agency’s Microsystems Technology Office (MTO), outlined the case for the project. “Today’s hardware is ill-suited to handle such data challenges, and these challenges are only going to get harder as the amount of data continues to grow exponentially,” explained Tran. The HIVE effort adds the additional demand of real-time support for cases where streaming data needs to be analyzed on the fly.

The challenge is that correlation across a graph tends to be computationally expensive, requiring a processor that is highly parallel in nature and has access to highly performant memory. The closest commercial architectures we currently have for this computing model is Intel’s Xeon Phi and GPUs. It’s noteworthy that DARPA’s original HIVE description made a specific reference to graphics processors, saying “the goal is to see a 1000x improvement in power and performance on the HIVE chip compared to a GPU.”

That doesn’t mean the graph analytics processor will be some variant of the Xeon Phi. The DARPA document specifies the HIVE effort will be to “research and design a new chip architecture from scratch.” Much of the work will actually focus on componentry outside the processor cores themselves, especially in the development of a memory architecture that supports a multi-node NUMA model.

NUMA [non-uniform memory access]

 More specifically, DARPA has outlined the HIVE architectural goals as follows:

    1. Create an accelerator architecture and processor pipeline which supports the processing of identified graph primitives in a native sparse matrix format.

    2. Develop a chip architecture that supports the rapid and efficient movement of data from memory or I/Os to the accelerators based on an identified data flow model. Emphasis should be on redefining cache based architectures so that they address both sparse and dense data sets.

    3. Develop an external memory controller designed to ensure efficient use of the identified data mapping tools. The controller should be able to efficiently handle random and sequential memory accesses on memory transfers as small as 8 to 32 bytes.

Presumably most of the hardware effort will fall to Intel, which will tap its Data Center Group, Platform Engineering Group, and Intel Labs to develop the graph analytics processor. The company stands to collect more than $100 million from DARPA over the four-and-a-half-year project.

According to Dhiraj Mallick, vice president of the Data Center Group and general manager of the Innovation Pathfinding and Architecture Group at Intel, by the middle of 2021, they and their HIVE contract partners will deliver “a 16-node demonstration platform showcasing 1,000x performance-per-watt improvement over today’s best-in-class hardware and software for graph analytics workloads.”

Mallick says commercial graph analytics products from this effort may arrive even sooner.

 <---------------------------------------------------------------------------->

https://www.darpa.mil/news-events/2017-06-02


Extracting Insight from the Data Deluge Is a Hard-to-Do Must-Do
DARPA selects five performers to develop specialized ‘graph analytics” hardware and techniques for ferreting out insights that otherwise would remain indiscernible within our oceans of data
outreach@darpa.mil
6/2/2017
HiVE

A mantra of these data-rife times is that within the vast and growing volumes of diverse data types, such as sensor feeds, economic indicators, and scientific and environmental measurements, are dots of significance that can tell important stories, if only those dots could be identified and connected in authentically meaningful ways. Getting good at that exercise of data synthesis and interpretation ought to open new, quicker routes to identifying threats, tracking disease outbreaks, and otherwise answering questions and solving problems that previously were intractable.

Now for a reality check. “Today’s hardware is ill-suited to handle such data challenges, and these challenges are only going to get harder as the amount of data continues to grow exponentially,” said Trung Tran, a program manager in DARPA’s Microsystems Technology Office (MTO). To take on that technology shortfall, MTO last summer unveiled its Hierarchical Identify Verify Exploit (HIVE) program, which has now signed on five performers to carry out HIVE’s mandate: to develop a powerful new data-handling and computing platform specialized for analyzing and interpreting huge amounts of data with unprecedented deftness. “It will be a privilege to work with this innovative team of performers to develop a new category of server processors specifically designed to handle the data workloads of today and tomorrow,” said Tran, who is overseeing HIVE.

The quintet of performers includes a mix of large commercial electronics firms, a national laboratory, a university, and a veteran defense-industry company: Intel Corporation (Santa Clara, California), Qualcomm Intelligent Solutions (San Diego, California), Pacific Northwest National Laboratory (Richland, Washington), Georgia Tech (Atlanta, Georgia), and Northrop Grumman (Falls Church, Virginia).

“The HIVE program is an exemplary prototype for how to engage the U.S. commercial industry, leverage their design expertise, and enhance U.S. competitiveness, while also enhancing national security,” said William Chappell, director of MTO. “By forming a team with members in both the commercial and defense sectors, we hope to forge new R&D pathways that can deliver unprecedented levels of hardware specialization. That can be a boost for commercial players but also can advance our military electronics supply to make sure the national defense infrastructure is empowered with the best capabilities in the world.”

Central to HIVE is the creation of a “graph analytics processor,” which incorporates the power of graphical representations of relationships in a network more efficiently than traditional data formats and processing techniques. Examples of these relationships among data elements and categories include person-to-person interactions as well as seemingly disparate links between, say, geography and changes in doctor visit trends or social media and regional strife. In combination with emerging machine learning and other artificial intelligence techniques that can categorize raw data elements, and by updating the elements in the graph as new data becomes available, a powerful graph analytics processor could discern otherwise hidden causal relationships and stories among the data elements in the graph representations.

If HIVE is successful, it could deliver a graph analytics processor that achieves a thousandfold improvement in processing efficiency over today’s best processors, enabling the real-time identification of strategically important relationships as they unfold in the field rather than relying on after-the-fact analyses in data centers. “This should empower data scientists to make associations previously thought impractical due to the amount of processing required,” said Tran. These could include the ability to spot, for example, early signs of an Ebola outbreak, the first digital missives of a cyberattack, or even the plans to carry out such an attack before it happens.

The words in the program’s name, Hierarchical Identify Verify Exploit, indicate a sequence that begins with the multi-layer graphical representations of data. This opens the way for graph analytic processing to identify relationships between data within and perhaps between the layers. A key next step is the applications of verification filters and tests that can distinguish between relationships that have causal connections from meaningless correlations among the data. “Taken together, these elements should allow us to exploit the enormous amount of data being generated today, to make better decisions about if, when, and how to act in furtherance of the public good and national security,” Tran said.

Read more about the HIVE program in the Broad Agency Announcement: DARPA-BAA-16-52.

 <---------------------------------------------------------------------------->

HARDWARE CONSIDERATIONS FOR FASTER AND DEEPER INSIGHTS FROM YOUR LARGE SCALE GRAPH
Fundamentally graph analytics workloads exhibit different compute, memory and network characteristics as compared to the traditional workloads. The traditional hardware arch efficiency techniques pose significant bottlenecks to obtaining performance at scale with graph workloads. This session discusses the design considerations for a pointer-chasing graph analytics hardware, drawing parallels to Intel’s Programmable Integrated Unified Memory Architecture (PIUMA) innovation whose goal is to enable real-time analytics on large-scale data to drive deeper and faster insights.

 <---------------------------------------------------------------------------->

 https://dmccreary.medium.com/intels-incredible-piuma-graph-analytics-hardware-a2e9c3daf8d8

https://www.textise.net/showText.aspx?strURL=https%253A//dmccreary.medium.com/intels-incredible-piuma-graph-analytics-hardware-a2e9c3daf8d8

Dan McCreary

Nov 22, 2020
10 min read
Intel’s Incredible PIUMA Graph Analytics Hardware
The Intel PIUMA graph-optimized ASIC will focus on many RISC cores and fast random memory access. (from Figure 3 of the paper)

For the last few years, I have been promoting the idea of the Hardware Graph. My assertion was that graph hardware needs a focus on simple pointer hopping at scale. I have always stated is that the best way to do this is to use full-custom ASIC chips and memory designed for random memory access that supports pointer hopping over large memory footprints. Although I had privileged access to some insider knowledge of developments in this field, on October 13th, 2020, Intel published the results of their groundbreaking research on how they are building the next generation graph hardware. Now that this paper is in the public domain, we can openly discuss this new architecture and its impact on the Enterprise Knowledge Graph (EKG) industry. I hope to convince you the impact could be huge!

Intel’s name for this new architecture is PIUMA: Programmable Integrated Unified Memory Architecture. Although the word “memory” is prominent in the title, it is not only about optimizing memory hardware. It goes beyond that to include the design of new RISC cores. The instruction sets of these cores are optimized for graph traversal. There are many performance gains by using more lightweight cores with smaller instruction sets. When combined with memory turned for graph-access patterns the goal is a 1,000x speedup over other complex instruction set computers (CISC) for graph algorithm execution.
Inspired by the DARPA HIVE Project

Most enterprise knowledge graph companies are focused on getting the most out of conventional commodity hardware that is configured to support high core counts, large terabyte RAM, and a high-bandwidth low-latency network interconnecting the servers in a cluster. Because the market for enterprise knowledge graphs is so new, these companies don’t have the budget to build their own full-custom ASICs to optimize their algorithms. But with the prompting of the DARPA HIVE program, Intel does have the resources and expertise to design and build graph optimized hardware. And to be clear, the goals of the DARPA HIVE project are impressive:

The HIVE program is looking to build a graph analytics processor that can process streaming graphs 1,000X faster and at much lower power than current processing technology.

I think after you read the Intel PIUMA paper, you will agree that they are well on their way to meet and in some benchmarks, exceed these goals!

Before we dive into the predicted performance of this new hardware, a bit of background might be helpful for those of you not familiar with the concepts in graph hardware optimization.
Classification vs Connection

The paper points out that many of the recent developments in AI and machine learning are focused on the identification of objects in unstructured data. This is referred to as object classification. This includes examples like:

    Finding objects in an image (image classification)
    Finding entities (nouns) like people, places, and things in text documents (entity extraction)
    Finding words in speech (automatic speech recognition)
    Finding variations in a genomic sequence

However, once these items are discovered in unstructured data, there must be ways of connecting this information into coherent knowledge graphs. This is really the next step in AI. The recent NIPS conference papers have 136 references to the string “graph” in the titles. So clearly combining machine learning with graph representations of knowledge is a big trend in AI research.

Bottom line: modern AI must both classify and connect in real-time
Dense vs Sparse Analytics Workloads

The Intel PIUMA paper mentions that this new hardware is optimized for “sparse” workloads as opposed to “dense” workloads. But it does not go into much detail about where these workloads come from and how they are different. In general, classification problems are often solved using dense matrix representations of unstructured data. By dense, we mean that most of the values in the matrix are non-zero. For example, converting an image into a matrix would mean that every point in the image has non-zero numbers for the grayscale or color values at each point.

Unlike classification problems, connection problems tend to best be represented by a graph. We use an existing knowledge graph of known connections and then we use both rules and machine learning to predict the new relationships. This is what the semantic web community calls “inference”: the discovery of new relationships within an existing graph. There are both deterministic rules (ontologies) and machine learning algorithms that work together to perform inference.

Historically, many academics working in “symbolic AI” focused on how to store ontologies in different forms (SKOS, OWL, etc.) and when to execute these rules. Most of the Semantic Web Stack is concerned with this area of research. Sometimes the results of the inference take a long time to compute and can be stored in new materialized edges or recalculated on demand. There are still many interesting topics in inference engines that will impact knowledge graphs in the future. However, they all boil down to graph traversals which is really just pointer hopping. No one has ever given me an inference rule that could not be converted to a graph query. Very often the rules a just a few lines of graph queries.

Bottom line: many problems in AI require sparse representations of knowledge
Example: Using GPUs to Store Knowlege Graphs

GPU hardware is best suited for transforming dense information. The native data structure of the GPU is a matrix of numerical values. If most of the data in a matrix are non-zero, you can efficiently transform it with a GPU. GPUs are designed with many small cores that work together on matrix transforms. Yet many forms of knowledge don’t fit efficiently in a matrix. For example to represent a graph in a matrix we use a data representation called an adjacency matrix. Imagine a matrix where each vertex has a row and a column. If two vertices are connected we put a “1” at the cell at that row and column. If not, we put a “0” in that cell. For a typical knowledge graph with a million vertices, we might have on the average of five connections per vertex. That means that only 0.0005% of the matrix has non-zero values. This is an inefficient way to store knowledge. For small graphs of a few hundred vertices, using an adjacency matrix representation is not a big problem. But the larger your graphs the more inefficient GPUs become.

Bottom line: GPUs are not designed for graph inference and are inefficient at large knowledge graph representation.
Why CISC Architectures are Inefficient for Graph Traversal

Before we go into the Intel PIUMA architecture, there are a few items we need to cover to give readers a better understanding of why CISC processors are not appropriate for efficient graph pointer hopping needed in modern native property graphs.

If you take a look at the actual silicon real-estate in many CISC processors, you see they allocate a huge amount of silicon to optimize the performance of compiled programs. These programs have many IF/THEN/ELSE operations that need to run quickly. They are implemented as a comparison followed by a branch to other locations in code. CISC processors try to speculate on which branches will be executed and load memory references for different branches in parallel. If you are programming with a modern strongly typed language, every time you call a function, there is code that checks that the parameters are of the right type and in the right range of values. They boil down to IF (ERROR) THEN GOTO ERROR_HANDELER ELSE CONTINUE. As you can guess, going down the ERROR path is extremely rare, so a good compiler will not pre-fetch the error path.

Bottom line: most CISC processors today are heavily optimized for executing compiled code, not graph traversal.
The Top Graph Challenges

Intel clearly has done their homework. They have done a detailed analysis of many of the most common graph algorithms like PageRank and looked at how new hardware can be optimized to make execution fast. They summarize the top challenges to graph traversal via pointer chasing to random memory locations.

    Cache and Bandwidth — optimizing memory access patterns for random pointer hopping.
    Irregular Computation and Memory Intensity — creating many small cores that execute the pointer hops and are not constantly waiting for memory.
    Synchronization — if traversal needs to update memory within a core — how to get that update in to global memory efficiently.
    Scale — since a multi-trillion vertex graph will not fit in the RAM of a single server, how can information about traversals to other nodes be executed quickly.

Graphs Need Fast Random Memory Access

Graph algorithms don’t need the hardware complexity needed to optimize many diverse memory access patterns. A graph query is simply moving the program counter to new addresses in memory and then following simple rules about what paths to take next based on the properties of the vertex or the edges. To be efficient, graph databases need to quickly assign a traversal to a thread of execution in any core processor and let the cores skip through memory. Every transistor that doesn’t contribute to pointer hopping gets in the way. The challenge is that this memory access pattern is complicated to predict since graph traversal really looks like continuous access to random addresses. So great graph hardware needs to be optimized for different memory access patterns.
Linear Memory Scans Can Be Boosted with Caches

Imagine that you are searching your hard drive for a file with a specific keyword. Your computer needs to bring each document into RAM and search the RAM for a string. This is called “memory scans” and it is very common in unindexed database searches.

As the paper states:

…graph applications suffer from inefficient cache and bandwidth utilization

If you are doing a simple search for a string of text in a long unindexed document, you will be simply stepping through the document page after page. A modern CISC processor will work hard to pre-fetch pages and put them in a cache so when the CPU is ready to use that page, the memory access time will be fast.

Many relational databases are focused on getting many rows from tables. These rows are frequently laid out in memory next to each other and grouped together in “pages” of memory. Current CPUs use elaborate caching algorithms to predict what the next page of memory that the CPU will need. So complex caching silicon that helps with search and relational databases don’t help us with graph traversal.

Bottom line: Graph databases rarely need to do large memory scans. Elaborate caching hardware will not help graph algorithms.

As a side note, converting any low-cardinality attribute of a graph (like PersonGenderCode) into a vertex is a way of using outbound edges of a vertex like an index. If you need to filter all members by gender you just traverse the edges from the right gender. That avoids having to create secondary indexes that slow updates.

The paper continues to describe the results of their analysis of the graph analytics algorithms and how they are optimizing their hardware to maximize the performance of native graphs that use direct pointer-hops. For now, I am going to defer the analysis of these sections to a later blog and focus on the performance results of the hardware simulations.
Summarizing Performance Simulations

The paper concludes with a comparison of a PIUMA node with a current Xeon Gold 6140 processor with 4 sockets and 18 cores per socket. The PIUMA core will have 256 PIUMA “blocks”, and each block will have a number of both multi-threaded and single-threaded cores. Doing an exact comparison of thread counts may not be very meaningful since the cores are very specialized. But in general, we are going from 144 on the Xeon node to 16K threads on the PIUMA node. The key to performance speedups they are predicting from their hardware simulations are not just high thread counts, but a combination of high thread counts with enhanced memory and network hardware.

The paper uses a metric called Sparse Matrix Dense Vector Multiplication (SpMV) which is a good estimation of the types of calculations done in PageRank algorithms. The data is stored in a format called Compressed Sparse Row (CSR) or Yale format. The simulator shows 10x performance improvements even without taking into consideration the memory improvements. Once all the memory access simulation is done we get a 29x performance improvement on a single node. Then by extrapolating the simulation to a 16 node configuration we see a 467x speedup. By adding more nodes to the simulation we can see that getting to the DARPA HIVE goal of 1,000x is within reach.

The paper also provides simulation results for many other graph algorithms. One of the most impressive speedups is for Random Walks at 2,606x. Random walks are critical for calculating graph structures called graph embeddings — another topic that has become critical in graph analytics. Although the hardware simulator and the extrapolation is not an exact number, it is a good sign that Intel PIUMA is heading in the right direction.

Bottom line: Custom silicon could give graph analytics a 1,000x speed improvement over the existing hardware.
Next Steps and The Future

As the paper mentions, Intel is still early in the design and fabrication steps of the new PIUMA architecture. Getting the silicon working and then all the software developed as well as porting graph databases to use the new hardware will take time. However, I think that any hardware company in the AI space will need to compete against Intel PIUMA to be a contender. The contenders in the hardware graph space include Graphcore, NVIDIA/ARM merger, and now AMD/Xilinx efforts after their merger next year. All of these firms have teams of engineers that appreciate the need for non-matrix parallel computation. Each will need to invest hundreds of millions of dollars to build full-custom ASICs optimized for pointer hopping. This new hardware will compete with the existing high-performance computing vendors like Cray Graph Engine (now part of HP Enterprise) but cost a tiny fraction of these systems.

All these contenders will be pressed to provide not just low-level C-language interfaces, but they will need to leverage the new GQL standard to implement solutions with graph databases vendors. There is still much work to be done before it is easy to use the 1,000x improvements that the graph optimized architecture can offer on our Enterprise Knowledge Graphs.
Apology and Gratitude

I first want to mention I should have got this blog out several weeks ago after the paper came out. I was in the middle of rolling out a new graph training curriculum and I got quite behind on my blogging efforts.

I want to close with my thankfulness for both the people at DARPA that sponsored the HIVE challenge and the incredible team at Intel that is designing and building the new hardware. You may not have started your project scope with “let’s lower the cost of healthcare for everyone” but I truly believe that this will be one of the “unintended consequences” of your leadership. So thank you and good luck!
--
--
--
 <---------------------------------------------------------------------------->

The Berkeley Par Lab:
Progress in the Parallel Computing Landscape
David Patterson, Dennis Gannon, and Michael Wrinn
Editors
August 23, 2013

Foreword

Around the middle the last decade, the computing landscape began to change in several significant ways. The most striking development was that processor performance was not accelerating at its previous spectacular rate-and the traditional approaches of decreasing feature size and improving compiler technology were no longer yielding the results they had in years past. On the other hand it was equally clear that future applications were going to require far more computational power to support the many advances in human-computer
interaction and machine learning that were on the horizon.
     At that time, Intel was releasing its first multicore systems and parallelism was an obvious solution to the performance challenge. While parallel computing has been part of the computer designer’s toolbox for over
30 years, its use had mostly been restricted to high-performance “supercomputers.” We had little experience with using such techniques to enhance the performance of personal computing devices.
     It was clear that the industry needed a basic research program to understand how our client applications and systems could be redesigned to take advantage of this parallel computing revolution.
   ... [...] ...
 Their goal was to reinvent the entire application-software-hardware stack using parallelism as the foundational concept.
   ... [...] ...
     This book is the first collection to present a comprehensive view of how parallel computing will transform the experience of using computers. It lays the groundwork for a new generation of systems and applications that will not only change the industry, it will usher in the next revolution in computing.

Craig Mundie
Microsoft Corporation

   ... [...] ...

Origins and Vision of the UC Berkeley
Parallel Computing Laboratory

David Patterson
   ... [...] ...

The seven dwarfs are:
(The dwarfs were also called motifs, as some preferred we find a word other than dwarf.)

1. Structured Grids – including adaptive mesh replacement
2. Unstructured Grids
3. Fast Fourier Transform (later Spectral Methods)
4. Dense Linear Algebra
5. Sparse Linear Algebra
6. Particles (later N-body)
7. Monte Carlo

 embedded computing

8. Finite State Machines—for control applications
9. Combinational Circuits—for security and error correction

 electronic design automation

10. Graph Algorithms
11. Backtrack Branch and Bound

(See Chapter 3 on Content-Based Image Retrieval.)

12. Dynamic Programming

computations in graphical models and the nature of computations in graph algorithms, particularly graph traversal

13. Graphical Models—for probabilistic reasoning in Machine Learning

 Nevertheless, it became clear that the dwarfs failed to realize the principal
goal behind their initial definition: to guide and direct the development of future microprocessor architectures and microarchitectures.
    Dwarfs fell short of guiding microarchitecture development because a given dwarf might not even define a particular algorithm. In other words, for a given dwarf, such as N-body, there could be radically different algorithms, each of which implied completely different approaches to computation and communication. These different approaches to organizing the computation and computation of the solution might have very different implantation styles in software, and the different software implementations might in turn prefer different microarchitectural support.
    Thus, while the 13 dwarfs provided a terse palette of computations that a programmer might need to consider, they did not of themselves aid in determining a strategy for parallelizing an application nor did they make particular recommendations for microarchitectural elements that might aid in their efficient execution.
    So, what precisely was the use of the dwarfs? Tim Mattson suggested that while the dwarfs did not indicate a definite micro-architecture or software implementation, they did provide a nice bridge between the computationally focused thinking of application developers and particular approaches to parallelizing computations that he had expressed in his pattern language for parallel programming.

   ... [...] ...

    Mattson et al.’s 2004 book “Patterns for Parallel Programming” [18] was the first such attempt to systematize parallel programming using a complete pattern language.

   ... [...] ...

 <---------------------------------------------------------------------------->

http://investors.cray.com/phoenix.zhtml?c=98390&p=irol-newsArticle&ID=1686852&highlight=

Cray Agrees to Sell Interconnect Hardware Assets to Intel
Company to Host Conference Call to Discuss the Definitive Agreement

SEATTLE, WA, Apr 24, 2012 -- Global supercomputer leader Cray Inc. (NASDAQ: CRAY) today announced it signed a definitive agreement to sell its interconnect hardware development program and related intellectual property to Intel Corporation (NASDAQ: INTC) for $140 million in cash.

"This agreement is evidence of the leadership position we've established in high performance computing, and is an exciting win for our customers, our company and our shareholders," said Peter Ungaro, president and CEO of Cray. "By broadening our relationship with Intel, we are positioned to further penetrate the HPC market and expand on our industry-leading technologies in support of our Adaptive Supercomputing vision. Our product roadmap remains intact as we continue to build the highly differentiated, tightly integrated supercomputers that our customers have come to expect from Cray. This agreement also dramatically strengthens our balance sheet and increases our options for further growth, profitability and creating shareholder value."

Highlights of the agreement include:

--  Cray to receive $140 million in cash at closing;
--  Cray will continue to develop, sell and support current product lines,
    as well as the Company's next-generation supercomputer code-named
    "Cascade";
--  Cray has opportunities to leverage important differentiating features
    of certain future Intel products;
--  Cray to retain certain rights to use the transferred assets and
    intellectual property in Cray products;
--  Up to 74 Cray employees will join Intel;
--  The transaction is expected to close relatively quickly, but in any
    event before the end of the current quarter, subject to customary
    closing conditions.

 <---------------------------------------------------------------------------->

  • http://www.realworldtech.com/iedm-2005/15/
     • At IEDM 2005, Sony announced a breakthrough in magnetic random access memory technology (MRAM). MRAM devices are highly attractive as a potential, unified replacement memory in system-on-chip applications. Similar to non-volatile flash memory (NVRAM), MRAM does not require power to maintain data storage. However, unlike NVRAM, MRAM devices have the advantage that write speed is comparable to read speed. Furthermore, MRAM devices are not known to be constrained to a limited number of erase-write cycles. Also, one final advantage to MRAM is that MRAM storage is based on switching magnetic fields and no active silicon real estate for charge storage is needed. The disinterest in active silicon real estate means that MRAM cells can be fabricated in between the metal layers above active logic circuits, and MRAM could be an ideal candidate as local memory for any embedded processor, simultaneously replacing SRAM, eDRAM, and any embedded NVRAM. 
     • MRAM devices are known to have three drawbacks when compared to other types of embedded memory: high write current, relatively large cell sizes, and compatibility with logic based process. The process compatibility issue means that suitable materials had to be found and integrated within the framework of the process flow of a logic-targeted process to preserve the functionality and performance of transistors on the surface of the active silicon and enable the integration of embedded MRAM cells with logic circuits. Although the process compatibility issue is a serious concern, the cell size issue for MRAM is less of a concern for embedded systems. The current generation of MRAM cells (~40 f2) are larger than eDRAM (~25 f2), and NVRAM (~4 f2), it is however far smaller than SRAM (>100 f2)5. Moreover, the fact that MRAM cells do not require the use of active silicon means the cell size issue is less of a concern, particularly when compared to SRAM, eDRAM, or NVRAM on a given design. However, the final issue of high write current needed to reverse magnetic fields remained a concern for embedded systems, since the high write current required associated costs in additional circuits and peak power consumption. Fortunately, the breakthrough announced by Sony at IEDM 2005 is precisely targeted to reduce the high write current of MRAM devices. 

 <---------------------------------------------------------------------------->

  • Intel and Micron announce a new type of memory technology
  • it is resistive memory or ReMem for short
  • implementation of memristor technology  
  • we should realize that the market is going to be entrenched, and there is an associated cost and learning curve to switch to a new technical breakthrough
  • http://semiaccurate.com/2015/07/29/intel-micron-introduce-new-memory-type-3d-xpoint/
  • http://www.eetimes.com/document.asp?doc_id=1279473
  • http://venturebeat.com/2013/08/05/crossbar-says-it-will-explode-the-60b-flash-memory-market-with-resistive-ram-which-stores-a-terabyte-on-a-chip/
  • http://www.webopedia.com/TERM/R/resistive_memory_reram_rram.html
  • https://en.wikipedia.org/wiki/Resistive_random-access_memory
  • https://en.wikipedia.org/wiki/Memristor
  • http://www.eetimes.com/document.asp?doc_id=1327292
     • Jim Handy, principal analyst with Objective Analysis, was on-site for the announcement, and as a result, had access to a second question and answer session with Intel and Micron technology staff. When he asked what memory technology 3D XPoint was closest to, he was told there was no technology as mature as this one. 

 <---------------------------------------------------------------------------->

   Memory technology development 
     HP (U.S. based) and SK Hynix (Korean based)
     Intel (U.S. based), crossbar, RRAM, Micron technology (U.S. based), 
        ReMem (resistive memory - 3D XPoint) 
     Sony, MRAM - magnetic random access memory 

   Memory products (2017) 
     DRAM, flash 


M. Mitchell Waldrop, The Dream Machine, 2001                                [ ]

p.247
“All through their design”, they wrote, “both in hardware and in programming, [the IBM engineers] seem to have taken the view that a system is a static thing which is only created once and never modified.”  In particular, IBM had built in the assumption that each computer would have one and only one processing unit sitting at its center like a spider in a web, with all the memory banks and input-output equipment feeding into it. And indeed, for most business applications, that assumption was perfectly adequate. For the Project MAC representatives, however, it was wrongheaded on two counts. First, because they wanted to create an information utility that would be able to grow and evolve in ways they could not anticipate, they needed a machine that could operate with MANY central processing units at once. Not only would this greatly enhance reliability--since if one processor failed, the others could keep the system running--but it would provide a natural path for expansion: you could just add more processors.  With the System/360 design, by contrast, there would be no way to upgrade without replacing the whole computer. Second, because the job of coordinating all those processors would have to be handled by Multics itself, which would reside in the computer's memory banks. Corbató and his colleagues were looking for a “memory-centered” architecture. 

p.248
   The upshot was that Corbató, Glaser, Dennis, and Graham politely thanked their hosts for the information and quietly made plans to look elsewhere. 

   (Waldrop, M. Mitchell.; The dream machine : J. C. R. Licklider and the revolution that made computing personal / M. Mitchell Waldrop., 1. Licklider, J. C. R., 2. microcomputers--history, 2001,   ) 

 <---------------------------------------------------------------------------->

http://www.theregister.co.uk/2014/12/18/crossbar_jumps_over_higher_rram_bar/

When does Handy think RRAM could appear in products?

"I spoke with someone who was in the session [a highly-respected process expert] who lauded Crossbar's promotional efforts but said that the technology is still a long way from production.

"My own position has long been that I see 2023 as the year that this, or some competing technology, will displace flash or DRAM. Entrenched technologies will be with us for some time. Until then, this and all the other technologies vying to replace DRAM and flash will be relegated to niches."


http://www.technologyreview.com/featuredstory/536786/machine-dreams/

Machine Dreams

To rescue its struggling business, Hewlett-Packard is making a long-shot bid to change the fundamentals of how computers work.

    By Tom Simonite on April 21, 2015

 ● memristor chips (Combining memory and storage)
    Edwin Kan, a professor at Cornell University who works on memory technology, says that progress on memristors and similar devices appeared to stall when companies tried to integrate them into dense, reliable chips. 
    Dmitri Strukov, one of Williams’s former collaborators at HP, says memristors have yet to pass a key test. Strukov, an assistant professor at the University of California, Santa Barbara, and lead author on the 2008 paper announcing the memristor, says that while technical publications released by HP and SK Hynix have shown that individual memristors can be switched trillions of times without failing, it’s not yet clear that large arrays perform the same way. “That’s nontrivial,” he says.
 ● photonic interconnects 
 

The two-tier system of storage and memory means computers spend a lot of time and energy moving data back and forth just to get into a position to use it. This is why your laptop can’t boot up instantly: the operating system must be retrieved from storage and loaded into memory. One constraint on the battery life of your smartphone is its need to spend energy keeping data alive in DRAM even when it is idling in your pocket.

That may be a mere annoyance for you, but it’s a costly headache for people working on computers that do the sort of powerful number-crunching that’s becoming so important in all kinds of industries, says Yuanyuan Zhou, a professor at the University of California, San Diego, who researches storage technologies. “People working on data-intensive problems are limited by the traditional architecture,” she says.

http://cseweb.ucsd.edu/~yyzhou/

The Machine is designed to overcome these problems by scrapping the distinction between storage and memory. A single large store of memory based on HP’s memristors will both hold data and make it available for the processor. Combining memory and storage isn’t a new idea, but there hasn’t yet been a nonvolatile memory technology fast enough to make it practical, says Tsu-Jae King Liu, a professor who studies microelectronics at the University of California, Berkeley. Liu is an advisor to Crossbar, a startup working on a memristor-like memory technology known as resistive RAM. It and a handful of other companies are developing the technology as a direct replacement for flash memory in existing computer designs. HP is alone, however, in saying its devices are ready to change computers more radically.

To make the Machine work as well as Fink imagines, HP needs to create memristor memory chips and a new kind of operating system designed to use a single, giant store of memory. Fink’s blueprint also calls for two other departures from the usual computer design. One is to move data between the Machine’s processors and memory using light pulses sent over optical fibers, a faster and more energy-efficient alternative to metal wiring. The second is to use groups of specialized energy-efficient chips, such as those found in mobile devices, instead of individual, general-purpose processors. The low-energy processors, made by companies such as Intel, can be bought off the shelf today. HP must invent everything else.

Rich Friedrich, director of system software for the Machine

Sharad Singhal, who leads HP’s data analysis research, expects particularly striking improvements for problems involving data sets in the form of a mathematical graph—where entities are linked by a web of connections, not organized in rows and columns. 

The switching effect came from a layer of titanium, used like glue to stick the rotaxane layer to the electrodes. More surprising, versions of the devices built around that material fulfilled a prediction made in 1971 of a completely new kind of basic electronic device. When Leon Chua, a professor at the University of California, Berkeley, predicted the existence of this device, engineering orthodoxy held that all electronic circuits had to be built from just three basic elements: capacitors, resistors, and inductors. Chua calculated that there should be a fourth; it was he who named it the memristor, or resistor with memory. The device’s essential property is that its electrical resistance—a measure of how much it inhibits the flow of electrons—can be altered by applying a voltage. That resistance, a kind of memory of the voltage the device experienced in the past, can be used to encode data.

HP’s latest manifestation of the component is simple: just a stack of thin films of titanium dioxide a few nanometers thick, sandwiched between two electrodes. Some of the layers in the stack conduct electricity; others are insulators because they are depleted of oxygen atoms, giving the device as a whole high electrical resistance. Applying the right amount of voltage pushes oxygen atoms from a conducting layer into an insulating one, permitting current to pass more easily. Research scientist Jean Paul Strachan demonstrates this by using his mouse to click a button marked “1” on his computer screen. That causes a narrow stream of oxygen atoms to flow briefly inside one layer of titanium dioxide in a memristor on a nearby silicon wafer. “We just created a bridge that electrons can travel through,” says Strachan. Numbers on his screen indicate that the electrical resistance of the device has dropped by a factor of a thousand. When he clicks a button marked “0,” the oxygen atoms retreat and the device’s resistance soars back up again. The resistance can be switched like that in just picoseconds, about a thousand times faster than the basic elements of DRAM and using a fraction of the energy. And crucially, the resistance remains fixed even after the voltage is turned off.

 in 2010 HP announced that it had struck a deal with the South Korean memory chip manufacturer SK Hynix to commercialize the technology

 <---------------------------------------------------------------------------->

   Everything is connected.  You can separate them out.  You can ignore them. 
   But that does not mean, they are not connected.  I am typing this out on a 
   notebook computing device using the technology and micro-electronics that 
   has caused miscarriages and birth defects in women.  
   
 <---------------------------------------------------------------------------->

Bloomberg Businessweek
June 19, 2017

Chipmaking moved to Asia.  Miscarriages and birth defects followed 



The price of a digital world
by Cam Simpson
   with Ben Elgin, Heesu Lee, and Kanoko Matsuyama

pp.58-65

p.58
25 years ago, U.S. chipmakers vowed to stop using chemicals that caused miscarriages and birth defects. And they did──by outsourcing the danger to women in Asia

p.58
money can cloud science (see: tobacco companies vs. cancer researchers). 

p.58
1984, Harris Pastides, associated professor of epidemiology at the University of Masschusetts at Amherst. 

p.58
James Stewart
Making computer chips involved hundreds of chemicals. The women on the production line worked in so-called cleanrooms and wore protective suits, but that was for the chips' protection, not theirs. The women were exposed to, and in some cases directly touched, chemicals that included reproductive toxins, mutagens, and carcinogens. 

p.58
Reproductive dangers are among the most serious concerns in occupational health, because workers' unborn children can suffer birth defects or childhood diseases, and also because reproductive issues can be sentinels for disorders, especially cancer, that don't show up in the workers themselves until long after exposure. 

p.58
   In epidemiology, follow-up studies usually get bigger and tougher, and for that reason they often contradict one another. But by December 1992, something rare had happened. All three studies──all paid for by the industry──showed similar results: roughly a doubling of the rate of miscarriages for thousands of potentially exposed women. This time the industry reacted quickly. SIA [Semiconductor Industry Association, representing International Business Machines Corp., Intel Corp., and about a dozen other top technology companies] pointed to a family of toxic chemical widely used in chipmaking as the likely cause and declared that its companies would accelerate efforts to phase them out. IBM went further: It pledged to rid its global chip production of them by 1995. 

p.60
As semiconductor production shifted to less expensive countries, the industry's promised fixes do not appear to have made the same journey, at least not in full. Confidential data reviewed by Bloomberg Businessweek show that thousands of women and their unborn children continued to face potential exposure to the same toxins until at least 2015. Some are probably still being exposed today. Separate evidence shows the same reproductive-health effects also persisted across the decades. 

p.60
Two young women working side-by-side at the same Samsung Electronics workstation and using the same chemicals contracted the same aggressive form of leukemia. The disease kills only 3 out of every 100,000 South Koreans each year, but these young co-workers died within 8 months of each other. And their disease was among those most clearly tied to carcinogens. 

p.60
Kim Myoung-hee, South Korean physician, also an epidemiologist 
“I had no idea that this is a chemical industry, not the electronics industry”, she says. 
([ electronics industry is the end-user of a category of chemical mixtures produced by chemical companies ])

p.60
   Physics drives the design of microchips, but their production is mostly about chemistry. In basic sense, chemicals and light combine to photographically print circuits onto silicon waters. Gordon Moore, a founder of Intel and a major figure in the creation of the modern chip in 1960, is a chemist. He worked closely on the printing process with a physicist named Jay Last. “We were putting into industrial production a lot of really nasty chemicals”, Last said in an interview he did with Moore for an oral history project of the Chemical Heritage Foundation. “There was just no knowledge of these things, and we were pouring stuff down into the city sewer system.”

p.60
Authorities would end up designating more Superfund hazardous waste sites in Santa Clara County, the heart of Silicon Valley, than in any other country in the U.S.  ([look up designated Superfund hazardous waste sites in U.S.])

p.60
The toxic ingredients were called ethylene glycol ethers, or EGEs. They also became key ingredients in solvent mixtures known as strippers, which are used to clean the chips during printing. 

ethylene glycol ethers

p.60
The IBM study found miscarriage rates tripled for women who worked specifically with EGEs. Separate studies showed EGEs easily permeated rubber gloves, like water through net, and that skin absorption was the most dangerous route, leading to exposure rates 500 to 800 times above the level deemed safe. The dangers were so abundantly clear that the U.S. Occupational Safety and Health Administration in 1993 formally proposed exposure levels so minute that, practically speaking, companies would have to ban EGEs to comply. 

p.60
Historical reproductive-health studies connected microelectronics production to fatal birth defects in the children of male workers, childhood cancers among the children of female workers, and infertility and prolonged menstrual cycles. 

p.60
... the chemicals [EGEs - ethylene glycol ethers] also had become classified as Category 1 reproductive toxins under international standards, and European regulators had placed them on a list of the most highly toxic chemicals known to science, designating them Substances of Very High Concern. 

p.63
One was benzene, which was known to cause the rare form of leukemia that killed Samsung co-workers, and another was the most toxic of the EGEs, a chemical commonly called 2-Methoxyethanol, or 2-ME. 

p.63
... the two photoresists with the highest concentrations of 2-ME were made by the same manufacturer: the Shin-Etsu Chemical Co. in Tokyo. 

p.63
A company in Taipei called Topco Scientific Co. is the exclusive distributor for Shin-Etsu chemical in Taiwan and China, ... 

p.63
   That the risks could persist overseas was flagged more than two decades ago by the Johns Hopkins researchers working at IBM. They knew that EGEs were cheap, effective, and abundantly available and that less-dangerous alternatives were far more expensive. Their published report cited the higher costs of safety and specifically warned that could mean the dangers would persist overseas. 

p.64
the newspaper Kyunghyang Shinmun in its March 1, 1996, edition. 


ethylene glycol ethers


p.64
   After IBM started buying memory chips from South Korea, it cut production in at least one of hte plants where Correa and his colleagues found the elevated miscarriage rates. Other members of the Semiconductor Industry Association also made deals in South Korea similar to IBM's, including Motorola, Texas Instruments, and HP. Intel began buying Samsung memory chips to put into its then world-dominating Pentium-processor chipsets in 1996. To the extent the South Koreans continued using products containing EGEs, the industry way in effect trading exposure in U.S. workers for exposure in women overseas. 

p.64
   Samsung and SK Hynix have dominated global production of memory chips for two decades──they controlled more than 74 percent of the market in 2015. Their chips are in iPhones, Android phones, laptops, cars, televisions, and game consoles──anything with an electronic brain. It's a safe bet that virtually every consumer in the industrialized world has purchased products containing memory chips made by Samsung or SK Hynix. 

p.64
   Kim, the epidemiologist, says the secrecy of these settlements is a reason there was so little discussion for so long of the risks in chipmaking. “It was not published in academic papers”, she says. “Just some hidden settlements between the companies and some victims.”

p.64
CMR agents──shorthand for carcinogens, mutagens, and reproductive toxins. In addition to benzene and EGEs, they've historically included arsenic, hydrofluoric acid, and trichloroethylene. 

p.64
“trade secret” designations

p.65
   The business of selling chemicals to chipmakers is worth $20 billion a year, according to a February 2016 report by Frost & Sullivan, a market research company in Mountain View, California. Pure EGEs [ethylene glycol ethers] are manufactured by at least 24 companies in 10 countries, according to a 2010-11 directory of chemical manufacturers published by SRI Consulting.  U.S. producers include Dow Chemical Co., which makes EGEs [ethylene glycol ethers] in Texas, and Monument Chemical, which makes them in Kentucky. International producers include BASF in Germany, Switzerland-based Clariant, and Sinopec Tianjin, which is a subsidiary of the China Petroleum & Chemical Corp., the state-owned giant. 

p.65
Women are also critical to the industry in Taiwan, Singapore, and Malaysia, which have all relied heavily on foreign migrant workers. 

 <---------------------------------------------------------------------------->

Kevin Kelly, out of control, 1994                                           [ ]

p.356
... Farmer learned three important things about predicting the future ...
   • First, you CAN milk underlying patterns inherent in chaotic systems to make good predictions.
   • Second, you don't need to look very far ahead to make a useful prediction. 
   • And third, even a LITTLE BIT of information about the future can be valuable. 

p.357
Prediction Company
“seeing further is not seeing better”
When immersed in real world complexity, where few choices are clear cut and every decision is clouded by incomplete information, evaluating choices too far ahead becomes counterproductive. 

pp.357-358
However, we never have sufficient information to make a fully informed decision. We operate in the dark. To compensate we use rules of thumb or rough guidelines. Chess rules of thumb are actually pretty good rules to live by. (Notes to my daughters: Favor moves that increase options; shy away from moves that end well but require cutting off choices; work from strong positions that have many adjoining strong positions. Balance looking ahead to really playing attention to what's happening now on the whole board.) 


p.358
We employ limited look-ahead guided by rules of thumb. 

   (Kevin Kelly, out of control, 1994, filename: ooc-mf.pdf  )

 <---------------------------------------------------------------------------->
 <---------------------------------------------------------------------------->


No comments:

Post a Comment

Chin-tang sah

  Chih-Tang Sah Evolution of the MOS transistor –– from conception of VLSI by Chih-tang Sah, fellow, IEEE manuscript received August 1, 1986...