Data Science Conference: Full Schedule

1:00pm PDT

Welcome

Speakers

Jan E. Odegard

Executive Director Ken Kennedy Institute/ Associate Vice President Research Computing, Rice University

Jan E. Odegard Executive Director, Ken Kennedy Institute for Information Technology and Associate Vice President, Research Computing & Cyberinfrastructure at Rice University. Dr. Odegard joined Rice University in 2002, and has over 15 years of experience supporting and enabling research... Read More →

Odegard 2017 DAY1 DS Welcome pdf

Monday October 9, 2017 1:00pm - 1:15pm PDT
Room 103, Auditorium, BRC 6500 Main Street

Welcome

Host Organization Ken Kennedy Institute for Information Technology

1:15pm PDT

Keynote: What Did the Public Really Know About Harvey, and How Can We Better Inform Them?, Eric Berger, Space City Weather

WATCH THE PRESENTATION

Speakers

Eric Berger

Space City Weather

Eric Berger is the editor for Space City Weather. As a certified meteorologist, Berger has written about weather in the Houston area for more than a decade. Formerly a journalist with the Houston Chronicle, he is well known for providing level-headed weather reporting. Following his... Read More →

1 Berger flooding1 pdf

Monday October 9, 2017 1:15pm - 2:00pm PDT
Room 103, Auditorium, BRC 6500 Main Street

Keynote

Host Organization Ken Kennedy Institute for Information Technology

2:00pm PDT

Plenary: Learning Discrete Markov Random Fields with Optimal Runtime and Sample Complexity, Adam Klivans, University of Texas

WATCH THE PRESENTATION

We give a simple, multiplicative-weight update algorithm for learningundirected graphical models or Markov random fields (MRFs). Theapproach is new, and for the well-studied case of Ising models orBoltzmann machines, we obtain an algorithm that uses a nearly optimalnumber of samples and has quadratic running time (up to logarithmicfactors), subsuming and improving on all prior work. Additionally, wegive the first efficient algorithm for learning Ising models overgeneral alphabets.
Our main application is an algorithm for learning the structure oft-wise MRFs with nearly-optimal sample complexity (up to polynomiallosses in necessary terms that depend on the weights) and running timethat is n^t. In addition, given n^t samples, we can also learn theparameters of the model and generate a hypothesis that is close instatistical distance to the true MRF. All prior work runs in timen^d for graphs of bounded degree d and does not generate ahypothesis close in statistical distance even for t=3. We observe thatour runtime has the correct dependence on n and t assuming thehardness of learning sparse parities with noise.
Our algorithm--the Sparsitron-- is easy to implement (has only oneparameter) and holds in the on-line setting. Its analysis applies aregret bound from Freund and Schapire's classic Hedge algorithm. Italso gives the first solution to the problem of learning sparseGeneralized Linear Models (GLMs).
Joint work with Raghu Meka

Speakers

Adam Klivans

Associate Professor, The University of Texas at Austin

Monday October 9, 2017 2:00pm - 2:30pm PDT
Room 103, Auditorium, BRC 6500 Main Street

Plenary

Host Organization Ken Kennedy Institute for Information Technology

2:30pm PDT

Plenary: Shared Infrastructure for Data Science, Wes McKinney, Two Sigma

WATCH THE PRESENTATION

Wes McKinney makes the case for a shared infrastructure for data science, discusses the open source community's efforts on Apache Arrow, and offers a vision for seamless computation and data sharing across languages.

Speakers

Wes McKinney

Software Architect, Two Sigma

Wes McKinney is a software architect at Two Sigma Investments. He is the creator of Python’s pandas library and a PMC member for Apache Arrow and Apache Parquet. He wrote the book Python for Data Analysis. Previously, Wes worked for Cloudera and was the founder and CEO of DataP... Read More →

mckinney rice slides pdf

Monday October 9, 2017 2:30pm - 3:00pm PDT
Room 103, Auditorium, BRC 6500 Main Street

Plenary

Host Organization Ken Kennedy Institute for Information Technology

3:00pm PDT

Break

Monday October 9, 2017 3:00pm - 3:30pm PDT
Event Hall, BRC 6500 Main Street

Break

Host Organization Ken Kennedy Institute for Information Technology

3:30pm PDT

Tools and Infrastructure: Designing Next Generation Resource-Frugal Deep Learning Algorithms

WATCH THE PRESENTATION

Current deep learning architectures are growing larger in order to learn from complex datasets. The quest for a unified machine learning algorithm which can simultaneously generalize from diverse sources of information (or transfer learning) has made it imperative to train astronomical sized neural networks with enormous computational and memory overheads.

Conversely, the data sizes are exploding, and furthermore, there is another growing trend to bring deep learning to low-power, embedded devices. Current algorithms, including the well-known backpropagation algorithm, for training and testing, do not meet the resource constraints required by future big-data and IoT systems. Massive deep architectures need giant matrix multiplication operations to train millions of parameters. The matrix operations, associated with both training and testing of deep networks, are costly from a computational, memory and energy standpoint.

In this talk, we will show a novel set of algorithms to deal with computational, energy and memory challenged associated with massive networks. We will discuss our recent success in developing a novel hashing based scalable and sustainable technique to reduce the computations associated with backpropagation algorithm drastically. Utilizing the magic that made search algorithms over web faster, we will demonstrate how our algorithms only need 5% of the total computations, and they can still manage to keep on average within 1% of the accuracy of the classical backpropagation.

A unique property of the proposed hashing based back-propagation is that the updates are always sparse. Due to the sparse gradient updates, our algorithm is ideally suited for asynchronous and parallel training leading to near linear speedup with increasing number of cores. We demonstrate the scalability and sustainability (energy efficiency) of the new proposed algorithm over several real datasets.

In the end, if time permits, we will show a simple algorithm for reducing the memory requirements associated with deep networks. Using our algorithms, we can train 100,000 classes with 400,000 features, on a single machine while only needing 5% or less memory required to store all the weights. Running a simple logistic regression on this data, the model size of 320GB is unavoidable.

Moderators

Trond Elefsen

Invatare

Roy Keyes

Houston Data Science Group

Speakers

Anshumali Shrivastava

Professor, Rice University; Founder, ThirdAI Corp

Anshumali Shrivastava's research focuses on Large Scale Machine Learning, Scalable and Sustainable Deep Learning, Randomized Algorithms for Big-Data and Graph Mining.

1 Talk pdf

Monday October 9, 2017 3:30pm - 3:50pm PDT
Room 103, Auditorium, BRC 6500 Main Street

Parallel Session A

Host Organization Ken Kennedy Institute for Information Technology

3:30pm PDT

Extending Tools & Methods: Scalable Real-Time Analytics for Cluster-Based Production Forecasting

WATCH THE PRESENTATION

A cornerstone question in Production and Reservoir Engineering is to be able to perform reliable production forecasting and reserves estimation on a timely basis. Given the high operational cost and complexity of operating a well, it is critical to proactively anticipate potential problems and act on them early. To this end, clustering can be used to group similar production curves to understand different field development scenarios. However, there is very few literature around the use of clustering to facilitate forecasting applications. The reason is that traditional distance metrics may not be suitable to align multiple time-dependent features associated to the physics and operational structure of the problem. This work focuses on the integration of a scalable real-time data platform with a novel clustering approach to describe different production forecasting trends and help detect temporal anomalies. We provide a high level description of the data platform as well as the basis for supporting our clustering approach in contrast to the widely used K-means algorithm.

Moderators

Scott Morton

Francisco Sanchez

HEDS Group

Speakers

Hector Klie

Chief Executive Officer, DeepCast.ai

I'm an experienced computational and data scientist with a passion to develop innovative physics- and data-driven solutions for a wide range of engineering and geoscientific applications in Oil & Gas. Hence, I'm very interested in discussing any topic related to Machine Learning... Read More →

Arturo Klie

CTO, DeepCast.ai

Duc Le

Sr Reservoir Engineer & Software Engineer, DeepCast.ai

1 Rice 2017 DSConf Klie pdf

Monday October 9, 2017 3:30pm - 3:50pm PDT
Room 280 & 282, BRC 6500 Main Street

Parallel Session B

Host Organization Ken Kennedy Institute for Information Technology

3:50pm PDT

Tools and Infrastructure: Deep Neural Network Hyperparameter Optimization with Genetic Algorithms

WATCH THE PRESENTATION

We propose EvoDevo: a scalable, parallel, high-performance computing (HPC) framework for fast and efficient optimization of deep neural network (DNN) hyperparameters and topologies via genetic algorithms. Written in C with MPI parallelization, EvoDevo's wrapper-based approach supports multiple NN toolkits including Google's TensorFlow as well as Microsoft's Cognitive Toolkit (CNTK). Optimized hyperparameters currently include: (1) number of filters for each convolutional layer, (2) kernel size for each convolutional layer, and (3) size of each fully-connected hidden layer. This work uses the MNIST and CIFAR-10 data sets with the LeNet-5 and ResNet-110 neural network models. Optimization is performed with 16 NVIDIA Tesla P100 GPUs on an internal Cray XC50 supercomputer. We observe improvements in final classification accuracy as well as training time to accuracy from the optimization of LeNet-5 on MNIST in TensorFlow and ResNet-110 on CIFAR-10 in CNTK.

Moderators

Trond Elefsen

Invatare

Roy Keyes

Houston Data Science Group

Speakers

Jacob Balma

Cray Inc.

Aaron Vose

Cray Inc.

Geert Wenes

Sr. Practice Leader/Architect, Cray, Inc.

2 evodevo nnopt rice final pdf

Monday October 9, 2017 3:50pm - 4:10pm PDT
Room 103, Auditorium, BRC 6500 Main Street

Parallel Session A

Host Organization Ken Kennedy Institute for Information Technology

3:50pm PDT

Extending Tools & Methods: Immersive Data Analysis for NASA Biomedical Data

WATCH THE PRESENTATION

NASA researchers are gathering high-dimensional astronaut data from research and surveillance studies to understand the risks and long-term consequences of human spaceflight and to generate new hypotheses regarding potential physiological changes associated with deep space exploration. Simultaneously, novel software and methods are being developed to analyze these data. Advances in data visualization have provided a medium for exploring data and providing enriched analytical results back to the investigator; however, our interaction with and visualization of these results are restricted to 2-dimensional displays, limiting our ability to extract the full value of information stored in the data. Currently, we are developing an immersive data analytics platform in virtual reality (VR) that will enhance data visualization, presentation, and analysis. We will demonstrate our current developments for VR data analytics in the context of network analysis and discuss how VR may help analysts and researchers identify patterns in the data that were previously unobservable.

Moderators

Scott Morton

Francisco Sanchez

HEDS Group

Speakers

Matthew Koslovsky

Biostatistician, KBRWyle

Last year I received my PhD in biostatistics from the UTHealth School of Public Health. Since graduation I have been working as a biostatistician at KBRwyle supporting research and clinical studies on astronaut health and physiological changes associated with spaceflight. My background... Read More →

Jennifer Mendez

Leidos

James Mireles

Business Data Analyst, KBRWyle

Virtual reality, augmented reality, mixed reality. I've been in the field since 2014. VR/AR/MR is a dynamic, even exploding field and the possibilities for B2B and enterprise uses are myriad. I'm an active member of the Houston VR community, co-founder of the Immersive Technology... Read More →

2 Rice Data Science Conference Koslovsky pptx

Monday October 9, 2017 3:50pm - 4:10pm PDT
Room 280 & 282, BRC 6500 Main Street

Parallel Session B

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Planner - Transient Optimization in Gas Pipelines

A planner model was implemented as an extension to an existing real-time model of a gas pipeline network. The objectives of planner are two-fold: determining the feasibility of given and calculated target flow rates, and optimization of gas inventory to minimize transportation costs of the gas in pipeline network to various off-takes. The feasibility of proposed solution is determined based on inlet and outlet flowrate constraints (targets), pipeline pressure constraints and delivery requirements. Under certain assumptions, the problem is reduced to “navigating” between lower and upper inventories using optimization algorithms. This pseudo-steady state solution is verified using full transient solver.
The planner was tested against various scenarios:
- Shutdown/startup
- Change in targets
- Switching operation in the pipeline network
- Actual operational experience

A unique testing strategy for all scenarios has been developed and used as part of deploying this planner to production.

The second part of this work describes experience solving the optimization models using Linear Simplex Solver and Non-Linear Interior Point Solver. The Modeling is achieved using mathematical programming languages popular for describing linear and nonlinear optimization problems in Operations Research (OR) such as AMPL. Some of the tools used in this work provide fast prototyping in Excel, and auto-generation of code and connectors to many open-source and closed-source solvers, particularly IPOPT. The presentation will also discuss some open problems.

Speakers

Denis Akhiyarov

Monday October 9, 2017 4:00pm - 6:00pm PDT
Event Hall BRC

Poster Session

4:10pm PDT

Tools and Infrastructure: Deep Learning for Data Fusion

PRESENTATION NOT AVAILABLE

Many methods and datasets in machine learning assume a single input source, consisting of either text, images, or a vector of numeric inputs. Yet many applications require integrating multiple sources of information, one or more of which may be missing for any particular instance. Because of its importance, this setting has been extensively studied under several names with nuanced distinctions, e.g. data fusion, information fusion, sensor fusion, and multimodal learning. It is especially relevant in fields that rely on collection of data from heterogeneous sources or physical sensors, such as robotics or oil and gas exploration. In recent years, deep learning has been yielded breakthroughs in many applications. In this talk, we discuss current research on using deep learning for data fusion and multimodal learning and present an architecture for general-purpose end-to-end multimodal learning with specific application to processing of documents collected and curated as part of legal proceedings and investigations.

Moderators

Trond Elefsen

Invatare

Roy Keyes

Houston Data Science Group

Speakers

Alan Lockett

CS Disco, Inc.

3 Deep Learning for Data Fusion at Rice DS 2 pdf

Monday October 9, 2017 4:10pm - 4:30pm PDT
Room 103, Auditorium, BRC 6500 Main Street

Parallel Session A

Host Organization Ken Kennedy Institute for Information Technology

4:10pm PDT

Extending Tools & Methods: Rapid Development of Open Real-Time Drilling Analytics System

WATCH THE PRESENTATION

In order to have an open RTD analytics system, Anadarko Petroleum Corporation decided to develop an in-house RTD analytics system based on a generic complex-event processing framework. The system consists of three subsystems: data acquisition, analytics, and a GUI front end. The open analytics system refers to a framework where various analytics modules can be developed in-house and added to the system, including physics based engineering models and data driven or machine learning models. It further refers to a generic real-time analytics system where the data acquisition layer is inter-changeable, and not limited to the RTD data. Currently in the analytics subsystem, several analytic modules are deployed or under development:

1. Drilling Activity Recognition module (online)
1.1. Real-time categorize the drilling activity every second (trip-in, trip-out, rotation drilling, sliding drilling et al. ); enable the real-time monitor of drilling operations.
1.2. Foundation for the remaining analytics modules.
2. Sliding Drilling Guidance module (online)
2.1. Automated directional calculations including motor yield and build rate needed to land as surveys are posted. Also, includes basic aggregation of actual second by second data.
3. Drilling Key Performance Indicators module (online)
3.1. Aggregation of second by second catergorized data into a “macro” state to provide general drilling KPIs.
4. Real-time Torque and Drag module (online)
4.1. Provides real-time information on downhole friction and hole issues as well as automatic casing monitoring. It helps make necessary adjustments to reduce the chance of stuck pipe event and damaged equipment and ensure casing integrity.
5. Rotational Drilling Guidance module (ready for deployment)
5.1. Learning the best drilling patterns from analog wells using machine-learning algorithm, generate the best drilling roadmap to guide the drilling of a new well.
6. Optimal Wellbore Trajectory Control module (under development)
6.1. Drilling automation: real time monitor the actual well path; propose a best well path back to plan well path once the actual well is deviated from plan well path

Within three months, the RTD analytics system with two analytics modules was built from scratch and placed online in production by a small team (one data scientist, one drilling engineer, two part-time developers). Currently the RTD system has four analytic modules. As time goes, more physics based engineering modules or machine learning modules will be added to the system as necessary. This real-time decision-support tool has been fully accepted by the business and has become a powerful tool for the whole drilling engineering team.

Moderators

Scott Morton

Francisco Sanchez

HEDS Group

Speakers

Dingzhou Cao

Presenter, Anadarko Petroleum Corporation

Chad Loesel

Anadarko

Sanjay Paranji

Anadarko

3 Rapid Development of Open Real Time Drilling Analytics System V3 pdf

Monday October 9, 2017 4:10pm - 4:30pm PDT
Room 280 & 282, BRC 6500 Main Street

Parallel Session B

Host Organization Ken Kennedy Institute for Information Technology

4:30pm PDT

Tools and Infrastructure: Data Analytics for Asset Health Management in Oil & Gas Services

PRESENTATION NOT AVAILABLE

The widespread availability of low-cost sensors, affordable computing power and massive data storage started a revolution in asset health management in recent years. It is now possible to deploy complex workflows dedicated to acquire asset data, detect anomalies, estimate remaining useful life for components, and optimize asset maintenance accordingly. Data Analytics plays a central role in all such workflows, leveraging the tremendous amounts of data generated by IoT sensors, maintenance systems, operating systems, etc.

Oil & Gas service companies have taken their first steps in implementing asset health management schemes across different product lines. Failures in equipment during field operations heavily contribute to maintenance costs, Non-Productive Time (NPT) and, overall, poor Service Quality (SQ). Service companies quickly realized of the huge benefits health management can have for their bottom lines as well as for their reputations with customers.

Our group in Schlumberger is responsible for development and co-deployment of Analytics-based asset management solutions across several Product Lines. Typically, our projects span four distinct phases: Data Ingestion, Exploration & Transformation, Algorithm Development, and Deployment. Two different applications, that we have successfully deployed in the field, will be described in detail in this presentation. The first one entails real-time monitoring of frac pumps; the second one illustrates offline on-demand health analysis and maintenance prescription for a donwnhole LWD tool.

Moderators

Trond Elefsen

Invatare

Roy Keyes

Houston Data Science Group

Speakers

Daniel Viassolo

Team Lead & Principal Data Scientist, Schlumberger

Daniel is the global Team Lead for Equipment Analytics with Schlumberger's Enterprise Solutions organization. Over the last 18 years, he has made impactful contributions to the field of Industrial Asset Health Management and Controls, in diverse applications domains - oil & gas services... Read More →

Monday October 9, 2017 4:30pm - 4:50pm PDT
Room 103, Auditorium, BRC 6500 Main Street

Parallel Session A

Host Organization Ken Kennedy Institute for Information Technology

4:30pm PDT

Extending Tools & Methods: An Extended DEIM Algorithm for Subset Selection

PRESENTATION NOT AVAILABLE

With large data sets becoming more prevalent, there is an increased demand for dimension reduction techniques. One approach to this problem is to select a subset of original samples using the discrete empirical interpolation method (DEIM), preserving the interpretability of the dimension-reduced data set. However, the number of DEIM-selected samples is limited to be no more than the rank of the original data matrix. While this is not an issue for many data sets, there are a number of settings in which this can limit the algorithm's potential for selecting a subset that contains representatives from each class present in the data. In the presented work, we address this issue through an extension of the DEIM algorithm that allows for the selection of a subset with size greater than the matrix rank.

Moderators

Scott Morton

Francisco Sanchez

HEDS Group

Speakers

Emily Hendryx

Rice University

Beatrice Riviere

Noah Harding Chair and Professor, Rice University

Craig Rusin

Chief Technology Officer, Baylor College of Medicine

Dr. Craig Rusin is the Chief Technology Officer at Medical Informatics Corporation. Craig is an engineer and professor whose groundbreaking work in medical research led to the creation of the grid-computing platform that MIC uses to make sense of patient data in order to improve... Read More →

Danny Sorensen

Rice University

Hendryx Rice Data Science F2017 unpublished material removed copy pdf

Monday October 9, 2017 4:30pm - 4:50pm PDT
Room 280 & 282, BRC 6500 Main Street

Parallel Session B

Host Organization Ken Kennedy Institute for Information Technology

4:50pm PDT

Networking Reception

Monday October 9, 2017 4:50pm - 7:00pm PDT
Event Hall, BRC 6500 Main Street

Networking Reception

Host Organization Ken Kennedy Institute for Information Technology

8:00am PDT

Networking & Breakfast

Tuesday October 10, 2017 8:00am - 9:00am PDT
Event Hall, BRC 6500 Main Street

Break

Host Organization Ken Kennedy Institute for Information Technology

9:00am PDT

Welcome - Day 2

Speakers

Keith Cooper

Jan E. Odegard

Executive Director Ken Kennedy Institute/ Associate Vice President Research Computing, Rice University

Tuesday October 10, 2017 9:00am - 9:15am PDT
Event Hall, BRC 6500 Main Street

Welcome

Host Organization Ken Kennedy Institute for Information Technology

9:15am PDT

Keynote: Integrating Simulation, Data Analysis and Deep Learning in Science and Engineering Applications, Rick Stevens, Argonne National Lab

WATCH THE PRESENTATION

The adoption of machine learning is proving to be an amazingly successful strategy for developing predictive models for where we lack a mechanistic understanding of a phenomena. These types of problems are widespread in science and engineering. However, machine learning is recently also starting to have impact when used in conjunction with simulation where we have first principles understanding, and with data analysis cases where it can augment or complement traditional methods.
In this talk, I will discuss the impact that machine learning is having on large-scale data analysis in a variety of fields including projects my group is working on in Cancer research where deep learning, in particular, is used to advance our ability to diagnosis and classify tumors. Recently demonstrated automated systems are routinely out performing human expertise. Deep learning is also being used to predict patient response to cancer treatments and to screen for new anti-cancer compounds. In these areas and in many others, the attraction of deep learning is the ability to train large volumes of data into high-capacity models and to work without complex feature engineering and in the presence of noise. Because of these advantages, it’s becoming an important component of scientific workloads. From a computational architecture standpoint, deep neural network (DNN) based scientific applications have some unique requirements. They require high compute density to support matrix-matrix and matrix-vector operations, but they rarely require 64bit or even 32bits of precision. Thus, architects are creating new instructions and new design points to accelerate training. Most current DNNs rely on dense fully connected networks and convolutional networks, and thus are reasonably matched to current HPC accelerators. However, future DNNs may rely less on dense communication patterns. Similar to simulation codes, power efficient DNNs require high-bandwidth memory to be physically close to arithmetic units to reduce costs of data motion and a high-bandwidth communication fabric between (perhaps modest scale) groups of processors to support network model parallelism. DNNs in general do not have good strong scaling behavior, so to fully exploit large-scale parallelism they rely on a combination of model, data and search parallelism. Deep learning problems also require large quantities of training data to be made available or generated at each node, thus providing opportunities for NVRAM. Discovering optimal deep learning models often involves a large-scale search of hyperparameters. It’s not uncommon to search a space of tens of thousands of model configurations. Naïve searches are outperformed by various intelligent searching strategies, including new approaches that use generative neural networks to manage the search space. HPC architectures that can support these large-scale intelligent search methods as well as efficient model training are needed.

Speakers

Rick Stevens

Since 1999, Rick Stevens has been a professor at the University of Chicago and since 2004, an Associate Laboratory Director at Argonne National Laboratory. He is internationally known for work in high-performance computing, collaboration and visualization technology, and for building... Read More →

Rice Data Science Stevens pdf

Tuesday October 10, 2017 9:15am - 10:00am PDT
Room 103, Auditorium, BRC 6500 Main Street

Keynote

Host Organization Ken Kennedy Institute for Information Technology

10:00am PDT

Plenary: Using Big Data and Machine Learning to Build Spatially Fine-Grained Prediction Models of Wind and Flood Damage Risk, Devika Subramanian, Rice University

WATCH THE PRESENTATION

Accurate, spatially fine-grained prediction models of wind and flood damage risk in coastal urban environments such as Houston are critical for making them resilient in the face of ever-increasing numbers of hydro-meteorological events. In this talk, I will present a new deep learning model for predicting wind damage risk at the kilometer square block level built by fusing high resolution LIDAR data, land use data, structural data on homes from tax appraisal records, as well as damage data on 800,000 homes from Hurricane Ike in Houston. Our model significantly outperforms the state-of-the-art HAZUS-MH4 model currently used by FEMA, improving predictive AUC from 0.43 to 0.7. I will also present some preliminary results on using high resolution LIDAR data, 311 flood call data from 2013-2015, land use data, road and bayou networks to predict street level flooding. Our model’s accuracy is competitive with present-day physics-based models and is significantly more efficient than them in terms of computational resources.

Speakers

Devika Subramanian

Rice University

My research interests are in artificial intelligence and machine learning and their applications in computational systems biology, neuroscience of human learning, assessments of hurricane risks, network analysis of power grids, mortality prediction in cardiology, conflict forecasting... Read More →

Subramanian pdf

Tuesday October 10, 2017 10:00am - 10:30am PDT
Room 103, Auditorium, BRC 6500 Main Street

Plenary

Host Organization Ken Kennedy Institute for Information Technology

10:30am PDT

Break

Tuesday October 10, 2017 10:30am - 11:00am PDT
Event Hall, BRC 6500 Main Street

Break

Host Organization Ken Kennedy Institute for Information Technology

11:00am PDT

Algorithms and Methods: Statistical Inference of Network Structures in Ultra-high Dimensional Space

WATCH THE PRESENTATION

This era of data explosion offers unprecedented opportunities for researchers to investigate and understand scientific questions at a much higher resolution. Particularly, it is of significant importance to further advance our understanding of large-scale networks under a variety of conditions. However, for such challenging problems, the dimension of variables is ultra-high (e.g., tens of thousands or even millions) such that the very standard analyses could become unsuitable, due to the curse of dimensionality. In this study, we concern networks of a size on the order of 104 and the number of parameters on the order of millions. While a few previous computational studies have claimed success in revealing network structures from time-course data, recent work suggests that these methods still suffer from the curse of dimensionality as network size increases to 100 or higher. We thus propose a novel scalable algorithm for identifying complex network structures, and the highlight of our method is that it can achieve a superior performance even for a network size O(104).

Moderators

Natalie Beretovsky

Anadarko

Alena Crivello

Chevron

Speakers

Hongyu Miao

Associate Professor, UT Health

• graphical model, dynamic system, causal model, ultra-high dimensional problem • big data, statistical learning, time series, discrete event, Monte Carlo inference • mHealth, neuroimaging, infectious disease, systems biology, epidemiology

1 DMI Miao pdf

Tuesday October 10, 2017 11:00am - 11:20am PDT
Room 280 & 282, BRC 6500 Main Street

Parallel Session C

Host Organization Ken Kennedy Institute for Information Technology

11:00am PDT

Time-Varying Data Analytics: Self-Learning Models for Securing Payment Card Transactions and Cyber Networks

WATCH THE PRESENTATION

People and devices change their behavior over time and their present behavior may not reflect their normal behavior in the future. Hackers and credit card fraudsters constantly evolve newer techniques to breach organization or defraud a card, shifting the behavior pattern of the criminal cases. So we have developed techniques to constantly monitor, learn and evaluate the shift in behavior pattern of a device or a component to capture true anomalous behavior due to a cyber-breach or a payment card fraud. For instance, to detect payment card fraud leveraging mobile phones, our algorithm sits inside the bank’s mobile app and captures user behavior to learn a rich set of behavioral data on a mobile device based on usage locations, networks connected to, apps used and in-app behavior. Similarly, to detect cyber intrusion, our analytics sits inside the various nodes of a computer-network, and constantly captures the activities and connections of the various devices and components. Archetypes of behavior are then derived from this behavioral data using patented generative Bayesian models called collaborative profiling. Each device or component is then represented as a mixture of these archetypes. Our patented self-calibrating outlier models continually monitor various features derived from these archetypes and learn the evolving normal behavior pattern for each feature in real time, without relying on historical data storage, allowing for up to date and fast outlier computation for each of the feature. All outlier models are then combined together to generate a single anomaly score, using our multi-layer self-calibrating model, which is structured like a Neural Network. Input features in the hidden nodes are selected to minimize correlation between them. The weights are either derived empirically or are based on expert knowledge. High score represents high likelihood of card fraud or cyber intrusion. This provides a sophisticated security mechanism that is immune to spoofing of biometric authentication or escaping the traditional signature based detection algorithms.

Moderators

Craig Rusin

Chief Technology Officer, Baylor College of Medicine

Jim Ward

Two Sigma

Speakers

Shafi Rahman

Scott Zoldi

FICO

Tuesday October 10, 2017 11:00am - 11:20am PDT
Room 103, Auditorium, BRC 6500 Main Street

Parallel Session D

Host Organization Ken Kennedy Institute for Information Technology

11:20am PDT

Algorithms and Methods: Causal Inference Models for Large-Scale Gene Regulatory Network Analysis

PRESENTATION NOT AVAILABLE

The popular methods for construction of gene regulatory networks are correlation analysis for co-expression networks, independent tests and Bayesian networks for construction of small gene regulatory networks. Although it can construct large-scale co-expression network, correlation analysis is unable to discover regulatory direction. The independent tests and Bayesian networks can discover regulatory direction. However, the traditional causal inference can only identify up to Markov equivalence class and cannot find the unique causal solutions or unique gene regulatory network. In addition, Bayesian networks can only identify small gene regulatory networks. To uniquely discover the causal relationships and causal gene regulatory networks, novel functional additive noise model (ANM) for causal inference where smoothing spline regression was used to fit the functional model has been developed and applied to construction of causal gene regulatory network.
Besides pairwise ANM, we also implemented DAG-based Causal Additive Model (CAM), score-based linear structural equation model (SEM) coupled with integer programming as an optimal graph search algorithm, and Glasso algorithm for identifying gene regulatory networks and compare their performances. The four methods were applied to Wnt signaling pathway with RNA-Seq of 145 genes measured in 447 tissue samples. Fifty most significant causal relations were selected to test detection power that was defined as the proportion of correctly inferred paths, given that the paths in the KEGG pathway database were taken as the true paths. Given directed paths in KEGG, results showed that 38% of detected paths by the ANM agreed with the paths in the KEGG, while CAM and SEM only reached 16% and 20% accuracy respectively. Given both directed and undirected paths, the results showed that the ANM could reach 46% accuracy in path detection while CAM, SEM and Glasso could only reach 24%, 26% and 24% respectively. In this experiment, the pairwise ANM outperformed the DAG-based CAM, the score-based linear SEM and the Glasso algorithms for gene regulatory network construction.

Moderators

Natalie Beretovsky

Anadarko

Alena Crivello

Chevron

Speakers

Zixin Hu

Fudan University

Rong Jiao

UT Health

Momiao Xiong

UT Health

Tuesday October 10, 2017 11:20am - 11:40am PDT
Room 280 & 282, BRC 6500 Main Street

Parallel Session C

Host Organization Ken Kennedy Institute for Information Technology

11:20am PDT

Time-Varying Data Analytics: Radiomics - A Tool for Predicting Tumor Response in Head and Neck Cancer Patients

WATCH THE PRESENTATION

Aim: The prediction of tumor response during radiotherapy (RT) could potentially lead to treatment individualization. In a cohort of head and neck cancer (HNC) patients, kinetics of radiomics features were investigated to characterize early changes in tumor structure that could be correlated to treatment response.

Background: Radiomics is a burgeoning research application in clinical oncology. The field allows for quantitative evaluation of anatomical structures from diagnostic imaging modalities. When correlated with clinical characteristics, this application may guide treatment decisions. Previous research attempted to describe the clinical efficacy of radiomics but relied on traditional principal component analysis, which cannot adequately project tumor trends over treatment. Our research, therefore, addresses an unmet need: a cohort of patients undergoing a standardized treatment regimen with tumor characteristics extracted at multiple timepoints for functional analysis.

Methods: Thirty nine HNC patients undergoing image-guided RT were included. Primary tumor response was retrieved at the end of RT, per RECIST v1.1. A total of 155 in-treatment CT scans at days 1, 5, 10 and 15 of RT were reclaimed. Primary gross tumor volumes were contoured. A total of 145 radiomic features were selected from the following categories: intensity-direct, neighborhood intensity difference, grey-level co-occurrence matrix (GLCM), grey-level run length, and shape. These features were analyzed using an in-house radiomics analytics software “IBEX” that runs on Matlab. A Spearman correlation coefiicient cutoff of 0.7 was pre-set to rule out volume-dependent radiomics features. After adjusting for dose, this led to the reduction of 145 features to 7.

Moderators

Craig Rusin

Chief Technology Officer, Baylor College of Medicine

Jim Ward

Two Sigma

Speakers

Shauna Campbell

The University of Texas MD Anderson Cancer Center

Laurence Court

Director, The Radiation Planning Assistant Project; Associate Professor, Radiation Oncology, UT MD Anderson Cancer Center

Hesham Elhalawani

The University of Texas MD Anderson Cancer Center

Xenia Fave

The University of Texas MD Anderson Cancer Center

Rachel Ger

The University of Texas MD Anderson Cancer Center

Robin Granberry

The University of Texas MD Anderson Cancer Center

Jolien Heukelom

The Netherlands Cancer Institute

Amit Jethanandani

MD Anderson

Timothy Lin

The University of Texas MD Anderson Cancer Center

Dennis Mackin

The University of Texas MD Anderson Cancer Center

Elisabeta Marai

The University of Illinois at Chicago

Abdallah Mohamed

The University of Texas MD Anderson Cancer Center

Arvind Rao

The University of Texas MD Anderson Cancer Center

David Vock

Associate Professor, Biostatistics, School of Public Health, University of Minnesota, United States

My research focuses on two major areas. The first is statistical methods development for electronic health data with a particular focus on development of machine learning techniques to handle censored data. Second, I work on novel methods for causal inference and estimation of dynamic... Read More →

Stefania Volpe

The University of Texas MD Anderson Cancer Center, University of Milan

Pei Yang

The University of Texas MD Anderson Cancer Center

Lifei Zhang

The University of Texas MD Anderson Cancer Center

2 Jethanandani Radiomics 10092017 pdf

Tuesday October 10, 2017 11:20am - 11:40am PDT
Room 103, Auditorium, BRC 6500 Main Street

Parallel Session D

Host Organization Ken Kennedy Institute for Information Technology

11:40am PDT

Algorithms and Methods: Recognizing Rock Facies By Gradient Boosting - An Application of Machine Learning in Geophysics

WATCH THE PRESENTATION

Big data analysis has drawn much attention from different industries. Geoscientists, meanwhile, have been doing analysis with voluminous data for many years, without even bragging how big it is. In this paper, we present an application of machine learning, to be more specific, the gradient boosting method, in Rock Facies Classification based on certain geological features and constraints. Gradient boosting is a both popular and effective approach in classification, which produces a prediction model in an ensemble of weak models, typically decision trees. The key for gradient boosting to work successfully lies in introducing a customized objective function and tuning the parameters iteratively based on cross-validation. Our model achieves a rather high F1 score in evaluating two test wells data.

Moderators

Natalie Beretovsky

Anadarko

Alena Crivello

Chevron

Speakers

Cheng Zhan

data scientist, Microsoft

Licheng Zhang

The University of Houston

3 recognize rock facies final pdf

Tuesday October 10, 2017 11:40am - 12:00pm PDT
Room 280 & 282, BRC 6500 Main Street

Parallel Session C

Host Organization Ken Kennedy Institute for Information Technology

11:40am PDT

Time-Varying Data Analytics: Sparse Dynamic Bayesian Network and its Application to Longitudinal Genetic-Imaging Data Analysis

WATCH THE PRESENTATION

Dynamic Bayesian network is a probabilistic framework for modeling gene regulatory networks with time-course and longitudinal expression data. We consider two types of dynamic Bayesian networks: stationary dynamic Bayesian network and non-stationary dynamic Bayesian networks. A stationary dynamic Bayesian network assumes that the structures and parameters of the dynamic Bayesian networks are fixed over time. A non-stationary dynamic Bayesian network assumes that the structures and parameters of the dynamic Bayesian networks vary over time. In general, gene regulatory and signal transduction processes in cells change to response the environmental stimuli and growth, such as immune response, developmental processes and disease progression. The non-stationary dynamic Bayesian networks can be used to model non-stationary gene expression data. In general, the dynamic structural Bayesian networks are sparse. In this talk, we present a novel sparse dynamic network models that is formulated as a nonsmooth optimization problem. The traditional Newton’s method is an efficient tool for solving unconstrained smooth optimization problem, but is not suited for solving large nonsmooth convex problem. Proximal methods can be viewed as an extension of Newton’s method from solving smooth optimization problems to nonsmooth optimization problems. We derive primal gradient algorithm for - penalized maximum likelihood estimation and generalized least square estimates of of parameters in the sparse dynamic structural equation models. Finally, the introduced dynamic structural equation model is coupled with integer programming algorithm for modeling dynamic Bayesian networks that consists of two steps. The dynamic structural equation model at the first step serves two purposes: generating initial dynamic directed graphs and calculate score function for each node. Integer linear programming at the second step is used to search the best dynamic Bayesian network with the optimal score. The proposed model is applied to 282 diffusion tensor images in the Alzheimer’s Disease Neuroimaging Initiative at five different time points: baseline, 3 months, 6 months, 12 months and 24 months. By using the proposed sparse dynamic structural equation model, we can construct brain modulation network at different time points. We hope that our discovery can make some contribution to decode the brain functions.

Moderators

Craig Rusin

Chief Technology Officer, Baylor College of Medicine

Jim Ward

Two Sigma

Speakers

Rong Jiao

UT Health

Nan Lin

UT Health

Momiao Xiong

UT Health

3 Sparse Dynamic Rice Data Science pdf

Tuesday October 10, 2017 11:40am - 12:00pm PDT
Room 103, Auditorium, BRC 6500 Main Street

Parallel Session D

Host Organization Ken Kennedy Institute for Information Technology

12:00pm PDT

Algorithms and Methods: Opportunity Detection for Business-to-Business (B2B) Sales Organizations

WATCH THE PRESENTATION

In a business-to-business setting, sales representatives often face the challenge of simultaneously managing a large number of accounts as well as having a wide offering of products to select from. Understanding the needs of each individual account and knowing which products these accounts are likely to purchase at a particular time often requires more time than a sales representative has. In this talk we will discuss a solution to this problem that utilizes a collection of pattern detection techniques to uncover high-probability sales opportunities, prioritize these opportunities according to their revenue potential, and finally surface them to the sales representative directly in their customer relationship management (CRM) system. PROS is a cloud software company that helps competitive enterprises create frictionless and personalized buying experiences for customers. Fueled by dynamic pricing science and machine learning, PROS solutions make it possible for companies to price, configure and sell their products and services with speed, precision and consistency in an omnichannel environment.

Moderators

Natalie Beretovsky

Anadarko

Alena Crivello

Chevron

Speakers

Justin Silver

Senior Scientist, PROS Inc

Yan Xu

Presenter, PROS Inc

4 PROS RiceDSConf 20171010 v2 pdf

Tuesday October 10, 2017 12:00pm - 12:20pm PDT
Room 280 & 282, BRC 6500 Main Street

Parallel Session C

Host Organization Ken Kennedy Institute for Information Technology

12:00pm PDT

Time-Varying Data Analytics: ST-COPOT—A Serial, Density-Contour Based Spatio-Temporal Clustering Approach and its Application to Taxi Trip Location Streams

WATCH THE PRESENTATION

Spatio-temporal clustering of data streams aims to discover interesting regions in large spatio-temporal data streams efficiently using a small amount of memory and time. In this paper, we propose a serial, density-contour based clustering algorithm called ST-COPOT, which can identify spatio-temporal cluster at multiple levels of granularity in approximately linear time. The proposed algorithm employs a non-parametric density estimation approach and contouring algorithms to obtain spatial clusters from point cloud streams and spatio-temporal clusters are then formed by identifying continuing relationships between spatial clusters in consecutive time windows. In particular, our approach subdivides the incoming data into batches; next, for each batch spatial clusters are generated as regions which are enclosed by polygons that correspond to given set of density thresholds; finally, spatio-temporal clusters are generated by analyzing continuity between spatial clusters in consecutive batches. Moreover, the paper introduces a unique data structure called contour polygon tree which provides a compact representation of the spatial clusters obtained for each batch for different density thresholds, and a family of novel distance functions that operate on contour polygon trees are proposed to identify continuing clusters. We evaluate our approach by conducting a case study involving NYC taxi trips data. The experimental results show that ST-COPOT can effectively discover interesting spatio-temporal patterns in taxi pickup location streams.

Moderators

Craig Rusin

Chief Technology Officer, Baylor College of Medicine

Jim Ward

Two Sigma

Speakers

Christoph F. Eick

University Of Houston

Tuesday October 10, 2017 12:00pm - 12:20pm PDT
Room 103, Auditorium, BRC 6500 Main Street

Parallel Session D

Host Organization Ken Kennedy Institute for Information Technology

12:20pm PDT

Lunch & Networking

Tuesday October 10, 2017 12:20pm - 1:45pm PDT
Event Hall, BRC 6500 Main Street

Break

Host Organization Ken Kennedy Institute for Information Technology

1:45pm PDT

Plenary: Practical Considerations for Data Science Consulting and Innovation in a Large Organization, Yulan Lin and Justin Gosses, NASA

WATCH THE PRESENTATION

There are things that can make or break a data science project that have nothing to do with programming or math, and everything to do with the social, cultural, and management context and constraints of innovating in a large organization. These considerations are not groundbreaking, but they are incredibly important, not taught in academic settings, and rarely discussed. This talk will focus on practical considerations for data science teams that build prototypes and provide internal consulting for organizations larger than 2,000 people in a non-technology industry.

Speakers

Justin Gosses

Software Engineer, Valador / NASA

Justin is a software engineer on the agency data analytics team within NASA’s Office of the Chief Information Officer’s Technology and Innovation division. He consults and builds prototypes for internal NASA customers in finance, human resources, facilities, space technology... Read More →

Yulan Lin

Software Engineer, Valador/NASA

Yulan is a Software Engineer and Data Scientist working with NASA's Office of the Chief Information Officer's Data Analytics Team. She consults and creates prototypes based on new data science, data visualization, and machine learning technologies to help internal clients best leverage... Read More →

Practical Considrations of Data Science final pdf

Tuesday October 10, 2017 1:45pm - 2:15pm PDT
Room 103, Auditorium, BRC 6500 Main Street

Plenary

Host Organization Ken Kennedy Institute for Information Technology

2:15pm PDT

Plenary: Wrangling Data at Texas Advanced Computing Center, Niall Gaffney, Texas Advanced Computing Center

WATCH THE PRESENTATION

For the past five years, advances in computational technologies have led to a revolution in data research that parallel those started 2 decades ago in computational research. With the coming of commodity large scale storage systems, faster networking, and new frameworks for exploring both structured and unstructured data came the era of “big data”. The Texas Advanced Computing Center (TACC), which has been at the forefront of computational research for the past decade, is also leading the way in data research today. TACC has systems for the storing, sharing, and analyzing petascale data along with a team of data scientists and high performance computing experts to help bring new technologies and optimizations to many different research problems. In this talk, we will look at both the hardware and software capabilities at TACC that address many of the challenges faced by data researchers today. We will examine some of the complex data problems being addressed at TACC in fields from genomics and geophysics to civil engineering and economics. We will also look at the shift over the past decade in how researchers interact with computational resources and how TACC is addressing the shift from command line interfaces to rich web based data environments. And finally, we will look at the current state of machine and deep learning, now often called AI, and how this technology is being used to address both data and computational challenges enabling new capabilities for the diverse range of domains doing data research today.

Speakers

Niall Gaffney

Director of Data Intensive Computing, Texas Advanced Computing Center

Niall Gaffney's background largely revolves around the management and utilization of large inhomogeneous scientific datasets. Niall, who earned his B.A., M.A., and Ph.D. degrees in astronomy from The University of Texas at Austin, joined TACC in May 2013. Prior to that he worked for... Read More →

Tuesday October 10, 2017 2:15pm - 2:45pm PDT
Room 103, Auditorium, BRC 6500 Main Street

Plenary

Host Organization Ken Kennedy Institute for Information Technology

2:45pm PDT

Plenary: Interactive and Dynamic Visualization for Clustering, Genevera Allen, Rice University

WATCH THE PRESENTATION

Hierarchical clustering enjoys wide popularity because of its fast computation, ease of interpretation, and appealing visualizations via the dendogram and cluster heatmap. Recently, several have proposed and studied convex clustering and biclustering which similar in spirit to hierarchical clustering, achieve cluster merges via convex fusion penalties. While these techniques enjoy superior statistical performance, they suffer from slower computation and are not generally conducive to representation as a dendogram.

In this talk, we present new convex (bi)clustering methods and fast algorithms that inherit all of the advantages of hierarchical clustering. Specifically, we develop a new fast approximation and variation of the convex (bi)clustering solution path that can be represented as a dendogram or cluster heatmap. Also, as one tuning parameter indexes the sequence of convex (bi)clustering solutions, we can use these to develop interactive and dynamic visualization strategies that allow one to watch their data form clusters.

Speakers

Genevera Allen

Associate Professor, Rice University

I am an Associate Professor at Rice University in the Departments of Statistics, Computer Science (by courtesy), and Electrical and Computer Engineering (by courtesy) and at Baylor College of Medicine where I am an investigator in the Jan and Dan Duncan Neurological Research Inst... Read More →

Tuesday October 10, 2017 2:45pm - 3:15pm PDT
Room 103, Auditorium, BRC 6500 Main Street

Plenary

Host Organization Ken Kennedy Institute for Information Technology

3:15pm PDT

Keynote: Optimal Data Assimilation Algorithms, Ronald DeVore, Texas A&M University

WATCH THE PRESENTATION

A common scientific problem is that we are given some data about a function f and we wish to use this information to either (i) approximatef or (ii) answer some question about f called a quantity of interest. We discussrecent results on data fitting whichdetermine optimal algorithms for the two scenarios above under the assumption that f is in a model class described by approximation.

Speakers

Ronald DeVore

Texas A&M University

Ronald DeVore is a mathematician recognized for his work in applied mathematics, particularly those areas that interface numerical analysis, partial differential equations, data processing, machine learning and approximation of functions. DeVore was born and raised in Detroit... Read More →

DeVore DATA Rice pdf

Tuesday October 10, 2017 3:15pm - 4:00pm PDT
Room 103, Auditorium, BRC 6500 Main Street

Keynote

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: A Comparison of CUR Subset Selection Algorithms

Speakers

Nabil Chaabane

Rice University

Emily Hendryx

Rice University

Beatrice Riviere

Noah Harding Chair and Professor, Rice University

Craig Rusin

Chief Technology Officer, Baylor College of Medicine

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: A computational view of the 2014 Hongkong Umbrella Revolution through Twitter

Speakers

Zhouhan Chen

Rice University

Richard Stoll

Rice University

Devika Subramanian

Rice University

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: A Machine Learning Approach to Debris Estimation During Storm and Flood Events

Speakers

Carl Bernier

Rice University

Jamie Padgett

Rice University

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: A Visually Interpretable Approach to Radiomic Classification

Speakers

William Deaderick

Rice University

Srikanth Kuthuru

Rice University

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Application of Machine Learning Techniques for the Reliability Analysis of Vertical Concrete Dry Casks Tip-over

Speakers

Jamie Padgett

Rice University

Majid Ebad Sichani

Rice University

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Arrays of (locality-sensitive) Count Estimators (ACE): Anomaly Detection on the Edge

Speakers

Chen Luo

Rice University

Anshumali Shrivastava

Professor, Rice University; Founder, ThirdAI Corp

Anshumali Shrivastava's research focuses on Large Scale Machine Learning, Scalable and Sustainable Deep Learning, Randomized Algorithms for Big-Data and Graph Mining.

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Convergence of K-indicators Clustering with Alternating Projection Algorithms

Speakers

Yuchen Yang

Rice University

Yin Zhang

Rice University

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Deep Learning Experiments for Seismic Interpretation

The Cloud Computing Research Lab at Prairie View A&M University has been working on developing a scalable deep-learning enabled cloud platform for petroleum data analytics to facilitate a variety of interpretation scenarios. The recognition of 3D complex geological features in noisy seismic datasets is a daunting challenge in pattern recognition. This poster will present our experiments of applying deep learning to facilitate seismic interpretation on top of our developed seismic data analytics cloud platform, which is built on open source software such as Apache Hadoop, Spark and Google TensorFlow. The poster will demonstrate our experimental results of applying a variety of deep learning models to detect geological faults on real seismic volumes. Moreover, the poster will also present the components of the deep-learning enabled big data analytics platform that manages/analyzes/visualizes large petroleum datasets that are distributed in the Hadoop and Spark environment. The work is sponsored by National Science Foundation (NSF).

Speakers

Ted Clee

President, TEC Applications Analysis

TEC Applications Analysis

Lei Huang

Assistant Professor, Prairie View A&M University

Xishuang Dong

Prairie View A & M University

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall BRC

Poster Session

4:00pm PDT

Poster: Distributed Representations of Extracted Knowledge for Machine Learning of Drug Side-Effect Relationships

Speakers

Trevor Cohen

UT Health

Justin Mower

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Improving Diagnosis of Skin Disease by Combining Deep Neural Network and Human Expertise

Speakers

Cui Tao

UTHSC

Xinyuan Zhang

Graduate Research Assistant, UTHSC

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Improving YouTube-8M Classification Performance by Ensemble Approaches in Google Cloud Platform

Speakers

Shujiao Huang

The University of Houston

Chang-Chun Wan

Zhiwei Xiao

Anadarko

Pei Yang

The University of Texas MD Anderson Cancer Center

Cheng Zhan

data scientist, Microsoft

Licheng Zhang

The University of Houston

Zhenzhen Zhong

Virginia Tech Alumni

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Integrate Deep Believe Network with Ontology Fingerprint for Drug Combination Prediction

Speakers

Guocai Chen

UT Health

W. Jim Zheng

UT Health

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Integration of Functional Principal Component Analysis in Temporal Imaging Analysis: Assessment of Parotid Gland Radiomic Features Changes in Irradiated Head and Neck Cancer Patients

Speakers

Shauna Campbell

The University of Texas MD Anderson Cancer Center

Laurence Court

Director, The Radiation Planning Assistant Project; Associate Professor, Radiation Oncology, UT MD Anderson Cancer Center

Hesham Elhalawani

The University of Texas MD Anderson Cancer Center

Xenia Fave

The University of Texas MD Anderson Cancer Center

Rachel Ger

The University of Texas MD Anderson Cancer Center

Robin Granberry

The University of Texas MD Anderson Cancer Center

Amit Jethanandani

MD Anderson

Timothy Lin

The University of Texas MD Anderson Cancer Center

Dennis Mackin

The University of Texas MD Anderson Cancer Center

Elisabeta Marai

The University of Illinois at Chicago

Abdallah Mohamed

The University of Texas MD Anderson Cancer Center

Arvind Rao

The University of Texas MD Anderson Cancer Center

David Vock

Associate Professor, Biostatistics, School of Public Health, University of Minnesota, United States

Stefania Volpe

The University of Texas MD Anderson Cancer Center, University of Milan

Pei Yang

The University of Texas MD Anderson Cancer Center

Lifei Zhang

The University of Texas MD Anderson Cancer Center

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Kolmogorov Superposition Theorem: Univariate Encodings of Multivariate Functions

Speakers

Jonas Actor

Presenter, Rice University

Matthew Knepley

University of Buffalo

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Latent Variable Models for Hippocampal Sequence Analysis

Speakers

Etienne Ackermann

Rice University

Caleb Kemere

Rice University

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Machine Learning Based Tools for Performance Assessment of Above Ground Storage Tanks Subjected to Storm Surge and Wave Loads

Speakers

Carl Bernier

Rice University

Sabarethinam Kameshwar

Graduate Research Assistat, Rice University

Jamie Padgett

Rice University

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Machine Learning in Experimental High Energy Physics

Speakers

Jamal Rorie

Rice University

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Machine Learning in Oil and Gas Industry

Speakers

Qianyun Zhou

Rice University

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Mixture Proportion Estimation via Dimension Reduction with Classifier

Speakers

Zhenfeng Lin

Texas A&M University

James P. Long

Texas A&M University

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: MOOC Data Developments@RICE

Speakers

Nenpan Tunkuda

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Multi-Task Learning for Commercial Brain Computer Interfaces

Speakers

George Panagopoulos

The University of Houston

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Muon Identification in High-Energy Heavy Ion Collisions

Speakers

James Brandenburg

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Network Reliability Estimation: Latest Methods and Benchmarks for a Hard Computational Problem

Speakers

Leonardo Duenas-Osorio

Rice University

Kuldeep S. Meel

Graduate Student, Rice University

Kuldeep is a PhD student in Rice working with Prof. Moshe Vardi and Supratik Chakraborty and obtained his B.Tech. from IIT Bombay in 2012. His research broadly falls into the intersection of program synthesis, computer-aided verification and formal methods. He is the recipient of... Read More →

Roger Paredes

Rice University

Moshe Vardi

George Distinguished Service Professor in Computational Engineering, Rice University

Moshe Y. Vardi is the George Distinguished Service Professor in Computational Engineering and Director of the Ken Kennedy Institute for Information Technology at Rice University. He is the recipient of three IBM Outstanding Innovation Awards, the ACM SIGACT Goedel Prize, the ACM... Read More →

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Overcoming the Challenges of Large Spatial Datasets

Speakers

Heather O'Connel

Rice University

Julia Schedler

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Perfomance of Missing Data imputation by EM algorithm under Gradient Boosting

Speakers

Weilu Han

UT Health

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Probabilistic hurricane-induced coastal flooding hazard assessment through kriging surrogate modeling techniques

Speakers

Ioannis Gidaris

Rice University

Jamie Padgett

Rice University

Tuesday October 10, 2017 4:00pm - 6:00pm PDT
Event Hall, BRC 6500 Main Street

Poster Session

Host Organization Ken Kennedy Institute for Information Technology

4:00pm PDT

Poster: Scalable and Sustainable Deep Learning via Randomized Hashing

Speakers

Anshumali Shrivastava

Professor, Rice University; Founder, ThirdAI Corp

Anshumali Shrivastava's research focuses on Large Scale Machine Learning, Scalable and Sustainable Deep Learning, Randomized Algorithms for Big-Data and Graph Mining.