1  20
Next
Number of results to display per page
Online 1. F11ENGR6201 : Introduction to Optimization. 2011 Fall [2011]
 Stanford University. Department of Engineering (Sponsor)
 Stanford (Calif.), 2011
 Description
 Book — 1 text file
 Summary

Formulation and analysis of linear optimization problems. Solution using Excel solver. Polyhedral geometry and duality theory. Applications to contingent claims analysis, production scheduling, pattern recognition, twoplayer zerosum games, and network flows. Prerequisite: CME 100 or MATH 51.
 Collection
 Stanford University Syllabi
Online 2. F10ENGR6201 : Introduction to Optimization. 2010 Fall [2010]
 Stanford University. Department of Engineering (Sponsor)
 Stanford (Calif.), 2010
 Description
 Book — 1 text file
 Summary

Formulation and analysis of linear optimization problems. Solution using Excel solver. Polyhedral geometry and duality theory. Applications to contingent claims analysis, production scheduling, pattern recognition, twoplayer zerosum games, and network flows. Prerequisite: CME 100 or MATH 51.
 Collection
 Stanford University Syllabi
Online 3. Sp10ENGR6201 : Introduction to Optimization. 2010 Spring [2010]
 Stanford University. Department of Engineering (Sponsor)
 Stanford (Calif.), 2010
 Description
 Book — 1 text file
 Summary

Formulation and analysis of linear optimization problems. Solution using Excel solver. Polyhedral geometry and duality theory. Applications to contingent claims analysis, production scheduling, pattern recognition, twoplayer zerosum games, and network flows. Prerequisite: CME 100 or MATH 51.
 Collection
 Stanford University Syllabi
 Yan, Xiang (Author)
 20060601
 Description
 Book
 Summary

In this paper, we develop an algorithm that optimizes logarithmic utility in pairs trading. We assume price processes for two assets, with transaction cost linear with respect to the rate of change in portfolio weights. We then solve the optimization problem via a linear programming approach to approximate dynamic programming. Our simulation results show that when asset price volatility and transaction cost are sufficiently high, our ADP strategy offers significant benefits over the chosen baseline strategy. Our baseline strategy is an optimized version of a pairs trading heuristic studied in the literature.
 Collection
 Undergraduate Theses, School of Engineering
5. A Tutorial on Thompson Sampling [2018]
 Russo, Daniel J., author.
 Hanover, MA : Now Publishers Inc., 2018
 Description
 Book — 1 online resource
 Summary

 1. Introduction
 2. Greedy Decisions
 3. Thompson Sampling for the Bernoulli Bandit
 4. General Thompson Sampling
 5. Approximations
 6. Practical Modeling Considerations
 7. Further Examples
 8. Why it Works, When it Fails, and Alternative Approaches Acknowledgements References.
 (source: Nielsen Book Data)
(source: Nielsen Book Data)
Online 6. F11MSandE11101 : Introduction to Optimization. 2011 Fall [2011]
 Stanford University. Department of Management Science and Engineering (Sponsor)
 Stanford (Calif.), 2011
 Description
 Book — 1 text file
 Summary

Formulation and analysis of linear optimization problems. Solution using Excel solver. Polyhedral geometry and duality theory. Applications to contingent claims analysis, production scheduling, pattern recognition, twoplayer zerosum games, and network flows. Prerequisite: CME 100 or MATH 51.
 Collection
 Stanford University Syllabi
Online 7. F10MSandE11101 : Introduction to Optimization. 2010 Fall [2010]
 Stanford University. Department of Management Science and Engineering (Sponsor)
 Stanford (Calif.), 2010
 Description
 Book — 1 text file
 Summary

Formulation and analysis of linear optimization problems. Solution using Excel solver. Polyhedral geometry and duality theory. Applications to contingent claims analysis, production scheduling, pattern recognition, twoplayer zerosum games, and network flows. Prerequisite: CME 100 or MATH 51.
 Collection
 Stanford University Syllabi
Online 8. Sp10MSandE11101 : Introduction to Optimization. 2010 Spring [2010]
 Stanford University. Department of Management Science and Engineering (Sponsor)
 Stanford (Calif.), 2010
 Description
 Book — 1 text file
 Summary

Formulation and analysis of linear optimization problems. Solution using Excel solver. Polyhedral geometry and duality theory. Applications to contingent claims analysis, production scheduling, pattern recognition, twoplayer zerosum games, and network flows. Prerequisite: CME 100 or MATH 51.
 Collection
 Stanford University Syllabi
 Farias, Vivek Francis.
 2007.
 Description
 Book — xiii, 99 p.
 Online

 Search ProQuest Dissertations & Theses. Not all titles available.
 Google Books (Full view)
SAL3 (offcampus storage), Special Collections
SAL3 (offcampus storage)  Status 

Stacks  Request (opens in new tab) 
3781 2007 F  Available 
Special Collections  Status 

University Archives  Request onsite access (opens in new tab) 
3781 2007 F  Inlibrary use 
 Weintraub, Gabriel Y.
 Stanford, CA : Graduate School of Business, Stanford University, [2007]
 Description
 Book — 43 p. : col. ill. ; 28 cm.
 Online
Business Library
Business Library  Status 

Archives: Ask at iDesk  
HF5006 .S72 NO 1969  Inlibrary use 
 Weintraud, Gabriel Y.
 [Rev.]  Stanford, CA : Graduate School of Business, Stanford University, [2007]
 Description
 Book — 33, [4] p. ; 28 cm.
 Online
Business Library
Business Library  Status 

Archives: Ask at iDesk  
HF5006 .S72 NO 1919R  Inlibrary use 
 Weintraud, Gabriel Y.
 Stanford, CA : Graduate School of Business, Stanford University, [2005]
 Description
 Book — 57 p. : col. ill. ; 28 cm.
 Online
Business Library
Business Library  Status 

Archives: Ask at iDesk  
HF5006 .S72 NO.1919  Inlibrary use 
Online 13. Overtheair statistical estimation [2021]
 Lee, Chuan Zheng, author.
 [Stanford, California] : [Stanford University], 2021
 Description
 Book — 1 online resource
 Summary

The data fueling today's rise in machine learning is often generated by devices at the edge of a network, like sensors or mobile devices. To use all this data to train a common model, devices need to communicate something about their own data to a central server. But physical communication channels have limits, and these constraints are increasingly becoming the bottleneck in distributed and federated learning systems. Can we improve such learning algorithms by explicitly incorporating the physical communication layer into their design? We explore this question using a new framework that draws on wireless communication theory and statistical estimation. We propose "analog" estimation schemes that exploit the superposition inherent in multipleaccess wireless networks, and analyze their performance. We then consider fundamental limits on how well "digital" schemes, which separate the communication and estimation stages, can possibly do. Comparing the two under several statistical models shows that the analog approach can yield drastic improvements in estimation error over the digital one. We also derive lower bounds for analog schemes that are within a logarithmic factor of our achievability results, and we present experimental results showing that these ideas can translate to performance gains in a federated machine learning context
 Also online at

Online 14. Robust causal inference and machine learning with clinical applications [2020]
 Yadlowsky, Steven, author.
 [Stanford, California] : [Stanford University], 2020
 Description
 Book — 1 online resource
 Summary

As healthcare data becomes increasingly ubiquitous, improving datadriven biomedical research is timely and important. There is a rush to learn from these new sources of data, and to implement research findings into clinical practice. While machine learning methods provide compelling examples of recognizing sophisticated patterns in data, their impact rests heavily on their ability to use data to influence decision making, especially in healthcare. The relationship between machine learning and decision making becomes particularly clear through the lens of causal inference. In general, the harm and benefit attributed to a medical decision depends on the causal treatment effect of the decision in the appropriate population, beyond their baseline risk of poor outcomes. In precision medicine research, the goal is to develop treatment decisions for individual patients by considering the subpopulation of individuals with similar covariates to each patient. This thesis advances methodology and practice for applying machine learning to learn better decisionmaking rules that influence clinical practice, and understanding the fundamental possibilities and limitations of using data to learn to make optimal decisions. First, we develop an approach for personalized treatment effect estimation based on the relative ratio of treatment outcomes. Second, we study when we can trust causal results learned from data, and develop a sensitivity analysis for conditional and average treatment effects to bound the bias created from unobserved confounding. Third, noting that treatment benefit is highly correlated with baseline risk for preventative treatments for atherosclerotic cardiovascular disease (ASCVD), we use machine learning approaches to improve ASCVD risk predictions from longitudinal cohort data that affect clinical prescribing practice, particularly among underrepresented minorities
 Also online at

Online 15. Efficient exploration in bandit and reinforcement learning [2019]
 Kazerouni, Abbas, author.
 [Stanford, California] : [Stanford University], 2019.
 Description
 Book — 1 online resource.
 Summary

Sequential decision making problems appear is the core problem in many real world applications. In such problems, an agent is aiming at achieving a certain goal by optimally taking a sequence of actions based on noisy observations. Bandit and reinforcement learning are fundamental frameworks for modeling decision making under uncertainty. Efficient exploration in such problems significantly increases data efficiency by speeding up the learning process and requiring less data for making decisions. As such, it is of utmost importance to design sophisticated exploration schemes based on the special characteristics of each practical problem. In this dissertation, we first consider a safe exploration problem in linear bandits and proposes an algorithm that satisfies safety constraints while minimizing the regret. We provide theoretical analysis and simulation results to demonstrate the efficiency of the proposed algorithm. Then, we consider best arm identification problem in generalized linear bandits and provide a gapbased exploration strategy that achieves desirable accuracy. We also provide an upper bound on the sample complexity of the proposed algorithm and offer numerical studies to evaluate its performance.
 Also online at

Online 16. Sufficient statistics for team decision problems [electronic resource] [2013]
 Wu, Jeffrey.
 2013.
 Description
 Book — 1 online resource.
 Summary

Decentralized control problems involve multiple controllers, each having access to different measurements but working together to optimize a common objective. Despite being extremely difficult to solve in general, a common thread behind the more tractable cases is the identification of sufficient statistics for each controller, i.e. reductions of the measurements for each controller that do not sacrifice optimal performance. These sufficient statistics serve to greatly reduce the controller search space, thus making the problem easier to solve. In this dissertation, we develop for the first time a general theory of sufficient statistics for team decision problems, a fundamental type of decentralized control problem. We give rigorous definitions for team decisions and team sufficient statistics, and show how team decisions based only on these sufficient statistics do not affect optimal performance. In a similar spirit to the Kalman filter, we also show how to gracefully update the team sufficient statistics as the state evolves and additional measurements are collected. Finally, we show how to compute team sufficient statistics for partially nested problems, a large class of team decision problems that tend to have easier solutions. These team sufficient statistics have intuitive and compelling interpretations. We also show general conditions when these team sufficient statistics can be updated without the state growing in size. To illustrate the results, we give examples for finitestate systems and systems whose variables are jointly Gaussian.
 Also online at

Special Collections
Special Collections  Status 

University Archives  Request onsite access (opens in new tab) 
3781 2013 W  Inlibrary use 
Online 17. Adaptive and efficient batch reinforcement learning algorithms [2021]
 Liu, Yao, author.
 [Stanford, California] : [Stanford University], 2021
 Description
 Book — 1 online resource
 Summary

Reinforcement learning (RL) focuses on solving the problem of sequential decisionmaking in an unknown environment and achieved many successes in domains with good simulators (Atari, Go, etc), from hundreds of millions of samples. However, realworld applications of reinforcement learning algorithms often cannot have highrisk online exploration. To bridge this gap, this dissertation investigates how to perform reinforcement learning from an offline dataset, named batch reinforcement learning methods. We provide theoretically justified new algorithms, as well as empirical validation in simulation and real datasets. More specifically, this dissertation studies the two main aspects of batch reinforcement learning. How to evaluate a policy given a fixed dataset collected by other policies? Offline policy evaluation is a challenging counterfactual reasoning problem, in these partialinformation and sequential settings. The solution to this problem relies on two types of estimators: direct modelbased estimators and importance reweighing estimators. We propose a modelbased estimator with meansquare error bound and analyze a variance reduction heuristic for importance sampling. How to learn a new policy from a fixed dataset collected by other policies? Learning policy using function approximation from a fixed dataset suffers from overestimating a counterfactual policy by empirical reward/value maximization. Prior theoretical justification relies on strong assumptions about the data distribution and thus is less informative to guide practice. We propose the idea of pessimism to constrain the policy search space and avoid instability. Following this intuition, we proposed three new algorithms: the first convergent policy gradient, value function learning with finite sample error bounds, and avoiding the overfitting in direct policy gradient. The primary contribution of this dissertation is advancing the foundations of batch RL and develop batch RL algorithms with provable guarantees under realistic assumptions. One of the driving goals is finitesample error bounds for algorithms in function approximation settings. To that end, this dissertation makes progress in both policy evaluation and policy learning problems by studying the theory of these problems and using the theoretical insights to derive new algorithms with strong empirical performance as well
 Also online at

Online 18. Learning preferences from choices and rankings [2021]
 Seshadri, Arjun, author.
 [Stanford, California] : [Stanford University], 2021
 Description
 Book — 1 online resource
 Summary

A large and growing experimental literature has shown that individual choices and judgements can be affected by irrelevant aspects of the context in which they are made. Despite these findings, much of the existing modeling work in preference learning still relies on the simplifying assumption that choices come from the maximization of a stable utility function. In this dissertation, we discuss our progress in tractably modeling violations of utilitybased reasoning in choices and rankings at scale. First, we describe the context dependent random utility model (CDM), our choice model that captures a broad class of context effects while remaining inferentially tractable. Second, we consider testing when violations of a popular notion of rationality, the Independence of Irrelevant Alternatives (IIA), exist in practice. Our work contributes effective methods for testing IIA and characterizes the fundamental statistical limitations of doing so. Third, we show how our advances in choice modeling can be leveraged to develop the contextual repeated selection (CRS) model of ranking, a model that brings a natural multimodality and richness to the rankings space along with strong statistical guarantees
 Also online at

Online 19. Problems, models, and algorithms in datadriven energy demand management [electronic resource] [2014]
 Albert, Adrian.
 2014.
 Description
 Book — 1 online resource.
 Summary

A compelling vision for the electricity grid of the 21st century is that of a highlyinstrumented system that integrates distributed generation from renewable and conventional sources where superior monitoring allows a targeted, localized, dynamic matching of demand and supply while maintaining a high degree of overall stability. To better monitor demand, utilities have recently deployed massive advanced sensing infrastructure (smart meters) to collect energy consumption data at fine (subhourly) time scales from large consumer populations; thus, there is urgent need formalize the new problems and develop the appropriate models, scalable algorithms, and methodologies that can leverage this new information to improve grid operations. The key tension in shaping demand is that while benefits from demandside management programs are relevant in the aggregate (over many consumers), consumption change happens at the level of the indivdual consumer. As such, incentive schemes (e.g., dynamic pricing) that aim to change certain aspects of the average consumer's consumption may not be optimal for any particular} real consumer. Thus, the perspective this thesis takes is that of datadriven energy program targeting, i.e., using smart meter readings for identifying highpotential types of consumers for certain demandresponse and energyefficiency programs, and designing tailored controls and incentives to improve their usage behavior. This is as much a computational and engineering problem as a management and marketing one. The central contribution of this thesis is on methodology for quantifying uncertainty in individual energy consumption, and relating it to the potential for flexibility for the design and operation of certain demandside programs. In particular, three algorithmic and modeling contributions are presented that are motivated by the question of comparing and benchmarking the impact and potential of individual consumers to providing flexibility for demandside management. First, it is noted that individual consumption is empirically observed to be highly volatile; as such no matter how good a predictive model, part of consumption will remain uncertain. Here, this variability is shown to be related to the stress each consumer places on the grid (through their respective costofservice); moreover a scalable clustering algorithm is proposed to uncover patterns in variability as encoded in typical distribution functions of consumption. Second, a model of individual consumption is proposed that interprets smart meter readings as the observed outcome of latent, temperaturedriven decisions to use either heating, air conditioning, or no HVAC at all; algorithms for learning such response models are introduced that are based on the maximum likelihood estimation framework. The dynamic consumption model is validated experimentally by emphasizing the intended enduse of statistical modeling when comparing with groundtruth data. A third methodological contribution leverages the statistical description of individual consumer response to weather to derive normative, tailored control schedules for thermallysensitive appliances. These actions are optimal in the sense that they both satisfy individual effort constraints, and contribute to reducing uncertainty in the aggregate over a large population. In addition to the algorithmic and modeling contributions, this thesis presents at great length the application of the methods developed here to realistic situations of segmentation and targeting large populations of consumers for demandside programs. We illustrate our models and algorithms on a variety of data sets consisting of heterogeneous sources  electricity usage, weather information, consumer attributes  and of various sizes, from a few hundred households in Austin, TX to 120,000 households in Northern California. We validate our dynamic consumption model experimentally, emphasizing the end purpose of decisions made using the outcome of the statistical representation of consumption. Finally, we discuss the two sides of the data coin  increased effectiveness in program management vs potential loss of consumer privacy  in an experimental study in which we argue that certain patterns in consumption as extracted from smart meter data may in some cases aid in predicting relevant consumer attributes (such as the presence of large appliances and lifestyles such as employment or children), but not many others. This, in turn, can enable the the program administrator or marketer to target those consumers whose actual data indicates that they might respond to the program, and may contribute to the debate on what consumers unwillingly reveal about themselves when using energy.
 Also online at

Special Collections
Special Collections  Status 

University Archives  Request onsite access (opens in new tab) 
3781 2014 A  Inlibrary use 
Online 20. Design and implementation of stochastic control policies via convex optimization [electronic resource] [2012]
 Wang, Yang.
 2012.
 Description
 Book — 1 online resource.
 Summary

In this dissertation we consider the design and implementation of control policies for stochastic control problems with arbitrary dynamics, objective, and constraints. In some very special cases, these problems can be solved analytically. For instance, when the dynamics are linear, and the objective is quadratic, the optimal control policy is linear state feedback. Another simple case where the optimal policy can be computed exactly is when the state and action spaces are finite, in which case methods such as value iteration or policy iteration can be used. When the state and action spaces are infinite, but low dimensional, the optimal control problem can be solved by gridding or other discretization methods. In general however, the optimal control policy cannot be tractably computed. In such situations, there are many methods for finding suboptimal controllers that hopefully achieve a small objective value. One particular method we will discuss in detail is approximate dynamic programming (ADP), which relies on an expression for the optimal policy in terms of the value function of the stochastic control problem. In ADP we use the same expression as the optimal policy, but replace the true value function with an approximation. Another widelyused suboptimal policy is receding horizon control (RHC), also known as model predictive control (MPC). In MPC, we solve an optimization problem at each time step to determine a plan of action over a fixed time horizon, and then apply the first input from the plan. At the next time step we repeat the planning process, solving a new optimization problem, with the time horizon shifted one step forward. In the design of policies such as these, we must choose parameters, such as approximate value functions, terminal costs, time horizon, to achieve good performance. Ideally, we would like to be able to compare the performance of a suboptimal controller with the optimal performance, which we cannot compute. In this dissertation, we describe a general method for obtaining a function that lower bounds the value function of a stochastic control problem. Our method yields a numerical lower bound on the optimal objective value, as well as a value function underestimator that can be used as a parameter for ADP or MPC. We can then compare our bound to the performance achieved by our policies. If the gap between the two is small, we can conclude that the policies are nearly optimal, and the bound is nearly tight. Thus, our method simultaneously yields suboptimal policy designs, as well as a way to certify their performance. Our underestimator/bound is nongeneric, in the sense that it does not simply depend on problem dimensions and some basic assumptions about the problem data. Instead, they are computed (numerically) for each specific problem instance. We will see that for many problem families, our method is based on solving a convex optimization problem, thus avoiding the `curses of dimensionality' usually associated with dynamic programming. One drawback of the suboptimal policies we design is that an optimization problem must be solved at each time step to determine the input to apply to the system. Using conventional optimization solvers this can take seconds, if not minutes. Thus, applications of these policies have been traditionally limited to systems with relatively slow dynamics, with sampling times measured in seconds, minutes, or hours. In the second part of this dissertation, we outline a collection of optimization methods that exploit the particular structure of the control problem. Our custom methods are up to around 1000 times faster compared with generic optimization packages such as SDPT3 or SeDuMi. These advances, combined with everincreasing computing power, extends the application of optimization based policies to a wide range of applications, including those with millisecond or microsecond sampling periods.
 Also online at

Special Collections
Special Collections  Status 

University Archives  Request onsite access (opens in new tab) 
3781 2012 W  Inlibrary use 
Articles+
Journal articles, ebooks, & other eresources
Guides
Course and topicbased guides to collections, tools, and services.