Tagged: Using Toggle Comment Threads | Keyboard Shortcuts

  • jkabtech 12:17 pm on January 14, 2018 Permalink | Reply
    Tags: , , Salad, Spoon, Using   

    Make Tuna Salad Faster by Using a Spoon Instead of a Fork 

    Melissa KirschFriday 3:22pmFiled to: time savingtuna fishkitchen hacks691EditSend to EditorsPromoteShare to KinjaToggle Conversation toolsGo to permalink

    View the Original article

    Advertisements
     
  • jkabtech 12:17 pm on January 3, 2018 Permalink | Reply
    Tags: , Paper, Toilet-Seat, Using, Wrong,   

    You're Using That Paper Toilet-Seat Cover Wrong 

    Leigh AndersonFriday 9:30amFiled to: ToiletsHygienePersonal Hygiene2896EditSend to EditorsPromoteShare to KinjaToggle Conversation toolsGo to permalink

    View the Original article

     
  • jkabtech 4:17 am on November 24, 2017 Permalink | Reply
    Tags: , , , , Using, visual   

    eBay launches visual search tools that let you shop using photos 

    eBay today is launching two new visual search tools that will allow online shoppers to use photos they snap, have saved on their phone, or even those they find while browsing the web or other social networking sites, in order to find matching products from eBay’s catalog. The tools, Image Search and Find it on eBay, leverage advancements in computer vision and deep learning, including the use of neural networks, the company notes.

    These tools were originally announced this July, with plans to launch in the fall.

    The first tool, Image Search, allows mobile consumers to take a photo of something they want to buy or use an image saved to their phone’s Camera Roll in order to shop eBay. The website will then return listings of items that are either a close match or at least visually similar to the product you’ve photographed.

    View the Original article

     
  • jkabtech 4:17 am on November 13, 2017 Permalink | Reply
    Tags: Distributed, , Hivemind, , Using   

    Show HN: Hivemind – Distributed jobs using AWS Lambda functions 

    GitHub – littlstar/hivemind: For creating distributed jobs using AWS Lambda functions Skip to content Features Business Explore Marketplace Pricing —

    View the Original article

     
  • jkabtech 4:17 am on October 11, 2017 Permalink | Reply
    Tags: Beginner's, , Strap-On, Using   

    The Beginner's Guide to Using a Strap-On 

    Illustration by Jim Cooke.

    One of the great things about strap-ons is that anyone can use them. If you have a penis, you can use a strap-on. If you don’t have a penis, you can use a strap-on. You can use a strap-on to penetrate a partner of any gender, to subvert stereotypical gender roles or to play with those roles. You can use one for fun, or you can use one to manage erectile or orgasmic challenges. You can use a strap-on for vaginal sex, anal sex, oral sex, manual sex, or masturbation.

    If you’re interested in trying it out, here’s your beginner’s guide.

    View the Original article

     
  • jkabtech 8:17 pm on October 10, 2017 Permalink | Reply
    Tags: Actually, , , , Using   

    Actually Using Your Phone Might Make Work Easier 

    Image credit: Fabio Sola Penna/Flickr

    Who prefers phone calls over emails? No one, that’s who. If making phone calls for work elicits a bit of anxiety, you’re not alone. TrackMaven CEO Allen Gannett also preferred texting and email over good old fashioned phone calls, but decided to emulate the traits of the more productive people and turn to the tried and true telephone instead of the impersonal email.

    Connecting with customers or clients on the phone might be more helpful if you’re trying to discern how they actually feel about something. It’s notoriously difficult to decipher emotions in texts and emails; a 2005 study revealed recipients of either serious or sarcastic emails were able to identify the tone only 56% of the time.

    Advertisement

    Gannett became more phone-friendly by responding to emails with a request for a phone call instead. He also kept a call list, people he needed to reach out to over the course of the week. The result?

    View the Original article

     
  • jkabtech 5:51 pm on July 24, 2017 Permalink | Reply
    Tags: efficiently, , Using   

    Using chip memory more efficiently 

    http://news.mit.edu/sites/mit.edu.newsoffice/files/styles/browse_news_image/public/images/2017/MPC-CMSE-Summer-Scholars-2017-Forner-Cuenca-presentation-Brushett-lab-MIT-00.jpg?itok

    View the Original article

     
  • jkabtech 5:51 pm on July 23, 2017 Permalink | Reply
    Tags: construct, , , Regex, Using   

    Show HN: Using functions to construct Regex in Python 

    GitHub – iogf/crocs: Write regex using pure python class/function syntax and test it better. (Regex for humans). Skip to content Features Business Explore Marketplace Pricing —

    View the Original article

     
  • jkabtech 3:17 pm on January 19, 2016 Permalink | Reply
    Tags: Numpy, Replacement, stdndslice, Using   

    Using D and std.ndslice as a Numpy Replacement 

    Published January 2, 2016

    Disclosure: I am writing this article from a biased perspective. I have been writing Python for six years, three professionally, and have written a book on Python. But, I have been writing D for the past eight months and four months ago I started contributing to D’s standard library. I also served as the review manager for std.ndslice’s inclusion into the standard library.

    Today, the new addition to D’s standard library, std.ndslice, was merged into master, and will be included in the next D release (v2.070, which is due this month). std.ndslice is multidimensional array implementation, not unlike Numpy, with very low overhead, as it’s based on D’s concept of ranges which avoids a lot of copying and allows for lazy generation of data. In this article, I will show some of the advantages std.ndslice has over Numpy and why you should consider D for your next numerical project.

    This article is written for Numpy users who might be interested in using D. So while it will cover some D basics, D veterans can learn something about std.ndslice as well.

    Simply put, if you write your numerical code in D, it will be much, much faster while retaining code readability and programmer productivity.

    This section is mainly for D newbies. If you already know D, I suggest you skim the code and then head straight for the Getting Hands On section.

    To give you a quick taste of the library before diving in, the following code will take the numbers 0 through 999 using the iota function (acts like Python’s xrange) and return a 5x5x40 three dimensional range.

    import std.range : iota;import std.experimental.ndslice;void main() { auto slice = sliced(iota(1000), 5, 5, 40);}

    D is statically typed, but for the sake of simplicity, this article will use D’s type deduction with auto. The sliced function is just a factory function that returns a multidimensional slice. The sliced factory function can also accept regular arrays, as they are ranges as well. So now we have a 5x5x40 cube with the numbers 0 through 999.

    A range is a common abstraction of any sequence of values. A range is any type (so a class or struct) which provides the functions front, which returns the next value in the sequence, popFront which moves the sequence to the next value, and empty, which returns a boolean determining if the sequence is empty or not. Ranges can either generate their values as they are called, lazy, or have a sequence of values already and just provide an interface to those values, eager.

    For a more in depth look at ranges, see The official D tutorial’s section on ranges

    Look Ma, no allocations! This is due to iota returning a lazy input range, and sliced returning a Slice (which is the struct that lies at the heart of std.ndslice) that acts as a wrapper around the iota range and modifies the underlying data as it’s accessed. So, when the data in sliced is accessed, the Slice range calls the iota values, which is lazily generated, and determines in what dimension the value will be in and how it will be returned to the user.

    iota -> slice -> user accessing the data

    So, std.ndslice is a bit different in concept than Numpy. Numpy creates its own type of arrays while std.ndslice provides a view of existing data. The composition of ranges to create something completely new is the basis of ranged-based code, and is one of the reasons D is so powerful. It allows you to make programs where the values returned are like parts on an assembly line, going from station to station, only to be assembled at the very end to avoid unnecessary allocations. This will be important to remember when the performance benchmarks are compared.

    The classic example of this is the following code, which takes in input from stdin, takes only the unique lines, sorts them, and outputs it back to stdout

    import std.stdio;import std.array;import std.algorithm;void main() { stdin // get stdin as a range .byLine(KeepTerminator.yes) .uniq // stdin is immutable, so we need a copy .map!(a => a.idup) .array .sort // stdout.lockingTextWriter() is an output range, meaning values can be // inserted into to it, which in this case will be sent to stdout .copy(stdout.lockingTextWriter());}

    For an advanced look at lazy generation with ranges, see H. S. Teoh’s article Component programming with ranges in which, he writes a calendar program with ranges (that sits entirely on the stack!).

    Because slice is three dimensional, it is a range which returns ranges of ranges. This can easily be seen by looping over the values:

    import std.range : iota;import std.stdio : writeln;import std.experimental.ndslice;void main() { auto slice = sliced(iota(1000), 5, 5, 40); foreach (item; slice) { writeln(item); }}

    Which outputs something like this (shortened for brevity)

    [[0, 1, … 38, 39], [40, 41, … 78, 79], [80, 81, … 118, 119], [120, 121, … 158, 159], [160, 161, … 198, 199]]…[[800, 801, … 838, 839], [840, 841, … 878, 879], [880, 881, … 918, 919], [920, 921, … 958, 959], [960, 961, … 998, 999]]

    The foreach loop in D is much like the for loop in Python. The difference being that D gives you the option of C style loops and Python style loops (using for and foreach respectively) without having to use workarounds like enumerate or xrange in the loop.

    Using Uniform Function Call Syntax (UFCS), the original example can be rewritten as the following:

    import std.range : iota;import std.experimental.ndslice;void main() { auto slice = 1000.iota.sliced(5, 5, 40);}

    UFCS transforms the call

    a.func(b)

    to

    func(a, b)

    if a doesn’t have a method named func.

    UFCS makes generative range-based code easier to follow, so it will be used in the rest of the examples in this article. For a primer on UFCS and why it was made, see this article by Walter Bright, D’s creator.

    If you don’t want to follow along with the code and play around with std.ndslice, then skip to the next section. There are two ways to get your hands on std.ndslice: use digger to download and build the head of the DMD, the reference D compiler, master branch, or use dub, D’s official package manager/build system, to download the dub version.

    This article will cover the dub path as using digger to get the latest executable is well explained on it’s GitHub page. Download dub from the above link or use the instructions on the same page to get it using your package manager of choice.

    Once you have dub, create a new directory with a new file called dub.json which is dub’s config file. I will not explain the dub.json format here, there is a tutorial for that here, if you just want to follow along, copy and paste the following code:

    { “name”: “test”, “sourcePaths”: [“.”], “dependencies”: { “dip80-ndslice”: “~>0.8.7” }, “targetType”: “executable”}

    This configuration tells dub that your project, named test, that lies in the current directory, will be compiled to a executable, and requires the package “dip80-ndslice” (a DIP is a D Improvement Proposal, much like a PEP). Now, in a new file called main.d, we can import std.ndslice

    import std.experimental.ndslice;void main() {}

    Why the std.experimental? For those of you who are not familiar with the process, all new modules in the D standard library must wait in a staging area, std.experimental, before going into the main namespace. This is to allow people to test new modules and find any bugs that were overlooked during the review process while signaling that the code is not quite ready for prime time.

    To build and run this project, use dub with no arguments

    $ dub

    std.ndslice has many of the same functions that Numpy has. In the following two sections, I could just provide some simple examples of Numpy and translate them, but halfway through writing that I realized anyone could find that out themselves by reading the documentation, so this section is designed to whet your appetite. To read the docs for std.ndslice and see the function equivelents, click here.

    Translating multidimensional slicing from Numpy to std.ndslice is very simple. The example

    a = numpy.arange(1000).reshape((5, 5, 40))b = a[2:-1, 1, 10:20]

    is equivalent to

    auto a = 1000.iota.sliced(5, 5, 40);auto b = a[2 .. $, 1, 10 .. 20];

    The main difference is D’s use of the $ as a symbol for the range’s length. Any Numpy slicing code can be translated to std.ndslice no problem.

    So let’s look at something a bit more involved. Lets take a 2d array and get an array of means of each of the columns.

    Python import numpydata = numpy.arange(100000).reshape((100, 1000))means = numpy.mean(data, axis=0)D import std.range;import std.algorithm.iteration;import std.experimental.ndslice;import std.array : array;void main() { auto means = 100_000.iota .sliced(100, 1000) .transposed .map!(r => sum(r) / r.length) .array;}

    To make this comparison apples to apples, I forced execution of the result in order to get a D array at the end by appending array. If I had not done that, the final D result would be a lazy input range rather than a D array, which would be unfair to Numpy, as the Numpy code outputs an array at the end. In a normal D program however, the results would not be computed until they are used by another part of the program. Also, D doesn’t have any stats functions in it’s standard library (yet, it’s being worked on), so this example uses a simple lambda function for the mean. In the map function call, you may have noticed the ! in front of the parentheses. This denotes a compile time function argument rather than a run time argument. The compiler generates the map function code based on the lambda function.

    As a quick aside, this example also illustrates something Walter Bright said about D in his 2014 Dconf talk:

    No [explicit] loops. Loops in your program are bugs.

    The reason the D code is much more verbose than the Python, is the map function with the mean lambda in this code works on any sequence of values that conforms to the concept of a finite input range (duck typing), where as the Python version uses a special Numpy function that only works on Numpy arrays. I will elaborate on this point in the section titled Numpy’s Main Problem, and How D Avoids It and why I believe the D version is better.

    But despite the D code’s length, it is way faster.

    These numbers were recorded on a 2015 MacBook Pro with a 2.9 GHz Intel Core Broadwell i5. For Python, I used IPython’s %timeit functionality to get a fair time. I made sure to only test the numpy.mean line in the Python code in order to not measure Numpy’s known slow initialization times. For the D code, I used std.datetime.benchmark with 10000 tests and took the mean of the results. Compiled with LDC, the LLVM based D compiler v0.17.0 alpha 1 (compiled with LLVM 3.6) ldmd2 -release -inline -boundscheck=off -O. For those of you using dub, that is equivalent to doing dub –build=release-nobounds –compiler=ldmd2.

    Results: Python: 145 µsLDC: 5 µsD is 29x faster

    Not bad, considering the above D code uses the often loathed D GC in order to allocate the new array, and the fact that the vast majority of Numpy is written in C. To quote Walter Bright once again:

    There really is no reason your D code can’t be as fast or faster than C or C++ code.

    Numpy is fast. Compared to regular array handling in Python, Numpy is several orders of magnitude faster. But there in lies the problem: normal Python and Numpy code don’t mix well.

    Numpy lives in it’s own world with its own functions and ways of handling values and types. For example, when using a non-numpy API or functions that don’t use Numpy that return regular arrays, you either have to use the normal Python functions (slow), or use np.asarray which copies the data into a new variable (also slow). A quick search on GitHub shows just how widespread this issue is with 139,151 results. Granted, some of those are mis-uses of Numpy, where array literals could be directly passed to the function in order to avoid copies, but many aren’t. And this is just open source code! I have seen this pattern many times in closed source projects where it can’t be avoided, save rewriting large parts of an existing code base.

    Another example of this problem is the amount of Python standard library functions that had to be rewritten in Numpy to take advantage of the type information. Examples include:

    No, not all of the functions in those links have standard library equivalents, but there is enough of them to start asking questions about DRY.

    The problem with the duplication is having, again, switching contexts with Python and Numpy code. Accidentally writing

    sum(a)

    instead of

    a.sum()

    and whoops, your code is 10x slower.

    The root cause of the above problems is that Python code can only be made so fast, so the Numpy developers tried to make an unholy match of Python and a type system using a ton of C code.

    D is a compiled and statically, strongly typed language to begin with. It’s code generation already takes advantage of type information with arrays and ranges. With std.ndslice, you can use the entire std.algorithm and std.range libraries with no problems. No code had to be reworked/rewritten to accommodate std.ndslice. And, as a testament to D’s code generation abilities with it’s templates, std.ndslice is entirely a library solution, there were no changes to the compiler or runtime for std.ndslice, and only D code is used.

    Using the sum example above:

    import std.range : iota;import std.algorithm.iteration : sum;import std.experimental.ndslice;void main() { auto slice = 1000.iota.sliced(5, 5, 40); auto result = slice // sum expects an input range of numerical values, so to get one // we call std.experimental.ndslice.byElement to get the unwound // range .byElement .sum;}

    This code is using the same sum function that every other piece of D code uses, in the same way you use it every other time.

    As another example, the Pythonic way to get a list of a specified length that is initialized with a certain value is to write

    a = [0] * 1000

    but Numpy has a special function for that

    a = numpy.zeros((1000))

    and if you don’t use it your code is four times slower (in this case) not even counting the copying you would have to do with numpy.asarray in the first example. In D, to get a range of a specified length initialized with a certain value you write

    auto a = repeat(0, 1000).array;

    and to get the ndslice of that

    auto a = repeat(0, 1000).array.sliced(5, 5, 40);

    The where Numpy really shines is the large amount of libraries that are built with it. Numpy is used in tons of open source financial and machine learning libraries, so if you just use those libraries, you can write fast numerical programs in Python. Numpy also has tons of tutorials, books, and examples on the Internet for people to learn from.

    But, this isn’t exactly a fair comparison in my opinion, as it could be argued that std.ndslice isn’t actually released yet, as it’s still in std.experimental. Also, this is already starting to change, as ndslice’s author, Ilya Yaroshenko, has stated his next project is writing a std.blas for D, completely in D using std.ndslice.

    The following example and explanation was written by Ilya Yaroshenko, the author of std.ndslice, who was gracious enough to let me include it in this article. I have reworded and expanded in some places. This example uses more complicated D code, so don’t worry if you don’t understand everything.

    Now that you have a more through understanding of the library, this will be a more advanced example. This code is a median image filter as well as the command line interface for the resulting program. The function movingWindowByChannel can also be used with other filters that use a sliding window as the argument, in particular with convolution matrices such as the Sobel operator.

    movingWindowByChannel iterates over an image in sliding window mode. Each window is transferred to a filter, which calculates the value of the pixel that corresponds to the given window.

    This function does not calculate border cases in which a window overlaps the image partially. However, the function can still be used to carry out such calculations. That can be done by creating an amplified image, with the edges reflected from the original image, and then applying the given function to the new file.

    /**Params: filter = unary function. Dimension window 2D is the argument. image = image dimensions `(h, w, c)`, where ? is the number of channels in the image nr = number of rows in the window n? = number of columns in the windowReturns: image dimensions `(h – nr + 1, w – nc + 1, c)`, where ? is the number of channels in the image. Dense data layout is guaranteed.*/Slice!(3, C*) movingWindowByChannel(alias filter, C)(Slice!(3, C*) image, size_t nr, size_t nc){ // local imports in D work much like Python’s local imports, // meaning if your code never runs this function, these will // never be imported because this function wasn’t compiled import std.algorithm.iteration: map; import std.array: array; // 0. 3D // The last dimension represents the color channel. auto wnds = image // 1. 2D composed of 1D // Packs the last dimension. .pack!1 // 2. 2D composed of 2D composed of 1D // Splits image into overlapping windows. .windows(nr, nc) // 3. 5D // Unpacks the windows. .unpack // 4. 5D // Brings the color channel dimension to the third position. .transposed!(0, 1, 4) // 5. 3D Composed of 2D // Packs the last two dimensions. .pack!2; return wnds // 6. Range composed of 2D // Gathers all windows in the range. .byElement // 7. Range composed of pixels // 2D to pixel lazy conversion. .map!filter // 8. `C[]` // The only memory allocation in this function. .array // 9. 3D // Returns slice with corresponding shape. .sliced(wnds.shape);}

    A function that calculates the value of iterator median is also necessary. This function was designed more for simplicity than for speed, and could be optimized heavily.

    /**Params: r = input range buf = buffer with length no less than the number of elements in `r`Returns: median value over the range `r`*/T median(Range, T)(Range r, T[] buf){ import std.algorithm.sorting: sort; size_t n; foreach (e; r) { buf[n++] = e; } buf[0 .. n].sort(); immutable m = n >> 1; return n & 1 ? buf[m] : cast(T)((buf[m – 1] + buf[m]) / 2);}

    The main function:

    void main(string[] args){ import std.conv: to; import std.getopt: getopt, defaultGetoptPrinter; import std.path: stripExtension; // In D, getopt is part of the standard library uint nr, nc, def = 3; auto helpInformation = args.getopt( “nr”, “number of rows in window, default value is ” ~ def.to!string, &nr, “nc”, “number of columns in window, default value is equal to nr”, &nc); if (helpInformation.helpWanted) { defaultGetoptPrinter( “Usage: median-filter [] []\noptions:”, helpInformation.options); return; } if (!nr) nr = def; if (!nc) nc = nr; auto buf = new ubyte[nr * nc]; foreach (name; args[1 .. $]) { import imageformats; // can be found at code.dlang.org IFImage image = read_image(name); auto ret = image.pixels .sliced(cast(size_t)image.h, cast(size_t)image.w, cast(size_t)image.c) .movingWindowByChannel !(window => median(window.byElement, buf)) (nr, nc); write_image( name.stripExtension ~ “_filtered.png”, ret.length!1, ret.length!0, (&ret[0, 0, 0])[0 .. ret.elementsCount]); }}

    I hope any Python users who have read this found std.ndslice tempting, or at least interesting. If you feel the need to learn more about D, then I highly suggest the official D tutorial here.

    And I would suggest any D users reading this to consider moving any Numpy code they have written to std.ndslice.

    View the original article here

     
  • jkabtech 11:13 am on January 19, 2016 Permalink | Reply
    Tags: basketball, , , predict, scores, Using   

    Using machine learning to predict basketball scores 

    By: Scott Clark, PhD

    Here at SigOpt we think a lot about model tuning and building optimization strategies; one of our goals is to help users get the most out of their Machine Learning (ML) models as quickly as possible. When our last hackathon rolled around I was inspired by some recent articles about using machine learning to make sports bets. For my hackathon project I teamed up with our amazing intern George Ke and set out to use a simple algorithm and open data to build a model that could predict the best basketball bets to make. We used SigOpt to tune the features and hyperparameters of this model to make it as profitable as possible, hoping to find a winning combination that could beat the house. Is it possible to use optimized machine learning models to beat Vegas? The short answer is yes; read on to find out how [0].

    Broadly speaking, there are three main challenges before deploying a machine learning model. First, you must Extract the data from somewhere, Transform it into a usable state, and then Load it somewhere you can quickly access it (ETL). This stage often requires a lot of creativity and good old-fashioned hacking. Next, you must apply your domain expertise about the problem to build the features and pick the model that will best solve it. Once you have your data and model you must train and tune the model to get it to the best possible state. This is what we will focus on in this post.

    It is often completely intractable to tune a model with more than a handful of parameters using traditional methods like grid and random search, because of the curse of dimensionality and how resource-intensive this process is. Model tuning is non-intuitive and orthogonal to the domain expertise required for the rest of the ML process so it is often also prohibitively inefficient to be done by hand. However, with the advent of optimization tools like SigOpt to properly tune models it is now possible for experts in any field to get the most out of their models quickly and easily. While sometimes in practice this final stage of model building is skipped, it can often mean the difference between making money and losing money with your model, as we see below.

    We used one of the simplest possible sports bets you can make in Vegas for our experiment, the Over/Under line. This is a bet that the total number of points scored by both teams in a game will be higher, or lower, than some number that Vegas picks. For example, if Vegas says the sum of scores for a game will be 200.5, and the scores totaled to 210, and we bet “over,” then we would be entitled to \$100 of winnings for every \$110 we bet [1], otherwise (if we bet “under” or the score came in lower than 200.5) we would lose our \$110 bet. On each game we simulated the same \$110 bet (only winning \$100 when we choose correctly). We picked NBA games for the experiment both for the wide availability of open statistics [2] and because over 1,000 games are played per year, giving us many data points with which to train our model.

    We picked a random forest regression model as our algorithm because it is easy to use and has interesting parameters to tune (hyperparameters) [3]. 23 different team-based statistics were chosen to build the features of the model [4]. We did not modify the feature set beyond our initial picks in order to show how model tuning, independent of feature selection, would fare against Vegas. For each of the 23 features we created a slow and fast moving average for both the home and away team. These fast and slow moving averages are tunable feature parameters which we use SigOpt to optimize [5]. The averages were calculated both for a total number of games and for a number of games of similar type (home games for the home team, away games for the away team). This led us to 184 total features for every game and a total of 7 tunable parameters [3] [5].

    The output of our model is a predicted number of total points scored given the historical statistics of the two teams playing in a given game. If the model predicts a lower score than the Vegas Over/Under line then we will bet under; similarly if the model predicts a higher score we will bet over. We will also let SigOpt tune how “certain” the model needs to be in order for us to make a bet by only simulating a bet when the difference between our prediction and the overunder line is greater than a tunable threshold.

    We used the ‘00-’14 NBA seasons to train our model (training data), and random subsets of the ‘14-’15 season to evaluate it in the tuning phase (test data). For every set of tunable parameters we calculated the average winnings (and variance of winnings) that we would have achieved over many random subsets of the testing data. Every evaluation took 15 minutes on a high CPU Linux machine. Note that grid search and random search (traditional approaches to model tuning) would be an impractical way to perform parameter optimization on this problem because the number of required evaluations grows so large with the number of parameters for both methods [6]. SigOpt takes a linear number of evaluations with respect to the number of parameters in practice. It is worth noting that even though it requires fewer evaluations, SigOpt also tends to find better results than grid and random search. Figure 1 shows how profitability increases with evaluations as SigOpt tunes the model.

    image

    Figure 1: Over the course of 100 different train and test evaluations, SigOpt was able to tune our model from losing more than \$500 to winning more than \$1,000, on average.  This value was computed on random subsets of the ‘14-’15 test season, which was not used for training.

    Once we have used SigOpt to fine tune the model, we want to see how it performs on a holdout dataset that we have never seen before. This is simulating using our model to make bets where the only information is historical information. Since the model was trained and tuned on the ‘00-’15 seasons, we used the first games of the ‘15-’16 season (being played now) to evaluate our tuned model. After simulating 131 total bets over a month, we observe that the SigOpt tuned model would have made \$1,550 in profit. An untuned version of this same model racked up \$1,020 in losses over the same holdout dataset [7]. Not only does model tuning with SigOpt make a huge difference, but a simple, well-tuned model can beat the house.

    image

    Figure 2: The blue line is cumulative winnings after each day of the SigOpt tuned model. The grey dashed line is the cumulative winnings of the untuned model. The dashed red line is the breakeven line.

    We are releasing all of the code used in this example in our github repo. We were able to use the power of SigOpt optimization to take a relatively simple model and make it beat Vegas. Can you use a more complicated model to get better results? Can you think of more features to add? Does including individual player stats increase accuracy? All of these questions can be explored by forking the repository and using a free trial of SigOpt to tune your model [0].

    [0]: All bets in this blog post were simulated, no actual money was gambled. SigOpt does not advocate sports gambling. Check your local laws to learn if gambling is legal in your area. Make sure you read the SigOpt terms of service before using SigOpt to tune your models.

    [1]: Betting \$110 to win \$100 is part of the edge that Vegas keeps. This keeps a player from breaking even by picking “over” and “under” randomly.

    [2]: NBA stats: http://stats.nba.com, Vegas lines: http://www.nowgoal.net/nba.htm

    [3]: http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html We will tune the hyperparameters of n_estimators, min_samples_leaf and min_samples_split.

    [4]: 23 different team level features were chosen: points per minute, offensive rebounds per minute, defensive rebounds per minute, steals per minute, blocks per minute, assists per minute, points in paint per minute, second chance points per minute, fast break points per minute, lead changes per minute, times tied per minute, largest lead per game, point differential per game, field goal attempts per minute, field goals made per minute, free throw attempts per minute, free throws made per minute, three point field goals attempted per minute, three point field goals made per minute, first quarter points per game, second quarter points per game, third quarter points per game, and fourth quarter points per game.

    [5]: The feature parameters included the number of games to look back for the slow and fast moving averages, as well as an exponential decay parameter for how much the most recent games count towards that average (with a value of 0 indicating linear decay), and the threshold for the difference between our prediction and the overunder line required to make a bet.

    [6]: Even a coarse grid of width 5 would require 5^7 = 78125 evaluations, taking over 800 days to run sequentially. The coarse width would almost certainly also perform poorly compared to the Bayesian approach that SigOpt takes, for examples see this blog post.

    [7]: The untuned model uses the same random forest implementation (with default hyperparameters), the same features, a fast and slow moving linear average of 1 and 10 games respectively, and a “certainty” threshold of 0.0 points.

    View the original article here

     
c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel
%d bloggers like this: