Take the probability of loss times the amount of possible loss from the probability of gain times the amount of possible gain. That's what I'm trying to do in business. It's imperfect, but that's what it's all about.
top of page
Search
Recent Posts
See AllML Algorithms Comparative Analysis Overview Classification Algorithms Logistic Regression Naive Bayes Classifier Regression Algorithms Linear Regression Either Classification or Regression Algorithms
why in the transformer model from the "Attention is all you need" paper there is no activation applied after both the multihead attention layer and to the residual connections. It seems to me that the
bottom of page