eng Embeddings trained on CONLL2017 Corpora
eng The embeddings were trained with finalfrontier on the CONLL2017 corpora with more than 100m tokens. For all languages embeddings, were trained with the skip- and structgram algorithms and contain subword ngrams. All embeddings are stored in the finalfusion format and can be used an processed with tools provided by the finalfusion ecosystem. N-Gram range (inclusive): 3 - 6 Number of hashing buckets: 2^21 Hashing function: FNV-1a Window size: 10 Negative Samples: 5 Dimensions: 300 Minimum Token Frequency: 30
2020-09-15
1
41d1ad19-4548-45f9-b43c-186315227aff
8cefa5dd-f5fb-4527-8acb-88cc6824eb48