Indicators for quality in small wikipedias
To look at only article numbers can be very misleading as many have learned since Waraywaray wikipedia passed a million articles. According to the story I remember, this guy from Sweden wanted to honour the language of his wife from the Philippines – and knew how to write programs that translate certain types of easily translatable short entries from English. No idea if that stuff is readable. A huge wikipedia. Is it a “success”?
Article numbers as such (which have often been compared and celebrated a sign of progress of a wikipedia version) say nothing or little about quality and acceptance. For example the "African language top wikipedia" would be Malagasy with more than 90,000 articles but it has only very few readers and shows in the user statictics of Madagascar with just 0.8% amongst all wikipedia lookups in that country, far behind French, English an Russian (!).
For evaluation I propose to go for a mix of indicators.
Number of articles
Number of articles is for very small wikipedias a good indicator.
Beyond say 2,000 we should also look for some quality indicators.
Thematic spread of "1000 articles" and "10,000 articles every wikipedia should have"
These two give an idea if a wikipedia version in a given language has a good spread of topics across the wide field of knowledge.
An easy one is the ranking ("score") in the 1000-article-index (List of wikipedia by sample of articles)
Top for African languages is Afrikaans at 26%, Swahili at 17% and all others are down at 8% and less of possible 100 points.
Similar the 10,000 article list (List of Wikipedias by expanded sample of articles)
Here Afrikaans is at at 32%, Swahili at 20%, Malagasy here better at 14%, Egyptian Arabic at 12%, the rest at 10% and less of reachable points.
Pageviews and active contributors
Generally https://stats.wikimedia.org/EN/Sitemap.htm gives an insight into basic numbers of wikipedias.
Pageview numbers are important (who reads the stuff??) but difficult to compare because of the numbers of speakers vary so much between languages.
Interestig for the background are also the details of how many wikipedians are actually active as contributors. Especially small wikipedias often depend on only one, two or three authors.
I propose to look for the market share in the “home country” (Like Afrikaans/South Africa, Swahiili/Tanzania-Kenya, Amharic – Ethiopia), using https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryBreakdown.htm.
I see here an indicator for the importance and acceptance of a wikipedia language in its own society. General numbers of viewers can be misleading if the rate of internet users in a country is small, so a small number of expatriates in a foreign country with easy internet access can generate relatively high numbers. An example is Yoruba with hardly any viewers in Nigeria which has its viewer base in China and USA.
These figures are statistically perhaps not sooo strong for some countries (because of relatively small view numbers over all). It is also possible to look at the readership of a language by checking “Pageviews per language” https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm showing the countries where requests come from. Many smaller African language versions have their readers abroad, in USA or Europe (the homesick African student? Exception; Igbo!).
Interestingly but not surprisingly no African language wikipedia has more than 12% share of overall wikipedia views in the “home country”. This has to do with the fact that in all African countries higher levels of education are in former colonial languages so that speakers of African languages a) often are not used to read and write in their language and b) often lack active vocabulary on a wide field of topics.
September 2018: Swahili has the top position amongst African languages in the "home country" Tanzania, at 11.2% of all Tanzania wikipedia lookups. Somalia with a very small number of internet users has a share of 10.5% users in Somali language. Amharic follows in Ethiopia with 4.7% (down from previous higher levels), Afrikaans with 2.3% in South Africa.
(Data from August 2017, in earlier version of this entry)
In August 2017 Somali (11.9% in Somalia) and Swahili (10.9% in Tanzania) were top. Afrikaans showed 3.0% lookups in South Africa; considering that the share of speakers is 13.5% of the population this could point to a share of more than 20&% between Afrikaans speakers. Amharic reached now 7% in Ethiopia. These languages show interesting positives changes compared to less than 1 year abroad.
(Data from December 2016, in earlier version of this entry)
Top are Somali in Somalia (very weak database) and Swahili for Tanzania with 8-9 %, the large majority reads English wikipedia. Afrikaans reaches less than 2% in South Africa wikipedia lookups, Amharic gets 4% in Ethiopia.
Yoruba is not visible in Nigeria wikipedia lookups, vast majority of its readers seem to be abroad, same for Malagasy in Madagascar.
Check of local relevance of a language version by comparing use of selected articles
I try to balance that with a check using the langviews analysis tool at https://tools.wmflabs.org/langviews/?project=en.wikipedia.org
(figures of December 2016 follow)
I go for some locations which will probably not be searched a lot from outside the country. (Not for Cape Town, not for Dar es Salaam, as these are sought from all over the world, therefor they must have higher English version readers. I assume that small places will be looked up rather by people inside the country). I get a comparison of language searchs for the entry if it is connected to wikidata. My random check shows a surprisingly strong position of Swahili in the interlanguage search compared to English.
Places in Tanzania
Pos. Lang. Name lookup/day
# en Mbozi District 6 / day
# sw Mbozi 3 / day
# en Mbeya Rural District 3 / day
# sw Mbeya Vijijini 1 / day
# en Mpwapwa District 4 / day
# sw Wilaya ya Mpwapwa 4 / day
# en Kigoma Region 34 / day
# sw Mkoa wa Kigoma 28 / day
# en Sumbawanga 15 / day
# sw Sumbawanga (mji) 1 / day
# en Tabora 39 / day
# sw Tabora (mji) 6 / day
# en Tabora Region 20 / day
# sw Mkoa wa Tabora 17 / day
This very tentative comparison puts Swahili in a pretty strong position compared to English! If we have content people find it (by way of search machines) and look it up!
Ok, this just as some indicators for ways to look for quality. Just quantity should not be the decisive factor when looking where to invest energy and time.