Third Post: auto-translate the description/title from Russian to English. (An hour.)


NOTE: I claim no knowledge of anyone on this list, or the veracity of their titles, since they were supplied in Russian, and I don’t read Russian.  The only change I made from the original list is to translate each item into various languages.

Took a lot longer than my estimate of an hour!  Running the final script after debugging, took four hours itself.  And, since I had to sign up for one of the translation services API, I’m not including the code I wrote for it.

I’ve uploaded the spreadsheet to Catbox:

https://files.catbox.moe/4y5p7q.csv

The first line is “headers”.  There are 30 columns.  The first three start with “src_” and are from the original list: “src_name_ru” for the name in Russian; “src_name_en” for the name in English; and “src_desc_ru” for the description in Russian.

The second three columns are the “reverse” of the first three: “name_ru_en” is the name in Russian translated to English; “name_en_ru” is the English name translated to Russian; and “desc_ru_en” is the description in Russian, translated to English.

The format of those three headings matches the format of the following headings: begins with the part after “src_” from the first three columns, with “_LANG” added to the end, where “LANG” is the language being translated into.  The following columns are all translated into the same target language; the above second three columns are similar, except the target is the opposite (from Russian to English and vice versa) of the original.

Then the final 24 columns are the first three columns translated into the other 8 of the top 10 languages by speaker, with the same layout: “name_ru_LANG“, “name_en_LANG“, and “desc_ru_LANG” to translate each of the 3 initial inputs from the list provided by https://www.mid.ru/ru/maps/us/1814243/ (which loads slowly; I see someone archived it so I’ll include a link to that as well: https://archive.ph/5Jr9O ).

30 columns total: 10 languages, times 3 for the translation of the name in Russian, name in English, and description in Russian.  (The math works/fits because both Russian and English are in the top ten languages, by speaker; see list, below.)

I notice some differences in the translations in the second three columns.  Looks like perhaps they had “more details” in the original English?  There are for example “Jr.” in the “src_name_en” that are not present in the “name_ru_en” (meaning, they were also not present in the “src_name_ru”).  And those two columns appear to have the last name in all caps, for most but not all of them.

It also appears to be in alphabetical order by last name, based on scanning the “name_ru_en” column.  I should split the name into up to four names (first, middle, last, suffix (“Jr.”, “IV”, etc)); perhaps that should be “Pre-Fourth Post”?  There, I just updated the About page, to add it.

The top ten languages by speaker, I obtained from here:

https://en.wikipedia.org/wiki/List_of_languages_by_total_number_of_speakers

  1. English (en)
  2. Mandarin Chinese (zh-CN)
  3. Hindi (hi)
  4. Spanish (es)
  5. French (fr)
  6. Modern Standard Arabic (ar)
  7. Bengali (bn)
  8. Russian (ru)
  9. Brazilian Portuguese (pt-BR)
  10. Urdu (ur)

 


Leave a Reply

Your email address will not be published. Required fields are marked *