cherry, cherry!

Memento coding

Wednesday, 25 April 2018

The HSK5 words graph

Some time ago I faced an incredible challenge: study and pass HSK5.
During the time spent studying, I thought it could be useful to target all the similar words in the HSK5 vocabulary.
It's easy for me to remember chinese pictographs if they have some prominent characteristic but when this prominent characteristic is missing, this task for me can be difficult.

So I developed a small script to show all the HSK5 characters sharing some common component, just for the sake to spot the "similar" one.
Ok, I know that talking about "similarity" with Chinese characters is something deeply different (anyone said "radicals"?), but this is just a naive way to have a representation.

I wrote the script in Python, starting from HSK5 character list: the script basically build a pseudo-adjacency list. Since I wanted to use Gephi for the representation, I used repetitions of characters in the matrix to enforce the weight of the edges in the graph (pretty ugly but fast and visualization-effective!), since Gephi adds the weight on repetition.

Here you can find the sources:

https://github.com/antigones/hsk5_words_graph

Ok, now what it is this graph like? Full resolution here!


Looking at the graph some nice relationships emerge:



Some student asked me to make this graph navigable so I used Gephi to obtain a .gexf file to load in this nice visualizer:

https://github.com/raphv/gexf-js

And here is the link to the final result:

https://antigones.bitbucket.io/projects/hsk5_graph/





No comments:

Post a Comment