cherry, cherry!

Memento coding

Wednesday, 2 May 2018

HSK5 words sharing semantic/phonetic components

Let's extend and deepen the analysis made in the previous post to common HSK5 only words semantic/phonetic components...
This time, I used a dictionary file to couple every hanzi with its semantic/phonetic component.
You can find the source code of the Python script producing the gexf file here:


Opening the file in Gephi gives the following result (click here to see image at full resolution!).



And here is a detail of the graph:


In this graph, outbound edges represent the character contribution as semantic or phonetic component for another hanzi.
It's nice to observe some contributions...let's take for example the character 少 (shao3), with the following outbound edges:




We can observe it contributes to 劣 (lie4) semantically ("bad, inferior") conveying the message of "lower" and to 抄 (chao1) phonetically (as the latter results as a "slight" phonetic variation of shao3).

Besides, when calculating the degree for every node, Gephi reports the following degree distribution:


It suggests that in this set there is a large amount of hanzi contributing to a little portion of the other characters and a really small subset of hanzi contributing to many characters. This observation is confirmed when reading the palette percentages for the nodes out-degree:




No comments:

Post a Comment