Memento Coding: HSK5 words sharing semantic/phonetic components

Let's extend and deepen the analysis made in the previous post to common HSK5 only words semantic/phonetic components...

This time, I used a dictionary file to couple every hanzi with its semantic/phonetic component.

You can find the source code of the Python script producing the gexf file here:

https://github.com/antigones/hsk5_rad_graph

Here is a "walkable" view of the graph you can play with:

https://antigones.bitbucket.io/projects/hsk5_graph/#output_etym.gexf

Opening the file in Gephi gives the following result (click here to see image at full resolution!).

And here is a detail of the graph:

In this graph, outbound edges represent the character contribution as semantic or phonetic component for another hanzi.

It's nice to observe some contributions...let's take for example the character 少 (shao3), with the following outbound edges:

We can observe it contributes to 劣 (lie4) semantically ("bad, inferior") conveying the message of "lower" and to 抄 (chao1) phonetically (as the latter results as a "slight" phonetic variation of shao3).

Besides, when calculating the degree for every node, Gephi reports the following degree distribution:

It suggests that in this set there is a large amount of hanzi contributing to a little portion of the other characters and a really small subset of hanzi contributing to many characters. This observation is confirmed when reading the palette percentages for the nodes out-degree:

Wednesday, 2 May 2018

HSK5 words sharing semantic/phonetic components

No comments:

Post a Comment