Applications of OpenOdia tools¶
Objectives¶
- Application of the tools in the package are mentioned here.
- These will expand your ideas to have practical realistic implementation of NLP tools in Odia language.
- Which in turn will increase productivity and digital presence of Odia resources.
Generate Odia names¶
- Using the Odia name generation package you can create lakhs of unique Odia names.
from openodia import name
name.generate_names()
'''
['ଯଦୁମଣୀ ମାଢ଼ୀ', 'ବାସନ୍ତି ବ୍ରହ୍ମା', 'ପ୍ରବୀଣ ସିଂହ ମୁକ୍କିମ', 'ବୃନ୍ଦାବନ ଧଳ', 'ଅଶ୍ୱିନୀ କିଶୋର ଜଗଦେବ',
'ଶ୍ରୀଯୁକ୍ତ ଇରାଶିଷ ସେଠୀ', 'କୁମାରୀ ସୁମନ ସିଂଦେଓ', 'ସଲିଲ ଅଲ୍ଲୀ ଛତ୍ରିଆ', 'ଦିବାକରନାଥ ରାଧାରାଣୀ ଆଚାର୍ଯ୍ୟ', 'ଦୁର୍ଗା ସୁନ୍ଦରସୁର୍ଯ୍ୟା ପୁଟୀ']
'''
Name morphology
- In Odia names, mostly the surname carries the family name.
- Therefore, there is a limited number of surnames, which can be used to identify names in a text.
Exceptions
- There has been a trend recently where, after marriage the spouse add their in-law's surname; which makes two surnames in a single name.
- However, recently there has been no surnames in the names or surname without any family name has been kept. Which needs to be kept an eye for.
Data Masking¶
- As these names database has been taken from Wikipedia and actual person names in Odisha, you can use these as NER for person name recognition.
- In data engineering projects when you need to ingest large amount of data, you need to mask or anonymize the PI (Personal Identification) data from the data source.
- This is to remove bias and increase privacy on your data/model.
Masking names:¶
- To avoid disrupting privacy of individuals, you can identify the names of people in the text and
- Replace the names with a mask tag like
<name>
. -
Example
ବୈଦ୍ୟନାଥ ସିଂହଦେଓ ବାବୁ ସସ୍ତ୍ରୀକ ନିଜ କାରରେ ବସି ପୁରୀ ବୁଲିଯାଇଛନ୍ତି ।
has potential personal data that which a person can derive like:ବୈଦ୍ୟନାଥ ସିଂହଦେଓ
is married.ବୈଦ୍ୟନାଥ ସିଂହଦେଓ
is male.- He has a car.
- He is in Puri.
- To avoid this kind of personal privacy exposure, we need to mask the text to
<name> ବାବୁ ସସ୍ତ୍ରୀକ ନିଜ କାରରେ ବସି ପୁରୀ ବୁଲିଯାଇଛନ୍ତି ।
- The problem with this method is you miss the credibility of an important feature of the input data. As there is no uniqueness among the name field.
Anonymization of names:¶
- Using this method rather than completely removing all names with a
<name>
tag you can substitute with a fake name. - It maintains the uniqueness of the name field.
- However, in many cases you may need to revert back to the original person name after a certain operation has been completed. That's not possible as anonymization is an irreversible procedure.
-
Example
ବୈଦ୍ୟନାଥ ସିଂହଦେଓ ବାବୁ ସସ୍ତ୍ରୀକ ନିଜ କାରରେ ବସି ପୁରୀ ବୁଲିଯାଇଛନ୍ତି ।
will be anonymized asବ୍ରଜମୋହନ ପଣ୍ଡା ବାବୁ ସସ୍ତ୍ରୀକ ନିଜ କାରରେ ବସି ପୁରୀ ବୁଲିଯାଇଛନ୍ତି ।
- Did you notice how
ବୈଦ୍ୟନାଥ ସିଂହଦେଓ
has changed toବ୍ରଜମୋହନ ପଣ୍ଡା
?
- Here are some situations in which you may want to use anonymization:1
- If you no longer need to communicate or work with a consumer, but wish to archive their activity, order history, or any other details that could not be used to identify them.
- To perform data analyses that are unrelated to the services you provide the consumer.
- If you need to make data available to a group of people outside those that are designated to fulfill your services, such as a wide group of employees or consultants.
Pseudonimization of names:¶
- This is same as anonymization, with only difference that, the anonymized names can be replaced with their original names.
-
Example
ବୈଦ୍ୟନାଥ ସିଂହଦେଓ ବାବୁ ସସ୍ତ୍ରୀକ ନିଜ କାରରେ ବସି ପୁରୀ ବୁଲିଯାଇଛନ୍ତି ।
will be pseudonymized ashrtvd6wqbvs4320 ବାବୁ ସସ୍ତ୍ରୀକ ନିଜ କାରରେ ବସି ପୁରୀ ବୁଲିଯାଇଛନ୍ତି ।
- Did you notice how
ବୈଦ୍ୟନାଥ ସିଂହଦେଓ
has changed tohrtvd6wqbvs4320
? - Here
hrtvd6wqbvs4320
has shown as a demonstration of a key, which can be stored in a hash table with value asବୈଦ୍ୟନାଥ ସିଂହଦେଓ
. - This text can be further reversed by referring to this hash table or a unique private key.
- Data pseudonymization can be used when you will need to re-identify users in the future:1
- To keep data secure during the fulfillment of services, by masking identifying details to employees or other data handlers that do not need those details.
- To maintain data protection within your database or records, order histories, and inactive customers that you remain in contact with.
- To transfer data over international borders.
- To maintain data protection and Privacy by Design principles laid out by the GDPR.
If you find this page useful, please cite this using:
@misc{OpenOdia,
author = {Soumendra Kumar Sahoo},
title = {OpenOdia Applications},
howpublished = {\url{https://www.openodia.soumendrak.com/}},
year = {2021}
}
Last update: 2023-03-29