THE ROLE OF FOLK ART BASED DESCRIPTIONS AND MARKETPLACE INDICATORS IN PRICING INDIAN FOLK ART ONLINE: AN NLP-BASED TEXT ANALYTICS STUDY

    DOI: https://doie.org/10.65985/APER.2026549597

    Authors:

    Vidhya Rao, Surekha Kohle


    Keywords:

    BoW, TF-IDF, Word2Vec, LASSO, Random Forest, XGBoost, Folk art paintings


    Abstract:

    Online marketplaces have become major platforms for the commercialization of Indian folk and traditional paintings, yet empirical evidence on how textual descriptions influence artwork pricing remains limited. This study examines how descriptive language, along with marketplace indicators such as art type, painting area, ratings, and reviews, shapes prices of online folk art listings. We introduce the Indian Painting Ecommerce Metadata (IPEM) dataset comprising 385 manually authenticated online listings of Indian paintings, including textual descriptions, prices, physical dimensions, art form categories, and market signals. Manual verification ex-cluded counterfeit and replica artworks, ensuring dataset reliability. Machine learning–based text analytics are applied using three representation techniques: Bag of Words (BoW), Term Frequency–Inverse Document Frequency (TF-IDF), and Word2Vec. Each representation is paired with an appropriate learning algorithm. High-dimensional and sparse BoW features are analyzed using LASSO (Least Absolute Shrinkage and Selection Operator) regression to enable feature selection and interpretability. TF-IDF representations are modeled using Random Forest, while Word2Vec embeddings are combined with XGBoost (Extreme Gradient Boosting) to exploit semantic interactions. Experimental results show that TF-IDF with Random Forest achieves the strongest predictive performance, explaining approximately 73% of the variance in log-transformed prices. The BoW + LASSO model reveals that keywords related to cultural identity, regional heritage, craftsmanship, traditional materials, and emotional aesthetics positively influence pricing, whereas decor-oriented, generic, or reproduction-related descriptors are negatively associated with value. The study provides managerial insights for sellers, marketplaces, and policymakers, emphasizing strategic text optimization to enhance visibility and pricing outcomes.


    PDF

Type: Journal

Language: English

Publisher: ya tai jing ji bian ji bu

ISSN: 1000-6052

Email: [email protected]