Package 'shoppingwords' reference manual

Title:	Text Processing Tools for Turkish E-Commerce Data
Description:	Provides several datasets useful for processing and analysis of text in Turkish from an online shopping platform.
Authors:	Betul Kan-Kilinc [aut, cre] (ORCID: <https://orcid.org/0000-0002-3746-2327>), Mine Çetinkaya-Rundel [ctb] (ORCID: <https://orcid.org/0000-0001-6452-2420>), Colin Rundel [ctb] (ORCID: <https://orcid.org/0000-0002-6058-8251>)
Maintainer:	Betul Kan-Kilinc <[email protected]>
License:	MIT + file LICENSE
Version:	0.1.0
Built:	2026-05-20 07:02:58 UTC
Source:	https://github.com/bkanx/shoppingwords

Remove Stopwords from User Reviews

Description

This function processes a dataframe containing user reviews and removes predefined stopwords. It first searches the package's internal stopwords dataset (stopwords_tr), and if no match is found, it falls back to the broader stopwords_iso list.

Usage

match_stopwords(df)
match_stopwords(df)

Arguments

df

Dataframe containing user reviews, with required columns comment (text) and rating (numerical score).

Details

The function converts text to a standardized format by removing accents and special characters, transforming it into basic Latin characters, and making all letters lowercase. It then tokenizes the text, filters out stopwords, and returns the cleaned version.

Value

A modified dataframe with an additional cleaned_text column containing stopword-free text.

Examples

reviews_sample <- tibble::tibble(
  comment = c("Bu ürün xs ancak fiyatı yüksek gibi",
              "Fiyat çok pahalı ama kaliteli iyi"),
  rating = c(4.5, 3.0)
)
match_stopwords(reviews_sample)
reviews_sample <- tibble::tibble(
  comment = c("Bu ürün xs ancak fiyatı yüksek gibi",
              "Fiyat çok pahalı ama kaliteli iyi"),
  rating = c(4.5, 3.0)
)
match_stopwords(reviews_sample)

A dataset of phrases

Description

Contains common negative-emotion phrases extracted from user reviews.

Usage

phrases
phrases

Format

A tbl_df with with 205 rows and 1 variable:

word: ngrams.

Examples

phrases
phrases

A dataset of reviews

Description

User reviews collected from an e-commerce site.

Usage

reviews
reviews

Format

A tbl_df with with 260,308 rows and 3 variables:

rating: Rating score, out of 5.
comment: Comment text, in Turkish.
id: Rating ID.

Examples

reviews
reviews

A test dataset

Description

A test sample data used for testing analysis functions. It differs from reviews data. The text column in this data frame is similar to the comment column in the reviews data frame. Note that this data frame contains 170 texts that are in common, verbatim, with comments in the reviews dataset. This is because some users made the same comments. The id column shows that these are not the same observations, just similarly worded comments from different reviews.

Usage

reviews_test
reviews_test

Format

A tbl_df with with 1,481 rows and 4 variables:

rating: Rating score, out of 5.
text: Comment text, in Turkish.
emotion: n for negative, p for positive.
id: Rating ID.

Examples

reviews_test
reviews_test

A dataset of Turkish stopwords

Description

A dataset of stopwords used in Turkish text analysis.

Usage

stopwords_tr
stopwords_tr

Format

A tbl_df with with 92 rows and 1 variable:

word: Stopword, in Turkish.

Examples

stopwords_tr
stopwords_tr

Package 'shoppingwords'

Help Index

Remove Stopwords from User Reviews

Description

Usage

Arguments

Details

Value

Examples

A dataset of phrases

Description

Usage

Format

Examples

A dataset of reviews

Description

Usage

Format

Examples

A test dataset

Description

Usage

Format

Examples

A dataset of Turkish stopwords

Description

Usage

Format

Examples