Using rlike with list to create new df scala

Question

just started with scala 2 days ago.

Here's the thing, I have a df and a list. The df contains two columns: paragraphs and authors, the list contains words (strings). I need to get the count of all the paragraphs where every word on list appears by author.

So far my idea was to create a for loop on the list to query the df using rlike and create a new df, but even if this does work, I wouldn't know how to do it. Any help is appreciated!

Edit: Adding example data and expected output

// Example df and list
val df = Seq(("auth1", "some text word1"), ("auth2","some text word2"),("auth3", "more text word1").toDF("a","t")

df.show

+-------+---------------+
|      a|              t|
+-------+---------------+
|auth1  |some text word1|
|auth2  |some text word2|
|auth1  |more text word1|
+-------+---------------+
    
val list = List("word1", "word2")
    
// Expected output

 newDF.show

+-------+-----+----------+
|   word|    a|text count|
+-------+-----+----------+
|word1  |auth1|         2|
|word2  |auth2|         1|
+-------+-----+----------+

Answer 1

You can do a filter and aggregation for each word in the list, and combine all the resulting dataframes using unionAll :

val result = list.map(word => 
    df.filter(df("t").rlike(s"\\b${word}\\b"))
      .groupBy("a")
      .agg(lit(word).as("word"), count(lit(1)).as("text count"))
).reduce(_ unionAll _)

result.show
+-----+-----+----------+
|    a| word|text count|
+-----+-----+----------+
|auth3|word1|         1|
|auth1|word1|         1|
|auth2|word2|         1|
+-----+-----+----------+

Using rlike with list to create new df scala

Question

1 answers

solution1
0 ACCPTED 2021-03-21 07:54:07

Using rlike with list to create new df scala

Question

1 answers

solution1 0 ACCPTED 2021-03-21 07:54:07

solution1
0 ACCPTED 2021-03-21 07:54:07