Splitting data in a column using SQL/HiveQL

Question

I have a university project where I need to do some simple analysis on a large dataset of my choosing, and we are to run this in the Hadoop system. I am choosing to use Hive because I have essentially no experience with databases, but I like Hive.

Anyway, I have got a chess dataset, and I have been able to extract some columns of interest, such as the names of the opening moves, and find how often they occur. Things like that.

I would like to be able to take a look at the first few move from each game, and that brings me to my problem. The notation for all moves is stored in a column called moves , and looks like this:

This column is in a .csv file called chess_game .

How would I go about extracting say, the first 4 moves into a new table called something like opening_moves .

Thanks in advance for any advice.

Answer 1

You can split moves string using split function. Like this:

select rating, 
       moves[0] as first, 
       moves[1] as second,
       moves[2] as third,
       moves[3] as fourth               
(
select rating,  split(moves, ' ') as moves from your_table 
) s
;

Splitting data in a column using SQL/HiveQL

Question

1 answers

solution1
1 2020-11-23 10:32:33

Splitting data in a column using SQL/HiveQL

Question

1 answers

solution1 1 2020-11-23 10:32:33

solution1
1 2020-11-23 10:32:33