简体   繁体   中英

Splitting data in a column using SQL/HiveQL

I have a university project where I need to do some simple analysis on a large dataset of my choosing, and we are to run this in the Hadoop system. I am choosing to use Hive because I have essentially no experience with databases, but I like Hive.

Anyway, I have got a chess dataset, and I have been able to extract some columns of interest, such as the names of the opening moves, and find how often they occur. Things like that.

I would like to be able to take a look at the first few move from each game, and that brings me to my problem. The notation for all moves is stored in a column called moves , and looks like this:

在此处输入图片说明

This column is in a .csv file called chess_game .

How would I go about extracting say, the first 4 moves into a new table called something like opening_moves .

Thanks in advance for any advice.

You can split moves string using split function. Like this:

select rating, 
       moves[0] as first, 
       moves[1] as second,
       moves[2] as third,
       moves[3] as fourth               
(
select rating,  split(moves, ' ') as moves from your_table 
) s
;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM