[英]Python - format csv data for Binary Classifier
我有一些csv
数据,格式如下:
headers = [artist_list, song_list, lyrics_track, lyrics_artist, lyrics]`,
和这个片段:
with open('lyrics.tsv', "rU") as f:
reader = csv.reader(f, delimiter="\t")
for i, line in enumerate(reader):
print 'line[{}] = {}'.format(i, line)
印刷品:
(...)
line[808] = ['Pearl Jam', 'Wishlist', 'Wishlist', 'Pearl Jam', "I wish I was a neutron bomb\nfor once I could go off\nI wish I was a sacrifice\nbut somehow still lived on\nI wish I was a sentimental\nornament you hung on\nthe Christmas tree, I wish I was\nthe star that went on top\nI wish I was the evidence\nI wish I was the grounds\nfor fifty million hands upraised and opened toward the sky\nI wish I was a sailor with\nsomeone who waited for me\nI wish I was as fortunate\nas fortunate as me\nI wish I was a messenger\nand all the news was good\nI wish I was the full moon shining\noff a Camaro's hood\nI wish I was an alien\nat home behind the sun\nI wish I was the souvenir\nyou kept your house key on\nI wish I was the pedal break\nthat you depended on\nI wish I was the verb to trust\nand never let you down\nI wish I was a radio song\nthe one that you turned up\nI wish ..."]
现在我想使用数据进行分类,仅保留所有行的lyrics
,并为二进制值添加一列(始终相同,为0
),因此数据被转换为:
lyrics type
(...) (...)
I wish I was a neutron bomb\nfor once I could go off.. 0
我该如何从上面的代码开始呢?
我认为类似这样的方法可能会起作用(假设您的数据位于名为lyrics_df的数据帧中):
lyrics_df['type']=0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.