如何在 Python 中将 txt 文件作为数据加载？

Question

我正在学习如何使用 sklearn 和 scikit 以及所有这些来进行一些机器学习。

我想知道如何将其作为数据导入？

这是来自百万歌曲流派数据集的数据集。

我怎样才能让我的data.target[0]等于“经典流行和摇滚”（作为 0）和data.target[1]等于 0，这是“经典流行和摇滚”和data.target[640]等于1 哪个是“民间”？

而我的data.data[0,:]等于-8.697 、 155.007 、 1 、 9 等等（标题列之后的所有数值）

Answer 1

正如其他人提到的那样，您要寻找的形状有点不清楚，但作为一般的入门者，并将数据转换为非常灵活的格式，您可以将文本文件读入 python 并将其转换为熊猫数据帧. 我确信他们有其他更紧凑的方法来做到这一点，但只是为了提供清晰的步骤，我们可以开始：

import pandas as pd
import re 

file = 'filepath' #this is the file path to the saved text file
music = open(file, 'r')
lines = music.readlines()
# split the lines by comma
lines = [line.split(',') for line in lines]
# capturing the column line
columns = lines[9]
# capturing the actual content of the data, and dismissing the header info
content = lines[10:]

musicdf = pd.DataFrame(content)
# assign the column names to our dataframe
musicdf.columns = columns
# preview the dataframe
musicdf.head(10)

# the final column had formatting issues, so wanted to provide code to get rid of the "\n" in both the column title and the column values

def cleaner(txt):
    txt = re.sub(r'[\n]+', '', txt)
    return txt

# rename the column of issue
musicdf = musicdf.rename(columns = {'var_timbre12\n' : 'var_timbre12'})

# applying the column cleaning function above to the column of interest
musicdf['var_timbre12'] = musicdf['var_timbre12'].apply(lambda p: cleaner(p))

# checking the top and bottom of dataframe for column var_timbre12
musicdf['var_timbre12'].head(10)
musicdf['var_timbre12'].tail(10)

结果如下：

             %genre            track_id       artist_name  
0  classic pop and rock  TRFCOOU128F427AEC0  Blue Oyster Cult   
1  classic pop and rock  TRNJTPB128F427AE9F  Blue Oyster Cult

通过使用这种格式的数据，您现在可以使用 pandas groupby 函数执行大量分组任务、查找某些流派及其相关属性等。

希望这可以帮助！

如何在 Python 中将 txt 文件作为数据加载？

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-11-11 23:29:40

如何在 Python 中将 txt 文件作为数据加载？

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-11-11 23:29:40

解决方案1
2 已采纳 2015-11-11 23:29:40