简体   繁体   English

在Python中将.dat转换为.csv

[英]convert .dat into .csv in python

I want to convert a data set of an .dat file into csv file. 我想将.dat文件的数据集转换为csv文件。 The data format looks like, 数据格式如下:

Each row begins with the sentiment score followed by the text associated with that rating.

.dat文件的图像

I want the have sentiment value of (-1 or 1) to have a column and the text of review corresponding to the sentiment value to have an review to have an column. 我希望具有(-1或1)的情感值具有一列,而与情感值相对应的评论文本则具有具有一列的评论。

WHAT I TRIED SO FAR 我尝试过的如此之遥

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np  
import csv

# read flash.dat to a list of lists
datContent = [i.strip().split() for i in open("train.dat").readlines()]

# write it as a new CSV file
with open("train.csv", "wb") as f:
    writer = csv.writer(f)
    writer.writerows(datContent)
def your_func(row):
    return row['Sentiments'] / row['Review']

columns_to_keep = ['Sentiments', 'Review']
dataframe = pd.read_csv("train.csv", usecols=columns_to_keep)
dataframe['new_column'] = dataframe.apply(your_func, axis=1)

print dataframe

Sample screen shot of the resulting train.csv it has an comma after every word in the review. 生成的train.csv的示例屏幕截图,它在审阅中的每个单词后面都有一个逗号。

train.csv的输出

If all your rows follow that consistent format, you can use pd.read_fwf . 如果所有行都遵循一致的格式,则可以使用pd.read_fwf This is a little safer than using read_csv , in the event that your second column also contains the delimiter you are attempting to split on. 这比使用read_csv安全一些,如果第二列还包含您要分割的定界符。

df = pd.read_fwf('data.txt', header=None, 
        widths=[2, int(1e5)], names=['label', 'text'])

print(df)
   label                       text
0     -1  ieafxf  rjzy xfxk ymi wuy
1      1     lqqm  ceegjnbjpxnidygr
2     -1  zss awoj anxb rfw  kgbvnl

data.txt

-1  ieafxf  rjzy xfxk ymi wuy
+1  lqqm  ceegjnbjpxnidygr
-1  zss awoj anxb rfw  kgbvnl

As mentioned in the comments, read_csv would be appropriate here. 如评论中所述,在这里read_csv是合适的。

df = pd.read_csv('train_csv.csv', sep='\t', names=['Sentiments', 'Review'])

  Sentiments     Review
0         -1    alskjdf
1          1      asdfa
2          1       afsd
3         -1        sdf

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM