繁体   English   中英

如果列没有标题,如何使用 python 分隔一列 CSV 文件,然后将其保存到新的 excel 文件中?

[英]How to use python to seperate a one column CSV file if the columns have no headings, then save this into a new excel file?

所以,我对 python 很陌生,一直在谷歌搜索,但没有找到好的解决方案。 我想要做的是在没有标题的 excel 文档中使用 python 将文本自动添加到列中。

这是我的 excel 表

它是一个 CSV 文件,其中所有数据都在没有标题的一列中

前任。 嗨 Ho loe 时间工作理发师吉姆琼你好

009 00487 08234 0240 2.0348 20.34829

分隔符是空格和逗号

我想要出来的内容保存在另一个 excel 中,前两行被删除并分隔成列(这可以使用文本到 excel 中的列来完成,但我想为几个 excel 表自动执行此操作)

009 | 00487 | 08234 | 0240 | 2.0348 | 20.34829

到目前为止我写的代码是这样的:

    import pandas as pd
    import csv


    path = 'C:/Users/ionan/OneDrive - Universiteit Utrecht/Desktop/UCU/test_excel'

    os.chdir(path)

    for root, dirs, files in os.walk(path):


        for f in files:

            df = pd.read_csv(f, delimiter='\t' + ';', engine = 'python') 

名称为data.xlsx的原始文件:

在此处输入图像描述

这意味着我们需要的所有数据都在Data列下。

将单个文件的数据拆分为多列的代码:

import pandas as pd 
import numpy as np 

f = 'data.xlsx'

# -- Insert the following code in your `for f in files` loop -- 
file_data = pd.read_excel(f) 

# Since number of values to be split is not known, set the value of `num_cols` to
# number of columns you expect in the modified excel file
num_cols = 20

# Create a dataframe with twenty columns 
new_file = pd.DataFrame(columns = ["col_{}".format(i) for i in range(num_cols)])

# Change the column name of the first column in new_file to "Data"
new_file = new_file.rename(columns = {"col_0": file_data.columns[0]})

# Add the value of the first cell in the original file to the first cell of the 
# new excel file
new_file.loc[0, new_file.columns[0]] = file_data.iloc[0, 0]

# Loop through all rows of original excel file
for index, row in file_data.iterrows():

    # Skip the first row
    if index == 0:
        continue

    # Split the row by `space`. This gives us a list of strings.
    split_data = file_data.loc[index, "Data"].split(" ")
    print(split_data)

    # Convert each element to a float (a number) if we want numbers and not strings
    # split_data = [float(i) for i in split_data]

    # Make sure the size of the list matches to the number of columns in the `new_file` 
    # np.NaN represents no value. 
    split_data = [np.NaN]  + split_data + [np.NaN] * (num_cols - len(split_data) - 1)

    # Store the list at a given index using `.loc` method
    new_file.loc[index] = split_data

# Drop all the columns where there is not a single number
new_file.dropna(axis=1, how='all', inplace=True)

# Get the original excel file name
new_file_name = f.split(".")[0]

# Save the new excel file at the same location where the original file is. 
new_file.to_excel(new_file_name + "_modified.xlsx", index=False)

这将创建一个名为data_modified.xlsx的新文件 excel(带有一张纸):

在此处输入图像描述

摘要(无注释的代码)

import pandas as pd 
import numpy as np 

f = 'data.xlsx'

file_data = pd.read_excel(f) 

num_cols = 20
new_file = pd.DataFrame(columns = ["col_{}".format(i) for i in range(num_cols)])
new_file = new_file.rename(columns = {"col_0": file_data.columns[0]})
new_file.loc[0, new_file.columns[0]] = file_data.iloc[0, 0]

for index, row in file_data.iterrows():

    if index == 0:
        continue

    split_data = file_data.loc[index, "Data"].split(" ")
    split_data = [np.NaN]  + split_data + [np.NaN] * (num_cols - len(split_data) - 1)
    new_file.loc[index] = split_data

new_file.dropna(axis=1, how='all', inplace=True)
new_file_name = f.split(".")[0]
new_file.to_excel(new_file_name + "_modified.xlsx", index=False)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM