如果列没有标题，如何使用 python 分隔一列 CSV 文件，然后将其保存到新的 excel 文件中？

Question

So, I am quite new to python and have been googling a lot but have not found a good solution.所以，我对 python 很陌生，一直在谷歌搜索，但没有找到好的解决方案。 What I am looking to do is automate text to columns using python in an excel document without headers.我想要做的是在没有标题的 excel 文档中使用 python 将文本自动添加到列中。

Here is the excel sheet I have这是我的 excel 表

it is a CSV file where all the data is in one column without headers它是一个 CSV 文件，其中所有数据都在没有标题的一列中

ex.前任。 hi ho loe time jobs barber jim joan hello嗨 Ho loe 时间工作理发师吉姆琼你好

009 00487 08234 0240 2.0348 20.34829 009 00487 08234 0240 2.0348 20.34829

delimeter is space and comma分隔符是空格和逗号

What I want to come out is saved in another excel with the first two rows deleted and seperated into columns ( this can be done using text to column in excel but i would like to automate this for several excel sheets)我想要出来的内容保存在另一个 excel 中，前两行被删除并分隔成列（这可以使用文本到 excel 中的列来完成，但我想为几个 excel 表自动执行此操作）

009 | 009 | 00487 | 00487 | 08234 | 08234 | 0240 | 0240 | 2.0348 | 2.0348 | 20.34829 20.34829

the code i have written so far is like this:到目前为止我写的代码是这样的：

    import pandas as pd
    import csv


    path = 'C:/Users/ionan/OneDrive - Universiteit Utrecht/Desktop/UCU/test_excel'

    os.chdir(path)

    for root, dirs, files in os.walk(path):


        for f in files:

            df = pd.read_csv(f, delimiter='\t' + ';', engine = 'python')

Answer 1

Original file with name as data.xlsx :名称为data.xlsx的原始文件：

This means all the data we need is under the column Data .这意味着我们需要的所有数据都在Data列下。

Code to split data into multiple columns for a single file:将单个文件的数据拆分为多列的代码：

import pandas as pd 
import numpy as np 

f = 'data.xlsx'

# -- Insert the following code in your `for f in files` loop -- 
file_data = pd.read_excel(f) 

# Since number of values to be split is not known, set the value of `num_cols` to
# number of columns you expect in the modified excel file
num_cols = 20

# Create a dataframe with twenty columns 
new_file = pd.DataFrame(columns = ["col_{}".format(i) for i in range(num_cols)])

# Change the column name of the first column in new_file to "Data"
new_file = new_file.rename(columns = {"col_0": file_data.columns[0]})

# Add the value of the first cell in the original file to the first cell of the 
# new excel file
new_file.loc[0, new_file.columns[0]] = file_data.iloc[0, 0]

# Loop through all rows of original excel file
for index, row in file_data.iterrows():

    # Skip the first row
    if index == 0:
        continue

    # Split the row by `space`. This gives us a list of strings.
    split_data = file_data.loc[index, "Data"].split(" ")
    print(split_data)

    # Convert each element to a float (a number) if we want numbers and not strings
    # split_data = [float(i) for i in split_data]

    # Make sure the size of the list matches to the number of columns in the `new_file` 
    # np.NaN represents no value. 
    split_data = [np.NaN]  + split_data + [np.NaN] * (num_cols - len(split_data) - 1)

    # Store the list at a given index using `.loc` method
    new_file.loc[index] = split_data

# Drop all the columns where there is not a single number
new_file.dropna(axis=1, how='all', inplace=True)

# Get the original excel file name
new_file_name = f.split(".")[0]

# Save the new excel file at the same location where the original file is. 
new_file.to_excel(new_file_name + "_modified.xlsx", index=False)

This creates a new excel file (with a single sheet) of name data_modified.xlsx :这将创建一个名为data_modified.xlsx的新文件 excel（带有一张纸）：

Summary (code without comments) :摘要（无注释的代码） ：

import pandas as pd 
import numpy as np 

f = 'data.xlsx'

file_data = pd.read_excel(f) 

num_cols = 20
new_file = pd.DataFrame(columns = ["col_{}".format(i) for i in range(num_cols)])
new_file = new_file.rename(columns = {"col_0": file_data.columns[0]})
new_file.loc[0, new_file.columns[0]] = file_data.iloc[0, 0]

for index, row in file_data.iterrows():

    if index == 0:
        continue

    split_data = file_data.loc[index, "Data"].split(" ")
    split_data = [np.NaN]  + split_data + [np.NaN] * (num_cols - len(split_data) - 1)
    new_file.loc[index] = split_data

new_file.dropna(axis=1, how='all', inplace=True)
new_file_name = f.split(".")[0]
new_file.to_excel(new_file_name + "_modified.xlsx", index=False)

如果列没有标题，如何使用 python 分隔一列 CSV 文件，然后将其保存到新的 excel 文件中？

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-05-02 11:19:14

如果列没有标题，如何使用 python 分隔一列 CSV 文件，然后将其保存到新的 excel 文件中？

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-05-02 11:19:14

解决方案1
0 已采纳 2022-05-02 11:19:14