简体   繁体   English

如果列没有标题,如何使用 python 分隔一列 CSV 文件,然后将其保存到新的 excel 文件中?

[英]How to use python to seperate a one column CSV file if the columns have no headings, then save this into a new excel file?

So, I am quite new to python and have been googling a lot but have not found a good solution.所以,我对 python 很陌生,一直在谷歌搜索,但没有找到好的解决方案。 What I am looking to do is automate text to columns using python in an excel document without headers.我想要做的是在没有标题的 excel 文档中使用 python 将文本自动添加到列中。

Here is the excel sheet I have这是我的 excel 表

it is a CSV file where all the data is in one column without headers它是一个 CSV 文件,其中所有数据都在没有标题的一列中

ex.前任。 hi ho loe time jobs barber jim joan hello嗨 Ho loe 时间工作理发师吉姆琼你好

009 00487 08234 0240 2.0348 20.34829 009 00487 08234 0240 2.0348 20.34829

delimeter is space and comma分隔符是空格和逗号

What I want to come out is saved in another excel with the first two rows deleted and seperated into columns ( this can be done using text to column in excel but i would like to automate this for several excel sheets)我想要出来的内容保存在另一个 excel 中,前两行被删除并分隔成列(这可以使用文本到 excel 中的列来完成,但我想为几个 excel 表自动执行此操作)

009 | 009 | 00487 | 00487 | 08234 | 08234 | 0240 | 0240 | 2.0348 | 2.0348 | 20.34829 20.34829

the code i have written so far is like this:到目前为止我写的代码是这样的:

    import pandas as pd
    import csv


    path = 'C:/Users/ionan/OneDrive - Universiteit Utrecht/Desktop/UCU/test_excel'

    os.chdir(path)

    for root, dirs, files in os.walk(path):


        for f in files:

            df = pd.read_csv(f, delimiter='\t' + ';', engine = 'python') 

Original file with name as data.xlsx :名称为data.xlsx的原始文件:

在此处输入图像描述

This means all the data we need is under the column Data .这意味着我们需要的所有数据都在Data列下。

Code to split data into multiple columns for a single file:将单个文件的数据拆分为多列的代码:

import pandas as pd 
import numpy as np 

f = 'data.xlsx'

# -- Insert the following code in your `for f in files` loop -- 
file_data = pd.read_excel(f) 

# Since number of values to be split is not known, set the value of `num_cols` to
# number of columns you expect in the modified excel file
num_cols = 20

# Create a dataframe with twenty columns 
new_file = pd.DataFrame(columns = ["col_{}".format(i) for i in range(num_cols)])

# Change the column name of the first column in new_file to "Data"
new_file = new_file.rename(columns = {"col_0": file_data.columns[0]})

# Add the value of the first cell in the original file to the first cell of the 
# new excel file
new_file.loc[0, new_file.columns[0]] = file_data.iloc[0, 0]

# Loop through all rows of original excel file
for index, row in file_data.iterrows():

    # Skip the first row
    if index == 0:
        continue

    # Split the row by `space`. This gives us a list of strings.
    split_data = file_data.loc[index, "Data"].split(" ")
    print(split_data)

    # Convert each element to a float (a number) if we want numbers and not strings
    # split_data = [float(i) for i in split_data]

    # Make sure the size of the list matches to the number of columns in the `new_file` 
    # np.NaN represents no value. 
    split_data = [np.NaN]  + split_data + [np.NaN] * (num_cols - len(split_data) - 1)

    # Store the list at a given index using `.loc` method
    new_file.loc[index] = split_data

# Drop all the columns where there is not a single number
new_file.dropna(axis=1, how='all', inplace=True)

# Get the original excel file name
new_file_name = f.split(".")[0]

# Save the new excel file at the same location where the original file is. 
new_file.to_excel(new_file_name + "_modified.xlsx", index=False)

This creates a new excel file (with a single sheet) of name data_modified.xlsx :这将创建一个名为data_modified.xlsx的新文件 excel(带有一张纸):

在此处输入图像描述

Summary (code without comments) :摘要(无注释的代码)

import pandas as pd 
import numpy as np 

f = 'data.xlsx'

file_data = pd.read_excel(f) 

num_cols = 20
new_file = pd.DataFrame(columns = ["col_{}".format(i) for i in range(num_cols)])
new_file = new_file.rename(columns = {"col_0": file_data.columns[0]})
new_file.loc[0, new_file.columns[0]] = file_data.iloc[0, 0]

for index, row in file_data.iterrows():

    if index == 0:
        continue

    split_data = file_data.loc[index, "Data"].split(" ")
    split_data = [np.NaN]  + split_data + [np.NaN] * (num_cols - len(split_data) - 1)
    new_file.loc[index] = split_data

new_file.dropna(axis=1, how='all', inplace=True)
new_file_name = f.split(".")[0]
new_file.to_excel(new_file_name + "_modified.xlsx", index=False)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 通过更改标题将.csv文件中的单个列拆分为多个列,并使用Python 2将其保存到新的.csv文件中 - Splitting a single column in a .csv file into multiple columns with changes in headings and saving it in a new .csv file using Python 2 我有一个包含许多列和许多行的 CSV 文件。 如何从 Python 创建一列一 Excel 表? - I have a CSV file with many columns and many rows. How do I create a one column one Excel sheet from Python? 如何使用 python 将 2 列相似数据连接到具有相同列标题的同一 excel 文件中的单个列中 - How to join 2 columns of similar data into a single column in a same excel file with same column headings using python 如何在 csv 文件中将一列拆分为单独的列? - how do I split a column into seperate columns in a csv file? 如何过滤 .CSV 文件中的列,然后将这些过滤后的列保存到 Python 中的新 .CSV 文件中? - How to filter columns within a .CSV file and then save those filtered columns to a new .CSV file in Python? 如何在Python中将列中的每个值拆分为单独的CSV文件? - How to split each value in a column as a seperate csv file in Python? 如何在python中“写入新的.CSV文件”或“另存为新的.CSV文件” - How To ' Write To New .CSV File' or "Save As New .CSV File' In python 如何在单独的列中写入csv文件? - How to write to csv file in seperate columns? Python Pandas从一列中读取csv,然后将各列分开 - Python pandas read a csv from one column then seperate columns 我有 YEAR、MO、DY、HR 四个单独的列。 如何从 CSV 文件将其转换为 Python 中的一列 - I have four separate columns for YEAR, MO, DY, HR. How do I convert it into one column in Python from a CSV file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM