簡體   English   中英

如果列沒有標題,如何使用 python 分隔一列 CSV 文件,然后將其保存到新的 excel 文件中?

[英]How to use python to seperate a one column CSV file if the columns have no headings, then save this into a new excel file?

所以,我對 python 很陌生,一直在谷歌搜索,但沒有找到好的解決方案。 我想要做的是在沒有標題的 excel 文檔中使用 python 將文本自動添加到列中。

這是我的 excel 表

它是一個 CSV 文件,其中所有數據都在沒有標題的一列中

前任。 嗨 Ho loe 時間工作理發師吉姆瓊你好

009 00487 08234 0240 2.0348 20.34829

分隔符是空格和逗號

我想要出來的內容保存在另一個 excel 中,前兩行被刪除並分隔成列(這可以使用文本到 excel 中的列來完成,但我想為幾個 excel 表自動執行此操作)

009 | 00487 | 08234 | 0240 | 2.0348 | 20.34829

到目前為止我寫的代碼是這樣的:

    import pandas as pd
    import csv


    path = 'C:/Users/ionan/OneDrive - Universiteit Utrecht/Desktop/UCU/test_excel'

    os.chdir(path)

    for root, dirs, files in os.walk(path):


        for f in files:

            df = pd.read_csv(f, delimiter='\t' + ';', engine = 'python') 

名稱為data.xlsx的原始文件:

在此處輸入圖像描述

這意味着我們需要的所有數據都在Data列下。

將單個文件的數據拆分為多列的代碼:

import pandas as pd 
import numpy as np 

f = 'data.xlsx'

# -- Insert the following code in your `for f in files` loop -- 
file_data = pd.read_excel(f) 

# Since number of values to be split is not known, set the value of `num_cols` to
# number of columns you expect in the modified excel file
num_cols = 20

# Create a dataframe with twenty columns 
new_file = pd.DataFrame(columns = ["col_{}".format(i) for i in range(num_cols)])

# Change the column name of the first column in new_file to "Data"
new_file = new_file.rename(columns = {"col_0": file_data.columns[0]})

# Add the value of the first cell in the original file to the first cell of the 
# new excel file
new_file.loc[0, new_file.columns[0]] = file_data.iloc[0, 0]

# Loop through all rows of original excel file
for index, row in file_data.iterrows():

    # Skip the first row
    if index == 0:
        continue

    # Split the row by `space`. This gives us a list of strings.
    split_data = file_data.loc[index, "Data"].split(" ")
    print(split_data)

    # Convert each element to a float (a number) if we want numbers and not strings
    # split_data = [float(i) for i in split_data]

    # Make sure the size of the list matches to the number of columns in the `new_file` 
    # np.NaN represents no value. 
    split_data = [np.NaN]  + split_data + [np.NaN] * (num_cols - len(split_data) - 1)

    # Store the list at a given index using `.loc` method
    new_file.loc[index] = split_data

# Drop all the columns where there is not a single number
new_file.dropna(axis=1, how='all', inplace=True)

# Get the original excel file name
new_file_name = f.split(".")[0]

# Save the new excel file at the same location where the original file is. 
new_file.to_excel(new_file_name + "_modified.xlsx", index=False)

這將創建一個名為data_modified.xlsx的新文件 excel(帶有一張紙):

在此處輸入圖像描述

摘要(無注釋的代碼)

import pandas as pd 
import numpy as np 

f = 'data.xlsx'

file_data = pd.read_excel(f) 

num_cols = 20
new_file = pd.DataFrame(columns = ["col_{}".format(i) for i in range(num_cols)])
new_file = new_file.rename(columns = {"col_0": file_data.columns[0]})
new_file.loc[0, new_file.columns[0]] = file_data.iloc[0, 0]

for index, row in file_data.iterrows():

    if index == 0:
        continue

    split_data = file_data.loc[index, "Data"].split(" ")
    split_data = [np.NaN]  + split_data + [np.NaN] * (num_cols - len(split_data) - 1)
    new_file.loc[index] = split_data

new_file.dropna(axis=1, how='all', inplace=True)
new_file_name = f.split(".")[0]
new_file.to_excel(new_file_name + "_modified.xlsx", index=False)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM