简体   繁体   English

如何从 txt 文件创建 csv 文件,在“x”个字符后使用列分隔符

[英]How to create a csv file from a txt file with column separator after “x” amount of characters

I have a txt file that looks like this:我有一个看起来像这样的 txt 文件:

MT0111500000000 Anniston-Oxford-Jacksonville, AL Metropolitan Statistical Area
MT0112220000000 Auburn-Opelika, AL Metropolitan Statistical Area  
MT0113820000000 Birmingham-Hoover, AL Metropolitan Statistical Area

I need to create a csv file from this, which I have little experience in, but have been learning and doing it, although likely not efficiently.我需要从中创建一个 csv 文件,我对此几乎没有经验,但一直在学习和做,虽然可能效率不高。

My issue right now is that when I use pandas, it is creating columns after the ",".我现在的问题是,当我使用 pandas 时,它会在“,”之后创建列。 What I need is the column separator to be after the code on the left, "MT0113820000000", and although the codes do change, they are all the same length.我需要的是列分隔符位于左侧代码“MT0113820000000”之后,尽管代码确实发生了变化,但它们的长度都相同。

Thanks in advance, I know this is a really noobie question.在此先感谢,我知道这是一个非常noobie的问题。

Here's my code currently:这是我目前的代码:

import pandas as pd

dataframe1 = pd.read_csv("C:/Users/andre/Desktop/bea_api_test/python-bureau-economic-analysis-api-client/testttt/output.txt")  
dataframe1.to_csv('output_.csv', index = None)

And the output:和 output:

COLUMN 1                                COLUMN 2
MT0111500000000 Anniston-Oxford-Jacksonville     | AL Metropolitan Statistical Area

Alternatively, using read_fwf as mentioned in a comment above:或者,使用上面评论中提到的read_fwf

from io import StringIO
import pandas as pd

testdata = '''\
MT0111500000000 Anniston-Oxford-Jacksonville, AL Metropolitan Statistical Area
MT0112220000000 Auburn-Opelika, AL Metropolitan Statistical Area
MT0113820000000 Birmingham-Hoover, AL Metropolitan Statistical Area
'''

buff = StringIO(testdata)

df = pd.read_fwf(buff, header=None, colspecs=[(0, 15), (16, 64 * 1024)])

print(df.to_csv(index=False, columns=[0, 1], header=['COLUMN1', 'COLUMN2']))

That's not a CSV and I don't see a convenient way of convincing read_csv to do the right thing.这不是 CSV 并且我看不到说服read_csv做正确事情的便捷方法。 Luckily, there seems to be an easy rule here.幸运的是,这里似乎有一个简单的规则。 The stuff before the first space, then the stuff after.第一个空格之前的东西,然后是之后的东西。 str.split does that. str.split这样做的。

import pandas as pd
from pathlib import Path

#in_file = Path("C:/Users/andre/Desktop/bea_api_test/python-bureau-economic-analysis-api-client/testttt/output.txt")
in_file = Path("test.txt")
out_file = in_file.with_name(in_file.stem + "_").with_suffix(".csv")

    # test data
    open(in_file, "w").write("""\
    MT0111500000000 Anniston-Oxford-Jacksonville, AL Metropolitan Statistical Area
    MT0112220000000 Auburn-Opelika, AL Metropolitan Statistical Area  
    MT0113820000000 Birmingham-Hoover, AL Metropolitan Statistical Area""")
    
    # convert to csv
    pd.DataFrame([line.strip().split(" ",1) for line in open(in_file)],
        columns=["COLUMN1", "COLUMN2"]).to_csv(out_file, index=None, headr=False)
    
    # visual verification
    print(open(out_file).read())

Output Output

MT0111500000000,"Anniston-Oxford-Jacksonville, AL Metropolitan Statistical Area"
MT0112220000000,"Auburn-Opelika, AL Metropolitan Statistical Area"
MT0113820000000,"Birmingham-Hoover, AL Metropolitan Statistical Area"

In this example I immediately wrote the csv so that the dataframe is automatically deleted from memory.在此示例中,我立即编写了 csv,以便自动从 memory 中删除 dataframe。 You could also do this with the CSV module, writing line at a time.您也可以使用 CSV 模块执行此操作,一次写入一行。 This will use less memory because it don't have to hold the entire file in memory.这将使用更少的 memory,因为它不必将整个文件保存在 memory 中。 And since csv is part of the standard python library, there is no external dependency on pandas .由于csv是标准 python 库的一部分,因此对pandas没有外部依赖。 Adding a bit of file name handling添加一些文件名处理

import csv
from pathlib import Path

#in_file = Path("C:/Users/andre/Desktop/bea_api_test/python-bureau-economic-analysis-api-client/testttt/output.txt")
in_file = Path("test.txt")
out_file = in_file.with_name(in_file.stem + "_").with_suffix(".csv")

# test data
open(in_file, "w").write("""\
MT0111500000000 Anniston-Oxford-Jacksonville, AL Metropolitan Statistical Area
MT0112220000000 Auburn-Opelika, AL Metropolitan Statistical Area  
MT0113820000000 Birmingham-Hoover, AL Metropolitan Statistical Area""")

# convert to csv
with open(in_file) as infp, open(out_file, "w") as outfp:
    writer = csv.writer(outfp)
    writer.writerows(line.strip().split(" ",1) for line in infp)

# visual verification
print(open(out_file).read())

You can split the data at the first occurrence of the whitespace:您可以在第一次出现空格时拆分数据:

data = pd.read_table("data.txt", squeeze = True, header = None).str.split(" ", 1)
df = pd.DataFrame(data.tolist(), columns = ["column1", "column2"])

df.to_csv("df.csv")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 CSV 文件的 x 行数之后使用 Pandas 创建一个新列? - How to create a new column with Pandas after x number of rows from a CSV file? 如何使用python 2.7从文件csv创建文件txt文件名 - how to create file txt filename from file csv with python 2.7 如何遍历.txt文件中的x行 - How to iterate over x amount of lines in a .txt file 如何生成列表并由';'限制 txt文件(或csv),为pandas数据框的每一列定义了字符数 - How to generate a tabulated and limited by ';' txt file (or csv) with a defined numer of characters for each column of a pandas dataframe 如何从 a.txt 文件中读取某些字符并将它们写入 Python 中的 a.csv 文件? - How can I read certain characters from a .txt file and write them to a .csv file in Python? 如何保留从a.csv文件中删除第二列并将文件的rest保存到Python中的a.txt文件 - How to keep the remove the second column from a .csv file and save the rest of the file to a .txt file in Python 从(.csv或.txt)文件Python中删除各种字符 - Removing various characters from (.csv or .txt) file Python 如何从 CSV 文件创建 Target(y) 和 X 变量 - How to Create Target(y) and X variables from CSV file 如何使用来自另一个 .txt 的列表过滤 .csv/.txt 文件 - how to filter a .csv/.txt file using a list from another .txt 如何在for循环中编写“如果文件类型不是txt或csv,请执行X”? - How to write "If file type is NOT txt or csv, do X" inside for loop?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM