簡體   English   中英

文本到 Pandas dataframe

[英]Text to Pandas dataframe

我一直在嘗試將此文本文件轉換為 dataframe,但它一直給我一個錯誤或 NaN。 我需要指導。 下面是我的代碼和文本示例。 material.txt 示例如下

_accurender\Ceiling\Acoustic Tile_Standard,灰色,2' x 2' Generic-051 _accurender\Ceiling\Acoustic Tile_Standard,白色,2' x 2' Generic-013 _accurender\Ceiling\Acoustic Tile_Standard,白色,2' x 4' Generic- 011 _accurender\瓷磚\馬賽克\方形\2"_Salmon,高光陶瓷-043 _accurender\混凝土\外露骨料,粉紅色混凝土-028 _accurender\混凝土\外露骨料,棕褐色混凝土-029 _accurender\Exterior\Shakes\Roofing,Shake ,方形,非均勻風化 Generic-052 _accurender\Masonry\Brick\Brown, Non-uniform,_8",Running Masonry-030 _accurender\Masonry\Brick\Brown,_8",Soldier Masonry-029

df = pd.read_csv('materials.txt', sep=';', header=None,names=['Revit_type', 'Material_Category', 'Material_Name', 'Material_Description'], encoding = 'latin')

我希望 dataframe 看起來像

     Material_Type   Material_Category    Material_Name    Material_Description

0    _accurender      Masonry              Brick            Brown,_8",Soldier   Masonry-029

請協助。 謝謝你。

希望這會有所幫助:但在此之前您已經以該格式編輯/更新您的 txt 文件:

_accurender\Ceiling\Acoustic Tile_Standard, Gray, 2' x 2' Generic-051 
_accurender\Ceiling\Acoustic Tile_Standard, White, 2' x 2' Generic-013 
_accurender\Ceiling\Acoustic Tile_Standard, White, 2' x 4' Generic-011 
_accurender\Ceramic Tile\Mosaic\Square\2"_Salmon,High Gloss Ceramic-043 
_accurender\Concrete\Exposed Aggregate, Pink Concrete-028 
_accurender\Concrete\Exposed Aggregate, Tan Concrete-029 
_accurender\Exterior\Shakes\Roofing,Shake,Square, Non-Uniform Weathering Generic-052 
_accurender\Masonry\Brick\Brown, Non-uniform,_8",Running Masonry-030 
_accurender\Masonry\Brick\Brown,_8",Soldier Masonry-029

編輯后只需運行此代碼。

import pandas as pd
Material_Type = []
Material_Category = []
Material_Name = []
Material_Description = []
file1 = open('sample.txt', 'r')
Lines = file1.readlines()
for line in Lines:
    res_list = line.split('\\')
    if len(res_list) == 3:
          Material_Type.append(res_list[0])
          Material_Category.append(res_list[1])
          Material_Name.append(res_list[2].split()[0])
          Material_Description.append(res_list[2].split()[1])
    else:
          Material_Type.append(res_list[0])
          Material_Category.append(res_list[1])
          Material_Name.append(res_list[2])
          Material_Description.append(res_list[3])

final_dict = {
    "Material_Type":Material_Type,
    "Material_Category":Material_Category,
    "Material_Name":Material_Name,
    "Material_Description":Material_Description
}
df = pd.DataFrame(final_dict)

嘗試使用sep='\\'而不是sep=';' . 請注意,這不會解決您輸入文件中所有條目的問題,因為它的格式似乎有些不一致。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM