簡體   English   中英

如何使用冒號分隔符從csv創建數據框

[英]how to create a dataframe from csv with a colon separator

我正在使用以下代碼解析Outlook消息:

email_content = str(message.Body)
lines_stripped = [line.strip() for line in email_content.split('\r\n') if line.strip() != '']
for line in lines_stripped:
    writer = csv.writer(write_file, delimiter=" ")
    writer.writerow(line.split())

CSV文件如下所示:

Car: Mazda

Color: Green

Comment: A very nice Car

Car: Toyota

Color: Black

Comment: Okay car

我想改變這樣的事情:

Car     Color       Comment
Mazda   Green       A very nice Car
Toyota  Black       Okay car

我會使用這個split_at模式在純python中完成大部分操作:

In [11]: def split_at(lst, f):
    ...:     inds = [i for i, x in enumerate(lst) if f(x)]
    ...:     for i, j in zip(inds, inds[1:]):
    ...:         yield lst[i:j]
    ...:     yield lst[j:]
    ...:

這允許您拆分屬性列表:

In [12]: cars = [c.split(": ", 1) for c in cars.splitlines() if c]

In [13]: cars
Out[13]:
[['Car', 'Mazda'],
 ['Color', 'Green'],
 ['Comment', 'A very nice Car'],
 ['Car', 'Toyota'],
 ['Color', 'Black'],
 ['Comment', 'Okay car']]

In [14]: pd.DataFrame([dict(c) for c in split_at(cars, lambda x: x[0] == "Car")])
Out[14]:
      Car  Color          Comment
0   Mazda  Green  A very nice Car
1  Toyota  Black         Okay car
##data

temp = StringIO("""  
Car: Mazda

Color: Green

Comment: A very nice Car

Car: Toyota

Color: Black

Comment: Okay car""")

df = pd.read_csv(temp, sep=':', engine='python', header=None)
df.columns = ['A','B']

##print(df)

         A                 B
0      Car             Mazda
1    Color             Green
2  Comment   A very nice Car
3      Car            Toyota
4    Color             Black
5  Comment          Okay car

使用pd.pivot並使用sorted with key為null

pd.pivot(index=df.index, columns=df.A, values=df.B).apply(sorted,key=pd.isnull).dropna()

產量

A      Car   Color           Comment
0    Mazda   Green   A very nice Car
1   Toyota   Black          Okay car

這應該工作:

import numpy as np
import pandas as pd
import io

temp = '''
Car: Mazda

Color: Green

Comment: A very nice Car

Car: Toyota

Color: Black

Comment: Okay car

'''
input_csv = io.StringIO(temp)
#input_csv = 'hello.csv'
df = pd.read_csv(input_csv, sep=":", skip_blank_lines=True,header=None)
data = np.array_split(df[1].to_numpy(), len(df)/3)
df2 = pd.DataFrame(data, columns=df[0].unique())
print(df2)

       Car   Color           Comment
0    Mazda   Green   A very nice Car
1   Toyota   Black          Okay car

使用純python + pandas

cars = []
colors = []
comments = []

lines = io.StringIO(temp).readlines()
for line in lines:
  if line.startswith('Car'):
    cars.append(line.split(':')[1].strip())
  if line.startswith('Color'):
    colors.append(line.split(':')[1].strip())
  if line.startswith('Comment'):
    comments.append(line.split(':')[1].strip())

df = pd.DataFrame({'car': cars, 'color': colors, 'comment': comments})
df

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM