從原始文本文件創建 pandas df

Question

我有一個文本文件，我想將其格式化為 pandas dataframe。 它被讀取為以下形式的字符串：
print(text)=

product: 1
description: product 1 desc
rating: 7.8
review: product 1 review

product: 2
description: product 2 desc
rating: 4.5
review: product 2 review

product: 3
description: product 3 desc
rating: 8.5
review: product 3 review

我想我會通過使用text.split('\n\n')將它們分組到列表中來拆分它們。 我會假設將每個迭代到一個字典中，然后加載到 pandas df 將是一個很好的路線，但我在這樣做時遇到了麻煩。 這是最好的路線嗎，有人可以幫我把它變成 pandas df 嗎？

Answer 1

您可以通過按product字符串和pivot比較第一列來將read_csv與創建組一起使用：

df = pd.read_csv('file.txt', header=None, sep=': ', engine='python')
df = df.assign(g = df[0].eq('product').cumsum()).pivot('g',0,1)
print (df)
0      description product rating             review
g                                                   
1   product 1 desc       1    7.8   product 1 review
2   product 2 desc       2    4.5   product 2 review
3   product 3 desc       3    8.5   product 3 review

或創建字典列表：

#https://stackoverflow.com/a/18970794/2901002
data = []
current = {}
with open('file.txt') as f:
    for line in f:
        pair = line.split(':', 1)
        if len(pair) == 2:
            if pair[0] == 'product' and current:
                # start of a new block
                data.append(current)
                current = {}
            current[pair[0]] = pair[1].strip()
    if current:
        data.append(current)
        
df = pd.DataFrame(data)
print (df)
  product     description rating            review
0       1  product 1 desc    7.8  product 1 review
1       2  product 2 desc    4.5  product 2 review
2       3  product 3 desc    8.5  product 3 review

或者將每 4 個值重塑為 2d numpy 數組並傳遞給DataFrame構造函數：

df = pd.read_csv('file.txt', header=None, sep=': ', engine='python')

df = pd.DataFrame(df[1].to_numpy().reshape(-1, 4), columns=df[0].iloc[:4].tolist())
print (df)
  product     description rating            review
0       1  product 1 desc    7.8  product 1 review
1       2  product 2 desc    4.5  product 2 review
2       3  product 3 desc    8.5  product 3 review

從原始文本文件創建 pandas df

問題描述

1 個解決方案

解決方案1
1 已采納 2021-04-06 04:35:18

從原始文本文件創建 pandas df

問題描述

1 個解決方案

解決方案1 1 已采納 2021-04-06 04:35:18

解決方案1
1 已采納 2021-04-06 04:35:18