简体   繁体   English

从文本文件中排序数据的问题

[英]Problems Sorting Data out of a text-file

I have a csv file imported into a dataframe and have trouble sorting the data.我有一个 csv 文件导入到 dataframe 并且无法对数据进行排序。

df looks like this: df 看起来像这样:

    Data
0                          <WindSpeed>0.69</WindSpeed>
1                         <PowerOutput>0</PowerOutput>
2              <ThrustCoEfficient>0</ThrustCoEffici...
3                        <RotorSpeed>8.17</RotorSpeed>
4                     <ReactivePower>0</ReactivePower>
5                                         </DataPoint>
6                                          <DataPoint>
7                          <WindSpeed>0.87</WindSpeed>
8                         <PowerOutput>0</PowerOutput

I want it to look like this:我希望它看起来像这样:

0   Windspeed   Poweroutput
1   0.69        0.0

Here´s the code that I wrote so far:这是我到目前为止编写的代码:


import pandas as pd
from pandas.compat import StringIO
import re
import numpy as np


df= pd.read_csv('powercurve.csv', encoding='utf-8',skiprows=42)
df.columns=['Data']


no_of_rows=df.Data.str.count("WindSpeed").sum()/2
rows=no_of_rows.astype(np.uint32)
TRBX=pd.DataFrame(index=range(0,abs(rows)),columns=['WSpd[m/s]','Power[kW]'],dtype='float')
i=0
for i in range(len(df)):

  if 'WindSpeed' in df['Data']:
       TRBX['WSpd[m/s]', i]= re.findall ("'(\d+)'",'Data')


  elif 'Rotorspeed' in df['Data']:
       TRBX['WSpd[m/s]', i]= re.findall ("'(\d+)'",'Data') 

Is this a suitable approach?这是一个合适的方法吗? If yes, so far there are no values written into the TRBX dataframe.如果是,那么到目前为止还没有值写入 TRBX dataframe。 Where is my mistake?我的错误在哪里?

The code below should help you if your df is indeed in the same format as you:如果您的 df 确实与您的格式相同,下面的代码应该对您有所帮助:

import re

split_func = lambda x: re.split('<|>', str(x))

split_series = df.Data.apply(split_func)
data = a.apply(lambda x: x[2]).rename('data')
features = a.apply(lambda x: x[1]).rename('features')
df = pd.DataFrame(data).set_index(features).T

You may want to drop some columns that have no data or input some N/A values afterwards.您可能希望删除一些没有数据的列或之后输入一些 N/A 值。 You also may want to rename the variables and series to different names that make more sense to you.您可能还希望将变量和系列重命名为对您更有意义的不同名称。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM