Import Data in Python with Pandas just for specific rows

Question

I am really new in Python and I hope this is the right community for my question. Sorry if it is not.

I am trying to import data from a.txt file with pandas. The file looks like this:

# Raman Scattering Spectrum
# X-Axis:  Frequency (cm-1)
# Y-Axis:  Intensity (10-36 m2 cm/sr)

# Harmonic Data

# Peak information (Harmonic)
#                  X                   Y
#      20.1304976000        1.1465331676
#      25.5433266000        6.0306906544
...
#    3211.8081700000        0.3440113123
#    3224.5118500000        0.8814596030

# Plot Curve (Harmonic)
#                  X                   Y               DY/DX
    0.0000000000        8.4803414671        0.6546818124
    8.0000000000       17.8239097502        2.0146387573

I already wrote this pieces of code to import my data:

import pandas as pd
# import matplotlib as plt
# import scipy as sp

data = pd.read_csv('/home/andrea/Schreibtisch/raman_gauss.txt', sep='\t')
data

Now I just get one column. If I try it with

pd.read_fwf(file)

I got 3 columns, but the x and y values from plot curve (harmonic) are in one column.

Now I want to import from Plot Curve (Harmonic) the x, y and DY/DX values in different variables or containers as series. The hart part for me ist how to split x und y now in 2 columns and how to tell python that the import should start at the line number from plot cuve (harmonix) +2 lines.

I think about it yet and my idea was to check all containers for the string 'Plot Curve (Harmonic). Then I get a new series with true or false. Then I need to read out which line number is true for the search word. And then I start the import from this line... I am too much a newbie to Python and I am not yet familiar with the documantation that I found the command i must use.

Has anyone tipps for me with a command or something? And how to split the columns?

Thank you very much!

Answer 1

You can read as follows.

Code

import pandas as pd
import re  # Regex to parse header

def get_data(filename):
   # Find row containing 'Plot Curve (Harmonic)'
    with open('data.txt', 'r') as f:
      for i, line in enumerate(f):
        if 'Plot Curve (Harmonic)' in line:
          start_row = i
          # Parse header on next line
          header = re.findall(r'\S+', next(f))[1:]
          # [1:] to skip '#' at beginnning of line
          break
      else:
        start_row = None  # not found

    if start_row:
      # Use  delimiter=r"\s+": since have multiple spaces between numbers
      # skip_rows = start_row+2: to skip to data 
      #   (skip current and header row)
      #   reference: https://thispointer.com/pandas-skip-rows-while-reading-csv-file-to-a-dataframe-using-read_csv-in-python/
      # names = header: assigns column names
      df = pd.read_csv('data.txt', delimiter=r"\s+", skiprows=start_row+2,
              names = header)

      return df

Test

df = get_data('data.txt')
print(df)

data.txt file

# Raman Scattering Spectrum
# X-Axis:  Frequency (cm-1)
# Y-Axis:  Intensity (10-36 m2 cm/sr)

# Harmonic Data

# Peak information (Harmonic)
#                  X                   Y
#      20.1304976000        1.1465331676
#      25.5433266000        6.0306906544
...
#    3211.8081700000        0.3440113123
#    3224.5118500000        0.8814596030

# Plot Curve (Harmonic)
#                  X                   Y               DY/DX
    0.0000000000        8.4803414671        0.6546818124
    8.0000000000       17.8239097502        2.0146387573

Output

    X          Y     DY/DX
0  0.0   8.480341  0.654682
1  8.0  17.823910  2.014639

Answer 2

First: Thank you very much for your answer. It helps me a lot. I tried to used the comment function but i cannot add an 'Enter'

I want to plot the data, I can now extract from the file, but when I add my standard plot code:

plt.plot(df.X, df.Y)
plt.legend(['simulated'])
plt.xlabel('raman_shift')
plt.ylabel('intensity')
plt.grid(True)

plt.show()

I get now the error:

TypeError                                 Traceback (most recent call last)
<ipython-input-240-8594f8545868> in <module>
     28 plt.plot(df.X, df.Y)
     29 plt.legend(['simulated'])
---> 30 plt.xlabel('raman_shift')
     31 plt.ylabel('intensity')
     32 plt.grid(True)

TypeError: 'str' object is not callable

I have nothing changed at the label function. In my other project this lines work well. And I dont know as well how do read out the DY/DX column, the '/' kann not be used in the columnname. Did you got a tipp for me, again? :)

Thanks.

Import Data in Python with Pandas just for specific rows

Question

2 answers

solution1
0 ACCPTED 2020-04-04 11:33:30

solution2
0 2020-04-04 15:40:30

Import Data in Python with Pandas just for specific rows

Question

2 answers

solution1 0 ACCPTED 2020-04-04 11:33:30

solution2 0 2020-04-04 15:40:30

solution1
0 ACCPTED 2020-04-04 11:33:30

solution2
0 2020-04-04 15:40:30